Search results

1 – 10 of 417
Article
Publication date: 6 February 2017

Aytug Onan

The immense quantity of available unstructured text documents serve as one of the largest source of information. Text classification can be an essential task for many purposes in…

Abstract

Purpose

The immense quantity of available unstructured text documents serve as one of the largest source of information. Text classification can be an essential task for many purposes in information retrieval, such as document organization, text filtering and sentiment analysis. Ensemble learning has been extensively studied to construct efficient text classification schemes with higher predictive performance and generalization ability. The purpose of this paper is to provide diversity among the classification algorithms of ensemble, which is a key issue in the ensemble design.

Design/methodology/approach

An ensemble scheme based on hybrid supervised clustering is presented for text classification. In the presented scheme, supervised hybrid clustering, which is based on cuckoo search algorithm and k-means, is introduced to partition the data samples of each class into clusters so that training subsets with higher diversities can be provided. Each classifier is trained on the diversified training subsets and the predictions of individual classifiers are combined by the majority voting rule. The predictive performance of the proposed classifier ensemble is compared to conventional classification algorithms (such as Naïve Bayes, logistic regression, support vector machines and C4.5 algorithm) and ensemble learning methods (such as AdaBoost, bagging and random subspace) using 11 text benchmarks.

Findings

The experimental results indicate that the presented classifier ensemble outperforms the conventional classification algorithms and ensemble learning methods for text classification.

Originality/value

The presented ensemble scheme is the first to use supervised clustering to obtain diverse ensemble for text classification

Details

Kybernetes, vol. 46 no. 2
Type: Research Article
ISSN: 0368-492X

Keywords

Article
Publication date: 27 February 2024

Jianhua Zhang, Liangchen Li, Fredrick Ahenkora Boamah, Dandan Wen, Jiake Li and Dandan Guo

Traditional case-adaptation methods have poor accuracy, low efficiency and limited applicability, which cannot meet the needs of knowledge users. To address the shortcomings of…

Abstract

Purpose

Traditional case-adaptation methods have poor accuracy, low efficiency and limited applicability, which cannot meet the needs of knowledge users. To address the shortcomings of the existing research in the industry, this paper proposes a case-adaptation optimization algorithm to support the effective application of tacit knowledge resources.

Design/methodology/approach

The attribute simplification algorithm based on the forward search strategy in the neighborhood decision information system is implemented to realize the vertical dimensionality reduction of the case base, and the fuzzy C-mean (FCM) clustering algorithm based on the simulated annealing genetic algorithm (SAGA) is implemented to compress the case base horizontally with multiple decision classes. Then, the subspace K-nearest neighbors (KNN) algorithm is used to induce the decision rules for the set of adapted cases to complete the optimization of the adaptation model.

Findings

The findings suggest the rapid enrichment of data, information and tacit knowledge in the field of practice has led to low efficiency and low utilization of knowledge dissemination, and this algorithm can effectively alleviate the problems of users falling into “knowledge disorientation” in the era of the knowledge economy.

Practical implications

This study provides a model with case knowledge that meets users’ needs, thereby effectively improving the application of the tacit knowledge in the explicit case base and the problem-solving efficiency of knowledge users.

Social implications

The adaptation model can serve as a stable and efficient prediction model to make predictions for the effects of the many logistics and e-commerce enterprises' plans.

Originality/value

This study designs a multi-decision class case-adaptation optimization study based on forward attribute selection strategy-neighborhood rough sets (FASS-NRS) and simulated annealing genetic algorithm-fuzzy C-means (SAGA-FCM) for tacit knowledgeable exogenous cases. By effectively organizing and adjusting tacit knowledge resources, knowledge service organizations can maintain their competitive advantages. The algorithm models established in this study develop theoretical directions for a multi-decision class case-adaptation optimization study of tacit knowledge.

Details

Journal of Advances in Management Research, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0972-7981

Keywords

Content available
Book part
Publication date: 18 January 2022

Abstract

Details

Essays in Honor of M. Hashem Pesaran: Prediction and Macro Modeling
Type: Book
ISBN: 978-1-80262-062-7

Book part
Publication date: 18 January 2022

Andreas Pick and Matthijs Carpay

This chapter investigates the performance of different dimension reduction approaches for large vector autoregressions in multi-step ahead forecasts. The authors consider factor…

Abstract

This chapter investigates the performance of different dimension reduction approaches for large vector autoregressions in multi-step ahead forecasts. The authors consider factor augmented VAR models using principal components and partial least squares, random subset regression, random projection, random compression, and estimation via LASSO and Bayesian VAR. The authors compare the accuracy of iterated and direct multi-step point and density forecasts. The comparison is based on macroeconomic and financial variables from the FRED-MD data base. Our findings suggest that random subspace methods and LASSO estimation deliver the most precise forecasts.

Details

Essays in Honor of M. Hashem Pesaran: Prediction and Macro Modeling
Type: Book
ISBN: 978-1-80262-062-7

Keywords

Article
Publication date: 20 August 2018

Laouni Djafri, Djamel Amar Bensaber and Reda Adjoudj

This paper aims to solve the problems of big data analytics for prediction including volume, veracity and velocity by improving the prediction result to an acceptable level and in…

Abstract

Purpose

This paper aims to solve the problems of big data analytics for prediction including volume, veracity and velocity by improving the prediction result to an acceptable level and in the shortest possible time.

Design/methodology/approach

This paper is divided into two parts. The first one is to improve the result of the prediction. In this part, two ideas are proposed: the double pruning enhanced random forest algorithm and extracting a shared learning base from the stratified random sampling method to obtain a representative learning base of all original data. The second part proposes to design a distributed architecture supported by new technologies solutions, which in turn works in a coherent and efficient way with the sampling strategy under the supervision of the Map-Reduce algorithm.

Findings

The representative learning base obtained by the integration of two learning bases, the partial base and the shared base, presents an excellent representation of the original data set and gives very good results of the Big Data predictive analytics. Furthermore, these results were supported by the improved random forests supervised learning method, which played a key role in this context.

Originality/value

All companies are concerned, especially those with large amounts of information and want to screen them to improve their knowledge for the customer and optimize their campaigns.

Details

Information Discovery and Delivery, vol. 46 no. 3
Type: Research Article
ISSN: 2398-6247

Keywords

Open Access
Article
Publication date: 2 November 2021

Showmitra Kumar Sarkar, Swapan Talukdar, Atiqur Rahman, Shahfahad and Sujit Kumar Roy

The present study aims to construct ensemble machine learning (EML) algorithms for groundwater potentiality mapping (GPM) in the Teesta River basin of Bangladesh, including random

2279

Abstract

Purpose

The present study aims to construct ensemble machine learning (EML) algorithms for groundwater potentiality mapping (GPM) in the Teesta River basin of Bangladesh, including random forest (RF) and random subspace (RSS).

Design/methodology/approach

The RF and RSS models have been implemented for integrating 14 selected groundwater condition parametres with groundwater inventories for generating GPMs. The GPM were then validated using the empirical and bionormal receiver operating characteristics (ROC) curve.

Findings

The very high (831–1200 km2) and high groundwater potential areas (521–680 km2) were predicted using EML algorithms. The RSS (AUC-0.892) model outperformed RF model based on ROC's area under curve (AUC).

Originality/value

Two new EML models have been constructed for GPM. These findings will aid in proposing sustainable water resource management plans.

Details

Frontiers in Engineering and Built Environment, vol. 2 no. 1
Type: Research Article
ISSN: 2634-2499

Keywords

Article
Publication date: 23 August 2011

Ch. Aswani Kumar

The purpose of this paper is to introduce a new hybrid method for reducing dimensionality of high dimensional data.

Abstract

Purpose

The purpose of this paper is to introduce a new hybrid method for reducing dimensionality of high dimensional data.

Design/methodology/approach

Literature on dimensionality reduction (DR) witnesses the research efforts that combine random projections (RP) and singular value decomposition (SVD) so as to derive the benefit of both of these methods. However, SVD is well known for its computational complexity. Clustering under the notion of concept decomposition is proved to be less computationally complex than SVD and useful for DR. The method proposed in this paper combines RP and fuzzy k‐means clustering (FKM) for reducing dimensionality of the data.

Findings

The proposed RP‐FKM is computationally less complex than SVD, RP‐SVD. On the image data, the proposed RP‐FKM has produced less amount of distortion when compared with RP. The proposed RP‐FKM provides better text retrieval results when compared with conventional RP and performs similar to RP‐SVD. For the text retrieval task, superiority of SVD over other DR methods noted here is in good agreement with the analysis reported by Moravec.

Originality/value

The hybrid method proposed in this paper, combining RP and FKM, is new. Experimental results indicate that the proposed method is useful for reducing dimensionality of high‐dimensional data such as images, text, etc.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 4 no. 3
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 23 June 2021

Serkan Altuntas, Türkay Dereli and Zülfiye Erdoğan

This study aims to propose a service quality evaluation model for health-care services.

658

Abstract

Purpose

This study aims to propose a service quality evaluation model for health-care services.

Design/methodology/approach

In this study, a service quality evaluation model is proposed based on the service quality measurement (SERVQUAL) scale and machine learning algorithm. Primarily, items that affect the quality of service are determined based on the SERVQUAL scale. Subsequently, a service quality assessment model is generated to manage the resources that are allocated to improve the activities efficiently. Following this phase, a sample of classification model is conducted. Machine learning algorithms are used to establish the classification model.

Findings

The proposed evaluation model addresses the following questions: What are the potential impact levels of service quality dimensions on the quality of service practically? What should be prioritization among the service quality dimensions and Which dimensions of service quality should be improved primarily? A real-life case study in a public hospital is carried out to reveal how the proposed model works. The results that have been obtained from the case study show that the proposed model can be conducted easily in practice. It is also found that there is a remarkably high-service gap in the public hospital, in which the case study has been conducted, regarding the general physical conditions and food services.

Originality/value

The primary contribution of this study is threefold. The proposed evaluation model determines the impact levels of service quality dimensions on the service quality in practice. The proposed evaluation model prioritizes service quality dimensions in terms of their significance. The proposed evaluation model finds out the answer to the question of which service quality dimensions should be improved primarily?

Details

Kybernetes, vol. 51 no. 2
Type: Research Article
ISSN: 0368-492X

Keywords

Article
Publication date: 24 November 2020

Changro Lee and Key-Ho Park

Most prior attempts at real estate valuation have focused on the use of metadata such as size and property age, neglecting the fact that the building workmanship in the…

Abstract

Purpose

Most prior attempts at real estate valuation have focused on the use of metadata such as size and property age, neglecting the fact that the building workmanship in the construction of a house is also a key factor for the estimation of house prices. Building workmanship, such as exterior walls and floor tiling correspond to the visual attributes of a house, and it is difficult to capture and evaluate such attributes efficiently through classical models like regression analysis. Deep learning approach is taken in the valuation process to utilize this visual information.

Design/methodology/approach

The authors propose a two-input neural network comprising a multilayer perceptron and a convolutional neural network that can utilize both metadata and the visual information from images of the front view of the house.

Findings

The authors applied the two-input neural network to Guri City in Gyeonggi Province, South Korea, as a case study and found that the accuracy of house price estimations can be improved by employing image information along with metadata.

Originality/value

Few studies considered the impact of the building workmanship in the valuation process. The authors revealed that it is useful to use both photographs and metadata for enhancing the accuracy of house price estimation.

Details

Data Technologies and Applications, vol. 55 no. 2
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 14 May 2020

Byungdae An and Yongmoo Suh

Financial statement fraud (FSF) committed by companies implies the current status of the companies may not be healthy. As such, it is important to detect FSF, since such companies…

Abstract

Purpose

Financial statement fraud (FSF) committed by companies implies the current status of the companies may not be healthy. As such, it is important to detect FSF, since such companies tend to conceal bad information, which causes a great loss to various stakeholders. Thus, the objective of the paper is to propose a novel approach to building a classification model to identify FSF, which shows high classification performance and from which human-readable rules are extracted to explain why a company is likely to commit FSF.

Design/methodology/approach

Having prepared multiple sub-datasets to cope with class imbalance problem, we build a set of decision trees for each sub-dataset; select a subset of the set as a model for the sub-dataset by removing the tree, each of whose performance is less than the average accuracy of all trees in the set; and then select one such model which shows the best accuracy among the models. We call the resulting model MRF (Modified Random Forest). Given a new instance, we extract rules from the MRF model to explain whether the company corresponding to the new instance is likely to commit FSF or not.

Findings

Experimental results show that MRF classifier outperformed the benchmark models. The results also revealed that all the variables related to profit belong to the set of the most important indicators to FSF and that two new variables related to gross profit which were unapprised in previous studies on FSF were identified.

Originality/value

This study proposed a method of building a classification model which shows the outstanding performance and provides decision rules that can be used to explain the classification results. In addition, a new way to resolve the class imbalance problem was suggested in this paper.

Details

Data Technologies and Applications, vol. 54 no. 2
Type: Research Article
ISSN: 2514-9288

Keywords

1 – 10 of 417