Search results

1 – 10 of 147
Article
Publication date: 23 March 2021

Mostafa El Habib Daho, Nesma Settouti, Mohammed El Amine Bechar, Amina Boublenza and Mohammed Amine Chikh

Ensemble methods have been widely used in the field of pattern recognition due to the difficulty of finding a single classifier that performs well on a wide variety of problems…

Abstract

Purpose

Ensemble methods have been widely used in the field of pattern recognition due to the difficulty of finding a single classifier that performs well on a wide variety of problems. Despite the effectiveness of these techniques, studies have shown that ensemble methods generate a large number of hypotheses and that contain redundant classifiers in most cases. Several works proposed in the state of the art attempt to reduce all hypotheses without affecting performance.

Design/methodology/approach

In this work, the authors are proposing a pruning method that takes into consideration the correlation between classifiers/classes and each classifier with the rest of the set. The authors have used the random forest algorithm as trees-based ensemble classifiers and the pruning was made by a technique inspired by the CFS (correlation feature selection) algorithm.

Findings

The proposed method CES (correlation-based Ensemble Selection) was evaluated on ten datasets from the UCI machine learning repository, and the performances were compared to six ensemble pruning techniques. The results showed that our proposed pruning method selects a small ensemble in a smaller amount of time while improving classification rates compared to the state-of-the-art methods.

Originality/value

CES is a new ordering-based method that uses the CFS algorithm. CES selects, in a short time, a small sub-ensemble that outperforms results obtained from the whole forest and the other state-of-the-art techniques used in this study.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 14 no. 2
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 10 January 2020

Waqar Ahmed Khan, S.H. Chung, Muhammad Usman Awan and Xin Wen

The purpose of this paper is three-fold: to review the categories explaining mainly optimization algorithms (techniques) in that needed to improve the generalization performance…

Abstract

Purpose

The purpose of this paper is three-fold: to review the categories explaining mainly optimization algorithms (techniques) in that needed to improve the generalization performance and learning speed of the Feedforward Neural Network (FNN); to discover the change in research trends by analyzing all six categories (i.e. gradient learning algorithms for network training, gradient free learning algorithms, optimization algorithms for learning rate, bias and variance (underfitting and overfitting) minimization algorithms, constructive topology neural networks, metaheuristic search algorithms) collectively; and recommend new research directions for researchers and facilitate users to understand algorithms real-world applications in solving complex management, engineering and health sciences problems.

Design/methodology/approach

The FNN has gained much attention from researchers to make a more informed decision in the last few decades. The literature survey is focused on the learning algorithms and the optimization techniques proposed in the last three decades. This paper (Part II) is an extension of Part I. For the sake of simplicity, the paper entitled “Machine learning facilitated business intelligence (Part I): Neural networks learning algorithms and applications” is referred to as Part I. To make the study consistent with Part I, the approach and survey methodology in this paper are kept similar to those in Part I.

Findings

Combining the work performed in Part I, the authors studied a total of 80 articles through popular keywords searching. The FNN learning algorithms and optimization techniques identified in the selected literature are classified into six categories based on their problem identification, mathematical model, technical reasoning and proposed solution. Previously, in Part I, the two categories focusing on the learning algorithms (i.e. gradient learning algorithms for network training, gradient free learning algorithms) are reviewed with their real-world applications in management, engineering, and health sciences. Therefore, in the current paper, Part II, the remaining four categories, exploring optimization techniques (i.e. optimization algorithms for learning rate, bias and variance (underfitting and overfitting) minimization algorithms, constructive topology neural networks, metaheuristic search algorithms) are studied in detail. The algorithm explanation is made enriched by discussing their technical merits, limitations, and applications in their respective categories. Finally, the authors recommend future new research directions which can contribute to strengthening the literature.

Research limitations/implications

The FNN contributions are rapidly increasing because of its ability to make reliably informed decisions. Like learning algorithms, reviewed in Part I, the focus is to enrich the comprehensive study by reviewing remaining categories focusing on the optimization techniques. However, future efforts may be needed to incorporate other algorithms into identified six categories or suggest new category to continuously monitor the shift in the research trends.

Practical implications

The authors studied the shift in research trend for three decades by collectively analyzing the learning algorithms and optimization techniques with their applications. This may help researchers to identify future research gaps to improve the generalization performance and learning speed, and user to understand the applications areas of the FNN. For instance, research contribution in FNN in the last three decades has changed from complex gradient-based algorithms to gradient free algorithms, trial and error hidden units fixed topology approach to cascade topology, hyperparameters initial guess to analytically calculation and converging algorithms at a global minimum rather than the local minimum.

Originality/value

The existing literature surveys include comparative study of the algorithms, identifying algorithms application areas and focusing on specific techniques in that it may not be able to identify algorithms categories, a shift in research trends over time, application area frequently analyzed, common research gaps and collective future directions. Part I and II attempts to overcome the existing literature surveys limitations by classifying articles into six categories covering a wide range of algorithm proposed to improve the FNN generalization performance and convergence rate. The classification of algorithms into six categories helps to analyze the shift in research trend which makes the classification scheme significant and innovative.

Details

Industrial Management & Data Systems, vol. 120 no. 1
Type: Research Article
ISSN: 0263-5577

Keywords

Article
Publication date: 6 February 2017

Aytug Onan

The immense quantity of available unstructured text documents serve as one of the largest source of information. Text classification can be an essential task for many purposes in…

Abstract

Purpose

The immense quantity of available unstructured text documents serve as one of the largest source of information. Text classification can be an essential task for many purposes in information retrieval, such as document organization, text filtering and sentiment analysis. Ensemble learning has been extensively studied to construct efficient text classification schemes with higher predictive performance and generalization ability. The purpose of this paper is to provide diversity among the classification algorithms of ensemble, which is a key issue in the ensemble design.

Design/methodology/approach

An ensemble scheme based on hybrid supervised clustering is presented for text classification. In the presented scheme, supervised hybrid clustering, which is based on cuckoo search algorithm and k-means, is introduced to partition the data samples of each class into clusters so that training subsets with higher diversities can be provided. Each classifier is trained on the diversified training subsets and the predictions of individual classifiers are combined by the majority voting rule. The predictive performance of the proposed classifier ensemble is compared to conventional classification algorithms (such as Naïve Bayes, logistic regression, support vector machines and C4.5 algorithm) and ensemble learning methods (such as AdaBoost, bagging and random subspace) using 11 text benchmarks.

Findings

The experimental results indicate that the presented classifier ensemble outperforms the conventional classification algorithms and ensemble learning methods for text classification.

Originality/value

The presented ensemble scheme is the first to use supervised clustering to obtain diverse ensemble for text classification

Details

Kybernetes, vol. 46 no. 2
Type: Research Article
ISSN: 0368-492X

Keywords

Article
Publication date: 13 August 2019

Hongshan Xiao and Yu Wang

Feature space heterogeneity exists widely in various application fields of classification techniques, such as customs inspection decision, credit scoring and medical diagnosis…

Abstract

Purpose

Feature space heterogeneity exists widely in various application fields of classification techniques, such as customs inspection decision, credit scoring and medical diagnosis. This paper aims to study the relationship between feature space heterogeneity and classification performance.

Design/methodology/approach

A measurement is first developed for measuring and identifying any significant heterogeneity that exists in the feature space of a data set. The main idea of this measurement is derived from a meta-analysis. For the data set with significant feature space heterogeneity, a classification algorithm based on factor analysis and clustering is proposed to learn the data patterns, which, in turn, are used for data classification.

Findings

The proposed approach has two main advantages over the previous methods. The first advantage lies in feature transform using orthogonal factor analysis, which results in new features without redundancy and irrelevance. The second advantage rests on samples partitioning to capture the feature space heterogeneity reflected by differences of factor scores. The validity and effectiveness of the proposed approach is verified on a number of benchmarking data sets.

Research limitations/implications

Measurement should be used to guide the heterogeneity elimination process, which is an interesting topic in future research. In addition, to develop a classification algorithm that enables scalable and incremental learning for large data sets with significant feature space heterogeneity is also an important issue.

Practical implications

Measuring and eliminating the feature space heterogeneity possibly existing in the data are important for accurate classification. This study provides a systematical approach to feature space heterogeneity measurement and elimination for better classification performance, which is favorable for applications of classification techniques in real-word problems.

Originality/value

A measurement based on meta-analysis for measuring and identifying any significant feature space heterogeneity in a classification problem is developed, and an ensemble classification framework is proposed to deal with the feature space heterogeneity and improve the classification accuracy.

Details

Kybernetes, vol. 48 no. 9
Type: Research Article
ISSN: 0368-492X

Keywords

Article
Publication date: 27 May 2021

Sara Tavassoli and Hamidreza Koosha

Customer churn prediction is one of the most well-known approaches to manage and improve customer retention. Machine learning techniques, especially classification algorithms, are…

Abstract

Purpose

Customer churn prediction is one of the most well-known approaches to manage and improve customer retention. Machine learning techniques, especially classification algorithms, are very popular tools to predict the churners. In this paper, three ensemble classifiers are proposed based on bagging and boosting for customer churn prediction.

Design/methodology/approach

In this paper, three ensemble classifiers are proposed based on bagging and boosting for customer churn prediction. The first classifier, which is called boosted bagging, uses boosting for each bagging sample. In this approach, before concluding the final results in a bagging algorithm, the authors try to improve the prediction by applying a boosting algorithm for each bootstrap sample. The second proposed ensemble classifier, which is called bagged bagging, combines bagging with itself. In the other words, the authors apply bagging for each sample of bagging algorithm. Finally, the third approach uses bagging of neural network with learning based on a genetic algorithm.

Findings

To examine the performance of all proposed ensemble classifiers, they are applied to two datasets. Numerical simulations illustrate that the proposed hybrid approaches outperform the simple bagging and boosting algorithms as well as base classifiers. Especially, bagged bagging provides high accuracy and precision results.

Originality/value

In this paper, three novel ensemble classifiers are proposed based on bagging and boosting for customer churn prediction. Not only the proposed approaches can be applied for customer churn prediction but also can be used for any other binary classification algorithms.

Details

Kybernetes, vol. 51 no. 3
Type: Research Article
ISSN: 0368-492X

Keywords

Article
Publication date: 28 November 2023

Tingting Tian, Hongjian Shi, Ruhui Ma and Yuan Liu

For privacy protection, federated learning based on data separation allows machine learning models to be trained on remote devices or in isolated data devices. However, due to the…

Abstract

Purpose

For privacy protection, federated learning based on data separation allows machine learning models to be trained on remote devices or in isolated data devices. However, due to the limited resources such as bandwidth and power of local devices, communication in federated learning can be much slower than in local computing. This study aims to improve communication efficiency by reducing the number of communication rounds and the size of information transmitted in each round.

Design/methodology/approach

This paper allows each user node to perform multiple local trainings, then upload the local model parameters to a central server. The central server updates the global model parameters by weighted averaging the parameter information. Based on this aggregation, user nodes first cluster the parameter information to be uploaded and then replace each value with the mean value of its cluster. Considering the asymmetry of the federated learning framework, adaptively select the optimal number of clusters required to compress the model information.

Findings

While maintaining the loss convergence rate similar to that of federated averaging, the test accuracy did not decrease significantly.

Originality/value

By compressing uplink traffic, the work can improve communication efficiency on dynamic networks with limited resources.

Details

International Journal of Web Information Systems, vol. 20 no. 1
Type: Research Article
ISSN: 1744-0084

Keywords

Open Access
Article
Publication date: 11 July 2022

Afreen Khan, Swaleha Zubair and Samreen Khan

This study aimed to assess the potential of the Clinical Dementia Rating (CDR) Scale in the prognosis of dementia in elderly subjects.

Abstract

Purpose

This study aimed to assess the potential of the Clinical Dementia Rating (CDR) Scale in the prognosis of dementia in elderly subjects.

Design/methodology/approach

Dementia staging severity is clinically an essential task, so the authors used machine learning (ML) on the magnetic resonance imaging (MRI) features to locate and study the impact of various MR readings onto the classification of demented and nondemented patients. The authors used cross-sectional MRI data in this study. The designed ML approach established the role of CDR in the prognosis of inflicted and normal patients. Moreover, the pattern analysis indicated CDR as a strong cohort amongst the various attributes, with CDR to have a significant value of p < 0.01. The authors employed 20 ML classifiers.

Findings

The mean prediction accuracy varied with the various ML classifier used, with the bagging classifier (random forest as a base estimator) achieving the highest (93.67%). A series of ML analyses demonstrated that the model including the CDR score had better prediction accuracy and other related performance metrics.

Originality/value

The results suggest that the CDR score, a simple clinical measure, can be used in real community settings. It can be used to predict dementia progression with ML modeling.

Details

Arab Gulf Journal of Scientific Research, vol. 40 no. 1
Type: Research Article
ISSN: 1985-9899

Keywords

Article
Publication date: 6 January 2022

Hanan Alghamdi and Ali Selamat

With the proliferation of terrorist/extremist websites on the World Wide Web, it has become progressively more crucial to detect and analyze the content on these websites…

Abstract

Purpose

With the proliferation of terrorist/extremist websites on the World Wide Web, it has become progressively more crucial to detect and analyze the content on these websites. Accordingly, the volume of previous research focused on identifying the techniques and activities of terrorist/extremist groups, as revealed by their sites on the so-called dark web, has also grown.

Design/methodology/approach

This study presents a review of the techniques used to detect and process the content of terrorist/extremist sites on the dark web. Forty of the most relevant data sources were examined, and various techniques were identified among them.

Findings

Based on this review, it was found that methods of feature selection and feature extraction can be used as topic modeling with content analysis and text clustering.

Originality/value

At the end of the review, present the current state-of-the- art and certain open issues associated with Arabic dark Web content analysis.

Details

Data Technologies and Applications, vol. 56 no. 4
Type: Research Article
ISSN: 2514-9288

Keywords

Book part
Publication date: 30 September 2020

Hera Khan, Ayush Srivastav and Amit Kumar Mishra

A detailed description will be provided of all the classification algorithms that have been widely used in the domain of medical science. The foundation will be laid by giving a…

Abstract

A detailed description will be provided of all the classification algorithms that have been widely used in the domain of medical science. The foundation will be laid by giving a comprehensive overview pertaining to the background and history of the classification algorithms. This will be followed by an extensive discussion regarding various techniques of classification algorithm in machine learning (ML) hence concluding with their relevant applications in data analysis in medical science and health care. To begin with, the initials of this chapter will deal with the basic fundamentals required for a profound understanding of the classification techniques in ML which will comprise of the underlying differences between Unsupervised and Supervised Learning followed by the basic terminologies of classification and its history. Further, it will include the types of classification algorithms ranging from linear classifiers like Logistic Regression, Naïve Bayes to Nearest Neighbour, Support Vector Machine, Tree-based Classifiers, and Neural Networks, and their respective mathematics. Ensemble algorithms such as Majority Voting, Boosting, Bagging, Stacking will also be discussed at great length along with their relevant applications. Furthermore, this chapter will also incorporate comprehensive elucidation regarding the areas of application of such classification algorithms in the field of biomedicine and health care and their contribution to decision-making systems and predictive analysis. To conclude, this chapter will devote highly in the field of research and development as it will provide a thorough insight to the classification algorithms and their relevant applications used in the cases of the healthcare development sector.

Details

Big Data Analytics and Intelligence: A Perspective for Health Care
Type: Book
ISBN: 978-1-83909-099-8

Keywords

Article
Publication date: 5 February 2018

Olugbenga Wilson Adejo and Thomas Connolly

The purpose of this paper is to empirically investigate and compare the use of multiple data sources, different classifiers and ensembles of classifiers technique in predicting…

1193

Abstract

Purpose

The purpose of this paper is to empirically investigate and compare the use of multiple data sources, different classifiers and ensembles of classifiers technique in predicting student academic performance. The study will compare the performance and efficiency of ensemble techniques that make use of different combination of data sources with that of base classifiers with single data source.

Design/methodology/approach

Using a quantitative research methodology, data samples of 141 learners enrolled in the University of the West of Scotland were extracted from the institution’s databases and also collected through survey questionnaire. The research focused on three data sources: student record system, learning management system and survey, and also used three state-of-art data mining classifiers, namely, decision tree, artificial neural network and support vector machine for the modeling. In addition, the ensembles of these base classifiers were used in the student performance prediction and the performances of the seven different models developed were compared using six different evaluation metrics.

Findings

The results show that the approach of using multiple data sources along with heterogeneous ensemble techniques is very efficient and accurate in prediction of student performance as well as help in proper identification of student at risk of attrition.

Practical implications

The approach proposed in this study will help the educational administrators and policy makers working within educational sector in the development of new policies and curriculum on higher education that are relevant to student retention. In addition, the general implications of this research to practice is its ability to accurately help in early identification of students at risk of dropping out of HE from the combination of data sources so that necessary support and intervention can be provided.

Originality/value

The research empirically investigated and compared the performance accuracy and efficiency of single classifiers and ensemble of classifiers that make use of single and multiple data sources. The study has developed a novel hybrid model that can be used for predicting student performance that is high in accuracy and efficient in performance. Generally, this research study advances the understanding of the application of ensemble techniques to predicting student performance using learner data and has successfully addressed these fundamental questions: What combination of variables will accurately predict student academic performance? What is the potential of the use of stacking ensemble techniques in accurately predicting student academic performance?

Details

Journal of Applied Research in Higher Education, vol. 10 no. 1
Type: Research Article
ISSN: 2050-7003

Keywords

1 – 10 of 147