Search results

1 – 10 of over 3000
Article
Publication date: 23 November 2022

Ibrahim Karatas and Abdulkadir Budak

The study is aimed to compare the prediction success of basic machine learning and ensemble machine learning models and accordingly create novel prediction models by…

Abstract

Purpose

The study is aimed to compare the prediction success of basic machine learning and ensemble machine learning models and accordingly create novel prediction models by combining machine learning models to increase the prediction success in construction labor productivity prediction models.

Design/methodology/approach

Categorical and numerical data used in prediction models in many studies in the literature for the prediction of construction labor productivity were made ready for analysis by preprocessing. The Python programming language was used to develop machine learning models. As a result of many variation trials, the models were combined and the proposed novel voting and stacking meta-ensemble machine learning models were constituted. Finally, the models were compared to Target and Taylor diagram.

Findings

Meta-ensemble models have been developed for labor productivity prediction by combining machine learning models. Voting ensemble by combining et, gbm, xgboost, lightgbm, catboost and mlp models and stacking ensemble by combining et, gbm, xgboost, catboost and mlp models were created and finally the Et model as meta-learner was selected. Considering the prediction success, it has been determined that the voting and stacking meta-ensemble algorithms have higher prediction success than other machine learning algorithms. Model evaluation metrics, namely MAE, MSE, RMSE and R2, were selected to measure the prediction success. For the voting meta-ensemble algorithm, the values of the model evaluation metrics MAE, MSE, RMSE and R2 are 0.0499, 0.0045, 0.0671 and 0.7886, respectively. For the stacking meta-ensemble algorithm, the values of the model evaluation metrics MAE, MSE, RMSE and R2 are 0.0469, 0.0043, 0.0658 and 0.7967, respectively.

Research limitations/implications

The study shows the comparison between machine learning algorithms and created novel meta-ensemble machine learning algorithms to predict the labor productivity of construction formwork activity. The practitioners and project planners can use this model as reliable and accurate tool for predicting the labor productivity of construction formwork activity prior to construction planning.

Originality/value

The study provides insight into the application of ensemble machine learning algorithms in predicting construction labor productivity. Additionally, novel meta-ensemble algorithms have been used and proposed. Therefore, it is hoped that predicting the labor productivity of construction formwork activity with high accuracy will make a great contribution to construction project management.

Details

Engineering, Construction and Architectural Management, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0969-9988

Keywords

Article
Publication date: 17 October 2022

Hasnae Zerouaoui, Ali Idri and Omar El Alaoui

Hundreds of thousands of deaths each year in the world are caused by breast cancer (BC). An early-stage diagnosis of this disease can positively reduce the morbidity and…

Abstract

Purpose

Hundreds of thousands of deaths each year in the world are caused by breast cancer (BC). An early-stage diagnosis of this disease can positively reduce the morbidity and mortality rate by helping to select the most appropriate treatment options, especially by using histological BC images for the diagnosis.

Design/methodology/approach

The present study proposes and evaluates a novel approach which consists of 24 deep hybrid heterogenous ensembles that combine the strength of seven deep learning techniques (DenseNet 201, Inception V3, VGG16, VGG19, Inception-ResNet-V3, MobileNet V2 and ResNet 50) for feature extraction and four well-known classifiers (multi-layer perceptron, support vector machines, K-nearest neighbors and decision tree) by means of hard and weighted voting combination methods for histological classification of BC medical image. Furthermore, the best deep hybrid heterogenous ensembles were compared to the deep stacked ensembles to determine the best strategy to design the deep ensemble methods. The empirical evaluations used four classification performance criteria (accuracy, sensitivity, precision and F1-score), fivefold cross-validation, Scott–Knott (SK) statistical test and Borda count voting method. All empirical evaluations were assessed using four performance measures, including accuracy, precision, recall and F1-score, and were over the histological BreakHis public dataset with four magnification factors (40×, 100×, 200× and 400×). SK statistical test and Borda count were also used to cluster the designed techniques and rank the techniques belonging to the best SK cluster, respectively.

Findings

Results showed that the deep hybrid heterogenous ensembles outperformed both their singles and the deep stacked ensembles and reached the accuracy values of 96.3, 95.6, 96.3 and 94 per cent across the four magnification factors 40×, 100×, 200× and 400×, respectively.

Originality/value

The proposed deep hybrid heterogenous ensembles can be applied for the BC diagnosis to assist pathologists in reducing the missed diagnoses and proposing adequate treatments for the patients.

Details

Data Technologies and Applications, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 31 May 2022

Osamah M. Al-Qershi, Junbum Kwon, Shuning Zhao and Zhaokun Li

For the case of many content features, This paper aims to investigate which content features in video and text ads more contribute to accurately predicting the success of…

Abstract

Purpose

For the case of many content features, This paper aims to investigate which content features in video and text ads more contribute to accurately predicting the success of crowdfunding by comparing prediction models.

Design/methodology/approach

With 1,368 features extracted from 15,195 Kickstarter campaigns in the USA, the authors compare base models such as logistic regression (LR) with tree-based homogeneous ensembles such as eXtreme gradient boosting (XGBoost) and heterogeneous ensembles such as XGBoost + LR.

Findings

XGBoost shows higher prediction accuracy than LR (82% vs 69%), in contrast to the findings of a previous relevant study. Regarding important content features, humans (e.g. founders) are more important than visual objects (e.g. products). In both spoken and written language, words related to experience (e.g. eat) or perception (e.g. hear) are more important than cognitive (e.g. causation) words. In addition, a focus on the future is more important than a present or past time orientation. Speech aids (see and compare) to complement visual content are also effective and positive tone matters in speech.

Research limitations/implications

This research makes theoretical contributions by finding more important visuals (human) and language features (experience, perception and future time). Also, in a multimodal context, complementary cues (e.g. speech aids) across different modalities help. Furthermore, the noncontent parts of speech such as positive “tone” or pace of speech are important.

Practical implications

Founders are encouraged to assess and revise the content of their video or text ads as well as their basic campaign features (e.g. goal, duration and reward) before they launch their campaigns. Next, overly complex ensembles may suffer from overfitting problems. In practice, model validation using unseen data is recommended.

Originality/value

Rather than reducing the number of content feature dimensions (Kaminski and Hopp, 2020), by enabling advanced prediction models to accommodate many contents features, prediction accuracy rises substantially.

Article
Publication date: 23 March 2021

Mostafa El Habib Daho, Nesma Settouti, Mohammed El Amine Bechar, Amina Boublenza and Mohammed Amine Chikh

Ensemble methods have been widely used in the field of pattern recognition due to the difficulty of finding a single classifier that performs well on a wide variety of…

Abstract

Purpose

Ensemble methods have been widely used in the field of pattern recognition due to the difficulty of finding a single classifier that performs well on a wide variety of problems. Despite the effectiveness of these techniques, studies have shown that ensemble methods generate a large number of hypotheses and that contain redundant classifiers in most cases. Several works proposed in the state of the art attempt to reduce all hypotheses without affecting performance.

Design/methodology/approach

In this work, the authors are proposing a pruning method that takes into consideration the correlation between classifiers/classes and each classifier with the rest of the set. The authors have used the random forest algorithm as trees-based ensemble classifiers and the pruning was made by a technique inspired by the CFS (correlation feature selection) algorithm.

Findings

The proposed method CES (correlation-based Ensemble Selection) was evaluated on ten datasets from the UCI machine learning repository, and the performances were compared to six ensemble pruning techniques. The results showed that our proposed pruning method selects a small ensemble in a smaller amount of time while improving classification rates compared to the state-of-the-art methods.

Originality/value

CES is a new ordering-based method that uses the CFS algorithm. CES selects, in a short time, a small sub-ensemble that outperforms results obtained from the whole forest and the other state-of-the-art techniques used in this study.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 14 no. 2
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 28 May 2021

Zhibin Xiong and Jun Huang

Ensemble models that combine multiple base classifiers have been widely used to improve prediction performance in credit risk evaluation. However, an arbitrary selection…

Abstract

Purpose

Ensemble models that combine multiple base classifiers have been widely used to improve prediction performance in credit risk evaluation. However, an arbitrary selection of base classifiers is problematic. The purpose of this paper is to develop a framework for selecting base classifiers to improve the overall classification performance of an ensemble model.

Design/methodology/approach

In this study, selecting base classifiers is treated as a feature selection problem, where the output from a base classifier can be considered a feature. The proposed correlation-based classifier selection using the maximum information coefficient (MIC-CCS), a correlation-based classifier selection under the maximum information coefficient method, selects the features (classifiers) using nonlinear optimization programming, which seeks to optimize the relationship between the accuracy and diversity of base classifiers, based on MIC.

Findings

The empirical results show that ensemble models perform better than stand-alone ones, whereas the ensemble model based on MIC-CCS outperforms the ensemble models with unselected base classifiers and other ensemble models based on traditional forward and backward selection methods. Additionally, the classification performance of the ensemble model in which correlation is measured with MIC is better than that measured with the Pearson correlation coefficient.

Research limitations/implications

The study provides an alternate solution to effectively select base classifiers that are significantly different, so that they can provide complementary information and, as these selected classifiers have good predictive capabilities, the classification performance of the ensemble model is improved.

Originality/value

This paper introduces MIC to the correlation-based selection process to better capture nonlinear and nonfunctional relationships in a complex credit data structure and construct a novel nonlinear programming model for base classifiers selection that has not been used in other studies.

Article
Publication date: 7 April 2015

Jie Sun, Hui Li, Pei-Chann Chang and Qing-Hua Huang

Previous researches on credit scoring mainly focussed on static modeling on panel sample data set in a certain period of time, and did not pay enough attention on dynamic…

Abstract

Purpose

Previous researches on credit scoring mainly focussed on static modeling on panel sample data set in a certain period of time, and did not pay enough attention on dynamic incremental modeling. The purpose of this paper is to address the integration of branch and bound algorithm with incremental support vector machine (SVM) ensemble to make dynamic modeling of credit scoring.

Design/methodology/approach

This new model hybridizes support vectors of old data with incremental financial data of corporate in the process of dynamic ensemble modeling based on bagged SVM. In the incremental stage, multiple base SVM models are dynamically adjusted according to bagged new updated information for credit scoring. These updated base models are further combined to generate a dynamic credit scoring. In the empirical experiment, the new method was compared with the traditional model of non-incremental SVM ensemble for credit scoring.

Findings

The results show that the new model is able to continuously and dynamically adjust credit scoring according to corporate incremental information, which helps produce better evaluation ability than the traditional model.

Originality/value

This research pioneered on dynamic modeling for credit scoring with incremental SVM ensemble. As time pasts, new incremental samples will be combined with support vectors of old samples to construct SVM ensemble credit scoring model. The incremental model will continuously adjust itself to keep good evaluation performance.

Details

Kybernetes, vol. 44 no. 4
Type: Research Article
ISSN: 0368-492X

Keywords

Article
Publication date: 29 July 2014

Chih-Fong Tsai and Chihli Hung

Credit scoring is important for financial institutions in order to accurately predict the likelihood of business failure. Related studies have shown that machine learning…

1042

Abstract

Purpose

Credit scoring is important for financial institutions in order to accurately predict the likelihood of business failure. Related studies have shown that machine learning techniques, such as neural networks, outperform many statistical approaches to solving this type of problem, and advanced machine learning techniques, such as classifier ensembles and hybrid classifiers, provide better prediction performance than single machine learning based classification techniques. However, it is not known which type of advanced classification technique performs better in terms of financial distress prediction. The paper aims to discuss these issues.

Design/methodology/approach

This paper compares neural network ensembles and hybrid neural networks over three benchmarking credit scoring related data sets, which are Australian, German, and Japanese data sets.

Findings

The experimental results show that hybrid neural networks and neural network ensembles outperform the single neural network. Although hybrid neural networks perform slightly better than neural network ensembles in terms of predication accuracy and errors with two of the data sets, there is no significant difference between the two types of prediction models.

Originality/value

The originality of this paper is in comparing two types of advanced classification techniques, i.e. hybrid and ensemble learning techniques, in terms of financial distress prediction.

Details

Kybernetes, vol. 43 no. 7
Type: Research Article
ISSN: 0368-492X

Keywords

Article
Publication date: 5 August 2014

Theodoros Anagnostopoulos and Christos Skourlas

The purpose of this paper is to understand the emotional state of a human being by capturing the speech utterances that are used during common conversation. Human beings…

Abstract

Purpose

The purpose of this paper is to understand the emotional state of a human being by capturing the speech utterances that are used during common conversation. Human beings except of thinking creatures are also sentimental and emotional organisms. There are six universal basic emotions plus a neutral emotion: happiness, surprise, fear, sadness, anger, disgust and neutral.

Design/methodology/approach

It is proved that, given enough acoustic evidence, the emotional state of a person can be classified by an ensemble majority voting classifier. The proposed ensemble classifier is constructed over three base classifiers: k nearest neighbors, C4.5 and support vector machine (SVM) polynomial kernel.

Findings

The proposed ensemble classifier achieves better performance than each base classifier. It is compared with two other ensemble classifiers: one-against-all (OAA) multiclass SVM with radial basis function kernels and OAA multiclass SVM with hybrid kernels. The proposed ensemble classifier achieves better performance than the other two ensemble classifiers.

Originality/value

The current paper performs emotion classification with an ensemble majority voting classifier that combines three certain types of base classifiers which are of low computational complexity. The base classifiers stem from different theoretical background to avoid bias and redundancy. It gives to the proposed ensemble classifier the ability to be generalized in the emotion domain space.

Details

Journal of Systems and Information Technology, vol. 16 no. 3
Type: Research Article
ISSN: 1328-7265

Keywords

Article
Publication date: 7 July 2020

Jiaming Liu, Liuan Wang, Linan Zhang, Zeming Zhang and Sicheng Zhang

The primary objective of this study was to recognize critical indicators in predicting blood glucose (BG) through data-driven methods and to compare the prediction…

Abstract

Purpose

The primary objective of this study was to recognize critical indicators in predicting blood glucose (BG) through data-driven methods and to compare the prediction performance of four tree-based ensemble models, i.e. bagging with tree regressors (bagging-decision tree [Bagging-DT]), AdaBoost with tree regressors (Adaboost-DT), random forest (RF) and gradient boosting decision tree (GBDT).

Design/methodology/approach

This study proposed a majority voting feature selection method by combining lasso regression with the Akaike information criterion (AIC) (LR-AIC), lasso regression with the Bayesian information criterion (BIC) (LR-BIC) and RF to select indicators with excellent predictive performance from initial 38 indicators in 5,642 samples. The selected features were deployed to build the tree-based ensemble models. The 10-fold cross-validation (CV) method was used to evaluate the performance of each ensemble model.

Findings

The results of feature selection indicated that age, corpuscular hemoglobin concentration (CHC), red blood cell volume distribution width (RBCVDW), red blood cell volume and leucocyte count are five most important clinical/physical indicators in BG prediction. Furthermore, this study also found that the GBDT ensemble model combined with the proposed majority voting feature selection method is better than other three models with respect to prediction performance and stability.

Practical implications

This study proposed a novel BG prediction framework for better predictive analytics in health care.

Social implications

This study incorporated medical background and machine learning technology to reduce diabetes morbidity and formulate precise medical schemes.

Originality/value

The majority voting feature selection method combined with the GBDT ensemble model provides an effective decision-making tool for predicting BG and detecting diabetes risk in advance.

Article
Publication date: 7 March 2016

Stephan Körner and Frank Holzäpfel

Wake vortices that are generated by an aircraft as a consequence of lift constitute a potential danger to the following aircraft. To predict and avoid dangerous…

Abstract

Purpose

Wake vortices that are generated by an aircraft as a consequence of lift constitute a potential danger to the following aircraft. To predict and avoid dangerous situations, wake vortex transport and decay models have been developed. Being based on different model physics, they can complement each other with their individual strengths. This paper investigates the skill of a Multi-Model Ensemble (MME) approach to improve prediction performance. Therefore, this paper aims to use wake vortex models developed by NASA (APA3.2, APA3.4, TDP2.1) and by DLR (P2P). Furthermore, this paper analyzes the possibility to use the ensemble spread to compute uncertainty envelopes.

Design/methodology/approach

An MME approach called Reliability Ensemble Averaging (REA) is adapted and used to the wake vortex predictions. To train the ensemble, a set of wake vortex measurements accomplished at the airports of Frankfurt (WakeFRA), Munich (WakeMUC) and at a special airport Oberpfaffenhofen was applied.

Findings

The REA approach can outperform the best member of the ensemble, on average, regarding the root-mean-square error. Moreover, the ensemble delivers reasonable uncertainty envelopes.

Practical implications

Reliable wake vortex predictions may be applicable for both tactical optimization of aircraft separation at airports and airborne wake vortex prediction and avoidance.

Originality/value

Ensemble approaches are widely used in weather forecasting, but they have never been applied to wake vortex predictions. Until today, the uncertainty envelopes for wake vortex forecasts have been computed among others from perturbed initial conditions or perturbed physics as well as from uncertainties from environmental conditions or from safety margins but not from the spread of structurally independent model forecasts.

Details

Aircraft Engineering and Aerospace Technology: An International Journal, vol. 88 no. 2
Type: Research Article
ISSN: 1748-8842

Keywords

1 – 10 of over 3000