Search results

1 – 10 of 339

View access options

Article

Publication date: 28 June 2021

Application of stacking ensemble machine learning algorithm in predicting the cost of highway construction projects

Meseret Getnet Meharie, Wubshet Jekale Mengesha, Zachary Abiero Gariy and Raphael N.N. Mutuku

The purpose of this study to apply stacking ensemble machine learning algorithm for predicting the cost of highway construction projects.

HTML

PDF (1.4 MB)

Downloads

964

Abstract

Purpose

The purpose of this study to apply stacking ensemble machine learning algorithm for predicting the cost of highway construction projects.

Design/methodology/approach

The proposed stacking ensemble model was developed by combining three distinct base predictive models automatically and optimally: linear regression, support vector machine and artificial neural network models using gradient boosting algorithm as meta-regressor.

Findings

The findings reveal that the proposed model predicted the final project cost with a very small prediction error value. This implies that the difference between predicted and actual cost was quite small. A comparison of the results of the models revealed that in all performance metrics, the stacking ensemble model outperforms the sole ones. The stacking ensemble cost model produces 86.8, 87.8 and 5.6 percent more accurate results than linear regression, vector machine support, and neural network models, respectively, based on the root mean square error values.

Research limitations/implications

The study shows how stacking ensemble machine learning algorithm applies to predict the cost of construction projects. The estimators or practitioners can use the new model as an effectual and reliable tool for predicting the cost of Ethiopian highway construction projects at the preliminary stage.

Originality/value

The study provides insight into the machine learning algorithm application in forecasting the cost of future highway construction projects in Ethiopia.

Details

Engineering, Construction and Architectural Management, vol. 29 no. 7

Type: Research Article

DOI:

ISSN: 0969-9988

Keywords

View access options

Article

Publication date: 23 November 2022

Development and comparative of a new meta-ensemble machine learning model in predicting construction labor productivity

Ibrahim Karatas and Abdulkadir Budak

The study is aimed to compare the prediction success of basic machine learning and ensemble machine learning models and accordingly create novel prediction models by combining…

HTML

PDF (3.5 MB)

Downloads

557

Abstract

Purpose

The study is aimed to compare the prediction success of basic machine learning and ensemble machine learning models and accordingly create novel prediction models by combining machine learning models to increase the prediction success in construction labor productivity prediction models.

Design/methodology/approach

Categorical and numerical data used in prediction models in many studies in the literature for the prediction of construction labor productivity were made ready for analysis by preprocessing. The Python programming language was used to develop machine learning models. As a result of many variation trials, the models were combined and the proposed novel voting and stacking meta-ensemble machine learning models were constituted. Finally, the models were compared to Target and Taylor diagram.

Findings

Meta-ensemble models have been developed for labor productivity prediction by combining machine learning models. Voting ensemble by combining et, gbm, xgboost, lightgbm, catboost and mlp models and stacking ensemble by combining et, gbm, xgboost, catboost and mlp models were created and finally the Et model as meta-learner was selected. Considering the prediction success, it has been determined that the voting and stacking meta-ensemble algorithms have higher prediction success than other machine learning algorithms. Model evaluation metrics, namely MAE, MSE, RMSE and R², were selected to measure the prediction success. For the voting meta-ensemble algorithm, the values of the model evaluation metrics MAE, MSE, RMSE and R² are 0.0499, 0.0045, 0.0671 and 0.7886, respectively. For the stacking meta-ensemble algorithm, the values of the model evaluation metrics MAE, MSE, RMSE and R² are 0.0469, 0.0043, 0.0658 and 0.7967, respectively.

Research limitations/implications

The study shows the comparison between machine learning algorithms and created novel meta-ensemble machine learning algorithms to predict the labor productivity of construction formwork activity. The practitioners and project planners can use this model as reliable and accurate tool for predicting the labor productivity of construction formwork activity prior to construction planning.

Originality/value

The study provides insight into the application of ensemble machine learning algorithms in predicting construction labor productivity. Additionally, novel meta-ensemble algorithms have been used and proposed. Therefore, it is hoped that predicting the labor productivity of construction formwork activity with high accuracy will make a great contribution to construction project management.

Details

Engineering, Construction and Architectural Management, vol. 31 no. 3

Type: Research Article

DOI:

ISSN: 0969-9988

Keywords

View access options

Article

Publication date: 30 March 2023

A stacked ensemble learning method for customer lifetime value prediction

Nader Asadi Ejgerdi and Mehrdad Kazerooni

With the growth of organizations and businesses, customer acquisition and retention processes have become more complex in the long run. That is why customer lifetime value (CLV…

HTML

PDF (4.1 MB)

Downloads

198

Abstract

Purpose

With the growth of organizations and businesses, customer acquisition and retention processes have become more complex in the long run. That is why customer lifetime value (CLV) has become crucial to sales managers. Predicting the CLV is a strategic weapon and competitive advantage in increasing profitability and identifying customers with more splendid profitability and is one of the essential key performance indicators (KPI) used in customer segmentation. Thus, this paper proposes a stacked ensemble learning method, a combination of multiple machine learning methods, for CLV prediction.

Design/methodology/approach

In order to utilize customers’ behavioral features for predicting the value of each customer’s CLV, the data of a textile sales company was used as a case study. The proposed stacked ensemble learning method is compared with several popular predictive methods named deep neural networks, bagging support vector regression, light gradient boosting machine, random forest and extreme gradient boosting.

Findings

Empirical results indicate that the regression performance of the stacked ensemble learning method outperformed other methods in terms of normalized rooted mean squared error, normalized mean absolute error and coefficient of determination, at 0.248, 0.364 and 0.848, respectively. In addition, the prediction capability of the proposed method improved significantly after optimizing its hyperparameters.

Originality/value

This paper proposes a stacked ensemble learning method as a new method for accurate CLV prediction. The results and comparisons support the robustness and efficiency of the proposed method for CLV prediction.

Details

Kybernetes, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 0368-492X

Keywords

View access options

Article

Publication date: 2 February 2022

Short-term cooling load prediction for office buildings based on feature selection scheme and stacking ensemble model

Wenzhong Gao, Xingzong Huang, Mengya Lin, Jing Jia and Zhen Tian

The purpose of this paper is to target on designing a short-term load prediction framework that can accurately predict the cooling load of office buildings.

HTML

PDF (5.7 MB)

Downloads

311

Abstract

Purpose

The purpose of this paper is to target on designing a short-term load prediction framework that can accurately predict the cooling load of office buildings.

Design/methodology/approach

A feature selection scheme and stacking ensemble model to fulfill cooling load prediction task was proposed. Firstly, the abnormal data were identified by the data density estimation algorithm. Secondly, the crucial input features were clarified from three aspects (i.e. historical load information, time information and meteorological information). Thirdly, the stacking ensemble model combined long short-term memory network and light gradient boosting machine was utilized to predict the cooling load. Finally, the proposed framework performances by predicting cooling load of office buildings were verified with indicators.

Findings

The identified input features can improve the prediction performance. The prediction accuracy of the proposed model is preferable to the existing ones. The stacking ensemble model is robust to weather forecasting errors.

Originality/value

The stacking ensemble model was used to fulfill cooling load prediction task which can overcome the shortcomings of deep learning models. The input features of the model, which are less focused on in most studies, are taken as an important step in this paper.

Details

Engineering Computations, vol. 39 no. 5

Type: Research Article

DOI:

ISSN: 0264-4401

Keywords

View access options

Article

Publication date: 7 November 2023

Ensemble of ensembles for fine particulate matter pollution prediction using big data analytics and IoT emission sensors

Christian Nnaemeka Egwim, Hafiz Alaka, Youlu Pan, Habeeb Balogun, Saheed Ajayi, Abdul Hye and Oluwapelumi Oluwaseun Egunjobi

The study aims to develop a multilayer high-effective ensemble of ensembles predictive model (stacking ensemble) using several hyperparameter optimized ensemble machine learning…

HTML

PDF (3 MB)

Downloads

Abstract

Purpose

The study aims to develop a multilayer high-effective ensemble of ensembles predictive model (stacking ensemble) using several hyperparameter optimized ensemble machine learning (ML) methods (bagging and boosting ensembles) trained with high-volume data points retrieved from Internet of Things (IoT) emission sensors, time-corresponding meteorology and traffic data.

Design/methodology/approach

For a start, the study experimented big data hypothesis theory by developing sample ensemble predictive models on different data sample sizes and compared their results. Second, it developed a standalone model and several bagging and boosting ensemble models and compared their results. Finally, it used the best performing bagging and boosting predictive models as input estimators to develop a novel multilayer high-effective stacking ensemble predictive model.

Findings

Results proved data size to be one of the main determinants to ensemble ML predictive power. Second, it proved that, as compared to using a single algorithm, the cumulative result from ensemble ML algorithms is usually always better in terms of predicted accuracy. Finally, it proved stacking ensemble to be a better model for predicting PM_2.5 concentration level than bagging and boosting ensemble models.

Research limitations/implications

A limitation of this study is the trade-off between performance of this novel model and the computational time required to train it. Whether this gap can be closed remains an open research question. As a result, future research should attempt to close this gap. Also, future studies can integrate this novel model to a personal air quality messaging system to inform public of pollution levels and improve public access to air quality forecast.

Practical implications

The outcome of this study will aid the public to proactively identify highly polluted areas thus potentially reducing pollution-associated/ triggered COVID-19 (and other lung diseases) deaths/ complications/ transmission by encouraging avoidance behavior and support informed decision to lock down by government bodies when integrated into an air pollution monitoring system

Originality/value

This study fills a gap in literature by providing a justification for selecting appropriate ensemble ML algorithms for PM_2.5 concentration level predictive modeling. Second, it contributes to the big data hypothesis theory, which suggests that data size is one of the most important factors of ML predictive capability. Third, it supports the premise that when using ensemble ML algorithms, the cumulative output is usually always better in terms of predicted accuracy than using a single algorithm. Finally developing a novel multilayer high-performant hyperparameter optimized ensemble of ensembles predictive model that can accurately predict PM_2.5 concentration levels with improved model interpretability and enhanced generalizability, as well as the provision of a novel databank of historic pollution data from IoT emission sensors that can be purchased for research, consultancy and policymaking.

Details

Journal of Engineering, Design and Technology , vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 1726-0531

Keywords

View access options

Article

Publication date: 5 February 2018

Predicting student academic performance using multi-model heterogeneous ensemble approach

Olugbenga Wilson Adejo and Thomas Connolly

The purpose of this paper is to empirically investigate and compare the use of multiple data sources, different classifiers and ensembles of classifiers technique in predicting…

HTML

PDF (609 KB)

Downloads

1192

Abstract

Purpose

The purpose of this paper is to empirically investigate and compare the use of multiple data sources, different classifiers and ensembles of classifiers technique in predicting student academic performance. The study will compare the performance and efficiency of ensemble techniques that make use of different combination of data sources with that of base classifiers with single data source.

Design/methodology/approach

Using a quantitative research methodology, data samples of 141 learners enrolled in the University of the West of Scotland were extracted from the institution’s databases and also collected through survey questionnaire. The research focused on three data sources: student record system, learning management system and survey, and also used three state-of-art data mining classifiers, namely, decision tree, artificial neural network and support vector machine for the modeling. In addition, the ensembles of these base classifiers were used in the student performance prediction and the performances of the seven different models developed were compared using six different evaluation metrics.

Findings

The results show that the approach of using multiple data sources along with heterogeneous ensemble techniques is very efficient and accurate in prediction of student performance as well as help in proper identification of student at risk of attrition.

Practical implications

The approach proposed in this study will help the educational administrators and policy makers working within educational sector in the development of new policies and curriculum on higher education that are relevant to student retention. In addition, the general implications of this research to practice is its ability to accurately help in early identification of students at risk of dropping out of HE from the combination of data sources so that necessary support and intervention can be provided.

Originality/value

The research empirically investigated and compared the performance accuracy and efficiency of single classifiers and ensemble of classifiers that make use of single and multiple data sources. The study has developed a novel hybrid model that can be used for predicting student performance that is high in accuracy and efficient in performance. Generally, this research study advances the understanding of the application of ensemble techniques to predicting student performance using learner data and has successfully addressed these fundamental questions: What combination of variables will accurately predict student academic performance? What is the potential of the use of stacking ensemble techniques in accurately predicting student academic performance?

Details

Journal of Applied Research in Higher Education, vol. 10 no. 1

Type: Research Article

DOI:

ISSN: 2050-7003

Keywords

Open Access

Article

Publication date: 21 June 2022

Design of ensemble recurrent model with stacked fuzzy ARTMAP for breast cancer detection

Abhishek Das and Mihir Narayan Mohanty

In time and accurate detection of cancer can save the life of the person affected. According to the World Health Organization (WHO), breast cancer occupies the most frequent…

HTML

PDF (1.7 MB)

Downloads

542

Abstract

Purpose

In time and accurate detection of cancer can save the life of the person affected. According to the World Health Organization (WHO), breast cancer occupies the most frequent incidence among all the cancers whereas breast cancer takes fifth place in the case of mortality numbers. Out of many image processing techniques, certain works have focused on convolutional neural networks (CNNs) for processing these images. However, deep learning models are to be explored well.

Design/methodology/approach

In this work, multivariate statistics-based kernel principal component analysis (KPCA) is used for essential features. KPCA is simultaneously helpful for denoising the data. These features are processed through a heterogeneous ensemble model that consists of three base models. The base models comprise recurrent neural network (RNN), long short-term memory (LSTM) and gated recurrent unit (GRU). The outcomes of these base learners are fed to fuzzy adaptive resonance theory mapping (ARTMAP) model for decision making as the nodes are added to the F_2ˆa layer if the winning criteria are fulfilled that makes the ARTMAP model more robust.

Findings

The proposed model is verified using breast histopathology image dataset publicly available at Kaggle. The model provides 99.36% training accuracy and 98.72% validation accuracy. The proposed model utilizes data processing in all aspects, i.e. image denoising to reduce the data redundancy, training by ensemble learning to provide higher results than that of single models. The final classification by a fuzzy ARTMAP model that controls the number of nodes depending upon the performance makes robust accurate classification.

Research limitations/implications

Research in the field of medical applications is an ongoing method. More advanced algorithms are being developed for better classification. Still, the scope is there to design the models in terms of better performance, practicability and cost efficiency in the future. Also, the ensemble models may be chosen with different combinations and characteristics. Only signal instead of images may be verified for this proposed model. Experimental analysis shows the improved performance of the proposed model. This method needs to be verified using practical models. Also, the practical implementation will be carried out for its real-time performance and cost efficiency.

Originality/value

The proposed model is utilized for denoising and to reduce the data redundancy so that the feature selection is done using KPCA. Training and classification are performed using heterogeneous ensemble model designed using RNN, LSTM and GRU as base classifiers to provide higher results than that of single models. Use of adaptive fuzzy mapping model makes the final classification accurate. The effectiveness of combining these methods to a single model is analyzed in this work.

Details

Applied Computing and Informatics, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 2634-1964

Keywords

View access options

Article

Publication date: 12 January 2022

Using AI and ML to predict shipment times of therapeutics, diagnostics and vaccines in e-pharmacy supply chains during COVID-19 pandemic

Mahesh Babu Mariappan, Kanniga Devi, Yegnanarayanan Venkataraman, Ming K. Lim and Panneerselvam Theivendren

This paper aims to address the pressing problem of prediction concerning shipment times of therapeutics, diagnostics and vaccines during the ongoing COVID-19 pandemic using a…

HTML

PDF (4.4 MB)

Downloads

1061

Abstract

Purpose

This paper aims to address the pressing problem of prediction concerning shipment times of therapeutics, diagnostics and vaccines during the ongoing COVID-19 pandemic using a novel artificial intelligence (AI) and machine learning (ML) approach.

Design/methodology/approach

The present study used organic real-world therapeutic supplies data of over 3 million shipments collected during the COVID-19 pandemic through a large real-world e-pharmacy. The researchers built various ML multiclass classification models, namely, random forest (RF), extra trees (XRT), decision tree (DT), multilayer perceptron (MLP), XGBoost (XGB), CatBoost (CB), linear stochastic gradient descent (SGD) and the linear Naïve Bayes (NB) and trained them on striped datasets of (source, destination, shipper) triplets. The study stacked the base models and built stacked meta-models. Subsequently, the researchers built a model zoo with a combination of the base models and stacked meta-models trained on these striped datasets. The study used 10-fold cross-validation (CV) for performance evaluation.

Findings

The findings reveal that the turn-around-time provided by therapeutic supply logistics providers is only 62.91% accurate when compared to reality. In contrast, the solution provided in this study is up to 93.5% accurate compared to reality, resulting in up to 48.62% improvement, with a clear trend of more historic data and better performance growing each week.

Research limitations/implications

The implication of the study has shown the efficacy of ML model zoo with a combination of base models and stacked meta-models trained on striped datasets of (source, destination and shipper) triplets for predicting the shipment times of therapeutics, diagnostics and vaccines in the e-pharmacy supply chain.

Originality/value

The novelty of the study is on the real-world e-pharmacy supply chain under post-COVID-19 lockdown conditions and has come up with a novel ML ensemble stacking based model zoo to make predictions on the shipment times of therapeutics. Through this work, it is assumed that there will be greater adoption of AI and ML techniques in shipment time prediction of therapeutics in the logistics industry in the pandemic situations.

Details

The International Journal of Logistics Management, vol. 34 no. 2

Type: Research Article

DOI:

ISSN: 0957-4093

Keywords

View access options

Article

Publication date: 3 January 2023

Weighted ensemble classifier for malicious link detection using natural language processing

Saleem Raja A., Sundaravadivazhagan Balasubaramanian, Pradeepa Ganesan, Justin Rajasekaran and Karthikeyan R.

The internet has completely merged into contemporary life. People are addicted to using internet services for everyday activities. Consequently, an abundance of information about…

HTML

PDF (849 KB)

Downloads

Abstract

Purpose

The internet has completely merged into contemporary life. People are addicted to using internet services for everyday activities. Consequently, an abundance of information about people and organizations is available online, which encourages the proliferation of cybercrimes. Cybercriminals often use malicious links for large-scale cyberattacks, which are disseminated via email, SMS and social media. Recognizing malicious links online can be exceedingly challenging. The purpose of this paper is to present a strong security system that can detect malicious links in the cyberspace using natural language processing technique.

Design/methodology/approach

The researcher recommends a variety of approaches, including blacklisting and rules-based machine/deep learning, for automatically recognizing malicious links. But the approaches generally necessitate the generation of a set of features to generalize the detection process. Most of the features are generated by processing URLs and content of the web page, as well as some external features such as the ranking of the web page and domain name system information. This process of feature extraction and selection typically takes more time and demands a high level of expertise in the domain. Sometimes the generated features may not leverage the full potentials of the data set. In addition, the majority of the currently deployed systems make use of a single classifier for the classification of malicious links. However, prediction accuracy may vary widely depending on the data set and the classifier used.

Findings

To address the issue of generating feature sets, the proposed method uses natural language processing techniques (term frequency and inverse document frequency) that vectorize URLs. To build a robust system for the classification of malicious links, the proposed system implements weighted soft voting classifier, an ensemble classifier that combines predictions of base classifiers. The ability or skill of each classifier serves as the base for the weight that is assigned to it.

Originality/value

The proposed method performs better when the optimal weights are assigned. The performance of the proposed method was assessed by using two different data sets (D1 and D2) and compared performance against base machine learning classifiers and previous research results. The outcome accuracy shows that the proposed method is superior to the existing methods, offering 91.4% and 98.8% accuracy for data sets D1 and D2, respectively.

Details

International Journal of Pervasive Computing and Communications, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 1742-7371

Keywords

View access options

Article

Publication date: 8 February 2020

Scale up predictive models for early detection of at-risk students: a feasibility study

Ying Cui, Fu Chen and Ali Shiri

This study aims to investigate the feasibility of developing general predictive models for using the learning management system (LMS) data to predict student performances in…

HTML

PDF (766 KB)

Downloads

437

Abstract

Purpose

This study aims to investigate the feasibility of developing general predictive models for using the learning management system (LMS) data to predict student performances in various courses. The authors focused on examining three practical but important questions: are there a common set of student activity variables that predict student performance in different courses? Which machine-learning classifiers tend to perform consistently well across different courses? Can the authors develop a general model for use in multiple courses to predict student performance based on LMS data?

Design/methodology/approach

Three mandatory undergraduate courses with large class sizes were selected from three different faculties at a large Western Canadian University, namely, faculties of science, engineering and education. Course-specific models for these three courses were built and compared using data from two semesters, one for model building and the other for generalizability testing.

Findings

The investigation has led the authors to conclude that it is not desirable to develop a general model in predicting course failure across variable courses. However, for the science course, the predictive model, which was built on data from one semester, was able to identify about 70% of students who failed the course and 70% of students who passed the course in another semester with only LMS data extracted from the first four weeks.

Originality/value

The results of this study are promising as they show the usability of LMS for early prediction of student course failure, which has the potential to provide students with timely feedback and support in higher education institutions.

Details

Information and Learning Sciences, vol. 121 no. 3/4

Type: Research Article

DOI:

ISSN: 2398-5348

Keywords

Access

Year

Content type

1 – 10 of 339