Search results

1 – 10 of 559

View access options

Article

Publication date: 7 November 2023

Ensemble of ensembles for fine particulate matter pollution prediction using big data analytics and IoT emission sensors

Christian Nnaemeka Egwim, Hafiz Alaka, Youlu Pan, Habeeb Balogun, Saheed Ajayi, Abdul Hye and Oluwapelumi Oluwaseun Egunjobi

The study aims to develop a multilayer high-effective ensemble of ensembles predictive model (stacking ensemble) using several hyperparameter optimized ensemble machine learning…

HTML

PDF (3 MB)

Downloads

Abstract

Purpose

The study aims to develop a multilayer high-effective ensemble of ensembles predictive model (stacking ensemble) using several hyperparameter optimized ensemble machine learning (ML) methods (bagging and boosting ensembles) trained with high-volume data points retrieved from Internet of Things (IoT) emission sensors, time-corresponding meteorology and traffic data.

Design/methodology/approach

For a start, the study experimented big data hypothesis theory by developing sample ensemble predictive models on different data sample sizes and compared their results. Second, it developed a standalone model and several bagging and boosting ensemble models and compared their results. Finally, it used the best performing bagging and boosting predictive models as input estimators to develop a novel multilayer high-effective stacking ensemble predictive model.

Findings

Results proved data size to be one of the main determinants to ensemble ML predictive power. Second, it proved that, as compared to using a single algorithm, the cumulative result from ensemble ML algorithms is usually always better in terms of predicted accuracy. Finally, it proved stacking ensemble to be a better model for predicting PM_2.5 concentration level than bagging and boosting ensemble models.

Research limitations/implications

A limitation of this study is the trade-off between performance of this novel model and the computational time required to train it. Whether this gap can be closed remains an open research question. As a result, future research should attempt to close this gap. Also, future studies can integrate this novel model to a personal air quality messaging system to inform public of pollution levels and improve public access to air quality forecast.

Practical implications

The outcome of this study will aid the public to proactively identify highly polluted areas thus potentially reducing pollution-associated/ triggered COVID-19 (and other lung diseases) deaths/ complications/ transmission by encouraging avoidance behavior and support informed decision to lock down by government bodies when integrated into an air pollution monitoring system

Originality/value

This study fills a gap in literature by providing a justification for selecting appropriate ensemble ML algorithms for PM_2.5 concentration level predictive modeling. Second, it contributes to the big data hypothesis theory, which suggests that data size is one of the most important factors of ML predictive capability. Third, it supports the premise that when using ensemble ML algorithms, the cumulative output is usually always better in terms of predicted accuracy than using a single algorithm. Finally developing a novel multilayer high-performant hyperparameter optimized ensemble of ensembles predictive model that can accurately predict PM_2.5 concentration levels with improved model interpretability and enhanced generalizability, as well as the provision of a novel databank of historic pollution data from IoT emission sensors that can be purchased for research, consultancy and policymaking.

Details

Journal of Engineering, Design and Technology , vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 1726-0531

Keywords

View access options

Article

Publication date: 11 July 2023

Machine learning and deep learning-based advanced classification techniques for the detection of major depressive disorder

Abhinandan Chatterjee, Pradip Bala, Shruti Gedam, Sanchita Paul and Nishant Goyal

Depression is a mental health problem characterized by a persistent sense of sadness and loss of interest. EEG signals are regarded as the most appropriate instruments for…

HTML

PDF (3.4 MB)

Downloads

108

Abstract

Purpose

Depression is a mental health problem characterized by a persistent sense of sadness and loss of interest. EEG signals are regarded as the most appropriate instruments for diagnosing depression because they reflect the operating status of the human brain. The purpose of this study is the early detection of depression among people using EEG signals.

Design/methodology/approach

(i) Artifacts are removed by filtering and linear and non-linear features are extracted; (ii) feature scaling is done using a standard scalar while principal component analysis (PCA) is used for feature reduction; (iii) the linear, non-linear and combination of both (only for those whose accuracy is highest) are taken for further analysis where some ML and DL classifiers are applied for the classification of depression; and (iv) in this study, total 15 distinct ML and DL methods, including KNN, SVM, bagging SVM, RF, GB, Extreme Gradient Boosting, MNB, Adaboost, Bagging RF, BootAgg, Gaussian NB, RNN, 1DCNN, RBFNN and LSTM, that have been effectively utilized as classifiers to handle a variety of real-world issues.

Findings

1. Among all, alpha, alpha asymmetry, gamma and gamma asymmetry give the best results in linear features, while RWE, DFA, CD and AE give the best results in non-linear feature. 2. In the linear features, gamma and alpha asymmetry have given 99.98% accuracy for Bagging RF, while gamma asymmetry has given 99.98% accuracy for BootAgg. 3. For non-linear features, it has been shown 99.84% of accuracy for RWE and DFA in RF, 99.97% accuracy for DFA in XGBoost and 99.94% accuracy for RWE in BootAgg. 4. By using DL, in linear features, gamma asymmetry has given more than 96% accuracy in RNN and 91% accuracy in LSTM and for non-linear features, 89% accuracy has been achieved for CD and AE in LSTM. 5. By combining linear and non-linear features, the highest accuracy was achieved in Bagging RF (98.50%) gamma asymmetry + RWE. In DL, Alpha + RWE, Gamma asymmetry + CD and gamma asymmetry + RWE have achieved 98% accuracy in LSTM.

Originality/value

A novel dataset was collected from the Central Institute of Psychiatry (CIP), Ranchi which was recorded using a 128-channels whereas major previous studies used fewer channels; the details of the study participants are summarized and a model is developed for statistical analysis using N-way ANOVA; artifacts are removed by high and low pass filtering of epoch data followed by re-referencing and independent component analysis for noise removal; linear features, namely, band power and interhemispheric asymmetry and non-linear features, namely, relative wavelet energy, wavelet entropy, Approximate entropy, sample entropy, detrended fluctuation analysis and correlation dimension are extracted; this model utilizes Epoch (213,072) for 5 s EEG data, which allows the model to train for longer, thereby increasing the efficiency of classifiers. Features scaling is done using a standard scalar rather than normalization because it helps increase the accuracy of the models (especially for deep learning algorithms) while PCA is used for feature reduction; the linear, non-linear and combination of both features are taken for extensive analysis in conjunction with ML and DL classifiers for the classification of depression. The combination of linear and non-linear features (only for those whose accuracy is highest) is used for the best detection results.

Details

Aslib Journal of Information Management, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 2050-3806

Keywords

View access options

Article

Publication date: 30 March 2023

A stacked ensemble learning method for customer lifetime value prediction

Nader Asadi Ejgerdi and Mehrdad Kazerooni

With the growth of organizations and businesses, customer acquisition and retention processes have become more complex in the long run. That is why customer lifetime value (CLV…

HTML

PDF (4.1 MB)

Downloads

200

Abstract

Purpose

With the growth of organizations and businesses, customer acquisition and retention processes have become more complex in the long run. That is why customer lifetime value (CLV) has become crucial to sales managers. Predicting the CLV is a strategic weapon and competitive advantage in increasing profitability and identifying customers with more splendid profitability and is one of the essential key performance indicators (KPI) used in customer segmentation. Thus, this paper proposes a stacked ensemble learning method, a combination of multiple machine learning methods, for CLV prediction.

Design/methodology/approach

In order to utilize customers’ behavioral features for predicting the value of each customer’s CLV, the data of a textile sales company was used as a case study. The proposed stacked ensemble learning method is compared with several popular predictive methods named deep neural networks, bagging support vector regression, light gradient boosting machine, random forest and extreme gradient boosting.

Findings

Empirical results indicate that the regression performance of the stacked ensemble learning method outperformed other methods in terms of normalized rooted mean squared error, normalized mean absolute error and coefficient of determination, at 0.248, 0.364 and 0.848, respectively. In addition, the prediction capability of the proposed method improved significantly after optimizing its hyperparameters.

Originality/value

This paper proposes a stacked ensemble learning method as a new method for accurate CLV prediction. The results and comparisons support the robustness and efficiency of the proposed method for CLV prediction.

Details

Kybernetes, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 0368-492X

Keywords

View access options

Article

Publication date: 26 September 2022

Comparison of machine learning algorithms for evaluating building energy efficiency using big data analytics

Christian Nnaemeka Egwim, Hafiz Alaka, Oluwapelumi Oluwaseun Egunjobi, Alvaro Gomes and Iosif Mporas

This study aims to compare and evaluate the application of commonly used machine learning (ML) algorithms used to develop models for assessing energy efficiency of buildings.

HTML

PDF (1.5 MB)

Downloads

249

Abstract

Purpose

This study aims to compare and evaluate the application of commonly used machine learning (ML) algorithms used to develop models for assessing energy efficiency of buildings.

Design/methodology/approach

This study foremostly combined building energy efficiency ratings from several data sources and used them to create predictive models using a variety of ML methods. Secondly, to test the hypothesis of ensemble techniques, this study designed a hybrid stacking ensemble approach based on the best performing bagging and boosting ensemble methods generated from its predictive analytics.

Findings

Based on performance evaluation metrics scores, the extra trees model was shown to be the best predictive model. More importantly, this study demonstrated that the cumulative result of ensemble ML algorithms is usually always better in terms of predicted accuracy than a single method. Finally, it was discovered that stacking is a superior ensemble approach for analysing building energy efficiency than bagging and boosting.

Research limitations/implications

While the proposed contemporary method of analysis is assumed to be applicable in assessing energy efficiency of buildings within the sector, the unique data transformation used in this study may not, as typical of any data driven model, be transferable to the data from other regions other than the UK.

Practical implications

This study aids in the initial selection of appropriate and high-performing ML algorithms for future analysis. This study also assists building managers, residents, government agencies and other stakeholders in better understanding contributing factors and making better decisions about building energy performance. Furthermore, this study will assist the general public in proactively identifying buildings with high energy demands, potentially lowering energy costs by promoting avoidance behaviour and assisting government agencies in making informed decisions about energy tariffs when this novel model is integrated into an energy monitoring system.

Originality/value

This study fills a gap in the lack of a reason for selecting appropriate ML algorithms for assessing building energy efficiency. More importantly, this study demonstrated that the cumulative result of ensemble ML algorithms is usually always better in terms of predicted accuracy than a single method.

Details

Journal of Engineering, Design and Technology , vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 1726-0531

Keywords

View access options

Article

Publication date: 29 September 2023

Influencing factors and prediction of overcapacity of new energy enterprises in China

Wen-Qian Lou, Bin Wu and Bo-Wen Zhu

This study aims to clarify influencing factors of overcapacity of new energy enterprises in China and accurately predict whether these enterprises have overcapacity.

HTML

PDF (1 MB)

Downloads

Abstract

Purpose

This study aims to clarify influencing factors of overcapacity of new energy enterprises in China and accurately predict whether these enterprises have overcapacity.

Design/methodology/approach

Based on relevant data including the experience and evidence from the capital market in China, the research establishes a generic univariate selection-comparative machine learning model to study relevant factors that affect overcapacity of new energy enterprises from five dimensions. These include the governmental intervention, market demand, corporate finance, corporate governance and corporate decision. Moreover, the bridging approach is used to strengthen findings from quantitative studies via the results from qualitative studies.

Findings

The authors' results show that the overcapacity of new energy enterprises in China is brought out by the combined effect of governmental intervention corporate governance and corporate decision. Governmental interventions increase the overcapacity risk of new energy enterprises mainly by distorting investment behaviors of enterprises. Corporate decision and corporate governance factors affect the overcapacity mainly by regulating the degree of overconfidence of the management team and the agency cost. Among the eight comparable integrated models, generic univariate selection-bagging exhibits the optimal comprehensive generalization performance and its area under the receiver operating characteristic curve Area under curve (AUC) accuracy precision and recall are 0.719, 0.960, 0.975 and 0.983, respectively.

Originality/value

The proposed integrated model analyzes causes and predicts presence of overcapacity of new energy enterprises to help governments to formulate appropriate strategies to deal with overcapacity and new energy enterprises to optimize resource allocation. Ten main features which affect the overcapacity of new energy enterprises in China are identified through generic univariate selection model. Through the bridging approach, the impact of the main features on the overcapacity of new energy enterprises and the mechanism of the influence are analyzed.

Details

Kybernetes, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 0368-492X

Keywords

View access options

Article

Publication date: 3 November 2023

Fatal structure fire classification from building fire data using machine learning

Vimala Balakrishnan, Aainaa Nadia Mohammed Hashim, Voon Chung Lee, Voon Hee Lee and Ying Qiu Lee

This study aims to develop a machine learning model to detect structure fire fatalities using a dataset comprising 11,341 cases from 2011 to 2019.

HTML

PDF (2 MB)

Downloads

Abstract

Purpose

This study aims to develop a machine learning model to detect structure fire fatalities using a dataset comprising 11,341 cases from 2011 to 2019.

Design/methodology/approach

Exploratory data analysis (EDA) was conducted prior to modelling, in which ten machine learning models were experimented with.

Findings

The main fatal structure fire risk factors were fires originating from bedrooms, living areas and the cooking/dining areas. The highest fatality rate (20.69%) was reported for fires ignited due to bedding (23.43%), despite a low fire incident rate (3.50%). Using 21 structure fire features, Random Forest (RF) yielded the best detection performance with 86% accuracy, followed by Decision Tree (DT) with bagging (accuracy = 84.7%).

Research limitations/practical implications

Limitations of the study are pertaining to data quality and grouping of categories in the data pre-processing stage, which could affect the performance of the models.

Originality/value

The study is the first of its kind to manipulate risk factors to detect fatal structure classification, particularly focussing on structure fire fatalities. Most of the previous studies examined the importance of fire risk factors and their relationship to the fire risk level.

Details

International Journal of Intelligent Computing and Cybernetics, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 1756-378X

Keywords

View access options

Article

Publication date: 20 March 2024

Swirl-induced motion prediction with physics-guided machine learning utilizing spatiotemporal flow field structure

Ziming Zhou, Fengnian Zhao and David Hung

Higher energy conversion efficiency of internal combustion engine can be achieved with optimal control of unsteady in-cylinder flow fields inside a direct-injection (DI) engine…

HTML

PDF (4.1 MB)

Downloads

Abstract

Purpose

Higher energy conversion efficiency of internal combustion engine can be achieved with optimal control of unsteady in-cylinder flow fields inside a direct-injection (DI) engine. However, it remains a daunting task to predict the nonlinear and transient in-cylinder flow motion because they are highly complex which change both in space and time. Recently, machine learning methods have demonstrated great promises to infer relatively simple temporal flow field development. This paper aims to feature a physics-guided machine learning approach to realize high accuracy and generalization prediction for complex swirl-induced flow field motions.

Design/methodology/approach

To achieve high-fidelity time-series prediction of unsteady engine flow fields, this work features an automated machine learning framework with the following objectives: (1) The spatiotemporal physical constraint of the flow field structure is transferred to machine learning structure. (2) The ML inputs and targets are efficiently designed that ensure high model convergence with limited sets of experiments. (3) The prediction results are optimized by ensemble learning mechanism within the automated machine learning framework.

Findings

The proposed data-driven framework is proven effective in different time periods and different extent of unsteadiness of the flow dynamics, and the predicted flow fields are highly similar to the target field under various complex flow patterns. Among the described framework designs, the utilization of spatial flow field structure is the featured improvement to the time-series flow field prediction process.

Originality/value

The proposed flow field prediction framework could be generalized to different crank angle periods, cycles and swirl ratio conditions, which could greatly promote real-time flow control and reduce experiments on in-cylinder flow field measurement and diagnostics.

Details

International Journal of Numerical Methods for Heat & Fluid Flow, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 0961-5539

Keywords

View access options

Article

Publication date: 2 January 2024

The impact of policy intervention on international wine demand

Xinyang Liu, Anyu Liu, Xiaoying Jiao and Zhen Liu

The purpose of the study is to investigate the impact of implementing anti-dumping duties on imported Australian wine to China in the short- and long-run, respectively.

HTML

PDF (1 MB)

Downloads

219

Abstract

Purpose

The purpose of the study is to investigate the impact of implementing anti-dumping duties on imported Australian wine to China in the short- and long-run, respectively.

Design/methodology/approach

First, the Difference-in-Differences (DID) method is used in this study to evaluate the short-run causal effect of implementing anti-dumping duties on imported Australian wine to China. Second, a Bayesian ensemble method is used to predict 2023–2025 wine exports from Australia to China. The disparity between the forecasts and counterfactual prediction which assumes no anti-dumping duties represents the accumulated impact of the anti-dumping duties in the long run.

Findings

The anti-dumping duties resulted in a significant decline in red and rose, white and sparkling wine exports to China by 92.59%, 99.06% and 90.06%, respectively, in 2021. In the long run, wine exports to China are projected to continue this downward trend, with an average annual growth rate of −21.92%, −38.90% and −9.54% for the three types of wine, respectively. In contrast, the counterfactual prediction indicates an increase of 3.20%, 20.37% and 4.55% for the respective categories. Consequently, the policy intervention is expected to result in a decrease of 96.11%, 93.15% and 84.11% in red and rose, white and sparkling wine exports to China from 2021 to 2025.

Originality/value

The originality of this study lies in the creation of an economic paradigm for assessing policy impacts within the realm of wine economics. Methodologically, it also represents the pioneering application of the DID and Bayesian ensemble forecasting methods within the field of wine economics.

Details

International Journal of Contemporary Hospitality Management, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 0959-6119

Keywords

Open Access

Article

Publication date: 27 February 2024

Using machine learning to determine factors affecting product and product–service innovation

Oscar F. Bustinza, Luis M. Molina Fernandez and Marlene Mendoza Macías

Machine learning (ML) analytical tools are increasingly being considered as an alternative quantitative methodology in management research. This paper proposes a new approach for…

HTML

PDF (1.3 MB)

Downloads

453

Abstract

Purpose

Machine learning (ML) analytical tools are increasingly being considered as an alternative quantitative methodology in management research. This paper proposes a new approach for uncovering the antecedents behind product and product–service innovation (PSI).

Design/methodology/approach

The ML approach is novel in the field of innovation antecedents at the country level. A sample of the Equatorian National Survey on Technology and Innovation, consisting of more than 6,000 firms, is used to rank the antecedents of innovation.

Findings

The analysis reveals that the antecedents of product and PSI are distinct, yet rooted in the principles of open innovation and competitive priorities.

Research limitations/implications

The analysis is based on a sample of Equatorian firms with the objective of showing how ML techniques are suitable for testing the antecedents of innovation in any other context.

Originality/value

The novel ML approach, in contrast to traditional quantitative analysis of the topic, can consider the full set of antecedent interactions to each of the innovations analyzed.

Details

Journal of Enterprise Information Management, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 1741-0398

Keywords

View access options

Article

Publication date: 9 January 2024

Comprehensive evaluation of classification: an empirical study on consequence prediction of construction accidents in China

Ning Chen, Zhenyu Zhang and An Chen

Consequence prediction is an emerging topic in safety management concerning the severity outcome of accidents. In practical applications, it is usually implemented through…

HTML

PDF (356 KB)

Downloads

Abstract

Purpose

Consequence prediction is an emerging topic in safety management concerning the severity outcome of accidents. In practical applications, it is usually implemented through supervised learning methods; however, the evaluation of classification results remains a challenge. The previous studies mostly adopted simplex evaluation based on empirical and quantitative assessment strategies. This paper aims to shed new light on the comprehensive evaluation and comparison of diverse classification methods through visualization, clustering and ranking techniques.

Design/methodology/approach

An empirical study is conducted using 9 state-of-the-art classification methods on a real-world data set of 653 construction accidents in China for predicting the consequence with respect to 39 carefully featured factors and accident type. The proposed comprehensive evaluation enriches the interpretation of classification results from different perspectives. Furthermore, the critical factors leading to severe construction accidents are identified by analyzing the coefficients of a logistic regression model.

Findings

This paper identifies the critical factors that significantly influence the consequence of construction accidents, which include accident type (particularly collapse), improper accident reporting and handling (E21), inadequate supervision engineers (O41), no special safety department (O11), delayed or low-quality drawings (T11), unqualified contractor (C21), schedule pressure (C11), multi-level subcontracting (C22), lacking safety examination (S22), improper operation of mechanical equipment (R11) and improper construction procedure arrangement (T21). The prediction models and findings of critical factors help make safety intervention measures in a targeted way and enhance the experience of safety professionals in the construction industry.

Research limitations/implications

The empirical study using some well-known classification methods for forecasting the consequences of construction accidents provides some evidence for the comprehensive evaluation of multiple classifiers. These techniques can be used jointly with other evaluation approaches for a comprehensive understanding of the classification algorithms. Despite the limitation of specific methods used in the study, the presented methodology can be configured with other classification methods and performance metrics and even applied to other decision-making problems such as clustering.

Originality/value

This study sheds new light on the comprehensive comparison and evaluation of classification results through visualization, clustering and ranking techniques using an empirical study of consequence prediction of construction accidents. The relevance of construction accident type is discussed with the severity of accidents. The critical factors influencing the accident consequence are identified for the sake of taking prevention measures for risk reduction. The proposed method can be applied to other decision-making tasks where the evaluation is involved as an important component.

Details

Construction Innovation , vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 1471-4175

Keywords

Access

Year

Content type

Earlycite article (559)

1 – 10 of 559