Search results

1 – 10 of 653
Article
Publication date: 13 April 2023

Ian Lenaers, Kris Boudt and Lieven De Moor

The purpose is twofold. First, this study aims to establish that black box tree-based machine learning (ML) models have better predictive performance than a standard linear…

176

Abstract

Purpose

The purpose is twofold. First, this study aims to establish that black box tree-based machine learning (ML) models have better predictive performance than a standard linear regression (LR) hedonic model for rent prediction. Second, it shows the added value of analyzing tree-based ML models with interpretable machine learning (IML) techniques.

Design/methodology/approach

Data on Belgian residential rental properties were collected. Tree-based ML models, random forest regression and eXtreme gradient boosting regression were applied to derive rent prediction models to compare predictive performance with a LR model. Interpretations of the tree-based models regarding important factors in predicting rent were made using SHapley Additive exPlanations (SHAP) feature importance (FI) plots and SHAP summary plots.

Findings

Results indicate that tree-based models perform better than a LR model for Belgian residential rent prediction. The SHAP FI plots agree that asking price, cadastral income, surface livable, number of bedrooms, number of bathrooms and variables measuring the proximity to points of interest are dominant predictors. The direction of relationships between rent and its factors is determined with SHAP summary plots. In addition to linear relationships, it emerges that nonlinear relationships exist.

Originality/value

Rent prediction using ML is relatively less studied than house price prediction. In addition, studying prediction models using IML techniques is relatively new in real estate economics. Moreover, to the best of the authors’ knowledge, this study is the first to derive insights of driving determinants of predicted rents from SHAP FI and SHAP summary plots.

Details

International Journal of Housing Markets and Analysis, vol. 17 no. 1
Type: Research Article
ISSN: 1753-8270

Keywords

Article
Publication date: 7 July 2020

Jiaming Liu, Liuan Wang, Linan Zhang, Zeming Zhang and Sicheng Zhang

The primary objective of this study was to recognize critical indicators in predicting blood glucose (BG) through data-driven methods and to compare the prediction performance of…

Abstract

Purpose

The primary objective of this study was to recognize critical indicators in predicting blood glucose (BG) through data-driven methods and to compare the prediction performance of four tree-based ensemble models, i.e. bagging with tree regressors (bagging-decision tree [Bagging-DT]), AdaBoost with tree regressors (Adaboost-DT), random forest (RF) and gradient boosting decision tree (GBDT).

Design/methodology/approach

This study proposed a majority voting feature selection method by combining lasso regression with the Akaike information criterion (AIC) (LR-AIC), lasso regression with the Bayesian information criterion (BIC) (LR-BIC) and RF to select indicators with excellent predictive performance from initial 38 indicators in 5,642 samples. The selected features were deployed to build the tree-based ensemble models. The 10-fold cross-validation (CV) method was used to evaluate the performance of each ensemble model.

Findings

The results of feature selection indicated that age, corpuscular hemoglobin concentration (CHC), red blood cell volume distribution width (RBCVDW), red blood cell volume and leucocyte count are five most important clinical/physical indicators in BG prediction. Furthermore, this study also found that the GBDT ensemble model combined with the proposed majority voting feature selection method is better than other three models with respect to prediction performance and stability.

Practical implications

This study proposed a novel BG prediction framework for better predictive analytics in health care.

Social implications

This study incorporated medical background and machine learning technology to reduce diabetes morbidity and formulate precise medical schemes.

Originality/value

The majority voting feature selection method combined with the GBDT ensemble model provides an effective decision-making tool for predicting BG and detecting diabetes risk in advance.

Article
Publication date: 31 May 2022

Osamah M. Al-Qershi, Junbum Kwon, Shuning Zhao and Zhaokun Li

For the case of many content features, This paper aims to investigate which content features in video and text ads more contribute to accurately predicting the success of…

Abstract

Purpose

For the case of many content features, This paper aims to investigate which content features in video and text ads more contribute to accurately predicting the success of crowdfunding by comparing prediction models.

Design/methodology/approach

With 1,368 features extracted from 15,195 Kickstarter campaigns in the USA, the authors compare base models such as logistic regression (LR) with tree-based homogeneous ensembles such as eXtreme gradient boosting (XGBoost) and heterogeneous ensembles such as XGBoost + LR.

Findings

XGBoost shows higher prediction accuracy than LR (82% vs 69%), in contrast to the findings of a previous relevant study. Regarding important content features, humans (e.g. founders) are more important than visual objects (e.g. products). In both spoken and written language, words related to experience (e.g. eat) or perception (e.g. hear) are more important than cognitive (e.g. causation) words. In addition, a focus on the future is more important than a present or past time orientation. Speech aids (see and compare) to complement visual content are also effective and positive tone matters in speech.

Research limitations/implications

This research makes theoretical contributions by finding more important visuals (human) and language features (experience, perception and future time). Also, in a multimodal context, complementary cues (e.g. speech aids) across different modalities help. Furthermore, the noncontent parts of speech such as positive “tone” or pace of speech are important.

Practical implications

Founders are encouraged to assess and revise the content of their video or text ads as well as their basic campaign features (e.g. goal, duration and reward) before they launch their campaigns. Next, overly complex ensembles may suffer from overfitting problems. In practice, model validation using unseen data is recommended.

Originality/value

Rather than reducing the number of content feature dimensions (Kaminski and Hopp, 2020), by enabling advanced prediction models to accommodate many contents features, prediction accuracy rises substantially.

Article
Publication date: 18 January 2024

Jing Tang, Yida Guo and Yilin Han

Coal is a critical global energy source, and fluctuations in its price significantly impact related enterprises' profitability. This study aims to develop a robust model for…

Abstract

Purpose

Coal is a critical global energy source, and fluctuations in its price significantly impact related enterprises' profitability. This study aims to develop a robust model for predicting the coal price index to enhance coal purchase strategies for coal-consuming enterprises and provide crucial information for global carbon emission reduction.

Design/methodology/approach

The proposed coal price forecasting system combines data decomposition, semi-supervised feature engineering, ensemble learning and deep learning. It addresses the challenge of merging low-resolution and high-resolution data by adaptively combining both types of data and filling in missing gaps through interpolation for internal missing data and self-supervision for initiate/terminal missing data. The system employs self-supervised learning to complete the filling of complex missing data.

Findings

The ensemble model, which combines long short-term memory, XGBoost and support vector regression, demonstrated the best prediction performance among the tested models. It exhibited superior accuracy and stability across multiple indices in two datasets, namely the Bohai-Rim steam-coal price index and coal daily settlement price.

Originality/value

The proposed coal price forecasting system stands out as it integrates data decomposition, semi-supervised feature engineering, ensemble learning and deep learning. Moreover, the system pioneers the use of self-supervised learning for filling in complex missing data, contributing to its originality and effectiveness.

Details

Data Technologies and Applications, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 19 February 2020

Shashidhar Kaparthi and Daniel Bumblauskas

The after-sale service industry is estimated to contribute over 8 percent to the US GDP. For use in this considerably large service management industry, this article provides…

2666

Abstract

Purpose

The after-sale service industry is estimated to contribute over 8 percent to the US GDP. For use in this considerably large service management industry, this article provides verification in the application of decision tree-based machine learning algorithms for optimal maintenance decision-making. The motivation for this research arose from discussions held with a large agricultural equipment manufacturing company interested in increasing the uptime of their expensive machinery and in helping their dealer network.

Design/methodology/approach

We propose a general strategy for the design of predictive maintenance systems using machine learning techniques. Then, we present a case study where multiple machine learning algorithms are applied to a particular example situation for an illustration of the proposed strategy and evaluation of its performance.

Findings

We found progressive improvements using such machine learning techniques in terms of accuracy in predictions of failure, demonstrating that the proposed strategy is successful.

Research limitations/implications

This approach is scalable to a wide variety of applications to aid in failure prediction. These approaches are generalizable to many systems irrespective of the underlying physics. Even though we focus on decision tree-based machine learning techniques in this study, the general design strategy proposed can be used with all other supervised learning techniques like neural networks, boosting algorithms, support vector machines, and statistical methods.

Practical implications

This approach is applicable to many different types of systems that require maintenance and repair decision-making. A case is provided for a cloud data storage provider. The methods described in the case can be used in any number of systems and industrial applications, making this a very scalable case for industry practitioners. This scalability is possible as the machine learning techniques learn the correspondence between machine conditions and outcome state irrespective of the underlying physics governing the systems.

Social implications

Sustainable systems and operations require allocating and utilizing resources efficiently and effectively. This approach can help asset managers decide how to sustainably allocate resources by increasing uptime and utilization for expensive equipment.

Originality/value

This is a novel application and case study for decision tree-based machine learning that will aid researchers in developing tools and techniques in this area as well as those working in the artificial intelligence and service management space.

Details

International Journal of Quality & Reliability Management, vol. 37 no. 4
Type: Research Article
ISSN: 0265-671X

Keywords

Open Access
Article
Publication date: 16 August 2021

Bo Qiu and Wei Fan

Metropolitan areas suffer from frequent road traffic congestion not only during peak hours but also during off-peak periods. Different machine learning methods have been used in…

Abstract

Purpose

Metropolitan areas suffer from frequent road traffic congestion not only during peak hours but also during off-peak periods. Different machine learning methods have been used in travel time prediction, however, such machine learning methods practically face the problem of overfitting. Tree-based ensembles have been applied in various prediction fields, and such approaches usually produce high prediction accuracy by aggregating and averaging individual decision trees. The inherent advantages of these approaches not only get better prediction results but also have a good bias-variance trade-off which can help to avoid overfitting. However, the reality is that the application of tree-based integration algorithms in traffic prediction is still limited. This study aims to improve the accuracy and interpretability of the models by using random forest (RF) to analyze and model the travel time on freeways.

Design/methodology/approach

As the traffic conditions often greatly change, the prediction results are often unsatisfactory. To improve the accuracy of short-term travel time prediction in the freeway network, a practically feasible and computationally efficient RF prediction method for real-world freeways by using probe traffic data was generated. In addition, the variables’ relative importance was ranked, which provides an investigation platform to gain a better understanding of how different contributing factors might affect travel time on freeways.

Findings

The parameters of the RF model were estimated by using the training sample set. After the parameter tuning process was completed, the proposed RF model was developed. The features’ relative importance showed that the variables (travel time 15 min before) and time of day (TOD) contribute the most to the predicted travel time result. The model performance was also evaluated and compared against the extreme gradient boosting method and the results indicated that the RF always produces more accurate travel time predictions.

Originality/value

This research developed an RF method to predict the freeway travel time by using the probe vehicle-based traffic data and weather data. Detailed information about the input variables and data pre-processing were presented. To measure the effectiveness of proposed travel time prediction algorithms, the mean absolute percentage errors were computed for different observation segments combined with different prediction horizons ranging from 15 to 60 min.

Details

Smart and Resilient Transportation, vol. 3 no. 2
Type: Research Article
ISSN: 2632-0487

Keywords

Article
Publication date: 3 March 2020

Nesreen El-Rayes, Ming Fang, Michael Smith and Stephen M. Taylor

The purpose of this study is to develop tree-based binary classification models to predict the likelihood of employee attrition based on firm cultural and management attributes.

1613

Abstract

Purpose

The purpose of this study is to develop tree-based binary classification models to predict the likelihood of employee attrition based on firm cultural and management attributes.

Design/methodology/approach

A data set of resumes anonymously submitted through Glassdoor’s online portal is used in tandem with public company review information to fit decision tree, random forest and gradient boosted tree models to predict the probability of an employee leaving a firm during a job transition.

Findings

Random forest and decision tree methods are found to be the strongest attrition prediction models. In addition, compensation, company culture and senior management performance play a primary role in an employee’s decision to leave a firm.

Practical implications

This study may be used by human resources staff to better understand factors which influence employee attrition. In addition, techniques developed in this study may be applied to company-specific data sets to construct customized attrition models.

Originality/value

This study contains several novel contributions which include exploratory studies such as industry job transition percentages, distributional comparisons between factors strongly contributing to employee attrition between those who left or stayed with the firm and the first comprehensive search over binary classification models to identify which provides the strongest predictive performance of employee attrition.

Details

International Journal of Organizational Analysis, vol. 28 no. 6
Type: Research Article
ISSN: 1934-8835

Keywords

Article
Publication date: 5 July 2022

Mahesh Babu Mariappan, Kanniga Devi, Yegnanarayanan Venkataraman and Samuel Fosso Wamba

The purpose of this study is to present a large-scale real-world comparative study using pre-COVID lockdown data versus post-COVID lockdown data on predicting shipment times of…

Abstract

Purpose

The purpose of this study is to present a large-scale real-world comparative study using pre-COVID lockdown data versus post-COVID lockdown data on predicting shipment times of therapeutic supplies in e-pharmacy supply chains and show that our proposed methodology is robust to lockdown effects.

Design/methodology/approach

The researchers used organic data of over 5.9 million records of therapeutic shipments, with 2.87 million records collected pre-COVID lockdown and 3.03 million records collected post-COVID lockdown. The researchers built various Machine Learning (ML) classifier models on the two datasets, namely, Random Forest (RF), Extra Trees (XRT), Decision Tree (DT), Multi-Layer Perceptron (MLP), XGBoost (XGB), CatBoost (CB), Linear Stochastic Gradient Descent (SGD) and the Linear Naïve Bayes (NB). Then, the researchers stacked these base models and built meta models on top of them. Further, the researchers performed a detailed comparison of the performances of ML models on pre-COVID lockdown and post-COVID lockdown datasets.

Findings

The proposed approach attains performance of 93.5% on real-world post-COVID lockdown data and 91.35% on real-world pre-COVID lockdown data. In contrast, the turn-around times (TAT) provided by therapeutic supply logistics providers are 62.91% accurate compared to reality in post-COVID lockdown times and 73.68% accurate compared to reality pre-COVID lockdown times. Hence, it is clear that while the TAT provided by logistics providers has deteriorated in the post-pandemic business climate, the proposed method is robust to handle pandemic lockdown effects on e-pharmacy supply chains.

Research limitations/implications

The implication of the study provides a novel ML-based framework for predicting the shipment times of therapeutics, diagnostics and vaccines, and it is robust to COVID-19 lockdown effects.

Practical implications

E-pharmacy companies can readily adopt the proposed approach to enhance their supply chain management (SCM) capabilities and build resilience during COVID lockdown times.

Originality/value

The present study is one of the first to perform a large-scale real-world comparative analysis on predicting therapeutic supply shipment times in the e-pharmacy supply chain with novel ML ensemble stacking, obtaining robust results in these COVID lockdown times.

Details

International Journal of Physical Distribution & Logistics Management, vol. 52 no. 7
Type: Research Article
ISSN: 0960-0035

Keywords

Article
Publication date: 12 November 2019

Sasanka Choudhury, Dhirendra Nath Thatoi, Jhalak Hota and Mohan D. Rao

To avoid the structural defect, early crack detection is oneof the important aspects in the recent area of research. The purpose of this paper is to detect the crack before its…

Abstract

Purpose

To avoid the structural defect, early crack detection is oneof the important aspects in the recent area of research. The purpose of this paper is to detect the crack before its failure by means of its position and severity.

Design/methodology/approach

This paper uses two trees based regressors, namely, decision tree (DT) regressor and random forest (RF) regressor for their capabilities to adopt different types of parameter and generate simple rules by which the method can predict the crack parameters with better accuracy, making it possible to effectively predict the crack parameters such as its location and depth before failure of the beam.

Findings

The predicted parameters can be achieved, if the relationship between vibration and crack parameters can be attained. The relationship yields the results of beam natural frequencies, which is used as the input value for the regression techniques. It is observed that the RF regressor predicts the parameters with better accuracy as compared to DT regressor.

Originality/value

The idea is used the developed regression techniques to identify the crack parameters which are more effective as compared to other developed methods because the alternate name of prediction is called regression. The authors have used DT regressor and RF regressor to achieve the target. In this paper care has been given to the generalization of the model, so that the adaptability of the model can be ensured. The robustness of proposed methods has been verified in support of numerical and experimental analysis.

Details

International Journal of Structural Integrity, vol. 11 no. 6
Type: Research Article
ISSN: 1757-9864

Keywords

Open Access
Article
Publication date: 24 November 2023

Elena Higueras-Castillo, Helena Alves, Francisco Liébana-Cabanillas and Ángel F. Villarejo-Ramos

This study proposes a hierarchic segmentation that develops a tree-based classification model and classifies the cases into groups. This allows for the definition of e-commerce…

Abstract

Purpose

This study proposes a hierarchic segmentation that develops a tree-based classification model and classifies the cases into groups. This allows for the definition of e-commerce user profiles for each of the groups. Additionally, it facilitates the development of actions to improve the adoption of the online channel that is in such high demand in the current pandemic COVID-19 context.

Design/methodology/approach

Regarding the created segments, two extreme segments stand out due to their marked differences and high volume. Segment 3 with 23% of the sample is the group with the most predisposition to use the online channel and is characterised by a high level of trust, more habitual use in comparison with other groups and the belief that its use implies high performance, which indicates they believe it to be useful, quick and helpful for more an effective shopping experience. The other extreme is found in segment 7. This group makes up 17.7% of the total and is the most reluctant to use the online channel. These users are characterised by the complete opposite: they have a low level of trust in this channel. However, the effort expectancy is low, i.e. they consider that the adoption of the online channel does not involve many difficulties in its learning and use. Nevertheless, they use it less regularly than the others.

Findings

Based on the conclusions reached in this study, in the current pandemic context in which consumer demand for online shopping channels for all types of products is on the rise, it is recommended that companies focus on the following aspects. It is essential to build trust with the user and show them the real benefits of e-commerce, how it would improve their life and why they should use it. Additionally, it is vital that the user perceives it as an easy procedure that does not require a significant learning curve. Other fundamental aspects would be to reduce any uncertainty the user might have about the online shopping process, to make it as easy as possible, and to design a simple, intuitive and user-friendly interface. It is also recommendable to manage data usage efficiently. To do so, the authors recommend asking the user for the least amount of information possible, offering a data protection policy and assuring them that their information will not be misused nor shared with third parties. All of this provides a series of facilities to modify the online shopping habits of users.

Research limitations/implications

As in most of the research, this study presents a series of limitations that should be debated and that could open future lines of investigation. Firstly, regarding the sample used that was limited to two neighbouring countries with similar profiles a priori; it would be necessary to compare their possible cultural differences according to Hofstede's dimensions as well as increase the number of European countries being analysed to reach a more generalised conclusions. Secondly, the variables used are a combination of those derived from the UTAUT2 model and others suggested in the literature as decisive in technology adoption by users, in this sense other theories and variables could be incorporated to complete a more holistic model.

Practical implications

This work contributes in a general way to (1) analysing the intention to use e-commerce platforms from a set of antecedents previously defined by their importance, after a period of economic and social restrictions derived from the pandemic; (2) determination of customer segments from the classification made by the CHAID analysis; (3) characterisation of the previously defined segments through the successive divisions that were proposed in the analysis carried out.

Social implications

Other fundamental aspects would be to reduce any uncertainty the user might have about the online shopping process to make it as easy as possible, and to design a simple, intuitive, and user-friendly interface. It is also recommended to manage data usage efficiently. To do so, the authors recommend asking the user for the least amount of information possible, offering a data protection policy, and assuring them that their information will not be misused or shared with third parties.

Originality/value

The results obtained have allowed us to establish predictive and explanatory models of the behaviour of the segments and profiles created, which will help companies to improve their relationships with online customers in the coming years.

研究目的

本研究擬提出一個會發展基於樹的分類模型、以及會把案例歸入不同的類別的層次細分。這讓我們能為每個類別考慮到電子商務用戶輪廓的定義和解釋;這亦促進我們優化採用在線渠道的發展工作,而在線渠道於現時2019冠狀病毒病肆虐的情況下,實在供不應求。

研究設計/方法/理念

就創設的細分而言,兩個極端的細分因其明顯的差別和大批量而顯得突出。佔樣本百分之二十三的細分3是擁有最大使用在線渠道傾向的細分,而細分3的特徵包括他們對在線渠道呈高信任度,比其他類別更習慣地使用,以及其相信使用在線渠道會帶來更高的績效,這表示他們相信使用在線渠道是有效的,是快捷的,是可幫助帶來成功的購物體驗的。另外的極端在細分7內發現。這類別佔整體的百分之十七點七,而他們是最不願意使用在線渠道的類別。這類別的特徵和前述的剛剛相反:他們對在線渠道的信任程度是低的,唯其努力期望是低的,也就是說,他們認為使用在線渠道是不會涉及很多在學習上或在實際應用上的困難。即使是這樣,他們較其他人卻較少使用在線渠道。

研究結果

基於研究的結論,我們的建議是:於目前大流行肆虐期間,消費者對於以在線渠道網購各類商品的需求不斷增加,企業應聚焦以下的範疇:企業必須建立消費者對電子商務的信心,並為他們展示電子商務的真正好處;企業也必須使消費者明瞭電子商務如何能改善其生活,以及他們為何要使用電子商務。更重要的是使消費者覺得使用電子商務是輕而易舉的,又不涉及陡峭的學習曲線。凡此種種,就成為消費者改變其網上購物習慣的動力和誘因。至於其他基本的考慮,包括減輕消費者對使用電子商務的不確定情緒,使電子商務易於使用,以及設計一個簡易的、憑直覺能知曉的、方便使用的介面。另外,值得推薦的是、數據使用情況須有效地管理。為此,我們建議應儘量向使用者索取最低限度的資料,為他們提供資料保護政策,保證他們的資料不會被濫用或與第三者分享。

研究的局限

與其他大多數的研究一樣,本研究展現了一系列值得辯論的局限,而這些局限或許會開展未來研究的領域。首先,考慮到使用了一個局限於兩個以因及果演繹而成的、概況相似的相鄰國家為樣本,我們或許需要根據霍夫斯泰德文化維度理論對這兩個國家進行比較,以瞭解它們的文化差異;另外,為求能達致可普遍適用的結論,我們也需把被分析的歐洲國家的數目增加。其次,被使用的變數是兩組變數的組合,他們是從UTAUT2模型中取得的變數,以及在有關的文獻裡,就技術採用而言、使用者認為是重要的變數。就此而言,若其他的理論和變數能被包含其中,則達致的模型將會是一個更為整體的模型。

實務方面的啟示

本研究就一般而言有以下的貢獻:(一) 、 在因大流行病而引起的經濟和社會限制實施時期後,研究人員分析人們如何從一套過去被認定是電子商務平台的重要前身而選擇使用電子商務平台,本研究對這方面的分析作出了貢獻;(二) 、本研究幫助確定從透過CHAID分析而來的分類中得到的顧客細分;(三) 、本研究透過進行連續分解、幫助歸納過去被認定的細分的特徵。

社會方面的啟示

企業必須建立消費者對電子商務的信心,並為他們展示電子商務的真正好處;企業也必須使消費者明瞭電子商務如何能改善其生活,以及他們為何要使用電子商務。更重要的是使消費者覺得使用電子商務是輕而易舉的,又不涉及陡峭的學習曲線。凡此種種,就成為消費者改變其網上購物習慣的動力和誘因。至於其他基本的考慮,包括減輕消費者對使用電子商務的不確定情緒,使電子商務易於使用,以及設計一個簡易的、憑直覺能知曉的、方便使用的介面。另外,值得推薦的是、數據使用情況須有效地管理。為此,我們建議應儘量向使用者索取最低限度的資料,為他們提供資料保護政策,保證他們的資料不會被濫用或與第三者分享。

研究的原創性

本研究所得的結果,讓我們可以建立多個模型、以預測並解說有關的市場部分的行為和被創建的消費者簡介,這會幫助企業改善它們今後與網上顧客的關係。

Details

European Journal of Management and Business Economics, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2444-8451

Keywords

1 – 10 of 653