Search results
1 – 10 of 260Rahul Priyadarshi, Akash Panigrahi, Srikanta Routroy and Girish Kant Garg
The purpose of this study is to select the appropriate forecasting model at the retail stage for selected vegetables on the basis of performance analysis.
Abstract
Purpose
The purpose of this study is to select the appropriate forecasting model at the retail stage for selected vegetables on the basis of performance analysis.
Design/methodology/approach
Various forecasting models such as the Box–Jenkins-based auto-regressive integrated moving average model and machine learning-based algorithms such as long short-term memory (LSTM) networks, support vector regression (SVR), random forest regression, gradient boosting regression (GBR) and extreme GBR (XGBoost/XGBR) were proposed and applied (i.e. modeling, training, testing and predicting) at the retail stage for selected vegetables to forecast demand. The performance analysis (i.e. forecasting error analysis) was carried out to select the appropriate forecasting model at the retail stage for selected vegetables.
Findings
From the obtained results for a case environment, it was observed that the machine learning algorithms, namely LSTM and SVR, produced the better results in comparison with other different demand forecasting models.
Research limitations/implications
The results obtained from the case environment cannot be generalized. However, it may be used for forecasting of different agriculture produces at the retail stage, capturing their demand environment.
Practical implications
The implementation of LSTM and SVR for the case situation at the retail stage will reduce the forecast error, daily retail inventory and fresh produce wastage and will increase the daily revenue.
Originality/value
The demand forecasting model selection for agriculture produce at the retail stage on the basis of performance analysis is a unique study where both traditional and non-traditional models were analyzed and compared.
Details
Keywords
Ian Lenaers, Kris Boudt and Lieven De Moor
The purpose is twofold. First, this study aims to establish that black box tree-based machine learning (ML) models have better predictive performance than a standard linear…
Abstract
Purpose
The purpose is twofold. First, this study aims to establish that black box tree-based machine learning (ML) models have better predictive performance than a standard linear regression (LR) hedonic model for rent prediction. Second, it shows the added value of analyzing tree-based ML models with interpretable machine learning (IML) techniques.
Design/methodology/approach
Data on Belgian residential rental properties were collected. Tree-based ML models, random forest regression and eXtreme gradient boosting regression were applied to derive rent prediction models to compare predictive performance with a LR model. Interpretations of the tree-based models regarding important factors in predicting rent were made using SHapley Additive exPlanations (SHAP) feature importance (FI) plots and SHAP summary plots.
Findings
Results indicate that tree-based models perform better than a LR model for Belgian residential rent prediction. The SHAP FI plots agree that asking price, cadastral income, surface livable, number of bedrooms, number of bathrooms and variables measuring the proximity to points of interest are dominant predictors. The direction of relationships between rent and its factors is determined with SHAP summary plots. In addition to linear relationships, it emerges that nonlinear relationships exist.
Originality/value
Rent prediction using ML is relatively less studied than house price prediction. In addition, studying prediction models using IML techniques is relatively new in real estate economics. Moreover, to the best of the authors’ knowledge, this study is the first to derive insights of driving determinants of predicted rents from SHAP FI and SHAP summary plots.
Details
Keywords
Jinghui Deng, Qiyou Cheng and Xing Lu
Helicopter fuselage vibration prediction is important to keep a safety and comfortable flight process. The helicopter vibration mechanism model is difficult to meet of demand for…
Abstract
Purpose
Helicopter fuselage vibration prediction is important to keep a safety and comfortable flight process. The helicopter vibration mechanism model is difficult to meet of demand for accurate vibration prediction. Thus, the purpose of this paper is to develop an intelligent algorithm for accurate helicopter fuselage vibration analysis.
Design/methodology/approach
In this research, a novel weighted variational mode decomposition (VMD)- extreme gradient boosting (xgboost) helicopter fuselage vibration prediction model is proposed. The vibration data is decomposed and reconstructed by the signal clustering results. The vibration response is predicted by xgboost algorithm based on the reconstructed data. The information transfer order between the controllable flight data and flight attitude are analyzed.
Findings
The mean absolute percentage error (MAPE), root mean square error (RMSE) and mean absolute error (MAE) of the proposed weighted VMD-xgboost model are decreased by 6.8%, 31.5% and 32.8% compared with xgboost model. The established weighted VMD-xgboost model has the highest prediction accuracy with the lowest mean MAPE, RMSE and MAE of 4.54%, 0.0162, and 0.0131, respectively. The attitude of horizontal tail and cycle pitch are the key factors to vibration.
Originality/value
A novel weighted VMD-xgboost intelligent prediction methods is proposed. The prediction effect of xgboost model is highly improved by using the signal-weighted reconstruction technique. In addition, the data set used is collected from actual helicopter flight process.
Details
Keywords
Cheng Zhang and Zehao Ye
Owing to the consumption of considerable resources in developing physical pipe prediction models and the fact that the statistical models cannot fit the failure records perfectly…
Abstract
Purpose
Owing to the consumption of considerable resources in developing physical pipe prediction models and the fact that the statistical models cannot fit the failure records perfectly, the purpose of this paper is to use data mining method to analyze and predict the risks of water pipe failure via considering attributes and location of pipes in historical failure records. One of the Automatized Machine Learning (AutoML) methods, tree-based pipeline optimization technique (TPOT) was used as the key data mining technique in this research.
Design/methodology/approach
By considering pipeline attributes, environmental factors and historical pipeline broke/breaks records, a water pipeline failure prediction method is proposed in this research. Regression analysis, genetic algorithm, machine learning, data mining approaches are used to analyze and predict the probability of pipeline failure. TPOT was used as the key data mining technique. A case study was carried out in a specific area in China to investigate the relationships between pipeline broke/breaks and relevant parameters, such as pipeline age, materials, diameter, pipeline density and so on.
Findings
By integrating the prediction models for individual pipelines and small research regions, a prediction model is developed to describe the probability of water pipe failures and validated by real data. A high fitting degree is achieved, which means a good potential of using the proposed method in reality as a guideline for identifying areas with high risks and taking proactive measures and optimizing the resources allocation for water supply companies.
Originality/value
Different models are developed to have better prediction on regional or individual pipeline. A comparison between the predicted values with real records has shown that a preliminary model has a good potential in predicting the future failure risks.
Details
Keywords
The purpose of this study is to evaluate the performance of the ensemble learning models, such as the Random Forest and Extreme Gradient Boosting models, in predicting the…
Abstract
Purpose
The purpose of this study is to evaluate the performance of the ensemble learning models, such as the Random Forest and Extreme Gradient Boosting models, in predicting the direction of the Japan real estate investment trusts (J-REITs) at different return horizons, based on input obtained from various technical indicators.
Design/methodology/approach
This study measures the predictability of J-REITs with technical indicators by using different horizons of REITs' return and machine learning models. The ensemble learning models includes Random Forest and Extreme Gradient Boosting models while the return horizons of REITs ranging from 1 to 300 days. The results were further split into individual years to check for the consistency of the performance across time.
Findings
The Extreme Gradient Boosting appears to be the best method in improving forecast accuracy but not the trading return. A wider return horizons platform seemed to deliver a relatively better performance in both forecast accuracy and trading return, when compared to the return horizon of one.
Practical implications
It is recommended that the Extreme Gradient Boosting and Random Forest model be considered by practitioners for back-testing trading model. In addition, selecting different return horizons so as to achieve a better performance in trading/investment should also be considered.
Originality/value
The predictability of J-REITs using technical indicators was compared among different returns horizons and the models (Extreme Gradient Boosting and Random Forest).
Details
Keywords
Suraj Kulkarni, Suhas Suresh Ambekar and Manoj Hudnurkar
Increasing health-care costs are a major concern, especially in the USA. The purpose of this paper is to predict the hospital charges of a patient before being admitted. This will…
Abstract
Purpose
Increasing health-care costs are a major concern, especially in the USA. The purpose of this paper is to predict the hospital charges of a patient before being admitted. This will help a patient who is getting admitted: “electively” can plan his/her finance. Also, this can be used as a tool by payers (insurance companies) to better forecast the amount that a patient might claim.
Design/methodology/approach
This research method involves secondary data collected from New York state’s patient discharges of 2017. A stratified sampling technique is used to sample the data from the population, feature engineering is done on categorical variables. Different regression techniques are being used to predict the target value “total charges.”
Findings
Total cost varies linearly with the length of stay. Among all the machine learning algorithms considered, namely, random forest, stochastic gradient descent (SGD) regressor, K nearest neighbors regressor, extreme gradient boosting regressor and gradient boosting regressor, random forest regressor had the best accuracy with R2 value 0.7753. “Age group” was the most important predictor among all the features.
Practical implications
This model can be helpful for patients who want to compare the cost at different hospitals and can plan their finances accordingly in case of “elective” admission. Insurance companies can predict how much a patient with a particular medical condition might claim by getting admitted to the hospital.
Originality/value
Health care can be a costly affair if not planned properly. This research gives patients and insurance companies a better prediction of the total cost that they might incur.
Details
Keywords
Julián Martínez-Vargas, Pedro Carmona and Pol Torrelles
The purpose of this paper is to study the influence of different quantitative (traditionally used) and qualitative variables, such as the possible negative effect in determined…
Abstract
Purpose
The purpose of this paper is to study the influence of different quantitative (traditionally used) and qualitative variables, such as the possible negative effect in determined periods of certain socio-political factors on share price formation.
Design/methodology/approach
We first analyse descriptively the evolution of the Ibex-35 in recent years and compare it with other international benchmark indices. Bellow, two techniques have been compared: a classic linear regression statistical model (GLM) and a method based on machine learning techniques called Extreme Gradient Boosting (XGBoost).
Findings
XGBoost yields a very accurate market value prediction model that clearly outperforms the other, with a coefficient of determination close to 90%, calculated on validation sets.
Practical implications
According to our analysis, individual accounts are equally or more important than consolidated information in predicting the behaviour of share prices. This would justify Spain maintaining the obligation to present individual interim financial statements, which does not happen in other European Union countries because IAS 34 only stipulates consolidated interim financial statements.
Social implications
The descriptive analysis allows us to see how the Ibex-35 has moved away from international trends, especially in periods in which some relevant socio-political events occurred, such as the independence referendum in Catalonia, the double elections of 2019 or the early handling of the Covid-19 pandemic in 2020.
Originality/value
Compared to other variables, the XGBoost model assigns little importance to socio-political factors when it comes to share price formation; however, this model explains 89.33% of its variance.
Propósito
El propósito de este artículo es estudiar la influencia de diferentes variables cuantitativas (tradicionalmente usadas) y cualitativas, como la posible influencia negativa en determinados períodos de ciertos factores sociopolíticos, sobre la formación del precio de.
Diseño/metodología/enfoque
Primero analizamos de forma descriptiva la evolución del Ibex-35 en los últimos años y la comparamos con la de otros índices internacionales de referencia. A continuación, se han contrastado dos técnicas: un modelo estadístico clásico de regresión lineal (GLM) y un método basado en el aprendizaje automático denominado Extreme Gradient Boosting (XGBoost).
Resultados
XGBoost nos permite obtener un modelo de predicción del valor de mercado muy preciso y claramente superior al otro, con un coeficiente de determinación cercano al 90%, calculado sobre las muestras de validación.
Implicaciones prácticas
De acuerdo con nuestro análisis, la información contable individual es igual o más importante que la consolidada para predecir el comportamiento del precio de las acciones. Esto justificaría que España mantenga la obligación de presentar estados financieros intermedios individuales, lo que no ocurre en otros países de la Unión Europea porque la NIC 34 solo obliga a realizar estados financieros intermedios consolidados.
Implicaciones sociales
El análisis descriptivo permite ver cómo el Ibex-35 se ha alejado de las tendencias internacionales, especialmente en periodos en los que se produjo algún hecho sociopolítico relevante, como el referéndum de autodeterminación de Cataluña, el doble proceso electoral de 2019 o la gestión inicial de la pandemia generada por el Covid-19.
Originalidad/valor
En comparación con otras variables, el modelo XGBoost asigna poca importancia a los factores sociopolíticos cuando se trata de la formación del precio de las acciones; sin embargo, este modelo explica el 89.33% de su varianza.
Details
Keywords
Farshad Peiman, Mohammad Khalilzadeh, Nasser Shahsavari-Pour and Mehdi Ravanshadnia
Earned value management (EVM)–based models for estimating project actual duration (AD) and cost at completion using various methods are continuously developed to improve the…
Abstract
Purpose
Earned value management (EVM)–based models for estimating project actual duration (AD) and cost at completion using various methods are continuously developed to improve the accuracy and actualization of predicted values. This study primarily aimed to examine natural gradient boosting (NGBoost-2020) with the classification and regression trees (CART) base model (base learner). To the best of the authors' knowledge, this concept has never been applied to EVM AD forecasting problem. Consequently, the authors compared this method to the single K-nearest neighbor (KNN) method, the ensemble method of extreme gradient boosting (XGBoost-2016) with the CART base model and the optimal equation of EVM, the earned schedule (ES) equation with the performance factor equal to 1 (ES1). The paper also sought to determine the extent to which the World Bank's two legal factors affect countries and how the two legal causes of delay (related to institutional flaws) influence AD prediction models.
Design/methodology/approach
In this paper, data from 30 construction projects of various building types in Iran, Pakistan, India, Turkey, Malaysia and Nigeria (due to the high number of delayed projects and the detrimental effects of these delays in these countries) were used to develop three models. The target variable of the models was a dimensionless output, the ratio of estimated duration to completion (ETC(t)) to planned duration (PD). Furthermore, 426 tracking periods were used to build the three models, with 353 samples and 23 projects in the training set, 73 patterns (17% of the total) and six projects (21% of the total) in the testing set. Furthermore, 17 dimensionless input variables were used, including ten variables based on the main variables and performance indices of EVM and several other variables detailed in the study. The three models were subsequently created using Python and several GitHub-hosted codes.
Findings
For the testing set of the optimal model (NGBoost), the better percentage mean (better%) of the prediction error (based on projects with a lower error percentage) of the NGBoost compared to two KNN and ES1 single models, as well as the total mean absolute percentage error (MAPE) and mean lags (MeLa) (indicating model stability) were 100, 83.33, 5.62 and 3.17%, respectively. Notably, the total MAPE and MeLa for the NGBoost model testing set, which had ten EVM-based input variables, were 6.74 and 5.20%, respectively. The ensemble artificial intelligence (AI) models exhibited a much lower MAPE than ES1. Additionally, ES1 was less stable in prediction than NGBoost. The possibility of excessive and unusual MAPE and MeLa values occurred only in the two single models. However, on some data sets, ES1 outperformed AI models. NGBoost also outperformed other models, especially single models for most developing countries, and was more accurate than previously presented optimized models. In addition, sensitivity analysis was conducted on the NGBoost predicted outputs of 30 projects using the SHapley Additive exPlanations (SHAP) method. All variables demonstrated an effect on ETC(t)/PD. The results revealed that the most influential input variables in order of importance were actual time (AT) to PD, regulatory quality (RQ), earned duration (ED) to PD, schedule cost index (SCI), planned complete percentage, rule of law (RL), actual complete percentage (ACP) and ETC(t) of the ES optimal equation to PD. The probabilistic hybrid model was selected based on the outputs predicted by the NGBoost and XGBoost models and the MAPE values from three AI models. The 95% prediction interval of the NGBoost–XGBoost model revealed that 96.10 and 98.60% of the actual output values of the testing and training sets are within this interval, respectively.
Research limitations/implications
Due to the use of projects performed in different countries, it was not possible to distribute the questionnaire to the managers and stakeholders of 30 projects in six developing countries. Due to the low number of EVM-based projects in various references, it was unfeasible to utilize other types of projects. Future prospects include evaluating the accuracy and stability of NGBoost for timely and non-fluctuating projects (mostly in developed countries), considering a greater number of legal/institutional variables as input, using legal/institutional/internal/inflation inputs for complex projects with extremely high uncertainty (such as bridge and road construction) and integrating these inputs and NGBoost with new technologies (such as blockchain, radio frequency identification (RFID) systems, building information modeling (BIM) and Internet of things (IoT)).
Practical implications
The legal/intuitive recommendations made to governments are strict control of prices, adequate supervision, removal of additional rules, removal of unfair regulations, clarification of the future trend of a law change, strict monitoring of property rights, simplification of the processes for obtaining permits and elimination of unnecessary changes particularly in developing countries and at the onset of irregular projects with limited information and numerous uncertainties. Furthermore, the managers and stakeholders of this group of projects were informed of the significance of seven construction variables (institutional/legal external risks, internal factors and inflation) at an early stage, using time series (dynamic) models to predict AD, accurate calculation of progress percentage variables, the effectiveness of building type in non-residential projects, regular updating inflation during implementation, effectiveness of employer type in the early stage of public projects in addition to the late stage of private projects, and allocating reserve duration (buffer) in order to respond to institutional/legal risks.
Originality/value
Ensemble methods were optimized in 70% of references. To the authors' knowledge, NGBoost from the set of ensemble methods was not used to estimate construction project duration and delays. NGBoost is an effective method for considering uncertainties in irregular projects and is often implemented in developing countries. Furthermore, AD estimation models do fail to incorporate RQ and RL from the World Bank's worldwide governance indicators (WGI) as risk-based inputs. In addition, the various WGI, EVM and inflation variables are not combined with substantial degrees of delay institutional risks as inputs. Consequently, due to the existence of critical and complex risks in different countries, it is vital to consider legal and institutional factors. This is especially recommended if an in-depth, accurate and reality-based method like SHAP is used for analysis.
Details
Keywords
This study aims to examine the association between board gender diversity (BGD) and workplace diversity and the relative importance of various board and firm characteristics in…
Abstract
Purpose
This study aims to examine the association between board gender diversity (BGD) and workplace diversity and the relative importance of various board and firm characteristics in predicting diversity.
Design/methodology/approach
With a novel machine learning (ML) approach, this study models the association between three workplace diversity variables and BGD using a social media data set of approximately 250,000 employee reviews. Using the tools of explainable artificial intelligence, the authors interpret the results of the ML model.
Findings
The results show that BGD has a strong positive association with the gender equality and inclusiveness dimensions of corporate diversity culture. However, BGD is found to have a weak negative association with age diversity in a company. Furthermore, the authors find that workplace diversity is an important predictor of firm value, indicating a possible channel on how BGD affects firm performance.
Originality/value
The effects of BGD on workplace diversity below management levels are mainly omitted in the current corporate governance literature. Furthermore, existing research has not considered different dimensions of this diversity and has mainly focused on its gender aspects. In this study, the authors address this research problem and examine how BGD affects different dimensions of diversity at the overall company level. This study reveals important associations and identifies key variables that should be included as a part of theoretical causal models in future research.
Details