Search results
1 – 10 of 396Metropolitan areas suffer from frequent road traffic congestion not only during peak hours but also during off-peak periods. Different machine learning methods have been used in…
Abstract
Purpose
Metropolitan areas suffer from frequent road traffic congestion not only during peak hours but also during off-peak periods. Different machine learning methods have been used in travel time prediction, however, such machine learning methods practically face the problem of overfitting. Tree-based ensembles have been applied in various prediction fields, and such approaches usually produce high prediction accuracy by aggregating and averaging individual decision trees. The inherent advantages of these approaches not only get better prediction results but also have a good bias-variance trade-off which can help to avoid overfitting. However, the reality is that the application of tree-based integration algorithms in traffic prediction is still limited. This study aims to improve the accuracy and interpretability of the models by using random forest (RF) to analyze and model the travel time on freeways.
Design/methodology/approach
As the traffic conditions often greatly change, the prediction results are often unsatisfactory. To improve the accuracy of short-term travel time prediction in the freeway network, a practically feasible and computationally efficient RF prediction method for real-world freeways by using probe traffic data was generated. In addition, the variables’ relative importance was ranked, which provides an investigation platform to gain a better understanding of how different contributing factors might affect travel time on freeways.
Findings
The parameters of the RF model were estimated by using the training sample set. After the parameter tuning process was completed, the proposed RF model was developed. The features’ relative importance showed that the variables (travel time 15 min before) and time of day (TOD) contribute the most to the predicted travel time result. The model performance was also evaluated and compared against the extreme gradient boosting method and the results indicated that the RF always produces more accurate travel time predictions.
Originality/value
This research developed an RF method to predict the freeway travel time by using the probe vehicle-based traffic data and weather data. Detailed information about the input variables and data pre-processing were presented. To measure the effectiveness of proposed travel time prediction algorithms, the mean absolute percentage errors were computed for different observation segments combined with different prediction horizons ranging from 15 to 60 min.
Details
Keywords
Kerim Koc, Ömer Ekmekcioğlu and Asli Pelin Gurgun
Central to the entire discipline of construction safety management is the concept of construction accidents. Although distinctive progress has been made in safety management…
Abstract
Purpose
Central to the entire discipline of construction safety management is the concept of construction accidents. Although distinctive progress has been made in safety management applications over the last decades, construction industry still accounts for a considerable percentage of all workplace fatalities across the world. This study aims to predict occupational accident outcomes based on national data using machine learning (ML) methods coupled with several resampling strategies.
Design/methodology/approach
Occupational accident dataset recorded in Turkey was collected. To deal with the class imbalance issue between the number of nonfatal and fatal accidents, the dataset was pre-processed with random under-sampling (RUS), random over-sampling (ROS) and synthetic minority over-sampling technique (SMOTE). In addition, random forest (RF), Naïve Bayes (NB), K-Nearest neighbor (KNN) and artificial neural networks (ANNs) were employed as ML methods to predict accident outcomes.
Findings
The results highlighted that the RF outperformed other methods when the dataset was preprocessed with RUS. The permutation importance results obtained through the RF exhibited that the number of past accidents in the company, worker's age, material used, number of workers in the company, accident year, and time of the accident were the most significant attributes.
Practical implications
The proposed framework can be used in construction sites on a monthly-basis to detect workers who have a high probability to experience fatal accidents, which can be a valuable decision-making input for safety professionals to reduce the number of fatal accidents.
Social implications
Practitioners and occupational health and safety (OHS) departments of construction firms can focus on the most important attributes identified by analysis results to enhance the workers' quality of life and well-being.
Originality/value
The literature on accident outcome predictions is limited in terms of dealing with imbalanced dataset through integrated resampling techniques and ML methods in the construction safety domain. A novel utilization plan was proposed and enhanced by the analysis results.
Details
Keywords
Laouni Djafri, Djamel Amar Bensaber and Reda Adjoudj
This paper aims to solve the problems of big data analytics for prediction including volume, veracity and velocity by improving the prediction result to an acceptable level and in…
Abstract
Purpose
This paper aims to solve the problems of big data analytics for prediction including volume, veracity and velocity by improving the prediction result to an acceptable level and in the shortest possible time.
Design/methodology/approach
This paper is divided into two parts. The first one is to improve the result of the prediction. In this part, two ideas are proposed: the double pruning enhanced random forest algorithm and extracting a shared learning base from the stratified random sampling method to obtain a representative learning base of all original data. The second part proposes to design a distributed architecture supported by new technologies solutions, which in turn works in a coherent and efficient way with the sampling strategy under the supervision of the Map-Reduce algorithm.
Findings
The representative learning base obtained by the integration of two learning bases, the partial base and the shared base, presents an excellent representation of the original data set and gives very good results of the Big Data predictive analytics. Furthermore, these results were supported by the improved random forests supervised learning method, which played a key role in this context.
Originality/value
All companies are concerned, especially those with large amounts of information and want to screen them to improve their knowledge for the customer and optimize their campaigns.
Details
Keywords
Murat Özemre and Ozgur Kabadurmus
The purpose of this paper is to present a novel framework for strategic decision making using Big Data Analytics (BDA) methodology.
Abstract
Purpose
The purpose of this paper is to present a novel framework for strategic decision making using Big Data Analytics (BDA) methodology.
Design/methodology/approach
In this study, two different machine learning algorithms, Random Forest (RF) and Artificial Neural Networks (ANN) are employed to forecast export volumes using an extensive amount of open trade data. The forecasted values are included in the Boston Consulting Group (BCG) Matrix to conduct strategic market analysis.
Findings
The proposed methodology is validated using a hypothetical case study of a Chinese company exporting refrigerators and freezers. The results show that the proposed methodology makes accurate trade forecasts and helps to conduct strategic market analysis effectively. Also, the RF performs better than the ANN in terms of forecast accuracy.
Research limitations/implications
This study presents only one case study to test the proposed methodology. In future studies, the validity of the proposed method can be further generalized in different product groups and countries.
Practical implications
In today’s highly competitive business environment, an effective strategic market analysis requires importers or exporters to make better predictions and strategic decisions. Using the proposed BDA based methodology, companies can effectively identify new business opportunities and adjust their strategic decisions accordingly.
Originality/value
This is the first study to present a holistic methodology for strategic market analysis using BDA. The proposed methodology accurately forecasts international trade volumes and facilitates the strategic decision-making process by providing future insights into global markets.
Details
Keywords
Hafiz Syed Mohsin Abbas, Zahid Hussain Qaisar, Xiaodong Xu and Chunxia Sun
E-government development (EGD) is vital in enhancing the institutional quality and sustainable public service (SPS) delivery by eradicating corruption and cybersecurity crimes.
Abstract
Purpose
E-government development (EGD) is vital in enhancing the institutional quality and sustainable public service (SPS) delivery by eradicating corruption and cybersecurity crimes.
Design/methodology/approach
The present study applied econometric fixed-effect (FE) regression analysis and random forest (RF) algorithm through machine learning for comprehensive estimations in achieving SPS. This study gauges the nexus between the EGD as an independent variable and public service sustainability (PSS) as a proxy of public health services as a dependent variable in the presence of two moderators, corruption and cybersecurity indices from 47 Asian countries economies from 2015 to 2019.
Findings
The computational estimation and econometric findings show that EGD quality has improved with time in Asia and substantially promoted PSS. It further explores that exercising corruption control measures and introducing sound cybersecurity initiatives enhance PSS's quality and support the EDG effect much better.
Practical implications
The study concludes that E-Government has positively impacted PSS (healthcare) in Asia while controlling cybersecurity and institutional malfunctioning made an E-Government system healthier and SPS development in Asia.
Originality/value
This study added a novel contribution to existing E-Government and public services literature by comprehensively applied FE regression and RF algorithm analysis. Moreover, E-Government and cybersecurity improvement also has taken under consideration for PSS in Asian economies.
Details
Keywords
Carina Titus Swai and Steven Edward Mangowi
The general goal of this paper is to help educators understand the importance of MOOC training to school teachers and their hypothetical value for predicting the use of teaching…
Abstract
Purpose
The general goal of this paper is to help educators understand the importance of MOOC training to school teachers and their hypothetical value for predicting the use of teaching strategies in the face-to face-classroom teaching. With this purpose, the study is guided by two research questions: (1) Are there different patterns of preferences in teaching strategies among school teachers when they participate in MOOC training? (2) To what extent the attributes selected from the data set to visualize patterns are suitable for the formation of models?
Design/methodology/approach
Peer instruction (PI) and think-pair-share (TPS) strategies might bring positive outcome during classroom teaching. When introduced properly to school teachers, these strategies help students see reason beyond the answers by sharing with other students their response and thus learning from each other. This study aims to use educational data mining (EDM) techniques to visualize patterns and propose models based on the teaching strategies training to be used in face-to-face classroom teaching. The data set includes five attributes extracted from school teachers' Massive Open Online Courses (MOOC) training interaction data. All analysis and visualization were performed using Python, and the models were evaluated using fivefold cross-validation. The modeling performance of three different algorithms (decision tree, random forest and K-means) was tested on the data set. The results of model accuracy were presented as a confusion matrix. The experimental results indicate that the random forest (RF) algorithm outperforms decision tree (DT) and K-means algorithms with an accuracy of 96.4%.
Findings
This visualization information on the grouping of school teachers based on the teaching strategies serves as an essential reference for school teachers choosing between the two types of strategies within their face-to-face classroom settings. Teachers may use the finding obtained for an initial understanding of which strategies will fit well on their classroom teaching based on their subject majors. Moreover, the classification accuracy rates of DT and RF algorithms were the highest and considered highly significant to allow developing predictive models for similar EDM cases and provide a positive effect on the learning environment.
Research limitations/implications
This visualization information on the grouping of school teachers based on the teaching strategies serves as an essential reference for school teachers choosing between the two types of strategies within their face-to-face classroom settings. Teachers may use the finding obtained for an initial understanding of which strategies will fit well on their classroom teaching based on their subject majors. Unlike predicting different patterns of preferences in teaching strategies among school teachers when they participate in MOOC training, using visualization was found much more comfortable, less complicated and more time-efficient for small data sets. Moreover, the classification accuracy rates of decision tree and random forest algorithms were the highest and considered highly significant to allow developing predictive models for similar educational data mining cases and provide a positive effect on the learning environment.
Practical implications
DT classifier in this study ranks first before model optimization, but second after model optimization in terms of accuracy. Therefore, the goodness of the indicators needs to be further studied to devise a reasonable intervention.
Social implications
A different group of school teachers attending training on teaching strategies in a different online platform is required in future research to cross-validate these study findings.
Originality/value
The authors declare that this submission is their own work and to the best of their knowledge it contains no materials previously published or written by another person, or substantial proportions of material that have been accepted for the award of any other degree at any other educational institution.
Details
Keywords
Financial statement fraud (FSF) committed by companies implies the current status of the companies may not be healthy. As such, it is important to detect FSF, since such companies…
Abstract
Purpose
Financial statement fraud (FSF) committed by companies implies the current status of the companies may not be healthy. As such, it is important to detect FSF, since such companies tend to conceal bad information, which causes a great loss to various stakeholders. Thus, the objective of the paper is to propose a novel approach to building a classification model to identify FSF, which shows high classification performance and from which human-readable rules are extracted to explain why a company is likely to commit FSF.
Design/methodology/approach
Having prepared multiple sub-datasets to cope with class imbalance problem, we build a set of decision trees for each sub-dataset; select a subset of the set as a model for the sub-dataset by removing the tree, each of whose performance is less than the average accuracy of all trees in the set; and then select one such model which shows the best accuracy among the models. We call the resulting model MRF (Modified Random Forest). Given a new instance, we extract rules from the MRF model to explain whether the company corresponding to the new instance is likely to commit FSF or not.
Findings
Experimental results show that MRF classifier outperformed the benchmark models. The results also revealed that all the variables related to profit belong to the set of the most important indicators to FSF and that two new variables related to gross profit which were unapprised in previous studies on FSF were identified.
Originality/value
This study proposed a method of building a classification model which shows the outstanding performance and provides decision rules that can be used to explain the classification results. In addition, a new way to resolve the class imbalance problem was suggested in this paper.
Details
Keywords
Arthi R., Nayana J.S. and Rajarshee Mondal
The purpose of optimal protocol prediction and the benefits offered by quantum key distribution (QKD), including unbreakable security, there is a growing interest in the practical…
Abstract
Purpose
The purpose of optimal protocol prediction and the benefits offered by quantum key distribution (QKD), including unbreakable security, there is a growing interest in the practical realization of quantum communication. Realization of the optimal protocol predictor in quantum key distribution is a critical step toward commercialization of QKD.
Design/methodology/approach
The proposed work designs a machine learning model such as K-nearest neighbor algorithm, convolutional neural networks, decision tree (DT), support vector machine and random forest (RF) for optimal protocol selector for quantum key distribution network (QKDN).
Findings
Because of the effectiveness of machine learning methods in predicting effective solutions using data, these models will be the best optimal protocol selectors for achieving high efficiency for QKDN. The results show that the best machine learning method for predicting optimal protocol in QKD is the RF algorithm. It also validates the effectiveness of machine learning in optimal protocol selection.
Originality/value
The proposed work was done using algorithms like the local search algorithm or exhaustive traversal, however the major downside of using these algorithms is that it takes a very long time to revert back results, which is unacceptable for commercial systems. Hence, machine learning methods are proposed to see the effectiveness of prediction for achieving high efficiency.
Details
Keywords
Rui Tian, Ruheng Yin and Feng Gan
Music sentiment analysis helps to promote the diversification of music information retrieval methods. Traditional music emotion classification tasks suffer from high manual…
Abstract
Purpose
Music sentiment analysis helps to promote the diversification of music information retrieval methods. Traditional music emotion classification tasks suffer from high manual workload and low classification accuracy caused by difficulty in feature extraction and inaccurate manual determination of hyperparameter. In this paper, the authors propose an optimized convolution neural network-random forest (CNN-RF) model for music sentiment classification which is capable of optimizing the manually selected hyperparameters to improve the accuracy of music sentiment classification and reduce labor costs and human classification errors.
Design/methodology/approach
A CNN-RF music sentiment classification model is designed based on quantum particle swarm optimization (QPSO). First, the audio data are transformed into a Mel spectrogram, and feature extraction is conducted by a CNN. Second, the music features extracted are processed by RF algorithm to complete a preliminary emotion classification. Finally, to select the suitable hyperparameters for a CNN, the QPSO algorithm is adopted to extract the best hyperparameters and obtain the final classification results.
Findings
The model has gone through experimental validations and achieved a classification accuracy of 97 per cent for different sentiment categories with shortened training time. The proposed method with QPSO achieved 1.2 and 1.6 per cent higher accuracy than that with particle swarm optimization and genetic algorithm, respectively. The proposed model had great potential for music sentiment classification.
Originality/value
The dual contribution of this work comprises the proposed model which integrated two deep learning models and the introduction of a QPSO into model optimization. With these two innovations, the efficiency and accuracy of music emotion recognition and classification have been significantly improved.
Details
Keywords
Gaurav Kumar, Molla Ramizur Rahman, Abhinav Rajverma and Arun Kumar Misra
This study aims to analyse the systemic risk emitted by all publicly listed commercial banks in a key emerging economy, India.
Abstract
Purpose
This study aims to analyse the systemic risk emitted by all publicly listed commercial banks in a key emerging economy, India.
Design/methodology/approach
The study makes use of the Tobias and Brunnermeier (2016) estimator to quantify the systemic risk (ΔCoVaR) that banks contribute to the system. The methodology addresses a classification problem based on the probability that a particular bank will emit high systemic risk or moderate systemic risk. The study applies machine learning models such as logistic regression, random forest (RF), neural networks and gradient boosting machine (GBM) and addresses the issue of imbalanced data sets to investigate bank’s balance sheet features and bank’s stock features which may potentially determine the factors of systemic risk emission.
Findings
The study reports that across various performance matrices, the authors find that two specifications are preferred: RF and GBM. The study identifies lag of the estimator of systemic risk, stock beta, stock volatility and return on equity as important features to explain emission of systemic risk.
Practical implications
The findings will help banks and regulators with the key features that can be used to formulate the policy decisions.
Originality/value
This study contributes to the existing literature by suggesting classification algorithms that can be used to model the probability of systemic risk emission in a classification problem setting. Further, the study identifies the features responsible for the likelihood of systemic risk.
Details