Search results

1 – 10 of 396
Open Access
Article
Publication date: 16 August 2021

Bo Qiu and Wei Fan

Metropolitan areas suffer from frequent road traffic congestion not only during peak hours but also during off-peak periods. Different machine learning methods have been used in…

Abstract

Purpose

Metropolitan areas suffer from frequent road traffic congestion not only during peak hours but also during off-peak periods. Different machine learning methods have been used in travel time prediction, however, such machine learning methods practically face the problem of overfitting. Tree-based ensembles have been applied in various prediction fields, and such approaches usually produce high prediction accuracy by aggregating and averaging individual decision trees. The inherent advantages of these approaches not only get better prediction results but also have a good bias-variance trade-off which can help to avoid overfitting. However, the reality is that the application of tree-based integration algorithms in traffic prediction is still limited. This study aims to improve the accuracy and interpretability of the models by using random forest (RF) to analyze and model the travel time on freeways.

Design/methodology/approach

As the traffic conditions often greatly change, the prediction results are often unsatisfactory. To improve the accuracy of short-term travel time prediction in the freeway network, a practically feasible and computationally efficient RF prediction method for real-world freeways by using probe traffic data was generated. In addition, the variables’ relative importance was ranked, which provides an investigation platform to gain a better understanding of how different contributing factors might affect travel time on freeways.

Findings

The parameters of the RF model were estimated by using the training sample set. After the parameter tuning process was completed, the proposed RF model was developed. The features’ relative importance showed that the variables (travel time 15 min before) and time of day (TOD) contribute the most to the predicted travel time result. The model performance was also evaluated and compared against the extreme gradient boosting method and the results indicated that the RF always produces more accurate travel time predictions.

Originality/value

This research developed an RF method to predict the freeway travel time by using the probe vehicle-based traffic data and weather data. Detailed information about the input variables and data pre-processing were presented. To measure the effectiveness of proposed travel time prediction algorithms, the mean absolute percentage errors were computed for different observation segments combined with different prediction horizons ranging from 15 to 60 min.

Details

Smart and Resilient Transportation, vol. 3 no. 2
Type: Research Article
ISSN: 2632-0487

Keywords

Article
Publication date: 23 June 2022

Kerim Koc, Ömer Ekmekcioğlu and Asli Pelin Gurgun

Central to the entire discipline of construction safety management is the concept of construction accidents. Although distinctive progress has been made in safety management…

Abstract

Purpose

Central to the entire discipline of construction safety management is the concept of construction accidents. Although distinctive progress has been made in safety management applications over the last decades, construction industry still accounts for a considerable percentage of all workplace fatalities across the world. This study aims to predict occupational accident outcomes based on national data using machine learning (ML) methods coupled with several resampling strategies.

Design/methodology/approach

Occupational accident dataset recorded in Turkey was collected. To deal with the class imbalance issue between the number of nonfatal and fatal accidents, the dataset was pre-processed with random under-sampling (RUS), random over-sampling (ROS) and synthetic minority over-sampling technique (SMOTE). In addition, random forest (RF), Naïve Bayes (NB), K-Nearest neighbor (KNN) and artificial neural networks (ANNs) were employed as ML methods to predict accident outcomes.

Findings

The results highlighted that the RF outperformed other methods when the dataset was preprocessed with RUS. The permutation importance results obtained through the RF exhibited that the number of past accidents in the company, worker's age, material used, number of workers in the company, accident year, and time of the accident were the most significant attributes.

Practical implications

The proposed framework can be used in construction sites on a monthly-basis to detect workers who have a high probability to experience fatal accidents, which can be a valuable decision-making input for safety professionals to reduce the number of fatal accidents.

Social implications

Practitioners and occupational health and safety (OHS) departments of construction firms can focus on the most important attributes identified by analysis results to enhance the workers' quality of life and well-being.

Originality/value

The literature on accident outcome predictions is limited in terms of dealing with imbalanced dataset through integrated resampling techniques and ML methods in the construction safety domain. A novel utilization plan was proposed and enhanced by the analysis results.

Details

Engineering, Construction and Architectural Management, vol. 30 no. 9
Type: Research Article
ISSN: 0969-9988

Keywords

Article
Publication date: 20 August 2018

Laouni Djafri, Djamel Amar Bensaber and Reda Adjoudj

This paper aims to solve the problems of big data analytics for prediction including volume, veracity and velocity by improving the prediction result to an acceptable level and in…

Abstract

Purpose

This paper aims to solve the problems of big data analytics for prediction including volume, veracity and velocity by improving the prediction result to an acceptable level and in the shortest possible time.

Design/methodology/approach

This paper is divided into two parts. The first one is to improve the result of the prediction. In this part, two ideas are proposed: the double pruning enhanced random forest algorithm and extracting a shared learning base from the stratified random sampling method to obtain a representative learning base of all original data. The second part proposes to design a distributed architecture supported by new technologies solutions, which in turn works in a coherent and efficient way with the sampling strategy under the supervision of the Map-Reduce algorithm.

Findings

The representative learning base obtained by the integration of two learning bases, the partial base and the shared base, presents an excellent representation of the original data set and gives very good results of the Big Data predictive analytics. Furthermore, these results were supported by the improved random forests supervised learning method, which played a key role in this context.

Originality/value

All companies are concerned, especially those with large amounts of information and want to screen them to improve their knowledge for the customer and optimize their campaigns.

Details

Information Discovery and Delivery, vol. 46 no. 3
Type: Research Article
ISSN: 2398-6247

Keywords

Article
Publication date: 26 May 2020

Murat Özemre and Ozgur Kabadurmus

The purpose of this paper is to present a novel framework for strategic decision making using Big Data Analytics (BDA) methodology.

2564

Abstract

Purpose

The purpose of this paper is to present a novel framework for strategic decision making using Big Data Analytics (BDA) methodology.

Design/methodology/approach

In this study, two different machine learning algorithms, Random Forest (RF) and Artificial Neural Networks (ANN) are employed to forecast export volumes using an extensive amount of open trade data. The forecasted values are included in the Boston Consulting Group (BCG) Matrix to conduct strategic market analysis.

Findings

The proposed methodology is validated using a hypothetical case study of a Chinese company exporting refrigerators and freezers. The results show that the proposed methodology makes accurate trade forecasts and helps to conduct strategic market analysis effectively. Also, the RF performs better than the ANN in terms of forecast accuracy.

Research limitations/implications

This study presents only one case study to test the proposed methodology. In future studies, the validity of the proposed method can be further generalized in different product groups and countries.

Practical implications

In today’s highly competitive business environment, an effective strategic market analysis requires importers or exporters to make better predictions and strategic decisions. Using the proposed BDA based methodology, companies can effectively identify new business opportunities and adjust their strategic decisions accordingly.

Originality/value

This is the first study to present a holistic methodology for strategic market analysis using BDA. The proposed methodology accurately forecasts international trade volumes and facilitates the strategic decision-making process by providing future insights into global markets.

Details

Journal of Enterprise Information Management, vol. 33 no. 6
Type: Research Article
ISSN: 1741-0398

Keywords

Article
Publication date: 28 September 2021

Hafiz Syed Mohsin Abbas, Zahid Hussain Qaisar, Xiaodong Xu and Chunxia Sun

E-government development (EGD) is vital in enhancing the institutional quality and sustainable public service (SPS) delivery by eradicating corruption and cybersecurity crimes.

335

Abstract

Purpose

E-government development (EGD) is vital in enhancing the institutional quality and sustainable public service (SPS) delivery by eradicating corruption and cybersecurity crimes.

Design/methodology/approach

The present study applied econometric fixed-effect (FE) regression analysis and random forest (RF) algorithm through machine learning for comprehensive estimations in achieving SPS. This study gauges the nexus between the EGD as an independent variable and public service sustainability (PSS) as a proxy of public health services as a dependent variable in the presence of two moderators, corruption and cybersecurity indices from 47 Asian countries economies from 2015 to 2019.

Findings

The computational estimation and econometric findings show that EGD quality has improved with time in Asia and substantially promoted PSS. It further explores that exercising corruption control measures and introducing sound cybersecurity initiatives enhance PSS's quality and support the EDG effect much better.

Practical implications

The study concludes that E-Government has positively impacted PSS (healthcare) in Asia while controlling cybersecurity and institutional malfunctioning made an E-Government system healthier and SPS development in Asia.

Originality/value

This study added a novel contribution to existing E-Government and public services literature by comprehensively applied FE regression and RF algorithm analysis. Moreover, E-Government and cybersecurity improvement also has taken under consideration for PSS in Asian economies.

Article
Publication date: 20 January 2022

Carina Titus Swai and Steven Edward Mangowi

The general goal of this paper is to help educators understand the importance of MOOC training to school teachers and their hypothetical value for predicting the use of teaching…

Abstract

Purpose

The general goal of this paper is to help educators understand the importance of MOOC training to school teachers and their hypothetical value for predicting the use of teaching strategies in the face-to face-classroom teaching. With this purpose, the study is guided by two research questions: (1) Are there different patterns of preferences in teaching strategies among school teachers when they participate in MOOC training? (2) To what extent the attributes selected from the data set to visualize patterns are suitable for the formation of models?

Design/methodology/approach

Peer instruction (PI) and think-pair-share (TPS) strategies might bring positive outcome during classroom teaching. When introduced properly to school teachers, these strategies help students see reason beyond the answers by sharing with other students their response and thus learning from each other. This study aims to use educational data mining (EDM) techniques to visualize patterns and propose models based on the teaching strategies training to be used in face-to-face classroom teaching. The data set includes five attributes extracted from school teachers' Massive Open Online Courses (MOOC) training interaction data. All analysis and visualization were performed using Python, and the models were evaluated using fivefold cross-validation. The modeling performance of three different algorithms (decision tree, random forest and K-means) was tested on the data set. The results of model accuracy were presented as a confusion matrix. The experimental results indicate that the random forest (RF) algorithm outperforms decision tree (DT) and K-means algorithms with an accuracy of 96.4%.

Findings

This visualization information on the grouping of school teachers based on the teaching strategies serves as an essential reference for school teachers choosing between the two types of strategies within their face-to-face classroom settings. Teachers may use the finding obtained for an initial understanding of which strategies will fit well on their classroom teaching based on their subject majors. Moreover, the classification accuracy rates of DT and RF algorithms were the highest and considered highly significant to allow developing predictive models for similar EDM cases and provide a positive effect on the learning environment.

Research limitations/implications

This visualization information on the grouping of school teachers based on the teaching strategies serves as an essential reference for school teachers choosing between the two types of strategies within their face-to-face classroom settings. Teachers may use the finding obtained for an initial understanding of which strategies will fit well on their classroom teaching based on their subject majors. Unlike predicting different patterns of preferences in teaching strategies among school teachers when they participate in MOOC training, using visualization was found much more comfortable, less complicated and more time-efficient for small data sets. Moreover, the classification accuracy rates of decision tree and random forest algorithms were the highest and considered highly significant to allow developing predictive models for similar educational data mining cases and provide a positive effect on the learning environment.

Practical implications

DT classifier in this study ranks first before model optimization, but second after model optimization in terms of accuracy. Therefore, the goodness of the indicators needs to be further studied to devise a reasonable intervention.

Social implications

A different group of school teachers attending training on teaching strategies in a different online platform is required in future research to cross-validate these study findings.

Originality/value

The authors declare that this submission is their own work and to the best of their knowledge it contains no materials previously published or written by another person, or substantial proportions of material that have been accepted for the award of any other degree at any other educational institution.

Details

The International Journal of Information and Learning Technology, vol. 39 no. 1
Type: Research Article
ISSN: 2056-4880

Keywords

Article
Publication date: 14 May 2020

Byungdae An and Yongmoo Suh

Financial statement fraud (FSF) committed by companies implies the current status of the companies may not be healthy. As such, it is important to detect FSF, since such companies…

Abstract

Purpose

Financial statement fraud (FSF) committed by companies implies the current status of the companies may not be healthy. As such, it is important to detect FSF, since such companies tend to conceal bad information, which causes a great loss to various stakeholders. Thus, the objective of the paper is to propose a novel approach to building a classification model to identify FSF, which shows high classification performance and from which human-readable rules are extracted to explain why a company is likely to commit FSF.

Design/methodology/approach

Having prepared multiple sub-datasets to cope with class imbalance problem, we build a set of decision trees for each sub-dataset; select a subset of the set as a model for the sub-dataset by removing the tree, each of whose performance is less than the average accuracy of all trees in the set; and then select one such model which shows the best accuracy among the models. We call the resulting model MRF (Modified Random Forest). Given a new instance, we extract rules from the MRF model to explain whether the company corresponding to the new instance is likely to commit FSF or not.

Findings

Experimental results show that MRF classifier outperformed the benchmark models. The results also revealed that all the variables related to profit belong to the set of the most important indicators to FSF and that two new variables related to gross profit which were unapprised in previous studies on FSF were identified.

Originality/value

This study proposed a method of building a classification model which shows the outstanding performance and provides decision rules that can be used to explain the classification results. In addition, a new way to resolve the class imbalance problem was suggested in this paper.

Details

Data Technologies and Applications, vol. 54 no. 2
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 1 September 2022

Arthi R., Nayana J.S. and Rajarshee Mondal

The purpose of optimal protocol prediction and the benefits offered by quantum key distribution (QKD), including unbreakable security, there is a growing interest in the practical…

Abstract

Purpose

The purpose of optimal protocol prediction and the benefits offered by quantum key distribution (QKD), including unbreakable security, there is a growing interest in the practical realization of quantum communication. Realization of the optimal protocol predictor in quantum key distribution is a critical step toward commercialization of QKD.

Design/methodology/approach

The proposed work designs a machine learning model such as K-nearest neighbor algorithm, convolutional neural networks, decision tree (DT), support vector machine and random forest (RF) for optimal protocol selector for quantum key distribution network (QKDN).

Findings

Because of the effectiveness of machine learning methods in predicting effective solutions using data, these models will be the best optimal protocol selectors for achieving high efficiency for QKDN. The results show that the best machine learning method for predicting optimal protocol in QKD is the RF algorithm. It also validates the effectiveness of machine learning in optimal protocol selection.

Originality/value

The proposed work was done using algorithms like the local search algorithm or exhaustive traversal, however the major downside of using these algorithms is that it takes a very long time to revert back results, which is unacceptable for commercial systems. Hence, machine learning methods are proposed to see the effectiveness of prediction for achieving high efficiency.

Details

International Journal of Pervasive Computing and Communications, vol. 19 no. 5
Type: Research Article
ISSN: 1742-7371

Keywords

Article
Publication date: 17 March 2023

Rui Tian, Ruheng Yin and Feng Gan

Music sentiment analysis helps to promote the diversification of music information retrieval methods. Traditional music emotion classification tasks suffer from high manual…

Abstract

Purpose

Music sentiment analysis helps to promote the diversification of music information retrieval methods. Traditional music emotion classification tasks suffer from high manual workload and low classification accuracy caused by difficulty in feature extraction and inaccurate manual determination of hyperparameter. In this paper, the authors propose an optimized convolution neural network-random forest (CNN-RF) model for music sentiment classification which is capable of optimizing the manually selected hyperparameters to improve the accuracy of music sentiment classification and reduce labor costs and human classification errors.

Design/methodology/approach

A CNN-RF music sentiment classification model is designed based on quantum particle swarm optimization (QPSO). First, the audio data are transformed into a Mel spectrogram, and feature extraction is conducted by a CNN. Second, the music features extracted are processed by RF algorithm to complete a preliminary emotion classification. Finally, to select the suitable hyperparameters for a CNN, the QPSO algorithm is adopted to extract the best hyperparameters and obtain the final classification results.

Findings

The model has gone through experimental validations and achieved a classification accuracy of 97 per cent for different sentiment categories with shortened training time. The proposed method with QPSO achieved 1.2 and 1.6 per cent higher accuracy than that with particle swarm optimization and genetic algorithm, respectively. The proposed model had great potential for music sentiment classification.

Originality/value

The dual contribution of this work comprises the proposed model which integrated two deep learning models and the introduction of a QPSO into model optimization. With these two innovations, the efficiency and accuracy of music emotion recognition and classification have been significantly improved.

Details

Data Technologies and Applications, vol. 57 no. 5
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 19 July 2023

Gaurav Kumar, Molla Ramizur Rahman, Abhinav Rajverma and Arun Kumar Misra

This study aims to analyse the systemic risk emitted by all publicly listed commercial banks in a key emerging economy, India.

Abstract

Purpose

This study aims to analyse the systemic risk emitted by all publicly listed commercial banks in a key emerging economy, India.

Design/methodology/approach

The study makes use of the Tobias and Brunnermeier (2016) estimator to quantify the systemic risk (ΔCoVaR) that banks contribute to the system. The methodology addresses a classification problem based on the probability that a particular bank will emit high systemic risk or moderate systemic risk. The study applies machine learning models such as logistic regression, random forest (RF), neural networks and gradient boosting machine (GBM) and addresses the issue of imbalanced data sets to investigate bank’s balance sheet features and bank’s stock features which may potentially determine the factors of systemic risk emission.

Findings

The study reports that across various performance matrices, the authors find that two specifications are preferred: RF and GBM. The study identifies lag of the estimator of systemic risk, stock beta, stock volatility and return on equity as important features to explain emission of systemic risk.

Practical implications

The findings will help banks and regulators with the key features that can be used to formulate the policy decisions.

Originality/value

This study contributes to the existing literature by suggesting classification algorithms that can be used to model the probability of systemic risk emission in a classification problem setting. Further, the study identifies the features responsible for the likelihood of systemic risk.

Details

Journal of Modelling in Management, vol. 19 no. 2
Type: Research Article
ISSN: 1746-5664

Keywords

1 – 10 of 396