Search results
1 – 10 of 81Francina Malan and Johannes Lodewyk Jooste
The purpose of this paper is to compare the effectiveness of the various text mining techniques that can be used to classify maintenance work-order records into their respective…
Abstract
Purpose
The purpose of this paper is to compare the effectiveness of the various text mining techniques that can be used to classify maintenance work-order records into their respective failure modes, focussing on the choice of algorithm and preprocessing transforms. Three algorithms are evaluated, namely Bernoulli Naïve Bayes, multinomial Naïve Bayes and support vector machines.
Design/methodology/approach
The paper has both a theoretical and experimental component. In the literature review, the various algorithms and preprocessing techniques used in text classification is considered from three perspectives: the domain-specific maintenance literature, the broader short-form literature and the general text classification literature. The experimental component consists of a 5 × 2 nested cross-validation with an inner optimisation loop performed using a randomised search procedure.
Findings
From the literature review, the aspects most affected by short document length are identified as the feature representation scheme, higher-order n-grams, document length normalisation, stemming, stop-word removal and algorithm selection. However, from the experimental analysis, the selection of preprocessing transforms seemed more dependent on the particular algorithm than on short document length. Multinomial Naïve Bayes performs marginally better than the other algorithms, but overall, the performances of the optimised models are comparable.
Originality/value
This work highlights the importance of model optimisation, including the selection of preprocessing transforms. Not only did the optimisation improve the performance of all the algorithms substantially, but it also affects model comparisons, with multinomial Naïve Bayes going from the worst to the best performing algorithm.
Details
Keywords
Hossein Sohrabi and Esmatullah Noorzai
The present study aims to develop a risk-supported case-based reasoning (RS-CBR) approach for water-related projects by incorporating various uncertainties and risks in the…
Abstract
Purpose
The present study aims to develop a risk-supported case-based reasoning (RS-CBR) approach for water-related projects by incorporating various uncertainties and risks in the revision step.
Design/methodology/approach
The cases were extracted by studying 68 water-related projects. This research employs earned value management (EVM) factors to consider time and cost features and economic, natural, technical, and project risks to account for uncertainties and supervised learning models to estimate cost overrun. Time-series algorithms were also used to predict construction cost indexes (CCI) and model improvements in future forecasts. Outliers were deleted by the pre-processing process. Next, datasets were split into testing and training sets, and algorithms were implemented. The accuracy of different models was measured with the mean absolute percentage error (MAPE) and the normalized root mean square error (NRSME) criteria.
Findings
The findings show an improvement in the accuracy of predictions using datasets that consider uncertainties, and ensemble algorithms such as Random Forest and AdaBoost had higher accuracy. Also, among the single algorithms, the support vector regressor (SVR) with the sigmoid kernel outperformed the others.
Originality/value
This research is the first attempt to develop a case-based reasoning model based on various risks and uncertainties. The developed model has provided an approving overlap with machine learning models to predict cost overruns. The model has been implemented in collected water-related projects and results have been reported.
Details
Keywords
Aliakbar Marandi, Misagh Tasavori and Manoochehr Najmi
This study aims to use big data analysis and sheds light on key hotel features that play a role in the revisit intention of customers. In addition, this study endeavors to…
Abstract
Purpose
This study aims to use big data analysis and sheds light on key hotel features that play a role in the revisit intention of customers. In addition, this study endeavors to highlight hotel features for different customer segments.
Design/methodology/approach
This study uses a machine learning method and analyzes around 100,000 reviews of customers of 100 selected hotels around the world where they had indicated on Trip Advisor their intention to return to a particular hotel. The important features of the hotels are then extracted in terms of the 7Ps of the marketing mix. This study has then segmented customers intending to revisit hotels, based on the similarities in their reviews.
Findings
In total, 71 important hotel features are extracted using text analysis of comments. The most important features are the room, staff, food and accessibility. Also, customers are segmented into 15 groups, and key hotel features important for each segment are highlighted.
Research limitations/implications
In this research, the number of repetitions of words was used to identify key hotel features, whereas sentence-based analysis or group analysis of adjacent words can be used.
Practical implications
This study highlights key hotel features that are crucial for customers’ revisit intention and identifies related market segments that can support managers in better designing their strategies and allocating their resources.
Originality/value
By using text mining analysis, this study identifies and classifies important hotel features that are crucial for the revisit intention of customers based on the 7Ps. Methodologically, the authors suggest a comprehensive method to describe the revisit intention of hotel customers based on customer reviews.
Details
Keywords
Thanh-Nghi Do and Minh-Thu Tran-Nguyen
This study aims to propose novel edge device-tailored federated learning algorithms of local classifiers (stochastic gradient descent, support vector machines), namely, FL-lSGD…
Abstract
Purpose
This study aims to propose novel edge device-tailored federated learning algorithms of local classifiers (stochastic gradient descent, support vector machines), namely, FL-lSGD and FL-lSVM. These algorithms are designed to address the challenge of large-scale ImageNet classification.
Design/methodology/approach
The authors’ FL-lSGD and FL-lSVM trains in a parallel and incremental manner to build an ensemble local classifier on Raspberry Pis without requiring data exchange. The algorithms load small data blocks of the local training subset stored on the Raspberry Pi sequentially to train the local classifiers. The data block is split into k partitions using the k-means algorithm, and models are trained in parallel on each data partition to enable local data classification.
Findings
Empirical test results on the ImageNet data set show that the authors’ FL-lSGD and FL-lSVM algorithms with 4 Raspberry Pis (Quad core Cortex-A72, ARM v8, 64-bit SoC @ 1.5GHz, 4GB RAM) are faster than the state-of-the-art LIBLINEAR algorithm run on a PC (Intel(R) Core i7-4790 CPU, 3.6 GHz, 4 cores, 32GB RAM).
Originality/value
Efficiently addressing the challenge of large-scale ImageNet classification, the authors’ novel federated learning algorithms of local classifiers have been tailored to work on the Raspberry Pi. These algorithms can handle 1,281,167 images and 1,000 classes effectively.
Details
Keywords
Christian Nnaemeka Egwim, Hafiz Alaka, Youlu Pan, Habeeb Balogun, Saheed Ajayi, Abdul Hye and Oluwapelumi Oluwaseun Egunjobi
The study aims to develop a multilayer high-effective ensemble of ensembles predictive model (stacking ensemble) using several hyperparameter optimized ensemble machine learning…
Abstract
Purpose
The study aims to develop a multilayer high-effective ensemble of ensembles predictive model (stacking ensemble) using several hyperparameter optimized ensemble machine learning (ML) methods (bagging and boosting ensembles) trained with high-volume data points retrieved from Internet of Things (IoT) emission sensors, time-corresponding meteorology and traffic data.
Design/methodology/approach
For a start, the study experimented big data hypothesis theory by developing sample ensemble predictive models on different data sample sizes and compared their results. Second, it developed a standalone model and several bagging and boosting ensemble models and compared their results. Finally, it used the best performing bagging and boosting predictive models as input estimators to develop a novel multilayer high-effective stacking ensemble predictive model.
Findings
Results proved data size to be one of the main determinants to ensemble ML predictive power. Second, it proved that, as compared to using a single algorithm, the cumulative result from ensemble ML algorithms is usually always better in terms of predicted accuracy. Finally, it proved stacking ensemble to be a better model for predicting PM2.5 concentration level than bagging and boosting ensemble models.
Research limitations/implications
A limitation of this study is the trade-off between performance of this novel model and the computational time required to train it. Whether this gap can be closed remains an open research question. As a result, future research should attempt to close this gap. Also, future studies can integrate this novel model to a personal air quality messaging system to inform public of pollution levels and improve public access to air quality forecast.
Practical implications
The outcome of this study will aid the public to proactively identify highly polluted areas thus potentially reducing pollution-associated/ triggered COVID-19 (and other lung diseases) deaths/ complications/ transmission by encouraging avoidance behavior and support informed decision to lock down by government bodies when integrated into an air pollution monitoring system
Originality/value
This study fills a gap in literature by providing a justification for selecting appropriate ensemble ML algorithms for PM2.5 concentration level predictive modeling. Second, it contributes to the big data hypothesis theory, which suggests that data size is one of the most important factors of ML predictive capability. Third, it supports the premise that when using ensemble ML algorithms, the cumulative output is usually always better in terms of predicted accuracy than using a single algorithm. Finally developing a novel multilayer high-performant hyperparameter optimized ensemble of ensembles predictive model that can accurately predict PM2.5 concentration levels with improved model interpretability and enhanced generalizability, as well as the provision of a novel databank of historic pollution data from IoT emission sensors that can be purchased for research, consultancy and policymaking.
Details
Keywords
Ibrahim Karatas and Abdulkadir Budak
The study is aimed to compare the prediction success of basic machine learning and ensemble machine learning models and accordingly create novel prediction models by combining…
Abstract
Purpose
The study is aimed to compare the prediction success of basic machine learning and ensemble machine learning models and accordingly create novel prediction models by combining machine learning models to increase the prediction success in construction labor productivity prediction models.
Design/methodology/approach
Categorical and numerical data used in prediction models in many studies in the literature for the prediction of construction labor productivity were made ready for analysis by preprocessing. The Python programming language was used to develop machine learning models. As a result of many variation trials, the models were combined and the proposed novel voting and stacking meta-ensemble machine learning models were constituted. Finally, the models were compared to Target and Taylor diagram.
Findings
Meta-ensemble models have been developed for labor productivity prediction by combining machine learning models. Voting ensemble by combining et, gbm, xgboost, lightgbm, catboost and mlp models and stacking ensemble by combining et, gbm, xgboost, catboost and mlp models were created and finally the Et model as meta-learner was selected. Considering the prediction success, it has been determined that the voting and stacking meta-ensemble algorithms have higher prediction success than other machine learning algorithms. Model evaluation metrics, namely MAE, MSE, RMSE and R2, were selected to measure the prediction success. For the voting meta-ensemble algorithm, the values of the model evaluation metrics MAE, MSE, RMSE and R2 are 0.0499, 0.0045, 0.0671 and 0.7886, respectively. For the stacking meta-ensemble algorithm, the values of the model evaluation metrics MAE, MSE, RMSE and R2 are 0.0469, 0.0043, 0.0658 and 0.7967, respectively.
Research limitations/implications
The study shows the comparison between machine learning algorithms and created novel meta-ensemble machine learning algorithms to predict the labor productivity of construction formwork activity. The practitioners and project planners can use this model as reliable and accurate tool for predicting the labor productivity of construction formwork activity prior to construction planning.
Originality/value
The study provides insight into the application of ensemble machine learning algorithms in predicting construction labor productivity. Additionally, novel meta-ensemble algorithms have been used and proposed. Therefore, it is hoped that predicting the labor productivity of construction formwork activity with high accuracy will make a great contribution to construction project management.
Details
Keywords
Hossam Mohamed Toma, Ahmed H. Abdeen and Ahmed Ibrahim
The equipment resale price plays an important role in calculating the optimum time for equipment replacement. Some of the existing models that predict the equipment resale price…
Abstract
Purpose
The equipment resale price plays an important role in calculating the optimum time for equipment replacement. Some of the existing models that predict the equipment resale price do not take many of the influencing factors on the resale price into account. Other models consider more factors that influence equipment resale price, but they still with low accuracy because of the modeling techniques that were used. An easy tool is required to help in forecasting the resale price and support efficient decisions for equipment replacement. This research presents a machine learning (ML) computer model helping in forecasting accurately the equipment resale price.
Design/methodology/approach
A measuring method for the influencing factors that have impacts on the equipment resale price was determined. The values of those factors were measured for 1,700 pieces of equipment and their corresponding resale price. The data were used to develop a ML model that covers three types of equipment (loaders, excavators and bulldozers). The methodology used to develop the model applied three ML algorithms: the random forest regressor, extra trees regressor and decision tree regressor, to find an accurate model for the equipment resale price. The three algorithms were verified and tested with data of 340 pieces of equipment.
Findings
Using a large number of data to train the ML model resulted in a high-accuracy predicting model. The accuracy of the extra trees regressor algorithm was the highest among the three used algorithms to develop the ML model. The accuracy of the model is 98%. A computer interface is designed to make the use of the model easier.
Originality/value
The proposed model is accurate and makes it easy to predict the equipment resale price. The predicted resale price can be used to calculate equipment elements that are essential for developing a dependable equipment replacement plan. The proposed model was developed based on the most influencing factors on the equipment resale price and evaluation of those factors was done using reliable methods. The technique used to develop the model is the ML that proved its accuracy in modeling. The accuracy of the model, which is 98%, enhances the value of the model.
Details
Keywords
Wan-Chen Lee, Li-Min Cassandra Huang and Juliana Hirt
This study aims to explore the application of emojis to mood descriptions of fiction. The three goals are investigating whether Cho et al.'s model (2023) is a sound conceptual…
Abstract
Purpose
This study aims to explore the application of emojis to mood descriptions of fiction. The three goals are investigating whether Cho et al.'s model (2023) is a sound conceptual framework for implementing emojis and mood categories in information systems, mapping 30 mood categories to 115 face emojis and exploring and visualizing the relationships between mood categories based on emojis mapping.
Design/methodology/approach
An online survey was distributed to a US public university to recruit adult fiction readers. In total, 64 participants completed the survey.
Findings
The results show that the participants distinguished between the three families of fiction mood categories. The three families model is a promising option to improve mood descriptions for fiction. Through mapping emojis to 30 mood categories, the authors identified the most popular emojis for each category, analyzed the relationships between mood categories and examined participants' consensus on mapping.
Originality/value
This study focuses on applying emojis to fiction reading. Emojis were mapped to mood categories by fiction readers. Emoji mapping contributes to the understanding of the relationships between mood categories. Emojis, as graphic mood descriptors, have the potential to complement textual descriptors and enrich mood metadata for fiction.
Jahanzaib Alvi and Imtiaz Arif
The crux of this paper is to unveil efficient features and practical tools that can predict credit default.
Abstract
Purpose
The crux of this paper is to unveil efficient features and practical tools that can predict credit default.
Design/methodology/approach
Annual data of non-financial listed companies were taken from 2000 to 2020, along with 71 financial ratios. The dataset was bifurcated into three panels with three default assumptions. Logistic regression (LR) and k-nearest neighbor (KNN) binary classification algorithms were used to estimate credit default in this research.
Findings
The study’s findings revealed that features used in Model 3 (Case 3) were the efficient and best features comparatively. Results also showcased that KNN exposed higher accuracy than LR, which proves the supremacy of KNN on LR.
Research limitations/implications
Using only two classifiers limits this research for a comprehensive comparison of results; this research was based on only financial data, which exhibits a sizeable room for including non-financial parameters in default estimation. Both limitations may be a direction for future research in this domain.
Originality/value
This study introduces efficient features and tools for credit default prediction using financial data, demonstrating KNN’s superior accuracy over LR and suggesting future research directions.
Details
Keywords
Annarita Colamatteo, Marcello Sansone and Giuliano Iorio
This paper aims to examine the impact of the COVID-19 pandemic on the private label food products, specifically assessing the stability and changes in factors influencing…
Abstract
Purpose
This paper aims to examine the impact of the COVID-19 pandemic on the private label food products, specifically assessing the stability and changes in factors influencing purchasing decisions, and comparing pre-pandemic and post-pandemic datasets.
Design/methodology/approach
The study employs the Extra Tree Classifier method, a robust quantitative approach, to analyse data collected from questionnaires distributed among two distinct consumer samples. This methodological choice is explicitly adopted to provide a clear classification of factors influencing consumer preferences for private label products, surpassing conventional qualitative methods.
Findings
Despite the profound disruptions caused by the COVID-19 pandemic, this research underscores the persistent hierarchy of factors shaping consumer choices in the private label food market, showing an overall stability in consumer behaviour. At the same time, the analysis of individual variables highlights the positive increase in those related to product quality, health, taste, and communication.
Research limitations/implications
The use of online surveys for data collection may introduce a self-selection bias, and the non-probabilistic sampling method could limit the generalizability of the results.
Practical implications
Practical implications suggest that managers in the private label industry should prioritize enhancing quality control, ensuring effective communication, and dynamically adapting strategies to meet evolving consumer preferences, with a particular emphasis on quality and health attributes.
Originality/value
This study contributes to the existing body of literature by providing insights into the profound transformations induced by the COVID-19 pandemic on consumer behaviour, specifically in relation to their preferences for private label food products.
Details