Search results

1 – 10 of 81
Article
Publication date: 6 February 2023

Francina Malan and Johannes Lodewyk Jooste

The purpose of this paper is to compare the effectiveness of the various text mining techniques that can be used to classify maintenance work-order records into their respective…

Abstract

Purpose

The purpose of this paper is to compare the effectiveness of the various text mining techniques that can be used to classify maintenance work-order records into their respective failure modes, focussing on the choice of algorithm and preprocessing transforms. Three algorithms are evaluated, namely Bernoulli Naïve Bayes, multinomial Naïve Bayes and support vector machines.

Design/methodology/approach

The paper has both a theoretical and experimental component. In the literature review, the various algorithms and preprocessing techniques used in text classification is considered from three perspectives: the domain-specific maintenance literature, the broader short-form literature and the general text classification literature. The experimental component consists of a 5 × 2 nested cross-validation with an inner optimisation loop performed using a randomised search procedure.

Findings

From the literature review, the aspects most affected by short document length are identified as the feature representation scheme, higher-order n-grams, document length normalisation, stemming, stop-word removal and algorithm selection. However, from the experimental analysis, the selection of preprocessing transforms seemed more dependent on the particular algorithm than on short document length. Multinomial Naïve Bayes performs marginally better than the other algorithms, but overall, the performances of the optimised models are comparable.

Originality/value

This work highlights the importance of model optimisation, including the selection of preprocessing transforms. Not only did the optimisation improve the performance of all the algorithms substantially, but it also affects model comparisons, with multinomial Naïve Bayes going from the worst to the best performing algorithm.

Details

Journal of Quality in Maintenance Engineering, vol. 29 no. 3
Type: Research Article
ISSN: 1355-2511

Keywords

Article
Publication date: 23 September 2022

Hossein Sohrabi and Esmatullah Noorzai

The present study aims to develop a risk-supported case-based reasoning (RS-CBR) approach for water-related projects by incorporating various uncertainties and risks in the…

Abstract

Purpose

The present study aims to develop a risk-supported case-based reasoning (RS-CBR) approach for water-related projects by incorporating various uncertainties and risks in the revision step.

Design/methodology/approach

The cases were extracted by studying 68 water-related projects. This research employs earned value management (EVM) factors to consider time and cost features and economic, natural, technical, and project risks to account for uncertainties and supervised learning models to estimate cost overrun. Time-series algorithms were also used to predict construction cost indexes (CCI) and model improvements in future forecasts. Outliers were deleted by the pre-processing process. Next, datasets were split into testing and training sets, and algorithms were implemented. The accuracy of different models was measured with the mean absolute percentage error (MAPE) and the normalized root mean square error (NRSME) criteria.

Findings

The findings show an improvement in the accuracy of predictions using datasets that consider uncertainties, and ensemble algorithms such as Random Forest and AdaBoost had higher accuracy. Also, among the single algorithms, the support vector regressor (SVR) with the sigmoid kernel outperformed the others.

Originality/value

This research is the first attempt to develop a case-based reasoning model based on various risks and uncertainties. The developed model has provided an approving overlap with machine learning models to predict cost overruns. The model has been implemented in collected water-related projects and results have been reported.

Details

Engineering, Construction and Architectural Management, vol. 31 no. 2
Type: Research Article
ISSN: 0969-9988

Keywords

Article
Publication date: 2 May 2023

Aliakbar Marandi, Misagh Tasavori and Manoochehr Najmi

This study aims to use big data analysis and sheds light on key hotel features that play a role in the revisit intention of customers. In addition, this study endeavors to…

Abstract

Purpose

This study aims to use big data analysis and sheds light on key hotel features that play a role in the revisit intention of customers. In addition, this study endeavors to highlight hotel features for different customer segments.

Design/methodology/approach

This study uses a machine learning method and analyzes around 100,000 reviews of customers of 100 selected hotels around the world where they had indicated on Trip Advisor their intention to return to a particular hotel. The important features of the hotels are then extracted in terms of the 7Ps of the marketing mix. This study has then segmented customers intending to revisit hotels, based on the similarities in their reviews.

Findings

In total, 71 important hotel features are extracted using text analysis of comments. The most important features are the room, staff, food and accessibility. Also, customers are segmented into 15 groups, and key hotel features important for each segment are highlighted.

Research limitations/implications

In this research, the number of repetitions of words was used to identify key hotel features, whereas sentence-based analysis or group analysis of adjacent words can be used.

Practical implications

This study highlights key hotel features that are crucial for customers’ revisit intention and identifies related market segments that can support managers in better designing their strategies and allocating their resources.

Originality/value

By using text mining analysis, this study identifies and classifies important hotel features that are crucial for the revisit intention of customers based on the 7Ps. Methodologically, the authors suggest a comprehensive method to describe the revisit intention of hotel customers based on customer reviews.

Details

International Journal of Contemporary Hospitality Management, vol. 36 no. 1
Type: Research Article
ISSN: 0959-6119

Keywords

Article
Publication date: 29 December 2023

Thanh-Nghi Do and Minh-Thu Tran-Nguyen

This study aims to propose novel edge device-tailored federated learning algorithms of local classifiers (stochastic gradient descent, support vector machines), namely, FL-lSGD…

Abstract

Purpose

This study aims to propose novel edge device-tailored federated learning algorithms of local classifiers (stochastic gradient descent, support vector machines), namely, FL-lSGD and FL-lSVM. These algorithms are designed to address the challenge of large-scale ImageNet classification.

Design/methodology/approach

The authors’ FL-lSGD and FL-lSVM trains in a parallel and incremental manner to build an ensemble local classifier on Raspberry Pis without requiring data exchange. The algorithms load small data blocks of the local training subset stored on the Raspberry Pi sequentially to train the local classifiers. The data block is split into k partitions using the k-means algorithm, and models are trained in parallel on each data partition to enable local data classification.

Findings

Empirical test results on the ImageNet data set show that the authors’ FL-lSGD and FL-lSVM algorithms with 4 Raspberry Pis (Quad core Cortex-A72, ARM v8, 64-bit SoC @ 1.5GHz, 4GB RAM) are faster than the state-of-the-art LIBLINEAR algorithm run on a PC (Intel(R) Core i7-4790 CPU, 3.6 GHz, 4 cores, 32GB RAM).

Originality/value

Efficiently addressing the challenge of large-scale ImageNet classification, the authors’ novel federated learning algorithms of local classifiers have been tailored to work on the Raspberry Pi. These algorithms can handle 1,281,167 images and 1,000 classes effectively.

Details

International Journal of Web Information Systems, vol. 20 no. 1
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 7 November 2023

Christian Nnaemeka Egwim, Hafiz Alaka, Youlu Pan, Habeeb Balogun, Saheed Ajayi, Abdul Hye and Oluwapelumi Oluwaseun Egunjobi

The study aims to develop a multilayer high-effective ensemble of ensembles predictive model (stacking ensemble) using several hyperparameter optimized ensemble machine learning…

66

Abstract

Purpose

The study aims to develop a multilayer high-effective ensemble of ensembles predictive model (stacking ensemble) using several hyperparameter optimized ensemble machine learning (ML) methods (bagging and boosting ensembles) trained with high-volume data points retrieved from Internet of Things (IoT) emission sensors, time-corresponding meteorology and traffic data.

Design/methodology/approach

For a start, the study experimented big data hypothesis theory by developing sample ensemble predictive models on different data sample sizes and compared their results. Second, it developed a standalone model and several bagging and boosting ensemble models and compared their results. Finally, it used the best performing bagging and boosting predictive models as input estimators to develop a novel multilayer high-effective stacking ensemble predictive model.

Findings

Results proved data size to be one of the main determinants to ensemble ML predictive power. Second, it proved that, as compared to using a single algorithm, the cumulative result from ensemble ML algorithms is usually always better in terms of predicted accuracy. Finally, it proved stacking ensemble to be a better model for predicting PM2.5 concentration level than bagging and boosting ensemble models.

Research limitations/implications

A limitation of this study is the trade-off between performance of this novel model and the computational time required to train it. Whether this gap can be closed remains an open research question. As a result, future research should attempt to close this gap. Also, future studies can integrate this novel model to a personal air quality messaging system to inform public of pollution levels and improve public access to air quality forecast.

Practical implications

The outcome of this study will aid the public to proactively identify highly polluted areas thus potentially reducing pollution-associated/ triggered COVID-19 (and other lung diseases) deaths/ complications/ transmission by encouraging avoidance behavior and support informed decision to lock down by government bodies when integrated into an air pollution monitoring system

Originality/value

This study fills a gap in literature by providing a justification for selecting appropriate ensemble ML algorithms for PM2.5 concentration level predictive modeling. Second, it contributes to the big data hypothesis theory, which suggests that data size is one of the most important factors of ML predictive capability. Third, it supports the premise that when using ensemble ML algorithms, the cumulative output is usually always better in terms of predicted accuracy than using a single algorithm. Finally developing a novel multilayer high-performant hyperparameter optimized ensemble of ensembles predictive model that can accurately predict PM2.5 concentration levels with improved model interpretability and enhanced generalizability, as well as the provision of a novel databank of historic pollution data from IoT emission sensors that can be purchased for research, consultancy and policymaking.

Details

Journal of Engineering, Design and Technology , vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 1726-0531

Keywords

Article
Publication date: 23 November 2022

Ibrahim Karatas and Abdulkadir Budak

The study is aimed to compare the prediction success of basic machine learning and ensemble machine learning models and accordingly create novel prediction models by combining…

Abstract

Purpose

The study is aimed to compare the prediction success of basic machine learning and ensemble machine learning models and accordingly create novel prediction models by combining machine learning models to increase the prediction success in construction labor productivity prediction models.

Design/methodology/approach

Categorical and numerical data used in prediction models in many studies in the literature for the prediction of construction labor productivity were made ready for analysis by preprocessing. The Python programming language was used to develop machine learning models. As a result of many variation trials, the models were combined and the proposed novel voting and stacking meta-ensemble machine learning models were constituted. Finally, the models were compared to Target and Taylor diagram.

Findings

Meta-ensemble models have been developed for labor productivity prediction by combining machine learning models. Voting ensemble by combining et, gbm, xgboost, lightgbm, catboost and mlp models and stacking ensemble by combining et, gbm, xgboost, catboost and mlp models were created and finally the Et model as meta-learner was selected. Considering the prediction success, it has been determined that the voting and stacking meta-ensemble algorithms have higher prediction success than other machine learning algorithms. Model evaluation metrics, namely MAE, MSE, RMSE and R2, were selected to measure the prediction success. For the voting meta-ensemble algorithm, the values of the model evaluation metrics MAE, MSE, RMSE and R2 are 0.0499, 0.0045, 0.0671 and 0.7886, respectively. For the stacking meta-ensemble algorithm, the values of the model evaluation metrics MAE, MSE, RMSE and R2 are 0.0469, 0.0043, 0.0658 and 0.7967, respectively.

Research limitations/implications

The study shows the comparison between machine learning algorithms and created novel meta-ensemble machine learning algorithms to predict the labor productivity of construction formwork activity. The practitioners and project planners can use this model as reliable and accurate tool for predicting the labor productivity of construction formwork activity prior to construction planning.

Originality/value

The study provides insight into the application of ensemble machine learning algorithms in predicting construction labor productivity. Additionally, novel meta-ensemble algorithms have been used and proposed. Therefore, it is hoped that predicting the labor productivity of construction formwork activity with high accuracy will make a great contribution to construction project management.

Details

Engineering, Construction and Architectural Management, vol. 31 no. 3
Type: Research Article
ISSN: 0969-9988

Keywords

Article
Publication date: 16 February 2024

Hossam Mohamed Toma, Ahmed H. Abdeen and Ahmed Ibrahim

The equipment resale price plays an important role in calculating the optimum time for equipment replacement. Some of the existing models that predict the equipment resale price…

Abstract

Purpose

The equipment resale price plays an important role in calculating the optimum time for equipment replacement. Some of the existing models that predict the equipment resale price do not take many of the influencing factors on the resale price into account. Other models consider more factors that influence equipment resale price, but they still with low accuracy because of the modeling techniques that were used. An easy tool is required to help in forecasting the resale price and support efficient decisions for equipment replacement. This research presents a machine learning (ML) computer model helping in forecasting accurately the equipment resale price.

Design/methodology/approach

A measuring method for the influencing factors that have impacts on the equipment resale price was determined. The values of those factors were measured for 1,700 pieces of equipment and their corresponding resale price. The data were used to develop a ML model that covers three types of equipment (loaders, excavators and bulldozers). The methodology used to develop the model applied three ML algorithms: the random forest regressor, extra trees regressor and decision tree regressor, to find an accurate model for the equipment resale price. The three algorithms were verified and tested with data of 340 pieces of equipment.

Findings

Using a large number of data to train the ML model resulted in a high-accuracy predicting model. The accuracy of the extra trees regressor algorithm was the highest among the three used algorithms to develop the ML model. The accuracy of the model is 98%. A computer interface is designed to make the use of the model easier.

Originality/value

The proposed model is accurate and makes it easy to predict the equipment resale price. The predicted resale price can be used to calculate equipment elements that are essential for developing a dependable equipment replacement plan. The proposed model was developed based on the most influencing factors on the equipment resale price and evaluation of those factors was done using reliable methods. The technique used to develop the model is the ML that proved its accuracy in modeling. The accuracy of the model, which is 98%, enhances the value of the model.

Details

Engineering, Construction and Architectural Management, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0969-9988

Keywords

Article
Publication date: 9 January 2024

Wan-Chen Lee, Li-Min Cassandra Huang and Juliana Hirt

This study aims to explore the application of emojis to mood descriptions of fiction. The three goals are investigating whether Cho et al.'s model (2023) is a sound conceptual…

Abstract

Purpose

This study aims to explore the application of emojis to mood descriptions of fiction. The three goals are investigating whether Cho et al.'s model (2023) is a sound conceptual framework for implementing emojis and mood categories in information systems, mapping 30 mood categories to 115 face emojis and exploring and visualizing the relationships between mood categories based on emojis mapping.

Design/methodology/approach

An online survey was distributed to a US public university to recruit adult fiction readers. In total, 64 participants completed the survey.

Findings

The results show that the participants distinguished between the three families of fiction mood categories. The three families model is a promising option to improve mood descriptions for fiction. Through mapping emojis to 30 mood categories, the authors identified the most popular emojis for each category, analyzed the relationships between mood categories and examined participants' consensus on mapping.

Originality/value

This study focuses on applying emojis to fiction reading. Emojis were mapped to mood categories by fiction readers. Emoji mapping contributes to the understanding of the relationships between mood categories. Emojis, as graphic mood descriptors, have the potential to complement textual descriptors and enrich mood metadata for fiction.

Details

Journal of Documentation, vol. 80 no. 2
Type: Research Article
ISSN: 0022-0418

Keywords

Article
Publication date: 17 April 2024

Jahanzaib Alvi and Imtiaz Arif

The crux of this paper is to unveil efficient features and practical tools that can predict credit default.

Abstract

Purpose

The crux of this paper is to unveil efficient features and practical tools that can predict credit default.

Design/methodology/approach

Annual data of non-financial listed companies were taken from 2000 to 2020, along with 71 financial ratios. The dataset was bifurcated into three panels with three default assumptions. Logistic regression (LR) and k-nearest neighbor (KNN) binary classification algorithms were used to estimate credit default in this research.

Findings

The study’s findings revealed that features used in Model 3 (Case 3) were the efficient and best features comparatively. Results also showcased that KNN exposed higher accuracy than LR, which proves the supremacy of KNN on LR.

Research limitations/implications

Using only two classifiers limits this research for a comprehensive comparison of results; this research was based on only financial data, which exhibits a sizeable room for including non-financial parameters in default estimation. Both limitations may be a direction for future research in this domain.

Originality/value

This study introduces efficient features and tools for credit default prediction using financial data, demonstrating KNN’s superior accuracy over LR and suggesting future research directions.

Details

Kybernetes, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0368-492X

Keywords

Article
Publication date: 23 April 2024

Annarita Colamatteo, Marcello Sansone and Giuliano Iorio

This paper aims to examine the impact of the COVID-19 pandemic on the private label food products, specifically assessing the stability and changes in factors influencing…

Abstract

Purpose

This paper aims to examine the impact of the COVID-19 pandemic on the private label food products, specifically assessing the stability and changes in factors influencing purchasing decisions, and comparing pre-pandemic and post-pandemic datasets.

Design/methodology/approach

The study employs the Extra Tree Classifier method, a robust quantitative approach, to analyse data collected from questionnaires distributed among two distinct consumer samples. This methodological choice is explicitly adopted to provide a clear classification of factors influencing consumer preferences for private label products, surpassing conventional qualitative methods.

Findings

Despite the profound disruptions caused by the COVID-19 pandemic, this research underscores the persistent hierarchy of factors shaping consumer choices in the private label food market, showing an overall stability in consumer behaviour. At the same time, the analysis of individual variables highlights the positive increase in those related to product quality, health, taste, and communication.

Research limitations/implications

The use of online surveys for data collection may introduce a self-selection bias, and the non-probabilistic sampling method could limit the generalizability of the results.

Practical implications

Practical implications suggest that managers in the private label industry should prioritize enhancing quality control, ensuring effective communication, and dynamically adapting strategies to meet evolving consumer preferences, with a particular emphasis on quality and health attributes.

Originality/value

This study contributes to the existing body of literature by providing insights into the profound transformations induced by the COVID-19 pandemic on consumer behaviour, specifically in relation to their preferences for private label food products.

Details

British Food Journal, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0007-070X

Keywords

1 – 10 of 81