Search results
1 – 10 of over 32000Ibrahim Karatas and Abdulkadir Budak
The study is aimed to compare the prediction success of basic machine learning and ensemble machine learning models and accordingly create novel prediction models by combining…
Abstract
Purpose
The study is aimed to compare the prediction success of basic machine learning and ensemble machine learning models and accordingly create novel prediction models by combining machine learning models to increase the prediction success in construction labor productivity prediction models.
Design/methodology/approach
Categorical and numerical data used in prediction models in many studies in the literature for the prediction of construction labor productivity were made ready for analysis by preprocessing. The Python programming language was used to develop machine learning models. As a result of many variation trials, the models were combined and the proposed novel voting and stacking meta-ensemble machine learning models were constituted. Finally, the models were compared to Target and Taylor diagram.
Findings
Meta-ensemble models have been developed for labor productivity prediction by combining machine learning models. Voting ensemble by combining et, gbm, xgboost, lightgbm, catboost and mlp models and stacking ensemble by combining et, gbm, xgboost, catboost and mlp models were created and finally the Et model as meta-learner was selected. Considering the prediction success, it has been determined that the voting and stacking meta-ensemble algorithms have higher prediction success than other machine learning algorithms. Model evaluation metrics, namely MAE, MSE, RMSE and R2, were selected to measure the prediction success. For the voting meta-ensemble algorithm, the values of the model evaluation metrics MAE, MSE, RMSE and R2 are 0.0499, 0.0045, 0.0671 and 0.7886, respectively. For the stacking meta-ensemble algorithm, the values of the model evaluation metrics MAE, MSE, RMSE and R2 are 0.0469, 0.0043, 0.0658 and 0.7967, respectively.
Research limitations/implications
The study shows the comparison between machine learning algorithms and created novel meta-ensemble machine learning algorithms to predict the labor productivity of construction formwork activity. The practitioners and project planners can use this model as reliable and accurate tool for predicting the labor productivity of construction formwork activity prior to construction planning.
Originality/value
The study provides insight into the application of ensemble machine learning algorithms in predicting construction labor productivity. Additionally, novel meta-ensemble algorithms have been used and proposed. Therefore, it is hoped that predicting the labor productivity of construction formwork activity with high accuracy will make a great contribution to construction project management.
Details
Keywords
Daniel Šandor and Marina Bagić Babac
Sarcasm is a linguistic expression that usually carries the opposite meaning of what is being said by words, thus making it difficult for machines to discover the actual meaning…
Abstract
Purpose
Sarcasm is a linguistic expression that usually carries the opposite meaning of what is being said by words, thus making it difficult for machines to discover the actual meaning. It is mainly distinguished by the inflection with which it is spoken, with an undercurrent of irony, and is largely dependent on context, which makes it a difficult task for computational analysis. Moreover, sarcasm expresses negative sentiments using positive words, allowing it to easily confuse sentiment analysis models. This paper aims to demonstrate the task of sarcasm detection using the approach of machine and deep learning.
Design/methodology/approach
For the purpose of sarcasm detection, machine and deep learning models were used on a data set consisting of 1.3 million social media comments, including both sarcastic and non-sarcastic comments. The data set was pre-processed using natural language processing methods, and additional features were extracted and analysed. Several machine learning models, including logistic regression, ridge regression, linear support vector and support vector machines, along with two deep learning models based on bidirectional long short-term memory and one bidirectional encoder representations from transformers (BERT)-based model, were implemented, evaluated and compared.
Findings
The performance of machine and deep learning models was compared in the task of sarcasm detection, and possible ways of improvement were discussed. Deep learning models showed more promise, performance-wise, for this type of task. Specifically, a state-of-the-art model in natural language processing, namely, BERT-based model, outperformed other machine and deep learning models.
Originality/value
This study compared the performance of the various machine and deep learning models in the task of sarcasm detection using the data set of 1.3 million comments from social media.
Details
Keywords
Marko Kureljusic and Jonas Metz
The accurate prediction of incoming cash flows enables more effective cash management and allows firms to shape firms' planning based on forward-looking information. Although most…
Abstract
Purpose
The accurate prediction of incoming cash flows enables more effective cash management and allows firms to shape firms' planning based on forward-looking information. Although most firms are aware of the benefits of these forecasts, many still have difficulties identifying and implementing an appropriate prediction model. With the rise of machine learning algorithms, numerous new forecasting techniques have emerged. These new forecasting techniques are theoretically applicable for predicting customer payment behavior but have not yet been adequately investigated. This study aims to close this research gap by examining which machine learning algorithm is the most appropriate for predicting customer payment dates.
Design/methodology/approach
By using various machine learning algorithms, the authors evaluate whether customer payment behavior patterns can be identified and predicted. The study is based on real-world transaction data from a DAX-40 firm with over 1,000,000 invoices in the dataset, with the data covering the period 2017–2019.
Findings
The authors' results show that neural networks in particular are suitable for predicting customers' payment dates. Furthermore, the authors demonstrate that contextual and logical prediction models can provide more accurate forecasts than conventional baseline models, such as linear and multivariate regression.
Research limitations/implications
Future cash flow forecasting studies should incorporate naïve prediction models, as the authors demonstrate that these models can compete with conventional baseline models used in existing machine learning research. However, the authors expect that with more in-depth information about the customer (creditworthiness, accounting structure) the results can be even further improved.
Practical implications
The knowledge of customers' future payment dates enables firms to change their perspective and move from reactive to proactive cash management. This shift leads to a more targeted dunning process.
Originality/value
To the best of the authors' knowledge, no study has yet been conducted that interprets the prediction of incoming payments as a daily rolling forecast by comparing naïve forecasts with forecasts based on machine learning and deep learning models.
Details
Keywords
Karlo Puh and Marina Bagić Babac
As the tourism industry becomes more vital for the success of many economies around the world, the importance of technology in tourism grows daily. Alongside increasing tourism…
Abstract
Purpose
As the tourism industry becomes more vital for the success of many economies around the world, the importance of technology in tourism grows daily. Alongside increasing tourism importance and popularity, the amount of significant data grows, too. On daily basis, millions of people write their opinions, suggestions and views about accommodation, services, and much more on various websites. Well-processed and filtered data can provide a lot of useful information that can be used for making tourists' experiences much better and help us decide when selecting a hotel or a restaurant. Thus, the purpose of this study is to explore machine and deep learning models for predicting sentiment and rating from tourist reviews.
Design/methodology/approach
This paper used machine learning models such as Naïve Bayes, support vector machines (SVM), convolutional neural network (CNN), long short-term memory (LSTM) and bidirectional long short-term memory (BiLSTM) for extracting sentiment and ratings from tourist reviews. These models were trained to classify reviews into positive, negative, or neutral sentiment, and into one to five grades or stars. Data used for training the models were gathered from TripAdvisor, the world's largest travel platform. The models based on multinomial Naïve Bayes (MNB) and SVM were trained using the term frequency-inverse document frequency (TF-IDF) for word representations while deep learning models were trained using global vectors (GloVe) for word representation. The results from testing these models are presented, compared and discussed.
Findings
The performance of machine and learning models achieved high accuracy in predicting positive, negative, or neutral sentiments and ratings from tourist reviews. The optimal model architecture for both classification tasks was a deep learning model based on BiLSTM. The study’s results confirmed that deep learning models are more efficient and accurate than machine learning algorithms.
Practical implications
The proposed models allow for forecasting the number of tourist arrivals and expenditure, gaining insights into the tourists' profiles, improving overall customer experience, and upgrading marketing strategies. Different service sectors can use the implemented models to get insights into customer satisfaction with the products and services as well as to predict the opinions given a particular context.
Originality/value
This study developed and compared different machine learning models for classifying customer reviews as positive, negative, or neutral, as well as predicting ratings with one to five stars based on a TripAdvisor hotel reviews dataset that contains 20,491 unique hotel reviews.
Details
Keywords
Aishwarya Narang, Ravi Kumar and Amit Dhiman
This study seeks to understand the connection of methodology by finding relevant papers and their full review using the “Preferred Reporting Items for Systematic Reviews and…
Abstract
Purpose
This study seeks to understand the connection of methodology by finding relevant papers and their full review using the “Preferred Reporting Items for Systematic Reviews and Meta-Analyses” (PRISMA).
Design/methodology/approach
Concrete-filled steel tubular (CFST) columns have gained popularity in construction in recent decades as they offer the benefit of constituent materials and cost-effectiveness. Artificial Neural Networks (ANNs), Support Vector Machines (SVMs), Gene Expression Programming (GEP) and Decision Trees (DTs) are some of the approaches that have been widely used in recent decades in structural engineering to construct predictive models, resulting in effective and accurate decision making. Despite the fact that there are numerous research studies on the various parameters that influence the axial compression capacity (ACC) of CFST columns, there is no systematic review of these Machine Learning methods.
Findings
The implications of a variety of structural characteristics on machine learning performance parameters are addressed and reviewed. The comparison analysis of current design codes and machine learning tools to predict the performance of CFST columns is summarized. The discussion results indicate that machine learning tools better understand complex datasets and intricate testing designs.
Originality/value
This study examines machine learning techniques for forecasting the axial bearing capacity of concrete-filled steel tubular (CFST) columns. This paper also highlights the drawbacks of utilizing existing techniques to build CFST columns, and the benefits of Machine Learning approaches over them. This article attempts to introduce beginners and experienced professionals to various research trajectories.
Details
Keywords
Rajshree Varma, Yugandhara Verma, Priya Vijayvargiya and Prathamesh P. Churi
The rapid advancement of technology in online communication and fingertip access to the Internet has resulted in the expedited dissemination of fake news to engage a global…
Abstract
Purpose
The rapid advancement of technology in online communication and fingertip access to the Internet has resulted in the expedited dissemination of fake news to engage a global audience at a low cost by news channels, freelance reporters and websites. Amid the coronavirus disease 2019 (COVID-19) pandemic, individuals are inflicted with these false and potentially harmful claims and stories, which may harm the vaccination process. Psychological studies reveal that the human ability to detect deception is only slightly better than chance; therefore, there is a growing need for serious consideration for developing automated strategies to combat fake news that traverses these platforms at an alarming rate. This paper systematically reviews the existing fake news detection technologies by exploring various machine learning and deep learning techniques pre- and post-pandemic, which has never been done before to the best of the authors’ knowledge.
Design/methodology/approach
The detailed literature review on fake news detection is divided into three major parts. The authors searched papers no later than 2017 on fake news detection approaches on deep learning and machine learning. The papers were initially searched through the Google scholar platform, and they have been scrutinized for quality. The authors kept “Scopus” and “Web of Science” as quality indexing parameters. All research gaps and available databases, data pre-processing, feature extraction techniques and evaluation methods for current fake news detection technologies have been explored, illustrating them using tables, charts and trees.
Findings
The paper is dissected into two approaches, namely machine learning and deep learning, to present a better understanding and a clear objective. Next, the authors present a viewpoint on which approach is better and future research trends, issues and challenges for researchers, given the relevance and urgency of a detailed and thorough analysis of existing models. This paper also delves into fake new detection during COVID-19, and it can be inferred that research and modeling are shifting toward the use of ensemble approaches.
Originality/value
The study also identifies several novel automated web-based approaches used by researchers to assess the validity of pandemic news that have proven to be successful, although currently reported accuracy has not yet reached consistent levels in the real world.
Details
Keywords
Simone Massulini Acosta and Angelo Marcio Oliveira Sant'Anna
Process monitoring is a way to manage the quality characteristics of products in manufacturing processes. Several process monitoring based on machine learning algorithms have been…
Abstract
Purpose
Process monitoring is a way to manage the quality characteristics of products in manufacturing processes. Several process monitoring based on machine learning algorithms have been proposed in the literature and have gained the attention of many researchers. In this paper, the authors developed machine learning-based control charts for monitoring fraction non-conforming products in smart manufacturing. This study proposed a relevance vector machine using Bayesian sparse kernel optimized by differential evolution algorithm for efficient monitoring in manufacturing.
Design/methodology/approach
A new approach was carried out about data analysis, modelling and monitoring in the manufacturing industry. This study developed a relevance vector machine using Bayesian sparse kernel technique to improve the support vector machine used to both regression and classification problems. The authors compared the performance of proposed relevance vector machine with other machine learning algorithms, such as support vector machine, artificial neural network and beta regression model. The proposed approach was evaluated by different shift scenarios of average run length using Monte Carlo simulation.
Findings
The authors analyse a real case study in a manufacturing company, based on best machine learning algorithms. The results indicate that proposed relevance vector machine-based process monitoring are excellent quality tools for monitoring defective products in manufacturing process. A comparative analysis with four machine learning models is used to evaluate the performance of the proposed approach. The relevance vector machine has slightly better performance than support vector machine, artificial neural network and beta models.
Originality/value
This research is different from the others by providing approaches for monitoring defective products. Machine learning-based control charts are used to monitor product failures in smart manufacturing process. Besides, the key contribution of this study is to develop different models for fault detection and to identify any change point in the manufacturing process. Moreover, the authors’ research indicates that machine learning models are adequate tools for the modelling and monitoring of the fraction non-conforming product in the industrial process.
Details
Keywords
This study updates the literature review of Jones (1987) published in this journal. The study pays particular attention to two important themes that have shaped the field over the…
Abstract
Purpose
This study updates the literature review of Jones (1987) published in this journal. The study pays particular attention to two important themes that have shaped the field over the past 35 years: (1) the development of a range of innovative new statistical learning methods, particularly advanced machine learning methods such as stochastic gradient boosting, adaptive boosting, random forests and deep learning, and (2) the emergence of a wide variety of bankruptcy predictor variables extending beyond traditional financial ratios, including market-based variables, earnings management proxies, auditor going concern opinions (GCOs) and corporate governance attributes. Several directions for future research are discussed.
Design/methodology/approach
This study provides a systematic review of the corporate failure literature over the past 35 years with a particular focus on the emergence of new statistical learning methodologies and predictor variables. This synthesis of the literature evaluates the strength and limitations of different modelling approaches under different circumstances and provides an overall evaluation the relative contribution of alternative predictor variables. The study aims to provide a transparent, reproducible and interpretable review of the literature. The literature review also takes a theme-centric rather than author-centric approach and focuses on structured themes that have dominated the literature since 1987.
Findings
There are several major findings of this study. First, advanced machine learning methods appear to have the most promise for future firm failure research. Not only do these methods predict significantly better than conventional models, but they also possess many appealing statistical properties. Second, there are now a much wider range of variables being used to model and predict firm failure. However, the literature needs to be interpreted with some caution given the many mixed findings. Finally, there are still a number of unresolved methodological issues arising from the Jones (1987) study that still requiring research attention.
Originality/value
The study explains the connections and derivations between a wide range of firm failure models, from simpler linear models to advanced machine learning methods such as gradient boosting, random forests, adaptive boosting and deep learning. The paper highlights the most promising models for future research, particularly in terms of their predictive power, underlying statistical properties and issues of practical implementation. The study also draws together an extensive literature on alternative predictor variables and provides insights into the role and behaviour of alternative predictor variables in firm failure research.
Details
Keywords
Deepak Suresh Asudani, Naresh Kumar Nagwani and Pradeep Singh
Classifying emails as ham or spam based on their content is essential. Determining the semantic and syntactic meaning of words and putting them in a high-dimensional feature…
Abstract
Purpose
Classifying emails as ham or spam based on their content is essential. Determining the semantic and syntactic meaning of words and putting them in a high-dimensional feature vector form for processing is the most difficult challenge in email categorization. The purpose of this paper is to examine the effectiveness of the pre-trained embedding model for the classification of emails using deep learning classifiers such as the long short-term memory (LSTM) model and convolutional neural network (CNN) model.
Design/methodology/approach
In this paper, global vectors (GloVe) and Bidirectional Encoder Representations Transformers (BERT) pre-trained word embedding are used to identify relationships between words, which helps to classify emails into their relevant categories using machine learning and deep learning models. Two benchmark datasets, SpamAssassin and Enron, are used in the experimentation.
Findings
In the first set of experiments, machine learning classifiers, the support vector machine (SVM) model, perform better than other machine learning methodologies. The second set of experiments compares the deep learning model performance without embedding, GloVe and BERT embedding. The experiments show that GloVe embedding can be helpful for faster execution with better performance on large-sized datasets.
Originality/value
The experiment reveals that the CNN model with GloVe embedding gives slightly better accuracy than the model with BERT embedding and traditional machine learning algorithms to classify an email as ham or spam. It is concluded that the word embedding models improve email classifiers accuracy.
Details
Keywords
Ning Yan and Oliver Tat-Sheung Au
The purpose of this paper is to make a correlation analysis between students’ online learning behavior features and course grade, and to attempt to build some effective prediction…
Abstract
Purpose
The purpose of this paper is to make a correlation analysis between students’ online learning behavior features and course grade, and to attempt to build some effective prediction model based on limited data.
Design/methodology/approach
The prediction label in this paper is the course grade of students, and the eigenvalues available are student age, student gender, connection time, hits count and days of access. The machine learning model used in this paper is the classical three-layer feedforward neural networks, and the scaled conjugate gradient algorithm is adopted. Pearson correlation analysis method is used to find the relationships between course grade and the student eigenvalues.
Findings
Days of access has the highest correlation with course grade, followed by hits count, and connection time is less relevant to students’ course grade. Student age and gender have the lowest correlation with course grade. Binary classification models have much higher prediction accuracy than multi-class classification models. Data normalization and data discretization can effectively improve the prediction accuracy of machine learning models, such as ANN model in this paper.
Originality/value
This paper may help teachers to find some clue to identify students with learning difficulties in advance and give timely help through the online learning behavior data. It shows that acceptable prediction models based on machine learning can be built using a small and limited data set. However, introducing external data into machine learning models to improve its prediction accuracy is still a valuable and hard issue.
Details