Search results
1 – 10 of over 6000
This study aims to compare machine learning models, datasets and splitting training-testing using data mining methods to detect financial statement fraud.
Abstract
Purpose
This study aims to compare machine learning models, datasets and splitting training-testing using data mining methods to detect financial statement fraud.
Design/methodology/approach
This study uses a quantitative approach from secondary data on the financial reports of companies listed on the Indonesia Stock Exchange in the last ten years, from 2010 to 2019. Research variables use financial and non-financial variables. Indicators of financial statement fraud are determined based on notes or sanctions from regulators and financial statement restatements with special supervision.
Findings
The findings show that the Extremely Randomized Trees (ERT) model performs better than other machine learning models. The best original-sampling dataset compared to other dataset treatments. Training testing splitting 80:10 is the best compared to other training-testing splitting treatments. So the ERT model with an original-sampling dataset and 80:10 training-testing splitting are the most appropriate for detecting future financial statement fraud.
Practical implications
This study can be used by regulators, investors, stakeholders and financial crime experts to add insight into better methods of detecting financial statement fraud.
Originality/value
This study proposes a machine learning model that has not been discussed in previous studies and performs comparisons to obtain the best financial statement fraud detection results. Practitioners and academics can use findings for further research development.
Details
Keywords
Ibrahim Karatas and Abdulkadir Budak
The study is aimed to compare the prediction success of basic machine learning and ensemble machine learning models and accordingly create novel prediction models by combining…
Abstract
Purpose
The study is aimed to compare the prediction success of basic machine learning and ensemble machine learning models and accordingly create novel prediction models by combining machine learning models to increase the prediction success in construction labor productivity prediction models.
Design/methodology/approach
Categorical and numerical data used in prediction models in many studies in the literature for the prediction of construction labor productivity were made ready for analysis by preprocessing. The Python programming language was used to develop machine learning models. As a result of many variation trials, the models were combined and the proposed novel voting and stacking meta-ensemble machine learning models were constituted. Finally, the models were compared to Target and Taylor diagram.
Findings
Meta-ensemble models have been developed for labor productivity prediction by combining machine learning models. Voting ensemble by combining et, gbm, xgboost, lightgbm, catboost and mlp models and stacking ensemble by combining et, gbm, xgboost, catboost and mlp models were created and finally the Et model as meta-learner was selected. Considering the prediction success, it has been determined that the voting and stacking meta-ensemble algorithms have higher prediction success than other machine learning algorithms. Model evaluation metrics, namely MAE, MSE, RMSE and R2, were selected to measure the prediction success. For the voting meta-ensemble algorithm, the values of the model evaluation metrics MAE, MSE, RMSE and R2 are 0.0499, 0.0045, 0.0671 and 0.7886, respectively. For the stacking meta-ensemble algorithm, the values of the model evaluation metrics MAE, MSE, RMSE and R2 are 0.0469, 0.0043, 0.0658 and 0.7967, respectively.
Research limitations/implications
The study shows the comparison between machine learning algorithms and created novel meta-ensemble machine learning algorithms to predict the labor productivity of construction formwork activity. The practitioners and project planners can use this model as reliable and accurate tool for predicting the labor productivity of construction formwork activity prior to construction planning.
Originality/value
The study provides insight into the application of ensemble machine learning algorithms in predicting construction labor productivity. Additionally, novel meta-ensemble algorithms have been used and proposed. Therefore, it is hoped that predicting the labor productivity of construction formwork activity with high accuracy will make a great contribution to construction project management.
Details
Keywords
Daniel Šandor and Marina Bagić Babac
Sarcasm is a linguistic expression that usually carries the opposite meaning of what is being said by words, thus making it difficult for machines to discover the actual meaning…
Abstract
Purpose
Sarcasm is a linguistic expression that usually carries the opposite meaning of what is being said by words, thus making it difficult for machines to discover the actual meaning. It is mainly distinguished by the inflection with which it is spoken, with an undercurrent of irony, and is largely dependent on context, which makes it a difficult task for computational analysis. Moreover, sarcasm expresses negative sentiments using positive words, allowing it to easily confuse sentiment analysis models. This paper aims to demonstrate the task of sarcasm detection using the approach of machine and deep learning.
Design/methodology/approach
For the purpose of sarcasm detection, machine and deep learning models were used on a data set consisting of 1.3 million social media comments, including both sarcastic and non-sarcastic comments. The data set was pre-processed using natural language processing methods, and additional features were extracted and analysed. Several machine learning models, including logistic regression, ridge regression, linear support vector and support vector machines, along with two deep learning models based on bidirectional long short-term memory and one bidirectional encoder representations from transformers (BERT)-based model, were implemented, evaluated and compared.
Findings
The performance of machine and deep learning models was compared in the task of sarcasm detection, and possible ways of improvement were discussed. Deep learning models showed more promise, performance-wise, for this type of task. Specifically, a state-of-the-art model in natural language processing, namely, BERT-based model, outperformed other machine and deep learning models.
Originality/value
This study compared the performance of the various machine and deep learning models in the task of sarcasm detection using the data set of 1.3 million comments from social media.
Details
Keywords
Marko Kureljusic and Jonas Metz
The accurate prediction of incoming cash flows enables more effective cash management and allows firms to shape firms' planning based on forward-looking information. Although most…
Abstract
Purpose
The accurate prediction of incoming cash flows enables more effective cash management and allows firms to shape firms' planning based on forward-looking information. Although most firms are aware of the benefits of these forecasts, many still have difficulties identifying and implementing an appropriate prediction model. With the rise of machine learning algorithms, numerous new forecasting techniques have emerged. These new forecasting techniques are theoretically applicable for predicting customer payment behavior but have not yet been adequately investigated. This study aims to close this research gap by examining which machine learning algorithm is the most appropriate for predicting customer payment dates.
Design/methodology/approach
By using various machine learning algorithms, the authors evaluate whether customer payment behavior patterns can be identified and predicted. The study is based on real-world transaction data from a DAX-40 firm with over 1,000,000 invoices in the dataset, with the data covering the period 2017–2019.
Findings
The authors' results show that neural networks in particular are suitable for predicting customers' payment dates. Furthermore, the authors demonstrate that contextual and logical prediction models can provide more accurate forecasts than conventional baseline models, such as linear and multivariate regression.
Research limitations/implications
Future cash flow forecasting studies should incorporate naïve prediction models, as the authors demonstrate that these models can compete with conventional baseline models used in existing machine learning research. However, the authors expect that with more in-depth information about the customer (creditworthiness, accounting structure) the results can be even further improved.
Practical implications
The knowledge of customers' future payment dates enables firms to change their perspective and move from reactive to proactive cash management. This shift leads to a more targeted dunning process.
Originality/value
To the best of the authors' knowledge, no study has yet been conducted that interprets the prediction of incoming payments as a daily rolling forecast by comparing naïve forecasts with forecasts based on machine learning and deep learning models.
Details
Keywords
Karlo Puh and Marina Bagić Babac
As the tourism industry becomes more vital for the success of many economies around the world, the importance of technology in tourism grows daily. Alongside increasing tourism…
Abstract
Purpose
As the tourism industry becomes more vital for the success of many economies around the world, the importance of technology in tourism grows daily. Alongside increasing tourism importance and popularity, the amount of significant data grows, too. On daily basis, millions of people write their opinions, suggestions and views about accommodation, services, and much more on various websites. Well-processed and filtered data can provide a lot of useful information that can be used for making tourists' experiences much better and help us decide when selecting a hotel or a restaurant. Thus, the purpose of this study is to explore machine and deep learning models for predicting sentiment and rating from tourist reviews.
Design/methodology/approach
This paper used machine learning models such as Naïve Bayes, support vector machines (SVM), convolutional neural network (CNN), long short-term memory (LSTM) and bidirectional long short-term memory (BiLSTM) for extracting sentiment and ratings from tourist reviews. These models were trained to classify reviews into positive, negative, or neutral sentiment, and into one to five grades or stars. Data used for training the models were gathered from TripAdvisor, the world's largest travel platform. The models based on multinomial Naïve Bayes (MNB) and SVM were trained using the term frequency-inverse document frequency (TF-IDF) for word representations while deep learning models were trained using global vectors (GloVe) for word representation. The results from testing these models are presented, compared and discussed.
Findings
The performance of machine and learning models achieved high accuracy in predicting positive, negative, or neutral sentiments and ratings from tourist reviews. The optimal model architecture for both classification tasks was a deep learning model based on BiLSTM. The study’s results confirmed that deep learning models are more efficient and accurate than machine learning algorithms.
Practical implications
The proposed models allow for forecasting the number of tourist arrivals and expenditure, gaining insights into the tourists' profiles, improving overall customer experience, and upgrading marketing strategies. Different service sectors can use the implemented models to get insights into customer satisfaction with the products and services as well as to predict the opinions given a particular context.
Originality/value
This study developed and compared different machine learning models for classifying customer reviews as positive, negative, or neutral, as well as predicting ratings with one to five stars based on a TripAdvisor hotel reviews dataset that contains 20,491 unique hotel reviews.
Details
Keywords
Xin Huang, Ting Tang, Yu Ning Luo and Ren Wang
This study aims to examine the impact of board characteristics on firm performance while also exploring the influential mechanisms that help Chinese listed companies establish…
Abstract
Purpose
This study aims to examine the impact of board characteristics on firm performance while also exploring the influential mechanisms that help Chinese listed companies establish effective boards of directors and strengthen their corporate governance mechanisms.
Design/methodology/approach
This paper uses machine learning methods to investigate the predictive ability of the board of directors' characteristics on firm performance based on the data from Chinese A-share listed companies on the Shanghai and Shenzhen stock exchanges in China during 2008–2021. This study further analyzes board characteristics with relatively strong predictive ability and their predictive models on firm performance.
Findings
The results show that nonlinear machine learning methods are more effective than traditional linear models in analyzing the impact of board characteristics on Chinese firm performance. Among the series characteristics of the board of directors, the contribution ratio in prediction from directors compensation, director shareholding ratio, the average age of directors and directors' educational level are significant, and these characteristics have a roughly nonlinear correlation to the prediction of firm performance; the improvement of the predictive ability of board characteristics on firm performance in state-owned enterprises in China performs better than that in private enterprises.
Practical implications
The findings of this study provide valuable suggestions for enriching the theory of board governance, strengthening board construction and optimizing the effectiveness of board governance. Furthermore, these impacts can serve as a valuable reference for board construction and selection, aiding in the rational selection of boards to establish an efficient and high-performing board of directors.
Originality/value
The study findings unequivocally demonstrate the superiority of nonlinear machine learning approaches over traditional linear models in examining the relationship between board characteristics and firm performance in China. Within the suite of board characteristics, director compensation, shareholding ratio, average age and educational level are particularly noteworthy, consistently demonstrating strong, nonlinear associations with firm performance. Within the suite of board characteristics, director compensation, shareholding ratio, average age and educational level are particularly noteworthy, consistently demonstrating strong, nonlinear associations with firm performance. The study reveals that the predictive performance of board attributes is generally more robust for state-owned enterprises in China in comparison to their counterparts in the private sector.
Details
Keywords
Machine learning is an algorithmic-based auto-learning mechanism that improves from its experiences. It makes use of a statistical learning method that trains and develops on its…
Abstract
Machine learning is an algorithmic-based auto-learning mechanism that improves from its experiences. It makes use of a statistical learning method that trains and develops on its own without the assistance of a person. Data, characteristics deduced from the data, and the model make up the three primary parts of a machine learning solution. Machine learning generates an algorithm from subsets of data that can utilise combinations of features and weights different from those obtained from basic principles. In this paper, an analysis of customer behaviour is predicted using different machine learning algorithms. The results of the algorithms are validated using python programming.
Details
Keywords
This study updates the literature review of Jones (1987) published in this journal. The study pays particular attention to two important themes that have shaped the field over the…
Abstract
Purpose
This study updates the literature review of Jones (1987) published in this journal. The study pays particular attention to two important themes that have shaped the field over the past 35 years: (1) the development of a range of innovative new statistical learning methods, particularly advanced machine learning methods such as stochastic gradient boosting, adaptive boosting, random forests and deep learning, and (2) the emergence of a wide variety of bankruptcy predictor variables extending beyond traditional financial ratios, including market-based variables, earnings management proxies, auditor going concern opinions (GCOs) and corporate governance attributes. Several directions for future research are discussed.
Design/methodology/approach
This study provides a systematic review of the corporate failure literature over the past 35 years with a particular focus on the emergence of new statistical learning methodologies and predictor variables. This synthesis of the literature evaluates the strength and limitations of different modelling approaches under different circumstances and provides an overall evaluation the relative contribution of alternative predictor variables. The study aims to provide a transparent, reproducible and interpretable review of the literature. The literature review also takes a theme-centric rather than author-centric approach and focuses on structured themes that have dominated the literature since 1987.
Findings
There are several major findings of this study. First, advanced machine learning methods appear to have the most promise for future firm failure research. Not only do these methods predict significantly better than conventional models, but they also possess many appealing statistical properties. Second, there are now a much wider range of variables being used to model and predict firm failure. However, the literature needs to be interpreted with some caution given the many mixed findings. Finally, there are still a number of unresolved methodological issues arising from the Jones (1987) study that still requiring research attention.
Originality/value
The study explains the connections and derivations between a wide range of firm failure models, from simpler linear models to advanced machine learning methods such as gradient boosting, random forests, adaptive boosting and deep learning. The paper highlights the most promising models for future research, particularly in terms of their predictive power, underlying statistical properties and issues of practical implementation. The study also draws together an extensive literature on alternative predictor variables and provides insights into the role and behaviour of alternative predictor variables in firm failure research.
Details
Keywords
The objective of this research work is to design a data-based solution for administering traffic organization in a smart city by using the machine learning algorithm.
Abstract
Purpose
The objective of this research work is to design a data-based solution for administering traffic organization in a smart city by using the machine learning algorithm.
Design/methodology/approach
A machine learning framework for managing traffic infrastructure and air pollution in urban centers relies on a predictive analytics model. The model makes use of transportation data to predict traffic patterns based on the information gathered from numerous sources within the city. It can be promoted for strategic planning determination. The data features volume and calendar variables, including hours of the day, week and month. These variables are leveraged to identify time series-based seasonal patterns in the data. To achieve accurate traffic volume forecasting, the long short-term memory (LSTM) method is recommended.
Findings
The study has produced a model that is appropriate for the transportation sector in the city and other innovative urban applications. The findings indicate that the implementation of smart transportation systems enhances transportation and has a positive impact on air quality. The study's results are explored and connected to practical applications in the areas of air pollution control and smart transportation.
Originality/value
The present paper has created the machine learning framework for the transportation sector of smart cities that achieves a reasonable level of accuracy. Additionally, the paper examines the effects of smart transportation on both the environment and supply chain.
Details
Keywords
Djordje Cica, Branislav Sredanovic, Sasa Tesic and Davorin Kramar
Sustainable manufacturing is one of the most important and most challenging issues in present industrial scenario. With the intention of diminish negative effects associated with…
Abstract
Sustainable manufacturing is one of the most important and most challenging issues in present industrial scenario. With the intention of diminish negative effects associated with cutting fluids, the machining industries are continuously developing technologies and systems for cooling/lubricating of the cutting zone while maintaining machining efficiency. In the present study, three regression based machine learning techniques, namely, polynomial regression (PR), support vector regression (SVR) and Gaussian process regression (GPR) were developed to predict machining force, cutting power and cutting pressure in the turning of AISI 1045. In the development of predictive models, machining parameters of cutting speed, depth of cut and feed rate were considered as control factors. Since cooling/lubricating techniques significantly affects the machining performance, prediction model development of quality characteristics was performed under minimum quantity lubrication (MQL) and high-pressure coolant (HPC) cutting conditions. The prediction accuracy of developed models was evaluated by statistical error analyzing methods. Results of regressions based machine learning techniques were also compared with probably one of the most frequently used machine learning method, namely artificial neural networks (ANN). Finally, a metaheuristic approach based on a neural network algorithm was utilized to perform an efficient multi-objective optimization of process parameters for both cutting environment.
Details