Search results

1 – 10 of over 1000

Abstract

Details

Machine Learning and Artificial Intelligence in Marketing and Sales
Type: Book
ISBN: 978-1-80043-881-1

Book part
Publication date: 15 March 2021

Jochen Hartmann

Across disciplines, researchers and practitioners employ decision tree ensembles such as random forests and XGBoost with great success. What explains their popularity…

Abstract

Across disciplines, researchers and practitioners employ decision tree ensembles such as random forests and XGBoost with great success. What explains their popularity? This chapter showcases how marketing scholars and decision-makers can harness the power of decision tree ensembles for academic and practical applications. The author discusses the origin of decision tree ensembles, explains their theoretical underpinnings, and illustrates them empirically using a real-world telemarketing case, with the objective of predicting customer conversions. Readers unfamiliar with decision tree ensembles will learn to appreciate them for their versatility, competitive accuracy, ease of application, and computational efficiency and will gain a comprehensive understanding why decision tree ensembles contribute to every data scientist's methodological toolbox.

Details

The Machine Age of Customer Insight
Type: Book
ISBN: 978-1-83909-697-6

Keywords

Book part
Publication date: 6 September 2019

Son Nguyen, Gao Niu, John Quinn, Alan Olinsky, Jonathan Ormsbee, Richard M. Smith and James Bishop

In recent years, the problem of classification with imbalanced data has been growing in popularity in the data-mining and machine-learning communities due to the emergence…

Abstract

In recent years, the problem of classification with imbalanced data has been growing in popularity in the data-mining and machine-learning communities due to the emergence of an abundance of imbalanced data in many fields. In this chapter, we compare the performance of six classification methods on an imbalanced dataset under the influence of four resampling techniques. These classification methods are the random forest, the support vector machine, logistic regression, k-nearest neighbor (KNN), the decision tree, and AdaBoost. Our study has shown that all of the classification methods have difficulty when working with the imbalanced data, with the KNN performing the worst, detecting only 27.4% of the minority class. However, with the help of resampling techniques, all of the classification methods experience improvement on overall performances. In particular, the Random Forest, in combination with the random over-sampling technique, performs the best, achieving 82.8% balanced accuracy (the average of the true-positive rate and true-negative rate).

We then propose a new procedure to resample the data. Our method is based on the idea of eliminating “easy” majority observations before under-sampling them. It has further improved the balanced accuracy of the Random Forest to 83.7%, making it the best approach for the imbalanced data.

Details

Advances in Business and Management Forecasting
Type: Book
ISBN: 978-1-78754-290-7

Keywords

Open Access
Article
Publication date: 19 August 2022

Bedour M. Alshammari, Fairouz Aldhmour, Zainab M. AlQenaei and Haidar Almohri

There is a gap in knowledge about the Gulf Cooperation Council (GCC) because most studies are undertaken in countries outside the Gulf region – such as China, India, the…

1594

Abstract

Purpose

There is a gap in knowledge about the Gulf Cooperation Council (GCC) because most studies are undertaken in countries outside the Gulf region – such as China, India, the US and Taiwan. The stock market contains rich, valuable and considerable data, and these data need careful analysis for good decisions to be made that can lead to increases in the efficiency of a business. Data mining techniques offer data processing tools and applications used to enhance decision-maker decisions. This study aims to predict the Kuwait stock market by applying big data mining.

Design/methodology/approach

The methodology used is quantitative techniques, which are mathematical and statistical models that describe a various array of the relationships of variables. Quantitative methods used to predict the direction of the stock market returns by using four techniques were implemented: logistic regression, decision trees, support vector machine and random forest.

Findings

The results are all variables statistically significant at the 5% level except gold price and oil price. Also, the variables that do not have an influence on the direction of the rate of return of Boursa Kuwait are money supply and gold price, unlike the Kuwait index, which has the highest coefficient. Furthermore, the height score of the variable that affects the direction of the rate of return is the firms, and the accuracy of the overall performance of the four models is nearly 50%.

Research limitations/implications

Some of the limitations identified for this study are as follows: (1) location limitation: Kuwait Stock Exchange; (2) time limitation: the amount of time available to accomplish the study, where the period was completed within the academic year 2019-2020 and the academic year 2020-2021. During 2020, the coronavirus pandemic (COVID-19), which was a major obstacle, occurred during data collection and analysis; (3) data limitation: The Kuwait Stock Exchange data were collected from May 2019 to March 2020, while the factors affecting the stock exchange data were collected in July 2020 due to the corona pandemic.

Originality/value

The study used new titles, variables and techniques such as using data mining to predict the Kuwait stock market. There are no adequate studies that predict the stock market by data mining in the GCC, especially in Kuwait. There is a gap in knowledge in the GCC as most studies are in foreign countries, such as China, India, the US and Taiwan.

Details

Arab Gulf Journal of Scientific Research, vol. 40 no. 2
Type: Research Article
ISSN: 1985-9899

Keywords

Book part
Publication date: 1 September 2021

Son Nguyen, Phyllis Schumacher, Alan Olinsky and John Quinn

We study the performances of various predictive models including decision trees, random forests, neural networks, and linear discriminant analysis on an imbalanced data…

Abstract

We study the performances of various predictive models including decision trees, random forests, neural networks, and linear discriminant analysis on an imbalanced data set of home loan applications. During the process, we propose our undersampling algorithm to cope with the issues created by the imbalance of the data. Our technique is shown to work competitively against popular resampling techniques such as random oversampling, undersampling, synthetic minority oversampling technique (SMOTE), and random oversampling examples (ROSE). We also investigate the relation between the true positive rate, true negative rate, and the imbalance of the data.

Book part
Publication date: 30 September 2020

Hera Khan, Ayush Srivastav and Amit Kumar Mishra

A detailed description will be provided of all the classification algorithms that have been widely used in the domain of medical science. The foundation will be laid by…

Abstract

A detailed description will be provided of all the classification algorithms that have been widely used in the domain of medical science. The foundation will be laid by giving a comprehensive overview pertaining to the background and history of the classification algorithms. This will be followed by an extensive discussion regarding various techniques of classification algorithm in machine learning (ML) hence concluding with their relevant applications in data analysis in medical science and health care. To begin with, the initials of this chapter will deal with the basic fundamentals required for a profound understanding of the classification techniques in ML which will comprise of the underlying differences between Unsupervised and Supervised Learning followed by the basic terminologies of classification and its history. Further, it will include the types of classification algorithms ranging from linear classifiers like Logistic Regression, Naïve Bayes to Nearest Neighbour, Support Vector Machine, Tree-based Classifiers, and Neural Networks, and their respective mathematics. Ensemble algorithms such as Majority Voting, Boosting, Bagging, Stacking will also be discussed at great length along with their relevant applications. Furthermore, this chapter will also incorporate comprehensive elucidation regarding the areas of application of such classification algorithms in the field of biomedicine and health care and their contribution to decision-making systems and predictive analysis. To conclude, this chapter will devote highly in the field of research and development as it will provide a thorough insight to the classification algorithms and their relevant applications used in the cases of the healthcare development sector.

Details

Big Data Analytics and Intelligence: A Perspective for Health Care
Type: Book
ISBN: 978-1-83909-099-8

Keywords

Open Access
Article
Publication date: 23 November 2021

Yueru Xu, Zhirui Ye and Chao Wang

Advanced driving assistance system (ADAS) has been applied in commercial vehicles. This paper aims to evaluate the influence factors of commercial vehicle drivers…

612

Abstract

Purpose

Advanced driving assistance system (ADAS) has been applied in commercial vehicles. This paper aims to evaluate the influence factors of commercial vehicle drivers’ acceptance on ADAS and explore the characteristics of each key factors. Two most widely used functions, forward collision warning (FCW) and lane departure warning (LDW), were considered in this paper.

Design/methodology/approach

A random forests algorithm was applied to evaluate the influence factors of commercial drivers’ acceptance. ADAS data of 24 commercial vehicles were recorded from 1 November to 21 December 2018, in Jiangsu province. Respond or not was set as dependent variables, while six influence factors were considered.

Findings

The acceptance rate for FCW and LDW systems was 69.52% and 38.76%, respectively. The accuracy of random forests model for FCW and LDW systems is 0.816 and 0.820, respectively. For FCW system, vehicle speed, duration time and warning hour are three key factors. Drivers prefer to respond in a short duration during daytime and low vehicle speed. While for LDW system, duration time, vehicle speed and driver age are three key factors. Older drivers have higher respond probability under higher vehicle speed, and the respond time is longer than FCW system.

Originality/value

Few research studies have focused on the attitudes of commercial vehicle drivers, though commercial vehicle accidents were proved to be more severe than passenger vehicles. The results of this study can help researchers to better understand the behavior of commercial vehicle drivers and make corresponding recommendations for ADAS of commercial vehicles.

Details

Journal of Intelligent and Connected Vehicles, vol. 4 no. 3
Type: Research Article
ISSN: 2399-9802

Keywords

Article
Publication date: 3 March 2020

Nesreen El-Rayes, Ming Fang, Michael Smith and Stephen M. Taylor

The purpose of this study is to develop tree-based binary classification models to predict the likelihood of employee attrition based on firm cultural and management attributes.

1121

Abstract

Purpose

The purpose of this study is to develop tree-based binary classification models to predict the likelihood of employee attrition based on firm cultural and management attributes.

Design/methodology/approach

A data set of resumes anonymously submitted through Glassdoor’s online portal is used in tandem with public company review information to fit decision tree, random forest and gradient boosted tree models to predict the probability of an employee leaving a firm during a job transition.

Findings

Random forest and decision tree methods are found to be the strongest attrition prediction models. In addition, compensation, company culture and senior management performance play a primary role in an employee’s decision to leave a firm.

Practical implications

This study may be used by human resources staff to better understand factors which influence employee attrition. In addition, techniques developed in this study may be applied to company-specific data sets to construct customized attrition models.

Originality/value

This study contains several novel contributions which include exploratory studies such as industry job transition percentages, distributional comparisons between factors strongly contributing to employee attrition between those who left or stayed with the firm and the first comprehensive search over binary classification models to identify which provides the strongest predictive performance of employee attrition.

Details

International Journal of Organizational Analysis, vol. 28 no. 6
Type: Research Article
ISSN: 1934-8835

Keywords

Book part
Publication date: 8 November 2021

Taniya Ghosh and Sakshi Agarwal

Significant evidence in the literature points to money demand instability and therefore inaccurate forecasting. In view of this issue, this chapter seeks to use a method…

Abstract

Significant evidence in the literature points to money demand instability and therefore inaccurate forecasting. In view of this issue, this chapter seeks to use a method, innovative for money demand literature, that is, the machine learning model to predict money demand. Specifically, this chapter uses Random Forest Regression to predict money demand using monthly data in the Indian context over the period April-1996 to December-2018 using the variables usually used in literature. The chapter finds that in money demand prediction, the Random Forest Regression performs fairly well. The results are also compared to traditional models and it is found that the Random Forest Regression model has the potential to enhance the prediction of money demand over what traditional models predicts.

Details

Environmental, Social, and Governance Perspectives on Economic Development in Asia
Type: Book
ISBN: 978-1-80117-594-4

Keywords

Article
Publication date: 7 July 2020

Jiaming Liu, Liuan Wang, Linan Zhang, Zeming Zhang and Sicheng Zhang

The primary objective of this study was to recognize critical indicators in predicting blood glucose (BG) through data-driven methods and to compare the prediction…

Abstract

Purpose

The primary objective of this study was to recognize critical indicators in predicting blood glucose (BG) through data-driven methods and to compare the prediction performance of four tree-based ensemble models, i.e. bagging with tree regressors (bagging-decision tree [Bagging-DT]), AdaBoost with tree regressors (Adaboost-DT), random forest (RF) and gradient boosting decision tree (GBDT).

Design/methodology/approach

This study proposed a majority voting feature selection method by combining lasso regression with the Akaike information criterion (AIC) (LR-AIC), lasso regression with the Bayesian information criterion (BIC) (LR-BIC) and RF to select indicators with excellent predictive performance from initial 38 indicators in 5,642 samples. The selected features were deployed to build the tree-based ensemble models. The 10-fold cross-validation (CV) method was used to evaluate the performance of each ensemble model.

Findings

The results of feature selection indicated that age, corpuscular hemoglobin concentration (CHC), red blood cell volume distribution width (RBCVDW), red blood cell volume and leucocyte count are five most important clinical/physical indicators in BG prediction. Furthermore, this study also found that the GBDT ensemble model combined with the proposed majority voting feature selection method is better than other three models with respect to prediction performance and stability.

Practical implications

This study proposed a novel BG prediction framework for better predictive analytics in health care.

Social implications

This study incorporated medical background and machine learning technology to reduce diabetes morbidity and formulate precise medical schemes.

Originality/value

The majority voting feature selection method combined with the GBDT ensemble model provides an effective decision-making tool for predicting BG and detecting diabetes risk in advance.

1 – 10 of over 1000