Search results

Across disciplines, researchers and practitioners employ decision tree ensembles such as random forests and XGBoost with great success. What explains their popularity? This…

HTML

PDF (423 KB)

EPUB (683 KB)

Abstract

Across disciplines, researchers and practitioners employ decision tree ensembles such as random forests and XGBoost with great success. What explains their popularity? This chapter showcases how marketing scholars and decision-makers can harness the power of decision tree ensembles for academic and practical applications. The author discusses the origin of decision tree ensembles, explains their theoretical underpinnings, and illustrates them empirically using a real-world telemarketing case, with the objective of predicting customer conversions. Readers unfamiliar with decision tree ensembles will learn to appreciate them for their versatility, competitive accuracy, ease of application, and computational efficiency and will gain a comprehensive understanding why decision tree ensembles contribute to every data scientist's methodological toolbox.

Details

The Machine Age of Customer Insight

Type: Book

DOI:

ISBN: 978-1-83909-697-6

Keywords

View access options

Book part

Publication date: 6 September 2019

Detecting Non-injured Passengers and Drivers in Car Accidents: A New Under-resampling Method for Imbalanced Classification

Son Nguyen, Gao Niu, John Quinn, Alan Olinsky, Jonathan Ormsbee, Richard M. Smith and James Bishop

In recent years, the problem of classification with imbalanced data has been growing in popularity in the data-mining and machine-learning communities due to the emergence of an…

HTML

PDF (716 KB)

EPUB (432 KB)

Abstract

In recent years, the problem of classification with imbalanced data has been growing in popularity in the data-mining and machine-learning communities due to the emergence of an abundance of imbalanced data in many fields. In this chapter, we compare the performance of six classification methods on an imbalanced dataset under the influence of four resampling techniques. These classification methods are the random forest, the support vector machine, logistic regression, k-nearest neighbor (KNN), the decision tree, and AdaBoost. Our study has shown that all of the classification methods have difficulty when working with the imbalanced data, with the KNN performing the worst, detecting only 27.4% of the minority class. However, with the help of resampling techniques, all of the classification methods experience improvement on overall performances. In particular, the Random Forest, in combination with the random over-sampling technique, performs the best, achieving 82.8% balanced accuracy (the average of the true-positive rate and true-negative rate).

We then propose a new procedure to resample the data. Our method is based on the idea of eliminating “easy” majority observations before under-sampling them. It has further improved the balanced accuracy of the Random Forest to 83.7%, making it the best approach for the imbalanced data.

Details

Advances in Business and Management Forecasting

Type: Book

DOI:

ISBN: 978-1-78754-290-7

Keywords

View access options

Article

Publication date: 14 September 2023

A novel approach to screening patents for securitization: a machine learning-based predictive analysis of high-quality basic asset

Cheng Liu, Yi Shi, Wenjing Xie and Xinzhong Bao

This paper aims to provide a complete analysis framework and prediction method for the construction of the patent securitization (PS) basic asset pool.

HTML

PDF (670 KB)

Downloads

Abstract

Purpose

This paper aims to provide a complete analysis framework and prediction method for the construction of the patent securitization (PS) basic asset pool.

Design/methodology/approach

This paper proposes an integrated classification method based on genetic algorithm and random forest algorithm. First, comprehensively consider the patent value evaluation model and SME credit evaluation model, determine 17 indicators to measure the patent value and SME credit; Secondly, establish the classification label of high-quality basic assets; Then, genetic algorithm and random forest model are used to predict and screen high-quality basic assets; Finally, the performance of the model is evaluated.

Findings

The machine learning model proposed in this study is mainly used to solve the screening problem of high-quality patents that constitute the underlying asset pool of PS. The empirical research shows that the integrated classification method based on genetic algorithm and random forest has good performance and prediction accuracy, and is superior to the single method that constitutes it.

Originality/value

The main contributions of the article are twofold: firstly, the machine learning model proposed in this article determines the standards for high-quality basic assets; Secondly, this article addresses the screening issue of basic assets in PS.

Details

Kybernetes, vol. 53 no. 2

Type: Research Article

DOI:

ISSN: 0368-492X

Keywords

View access options

Book part

Publication date: 30 September 2020

Use of Classification Algorithms in Health Care

Hera Khan, Ayush Srivastav and Amit Kumar Mishra

A detailed description will be provided of all the classification algorithms that have been widely used in the domain of medical science. The foundation will be laid by giving a…

HTML

PDF (852 KB)

EPUB (1 MB)

Abstract

A detailed description will be provided of all the classification algorithms that have been widely used in the domain of medical science. The foundation will be laid by giving a comprehensive overview pertaining to the background and history of the classification algorithms. This will be followed by an extensive discussion regarding various techniques of classification algorithm in machine learning (ML) hence concluding with their relevant applications in data analysis in medical science and health care. To begin with, the initials of this chapter will deal with the basic fundamentals required for a profound understanding of the classification techniques in ML which will comprise of the underlying differences between Unsupervised and Supervised Learning followed by the basic terminologies of classification and its history. Further, it will include the types of classification algorithms ranging from linear classifiers like Logistic Regression, Naïve Bayes to Nearest Neighbour, Support Vector Machine, Tree-based Classifiers, and Neural Networks, and their respective mathematics. Ensemble algorithms such as Majority Voting, Boosting, Bagging, Stacking will also be discussed at great length along with their relevant applications. Furthermore, this chapter will also incorporate comprehensive elucidation regarding the areas of application of such classification algorithms in the field of biomedicine and health care and their contribution to decision-making systems and predictive analysis. To conclude, this chapter will devote highly in the field of research and development as it will provide a thorough insight to the classification algorithms and their relevant applications used in the cases of the healthcare development sector.

Details

Big Data Analytics and Intelligence: A Perspective for Health Care

Type: Book

DOI:

ISBN: 978-1-83909-099-8

Keywords

View access options

Book part

Publication date: 1 September 2021

Effects of Resampling Techniques on Imbalanced Data Classification: A New Under-resampling Method

Son Nguyen, Phyllis Schumacher, Alan Olinsky and John Quinn

We study the performances of various predictive models including decision trees, random forests, neural networks, and linear discriminant analysis on an imbalanced data set of…

HTML

PDF (1.3 MB)

EPUB (10.6 MB)

Abstract

We study the performances of various predictive models including decision trees, random forests, neural networks, and linear discriminant analysis on an imbalanced data set of home loan applications. During the process, we propose our undersampling algorithm to cope with the issues created by the imbalance of the data. Our technique is shown to work competitively against popular resampling techniques such as random oversampling, undersampling, synthetic minority oversampling technique (SMOTE), and random oversampling examples (ROSE). We also investigate the relation between the true positive rate, true negative rate, and the imbalance of the data.

Details

Advances in Business and Management Forecasting

Type: Book

DOI:

ISBN: 978-1-83982-091-5

Keywords

Open Access

Article

Publication date: 19 August 2022

Stock market prediction by applying big data mining

Bedour M. Alshammari, Fairouz Aldhmour, Zainab M. AlQenaei and Haidar Almohri

There is a gap in knowledge about the Gulf Cooperation Council (GCC) because most studies are undertaken in countries outside the Gulf region – such as China, India, the US and…

HTML

PDF (929 KB)

Downloads

4641

Abstract

Purpose

There is a gap in knowledge about the Gulf Cooperation Council (GCC) because most studies are undertaken in countries outside the Gulf region – such as China, India, the US and Taiwan. The stock market contains rich, valuable and considerable data, and these data need careful analysis for good decisions to be made that can lead to increases in the efficiency of a business. Data mining techniques offer data processing tools and applications used to enhance decision-maker decisions. This study aims to predict the Kuwait stock market by applying big data mining.

Design/methodology/approach

The methodology used is quantitative techniques, which are mathematical and statistical models that describe a various array of the relationships of variables. Quantitative methods used to predict the direction of the stock market returns by using four techniques were implemented: logistic regression, decision trees, support vector machine and random forest.

Findings

The results are all variables statistically significant at the 5% level except gold price and oil price. Also, the variables that do not have an influence on the direction of the rate of return of Boursa Kuwait are money supply and gold price, unlike the Kuwait index, which has the highest coefficient. Furthermore, the height score of the variable that affects the direction of the rate of return is the firms, and the accuracy of the overall performance of the four models is nearly 50%.

Research limitations/implications

Some of the limitations identified for this study are as follows: (1) location limitation: Kuwait Stock Exchange; (2) time limitation: the amount of time available to accomplish the study, where the period was completed within the academic year 2019-2020 and the academic year 2020-2021. During 2020, the coronavirus pandemic (COVID-19), which was a major obstacle, occurred during data collection and analysis; (3) data limitation: The Kuwait Stock Exchange data were collected from May 2019 to March 2020, while the factors affecting the stock exchange data were collected in July 2020 due to the corona pandemic.

Originality/value

The study used new titles, variables and techniques such as using data mining to predict the Kuwait stock market. There are no adequate studies that predict the stock market by data mining in the GCC, especially in Kuwait. There is a gap in knowledge in the GCC as most studies are in foreign countries, such as China, India, the US and Taiwan.

Details

Arab Gulf Journal of Scientific Research, vol. 40 no. 2

Type: Research Article

DOI:

ISSN: 1985-9899

Keywords

Open Access

Article

Publication date: 23 November 2021

Modeling commercial vehicle drivers’ acceptance of advanced driving assistance system (ADAS)

Yueru Xu, Zhirui Ye and Chao Wang

Advanced driving assistance system (ADAS) has been applied in commercial vehicles. This paper aims to evaluate the influence factors of commercial vehicle drivers’ acceptance on…

HTML

PDF (2 MB)

Downloads

982

Abstract

Purpose

Advanced driving assistance system (ADAS) has been applied in commercial vehicles. This paper aims to evaluate the influence factors of commercial vehicle drivers’ acceptance on ADAS and explore the characteristics of each key factors. Two most widely used functions, forward collision warning (FCW) and lane departure warning (LDW), were considered in this paper.

Design/methodology/approach

A random forests algorithm was applied to evaluate the influence factors of commercial drivers’ acceptance. ADAS data of 24 commercial vehicles were recorded from 1 November to 21 December 2018, in Jiangsu province. Respond or not was set as dependent variables, while six influence factors were considered.

Findings

The acceptance rate for FCW and LDW systems was 69.52% and 38.76%, respectively. The accuracy of random forests model for FCW and LDW systems is 0.816 and 0.820, respectively. For FCW system, vehicle speed, duration time and warning hour are three key factors. Drivers prefer to respond in a short duration during daytime and low vehicle speed. While for LDW system, duration time, vehicle speed and driver age are three key factors. Older drivers have higher respond probability under higher vehicle speed, and the respond time is longer than FCW system.

Originality/value

Few research studies have focused on the attitudes of commercial vehicle drivers, though commercial vehicle accidents were proved to be more severe than passenger vehicles. The results of this study can help researchers to better understand the behavior of commercial vehicle drivers and make corresponding recommendations for ADAS of commercial vehicles.

Details

Journal of Intelligent and Connected Vehicles, vol. 4 no. 3

Type: Research Article

DOI:

ISSN: 2399-9802

Keywords

View access options

Article

Publication date: 3 April 2024

Creditworthiness pattern prediction and detection for GCC Islamic banks using machine learning techniques

Samar Shilbayeh and Rihab Grassa

Bank creditworthiness refers to the evaluation of a bank’s ability to meet its financial obligations. It is an assessment of the bank’s financial health, stability and capacity to…

HTML

PDF (1.6 MB)

Downloads

Abstract

Purpose

Bank creditworthiness refers to the evaluation of a bank’s ability to meet its financial obligations. It is an assessment of the bank’s financial health, stability and capacity to manage risks. This paper aims to investigate the credit rating patterns that are crucial for assessing creditworthiness of the Islamic banks, thereby evaluating the stability of their industry.

Design/methodology/approach

Three distinct machine learning algorithms are exploited and evaluated for the desired objective. This research initially uses the decision tree machine learning algorithm as a base learner conducting an in-depth comparison with the ensemble decision tree and Random Forest. Subsequently, the Apriori algorithm is deployed to uncover the most significant attributes impacting a bank’s credit rating. To appraise the previously elucidated models, a ten-fold cross-validation method is applied. This method involves segmenting the data sets into ten folds, with nine used for training and one for testing alternatively ten times changeable. This approach aims to mitigate any potential biases that could arise during the learning and training phases. Following this process, the accuracy is assessed and depicted in a confusion matrix as outlined in the methodology section.

Findings

The findings of this investigation reveal that the Random Forest machine learning algorithm superperforms others, achieving an impressive 90.5% accuracy in predicting credit ratings. Notably, our research sheds light on the significance of the loan-to-deposit ratio as a primary attribute affecting credit rating predictions. Moreover, this study uncovers additional pivotal banking features that intensely impact the measurements under study. This paper’s findings provide evidence that the loan-to-deposit ratio looks to be the purest bank attribute that affects credit rating prediction. In addition, deposit-to-assets ratio and profit sharing investment account ratio criteria are found to be effective in credit rating prediction and the ownership structure criterion came to be viewed as one of the essential bank attributes in credit rating prediction.

Originality/value

These findings contribute significant evidence to the understanding of attributes that strongly influence credit rating predictions within the banking sector. This study uniquely contributes by uncovering patterns that have not been previously documented in the literature, broadening our understanding in this field.

Details

International Journal of Islamic and Middle Eastern Finance and Management, vol. 17 no. 2

Type: Research Article

DOI:

ISSN: 1753-8394

Keywords

View access options

Article

Publication date: 3 March 2020

Predicting employee attrition using tree-based models

Nesreen El-Rayes, Ming Fang, Michael Smith and Stephen M. Taylor

The purpose of this study is to develop tree-based binary classification models to predict the likelihood of employee attrition based on firm cultural and management attributes.

HTML

PDF (907 KB)

Downloads

1605

Abstract

Purpose

The purpose of this study is to develop tree-based binary classification models to predict the likelihood of employee attrition based on firm cultural and management attributes.

Design/methodology/approach

A data set of resumes anonymously submitted through Glassdoor’s online portal is used in tandem with public company review information to fit decision tree, random forest and gradient boosted tree models to predict the probability of an employee leaving a firm during a job transition.

Findings

Random forest and decision tree methods are found to be the strongest attrition prediction models. In addition, compensation, company culture and senior management performance play a primary role in an employee’s decision to leave a firm.

Practical implications

This study may be used by human resources staff to better understand factors which influence employee attrition. In addition, techniques developed in this study may be applied to company-specific data sets to construct customized attrition models.

Originality/value

This study contains several novel contributions which include exploratory studies such as industry job transition percentages, distributional comparisons between factors strongly contributing to employee attrition between those who left or stayed with the firm and the first comprehensive search over binary classification models to identify which provides the strongest predictive performance of employee attrition.

Details

International Journal of Organizational Analysis, vol. 28 no. 6

Type: Research Article

DOI:

ISSN: 1934-8835

Keywords

Access

Year

Content type

1 – 10 of over 1000

Abstract

Details

Abstract

Details

Keywords

Abstract

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Details

Keywords

Abstract

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Research limitations/implications

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Practical implications

Originality/value

Details

Keywords

Access

Year

Content type

We’re listening — tell us what you think

Something didn’t work…

All feedback is valuable

Join us on our journey

Platform update page

Questions & More Information