Search results

1 – 10 of 988
Book part
Publication date: 15 March 2021

Jochen Hartmann

Across disciplines, researchers and practitioners employ decision tree ensembles such as random forests and XGBoost with great success. What explains their popularity? This…

Abstract

Across disciplines, researchers and practitioners employ decision tree ensembles such as random forests and XGBoost with great success. What explains their popularity? This chapter showcases how marketing scholars and decision-makers can harness the power of decision tree ensembles for academic and practical applications. The author discusses the origin of decision tree ensembles, explains their theoretical underpinnings, and illustrates them empirically using a real-world telemarketing case, with the objective of predicting customer conversions. Readers unfamiliar with decision tree ensembles will learn to appreciate them for their versatility, competitive accuracy, ease of application, and computational efficiency and will gain a comprehensive understanding why decision tree ensembles contribute to every data scientist's methodological toolbox.

Details

The Machine Age of Customer Insight
Type: Book
ISBN: 978-1-83909-697-6

Keywords

Article
Publication date: 31 May 2022

Osamah M. Al-Qershi, Junbum Kwon, Shuning Zhao and Zhaokun Li

For the case of many content features, This paper aims to investigate which content features in video and text ads more contribute to accurately predicting the success of…

Abstract

Purpose

For the case of many content features, This paper aims to investigate which content features in video and text ads more contribute to accurately predicting the success of crowdfunding by comparing prediction models.

Design/methodology/approach

With 1,368 features extracted from 15,195 Kickstarter campaigns in the USA, the authors compare base models such as logistic regression (LR) with tree-based homogeneous ensembles such as eXtreme gradient boosting (XGBoost) and heterogeneous ensembles such as XGBoost + LR.

Findings

XGBoost shows higher prediction accuracy than LR (82% vs 69%), in contrast to the findings of a previous relevant study. Regarding important content features, humans (e.g. founders) are more important than visual objects (e.g. products). In both spoken and written language, words related to experience (e.g. eat) or perception (e.g. hear) are more important than cognitive (e.g. causation) words. In addition, a focus on the future is more important than a present or past time orientation. Speech aids (see and compare) to complement visual content are also effective and positive tone matters in speech.

Research limitations/implications

This research makes theoretical contributions by finding more important visuals (human) and language features (experience, perception and future time). Also, in a multimodal context, complementary cues (e.g. speech aids) across different modalities help. Furthermore, the noncontent parts of speech such as positive “tone” or pace of speech are important.

Practical implications

Founders are encouraged to assess and revise the content of their video or text ads as well as their basic campaign features (e.g. goal, duration and reward) before they launch their campaigns. Next, overly complex ensembles may suffer from overfitting problems. In practice, model validation using unseen data is recommended.

Originality/value

Rather than reducing the number of content feature dimensions (Kaminski and Hopp, 2020), by enabling advanced prediction models to accommodate many contents features, prediction accuracy rises substantially.

Article
Publication date: 25 July 2019

Xia Li, Ruibin Bai, Peer-Olaf Siebers and Christian Wagner

Many transport and logistics companies nowadays use raw vehicle GPS data for travel time prediction. However, they face difficult challenges in terms of the costs of information…

Abstract

Purpose

Many transport and logistics companies nowadays use raw vehicle GPS data for travel time prediction. However, they face difficult challenges in terms of the costs of information storage, as well as the quality of the prediction. This paper aims to systematically investigate various meta-data (features) that require significantly less storage space but provide sufficient information for high-quality travel time predictions.

Design/methodology/approach

The paper systematically studied the combinatorial effects of features and different model fitting strategies with two popular decision tree ensemble methods for travel time prediction, namely, random forests and gradient boosting regression trees. First, the investigation was conducted using pseudo travel time data that were generated using a pseudo travel time sampling algorithm, which allows generating travel time data using different noise processes so that the prediction performance under different travel conditions and noise characteristics can be studied systematically. The results and findings were then further compared and evaluated through a real-life case.

Findings

The paper provides empirical insights and guidelines about how raw GPS data can be reduced into a small-sized feature vector for the purposes of vehicle travel time prediction. It suggests that, add travel time observations from the previous departure time intervals are beneficial to the prediction, particularly when there is no other types of real-time information (e.g. traffic flow, speed) are available. It was also found that modular model fitting does not improve the quality of the prediction in all experimental settings used in this paper.

Research limitations/implications

The findings are primarily based on empirical studies on limited real-life data instances, and the results may lack generalisabilities. Therefore, the researchers are encouraged to test them further in more real-life data instances.

Practical implications

The paper includes implications and guidelines for the development of efficient GPS data storage and high-quality travel time prediction under different types of travel conditions.

Originality/value

This paper systematically studies the combinatorial feature effects for tree-ensemble-based travel time prediction approaches.

Details

VINE Journal of Information and Knowledge Management Systems, vol. 49 no. 3
Type: Research Article
ISSN: 2059-5891

Keywords

Article
Publication date: 6 June 2008

Norbert Tóth and Béla Pataki

The purpose of this paper is to provide classification confidence value to every individual sample classified by decision trees and use this value to combine the classifiers.

Abstract

Purpose

The purpose of this paper is to provide classification confidence value to every individual sample classified by decision trees and use this value to combine the classifiers.

Design/methodology/approach

The proposed system is first theoretically explained, and then the use and effectiveness of the proposed system is demonstrated on sample datasets.

Findings

In this paper, a novel method is proposed to combine decision tree classifiers using calculated classification confidence values. This confidence in the classification is based on distance calculation to the relevant decision boundary (distance conditional), probability density estimation and (distance conditional) classification confidence estimation. It is shown that these values – provided by individual classification trees – can be integrated to derive a consensus decision.

Research limitations/implications

The proposed method is not limited to axis‐parallel trees, it is applicable not only to oblique trees, but also to any kind of classifier system that uses hyperplanes to cluster the input space.

Originality/value

A novel method is presented to extend decision tree like classifiers with confidence calculation and a voting system is proposed that uses this confidence information. The proposed system possesses several novelties (e.g. it not only gives class probabilities, but also classification confidences) and advantages over previous (traditional) approaches. The voting system does not require an auxiliary combiner or gating network, as in the mixture of experts structure and the method is not limited to decision trees with axis‐parallel splits; it is applicable to any kind of classifiers that use hyperplanes to cluster the input space.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 1 no. 2
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 3 April 2024

Samar Shilbayeh and Rihab Grassa

Bank creditworthiness refers to the evaluation of a bank’s ability to meet its financial obligations. It is an assessment of the bank’s financial health, stability and capacity to…

Abstract

Purpose

Bank creditworthiness refers to the evaluation of a bank’s ability to meet its financial obligations. It is an assessment of the bank’s financial health, stability and capacity to manage risks. This paper aims to investigate the credit rating patterns that are crucial for assessing creditworthiness of the Islamic banks, thereby evaluating the stability of their industry.

Design/methodology/approach

Three distinct machine learning algorithms are exploited and evaluated for the desired objective. This research initially uses the decision tree machine learning algorithm as a base learner conducting an in-depth comparison with the ensemble decision tree and Random Forest. Subsequently, the Apriori algorithm is deployed to uncover the most significant attributes impacting a bank’s credit rating. To appraise the previously elucidated models, a ten-fold cross-validation method is applied. This method involves segmenting the data sets into ten folds, with nine used for training and one for testing alternatively ten times changeable. This approach aims to mitigate any potential biases that could arise during the learning and training phases. Following this process, the accuracy is assessed and depicted in a confusion matrix as outlined in the methodology section.

Findings

The findings of this investigation reveal that the Random Forest machine learning algorithm superperforms others, achieving an impressive 90.5% accuracy in predicting credit ratings. Notably, our research sheds light on the significance of the loan-to-deposit ratio as a primary attribute affecting credit rating predictions. Moreover, this study uncovers additional pivotal banking features that intensely impact the measurements under study. This paper’s findings provide evidence that the loan-to-deposit ratio looks to be the purest bank attribute that affects credit rating prediction. In addition, deposit-to-assets ratio and profit sharing investment account ratio criteria are found to be effective in credit rating prediction and the ownership structure criterion came to be viewed as one of the essential bank attributes in credit rating prediction.

Originality/value

These findings contribute significant evidence to the understanding of attributes that strongly influence credit rating predictions within the banking sector. This study uniquely contributes by uncovering patterns that have not been previously documented in the literature, broadening our understanding in this field.

Details

International Journal of Islamic and Middle Eastern Finance and Management, vol. 17 no. 2
Type: Research Article
ISSN: 1753-8394

Keywords

Article
Publication date: 10 February 2022

Jameel Ahamed, Roohie Naaz Mir and Mohammad Ahsan Chishti

The world is shifting towards the fourth industrial revolution (Industry 4.0), symbolising the move to digital, fully automated habitats and cyber-physical systems. Industry 4.0…

Abstract

Purpose

The world is shifting towards the fourth industrial revolution (Industry 4.0), symbolising the move to digital, fully automated habitats and cyber-physical systems. Industry 4.0 consists of innovative ideas and techniques in almost all sectors, including Smart health care, which recommends technologies and mechanisms for early prediction of life-threatening diseases. Cardiovascular disease (CVD), which includes stroke, is one of the world’s leading causes of sickness and deaths. As per the American Heart Association, CVDs are a leading cause of death globally, and it is believed that COVID-19 also influenced the health of cardiovascular and the number of patients increases as a result. Early detection of such diseases is one of the solutions for a lower mortality rate. In this work, early prediction models for CVDs are developed with the help of machine learning (ML), a form of artificial intelligence that allows computers to learn and improve on their own without requiring to be explicitly programmed.

Design/methodology/approach

The proposed CVD prediction models are implemented with the help of ML techniques, namely, decision tree, random forest, k-nearest neighbours, support vector machine, logistic regression, AdaBoost and gradient boosting. To mitigate the effect of over-fitting and under-fitting problems, hyperparameter optimisation techniques are used to develop efficient disease prediction models. Furthermore, the ensemble technique using soft voting is also used to gain more insight into the data set and accurate prediction models.

Findings

The models were developed to help the health-care providers with the early diagnosis and prediction of heart disease patients, reducing the risk of developing severe diseases. The created heart disease risk evaluation model is built on the Jupyter Notebook Web application, and its performance is calculated using unbiased indicators such as true positive rate, true negative rate, accuracy, precision, misclassification rate, area under the ROC curve and cross-validation approach. The results revealed that the ensemble heart disease model outperforms the other proposed and implemented models.

Originality/value

The proposed and developed CVD prediction models aims at predicting CVDs at an early stage, thereby taking prevention and precautionary measures at a very early stage of the disease to abate the predictive maintenance as recommended in Industry 4.0. Prediction models are developed on algorithms’ default values, hyperparameter optimisations and ensemble techniques.

Details

Industrial Robot: the international journal of robotics research and application, vol. 49 no. 3
Type: Research Article
ISSN: 0143-991X

Keywords

Article
Publication date: 30 December 2020

Kushalkumar Thakkar, Suhas Suresh Ambekar and Manoj Hudnurkar

Longitudinal facial cracks (LFC) are one of the major defects occurring in the continuous-casting stage of thin slab caster using funnel molds. Longitudinal cracks occur mainly…

Abstract

Purpose

Longitudinal facial cracks (LFC) are one of the major defects occurring in the continuous-casting stage of thin slab caster using funnel molds. Longitudinal cracks occur mainly owing to non-uniform cooling, varying thermal conductivity along mold length and use of high superheat during casting, improper casting powder characteristics. These defects are difficult to capture and are visible only in the final stages of a process or even at the customer end. Besides, there is a seasonality associated with this defect where defect intensity increases during the winter season. To address the issue, a model-based on data analytics is developed.

Design/methodology/approach

Around six-month data of steel manufacturing process is taken and around 60 data collection point is analyzed. The model uses different classification machine learning algorithms such as logistic regression, decision tree, ensemble methods of a decision tree, support vector machine and Naïve Bays (for different cut off level) to investigate data.

Findings

Proposed research framework shows that most of models give good results between cut off level 0.6–0.8 and random forest, gradient boosting for decision trees and support vector machine model performs better compared to other model.

Practical implications

Based on predictions of model steel manufacturing companies can identify the optimal operating range where this defect can be reduced.

Originality/value

An analytical approach to identify LFC defects provides objective models for reduction of LFC defects. By reducing LFC defects, quality of steel can be improved.

Details

International Journal of Innovation Science, vol. 13 no. 1
Type: Research Article
ISSN: 1757-2223

Keywords

Article
Publication date: 6 October 2023

Vahide Bulut

Feature extraction from 3D datasets is a current problem. Machine learning is an important tool for classification of complex 3D datasets. Machine learning classification…

Abstract

Purpose

Feature extraction from 3D datasets is a current problem. Machine learning is an important tool for classification of complex 3D datasets. Machine learning classification techniques are widely used in various fields, such as text classification, pattern recognition, medical disease analysis, etc. The aim of this study is to apply the most popular classification and regression methods to determine the best classification and regression method based on the geodesics.

Design/methodology/approach

The feature vector is determined by the unit normal vector and the unit principal vector at each point of the 3D surface along with the point coordinates themselves. Moreover, different examples are compared according to the classification methods in terms of accuracy and the regression algorithms in terms of R-squared value.

Findings

Several surface examples are analyzed for the feature vector using classification (31 methods) and regression (23 methods) machine learning algorithms. In addition, two ensemble methods XGBoost and LightGBM are used for classification and regression. Also, the scores for each surface example are compared.

Originality/value

To the best of the author’s knowledge, this is the first study to analyze datasets based on geodesics using machine learning algorithms for classification and regression.

Details

Engineering Computations, vol. 40 no. 9/10
Type: Research Article
ISSN: 0264-4401

Keywords

Article
Publication date: 18 October 2022

Hasnae Zerouaoui, Ali Idri and Omar El Alaoui

Hundreds of thousands of deaths each year in the world are caused by breast cancer (BC). An early-stage diagnosis of this disease can positively reduce the morbidity and mortality…

Abstract

Purpose

Hundreds of thousands of deaths each year in the world are caused by breast cancer (BC). An early-stage diagnosis of this disease can positively reduce the morbidity and mortality rate by helping to select the most appropriate treatment options, especially by using histological BC images for the diagnosis.

Design/methodology/approach

The present study proposes and evaluates a novel approach which consists of 24 deep hybrid heterogenous ensembles that combine the strength of seven deep learning techniques (DenseNet 201, Inception V3, VGG16, VGG19, Inception-ResNet-V3, MobileNet V2 and ResNet 50) for feature extraction and four well-known classifiers (multi-layer perceptron, support vector machines, K-nearest neighbors and decision tree) by means of hard and weighted voting combination methods for histological classification of BC medical image. Furthermore, the best deep hybrid heterogenous ensembles were compared to the deep stacked ensembles to determine the best strategy to design the deep ensemble methods. The empirical evaluations used four classification performance criteria (accuracy, sensitivity, precision and F1-score), fivefold cross-validation, Scott–Knott (SK) statistical test and Borda count voting method. All empirical evaluations were assessed using four performance measures, including accuracy, precision, recall and F1-score, and were over the histological BreakHis public dataset with four magnification factors (40×, 100×, 200× and 400×). SK statistical test and Borda count were also used to cluster the designed techniques and rank the techniques belonging to the best SK cluster, respectively.

Findings

Results showed that the deep hybrid heterogenous ensembles outperformed both their singles and the deep stacked ensembles and reached the accuracy values of 96.3, 95.6, 96.3 and 94 per cent across the four magnification factors 40×, 100×, 200× and 400×, respectively.

Originality/value

The proposed deep hybrid heterogenous ensembles can be applied for the BC diagnosis to assist pathologists in reducing the missed diagnoses and proposing adequate treatments for the patients.

Details

Data Technologies and Applications, vol. 57 no. 2
Type: Research Article
ISSN: 2514-9288

Keywords

Book part
Publication date: 11 September 2020

D. K. Malhotra, Kunal Malhotra and Rashmi Malhotra

Traditionally, loan officers use different credit scoring models to complement judgmental methods to classify consumer loan applications. This study explores the use of decision

Abstract

Traditionally, loan officers use different credit scoring models to complement judgmental methods to classify consumer loan applications. This study explores the use of decision trees, AdaBoost, and support vector machines (SVMs) to identify potential bad loans. Our results show that AdaBoost does provide an improvement over simple decision trees as well as SVM models in predicting good credit clients and bad credit clients. To cross-validate our results, we use k-fold classification methodology.

1 – 10 of 988