Search results

1 – 10 of 711
To view the access options for this content please click here

Abstract

Details

Machine Learning and Artificial Intelligence in Marketing and Sales
Type: Book
ISBN: 978-1-80043-881-1

To view the access options for this content please click here
Book part
Publication date: 15 March 2021

Jochen Hartmann

Across disciplines, researchers and practitioners employ decision tree ensembles such as random forests and XGBoost with great success. What explains their popularity…

Abstract

Across disciplines, researchers and practitioners employ decision tree ensembles such as random forests and XGBoost with great success. What explains their popularity? This chapter showcases how marketing scholars and decision-makers can harness the power of decision tree ensembles for academic and practical applications. The author discusses the origin of decision tree ensembles, explains their theoretical underpinnings, and illustrates them empirically using a real-world telemarketing case, with the objective of predicting customer conversions. Readers unfamiliar with decision tree ensembles will learn to appreciate them for their versatility, competitive accuracy, ease of application, and computational efficiency and will gain a comprehensive understanding why decision tree ensembles contribute to every data scientist's methodological toolbox.

Details

The Machine Age of Customer Insight
Type: Book
ISBN: 978-1-83909-697-6

Keywords

Content available
Book part
Publication date: 6 September 2019

Son Nguyen, Gao Niu, John Quinn, Alan Olinsky, Jonathan Ormsbee, Richard M. Smith and James Bishop

In recent years, the problem of classification with imbalanced data has been growing in popularity in the data-mining and machine-learning communities due to the emergence…

Abstract

In recent years, the problem of classification with imbalanced data has been growing in popularity in the data-mining and machine-learning communities due to the emergence of an abundance of imbalanced data in many fields. In this chapter, we compare the performance of six classification methods on an imbalanced dataset under the influence of four resampling techniques. These classification methods are the random forest, the support vector machine, logistic regression, k-nearest neighbor (KNN), the decision tree, and AdaBoost. Our study has shown that all of the classification methods have difficulty when working with the imbalanced data, with the KNN performing the worst, detecting only 27.4% of the minority class. However, with the help of resampling techniques, all of the classification methods experience improvement on overall performances. In particular, the Random Forest, in combination with the random over-sampling technique, performs the best, achieving 82.8% balanced accuracy (the average of the true-positive rate and true-negative rate).

We then propose a new procedure to resample the data. Our method is based on the idea of eliminating “easy” majority observations before under-sampling them. It has further improved the balanced accuracy of the Random Forest to 83.7%, making it the best approach for the imbalanced data.

Details

Advances in Business and Management Forecasting
Type: Book
ISBN: 978-1-78754-290-7

Keywords

To view the access options for this content please click here
Book part
Publication date: 30 September 2020

Hera Khan, Ayush Srivastav and Amit Kumar Mishra

A detailed description will be provided of all the classification algorithms that have been widely used in the domain of medical science. The foundation will be laid by…

Abstract

A detailed description will be provided of all the classification algorithms that have been widely used in the domain of medical science. The foundation will be laid by giving a comprehensive overview pertaining to the background and history of the classification algorithms. This will be followed by an extensive discussion regarding various techniques of classification algorithm in machine learning (ML) hence concluding with their relevant applications in data analysis in medical science and health care. To begin with, the initials of this chapter will deal with the basic fundamentals required for a profound understanding of the classification techniques in ML which will comprise of the underlying differences between Unsupervised and Supervised Learning followed by the basic terminologies of classification and its history. Further, it will include the types of classification algorithms ranging from linear classifiers like Logistic Regression, Naïve Bayes to Nearest Neighbour, Support Vector Machine, Tree-based Classifiers, and Neural Networks, and their respective mathematics. Ensemble algorithms such as Majority Voting, Boosting, Bagging, Stacking will also be discussed at great length along with their relevant applications. Furthermore, this chapter will also incorporate comprehensive elucidation regarding the areas of application of such classification algorithms in the field of biomedicine and health care and their contribution to decision-making systems and predictive analysis. To conclude, this chapter will devote highly in the field of research and development as it will provide a thorough insight to the classification algorithms and their relevant applications used in the cases of the healthcare development sector.

Details

Big Data Analytics and Intelligence: A Perspective for Health Care
Type: Book
ISBN: 978-1-83909-099-8

Keywords

To view the access options for this content please click here
Article
Publication date: 4 March 2020

Nesreen El-Rayes, Ming Fang, Michael Smith and Stephen M. Taylor

The purpose of this study is to develop tree-based binary classification models to predict the likelihood of employee attrition based on firm cultural and management attributes.

Abstract

Purpose

The purpose of this study is to develop tree-based binary classification models to predict the likelihood of employee attrition based on firm cultural and management attributes.

Design/methodology/approach

A data set of resumes anonymously submitted through Glassdoor’s online portal is used in tandem with public company review information to fit decision tree, random forest and gradient boosted tree models to predict the probability of an employee leaving a firm during a job transition.

Findings

Random forest and decision tree methods are found to be the strongest attrition prediction models. In addition, compensation, company culture and senior management performance play a primary role in an employee’s decision to leave a firm.

Practical implications

This study may be used by human resources staff to better understand factors which influence employee attrition. In addition, techniques developed in this study may be applied to company-specific data sets to construct customized attrition models.

Originality/value

This study contains several novel contributions which include exploratory studies such as industry job transition percentages, distributional comparisons between factors strongly contributing to employee attrition between those who left or stayed with the firm and the first comprehensive search over binary classification models to identify which provides the strongest predictive performance of employee attrition.

Details

International Journal of Organizational Analysis, vol. 28 no. 6
Type: Research Article
ISSN: 1934-8835

Keywords

To view the access options for this content please click here
Article
Publication date: 1 July 2020

Jiaming Liu, Liuan Wang, Linan Zhang, Zeming Zhang and Sicheng Zhang

The primary objective of this study was to recognize critical indicators in predicting blood glucose (BG) through data-driven methods and to compare the prediction…

Abstract

Purpose

The primary objective of this study was to recognize critical indicators in predicting blood glucose (BG) through data-driven methods and to compare the prediction performance of four tree-based ensemble models, i.e. bagging with tree regressors (bagging-decision tree [Bagging-DT]), AdaBoost with tree regressors (Adaboost-DT), random forest (RF) and gradient boosting decision tree (GBDT).

Design/methodology/approach

This study proposed a majority voting feature selection method by combining lasso regression with the Akaike information criterion (AIC) (LR-AIC), lasso regression with the Bayesian information criterion (BIC) (LR-BIC) and RF to select indicators with excellent predictive performance from initial 38 indicators in 5,642 samples. The selected features were deployed to build the tree-based ensemble models. The 10-fold cross-validation (CV) method was used to evaluate the performance of each ensemble model.

Findings

The results of feature selection indicated that age, corpuscular hemoglobin concentration (CHC), red blood cell volume distribution width (RBCVDW), red blood cell volume and leucocyte count are five most important clinical/physical indicators in BG prediction. Furthermore, this study also found that the GBDT ensemble model combined with the proposed majority voting feature selection method is better than other three models with respect to prediction performance and stability.

Practical implications

This study proposed a novel BG prediction framework for better predictive analytics in health care.

Social implications

This study incorporated medical background and machine learning technology to reduce diabetes morbidity and formulate precise medical schemes.

Originality/value

The majority voting feature selection method combined with the GBDT ensemble model provides an effective decision-making tool for predicting BG and detecting diabetes risk in advance.

To view the access options for this content please click here
Article
Publication date: 13 November 2019

Sasanka Choudhury, Dhirendra Nath Thatoi, Jhalak Hota and Mohan D. Rao

To avoid the structural defect, early crack detection is oneof the important aspects in the recent area of research. The purpose of this paper is to detect the crack…

Abstract

Purpose

To avoid the structural defect, early crack detection is oneof the important aspects in the recent area of research. The purpose of this paper is to detect the crack before its failure by means of its position and severity.

Design/methodology/approach

This paper uses two trees based regressors, namely, decision tree (DT) regressor and random forest (RF) regressor for their capabilities to adopt different types of parameter and generate simple rules by which the method can predict the crack parameters with better accuracy, making it possible to effectively predict the crack parameters such as its location and depth before failure of the beam.

Findings

The predicted parameters can be achieved, if the relationship between vibration and crack parameters can be attained. The relationship yields the results of beam natural frequencies, which is used as the input value for the regression techniques. It is observed that the RF regressor predicts the parameters with better accuracy as compared to DT regressor.

Originality/value

The idea is used the developed regression techniques to identify the crack parameters which are more effective as compared to other developed methods because the alternate name of prediction is called regression. The authors have used DT regressor and RF regressor to achieve the target. In this paper care has been given to the generalization of the model, so that the adaptability of the model can be ensured. The robustness of proposed methods has been verified in support of numerical and experimental analysis.

Details

International Journal of Structural Integrity, vol. 11 no. 6
Type: Research Article
ISSN: 1757-9864

Keywords

To view the access options for this content please click here
Book part
Publication date: 4 December 2020

Gauri Rajendra Virkar and Supriya Sunil Shinde

Predictive analytics is the science of decision-making that eliminates guesswork out of the decision-making process and applies proven scientific procedures to find right…

Abstract

Predictive analytics is the science of decision-making that eliminates guesswork out of the decision-making process and applies proven scientific procedures to find right solutions. Predictive analytics provides ideas on the occurrences of future downtimes and rejections thereby aids in taking preventive actions before abnormalities occur. Considering these advantages, predictive analytics is adopted in various diverse fields such as health care, finance, education, marketing, automotive, etc. Predictive analytics tools can be used to predict various behaviors and patterns, thereby saving the time and money of its users. Many open-source predictive analysis tools namely R, scikit-learn, Konstanz Information Miner (KNIME), Orange, RapidMiner, Waikato Environment for Knowledge Analysis (WEKA), etc. are freely available for the users. This chapter aims to reveal the best accurate tools and techniques for the classification task that aid in decision-making. Our experimental results show that no specific tool provides the best results in all scenarios; rather it depends upon the datasets and the classifier.

To view the access options for this content please click here
Article
Publication date: 26 June 2020

Jamal Al Qundus, Adrian Paschke, Shivam Gupta, Ahmad M. Alzouby and Malik Yousef

The purpose of this paper is to explore to which extent the quality of social media short text without extensions can be investigated and what are the predictors, if any…

Abstract

Purpose

The purpose of this paper is to explore to which extent the quality of social media short text without extensions can be investigated and what are the predictors, if any, of such short text that lead to trust its content.

Design/methodology/approach

The paper applies a trust model to classify data collections based on metadata into four classes: Very Trusted, Trusted, Untrusted and Very Untrusted. These data are collected from the online communities, Genius and Stack Overflow. In order to evaluate short texts in terms of its trust levels, the authors have conducted two investigations: (1) A natural language processing (NLP) approach to extract relevant features (i.e. Part-of-Speech and various readability indexes). The authors report relatively good performance of the NLP study. (2) A machine learning technique in more precise, a random forest (RF) classifierusing bag-of-words model (BoW).

Findings

The investigation of the RF classifier using BoW shows promising intermediate results (on average 62% accuracy of both online communities) in short-text quality identification that leads to trust.

Practical implications

As social media becomes an increasingly new and attractive source of information, which is mostly provided in the form of short texts, businesses (e.g. in search engines for smart data) can filter content without having to apply complex approaches and continue to deal with information that is considered more trustworthy.

Originality/value

Short-text classifications with regard to a criterion (e.g. quality, readability) are usually extended by an external source or its metadata. This enhancement either changes the original text if it is an additional text from an external source, or it requires text metadata that is not always available. To this end, the originality of this study faces the challenge of investigating the quality of short text (i.e. social media text) without having to extend or modify it using external sources. This modification alters the text and distorts the results of the investigation.

Details

Journal of Enterprise Information Management, vol. 33 no. 6
Type: Research Article
ISSN: 1741-0398

Keywords

To view the access options for this content please click here
Book part
Publication date: 10 November 2017

Karen Miller

This chapter explores differences in fringe, distant, and remote rural public library assets for asset-based community development (ABCD) and the relationships of those…

Abstract

This chapter explores differences in fringe, distant, and remote rural public library assets for asset-based community development (ABCD) and the relationships of those assets to geographic regions, governance structures, and demographics.

The author analyzes 2013 data from the Institute of Museum and Library Services (IMLS) and U.S. Department of Agriculture using nonparametric statistics and data mining random forest supervised classification algorithms.

There are statistically significant differences between fringe, distant, and remote library assets. Unexpectedly, median per capita outlets (along with service hours and staff) increase as distances from urban areas increase. The Southeast region ranks high in unemployment and poverty and low in median household income, which aligns with the Southeast’s low median per capita library expenditures, staff, hours, inventory, and programs. However, the Southeast’s relatively high percentage of rural libraries with at least one staff member with a Master of Library and Information Science promises future asset growth in those libraries. State and federal contributions to Alaska libraries propelled the remote Far West to the number one ranking in median per capita staff, inventory, and programs.

This study is based on IMLS library system-wide data and does not include rural library branches operated by nonrural central libraries.

State and federal contributions to rural libraries increase economic, cultural, and social capital creation in the most remote communities. On a per capita basis, economic capital from state and federal agencies assists small, remote rural libraries in providing infrastructure and services that are more closely aligned with libraries in more populated areas and increases library assets available for ABCD initiatives in otherwise underserved communities.

Even the smallest rural library can contribute to ABCD initiatives by connecting their communities to outside resources and creating new economic, cultural, and social assets.

Analyzing rural public library assets within their geographic, political, and demographic contexts highlights their potential contributions to ABCD initiatives.

Details

Rural and Small Public Libraries: Challenges and Opportunities
Type: Book
ISBN: 978-1-78743-112-6

Keywords

1 – 10 of 711