Search results

1 – 10 of over 1000
Article
Publication date: 6 June 2008

Norbert Tóth and Béla Pataki

The purpose of this paper is to provide classification confidence value to every individual sample classified by decision trees and use this value to combine the classifiers.

Abstract

Purpose

The purpose of this paper is to provide classification confidence value to every individual sample classified by decision trees and use this value to combine the classifiers.

Design/methodology/approach

The proposed system is first theoretically explained, and then the use and effectiveness of the proposed system is demonstrated on sample datasets.

Findings

In this paper, a novel method is proposed to combine decision tree classifiers using calculated classification confidence values. This confidence in the classification is based on distance calculation to the relevant decision boundary (distance conditional), probability density estimation and (distance conditional) classification confidence estimation. It is shown that these values – provided by individual classification trees – can be integrated to derive a consensus decision.

Research limitations/implications

The proposed method is not limited to axis‐parallel trees, it is applicable not only to oblique trees, but also to any kind of classifier system that uses hyperplanes to cluster the input space.

Originality/value

A novel method is presented to extend decision tree like classifiers with confidence calculation and a voting system is proposed that uses this confidence information. The proposed system possesses several novelties (e.g. it not only gives class probabilities, but also classification confidences) and advantages over previous (traditional) approaches. The voting system does not require an auxiliary combiner or gating network, as in the mixture of experts structure and the method is not limited to decision trees with axis‐parallel splits; it is applicable to any kind of classifiers that use hyperplanes to cluster the input space.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 1 no. 2
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 30 October 2018

Shrawan Kumar Trivedi and Prabin Kumar Panigrahi

Email spam classification is now becoming a challenging area in the domain of text classification. Precise and robust classifiers are not only judged by classification accuracy…

Abstract

Purpose

Email spam classification is now becoming a challenging area in the domain of text classification. Precise and robust classifiers are not only judged by classification accuracy but also by sensitivity (correctly classified legitimate emails) and specificity (correctly classified unsolicited emails) towards the accurate classification, captured by both false positive and false negative rates. This paper aims to present a comparative study between various decision tree classifiers (such as AD tree, decision stump and REP tree) with/without different boosting algorithms (bagging, boosting with re-sample and AdaBoost).

Design/methodology/approach

Artificial intelligence and text mining approaches have been incorporated in this study. Each decision tree classifier in this study is tested on informative words/features selected from the two publically available data sets (SpamAssassin and LingSpam) using a greedy step-wise feature search method.

Findings

Outcomes of this study show that without boosting, the REP tree provides high performance accuracy with the AD tree ranking as the second-best performer. Decision stump is found to be the under-performing classifier of this study. However, with boosting, the combination of REP tree and AdaBoost compares favourably with other classification models. If the metrics false positive rate and performance accuracy are taken together, AD tree and REP tree with AdaBoost were both found to carry out an effective classification task. Greedy stepwise has proven its worth in this study by selecting a subset of valuable features to identify the correct class of emails.

Research limitations/implications

This research is focussed on the classification of those email spams that are written in the English language only. The proposed models work with content (words/features) of email data that is mostly found in the body of the mail. Image spam has not been included in this study. Other messages such as short message service or multi-media messaging service were not included in this study.

Practical implications

In this research, a boosted decision tree approach has been proposed and used to classify email spam and ham files; this is found to be a highly effective approach in comparison with other state-of-the-art modes used in other studies. This classifier may be tested for different applications and may provide new insights for developers and researchers.

Originality/value

A comparison of decision tree classifiers with/without ensemble has been presented for spam classification.

Details

Journal of Systems and Information Technology, vol. 20 no. 3
Type: Research Article
ISSN: 1328-7265

Keywords

Article
Publication date: 31 July 2019

Zhe Zhang and Yue Dai

For classification problems of customer relationship management (CRM), the purpose of this paper is to propose a method with interpretability of the classification results that…

Abstract

Purpose

For classification problems of customer relationship management (CRM), the purpose of this paper is to propose a method with interpretability of the classification results that combines multiple decision trees based on a genetic algorithm.

Design/methodology/approach

In the proposed method, multiple decision trees are combined in parallel. Subsequently, a genetic algorithm is used to optimize the weight matrix in the combination algorithm.

Findings

The method is applied to customer credit rating assessment and customer response behavior pattern recognition. The results demonstrate that compared to a single decision tree, the proposed combination method improves the predictive accuracy and optimizes the classification rules, while maintaining interpretability of the classification results.

Originality/value

The findings of this study contribute to research methodologies in CRM. It specifically focuses on a new method with interpretability by combining multiple decision trees based on genetic algorithms for customer classification.

Details

Asia Pacific Journal of Marketing and Logistics, vol. 32 no. 5
Type: Research Article
ISSN: 1355-5855

Keywords

Article
Publication date: 4 April 2022

Shrawan Kumar Trivedi, Amrinder Singh and Somesh Kumar Malhotra

There is a need to predict whether the consumers liked the stay in the hotel rooms or not, and to remove the aspects the customers did not like. Many customers leave a review…

Abstract

Purpose

There is a need to predict whether the consumers liked the stay in the hotel rooms or not, and to remove the aspects the customers did not like. Many customers leave a review after staying in the hotel. These reviews are mostly given on the website used to book the hotel. These reviews can be considered as a valuable data, which can be analyzed to provide better services in the hotels. The purpose of this study is to use machine learning techniques for analyzing the given data to determine different sentiment polarities of the consumers.

Design/methodology/approach

Reviews given by hotel customers on the Tripadvisor website, which were made available publicly by Kaggle. Out of 10,000 reviews in the data, a sample of 3,000 negative polarity reviews (customers with bad experiences) in the hotel and 3,000 positive polarity reviews (customers with good experiences) in the hotel is taken to prepare data set. The two-stage feature selection was applied, which first involved greedy selection method and then wrapper method to generate 37 most relevant features. An improved stacked decision tree (ISD) classifier) is built, which is further compared with state-of-the-art machine learning algorithms. All the tests are done using R-Studio.

Findings

The results showed that the new model was satisfactory overall with 80.77% accuracy after doing in-depth study with 50–50 split, 80.74% accuracy for 66–34 split and 80.25% accuracy for 80–20 split, when predicting the nature of the customers’ experience in the hotel, i.e. whether they are positive or negative.

Research limitations/implications

The implication of this research is to provide a showcase of how we can predict the polarity of potentially popular reviews. This helps the authors’ perspective to help the hotel industries to take corrective measures for the betterment of business and to promote useful positive reviews. This study also has some limitations like only English reviews are considered. This study was restricted to the data from trip-adviser website; however, a new data may be generated to test the credibility of the model. Only aspect-based sentiment classification is considered in this study.

Originality/value

Stacking machine learning techniques have been proposed. At first, state-of-the-art classifiers are tested on the given data, and then, three best performing classifiers (decision tree C5.0, random forest and support vector machine) are taken to build stack and to create ISD classifier.

Article
Publication date: 3 May 2016

Mohammad Fathian, Yaser Hoseinpoor and Behrouz Minaei-Bidgoli

Churn management is a fundamental process in firms to keep their customers. Therefore, predicting the customer’s churn is essential to facilitate such processes. The literature…

1009

Abstract

Purpose

Churn management is a fundamental process in firms to keep their customers. Therefore, predicting the customer’s churn is essential to facilitate such processes. The literature has introduced data mining approaches for this purpose. On the other hand, results indicate that performance of classification models increases by combining two or more techniques. The purpose of this paper is to propose a combined model based on clustering and ensemble classifiers.

Design/methodology/approach

Based on churn data set in Cell2Cell, single baseline classifiers, ensemble classifiers are used for comparisons. Specifically, self-organizing map (SOM) clustering technique, and four other classifier techniques including decision tree, artificial neural networks, support vector machine, and K-nearest neighbors were used. Moreover, for reduced dimensions of the features, principal component analysis (PCA) method was employed.

Findings

As results 14 models are compared with each other regarding accuracy, sensitivity, specification, F-measure, and AUC. The results showed that combination of SOM, PCA, and heterogeneous boosting achieved the best performance comparing with other classification models.

Originality/value

This study examined the performance of classifier ensembles in predicting customers churn. In particular, heterogeneous classifier ensembles such as bagging and boosting are compared.

Details

Kybernetes, vol. 45 no. 5
Type: Research Article
ISSN: 0368-492X

Keywords

Article
Publication date: 29 October 2018

Shrawan Kumar Trivedi and Shubhamoy Dey

To be sustainable and competitive in the current business environment, it is useful to understand users’ sentiment towards products and services. This critical task can be…

Abstract

Purpose

To be sustainable and competitive in the current business environment, it is useful to understand users’ sentiment towards products and services. This critical task can be achieved via natural language processing and machine learning classifiers. This paper aims to propose a novel probabilistic committee selection classifier (PCC) to analyse and classify the sentiment polarities of movie reviews.

Design/methodology/approach

An Indian movie review corpus is assembled for this study. Another publicly available movie review polarity corpus is also involved with regard to validating the results. The greedy stepwise search method is used to extract the features/words of the reviews. The performance of the proposed classifier is measured using different metrics, such as F-measure, false positive rate, receiver operating characteristic (ROC) curve and training time. Further, the proposed classifier is compared with other popular machine-learning classifiers, such as Bayesian, Naïve Bayes, Decision Tree (J48), Support Vector Machine and Random Forest.

Findings

The results of this study show that the proposed classifier is good at predicting the positive or negative polarity of movie reviews. Its performance accuracy and the value of the ROC curve of the PCC is found to be the most suitable of all other classifiers tested in this study. This classifier is also found to be efficient at identifying positive sentiments of reviews, where it gives low false positive rates for both the Indian Movie Review and Review Polarity corpora used in this study. The training time of the proposed classifier is found to be slightly higher than that of Bayesian, Naïve Bayes and J48.

Research limitations/implications

Only movie review sentiments written in English are considered. In addition, the proposed committee selection classifier is prepared only using the committee of probabilistic classifiers; however, other classifier committees can also be built, tested and compared with the present experiment scenario.

Practical implications

In this paper, a novel probabilistic approach is proposed and used for classifying movie reviews, and is found to be highly effective in comparison with other state-of-the-art classifiers. This classifier may be tested for different applications and may provide new insights for developers and researchers.

Social implications

The proposed PCC may be used to classify different product reviews, and hence may be beneficial to organizations to justify users’ reviews about specific products or services. By using authentic positive and negative sentiments of users, the credibility of the specific product, service or event may be enhanced. PCC may also be applied to other applications, such as spam detection, blog mining, news mining and various other data-mining applications.

Originality/value

The constructed PCC is novel and was tested on Indian movie review data.

Article
Publication date: 11 June 2018

Deepika Kishor Nagthane and Archana M. Rajurkar

One of the main reasons for increase in mortality rate in woman is breast cancer. Accurate early detection of breast cancer seems to be the only solution for diagnosis. In the…

Abstract

Purpose

One of the main reasons for increase in mortality rate in woman is breast cancer. Accurate early detection of breast cancer seems to be the only solution for diagnosis. In the field of breast cancer research, many new computer-aided diagnosis systems have been developed to reduce the diagnostic test false positives because of the subtle appearance of breast cancer tissues. The purpose of this study is to develop the diagnosis technique for breast cancer using LCFS and TreeHiCARe classifier model.

Design/methodology/approach

The proposed diagnosis methodology initiates with the pre-processing procedure. Subsequently, feature extraction is performed. In feature extraction, the image features which preserve the characteristics of the breast tissues are extracted. Consequently, feature selection is performed by the proposed least-mean-square (LMS)-Cuckoo search feature selection (LCFS) algorithm. The feature selection from the vast range of the features extracted from the images is performed with the help of the optimal cut point provided by the LCS algorithm. Then, the image transaction database table is developed using the keywords of the training images and feature vectors. The transaction resembles the itemset and the association rules are generated from the transaction representation based on a priori algorithm with high conviction ratio and lift. After association rule generation, the proposed TreeHiCARe classifier model emanates in the diagnosis methodology. In TreeHICARe classifier, a new feature index is developed for the selection of a central feature for the decision tree centered on which the classification of images into normal or abnormal is performed.

Findings

The performance of the proposed method is validated over existing works using accuracy, sensitivity and specificity measures. The experimentation of proposed method on Mammographic Image Analysis Society database resulted in classification of normal and abnormal cancerous mammogram images with an accuracy of 0.8289, sensitivity of 0.9333 and specificity of 0.7273.

Originality/value

This paper proposes a new approach for the breast cancer diagnosis system by using mammogram images. The proposed method uses two new algorithms: LCFS and TreeHiCARe. LCFS is used to select optimal feature split points, and TreeHiCARe is the decision tree classifier model based on association rule agreements.

Details

Sensor Review, vol. 39 no. 1
Type: Research Article
ISSN: 0260-2288

Keywords

Article
Publication date: 26 July 2021

João Antônio Dantas de Jesus Ferreira and Ney Rafael Secco

This paper aims to investigate the possibility of lowering the time taken during the aircraft design for unmanned aerial vehicles by using machine learning (ML) for the…

Abstract

Purpose

This paper aims to investigate the possibility of lowering the time taken during the aircraft design for unmanned aerial vehicles by using machine learning (ML) for the configuration selection phase. In this work, a database of unmanned aircraft is compiled and is proposed that decision tree classifiers (DTC) can understand the relations between mission and operational requirements and the resulting aircraft configuration.

Design/methodology/approach

This paper presents a ML-based approach to configuration selection of unmanned aircraft. Multiple DTC are built to predict the overall configuration. The classifiers are trained with a database of 118 unmanned aircraft with 57 characteristics, 47 of which are inputs for the classification problem, and 10 are the desired outputs, such as wing configuration or engine type.

Findings

This paper shows that DTC can be used for the configuration selection of unmanned aircraft with reasonable accuracy, understanding the connections between the different mission requirements and the culminating configuration. The framework is also capable of dealing with incomplete databases, maximizing the available knowledge.

Originality/value

This paper increases the computational usage for the aircraft design while retaining requirements’ traceability and increasing decision awareness.

Details

Aircraft Engineering and Aerospace Technology, vol. 93 no. 6
Type: Research Article
ISSN: 1748-8842

Keywords

Article
Publication date: 14 July 2021

Ouidad Akhrif, Chaymae Benfaress, Mostapha EL Jai, Youness El Bouzekri El Idrissi and Nabil Hmina

The purpose of this paper is to reveal the smart collaborative learning service. This concept aims to build teams of learners based on the complementarity of their skills…

Abstract

Purpose

The purpose of this paper is to reveal the smart collaborative learning service. This concept aims to build teams of learners based on the complementarity of their skills, allowing flexible participation and offering interdisciplinary collaboration opportunities for all the learners. The success of this environment is related to predict efficient collaboration between the different teammates, allowing a smartly sharing knowledge in the Smart University environment.

Design/methodology/approach

A random forest (RF) approach is proposed, which is based on semantic modelization of the learner and the problem-solving allowing multidisciplinary collaboration, and heuristic completeness processing to build complementary teams. To achieve that, this paper established a Konstanz Information Miner workflow that integrates the main steps for building and evaluating the RF classifier, this workflow is divided into: extracting knowledge from the smart collaborative learning ontology, calculating the completeness using a novel heuristic and building the RF classifier.

Findings

The smart collaborative learning service enables efficient collaboration and democratized sharing of knowledge between learners, by using a semantic support decision support system. This service solves a frequent issue related to the composition of learning groups to serve pedagogical perspectives.

Originality/value

The present study harmonizes the integration of ontology, a new heuristic processing and supervised machine learning algorithm aiming at building an intelligent collaborative learning service that includes a qualified classifier of complementary teams of learners.

Article
Publication date: 12 January 2022

Mahesh Babu Mariappan, Kanniga Devi, Yegnanarayanan Venkataraman, Ming K. Lim and Panneerselvam Theivendren

This paper aims to address the pressing problem of prediction concerning shipment times of therapeutics, diagnostics and vaccines during the ongoing COVID-19 pandemic using a…

1055

Abstract

Purpose

This paper aims to address the pressing problem of prediction concerning shipment times of therapeutics, diagnostics and vaccines during the ongoing COVID-19 pandemic using a novel artificial intelligence (AI) and machine learning (ML) approach.

Design/methodology/approach

The present study used organic real-world therapeutic supplies data of over 3 million shipments collected during the COVID-19 pandemic through a large real-world e-pharmacy. The researchers built various ML multiclass classification models, namely, random forest (RF), extra trees (XRT), decision tree (DT), multilayer perceptron (MLP), XGBoost (XGB), CatBoost (CB), linear stochastic gradient descent (SGD) and the linear Naïve Bayes (NB) and trained them on striped datasets of (source, destination, shipper) triplets. The study stacked the base models and built stacked meta-models. Subsequently, the researchers built a model zoo with a combination of the base models and stacked meta-models trained on these striped datasets. The study used 10-fold cross-validation (CV) for performance evaluation.

Findings

The findings reveal that the turn-around-time provided by therapeutic supply logistics providers is only 62.91% accurate when compared to reality. In contrast, the solution provided in this study is up to 93.5% accurate compared to reality, resulting in up to 48.62% improvement, with a clear trend of more historic data and better performance growing each week.

Research limitations/implications

The implication of the study has shown the efficacy of ML model zoo with a combination of base models and stacked meta-models trained on striped datasets of (source, destination and shipper) triplets for predicting the shipment times of therapeutics, diagnostics and vaccines in the e-pharmacy supply chain.

Originality/value

The novelty of the study is on the real-world e-pharmacy supply chain under post-COVID-19 lockdown conditions and has come up with a novel ML ensemble stacking based model zoo to make predictions on the shipment times of therapeutics. Through this work, it is assumed that there will be greater adoption of AI and ML techniques in shipment time prediction of therapeutics in the logistics industry in the pandemic situations.

Details

The International Journal of Logistics Management, vol. 34 no. 2
Type: Research Article
ISSN: 0957-4093

Keywords

1 – 10 of over 1000