Search results

1 – 10 of 143
Article
Publication date: 22 September 2021

Samar Ali Shilbayeh and Sunil Vadera

This paper aims to describe the use of a meta-learning framework for recommending cost-sensitive classification methods with the aim of answering an important question that arises…

Abstract

Purpose

This paper aims to describe the use of a meta-learning framework for recommending cost-sensitive classification methods with the aim of answering an important question that arises in machine learning, namely, “Among all the available classification algorithms, and in considering a specific type of data and cost, which is the best algorithm for my problem?”

Design/methodology/approach

This paper describes the use of a meta-learning framework for recommending cost-sensitive classification methods for the aim of answering an important question that arises in machine learning, namely, “Among all the available classification algorithms, and in considering a specific type of data and cost, which is the best algorithm for my problem?” The framework is based on the idea of applying machine learning techniques to discover knowledge about the performance of different machine learning algorithms. It includes components that repeatedly apply different classification methods on data sets and measures their performance. The characteristics of the data sets, combined with the algorithms and the performance provide the training examples. A decision tree algorithm is applied to the training examples to induce the knowledge, which can then be used to recommend algorithms for new data sets. The paper makes a contribution to both meta-learning and cost-sensitive machine learning approaches. Those both fields are not new, however, building a recommender that recommends the optimal case-sensitive approach for a given data problem is the contribution. The proposed solution is implemented in WEKA and evaluated by applying it on different data sets and comparing the results with existing studies available in the literature. The results show that a developed meta-learning solution produces better results than METAL, a well-known meta-learning system. The developed solution takes the misclassification cost into consideration during the learning process, which is not available in the compared project.

Findings

The proposed solution is implemented in WEKA and evaluated by applying it to different data sets and comparing the results with existing studies available in the literature. The results show that a developed meta-learning solution produces better results than METAL, a well-known meta-learning system.

Originality/value

The paper presents a major piece of new information in writing for the first time. Meta-learning work has been done before but this paper presents a new meta-learning framework that is costs sensitive.

Details

Journal of Modelling in Management, vol. 17 no. 3
Type: Research Article
ISSN: 1746-5664

Keywords

Article
Publication date: 8 April 2014

Kristof Coussement

Retailers realize that customer churn detection is a critical success factor. However, no research study has taken into consideration that misclassifying a customer as a…

3449

Abstract

Purpose

Retailers realize that customer churn detection is a critical success factor. However, no research study has taken into consideration that misclassifying a customer as a non-churner (i.e. predicting that (s)he will not leave the company, while in reality (s)he does) results in higher costs than predicting that a staying customer will churn. The aim of this paper is to examine the prediction performance of various cost-sensitive methodologies (direct minimum expected cost (DMECC), metacost, thresholding and weighting) that incorporate these different costs of misclassifying customers in predicting churn.

Design/methodology/approach

Cost-sensitive methodologies are benchmarked on six real-life churn datasets from the retail industry.

Findings

This article argues that total misclassification cost, as a churn prediction evaluation measure, is crucial as input for optimizing consumer decision making. The practical classification threshold of 0.5 for churn probabilities (i.e. when the churn probability is greater than 0.5, the customer is predicted as a churner, and otherwise as a non-churner) offers the worst performance. The provided managerial guidelines suggest when to use each cost-sensitive method, depending on churn levels and the cost level discrepancy between misclassifying churners versus non-churners.

Practical implications

This research emphasizes the importance of cost-sensitive learning to improve customer retention management in the retail context.

Originality/value

This article is the first to use the concept of misclassification costs in a churn prediction setting, and to offer recommendations about the circumstances in which marketing managers should use specific cost-sensitive methodologies.

Details

European Journal of Marketing, vol. 48 no. 3/4
Type: Research Article
ISSN: 0309-0566

Keywords

Article
Publication date: 8 April 2022

Botond Benedek, Cristina Ciumas and Bálint Zsolt Nagy

The purpose of this paper is to survey the automobile insurance fraud detection literature in the past 31 years (1990–2021) and present a research agenda that addresses the…

1322

Abstract

Purpose

The purpose of this paper is to survey the automobile insurance fraud detection literature in the past 31 years (1990–2021) and present a research agenda that addresses the challenges and opportunities artificial intelligence and machine learning bring to car insurance fraud detection.

Design/methodology/approach

Content analysis methodology is used to analyze 46 peer-reviewed academic papers from 31 journals plus eight conference proceedings to identify their research themes and detect trends and changes in the automobile insurance fraud detection literature according to content characteristics.

Findings

This study found that automobile insurance fraud detection is going through a transformation, where traditional statistics-based detection methods are replaced by data mining- and artificial intelligence-based approaches. In this study, it was also noticed that cost-sensitive and hybrid approaches are the up-and-coming avenues for further research.

Practical implications

This paper’s findings not only highlight the rise and benefits of data mining- and artificial intelligence-based automobile insurance fraud detection but also highlight the deficiencies observable in this field such as the lack of cost-sensitive approaches or the absence of reliable data sets.

Originality/value

This paper offers greater insight into how artificial intelligence and data mining challenges traditional automobile insurance fraud detection models and addresses the need to develop new cost-sensitive fraud detection methods that identify new real-world data sets.

Details

Journal of Financial Regulation and Compliance, vol. 30 no. 4
Type: Research Article
ISSN: 1358-1988

Keywords

Article
Publication date: 14 May 2021

Zhenyuan Wang, Chih-Fong Tsai and Wei-Chao Lin

Class imbalance learning, which exists in many domain problem datasets, is an important research topic in data mining and machine learning. One-class classification techniques…

Abstract

Purpose

Class imbalance learning, which exists in many domain problem datasets, is an important research topic in data mining and machine learning. One-class classification techniques, which aim to identify anomalies as the minority class from the normal data as the majority class, are one representative solution for class imbalanced datasets. Since one-class classifiers are trained using only normal data to create a decision boundary for later anomaly detection, the quality of the training set, i.e. the majority class, is one key factor that affects the performance of one-class classifiers.

Design/methodology/approach

In this paper, we focus on two data cleaning or preprocessing methods to address class imbalanced datasets. The first method examines whether performing instance selection to remove some noisy data from the majority class can improve the performance of one-class classifiers. The second method combines instance selection and missing value imputation, where the latter is used to handle incomplete datasets that contain missing values.

Findings

The experimental results are based on 44 class imbalanced datasets; three instance selection algorithms, including IB3, DROP3 and the GA, the CART decision tree for missing value imputation, and three one-class classifiers, which include OCSVM, IFOREST and LOF, show that if the instance selection algorithm is carefully chosen, performing this step could improve the quality of the training data, which makes one-class classifiers outperform the baselines without instance selection. Moreover, when class imbalanced datasets contain some missing values, combining missing value imputation and instance selection, regardless of which step is first performed, can maintain similar data quality as datasets without missing values.

Originality/value

The novelty of this paper is to investigate the effect of performing instance selection on the performance of one-class classifiers, which has never been done before. Moreover, this study is the first attempt to consider the scenario of missing values that exist in the training set for training one-class classifiers. In this case, performing missing value imputation and instance selection with different orders are compared.

Details

Data Technologies and Applications, vol. 55 no. 5
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 2 November 2021

Pengyun Zhao, Shoufeng Ji and Yaoting Xue

The purpose of this paper is to propose an innovative integration method based on decision-theoretic rough set and the extended VlseKriterijuska Optimizacija I Komoromisno Resenje…

Abstract

Purpose

The purpose of this paper is to propose an innovative integration method based on decision-theoretic rough set and the extended VlseKriterijuska Optimizacija I Komoromisno Resenje (VIKOR) methods to address the resilient-sustainable supplier selection and order allocation (SS/OA) problem.

Design/methodology/approach

Specifically, a two-stage approach is designed in this paper. First, the decision-theoretic rough set is employed to calculate the rough number for coping with the subjective uncertainty of data and assigning the weights for a resilient-sustainable evaluation criterion. On this basis, the supplier resilient-sustainable performance is ranked in combination with the extended VIKOR method. Second, a novel multi-objective optimization model is proposed that applies an improved genetic algorithm to select the resilient-sustainable supplier and allocate the corresponding order quantity under a multi-tier supplier network.

Findings

The results reveal that joint consideration of resilience and sustainability is essential in the SS/OA process. The method proposed in this study based on decision-theoretic rough sets and the extended VIKOR method can handle imprecise information flexibly, reduce information loss and obtain acceptable solutions for decision-makers. Numerical cases validate that this integrated approach can combine resilience and sustainability for effective and efficient SS/OA.

Practical implications

This paper provides industry managers with a new perspective on SS/OA from a resilience and sustainability perspective as a basis for best practices for industry resilience and sustainability. The proposed method helps to evaluate the resilient-sustainable performance of potential suppliers, which is applicable to solving real-world SS/OA problems and has important practical implications for the resilient-sustainable development of supply chains.

Originality/value

The two interrelated priorities of resilience and sustainability have emerged as key strategic challenges in SS/OA issues. This paper is the first study of this issue that uses the proposed integrated approach.

Article
Publication date: 25 November 2013

Wu He

As mobile malware and virus are rapidly increasing in frequency and sophistication, mobile social media has recently become a very popular attack vector. The purpose of this paper…

4186

Abstract

Purpose

As mobile malware and virus are rapidly increasing in frequency and sophistication, mobile social media has recently become a very popular attack vector. The purpose of this paper is to survey the state-of-the-art of security aspect of mobile social media, identify recent trends, and provide recommendations for researchers and practitioners in this fast moving field.

Design/methodology/approach

This paper reviews disparate discussions in literature on security aspect of mobile social media though blog mining and an extensive literature search. Based on the detailed review, the author summarizes some key insights to help enterprises understand security risks associated with mobile social media.

Findings

Risks related to mobile social media are identified based on the results of the review. Best practices and useful tips are offered to help enterprises mitigate risks of mobile social media. This paper also provides insights and guidance for enterprises to mitigate the security risks of mobile social media.

Originality/value

The paper consolidates the fragmented discussion in literature and provides an in-depth review to help researchers understand the latest development of security risks associated with mobile social media.

Details

Information Management & Computer Security, vol. 21 no. 5
Type: Research Article
ISSN: 0968-5227

Keywords

Article
Publication date: 26 August 2014

Bilal M’hamed Abidine, Belkacem Fergani, Mourad Oussalah and Lamya Fergani

The task of identifying activity classes from sensor information in smart home is very challenging because of the imbalanced nature of such data set where some activities occur…

Abstract

Purpose

The task of identifying activity classes from sensor information in smart home is very challenging because of the imbalanced nature of such data set where some activities occur more frequently than others. Typically probabilistic models such as Hidden Markov Model (HMM) and Conditional Random Fields (CRF) are known as commonly employed for such purpose. The paper aims to discuss these issues.

Design/methodology/approach

In this work, the authors propose a robust strategy combining the Synthetic Minority Over-sampling Technique (SMOTE) with Cost Sensitive Support Vector Machines (CS-SVM) with an adaptive tuning of cost parameter in order to handle imbalanced data problem.

Findings

The results have demonstrated the usefulness of the approach through comparison with state of art of approaches including HMM, CRF, the traditional C-Support vector machines (C-SVM) and the Cost-Sensitive-SVM (CS-SVM) for classifying the activities using binary and ubiquitous sensors.

Originality/value

Performance metrics in the experiment/simulation include Accuracy, Precision/Recall and F measure.

Details

Kybernetes, vol. 43 no. 8
Type: Research Article
ISSN: 0368-492X

Keywords

Article
Publication date: 27 May 2021

Sara Tavassoli and Hamidreza Koosha

Customer churn prediction is one of the most well-known approaches to manage and improve customer retention. Machine learning techniques, especially classification algorithms, are…

Abstract

Purpose

Customer churn prediction is one of the most well-known approaches to manage and improve customer retention. Machine learning techniques, especially classification algorithms, are very popular tools to predict the churners. In this paper, three ensemble classifiers are proposed based on bagging and boosting for customer churn prediction.

Design/methodology/approach

In this paper, three ensemble classifiers are proposed based on bagging and boosting for customer churn prediction. The first classifier, which is called boosted bagging, uses boosting for each bagging sample. In this approach, before concluding the final results in a bagging algorithm, the authors try to improve the prediction by applying a boosting algorithm for each bootstrap sample. The second proposed ensemble classifier, which is called bagged bagging, combines bagging with itself. In the other words, the authors apply bagging for each sample of bagging algorithm. Finally, the third approach uses bagging of neural network with learning based on a genetic algorithm.

Findings

To examine the performance of all proposed ensemble classifiers, they are applied to two datasets. Numerical simulations illustrate that the proposed hybrid approaches outperform the simple bagging and boosting algorithms as well as base classifiers. Especially, bagged bagging provides high accuracy and precision results.

Originality/value

In this paper, three novel ensemble classifiers are proposed based on bagging and boosting for customer churn prediction. Not only the proposed approaches can be applied for customer churn prediction but also can be used for any other binary classification algorithms.

Details

Kybernetes, vol. 51 no. 3
Type: Research Article
ISSN: 0368-492X

Keywords

Article
Publication date: 4 December 2018

Zhongyi Hu, Raymond Chiong, Ilung Pranata, Yukun Bao and Yuqing Lin

Malicious web domain identification is of significant importance to the security protection of internet users. With online credibility and performance data, the purpose of this…

Abstract

Purpose

Malicious web domain identification is of significant importance to the security protection of internet users. With online credibility and performance data, the purpose of this paper to investigate the use of machine learning techniques for malicious web domain identification by considering the class imbalance issue (i.e. there are more benign web domains than malicious ones).

Design/methodology/approach

The authors propose an integrated resampling approach to handle class imbalance by combining the synthetic minority oversampling technique (SMOTE) and particle swarm optimisation (PSO), a population-based meta-heuristic algorithm. The authors use the SMOTE for oversampling and PSO for undersampling.

Findings

By applying eight well-known machine learning classifiers, the proposed integrated resampling approach is comprehensively examined using several imbalanced web domain data sets with different imbalance ratios. Compared to five other well-known resampling approaches, experimental results confirm that the proposed approach is highly effective.

Practical implications

This study not only inspires the practical use of online credibility and performance data for identifying malicious web domains but also provides an effective resampling approach for handling the class imbalance issue in the area of malicious web domain identification.

Originality/value

Online credibility and performance data are applied to build malicious web domain identification models using machine learning techniques. An integrated resampling approach is proposed to address the class imbalance issue. The performance of the proposed approach is confirmed based on real-world data sets with different imbalance ratios.

Article
Publication date: 8 March 2024

Satyajit Mahato and Supriyo Roy

Managing project completion within the stipulated time is significant to all firms' sustainability. Especially for software start-up firms, it is of utmost importance. For any…

Abstract

Purpose

Managing project completion within the stipulated time is significant to all firms' sustainability. Especially for software start-up firms, it is of utmost importance. For any schedule variation, these firms must spend 25 to 40 percent of the development cost reworking quality defects. Significantly, the existing literature does not support defect rework opportunities under quality aspects among Indian IT start-ups. The present study aims to fill this niche by proposing a unique mathematical model of the defect rework aligned with the Six Sigma quality approach.

Design/methodology/approach

An optimization model was formulated, comprising the two objectives: rework “time” and rework “cost.” A case study was developed in relevance, and for the model solution, we used MATLAB and an elitist, Nondominated Sorting Genetic Algorithm (NSGA-II).

Findings

The output of the proposed approach reduced the “time” by 31 percent at a minimum “cost”. The derived “Pareto Optimal” front can be used to estimate the “cost” for a pre-determined rework “time” and vice versa, thus adding value to the existing literature.

Research limitations/implications

This work has deployed a decision tree for defect prediction, but it is often criticized for overfitting. This is one of the limitations of this paper. Apart from this, comparing the predicted defect count with other prediction models hasn’t been attempted. NSGA-II has been applied to solve the optimization problem; however, the optimal results obtained have yet to be compared with other algorithms. Further study is envisaged.

Practical implications

The Pareto front provides an effective visual aid for managers to compare multiple strategies to decide the best possible rework “cost” and “time” for their projects. It is beneficial for cost-sensitive start-ups to estimate the rework “cost” and “time” to negotiate with their customers effectively.

Originality/value

This paper proposes a novel quality management framework under the Six Sigma approach, which integrates optimization of critical metrics. As part of this study, a unique mathematical model of the software defect rework process was developed (combined with the proposed framework) to obtain the optimal solution for the perennial problem of schedule slippage in the rework process of software development.

Details

International Journal of Quality & Reliability Management, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0265-671X

Keywords

1 – 10 of 143