Search results

1 – 10 of 204

View access options

Article

Publication date: 3 September 2024

Customer churn analysis using feature optimization methods and tree-based classifiers

As internet banking service marketing platforms continue to advance, customers exhibit distinct behaviors. Given the extensive array of options and minimal barriers to switching…

HTML

PDF (1.8 MB)

Downloads

Abstract

Purpose

As internet banking service marketing platforms continue to advance, customers exhibit distinct behaviors. Given the extensive array of options and minimal barriers to switching to competitors, the concept of customer churn behavior has emerged as a subject of considerable debate. This study aims to delineate the scope of feature optimization methods for elucidating customer churn behavior within the context of internet banking service marketing. To achieve this goal, the author aims to predict the attrition and migration of customers who use internet banking services using tree-based classifiers.

Design/methodology/approach

The author used various feature optimization methods in tree-based classifiers to predict customer churn behavior using transaction data from customers who use internet banking services. First, the authors conducted feature reduction to eliminate ineffective features and project the data set onto a lower-dimensional space. Next, the author used Recursive Feature Elimination with Cross-Validation (RFECV) to extract the most practical features. Then, the author applied feature importance to assign a score to each input feature. Following this, the author selected C5.0 Decision Tree, Random Forest, XGBoost, AdaBoost, CatBoost and LightGBM as the six tree-based classifier structures.

Findings

This study acclaimed that transaction data is a reliable resource for elucidating customer churn behavior within the context of internet banking service marketing. Experimental findings highlight the operational benefits and enhanced customer retention afforded by implementing feature optimization and leveraging a variety of tree-based classifiers. The results indicate the significance of feature reduction, feature selection and feature importance as the three feature optimization methods in comprehending customer churn prediction. This study demonstrated that feature optimization can improve this prediction by increasing the accuracy and precision of tree-based classifiers and decreasing their error rates.

Originality/value

This research aims to enhance the understanding of customer behavior on internet banking service platforms by predicting churn intentions. This study demonstrates how feature optimization methods influence customer churn prediction performance. This approach included feature reduction, feature selection and assessing feature importance to optimize transaction data analysis. Additionally, the author performed feature optimization within tree-based classifiers to improve performance. The novelty of this approach lies in combining feature optimization methods with tree-based classifiers to effectively capture and articulate customer churn experience in internet banking service marketing.

Details

Journal of Services Marketing, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 0887-6045

Keywords

View access options

Article

Publication date: 3 September 2024

Bayesian-optimized extreme gradient boosting models for classification problems: an experimental analysis of product return case

Biplab Bhattacharjee, Kavya Unni and Maheshwar Pratap

Product returns are a major challenge for e-businesses as they involve huge logistical and operational costs. Therefore, it becomes crucial to predict returns in advance. This…

HTML

PDF (790 KB)

Downloads

Abstract

Purpose

Product returns are a major challenge for e-businesses as they involve huge logistical and operational costs. Therefore, it becomes crucial to predict returns in advance. This study aims to evaluate different genres of classifiers for product return chance prediction, and further optimizes the best performing model.

Design/methodology/approach

An e-commerce data set having categorical type attributes has been used for this study. Feature selection based on chi-square provides a selective features-set which is used as inputs for model building. Predictive models are attempted using individual classifiers, ensemble models and deep neural networks. For performance evaluation, 75:25 train/test split and 10-fold cross-validation strategies are used. To improve the predictability of the best performing classifier, hyperparameter tuning is performed using different optimization methods such as, random search, grid search, Bayesian approach and evolutionary models (genetic algorithm, differential evolution and particle swarm optimization).

Findings

A comparison of F1-scores revealed that the Bayesian approach outperformed all other optimization approaches in terms of accuracy. The predictability of the Bayesian-optimized model is further compared with that of other classifiers using experimental analysis. The Bayesian-optimized XGBoost model possessed superior performance, with accuracies of 77.80% and 70.35% for holdout and 10-fold cross-validation methods, respectively.

Research limitations/implications

Given the anonymized data, the effects of individual attributes on outcomes could not be investigated in detail. The Bayesian-optimized predictive model may be used in decision support systems, enabling real-time prediction of returns and the implementation of preventive measures.

Originality/value

There are very few reported studies on predicting the chance of order return in e-businesses. To the best of the authors’ knowledge, this study is the first to compare different optimization methods and classifiers, demonstrating the superiority of the Bayesian-optimized XGBoost classification model for returns prediction.

Details

Journal of Systems and Information Technology, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 1328-7265

Keywords

View access options

Article

Publication date: 22 August 2024

Does E-government curb corruption? The moderating role of national culture: a machine learning approach

Senda Belhaj Slimene, Hela Borgi and Hakim Ben Othman

The study aims to investigate the relationship between E-government and corruption. It also examines the moderator role of national culture through Hofstede’s dimensions on the…

HTML

PDF (792 KB)

Downloads

Abstract

Purpose

The study aims to investigate the relationship between E-government and corruption. It also examines the moderator role of national culture through Hofstede’s dimensions on the association between E-government and corruption.

Design/methodology/approach

In addition to panel regression techniques, the authors use the random forest method to assess the order of importance of all significant variables in determining corruption. The sample of this study consists of 55 countries during 2008–2020 period.

Findings

The results show that E-government is negatively correlated with corruption. The authors also find that both economic and cultural variables play an important role in determining corruption. However, religion has no impact on corruption. The results can potentially assist regulators and policy-makers when trying to control corruption as they should take into consideration the cultural background of citizens when making rules and procedures that aim at reducing corruption.

Originality/value

The current study uses random forests model, which allows the regression of variables based on the construction of a multitude of decision trees. The main contribution of using this model compared to the other regression models used in prior studies is to extract the relative importance of each significant variable. More precisely, it evaluates the rank of importance for each significant variable that drives corruption rather than merely identifying variables that drive corruption regardless of their relative importance.

Details

Transforming Government: People, Process and Policy, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 1750-6166

Keywords

View access options

Article

Publication date: 26 August 2024

Unveiling the melodic matrix: exploring genre-and-audio dynamics in the digital music popularity using machine learning techniques

Jurui Zhang, Shan Yu, Raymond Liu, Guang-Xin Xie and Leon Zurawicki

This paper aims to explore factors contributing to music popularity using machine learning approaches.

HTML

PDF (411 KB)

Downloads

Abstract

Purpose

This paper aims to explore factors contributing to music popularity using machine learning approaches.

Design/methodology/approach

A dataset comprising 204,853 songs from Spotify was used for analysis. The popularity of a song was predicted using predictive machine learning models, with the results showing the superiority of the random forest model across key performance metrics.

Findings

The analysis identifies crucial genre and audio features influencing music popularity. Additionally, genre specific analysis reveals that the impact of music features on music popularity varies across different genres.

Practical implications

The findings offer valuable insights for music artists, digital marketers and music platform researchers to understand and focus on the most impactful music features that drive the success of digital music, to devise more targeted marketing strategies and tactics based on popularity predictions, and more effectively capitalize on popular songs in this digital streaming age.

Originality/value

While previous research has explored different factors that may contribute to the popularity of music, this study makes a pioneering effort as the first to consider the intricate interplay between genre and audio features in predicting digital music popularity.

Details

Marketing Intelligence & Planning, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 0263-4503

Keywords

Open Access

Article

Publication date: 12 October 2021

Predicting student performance in a blended learning environment using learning management system interaction data

Kiran Fahd, Shah Jahan Miah and Khandakar Ahmed

Student attritions in tertiary educational institutes may play a significant role to achieve core values leading towards strategic mission and financial well-being. Analysis of…

HTML

PDF (2.3 MB)

Downloads

4207

Abstract

Purpose

Student attritions in tertiary educational institutes may play a significant role to achieve core values leading towards strategic mission and financial well-being. Analysis of data generated from student interaction with learning management systems (LMSs) in blended learning (BL) environments may assist with the identification of students at risk of failing, but to what extent this may be possible is unknown. However, existing studies are limited to address the issues at a significant scale.

Design/methodology/approach

This study develops a new approach harnessing applications of machine learning (ML) models on a dataset, that is publicly available, relevant to student attrition to identify potential students at risk. The dataset consists of the data generated by the interaction of students with LMS for their BL environment.

Findings

Identifying students at risk through an innovative approach will promote timely intervention in the learning process, such as for improving student academic progress. To evaluate the performance of the proposed approach, the accuracy is compared with other representational ML methods.

Originality/value

The best ML algorithm random forest with 85% is selected to support educators in implementing various pedagogical practices to improve students’ learning.

Details

Applied Computing and Informatics, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 2634-1964

Keywords

View access options

Article

Publication date: 28 March 2023

Predicting song genre with deep learning

Antonijo Marijić and Marina Bagić Babac

Genre classification of songs based on lyrics is a challenging task even for humans, however, state-of-the-art natural language processing has recently offered advanced solutions…

HTML

PDF (657 KB)

Downloads

193

Abstract

Purpose

Genre classification of songs based on lyrics is a challenging task even for humans, however, state-of-the-art natural language processing has recently offered advanced solutions to this task. The purpose of this study is to advance the understanding and application of natural language processing and deep learning in the domain of music genre classification, while also contributing to the broader themes of global knowledge and communication, and sustainable preservation of cultural heritage.

Design/methodology/approach

The main contribution of this study is the development and evaluation of various machine and deep learning models for song genre classification. Additionally, we investigated the effect of different word embeddings, including Global Vectors for Word Representation (GloVe) and Word2Vec, on the classification performance. The tested models range from benchmarks such as logistic regression, support vector machine and random forest, to more complex neural network architectures and transformer-based models, such as recurrent neural network, long short-term memory, bidirectional long short-term memory and bidirectional encoder representations from transformers (BERT).

Findings

The authors conducted experiments on both English and multilingual data sets for genre classification. The results show that the BERT model achieved the best accuracy on the English data set, whereas cross-lingual language model pretraining based on RoBERTa (XLM-RoBERTa) performed the best on the multilingual data set. This study found that songs in the metal genre were the most accurately labeled, as their text style and topics were the most distinct from other genres. On the contrary, songs from the pop and rock genres were more challenging to differentiate. This study also compared the impact of different word embeddings on the classification task and found that models with GloVe word embeddings outperformed Word2Vec and the learning embedding layer.

Originality/value

This study presents the implementation, testing and comparison of various machine and deep learning models for genre classification. The results demonstrate that transformer models, including BERT, robustly optimized BERT pretraining approach, distilled bidirectional encoder representations from transformers, bidirectional and auto-regressive transformers and XLM-RoBERTa, outperformed other models.

Details

Global Knowledge, Memory and Communication, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 2514-9342

Keywords

View access options

Article

Publication date: 28 February 2024

Predicting and analysing initiator crime environments based on machine learning for improving urban safety

Yoonjae Hwang, Sungwon Jung and Eun Joo Park

Initiator crimes, also known as near-repeat crimes, occur in places with known risk factors and vulnerabilities based on prior crime-related experiences or information…

HTML

PDF (2 MB)

Downloads

200

Abstract

Purpose

Initiator crimes, also known as near-repeat crimes, occur in places with known risk factors and vulnerabilities based on prior crime-related experiences or information. Consequently, the environment in which initiator crimes occur might be different from more general crime environments. This study aimed to analyse the differences between the environments of initiator crimes and general crimes, confirming the need for predicting initiator crimes.

Design/methodology/approach

We compared predictive models using data corresponding to initiator crimes and all residential burglaries without considering repetitive crime patterns as dependent variables. Using random forest and gradient boosting, representative ensemble models and predictive models were compared utilising various environmental factor data. Subsequently, we evaluated the performance of each predictive model to derive feature importance and partial dependence based on a highly predictive model.

Findings

By analysing environmental factors affecting overall residential burglary and initiator crimes, we observed notable differences in high-importance variables. Further analysis of the partial dependence of total residential burglary and initiator crimes based on these variables revealed distinct impacts on each crime. Moreover, initiator crimes took place in environments consistent with well-known theories in the field of environmental criminology.

Originality/value

Our findings indicate the possibility that results that do not appear through the existing theft crime prediction method will be identified in the initiator crime prediction model. Emphasising the importance of investigating the environments in which initiator crimes occur, this study underscores the potential of artificial intelligence (AI)-based approaches in creating a safe urban environment. By effectively preventing potential crimes, AI-driven prediction of initiator crimes can significantly contribute to enhancing urban safety.

Details

Archnet-IJAR: International Journal of Architectural Research, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 2631-6862

Keywords

View access options

Article

Publication date: 19 August 2022

AI federated learning based improvised random Forest classifier with error reduction mechanism for skewed data sets

Anjali More and Dipti Rana

Referred data set produces reliable information about the network flows and common attacks meeting with real-world criteria. Accordingly, this study aims to focus on the use of…

HTML

PDF (1.3 MB)

Downloads

106

Abstract

Purpose

Referred data set produces reliable information about the network flows and common attacks meeting with real-world criteria. Accordingly, this study aims to focus on the use of imbalanced intrusion detection benchmark knowledge discovery in database (KDD) data set. KDD data set is most preferably used by many researchers for experimentation and analysis. The proposed algorithm improvised random forest classification with error tuning factors (IRFCETF) deals with experimentation on KDD data set and evaluates the performance of a complete set of network traffic features through IRFCETF.

Design/methodology/approach

In the current era of applications, the attention of researchers is immersed by a diverse number of existing time applications that deals with imbalanced data classification (ImDC). Real-time application areas, artificial intelligence (AI), Industrial Internet of Things (IIoT), etc. are dealing ImDC undergo with diverted classification performance due to skewed data distribution (SkDD). There are numerous application areas that deal with SkDD. Many of the data applications in AI and IIoT face the diverted data classification rate in SkDD. In recent advancements, there is an exponential expansion in the volume of computer network data and related application developments. Intrusion detection is one of the demanding applications of ImDC. The proposed study focusses on imbalanced intrusion benchmark data set, KDD data set and other benchmark data set with the proposed IRFCETF approach. IRFCETF justifies the enriched classification performance on imbalanced data set over the existing approach. The purpose of this work is to review imbalanced data applications in numerous application areas including AI and IIoT and tuning the performance with respect to principal component analysis. This study also focusses on the out-of-bag error performance-tuning factor.

Findings

Experimental results on KDD data set shows that proposed algorithm gives enriched performance. For referred intrusion detection data set, IRFCETF classification accuracy is 99.57% and error rate is 0.43%.

Research limitations/implications

This research work extended for further improvements in classification techniques with multiple correspondence analysis (MCA); hierarchical MCA can be focussed with the use of classification models for wide range of skewed data sets.

Practical implications

The metrics enhancement is measurable and helpful in dealing with intrusion detection systems–related imbalanced applications in current application domains such as security, AI and IIoT digitization. Analytical results show improvised metrics of the proposed approach than other traditional machine learning algorithms. Thus, error-tuning parameter creates a measurable impact on classification accuracy is justified with the proposed IRFCETF.

Social implications

Proposed algorithm is useful in numerous IIoT applications such as health care, machinery automation etc.

Originality/value

This research work addressed classification metric enhancement approach IRFCETF. The proposed method yields a test set categorization for each case with error reduction mechanism.

Details

International Journal of Pervasive Computing and Communications, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 1742-7371

Keywords

View access options

Article

Publication date: 27 August 2024

Recognize and thrive: predicting employees’ satisfaction towards fairness in reward and recognition system using explainable machine learning and text mining

Shrawan Kumar Trivedi, Jaya Srivastava, Pradipta Patra, Shefali Singh and Debashish Jena

In current era, retaining the best-performing employees has become essential for businesses to compete in the dynamic technological landscape. Consequently, organizations must…

HTML

PDF (2.7 MB)

Downloads

Abstract

Purpose

In current era, retaining the best-performing employees has become essential for businesses to compete in the dynamic technological landscape. Consequently, organizations must ensure that their star performers believe that company’s reward and recognition (R&R) system is fair and equal. This study aims to use an explainable machine learning (eXML) model to develop a prediction algorithm for employee satisfaction with the fairness of R&R systems.

Design/methodology/approach

The current study uses state-of-the-art machine learning models such as Naive Bayes, Decision Tree C5.0, Random Forest and support vector machine-RBF to predict employee satisfaction towards fairness in R&R. The primary data used in the study has been collected from the employees of a large public sector undertaking from an emerging economy. This study also proposes a novel improved Naïve Bayes (INB) algorithm, the efficiency of which is compared with the state-of-the-art algorithms.

Findings

It is seen that the proposed INB model outperforms the state-of-the-art algorithms in many scenarios. Further, the proposed model and feature interaction are explained using the explainable machine learning (XML) concept. In addition, this study incorporates text mining techniques to corroborate the results from XML and suggests that “Transparency”, “Recognition”, “Unbiasedness”, “Appreciation” and “Timeliness in reward” are the most important features that impact employee satisfaction.

Originality/value

To the best of the authors’ knowledge, this is one of the first studies to use INB algorithm and mixed method research (text mining along with machine learning algorithms) for the prediction of employee satisfaction with respect to the R&R system.

Details

Global Knowledge, Memory and Communication, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 2514-9342

Keywords

View access options

Article

Publication date: 18 July 2024

Predicting student success with and without library instruction using supervised machine learning methods

Karen Harker, Carol Hargis and Jennifer Rowe

The main purpose of this analysis was to demonstrate the value of predictive modeling of student success and identify the key groups of students for which library instruction…

HTML

PDF (1.8 MB)

Downloads

Abstract

Purpose

The main purpose of this analysis was to demonstrate the value of predictive modeling of student success and identify the key groups of students for which library instruction could provide the most impact.

Design/methodology/approach

Data regarding the attendance of library instruction associated with a first-year writing course were combined with student demographic and academic data over a four year period representing over 10,000 students. We applied supervised machine learning methods to determine the most accurate model for predicting student outcomes, including course outcome, persistence and graduation. We also assessed the impact of library instruction on these outcomes.

Findings

The gradient-boosted decision tree model provided the most accurate predictions. The impact of library instruction was modest but still was second only to the previous grade point average (GPA). The value of this metric, however, was greatest for students who were struggling, especially those who were first-generation students, regardless of ethnicity. More notably, the impact of library instruction was substantially greater for specific student demographics, including students with lower cumulative GPAs.

Research limitations/implications

Features of the models were limited to high-level academic metrics, some of which may not be very useful in predicting outcomes. Measures more closely related to learning styles, the course or course of study could provide for greater accuracy.

Practical implications

Prediction modeling could allow for a more selective approach to outreach and offers information that the librarian can use to customize instruction sessions and reference interactions.

Social implications

Targeting students who may be at risk of not succeeding in a course has ethical implications either way. If used to bias the subjective assessments, these predictions could produce self-fulfilling prophecies. Conversely, to ignore indicators of possible difficulties the student may have with the material is a disservice to the education of that student.

Originality/value

There are few studies that have incorporated library instruction into models of predicting student outcomes. Library resources and services can play a major role in the success of students, particularly those who have had less exposure to the resources and skills needed to use these resources.

Details

Performance Measurement and Metrics, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 1467-8047

Keywords

Access

Year

Content type

Earlycite article (204)

1 – 10 of 204