Search results

1 – 10 of 604
Article
Publication date: 6 January 2022

Deepti Sisodia and Dilip Singh Sisodia

The problem of choosing the utmost useful features from hundreds of features from time-series user click data arises in online advertising toward fraudulent publisher's…

Abstract

Purpose

The problem of choosing the utmost useful features from hundreds of features from time-series user click data arises in online advertising toward fraudulent publisher's classification. Selecting feature subsets is a key issue in such classification tasks. Practically, the use of filter approaches is common; however, they neglect the correlations amid features. Conversely, wrapper approaches could not be applied due to their complexities. Moreover, in particular, existing feature selection methods could not handle such data, which is one of the major causes of instability of feature selection.

Design/methodology/approach

To overcome such issues, a majority voting-based hybrid feature selection method, namely feature distillation and accumulated selection (FDAS), is proposed to investigate the optimal subset of relevant features for analyzing the publisher's fraudulent conduct. FDAS works in two phases: (1) feature distillation, where significant features from standard filter and wrapper feature selection methods are obtained using majority voting; (2) accumulated selection, where we enumerated an accumulated evaluation of relevant feature subset to search for an optimal feature subset using effective machine learning (ML) models.

Findings

Empirical results prove enhanced classification performance with proposed features in average precision, recall, f1-score and AUC in publisher identification and classification.

Originality/value

The FDAS is evaluated on FDMA2012 user-click data and nine other benchmark datasets to gauge its generalizing characteristics, first, considering original features, second, with relevant feature subsets selected by feature selection (FS) methods, third, with optimal feature subset obtained by the proposed approach. ANOVA significance test is conducted to demonstrate significant differences between independent features.

Details

Data Technologies and Applications, vol. 56 no. 4
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 17 January 2022

Syed Haroon Abdul Gafoor and Padma Theagarajan

Conventional diagnostic techniques, on the other hand, may be prone to subjectivity since they depend on assessment of motions that are often subtle to individual eyes and hence…

126

Abstract

Purpose

Conventional diagnostic techniques, on the other hand, may be prone to subjectivity since they depend on assessment of motions that are often subtle to individual eyes and hence hard to classify, potentially resulting in misdiagnosis. Meanwhile, early nonmotor signs of Parkinson’s disease (PD) can be mild and may be due to variety of other conditions. As a result, these signs are usually ignored, making early PD diagnosis difficult. Machine learning approaches for PD classification and healthy controls or individuals with similar medical symptoms have been introduced to solve these problems and to enhance the diagnostic and assessment processes of PD (like, movement disorders or other Parkinsonian syndromes).

Design/methodology/approach

Medical observations and evaluation of medical symptoms, including characterization of a wide range of motor indications, are commonly used to diagnose PD. The quantity of the data being processed has grown in the last five years; feature selection has become a prerequisite before any classification. This study introduces a feature selection method based on the score-based artificial fish swarm algorithm (SAFSA) to overcome this issue.

Findings

This study adds to the accuracy of PD identification by reducing the amount of chosen vocal features while to use the most recent and largest publicly accessible database. Feature subset selection in PD detection techniques starts by eliminating features that are not relevant or redundant. According to a few objective functions, features subset chosen should provide the best performance.

Research limitations/implications

In many situations, this is an Nondeterministic Polynomial Time (NP-Hard) issue. This method enhances the PD detection rate by selecting the most essential features from the database. To begin, the data set's dimensionality is reduced using Singular Value Decomposition dimensionality technique. Next, Biogeography-Based Optimization (BBO) for feature selection; the weight value is a vital parameter for finding the best features in PD classification.

Originality/value

PD classification is done by using ensemble learning classification approaches such as hybrid classifier of fuzzy K-nearest neighbor, kernel support vector machines, fuzzy convolutional neural network and random forest. The suggested classifiers are trained using data from UCI ML repository, and their results are verified using leave-one-person-out cross validation. The measures employed to assess the classifier efficiency include accuracy, F-measure, Matthews correlation coefficient.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 15 no. 4
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 13 August 2020

Chandra Sekhar Kolli and Uma Devi Tatavarthi

Fraud transaction detection has become a significant factor in the communication technologies and electronic commerce systems, as it affects the usage of electronic payment. Even…

Abstract

Purpose

Fraud transaction detection has become a significant factor in the communication technologies and electronic commerce systems, as it affects the usage of electronic payment. Even though, various fraud detection methods are developed, enhancing the performance of electronic payment by detecting the fraudsters results in a great challenge in the bank transaction.

Design/methodology/approach

This paper aims to design the fraud detection mechanism using the proposed Harris water optimization-based deep recurrent neural network (HWO-based deep RNN). The proposed fraud detection strategy includes three different phases, namely, pre-processing, feature selection and fraud detection. Initially, the input transactional data is subjected to the pre-processing phase, where the data is pre-processed using the Box-Cox transformation to remove the redundant and noise values from data. The pre-processed data is passed to the feature selection phase, where the essential and the suitable features are selected using the wrapper model. The selected feature makes the classifier to perform better detection performance. Finally, the selected features are fed to the detection phase, where the deep recurrent neural network classifier is used to achieve the fraud detection process such that the training process of the classifier is done by the proposed Harris water optimization algorithm, which is the integration of water wave optimization and Harris hawks optimization.

Findings

Moreover, the proposed HWO-based deep RNN obtained better performance in terms of the metrics, such as accuracy, sensitivity and specificity with the values of 0.9192, 0.7642 and 0.9943.

Originality/value

An effective fraud detection method named HWO-based deep RNN is designed to detect the frauds in the bank transaction. The optimal features selected using the wrapper model enable the classifier to find fraudulent activities more efficiently. However, the accurate detection result is evaluated through the optimization model based on the fitness measure such that the function with the minimal error value is declared as the best solution, as it yields better detection results.

Article
Publication date: 15 December 2017

Farshid Abdi, Kaveh Khalili-Damghani and Shaghayegh Abolmakarem

Customer insurance coverage sales plan problem, in which the loyal customers are recognized and offered some special plans, is an essential problem facing insurance companies. On…

Abstract

Purpose

Customer insurance coverage sales plan problem, in which the loyal customers are recognized and offered some special plans, is an essential problem facing insurance companies. On the other hand, the loyal customers who have enough potential to renew their insurance contracts at the end of the contract term should be persuaded to repurchase or renew their contracts. The aim of this paper is to propose a three-stage data-mining approach to recognize high-potential loyal insurance customers and to predict/plan special insurance coverage sales.

Design/methodology/approach

The first stage addresses data cleansing. In the second stage, several filter and wrapper methods are implemented to select proper features. In the third stage, K-nearest neighbor algorithm is used to cluster the customers. The approach aims to select a compact feature subset with the maximal prediction capability. The proposed approach can detect the customers who are more likely to buy a specific insurance coverage at the end of a contract term.

Findings

The proposed approach has been applied in a real case study of insurance company in Iran. On the basis of the findings, the proposed approach is capable of recognizing the customer clusters and planning a suitable insurance coverage sales plans for loyal customers with proper accuracy level. Therefore, the proposed approach can be useful for the insurance company which helps them to identify their potential clients. Consequently, insurance managers can consider appropriate marketing tactics and appropriate resource allocation of the insurance company to their high-potential loyal customers and prevent switching them to competitors.

Originality/value

Despite the importance of recognizing high-potential loyal insurance customers, little study has been done in this area. In this paper, data-mining techniques were developed for the prediction of special insurance coverage sales on the basis of customers’ characteristics. The method allows the insurance company to prioritize their customers and focus their attention on high-potential loyal customers. Using the outputs of the proposed approach, the insurance companies can offer the most productive/economic insurance coverage contracts to their customers. The approach proposed by this study be customized and may be used in other service companies.

Article
Publication date: 12 June 2017

Taehoon Ko, Je Hyuk Lee, Hyunchang Cho, Sungzoon Cho, Wounjoo Lee and Miji Lee

Quality management of products is an important part of manufacturing process. One way to manage and assure product quality is to use machine learning algorithms based on…

1893

Abstract

Purpose

Quality management of products is an important part of manufacturing process. One way to manage and assure product quality is to use machine learning algorithms based on relationship among various process steps. The purpose of this paper is to integrate manufacturing, inspection and after-sales service data to make full use of machine learning algorithms for estimating the products’ quality in a supervised fashion. Proposed frameworks and methods are applied to actual data associated with heavy machinery engines.

Design/methodology/approach

By following Lenzerini’s formula, manufacturing, inspection and after-sales service data from various sources are integrated. The after-sales service data are used to label each engine as normal or abnormal. In this study, one-class classification algorithms are used due to class imbalance problem. To address multi-dimensionality of time series data, the symbolic aggregate approximation algorithm is used for data segmentation. Then, binary genetic algorithm-based wrapper approach is applied to segmented data to find the optimal feature subset.

Findings

By employing machine learning-based anomaly detection models, an anomaly score for each engine is calculated. Experimental results show that the proposed method can detect defective engines with a high probability before they are shipped.

Originality/value

Through data integration, the actual customer-perceived quality from after-sales service is linked to data from manufacturing and inspection process. In terms of business application, data integration and machine learning-based anomaly detection can help manufacturers establish quality management policies that reflect the actual customer-perceived quality by predicting defective engines.

Details

Industrial Management & Data Systems, vol. 117 no. 5
Type: Research Article
ISSN: 0263-5577

Keywords

Article
Publication date: 9 March 2022

G.L. Infant Cyril and J.P. Ananth

The bank is termed as an imperative part of the marketing economy. The failure or success of an institution relies on the ability of industries to compute the credit risk. The…

Abstract

Purpose

The bank is termed as an imperative part of the marketing economy. The failure or success of an institution relies on the ability of industries to compute the credit risk. The loan eligibility prediction model utilizes analysis method that adapts past and current information of credit user to make prediction. However, precise loan prediction with risk and assessment analysis is a major challenge in loan eligibility prediction.

Design/methodology/approach

This aim of the research technique is to present a new method, namely Social Border Collie Optimization (SBCO)-based deep neuro fuzzy network for loan eligibility prediction. In this method, box cox transformation is employed on input loan data to create the data apt for further processing. The transformed data utilize the wrapper-based feature selection to choose suitable features to boost the performance of loan eligibility calculation. Once the features are chosen, the naive Bayes (NB) is adapted for feature fusion. In NB training, the classifier builds probability index table with the help of input data features and groups values. Here, the testing of NB classifier is done using posterior probability ratio considering conditional probability of normalization constant with class evidence. Finally, the loan eligibility prediction is achieved by deep neuro fuzzy network, which is trained with designed SBCO. Here, the SBCO is devised by combining the social ski driver (SSD) algorithm and Border Collie Optimization (BCO) to produce the most precise result.

Findings

The analysis is achieved by accuracy, sensitivity and specificity parameter by. The designed method performs with the highest accuracy of 95%, sensitivity and specificity of 95.4 and 97.3%, when compared to the existing methods, such as fuzzy neural network (Fuzzy NN), multiple partial least squares regression model (Multi_PLS), instance-based entropy fuzzy support vector machine (IEFSVM), deep recurrent neural network (Deep RNN), whale social optimization algorithm-based deep RNN (WSOA-based Deep RNN).

Originality/value

This paper devises SBCO-based deep neuro fuzzy network for predicting loan eligibility. Here, the deep neuro fuzzy network is trained with proposed SBCO, which is devised by combining the SSD and BCO to produce most precise result for loan eligibility prediction.

Details

Kybernetes, vol. 52 no. 8
Type: Research Article
ISSN: 0368-492X

Keywords

Article
Publication date: 16 October 2018

Guan Yuan, Zhaohui Wang, Fanrong Meng, Qiuyan Yan and Shixiong Xia

Currently, ubiquitous smartphones embedded with various sensors provide a convenient way to collect raw sequence data. These data bridges the gap between human activity and…

Abstract

Purpose

Currently, ubiquitous smartphones embedded with various sensors provide a convenient way to collect raw sequence data. These data bridges the gap between human activity and multiple sensors. Human activity recognition has been widely used in quite a lot of aspects in our daily life, such as medical security, personal safety, living assistance and so on.

Design/methodology/approach

To provide an overview, the authors survey and summarize some important technologies and involved key issues of human activity recognition, including activity categorization, feature engineering as well as typical algorithms presented in recent years. In this paper, the authors first introduce the character of embedded sensors and dsiscuss their features, as well as survey some data labeling strategies to get ground truth label. Then, following the process of human activity recognition, the authors discuss the methods and techniques of raw data preprocessing and feature extraction, and summarize some popular algorithms used in model training and activity recognizing. Third, they introduce some interesting application scenarios of human activity recognition and provide some available data sets as ground truth data to validate proposed algorithms.

Findings

The authors summarize their viewpoints on human activity recognition, discuss the main challenges and point out some potential research directions.

Originality/value

It is hoped that this work will serve as the steppingstone for those interested in advancing human activity recognition.

Details

Sensor Review, vol. 39 no. 2
Type: Research Article
ISSN: 0260-2288

Keywords

Article
Publication date: 10 November 2023

Yong Gui and Lanxin Zhang

Influenced by the constantly changing manufacturing environment, no single dispatching rule (SDR) can consistently obtain better scheduling results than other rules for the…

Abstract

Purpose

Influenced by the constantly changing manufacturing environment, no single dispatching rule (SDR) can consistently obtain better scheduling results than other rules for the dynamic job-shop scheduling problem (DJSP). Although the dynamic SDR selection classifier (DSSC) mined by traditional data-mining-based scheduling method has shown some improvement in comparison to an SDR, the enhancement is not significant since the rule selected by DSSC is still an SDR.

Design/methodology/approach

This paper presents a novel data-mining-based scheduling method for the DJSP with machine failure aiming at minimizing the makespan. Firstly, a scheduling priority relation model (SPRM) is constructed to determine the appropriate priority relation between two operations based on the production system state and the difference between their priority values calculated using multiple SDRs. Subsequently, a training sample acquisition mechanism based on the optimal scheduling schemes is proposed to acquire training samples for the SPRM. Furthermore, feature selection and machine learning are conducted using the genetic algorithm and extreme learning machine to mine the SPRM.

Findings

Results from numerical experiments demonstrate that the SPRM, mined by the proposed method, not only achieves better scheduling results in most manufacturing environments but also maintains a higher level of stability in diverse manufacturing environments than an SDR and the DSSC.

Originality/value

This paper constructs a SPRM and mines it based on data mining technologies to obtain better results than an SDR and the DSSC in various manufacturing environments.

Details

Kybernetes, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0368-492X

Keywords

Article
Publication date: 30 November 2004

S B Kotsiantis and P E Pintelas

Machine Learning algorithms fed with data sets which include information such as attendance data, test scores and other student information can provide tutors with powerful tools…

Abstract

Machine Learning algorithms fed with data sets which include information such as attendance data, test scores and other student information can provide tutors with powerful tools for decision‐making. Until now, much of the research has been limited to the relation between single variables and student performance. Combining multiple variables as possible predictors of dropout has generally been overlooked. The aim of this work is to present a high level architecture and a case study for a prototype machine learning tool which can automatically recognize dropout‐prone students in university level distance learning classes. Tracking student progress is a time‐consuming job which can be handled automatically by such a tool. While the tutors will still have an essential role in monitoring and evaluating student progress, the tool can compile the data required for reasonable and efficient monitoring. What is more, the application of the tool is not restricted to predicting drop‐out prone students: it can be also used for the prediction of students’ marks, for the prediction of how many students will submit a written assignment, etc. It can also help tutors explore data and build models for prediction, forecasting and classification. Finally, the underlying architecture is independent of the data set and as such it can be used to develop other similar tools

Details

Interactive Technology and Smart Education, vol. 1 no. 4
Type: Research Article
ISSN: 1741-5659

Keywords

Article
Publication date: 4 April 2022

Shrawan Kumar Trivedi, Amrinder Singh and Somesh Kumar Malhotra

There is a need to predict whether the consumers liked the stay in the hotel rooms or not, and to remove the aspects the customers did not like. Many customers leave a review…

Abstract

Purpose

There is a need to predict whether the consumers liked the stay in the hotel rooms or not, and to remove the aspects the customers did not like. Many customers leave a review after staying in the hotel. These reviews are mostly given on the website used to book the hotel. These reviews can be considered as a valuable data, which can be analyzed to provide better services in the hotels. The purpose of this study is to use machine learning techniques for analyzing the given data to determine different sentiment polarities of the consumers.

Design/methodology/approach

Reviews given by hotel customers on the Tripadvisor website, which were made available publicly by Kaggle. Out of 10,000 reviews in the data, a sample of 3,000 negative polarity reviews (customers with bad experiences) in the hotel and 3,000 positive polarity reviews (customers with good experiences) in the hotel is taken to prepare data set. The two-stage feature selection was applied, which first involved greedy selection method and then wrapper method to generate 37 most relevant features. An improved stacked decision tree (ISD) classifier) is built, which is further compared with state-of-the-art machine learning algorithms. All the tests are done using R-Studio.

Findings

The results showed that the new model was satisfactory overall with 80.77% accuracy after doing in-depth study with 50–50 split, 80.74% accuracy for 66–34 split and 80.25% accuracy for 80–20 split, when predicting the nature of the customers’ experience in the hotel, i.e. whether they are positive or negative.

Research limitations/implications

The implication of this research is to provide a showcase of how we can predict the polarity of potentially popular reviews. This helps the authors’ perspective to help the hotel industries to take corrective measures for the betterment of business and to promote useful positive reviews. This study also has some limitations like only English reviews are considered. This study was restricted to the data from trip-adviser website; however, a new data may be generated to test the credibility of the model. Only aspect-based sentiment classification is considered in this study.

Originality/value

Stacking machine learning techniques have been proposed. At first, state-of-the-art classifiers are tested on the given data, and then, three best performing classifiers (decision tree C5.0, random forest and support vector machine) are taken to build stack and to create ISD classifier.

1 – 10 of 604