Search results

1 – 10 of over 60000
To view the access options for this content please click here
Article
Publication date: 4 May 2021

Nor Hamizah Miswan, Chee Seng Chan and Chong Guan Ng

This paper develops a robust hospital readmission prediction framework by combining the feature selection algorithm and machine learning (ML) classifiers. The improved…

Abstract

Purpose

This paper develops a robust hospital readmission prediction framework by combining the feature selection algorithm and machine learning (ML) classifiers. The improved feature selection is proposed by considering the uncertainty in patient's attributes that leads to the output variable.

Design/methodology/approach

First, data preprocessing is conducted which includes how raw data is managed. Second, the impactful features are selected through feature selection process. It started with calculating the relational grade of each patient towards readmission using grey relational analysis (GRA) and the grade is used as the target values for feature selection. Then, the influenced features are selected using the Least Absolute Shrinkage and Selection Operator (LASSO) method. This proposed method is termed as Grey-LASSO feature selection. The final task is the readmission prediction using ML classifiers.

Findings

The proposed method offered good performances with a minimum feature subset up to 54–65% discarded features. Multi-Layer Perceptron with Grey-LASSO gave the best performance.

Research limitations/implications

The performance of Grey-LASSO is justified in two readmission datasets. Further research is required to examine the generalisability to other datasets.

Originality/value

In designing the feature selection algorithm, the selection on influenced input variables was based on the integration of GRA and LASSO. Specifically, GRA is a part of the grey system theory, which was employed to analyse the relation between systems under uncertain conditions. The LASSO approach was adopted due to its ability for sparse data representation.

Details

Grey Systems: Theory and Application, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2043-9377

Keywords

Content available
Article
Publication date: 28 July 2020

Noura AlNuaimi, Mohammad Mehedy Masud, Mohamed Adel Serhani and Nazar Zaki

Organizations in many domains generate a considerable amount of heterogeneous data every day. Such data can be processed to enhance these organizations’ decisions in real…

Abstract

Organizations in many domains generate a considerable amount of heterogeneous data every day. Such data can be processed to enhance these organizations’ decisions in real time. However, storing and processing large and varied datasets (known as big data) is challenging to do in real time. In machine learning, streaming feature selection has always been considered a superior technique for selecting the relevant subset features from highly dimensional data and thus reducing learning complexity. In the relevant literature, streaming feature selection refers to the features that arrive consecutively over time; despite a lack of exact figure on the number of features, numbers of instances are well-established. Many scholars in the field have proposed streaming-feature-selection algorithms in attempts to find the proper solution to this problem. This paper presents an exhaustive and methodological introduction of these techniques. This study provides a review of the traditional feature-selection algorithms and then scrutinizes the current algorithms that use streaming feature selection to determine their strengths and weaknesses. The survey also sheds light on the ongoing challenges in big-data research.

Details

Applied Computing and Informatics, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2634-1964

Keywords

To view the access options for this content please click here
Article
Publication date: 23 August 2019

Janani Balakumar and S. Vijayarani Mohan

Owing to the huge volume of documents available on the internet, text classification becomes a necessary task to handle these documents. To achieve optimal text…

Abstract

Purpose

Owing to the huge volume of documents available on the internet, text classification becomes a necessary task to handle these documents. To achieve optimal text classification results, feature selection, an important stage, is used to curtail the dimensionality of text documents by choosing suitable features. The main purpose of this research work is to classify the personal computer documents based on their content.

Design/methodology/approach

This paper proposes a new algorithm for feature selection based on artificial bee colony (ABCFS) to enhance the text classification accuracy. The proposed algorithm (ABCFS) is scrutinized with the real and benchmark data sets, which is contrary to the other existing feature selection approaches such as information gain and χ2 statistic. To justify the efficiency of the proposed algorithm, the support vector machine (SVM) and improved SVM classifier are used in this paper.

Findings

The experiment was conducted on real and benchmark data sets. The real data set was collected in the form of documents that were stored in the personal computer, and the benchmark data set was collected from Reuters and 20 Newsgroups corpus. The results prove the performance of the proposed feature selection algorithm by enhancing the text document classification accuracy.

Originality/value

This paper proposes a new ABCFS algorithm for feature selection, evaluates the efficiency of the ABCFS algorithm and improves the support vector machine. In this paper, the ABCFS algorithm is used to select the features from text (unstructured) documents. Although, there is no text feature selection algorithm in the existing work, the ABCFS algorithm is used to select the data (structured) features. The proposed algorithm will classify the documents automatically based on their content.

To view the access options for this content please click here
Article
Publication date: 23 September 2020

Z.F. Zhang, Wei Liu, Egon Ostrosi, Yongjie Tian and Jianping Yi

During the production process of steel strip, some defects may appear on the surface, that is, traditional manual inspection could not meet the requirements of low-cost…

Abstract

Purpose

During the production process of steel strip, some defects may appear on the surface, that is, traditional manual inspection could not meet the requirements of low-cost and high-efficiency production. The purpose of this paper is to propose a method of feature selection based on filter methods combined with hidden Bayesian classifier for improving the efficiency of defect recognition and reduce the complexity of calculation. The method can select the optimal hybrid model for realizing the accurate classification of steel strip surface defects.

Design/methodology/approach

A large image feature set was initially obtained based on the discrete wavelet transform feature extraction method. Three feature selection methods (including correlation-based feature selection, consistency subset evaluator [CSE] and information gain) were then used to optimize the feature space. Parameters for the feature selection methods were based on the classification accuracy results of hidden Naive Bayes (HNB) algorithm. The selected feature subset was then applied to the traditional NB classifier and leading extended NB classifiers.

Findings

The experimental results demonstrated that the HNB model combined with feature selection approaches has better classification performance than other models of defect recognition. Among the results of this study, the proposed hybrid model of CSE + HNB is the most robust and effective and of highest classification accuracy in identifying the optimal subset of the surface defect database.

Originality/value

The main contribution of this paper is the development of a hybrid model combining feature selection and multi-class classification algorithms for steel strip surface inspection. The proposed hybrid model is primarily robust and effective for steel strip surface inspection.

Details

Engineering Computations, vol. 38 no. 4
Type: Research Article
ISSN: 0264-4401

Keywords

To view the access options for this content please click here
Article
Publication date: 12 June 2020

Sandeepkumar Hegde and Monica R. Mundada

According to the World Health Organization, by 2025, the contribution of chronic disease is expected to rise by 73% compared to all deaths and it is considered as global…

Abstract

Purpose

According to the World Health Organization, by 2025, the contribution of chronic disease is expected to rise by 73% compared to all deaths and it is considered as global burden of disease with a rate of 60%. These diseases persist for a longer duration of time, which are almost incurable and can only be controlled. Cardiovascular disease, chronic kidney disease (CKD) and diabetes mellitus are considered as three major chronic diseases that will increase the risk among the adults, as they get older. CKD is considered a major disease among all these chronic diseases, which will increase the risk among the adults as they get older. Overall 10% of the population of the world is affected by CKD and it is likely to double in the year 2030. The paper aims to propose novel feature selection approach in combination with the machine-learning algorithm which can early predict the chronic disease with utmost accuracy. Hence, a novel feature selection adaptive probabilistic divergence-based feature selection (APDFS) algorithm is proposed in combination with the hyper-parameterized logistic regression model (HLRM) for the early prediction of chronic disease.

Design/methodology/approach

A novel feature selection APDFS algorithm is proposed which explicitly handles the feature associated with the class label by relevance and redundancy analysis. The algorithm applies the statistical divergence-based information theory to identify the relationship between the distant features of the chronic disease data set. The data set required to experiment is obtained from several medical labs and hospitals in India. The HLRM is used as a machine-learning classifier. The predictive ability of the framework is compared with the various algorithm and also with the various chronic disease data set. The experimental result illustrates that the proposed framework is efficient and achieved competitive results compared to the existing work in most of the cases.

Findings

The performance of the proposed framework is validated by using the metric such as recall, precision, F1 measure and ROC. The predictive performance of the proposed framework is analyzed by passing the data set belongs to various chronic disease such as CKD, diabetes and heart disease. The diagnostic ability of the proposed approach is demonstrated by comparing its result with existing algorithms. The experimental figures illustrated that the proposed framework performed exceptionally well in prior prediction of CKD disease with an accuracy of 91.6.

Originality/value

The capability of the machine learning algorithms depends on feature selection (FS) algorithms in identifying the relevant traits from the data set, which impact the predictive result. It is considered as a process of choosing the relevant features from the data set by removing redundant and irrelevant features. Although there are many approaches that have been already proposed toward this objective, they are computationally complex because of the strategy of following a one-step scheme in selecting the features. In this paper, a novel feature selection APDFS algorithm is proposed which explicitly handles the feature associated with the class label by relevance and redundancy analysis. The proposed algorithm handles the process of feature selection in two separate indices. Hence, the computational complexity of the algorithm is reduced to O(nk+1). The algorithm applies the statistical divergence-based information theory to identify the relationship between the distant features of the chronic disease data set. The data set required to experiment is obtained from several medical labs and hospitals of karkala taluk ,India. The HLRM is used as a machine learning classifier. The predictive ability of the framework is compared with the various algorithm and also with the various chronic disease data set. The experimental result illustrates that the proposed framework is efficient and achieved competitive results are compared to the existing work in most of the cases.

Details

International Journal of Pervasive Computing and Communications, vol. 17 no. 1
Type: Research Article
ISSN: 1742-7371

Keywords

To view the access options for this content please click here
Article
Publication date: 7 July 2020

Jiaming Liu, Liuan Wang, Linan Zhang, Zeming Zhang and Sicheng Zhang

The primary objective of this study was to recognize critical indicators in predicting blood glucose (BG) through data-driven methods and to compare the prediction…

Abstract

Purpose

The primary objective of this study was to recognize critical indicators in predicting blood glucose (BG) through data-driven methods and to compare the prediction performance of four tree-based ensemble models, i.e. bagging with tree regressors (bagging-decision tree [Bagging-DT]), AdaBoost with tree regressors (Adaboost-DT), random forest (RF) and gradient boosting decision tree (GBDT).

Design/methodology/approach

This study proposed a majority voting feature selection method by combining lasso regression with the Akaike information criterion (AIC) (LR-AIC), lasso regression with the Bayesian information criterion (BIC) (LR-BIC) and RF to select indicators with excellent predictive performance from initial 38 indicators in 5,642 samples. The selected features were deployed to build the tree-based ensemble models. The 10-fold cross-validation (CV) method was used to evaluate the performance of each ensemble model.

Findings

The results of feature selection indicated that age, corpuscular hemoglobin concentration (CHC), red blood cell volume distribution width (RBCVDW), red blood cell volume and leucocyte count are five most important clinical/physical indicators in BG prediction. Furthermore, this study also found that the GBDT ensemble model combined with the proposed majority voting feature selection method is better than other three models with respect to prediction performance and stability.

Practical implications

This study proposed a novel BG prediction framework for better predictive analytics in health care.

Social implications

This study incorporated medical background and machine learning technology to reduce diabetes morbidity and formulate precise medical schemes.

Originality/value

The majority voting feature selection method combined with the GBDT ensemble model provides an effective decision-making tool for predicting BG and detecting diabetes risk in advance.

Content available
Article
Publication date: 11 April 2018

Mohamed A. Tawhid and Kevin B. Dsouza

In this paper, we present a new hybrid binary version of bat and enhanced particle swarm optimization algorithm in order to solve feature selection problems. The proposed…

Abstract

In this paper, we present a new hybrid binary version of bat and enhanced particle swarm optimization algorithm in order to solve feature selection problems. The proposed algorithm is called Hybrid Binary Bat Enhanced Particle Swarm Optimization Algorithm (HBBEPSO). In the proposed HBBEPSO algorithm, we combine the bat algorithm with its capacity for echolocation helping explore the feature space and enhanced version of the particle swarm optimization with its ability to converge to the best global solution in the search space. In order to investigate the general performance of the proposed HBBEPSO algorithm, the proposed algorithm is compared with the original optimizers and other optimizers that have been used for feature selection in the past. A set of assessment indicators are used to evaluate and compare the different optimizers over 20 standard data sets obtained from the UCI repository. Results prove the ability of the proposed HBBEPSO algorithm to search the feature space for optimal feature combinations.

Details

Applied Computing and Informatics, vol. 16 no. 1/2
Type: Research Article
ISSN: 2634-1964

Keywords

To view the access options for this content please click here
Article
Publication date: 11 June 2018

Deepika Kishor Nagthane and Archana M. Rajurkar

One of the main reasons for increase in mortality rate in woman is breast cancer. Accurate early detection of breast cancer seems to be the only solution for diagnosis. In…

Abstract

Purpose

One of the main reasons for increase in mortality rate in woman is breast cancer. Accurate early detection of breast cancer seems to be the only solution for diagnosis. In the field of breast cancer research, many new computer-aided diagnosis systems have been developed to reduce the diagnostic test false positives because of the subtle appearance of breast cancer tissues. The purpose of this study is to develop the diagnosis technique for breast cancer using LCFS and TreeHiCARe classifier model.

Design/methodology/approach

The proposed diagnosis methodology initiates with the pre-processing procedure. Subsequently, feature extraction is performed. In feature extraction, the image features which preserve the characteristics of the breast tissues are extracted. Consequently, feature selection is performed by the proposed least-mean-square (LMS)-Cuckoo search feature selection (LCFS) algorithm. The feature selection from the vast range of the features extracted from the images is performed with the help of the optimal cut point provided by the LCS algorithm. Then, the image transaction database table is developed using the keywords of the training images and feature vectors. The transaction resembles the itemset and the association rules are generated from the transaction representation based on a priori algorithm with high conviction ratio and lift. After association rule generation, the proposed TreeHiCARe classifier model emanates in the diagnosis methodology. In TreeHICARe classifier, a new feature index is developed for the selection of a central feature for the decision tree centered on which the classification of images into normal or abnormal is performed.

Findings

The performance of the proposed method is validated over existing works using accuracy, sensitivity and specificity measures. The experimentation of proposed method on Mammographic Image Analysis Society database resulted in classification of normal and abnormal cancerous mammogram images with an accuracy of 0.8289, sensitivity of 0.9333 and specificity of 0.7273.

Originality/value

This paper proposes a new approach for the breast cancer diagnosis system by using mammogram images. The proposed method uses two new algorithms: LCFS and TreeHiCARe. LCFS is used to select optimal feature split points, and TreeHiCARe is the decision tree classifier model based on association rule agreements.

Details

Sensor Review, vol. 39 no. 1
Type: Research Article
ISSN: 0260-2288

Keywords

To view the access options for this content please click here
Article
Publication date: 16 October 2018

Guan Yuan, Zhaohui Wang, Fanrong Meng, Qiuyan Yan and Shixiong Xia

Currently, ubiquitous smartphones embedded with various sensors provide a convenient way to collect raw sequence data. These data bridges the gap between human activity…

Abstract

Purpose

Currently, ubiquitous smartphones embedded with various sensors provide a convenient way to collect raw sequence data. These data bridges the gap between human activity and multiple sensors. Human activity recognition has been widely used in quite a lot of aspects in our daily life, such as medical security, personal safety, living assistance and so on.

Design/methodology/approach

To provide an overview, the authors survey and summarize some important technologies and involved key issues of human activity recognition, including activity categorization, feature engineering as well as typical algorithms presented in recent years. In this paper, the authors first introduce the character of embedded sensors and dsiscuss their features, as well as survey some data labeling strategies to get ground truth label. Then, following the process of human activity recognition, the authors discuss the methods and techniques of raw data preprocessing and feature extraction, and summarize some popular algorithms used in model training and activity recognizing. Third, they introduce some interesting application scenarios of human activity recognition and provide some available data sets as ground truth data to validate proposed algorithms.

Findings

The authors summarize their viewpoints on human activity recognition, discuss the main challenges and point out some potential research directions.

Originality/value

It is hoped that this work will serve as the steppingstone for those interested in advancing human activity recognition.

Details

Sensor Review, vol. 39 no. 2
Type: Research Article
ISSN: 0260-2288

Keywords

Content available
Article
Publication date: 11 August 2020

Hongfang Zhou, Xiqian Wang and Yao Zhang

Feature selection is an essential step in data mining. The core of it is to analyze and quantize the relevancy and redundancy between the features and the classes. In CFR…

Abstract

Feature selection is an essential step in data mining. The core of it is to analyze and quantize the relevancy and redundancy between the features and the classes. In CFR feature selection method, they rarely consider which feature to choose if two or more features have the same value using evaluation criterion. In order to address this problem, the standard deviation is employed to adjust the importance between relevancy and redundancy. Based on this idea, a novel feature selection method named as Feature Selection Based on Weighted Conditional Mutual Information (WCFR) is introduced. Experimental results on ten datasets show that our proposed method has higher classification accuracy.

Details

Applied Computing and Informatics, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2634-1964

Keywords

1 – 10 of over 60000