Search results

1 – 10 of over 17000
Article
Publication date: 28 September 2021

Nageswara Rao Eluri, Gangadhara Rao Kancharla, Suresh Dara and Venkatesulu Dondeti

Gene selection is considered as the fundamental process in the bioinformatics field. The existing methodologies pertain to cancer classification are mostly clinical basis, and its…

Abstract

Purpose

Gene selection is considered as the fundamental process in the bioinformatics field. The existing methodologies pertain to cancer classification are mostly clinical basis, and its diagnosis capability is limited. Nowadays, the significant problems of cancer diagnosis are solved by the utilization of gene expression data. The researchers have been introducing many possibilities to diagnose cancer appropriately and effectively. This paper aims to develop the cancer data classification using gene expression data.

Design/methodology/approach

The proposed classification model involves three main phases: “(1) Feature extraction, (2) Optimal Feature Selection and (3) Classification”. Initially, five benchmark gene expression datasets are collected. From the collected gene expression data, the feature extraction is performed. To diminish the length of the feature vectors, optimal feature selection is performed, for which a new meta-heuristic algorithm termed as quantum-inspired immune clone optimization algorithm (QICO) is used. Once the relevant features are selected, the classification is performed by a deep learning model called recurrent neural network (RNN). Finally, the experimental analysis reveals that the proposed QICO-based feature selection model outperforms the other heuristic-based feature selection and optimized RNN outperforms the other machine learning methods.

Findings

The proposed QICO-RNN is acquiring the best outcomes at any learning percentage. On considering the learning percentage 85, the accuracy of the proposed QICO-RNN was 3.2% excellent than RNN, 4.3% excellent than RF, 3.8% excellent than NB and 2.1% excellent than KNN for Dataset 1. For Dataset 2, at learning percentage 35, the accuracy of the proposed QICO-RNN was 13.3% exclusive than RNN, 8.9% exclusive than RF and 14.8% exclusive than NB and KNN. Hence, the developed QICO algorithm is performing well in classifying the cancer data using gene expression data accurately.

Originality/value

This paper introduces a new optimal feature selection model using QICO and QICO-based RNN for effective classification of cancer data using gene expression data. This is the first work that utilizes an optimal feature selection model using QICO and QICO-RNN for effective classification of cancer data using gene expression data.

Article
Publication date: 11 June 2018

Deepika Kishor Nagthane and Archana M. Rajurkar

One of the main reasons for increase in mortality rate in woman is breast cancer. Accurate early detection of breast cancer seems to be the only solution for diagnosis. In the…

Abstract

Purpose

One of the main reasons for increase in mortality rate in woman is breast cancer. Accurate early detection of breast cancer seems to be the only solution for diagnosis. In the field of breast cancer research, many new computer-aided diagnosis systems have been developed to reduce the diagnostic test false positives because of the subtle appearance of breast cancer tissues. The purpose of this study is to develop the diagnosis technique for breast cancer using LCFS and TreeHiCARe classifier model.

Design/methodology/approach

The proposed diagnosis methodology initiates with the pre-processing procedure. Subsequently, feature extraction is performed. In feature extraction, the image features which preserve the characteristics of the breast tissues are extracted. Consequently, feature selection is performed by the proposed least-mean-square (LMS)-Cuckoo search feature selection (LCFS) algorithm. The feature selection from the vast range of the features extracted from the images is performed with the help of the optimal cut point provided by the LCS algorithm. Then, the image transaction database table is developed using the keywords of the training images and feature vectors. The transaction resembles the itemset and the association rules are generated from the transaction representation based on a priori algorithm with high conviction ratio and lift. After association rule generation, the proposed TreeHiCARe classifier model emanates in the diagnosis methodology. In TreeHICARe classifier, a new feature index is developed for the selection of a central feature for the decision tree centered on which the classification of images into normal or abnormal is performed.

Findings

The performance of the proposed method is validated over existing works using accuracy, sensitivity and specificity measures. The experimentation of proposed method on Mammographic Image Analysis Society database resulted in classification of normal and abnormal cancerous mammogram images with an accuracy of 0.8289, sensitivity of 0.9333 and specificity of 0.7273.

Originality/value

This paper proposes a new approach for the breast cancer diagnosis system by using mammogram images. The proposed method uses two new algorithms: LCFS and TreeHiCARe. LCFS is used to select optimal feature split points, and TreeHiCARe is the decision tree classifier model based on association rule agreements.

Details

Sensor Review, vol. 39 no. 1
Type: Research Article
ISSN: 0260-2288

Keywords

Article
Publication date: 10 March 2022

Jayaram Boga and Dhilip Kumar V.

For achieving the profitable human activity recognition (HAR) method, this paper solves the HAR problem under wireless body area network (WBAN) using a developed ensemble learning…

95

Abstract

Purpose

For achieving the profitable human activity recognition (HAR) method, this paper solves the HAR problem under wireless body area network (WBAN) using a developed ensemble learning approach. The purpose of this study is,to solve the HAR problem under WBAN using a developed ensemble learning approach for achieving the profitable HAR method. There are three data sets used for this HAR in WBAN, namely, human activity recognition using smartphones, wireless sensor data mining and Kaggle. The proposed model undergoes four phases, namely, “pre-processing, feature extraction, feature selection and classification.” Here, the data can be preprocessed by artifacts removal and median filtering techniques. Then, the features are extracted by techniques such as “t-Distributed Stochastic Neighbor Embedding”, “Short-time Fourier transform” and statistical approaches. The weighted optimal feature selection is considered as the next step for selecting the important features based on computing the data variance of each class. This new feature selection is achieved by the hybrid coyote Jaya optimization (HCJO). Finally, the meta-heuristic-based ensemble learning approach is used as a new recognition approach with three classifiers, namely, “support vector machine (SVM), deep neural network (DNN) and fuzzy classifiers.” Experimental analysis is performed.

Design/methodology/approach

The proposed HCJO algorithm was developed for optimizing the membership function of fuzzy, iteration limit of SVM and hidden neuron count of DNN for getting superior classified outcomes and to enhance the performance of ensemble classification.

Findings

The accuracy for enhanced HAR model was pretty high in comparison to conventional models, i.e. higher than 6.66% to fuzzy, 4.34% to DNN, 4.34% to SVM, 7.86% to ensemble and 6.66% to Improved Sealion optimization algorithm-Attention Pyramid-Convolutional Neural Network-AP-CNN, respectively.

Originality/value

The suggested HAR model with WBAN using HCJO algorithm is accurate and improves the effectiveness of the recognition.

Details

International Journal of Pervasive Computing and Communications, vol. 19 no. 4
Type: Research Article
ISSN: 1742-7371

Keywords

Article
Publication date: 2 July 2020

N. Venkata Sailaja, L. Padmasree and N. Mangathayaru

Text mining has been used for various knowledge discovery based applications, and thus, a lot of research has been contributed towards it. Latest trending research in the text…

176

Abstract

Purpose

Text mining has been used for various knowledge discovery based applications, and thus, a lot of research has been contributed towards it. Latest trending research in the text mining is adopting the incremental learning data, as it is economical while dealing with large volume of information.

Design/methodology/approach

The primary intention of this research is to design and develop a technique for incremental text categorization using optimized Support Vector Neural Network (SVNN). The proposed technique involves four major steps, such as pre-processing, feature selection, classification and feature extraction. Initially, the data is pre-processed based on stop word removal and stemming. Then, the feature extraction is done by extracting semantic word-based features and Term Frequency and Inverse Document Frequency (TF-IDF). From the extracted features, the important features are selected using Bhattacharya distance measure and the features are subjected as the input to the proposed classifier. The proposed classifier performs incremental learning using SVNN, wherein the weights are bounded in a limit using rough set theory. Moreover, for the optimal selection of weights in SVNN, Moth Search (MS) algorithm is used. Thus, the proposed classifier, named Rough set MS-SVNN, performs the text categorization for the incremental data, given as the input.

Findings

For the experimentation, the 20 News group dataset, and the Reuters dataset are used. Simulation results indicate that the proposed Rough set based MS-SVNN has achieved 0.7743, 0.7774 and 0.7745 for the precision, recall and F-measure, respectively.

Originality/value

In this paper, an online incremental learner is developed for the text categorization. The text categorization is done by developing the Rough set MS-SVNN classifier, which classifies the incoming texts based on the boundary condition evaluated by the Rough set theory, and the optimal weights from the MS. The proposed online text categorization scheme has the basic steps, like pre-processing, feature extraction, feature selection and classification. The pre-processing is carried out to identify the unique words from the dataset, and the features like semantic word-based features and TF-IDF are obtained from the keyword set. Feature selection is done by setting a minimum Bhattacharya distance measure, and the selected features are provided to the proposed Rough set MS-SVNN for the classification.

Details

Data Technologies and Applications, vol. 54 no. 5
Type: Research Article
ISSN: 2514-9288

Keywords

Open Access
Article
Publication date: 28 July 2020

Noura AlNuaimi, Mohammad Mehedy Masud, Mohamed Adel Serhani and Nazar Zaki

Organizations in many domains generate a considerable amount of heterogeneous data every day. Such data can be processed to enhance these organizations’ decisions in real time…

3576

Abstract

Organizations in many domains generate a considerable amount of heterogeneous data every day. Such data can be processed to enhance these organizations’ decisions in real time. However, storing and processing large and varied datasets (known as big data) is challenging to do in real time. In machine learning, streaming feature selection has always been considered a superior technique for selecting the relevant subset features from highly dimensional data and thus reducing learning complexity. In the relevant literature, streaming feature selection refers to the features that arrive consecutively over time; despite a lack of exact figure on the number of features, numbers of instances are well-established. Many scholars in the field have proposed streaming-feature-selection algorithms in attempts to find the proper solution to this problem. This paper presents an exhaustive and methodological introduction of these techniques. This study provides a review of the traditional feature-selection algorithms and then scrutinizes the current algorithms that use streaming feature selection to determine their strengths and weaknesses. The survey also sheds light on the ongoing challenges in big-data research.

Details

Applied Computing and Informatics, vol. 18 no. 1/2
Type: Research Article
ISSN: 2634-1964

Keywords

Article
Publication date: 28 February 2019

Gabrijela Dimic, Dejan Rancic, Nemanja Macek, Petar Spalevic and Vida Drasute

This paper aims to deal with the previously unknown prediction accuracy of students’ activity pattern in a blended learning environment.

Abstract

Purpose

This paper aims to deal with the previously unknown prediction accuracy of students’ activity pattern in a blended learning environment.

Design/methodology/approach

To extract the most relevant activity feature subset, different feature-selection methods were applied. For different cardinality subsets, classification models were used in the comparison.

Findings

Experimental evaluation oppose the hypothesis that feature vector dimensionality reduction leads to prediction accuracy increasing.

Research limitations/implications

Improving prediction accuracy in a described learning environment was based on applying synthetic minority oversampling technique, which had affected results on correlation-based feature-selection method.

Originality/value

The major contribution of the research is the proposed methodology for selecting the optimal low-cardinal subset of students’ activities and significant prediction accuracy improvement in a blended learning environment.

Details

Information Discovery and Delivery, vol. 47 no. 2
Type: Research Article
ISSN: 2398-6247

Keywords

Book part
Publication date: 30 September 2020

B. G. Deepa and S. Senthil

Breast cancer (BC) is one of the leading cancer in the world, BC risk has been there for women of the middle age also, it is the malignant tumor. However, identifying BC in the…

Abstract

Breast cancer (BC) is one of the leading cancer in the world, BC risk has been there for women of the middle age also, it is the malignant tumor. However, identifying BC in the early stage will save most of the women’s life. As there is an advancement in the technology research used Machine Learning (ML) algorithm Random Forest for ranking the feature, Support Vector Machine (SVM), and Naïve Bayes (NB) supervised classifiers for selection of best optimized features and prediction of BC accuracy. The estimation of prediction accuracy has been done by using the dataset Wisconsin Breast Cancer Data from University of California Irvine (UCI) ML repository. To perform all these operation, Anaconda one of the open source distribution of Python has been used. The proposed work resulted in extemporize improvement in the NB and SVM classifier accuracy. The performance evaluation of the proposed model is estimated by using classification accuracy, confusion matrix, mean, standard deviation, variance, and root mean-squared error.

The experimental results shows that 70-30 data split will result in best accuracy. SVM acts as a feature optimizer of 12 best features with the result of 97.66% accuracy and improvement of 1.17% after feature reduction. NB results with feature optimizer 17 of best features with the result of 96.49% accuracy and improvement of 1.17% after feature reduction.

The study shows that proposal model works very effectively as compare to the existing models with respect to accuracy measures.

Details

Big Data Analytics and Intelligence: A Perspective for Health Care
Type: Book
ISBN: 978-1-83909-099-8

Keywords

Article
Publication date: 28 July 2020

Sathyaraj R, Ramanathan L, Lavanya K, Balasubramanian V and Saira Banu J

The innovation in big data is increasing day by day in such a way that the conventional software tools face several problems in managing the big data. Moreover, the occurrence of…

Abstract

Purpose

The innovation in big data is increasing day by day in such a way that the conventional software tools face several problems in managing the big data. Moreover, the occurrence of the imbalance data in the massive data sets is a major constraint to the research industry.

Design/methodology/approach

The purpose of the paper is to introduce a big data classification technique using the MapReduce framework based on an optimization algorithm. The big data classification is enabled using the MapReduce framework, which utilizes the proposed optimization algorithm, named chicken-based bacterial foraging (CBF) algorithm. The proposed algorithm is generated by integrating the bacterial foraging optimization (BFO) algorithm with the cat swarm optimization (CSO) algorithm. The proposed model executes the process in two stages, namely, training and testing phases. In the training phase, the big data that is produced from different distributed sources is subjected to parallel processing using the mappers in the mapper phase, which perform the preprocessing and feature selection based on the proposed CBF algorithm. The preprocessing step eliminates the redundant and inconsistent data, whereas the feature section step is done on the preprocessed data for extracting the significant features from the data, to provide improved classification accuracy. The selected features are fed into the reducer for data classification using the deep belief network (DBN) classifier, which is trained using the proposed CBF algorithm such that the data are classified into various classes, and finally, at the end of the training process, the individual reducers present the trained models. Thus, the incremental data are handled effectively based on the training model in the training phase. In the testing phase, the incremental data are taken and split into different subsets and fed into the different mappers for the classification. Each mapper contains a trained model which is obtained from the training phase. The trained model is utilized for classifying the incremental data. After classification, the output obtained from each mapper is fused and fed into the reducer for the classification.

Findings

The maximum accuracy and Jaccard coefficient are obtained using the epileptic seizure recognition database. The proposed CBF-DBN produces a maximal accuracy value of 91.129%, whereas the accuracy values of the existing neural network (NN), DBN, naive Bayes classifier-term frequency–inverse document frequency (NBC-TFIDF) are 82.894%, 86.184% and 86.512%, respectively. The Jaccard coefficient of the proposed CBF-DBN produces a maximal Jaccard coefficient value of 88.928%, whereas the Jaccard coefficient values of the existing NN, DBN, NBC-TFIDF are 75.891%, 79.850% and 81.103%, respectively.

Originality/value

In this paper, a big data classification method is proposed for categorizing massive data sets for meeting the constraints of huge data. The big data classification is performed on the MapReduce framework based on training and testing phases in such a way that the data are handled in parallel at the same time. In the training phase, the big data is obtained and partitioned into different subsets of data and fed into the mapper. In the mapper, the features extraction step is performed for extracting the significant features. The obtained features are subjected to the reducers for classifying the data using the obtained features. The DBN classifier is utilized for the classification wherein the DBN is trained using the proposed CBF algorithm. The trained model is obtained as an output after the classification. In the testing phase, the incremental data are considered for the classification. New data are first split into subsets and fed into the mapper for classification. The trained models obtained from the training phase are used for the classification. The classified results from each mapper are fused and fed into the reducer for the classification of big data.

Details

Data Technologies and Applications, vol. 55 no. 3
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 4 December 2017

Fuzan Chen, Harris Wu, Runliang Dou and Minqiang Li

The purpose of this paper is to build a compact and accurate classifier for high-dimensional classification.

Abstract

Purpose

The purpose of this paper is to build a compact and accurate classifier for high-dimensional classification.

Design/methodology/approach

A classification approach based on class-dependent feature subspace (CFS) is proposed. CFS is a class-dependent integration of a support vector machine (SVM) classifier and associated discriminative features. For each class, our genetic algorithm (GA)-based approach evolves the best subset of discriminative features and SVM classifier simultaneously. To guarantee convergence and efficiency, the authors customize the GA in terms of encoding strategy, fitness evaluation, and genetic operators.

Findings

Experimental studies demonstrated that the proposed CFS-based approach is superior to other state-of-the-art classification algorithms on UCI data sets in terms of both concise interpretation and predictive power for high-dimensional data.

Research limitations/implications

UCI data sets rather than real industrial data are used to evaluate the proposed approach. In addition, only single-label classification is addressed in the study.

Practical implications

The proposed method not only constructs an accurate classification model but also obtains a compact combination of discriminative features. It is helpful for business makers to get a concise understanding of the high-dimensional data.

Originality/value

The authors propose a compact and effective classification approach for high-dimensional data. Instead of the same feature subset for all the classes, the proposed CFS-based approach obtains the optimal subset of discriminative feature and SVM classifier for each class. The proposed approach enhances both interpretability and predictive power for high-dimensional data.

Details

Industrial Management & Data Systems, vol. 117 no. 10
Type: Research Article
ISSN: 0263-5577

Keywords

Article
Publication date: 22 March 2024

Mohd Mustaqeem, Suhel Mustajab and Mahfooz Alam

Software defect prediction (SDP) is a critical aspect of software quality assurance, aiming to identify and manage potential defects in software systems. In this paper, we have…

Abstract

Purpose

Software defect prediction (SDP) is a critical aspect of software quality assurance, aiming to identify and manage potential defects in software systems. In this paper, we have proposed a novel hybrid approach that combines Gray Wolf Optimization with Feature Selection (GWOFS) and multilayer perceptron (MLP) for SDP. The GWOFS-MLP hybrid model is designed to optimize feature selection, ultimately enhancing the accuracy and efficiency of SDP. Gray Wolf Optimization, inspired by the social hierarchy and hunting behavior of gray wolves, is employed to select a subset of relevant features from an extensive pool of potential predictors. This study investigates the key challenges that traditional SDP approaches encounter and proposes promising solutions to overcome time complexity and the curse of the dimensionality reduction problem.

Design/methodology/approach

The integration of GWOFS and MLP results in a robust hybrid model that can adapt to diverse software datasets. This feature selection process harnesses the cooperative hunting behavior of wolves, allowing for the exploration of critical feature combinations. The selected features are then fed into an MLP, a powerful artificial neural network (ANN) known for its capability to learn intricate patterns within software metrics. MLP serves as the predictive engine, utilizing the curated feature set to model and classify software defects accurately.

Findings

The performance evaluation of the GWOFS-MLP hybrid model on a real-world software defect dataset demonstrates its effectiveness. The model achieves a remarkable training accuracy of 97.69% and a testing accuracy of 97.99%. Additionally, the receiver operating characteristic area under the curve (ROC-AUC) score of 0.89 highlights the model’s ability to discriminate between defective and defect-free software components.

Originality/value

Experimental implementations using machine learning-based techniques with feature reduction are conducted to validate the proposed solutions. The goal is to enhance SDP’s accuracy, relevance and efficiency, ultimately improving software quality assurance processes. The confusion matrix further illustrates the model’s performance, with only a small number of false positives and false negatives.

Details

International Journal of Intelligent Computing and Cybernetics, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 1756-378X

Keywords

1 – 10 of over 17000