Search results

1 – 10 of over 18000
Article
Publication date: 7 November 2016

Ismail Hmeidi, Mahmoud Al-Ayyoub, Nizar A. Mahyoub and Mohammed A. Shehab

Multi-label Text Classification (MTC) is one of the most recent research trends in data mining and information retrieval domains because of many reasons such as the rapid growth…

Abstract

Purpose

Multi-label Text Classification (MTC) is one of the most recent research trends in data mining and information retrieval domains because of many reasons such as the rapid growth of online data and the increasing tendency of internet users to be more comfortable with assigning multiple labels/tags to describe documents, emails, posts, etc. The dimensionality of labels makes MTC more difficult and challenging compared with traditional single-labeled text classification (TC). Because it is a natural extension of TC, several ways are proposed to benefit from the rich literature of TC through what is called problem transformation (PT) methods. Basically, PT methods transform the multi-label data into a single-label one that is suitable for traditional single-label classification algorithms. Another approach is to design novel classification algorithms customized for MTC. Over the past decade, several works have appeared on both approaches focusing mainly on the English language. This work aims to present an elaborate study of MTC of Arabic articles.

Design/methodology/approach

This paper presents a novel lexicon-based method for MTC, where the keywords that are most associated with each label are extracted from the training data along with a threshold that can later be used to determine whether each test document belongs to a certain label.

Findings

The experiments show that the presented approach outperforms the currently available approaches. Specifically, the results of our experiments show that the best accuracy obtained from existing approaches is only 18 per cent, whereas the accuracy of the presented lexicon-based approach can reach an accuracy level of 31 per cent.

Originality/value

Although there exist some tools that can be customized to address the MTC problem for Arabic text, their accuracies are very low when applied to Arabic articles. This paper presents a novel method for MTC. The experiments show that the presented approach outperforms the currently available approaches.

Details

International Journal of Web Information Systems, vol. 12 no. 4
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 28 July 2020

Sathyaraj R, Ramanathan L, Lavanya K, Balasubramanian V and Saira Banu J

The innovation in big data is increasing day by day in such a way that the conventional software tools face several problems in managing the big data. Moreover, the occurrence of…

Abstract

Purpose

The innovation in big data is increasing day by day in such a way that the conventional software tools face several problems in managing the big data. Moreover, the occurrence of the imbalance data in the massive data sets is a major constraint to the research industry.

Design/methodology/approach

The purpose of the paper is to introduce a big data classification technique using the MapReduce framework based on an optimization algorithm. The big data classification is enabled using the MapReduce framework, which utilizes the proposed optimization algorithm, named chicken-based bacterial foraging (CBF) algorithm. The proposed algorithm is generated by integrating the bacterial foraging optimization (BFO) algorithm with the cat swarm optimization (CSO) algorithm. The proposed model executes the process in two stages, namely, training and testing phases. In the training phase, the big data that is produced from different distributed sources is subjected to parallel processing using the mappers in the mapper phase, which perform the preprocessing and feature selection based on the proposed CBF algorithm. The preprocessing step eliminates the redundant and inconsistent data, whereas the feature section step is done on the preprocessed data for extracting the significant features from the data, to provide improved classification accuracy. The selected features are fed into the reducer for data classification using the deep belief network (DBN) classifier, which is trained using the proposed CBF algorithm such that the data are classified into various classes, and finally, at the end of the training process, the individual reducers present the trained models. Thus, the incremental data are handled effectively based on the training model in the training phase. In the testing phase, the incremental data are taken and split into different subsets and fed into the different mappers for the classification. Each mapper contains a trained model which is obtained from the training phase. The trained model is utilized for classifying the incremental data. After classification, the output obtained from each mapper is fused and fed into the reducer for the classification.

Findings

The maximum accuracy and Jaccard coefficient are obtained using the epileptic seizure recognition database. The proposed CBF-DBN produces a maximal accuracy value of 91.129%, whereas the accuracy values of the existing neural network (NN), DBN, naive Bayes classifier-term frequency–inverse document frequency (NBC-TFIDF) are 82.894%, 86.184% and 86.512%, respectively. The Jaccard coefficient of the proposed CBF-DBN produces a maximal Jaccard coefficient value of 88.928%, whereas the Jaccard coefficient values of the existing NN, DBN, NBC-TFIDF are 75.891%, 79.850% and 81.103%, respectively.

Originality/value

In this paper, a big data classification method is proposed for categorizing massive data sets for meeting the constraints of huge data. The big data classification is performed on the MapReduce framework based on training and testing phases in such a way that the data are handled in parallel at the same time. In the training phase, the big data is obtained and partitioned into different subsets of data and fed into the mapper. In the mapper, the features extraction step is performed for extracting the significant features. The obtained features are subjected to the reducers for classifying the data using the obtained features. The DBN classifier is utilized for the classification wherein the DBN is trained using the proposed CBF algorithm. The trained model is obtained as an output after the classification. In the testing phase, the incremental data are considered for the classification. New data are first split into subsets and fed into the mapper for classification. The trained models obtained from the training phase are used for the classification. The classified results from each mapper are fused and fed into the reducer for the classification of big data.

Details

Data Technologies and Applications, vol. 55 no. 3
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 14 January 2022

Ashutosh Shankhdhar, Pawan Kumar Verma, Prateek Agrawal, Vishu Madaan and Charu Gupta

The aim of this paper is to explore the brain–computer interface (BCI) as a methodology for generating awareness and increasing reliable use cases of the same so that an…

Abstract

Purpose

The aim of this paper is to explore the brain–computer interface (BCI) as a methodology for generating awareness and increasing reliable use cases of the same so that an individual's quality of life can be enhanced via neuroscience and neural networks, and risk evaluation of certain experiments of BCI can be conducted in a proactive manner.

Design/methodology/approach

This paper puts forward an efficient approach for an existing BCI device, which can enhance the performance of an electroencephalography (EEG) signal classifier in a composite multiclass problem and investigates the effects of sampling rate on feature extraction and multiple channels on the accuracy of a complex multiclass EEG signal. A one-dimensional convolutional neural network architecture is used to further classify and improve the quality of the EEG signals, and other algorithms are applied to test their variability. The paper further also dwells upon the combination of internet of things multimedia technology to be integrated with a customized design BCI network based on a conventionally used system known as the message query telemetry transport.

Findings

At the end of our implementation stage, 98% accuracy was achieved in a binary classification problem of classifying digit and non-digit stimuli, and 36% accuracy was observed in the classification of signals resulting from stimuli of digits 0 to 9.

Originality/value

BCI, also known as the neural-control interface, is a device that helps a user reliably interact with a computer using only his/her brain activity, which is measured usually via EEG. An EEG machine is a quality device used for observing the neural activity and electric signals generated in certain parts of the human brain, which in turn can help us in studying the different core components of the human brain and how it functions to improve the quality of human life in general.

Details

International Journal of Quality & Reliability Management, vol. 39 no. 7
Type: Research Article
ISSN: 0265-671X

Keywords

Article
Publication date: 16 September 2021

Sireesha Jasti

Internet has endorsed a tremendous change with the advancement of the new technologies. The change has made the users of the internet to make comments regarding the service or…

Abstract

Purpose

Internet has endorsed a tremendous change with the advancement of the new technologies. The change has made the users of the internet to make comments regarding the service or product. The Sentiment classification is the process of analyzing the reviews for helping the user to decide whether to purchase the product or not.

Design/methodology/approach

A rider feedback artificial tree optimization-enabled deep recurrent neural networks (RFATO-enabled deep RNN) is developed for the effective classification of sentiments into various grades. The proposed RFATO algorithm is modeled by integrating the feedback artificial tree (FAT) algorithm in the rider optimization algorithm (ROA), which is used for training the deep RNN classifier for the classification of sentiments in the review data. The pre-processing is performed by the stemming and the stop word removal process for removing the redundancy for smoother processing of the data. The features including the sentiwordnet-based features, a variant of term frequency-inverse document frequency (TF-IDF) features and spam words-based features are extracted from the review data to form the feature vector. Feature fusion is performed based on the entropy of the features that are extracted. The metrics employed for the evaluation in the proposed RFATO algorithm are accuracy, sensitivity, and specificity.

Findings

By using the proposed RFATO algorithm, the evaluation metrics such as accuracy, sensitivity and specificity are maximized when compared to the existing algorithms.

Originality/value

The proposed RFATO algorithm is modeled by integrating the FAT algorithm in the ROA, which is used for training the deep RNN classifier for the classification of sentiments in the review data. The pre-processing is performed by the stemming and the stop word removal process for removing the redundancy for smoother processing of the data. The features including the sentiwordnet-based features, a variant of TF-IDF features and spam words-based features are extracted from the review data to form the feature vector. Feature fusion is performed based on the entropy of the features that are extracted.

Details

International Journal of Web Information Systems, vol. 17 no. 6
Type: Research Article
ISSN: 1744-0084

Keywords

Open Access
Article
Publication date: 8 March 2021

Mamdouh Abdel Alim Saad Mowafy and Walaa Mohamed Elaraby Mohamed Shallan

Heart diseases have become one of the most causes of death among Egyptians. With 500 deaths per 100,000 occurring annually in Egypt, it has been noticed that medical data faces a…

1106

Abstract

Purpose

Heart diseases have become one of the most causes of death among Egyptians. With 500 deaths per 100,000 occurring annually in Egypt, it has been noticed that medical data faces a high-dimensional problem that leads to a decrease in the classification accuracy of heart data. So the purpose of this study is to improve the classification accuracy of heart disease data for helping doctors efficiently diagnose heart disease by using a hybrid classification technique.

Design/methodology/approach

This paper used a new approach based on the integration between dimensionality reduction techniques as multiple correspondence analysis (MCA) and principal component analysis (PCA) with fuzzy c means (FCM) then with both of multilayer perceptron (MLP) and radial basis function networks (RBFN) which separate patients into different categories based on their diagnosis results in this paper, a comparative study of the performance performed including six structures such as MLP, RBFN, MLP via FCM–MCA, MLP via FCM–PCA, RBFN via FCM–MCA and RBFN via FCM–PCA to reach to the best classifier.

Findings

The results show that the MLP via FCM–MCA classifier structure has the highest ratio of classification accuracy and has the best performance superior to other methods; and that Smoking was the most factor causing heart disease.

Originality/value

This paper shows the importance of integrating statistical methods in increasing the classification accuracy of heart disease data.

Details

Review of Economics and Political Science, vol. 6 no. 3
Type: Research Article
ISSN: 2356-9980

Keywords

Article
Publication date: 16 April 2020

Mohammad Mahdi Ershadi and Abbas Seifi

This study aims to differential diagnosis of some diseases using classification methods to support effective medical treatment. For this purpose, different classification methods…

Abstract

Purpose

This study aims to differential diagnosis of some diseases using classification methods to support effective medical treatment. For this purpose, different classification methods based on data, experts’ knowledge and both are considered in some cases. Besides, feature reduction and some clustering methods are used to improve their performance.

Design/methodology/approach

First, the performances of classification methods are evaluated for differential diagnosis of different diseases. Then, experts' knowledge is utilized to modify the Bayesian networks' structures. Analyses of the results show that using experts' knowledge is more effective than other algorithms for increasing the accuracy of Bayesian network classification. A total of ten different diseases are used for testing, taken from the Machine Learning Repository datasets of the University of California at Irvine (UCI).

Findings

The proposed method improves both the computation time and accuracy of the classification methods used in this paper. Bayesian networks based on experts' knowledge achieve a maximum average accuracy of 87 percent, with a minimum standard deviation average of 0.04 over the sample datasets among all classification methods.

Practical implications

The proposed methodology can be applied to perform disease differential diagnosis analysis.

Originality/value

This study presents the usefulness of experts' knowledge in the diagnosis while proposing an adopted improvement method for classifications. Besides, the Bayesian network based on experts' knowledge is useful for different diseases neglected by previous papers.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 13 no. 1
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 28 September 2021

Nageswara Rao Eluri, Gangadhara Rao Kancharla, Suresh Dara and Venkatesulu Dondeti

Gene selection is considered as the fundamental process in the bioinformatics field. The existing methodologies pertain to cancer classification are mostly clinical basis, and its…

Abstract

Purpose

Gene selection is considered as the fundamental process in the bioinformatics field. The existing methodologies pertain to cancer classification are mostly clinical basis, and its diagnosis capability is limited. Nowadays, the significant problems of cancer diagnosis are solved by the utilization of gene expression data. The researchers have been introducing many possibilities to diagnose cancer appropriately and effectively. This paper aims to develop the cancer data classification using gene expression data.

Design/methodology/approach

The proposed classification model involves three main phases: “(1) Feature extraction, (2) Optimal Feature Selection and (3) Classification”. Initially, five benchmark gene expression datasets are collected. From the collected gene expression data, the feature extraction is performed. To diminish the length of the feature vectors, optimal feature selection is performed, for which a new meta-heuristic algorithm termed as quantum-inspired immune clone optimization algorithm (QICO) is used. Once the relevant features are selected, the classification is performed by a deep learning model called recurrent neural network (RNN). Finally, the experimental analysis reveals that the proposed QICO-based feature selection model outperforms the other heuristic-based feature selection and optimized RNN outperforms the other machine learning methods.

Findings

The proposed QICO-RNN is acquiring the best outcomes at any learning percentage. On considering the learning percentage 85, the accuracy of the proposed QICO-RNN was 3.2% excellent than RNN, 4.3% excellent than RF, 3.8% excellent than NB and 2.1% excellent than KNN for Dataset 1. For Dataset 2, at learning percentage 35, the accuracy of the proposed QICO-RNN was 13.3% exclusive than RNN, 8.9% exclusive than RF and 14.8% exclusive than NB and KNN. Hence, the developed QICO algorithm is performing well in classifying the cancer data using gene expression data accurately.

Originality/value

This paper introduces a new optimal feature selection model using QICO and QICO-based RNN for effective classification of cancer data using gene expression data. This is the first work that utilizes an optimal feature selection model using QICO and QICO-RNN for effective classification of cancer data using gene expression data.

Article
Publication date: 28 February 2019

Wonjoon Kim, Byungki Jin, Sanghyun Choo, Chang S. Nam and Myung Hwan Yun

Sitting in a chair is a typical act of modern people. Prolonged sitting and sitting with improper postures can lead to musculoskeletal disorders. Thus, there is a need for a…

Abstract

Purpose

Sitting in a chair is a typical act of modern people. Prolonged sitting and sitting with improper postures can lead to musculoskeletal disorders. Thus, there is a need for a sitting posture classification monitoring system that can predict a sitting posture. The purpose of this paper is to develop a system for classifying children’s sitting postures for the formation of correct postural habits.

Design/methodology/approach

For the data analysis, a pressure sensor of film type was installed on the seat of the chair, and image data of the postu.re were collected. A total of 26 children participated in the experiment and collected image data for a total of seven postures. The authors used convolutional neural networks (CNN) algorithm consisting of seven layers. In addition, to compare the accuracy of classification, artificial neural networks (ANN) technique, one of the machine learning techniques, was used.

Findings

The CNN algorithm was used for the sitting position classification and the average accuracy obtained by tenfold cross validation was 97.5 percent. The authors confirmed that classification accuracy through CNN algorithm is superior to conventional machine learning algorithms such as ANN and DNN. Through this study, we confirmed the applicability of the CNN-based algorithm that can be applied to the smart chair to support the correct posture in children.

Originality/value

This study successfully performed the posture classification of children using CNN technique, which has not been used in related studies. In addition, by focusing on children, we have expanded the scope of the related research area and expected to contribute to the early postural habits of children.

Details

Data Technologies and Applications, vol. 53 no. 2
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 17 March 2023

Rui Tian, Ruheng Yin and Feng Gan

Music sentiment analysis helps to promote the diversification of music information retrieval methods. Traditional music emotion classification tasks suffer from high manual…

Abstract

Purpose

Music sentiment analysis helps to promote the diversification of music information retrieval methods. Traditional music emotion classification tasks suffer from high manual workload and low classification accuracy caused by difficulty in feature extraction and inaccurate manual determination of hyperparameter. In this paper, the authors propose an optimized convolution neural network-random forest (CNN-RF) model for music sentiment classification which is capable of optimizing the manually selected hyperparameters to improve the accuracy of music sentiment classification and reduce labor costs and human classification errors.

Design/methodology/approach

A CNN-RF music sentiment classification model is designed based on quantum particle swarm optimization (QPSO). First, the audio data are transformed into a Mel spectrogram, and feature extraction is conducted by a CNN. Second, the music features extracted are processed by RF algorithm to complete a preliminary emotion classification. Finally, to select the suitable hyperparameters for a CNN, the QPSO algorithm is adopted to extract the best hyperparameters and obtain the final classification results.

Findings

The model has gone through experimental validations and achieved a classification accuracy of 97 per cent for different sentiment categories with shortened training time. The proposed method with QPSO achieved 1.2 and 1.6 per cent higher accuracy than that with particle swarm optimization and genetic algorithm, respectively. The proposed model had great potential for music sentiment classification.

Originality/value

The dual contribution of this work comprises the proposed model which integrated two deep learning models and the introduction of a QPSO into model optimization. With these two innovations, the efficiency and accuracy of music emotion recognition and classification have been significantly improved.

Details

Data Technologies and Applications, vol. 57 no. 5
Type: Research Article
ISSN: 2514-9288

Keywords

Open Access
Article
Publication date: 11 April 2023

Wenhao Yi, Mingnian Wang, Jianjun Tong, Siguang Zhao, Jiawang Li, Dengbin Gui and Xiao Zhang

The purpose of the study is to quickly identify significant heterogeneity of surrounding rock of tunnel face that generally occurs during the construction of large-section rock…

Abstract

Purpose

The purpose of the study is to quickly identify significant heterogeneity of surrounding rock of tunnel face that generally occurs during the construction of large-section rock tunnels of high-speed railways.

Design/methodology/approach

Relying on the support vector machine (SVM)-based classification model, the nominal classification of blastholes and nominal zoning and classification terms were used to demonstrate the heterogeneity identification method for the surrounding rock of tunnel face, and the identification calculation was carried out for the five test tunnels. Then, the suggestions for local optimization of the support structures of large-section rock tunnels were put forward.

Findings

The results show that compared with the two classification models based on neural networks, the SVM-based classification model has a higher classification accuracy when the sample size is small, and the average accuracy can reach 87.9%. After the samples are replaced, the SVM-based classification model can still reach the same accuracy, whose generalization ability is stronger.

Originality/value

By applying the identification method described in this paper, the significant heterogeneity characteristics of the surrounding rock in the process of two times of blasting were identified, and the identification results are basically consistent with the actual situation of the tunnel face at the end of blasting, and can provide a basis for local optimization of support parameters.

Details

Railway Sciences, vol. 2 no. 1
Type: Research Article
ISSN: 2755-0907

Keywords

1 – 10 of over 18000