Search results

1 – 10 of 101
Article
Publication date: 14 November 2016

Shrawan Kumar Trivedi and Shubhamoy Dey

The email is an important medium for sharing information rapidly. However, spam, being a nuisance in such communication, motivates the building of a robust filtering system with…

Abstract

Purpose

The email is an important medium for sharing information rapidly. However, spam, being a nuisance in such communication, motivates the building of a robust filtering system with high classification accuracy and good sensitivity towards false positives. In that context, this paper aims to present a combined classifier technique using a committee selection mechanism where the main objective is to identify a set of classifiers so that their individual decisions can be combined by a committee selection procedure for accurate detection of spam.

Design/methodology/approach

For training and testing of the relevant machine learning classifiers, text mining approaches are used in this research. Three data sets (Enron, SpamAssassin and LingSpam) have been used to test the classifiers. Initially, pre-processing is performed to extract the features associated with the email files. In the next step, the extracted features are taken through a dimensionality reduction method where non-informative features are removed. Subsequently, an informative feature subset is selected using genetic feature search. Thereafter, the proposed classifiers are tested on those informative features and the results compared with those of other classifiers.

Findings

For building the proposed combined classifier, three different studies have been performed. The first study identifies the effect of boosting algorithms on two probabilistic classifiers: Bayesian and Naïve Bayes. In that study, AdaBoost has been found to be the best algorithm for performance boosting. The second study was on the effect of different Kernel functions on support vector machine (SVM) classifier, where SVM with normalized polynomial (NP) kernel was observed to be the best. The last study was on combining classifiers with committee selection where the committee members were the best classifiers identified by the first study i.e. Bayesian and Naïve bays with AdaBoost, and the committee president was selected from the second study i.e. SVM with NP kernel. Results show that combining of the identified classifiers to form a committee machine gives excellent performance accuracy with a low false positive rate.

Research limitations/implications

This research is focused on the classification of email spams written in English language. Only body (text) parts of the emails have been used. Image spam has not been included in this work. We have restricted our work to only emails messages. None of the other types of messages like short message service or multi-media messaging service were a part of this study.

Practical implications

This research proposes a method of dealing with the issues and challenges faced by internet service providers and organizations that use email. The proposed model provides not only better classification accuracy but also a low false positive rate.

Originality/value

The proposed combined classifier is a novel classifier designed for accurate classification of email spam.

Details

VINE Journal of Information and Knowledge Management Systems, vol. 46 no. 4
Type: Research Article
ISSN: 2059-5891

Keywords

Article
Publication date: 28 September 2022

Hanene Rouabeh, Sami Gomri and Mohamed Masmoudi

The purpose of this paper is to design and validate an electronic nose (E-nose) prototype using commercially available metal oxide gas sensors (MOX). This prototype has a sensor…

Abstract

Purpose

The purpose of this paper is to design and validate an electronic nose (E-nose) prototype using commercially available metal oxide gas sensors (MOX). This prototype has a sensor array board that integrates eight different MOX gas sensors to handle multi-purpose applications. The number of sensors can be adapted to match different requirements and classification cases. The paper presents the validation of this E-nose prototype when used to identify three gas samples, namely, alcohol, butane and cigarette smoke. At the same time, it discusses the discriminative abilities of the prototype for the identification of alcohol, acetone and a mixture of them. In this respect, the selection of the appropriate type and number of gas sensors, as well as obtaining excellent discriminative abilities with a miniaturized design and minimal computation time, are all drivers for such implementation.

Design/methodology/approach

The suggested prototype contains two main parts: hardware (low-cost components) and software (Machine Learning). An interconnection printed circuit board, a Raspberry Pi and a sensor chamber with the sensor array board make up the first part. Eight sensors were put to the test to see how effective and feasible they were for the classification task at hand, and then the bare minimum of sensors was chosen. The second part consists of machine learning algorithms designed to ensure data acquisition and processing. These algorithms include feature extraction, dimensionality reduction and classification. To perform the classification task, two features taken from the sensors’ transient response were used.

Findings

Results reveal that the system presents high discriminative ability. The K-nearest neighbor (KNN) and support vector machine radial basis function based (SVM-RBF) classifiers both achieved 97.81% and 98.44% mean accuracy, respectively. These results were obtained after data dimensionality reduction using linear discriminant analysis, which is more effective in terms of discrimination power than principal component analysis. A repeated stratified K-cross validation was used to train and test five different machine learning classifiers. The classifiers were each tested on sets of data to determine their accuracy. The SVM-RBF model had high, stable and consistent accuracy over many repeats and different data splits. The total execution time for detection and identification is about 10 s.

Originality/value

Using information extracted from transient response of the sensors, the system proved to be able to accurately classify the gas types only in three out of the eight MQ-X gas sensors. The training and validation results of the SVM-RBF classifier show a good bias-variance trade-off. This proves that the two transient features are sufficiently efficient for this classification purpose. Moreover, all data processing tasks are performed by the Raspberry Pi, which shows real-time data processing with miniaturized architecture and low prices.

Details

Sensor Review, vol. 42 no. 6
Type: Research Article
ISSN: 0260-2288

Keywords

Article
Publication date: 8 February 2022

K. Arunkumar and S. Vasundra

Patient treatment trajectory data are used to predict the outcome of the treatment to particular disease that has been carried out in the research. In order to determine the…

Abstract

Purpose

Patient treatment trajectory data are used to predict the outcome of the treatment to particular disease that has been carried out in the research. In order to determine the evolving disease on the patient and changes in the health due to treatment has not considered existing methodologies. Hence deep learning models to trajectory data mining can be employed to identify disease prediction with high accuracy and less computation cost.

Design/methodology/approach

Multifocus deep neural network classifiers has been utilized to detect the novel disease class and comorbidity class to the changes in the genome pattern of the patient trajectory data can be identified on the layers of the architecture. Classifier is employed to learn extracted feature set with activation and weight function and then merged on many aspects to classify the undetermined sequence of diseases as a new variant. The performance of disease progression learning progress utilizes the precision of the constituent classifiers, which usually has larger generalization benefits than those optimized classifiers.

Findings

Deep learning architecture uses weight function, bias function on input layers and max pooling. Outcome of the input layer has applied to hidden layer to generate the multifocus characteristics of the disease, and multifocus characterized disease is processed in activation function using ReLu function along hyper parameter tuning which produces the effective outcome in the output layer of a fully connected network. Experimental results have proved using cross validation that proposed model outperforms methodologies in terms of computation time and accuracy.

Originality/value

Proposed evolving classifier represented as a robust architecture on using objective function to map the data sequence into a class distribution of the evolving disease class to the patient trajectory. Then, the generative output layer of the proposed model produces the progression outcome of the disease of the particular patient trajectory. The model tries to produce the accurate prognosis outcomes by employing data conditional probability function. The originality of the work defines 70% and comparisons of the previous methods the method of values are accurate and increased analysis of the predictions.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 15 no. 4
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 6 February 2017

Aytug Onan

The immense quantity of available unstructured text documents serve as one of the largest source of information. Text classification can be an essential task for many purposes in…

Abstract

Purpose

The immense quantity of available unstructured text documents serve as one of the largest source of information. Text classification can be an essential task for many purposes in information retrieval, such as document organization, text filtering and sentiment analysis. Ensemble learning has been extensively studied to construct efficient text classification schemes with higher predictive performance and generalization ability. The purpose of this paper is to provide diversity among the classification algorithms of ensemble, which is a key issue in the ensemble design.

Design/methodology/approach

An ensemble scheme based on hybrid supervised clustering is presented for text classification. In the presented scheme, supervised hybrid clustering, which is based on cuckoo search algorithm and k-means, is introduced to partition the data samples of each class into clusters so that training subsets with higher diversities can be provided. Each classifier is trained on the diversified training subsets and the predictions of individual classifiers are combined by the majority voting rule. The predictive performance of the proposed classifier ensemble is compared to conventional classification algorithms (such as Naïve Bayes, logistic regression, support vector machines and C4.5 algorithm) and ensemble learning methods (such as AdaBoost, bagging and random subspace) using 11 text benchmarks.

Findings

The experimental results indicate that the presented classifier ensemble outperforms the conventional classification algorithms and ensemble learning methods for text classification.

Originality/value

The presented ensemble scheme is the first to use supervised clustering to obtain diverse ensemble for text classification

Details

Kybernetes, vol. 46 no. 2
Type: Research Article
ISSN: 0368-492X

Keywords

Article
Publication date: 18 November 2021

Yingjie Zhang, Wentao Yan, Geok Soon Hong, Jerry Fuh Hsi Fuh, Di Wang, Xin Lin and Dongsen Ye

This study aims to develop a data fusion method for powder-bed fusion (PBF) process monitoring based on process image information. The data fusion method can help improve process…

Abstract

Purpose

This study aims to develop a data fusion method for powder-bed fusion (PBF) process monitoring based on process image information. The data fusion method can help improve process condition identification performance, which can provide guidance for further PBF process monitoring and control system development.

Design/methodology/approach

Design of reliable process monitoring systems is an essential approach to solve PBF built quality. A data fusion framework based on support vector machine (SVM), convolutional neural network (CNN) and Dempster-Shafer (D-S) evidence theory are proposed in the study. The process images which include the information of melt pool, plume and spatters were acquired by a high-speed camera. The features were extracted based on an appropriate image processing method. The three feature vectors corresponding to the three objects, respectively, were used as the inputs of SVM classifiers for process condition identification. Moreover, raw images were also used as the input of a CNN classifier for process condition identification. Then, the information fusion of the three SVM classifiers and the CNN classifier by an improved D-S evidence theory was studied.

Findings

The results demonstrate that the sensitivity of information sources is different for different condition identification. The feature fusion based on D-S evidence theory can improve the classification performance, with feature fusion and classifier fusion, the accuracy of condition identification is improved more than 20%.

Originality/value

An improved D-S evidence theory is proposed for PBF process data fusion monitoring, which is promising for the development of reliable PBF process monitoring systems.

Details

Rapid Prototyping Journal, vol. 28 no. 5
Type: Research Article
ISSN: 1355-2546

Keywords

Article
Publication date: 17 January 2022

Syed Haroon Abdul Gafoor and Padma Theagarajan

Conventional diagnostic techniques, on the other hand, may be prone to subjectivity since they depend on assessment of motions that are often subtle to individual eyes and hence…

126

Abstract

Purpose

Conventional diagnostic techniques, on the other hand, may be prone to subjectivity since they depend on assessment of motions that are often subtle to individual eyes and hence hard to classify, potentially resulting in misdiagnosis. Meanwhile, early nonmotor signs of Parkinson’s disease (PD) can be mild and may be due to variety of other conditions. As a result, these signs are usually ignored, making early PD diagnosis difficult. Machine learning approaches for PD classification and healthy controls or individuals with similar medical symptoms have been introduced to solve these problems and to enhance the diagnostic and assessment processes of PD (like, movement disorders or other Parkinsonian syndromes).

Design/methodology/approach

Medical observations and evaluation of medical symptoms, including characterization of a wide range of motor indications, are commonly used to diagnose PD. The quantity of the data being processed has grown in the last five years; feature selection has become a prerequisite before any classification. This study introduces a feature selection method based on the score-based artificial fish swarm algorithm (SAFSA) to overcome this issue.

Findings

This study adds to the accuracy of PD identification by reducing the amount of chosen vocal features while to use the most recent and largest publicly accessible database. Feature subset selection in PD detection techniques starts by eliminating features that are not relevant or redundant. According to a few objective functions, features subset chosen should provide the best performance.

Research limitations/implications

In many situations, this is an Nondeterministic Polynomial Time (NP-Hard) issue. This method enhances the PD detection rate by selecting the most essential features from the database. To begin, the data set's dimensionality is reduced using Singular Value Decomposition dimensionality technique. Next, Biogeography-Based Optimization (BBO) for feature selection; the weight value is a vital parameter for finding the best features in PD classification.

Originality/value

PD classification is done by using ensemble learning classification approaches such as hybrid classifier of fuzzy K-nearest neighbor, kernel support vector machines, fuzzy convolutional neural network and random forest. The suggested classifiers are trained using data from UCI ML repository, and their results are verified using leave-one-person-out cross validation. The measures employed to assess the classifier efficiency include accuracy, F-measure, Matthews correlation coefficient.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 15 no. 4
Type: Research Article
ISSN: 1756-378X

Keywords

Open Access
Article
Publication date: 4 December 2020

Sergei O. Kuznetsov, Alexey Masyutin and Aleksandr Ageev

The purpose of this study is to show that closure-based classification and regression models provide both high accuracy and interpretability.

Abstract

Purpose

The purpose of this study is to show that closure-based classification and regression models provide both high accuracy and interpretability.

Design/methodology/approach

Pattern structures allow one to approach the knowledge extraction problem in case of partially ordered descriptions. They provide a way to apply techniques based on closed descriptions to non-binary data. To provide scalability of the approach, the author introduced a lazy (query-based) classification algorithm.

Findings

The experiments support the hypothesis that closure-based classification and regression allow one to both achieve higher accuracy in scoring models as compared to results obtained with classical banking models and retain interpretability of model results, whereas black-box methods grant better accuracy for the cost of losing interpretability.

Originality/value

This is an original research showing the advantage of closure-based classification and regression models in the banking sphere.

Details

Asian Journal of Economics and Banking, vol. 4 no. 3
Type: Research Article
ISSN: 2615-9821

Keywords

Article
Publication date: 28 October 2014

Kyle Dillon Feuz and Diane J. Cook

The purpose of this paper is to study heterogeneous transfer learning for activity recognition using heuristic search techniques. Many pervasive computing applications require…

Abstract

Purpose

The purpose of this paper is to study heterogeneous transfer learning for activity recognition using heuristic search techniques. Many pervasive computing applications require information about the activities currently being performed, but activity recognition algorithms typically require substantial amounts of labeled training data for each setting. One solution to this problem is to leverage transfer learning techniques to reuse available labeled data in new situations.

Design/methodology/approach

This paper introduces three novel heterogeneous transfer learning techniques that reverse the typical transfer model and map the target feature space to the source feature space and apply them to activity recognition in a smart apartment. This paper evaluates the techniques on data from 18 different smart apartments located in an assisted-care facility and compares the results against several baselines.

Findings

The three transfer learning techniques are all able to outperform the baseline comparisons in several situations. Furthermore, the techniques are successfully used in an ensemble approach to achieve even higher levels of accuracy.

Originality/value

The techniques in this paper represent a considerable step forward in heterogeneous transfer learning by removing the need to rely on instance – instance or feature – feature co-occurrence data.

Details

International Journal of Pervasive Computing and Communications, vol. 10 no. 4
Type: Research Article
ISSN: 1742-7371

Keywords

Article
Publication date: 14 December 2023

Huaxiang Song, Chai Wei and Zhou Yong

The paper aims to tackle the classification of Remote Sensing Images (RSIs), which presents a significant challenge for computer algorithms due to the inherent characteristics of…

Abstract

Purpose

The paper aims to tackle the classification of Remote Sensing Images (RSIs), which presents a significant challenge for computer algorithms due to the inherent characteristics of clustered ground objects and noisy backgrounds. Recent research typically leverages larger volume models to achieve advanced performance. However, the operating environments of remote sensing commonly cannot provide unconstrained computational and storage resources. It requires lightweight algorithms with exceptional generalization capabilities.

Design/methodology/approach

This study introduces an efficient knowledge distillation (KD) method to build a lightweight yet precise convolutional neural network (CNN) classifier. This method also aims to substantially decrease the training time expenses commonly linked with traditional KD techniques. This approach entails extensive alterations to both the model training framework and the distillation process, each tailored to the unique characteristics of RSIs. In particular, this study establishes a robust ensemble teacher by independently training two CNN models using a customized, efficient training algorithm. Following this, this study modifies a KD loss function to mitigate the suppression of non-target category predictions, which are essential for capturing the inter- and intra-similarity of RSIs.

Findings

This study validated the student model, termed KD-enhanced network (KDE-Net), obtained through the KD process on three benchmark RSI data sets. The KDE-Net surpasses 42 other state-of-the-art methods in the literature published from 2020 to 2023. Compared to the top-ranked method’s performance on the challenging NWPU45 data set, KDE-Net demonstrated a noticeable 0.4% increase in overall accuracy with a significant 88% reduction in parameters. Meanwhile, this study’s reformed KD framework significantly enhances the knowledge transfer speed by at least three times.

Originality/value

This study illustrates that the logit-based KD technique can effectively develop lightweight CNN classifiers for RSI classification without substantial sacrifices in computation and storage costs. Compared to neural architecture search or other methods aiming to provide lightweight solutions, this study’s KDE-Net, based on the inherent characteristics of RSIs, is currently more efficient in constructing accurate yet lightweight classifiers for RSI classification.

Details

International Journal of Web Information Systems, vol. 20 no. 2
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 11 July 2023

Abhinandan Chatterjee, Pradip Bala, Shruti Gedam, Sanchita Paul and Nishant Goyal

Depression is a mental health problem characterized by a persistent sense of sadness and loss of interest. EEG signals are regarded as the most appropriate instruments for…

Abstract

Purpose

Depression is a mental health problem characterized by a persistent sense of sadness and loss of interest. EEG signals are regarded as the most appropriate instruments for diagnosing depression because they reflect the operating status of the human brain. The purpose of this study is the early detection of depression among people using EEG signals.

Design/methodology/approach

(i) Artifacts are removed by filtering and linear and non-linear features are extracted; (ii) feature scaling is done using a standard scalar while principal component analysis (PCA) is used for feature reduction; (iii) the linear, non-linear and combination of both (only for those whose accuracy is highest) are taken for further analysis where some ML and DL classifiers are applied for the classification of depression; and (iv) in this study, total 15 distinct ML and DL methods, including KNN, SVM, bagging SVM, RF, GB, Extreme Gradient Boosting, MNB, Adaboost, Bagging RF, BootAgg, Gaussian NB, RNN, 1DCNN, RBFNN and LSTM, that have been effectively utilized as classifiers to handle a variety of real-world issues.

Findings

1. Among all, alpha, alpha asymmetry, gamma and gamma asymmetry give the best results in linear features, while RWE, DFA, CD and AE give the best results in non-linear feature. 2. In the linear features, gamma and alpha asymmetry have given 99.98% accuracy for Bagging RF, while gamma asymmetry has given 99.98% accuracy for BootAgg. 3. For non-linear features, it has been shown 99.84% of accuracy for RWE and DFA in RF, 99.97% accuracy for DFA in XGBoost and 99.94% accuracy for RWE in BootAgg. 4. By using DL, in linear features, gamma asymmetry has given more than 96% accuracy in RNN and 91% accuracy in LSTM and for non-linear features, 89% accuracy has been achieved for CD and AE in LSTM. 5. By combining linear and non-linear features, the highest accuracy was achieved in Bagging RF (98.50%) gamma asymmetry + RWE. In DL, Alpha + RWE, Gamma asymmetry + CD and gamma asymmetry + RWE have achieved 98% accuracy in LSTM.

Originality/value

A novel dataset was collected from the Central Institute of Psychiatry (CIP), Ranchi which was recorded using a 128-channels whereas major previous studies used fewer channels; the details of the study participants are summarized and a model is developed for statistical analysis using N-way ANOVA; artifacts are removed by high and low pass filtering of epoch data followed by re-referencing and independent component analysis for noise removal; linear features, namely, band power and interhemispheric asymmetry and non-linear features, namely, relative wavelet energy, wavelet entropy, Approximate entropy, sample entropy, detrended fluctuation analysis and correlation dimension are extracted; this model utilizes Epoch (213,072) for 5 s EEG data, which allows the model to train for longer, thereby increasing the efficiency of classifiers. Features scaling is done using a standard scalar rather than normalization because it helps increase the accuracy of the models (especially for deep learning algorithms) while PCA is used for feature reduction; the linear, non-linear and combination of both features are taken for extensive analysis in conjunction with ML and DL classifiers for the classification of depression. The combination of linear and non-linear features (only for those whose accuracy is highest) is used for the best detection results.

Details

Aslib Journal of Information Management, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2050-3806

Keywords

1 – 10 of 101