Search results

The email is an important medium for sharing information rapidly. However, spam, being a nuisance in such communication, motivates the building of a robust filtering system with…

HTML

PDF (557 KB)

Downloads

227

Abstract

Purpose

The email is an important medium for sharing information rapidly. However, spam, being a nuisance in such communication, motivates the building of a robust filtering system with high classification accuracy and good sensitivity towards false positives. In that context, this paper aims to present a combined classifier technique using a committee selection mechanism where the main objective is to identify a set of classifiers so that their individual decisions can be combined by a committee selection procedure for accurate detection of spam.

Design/methodology/approach

For training and testing of the relevant machine learning classifiers, text mining approaches are used in this research. Three data sets (Enron, SpamAssassin and LingSpam) have been used to test the classifiers. Initially, pre-processing is performed to extract the features associated with the email files. In the next step, the extracted features are taken through a dimensionality reduction method where non-informative features are removed. Subsequently, an informative feature subset is selected using genetic feature search. Thereafter, the proposed classifiers are tested on those informative features and the results compared with those of other classifiers.

Findings

For building the proposed combined classifier, three different studies have been performed. The first study identifies the effect of boosting algorithms on two probabilistic classifiers: Bayesian and Naïve Bayes. In that study, AdaBoost has been found to be the best algorithm for performance boosting. The second study was on the effect of different Kernel functions on support vector machine (SVM) classifier, where SVM with normalized polynomial (NP) kernel was observed to be the best. The last study was on combining classifiers with committee selection where the committee members were the best classifiers identified by the first study i.e. Bayesian and Naïve bays with AdaBoost, and the committee president was selected from the second study i.e. SVM with NP kernel. Results show that combining of the identified classifiers to form a committee machine gives excellent performance accuracy with a low false positive rate.

Research limitations/implications

This research is focused on the classification of email spams written in English language. Only body (text) parts of the emails have been used. Image spam has not been included in this work. We have restricted our work to only emails messages. None of the other types of messages like short message service or multi-media messaging service were a part of this study.

Practical implications

This research proposes a method of dealing with the issues and challenges faced by internet service providers and organizations that use email. The proposed model provides not only better classification accuracy but also a low false positive rate.

Originality/value

The proposed combined classifier is a novel classifier designed for accurate classification of email spam.

Details

VINE Journal of Information and Knowledge Management Systems, vol. 46 no. 4

Type: Research Article

DOI:

ISSN: 2059-5891

Keywords

View access options

Article

Publication date: 29 September 2020

A modified boosted support vector machine to rate banks

Hari Hara Krishna Kumar Viswanathan, Punniyamoorthy Murugesan, Sundar Rengasamy and Lavanya Vilvanathan

The purpose of this study is to compare the classification learning ability of our algorithm based on boosted support vector machine (B-SVM), against other classification…

HTML

PDF (899 KB)

Downloads

203

Abstract

Purpose

The purpose of this study is to compare the classification learning ability of our algorithm based on boosted support vector machine (B-SVM), against other classification techniques in predicting the credit ratings of banks. The key feature of this study is the usage of an imbalanced dataset (in the response variable/rating) with a smaller number of observations (number of banks).

Design/methodology/approach

In general, datasets in banking sector are small and imbalanced too. In this study, 23 Scheduled Commercial Banks (SCBs) have been chosen (in India), and their corresponding corporate ratings have been collated from the Indian subsidiary of reputed global rating agency. The top management of the rating agency provided 12 input (quantitative) variables that are considered essential for rating a bank within India. In order to overcome the challenge of dataset being imbalanced and having small number of observations, this study uses an algorithm, namely “Modified Boosted Support Vector Machines” (MBSVMs) proposed by Punniyamoorthy Murugesan and Sundar Rengasamy. This study also compares the classification ability of the aforementioned algorithm against other classification techniques such as multi-class SVM, back propagation neural networks, multi-class linear discriminant analysis (LDA) and k-nearest neighbors (k-NN) classification, on the basis of geometric mean (GM).

Findings

The performances of each algorithm have been compared based on one metric—the geometric mean, also known as GMean (GM). This metric typically indicates the class-wise sensitivity by using the values of products. The findings of the study prove that the proposed MBSVM technique outperforms the other techniques.

Research limitations/implications

This study provides an algorithm to predict ratings of banks where the dataset is small and imbalanced. One of the limitations of this research study is that subjective factors have not been included in our model; the sole focus is on the results generated by the models (driven by quantitative parameters). In future, studies may be conducted which may include subjective parameters (proxied by relevant and quantifiable variables).

Practical implications

Various stakeholders such as investors, regulators and central banks can predict the credit ratings of banks by themselves, by inputting appropriate data to the model.

Originality/value

In the process of rating banks, the usage of an imbalanced dataset can lessen the performance of the soft-computing techniques. In order to overcome this, the authors have come up with a novel classification approach based on “MBSVMs”, which can be used as a yardstick for such imbalanced datasets. For this purpose, through primary research, 12 features have been identified that are considered essential by the credit rating agencies.

Details

Benchmarking: An International Journal, vol. 28 no. 1

Type: Research Article

DOI:

ISSN: 1463-5771

Keywords

View access options

Article

Publication date: 1 February 2016

Performance and accuracy analysis of semantic kernel functions

Manoj Manuja and Deepak Garg

Syntax-based text classification (TC) mechanisms have been overtly replaced by semantic-based systems in recent years. Semantic-based TC systems are particularly useful in those…

HTML

PDF (994 KB)

Downloads

208

Abstract

Purpose

Syntax-based text classification (TC) mechanisms have been overtly replaced by semantic-based systems in recent years. Semantic-based TC systems are particularly useful in those scenarios where similarity among documents is computed considering semantic relationships among their terms. Kernel functions have received major attention because of the unprecedented popularity of SVMs in the field of TC. Most of the kernel functions exploit syntactic structures of the text, but quite a few also use a priori semantic information for knowledge extraction. The purpose of this paper is to investigate semantic kernel functions in the context of TC.

Design/methodology/approach

This work presents performance and accuracy analysis of seven semantic kernel functions (Semantic Smoothing Kernel, Latent Semantic Kernel, Semantic WordNet-based Kernel, Semantic Smoothing Kernel having Implicit Superconcept Expansions, Compactness-based Disambiguation Kernel Function, Omiotis-based S-VSM semantic kernel function and Top-k S-VSM semantic kernel) being implemented with SVM as kernel method. All seven semantic kernels are implemented in SVM-Light tool.

Findings

Performance and accuracy parameters of seven semantic kernel functions have been evaluated and compared. The experimental results show that Top-k S-VSM semantic kernel has the highest performance and accuracy among all the evaluated kernel functions which make it a preferred building block for kernel methods for TC and retrieval.

Research limitations/implications

A combination of semantic kernel function with syntactic kernel function needs to be investigated as there is a scope of further improvement in terms of accuracy and performance in all the seven semantic kernel functions.

Practical implications

This research provides an insight into TC using a priori semantic knowledge. Three commonly used data sets are being exploited. It will be quite interesting to explore these kernel functions on live web data which may test their actual utility in real business scenarios.

Originality/value

Comparison of performance and accuracy parameters is the novel point of this research paper. To the best of the authors’ knowledge, this type of comparison has not been done previously.

Details

Program, vol. 50 no. 1

Type: Research Article

DOI:

ISSN: 0033-0337

Keywords

View access options

Article

Publication date: 8 March 2011

SVM‐based method for high‐impedance faults detection in distribution networks

M. Sarlak and S.M. Shahrtash

The purpose of this paper is to present a new pattern recognition‐based algorithm to detect high‐impedance faults (HIFs), including only with broken conductor and arcs, in…

HTML

PDF (1014 KB)

Downloads

595

Abstract

Purpose

The purpose of this paper is to present a new pattern recognition‐based algorithm to detect high‐impedance faults (HIFs), including only with broken conductor and arcs, in distribution networks.

Design/methodology/approach

In the proposed method, using discrete wavelet transform, the time‐frequency‐based features of the current waveform are calculated. Then, to extract the best feature set of the generated time‐frequency features, principle components analysis (PCA) is applied and finally support vector machines (SVM) is used as a classifier to distinguish between the HIFs, including only with broken conductor and arcs, and other similar phenomena such as capacitor banks switching, no load transformer switching, load switching, insulator leakage current and harmonic loads.

Findings

The experimental results have shown that using SVM with PCA as the feature extraction method and radial basis function (RBF) as the kernel function has acceptable security and dependability performances in distinguishing HIFs, including only with broken conductor and arcs, from other similar phenomena and is superior to the Bayes and multi‐layer perceptron neural network classifiers.

Originality/value

Using new combination of time‐frequency‐based features with SVM provides a new algorithm to detect HIFs, including only with broken conductor and arcs, that has acceptable security and dependability.

Details

COMPEL - The international journal for computation and mathematics in electrical and electronic engineering, vol. 30 no. 2

Type: Research Article

DOI:

ISSN: 0332-1649

Keywords

Open Access

Article

Publication date: 29 July 2020

Inpainting forgery detection using hybrid generative/discriminative approach based on bounded generalized Gaussian mixture model

Abdullah Alharbi, Wajdi Alhakami, Sami Bourouis, Fatma Najar and Nizar Bouguila

We propose in this paper a novel reliable detection method to recognize forged inpainting images. Detecting potential forgeries and authenticating the content of digital images is…

HTML

PDF (2.2 MB)

Downloads

636

Abstract

We propose in this paper a novel reliable detection method to recognize forged inpainting images. Detecting potential forgeries and authenticating the content of digital images is extremely challenging and important for many applications. The proposed approach involves developing new probabilistic support vector machines (SVMs) kernels from a flexible generative statistical model named “bounded generalized Gaussian mixture model”. The developed learning framework has the advantage to combine properly the benefits of both discriminative and generative models and to include prior knowledge about the nature of data. It can effectively recognize if an image is a tampered one and also to identify both forged and authentic images. The obtained results confirmed that the developed framework has good performance under numerous inpainted images.

Details

Applied Computing and Informatics, vol. 20 no. 1/2

Type: Research Article

DOI:

ISSN: 2634-1964

Keywords

View access options

Article

Publication date: 30 August 2022

Enhanced gray wolf optimization for estimation of time difference of arrival in WSNs

Devika E. and Saravanan A.

Intelligent prediction of node localization in wireless sensor networks (WSNs) is a major concern for researchers. The huge amount of data generated by modern sensor array systems…

HTML

PDF (1 MB)

Downloads

Abstract

Purpose

Intelligent prediction of node localization in wireless sensor networks (WSNs) is a major concern for researchers. The huge amount of data generated by modern sensor array systems required computationally efficient calibration techniques. This paper aims to improve localization accuracy by identifying obstacles in the optimization process and network scenarios.

Design/methodology/approach

The proposed method is used to incorporate distance estimation between nodes and packet transmission hop counts. This estimation is used in the proposed support vector machine (SVM) to find the network path using a time difference of arrival (TDoA)-based SVM. However, if the data set is noisy, SVM is prone to poor optimization, which leads to overlapping of target classes and the pathways through TDoA. The enhanced gray wolf optimization (EGWO) technique is introduced to eliminate overlapping target classes in the SVM.

Findings

The performance and efficacy of the model using existing TDoA methodologies are analyzed. The simulation results show that the proposed TDoA-EGWO achieves a higher rate of detection efficiency of 98% and control overhead of 97.8% and a better packet delivery ratio than other traditional methods.

Originality/value

The proposed method is successful in detecting the unknown position of the sensor node with a detection rate greater than that of other methods.

Details

International Journal of Pervasive Computing and Communications, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 1742-7371

Keywords

View access options

Article

Publication date: 29 December 2022

A machine learning-based human resources recruitment system for business process management: using LSA, BERT and SVM

Xiaoguang Tian, Robert Pavur, Henry Han and Lili Zhang

Studies on mining text and generating intelligence on human resource documents are rare. This research aims to use artificial intelligence and machine learning techniques to…

HTML

PDF (1.4 MB)

Downloads

1856

Abstract

Purpose

Studies on mining text and generating intelligence on human resource documents are rare. This research aims to use artificial intelligence and machine learning techniques to facilitate the employee selection process through latent semantic analysis (LSA), bidirectional encoder representations from transformers (BERT) and support vector machines (SVM). The research also compares the performance of different machine learning, text vectorization and sampling approaches on the human resource (HR) resume data.

Design/methodology/approach

LSA and BERT are used to discover and understand the hidden patterns from a textual resume dataset, and SVM is applied to build the screening model and improve performance.

Findings

Based on the results of this study, LSA and BERT are proved useful in retrieving critical topics, and SVM can optimize the prediction model performance with the help of cross-validation and variable selection strategies.

Research limitations/implications

The technique and its empirical conclusions provide a practical, theoretical basis and reference for HR research.

Practical implications

The novel methods proposed in the study can assist HR practitioners in designing and improving their existing recruitment process. The topic detection techniques used in the study provide HR practitioners insights to identify the skill set of a particular recruiting position.

Originality/value

To the best of the authors’ knowledge, this research is the first study that uses LSA, BERT, SVM and other machine learning models in human resource management and resume classification. Compared with the existing machine learning-based resume screening system, the proposed system can provide more interpretable insights for HR professionals to understand the recommendation results through the topics extracted from the resumes. The findings of this study can also help organizations to find a better and effective approach for resume screening and evaluation.

Details

Business Process Management Journal, vol. 29 no. 1

Type: Research Article

DOI:

ISSN: 1463-7154

Keywords

View access options

Book part

Publication date: 11 September 2020

Evaluating Consumer Loans Using Machine Learning Techniques

D. K. Malhotra, Kunal Malhotra and Rashmi Malhotra

Traditionally, loan officers use different credit scoring models to complement judgmental methods to classify consumer loan applications. This study explores the use of decision…

HTML

PDF (86 KB)

EPUB (14.4 MB)

Abstract

Traditionally, loan officers use different credit scoring models to complement judgmental methods to classify consumer loan applications. This study explores the use of decision trees, AdaBoost, and support vector machines (SVMs) to identify potential bad loans. Our results show that AdaBoost does provide an improvement over simple decision trees as well as SVM models in predicting good credit clients and bad credit clients. To cross-validate our results, we use k-fold classification methodology.

Details

Applications of Management Science

Type: Book

DOI:

ISBN: 978-1-83867-001-6

Keywords

View access options

Article

Publication date: 1 October 2021

A new combined transient extraction method coupled with WO₃ gas sensors for polluting gases classification

Rabeb Faleh, Sami Gomri, Khalifa Aguir and Abdennaceur Kachouri

The purpose of this paper is to deal with the classification improvement of pollutant using WO3 gases sensors. To evaluate the discrimination capacity, some experiments were…

HTML

PDF (1.3 MB)

Downloads

124

Abstract

Purpose

The purpose of this paper is to deal with the classification improvement of pollutant using WO3 gases sensors. To evaluate the discrimination capacity, some experiments were achieved using three gases: ozone, ethanol, acetone and a mixture of ozone and ethanol via four WO3 sensors.

Design/methodology/approach

To improve the classification accuracy and enhance selectivity, some combined features that were configured through the principal component analysis were used. First, evaluate the discrimination capacity; some experiments were performed using three gases: ozone, ethanol, acetone and a mixture of ozone and ethanol, via four WO3 sensors. To this end, three features that are derivate, integral and the time corresponding to the peak derivate have been extracted from each transient sensor response according to four WO3 gas sensors used. Then these extracted parameters were used in a combined array.

Findings

The results show that the proposed feature extraction method could extract robust information. The Extreme Learning Machine (ELM) was used to identify the studied gases. In addition, ELM was compared with the Support Vector Machine (SVM). The experimental results prove the superiority of the combined features method in our E-nose application, as this method achieves the highest classification rate of 90% using the ELM and 93.03% using the SVM based on Radial Basis Kernel Function SVM-RBF.

Originality/value

Combined features have been configured from transient response to improve the classification accuracy. The achieved results show that the proposed feature extraction method could extract robust information. The ELM and SVM were used to identify the studied gases.

Details

Sensor Review, vol. 41 no. 5

Type: Research Article

DOI:

ISSN: 0260-2288

Keywords

Access

Year

Content type

1 – 10 of 720

Abstract

Details

Abstract

Purpose

Design/methodology/approach

Findings

Research limitations/implications

Practical implications

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Research limitations/implications

Practical implications

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Research limitations/implications

Practical implications

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Research limitations/implications

Practical implications

Originality/value

Details

Keywords

Abstract

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Access

Year

Content type

We’re listening — tell us what you think

Something didn’t work…

All feedback is valuable

Join us on our journey

Platform update page

Questions & More Information