Search results

1 – 10 of 711
Article
Publication date: 29 October 2018

Shrawan Kumar Trivedi and Shubhamoy Dey

To be sustainable and competitive in the current business environment, it is useful to understand users’ sentiment towards products and services. This critical task can be…

Abstract

Purpose

To be sustainable and competitive in the current business environment, it is useful to understand users’ sentiment towards products and services. This critical task can be achieved via natural language processing and machine learning classifiers. This paper aims to propose a novel probabilistic committee selection classifier (PCC) to analyse and classify the sentiment polarities of movie reviews.

Design/methodology/approach

An Indian movie review corpus is assembled for this study. Another publicly available movie review polarity corpus is also involved with regard to validating the results. The greedy stepwise search method is used to extract the features/words of the reviews. The performance of the proposed classifier is measured using different metrics, such as F-measure, false positive rate, receiver operating characteristic (ROC) curve and training time. Further, the proposed classifier is compared with other popular machine-learning classifiers, such as Bayesian, Naïve Bayes, Decision Tree (J48), Support Vector Machine and Random Forest.

Findings

The results of this study show that the proposed classifier is good at predicting the positive or negative polarity of movie reviews. Its performance accuracy and the value of the ROC curve of the PCC is found to be the most suitable of all other classifiers tested in this study. This classifier is also found to be efficient at identifying positive sentiments of reviews, where it gives low false positive rates for both the Indian Movie Review and Review Polarity corpora used in this study. The training time of the proposed classifier is found to be slightly higher than that of Bayesian, Naïve Bayes and J48.

Research limitations/implications

Only movie review sentiments written in English are considered. In addition, the proposed committee selection classifier is prepared only using the committee of probabilistic classifiers; however, other classifier committees can also be built, tested and compared with the present experiment scenario.

Practical implications

In this paper, a novel probabilistic approach is proposed and used for classifying movie reviews, and is found to be highly effective in comparison with other state-of-the-art classifiers. This classifier may be tested for different applications and may provide new insights for developers and researchers.

Social implications

The proposed PCC may be used to classify different product reviews, and hence may be beneficial to organizations to justify users’ reviews about specific products or services. By using authentic positive and negative sentiments of users, the credibility of the specific product, service or event may be enhanced. PCC may also be applied to other applications, such as spam detection, blog mining, news mining and various other data-mining applications.

Originality/value

The constructed PCC is novel and was tested on Indian movie review data.

Article
Publication date: 14 November 2016

Shrawan Kumar Trivedi and Shubhamoy Dey

The email is an important medium for sharing information rapidly. However, spam, being a nuisance in such communication, motivates the building of a robust filtering system with…

Abstract

Purpose

The email is an important medium for sharing information rapidly. However, spam, being a nuisance in such communication, motivates the building of a robust filtering system with high classification accuracy and good sensitivity towards false positives. In that context, this paper aims to present a combined classifier technique using a committee selection mechanism where the main objective is to identify a set of classifiers so that their individual decisions can be combined by a committee selection procedure for accurate detection of spam.

Design/methodology/approach

For training and testing of the relevant machine learning classifiers, text mining approaches are used in this research. Three data sets (Enron, SpamAssassin and LingSpam) have been used to test the classifiers. Initially, pre-processing is performed to extract the features associated with the email files. In the next step, the extracted features are taken through a dimensionality reduction method where non-informative features are removed. Subsequently, an informative feature subset is selected using genetic feature search. Thereafter, the proposed classifiers are tested on those informative features and the results compared with those of other classifiers.

Findings

For building the proposed combined classifier, three different studies have been performed. The first study identifies the effect of boosting algorithms on two probabilistic classifiers: Bayesian and Naïve Bayes. In that study, AdaBoost has been found to be the best algorithm for performance boosting. The second study was on the effect of different Kernel functions on support vector machine (SVM) classifier, where SVM with normalized polynomial (NP) kernel was observed to be the best. The last study was on combining classifiers with committee selection where the committee members were the best classifiers identified by the first study i.e. Bayesian and Naïve bays with AdaBoost, and the committee president was selected from the second study i.e. SVM with NP kernel. Results show that combining of the identified classifiers to form a committee machine gives excellent performance accuracy with a low false positive rate.

Research limitations/implications

This research is focused on the classification of email spams written in English language. Only body (text) parts of the emails have been used. Image spam has not been included in this work. We have restricted our work to only emails messages. None of the other types of messages like short message service or multi-media messaging service were a part of this study.

Practical implications

This research proposes a method of dealing with the issues and challenges faced by internet service providers and organizations that use email. The proposed model provides not only better classification accuracy but also a low false positive rate.

Originality/value

The proposed combined classifier is a novel classifier designed for accurate classification of email spam.

Details

VINE Journal of Information and Knowledge Management Systems, vol. 46 no. 4
Type: Research Article
ISSN: 2059-5891

Keywords

Article
Publication date: 21 January 2019

Issa Alsmadi and Keng Hoon Gan

Rapid developments in social networks and their usage in everyday life have caused an explosion in the amount of short electronic documents. Thus, the need to classify this type…

1086

Abstract

Purpose

Rapid developments in social networks and their usage in everyday life have caused an explosion in the amount of short electronic documents. Thus, the need to classify this type of document based on their content has a significant implication in many applications. The need to classify these documents in relevant classes according to their text contents should be interested in many practical reasons. Short-text classification is an essential step in many applications, such as spam filtering, sentiment analysis, Twitter personalization, customer review and many other applications related to social networks. Reviews on short text and its application are limited. Thus, this paper aims to discuss the characteristics of short text, its challenges and difficulties in classification. The paper attempt to introduce all stages in principle classification, the technique used in each stage and the possible development trend in each stage.

Design/methodology/approach

The paper as a review of the main aspect of short-text classification. The paper is structured based on the classification task stage.

Findings

This paper discusses related issues and approaches to these problems. Further research could be conducted to address the challenges in short texts and avoid poor accuracy in classification. Problems in low performance can be solved by using optimized solutions, such as genetic algorithms that are powerful in enhancing the quality of selected features. Soft computing solution has a fuzzy logic that makes short-text problems a promising area of research.

Originality/value

Using a powerful short-text classification method significantly affects many applications in terms of efficiency enhancement. Current solutions still have low performance, implying the need for improvement. This paper discusses related issues and approaches to these problems.

Details

International Journal of Web Information Systems, vol. 15 no. 2
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 1 August 2003

Alexandr Seleznyov and Seppo Puuronen

Nowadays computer and network intrusions have become more common and more complicated, challenging the intrusion detection systems. Also, network traffic has been constantly…

Abstract

Nowadays computer and network intrusions have become more common and more complicated, challenging the intrusion detection systems. Also, network traffic has been constantly increasing. As a consequence, the amount of data to be processed by an intrusion detection system has been growing, making it difficult to efficiently detect intrusions online. Proposes an approach for continuous user authentication based on the user’s behaviour, aiming at development of an efficient and portable anomaly intrusion detection system. A prototype of a host‐based intrusion detection system was built. It detects masqueraders by comparing the current user behaviour with his/her stored behavioural model. The model itself is represented by a number of patterns that describe sequential and temporal behavioural regularities of the users. This paper also discusses implementation issues, describes the authors’ solutions, and provides performance results of the prototype.

Details

Information Management & Computer Security, vol. 11 no. 3
Type: Research Article
ISSN: 0968-5227

Keywords

Article
Publication date: 30 November 2022

Dhanya M. and Sanjana S.

The purpose of this paper is to understand the customer sentiment towards telemedicine apps and also to apply machine learning algorithms to analyse the sentiments in the adoption…

Abstract

Purpose

The purpose of this paper is to understand the customer sentiment towards telemedicine apps and also to apply machine learning algorithms to analyse the sentiments in the adoption during the COVID-19 pandemic.

Design/methodology/approach

Text mining that uses natural language processing to extract insights from unstructured text is used to find out the customer sentiment towards the telemedicine apps during the COVID-19 pandemic. Machine learning algorithms like support vector machine (SVM) and Naïve Bayes classifier are used for classification, and their sensitivity and specificity are found using a confusion matrix.

Findings

The paper explores the customer sentiment towards telemedicine apps and their adoption during the COVID-19 pandemic. Text mining that uses natural language processing to extract insights from unstructured text is used to find out the customer sentiment towards the telemedicine apps during the COVID-19 pandemic. Machine learning algorithms like SVM and Naïve Bayes classifier are used for classification, and their sensitivity and specificity are found using a confusion matrix. The customers who used telemedicine apps have positive sentiment as well as negative sentiment towards the telemedicine apps. Some of the customers have concerns about the medicines delivered, their delivery time, the quality of service and other technical difficulties. Even a small percentage of doctors feel uncomfortable in online consultation through the application.

Originality/value

The primary value of this paper lies in providing an overview of the customers’ approach towards the telemedicine apps, especially during the COVID-19 pandemic.

Details

Journal of Science and Technology Policy Management, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2053-4620

Keywords

Article
Publication date: 14 November 2016

Konstantinos Domdouzis, Babak Akhgar, Simon Andrews, Helen Gibson and Laurence Hirsch

A number of crisis situations, such as natural disasters, have affected the planet over the past decade. The outcomes of such disasters are catastrophic for the infrastructures of…

1249

Abstract

Purpose

A number of crisis situations, such as natural disasters, have affected the planet over the past decade. The outcomes of such disasters are catastrophic for the infrastructures of modern societies. Furthermore, after large disasters, societies come face-to-face with important issues, such as the loss of human lives, people who are missing and the increment of the criminality rate. In many occasions, they seem unprepared to face such issues. This paper aims to present an automated social media and crowdsourcing data mining system for the synchronization of the police and law enforcement agencies for the prevention of criminal activities during and post a large crisis situation.

Design/methodology/approach

The paper realized qualitative research in the form of a review of the literature. This review focuses on the necessity of using social media and crowdsourcing data mining techniques in combination with advanced Web technologies for the purpose of providing solutions to problems related to criminal activities caused during and after a crisis. The paper presents the ATHENA crisis management system, which uses a number of data mining techniques to collect and analyze crisis-related data from social media for the purpose of crime prevention.

Findings

Conclusions are drawn on the significance of social media and crowdsourcing data mining techniques for the resolution of problems related to large crisis situations with emphasis to the ATHENA system.

Originality/value

The paper shows how the integrated use of social media and data mining algorithms can contribute in the resolution of problems that are developed during and after a large crisis.

Details

Journal of Systems and Information Technology, vol. 18 no. 4
Type: Research Article
ISSN: 1328-7265

Keywords

Article
Publication date: 12 October 2012

Policarpo C. deMattos, Daniel M. Miller and Eui H. Park

This paper aims to examine complex clinical decision‐making processes in trauma center units of hospitals in terms of the immediate impact of complexity on the medical team…

1034

Abstract

Purpose

This paper aims to examine complex clinical decision‐making processes in trauma center units of hospitals in terms of the immediate impact of complexity on the medical team involved in the trauma event.

Design/methodology/approach

It is proposed to develop a model of decision‐making processes in trauma events that uses a Bayesian classifier model with convolution and deconvolution operators to study real‐time observed trauma data for the decision‐making process under tremendous stress. The objective is to explore and explain physicians' decision‐making processes under stress and time constraints during actual trauma events from the perspective of complexity.

Findings

Because physicians have blurred information and cues that are tainted by random environmental noise during injury‐related events, they must de‐blur (de‐convolute) the collected data to find a best approximation of the real data for decision‐making processes.

Research limitations/implications

The data collection and analysis is innovative and the permission to access raw audio and video data from an active trauma center will differentiate this study from similar studies that rely on simulations, self report and case study approaches.

Practical implications

Clinical decision makers in trauma centers are placed in situations that are increasingly complex, making decision‐making and problem‐solving processes multifaceted.

Originality/value

The science of complex adaptive systems, together with human judgment theories, provide important concepts and tools for responding to the challenges of healthcare this century and beyond.

Details

Management Decision, vol. 50 no. 9
Type: Research Article
ISSN: 0025-1747

Keywords

Article
Publication date: 8 March 2022

Gabriel Caldas Montes and Vítor Manuel Araújo da Fonseca

Using a fiscal sentiment indicator, this study aims to verify whether fiscal sentiment affects the yield curve in Brazil. Since policymakers highlight the coordination between…

Abstract

Purpose

Using a fiscal sentiment indicator, this study aims to verify whether fiscal sentiment affects the yield curve in Brazil. Since policymakers highlight the coordination between monetary and fiscal policies and the importance of fiscal policy to the expectations formation process in inflation targeting regimes, the authors also explore the transmission mechanism through inflation expectations. Hence, the study also analyzes the effect of fiscal sentiment on interest rate swap spreads through the inflation expectations channel.

Design/methodology/approach

Based on information obtained from official communiqués about fiscal policies issued by the Central Bank of Brazil and the Brazilian Ministry of Finance, the study builds a fiscal sentiment indicator. The econometric strategy to verify whether fiscal sentiment is related to the short tail of the yield curve is based on time series analysis through ordinary least squares and generalized method of moments estimates. In turn, to estimate the transmission mechanism through inflation expectations, the model uses interaction terms between fiscal sentiment and inflation expectations.

Findings

The results suggest a more optimistic (pessimistic) fiscal sentiment reduces (increases) swap spreads. The findings reveal that improvements in fiscal credibility and a more optimistic fiscal sentiment are able to reduce the positive marginal effect that inflation expectations variations have on interest rate swap spreads.

Originality/value

This study contributes to the literature, as, to the best of authors’ knowledge, it is the first to analyze the content of the communiqués related to fiscal policy, and based on this content, it extracts the sentiment related to the fiscal environment and analyzes the effect of this sentiment on the yield curve. Besides, different from existing studies that analyze the effect of fiscal backward-looking aspects (such as public debt, budget balance, taxes and public spending) on the yield curve, this study investigates forward-looking aspects related to fiscal policy (such as fiscal credibility and fiscal sentiment).

Details

Journal of Financial Economic Policy, vol. 14 no. 5
Type: Research Article
ISSN: 1757-6385

Keywords

Article
Publication date: 25 January 2018

Hima Bindu and Manjunathachari K.

This paper aims to develop the Hybrid feature descriptor and probabilistic neuro-fuzzy system for attaining the high accuracy in face recognition system. In recent days, facial…

Abstract

Purpose

This paper aims to develop the Hybrid feature descriptor and probabilistic neuro-fuzzy system for attaining the high accuracy in face recognition system. In recent days, facial recognition (FR) systems play a vital part in several applications such as surveillance, access control and image understanding. Accordingly, various face recognition methods have been developed in the literature, but the applicability of these algorithms is restricted because of unsatisfied accuracy. So, the improvement of face recognition is significantly important for the current trend.

Design/methodology/approach

This paper proposes a face recognition system through feature extraction and classification. The proposed model extracts the local and the global feature of the image. The local features of the image are extracted using the kernel based scale invariant feature transform (K-SIFT) model and the global features are extracted using the proposed m-Co-HOG model. (Co-HOG: co-occurrence histograms of oriented gradients) The proposed m-Co-HOG model has the properties of the Co-HOG algorithm. The feature vector database contains combined local and the global feature vectors derived using the K-SIFT model and the proposed m-Co-HOG algorithm. This paper proposes a probabilistic neuro-fuzzy classifier system for the finding the identity of the person from the extracted feature vector database.

Findings

The face images required for the simulation of the proposed work are taken from the CVL database. The simulation considers a total of 114 persons form the CVL database. From the results, it is evident that the proposed model has outperformed the existing models with an improved accuracy of 0.98. The false acceptance rate (FAR) and false rejection rate (FRR) values of the proposed model have a low value of 0.01.

Originality/value

This paper proposes a face recognition system with proposed m-Co-HOG vector and the hybrid neuro-fuzzy classifier. Feature extraction was based on the proposed m-Co-HOG vector for extracting the global features and the existing K-SIFT model for extracting the local features from the face images. The proposed m-Co-HOG vector utilizes the existing Co-HOG model for feature extraction, along with a new color gradient decomposition method. The major advantage of the proposed m-Co-HOG vector is that it utilizes the color features of the image along with other features during the histogram operation.

Details

Sensor Review, vol. 38 no. 3
Type: Research Article
ISSN: 0260-2288

Keywords

Article
Publication date: 1 November 2019

Shrawan Kumar Trivedi and Shubhamoy Dey

Email is a rapid and cheapest medium of sharing information, whereas unsolicited email (spam) is constant trouble in the email communication. The rapid growth of the spam creates…

Abstract

Purpose

Email is a rapid and cheapest medium of sharing information, whereas unsolicited email (spam) is constant trouble in the email communication. The rapid growth of the spam creates a necessity to build a reliable and robust spam classifier. This paper aims to presents a study of evolutionary classifiers (genetic algorithm [GA] and genetic programming [GP]) without/with the help of an ensemble of classifiers method. In this research, the classifiers ensemble has been developed with adaptive boosting technique.

Design/methodology/approach

Text mining methods are applied for classifying spam emails and legitimate emails. Two data sets (Enron and SpamAssassin) are taken to test the concerned classifiers. Initially, pre-processing is performed to extract the features/words from email files. Informative feature subset is selected from greedy stepwise feature subset search method. With the help of informative features, a comparative study is performed initially within the evolutionary classifiers and then with other popular machine learning classifiers (Bayesian, naive Bayes and support vector machine).

Findings

This study reveals the fact that evolutionary algorithms are promising in classification and prediction applications where genetic programing with adaptive boosting is turned out not only an accurate classifier but also a sensitive classifier. Results show that initially GA performs better than GP but after an ensemble of classifiers (a large number of iterations), GP overshoots GA with significantly higher accuracy. Amongst all classifiers, boosted GP turns out to be not only good regarding classification accuracy but also low false positive (FP) rates, which is considered to be the important criteria in email spam classification. Also, greedy stepwise feature search is found to be an effective method for feature selection in this application domain.

Research limitations/implications

The research implication of this research consists of the reduction in cost incurred because of spam/unsolicited bulk email. Email is a fundamental necessity to share information within a number of units of the organizations to be competitive with the business rivals. In addition, it is continually a hurdle for internet service providers to provide the best emailing services to their customers. Although, the organizations and the internet service providers are continuously adopting novel spam filtering approaches to reduce the number of unwanted emails, the desired effect could not be significantly seen because of the cost of installation, customizable ability and the threat of misclassification of important emails. This research deals with all the issues and challenges faced by internet service providers and organizations.

Practical implications

In this research, the proposed models have not only provided excellent performance accuracy, sensitivity with low FP rate, customizable capability but also worked on reducing the cost of spam. The same models may be used for other applications of text mining also such as sentiment analysis, blog mining, news mining or other text mining research.

Originality/value

A comparison between GP and GAs has been shown with/without ensemble in spam classification application domain.

1 – 10 of 711