Search results

1 – 10 of 28
Article
Publication date: 29 October 2018

Shrawan Kumar Trivedi and Shubhamoy Dey

To be sustainable and competitive in the current business environment, it is useful to understand users’ sentiment towards products and services. This critical task can be…

Abstract

Purpose

To be sustainable and competitive in the current business environment, it is useful to understand users’ sentiment towards products and services. This critical task can be achieved via natural language processing and machine learning classifiers. This paper aims to propose a novel probabilistic committee selection classifier (PCC) to analyse and classify the sentiment polarities of movie reviews.

Design/methodology/approach

An Indian movie review corpus is assembled for this study. Another publicly available movie review polarity corpus is also involved with regard to validating the results. The greedy stepwise search method is used to extract the features/words of the reviews. The performance of the proposed classifier is measured using different metrics, such as F-measure, false positive rate, receiver operating characteristic (ROC) curve and training time. Further, the proposed classifier is compared with other popular machine-learning classifiers, such as Bayesian, Naïve Bayes, Decision Tree (J48), Support Vector Machine and Random Forest.

Findings

The results of this study show that the proposed classifier is good at predicting the positive or negative polarity of movie reviews. Its performance accuracy and the value of the ROC curve of the PCC is found to be the most suitable of all other classifiers tested in this study. This classifier is also found to be efficient at identifying positive sentiments of reviews, where it gives low false positive rates for both the Indian Movie Review and Review Polarity corpora used in this study. The training time of the proposed classifier is found to be slightly higher than that of Bayesian, Naïve Bayes and J48.

Research limitations/implications

Only movie review sentiments written in English are considered. In addition, the proposed committee selection classifier is prepared only using the committee of probabilistic classifiers; however, other classifier committees can also be built, tested and compared with the present experiment scenario.

Practical implications

In this paper, a novel probabilistic approach is proposed and used for classifying movie reviews, and is found to be highly effective in comparison with other state-of-the-art classifiers. This classifier may be tested for different applications and may provide new insights for developers and researchers.

Social implications

The proposed PCC may be used to classify different product reviews, and hence may be beneficial to organizations to justify users’ reviews about specific products or services. By using authentic positive and negative sentiments of users, the credibility of the specific product, service or event may be enhanced. PCC may also be applied to other applications, such as spam detection, blog mining, news mining and various other data-mining applications.

Originality/value

The constructed PCC is novel and was tested on Indian movie review data.

Article
Publication date: 4 April 2022

Shrawan Kumar Trivedi, Amrinder Singh and Somesh Kumar Malhotra

There is a need to predict whether the consumers liked the stay in the hotel rooms or not, and to remove the aspects the customers did not like. Many customers leave a review…

Abstract

Purpose

There is a need to predict whether the consumers liked the stay in the hotel rooms or not, and to remove the aspects the customers did not like. Many customers leave a review after staying in the hotel. These reviews are mostly given on the website used to book the hotel. These reviews can be considered as a valuable data, which can be analyzed to provide better services in the hotels. The purpose of this study is to use machine learning techniques for analyzing the given data to determine different sentiment polarities of the consumers.

Design/methodology/approach

Reviews given by hotel customers on the Tripadvisor website, which were made available publicly by Kaggle. Out of 10,000 reviews in the data, a sample of 3,000 negative polarity reviews (customers with bad experiences) in the hotel and 3,000 positive polarity reviews (customers with good experiences) in the hotel is taken to prepare data set. The two-stage feature selection was applied, which first involved greedy selection method and then wrapper method to generate 37 most relevant features. An improved stacked decision tree (ISD) classifier) is built, which is further compared with state-of-the-art machine learning algorithms. All the tests are done using R-Studio.

Findings

The results showed that the new model was satisfactory overall with 80.77% accuracy after doing in-depth study with 50–50 split, 80.74% accuracy for 66–34 split and 80.25% accuracy for 80–20 split, when predicting the nature of the customers’ experience in the hotel, i.e. whether they are positive or negative.

Research limitations/implications

The implication of this research is to provide a showcase of how we can predict the polarity of potentially popular reviews. This helps the authors’ perspective to help the hotel industries to take corrective measures for the betterment of business and to promote useful positive reviews. This study also has some limitations like only English reviews are considered. This study was restricted to the data from trip-adviser website; however, a new data may be generated to test the credibility of the model. Only aspect-based sentiment classification is considered in this study.

Originality/value

Stacking machine learning techniques have been proposed. At first, state-of-the-art classifiers are tested on the given data, and then, three best performing classifiers (decision tree C5.0, random forest and support vector machine) are taken to build stack and to create ISD classifier.

Article
Publication date: 28 February 2019

Gabrijela Dimic, Dejan Rancic, Nemanja Macek, Petar Spalevic and Vida Drasute

This paper aims to deal with the previously unknown prediction accuracy of students’ activity pattern in a blended learning environment.

Abstract

Purpose

This paper aims to deal with the previously unknown prediction accuracy of students’ activity pattern in a blended learning environment.

Design/methodology/approach

To extract the most relevant activity feature subset, different feature-selection methods were applied. For different cardinality subsets, classification models were used in the comparison.

Findings

Experimental evaluation oppose the hypothesis that feature vector dimensionality reduction leads to prediction accuracy increasing.

Research limitations/implications

Improving prediction accuracy in a described learning environment was based on applying synthetic minority oversampling technique, which had affected results on correlation-based feature-selection method.

Originality/value

The major contribution of the research is the proposed methodology for selecting the optimal low-cardinal subset of students’ activities and significant prediction accuracy improvement in a blended learning environment.

Details

Information Discovery and Delivery, vol. 47 no. 2
Type: Research Article
ISSN: 2398-6247

Keywords

Article
Publication date: 1 November 2019

Shrawan Kumar Trivedi and Shubhamoy Dey

Email is a rapid and cheapest medium of sharing information, whereas unsolicited email (spam) is constant trouble in the email communication. The rapid growth of the spam creates…

Abstract

Purpose

Email is a rapid and cheapest medium of sharing information, whereas unsolicited email (spam) is constant trouble in the email communication. The rapid growth of the spam creates a necessity to build a reliable and robust spam classifier. This paper aims to presents a study of evolutionary classifiers (genetic algorithm [GA] and genetic programming [GP]) without/with the help of an ensemble of classifiers method. In this research, the classifiers ensemble has been developed with adaptive boosting technique.

Design/methodology/approach

Text mining methods are applied for classifying spam emails and legitimate emails. Two data sets (Enron and SpamAssassin) are taken to test the concerned classifiers. Initially, pre-processing is performed to extract the features/words from email files. Informative feature subset is selected from greedy stepwise feature subset search method. With the help of informative features, a comparative study is performed initially within the evolutionary classifiers and then with other popular machine learning classifiers (Bayesian, naive Bayes and support vector machine).

Findings

This study reveals the fact that evolutionary algorithms are promising in classification and prediction applications where genetic programing with adaptive boosting is turned out not only an accurate classifier but also a sensitive classifier. Results show that initially GA performs better than GP but after an ensemble of classifiers (a large number of iterations), GP overshoots GA with significantly higher accuracy. Amongst all classifiers, boosted GP turns out to be not only good regarding classification accuracy but also low false positive (FP) rates, which is considered to be the important criteria in email spam classification. Also, greedy stepwise feature search is found to be an effective method for feature selection in this application domain.

Research limitations/implications

The research implication of this research consists of the reduction in cost incurred because of spam/unsolicited bulk email. Email is a fundamental necessity to share information within a number of units of the organizations to be competitive with the business rivals. In addition, it is continually a hurdle for internet service providers to provide the best emailing services to their customers. Although, the organizations and the internet service providers are continuously adopting novel spam filtering approaches to reduce the number of unwanted emails, the desired effect could not be significantly seen because of the cost of installation, customizable ability and the threat of misclassification of important emails. This research deals with all the issues and challenges faced by internet service providers and organizations.

Practical implications

In this research, the proposed models have not only provided excellent performance accuracy, sensitivity with low FP rate, customizable capability but also worked on reducing the cost of spam. The same models may be used for other applications of text mining also such as sentiment analysis, blog mining, news mining or other text mining research.

Originality/value

A comparison between GP and GAs has been shown with/without ensemble in spam classification application domain.

Article
Publication date: 30 October 2018

Shrawan Kumar Trivedi and Prabin Kumar Panigrahi

Email spam classification is now becoming a challenging area in the domain of text classification. Precise and robust classifiers are not only judged by classification accuracy…

Abstract

Purpose

Email spam classification is now becoming a challenging area in the domain of text classification. Precise and robust classifiers are not only judged by classification accuracy but also by sensitivity (correctly classified legitimate emails) and specificity (correctly classified unsolicited emails) towards the accurate classification, captured by both false positive and false negative rates. This paper aims to present a comparative study between various decision tree classifiers (such as AD tree, decision stump and REP tree) with/without different boosting algorithms (bagging, boosting with re-sample and AdaBoost).

Design/methodology/approach

Artificial intelligence and text mining approaches have been incorporated in this study. Each decision tree classifier in this study is tested on informative words/features selected from the two publically available data sets (SpamAssassin and LingSpam) using a greedy step-wise feature search method.

Findings

Outcomes of this study show that without boosting, the REP tree provides high performance accuracy with the AD tree ranking as the second-best performer. Decision stump is found to be the under-performing classifier of this study. However, with boosting, the combination of REP tree and AdaBoost compares favourably with other classification models. If the metrics false positive rate and performance accuracy are taken together, AD tree and REP tree with AdaBoost were both found to carry out an effective classification task. Greedy stepwise has proven its worth in this study by selecting a subset of valuable features to identify the correct class of emails.

Research limitations/implications

This research is focussed on the classification of those email spams that are written in the English language only. The proposed models work with content (words/features) of email data that is mostly found in the body of the mail. Image spam has not been included in this study. Other messages such as short message service or multi-media messaging service were not included in this study.

Practical implications

In this research, a boosted decision tree approach has been proposed and used to classify email spam and ham files; this is found to be a highly effective approach in comparison with other state-of-the-art modes used in other studies. This classifier may be tested for different applications and may provide new insights for developers and researchers.

Originality/value

A comparison of decision tree classifiers with/without ensemble has been presented for spam classification.

Details

Journal of Systems and Information Technology, vol. 20 no. 3
Type: Research Article
ISSN: 1328-7265

Keywords

Article
Publication date: 24 June 2021

Chanapol Pornpikul and Sampan Nettayanun

The authors study the explanatory power of investor rationality and irrationality for value and momentum portfolios. We also examine the relationships during financial crisis…

Abstract

Purpose

The authors study the explanatory power of investor rationality and irrationality for value and momentum portfolios. We also examine the relationships during financial crisis events, namely, the US subprime mortgage crisis (2007–2009) and the European debt crisis (2011–2013).

Design/methodology/approach

This study examines the influence of investors’ rationality and irrationality on the US stock market, using the multiple linear regression model and the stepwise regression model. Technically, the stepwise regression uses the machine-learning technique, with specific testing methods — forward selection, backward selection and stepwise selection — to find the best-fit model, according to Akaike’s Information Criterion (AIC). Thus, in this study, we will show the best model, as tested by the stepwise regression model.

Findings

Our empirical results contribute to the importance of reasons and emotions for stock-market returns and conclude that rationality and irrationality simultaneously explain the value and momentum portfolios, as well as the ETF portfolios. Also, the rational and irrational explanatory powers differ, depending on portfolios and different periods. Rational factors usually explain the volatility of the return to a greater extent than irrational factors. Moreover, during a financial crisis, the irrational factors remarkably increase their importance in explaining returns, especially for the ETF portfolios.

Originality/value

We expect this study’s contribution will show not only academic contribution but also benefit many stakeholders in the financial market. Investors and traders can identify various irrational factors of trading — for example, taking a long position during the panic in the market following the indicators in the models. Managers also reconsider the cost of the company by adding irrational factors when computing the equity’s expected return. Similarly, stock exchanges can adequately adjust their circuit breaker during a pessimistic-investor period. Finally, regulators can evaluate a complete picture of the stock market by adding irrational factors into their considerations.

Details

Review of Behavioral Finance, vol. 14 no. 5
Type: Research Article
ISSN: 1940-5979

Keywords

Article
Publication date: 15 June 2021

Hoyoung Rho, Keunho Choi and Donghee Yoo

This study identifies whether the Internet search index can be used as effective enough data to identify agricultural and livestock product demand and compare the accuracy of the…

Abstract

Purpose

This study identifies whether the Internet search index can be used as effective enough data to identify agricultural and livestock product demand and compare the accuracy of the prediction of major agricultural and livestock products purchases between these prediction models using artificial neural network, linear regression and a decision tree.

Design/methodology/approach

Artificial neural network, linear regression and decision tree algorithms were used in this study to compare the accuracy of the prediction of major agricultural and livestock products purchases. The analysis data were studied using 10-fold cross validation.

Findings

First, the importance of the Internet search index among the 20 explanatory variables was found to be high for most items, so the Internet search index can be used as a variable to explain agricultural and livestock products purchases. Second, as a result of comparing the accuracy of the prediction of six agricultural and livestock purchases using three models, beef was the most predictable, followed by radishes, chicken, Chinese cabbage, garlic and dried peppers, and by model, a decision tree shows the highest accuracy of prediction, followed by linear regression and an artificial neural network.

Originality/value

This study is meaningful in that it analyzes the purchase of agricultural and livestock products using data from actual consumers' purchases of agricultural and livestock products. In addition, the use of data mining techniques and Internet search index in the analysis of agricultural and livestock purchases contributes to improving the accuracy and efficiency of agricultural and livestock purchase predictions.

Details

Data Technologies and Applications, vol. 55 no. 5
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 13 June 2019

Samia Ben Amarat and Peng Zong

This paper aims to present a comprehensive review in major research areas of unmanned air vehicles (UAVs) navigation, i.e. three degree-of-freedom (3D) path planning, routing…

1090

Abstract

Purpose

This paper aims to present a comprehensive review in major research areas of unmanned air vehicles (UAVs) navigation, i.e. three degree-of-freedom (3D) path planning, routing algorithm and routing protocols. The paper is further aimed to provide a meaningful comparison among these algorithms and methods and also intend to find the best ones for a particular application.

Design/methodology/approach

The major UAV navigation research areas are further classified into different categories based on methods and models. Each category is discussed in detail with updated research work done in that very domain. Performance evaluation criteria are defined separately for each category. Based on these criteria and research challenges, research questions are also proposed in this work and answered in discussion according to the presented literature review.

Findings

The research has found that conventional and node-based algorithms are a popular choice for path planning. Similarly, the graph-based methods are preferred for route planning and hybrid routing protocols are proved better in providing performance. The research has also found promising areas for future research directions, i.e. critical link method for UAV path planning and queuing theory as a routing algorithm for large UAV networks.

Originality/value

The proposed work is a first attempt to provide a comprehensive study on all research aspects of UAV navigation. In addition, a comparison of these methods, algorithms and techniques based on standard performance criteria is also presented the very first time.

Details

Aircraft Engineering and Aerospace Technology, vol. 91 no. 9
Type: Research Article
ISSN: 1748-8842

Keywords

Article
Publication date: 4 November 2014

Ahmad Mozaffari, Nasser Lashgarian Azad and Alireza Fathi

The purpose of this paper is to demonstrate the applicability of swarm and evolutionary techniques for regularized machine learning. Generally, by defining a proper penalty…

Abstract

Purpose

The purpose of this paper is to demonstrate the applicability of swarm and evolutionary techniques for regularized machine learning. Generally, by defining a proper penalty function, regularization laws are embedded into the structure of common least square solutions to increase the numerical stability, sparsity, accuracy and robustness of regression weights. Several regularization techniques have been proposed so far which have their own advantages and disadvantages. Several efforts have been made to find fast and accurate deterministic solvers to handle those regularization techniques. However, the proposed numerical and deterministic approaches need certain knowledge of mathematical programming, and also do not guarantee the global optimality of the obtained solution. In this research, the authors propose the use of constraint swarm and evolutionary techniques to cope with demanding requirements of regularized extreme learning machine (ELM).

Design/methodology/approach

To implement the required tools for comparative numerical study, three steps are taken. The considered algorithms contain both classical and swarm and evolutionary approaches. For the classical regularization techniques, Lasso regularization, Tikhonov regularization, cascade Lasso-Tikhonov regularization, and elastic net are considered. For swarm and evolutionary-based regularization, an efficient constraint handling technique known as self-adaptive penalty function constraint handling is considered, and its algorithmic structure is modified so that it can efficiently perform the regularized learning. Several well-known metaheuristics are considered to check the generalization capability of the proposed scheme. To test the efficacy of the proposed constraint evolutionary-based regularization technique, a wide range of regression problems are used. Besides, the proposed framework is applied to a real-life identification problem, i.e. identifying the dominant factors affecting the hydrocarbon emissions of an automotive engine, for further assurance on the performance of the proposed scheme.

Findings

Through extensive numerical study, it is observed that the proposed scheme can be easily used for regularized machine learning. It is indicated that by defining a proper objective function and considering an appropriate penalty function, near global optimum values of regressors can be easily obtained. The results attest the high potentials of swarm and evolutionary techniques for fast, accurate and robust regularized machine learning.

Originality/value

The originality of the research paper lies behind the use of a novel constraint metaheuristic computing scheme which can be used for effective regularized optimally pruned extreme learning machine (OP-ELM). The self-adaption of the proposed method alleviates the user from the knowledge of the underlying system, and also increases the degree of the automation of OP-ELM. Besides, by using different types of metaheuristics, it is demonstrated that the proposed methodology is a general flexible scheme, and can be combined with different types of swarm and evolutionary-based optimization techniques to form a regularized machine learning approach.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 7 no. 4
Type: Research Article
ISSN: 1756-378X

Keywords

Abstract

Details

Documents from the History of Economic Thought
Type: Book
ISBN: 978-0-7623-1423-2

1 – 10 of 28