Search results
1 – 10 of 14Ismail Hmeidi, Mahmoud Al-Ayyoub, Nizar A. Mahyoub and Mohammed A. Shehab
Multi-label Text Classification (MTC) is one of the most recent research trends in data mining and information retrieval domains because of many reasons such as the rapid growth…
Abstract
Purpose
Multi-label Text Classification (MTC) is one of the most recent research trends in data mining and information retrieval domains because of many reasons such as the rapid growth of online data and the increasing tendency of internet users to be more comfortable with assigning multiple labels/tags to describe documents, emails, posts, etc. The dimensionality of labels makes MTC more difficult and challenging compared with traditional single-labeled text classification (TC). Because it is a natural extension of TC, several ways are proposed to benefit from the rich literature of TC through what is called problem transformation (PT) methods. Basically, PT methods transform the multi-label data into a single-label one that is suitable for traditional single-label classification algorithms. Another approach is to design novel classification algorithms customized for MTC. Over the past decade, several works have appeared on both approaches focusing mainly on the English language. This work aims to present an elaborate study of MTC of Arabic articles.
Design/methodology/approach
This paper presents a novel lexicon-based method for MTC, where the keywords that are most associated with each label are extracted from the training data along with a threshold that can later be used to determine whether each test document belongs to a certain label.
Findings
The experiments show that the presented approach outperforms the currently available approaches. Specifically, the results of our experiments show that the best accuracy obtained from existing approaches is only 18 per cent, whereas the accuracy of the presented lexicon-based approach can reach an accuracy level of 31 per cent.
Originality/value
Although there exist some tools that can be customized to address the MTC problem for Arabic text, their accuracies are very low when applied to Arabic articles. This paper presents a novel method for MTC. The experiments show that the presented approach outperforms the currently available approaches.
Details
Keywords
Sonia Osorio Angel, Adriana Peña Pérez Negrón and Aurora Espinoza-Valdez
Most studies on Sentiment Analysis are performed in English. However, as the third most spoken language on the Internet, Sentiment Analysis for Spanish presents its challenges…
Abstract
Purpose
Most studies on Sentiment Analysis are performed in English. However, as the third most spoken language on the Internet, Sentiment Analysis for Spanish presents its challenges from a semantic and syntactic point of view. This review presents a scope of the recent advances in this area.
Design/methodology/approach
A systematic literature review on Sentiment Analysis for the Spanish language was conducted on recognized databases by the research community.
Findings
Results show classification systems through three different approaches: Lexicon based, Machine Learning based and hybrid approaches. Additionally, different linguistic resources as Lexicon or corpus explicitly developed for the Spanish language were found.
Originality/value
This study provides academics and professionals, a review of advances in Sentiment Analysis for the Spanish language. Most reviews on Sentiment Analysis are for English, and other languages such as Chinese or Arabic, but no updated reviews were found for Spanish.
Details
Keywords
Fuli Zhou, Ming K. Lim, Yandong He and Saurabh Pratap
The increasingly booming e-commerce development has stimulated vehicle consumers to express individual reviews through online forum. The purpose of this paper is to probe into the…
Abstract
Purpose
The increasingly booming e-commerce development has stimulated vehicle consumers to express individual reviews through online forum. The purpose of this paper is to probe into the vehicle consumer consumption behavior and make recommendations for potential consumers from textual comments viewpoint.
Design/methodology/approach
A big data analytic-based approach is designed to discover vehicle consumer consumption behavior from online perspective. To reduce subjectivity of expert-based approaches, a parallel Naïve Bayes approach is designed to analyze the sentiment analysis, and the Saaty scale-based (SSC) scoring rule is employed to obtain specific sentimental value of attribute class, contributing to the multi-grade sentiment classification. To achieve the intelligent recommendation for potential vehicle customers, a novel SSC-VIKOR approach is developed to prioritize vehicle brand candidates from a big data analytical viewpoint.
Findings
The big data analytics argue that “cost-effectiveness” characteristic is the most important factor that vehicle consumers care, and the data mining results enable automakers to better understand consumer consumption behavior.
Research limitations/implications
The case study illustrates the effectiveness of the integrated method, contributing to much more precise operations management on marketing strategy, quality improvement and intelligent recommendation.
Originality/value
Researches of consumer consumption behavior are usually based on survey-based methods, and mostly previous studies about comments analysis focus on binary analysis. The hybrid SSC-VIKOR approach is developed to fill the gap from the big data perspective.
Details
Keywords
Javiera M. Guedes, Akinbami Akinwale and María Requemán Fontecha
Content marketing is a crucial aspect of digital marketing in modern firms. By generating content that is interesting and engaging, companies have the two-fold advantage of…
Abstract
Content marketing is a crucial aspect of digital marketing in modern firms. By generating content that is interesting and engaging, companies have the two-fold advantage of promoting their products in a relatable way, while increasing familiarity and engagement with the brand. As data scientists at Credit Suisse, we value our content teams because their voice is the bank's voice. We strive to provide them with the best tools to increase their articles' success. With the help of machine learning, we have created digital products that allow them to improve articles before publication, recommend them to the most interested readers, and track their performance. The chapter begins with a brief introduction to content marketing, followed by an overview of our data, a review of the business challenges we have encountered, and the machine learning solutions we have developed in order to provide the best data insights to our internal and external stakeholders. We close the chapter with a brief summary of our work.
Details
Keywords
Meng Zhao, Mengjiao Liu, Chang Xu and Chenxi Zhang
This study aims to provide a method for classifying travellers’ requirements to help hoteliers understand travellers’ requirements and improve hotel services. Specifically, this…
Abstract
Purpose
This study aims to provide a method for classifying travellers’ requirements to help hoteliers understand travellers’ requirements and improve hotel services. Specifically, this study develops a strength-frequency Kano (SF-Kano) model to classify the requirements expressed by travellers in online reviews.
Design/methodology/approach
The strength and frequency of travellers’ requirements are determined through sentiment and statistical analyses of the 13,217 crawled online reviews. The proposed method considering the interaction between strength and frequency is proposed to classify the different travellers’ requirements.
Findings
This study identifies 13 travellers’ requirements by mining online reviews. According to the results of the improved Kano model, the six travellers’ requirements belong to one-dimensional requirements; two travellers’ requirements belong to must-be requirements; three travellers’ requirements belong to attractive requirements; two travellers’ requirements belong to indifferent requirements.
Research limitations/implications
Results of this research can guide hoteliers to address hotel service improvement strategies according to the types of travellers’ requirements. This study can also expand the analysis scope of hotel online reviews and provide a reference for hoteliers to understand travellers’ requirements.
Originality/value
By mining online reviews, this study proposes an SF-Kano model to classify travellers’ requirements by considering both the strength and frequency of requirements. This study uses the optimisation model to determine the classification thresholds. This process maximises travellers’ satisfaction at the lowest cost. The classification results of travellers’ requirements can help hoteliers gain a deeper understanding of travellers’ requirements and prioritise service improvements.
Details
Keywords
Issa Alsmadi and Keng Hoon Gan
Rapid developments in social networks and their usage in everyday life have caused an explosion in the amount of short electronic documents. Thus, the need to classify this type…
Abstract
Purpose
Rapid developments in social networks and their usage in everyday life have caused an explosion in the amount of short electronic documents. Thus, the need to classify this type of document based on their content has a significant implication in many applications. The need to classify these documents in relevant classes according to their text contents should be interested in many practical reasons. Short-text classification is an essential step in many applications, such as spam filtering, sentiment analysis, Twitter personalization, customer review and many other applications related to social networks. Reviews on short text and its application are limited. Thus, this paper aims to discuss the characteristics of short text, its challenges and difficulties in classification. The paper attempt to introduce all stages in principle classification, the technique used in each stage and the possible development trend in each stage.
Design/methodology/approach
The paper as a review of the main aspect of short-text classification. The paper is structured based on the classification task stage.
Findings
This paper discusses related issues and approaches to these problems. Further research could be conducted to address the challenges in short texts and avoid poor accuracy in classification. Problems in low performance can be solved by using optimized solutions, such as genetic algorithms that are powerful in enhancing the quality of selected features. Soft computing solution has a fuzzy logic that makes short-text problems a promising area of research.
Originality/value
Using a powerful short-text classification method significantly affects many applications in terms of efficiency enhancement. Current solutions still have low performance, implying the need for improvement. This paper discusses related issues and approaches to these problems.
Details
Keywords
Yi-Hung Liu, Sheng-Fong Chen and Dan-Wei (Marian) Wen
Online medical repositories provide a platform for users to share information and dynamically access abundant electronic health data. It is important to determine whether case…
Abstract
Purpose
Online medical repositories provide a platform for users to share information and dynamically access abundant electronic health data. It is important to determine whether case report information can assist the general public in appropriately managing their diseases. Therefore, this paper aims to introduce a novel deep learning-based method that allows non-professionals to make inquiries using ordinary vocabulary, retrieving the most relevant case reports for accurate and effective health information.
Design/methodology/approach
The dataset of case reports was collected from both the patient-generated research network and the digital medical journal repository. To enhance the accuracy of obtaining relevant case reports, the authors propose a retrieval approach that combines BERT and BiLSTM methods. The authors identified representative health-related case reports and analyzed the retrieval performance, as well as user judgments.
Findings
This study aims to provide the necessary functionalities to deliver relevant health case reports based on input from ordinary terms. The proposed framework includes features for health management, user feedback acquisition and ranking by weights to obtain the most pertinent case reports.
Originality/value
This study contributes to health information systems by analyzing patients' experiences and treatments with the case report retrieval model. The results of this study can provide immense benefit to the general public who intend to find treatment decisions and experiences from relevant case reports.
Details
Keywords
Shrawan Kumar Trivedi and Shubhamoy Dey
To be sustainable and competitive in the current business environment, it is useful to understand users’ sentiment towards products and services. This critical task can be…
Abstract
Purpose
To be sustainable and competitive in the current business environment, it is useful to understand users’ sentiment towards products and services. This critical task can be achieved via natural language processing and machine learning classifiers. This paper aims to propose a novel probabilistic committee selection classifier (PCC) to analyse and classify the sentiment polarities of movie reviews.
Design/methodology/approach
An Indian movie review corpus is assembled for this study. Another publicly available movie review polarity corpus is also involved with regard to validating the results. The greedy stepwise search method is used to extract the features/words of the reviews. The performance of the proposed classifier is measured using different metrics, such as F-measure, false positive rate, receiver operating characteristic (ROC) curve and training time. Further, the proposed classifier is compared with other popular machine-learning classifiers, such as Bayesian, Naïve Bayes, Decision Tree (J48), Support Vector Machine and Random Forest.
Findings
The results of this study show that the proposed classifier is good at predicting the positive or negative polarity of movie reviews. Its performance accuracy and the value of the ROC curve of the PCC is found to be the most suitable of all other classifiers tested in this study. This classifier is also found to be efficient at identifying positive sentiments of reviews, where it gives low false positive rates for both the Indian Movie Review and Review Polarity corpora used in this study. The training time of the proposed classifier is found to be slightly higher than that of Bayesian, Naïve Bayes and J48.
Research limitations/implications
Only movie review sentiments written in English are considered. In addition, the proposed committee selection classifier is prepared only using the committee of probabilistic classifiers; however, other classifier committees can also be built, tested and compared with the present experiment scenario.
Practical implications
In this paper, a novel probabilistic approach is proposed and used for classifying movie reviews, and is found to be highly effective in comparison with other state-of-the-art classifiers. This classifier may be tested for different applications and may provide new insights for developers and researchers.
Social implications
The proposed PCC may be used to classify different product reviews, and hence may be beneficial to organizations to justify users’ reviews about specific products or services. By using authentic positive and negative sentiments of users, the credibility of the specific product, service or event may be enhanced. PCC may also be applied to other applications, such as spam detection, blog mining, news mining and various other data-mining applications.
Originality/value
The constructed PCC is novel and was tested on Indian movie review data.
Details
Keywords
Fatima Zohra Ennaji, Abdelaziz El Fazziki, Hasna El Alaoui El Abdallaoui, Djamal Benslimane and Mohamed Sadgal
The purpose of this paper is to bring together the textual and multimedia opinions, since the use of social data has become the new trend that enables to gather the product…
Abstract
Purpose
The purpose of this paper is to bring together the textual and multimedia opinions, since the use of social data has become the new trend that enables to gather the product reputation traded in social media. Integrating a product reputation process into the companies' strategy will bring several benefits such as helping in decision-making regarding the current and the new generation of the product by understanding the customers’ needs. However, image-centric sentiment analysis has received much less attention than text-based sentiment detection.
Design/methodology/approach
In this work, the authors propose a multimedia content-based product reputation framework that helps in detecting opinions from social media. Thus, in this case, the analysis of a certain publication is made by combining their textual and multimedia parts.
Findings
To test the effectiveness of the proposed framework, a case study based on YouTube videos has been established, as it brings together the image, the audio and the video processing at the same time.
Originality/value
The key novelty is the implication of multimedia content in addition of the textual one with the goal of gathering opinions about a certain product. The multimedia analysis brings together facial sentiment detection, printed text analysis, opinion detection from speeches and textual opinion analysis.
Details
Keywords
Ruichen Ge, Sha Zhang and Hong Zhao
Extant research shows mixed results on the impact of expressed negative emotions on donations in online charitable crowdfunding. This study solves the puzzle by examining how…
Abstract
Purpose
Extant research shows mixed results on the impact of expressed negative emotions on donations in online charitable crowdfunding. This study solves the puzzle by examining how different types of negative emotions (i.e. sadness, anxiety and fear) expressed in crowdfunding project descriptions affect donations.
Design/methodology/approach
Data on 15,653 projects across four categories (medical assistance, education assistance, disaster assistance and poverty assistance) from September 2013 to May 2019 come from a leading online crowdfunding platform in China. Text analysis and regression models serve to test the hypotheses.
Findings
In the medical assistance category, the expression of sadness has an inverted U-shaped effect on donations, while the expression of anxiety has a negative effect. An appropriate number of sadness words is helpful but should not exceed five times. In the education assistance and disaster assistance categories, the expression of sadness has a positive effect on donations, but disclosure of anxiety and fear has no influence on donations. Expressions of sadness, anxiety and fear have no impact on donations in the poverty assistance category.
Research limitations/implications
This work has important implications for fundraisers on how to regulate the fundraisers' expressions of negative emotions in a project's description to attract donations. These insights are also relevant for online crowdfunding platforms.
Originality/value
Online crowdfunding research often studies negative emotions as a whole and does not differentiate project types. The current work contributes by empirically testing the impact of three types of negative emotions on donations across four major online crowdfunding categories.
Details