Search results

1 – 10 of 866
Article
Publication date: 12 April 2022

Mengjuan Zha, Changping Hu and Yu Shi

Sentiment lexicon is an essential resource for sentiment analysis of user reviews. By far, there is still a lack of domain sentiment lexicon with large scale and high accuracy for…

Abstract

Purpose

Sentiment lexicon is an essential resource for sentiment analysis of user reviews. By far, there is still a lack of domain sentiment lexicon with large scale and high accuracy for Chinese book reviews. This paper aims to construct a large-scale sentiment lexicon based on the ultrashort reviews of Chinese books.

Design/methodology/approach

First, large-scale ultrashort reviews of Chinese books, whose length is no more than six Chinese characters, are collected and preprocessed as candidate sentiment words. Second, non-sentiment words are filtered out through certain rules, such as part of speech rules, context rules, feature word rules and user behaviour rules. Third, the relative frequency is used to select and judge the polarity of sentiment words. Finally, the performance of the sentiment lexicon is evaluated through experiments.

Findings

This paper proposes a method of sentiment lexicon construction based on ultrashort reviews and successfully builds one for Chinese books with nearly 40,000 words based on the Douban book.

Originality/value

Compared with the idea of constructing a sentiment lexicon based on a small number of reviews, the proposed method can give full play to the advantages of data scale to build a corpus. Moreover, different from the computer segmentation method, this method helps to avoid the problems caused by immature segmentation technology and an imperfect N-gram language model.

Details

The Electronic Library , vol. 40 no. 3
Type: Research Article
ISSN: 0264-0473

Keywords

Article
Publication date: 17 May 2021

Sayeh Bagherzadeh, Sajjad Shokouhyar, Hamed Jahani and Marianna Sigala

Research analyzing online travelers’ reviews has boomed over the past years, but it lacks efficient methodologies that can provide useful end-user value within time and budget…

1312

Abstract

Purpose

Research analyzing online travelers’ reviews has boomed over the past years, but it lacks efficient methodologies that can provide useful end-user value within time and budget. This study aims to contribute to the field by developing and testing a new methodology for sentiment analysis that surpasses the standard dictionary-based method by creating two hotel-specific word lexicons.

Design/methodology/approach

Big data of hotel customer reviews posted on the TripAdvisor platform were collected and appropriately prepared for conducting a binary sentiment analysis by developing a novel bag-of-words weighted approach. The latter provides a transparent and replicable procedure to prepare, create and assess lexicons for sentiment analysis. This approach resulted in two lexicons (a weighted lexicon, L1 and a manually selected lexicon, L2), which were tested and validated by applying classification accuracy metrics to the TripAdvisor big data. Two popular methodologies (a public dictionary-based method and a complex machine-learning algorithm) were used for comparing the accuracy metrics of the study’s approach for creating the two lexicons.

Findings

The results of the accuracy metrics confirmed that the study’s methodology significantly outperforms the dictionary-based method in comparison to the machine-learning algorithm method. The findings also provide evidence that the study’s methodology is generalizable for predicting users’ sentiment.

Practical implications

The study developed and validated a methodology for generating reliable lexicons that can be used for big data analysis aiming to understand and predict customers’ sentiment. The L2 hotel dictionary generated by the study provides a reliable method and a useful tool for analyzing guests’ feedback and enabling managers to understand, anticipate and re-actively respond to customers’ attitudes and changes. The study also proposed a simplified methodology for understanding the sentiment of each user, which, in turn, can be used for conducting comparisons aiming to detect and understand guests’ sentiment changes across time, as well as across users based on their profiles and experiences.

Originality/value

This study contributes to the field by proposing and testing a new methodology for conducting sentiment analysis that addresses previous methodological limitations, as well as the contextual specificities of the tourism industry. Based on the paper’s literature review, this is the first research study using a bag-of-words approach for conducting a sentiment analysis and creating a field-specific lexicon.

论可推广性的情感分析法以创建酒店字典:以TripAdvisor酒店评论为样本的大数据分析

摘要

研究目的

对于在线游客评论的研究在过去的几年中与日俱增, 但是仍缺乏有效方法能在有限的时间喝预算内提供终端用户价值。本论文开发并测试了一套情感分析的新方法, 创建两套酒店相关的词库, 此方法超越了标准词典式分析法。

研究设计/方法/途径

研究样本为TripAdvisor酒店客户评论的大数据, 通过开发崭新的有配重的词库法, 来开展两极式情感分析。这个崭新的具有配重的词库法能够呈现透明化和可复制的程序, 准备、创建、并检验情感分析的词条。这个方法用到了两种词典(有配重的词典L1和手动选择的词典L2), 本论文通过对TripAdvisor大数据进行使用词类划分精准度, 来检测和验证这两种词典。本论文采用两种热门方法(公共词典法和复杂机器学习算法)来对比词典的准确度。

研究结果

精确度对比结果证实了本论文的方法, 相较于机器学习算法, 显著地超越了以字典为基础的方法。研究结果还表明, 本论文的方法可以就预测用户情感趋势进行推广。

研究实际启示

本论文开发并验证了一项方法, 这种方法通过创建可信的词典进行大数据分析, 以判定用户情感。本论文创建的L2酒店词库对分析客人反馈是可靠有用的工具, 这个词库还能帮助酒店经理了解、预测、以及积极相应客人的态度和改变。本论文还提出了一项可以了解每个用户情感的简易方法, 这项方法可以通过对比的方式来检测和了解客人不同时间的情感变化, 以及根据其不同背景和经历的不同用户之间的变化。

研究原创性/价值

本论文提出并检测了一项新方法, 这项情感分析方法可以解决之前方法的局限并立脚于旅游行业。基于文献综述, 本论文是首篇研究, 使用词库法来进行情感分析和创建特别领域词典的方式。

Details

Journal of Hospitality and Tourism Technology, vol. 12 no. 2
Type: Research Article
ISSN: 1757-9880

Keywords

Article
Publication date: 2 January 2020

Futao Zhao, Zhong Yao, Jing Luan and Hao Liu

The purpose of this paper is to propose a methodology to construct a stock market sentiment lexicon by incorporating domain-specific knowledge extracted from diverse Chinese media…

Abstract

Purpose

The purpose of this paper is to propose a methodology to construct a stock market sentiment lexicon by incorporating domain-specific knowledge extracted from diverse Chinese media outlets.

Design/methodology/approach

This paper presents a novel method to automatically generate financial lexicons using a unique data set that comprises news articles, analyst reports and social media. Specifically, a novel method based on keyword extraction is used to build a high-quality seed lexicon and an ensemble mechanism is developed to integrate the knowledge derived from distinct language sources. Meanwhile, two different methods, Pointwise Mutual Information and Word2vec, are applied to capture word associations. Finally, an evaluation procedure is performed to validate the effectiveness of the method compared with four traditional lexicons.

Findings

The experimental results from the three real-world testing data sets show that the ensemble lexicons can significantly improve sentiment classification performance compared with the four baseline lexicons, suggesting the usefulness of leveraging knowledge derived from diverse media in domain-specific lexicon generation and corresponding sentiment analysis tasks.

Originality/value

This work appears to be the first to construct financial sentiment lexicons from over 2m posts and headlines collected from more than one language source. Furthermore, the authors believe that the data set established in this study is one of the largest corpora used for Chinese stock market lexicon acquisition. This work is valuable to extract collective sentiment from multiple media sources and provide decision-making support for stock market participants.

Details

Industrial Management & Data Systems, vol. 120 no. 3
Type: Research Article
ISSN: 0263-5577

Keywords

Article
Publication date: 2 September 2019

Guellil Imane, Darwish Kareem and Azouaou Faical

This paper aims to propose an approach to automatically annotate a large corpus in Arabic dialect. This corpus is used in order to analyse sentiments of Arabic users on social…

Abstract

Purpose

This paper aims to propose an approach to automatically annotate a large corpus in Arabic dialect. This corpus is used in order to analyse sentiments of Arabic users on social medias. It focuses on the Algerian dialect, which is a sub-dialect of Maghrebi Arabic. Although Algerian is spoken by roughly 40 million speakers, few studies address the automated processing in general and the sentiment analysis in specific for Algerian.

Design/methodology/approach

The approach is based on the construction and use of a sentiment lexicon to automatically annotate a large corpus of Algerian text that is extracted from Facebook. Using this approach allow to significantly increase the size of the training corpus without calling the manual annotation. The annotated corpus is then vectorized using document embedding (doc2vec), which is an extension of word embeddings (word2vec). For sentiments classification, the authors used different classifiers such as support vector machines (SVM), Naive Bayes (NB) and logistic regression (LR).

Findings

The results suggest that NB and SVM classifiers generally led to the best results and MLP generally had the worst results. Further, the threshold that the authors use in selecting messages for the training set had a noticeable impact on recall and precision, with a threshold of 0.6 producing the best results. Using PV-DBOW led to slightly higher results than using PV-DM. Combining PV-DBOW and PV-DM representations led to slightly lower results than using PV-DBOW alone. The best results were obtained by the NB classifier with F1 up to 86.9 per cent.

Originality/value

The principal originality of this paper is to determine the right parameters for automatically annotating an Algerian dialect corpus. This annotation is based on a sentiment lexicon that was also constructed automatically.

Details

International Journal of Web Information Systems, vol. 15 no. 5
Type: Research Article
ISSN: 1744-0084

Keywords

Open Access
Article
Publication date: 31 July 2020

Omar Alqaryouti, Nur Siyam, Azza Abdel Monem and Khaled Shaalan

Digital resources such as smart applications reviews and online feedback information are important sources to seek customers’ feedback and input. This paper aims to help…

9794

Abstract

Digital resources such as smart applications reviews and online feedback information are important sources to seek customers’ feedback and input. This paper aims to help government entities gain insights on the needs and expectations of their customers. Towards this end, we propose an aspect-based sentiment analysis hybrid approach that integrates domain lexicons and rules to analyse the entities smart apps reviews. The proposed model aims to extract the important aspects from the reviews and classify the corresponding sentiments. This approach adopts language processing techniques, rules, and lexicons to address several sentiment analysis challenges, and produce summarized results. According to the reported results, the aspect extraction accuracy improves significantly when the implicit aspects are considered. Also, the integrated classification model outperforms the lexicon-based baseline and the other rules combinations by 5% in terms of Accuracy on average. Also, when using the same dataset, the proposed approach outperforms machine learning approaches that uses support vector machine (SVM). However, using these lexicons and rules as input features to the SVM model has achieved higher accuracy than other SVM models.

Details

Applied Computing and Informatics, vol. 20 no. 1/2
Type: Research Article
ISSN: 2634-1964

Keywords

Article
Publication date: 16 December 2019

Chihli Hung and You-Xin Cao

This paper aims to propose a novel approach which integrates collocations and domain concepts for Chinese cosmetic word of mouth (WOM) sentiment classification. Most sentiment

Abstract

Purpose

This paper aims to propose a novel approach which integrates collocations and domain concepts for Chinese cosmetic word of mouth (WOM) sentiment classification. Most sentiment analysis works by collecting sentiment scores from each unigram or bigram. However, not every unigram or bigram in a WOM document contains sentiments. Chinese collocations consist of the main sentiments of WOM. This paper reduces the complexity of the document dimensionality and makes an improvement for sentiment classification.

Design/methodology/approach

This paper builds two contextual lexicons for feature words and sentiment words, respectively. Based on these contextual lexicons, this paper uses the techniques of associated rules and mutual information to build possible Chinese collocation sets. This paper applies preference vector modelling as the vector representation approach to catch the relationship between Chinese collocations and their associated concepts.

Findings

This paper compares the proposed preference vector models with benchmarks, using three classification techniques (i.e. support vector machine, J48 decision tree and multilayer perceptron). According to the experimental results, the proposed models outperform all benchmarks evaluated by the criterion of accuracy.

Originality/value

This paper focuses on Chinese collocations and proposes a novel research approach for sentiment classification. The Chinese collocations used in this paper are adaptable to the content and domains. Finally, this paper integrates collocations with the preference vector modelling approach, which not only achieves a better sentiment classification performance for Chinese WOM documents but also avoids the curse of dimensionality.

Details

The Electronic Library , vol. 38 no. 1
Type: Research Article
ISSN: 0264-0473

Keywords

Article
Publication date: 8 July 2020

Yasir Mehmood and Vimala Balakrishnan

Research on sentiment analysis were mostly conducted on product and services, resulting in scarcity of studies focusing on social issues, which may require different mechanisms…

Abstract

Purpose

Research on sentiment analysis were mostly conducted on product and services, resulting in scarcity of studies focusing on social issues, which may require different mechanisms due to the nature of the issue itself. This paper aims to address this gap by developing an enhanced lexicon-based approach.

Design/methodology/approach

An enhanced lexicon-based approach was employed using General Inquirer, incorporated with multi-level grammatical dependencies and the role of verb. Data on illegal immigration were gathered from Twitter for a period of three months, resulting in 694,141 tweets. Of these, 2,500 tweets were segregated into two datasets for evaluation purposes after filtering and pre-processing.

Findings

The enhanced approach outperformed ten online sentiment analysis tools with an overall accuracy of 81.4 and 82.3% for dataset 1 and 2, respectively as opposed to ten other sentiment analysis tools.

Originality/value

The study is novel in the sense that data pertaining to a social issue were used instead of products and services, which require different mechanism due to the nature of the issue itself.

Details

Online Information Review, vol. 44 no. 5
Type: Research Article
ISSN: 1468-4527

Keywords

Article
Publication date: 16 March 2021

P. Padmavathy, S. Pakkir Mohideen and Zameer Gulzar

The purpose of this paper is to initially perform Senti-WordNet (SWN)- and point wise mutual information (PMI)-based polarity computation and based polarity updation. When the SWN…

Abstract

Purpose

The purpose of this paper is to initially perform Senti-WordNet (SWN)- and point wise mutual information (PMI)-based polarity computation and based polarity updation. When the SWN polarity and polarity mismatched, the vote flipping algorithm (VFA) is employed.

Design/methodology/approach

Recently, in domains like social media(SM), healthcare, hotel, car, product data, etc., research on sentiment analysis (SA) has massively increased. In addition, there is no approach for analyzing the positive or negative orientations of every single aspect in a document (a tweet, a review, as well as a piece of news, among others). For SA as well as polarity classification, several researchers have used SWN as a lexical resource. Nevertheless, these lexicons show lower-level performance for sentiment classification (SC) than domain-specific lexicons (DSL). Likewise, in some scenarios, the same term is utilized differently between domain and general knowledge lexicons. While concerning different domains, most words have one sentiment class in SWN, and in the annotated data set, their occurrence signifies a strong inclination with the other sentiment class. Hence, this paper chiefly concentrates on the drawbacks of adapting domain-dependent sentiment lexicon (DDSL) from a collection of labeled user reviews and domain-independent lexicon (DIL) for proposing a framework centered on the information theory that could predict the correct polarity of the words (positive, neutral and negative). The proposed work initially performs SWN- and PMI-based polarity computation and based polarity updation. When the SWN polarity and polarity mismatched, the vote flipping algorithm (VFA) is employed. Finally, the predicted polarity is inputted to the mtf-idf-based SVM-NN classifier for the SC of reviews. The outcomes are examined and contrasted to the other existing techniques to verify that the proposed work has predicted the class of the reviews more effectually for different datasets.

Findings

There is no approach for analyzing the positive or negative orientations of every single aspect in a document (a tweet, a review, as well as a piece of news, among others). For SA as well as polarity classification, several researchers have used SWN as a lexical resource. Nevertheless, these lexicons show lower-level performance for sentiment classification (SC) than domain-specific lexicons (DSL). Likewise, in some scenarios, the same term is utilized differently between domain and general knowledge lexicons. While concerning different domains, most words have one sentiment class in SWN, and in the annotated data set their occurrence signifies a strong inclination with the other sentiment class.

Originality/value

The proposed work initially performs SWN- and PMI-based polarity computation, and based polarity updation. When the SWN polarity and polarity mismatched, the vote flipping algorithm (VFA) is employed.

Article
Publication date: 1 June 2015

Yuki Yamamoto, Tadahiko Kumamoto and Akiyo Nadamoto

– The purpose of this paper is to propose a method of calculating the sentiment value of a tweet based on the emoticon role.

Abstract

Purpose

The purpose of this paper is to propose a method of calculating the sentiment value of a tweet based on the emoticon role.

Design/methodology/approach

Classification of emoticon roles as four types showing “emphasis”, “assuagement”, “conversion” and “addition”, with roles determined based on the respective relations to sentiment of sentences and emoticons.

Findings

Clustering of users of four types based on emoticon sentiment.

Originality/value

Formalization, using regression analysis, of the relation of sentiment between sentences and emoticons in all roles.

Details

International Journal of Pervasive Computing and Communications, vol. 11 no. 2
Type: Research Article
ISSN: 1742-7371

Keywords

Article
Publication date: 11 September 2017

Chedia Dhaoui, Cynthia M. Webster and Lay Peng Tan

With the soaring volumes of brand-related social media conversations, digital marketers have extensive opportunities to track and analyse consumers’ feelings and opinions about…

8571

Abstract

Purpose

With the soaring volumes of brand-related social media conversations, digital marketers have extensive opportunities to track and analyse consumers’ feelings and opinions about brands, products or services embedded within consumer-generated content (CGC). These “Big Data” opportunities render manual approaches to sentiment analysis impractical and raise the need to develop automated tools to analyse consumer sentiment expressed in text format. This paper aims to evaluate and compare the performance of two prominent approaches to automated sentiment analysis applied to CGC on social media and explores the benefits of combining them.

Design/methodology/approach

A sample of 850 consumer comments from 83 Facebook brand pages are used to test and compare lexicon-based and machine learning approaches to sentiment analysis, as well as their combination, using the LIWC2015 lexicon and RTextTools machine learning package.

Findings

Results show the two approaches are similar in accuracy, both achieving higher accuracy when classifying positive sentiment than negative sentiment. However, they differ substantially in their classification ensembles. The combined approach demonstrates significantly improved performance in classifying positive sentiment.

Research limitations/implications

Further research is required to improve the accuracy of negative sentiment classification. The combined approach needs to be applied to other kinds of CGCs on social media such as tweets.

Practical implications

The findings inform decision-making around which sentiment analysis approaches (or a combination thereof) is best to analyse CGC on social media.

Originality/value

This study combines two sentiment analysis approaches and demonstrates significantly improved performance.

Details

Journal of Consumer Marketing, vol. 34 no. 6
Type: Research Article
ISSN: 0736-3761

Keywords

1 – 10 of 866