Search results

1 – 10 of over 3000

View access options

Article

Publication date: 17 May 2021

A generalizable sentiment analysis method for creating a hotel dictionary: using big data on TripAdvisor hotel reviews

Sayeh Bagherzadeh, Sajjad Shokouhyar, Hamed Jahani and Marianna Sigala

Research analyzing online travelers’ reviews has boomed over the past years, but it lacks efficient methodologies that can provide useful end-user value within time and budget…

HTML

PDF (956 KB)

Downloads

1178

Abstract

Purpose

Research analyzing online travelers’ reviews has boomed over the past years, but it lacks efficient methodologies that can provide useful end-user value within time and budget. This study aims to contribute to the field by developing and testing a new methodology for sentiment analysis that surpasses the standard dictionary-based method by creating two hotel-specific word lexicons.

Design/methodology/approach

Big data of hotel customer reviews posted on the TripAdvisor platform were collected and appropriately prepared for conducting a binary sentiment analysis by developing a novel bag-of-words weighted approach. The latter provides a transparent and replicable procedure to prepare, create and assess lexicons for sentiment analysis. This approach resulted in two lexicons (a weighted lexicon, L1 and a manually selected lexicon, L2), which were tested and validated by applying classification accuracy metrics to the TripAdvisor big data. Two popular methodologies (a public dictionary-based method and a complex machine-learning algorithm) were used for comparing the accuracy metrics of the study’s approach for creating the two lexicons.

Findings

The results of the accuracy metrics confirmed that the study’s methodology significantly outperforms the dictionary-based method in comparison to the machine-learning algorithm method. The findings also provide evidence that the study’s methodology is generalizable for predicting users’ sentiment.

Practical implications

The study developed and validated a methodology for generating reliable lexicons that can be used for big data analysis aiming to understand and predict customers’ sentiment. The L2 hotel dictionary generated by the study provides a reliable method and a useful tool for analyzing guests’ feedback and enabling managers to understand, anticipate and re-actively respond to customers’ attitudes and changes. The study also proposed a simplified methodology for understanding the sentiment of each user, which, in turn, can be used for conducting comparisons aiming to detect and understand guests’ sentiment changes across time, as well as across users based on their profiles and experiences.

Originality/value

This study contributes to the field by proposing and testing a new methodology for conducting sentiment analysis that addresses previous methodological limitations, as well as the contextual specificities of the tourism industry. Based on the paper’s literature review, this is the first research study using a bag-of-words approach for conducting a sentiment analysis and creating a field-specific lexicon.

论可推广性的情感分析法以创建酒店字典：以TripAdvisor酒店评论为样本的大数据分析

摘要

研究目的

对于在线游客评论的研究在过去的几年中与日俱增, 但是仍缺乏有效方法能在有限的时间喝预算内提供终端用户价值。本论文开发并测试了一套情感分析的新方法, 创建两套酒店相关的词库, 此方法超越了标准词典式分析法。

研究设计/方法/途径

研究样本为TripAdvisor酒店客户评论的大数据, 通过开发崭新的有配重的词库法, 来开展两极式情感分析。这个崭新的具有配重的词库法能够呈现透明化和可复制的程序, 准备、创建、并检验情感分析的词条。这个方法用到了两种词典（有配重的词典L1和手动选择的词典L2）, 本论文通过对TripAdvisor大数据进行使用词类划分精准度, 来检测和验证这两种词典。本论文采用两种热门方法（公共词典法和复杂机器学习算法）来对比词典的准确度。

研究结果

精确度对比结果证实了本论文的方法, 相较于机器学习算法, 显著地超越了以字典为基础的方法。研究结果还表明, 本论文的方法可以就预测用户情感趋势进行推广。

研究实际启示

本论文开发并验证了一项方法, 这种方法通过创建可信的词典进行大数据分析, 以判定用户情感。本论文创建的L2酒店词库对分析客人反馈是可靠有用的工具, 这个词库还能帮助酒店经理了解、预测、以及积极相应客人的态度和改变。本论文还提出了一项可以了解每个用户情感的简易方法, 这项方法可以通过对比的方式来检测和了解客人不同时间的情感变化, 以及根据其不同背景和经历的不同用户之间的变化。

研究原创性/价值

本论文提出并检测了一项新方法, 这项情感分析方法可以解决之前方法的局限并立脚于旅游行业。基于文献综述, 本论文是首篇研究, 使用词库法来进行情感分析和创建特别领域词典的方式。

Details

Journal of Hospitality and Tourism Technology, vol. 12 no. 2

Type: Research Article

DOI:

ISSN: 1757-9880

Keywords

View access options

Article

Publication date: 7 November 2016

A lexicon based approach for classifying Arabic multi-labeled text

Ismail Hmeidi, Mahmoud Al-Ayyoub, Nizar A. Mahyoub and Mohammed A. Shehab

Multi-label Text Classification (MTC) is one of the most recent research trends in data mining and information retrieval domains because of many reasons such as the rapid growth…

HTML

PDF (659 KB)

Downloads

346

Abstract

Purpose

Multi-label Text Classification (MTC) is one of the most recent research trends in data mining and information retrieval domains because of many reasons such as the rapid growth of online data and the increasing tendency of internet users to be more comfortable with assigning multiple labels/tags to describe documents, emails, posts, etc. The dimensionality of labels makes MTC more difficult and challenging compared with traditional single-labeled text classification (TC). Because it is a natural extension of TC, several ways are proposed to benefit from the rich literature of TC through what is called problem transformation (PT) methods. Basically, PT methods transform the multi-label data into a single-label one that is suitable for traditional single-label classification algorithms. Another approach is to design novel classification algorithms customized for MTC. Over the past decade, several works have appeared on both approaches focusing mainly on the English language. This work aims to present an elaborate study of MTC of Arabic articles.

Design/methodology/approach

This paper presents a novel lexicon-based method for MTC, where the keywords that are most associated with each label are extracted from the training data along with a threshold that can later be used to determine whether each test document belongs to a certain label.

Findings

The experiments show that the presented approach outperforms the currently available approaches. Specifically, the results of our experiments show that the best accuracy obtained from existing approaches is only 18 per cent, whereas the accuracy of the presented lexicon-based approach can reach an accuracy level of 31 per cent.

Originality/value

Although there exist some tools that can be customized to address the MTC problem for Arabic text, their accuracies are very low when applied to Arabic articles. This paper presents a novel method for MTC. The experiments show that the presented approach outperforms the currently available approaches.

Details

International Journal of Web Information Systems, vol. 12 no. 4

Type: Research Article

DOI:

ISSN: 1744-0084

Keywords

View access options

Article

Publication date: 12 April 2022

Sentiment lexicon construction for Chinese book reviews based on ultrashort reviews

Mengjuan Zha, Changping Hu and Yu Shi

Sentiment lexicon is an essential resource for sentiment analysis of user reviews. By far, there is still a lack of domain sentiment lexicon with large scale and high accuracy for…

HTML

PDF (407 KB)

Downloads

142

Abstract

Purpose

Sentiment lexicon is an essential resource for sentiment analysis of user reviews. By far, there is still a lack of domain sentiment lexicon with large scale and high accuracy for Chinese book reviews. This paper aims to construct a large-scale sentiment lexicon based on the ultrashort reviews of Chinese books.

Design/methodology/approach

First, large-scale ultrashort reviews of Chinese books, whose length is no more than six Chinese characters, are collected and preprocessed as candidate sentiment words. Second, non-sentiment words are filtered out through certain rules, such as part of speech rules, context rules, feature word rules and user behaviour rules. Third, the relative frequency is used to select and judge the polarity of sentiment words. Finally, the performance of the sentiment lexicon is evaluated through experiments.

Findings

This paper proposes a method of sentiment lexicon construction based on ultrashort reviews and successfully builds one for Chinese books with nearly 40,000 words based on the Douban book.

Originality/value

Compared with the idea of constructing a sentiment lexicon based on a small number of reviews, the proposed method can give full play to the advantages of data scale to build a corpus. Moreover, different from the computer segmentation method, this method helps to avoid the problems caused by immature segmentation technology and an imperfect N-gram language model.

Details

The Electronic Library , vol. 40 no. 3

Type: Research Article

DOI:

ISSN: 0264-0473

Keywords

View access options

Article

Publication date: 4 October 2019

Improving the affective analysis in texts: Automatic method to detect affective intensity in lexicons based on Plutchik’s wheel of emotions

Carlos Molina Beltrán, Alejandra Andrea Segura Navarrete, Christian Vidal-Castro, Clemente Rubio-Manzano and Claudia Martínez-Araneda

This paper aims to propose a method for automatically labelling an affective lexicon with intensity values by using the WordNet Similarity (WS) software package with the purpose…

HTML

PDF (3.6 MB)

Downloads

672

Abstract

Purpose

This paper aims to propose a method for automatically labelling an affective lexicon with intensity values by using the WordNet Similarity (WS) software package with the purpose of improving the results of an affective analysis process, which is relevant to interpreting the textual information that is available in social networks. The hypothesis states that it is possible to improve affective analysis by using a lexicon that is enriched with the intensity values obtained from similarity metrics. Encouraging results were obtained when an affective analysis based on a labelled lexicon was compared with that based on another lexicon without intensity values.

Design/methodology/approach

The authors propose a method for the automatic extraction of the affective intensity values of words using the similarity metrics implemented in WS. First, the intensity values were calculated for words having an affective root in WordNet. Then, to evaluate the effectiveness of the proposal, the results of the affective analysis based on a labelled lexicon were compared to the results of an analysis with and without affective intensity values.

Findings

The main contribution of this research is a method for the automatic extraction of the intensity values of affective words used to enrich a lexicon compared with the manual labelling process. The results obtained from the affective analysis with the new lexicon are encouraging, as they provide a better performance than those achieved using a lexicon without affective intensity values.

Research limitations/implications

Given the restrictions for calculating the similarity between two words, the lexicon labelled with intensity values is a subset of the original lexicon, which means that a large proportion of the words in the corpus are not labelled in the new lexicon.

Practical implications

The practical implications of this work include providing tools to improve the analysis of the feelings of the users of social networks. In particular, it is of interest to provide an affective lexicon that improves attempts to solve the problems of a digital society, such as the detection of cyberbullying. In this case, by achieving greater precision in the detection of emotions, it is possible to detect the roles of participants in a situation of cyberbullying, for example, the bully and victim. Other problems in which the application of affective lexicons is of importance are the detection of aggressiveness against women or gender violence or the detection of depressive states in young people and children.

Social implications

This work is interested in providing an affective lexicon that improves attempts to solve the problems of a digital society, such as the detection of cyberbullying. In this case, by achieving greater precision in the detection of emotions, it is possible to detect the roles of participants in a situation of cyber bullying, for example, the bully and victim. Other problems in which the application of affective lexicons is of importance are the detection of aggressiveness against women or gender violence or the detection of depressive states in young people and children.

Originality/value

The originality of the research lies in the proposed method for automatically labelling the words of an affective lexicon with intensity values by using WS. To date, a lexicon labelled with intensity values has been constructed using the opinions of experts, but that method is more expensive and requires more time than other existing methods. On the other hand, the new method developed herein is applicable to larger lexicons, requires less time and facilitates automatic updating.

Details

The Electronic Library, vol. 37 no. 6

Type: Research Article

DOI:

ISSN: 0264-0473

Keywords

View access options

Article

Publication date: 14 May 2018

Towards the creation of an emotion lexicon for microblogging

Georgios Kalamatianos, Symeon Symeonidis, Dimitrios Mallis and Avi Arampatzis

The rapid growth of social media has rendered opinion and sentiment mining an important area of research with a wide range of applications. This paper aims to focus on the Greek…

HTML

PDF (656 KB)

Downloads

258

Abstract

Purpose

The rapid growth of social media has rendered opinion and sentiment mining an important area of research with a wide range of applications. This paper aims to focus on the Greek language and the microblogging platform Twitter, investigating methods for extracting emotion of individual tweets as well as population emotion for different subjects (hashtags).

Design/methodology/approach

The authors propose and investigate the use of emotion lexicon-based methods as a mean of extracting emotion/sentiment information from social media. The authors compare several approaches for measuring the intensity of six emotions: anger, disgust, fear, happiness, sadness and surprise. To evaluate the effectiveness of the methods, the authors develop a benchmark dataset of tweets, manually rated by two humans.

Findings

Development of a new sentiment lexicon for use in Web applications. The authors then assess the performance of the methods with the new lexicon and find improved results.

Research limitations/implications

Automated emotion results of research seem promising and correlate to real user emotion. At this point, the authors make some interesting observations about the lexicon-based approach which lead to the need for a new, better, emotion lexicon.

Practical implications

The authors examine the variation of emotion intensity over time for selected hashtags and associate it with real-world events.

Originality/value

The originality in this research is the development of a training set of tweets, manually annotated by two independent raters. The authors “transfer” the sentiment information of these annotated tweets, in a meaningful way, to the set of words that appear in them.

Details

Journal of Systems and Information Technology, vol. 20 no. 2

Type: Research Article

DOI:

ISSN: 1328-7265

Keywords

View access options

Article

Publication date: 8 February 2021

A novel approach to the creation of a labelling lexicon for improving emotion analysis in text

Alejandra Segura Navarrete, Claudia Martinez-Araneda, Christian Vidal-Castro and Clemente Rubio-Manzano

This paper aims to describe the process used to create an emotion lexicon enriched with the emotional intensity of words and focuses on improving the emotion analysis process in…

HTML

PDF (1.1 MB)

Downloads

284

Abstract

Purpose

This paper aims to describe the process used to create an emotion lexicon enriched with the emotional intensity of words and focuses on improving the emotion analysis process in texts.

Design/methodology/approach

The process includes setting, preparation and labelling stages. In the first stage, a lexicon is selected. It must include a translation to the target language and labelling according to Plutchik’s eight emotions. The second stage starts with the validation of the translations. Then, it is expanded with the synonyms of the emotion synsets of each word. In the labelling stage, the similarity of words is calculated and displayed using WordNet similarity.

Findings

The authors’ approach shows better performance to identification of the predominant emotion for the selected corpus. The most relevant is the improvement obtained in the results of the emotion analysis in a hybrid approach compared to the results obtained in a purist approach.

Research limitations/implications

The proposed lexicon can still be enriched by incorporating elements such as emojis, idioms and colloquial expressions.

Practical implications

This work is part of a research project that aids in solving problems in a digital society, such as detecting cyberbullying, abusive language and gender violence in texts or exercising parental control. Detection of depressive states in young people and children is added.

Originality/value

This semi-automatic process can be applied to any language to generate an emotion lexicon. This resource will be available in a software tool that implements a crowdsourcing strategy allowing the intensity to be re-labelled and new words to be automatically incorporated into the lexicon.

Details

The Electronic Library , vol. 39 no. 1

Type: Research Article

DOI:

ISSN: 0264-0473

Keywords

View access options

Article

Publication date: 2 January 2020

Inducing stock market lexicons from disparate Chinese texts

Futao Zhao, Zhong Yao, Jing Luan and Hao Liu

The purpose of this paper is to propose a methodology to construct a stock market sentiment lexicon by incorporating domain-specific knowledge extracted from diverse Chinese media…

HTML

PDF (844 KB)

Downloads

235

Abstract

Purpose

The purpose of this paper is to propose a methodology to construct a stock market sentiment lexicon by incorporating domain-specific knowledge extracted from diverse Chinese media outlets.

Design/methodology/approach

This paper presents a novel method to automatically generate financial lexicons using a unique data set that comprises news articles, analyst reports and social media. Specifically, a novel method based on keyword extraction is used to build a high-quality seed lexicon and an ensemble mechanism is developed to integrate the knowledge derived from distinct language sources. Meanwhile, two different methods, Pointwise Mutual Information and Word2vec, are applied to capture word associations. Finally, an evaluation procedure is performed to validate the effectiveness of the method compared with four traditional lexicons.

Findings

The experimental results from the three real-world testing data sets show that the ensemble lexicons can significantly improve sentiment classification performance compared with the four baseline lexicons, suggesting the usefulness of leveraging knowledge derived from diverse media in domain-specific lexicon generation and corresponding sentiment analysis tasks.

Originality/value

This work appears to be the first to construct financial sentiment lexicons from over 2m posts and headlines collected from more than one language source. Furthermore, the authors believe that the data set established in this study is one of the largest corpora used for Chinese stock market lexicon acquisition. This work is valuable to extract collective sentiment from multiple media sources and provide decision-making support for stock market participants.

Details

Industrial Management & Data Systems, vol. 120 no. 3

Type: Research Article

DOI:

ISSN: 0263-5577

Keywords

View access options

Article

Publication date: 28 July 2023

Assessing place experiences in Luton and Darlington on Twitter with topic modelling and AI-generated lexicons

Viriya Taecharungroj and Ioana S. Stoica

The purpose of this paper is to examine and compare the in situ place experiences of people in Luton and Darlington.

HTML

PDF (2.1 MB)

Downloads

122

Abstract

Purpose

The purpose of this paper is to examine and compare the in situ place experiences of people in Luton and Darlington.

Design/methodology/approach

The study used 109,998 geotagged tweets from Luton and Darlington between 2020 and 2022 and conducted topic modelling using latent Dirichlet allocation. Lexicons were created using GPT-4 to evaluate the eight dimensions of place experience for each topic.

Findings

The study found that Darlington had higher counts in the sensorial, behavioural, designed and mundane dimensions of place experience than Luton. Conversely, Luton had a higher prevalence of the affective and intellectual dimensions, attributed to political and faith-related tweets.

Originality/value

The study introduces a novel approach that uses AI-generated lexicons for place experience. These lexicons cover four facets, two intentions and two intensities of place experience, enabling detection of words from any domain. This approach can be useful not only for town and destination brand managers but also for researchers in any field.

Details

Journal of Place Management and Development, vol. 17 no. 1

Type: Research Article

DOI:

ISSN: 1753-8335

Keywords

Open Access

Article

Publication date: 31 July 2020

Aspect-based sentiment analysis using smart government review data

Omar Alqaryouti, Nur Siyam, Azza Abdel Monem and Khaled Shaalan

Digital resources such as smart applications reviews and online feedback information are important sources to seek customers’ feedback and input. This paper aims to help…

HTML

PDF (1.2 MB)

Downloads

6984

Abstract

Digital resources such as smart applications reviews and online feedback information are important sources to seek customers’ feedback and input. This paper aims to help government entities gain insights on the needs and expectations of their customers. Towards this end, we propose an aspect-based sentiment analysis hybrid approach that integrates domain lexicons and rules to analyse the entities smart apps reviews. The proposed model aims to extract the important aspects from the reviews and classify the corresponding sentiments. This approach adopts language processing techniques, rules, and lexicons to address several sentiment analysis challenges, and produce summarized results. According to the reported results, the aspect extraction accuracy improves significantly when the implicit aspects are considered. Also, the integrated classification model outperforms the lexicon-based baseline and the other rules combinations by 5% in terms of Accuracy on average. Also, when using the same dataset, the proposed approach outperforms machine learning approaches that uses support vector machine (SVM). However, using these lexicons and rules as input features to the SVM model has achieved higher accuracy than other SVM models.

Details

Applied Computing and Informatics, vol. 20 no. 1/2

Type: Research Article

DOI:

ISSN: 2634-1964

Keywords

View access options

Article

Publication date: 16 December 2019

Sentiment classification of Chinese cosmetic reviews based on integration of collocations and concepts

Chihli Hung and You-Xin Cao

This paper aims to propose a novel approach which integrates collocations and domain concepts for Chinese cosmetic word of mouth (WOM) sentiment classification. Most sentiment…

HTML

PDF (386 KB)

Downloads

300

Abstract

Purpose

This paper aims to propose a novel approach which integrates collocations and domain concepts for Chinese cosmetic word of mouth (WOM) sentiment classification. Most sentiment analysis works by collecting sentiment scores from each unigram or bigram. However, not every unigram or bigram in a WOM document contains sentiments. Chinese collocations consist of the main sentiments of WOM. This paper reduces the complexity of the document dimensionality and makes an improvement for sentiment classification.

Design/methodology/approach

This paper builds two contextual lexicons for feature words and sentiment words, respectively. Based on these contextual lexicons, this paper uses the techniques of associated rules and mutual information to build possible Chinese collocation sets. This paper applies preference vector modelling as the vector representation approach to catch the relationship between Chinese collocations and their associated concepts.

Findings

This paper compares the proposed preference vector models with benchmarks, using three classification techniques (i.e. support vector machine, J48 decision tree and multilayer perceptron). According to the experimental results, the proposed models outperform all benchmarks evaluated by the criterion of accuracy.

Originality/value

This paper focuses on Chinese collocations and proposes a novel research approach for sentiment classification. The Chinese collocations used in this paper are adaptable to the content and domains. Finally, this paper integrates collocations with the preference vector modelling approach, which not only achieves a better sentiment classification performance for Chinese WOM documents but also avoids the curse of dimensionality.

Details

The Electronic Library , vol. 38 no. 1

Type: Research Article

DOI:

ISSN: 0264-0473

Keywords

Access

Year

Content type

1 – 10 of over 3000