To read the full version of this content please select one of the options below:

A generalizable sentiment analysis method for creating a hotel dictionary: using big data on TripAdvisor hotel reviews

Sayeh Bagherzadeh (Management and Accounting Faculty, Department of Industrial and Information Management, Shahid Beheshti University, Tehran, Iran)
Sajjad Shokouhyar (Management and Accounting Faculty, Department of Industrial and Information Management, Shahid Beheshti University, Tehran, Iran)
Hamed Jahani (School of Accounting, Information Systems and Supply Chain, RMIT University, Melbourne, Australia)
Marianna Sigala (Department of UniSA Business, University of South Australia, Adelaide, Australia)

Journal of Hospitality and Tourism Technology

ISSN: 1757-9880

Article publication date: 17 May 2021

Issue publication date: 15 July 2021




Research analyzing online travelers’ reviews has boomed over the past years, but it lacks efficient methodologies that can provide useful end-user value within time and budget. This study aims to contribute to the field by developing and testing a new methodology for sentiment analysis that surpasses the standard dictionary-based method by creating two hotel-specific word lexicons.


Big data of hotel customer reviews posted on the TripAdvisor platform were collected and appropriately prepared for conducting a binary sentiment analysis by developing a novel bag-of-words weighted approach. The latter provides a transparent and replicable procedure to prepare, create and assess lexicons for sentiment analysis. This approach resulted in two lexicons (a weighted lexicon, L1 and a manually selected lexicon, L2), which were tested and validated by applying classification accuracy metrics to the TripAdvisor big data. Two popular methodologies (a public dictionary-based method and a complex machine-learning algorithm) were used for comparing the accuracy metrics of the study’s approach for creating the two lexicons.


The results of the accuracy metrics confirmed that the study’s methodology significantly outperforms the dictionary-based method in comparison to the machine-learning algorithm method. The findings also provide evidence that the study’s methodology is generalizable for predicting users’ sentiment.

Practical implications

The study developed and validated a methodology for generating reliable lexicons that can be used for big data analysis aiming to understand and predict customers’ sentiment. The L2 hotel dictionary generated by the study provides a reliable method and a useful tool for analyzing guests’ feedback and enabling managers to understand, anticipate and re-actively respond to customers’ attitudes and changes. The study also proposed a simplified methodology for understanding the sentiment of each user, which, in turn, can be used for conducting comparisons aiming to detect and understand guests’ sentiment changes across time, as well as across users based on their profiles and experiences.


This study contributes to the field by proposing and testing a new methodology for conducting sentiment analysis that addresses previous methodological limitations, as well as the contextual specificities of the tourism industry. Based on the paper’s literature review, this is the first research study using a bag-of-words approach for conducting a sentiment analysis and creating a field-specific lexicon.




对于在线游客评论的研究在过去的几年中与日俱增, 但是仍缺乏有效方法能在有限的时间喝预算内提供终端用户价值。本论文开发并测试了一套情感分析的新方法, 创建两套酒店相关的词库, 此方法超越了标准词典式分析法。


研究样本为TripAdvisor酒店客户评论的大数据, 通过开发崭新的有配重的词库法, 来开展两极式情感分析。这个崭新的具有配重的词库法能够呈现透明化和可复制的程序, 准备、创建、并检验情感分析的词条。这个方法用到了两种词典(有配重的词典L1和手动选择的词典L2), 本论文通过对TripAdvisor大数据进行使用词类划分精准度, 来检测和验证这两种词典。本论文采用两种热门方法(公共词典法和复杂机器学习算法)来对比词典的准确度。


精确度对比结果证实了本论文的方法, 相较于机器学习算法, 显著地超越了以字典为基础的方法。研究结果还表明, 本论文的方法可以就预测用户情感趋势进行推广。


本论文开发并验证了一项方法, 这种方法通过创建可信的词典进行大数据分析, 以判定用户情感。本论文创建的L2酒店词库对分析客人反馈是可靠有用的工具, 这个词库还能帮助酒店经理了解、预测、以及积极相应客人的态度和改变。本论文还提出了一项可以了解每个用户情感的简易方法, 这项方法可以通过对比的方式来检测和了解客人不同时间的情感变化, 以及根据其不同背景和经历的不同用户之间的变化。


本论文提出并检测了一项新方法, 这项情感分析方法可以解决之前方法的局限并立脚于旅游行业。基于文献综述, 本论文是首篇研究, 使用词库法来进行情感分析和创建特别领域词典的方式。



Bagherzadeh, S., Shokouhyar, S., Jahani, H. and Sigala, M. (2021), "A generalizable sentiment analysis method for creating a hotel dictionary: using big data on TripAdvisor hotel reviews", Journal of Hospitality and Tourism Technology, Vol. 12 No. 2, pp. 210-238.



Emerald Publishing Limited

Copyright © 2021, Emerald Publishing Limited

Related articles