The purpose of this paper is to determine whether different scales and ways to collect reviews and ratings found on online travel agencies (OTAs) can affect hotels, and whether hotels obtain the same or different evaluations.
Hotel ratings from five OTAs in four European markets were collected and compared in pairs. An initial comparison was made with the hotel scores of each OTA to show what a typical user would see. Then, a rescaled score (0-10) was used to compare all the OTA scales appropriately and to distinguish between what customers observe and what the reality is.
The results reveal that Booking.com that uses a scale (2.5-10) and Agoda with a scale (2-10) seem to give higher rating scores than Atrapalo (1-10), Travel Republic (0-10) and hotel reservation service (1-10). However, when the scores are rescaled (0-10), the worst ratings are found on Booking.com followed by Agoda.
OTAs should include, next to the scores, the scale used to rate hotels so as to provide users with better and clearer information. Moreover, rating questionnaires should match the verbal denominations with their numerical values to avoid biased ratings.
OTAs and hotel managers are losing information provided by customers because customers are not aware of the scale when rating hotels. Moreover, hotel ratings are used by potential customers to obtain a clearer image of an establishment. However, if some hotels are being overrated by some scales, customers might have higher expectations, which may not be met.
The unique rating scales of Booking.com and Agoda provide additional insights into their hotel evaluations, which seem to be apparently higher when in fact they are not.
本研究收集并比较了来自四个欧洲市场中五个OTA的酒店等级数据。研究首先对每个OTA的酒店得分进行了比较, 以显示一般用户会看到的内容。然后研究使用重新缩放的得分（0-10）来恰当地比较所有OTA的酒店等级, 并区分顾客观察到的内容和现实。
结果显示, Booking.com使用的量表（2.5-10）和Agoda的量表（2-10）, 似乎高于Atrapalo（1-10）, Travel Republic（0-10）和 hotel reservation service （1-10）的评分。但是, 当分数重新调整为（0-10）时, 最差的评分是在Booking.com上, 其次是Agoda。
OTA应在评分旁边注明用于对酒店进行评分的量表, 以便为用户提供更好, 更清晰的信息。此外, 评级问卷应使评价描述与其数值相匹配, 以避免评级出现偏差。
OTA和酒店经理正在丢失客户所提供的信息, 因为客户在对酒店进行评级时并不了解其使用的量表。此外, 潜在客户使用酒店评级来获得更清晰的企业形象。但是, 如果某些酒店被某些网站的评级量表高估, 那么客户可能会有偏高的期望, 而这些期望可能无法被满足。
Las escalas de calificación de las opiniones de los viajes online y sus efectos en la valoración y competitividad de los hoteles.
El objetivo de esta investigación es determinar si las diferentes escalas y formas de recopilar opiniones y valoraciones de las Agencias de Viajes Online (OTAs), pueden afectar a si los hoteles tienen las mismas o distintas calificaciones.
Las calificaciones de hoteles de cinco OTAs en cuatro mercados europeos, se recopilaron y compararon por pares. Se realizó una comparación inicial con las puntuaciones de los hoteles de cada OTA, para mostrar lo que vería un usuario típico. Luego, se utilizó una puntuación de reescalado (0-10), para comparar todas las escalas de las OTAs de manera apropiada y así poder diferenciar entre lo que los clientes observan y lo que es en realidad.
Los resultados revelan que Booking.com, que utiliza una escala (2.5-10) y Agoda con una escala (2-10), parecen puntuar con calificaciones más altas que Atrapalo (1-10), Travel Republic (0-10) y hotel reservation service (1-10). Sin embargo, cuando se vuelven a escalar las puntuaciones (0-10), las peores calificaciones se encuentran en Booking.com, seguida de Agoda.
Las OTAs deben incluir, junto a las puntuaciones, la escala utilizada para calificar los hoteles a fin de proporcionar a los usuarios una información mayor y más clara. Además, los cuestionarios de calificación deben hacer coincidir las denominaciones verbales con sus valores numéricos para evitar calificaciones sesgadas.
Por un lado las OTAs y los gerentes de hoteles, están perdiendo información proporcionada por los clientes, porque los clientes no son conscientes del tipo de escala utilizada cuando califican los hoteles. Por otro lado, los clientes potenciales utilizan las calificaciones de los hoteles para obtener una imagen más clara de un establecimiento. Por lo que en muchos casos, los clientes pueden tener expectativas más altas, que pueden no cumplirse, si los hoteles están siendo sobrevalorados por algunas escalas.
Las escalas de calificación únicas de Booking.com y Agoda, brindan información adicional sobre las evaluaciones de sus hoteles que parecen ser aparentemente más altas cuando en realidad no lo son.
Martin-Fuentes, E., Mellinas, J.P. and Parra-Lopez, E. (2020), "Online travel review rating scales and effects on hotel scoring and competitiveness", Tourism Review, Vol. ahead-of-print No. ahead-of-print. https://doi.org/10.1108/TR-01-2019-0024
Emerald Publishing Limited
Copyright © 2020, Eva Martin-Fuentes, Juan Pedro Mellinas and Eduardo Parra-Lopez.
Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) licence. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial & non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this licence may be seen at http://creativecommons.org/licences/by/4.0/legalcode
Online travel reviews (OTRs) have grown exponentially in recent years, transforming the tourism industry (Buhalis and Law, 2008), especially the exchange of information and the social media have changed consumer behaviour (Femenia-Serra et al., 2019). OTRs are written by tourists who provide opinions and evaluations about their travel experiences on platforms belonging to community-based sites or transaction-based online travel agencies (OTAs) (Xiang et al., 2017) . These OTRs consist not only of text space for users to describe their travel experiences but also a numeric questionnaire that allows customers to rate the services offered or the overall experience. In this sense, recent research shows that more priority is given to rating symbols than to textual material (Aicher et al., 2016) because of an excess of information.
Nowadays, hotels rely on online distribution channels, especially OTAs (Leung, 2019), therefore, hotels that have higher scores on OTA websites are better positioned in the ranking when customers search according to best reviewed hotels. Consequently, a better score contributes to increasing reservations (Vermeulen and Seegers, 2009), increasing online hotel room sales (Cezar and Ögüt, 2016) and leads to increased occupancy rates (Viglia et al., 2016).
OTAs and consumer opinion platforms use different systems for collecting numerical ratings, some of them use a 1-5 rating scale such as TripAdvisor, Expedia or Hotels.com, and others an apparent 1-10 rating scale such as Booking.com, Agoda or HRS.
A study revealed that Booking.com uses a scale from 2.5-10, inducing apparent distortions in scores given to hotels (Mellinas et al., 2015), but the effects of this unique scale have not been studied. The same authors tested a sample of US hotels by comparing their scores with those on the Priceline website. They concluded that hotels get better scores on Booking.com (Mellinas et al., 2016). Research comparing the scale of Booking.com (2.5-10) with TripAdvisor (1-5) with 20 million reviews of more than 20,000 hotels worldwide concluded that the Booking.com scale benefits one- to three-star hotels in Europe and America and is detrimental to five-star hotels worldwide (Martin-Fuentes et al., 2018). Moreover, important differences have been identified in reviews registered on TripAdvisor, Expedia and Yelp, in several aspects, such as ratings (Xiang et al., 2017) and on different OTAs offering hotels from Hong Kong (Leung et al., 2018).
Thus, it seems that there is still an important gap in the development of research into differences in hotel rating scales and scores on various websites. This is of some concern as the increasingly frequent use of these information sources requires reliable and precise rating scales to avoid distortions and errors in the results obtained, as has been detected in some cases. Thus, this research aims to analyse in-depth the hotel rating scales of five OTAs (Booking.com, Agoda, Atrapalo, HRS and Travel Republic[TR]), focusing on those whose systems show an apparent 0-10 or 1-10 scale, to determine whether different scales can lead to significant score variations, and to determine which OTA rating scale provides better or worse hotels score.
2. Literature review
2.1 Online travel agencies and hotel reviews
OTAs have made great efforts in terms of usability, security and quality of service (Bernardo et al., 2012; Chen and Kao, 2010; Chiou et al., 2011; Cho and Agrusa, 2006; Fu Tsang et al., 2010; Kaynama and Black, 2000; Park et al., 2007). Moreover, one of the most important attributes valued by OTA users are the reviews (Kim et al., 2007) that allow users to have a better idea of the services being offered before purchasing.
However, the huge number of reviews published on OTAs and on other platforms about products and services lead to an information overload that makes decision-making difficult (Lamest and Brady, 2019; Martin-Fuentes et al., 2018). Nevertheless, there are different ways to reduce the options when choosing hotel accommodation, such as other users’ ratings or rankings, which serve to reduce the time and effort in the search for information about products or services (Filieri and McLeay, 2014). Thus, customers use them to make quicker and more efficient decisions (Browning et al., 2013) because ratings help not only customers’ decision-making, but also provide visibility for hotels (Nieto-Garcia et al., 2019).
In the academic field, the relevance of hotel reviews in the sector from the point of view of OTAs, users and hotels, has become an increasingly popular subject, generating a large number of publications (Linchi et al., 2017; Schuckert et al., 2015; Serra Cantallops and Salvi, 2014). Large databases of thousands or even millions of hotel reviews are being used (Martin-Fuentes and Mellinas, 2018), usually supported by automatically controlled systems (Radojevic et al., 2015) in a quick, cheap and convenient way. However, the use of this information can imply important errors if the process of capturing this information is not known in sufficient depth.
All OTAs selected for this study use an apparent 0-10 or 1-10 rating scale, but the reality is that two of them, Booking.com and Agoda, start the scale at 2.5 and 2, respectively. This can lead to confusion among users, who could think that the minimum score is 0 or 1 in a typical scale. This confusion has been identified in several types of research erroneously analysing the distinct measurement scales of some OTAs.
Mellinas et al. (2015) identified 13 articles that made the mistake of considering the Booking.com scale to be 0-10 or 1-10. Since then, most authors have taken this scale into account when they have used data from Booking.com (24 citations in 3 years). However, many authors are repeating the mistake of considering that all OTAs use a scale of 0-10 or 1-10 (Abrate and Viglia, 2016; Castro and Ferreira, 2018; Ert et al., 2016; Kim and Park, 2017; Leung et al., 2018; Pokryshevskaya and Antipov, 2017, among others). Even a UNWTO report indicated that Booking.com uses a 1-10 scale (Blomberg-Nygard and Anderson, 2016).
Often, this error does not affect the results of the studies carried out, because they are qualitative studies based on the content of the reviews. The most important inaccuracies occur when using scores from different websites and trying to homogenize scales as stated by Leung et al. (2018):
By contrast, Booking.com, Agoda, Priceline, and Kayak used a 10-point scale […] To standardize the baseline scores for comparison in this study, the 10-point scores were divided by two to achieve a 5-point scale score
OTAs encourage consumers to write reviews about services once they have used them by sending an e-mail to the person who bought and consumed the service. They use different types of questionnaires to collect consumers’ opinions, as summarized in Table I. Booking.com asks guests to rate six categories or attributes, and the hotel’s final score is the arithmetic average of them. The effects of these ratings have been studied by Nieto-Garcia et al. (2019), who conclude that not all the attributes play the same role for revenue maximization. Agoda uses a similar system, also with six categories and Atrapalo asks customers to rate eight categories.
Although categories to rate hotels are quite similar on OTAs, the questionnaires are different when it comes to the number of answers in each question. On Booking.com, before 2015, there were four-point answers in each category in which a designation of “poor” assigned a 2.5 rating to the hotel, “fair” was rated with a 5, “good” with a 7.5 and “excellent” with a 10 (Mellinas et al., 2015). Since 2015, Booking.com has continued to use the same system, but now uses smiley faces instead of the mentioned designations.
All OTAs seem to use a Likert scale in their questionnaires. This was first used in research to measure the five major “attitude areas” in psychology (Likert, 1932). However, there is no consensus as to the number of points for the answers to surveys using a Likert scale (Bisquerra-Alzina and Pérez-Escoda, 2015; Boone and Boone, 2012), nor in the number of points for the answers in an OTA survey .
Most research that uses a Likert scale applies between three- to seven-point responses, but there is a wide range of points, between 2 and 20 (Bisquerra-Alzina and Pérez-Escoda, 2015), although using more than a five-point scale complicates the denomination of each point because tags normally accompany the Likert scale. Bisquerra-Alzina and Pérez-Escoda (2015) recommend the use of an 11-point Likert scale (from 0 to 10) because it increases the sensitivity of the results.
Dawes (2008) analysed the results of the same questions using five-, seven-, and ten-point scales concluding that the latter produces slightly lower scores compared to the former ones. Leung et al. (2018) also concluded that the results of OTA ratings that use a five-point scale were higher than those from a ten-point one, although, as already mentioned, this study did not take into account that the Booking.com scale was not from 0 to 10.
Furthermore, Worcester and Burns (1975) detected that a four-point scale without a midpoint seemed to get more answers towards the most positive part of the scale, whereas Adelson and McCoach (2010) confirmed that there were no differences in the results with a four-point and a five-point scale, so a neutral point was not so important.
Although Booking.com uses a four-point Likert scale, each point is multiplied by 2.5 to reach 10 as a maximum score, and the minimum is not 0 but 2.5 (Mellinas et al., 2015). Agoda uses a five-point Likert scale, in which the minimum score is 2, and the maximum is 10, so each point is multiplied by 2. TR uses an eleven-point Likert scale from 0 to 10, and HRS a ten-point Likert scale from 1 to 10. And Atrapalo uses the system of a ten-point Likert scale from 1 to 10.
Related to the tags that accompany the scale, Worcester and Burns (1975) confirmed that the interpretation sometimes cannot be adjusted not because “different words mean different things, but that the same word can be made to mean different things as the context changes” (Worcester and Burns, 1975: p. 182).
It is worth mentioning that the description of the answers on some OTAs are more positive than negative, thus in the four-point Likert scale on Booking.com, the second point used to be described as “fair,” which is a neutral point. The same happens today with the use of smiley faces; the second point is a neutral smiley, drawn as.
3. Research aim and hypothesis
To provide additional insights about OTA rating scales, this study aims to analyse the rating scales that apparently use a 0-10 or a 1-10 scale by answering the research question: Do OTA rating scale systems provide the same score results for the same hotels? Conversely, do the rating scales lead to significant rating variations? Which OTA rating scale produces better/worse scores for the same hotels?
Moreover, the specific objective of this research is to compare the rating scales of Booking.com (2.5-10), Agoda (2-10) and other OTAs that use a 0-10 or 1-10 measurement scale. This will contribute to a better understanding for the scarce literature about the effects of these “singular” rating scales.
We propose the following hypotheses:
Booking.com rating scale (2.5-10) and Agoda rating scale (2-10) provide higher hotel scores than the OTAs that use 0-10 or 1-10 scales.
Booking.com and Agoda rating scales rescaled to 0-10 provide lower hotel scores than the OTAs using the original 0-10 scales.
Ten- and eleven-point Likert scales show lower scores than four- and five-point scales.
To know whether OTA rating scales provide the same rating results for the same hotels, a search was performed among a wide range of OTAs operating in Europe that use an apparent 0-10 or 1-10 scoring system, identifying those that implement “verified review” systems.
The same hotels were selected in each comparison and, to minimize possible biases, we set a requirement to select a hotel with a minimum of 40 reviews. This condition made it difficult to identify valid websites for our study because, although there are OTAs that use this scale, some do not have significant activity in Europe (Bookit, Despegar, Malapronta, Ctrip, etc.) or do not have a significant number of hotels with the minimum of 40 reviews (Hoteliers, Splendia, etc.).
As not all the hotels operate with all OTAs, it was not possible to compare the same hotels at the same time on all OTAs. This is the reason why the comparison of the hotels was performed with OTAs in pairs, so that we could compare exactly the same hotels from the same destinations in each comparison.
Finally, we selected five platforms that met the above conditions, but, despite having apparently identical scoring systems, they showed relevant differences, as can be observed in Table I.
TR uses a 0-10 scale; HRS and Atrapalo a 1-10 scale, whereas Agoda uses a 2-10 scale, and Booking a 2.5-10 scale. Moreover, HRS and Booking.com delete reviews after a certain period, but the three other websites do not seem to delete old reviews.
The three selected websites that use a conventional system (scale 0-10 or 1-10) have a limited geographical scope. This made finding a significant number of hotels with 40 reviews on several of these websites unattainable. However, Agoda and Booking.com have worldwide implementation, allowing us to find the same hotels with more than 40 reviews in the websites identified.
Booking.com has been used in numerous studies as a source of information as already mentioned. Agoda has also been used in various investigations, especially focusing on the Asian market, in some cases assuming wrongly that it uses a scale that ranges from 0 to 10 (Zhou et al., 2014) or 1 to 10 (Muangon et al., 2014). In other cases, studies have focused on semantic analysis, so the scoring system does not affect the results obtained (Haruechaiyasak et al., 2010; Patel et al., 2015). HRS has been used in very specific cases focused on its geographical sphere of influence (Jannach et al., 2014; Schütze, 2008), and Atrapalo has been used in studies on the Spanish market (María-Dolores et al., 2012; Poggi et al., 2007). The TR database has not been used for academic research, as far as the authors are aware.
Data were taken manually on different hotel samples in Europe with different locations during May 2015 starting with the largest cities of each sample, analysing all the hotels in each city and randomly selecting hotels with at least 40 reviews. This is the reason most of the hotels analysed in this research were located in large cities, as can be seen in Table II. Moreover, the hotels were of all categories and the largest operate with different OTAs. The shortage of hotels that fulfilled the conditions prevented larger samples from being used and from being able to realize a specific sample design, which would have allowed the sample to be segmented by hotel or client type.
When the number of hotels indicated for each sample was reached, the selection of new cities and hotels was stopped. In all cases, there was a Web of reference that uses a system of conventional assessment, whose scores were compared with those of Agoda and Booking.com in three of the cases and, exclusively with Booking.com in the fourth case. In addition, we compared scores obtained from Agoda and Booking.com for the three cases in which it is possible. This was the same-pair sample, i.e. the scores of the assessment of the different websites in each market were performed on the same hotels.
As the aim of this study is to determine whether the rating scales of OTAs provide the same results for the same hotels, a mean comparison Student’s t-test distribution for same pairs was performed divided by markets (Table III) and OTAs (Table IV) with the mean ratings, as announced on each OTA (Row: Rating OTA) and with the rescaled ratings from 0 to 10 (Row: Rescaled rating OTA).
Statistical calculations were performed with SPSS V.24 and the analyses were carried out with normal ratings (ratings assigned to each hotel by each OTA) to compare the results a typical user would see when entering each OTA to look for a hotel. To compare ratings with the same scales, the scores of all hotels in each OTA were rescaled to 0-10 with the min-max normalization method because of its simplicity.
5.1 Results from scores announced by online travel agencies
The results of Student’s t-test distribution for same pairs (Table III) show that Booking.com scores are significantly higher than other platforms in all markets when the rating taken is the one announced by Booking.com for each hotel, as already confirmed (Parra et al., 2018). Results show minor variations when comparing Booking.com scores with Agoda, where the differences are lower. Worth noting is the case of hotels in Germany, where the results are not significant. It is therefore confirmed that the effect that the Booking.com scale has on hotel scores is to produce higher scores.
The results comparing Agoda with HRS in Germany confirm that Agoda scale system also gives higher scores. However, when comparing Agoda with Atrapalo or TR, the results are not statistically significant. The highest statistical score mean difference is in Germany; on Booking.com, the hotels obtain 7.947, and hotels on HRS get 7.609 (t = 10.78; p < 0.001).
The results in Table IV show that, when comparing Booking.com original scores with any of the OTAs analysed, the mean hotel score is higher and statistically significant in all cases. With Agoda, the comparison is made with only three of the four samples, because this site does not have enough hotels with the required minimum of reviews for the Spanish coast hotels sample.
Thus, the results confirm the first hypothesis that the Booking.com rating scale (2.5-10) and Agoda rating scale (2-10) provide higher hotel scores compared to the OTAs that use 0-10 or 1-10 scales.
5.2 Results from rescaled ratings from 0 to 10
This section shows the results a user of any of those OTAs can observe with rescaled ratings (0-10), the results are different because the worst scores are obtained on Booking.com followed by Agoda.
First, Booking.com shows the lowest ratings when compared with any other OTA, even with Agoda with all the data set, as observed in Table IV, although they use similar measurement scale systems. However, when markets separate the results, there are no mean differences between Booking.com and Agoda in Europe or the UK, as can be seen in Table III. The highest statistical mean score difference is in comparing Booking.com with TR. Hotels on TR get 7.636, but on Booking.com, the hotels obtain 7.020 (t = 21.06; p < 0.001).
Second, Agoda obtains the lowest ratings when compared with Atrapalo and TR, but when compared with HRS in Germany, although there is a slightly higher rating in favour of HRS, they are not statistically significant. In this sense, the highest statistical score mean difference is between TR and Agoda with 7.815 and 7.233 (t = 15.71; p < 0.001), respectively.
Moreover, a Levene’s test was performed to assess the homogeneity of variance of the ratings by market. The results show that there is homogeneity of variance in all cases because the p-value is higher than 0.05, except for the German market between Agoda and Booking.com (p < 0.001) and between Agoda an HRS (p <* 0.001) where there are statistically significant differences in variances.
When the ratings are rescaled from 0 to 10 to compare scores fairly, that is, using the same scale, the results confirm the second hypothesis that Booking.com obtains the lowest hotel scores followed by Agoda. As Booking.com and Agoda use a four- and five-point Likert scale, respectively, and the other OTAs use a ten- or eleven-point scale, the third hypothesis is rejected because the highest results are for the OTAs that use 10- or 11-point scales.
An analysis of several OTAs that use a 0-10 or 1-10 scale for hotel ratings demonstrates that there is a disparity in scoring systems. In addition to the already known Booking.com 2.5-10 scale, Agoda uses a 2-10 scale; TR a 0-10 scale, whereas HRS and Atrapalo use a 1-10 scale. In the last two cases, although they would appear to be fully equivalent systems, we observe that HRS calculates a global score for each hotel as an arithmetic average of up to 12 items, whereas Atrapalo uses 8. The OTAs also use different delete criteria for old reviews. Even how to describe the customer experience is different with some OTAs asking customers about the most positive and negative aspects of the hotel.
This study shows that the OTA rating scale systems provide different score results for the same hotels when comparing the scores calculated by the OTA and when comparing the rescaled scores from 0 to 10, but the differences in each case are in the opposite direction.
The singular scales of Booking.com (2.5-10) and Agoda (2-10) provide apparently better scores for the hotels when compared with the other OTAs analysed. This is also confirmed by Leung et al. (2018) with their study comparing several OTAs, but the study did not take into account that the Booking.com scale starts on 2.5 instead of 1.
This effect is what any user sees when consulting these websites to compare hotel scores. Therefore, these OTAs use a scale that is not the usual one and leads users to believe that a hotel with a score of 5 or 6 on average in the assessment on Agoda and on Booking.com is a hotel with a “pass mark” when it is actually a failure. This perception is reinforced when words such as “passable”, “pleasant”, “acceptable” or “above average” are included next to these scores (Mellinas and Reino, 2019).
Not only can users of these websites be confused with these peculiar scales, but also researchers that have made studies based on the scale of Booking.com as 0-10 or 1-10 (Abrate and Viglia, 2016; Ert et al., 2016; Kim and Park, 2017; Leung et al., 2018; Pokryshevskaya and Antipov, 2017) even when Mellinas et al. (2015) reported that the scale was from 2.5 to 10. These publications have passed the filter of reviewers and editors, disseminating erroneous data among the scientific community. It has happened with Booking.com, and may happen with the rest of the OTA websites in the future, if prior studies of them are not carried out.
When analysing the results with a normalized scale (0-10), comparable among OTAs, the results are different because the worst scores are obtained by hotels on Booking.com, followed by those on Agoda.
Booking.com uses a four-point Likert scale and Agoda uses a five-point one and produces, when normalizing the scale, the worst results compared with the scales of 10 or 11 points. This result goes in the opposite direction to Dawes’ study (2018) in which ten-point scales produced slightly lower scores compared with five-point and seven-point ones. It is worth mentioning that with the Booking.com and Agoda scales, although OTAs have similarities with four- and five-point Likert scales, the results are multiplied by 2.5 and 2, respectively; therefore, the results obtained do not have to coincide with the aforementioned study related to the highest score on a ten-point Likert scale.
Moreover, if we take into account that a four-point scale without a midpoint seems to attract more answers to the positive side of the scale (Worcester and Burns, 1975), the results could be better for OTAs using this scale, although there is no consensus on the effects of using an even or odd number in the response scale (Adelson and McCoach, 2010) neither in the number of points used in the answers of surveys using Likert scales (Bisquerra-Alzina and Pérez-Escoda, 2015; Boone and Boone, 2012).
It is also important to note that in collecting text responses and reviews, Booking.com and Agoda ask users to evaluate the most positive and negative aspects of the accommodation, which could mean respondents have to perform a memory exercise of negative situations experienced in the hotel. Therefore, numerical ratings are lower than the other OTAs, once the scale is normalized. This could explain the fact that Booking.com and Agoda have the lowest results.
Regarding the tags that accompany the numeric Likert scales, the literature confirms that instead of providing more information, sometimes they can confuse respondents because they can have different meanings (Worcester and Burns, 1975). In the case of OTAs that use verbal denominations or tags in the form of smiley faces instead of numbers to rate the hotel, the authors have confirmed that some OTA use tags or textual descriptions of each item to be assessed on a Likert scale that do not have the same relation with the numerical value. For example, the authors observed that some OTAs use adjectives or elements such as smileys that tend to be more positive than negative. For instance, Booking.com used to describe “fair” as the second point in a four-point scale. At present, it describes this point with a neutral face, but the response is not at the midpoint of the scale because there are only four points, which could induce respondents to rate better than they really would.
OTAs seek to obtain the best ratings for hotels because this implies greater satisfaction of hotel managers because of the online reputation they obtain through these websites, and it is linked to the success of the hotel and the quality of service (Jalilvand et al., 2017). Also, it have been proven that better online reviews and hotel ratings increase users willingness to book rooms there (Vermeulen and Seegers, 2009), resulting in better hotel room sales (Cezar and Ögüt, 2016). So being the OTA with the best for hotels makes it more desirable both for potential customers and for accommodation establishments to distribute their rooms with them. Booking.com and Agoda’s scales lead customers to believe that hotels are better valued than they really are. Thus, when a user compares hotels on different OTAs websites, at equal prices, customers perceive a hotel that has apparently a higher score and is better than the same hotel on an OTA with a scale from 0 to 10. For example, a real hotel of our dataset in Calella (Spain) is more likely to make sales through Booking.com (6.3) or Agoda (6.4) than through TR (5.6) or Atrapalo (4.8).
Furthermore, OTAs establish mechanisms for collecting reviews and ratings that are as beneficial as possible to obtain the best results for hotels. Proof of this is that since the collection of the data herein, we have seen how the collection format has changed in several OTAs (e.g. Booking.com has changed the description of the four-point response and changed the policy of eliminating old reviews from 14 to 24 months).
The two websites that use “non-conventional” scoring systems (Booking.com and Agoda) belong to Booking holdings. These websites also use very positive words together with scores of the lowest rated hotels (Mellinas and Reino, 2018). We wonder if it is all part of a strategy to improve hotel quality perception that is close to being considered fake or deceptive advertising. Clearly, the way to calculate the final score seems to be more honest for OTAs with a real 0-10 or 1-10 scale such as Atrapalo, HRS, or TR.
7. Implications and conclusions
7.1 Theoretical implications
Through an analysis of hotels that operate with several OTAs that use different measurement scales, this study confirms that there are differences in the final scores obtained by hotels, as suggested by various authors comparing only two websites (Martin-Fuentes et al., 2018; Mellinas et al., 2016). This research provides an additional insight with an analysis of the same hotels that operate in five OTAs in different markets.
This finding should be taken into consideration when comparing ratings from these OTAs because comparing scales from 0 to 10, from 1 to 10, from 2 to 10 or from 2.5 to 10, without normalizing them, is like comparing “apples with oranges”.
Despite having analysed the number of points on a Likert scale through OTAs, the recurrent question in numerous investigations about how many points to use in the responses on a Likert scale remains unanswered.
7.2 Managerial implications
This research has several practical implications for OTAs using different scales than the 0-10 or 1-10. First, OTAs and hotel managers are losing information provided by customers because guests are not aware of the scale system when rating them. Second, online reviews are a valuable source of information for potential customers that allow them to have a closer understanding of the services and the facilities they will find in the accommodation establishment. With some scales, hotels can be overrated, and customers will create high expectations that are not consistent with reality. This will be detrimental to hotels because they might not be able to satisfy these expectations; therefore, customers will be dissatisfied with their hotel experience. There are better ways to improve the hotel ratings such as training the staff on emotional intelligence (Koc and Boz, 2019), answering the online reviews of previous guests (Wei et al., 2013), encouraging guests to handwrite their opinions as it is demonstrate that subsequent ratings are better if the opinion has been previously handwritten (Tassiello et al., 2018).
A third group of implications could come from the competitors of Bookings holdings. These should consider whether it is appropriate to maintain a system of conventional scales and allow hotels to be better valued in other websites or choose to change the scale of their systems. In this case, competitors could invent new scales that inflate scores even more than Agoda or Booking.com, which would lead to a very controversial situation.
In conclusion, these variations in the scales and the confusion that they entail are having negative effects for consumers, hoteliers, researchers and competitors. For this reason, we suggest OTAs include, next to the score, the type of scale used to rate the hotel to provide users with more information. In this sense, we applaud the initiative of Booking.com, which has recently created a website section titled “How is my review score calculated?” (Booking.com, 2018) explaining its review score system. However, this information is located in a section aimed at Booking.com partners. Therefore, we have doubts about the number of users who know about it and take it into account when choosing a hotel.
Furthermore, OTAs that use textual descriptions or tags in the form of smiley faces to rate the hotel should be honest and match the verbal denomination with its numerical value and not use elements that tend to be more positive than negative because they can cause confusion and biased answers from users.
This study confirms the existence of differences between scores when using different rating scales. Both hoteliers and researchers should not continue to make mistakes when considering that the scores provided by different websites are equivalent, simply because they seem to use identical information and score collection systems. As has been demonstrated, the scales are variable, in addition to the items used, which leads to significant differences in scores. These findings should be considered in future investigations with quantitative analyses using these sources of information, especially if they are designed to combine different sources though assuming that the data are equivalent.
8. Limitations and future directions for research
Evidence shows that scores for the same hotels vary depending on each OTA. With the present study, we cannot point to a single reason such as the measurement scale, method of collecting reviews, textual description of positive and negative aspects, old review policies of OTAs or even that different platforms might have different users that might come with some sample self-selection issues on platforms. There are more factors to be considered such as the user’s nationality that can be more or less demanding of hotel services (Au et al., 2014; Leung et al., 2018), cultural differences in responses to a Likert scale (Lee et al., 2002). Another factor could be the percentage of reviews collected on each OTA through either mobile phones or personal computers, since the analysis carried out by Mariani et al. (2019) determined that the ratings on Booking.com were higher for responses collected with smartphones.
Finally, it would be interesting to carry out similar research with websites that only use rating scales from 1 to 5. This way, whether there are significant differences in scoring systems, as seen in this research, could be verified.
Analysed websites features
|Web||Scale||Likert points||Categories||Main destinations||Delete reviews|
|TR||0-10||11||Unknown||UK and coastal destinations||No|
|HRS||1-10||10||12||German-speaking countries||24 months|
|Atrapalo||1-10||10||8||Spain & major European cities||No|
|Agoda||2-10||5||6||Asia and major world cities||No|
Source: Own elaboration
Geographical scope and websites
|Germany||Major German cities (Berlin, Frankfurt, Munich, Hamburg or Stuttgart)||N = 150||HRS, Agoda, Booking.com|
|Europe||Major European cities (Barcelona, Madrid, London, Rome, Paris, Amsterdam, Brussels, Lisbon, among others)||N = 100||Atrapalo, Booking.com, Agoda|
|UK||Major UK cities (London and Manchester)||N = 100||TR, Agoda, Booking.com|
|Coast||Spanish coast destinations (Marbella, Torremolinos, Lloret de Mar, Adeje, Benalmadena)||N = 100||TR, Booking.com|
Source: Own elaboration
Student’s t-test for same-pairs sample by markets
|Displays||OTA 1||OTA 2||Rating OTA 1
|Rating OTA 2
|p-value||GIS||Rescaled rating OTA1||Rescaled rating OTA2||p-value||GIS|
|UK||Booking||TR||7.933 (0.705)||7.815 (0.886)||0.004||*||7.244 (0.940)||7.815 (0.886)||0.000||***|
|Coast||Booking||TR||7.597 (0.585)||7.456 (0.743)||0.001||***||6.796 (0.780)||7.456 (0.743)||0.000||***|
|Europe||Booking||Atrapalo||7.956 (0.580)||7.805 (0.740)||0.000||***||7.275 (0.773)||7.561 (0.823)||0.000||***|
|Germany||Booking||HRS||7.947 (0.630)||7.609 (0.836)||0.000||***||7.263 (0.840)||7.343 (0.928)||0.017||*|
|UK||Agoda||TR||7.786 (0.626)||7.815 (0.886)||0.482||NS||7.233 (0.783)||7.815 (0.886)||0.000||***|
|Europe||Agoda||Atrapalo||7.826 (0.537)||7.805 (0.740)||0.623||NS||7.283 (0.672)||7.561 (0.823)||0.000||***|
|Germany||Agoda||HRS||7.911 (0.495)||7.609 (0.836)||0.000||***||7.389 (0.618)||7.343 (0.928)||0.336||NS|
|UK||Booking||Agoda||7.933 (0.705)||7.786 (0.626)||0.000||***||7.244 (0.940)||7.233 (0.783)||0.746||NS|
|Europe||Booking||Agoda||7.956 (0.580)||7.826 (0.537)||0.000||***||7.275 (0.773)||7.283 (0.672)||0.808||NS|
|Germany||Booking||Agoda||7.947 (0.630)||7.911 (0.495)||0.139||NS||7.263 (0.840)||7.389 (0.618)||0.000||***|
Level of significance *** p < 0.001; NS is not significant
Student’s t-test distribution for same-pairs sample by OTAs
|N||OTA 1||OTA 2||Rating OTA 1
|Rating OTA 2
|p-value||GIS||Rescaled rating OTA1
|Rescaled rating OTA2
|200||Booking||TR||7.765 (0.668)||7.636 (0.835)||0.000||***||7.020 (0.890)||7.636 (0.835)||0.000||***|
|100||Booking||Atrapalo||7.956 (0.580)||7.805 (0.740)||0.000||***||7.275 (0.773)||7.561 (0.823)||0.000||***|
|150||Booking||HRS||7.947 (0.630)||7.609 (0.836)||0.000||***||7.263 (0.840)||7.343 (0.928)||0.017||***|
|100||Agoda||TR||7.786 (0.626)||7.815 (0.886)||0.482||NS||7.233 (0.783)||7.815 (0.886)||0.000||***|
|100||Agoda||Atrapalo||7.826 (0.537)||7.805 (0.740)||0.623||NS||7.283 (0.672)||7.561 (0.823)||0.000||***|
|150||Agoda||HRS||7.911 (0.495)||7.609 (0.836)||0.000||***||7.389 (0.618)||7.343 (0.928)||0.336||NS|
|350||Booking||Agoda||7.946 (0.637)||7.851 (0.548)||0.000||***||7.261 (0.850)||7.314 (0.685)||0.008||***|
Level of significance *** p < 0.001; NS is not significant
Abrate, G. and Viglia, G. (2016), “Strategic and tactical price decisions in hotel revenue management”, Tourism Management, Vol. 55, pp. 123-132.
Adelson, J.L. and McCoach, D.B. (2010), “Measuring the mathematical attitudes of elementary students: the effects of a 4-point or 5-point likert-type scale”, Educational and Psychological Measurement, Vol. 70 No. 5, pp. 796-807.
Aicher, J., Asiimwe, F., Batchuluun, B., Hauschild, M., Zöhrer, M., Egger, R. (2016), “Online hotel reviews: rating symbols or text … text or rating symbols?”, That is the Question!’, Information and Communication Technologies in Tourism 2016, pp. 369-382.
Au, N., Buhalis, D. and Law, R. (2014), “Online complaining behavior in mainland China hotels: the perception of chinese and Non-Chinese customers”, International Journal of Hospitality & Tourism Administration, Vol. 15 No. 3, pp. 248-274.
Bernardo, M., Marimon, F. and del Mar Alonso-Almeida, M. (2012), “Functional quality and hedonic quality: a study of the dimensions of e-service quality in online travel agencies”, Information & Management, Vol. 49 Nos 7/8, pp. 342-347.
Bisquerra-Alzina, R. and Pérez-Escoda, N. (2015), “¿pueden las escalas likert aumentar en sensibilidad?”, REIRE. Revista D'Innovació i Recerca en Educació, Vol. 8 No. 2, pp. 129-147.
Blomberg-Nygard, A. and Anderson, C.K. (2016), “United Nations world Tourism Organization study on online guest reviews and hotel classification systems: an integrated approach”, Service Science, Vol. 8 No. 2, pp. 139-151.
Booking.com (2018), “How is my review score calculated?”, available at: https://partnerhelp.booking.com/hc/en-us/articles/213302185-How-is-my-review-score-calculated- (accessed 3 November 2018).
Boone, H.N. and Boone, D.A. (2012), “Analyzing Likert data”, Journal of Extension, Vol. 50 No. 2, pp. 1-5.
Browning, V., So, K.K.F. and Sparks, B. (2013), “The influence of online reviews on consumers’ attributions of service quality and control for service standards in hotels”, Journal of Travel and Tourism Marketing.
Buhalis, D. and Law, R. (2008), “Progress in information technology and tourism management: 20 years on and 10 years after the internet–the state of etourism research”, Tourism Management, Vol. 29 No. 4, pp. 609-623.
Castro, C. and Ferreira, F.A. (2018), “Online hotel ratings and its influence on hotel room rates: the case of Lisbon, Portugal”, Tourism & Management Studies, Vol. 14 No. 1, pp. 63-72.
Cezar, A. and Ögüt, H. (2016), “Analyzing conversion rates in online hotel booking”, International Journal of Contemporary Hospitality Management, Vol. 28 No. 2, pp. 286-304.
Chen, C.-F. and Kao, Y.-L. (2010), “Relationships between process quality, outcome quality, satisfaction, and behavioural intentions for online travel agencies – evidence from Taiwan”, The Service Industries Journal, Vol. 30 No. 12, pp. 2081-2092.
Chiou, W.-C., Lin, C.-C. and Perng, C. (2011), “A strategic website evaluation of online travel agencies”, Tourism Management, Vol. 32 No. 6, pp. 1463-1473.
Cho, Y.C. and Agrusa, J. (2006), “Assessing use acceptance and satisfaction toward online travel agencies”, Information Technology & Tourism, Vol. 8 Nos 3/4, pp. 179-195.
Dawes, J. (2008), “Do data characteristics change according to the number of scale points used? An experiment using 5-Point, 7-Point and 10-Point scales”, International Journal of Market Research, Vol. 50 No. 1, pp. 61-104.
Ert, E., Fleischer, A. and Magen, N. (2016), “Trust and reputation in the sharing economy: the role of personal photos in airbnb”, Tourism Management, Vol. 55, pp. 62-73.
Femenia-Serra, F., Perles-Ribes, J.F. and Ivars-Baidal, J.A. (2019), “Smart destinations and tech-savvy millennial tourists: hype versus reality”, Tourism Review, Vol. 74 No. 1, pp. 63-81.
Filieri, R. and McLeay, F. (2014), “E-WOM and accommodation: an analysis of the factors that influence travelers’ adoption of information from online reviews”, Journal of Travel Research, Vol. 53 No. 1, pp. 44-57.
Fu Tsang, N.K., Lai, M.T.H. and Law, R. (2010), “Measuring e-service quality for online travel agencies”, Journal of Travel & Tourism Marketing, Vol. 27 No. 3, pp. 306-323.
Haruechaiyasak, C., Kongthon, A., Palingoon, P. and Sangkeettrakarn, C. (2010), “Constructing thai opinion mining resource: a case study on hotel reviews”, Proceedings of the Eighth Workshop on Asian Language Resouces, pp. 64-71.
Jalilvand, M.R., Nasrolahi Vosta, L., Kazemi Mahyari, H. and Khazaei Pool, J. (2017), “Social responsibility influence on customer trust in hotels: mediating effects of reputation and word-of-mouth”, Tourism Review, Vol. 72 No. 1, pp. 1-14.
Jannach, D., Zanker, M. and Fuchs, M. (2014), “Leveraging multi-criteria customer feedback for satisfaction analysis and improved recommendations”, Information Technology & Tourism, Vol. 14 No. 2, pp. 119-149.
Kaynama, S.A. and Black, C.I. (2000), “A proposal to assess the service quality of online travel agencies: an exploratory study”, Journal of Professional Services Marketing, Vol. 21 No. 1, pp. 63-88.
Kim, D.J., Kim, W.G. and Han, J.S. (2007), “A perceptual mapping of online travel agencies and preference attributes”, Tourism Management, Vol. 28 No. 2, pp. 591-603.
Kim, W.G. and Park, S.A. (2017), “Social media review rating versus traditional customer satisfaction”, International Journal of Contemporary Hospitality Management, Vol. 29 No. 2, pp. 784-802.
Koc, E. and Boz, H. (2019), “Development of hospitality and tourism employees’ emotional intelligence through developing their emotion recognition abilities”, Journal of Hospitality Marketing & Management, pp. 1-18.
Lamest, M. and Brady, M. (2019), “Data-focused managerial challenges within the hotel sector”, Tourism Review, Vol. 74 No. 1, pp. 104-115.
Lee, J.W., Jones, P.S., Mineyama, Y. and Zhang, X.E. (2002), “Cultural differences in responses to a likert scale”, Research in Nursing & Health, Vol. 25 No. 4, pp. 295-306.
Leung, R. (2019), “Smart hospitality: Taiwan hotel stakeholder perspectives”, Tourism Review, Vol. 74 No. 1, pp. 50-62.
Leung, R., Au, N., Liu, J. and Law, R. (2018), “Do customers share the same perspective? A study on online OTAs ratings versus user ratings of Hong Kong hotels”, Journal of Vacation Marketing, Vol. 24 No. 2, pp. 103-117.
Likert, R. (1932), A Technique for the Measurement of Attitudes, Archives of Psychology.
Linchi, K., Karen, L.X. and Tori, R. (2017), “Thematic framework of online review research – a systematic analysis of contemporary literature on seven major hospitality and tourism journals”, International Journal of Contemporary Hospitality Management, Vol. 29 No. 1, pp. 307-354.
María-Dolores, S.M.M., García, J.J.B. and Mellinas, J.P. (2012), “Los hoteles de la región de Murcia ante las redes sociales y la reputación online”, Revista de Análisis Turístico, Vol. 13, pp. 1-10.
Mariani, M.M., Borghi, M. and Gretzel, U. (2019), “Online reviews: differences by submission device”, Tourism Management, Vol. 70, pp. 295-298.
Martin-Fuentes, E. and Mellinas, J.P. (2018), “Hotels that most rely on booking.com – online travel agencies (OTAs) and hotel distribution channels”, Tourism Review, Vol. 73 No. 4, pp. 465-479.
Martin-Fuentes, E., Fernandez, C., Mateu, C. and Marine-Roig, E. (2018), “Modelling a grading scheme for peer-to-peer accommodation: stars for airbnb”, International Journal of Hospitality Management, Vol. 69, pp. 75-83.
Martin-Fuentes, E., Mateu, C. and Fernandez, C. (2018), “Does verifying users influence rankings? Analyzing booking.Com and tripadvisor”, Tourism Analysis, Vol. 23 No. 1, pp. 1-15.
Mellinas, J.P., Martínez María-Dolores, S.-M. and Bernal García, J.J. (2015), “Booking.com: the unexpected scoring system”, Tourism Management, Vol. 49, pp. 72-74.
Mellinas, J.P. and Reino, S. (2018), “‘Neutrality in descriptions beside overall hotel scores”, Conference Proceedings The INC, pp. 60-62.
Mellinas, J.P. and Reino, S. (2019), “Average scores integration in official star rating scheme”, Journal of Hospitality and Tourism Technology, Vol. 10 No. 3.
Mellinas, J.P., Martínez María-Dolores, S.-M. and Bernal García, J.J. (2016), “Effects of the booking.com scoring system”, Tourism Management, Vol. 57, pp. 80-83.
Muangon, A., Thammaboosadee, S. and Haruechaiyasak, C. (2014), “A lexiconizing framework of feature-based opinion mining in tourism industry”, 2014 4th International Conference on Digital Information and Communication Technology and Its Applications, DICTAP, pp. 169-173.
Nieto-Garcia, M., Resce, G., Ishizaka, A., Occhiocupo, N. and Viglia, G. (2019), “The dimensions of hotel customer ratings that boost RevPAR”, International Journal of Hospitality Management, Vol. 77, pp. 583-592.
Park, Y.A., Gretzel, U. and Sirakaya-Turk, E. (2007), “Measuring website quality for online travel agencies”, Journal of Travel & Tourism Marketing, Vol. 23 No. 1, pp. 15-30.
Parra, E., Mellinas, J.P., Martínez María-Dolores, S.-M., Bernal Garcia, J.J. and Gutiérrez-Taño, D. (2018), “Effects of reviews scales on hotel online reputation”, in XII Congreso Internacional de Turismo y Tecnologías de la información y las comunicaciones Turitec 2018, Malaga, pp. 98-116.
Patel, B.S., Varma, T. and Patel, P.S. (2015), “A survey on feature based opinion mining for tourism industry”, Journal of Engineering Computers & Applied Sciences, Vol. 4 No. 3, pp. 83-86.
Poggi, N., Moreno, T., Berral, J.L., Gavaldà, R. and Torres, J. (2007), ‘“Web customer modeling for automated session prioritization on high traffic sites”, in Conati, C., McCoy, K. and Paliouras, G. (Eds), User Modeling 2007, Springer Berlin Heidelberg, pp. 450-454.
Pokryshevskaya, E.B. and Antipov, E.A. (2017), “Profiling satisfied and dissatisfied hotel visitors using publicly available data from a booking platform”, International Journal of Hospitality Management, Vol. 67, pp. 1-10.
Radojevic, T., Stanisic, N. and Stanic, N. (2015), “Solo travellers assign higher ratings than families: examining customer satisfaction by demographic group”, Tourism Management Perspectives, Vol. 16, pp. 247-258.
Schuckert, M., Liu, X. and Law, R. (2015), “A segmentation of online reviews by language groups: how English and non-English speakers rate hotels differently”, International Journal of Hospitality Management, Vol. 48, pp. 143-149.
Schütze, J. (2008), “Pricing strategies for perishable products: the case of Vienna and the hotel reservation system hrs”, Central European Journal of Operations Research, Vol. 16 No. 1, pp. 43-66.
Serra Cantallops, A. and Salvi, F. (2014), “New consumer behavior: a review of research on eWOM and hotels”, International Journal of Hospitality Management, Vol. 36, pp. 41-51.
Tassiello, V., Viglia, G. and Mattila, A.S. (2018), “How handwriting reduces negative online ratings”, Annals of Tourism Research, Vol. 73, pp. 171-179.
Vermeulen, I.E. and Seegers, D. (2009), “Tried and tested: the impact of online hotel reviews on consumer consideration”, Tourism Management, Vol. 30 No. 1, pp. 123-127.
Viglia, G., Minazzi, R. and Buhalis, D. (2016), “The influence of e-word-of-mouth on hotel occupancy rate”, International Journal of Contemporary Hospitality Management, Vol. 28 No. 9, pp. 2035-2051.
Wei, W., Miao, L. and Huang, Z.J. (2013), “Customer engagement behaviors and hotel responses”, International Journal of Hospitality Management, Vol. 33, pp. 316-330.
Worcester, R.M. and Burns, T.R. (1975), “Statistical examination of relative precision of verbal scales”, Journal of the Market Research Society, Vol. 17 No. 3, pp. 181-197.
Xiang, Z., Du, Q., Ma, Y. and Fan, W. (2017), “A comparative analysis of major online review platforms: implications for social media analytics in hospitality and tourism”, Tourism Management, Vol. 58, pp. 51-65.
Zhou, L., Ye, S., Pearce, P.L. and Wu, M.-Y. (2014), “Refreshing hotel satisfaction studies by reconfiguring customer review data”, International Journal of Hospitality Management, Vol. 38, pp. 1-10.
About the authors
Eva Martin-Fuentes is based at the Department of Business Management, Universitat de Lleida, Lleida, Spain.
Juan Pedro Mellinas is based at the Department of Business Management, Universidad Politecnica de Cartagena, Cartagena, Spain.
Eduardo Parra-Lopez is based at the Department of Business Management and Economic History, Facultad de Ciencias Economicas y Empresariales, Instituto de investigación social y turismo (ISTUR), Universidad de La Laguna, San Cristóbal de La Laguna.