Twitter ’ s capacity to forecast tourism demand: the case of way of Saint James

Purpose – Radical changes in consumer habits induced by the coronavirus disease (COVID-19) pandemic suggest that the usual demand forecasting techniques based on historical series are questionable. This is particularly true for hospitality demand, which has been dramatically affected by the pandemic. Accordingly, we investigate the suitability oftourists ’ activity on Twitter as a predictor ofhospitalitydemand in the Way of Saint James – an important pilgrimage tourism destination. Design/methodology/approach – This study compares the predictive performance of the seasonal autoregressive integrated moving average (SARIMA) time-series model with that of the SARIMA with an exogenous variables (SARIMAX) model to forecast hotel tourism demand. For this, 110,456 tweets posted on Twitter between January 2018 and September 2022 are used as exogenous variables. Findings – The results confirm that the predictions of traditional time-series models for tourist demand can be significantly improved by including tourist activity on Twitter. Twitter data could be an effective tool for improving the forecasting accuracy of tourism demand in real-time, which has relevant implications for tourism management.Thisstudyalsoprovidesabetterunderstandingoftourists ’ digitalfootprintsinpilgrimagetourism. Originality/value – Thisstudycontributestothescarceliteratureonthedigitalisationofpilgrimagetourismand forecastinghoteldemandusinganewmethodologicalframeworkbasedonTwitteruser-generatedcontent.Thiscanenablehospitalityindustrypractitionerstoconvertsocialmediadataintorelevantinformationforhospitality management.


Introduction
Research on the use of social media and social networking sites in hospitality and tourism has proliferated in recent years (Buhalis et al., 2017;Jamil et al., 2023;Kozak et al., 2018;Leung et al., 2021;Sigala, 2015).Social media information enables the analysis of user behaviour (Bigne et al., 2016;K€ uster Boluda et al., 2024;Nav ıo-Marco et al., 2018;Payntar et al., 2021), accelerates the knowledge transfer process, provides a direct link between users and knowledge (Abdollahi et al., 2023;Rita et al., 2022) and helps analyse the relationship between brand equity and social media intensity (Stojanovic et al., 2018).
Lately, it has been used as a data source for estimating tourism demand in a very incipient way.Li et al. (2021), in their review of tourism and hospitality forecasting research using Internet data, have identified only ten studies adopting social media data for forecasting.Since then, studies using social media data to improve predictions in the field of tourism have been increasing (e.g.Hu et al., 2022;Li et al., 2022;Sulong et al., 2022).Regarding Twitter, Bign e et al. (2019) have extracted important relevant information from this application to determine how destination marketing organisation (DMO) activities on Twitter affect hotel occupancy forecasting.Assaf et al. (2022), in their investigation to establish an expert-informed agenda for future research on tourism after COVID-19, have considered forecasting an area in which to progress, including the use of scenario forecasts using judgemental and econometric methods based on big data, tourism portals and social media.Several scholars have observed that during and after the pandemic, tourist demand was seriously impacted and the traditional methods of forecasting in these industries have become obsolete (Song and Li, 2021;Utkarsh and Sigala, 2021).Researchers have now begun to seek the best methods to predict the recovery of tourism from the devastating effects of COVID-19 (Polyzos et al., 2020;Zhang et al., 2021).
The relationship between tourism and pilgrimages has been studied in a fragmentary manner (Caber et al., 2021), despite the growing economic importance of this kind of tourism [1].While motivations and experiences have been analysed (Terzidou et al., 2018), limited attention has been paid to behaviour on online platforms and digital devices (de Ascaniis et al., 2019).
Accordingly, this study aims to fill the gap in the scarce literature on pilgrims' use of social networks and the suitability of user-generated data for accurately predicting hotel demand.The contribution of this research is threefold.First, it evaluates Twitter as a tool for predicting demandin this case, for pilgrimage tourism to "the Way"and provides insights into the time lag between tweets and demand manifestation.Second, it sheds light on the changes in hospitality demand and explores new forecasting approaches for estimating tourism demand during tumultuous times.Additionally, it provides new data on the digital footprint of pilgrimage tourism, an area where research is also very scarce.
As a research question, this study examines how hotel demand at a tourist destination can be accurately predicted using Twitter data.Particularly, this study analyses an international destination of special interest for pilgrimage tourism, namely, Santiago de Compostela, Spain.Accordingly, we assess the predictive performance of the seasonal autoregressive integrated moving average (SARIMA) time-series model with and without including the Twitter activity of pilgrims, considering the lagged effect of Twitter data and external factors, such as the Jubilee year in Santiago de Compostela.Accordingly, this study predicts tourism demand from January 2018 to September 2022 (using 110,456 tweets posted).
The remainder of this study is structured as follows: Section 2 briefly reviews the literature on techniques for forecasting tourism demand, use of social network data for forecasting and digital footprint of pilgrimage tourism.Section 3 presents the empirical analysis, including descriptions of the data and methodology.Section 4 presents and discusses the results.Finally, Section 5 presents the conclusion, major theoretical and managerial implications, study limitations and new avenues for future research.

Literature review 2.1 Tourism demand forecasting
Demand forecasting is essential for the hospitality and tourism sectors because of the transient nature of tourism.Therefore, growing interest in tourism demand forecasting is EJMBE reflected in the literature.Several studies have assessed the performance of different sources of big data generated on the internet for forecasting tourism demand (Li et al., 2021;Mariani and Baggio, 2022;Stylos et al., 2021).
Time-series models have maintained increasing acceptance in the literature on tourism demand forecasting studies (Huang and Zheng, 2023;Teixeira and Gunter, 2023;Wu et al., 2023).This is mainly because of their ability to forecast future time series by identifying historical patterns and capturing seasonality and trends in time series (Ma et al., 2023).However, in recent literature, a trend has emerged to incorporate exogenous explanatory variables into time-series models for predicting tourism demand (Hu et al., 2023;Jiao and Chen, 2019;Li et al., 2023a).Thus, SARIMAX models have gained importance among academics, especially after the COVID-19 pandemic.They improve the performance of pure time-series forecasting models during turbulent periods and allow the incorporation of exogenous variables with real-time information.For example, researchers have compared the performance of SARIMA models with exogenous variables using information collected from search engines (Li et al., 2023b;Wickramasinghe and Ratnasiri, 2021), online news (Park et al., 2021) and online reviews (Hu et al., 2022;Li et al., 2023a, b).The results confirm that the incorporation of this type of big data generated on the internet is useful for forecasting tourist demand for destinations or companies.

Social media as a source of prediction data
Studies have demonstrated that social media data measures people's attention and sentiments and provides real-time insights to predict consumer demand in different research areas, including economics and management.The main areas covered include the following: (a) stock market performance accurately predicted based on investors' opinions on social media (Guan et al., 2022;Nofer and Hinz, 2015;Yang et al., 2020), (b) transport and power demand predicted using real-time data from social media (Luna, Nunez-del-Prado, Talavera and Holguin, 2017;Punel and Ermagun, 2018;Roy et al., 2021) and (c) crude oil prices predicted with social media data during periods of sharp fluctuations caused by conflicts or political instability (Elshendy et al., 2017;Wu et al., 2021).
Regarding Internet-structured data in tourism, search engine data (Bangwayo-Skeete and Skeete, 2015; Choi and Varian, 2012;Wu et al., 2022) and web traffic data (Gunter and € Onder, 2016) have been widely used to forecast tourism demand.Conversely, social media data are unstructured and require crawler tools to collect and apply big data techniques for extracting useful information from online textual data or images, thereby making them relatively less popular (Li et al., 2021).
Focusing on Twitter, tourism studies have utilised this data source for sentiment analysis to identify tourist preferences and opinions on tourist services (Nadeau et al., 2022;Philander and Zhong, 2016), geographic information (Chua et al., 2016;Piramanayagam and Seal, 2022;Xin and MacEachren, 2020), promotion of tourist attractions (Bokunewicz and Shulman, 2017;Meehan et al., 2016) and international trade show organisation (Geldres-Weiss et al., 2023).However, only a few studies have analysed the usefulness of big data from Twitter to Twitter and tourism demand analyse tourism demand (e.g.Bign e et al., 2019;Sulong et al., 2022;Yang et al., 2022) and define management approaches and business responses to the COVID-19 pandemic in real-time (Chen et al., 2023;Yang and Han, 2021).Previous literature has recognised Twitter's representativeness as a concern (Beninger and Lepps, 2014), but some authors recognise its interest if a contextual interpretation is made (Tromble, 2019).Twitter data differ in nature from data collected through traditional quantitative methods, such as surveys or experiments (Chen et al., 2022).Survey data are controlled and designed by researchers, while social media data can be considered organic data (Groves, 2011).The concept of organic data refers to data that are not collected following an explicit research design but documented using a technology that collects natural "digital footprints" of human activities, such as data from sensor devices, mobile applications or online social networks (Xu et al., 2020).
According to Xu et al. (2020), the advantages of these data coexist with challenges regarding data quality that researchers must consider because of their organic nature.First, data quality is more likely to be guaranteed in surveys and experiments because researchers have more control over which participants are recruited and what questions to ask.However, the emergent nature of social media discussions offers researchers opportunities to identify new perspectives and frameworks not previously identified (Kla snja et al., 2018).Although researchers have more control over the data generation process in surveys and experiments, it is expensive to collect surveys.Furthermore, organic data generated on social networks allows information to be extracted in real-time.Traditionally, hotel demand forecasts have been based solely on government statistical reports published annually or monthly (Huang et al., 2017).Nevertheless, hospitality industry professionals need up-to-date information to adjust to changes in tourism demand in real-time and achieve greater efficiency in the sector.
Newness is a strength of social media data, which is especially useful for studying emerging topics.The novelty of the data brings with it a data quality challenge that requires researchers to develop methods to indirectly assess user characteristics, such as user identity and motivations.Similarly, numerous authors have indicated that the pandemic has called into question traditional forecasting methods because data from official sources with guaranteed representativeness are not available in real-time, which makes it even more interesting to explore new data sources that are open and original, as done in this study.

Pilgrimage tourism's digital footprint
Literature on the digital aspects of pilgrimage tourism is scarce, recent and focused on human mobility (Barnett et al., 2016).De Ascaniis et al. (2019) have reviewed 13 academic papers and identified the following four themes: the adoption of information and communication technology (ICT) by religious travellers, usage and functionalities of mobile applications, online travel reviews to understand visitors' experiences at religious sites and online transmission of religious mass events.Research interest in religious tourists' behaviour on digital platforms, such as social media and social networking sites, remains incipient.Caber et al. (2021) have identified a few early works, such as Haq and Jackson (2009) investigating the impact of ICTs on religious tourists' perceptions and Park et al. (2015) surveying American participants to gauge their interest in visiting pilgrimage destinations and willingness to share their experiences on social networking sites.
"The Way" is a pilgrimage tourism destination that generates both religious and tourist interest worldwide (L opez et al., 2017).Vila et al. (2020) have indicated that religious or spiritual motivation is present but interlinked with other motivations, such as heritage, culture and experience."The Way" is an international and multiconfessional space where pilgrims and tourists interact to co-create the route's postmodern identity and personality (L opez and Lois Gonz alez, 2020).Pilgrims in "the Way" benefit from using mobile phones EJMBE while walking (Antunes and Amaro, 2016;Nickerson et al., 2014).Fern andez-Poyatos et al.
(2012) have studied the presence of "the Way" on regional tourism websites in Spain, while V azquez et al. (2020) have analysed the usage and effectiveness of Facebook fan pages of institutions in Spanish regions through which the French Way of Saint James passes for tourism promotion.No other research has been conducted on social media use pertaining to this topic.
Pilgrimage tourism, gaining popularity since the COVID-19 outbreak, has demonstrated great resilience during the pandemic (Lin and Hsieh, 2022;Mittal and Sinha, 2021).As outdoor activities, pilgrimage routes can provide a safe environment and improve tourist well-being, offering an alternative to mass tourism (Lin et al., 2022).Therefore, tourist destinations have used religious tourism as a key market segment to mitigate disruptions in tourism demand caused by the COVID-19 pandemic (Mittal and Sinha, 2021).In fact, pilgrimage tourism is positioned as a novel travel trend in tourism in the "new normal" (Campos et al., 2022).This makes research that combines tourist demand, social media and pilgrimage tourism particularly interesting.

Empirical analysis 3.1 Data
Pilgrimage tourism is in a state of rejuvenation and is gaining importance among various tourism segments (Collins-Kreiner, 2020).This empirical analysis investigates the relationship between the digital footprint of pilgrims on "the Way" and hotel tourism demand for Santiago de Compostela.This is a major European pilgrimage itinerary recognised as the first European Cultural Route by the Council of Europe.Figure 1 presents the international dimensions of Santiago de Compostela as a tourist destination in 2019 (the year before the COVID-19 pandemic).Graph A reveals that foreign tourism represents 45.5% of the total hospitality demand, whereas Graph B reveals the distribution of international tourism demand by country of origin.The USA, Italy, Germany, Portugal, France and the UK generated 55.5% of international tourism demand.
Figure 2 depicts the framework used in this study to predict tourism demand in Santiago de Compostela based on big data generated on Twitter by pilgrims to the Saint James Way.It presents the data sources, data collection, model specifications and processes used in the empirical analysis.

Twitter and tourism demand
As shown in Figure 2, the tourism demand for Santiago de Compostela is measured using the total number of tourists staying in hotel accommodations (TOUR).Monthly tourist arrivals are collected from the Hotel Occupancy Survey (HOS), published by the Spanish National Statistics Institute (INE) since 1996.It provides disaggregated information on travellers by country of origin and destination (regions, provinces and tourist sites).This measure includes the total number of travellers arriving by any means of transportation and staying in an establishment that provides hotel accommodation services (hotels, aparthotels, motels, hostels, B&Bs, pensions and guesthouses).
Figure 2 shows the digital footprint of tourists on Twitter as a secondary source of data.A crawler created with the programming language Python is used to extract the digital footprints of tourists on Twitter.Specifically, a script is designed to collate tweets posted with target hashtags using Twitter API V2.As Santiago de Compostela is an international pilgrimage destination, the decision to use hashtags was supported by an exhaustive search for hashtags related to tourism.Previous literature has supported the idea that the use of hashtags on Twitter is a powerful and helpful source of data (Geldres-Weiss et al., 2023;Wang et al., 2016).According to Carvache-Franco et al. (2023), using hashtags to gather information is advantageous because it allows the concentration of users' opinions on a specific topic.Although the use of hashtags may exclude some data, it also helps avoid irrelevant data.

EJMBE
Twitter is a massive platform with a large amount of noisy and irrelevant data.Using hashtags helps categorise topics, making it easier to identify users who are talking about the same topic (Bruns and Burgess, 2011).Using hashtags also allows us to filter this noise and focus on the data most relevant to our study.All hashtags included in tweets published during the study period that contained the key search "Santiago de Compostela" were identified.By comparing the most repeated hashtags related to tourism for this destination, the following categories were identified: (1) "Saint James Way", (2) "Pilgrims" and "Pilgrimage" and (3) "Xacobeo" and "Jacobeo".
We excluded hashtags related to "Pilgrims" and "Pilgrimage" because they could include tweets pertaining to other pilgrimage destinations.However, tweets pertaining to St. James Way and Xacobeo were exclusive to tourism in Santiago de Compostela.Therefore, a combination of the 20 most published hashtags related to categories (1) and (3) in Spanish, English, German, French and Portuguese was selected (Table 1).These languages were selected because countries with these languages as their native languages represented 75% of hotel tourism demand in Santiago de Compostela in 2019.
After eliminating duplicate retweets, 110,456 tweets remained, based on which the monthly number of tweets was used to derive the explanatory variable -Twitter Data (TD).According to Guizzardi and Mazzocchi (2010), factors that occur at a specific moment in time, such as the Jubilee Year, can determine short-or long-term modifications in tourist flow.Therefore, a temporary dummy was created to control the effect of an extraordinary increase in tourism demand in 2021 and 2022, the Jubilee years in Santiago de Compostela (Compostela Holy Year, Xacobeo Year or Jacobeo Year).This variable takes the value of one for 2021 and 2022 and zero otherwise.

Methodology
In this study, we compare two ARIMA-based forecasting models (SARIMA and SARIMAX models) to evaluate the appropriateness of using user-generated content on social media to improve the predictive capacity of time-series models in turmoil stages.In this exploratory case, we forecast monthly tourism demand for the internationally known destination of Santiago de Compostela.
The comparison of the SARIMA models in our time-series prediction methodology aligns with the goal of achieving accurate predictions, considering the specific characteristics of our dataset.We aim to capture the effects of exogenous shocks as part of the SARIMA model.To achieve this, we compare the predictive capacity of the SARIMA pure time-series forecasting and SARIMA models with exogenous variables (SARIMAX).
The SARIMA model was selected because of its various statistical advantages, supported by previous research on tourism demand forecasting (Qiu et al., 2021;Song et al., 2019).According to Song et al. (2019), the SARIMA model is the most commonly used model in tourism research because it considers the trends and/or seasonality components of a time series.Additionally, the parsimonious structure of the SARIMA models balances complexity and performance (Lama et al., 2022;Saz, 2011).
The SARIMA (p,d,q) (P,D,Q) model is as follows: where y t expresses the tourism demand at time t; the autoregressive (AR) and moving average  2) indicates that SARIMAX is a regression model with SARIMA errors where the regression is first conducted.
where X k t is the exogenous variable at time t and β k is the corresponding coefficient of the exogenous variable added to the parameters of the aforementioned SARIMA model described.
To validate the models and assess their respective predictive capacities, we fit the models with data from January 2018 to December 2021 and use those from January 2022 to September 2022 to test the accuracy of the predictions.To evaluate the forecast accuracy of the models, we use the following common evaluation measures from tourism and hospitality forecasting research: the mean absolute error (MAE) and root mean square error (RMSE), calculated using Eq. ( 3) and (4).
where b y t and y t are the predicted and actual values representing tourism demand in Santiago de Compostela, respectively.EJMBE

Results and discussion
An exploratory analysis during the fitting period reveals that the variable TD displays the same trend as the variable TOUR, which denotes the volume of tourists staying in hotels in Santiago de Compostela; however, the peaks of the former occur one month earlier than those of the latter (see Figure 3).This indicates that tourists' Twitter activity is a good predictor of hotel demand.Tourism demand has a high seasonal component, which is adjusted according to the model specifications.The augmented Dickey-Fuller (ADF) and Phillips-Perron (PP) unit root tests confirm the presence of a unit root in the dependent and independent variables at the 1% significance level.Therefore, the first differences of all the variables are considered to ensure a stationary series.Correlograms and partial autocorrelation functions are examined to determine the appropriate order of the AR and MA components.
To analyse Twitter data's dynamic structure to forecast tourism demand, we use the Akaike information criterion (AIC) and Schwartz Bayesian information criterion (SBIC) to determine the monthly lagged distribution of the explanatory variable.The results indicate that the optimal lag length for the independent variable is two months.Additionally, the Granger causality test confirms a causal relationship between hotel demand and tourists' Twitter activity.
Table 2 presents the forecast errors of the in-sample estimation and improvement achieved in the final SARIMAX model compared to the SARIMA model [2].The results indicate that including exogenous variables improves the SARIMA model's fit by 5.75 and 9.05% for the MAE and RMSE evaluation measures, respectively.Twitter and tourism demand The performance of the out-of-sample prediction summarised in Table 3 confirms a significant improvement in the SARIMAX model by 20.3 and 18.0% when using the MAE and RMSE evaluation measures, respectively.The robustness of the analysis is tested by modifying the fitting periods of the models and comparing their predictive performance after including Twitter data.This analysis confirmed the goodness of fit of the results.
Consistent with Yang and Han (2021), this study provides novel perspectives for practitioners to gain relevant hospitality business insights using social media data.Our results' alignment with those of previous studies verifies the utility of using Twitter to improve hotel demand forecasts, as in Bign e et al. (2019), and confirms a significant improvement in prediction accuracy, even during the pandemic, with the inclusion of new real-time data sources.Similarly, incorporating online review data improves the MAE forecast models by 2.97 and 6.19% and the RMSE between À3.41 and 7.98%, following Hu et al. (2022).
Moreover, our results confirm the importance of the lag structure of data sources in forecasting research, allowing tourism companies and policymakers to accurately anticipate future tourism demand.According to the results of our research, the Twitter activity of pilgrims from the previous two months can help hospitality companies predict the tourism demand for the Saint James Way.
Figures 4 and 5 illustrate the actual and predicted tourism demand for Santiago de Compostela using the SARIMA and SARIMAX models, respectively.The evaluation measures of the SARIMA model and prediction accuracy shown in Figure 4 confirm that pure autoregressive models are inefficient in forecasting tourism demand during and after the pandemic.Therefore, we propose that researchers and stakeholders use Twitter activity data to accurately predict tourism demand (see Figure 5).

Conclusions
The pilgrim's footprint when walking "the Way" becomes a digital footprint in the 21st century.Our investigation contributes both to the scarce literature on digital pilgrimage tourism and research on forecasting hotel demand by proposing a new methodological framework based on user-generated content on Twitter for the case of the internationally known pilgrimage destination "the Way of Saint James".This study demonstrates the importance of regularly refining forecasting methods using new data sources available in the digital world for effective forecasting.Thus, some theoretical implications are derived from this study.First, it improves our understanding of the usefulness of social networks, particularly Twitter, in forecasting tourism models.Second, it identifies the time lag between user information generated on Twitter and consumer demand.Third, it connects the digitalisation of pilgrimage tourists with the use of social networks and digital footprints.
We agree with Gunter and € Onder (2015), suggesting that an accurate prediction of the number of tourists visiting a destination has implications for tourism management, such as sustaining tourism demand and efficient planning to accommodate tourists.This study has three primary managerial implications.First, the possibility of accurately predicting tourism demand from publicly shared information by pilgrims can improve hotel management efficiency at tourist destinations and prevent hotel oversupply or undersupply.Second, our  (2018), the lag time structure of the data enables a better prediction of the demand and management of tourist destinations.This is because it allows the number of visitors to a destination to be known before they arrive.Finally, the COVID-19 pandemic has generated instability in tourism demand, induced by perceived health risks and governmentimposed mobility restrictions, forcing managers to modify demand predictions frequently.Therefore, this study provides stakeholders with a methodological framework to accurately forecast real-time tourism demand and anticipate changes during times of crisis and instability.
In summary, Twitter offers two primary practical advantages for tourism management in Santiago de Compostela.First, it provides real-time information, which is particularly important during periods of uncertainty and volatility, such as those caused by the COVID-19 pandemic.Second, it helps accurately predict tourism demand, which can improve the tourism industry's efficiency.Therefore, this study recommends that stakeholders and decision-makers use Twitter as a new source of big data because it can serve as a leading indicator of changes in tourism demand.
This study has some limitations, the main one being its exploratory nature because it is limited to a single destination.One limitation of sampling our data using hashtags is that tweets related to elections without a hashtag would be ignored.However, the results obtained make it advisable to replicate the study in other tourism environments to observe the feasibility of using Twitter as a source for forecasting tourism demand, especially considering some of the trends found in this study are promising.
Nevertheless, the exploratory nature of this study does not detract from the relevance of its results, in which we are able to identify opportunities for Santiago de Compostela hotel demand planning.Furthermore, this study is limited to the pilgrimage destination of the Saint James Way and the results for other destinations should be cross-checked in future studies.Thus, the application of the Twitter-based forecasting method to other destinations is a clear avenue for future research.
In any case, we consider that our findings represent a step forward in the search for new forecasting methods that work even in the event of strong demand shocks, such as those caused by the COVID-19 pandemic and in understanding the relationship between social media data and pilgrimage tourism demand.
Notes 1.The United Nations World Tourism Organization estimates that 330 m people travel for religious reasons each year (https://www.unwto.org).Additionally, it is estimated that global income from religious tourism will increase from a total of $15.1 bn in 2023 to approximately $41 bn in 2033, according to the market analysis firm Future Market Insights (https://www.futuremarketinsights.com).
2. The improvement achieved using Twitter data are measured as follows: Improvement ¼ Evaluation Metric ðSARIMAÞ − Evaluation Metric ðSARIMAX Þ Evaluation Metric ðSARIMAÞ

Figure 1 .
Figure 1.Volume and distribution of tourism demand in Santiago de Compostela in 2019 (pre-COVID-19) Figure 2. Framework for tourism demand predictions based on Twitter data

(
MA) components are represented by f and θ of orders p and q, respectively; ΦðB m Þ and ΘðB m Þ denote the seasonal AR(P) and seasonal MA(Q) components, respectively; ð1 − BÞ d and ð1 − B m Þ D represent the difference and seasonal difference indicators, respectively; ε t expresses the white noise error term.Using a linear regression, external variables can be added to the SARIMA model to create a SARIMAX model.Eq. ( The values in italic indicate the model with the best evaluation metric Source(s): Tableby authors Li et al. (2022)21) the research questions and confirm our initial assumptions.With an improvement of between 18.0 and 20.3%, depending on the evaluation metric, pilgrimgenerated digital content on social media can be used to improve the predictive capacity of time-series models.We agree withZhang et al. (2021)in that hospitality companies' business planning, including budgeting, resource allocation and marketing, is based on demand forecasts.Consistent withLi et al. (2022), we avoid inaccurate predictions that could result in a supply-demand mismatch of tourism services, significantly affecting management, efficiency, productivity and the tourism sector's profitability.Therefore, this study makes a timely contribution to model development in tourism demand forecasting by proposing Twitter data as an exogenous variable to generate more accurate forecasts.Additionally, the results verify the lag-time structure of Twitter data, enabling the anticipation of changes in tourism demand during uncertain periods.
Huang et al. (2017)nt published on Twitter during the previous two months is significant for forecasting hotel demand in Santiago de Compostela.Consistent withHuang et al. (2017)andLiu et al. findings