Housing search activity and quantiles-based predictability of housing price movements in the USA

Purpose – Recent evidence from a linear econometric framework infers that housing search activity, captured from Google Trends data, can predict housing returns for the USA at a national and regional (metropolitan statisticalarea [MSA]) level.Basedonsearch theory, theauthors, however,postulatethat search activity can also predict housing returns volatility. This study aims to explore the possibility of using online searchactivitytopredictboth housing returnsandvolatility. Design/methodology/approach – Using a k-th order non-parametric causality-in-quantiles test allows us to test for predictability in a robust manner over the entire conditional distribution of both housing price returns and its volatility (i.e. squared returns) by controlling for nonlinearity and structural breaks that exist in thedata. Findings – The analysis over the monthly period of 2004:01 to 2021:01 produces results indicating that while housing search activity continues to predict aggregate US house price returns, barring the extreme ends of the conditional distribution, volatility is relatively strongly predicted over the entire quantile range considered. The results carry over to an alternative (the generalized autoregressive conditional heteroskedasticity-based) metric ofvolatility,higher (weekly)-frequency data(over January2018 – March 2021) andto over 84%ofthe 77MSAsconsidered. Originality/value – To the best of the authors ’ knowledge, this is the ﬁ rst study regarding predictability of overall and regional US housing price returns and volatility using search activity, based on a non-parametric higher-order causality-in-quantiles framework, which is insightful to investors, policymakers and academics.


Introduction
In a recent paper, Møller et al. (2023) provide statistical evidence in favour of the hypothesis that online housing search activity (measured by a housing search index [HSI] obtained from Google Trends data) [1], which captures peoples' intentions of buying a house and hence, proxies for housing demand, contain predictive information for housing price returns for the overall USA, and its regions.This is not surprising, since an increase in search activity is propagated into future periods, which, given various frictions in the housing market, would imply sluggish price adjustment in response to an increase in demand, so that search activity should hold predictive power for future variation in house prices.In this regard, the reader is referred to the conclusions drawn from the theoretical search-based models or Berkovec and Goodman (1996), Genesove and Han (2012) and Carrillo et al. (2015).
In this regard, Ngai and Sheedy (2022), extending the earlier works of Díaz and Jerez (2013), Ngai and Sheedy (2020) and Smith (2020), used a calibrated search and matching model with both endogenous inflows (new listings) and outflows (sales), to show that a single persistent housing demand shock induces more moving and increases the supply of houses on the market and hence, can quantitatively match the data on volatility of various housing market variables, including housing price returns variability.In other words, we can postulate that the HSI of Møller et al. (2023) should not only contain predictive information for house price returns but also its volatility.
To test our proposition, we use the k-th order non-parametric causality-in-quantiles framework of Balcilar et al. (2018).This econometric model allows us to test the predictability of the entire conditional distributions (capturing regimes) through quantiles of both housing price returns and squared returns, i.e. volatility simultaneously, by controlling for misspecification due to uncaptured non-linearity and regime changes with the HSI in a non-parametric mannerboth of which we show to exist in our data set via formal statistical tests.As our focus is on volatility in this paper, being an extension to the work of Møller et al. (2023), to check for the robustness of our results, we also apply the first-order of the test to the conditional volatility as captured by the generalized autoregressive conditional heteroskedasticity (GARCH) model of Bollerslev (1986), which is a wellestablished approach to obtain an estimate for model-based volatility.While the primary focus is the aggregate US housing price returns and its volatility, just as in Møller et al. (2023), we also analyse the predictive impact of the HSI for the first and second moment of housing prices of 77 metropolitan statistical areas (MSAs), as it is well-known that the US housing market is highly segmented (Gupta et al., 2023).Based on data availability, we conduct these predictive experiments over the monthly period of 2004:01 to 2021:01.
Statistically speaking, US residential real estate represents about 85.00% of total household non-financial assets, 32.56% of total household net worth and 35.10% of US net wealth (financial accounts of the USA, Second Quarter, 2023) [2].Hence, it is not surprising that housing price movements have been historically associated with aggregate and regional business cycles [Balcilar et al. (2014), Apergis et al. (2015), Nyakabawo et al. (2015), Emirmahmutoglu et al. (2016) and Payne and Sun (2023)].Naturally, predicting the future path of housing price returns and its volatility contingent on the information content of the HSI in our current context is of immense value, to not only real estate consumers and investors but also to the policymaker.Understandably, information on the evolution of house price movements at a higher frequency would be immense value in making timely portfolio decisions (Bollerslev et al., 2016;Nyakabawo et al., 2018), and in particular to policy authorities from the perspective of nowcasting (Ba nbura et al., 2011), which will assist in the designing of monetary and fiscal responses ahead of time to prevent possible recessions (Balcilar et al., 2020(Balcilar et al., , 2021;;Bouri et al., 2021).Hence, we also conduct our analysis at a weekly-frequency over the period of January 2018 to March 2021.
To the best of our knowledge, this is the first paper that evaluates the predictive power of HSI for overall and regional US housing price returns and volatility based on a nonparametric higher-order causality-in-quantiles framework, with the hypothesis derived from IJHMA Ngai and Sheedy (2022), who show that a single persistent housing demand shock induces more moving and increases the supply of houses on the market, and hence, can cause housing price returns variability.In the process, we add to the large existing literature on predicting the first and second moment of US house prices using various types of econometric models and predictors, the review of which is not only beyond the scope of this paper but also not its objective, with the reader referred to the recent works of Bork and Møller (2015), Bork et al. (2020), Segnon et al. (2021) and Gupta et al. (2022) for this purpose.
The remainder of the paper is structured as follows: Section 2 describes the data used for our analysis, as well as outlines the methodology.Section 3 presents the findings, and Section 4 concludes the paper.

Data sets
This sub-section provides specific details concerning the data set used in the main analysis.Furthermore, it presents an overview of the econometric methodology implemented to perform our investigation.
As mentioned above, we make use of a newly developed HSI introduced by Møller et al. (2023) to test the possibility of using online search activity to predict housing returns and volatility of the aggregate USA and that, for 77 MSAs [3].HSI is constructed using Google trends data, to quantify internet search activity related to housing demand.Google Trends data are available from 2004 onwards, resulting in a sample period of 2004:01 to 2021:01 at the monthly frequency.To obtain a measure of housing demand, Møller et al. (2023) initially used "buying a house" as their main search term and subsequently used a list of 22 related terms, namely: "when buying a house", "buying a home", "buy a house", "mortgage", "buying a new house", "before buying a house", "how to buy a house", "real estate", "steps to buying a house", "buying a house calculator", "first time buying a house", "buying a house process", "house buying process", "homes for sale", "building a house", "buying a house with bad credit", "cost of buying a house", "buying a house to rent", "mortgage calculator", "houses for sale", "buying a house tips" and "buying a foreclosure house".
To filter out the noise and more accurately estimate latent demand, Møller et al. (2023) use the elastic net estimator to select the ten most relevant search indexes and then apply principal component analysis (PCA) to summarize the most important information from these indices into one common component, which is interpreted as a summary measure for housing search and referred to as the HSI [4].Note that the same approach is followed for the overall USA, but by now specifying the MSA for which the search is conducted [5].Recall that the elastic net is an estimation method for linear regressions with many predictors, which allows the econometrician to choose the most relevant predictors in explaining the variability of the dependent variable.While, PCA is a data analysis technique, which is particularly useful for reducing the dimensionality of data sets, but ensuring the preservation crucial information.
We use the log returns (HR), i.e. the first difference of the natural logarithm of the seasonally adjusted monthly Federal Housing Finance Agency purchase-only house price index for the USA and the MSAs to capture housing price returns, with the corresponding squared values measuring volatility [6].As indicated earlier, a GARCH model was estimated on the log returns also provide an alternative conditional (modelbased) estimate of volatility.As part of our high-frequency analysis, we also use housing log returns at weekly frequency using the smoothed, seasonally adjusted weekly median sale prices from Zillow [7] for the overall USA over the 1st week of January, 2018 to the 4th week of March, 2021.Given the availability of the HSI data at a high frequency, this Housing search activity analysis will depict to us if the results are dependent on the frequency of the data and, hence, the speed of information transmission from housing search to being reflected in the moments of house prices.
Table A1 and Figure A1 in the Appendix of the paper summarizes the HR and HSI variables for the overall USA over 2004:01 to 2021:01.As can be seen from Table A1, HR is negatively skewed and has excess kurtosis, resulting in a non-normal distribution as indicated by the overwhelming rejection of the null of normality under the Jarque-Bera test.This evidence of heavy-tails provides preliminary justification for using a quantiles-based approach, rather than a conditional mean-reliant method, to predictability, as using the latter is likely to miss important information at various parts of the conditional distribution of housing returns.

Econometric model
In this sub-section, we briefly present the methodology for testing nonlinear causality via a hybrid approach as developed by Balcilar et al. (2018), who combine the higher-order conditional mean-based non-parametric approach of Nishiyama et al. (2011), with the quantiles-based first-moment framework of Jeong et al. (2012).Let y t denote housing returns and x t the HSI.Further, let.
¼ u with probability one.The (non)causality in the u-th quantile hypotheses to be tested are: Jeong et al. (2012) show that the feasible kernel-based test statistics has the following format:  Nishiyama et al. (2011), to the second (or higher) moment which allows us to test the IJHMA causality between the HSI and housing returns volatility.In this case, the null and alternative hypotheses are given by: The causality-in-variance test can then be calculated by replacing y t in equations ( 3) and ( 4) with y 2 t .As pointed out by Balcilar et al. (2018) a rescaled version of the Ĵ T has the standard normal distribution.The testing approach is sequential and failing to reject the test for k ¼ 1 does not automatically lead to no causality in the second moment; one can still construct the test for k ¼ 2.
The empirical implementation of causality testing via quantiles entails specifying three key parameters: the bandwidth (h), the lag order (p) and the kernel types for K (•) and L (•).We use a lag order of one based on the Schwarz Information Criterion (SIC).We determine h by the leave-one-out least-squares cross-validation.Finally, for K (•) and L (•), we use Gaussian kernels.

Empirical findings
Before we discuss the findings from the causality-in-quantiles test, for the sake of completeness and comparability we conduct the standard linear Granger causality test, with a lag-length of 1, as determined by the SIC.The resulting x 2 (1) test statistic associated with the causality running from HSI to HR is 61.5012 with a p-value of 0.0000, i.e. the null hypothesis that housing search activity does not Granger cause housing returns, in line with Møller et al. (2023), is strongly rejected.However, the linear framework is unable to provide information on regime-specific, i.e. quantiles-based, predictability, besides being silent about the causal influence on volatility, i.e. squared returns.Naturally, we turn to the k-th order non-parametric causality-in-quantiles test next.But to econometrically motivate this framework, we statistically examine the presence of nonlinearity and structural breaks in the relationship between the HSI and HR.Nonlinearity and regime changes, if present, would warrant the use of the non-parametric quantiles-in-causality approach, since this data-driven test would formally address the issues of nonlinearity and structural breaks in the relationship between the variables under investigation.
For this purpose, we first apply the Brock et al. (1996, BDS) test on the residual derived from the HR equation involving one lag each of HR and HSI.Table A2 in the Appendix presents the results of the BDS test of non-linearity.As the table shows, we find evidence, at least at the 5% level of significance, for the rejection of the null hypothesis of i.i.d.residuals at various embedded dimensions (m), which, in turn, is indicative of non-linearity in the relationship between housing search activity and housing price returns.To further motivate the causality-in-quantiles approach, we next use the powerful UDmax and WDmax tests of Bai and Perron (2003), to detect 1 to M structural breaks in the relationship between HR and HSI, allowing for heterogeneous error distributions across the breaks.When we apply these tests to the HR equation involving one lag each of HR and HSI, we detect four breaks on: 2008:12, 2012:03, 2014:09 and 2017:03 associated with the downturns and lower search activity in the housing market during the global financial and the European sovereign debt reflecting weak economic conditions, but then sustained economic recovery and improved HSI since 2014 (see, Figure A1).

Housing search activity
Given the strong evidence of non-linearity and structural breaks in the relationship between HR and HSI, we now turn our attention to the causality-in-quantiles test, which is robust to misspecification in the linear model due to its non-parametric nature, besides allowing us to test for predictability over the entire conditional distributions of both returns and volatility.The results are reported in Figure 1, whereby we test the regime-specific null hypothesis of no -Granger causality running from HSI to HR and HR 2 over the quantile range of 0.10 to 0.90 based on the standard normal test statistic.As can be seen from the figure, predictability for housing returns from HSI holds over the range of 0.20 to 0.80 at least at the 5% level of significance, with the strongest causal influence observed at the median.Interestingly, there is no evidence of predictability at the extreme quantiles of 0.10 and 0.90.In other words, allowing for a quantiles-based model, we provide more nuanced evidence of predictability as detected by Møller et al. (2023) from a linear (conditional meanbased) predictive regression framework, as we are able to detect varied strength of causality conditional on the regimes of the market.Put alternatively, we can now say that the impact of HSI on HR increases as we move from a bearish regime to a bullish regime, with a peak at the median, but there is no evidence of causal influence at the two extreme ends of the market [8].These findings tend to support the idea that for exceptionally weak and strong phases of the real estate market, i.e. at the quantiles of 0.10 and 0.90, participants tend to herd (Babalos et al., 2015;Ngene et al., 2017) and, hence, does not require information of a predictor like HSI to gauge the future path of HR.Note that, housing markets are also known to commove during booms and busts (Cotter et al., 2015) corresponding to the unconditional upper and lower quantiles of HR, so the role of a fundamental, like HSI, might become invalid here.Furthermore, the lack of predictability at the upper quantiles could also be   Interestingly however, the predictability of HSI for squared returns, i.e. volatility is observed over its entire conditional distribution at least at the 5% level of significance for majority of the quantiles (barring the 90th quantile, where causality holds at the 10% level), with a peak at the quantile of 0.40.In other words, we provide strong evidence in favour of our hypothesis that housing search activity can lead to house price volatility over and above housing returns, with the effect holding irrespective of the size this price variability, unlike returns.
Although robust predictive inference is derived based on the non-parametric causalityin-quantiles test, it is also interesting to estimate the sign of the effect of the HSI on HR and HR 2 at various quantiles, especially to validate the theoretical positive relationship outlined in the introduction.But, in a non-parametric framework, this is not straightforward, as we need to use the first-order partial derivatives.Estimation of the partial derivatives for nonparametric models can give rise to complications, because non-parametric methods exhibit slow convergence rates, due to the dimensionality and smoothness of the underlying conditional expectation function.However, one can look at a statistic that summarizes the overall effect or the global curvature (i.e. the global sign and magnitude), but not the entire derivative curve.In this regard, a natural measure of the global curvature is the average derivative (AD) using the conditional pivotal quantile, based on approximation or the coupling approach of Belloni et al. (2019), which allows us to estimate the partial ADs of HR or HR 2 with respected to HSI.Based on these ADs reported in Table 1, we find consistent evidence of a positive predictive effect of HSI on housing price returns and its volatility.
When we rely on a GARCH-based metric of volatility [9], our findings, as reported in Figure 2, continue to be robust in the sense that predictability is again observed over the entire conditional distribution, with a peak at the median, in a quite strong manner at the 1% level of significance, except at the two ends where the same holds at the 10% level.Barring the highest quantile of 0.90, a similar result is also obtained for weekly squared returns, as seen from Figure 3.At the same time from Figure 3, it must be noted that causality of HSI on HR is restricted now over the quantile range of 0.30-0.70,i.e. compared to the monthly data, though sample periods are different, the lack of predictability at the two ends of the conditional distribution of house price returns gets extended.
At the regional-level, as can be seen from Table 2, 65 of the 77 MSAs considered, i.e. in 84.42% of the cases considered, there is evidence of predictability running from HSI to HR (for at least one quantile of the conditional distribution at the 1% to 10% level of significance).In line with the results for the overall USA, for these instances, predictability peaks at quantiles closer to the median and fades away at the extreme ends.Furthermore, as reported in Table 3, predictability for housing price returns volatility is detected for 68 of the 77 MSAs, i.e. in 88.31% of instances, again with an inverted u-shaped pattern of the test statistic registering its highest value close to the median.But in this case, just as for the aggregate USA, the coverage of causality over the conditional distribution of volatility is relatively higher compared to that of HR, in terms of the number of quantiles for which predictability is observed.
In sum, we tend to conclude that the predictability of HSI for house price volatility, unlike housing returns is, in general, not regime-specific and tends to be stronger in the sense of its coverage of the entire conditional distribution of the former, with these observations tending to hold both at the aggregate and MSA-level of the USA housing market.In other words, we provide for the first time, robust empirical validation of the theoretical proposition found in the housing search-theoretic models that housing search activity is likely to predict not only housing price returns but also its volatility.

Conclusions
In a recent study, Møller et al. (2023) developed a Google-based online search volume index of housing activity as a measure of underlying housing demand to show that the metric can predict housing price returns of the USA and its MSAs.Based on recent models of housing search theory, in particular of Ngai and Sheedy (2022), we can also postulate that this HSI should also be able to predict volatility in house prices.To test our  We show that while housing search activity continues to predict aggregate USA house price returns under the misspecified linear Granger causality model, as in Møller et al. (2023), the same, in general, also holds true for the quantiles-causality framework, barring at the extreme ends of the conditional distribution of returns.Our results thus provide a more nuanced evidence of causality running from HSI to housing price returns, with an inverted u-shape of the strength of the underlying standard normal test statistic of the k-th order nonparametric test of causality-in-quantiles, which reached its peak at the median.In other words, the strongest evidence of causality from HSI to HR is obtained around the conditionally normal state of housing returns.Comparatively, volatility is found to relatively strongly predicted over the entire quantile range considered of squared returns, with the highest value test statistic again registered close to the conditional median, i.e. normal volatility-regime.Our results tend to carry over to an alternative (the GARCH-based) metric of volatility, as well as for higher-frequency, i.e. weekly data over January, 2018 to March, 2021.When we take a regional perspective by delving into 77 MSAs, we find that the predictive impact of HSI is detected for 65 and 67 of the cases for the first and second moment of house prices, respectively.In other words, the causal influence of HSI is dominant not only for the overall USA but also at the local-level, with strong evidence in

Housing search activity
favour of our hypothesis that housing search activity tends to predict housing returns volatility, over and above returns.
In sum, using a novel methodology involving the k-th order non-parametric causality-in-quantiles framework, we are able to test the predictability of the entire conditional distributions of both housing price returns and squared returns, i.e. volatility simultaneously by controlling for misspecification due to uncaptured nonlinearity and regime changes with the HSIboth of which we show to exist in our data set via formal statistical tests.In the process, we are able to provide robust evidence of causality running from HSI to not only returns but also volatility at various states or regimes of the two moments of housing prices, which has indeed implications for various market agents.Overall, our paper provides an empirical test of the theoretical proposition that housing search should produce housing market volatility for the first time.
As our predictive analysis is performed at the monthly as well as weekly frequencies associated with housing returns, our results can be used by policymakers to obtain highfrequency information about where the housing market is headed due to changes in housing search activity and predict the future path of low-frequency, i.e. quarterly, economic activity variables, such as growth of gross domestic product, at monthly and weekly-levels, given that house price movements are known to lead US business cycles.At the same time, since higher search activities result in higher future housing returns and volatility, the monetary authorities might need to undertake contractionary monetary policies to ensure the inflation target is met, as higher housing returns (and volatility depicting more trading) is likely to be associated with increased aggregate demand via the wealth-effect channel.Moreover, monthly and weekly predictions of housing returns and volatility contingent on online housing search activity, capturing latent demand, would also help investors to make optimal portfolio allocation decisions in a timely manner, as clearly with an increase in the HSI, both returns and risk in investing the housing market increases, producing a risk-return trade-off, which, in turn, needs to be compared with other asset market the agent might be investing in at the same point in time.Finally, from the perspective of a researcher, our results suggest that the housing market is, in fact, inefficient in the semi-strong sense [10], given the predictive role of search activity, but this result is also contingent on the phase of the housing returns, which excludes bearish-and bullish-regimes.This finding also implies that theoretical models relating housing price movements with search activities, should involve consideration of the underlying initial regime the market is in.In other words, our results have important implications for policy authorities, investors and academics.
Since in-sample predictability does not necessarily translate into out-of-sample gains, as part of future research, it would be interesting to extend our analysis to a full-fledged forecasting exercise using the k-th order non-parametric causality-in-quantiles test, as outlined in Bonaccolto et al. (2018).One could also go beyond a bivariate set-up when forecasting by incorporating a large array of other predictors when forecasting housing returns volatility in particular in a machine learning set-up, and hence compare the relative importance of HSI with the other control variables over the forecasting sample.Notes 1.A recent report by the National Association of Realtors (NAR, 2023) shows that home buyers use the internet as their main source of information about the housing market, with as many as 96% of home buyers using the internet to search for a home.
4. Before extracting the first principal component, the indexes are used in their logarithms, a sequential testing strategy is used to account for the possibility that the individual Google Trends series could follow different trends, and seasonality is removed by regressing each series on monthly dummy variables to study the residuals from this regression.
5. While search activity for individuals residing in a given MSA counts in the overall search volume for that particular MSA, some individuals may also be interested in buying a home in one of the neighbouring MSAs.To allow for such potential moves across MSA borders, Møller et al. (2023) also include search activity in the state in which the MSA is located.
6.The data is available for download at: www.fhfa.gov/DataTools/Downloads/Pages/House-Price-Index.aspx 7. The data can be accessed at: www.zillow.com/research/data/8. From the perspective of unconditional HR, it is likely that the result could be reflecting noisy data, especially since lowest and highest returns correspond to the peak of the global financial crisis and the outbreak of the COVID-19 pandemic, respectively (see, Figure A1).9. Complete details of the parameter estimates of the GARCH model are available upon request from the authors.
10.Note that, efficiency in asset markets are tested in the context of univariate models to highlight the random-walk nature of asset prices (i.e., current price movements explained only by its lagged price) via the non-rejection of the unit root hypothesis can be considered a weak test, while predictability of asset returns involving one and multiple predictors can be termed as semi-strong and strong, respectively.

Figure 1 .
Figure 1.k-th order causalityin-quantiles test results for housing price returns and volatility for the USA using monthly data: 2004:01-2021:01 IJHMAsignalling market efficiency related to HSI in line with the quantiles-based test of efficiency ofTiwari et al. (2020) for the overall USA, as well as at the MSA-level.

Table 1 .
Notes: Entries correspond to average derivative (AD) estimates of the sign of the effect of HSI on to housing price returns (HR) and its volatility (HR 2 ) at a particular quantile Source: Table created by authors Housing search activity

Table 2 .
Notes: Vertical axis reports the standard normal test statistic for the hypothesis that there is no Granger causality for a particular quantile on the horizontal axis running from housing search activity index (HSI) to housing price returns (HR) and squared returns (volatility; HR2); CV 10%, CV 5% and CV 1 % correspond to the critical values of 1.645, 1.96 and 2.575 respectively Source: Figure created by authors Table created by authors