Macroeconomic Nowcasting Using Google Probabilities
^{☆}
ISBN: 9781789732429, eISBN: 9781789732412
ISSN: 07319053
Publication date: 30 August 2019
Abstract
Many recent chapters have investigated whether data from internet search engines such as Google can help improve nowcasts or shortterm forecasts of macroeconomic variables. These chapters construct variables based on Google searches and use them as explanatory variables in regression models. We add to this literature by nowcasting using dynamic model selection (DMS) methods which allow for model switching between timevarying parameter regression models. This is potentially useful in an environment of coefficient instability and overparameterization which can arise when forecasting with Google variables. We extend the DMS methodology by allowing for the model switching to be controlled by the Google variables through what we call “Google probabilities”: instead of using Google variables as regressors, we allow them to determine which nowcasting model should be used at each point in time. In an empirical exercise involving nine major monthly US macroeconomic variables, we find DMS methods to provide large improvements in nowcasting. Our use of Google model probabilities within DMS often performs better than conventional DMS methods.
Keywords
Citation
Koop, G. and Onorante, L. (2019), "Macroeconomic Nowcasting Using Google Probabilities
Publisher
:Emerald Publishing Limited
Copyright © 2019 Emerald Publishing Limited
1. Introduction
Macroeconomic data are typically published with a time lag. This has led to a growing body of research on nowcasting. Nowcasting uses currently available data to provide timely estimates of macroeconomic variables weeks or even months before their initial estimates are produced. The availability of internet search data has provided a new resource for researchers interested in nowcasts or shortterm forecasts of macroeconomic variables. Google search data, available since January 2004, is a particularly popular source. Pioneering chapters such as Choi and Varian (2009, 2011) have led to an explosion of nowcasting work using Google data including, among many others, Artola and Galan (2012), Askitas and Zimmermann (2009), CarriereSwallow and Labbe (2011), Chamberlin (2010), D’Amuri and Marcucci (2009), Hellerstein and Middeldorp (2012), Kholodilin, Podstawski, and Siliverstovs (2010), McLaren and Shanbhoge (2011), Scott and Varian (2012), Schmidt and Vosen (2009), Suhoy (2009) and Wu and Brynjolfsson (2010). ^{1}
These chapters report a variety of findings for a range of variables, but a few general themes emerge. First, Google data is potentially useful in nowcasting or shortterm forecasting, but there is little evidence that it can be successfully used for longterm forecasting. Second, Google data is only rarely found to be useful for broad macroeconomic variables (e.g. inflation, industrial production, etc.) ^{2} and is more commonly used to nowcast specific variables relating to consumption, housing or labor markets. For instance, Choi and Varian (2011) successfully nowcast motor vehicles and car parts ^{3} , initial claims for unemployment benefits and tourist arrivals in Hong Kong. Third, the existing chapter uses linear regression methods.
This chapter deals with the second and third of these points. We nowcast a variety of conventional US monthly macroeconomic variables and see if Google variables provide additional nowcasting power beyond a conventional set of predictors. It is common (see, among many others, Giannone, Lenza, Momferatou, & Onorante, 2010) to forecast inflation using a variety of macro predictors such as unemployment, the term spread, wage inflation, oil price inflation, etc. We use Google variables in different ways as additional information and check whether their inclusion can improve nowcasting power. We do this for nine different macroeconomic variables.
The main innovations in our approach relate to the manner in which we include the Google variables in our regression models. We use Dynamic Model Averaging and Model Selection (DMA and DMS) methods with timevarying parameter (TVP) regressions. DMA methods for TVP regression models were developed by Raftery, Karny, and Ettler (2010) and have been used successfully in several applications (e.g., among others, Dangl & Halling, 2012; Koop & Korobilis, 2012; Koop & Onorante, 2012; Koop & Tole, 2013; Nicoletti & Passaro, 2012).
Initially we implement DMA and DMS in a conventional manner, using Google variables as additional predictors in TVP regressions. This represents a useful extension over existing nowcasting methods, such as Choi and Varian (2009, 2011), which uses linear regression methods with constant coefficients. The second innovative aspect of the chapter is that we extend the DMA methodology to use the Google data in a different manner. Instead of simply using a Google variable as an explanatory variable in a regression, we develop a method which allows for the inclusion probability of each macro explanatory variable to depend on the Google data. This motivates the terminology used in the title of this chapter: “Google probabilities”. The rationale behind our approach is that some of the existing literature (e.g. Choi & Varian, 2011) suggests that Google variables might not be good linear predictors. However, they may be good at signalling the turning points or other forms of change or model switching. In particular, we hypothesize that Google searches are able to collect “collective wisdom” and can be informative about which macro variables are important in the model at different points in time, either directly or by influencing the outcomes through agents’ expectations. For example, a surge in searches about oil prices may not say much per se about whether oil prices are increasing or decreasing, but may indicate that the variable should be relevant in modeling. This should trigger a switch toward nowcasting models including the oil price as explanatory variable.
In an empirical exercise involving monthly US data on nine macroeconomic variables, we find DMS methods to nowcast well, regardless of whether they involve Google model probabilities or not. In particular, DMS tends to nowcast slightly better than the DMA and much better than the standard benchmarks using OLS methods. The use of Google probabilities to influence model switching often leads to further improvements in nowcast performance.
2. Macroeconomic Nowcasting and Google Data
Table 1 lists the macroeconomic variables we are interested in nowcasting. We use monthly US data from January 1973 through July 2012. Note that, as is commonly done, all of our variables are transformed so as to be rates (e.g. inflation rate, unemployment rate, etc.). All data are taken from the BIS Macroeconomic series databases, OECD Main Economic Indicators (OECD), Hamburg World Economic Archive, and the Federal Reserve Bank of Chicago.
Variable  Raw Variable (

Transformation  Source 

Inflation  Consumer price index, all items 

BIS 
Wage inflation  Ave. hourly earnings in manuf. 

BIS 
Unemployment  Unemployment rate, all employees  None  BIS 
Term spread  Long minus short − 10 yr. Treasury minus Fed funds rate  None  BIS, OECD 
FCI  Financial Conditions Index^{a}  None  Chicago FED 
Commodities price inflation  Price Index, food and energy 

Ham World Econ. Archive 
Industrial production  Total industrial production excluding construction 

BIS 
Oil price inflation  Crude oil price (USD per barrel) 

BIS 
Money supply growth  Money supply (M3) 

OECD 
^{a} Source: Chicago Fed. The indicator has an average value of zero and a standard deviation of one. Positive/negative numbers indicate tighter/looser than average financial conditions.
Corresponding to each of these variables, we produce a composite Google search variable. Of course, for any concept there are many potential Google search terms and there are different treatments of this in the literature. For example, Scott and Varian (2012) use 151 search categories. ^{4} In this chapter we use a standardized procedure with the scope of minimizing the amount of judgement in the choice of variables. We start by searching for the name of the macro variable of interest and we collect the corresponding Google search volume. Along with this variable, the Google interface supplies a set of related terms. These are the most popular terms related to the search: Google chooses them in a mechanical manner, by examining searches conducted by users immediately before and after. We fetch these related searches, and we repeat the procedure for each of them, finding new terms. Only at this point some judgment is necessary. The related searches in Google are found automatically, therefore terms completely unrelated to economic concepts are removed manually. We could alternatively have chosen to limit our search to some specific Google category, but those are also defined automatically and remaining extraneous variables would have needed manual intervention. It is important, however, to note that variables are not eliminated on the basis of (expected) performance, but only when they are obvious mistakes (e.g. when searching for “spread” in relation to interest rates all results related to food are not retained). We also mechanically deleted all repeated terms, a frequent event when using the concept of “related” more than once. The remaining Google variables are attributed to the macro variable used to start the search.
Our final Google database is composed of 259 search results (see the Appendix for a complete list). All series start at the beginning of 2004 and each volume search is separately normalized from 0 to 100. This normalization is done by first dividing the number of searches for a word by the total number of searches being done. This is done to avoid the issues that would arise due to the fact that, overall, the number of google searches are increasing over the sample period. The result is then normalized to lie between 0 and 100. Variables searched with high volume have weekly frequency; less searched terms are supplied by the Google interface as monthly observations. Our research and the data to be forecasted are at most at monthly frequency, therefore we convert the weekly series by taking the last observation available for every month.
Thus, for each of the nine macroeconomic variables in Table 1, we match a number of Google search variables. For each variable, we have, on average, over 20 Google search variables, unevenly distributed. To ensure parsimony, we adopt a strategy of averaging all the Google search variables to produce a single “Google variable” corresponding to each macroeconomic variable. Such a strategy works well, although other more sophisticated methods (e.g. using principal components methods) would be possible.
The nine Google variables constructed in this fashion are plotted in Fig. 1. The macroeconomic variables themselves are plotted in Fig. 2. A comparison of each Google variable to its macroeconomic counterpart does not tend to indicate a close relationship between the two. There are some exceptions to this. For instance, the increase in Google searches related to unemployment matches up well with the actual unemployment rate, especially as the financial crisis occurs. But overall, the differences are greater than the similarities. For instance, several of the Google search variables exhibit much less variation over time than their actual counterparts (e.g. the Google variables for wages, financial conditions, and industrial production are all roughly constant over the sample). This suggests why regressions involving Google variables might not be good forecasting models for these macroeconomic variables. However, this does not preclude that the Google variables might be useful predictors at particular times. For instance, the Google variable for the term “spread” in general looks very different from the variable “spread” itself. However, it does exhibit an increase in the run up to the financial crisis that matches the behavior of this variable at this point in time. Our multimodel, dynamic approach is welldesigned to accommodate such features in a way that single regressions are not.
In summary, the date set we have includes 18 variables. These are the nine variables listed in Table 1 and, corresponding to each, the average Google search variable reflecting internet search activity relating to the underlying macroeconomic concept. ^{5}
3. Models
Each of our models involves using one of the macroeconomic variables as a dependent variable,
3.1. Our Baseline: Regressions with Constant Coefficients
A standard, onestep ahead regression model for forecasting
Typically, the model would also include lags of the dependent variable and an intercept. All models and all the empirical results in this chapter include these (with a lag length of 2), but for notational simplicity we will not explicitly note this in the formulas in this section.
We then add the Google regressors. We assume the following timing convention: At the end of month
The results in this chapter adopt this timing convention, but other timing conventions (e.g. nowcasting in the middle of a month) can be accommodated with minor alterations of the preceding equation (depending on the release date of the variables in
3.2. TVP Regression Models, Model Averaging and Model Switching with Google Regressors
The regressions in the preceding subsection have two potential problems: (1) they assume coefficients are constant over time which, for many macroeconomic time series, is rejected by the data (see, among many other, Stock and Watson, 1996) and (2) they may be overparameterized since the regressions potentially have many explanatory variables and the time span of the data may be short.
An obvious way to surmount the first problem is to use a TVP regression model. TVP regression models (or multivariate extensions) are increasingly popular in macroeconomics (see, among many others, Canova, 1993; Canova & Ciccarelli, 2009; Canova & Gambetti, 2009; Chan, Koop, LeonGonzalez, & Strachan, 2012; Cogley & Sargent, 2001, 2005; Koop, LeonGonzalez, & Strachan, 2009; and Primiceri, 2005). Our TVP regression model is specified as:
Before discussing the more innovative part of our modeling approach, we note that, in all of our models, we allow for time variation in the error variance. Thus,
Due to overparametrization concerns, there is a growing literature which uses model averaging or selection methods in TVP regressions. That is, instead of working with one large overparameterized model, parsimony can be achieved by averaging over (or selecting between) smaller models. Thus, model averaging or model selection methods can be used to ensure shrinkage in overparameterized models. With TVP models, it is often desirable to choose models in a timevarying fashion and, thus, DMA or DMS methods can be used (see, e.g., Koop & Korobilis, 2012). These allow for a different model to be selected at each point in time (with DMS) or different weighs to be used in model averaging at each point in time (with DMA). For instance, in light of Choi and Varian (2011)’s finding that Google variables predict better at some points in time than others, one may wish to include the Google variables sometimes but not always. DMS allows for this. It can switch between models which include Google variables and models which do not, as necessary.
The pioneering chapter which developed methods for DMA and DMS was Raftery et al. (2010). Since this chapter describes (and provides motivation for) the DMA algorithm used in this chapter, we will not provide complete details here. Instead we just describe the model space under consideration and the general ideas involved in the algorithm.
Instead of working with the single regression of the form (3), we have
Within a single TVP regression model we estimate
DMA and DMS involve a recursive updating scheme using quantities which we label
Raftery et al. (2010) derive the following model updating equation:
Thus, starting with
3.3. DMA and DMS with Google Probabilities
Our final and most original contribution consists of using the Google variables not directly as regressors, but as providing information to determine which macroeconomic variables should be included at each point in time. The underlying intuition is that the search volume might show the relevance of a certain variable for nowcasting at one point in time rather than a precise and signed causeeffect relationship. Therefore even those Google searches showing little direct forecasting power as explanatory variables in a regression might be useful in selecting the explanatory variables of most use for nowcasting at any given point in time. Motivated by these considerations, we propose to modify the conventional DMA/DMS methodology as follows.
Let
Consider the same model space as before, defined in Eq. (5), with
Our modified version of DMA and DMS with Google model probabilities involves implementing the algorithm of Raftery et al. (2010), except with the timevarying model probabilities altered to reflect the Google model probabilities as:
It is worth noting that there exist other approaches which allow for model probabilities to depend upon explanatory variables such as we do with our Google model probabilities. A good example is the smoothly mixing regression model of Geweke and Keane (2007). Our approach differs from these in two main ways. First, unlike the smoothly mixing regression model, our approach is dynamic such that a different model can be selected in each time period. Second, our approach avoids the use of computationallyintensive MCMC methods. As noted above, with
4. Nowcasting Using DMS and DMA with Google Model Probabilities
4.1. Overview
In this section, we present evidence on the nowcasting performance of various implementations of DMA and DMS using the data set described in Section 2. For each of the nine variables in Table 1, we carry out a nowcasting exercise using several different approaches, most of which are either DMA or DMS using Eq. (8). In particular, we consider
We use mean squared forecast errors (MSFEs) to evaluate the quality of point forecasts and sums of log predictive likelihoods to evaluate the quality of the predictive densities produced by the various methods. Remember though, that our macroeconomic data is available from January 1973 through July 2012, but the Google data only exists since January 2004. In light of this mismatch in sample span, we estimate all our models in two different ways. First, we simply discard all pre2004 data for all variables and estimate our models using this relatively short sample. Second, we use data back to 1973 for the macroeconomic variables, but pre2004 we do not use versions of the models involving the unavailable Google data. For instance, when doing DMA with
LogPL  MSFE  

DMA  DMS  DMA  DMS  
Inflation  
Google Variables Not Used  

−236.73  −235.38  24.95  23.28 
Rec. OLS  –  –  30.75  – 
Rec. AR(2)  –  –  24.22  – 
No change  –  –  31.20  – 
Google Variables Used as Probabilities  

−239.41  −232.29  24.69  19.35 

−239.48  −232.36  24.75  19.13 
Google Variables Used as Regressors  

−237.64  −233.23  26.28  21.08 
Rec. OLS  –  –  37.23  – 
Industrial Production  

Google Variables Not Used  

−289.10  −287.78  107.04  104.46 
Rec. OLS  –  –  165.51  – 
Rec. AR(2)  –  –  114.13  – 
No change  –  –  113.83  – 
Google Variables Used as Probabilities  

−291.74  −286.46  116.96  110.12 

−291.94  −284.42  117.49  109.74 
Google Variables Used as Regressors  

−288.28  −284.98  102.88  95.90 
Rec. OLS  –  –  158.12  – 
LogPL  MSFE  

DMA  DMS  DMA  DMS  
Unemployment  
Google Variables Not Used  

−124.10  −123.04  0.033  0.033 
Rec. OLS  –  –  0.036  – 
Rec. AR(2)  –  –  0.038  – 
No change  –  –  5.44  – 
Google Variables Used as Probabilities  

−133.75  −127.30  0.032  0.033 

−134.41  −128.38  0.032  0.035 
Google Variables Used as Regressors  

−127.91  −123.04  0.034  0.033 
Rec. OLS  –  –  0.047  – 
Wage Inflation  

Google Variables Not Used  

−192.95  −190.10  6.52  5.77 
Rec. OLS  –  –  9.78  – 
Rec. AR(2)  –  –  6.83  – 
No change  –  –  6.15  – 
Google Variables Used as Probabilities  

−197.80  −194.11  7.16  5.71 

−198.02  −194.49  7.17  5.89 
Google Variables Used as Regressors  

−195.25  −189.50  6.72  5.53 
Rec. OLS  –  –  11.48  – 
Money  

Google Variables Not Used  

−245.69  −244.53  30.02  29.71 
Rec. OLS  –  –  33.50  – 
Rec. AR(2)  –  –  28.99  – 
No change  –  –  28.69  – 
Google Variables Used as Probabilities  

−249.97  −242.72  29.28  27.34 

−250.81  −243.97  29.75  26.07 
Google Variables Used as Regressors  

−247.07  −242.90  31.20  28.12 
Rec. OLS  –  –  42.77  – 
LogPL  MSFE  

DMA  DMS  DMA  DMS  
Financial Conditions Index  
Google Variables Not Used  

−53.22  −53.64  0.29  0.29 
Rec. OLS  –  –  0.30  – 
Rec. AR(2)  –  –  0.32  – 
No change  –  –  0.45  – 
Google Variables Used as Probabilities  

−58.92  −53.29  0.28  0.21 

−59.56  −54.56  0.28  0.21 
Google Variables Used as Regressors  

−55.51  −51.56  0.32  0.26 
Rec. OLS  –  –  0.48  – 
Oil Price Inflation  

Google Variables Not Used  

−484.51  −479.54  13,219  10,407 
Rec. OLS  –  –  17,465  – 
Rec. AR(2)  –  –  11,253  – 
No change  –  –  12,185  – 
Google Variables Used as Probabilities  

−481.39  −475.00  11,678  8,961 

−481.60  −474.71  11,857  8,555 
Google Variables Used as Regressors  

−484.63  −479.72  13,241  10,415 
Rec. OLS  –  –  29,333  – 
LogPL  MSFE  

DMA  DMS  DMA  DMS  
Commodity Price Inflation  
Google Variables Not Used  

−429.24  −425.50  3,115  2,706 
Rec. OLS  –  –  3,925  – 
Rec. AR(2)  –  –  2,950  – 
No change  –  –  3,254  – 
Google Variables Used as Probabilities  

−429.74  −427.97  3,169  2,964 

−429.85  −428.49  3,168  2,986 
Google Variables Used as Regressors  

−429.23  −424.71  3,120  2,635 
Rec. OLS  –  –  5,193  – 
Term Spread  

Google Variables Not Used  

−87.68  −86.67  0.072  0.072 
Rec. OLS  –  –  0.092  – 
Rec. AR(2)  –  –  0.068  – 
No change  –  –  1.476  – 
Google Variables Used as Probabilities  

−99.42  −91.28  0.069  0.081 

−100.53  −93.32  0.069  0.091 
Google Variables Used as Regressors  

−91.44  −86.67  0.068  0.072 
Rec. OLS  –  –  0.103  – 
LogPL  MSFE  

DMA  DMS  DMA  DMS  
Inflation  
Google Variables Not Used  

−293.11  −291.56  20.39  19.39 
Rec. OLS  –  –  22.80  – 
Rec. AR(2)  –  –  24.16  – 
No change  –  –  34.10  – 
Google Variables Used as Probabilities  

−293.71  −290.95  20.42  18.74 

−293.73  291.71  20.42  19.09 
LogPL  MSFE  

DMA  DMS  DMA  DMS  
Industrial Production  
Google Variables Not Used  

−363.09  −360.27  94.88  88.38 
Rec. OLS  –  –  90.43  – 
Rec. AR(2)  –  –  90.38  – 
No change  –  –  104.35  – 
Google Variables Used as Probabilities  

−362.54  −361.41  94.79  93.11 

−362.59  −361.04  94.92  92.61 
Unemployment  

Google Variables Not Used  

48.28  50.83  0.027  0.025 
Rec. OLS  –  –  0.025  – 
Rec. AR(2)  –  –  0.030  – 
No change  –  –  4.408  – 
Google Variables Used as Probabilities  

45.85  46.19  0.029  0.028 

45.51  46.15  0.029  0.028 
Wage Inflation  

Google Variables Not Used  

−232.99  −229.57  6.06  5.44 
Rec. OLS  –  –  9.30  – 
Rec. AR(2)  –  –  7.71  – 
No change  –  –  10.41  – 
Google Variables Used as Probabilities  

−233.69  −230.94  6.16  5.42 

−233.77  −230.37  6.13  5.39 
LogPL  MSFE  

DMA  DMS  DMA  DMS  
Money  
Google Variables Not Used  

−294.56  −293.57  23.46  22.73 
Rec. OLS  –  –  23.02  – 
Rec. AR(2)  –  –  23.34  – 
No change  –  –  24.16  – 
Google Variables Used as Probabilities  

−293.99  −290.76  22.97  20.38 

−294.11  −291.24  23.07  20.92 
Financial Conditions Index  

Google Variables Not Used  

−28.63  −28.02  0.17  0.17 
Rec. OLS  –  –  0.18  – 
Rec. AR(2)  –  –  0.20  – 
No change  –  –  0.36  – 
Google Variables Used as Probabilities  

−31.71  −31.17  0.18  0.16 

−31.70  −31.15  0.18  0.16 
Oil Price Inflation  

Google Variables Not Used  

−610.18  −608.20  10,443  9,836 
Rec. OLS  –  –  10,468  – 
Rec. AR(2)  –  –  9,740  – 
No change  –  –  10,957  – 
Google Variables Used as Probabilities  

−609.54  −607.24  10,210  9,269 

−609.63  −606.73  10,230  9,064 
LogPL  MSFE  

DMA  DMS  DMA  DMS  
Commodity Price Inflation  
Google Variables Not Used  

−531.01  −528.20  2,230  2,080 
Rec. OLS  –  –  2,198  – 
Rec. AR(2)  –  –  2,200  – 
No change  –  –  2,710  – 
Google Variables Used as Probabilities  

−529.57  −528.19  2,230  2,079 

−529.64  −527.63  2,235  2,084 
Term Spread  

Google Variables Not Used  

2.980  6.239  0.062  0.056 
Rec. OLS  –  –  0.109  – 
Rec. AR(2)  –  –  0.083  – 
No change  –  –  1.382  – 
Google Variables Used as Probabilities  

2.374  6.785  0.062  0.053 

2.484  6.754  0.061  0.052 
4.2. Discussion of Empirical Results
With nine variables, two different forecast metrics and two different sample spans, there are 36 different dimensions in which our approaches can be compared. Not surprisingly, we are not finding one approach which nowcasts best in every case. However, there is a strong tendency to find that DMA and DMS methods nowcast better than standard benchmarks and there are many cases where the inclusion of Google data improves nowcast performance relative to the comparable approach excluding the Google data. Inclusion of Google data in the form of model probabilities is typically (although not always) the best way of including Google data. It is typically the case that DMS nowcasts better than the comparable DMA algorithm, presumably since the ability of DMS to switch quickly between different parsimonious models helps improving nowcasts. The remainder of this subsection elaborates on these points, going through one macroeconomic variable at a time.
Inflation. For inflation, we find DMS with
Industrial Production: As with inflation, there is strong evidence that DMS leads to nowcast improvements over benchmark OLS methods. However, evidence conflicts on the best way to include Google variables. If we use only the post2004 data, the MSFEs indicate the Google variables are best used as regressors (along with DMS methods). However, predictive likelihoods indicate that DMS with Google model probabilities nowcasts best. However, if we use data since 1973, MSFEs and predictive likelihoods both indicate that simply doing DMS using the macroeconomic variables nowcasts best. Hence, we are finding strong support for the use of DMS, but a less clear story on how or whether Google variables should be used with DMS.
Unemployment: With the post2004 data, MSFEs indicate support for our DMS approach using Google probabilities, but predictive likelihoods indicate a preference for using the Google variables as regressors (or not at all). When using the post1973 sample, predictive likelihoods also indicate support for DMS using Google probabilities. However, MSFEs indicate omitting the Google variables leads to the best nowcasts, with conventional DMS and recursive OLS being the winning approaches according to this metric.
Wage inflation: This is a variable for which MSFE and predictive likelihood results are in accordance. For the post2004 sample they indicate conventional DMS, using the Google variables as regressors, is to be preferred. However, for the post1973 sample, they indicate DMS using Google probabilities (or just the macro dataset) nowcasts best.
Money: The different measures of nowcast performance and sample spans also lead to a consistent story for money supply growth. In particular, DMS with Google probabilities nowcasts best, although there is some disagreement over whether
Financial Conditions Index: Using MSFEs, both sample spans indicate that DMS with Google data nowcasts best. Predictive likelihoods, though, show a conflict between whether the Google variables should be used as regressors (post2004 data) or not included at all (post1973 data).
Oil Price Inflation: For this variable, both nowcast metrics and data spans indicate DMS with
Commodity Price Inflation: Using the post2004 sample, we find the best performance using DMS with the Google variables being used as regressors. However, using the post1973 sample we find the approaches including the Google model probabilities (either with
Term Spread: Using the smaller post2004 sample, we are finding that DMS using Google variables as regressors narrowly beats approaches using Google probabilities to be the best nowcasting model. However, in the longer sample, approaches which use the Google probabilities nowcast best. We note also that this is one of the few variables where a benchmark approach does well. In particular, using the post2004 sample, an AR(2) model nowcasts quite well (although it does not beat our DMS approach).
With regards to the general question as to whether it is worthwhile to go to the effort of collecting Google data in a macroeconomic forecasting exercise, our results indicate that the answer is yes. Even though the forecaster should take care in investigating the best manner in which the Google variables should be incorporated, we found that incorporating them in some fashion does improve forecasts in almost every case. To dig deeper into this issue, it is informative to look at results for methods which are comparable in every respect except for the way Google variables are included (or not). Thus, if we compare only DMS methods, using post2004 data, we found that methods which involve the Google data lead to better forecast performance for most of the variables. For inflation, industrial production, the money supply, the FCI and the oil price, we are finding the DMS methods which do not use the Google variables always forecast much worse than those that do. For the other variables (i.e. unemployment, wage inflation, commodity price inflation, and the term spread), including Google variables into DMS methods leads to forecasts which are as good as or only slightly better than DMS methods without Google variables. Nevertheless, even in these cases Google variables do seem to be moderately useful. However, we are not finding any systematic pattern as to which categories of variables Google data is useful for. For instance, we do not find that Google data is more useful for real variables than for price or financial variables or vice versa.
5. Further Discussion and Conclusions
The preceding discussion reveals a wide variety of findings. The following main conclusions emerge:
 1)
First, the inclusion of Google data leads to improvements in nowcast performance. This result complements the existing literature by showing that Google search variables are not only useful when dealing with specific disaggregate variables, but can be used to improve nowcasting of broad macroeconomic aggregates.
 2)
Second, and despite the crude procedure we adopted to create the Google variables, we also find that it is often (albeit not invariably) the case that the information in the Google variables is best included in the form of model probabilities as opposed to simply including Google variables as regressors. The intuition that Google search volumes may provide the econometrician with useful information about which variable is important at each point in time opens the way to a new and more extensive use of this vast database.
 3)
Third, Google probabilities make sense in a context where the economy is unstable, and are therefore particularly suited to deal with the recent crisis. However, their potential must be exploited with opportune techniques allowing for model change and parsimony. We compared different techniques responding to such requirements. DMS proved to be a particularly good method for improving nowcast performance in the models we are dealing with, leading to substantial improvements over common benchmarks. It is also worth noting that DMS is a strategy which often nowcasts best, but even when it does not it does not go too far wrong. Our simple benchmarks, using OLS methods, sometimes also provide reasonable nowcasts but occasionally produce very bad nowcasts.
This is a first and so far successful attempt to use Google variables to improve macroeconomic nowcasting. We proposed two different uses of these variables, one of which, to our knowledge, completely new and close to the spirit (“what are people concerned about?”) in which these variables are collected. Additional research will be needed to make these results more robust. Our construction of the Google variables, in particular, is extremely simple, and it is not unlikely that a more accurate choice in the searches or a different method of averaging may lead to further improvements in their use.
This working chapter should not be reported as representing the views of the ECB. The views expressed are those of the authors and do not necessarily reflect those of the ECB.
Author Biography
Gary Koop is a Professor in the Department of Economics at the University of Strathclyde. He received his PhD at the University of Toronto in 1989. He has held professorial posts at the Universities of Edinburgh, Glasgow, and Leicester and was an Assistant Professor at Boston University, Queen’s University, and the University of Toronto. His research work in Bayesian econometrics has resulted in over a hundred publications in international quality journals. He has also published several textbooks including Bayesian Econometrics, Bayesian Econometric Methods and is coeditor of the Oxford Handbook of Bayesian Econometrics. He is on the editorial board of several journals including the Journal of Business and Economic Statistics and the Journal of Applied Econometrics.
Luca Onorante is Senior Economist at the European Central Bank. From 2013 to 2016 he worked as Deputy Head of Research and Head of the Macro Modeling Project and subsequently as Head of the Monetary Policy Division at the Central Bank of Ireland. He holds a PhD from the European University Institute. He was an Economist at the European Central Bank from 2003 to 2013. Prior to this, he was a Research Assistant at the European University Institute and consultant at the UNCTAD. He has taught Macroeconomics at the University of Bolzano and Advanced Macro (master level) at Trinity College in Dublin.
Notes
This list of chapters uses Google data for macroeconomic forecasting. Google data is also being used for nowcasting in other fields such as finance and epidemiology.
A notable exception is the nowcasting of U.S. unemployment in D’Amuri and Marcucci (2009).
Following this chapter, a whole literature has developed focusing on predicting car sales. For instance, Barreira, Godinho, and Melo (2013) apply selected Google Trends data to car sales in Spain, France, Italy and Portugal, finding only mixed evidence that search query data improves prediction. Fantazzini and Toktamysova (2015) also reach mixed conclusions when forecasting car sales in Germany. NymandAndersen and Pantelidis (2018) test an indicator provided by Google Categories in predicting car sales in 12 European countries.
Categories are aggregates of searches that are classified by the Google engine as belonging to a specific category. Examples of toplevel categories are ‘Food and beverages’ or ‘News and current events’. Running a regression with 151 explanatory categories, using data beginning in January 2004, is a challenge, raising concerns about overfitting. They address these problems by using Bayesian variable selection methods, involving a spikeandslab prior, to obtain a more parsimonious model. Their work well illustrates the two problems which must be addressed with Google data: (1) how to select the Google search variables and (2) given the number of Google search variables is typically large, how to ensure parsimony.
Note that the macroeconomic variables and Google variables have different time spans since the internet search data is not available before January 2004. We will discuss how we treat this issue in a subsequent section.
For the case where the Google variables are included as regressors, we only use post2004 data.
References
Artola & Galan (2012) Artola, C. , & Galan, E. (2012). Tracking the future on the web: Construction of leading indicators using internet searches. Documentos Ocasionales No. 1203, Bank of Spain.
Askitas & Zimmermann (2009) Askitas, N. , & Zimmermann, K. (2009). Google econometrics and unemployment nowcasting. DIW Berlin, Discussion Paper 899.
Barreira, Godinho, & Melo (2013) Barreira, N. , Godinho, P. , & Melo, M. (2013). Nowcasting unemployment rate and new car sales in southwestern Europe with Google Trends. Netnomics, 14(3), 129–165.
Canova (1993) Canova, F. (1993). Modelling and nowcasting exchange rates using a Bayesian time varying coefficient model. Journal of Economic Dynamics and Control, 17, 233–262.
Canova & Ciccarelli (2009) Canova, F. , & Ciccarelli, M. (2009). Estimating multicountry VAR models. International Economic Review, 50, 929–959.
Canova & Gambetti (2009) Canova, F. , & Gambetti, L. (2009). Structural changes in the US economy: Is there a role for monetary policy? Journal of Economic Dynamics and Control, 33, 477–490.
CarriereSwallow & Labbe (2011) CarriereSwallow, Y. , & Labbe, F. (2011). Nowcasting with Google trends in an emerging market. Journal of Forecasting, 32(4), 289–298.
Chamberlin (2010) Chamberlin, G. (2010). Googling the present. Economic & Labour Market Review, 4(12), 59–95.
Chan, Koop, LeonGonzalez, & Strachan (2012) Chan, J. , Koop, G. , LeonGonzalez, R. , & Strachan, R. (2012). Time varying dimension models. Journal of Business and Economic Statistics, 30, 358–367.
Choi & Varian (2009) Choi, H. , & Varian, R. (2009). Predicting initial claims for unemployment insurance using Google Trends. Google Technical Report.
Choi & Varian (2011) Choi, H. , & Varian, R. (2011). Predicting the present with Google Trends. Google Technical Report.
Cogley & Sargent (2001) Cogley, T. , & Sargent, T. (2001). Evolving postWorld War II inflation dynamics. NBER Macroeconomic Annual, 16, 331–373.
Cogley & Sargent (2005) Cogley, T. , & Sargent, T. (2005). Drifts and volatilities: Monetary policies and outcomes in the post WWII U.S. Review of Economic Dynamics, 8, 262–302.
D’Amuri & Marcucci (2009) D’Amuri, F. , & Marcucci, J. (2009). ‘Google it!’ nowcasting the US unemployment rate with a Google job search index. Institute for Economic and Social Research Discussion Paper 200932.
Dangl & Halling (2012) Dangl, T. , & Halling, M. (2012). Predictive regressions with time varying coefficients. Journal of Financial Economics, 106, 157–181.
Fantazzini & Toktamysova (2015) Fantazzini, D. , & Toktamysova, Z. (2015). Forecasting German car sales using Google data and multivariate models. International Journal of Production Economics, Elsevier, 170(PA), 97–135.
Geweke & Keane (2007) Geweke, J. , & Keane, M. (2007). Smoothly mixing regressions. Journal of Econometrics, 138, 252–290.
Giannone, Lenza, Momferatou, & Onorante (2010) Giannone, D. , Lenza, M. , Momferatou, D. , & Onorante, L. (2010). Shortterm inflation projections: A Bayesian vector autoregressive approach. ECARES working paper 2010011.
Hellerstein & Middeldorp (2012) Hellerstein, R. , & Middeldorp, M. (2012). Nowcasting with internet search data. Liberty Street Economics Blog of the Federal Reserve Bank of New York, January 4, 2012.
Kholodilin, Podstawski, & Siliverstovs (2010) Kholodilin, K. , Podstawski, M. , & Siliverstovs, S. (2010). Do Google searches help in nowcasting private consumption? A realtime evidence for the US. KOF Swiss Economic Institute Discussion Paper No. 256.
Koop & Korobilis (2012) Koop, G. , & Korobilis, D. (2012). Forecasting inflation using dynamic model averaging. International Economic Review, 53, 867–886.
Koop, LeonGonzalez, & Strachan (2009) Koop, G. , LeonGonzalez, R. , & Strachan, R. (2009). On the evolution of the monetary policy transmission mechanism. Journal of Economic Dynamics and Control, 33, 997–1017.
Koop & Onorante (2012) Koop, G. , & Onorante, L. (2012). Estimating Phillips curves in turbulent times using the ECB’s survey of professional nowcasters. European Central Bank, working paper number 1422.
Koop & Tole (2013) Koop, G. , & Tole, L. (2013). Forecasting the European carbon market. Journal of the Royal Statistical Society, Series A, 176(Part 3), 723–741.
McLaren & Shanbhoge (2011) McLaren, N. , & Shanbhoge, R. (2011). Using internet search data as economic indicators. Bank of England Quarterly Bulletin, June 2011.
Nicoletti & Passaro (2012) Nicoletti, G. , & Passaro, R. (2012). Sometimes it helps: Evolving predictive power of spreads on GDP. European Central Bank, working paper number 1447.
NymandAndersen & Pantelidis (2018) NymandAndersen, P. , & Pantelidis, E. (2018). Nowcasting euro area car sales and big data quality requirements. Mimeo.
Primiceri (2005) Primiceri, G. (2005). Time varying structural vector autoregressions and monetary policy. Review of Economic Studies, 72, 821–852.
Raftery, Karny, & Ettler (2010) Raftery, A. , Karny, M. , & Ettler, P. (2010). Online prediction under model uncertainty via dynamic model averaging: Application to a cold rolling mill. Technometrics, 52, 52–66.
RiskMetrics (1996) RiskMetrics . (1996). Technical Document (4th ed.). Retrieved from http://www.riskmetrics.com/system/files/private/td4e.pdf
Schmidt & Vosen (2009) Schmidt, T. , & Vosen, S. (2009). Nowcasting private consumption: Surveybased indicators vs. Google trends. Ruhr Economic Papers, No. 155.
Scott & Varian (2012) Scott, S. , & Varian, H. (2012). Bayesian variable selection for nowcasting economic time series. Retrieved from http://people.ischool.berkeley.edu/˜hal/people/hal/papers.html
Stock & Watson (1996) Stock, J. , & Watson, M. (1996). Evidence on structural instability in macroeconomic time series relations. Journal of Business and Economic Statistics, 14, 11–30.
Suhoy (2009) Suhoy, T. (2009). Query indices and a 2008 downturn: Israeli data. Bank of Israel Discussion Paper No. 2009.06.
West & Harrison (1997) West, M. , & Harrison, J. (1997). Bayesian nowcasting and dynamic models (2nd ed.). New York, NY: Springer.
Wu & Brynjolfsson (2010) Wu, L. , & Brynjolfsson, E. (2010). The future of prediction: How Google searches foreshadow housing prices and sales. MIT Technical Report.
Appendix: Categorization of Google Search Terms
Terms are grouped by category, categories are in bold. 
Commodity Price Inflation: steel price, food price, copper price, Financial Conditions Index: stock compensation, investment banking, growth equity, goldman sachs, equity compensation. Industrial Production: production, production jobs, production company, production companies, US GDP growth, urban growth, the great depression, tax calculator, small business growth, sales growth, sales compensation, revenue growth, recession, recession inflation, market growth, growth, growth industries, growth financial, growth company, growth companies, great depression, great depression deflation, GDP growth, economy, economic growth, cycle, crisis, business growth, business cycle. Inflation: what is inflation, what is deflation, US inflation, US inflation rates, US inflation rate, US inflation index, US deflation, United States inflation, U.S. inflation, real inflation, rate of inflation, price inflation, price index, national inflation, investing deflation, inflation, inflation USA, inflation stocks, inflation rates, inflation rate, inflation or deflation, inflation money, inflation index, inflation in US, inflation graph, inflation forecast, inflation deflation, inflation definition, inflation data, inflation chart, inflation calculator, inflation and deflation, India inflation, historical inflation, high inflation, fed deflation, economic inflation, economic deflation, depression deflation, deflation, deflation rate, deflation interest rates, deflation in us, deflation gold, deflation economy, definition inflation, definition deflation, define inflation, debt deflation, current inflation, current inflation rate, cpi, cpi index, cost of inflation, consumer price index. Money: money, money deflation, monetary policy, monetary deflation. Oil Price Inflation: oil production, oil prices, oil price, gasoline price, gas price, energy production, energy price, electricity price, diesel price. Term Spread: US interest rate, the fed, real interest rate, prime rate, prime interest rate, mortgage rate, mortgage interest rates, lower interest rate, libor, libor rate, libor interest rate, interest rates, interest rates inflation, interest rate, interest rate trends, interest rate risk, interest rate reduction, interest rate predictions, interest rate news, interest rate mortgage, interest rate model, interest rate inflation, interest rate history, interest rate forecast, interest rate fed, interest rate drop, interest rate cuts, interest rate cut, interest rate chart, interest rate calculator, feds interest rate, federal reserve, federal interest rate, fed, fed rates, fed rate, fed rate cut, fed interest rates, fed interest rate, fed cut, discount rate, current interest rate. Unemployment Rate: Washington unemployment, US unemployment, US unemployment rate, unemployment, unemployment statistics, unemployment rates, unemployment rate, unemployment pa, unemployment office, unemployment Michigan, unemployment insurance, unemployment great depression, unemployment extension, unemployment depression, unemployment checks, unemployment check, unemployment benefits, Texas unemployment, subsidies, state compensation fund, Oregon unemployment, Ohio unemployment, NY unemployment, NJ unemployment, New York unemployment, Michigan works, Michigan works unemployment, Michigan state unemployment, Marvin unemployment, Marvin Michigan unemployment, job growth, Florida unemployment, federal unemployment, employee benefits, depression unemployment rate, compensation packages, compensation package, California unemployment. Wage Inflation: workers compensation, workers compensation Ohio, workers compensation insurance, what is compensation, walmart wages, wages, wages calculator, wage, wage inflation, vice president salary, US wages, unpaid wages, union wages, total compensation, state wages, state employee wages, salary, salary tax calculator, salary survey, salary schedule, salary requirements, salary raise, salary grade, salary comparison, salary calculator hourly, salaries, real wages, project manager salary, pilot salary, paycheck calculator, nfl salary, nfl minimum salary, minimum wages, labor wages, labor and wages, job wages, investment banking salary, incentive compensation, human resources salary, human resources compensation, hr compensation, hourly wages, gross wages, gross salary, federal wages, federal salary, executive compensation, employment wages, employee wages, employee compensation, director compensation, deferred compensation, compensation, compensation time, compensation system, compensation structure, compensation resources, compensation plans, compensation plan, compensation manager, compensation consulting, compensation analyst, China wages, ceo salary, ceo compensation, calculate salary, bonus compensation, benefits and compensation, average wages, average salary, average nfl salary, and annual compensation. 
Appendix: Categorization of Google Search Terms
Terms are grouped by category, categories are in bold. 
Commodity Price Inflation: steel price, food price, copper price, Financial Conditions Index: stock compensation, investment banking, growth equity, goldman sachs, equity compensation. Industrial Production: production, production jobs, production company, production companies, US GDP growth, urban growth, the great depression, tax calculator, small business growth, sales growth, sales compensation, revenue growth, recession, recession inflation, market growth, growth, growth industries, growth financial, growth company, growth companies, great depression, great depression deflation, GDP growth, economy, economic growth, cycle, crisis, business growth, business cycle. Inflation: what is inflation, what is deflation, US inflation, US inflation rates, US inflation rate, US inflation index, US deflation, United States inflation, U.S. inflation, real inflation, rate of inflation, price inflation, price index, national inflation, investing deflation, inflation, inflation USA, inflation stocks, inflation rates, inflation rate, inflation or deflation, inflation money, inflation index, inflation in US, inflation graph, inflation forecast, inflation deflation, inflation definition, inflation data, inflation chart, inflation calculator, inflation and deflation, India inflation, historical inflation, high inflation, fed deflation, economic inflation, economic deflation, depression deflation, deflation, deflation rate, deflation interest rates, deflation in us, deflation gold, deflation economy, definition inflation, definition deflation, define inflation, debt deflation, current inflation, current inflation rate, cpi, cpi index, cost of inflation, consumer price index. Money: money, money deflation, monetary policy, monetary deflation. Oil Price Inflation: oil production, oil prices, oil price, gasoline price, gas price, energy production, energy price, electricity price, diesel price. Term Spread: US interest rate, the fed, real interest rate, prime rate, prime interest rate, mortgage rate, mortgage interest rates, lower interest rate, libor, libor rate, libor interest rate, interest rates, interest rates inflation, interest rate, interest rate trends, interest rate risk, interest rate reduction, interest rate predictions, interest rate news, interest rate mortgage, interest rate model, interest rate inflation, interest rate history, interest rate forecast, interest rate fed, interest rate drop, interest rate cuts, interest rate cut, interest rate chart, interest rate calculator, feds interest rate, federal reserve, federal interest rate, fed, fed rates, fed rate, fed rate cut, fed interest rates, fed interest rate, fed cut, discount rate, current interest rate. Unemployment Rate: Washington unemployment, US unemployment, US unemployment rate, unemployment, unemployment statistics, unemployment rates, unemployment rate, unemployment pa, unemployment office, unemployment Michigan, unemployment insurance, unemployment great depression, unemployment extension, unemployment depression, unemployment checks, unemployment check, unemployment benefits, Texas unemployment, subsidies, state compensation fund, Oregon unemployment, Ohio unemployment, NY unemployment, NJ unemployment, New York unemployment, Michigan works, Michigan works unemployment, Michigan state unemployment, Marvin unemployment, Marvin Michigan unemployment, job growth, Florida unemployment, federal unemployment, employee benefits, depression unemployment rate, compensation packages, compensation package, California unemployment. Wage Inflation: workers compensation, workers compensation Ohio, workers compensation insurance, what is compensation, walmart wages, wages, wages calculator, wage, wage inflation, vice president salary, US wages, unpaid wages, union wages, total compensation, state wages, state employee wages, salary, salary tax calculator, salary survey, salary schedule, salary requirements, salary raise, salary grade, salary comparison, salary calculator hourly, salaries, real wages, project manager salary, pilot salary, paycheck calculator, nfl salary, nfl minimum salary, minimum wages, labor wages, labor and wages, job wages, investment banking salary, incentive compensation, human resources salary, human resources compensation, hr compensation, hourly wages, gross wages, gross salary, federal wages, federal salary, executive compensation, employment wages, employee wages, employee compensation, director compensation, deferred compensation, compensation, compensation time, compensation system, compensation structure, compensation resources, compensation plans, compensation plan, compensation manager, compensation consulting, compensation analyst, China wages, ceo salary, ceo compensation, calculate salary, bonus compensation, benefits and compensation, average wages, average salary, average nfl salary, and annual compensation. 
Acknowledgments
This research was supported by the ESRC under grant RES062232646. Gary Koop is a Fellow at the Rimini Centre for Economic Analysis. Address for correspondence: Gary Koop, Department of Economics, University of Strathclyde, 130 Rottenrow, Glasgow G4 0GE, UK. Email: Gary.Koop@strath.ac.uk
 Prelims
 An Interview with Dale Poirier
 Macroeconomic Nowcasting Using Google Probabilities
 Sentimentbased Overlapping Community Discovery
 Violence in the Second Intifada: A Demonstration of Bayesian Generative Cognitive Modeling
 A Bayesian Model for Activation and Connectivity in Taskrelated fMRI Data
 Robust Estimation of ARMA Models with Near Root Cancellation
 A Simple Efficient Momentbased Estimator for the Stochastic Volatility Model
 A New Approach to Modeling Endogenous Gain Learning
 How Sensitive Are VAR Forecasts to Prior Hyperparameters? An Automated Sensitivity Analysis
 Steinlike Shrinkage Estimation of Panel Data Models with Common Correlated Effects
 Predictive Testing for Granger Causality via Posterior Simulation and Crossvalidation
 New Evidence on the Effect of Compulsory Schooling Laws