The paper compares multi-period forecasting performances by direct and iterated method using Bayesian vector autoregressive (VAR) models.
The paper adopts Bayesian VAR models with three different priors – independent Normal-Wishart prior, the Minnesota prior and the stochastic search variable selection (SSVS). Monte Carlo simulations are conducted to compare forecasting performances. An empirical study using US macroeconomic data are shown as an illustration.
In theory direct forecasts are more efficient asymptotically and more robust to model misspecification than iterated forecasts, and iterated forecasts tend to bias but more efficient if the one-period ahead model is correctly specified. From the results of the Monte Carlo simulations, iterated forecasts tend to outperform direct forecasts, particularly with longer lag model and with longer forecast horizons. Implementing SSVS prior generally improves forecasting performance over unrestricted VAR model for either nonstationary or stationary data.
The paper finds that iterated forecasts using model with the SSVS prior generally best outperform, suggesting that the SSVS restrictions on insignificant parameters alleviates over-parameterized problem of VAR in one-step ahead forecast and thus offers an appreciable improvement in forecast performance of iterated forecasts.
Sugita, K. (2022), "Forecasting with Bayesian vector autoregressive models: comparison of direct and iterated multistep methods", Asian Journal of Economics and Banking, Vol. 6 No. 2, pp. 142-154. https://doi.org/10.1108/AJEB-04-2022-0044
Emerald Publishing Limited
Copyright © 2022, Katsuhiro Sugita
Published in Asian Journal of Economics and Banking. Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) licence. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this licence may be seen at http://creativecommons.org/licences/by/4.0/legalcode.
Vector autoregressive (VAR) models have been widely used to forecast macroeconomic variables and to analyze macroeconomics and policy. For one-period ahead forecasting, one has to just estimate the model. However, it is often the case that more than one-period forecasting is of interest. In making a multi-period forecast, there are two methods – direct forecast method and iterated forecast method, and there have been several theoretical research about which method is better for multi-period forecasting such as Bhansali (1996, 1997), Clements and Hendry (1996), Kang (2003), Chevillon and Hendry (2005), Ing (2003) and among others. These literature tend to conclude that direct forecasts are more robust to model specification and more efficient asymptotically, and thus the direct forecast method is preferable compared with the iterated forecast method, while the iterated forecast method can be more efficient only if the one-period ahead model is correctly specified. However, some empirical research studies show that iterated forecasts outperform direct forecasts. Ang et al. (2006) find that the iterated forecasts of the US GDP growth perform better than the direct forecasts. Marcellino et al. (2006) show that iterated forecasts outperform direct forecasts, especially with longer lag and longer forecast horizon; this paper uses 170 US monthly macroeconomic time series for either univariate or multivariate models. Pesaran et al. (2011) state that whether direct or iterated method is better in multi-period forecasting depends upon the sample size, forecast horizon, the underlying data generating process (DGP) and the methods used to select lag length for the model, and thus it is ultimately an empirical matter.
For multivariate VAR models, there exists an over-parameterization problem, which leads to imprecise inference and thus deteriorates the forecast performance. Some Bayesian approaches to VAR models have been increasingly popular since Bayesian method can shrink VAR models by restricting its prior distributions. In this paper we investigate whether restricted parsimonious VAR models can mitigate misspecification problem and thus improve the forecasting performance of iterated method. Here, an independent Normal-Wishart prior is used for the unrestricted VAR and the Minnesota prior (Minn) and the stochastic search variable selection (SSVS) prior are used for the restricted prior to compare multiperiod forecasting performance between the direct and iterated forecast method.
We conduct numerical simulations using both stationary and nonstationary data generating processes (DGPs) to evaluate forecasting performances with 2-, 4-, 8- and 12-step ahead horizons, and compute the mean squared forecast error (MSFE) to compare direct forecasts with iterated forecasts using Bayesian VAR models with unrestricted and restricted priors. Iterated forecasts are found to outperform direct forecasts for both unrestricted and restricted VAR models, particularly with long-lag model and with long forecasting horizon. Implementing SSVS in VAR is found to generally improve forecasting performance appreciably. With relatively long lag length and thus a large number of parameters in the model, it seems that SSVS can effectively restrict insignificant parameters in the model and thus improve forecasting performance.
The plan of this paper is as follows. In Section 2 multi-period forecasting using VAR model is described, and method to evaluate forecasting performances. Section 3 reviews Bayesian VAR models with three different priors – the independent Normal-Wishart prior, the Minnesota prior and the SSVS prior. Section 4 illustrates numerical experiments with artificially generated data, and then examines the results of the numerical simulations. Section 5 illustrates an application to a simple three variables VAR of US macroeconomics. Section 6 concludes. This paper is based on preliminary working papers, Sugita (2018), Sugita (2019a) and Sugita (2019b). All results reported in this paper are generated using Ox version 7.2 for Linux (see Doornik, 2013).
2. Iterated and direct multi-period forecasts for VAR models
This section describes iterated and direct forecasting methods for VAR models. Let yt be an n × 1 vector of observations at time t, then a VAR model with p lag is written as
The one-step ahead forecast of the VAR model is obtained by estimating the parameters in eq. (1) as , where and are the estimators for μ and Θi in eq. (1). To make forecasting further than one-period ahead into the future, there are two methods for making multi-period forecasts – iterated forecasts and direct forecasts methods. Iterated forecasts for the h-period forecasts are obtained recursively as
Direct forecasts for the multi-period forecasting are obtained by estimating the model
To evaluate the forecasting performances among several different models, the MSFE is the most widely used. Let is a vector of observations at time τ + h for τ = τ0, …, T − h − 1, and h = 2-, 4-, 8- and 12-step ahead forecasts. Then, is estimated for both the direct and iterated method, using information up to τ − 1 to forecast values starting from τ = τ0 up to τ = T − h − 1, and calculate the MSFE defined as:
3. Bayesian VARs
This section presents Bayesian VAR models with three different priors – independent Normal-Wishart prior, the Minnesota prior and the SSVS prior. The VAR model in eq. (1) can be written in matrix form as follows:
3.1 Independent Normal-Wishart prior
The VAR model in eq. (6) with the independent Normal-Wishart prior
, and ν* = T + ν0. Given these conditional posterior specifications above, the Gibbs sampler generates sample draws.
Note that, with zero prior mean Φ0 = 0 and large prior variance V0 in eq. (7), the posterior mean for Φ is almost identical to the Maximum likelihood estimator. In this paper the hyperparameters are set at vec(Φ0) = 0 and V0 = 100 in eq. (7), Σ0 = 0.1I, and ν = 5 in eq. (8).
3.2 Minnesota prior
Litterman (1986) proposes what we call the Minnesota prior which is shrinkage prior for a Bayesian VAR model with random walk components. For a VAR model with p-the lag in eq. (1), the Minnesota prior for the coefficients assumes that the importance of the lagged variables is shrinking with the lag length, so that the prior is tighter around zero with lag length such that where the expected values of Θi is defined as and , and the variance of Θi is given as:
3.3 SSVS prior
Without any restriction on the regression coefficients and the covariance matrix in eq. (1), VAR models usually has over-parameterization problem. They contain a very large number of parameters, leading to imprecise inference and deterioration of the forecast performance. To overcome this problem, George et al. (2008) apply the Bayesian SSVS method in a VAR. The SSVS method, proposed by George et al. (2008) and George and McCulloch (1997), restricts the parameters of the model by using a hierarchical prior on the parameters.
SSVS defines the prior for the VAR coefficient Φ for each element in Φ. Let ϕj be each element in Φ, then the prior for ϕj is a hierarchical prior with mixture of two normal distributions with different variance conditional on an unknown dummy variable γj that takes zero or one:
George and McCulloch (1997) and George et al. (2008) use a default semiautomatic approach that sets for k = 0, 1, where is the OLS estimates of the standard error of ϕj in an unrestricted VAR and pre-selected constants c0 and c1 must be c0 < c1 e.g. c0 = 0.1 and c1 = 10 as used by George et al. (2008), Jochmann et al. (2010) and Jochmann et al. (2013). In this paper, we follow these values for the hyperparameters.
4. Monte Carlo simulations
This section presents Monte Carlo simulations to illustrate forecasting performances for both iterated and direct forecast methods using VAR models. Two DGPs are considered: one follows non-stationary process and the other follows stationary process. For each DGPs, 100 samples of size T = 150 were simulated, and then for each sample, three types of priors are compared: (1) inverted Normal-Wishart (INW) prior, (2) Minnesota (Minn) prior and (3) SSVS prior.
The following two DGPs for VARs are considered for this experiment. Both DGPs contain intercept term. DGP 1 is a four-variable VAR with four lags, containing unit roots with parameters
Next, DGP 2 is also a four-variable VAR with four lags, but stationary data with parameters
Each DGP is repeated 100 times to obtain 100 samples. As for determination of the lag length p, three different methods are used as (1) p = 4 (fixed), (2) p = 8 (fixed), and (3) p chosen by the Akaike information criterion (AIC) with 0 ≤ p ≤ 12. The first method has the fixed lag length as p = 4 is the true lag length. We do not use the Bayesian information criterion (BIC) for the lag length determination since the BIC is generally choosing short lag length, and the use of SSVS means that short lag model is not required to consider. For the selection of lag by the AIC, the AIC is computed at each date τ, where τ0 ≤ τ ≤ T − h, based on the one-step ahead regression for the iterated forecasts, and on the h-step ahead regression eq. (4) for the direct forecasts. For each τ in eq. (5), MCMC is run with 20,000 draws after 5,000 burn-in from τ = τ0 up to τ = T − h − 1 to compute the MSFEs in eq. (5) for each estimator by a recursive forecasting exercise of both an iterated and a direct multi-period forecasting method.
The Monte Carlo simulations for the multi-step forecasting are examined. Table 1 summarizes the MSFEs of both iterated and direct forecasts methods with forecast horizon 2-, 4-, 8- and 12-steps ahead. The MSFE in the table are the sum of the MSFE for each variable. For all series, pseudo-out-of-sample forecasts are computed for τ = 80 to τ = 150 − h − 1, then we calculate the MSFE defined as eq. (5). Each figure in Table 1 is the average over 100 sample MSFEs. Inspection of Table 1 suggests the following:
Among the three estimators by the INW, the Minn and the SSVS, the SSVS produces the lowest MSFE in most cases, though in a very few cases of direct forecasts the Minn shows barely better performances than the SSVS.
The forecast performances by SSVS prior tends to be insensitive to the choice of the lag length, while the INW estimator considerably deteriorates the performances as the lag length is longer. That is, even if the lag length is more than 4 (that is the true lag length), the SSVS treats the coefficients on longer lags to be zero, while the forecast performances of other two models are largely depend upon the selection of the lag length. The Minnesota prior effectively provides shrinkage in parameters of the longer lags.
For the INW and the SSVS, the iterated method of forecasts is better than the direct method, though for the Minn the results by iterated method are better for DGP2 than those by the direct method, but in some cases worse for DGP1.
For these DGPs, the SSVS model with iterated forecast performs best for any forecast horizon.
Table 2 illustrates the distributions of the relative MSFE, that is the ratios of the MSFE of the direct forecast to the MSFE of the iterated forecast for different forecast horizons, . The table shows the mean, standard deviations, 95% highest posterior density intervals (HPDI) of the relative MSFE, and pr.(<1), which is probability that the ratio is less than 1 (the direct forecasts performs better than the iterated forecasts). The following results are found:
For the INW and the SSVS, the mean values of the relative MSFEs are always greater than 1 (means that the iterated forecasts outperform the direct forecasts), while for the Minn the ratios are either greater or less than 1.
For the INW and the SSVS, the mean values of the relative MSFEs are getting large as the forecast horizons are longer, meaning that the relative performance of the iterated forecasts improves with the forecast horizon.
The MSFE ratios by the INW are quite sensitive to the choice of the lag length. As the lag length is longer, the relative MSFEs by the INW are getting larger. However, the relative MSFE by the SSVS is not affected by the choice of the lag length due to the insensitivities of the SSVS to the lag length.
For all three estimators, the standard deviations of the relative MSFE are larger as the forecasts horizon is longer.
Except for the Minn, the probability that the ratio is less than 1 is generally smaller with long-lag length.
In the case of the Minn with non-stationary data, direct forecasts tend to have lower MSFEs than iterated forecasts, though with stationary data, the results are opposite as iterated forecasts lead to better performance than direct forecasts.
These findings show that for given DGPs the SSVS also has almost same properties in forecasting as the INW, as suggested by Marcellino et al. (2006). That is, forecasting performance by the SSVS also shows that the iterated forecast tends to outperform the direct forecast, especially with long-lag and longer forecast horizon. For the Minn, the results are ambiguous. Since the Minnesota prior set its prior mean for the coefficients on the first own lag to be 1 and other coefficients to be zero, the Minnesota prior prone to produce misspecified parameter estimates.
5. An empirical analysis
For an empirical example of comparison of the direct and iterated forecasts using Bayesian VAR models, this section considers multivariate model of US macroeconomics that uses three variables – unemployment rate, inflation rate and interest rate. A VAR model that uses these variables has been analyzed by Cogley and Sargent (2005), Primiceri (2005), Koop et al. (2009) and Jochmann et al. (2010), among many others. Our US data are quarterly, from 1953:1 to 2020:1 with sample size T = 268. Unemployment rate is measured by the civilian unemployment rate, inflation by the 400 times the difference of the log of CPI, which is the GDP chain-type price index, and interest rate by the three-month Treasury bill. These data are obtained from the Federal Reserve Bank of St. Louis , and are plotted in Figure 1.
The selection of the number of lags in a VAR affects efficiency in estimation and thus forecasting performances. Cogley and Sargent (2005) and Primiceri (2005) work with VAR(2) to analyze US macroeconomy with the three variables without mentioning any particular reason how the lag length is chosen. Jochmann et al. (2010) use VAR(4) for their SSVS VAR model because the SSVS can find zero restrictions on the parameters of longer lags even if the true lag length is less than 4. However, the true lag length might be larger than 4. With our data set, the number of lags is scattered depending upon which criterion we use – VAR(10) by the AIC, VAR(4) by the Hannan–Quinn criterion and VAR(2) by the BIC. Even if the true lag length is less than 10, the SSVS can set zero restrictions on the longer lags, thus we consider VAR (12) and VAR (AIC) where the lag length is chosen by the AIC. Forecast horizons are 2-, 4-, 8- and 12-period ahead. We work with a recursive forecasting exercise using both direct and iterated multistep forecasting method, with data up to time τ − 1, where τ = τ0, …, T − h − 1, and τ0 = 80.
Table 3 presents the MSFEs eq. (5) for the three-variable VAR with the lag-length 12 and chosen by the AIC for the INW, the Minn and the SSVS estimators. For any forecast horizon, iterated forecasts have lower MSFE than direct forecasts. With enough long lag length of 12 the SSVS improves the forecast performance among other methods. However, with the lag selected by the AIC, almost half of the MSFEs by VAR with the Minnesota prior have the lowest MSFEs. Compared with the fixed lag length of 12, the lag length chosen by the AIC is shorter than 12 and the MSFEs are smaller than the MSFEs by the models with lag length 12. This indicates that the lag length 12 may be too long, containing unnecessary lags or parameters, though the SSVS is supposed to restrict insignificant coefficients to be zero. This indicates that SSVS is effective in ensuring parsimony in over-parameterized VAR(12) model.
The three variables used in this empirical analysis appear to be nonstationary, and thus transformation to stationary data by taking their first difference is also considered. For this case, the forecasting models are estimated using Δyt instead of yt in eq. (1), then these models are used to compute the forecast of the level of yt + h such as for the iterated forecast, and for the direct forecasts. All elements of the prior mean for the Minnesota prior are set to be zero as since all series are transformed to be stationary by the first differencing. Table 4 presents the MSFEs of the case of the first difference data. The MSFEs of the first difference data tend to have lower MSFEs than the case of the level data, particularly for the inflation rates. These results also indicate that iterated forecasts outperform direct forecasts and the SSVS improves forecast performances than other models though the Minn produces better in some cases.
This paper examines comparison of direct and iterated multistep forecasting performance using three estimators for VAR model – the inverted Normal-Wishart (INW) prior, the Minnesota prior and the SSVS prior. Theoretically, direct method is preferable since the direct forecasts are prone to be efficient and more robust to model misspecification. Iterated forecasts are more efficient if the one-step ahead model is not misspecified. Since George et al. (2008) show VAR with SSVS prior greatly improves the one-step ahead forecast, the coefficients are estimated more efficiently and thus an iterated multi-period forecast method would be more efficient than the direct method. So, it is of interest if direct forecasts are compared with iterated forecasts using SSVS VAR model. Although Pesaran et al. (2011) noted that whether the direct or iterated method produced better forecasts is ultimately an empirical question; this paper considers the case of three estimators of VAR for comparison of direct and iterated method using two DGPs and US macroeconomics data. The results are exactly same as Marcellino et al. (2006), that is, iterated forecasts for the INW and SSVS estimators have lower MSFEs than direct forecasts, particularly if the models are with long-lag and longer forecast horizon, while it is ambiguous for the case of the Minnesota prior. The SSVS estimator tends to appreciably improve the forecast performance against other estimators by the INW and the Minnesota prior in most cases.
As an empirical example an application of US macroeconomics is studied to show a benefit of using SSVS prior in a VAR. With longer lags and thus large number of parameters that may include many insignificant, it seems that SSVS alleviates over-parameterization problem in VAR model by restricting insignificant parameters of the model, and enables to improve forecasting performance, although the Minnesota prior also produces smaller MSFEs in some case than SSVS since the Minnesota prior provides shrinkages on the longer lags.
With these results, iterated forecasts are found to produce better forecast performances than direct forecasts, and Bayesian method such as the Minnesota prior model and the SSVS model outperform the INW, particularly with longer lag and longer forecast horizon.
Monte Carlo simulation: average MSFEs
|DGP 1||DGP 2|
|Forecast horizon||Forecast horizon|
|Lag = 4|
|Lag = 8|
|Lag by AIC|
|DGP 1||DGP 2|
|Forecast horizon||Forecast horizon|
|Lag = 4|
|Lag = 8|
|Lag by AIC|
MSFEs for US data: level data
|Lag = 12|
|Lag by AIC|
MSFEs for US data: first difference data
|Lag = 11|
|Lag by AIC|
Ang, A., Piazzesi, M. and Wei, M. (2006), “What does the yield curve tell us about GDP growth?”, Journal of Econometrics, Vol. 131 Nos 1-2, pp. 359-403.
Bhansali, R.J. (1996), “Asymptotically efficient autoregressive model selection for multistep prediction”, Annals of the Institute of Statistical Mathematics, Vol. 48 No. 3, pp. 577-602.
Bhansali, R. (1997), “Direct autoregressive predictors for multistep prediction: order selection and performance relative to the plug in predictors”, Statistica Sinica, Vol. 7, pp. 425-449.
Chevillon, G. and Hendry, D.F. (2005), “Non-parametric direct multi-step estimation for forecasting economic processes”, International Journal of Forecasting, Vol. 21 No. 2, pp. 201-218.
Clements, M.P. and Hendry, D.F. (1996), “Multi-step estimation for forecasting”, Oxford Bulletin of Economics and Statistics, Vol. 58 No. 4, pp. 657-684.
Cogley, T. and Sargent, T.J. (2005), “Drifts and volatilities: monetary policies and outcomes in the post WWII US”, Review of Economic Dynamics, Vol. 8 No. 2, pp. 262-302.
Doornik, J.A. (2013), Object-Oriented Matrix Programming Using Ox, Timberlake Consultants Press, London.
George, E.I. and McCulloch, R.E. (1997), “Approaches for Bayesian variable selection”, Statistica Sinica, Vol. 7, pp. 339-373.
George, E.I., Sun, D. and Ni, S. (2008), “Bayesian stochastic search for VAR model restrictions”, Journal of Econometrics, Vol. 142 No. 1, pp. 553-580.
Ing, C.-k. (2003), “Multistep prediction in autoregressive processes”, Econometric Theory, Vol. 19 No. 2, pp. 254-279.
Jochmann, M., Koop, G. and Strachan, R.W. (2010), “Bayesian forecasting using stochastic search variable selection in a VAR subject to breaks”, International Journal of Forecasting, Vol. 26 No. 2, pp. 326-347, doi: 10.1016/j.ijforecast.2009.11.002.
Jochmann, M., Koop, G., León-González, R. and Strachan, R.W. (2013), “Stochastic search variable selection in vector error correction models with an application to a model of the UK macroeconomy”, Journal of Applied Econometrics, Vol. 28 No. 4, pp. 62-81.
Kang, I.B. (2003), “Multi-period forecasting using different models for different horizons: an application to US economic time series data”, International Journal of Forecasting, Vol. 19 No. 3, pp. 387-400.
Koop, G., Leon-Gonzalez, R. and Strachan, R.W. (2009), “On the evolution of the monetary policy transmission mechanism”, Journal of Economic Dynamics and Control, Vol. 33 No. 4, pp. 997-1017.
Litterman, R.B. (1986), “Forecasting with Bayesian vector autoregressions: five year experience”, Journal of Business Economic Statistics, Vol. 4 No. 1, pp. 25-38.
Marcellino, M., Stock, J.H. and Watson, M.W. (2006), “A comparison of direct and iterated multistep AR methods for forecasting macroeconomic time series”, Journal of Econometrics, Vol. 135 No. 1, pp. 499-526.
Pesaran, M.H., Pick, A. and Timmermann, A. (2011), “Variable selection, estimation and inference for multi-period forecasting problems”, Journal of Econometrics, Vol. 164 No. 1, pp. 173-187.
Primiceri, G.E. (2005), “Time varying structural vector autoregressions and monetary policy”, Review of Economic Studies, Vol. 72 No. 3, pp. 821-852.
Sugita, K. (2018), Evaluation of forecasting performance using Bayesian stochastic search variable selection in a vector autoregression, Ryukyu Economics Working Paper #1, University of the Ryukyus.
Sugita, K. (2019a), Forecasting with vector autoregressions by Bayesian model averaging, Ryukyu Economics Working Paper #3, University of the Ryukyus.
Sugita, K. (2019b), Forecasting with vector autoregressions using Bayesian variable selection methods: comparison of direct and iterated methods, Ryukyu Economics Working Paper #2, University of the Ryukyus.
This work was supported by JSPS KAKENHI grant number 20K01591.