Topics in Identification, Limited Dependent Variables, Partial Observability, Experimentation, and Flexible Modeling: Part B: Volume 40B

Cover of Topics in Identification, Limited Dependent Variables, Partial Observability, Experimentation, and Flexible Modeling: Part B
Subject:

Table of contents

(12 chapters)
Abstract

We consider a semiparametric panel stochastic frontier model where one-sided firm effects representing inefficiencies are correlated with the regressors. A form of the Chamberlain-Mundlak device is used to relate the logarithm of the effects to the regressors resulting in a lognormal distribution for the effects. The function describing the technology is modeled nonparametrically using penalized splines. Both Bayesian and non-Bayesian approaches to estimation are considered, with an emphasis on Bayesian estimation. A Monte Carlo experiment is used to investigate the consequences of ignoring correlation between the effects and the regressors, and choosing the wrong functional form for the technology.

Abstract

This chapter examines the effect of the division of labor from a Bayesian viewpoint. While organizational reforms are crucial for cost reduction in the Japanese water supply industry, the effect of labor division in intra-organizational units on total costs has, to the best of our knowledge, not been examined empirically. Fortunately, a one-time survey of 79 Japanese water suppliers conducted in 2010 enables us to examine the effect. To examine this problem, a cost stochastic frontier model with endogenous regressors is considered in a cross-sectional setting, because the cost and the division of labor are regarded as simultaneously determined factors. From the empirical analysis, we obtain the following results: (1) total costs rise when the level of labor division becomes high; (2) ignoring the endogeneity leads to the underestimation of the impact of labor division on total costs; and (3) the estimation bias on inefficiency can be mitigated for relatively efficient organizations by including the labor division variable in the model, while the bias for relatively inefficient organizations needs to be controlled by considering its endogeneity. In summary, our results indicate that integration of internal sections is better than specialization in terms of costs for Japanese water supply organizations.

Abstract

We present a new procedure for nonparametric Bayesian estimation of regression functions. Specifically, our method makes use of an idea described in Frühwirth-Schnatter and Wagner (2010) to impose linearity exactly (conditional upon an unobserved binary indicator), yet also permits departures from linearity while imposing smoothness of the regression curves. An advantage of this approach is that the posterior probability of linearity is essentially produced as a by-product of the procedure. We apply our methods in both generated data experiments as well as in an illustrative application involving the impact of body mass index (BMI) on labor market earnings.

Abstract

In this chapter we consider the “Regularization of Derivative Expectation Operator” (Rodeo) of Lafferty and Wasserman (2008) and propose a modified Rodeo algorithm for semiparametric single index models (SIMs) in big data environment with many regressors. The method assumes sparsity that many of the regressors are irrelevant. It uses a greedy algorithm, in that, to estimate the semiparametric SIM of Ichimura (1993), all coefficients of the regressors are initially set to start from near zero, then we test iteratively if the derivative of the regression function estimator with respect to each coefficient is significantly different from zero. The basic idea of the modified Rodeo algorithm for SIM (to be called SIM-Rodeo) is to view the local bandwidth selection as a variable selection scheme which amplifies the coefficients for relevant variables while keeping the coefficients of irrelevant variables relatively small or at the initial starting values near zero. For sparse semiparametric SIM, the SIM-Rodeo algorithm is shown to attain consistency in variable selection. In addition, the algorithm is fast to finish the greedy steps. We compare SIM-Rodeo with SIM-Lasso method in Zeng et al. (2012). Our simulation results demonstrate that the proposed SIM-Rodeo method is consistent for variable selection and show that it has smaller integrated mean squared errors (IMSE) than SIM-Lasso.

Abstract

Bayesian additive regression trees (BART) is a fully Bayesian approach to modeling with ensembles of trees. BART can uncover complex regression functions with high-dimensional regressors in a fairly automatic way and provide Bayesian quantification of the uncertainty through the posterior. However, BART assumes independent and identical distributed (i.i.d) normal errors. This strong parametric assumption can lead to misleading inference and uncertainty quantification. In this chapter we use the classic Dirichlet process mixture (DPM) mechanism to nonparametrically model the error distribution. A key strength of BART is that default prior settings work reasonably well in a variety of problems. The challenge in extending BART is to choose the parameters of the DPM so that the strengths of the standard BART approach is not lost when the errors are close to normal, but the DPM has the ability to adapt to non-normal errors.

Abstract

Bayesian A/B inference (BABI) is a method that combines subjective prior information with data from A/B experiments to provide inference for lift – the difference in a measure of response in control and treatment, expressed as its ratio to the measure of response in control. The procedure is embedded in stable code that can be executed in a few seconds for an experiment, regardless of sample size, and caters to the objectives and technical background of the owners of experiments. BABI provides more powerful tests of the hypothesis of the impact of treatment on lift, and sharper conclusions about the value of lift, than do legacy conventional methods. In application to 21 large online experiments, the credible interval is 60% to 65% shorter than the conventional confidence interval in the median case, and by close to 100% in a significant proportion of cases; in rare cases, BABI credible intervals are longer than conventional confidence intervals and then by no more than about 10%.

Abstract

Heavy-tailed distributions present a tough setting for inference. They are also common in industrial applications, particularly with internet transaction datasets, and machine learners often analyze such data without considering the biases and risks associated with the misuse of standard tools. This chapter outlines a procedure for inference about the mean of a (possibly conditional) heavy-tailed distribution that combines nonparametric analysis for the bulk of the support with Bayesian parametric modeling – motivated from extreme value theory – for the heavy tail. The procedure is fast and massively scalable. The work should find application in settings wherever correct inference is important and reward tails are heavy; we illustrate the framework in causal inference for A/B experiments involving hundreds of millions of users of eBay.com.

Abstract

This chapter develops a framework for quantile regression in binary longitudinal data settings. A novel Markov chain Monte Carlo (MCMC) method is designed to fit the model and its computational efficiency is demonstrated in a simulation study. The proposed approach is flexible in that it can account for common and individual-specific parameters, as well as multivariate heterogeneity associated with several covariates. The methodology is applied to study female labor force participation and home ownership in the United States. The results offer new insights at the various quantiles, which are of interest to policymakers and researchers alike.

Abstract

Stochastic volatility models are of great importance in the field of mathematical finance, especially for accurately explaining the dynamics of financial derivatives. A quantile-based estimator for the location parameter of a stochastic volatility model is proposed by solving an optimization problem. In this chapter, the asymptotic distribution of the estimator is derived without assuming that the density function of the noise is positive around the corresponding population quantile. We also discuss a Bayesian approach to the quantile estimation problem and establish a result regarding the nature of the posterior distribution.

Abstract

This article is motivated by the lack of flexibility in Bayesian quantile regression for ordinal models where the error follows an asymmetric Laplace (AL) distribution. The inflexibility arises because the skewness of the distribution is completely specified when a quantile is chosen. To overcome this shortcoming, we derive the cumulative distribution function (and the moment-generating function) of the generalized asymmetric Laplace (GAL) distribution – a generalization of AL distribution that separates the skewness from the quantile parameter – and construct a working likelihood for the ordinal quantile model. The resulting framework is termed flexible Bayesian quantile regression for ordinal (FBQROR) models. However, its estimation is not straightforward. We address estimation issues and propose an efficient Markov chain Monte Carlo (MCMC) procedure based on Gibbs sampling and joint Metropolis–Hastings algorithm. The advantages of the proposed model are demonstrated in multiple simulation studies and implemented to analyze public opinion on homeownership as the best long-term investment in the United States following the Great Recession.

Cover of Topics in Identification, Limited Dependent Variables, Partial Observability, Experimentation, and Flexible Modeling: Part B
DOI
10.1108/S0731-9053201940B
Publication date
2019-10-18
Book series
Advances in Econometrics
Editors
Series copyright holder
Emerald Publishing Limited
ISBN
978-1-83867-420-5
eISBN
978-1-83867-419-9
Book series ISSN
0731-9053