Missing Data Methods: Cross-sectional Methods and Applications: Volume 27 Part 1


Table of contents

(18 chapters)
Content available
Content available

List of Contributors

Pages vii-viii
Content available

“The Elephant in the Corner: A Cautionary Tale About Measurement Error in Treatment Effects Models” by Daniel L. Millimet discusses the current use of the unobserved-outcome framework to estimate population-averaged treatment effects, and it exposes the sensitivity of these estimators to assumption of no measurement error. The Monte Carlo simulation evidence in this chapter indicates that “nonclassical measurement error in the covariates, mean-reverting measurement error in the outcome, and simultaneous measurement errors in the outcome, treatment assignment, and covariates have a dramatic, adverse effect on the performance of the various estimators even with relatively small and infrequent errors” (Millimet article, p. 1–39). To some extent, all the estimators analyzed by Millimet are based on weak functional form assumptions and use semiparametric or nonparametric methods. Millimet's results indicate the need for measurement error models be they parametric or nonparametric models, see Schennach (2007), Hu and Schennach (2008), and Matzkin (2007) for some recent research in nonparametric approaches. Chapter 7 develops a Bayesian estimator that can handle some of the measurement errors discussed in this chapter.

Researchers in economics and other disciplines are often interested in the causal effect of a binary treatment on outcomes. Econometric methods used to estimate such effects are divided into one of two strands depending on whether they require unconfoundedness (i.e., independence of potential outcomes and treatment assignment conditional on a set of observable covariates). When this assumption holds, researchers now have a wide array of estimation techniques from which to choose. However, very little is known about their performance – both in absolute and relative terms – when measurement error is present. In this study, the performance of several estimators that require unconfoundedness, as well as some that do not, are evaluated in a Monte Carlo study. In all cases, the data-generating process is such that unconfoundedness holds with the ‘real’ data. However, measurement error is then introduced. Specifically, three types of measurement error are considered: (i) errors in treatment assignment, (ii) errors in the outcome, and (iii) errors in the vector of covariates. Recommendations for researchers are provided.

This chapter reviews the recent developments in the estimation of panel data models in which some variables are only partially observed. Specifically we consider the issues of censoring, sample selection, attrition, missing data, and measurement error in panel data models. Although most of these issues, except attrition, occur in cross-sectional or time series data as well, panel data models introduce some particular challenges due to the presence of persistent individual effects. The past two decades have seen many stimulating developments in the econometric and statistical methods dealing with these problems. This review focuses on two strands of research of the rapidly growing literature on semiparametric and nonparametric methods for panel data models: (i) estimation of panel models with discrete or limited dependent variables and (ii) estimation of panel models based on nonparametric deconvolution methods.

Standard stratified sampling (SSS) is a popular non-random sampling scheme. Maximum likelihood estimator (MLE) is inconsistent if some sampled strata depend on the response variable Y (‘endogenous samples’) or if some Y-dependent strata are not sampled at all (‘truncated sample’ – a missing data problem). Various versions of MLE have appeared in the literature, and this paper reviews practical likelihood-based estimators for endogenous or truncated samples in SSS. Also a new estimator ‘Estimated-EX MLE’ is introduced using an extra random sample on X (not on Y) to estimate the distribution EX of X. As information on Y may be hard to get, this estimator's data demand is weaker than an extra random sample on Y in some other estimators. The estimator can greatly improve the efficiency of ‘Fixed-X MLE’ which conditions on X, even if the extra sample size is small. In fact, Estimated-EX MLE does not estimate the full FX as it needs only a sample average using the extra sample. Estimated-EX MLE can be almost as efficient as the ‘Known-FX MLE’. A small-scale simulation study is provided to illustrate these points.

This chapter studies the large sample properties of a subclassification-based estimator of the dose–response function under ignorability. Employing standard regularity conditions, it is shown that the estimator is root-n consistent, asymptotically linear, and semiparametric efficient in large samples. A consistent estimator of the standard-error is also developed under the same assumptions. In a Monte Carlo experiment, we investigate the finite sample performance of this simple and intuitive estimator and compare it to others commonly employed in the literature.

This chapter proposes a simple procedure to estimate average derivatives in nonparametric regression models with incomplete responses. The method consists of replacing the responses with an appropriately weighted version and then use local polynomial estimation for the average derivatives. The resulting estimator is shown to be asymptotically normal, and an estimator of its asymptotic variance–covariance matrix is also shown to be consistent. Monte Carlo experiments show that the proposed estimator has desirable finite sample properties.

Observations in a dataset are rarely missing at random. One can control for this non-random selection of the data by introducing fixed effects or other nuisance parameters. This chapter deals with consistent estimation the presence of many nuisance parameters. It derives a new orthogonality concept that gives sufficient conditions for consistent estimation of the parameters of interest. It also shows how this orthogonality concept can be used to derive and compare estimators. The chapter then shows how to use the orthogonality concept to derive estimators for unbalanced panels and incomplete data sets (missing data).

This chapter presents a Bayesian analysis of the endogenous treatment model with misclassified treatment participation. Our estimation procedure utilizes a combination of data augmentation, Gibbs sampling, and Metropolis–Hastings to obtain estimates of the misclassification probabilities and the treatment effect. Simulations demonstrate that the proposed Bayesian estimator accurately estimates the treatment effect in light of misclassification and endogeneity.

A common approach to dealing with missing data is to estimate the model on the common subset of data, by necessity throwing away potentially useful data. We derive a new probit type estimator for models with missing covariate data where the dependent variable is binary. For the benchmark case of conditional multinormality we show that our estimator is efficient and provide exact formulae for its asymptotic variance. Simulation results show that our estimator outperforms popular alternatives and is robust to departures from the parametric assumptions adopted in the benchmark case. We illustrate our estimator by examining the portfolio allocation decision of Italian households.

This chapter uses the nonlinear difference-in-difference (NL-DID) methodology developed by Athey and Imbens (2006) to estimate the effects of a treatment program on the entire distribution of an outcome variable. The NL-DID estimates the entire counterfactual distribution of an outcome variable that would have occurred in the absence of treatment. This chapter extends the Monte Carlo results in Athey and Imbens's (2006) to assess the efficacy of the NL-DID estimators in finite samples. Furthermore, the NL-DID methodology recovers the entire outcome distribution in the absence of treatment. Further, we consider the empirical size and power of tests statistics for equality of mean, medians, and complete distributions as suggested by Abadie (2002). The results show that the NL-DID estimator can effectively be used to recover the average treatment effect, as well as the entire distribution of the treatment effects when there is no selection during the treatment period in finite samples.

We consider the Bayes estimation of a multivariate sample selection model with p pairs of selection and outcome variables. Each of the variables may be discrete or continuous with a parametric marginal distribution, and their dependence structure is modeled through a Gaussian copula function. Markov chain Monte Carlo methods are used to simulate from the posterior distribution of interest. The methods are illustrated in a simulation study and an application from transportation economics.

In this chapter, we consider the nonparametric estimation of the average treatment effect (ATE) based on direct estimation of the conditional treatment effect. We establish the asymptotic distribution of the proposed ATE estimator. We also consider consistent testing for a parametric functional form for the conditional treatment effect function. A small-scale Monte Carlo simulation study is reported to examine the finite sample performance of the proposed estimator.

The objective of this research is to examine, validate, and recommend techniques for handling the problem of missingness in observational data. We use a rich observational data set, the Nielsen HomeScan data set, which allows us to effectively combine elements from simulated data sets: large numbers of observations, large number of data sets and variables, allowing elements of “design” that typically come with simulated data, and its observational nature. We created random 20% and 50% uniform missingness in our data sets and employed several widely used methods of single imputation, such as mean, regression, and stochastic regression imputations, and multiple imputation methods to fill in the data gaps. We compared these methods by measuring the error of predicting the missing values and the parameter estimates from the subsequent regression analysis using the imputed values. We also compared coverage or the percentages of intervals that covered the true parameter in both cases. Based on our results, the method of single regression or conditional mean imputation provided the best predictions of the missing price values with 28.34 and 28.59 mean absolute percent errors in 20% and 50% missingness settings, respectively. The imputation from conditional distribution method had the best rate of coverage. The parameter estimates based on data sets imputed by conditional mean method were consistently unbiased and had the smallest standard deviations. The multiple imputation methods had the best coverage of both the parameter estimates and predictions of the dependent variable.

Publication date
Book series
Advances in Econometrics
Series copyright holder
Emerald Publishing Limited
Book series ISSN