Search results

1 – 10 of 483
Book part
Publication date: 23 November 2011

Gayaneh Kyureghian, Oral Capps and Rodolfo M. Nayga

The objective of this research is to examine, validate, and recommend techniques for handling the problem of missingness in observational data. We use a rich observational data…

Abstract

The objective of this research is to examine, validate, and recommend techniques for handling the problem of missingness in observational data. We use a rich observational data set, the Nielsen HomeScan data set, which allows us to effectively combine elements from simulated data sets: large numbers of observations, large number of data sets and variables, allowing elements of “design” that typically come with simulated data, and its observational nature. We created random 20% and 50% uniform missingness in our data sets and employed several widely used methods of single imputation, such as mean, regression, and stochastic regression imputations, and multiple imputation methods to fill in the data gaps. We compared these methods by measuring the error of predicting the missing values and the parameter estimates from the subsequent regression analysis using the imputed values. We also compared coverage or the percentages of intervals that covered the true parameter in both cases. Based on our results, the method of single regression or conditional mean imputation provided the best predictions of the missing price values with 28.34 and 28.59 mean absolute percent errors in 20% and 50% missingness settings, respectively. The imputation from conditional distribution method had the best rate of coverage. The parameter estimates based on data sets imputed by conditional mean method were consistently unbiased and had the smallest standard deviations. The multiple imputation methods had the best coverage of both the parameter estimates and predictions of the dependent variable.

Details

Missing Data Methods: Cross-sectional Methods and Applications
Type: Book
ISBN: 978-1-78052-525-9

Keywords

Book part
Publication date: 10 April 2019

Shu Yang and Jae Kwang Kim

Nearest neighbor imputation has a long tradition for handling item nonresponse in survey sampling. In this article, we study the asymptotic properties of the nearest neighbor…

Abstract

Nearest neighbor imputation has a long tradition for handling item nonresponse in survey sampling. In this article, we study the asymptotic properties of the nearest neighbor imputation estimator for general population parameters, including population means, proportions and quantiles. For variance estimation, we propose novel replication variance estimation, which is asymptotically valid and straightforward to implement. The main idea is to construct replicates of the estimator directly based on its asymptotically linear terms, instead of individual records of variables. The simulation results show that nearest neighbor imputation and the proposed variance estimation provide valid inferences for general population parameters.

Details

The Econometrics of Complex Survey Data
Type: Book
ISBN: 978-1-78756-726-9

Keywords

Abstract

Details

Transport Survey Quality and Innovation
Type: Book
ISBN: 978-0-08-044096-5

Book part
Publication date: 10 July 2006

Craig Enders, Samantha Dietz, Marjorie Montague and Jennifer Dixon

Missing data are a pervasive problem in special education research. The purpose of this chapter is to provide researchers with an overview of two “modern” alternatives for…

Abstract

Missing data are a pervasive problem in special education research. The purpose of this chapter is to provide researchers with an overview of two “modern” alternatives for handling missing data, full information maximum likelihood (FIML) and multiple imputation (MI). These techniques are currently considered to be the methodological “state of the art”, and generally provide more accurate parameter estimates than the traditional methods that are still common in published educational studies. The chapter begins with an overview of missing data theory, and provides brief descriptions of some traditional missing data techniques and their requisite assumptions. Detailed descriptions of FIML and MI are given, and the chapter concludes with an analytic example from a longitudinal study of depression.

Details

Applications of Research Methodology
Type: Book
ISBN: 978-0-76231-295-5

Abstract

Details

Transport Survey Quality and Innovation
Type: Book
ISBN: 978-0-08-044096-5

Book part
Publication date: 10 April 2019

Gustavo J. Canavire-Bacarreza, Alexander L. Lundberg and Alejandra Montoya-Agudelo

In 2014, the Colombian Government commissioned a unique national survey on illegal liquor. Interviewers purchased bottles of liquor from interviewees and tested them for…

Abstract

In 2014, the Colombian Government commissioned a unique national survey on illegal liquor. Interviewers purchased bottles of liquor from interviewees and tested them for authenticity in a laboratory. Two factors predict whether liquor is contraband (smuggled): (1) the absence of a receipt and (2) the presence of a discount offered by the seller. Neither factor predicts whether a bottle is adulterated. The results back a story in which sellers are complicit with a contraband economy, but whether buyers are complicit remains unclear. However, buyers are more likely to receive adulterated liquor when specifically asking for a discount.

Details

The Econometrics of Complex Survey Data
Type: Book
ISBN: 978-1-78756-726-9

Keywords

Book part
Publication date: 29 September 2023

Torben Juul Andersen

This chapter outlines how the comprehensive North American and European datasets were collected and explains the ensuing data cleaning process outlining three alternative methods…

Abstract

This chapter outlines how the comprehensive North American and European datasets were collected and explains the ensuing data cleaning process outlining three alternative methods applied to deal with missing values, the complete case, the multiple imputation (MI), and the K-nearest neighbor (KNN) methods. The complete case method is the conventional approach adopted in many mainstream management studies. We further discuss the implied assumption underlying use of this technique, which is rarely assessed, or tested in practice and explain the alternative imputation approaches in detail. Use of North American data is the norm but we also collected a European dataset, which is rarely done to enable subsequent comparative analysis between these geographical regions. We introduce the structure of firms organized within different industry classification schemes for use in the ensuing comparative analyses and provide base information on missing values in the original and cleaned datasets. The calculated performance indicators derived from the sampled data are defined and presented. We show how the three alternative approaches considered to deal with missing values have significantly different effects on the calculated performance measures in terms of extreme estimate ranges and mean performance values.

Details

A Study of Risky Business Outcomes: Adapting to Strategic Disruption
Type: Book
ISBN: 978-1-83797-074-2

Keywords

Book part
Publication date: 30 November 2011

Wensheng Kang

A linear interpolation (Lerp) approach, utilizing a common stochastic trend, is explored to impute missing values in nonstationary panel data models. The Lerp algorithm is…

Abstract

A linear interpolation (Lerp) approach, utilizing a common stochastic trend, is explored to impute missing values in nonstationary panel data models. The Lerp algorithm is considerably faster and easier to use than the leading methods recommended in the statistics literature. It shows through a set of simulations that the Lerp works well, whereas other existing methods fail to perform properly, when the panel data contain a high degree of missingness and/or a strong correlation across cross-sectional units. As an illustration, the method is applied to study the cost-of-living-index dataset with missing values. The test on the imputed panel data provides the supporting evidence for the U.S. economy convergence that depends on the state physical spatial proximities and the state industrial development similarities.

Details

Missing Data Methods: Time-Series Methods and Applications
Type: Book
ISBN: 978-1-78052-526-6

Keywords

Book part
Publication date: 29 September 2023

Torben Juul Andersen

This chapter first analyzes how the data-cleaning process affects the share of missing values in the extracted European and North American datasets. It then moves on to examine…

Abstract

This chapter first analyzes how the data-cleaning process affects the share of missing values in the extracted European and North American datasets. It then moves on to examine how three different approaches to treat the issue of missing values, Complete Case, Multiple Imputation Chained Equations (MICE), and K-Nearest Neighbor (KNN) imputations affect the number of firms and their average lifespan in the datasets compared to the original sample and assessed across different SIC industry divisions. This is extended to consider implied effects on the distribution of a key performance indicator, return on assets (ROA), calculating skewness and kurtosis measures for each of the treatment methods and across industry contexts. This consistently shows highly negatively skewed distributions with high positive excess kurtosis across all the industries where the KNN imputation treatment creates results with distribution characteristics that are closest to the original untreated data. We further analyze the persistency of the (extreme) left-skewed tails measured in terms of the share of outliers and extreme outliers, which shows consistent and rather high percentages of outliers around 15% of the full sample and extreme outliers around 7.5% indicating pervasive skewness in the data. Of the three alternative approaches to deal with missing values, the KNN imputation treatment is found to be the method that generates final datasets that most closely resemble the original data even though the Complete Case approach remains the norm in mainstream studies. One consequence of this is that most empirical studies are likely to underestimate the prevalence of extreme negative performance outcomes.

Details

A Study of Risky Business Outcomes: Adapting to Strategic Disruption
Type: Book
ISBN: 978-1-83797-074-2

Keywords

1 – 10 of 483