Search results
1 – 10 of 108Sampling units for the 2013 Methods-of-Payment survey were selected through an approximate stratified two-stage sampling design. To compensate for nonresponse and noncoverage and…
Abstract
Sampling units for the 2013 Methods-of-Payment survey were selected through an approximate stratified two-stage sampling design. To compensate for nonresponse and noncoverage and ensure consistency with external population counts, the observations are weighted through a raking procedure. We apply bootstrap resampling methods to estimate the variance, allowing for randomness from both the sampling design and raking procedure. We find that the variance is smaller when estimated through the bootstrap resampling method than through the naive linearization method, where the latter does not take into account the correlation between the variables used for weighting and the outcome variable of interest.
Details
Keywords
Son Nguyen, Phyllis Schumacher, Alan Olinsky and John Quinn
We study the performances of various predictive models including decision trees, random forests, neural networks, and linear discriminant analysis on an imbalanced data set of…
Abstract
We study the performances of various predictive models including decision trees, random forests, neural networks, and linear discriminant analysis on an imbalanced data set of home loan applications. During the process, we propose our undersampling algorithm to cope with the issues created by the imbalance of the data. Our technique is shown to work competitively against popular resampling techniques such as random oversampling, undersampling, synthetic minority oversampling technique (SMOTE), and random oversampling examples (ROSE). We also investigate the relation between the true positive rate, true negative rate, and the imbalance of the data.
Details
Keywords
Son Nguyen, Gao Niu, John Quinn, Alan Olinsky, Jonathan Ormsbee, Richard M. Smith and James Bishop
In recent years, the problem of classification with imbalanced data has been growing in popularity in the data-mining and machine-learning communities due to the emergence of an…
Abstract
In recent years, the problem of classification with imbalanced data has been growing in popularity in the data-mining and machine-learning communities due to the emergence of an abundance of imbalanced data in many fields. In this chapter, we compare the performance of six classification methods on an imbalanced dataset under the influence of four resampling techniques. These classification methods are the random forest, the support vector machine, logistic regression, k-nearest neighbor (KNN), the decision tree, and AdaBoost. Our study has shown that all of the classification methods have difficulty when working with the imbalanced data, with the KNN performing the worst, detecting only 27.4% of the minority class. However, with the help of resampling techniques, all of the classification methods experience improvement on overall performances. In particular, the Random Forest, in combination with the random over-sampling technique, performs the best, achieving 82.8% balanced accuracy (the average of the true-positive rate and true-negative rate).
We then propose a new procedure to resample the data. Our method is based on the idea of eliminating “easy” majority observations before under-sampling them. It has further improved the balanced accuracy of the Random Forest to 83.7%, making it the best approach for the imbalanced data.
Details
Keywords
Badi H. Baltagi, Georges Bresson, Anoop Chaturvedi and Guy Lacroix
This chapter extends the work of Baltagi, Bresson, Chaturvedi, and Lacroix (2018) to the popular dynamic panel data model. The authors investigate the robustness of Bayesian panel…
Abstract
This chapter extends the work of Baltagi, Bresson, Chaturvedi, and Lacroix (2018) to the popular dynamic panel data model. The authors investigate the robustness of Bayesian panel data models to possible misspecification of the prior distribution. The proposed robust Bayesian approach departs from the standard Bayesian framework in two ways. First, the authors consider the ε-contamination class of prior distributions for the model parameters as well as for the individual effects. Second, both the base elicited priors and the ε-contamination priors use Zellner’s (1986) g-priors for the variance–covariance matrices. The authors propose a general “toolbox” for a wide range of specifications which includes the dynamic panel model with random effects, with cross-correlated effects à la Chamberlain, for the Hausman–Taylor world and for dynamic panel data models with homogeneous/heterogeneous slopes and cross-sectional dependence. Using a Monte Carlo simulation study, the authors compare the finite sample properties of the proposed estimator to those of standard classical estimators. The chapter contributes to the dynamic panel data literature by proposing a general robust Bayesian framework which encompasses the conventional frequentist specifications and their associated estimation methods as special cases.
Details
Keywords
Alicia T. Lamere, Son Nguyen, Gao Niu, Alan Olinsky and John Quinn
Predicting a patient's length of stay (LOS) in a hospital setting has been widely researched. Accurately predicting an individual's LOS can have a significant impact on a…
Abstract
Predicting a patient's length of stay (LOS) in a hospital setting has been widely researched. Accurately predicting an individual's LOS can have a significant impact on a healthcare provider's ability to care for individuals by allowing them to properly prepare and manage resources. A hospital's productivity requires a delicate balance of maintaining enough staffing and resources without being overly equipped or wasteful. This has become even more important in light of the current COVID-19 pandemic, during which emergency departments around the globe have been inundated with patients and are struggling to manage their resources.
In this study, the authors focus on the prediction of LOS at the time of admission in emergency departments at Rhode Island hospitals through discharge data obtained from the Rhode Island Department of Health over the time period of 2012 and 2013. This work also explores the distribution of discharge dispositions in an effort to better characterize the resources patients require upon leaving the emergency department.
Details
Keywords
Nii Ayi Armah and Norman R. Swanson
In this chapter we discuss model selection and predictive accuracy tests in the context of parameter and model uncertainty under recursive and rolling estimation schemes. We begin…
Abstract
In this chapter we discuss model selection and predictive accuracy tests in the context of parameter and model uncertainty under recursive and rolling estimation schemes. We begin by summarizing some recent theoretical findings, with particular emphasis on the construction of valid bootstrap procedures for calculating the impact of parameter estimation error. We then discuss the Corradi and Swanson (2002) (CS) test of (non)linear out-of-sample Granger causality. Thereafter, we carry out a series of Monte Carlo experiments examining the properties of the CS and a variety of other related predictive accuracy and model selection type tests. Finally, we present the results of an empirical investigation of the marginal predictive content of money for income, in the spirit of Stock and Watson (1989), Swanson (1998) and Amato and Swanson (2001).
Otávio Bartalotti, Gray Calhoun and Yang He
This chapter develops a novel bootstrap procedure to obtain robust bias-corrected confidence intervals in regression discontinuity (RD) designs. The procedure uses a wild…
Abstract
This chapter develops a novel bootstrap procedure to obtain robust bias-corrected confidence intervals in regression discontinuity (RD) designs. The procedure uses a wild bootstrap from a second-order local polynomial to estimate the bias of the local linear RD estimator; the bias is then subtracted from the original estimator. The bias-corrected estimator is then bootstrapped itself to generate valid confidence intervals (CIs). The CIs generated by this procedure are valid under conditions similar to Calonico, Cattaneo, and Titiunik’s (2014) analytical correction – that is, when the bias of the naive RD estimator would otherwise prevent valid inference. This chapter also provides simulation evidence that our method is as accurate as the analytical corrections and we demonstrate its use through a reanalysis of Ludwig and Miller’s (2007) Head Start dataset.
Details
Keywords
R.Carter Hill, Lee C. Adkins and Keith A. Bender
The Heckman two-step estimator (Heckit) for the selectivity model is widely applied in Economics and other social sciences. In this model a non-zero outcome variable is observed…
Abstract
The Heckman two-step estimator (Heckit) for the selectivity model is widely applied in Economics and other social sciences. In this model a non-zero outcome variable is observed only if a latent variable is positive. The asymptotic covariance matrix for a two-step estimation procedure must account for the estimation error introduced in the first stage. We examine the finite sample size of tests based on alternative covariance matrix estimators. We do so by using Monte Carlo experiments to evaluate bootstrap generated critical values and critical values based on asymptotic theory.