Search results1 – 10 of over 26000
We extend Vuong’s (1989) model-selection statistic to allow for complex survey samples. As a further extension, we use an M-estimation setting so that the tests apply to…
We extend Vuong’s (1989) model-selection statistic to allow for complex survey samples. As a further extension, we use an M-estimation setting so that the tests apply to general estimation problems – such as linear and nonlinear least squares, Poisson regression and fractional response models, to name just a few – and not only to maximum likelihood settings. With stratified sampling, we show how the difference in objective functions should be weighted in order to obtain a suitable test statistic. Interestingly, the weights are needed in computing the model-selection statistic even in cases where stratification is appropriately exogenous, in which case the usual unweighted estimators for the parameters are consistent. With cluster samples and panel data, we show how to combine the weighted objective function with a cluster-robust variance estimator in order to expand the scope of the model-selection tests. A small simulation study shows that the weighted test is promising.
This paper uses a sample of school age children from the Nepal Demographic Health Survey (NDHS) to examine the relationship between maternal education and child schooling…
This paper uses a sample of school age children from the Nepal Demographic Health Survey (NDHS) to examine the relationship between maternal education and child schooling in Nepal. Taking advantage of the two-stage stratified sample design, we estimate a sample selection model controlling for cluster fixed effects. These results are then compared to OLS and Tobit models. Our analysis shows that being male significantly increases the likelihood of attending school and for those children attending school, it also affects the years of schooling. Parental education has a similarly positive effect on child school, but interestingly we find maternal education having a relatively greater effect on the schooling of girls. Our results also point to household wealth as having a positive effect on both the probability of schooling and the years of schooling in all our models, with the magnitude of these effects being similar for male and female children. Finally, a comparison of our results with a model ignoring cluster fixed effects produces results that are statistically different both in signs and in the levels of significance.
As existing studies show the accuracy of sampling methods depends heavily on the evaluation metric in web accessibility evaluation, the purpose of this paper is to propose…
As existing studies show the accuracy of sampling methods depends heavily on the evaluation metric in web accessibility evaluation, the purpose of this paper is to propose a sampling method OPS-WAQM optimized for Web Accessibility Quantitative Metric (WAQM). Furthermore, to support quick accessibility evaluation or real-time website accessibility monitoring, the authors also provide online extension for the sampling method.
In the OPS-WAQM method, the authors propose a minimal sampling error model for WAQM and use a greedy algorithm to approximately solve the optimization problem to determine the sample numbers in different layers. To make OPS-WAQM online, the authors apply the sampling in crawling strategy.
The sampling method OPS-WAQM and its online extension can both achieve good sampling quality by choosing the optimal sample numbers in different layers. Moreover, the online extension can also support quick accessibility evaluation by sampling and evaluating the pages in crawling.
To the best of the authors’ knowledge, the sampling method OPS-WAQM in this paper is the first attempt to optimize for a specific evaluation metric. Meanwhile, the online extension not only greatly reduces the serious I/O issues in existing web accessibility evaluation, but also supports quick web accessibility evaluation by sampling in crawling.
This paper aims to solve the problems of big data analytics for prediction including volume, veracity and velocity by improving the prediction result to an acceptable…
This paper aims to solve the problems of big data analytics for prediction including volume, veracity and velocity by improving the prediction result to an acceptable level and in the shortest possible time.
This paper is divided into two parts. The first one is to improve the result of the prediction. In this part, two ideas are proposed: the double pruning enhanced random forest algorithm and extracting a shared learning base from the stratified random sampling method to obtain a representative learning base of all original data. The second part proposes to design a distributed architecture supported by new technologies solutions, which in turn works in a coherent and efficient way with the sampling strategy under the supervision of the Map-Reduce algorithm.
The representative learning base obtained by the integration of two learning bases, the partial base and the shared base, presents an excellent representation of the original data set and gives very good results of the Big Data predictive analytics. Furthermore, these results were supported by the improved random forests supervised learning method, which played a key role in this context.
All companies are concerned, especially those with large amounts of information and want to screen them to improve their knowledge for the customer and optimize their campaigns.
Sampling units for the 2013 Methods-of-Payment survey were selected through an approximate stratified two-stage sampling design. To compensate for nonresponse and…
Sampling units for the 2013 Methods-of-Payment survey were selected through an approximate stratified two-stage sampling design. To compensate for nonresponse and noncoverage and ensure consistency with external population counts, the observations are weighted through a raking procedure. We apply bootstrap resampling methods to estimate the variance, allowing for randomness from both the sampling design and raking procedure. We find that the variance is smaller when estimated through the bootstrap resampling method than through the naive linearization method, where the latter does not take into account the correlation between the variables used for weighting and the outcome variable of interest.