Search results

1 – 10 of over 1000
Book part
Publication date: 23 November 2011

Gayaneh Kyureghian, Oral Capps and Rodolfo M. Nayga

The objective of this research is to examine, validate, and recommend techniques for handling the problem of missingness in observational data. We use a rich observational data…

Abstract

The objective of this research is to examine, validate, and recommend techniques for handling the problem of missingness in observational data. We use a rich observational data set, the Nielsen HomeScan data set, which allows us to effectively combine elements from simulated data sets: large numbers of observations, large number of data sets and variables, allowing elements of “design” that typically come with simulated data, and its observational nature. We created random 20% and 50% uniform missingness in our data sets and employed several widely used methods of single imputation, such as mean, regression, and stochastic regression imputations, and multiple imputation methods to fill in the data gaps. We compared these methods by measuring the error of predicting the missing values and the parameter estimates from the subsequent regression analysis using the imputed values. We also compared coverage or the percentages of intervals that covered the true parameter in both cases. Based on our results, the method of single regression or conditional mean imputation provided the best predictions of the missing price values with 28.34 and 28.59 mean absolute percent errors in 20% and 50% missingness settings, respectively. The imputation from conditional distribution method had the best rate of coverage. The parameter estimates based on data sets imputed by conditional mean method were consistently unbiased and had the smallest standard deviations. The multiple imputation methods had the best coverage of both the parameter estimates and predictions of the dependent variable.

Details

Missing Data Methods: Cross-sectional Methods and Applications
Type: Book
ISBN: 978-1-78052-525-9

Keywords

Article
Publication date: 24 August 2018

Jewoo Kim and Jongho Im

The purpose of this paper is to introduce a new multiple imputation method that can effectively manage missing values in online review data, thereby allowing the online review…

Abstract

Purpose

The purpose of this paper is to introduce a new multiple imputation method that can effectively manage missing values in online review data, thereby allowing the online review analysis to yield valid results by using all available data.

Design/methodology/approach

This study develops a missing data method based on the multivariate imputation chained equation to generate imputed values for online reviews. Sentiment analysis is used to incorporate customers’ textual opinions as the auxiliary information in the imputation procedures. To check the validity of the proposed imputation method, the authors apply this method to missing values of sub-ratings on hotel attributes in both the simulated and real Honolulu hotel review data sets. The estimation results are compared to those of different missing data techniques, namely, listwise deletion and conventional multiple imputation which does not consider text reviews.

Findings

The findings from the simulation analysis show that the imputation method of the authors produces more efficient and less biased estimates compared to the other two missing data techniques when text reviews are possibly associated with the rating scores and response mechanism. When applying the imputation method to the real hotel review data, the findings show that the text sentiment-based propensity score can effectively explain the missingness of sub-ratings on hotel attributes, and the imputation method considering those propensity scores has better estimation results than the other techniques as in the simulation analysis.

Originality/value

This study extends multiple imputation to online data considering its spontaneous and unstructured nature. This new method helps make the fuller use of the observed online data while avoiding potential missing problems.

Details

International Journal of Contemporary Hospitality Management, vol. 30 no. 11
Type: Research Article
ISSN: 0959-6119

Keywords

Article
Publication date: 12 August 2021

Pooja Rani, Rajneesh Kumar and Anurag Jain

Decision support systems developed using machine learning classifiers have become a valuable tool in predicting various diseases. However, the performance of these systems is…

Abstract

Purpose

Decision support systems developed using machine learning classifiers have become a valuable tool in predicting various diseases. However, the performance of these systems is adversely affected by the missing values in medical datasets. Imputation methods are used to predict these missing values. In this paper, a new imputation method called hybrid imputation optimized by the classifier (HIOC) is proposed to predict missing values efficiently.

Design/methodology/approach

The proposed HIOC is developed by using a classifier to combine multivariate imputation by chained equations (MICE), K nearest neighbor (KNN), mean and mode imputation methods in an optimum way. Performance of HIOC has been compared to MICE, KNN, and mean and mode methods. Four classifiers support vector machine (SVM), naive Bayes (NB), random forest (RF) and decision tree (DT) have been used to evaluate the performance of imputation methods.

Findings

The results show that HIOC performed efficiently even with a high rate of missing values. It had reduced root mean square error (RMSE) up to 17.32% in the heart disease dataset and 34.73% in the breast cancer dataset. Correct prediction of missing values improved the accuracy of the classifiers in predicting diseases. It increased classification accuracy up to 18.61% in the heart disease dataset and 6.20% in the breast cancer dataset.

Originality/value

The proposed HIOC is a new hybrid imputation method that can efficiently predict missing values in any medical dataset.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 14 no. 4
Type: Research Article
ISSN: 1756-378X

Keywords

Book part
Publication date: 6 September 2021

Line Ettrich and Torben Juul Andersen

The world in which companies operate today is volatile, uncertain, complex, and ambiguous, thus subjecting contemporary forms to an array of risks that challenge their viability…

Abstract

The world in which companies operate today is volatile, uncertain, complex, and ambiguous, thus subjecting contemporary forms to an array of risks that challenge their viability in an increasingly competitive landscape. Organizations that cling to their traditional ways of operating impede their ability to survive while those able to embrace evolving changes and lever their strategic response capabilities (SRCs) will thrive against the odds. The possession of such capabilities has become a prominent explanation for effective adaptation to the impending changes but is rarely analyzed and tested empirically. Strategic adaptation typically assumes innovation as an important component, but we know little about how the innovative processes interact with the firm’s SRCs. Hence, this study investigates these implied relationships to discern their effects on organizational performance and risk outcomes. It explores the effects of SRCs and the role of innovation as intertwined adaptive mechanisms supporting strategic renewal that can attain superior performance and risk effects. The relationships are analyzed based on a large sample of US manufacturing firms over the decade 2010–2019. The study reveals that firms possessing effective SRCs have the ability to exploit opportunities and deflect risky situations to gain favorable performance and risk outcomes. While innovation indeed plays a role, the precise nature and dynamic effect thereof remain inconclusive.

Details

Strategic Responses for a Sustainable Future: New Research in International Management
Type: Book
ISBN: 978-1-80071-929-3

Keywords

Article
Publication date: 9 May 2016

Sanna Sintonen, Anssi Tarkiainen, John W. Cadogan, Olli Kuivalainen, Nick Lee and Sanna Sundqvist

The purpose of this paper is to focus on the case where – by design – one needs to impute cross-country cross-survey (CCCS) data (situation typical for example among multinational…

1472

Abstract

Purpose

The purpose of this paper is to focus on the case where – by design – one needs to impute cross-country cross-survey (CCCS) data (situation typical for example among multinational firms who are confronted with the need to carry out comparative marketing surveys with respondents located in several countries). Importantly, while some work demonstrates approaches for single-item direct measures, no prior research has examined the common situation in international marketing where the researcher needs to use multi-item scales of latent constructs. The paper presents problem areas related to the choices international marketers have to make when doing cross-country/cross-survey research and provides guidance for future research.

Design/methodology/approach

Multi-country sample of real data is used as an example of cross-sample imputation (292 New Zealand exporters and 302 Finnish ones) the international entrepreneurial orientation (IEO) data. Three variations of the input data are tested: first, imputation based on all the data available for the measurement model; second, imputation based on the set of items based on the invariance structure of the joint items shared across the two groups; and third, imputation based both on examination of the invariance structures of the joint items and the performance of the measurement model in the group where the full data was originally available.

Findings

Based on distribution comparisons imputation for New Zealand after completing the measurement model with Finnish data (Model C) gave the most promising results. Consequently, using knowledge on between country measurement qualities may improve the imputation results, but this benefit comes with a downside since it simultaneously reduces the amount of data used for imputation. None of the imputation models leads to the same statistical inferences about covariances between latent constructs than as the original full data, however.

Research limitations/implications

Considering multiple imputation, the present exploratory study suggests that there are several concerns and issues that should be taken into account when planning CCCSs (or split questionnaire or sub-sampling designs). Even if there are several advantages available for well-implemented CCCS designs such as shorter questionnaires and improved response rates, these concerns lead us to question the appropriateness of the CCCS approach in general, due to the need to impute across the samples.

Originality/value

The combination of cross-country and cross-survey approaches is novel to international marketing, and it is not known how the different procedures utilized in imputation affect the results and their validity and reliability. The authors demonstrate the consequences of the various imputation strategy choices taken by using a real example of a two-country sample. The exploration may have significant implications to international marketing researchers and the paper offers stimulus for further research in the area.

Details

International Marketing Review, vol. 33 no. 3
Type: Research Article
ISSN: 0265-1335

Keywords

Article
Publication date: 2 April 2021

Tressy Thomas and Enayat Rajabi

The primary aim of this study is to review the studies from different dimensions including type of methods, experimentation setup and evaluation metrics used in the novel…

1376

Abstract

Purpose

The primary aim of this study is to review the studies from different dimensions including type of methods, experimentation setup and evaluation metrics used in the novel approaches proposed for data imputation, particularly in the machine learning (ML) area. This ultimately provides an understanding about how well the proposed framework is evaluated and what type and ratio of missingness are addressed in the proposals. The review questions in this study are (1) what are the ML-based imputation methods studied and proposed during 2010–2020? (2) How the experimentation setup, characteristics of data sets and missingness are employed in these studies? (3) What metrics were used for the evaluation of imputation method?

Design/methodology/approach

The review process went through the standard identification, screening and selection process. The initial search on electronic databases for missing value imputation (MVI) based on ML algorithms returned a large number of papers totaling at 2,883. Most of the papers at this stage were not exactly an MVI technique relevant to this study. The literature reviews are first scanned in the title for relevancy, and 306 literature reviews were identified as appropriate. Upon reviewing the abstract text, 151 literature reviews that are not eligible for this study are dropped. This resulted in 155 research papers suitable for full-text review. From this, 117 papers are used in assessment of the review questions.

Findings

This study shows that clustering- and instance-based algorithms are the most proposed MVI methods. Percentage of correct prediction (PCP) and root mean square error (RMSE) are most used evaluation metrics in these studies. For experimentation, majority of the studies sourced the data sets from publicly available data set repositories. A common approach is that the complete data set is set as baseline to evaluate the effectiveness of imputation on the test data sets with artificially induced missingness. The data set size and missingness ratio varied across the experimentations, while missing datatype and mechanism are pertaining to the capability of imputation. Computational expense is a concern, and experimentation using large data sets appears to be a challenge.

Originality/value

It is understood from the review that there is no single universal solution to missing data problem. Variants of ML approaches work well with the missingness based on the characteristics of the data set. Most of the methods reviewed lack generalization with regard to applicability. Another concern related to applicability is the complexity of the formulation and implementation of the algorithm. Imputations based on k-nearest neighbors (kNN) and clustering algorithms which are simple and easy to implement make it popular across various domains.

Details

Data Technologies and Applications, vol. 55 no. 4
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 21 December 2021

Ling Jiang, Tingsheng Zhao, Chuxuan Feng and Wei Zhang

This research is aimed at predicting tower crane accident phases with incomplete data.

360

Abstract

Purpose

This research is aimed at predicting tower crane accident phases with incomplete data.

Design/methodology/approach

The tower crane accidents are collected for prediction model training. Random forest (RF) is used to conduct prediction. When there are missing values in the new inputs, they should be filled in advance. Nevertheless, it is difficult to collect complete data on construction site. Thus, the authors use multiple imputation (MI) method to improve RF. Finally the prediction model is applied to a case study.

Findings

The results show that multiple imputation RF (MIRF) can effectively predict tower crane accident when the data are incomplete. This research provides the importance rank of tower crane safety factors. The critical factors should be focused on site, because the missing data affect the prediction results seriously. Also the value of critical factors influences the safety of tower crane.

Practical implication

This research promotes the application of machine learning methods for accident prediction in actual projects. According to the onsite data, the authors can predict the accident phase of tower crane. The results can be used for tower crane accident prevention.

Originality/value

Previous studies have seldom predicted tower crane accidents, especially the phase of accident. This research uses tower crane data collected on site to predict the phase of the tower crane accident. The incomplete data collection is considered in this research according to the actual situation.

Details

Engineering, Construction and Architectural Management, vol. 30 no. 3
Type: Research Article
ISSN: 0969-9988

Keywords

Book part
Publication date: 10 July 2006

Craig Enders, Samantha Dietz, Marjorie Montague and Jennifer Dixon

Missing data are a pervasive problem in special education research. The purpose of this chapter is to provide researchers with an overview of two “modern” alternatives for…

Abstract

Missing data are a pervasive problem in special education research. The purpose of this chapter is to provide researchers with an overview of two “modern” alternatives for handling missing data, full information maximum likelihood (FIML) and multiple imputation (MI). These techniques are currently considered to be the methodological “state of the art”, and generally provide more accurate parameter estimates than the traditional methods that are still common in published educational studies. The chapter begins with an overview of missing data theory, and provides brief descriptions of some traditional missing data techniques and their requisite assumptions. Detailed descriptions of FIML and MI are given, and the chapter concludes with an analytic example from a longitudinal study of depression.

Details

Applications of Research Methodology
Type: Book
ISBN: 978-0-76231-295-5

Article
Publication date: 12 October 2020

Ibrahim Said Ahmad, Azuraliza Abu Bakar, Mohd Ridzwan Yaakub and Mohammad Darwich

Sequel movies are very popular; however, there are limited studies on sequel movie revenue prediction. The purpose of this paper is to propose a sentiment analysis based model for…

Abstract

Purpose

Sequel movies are very popular; however, there are limited studies on sequel movie revenue prediction. The purpose of this paper is to propose a sentiment analysis based model for sequel movie revenue prediction and to propose a missing value imputation method for the sequel revenue prediction dataset.

Design/methodology/approach

A sequel of a successful movie will most likely also be successful. Therefore, we propose a supervised learning approach in which data are created from sequel movies to predict the box-office revenue of an upcoming sequel. The algorithms used in the prediction are multiple linear regression, support vector machine and multilayer perceptron neural network.

Findings

The results show that using four sequel movies in a franchise to predict the box-office revenue of a fifth sequel achieved better prediction than using three sequels, which was also better than using two sequel movies.

Research limitations/implications

The model produced will be beneficial to movie producers and other stakeholders in the movie industry in deciding the viability of producing a movie sequel.

Originality/value

Previous studies do not give priority to sequel movies in movie revenue prediction. Additionally, a new missing value imputation method was introduced. Finally, sequel movie revenue prediction dataset was prepared.

Details

Data Technologies and Applications, vol. 54 no. 5
Type: Research Article
ISSN: 2514-9288

Keywords

Content available
Article
Publication date: 30 October 2018

Darryl Ahner and Luke Brantley

This paper aims to address the reasons behind the varying levels of volatile conflict and peace as seen during the Arab Spring of 2011 to 2015. During this time, higher rates of…

1164

Abstract

Purpose

This paper aims to address the reasons behind the varying levels of volatile conflict and peace as seen during the Arab Spring of 2011 to 2015. During this time, higher rates of conflict transition occurred than normally observed in previous studies for certain Middle Eastern and North African countries.

Design/methodology/approach

Previous prediction models decrease in accuracy during times of volatile conflict transition. Also, proper strategies for handling the Arab Spring have been highly debated. This paper identifies which countries were affected by the Arab Spring and then applies data analysis techniques to predict a country’s tendency to suffer from high-intensity, violent conflict. A large number of open-source variables are incorporated by implementing an imputation methodology useful to conflict prediction studies in the future. The imputed variables are implemented in four model building techniques: purposeful selection of covariates, logical selection of covariates, principal component regression and representative principal component regression resulting in modeling accuracies exceeding 90 per cent.

Findings

Analysis of the models produced by the four techniques supports hypotheses which propose political opportunity and quality of life factors as causations for increased instability following the Arab Spring.

Originality/value

Of particular note is that the paper addresses the reasons behind the varying levels of volatile conflict and peace as seen during the Arab Spring of 2011 to 2015 through data analytics. This paper considers various open-source, readily available data for inclusion in multiple models of identified Arab Spring nations in addition to implementing a novel imputation methodology useful to conflict prediction studies in the future.

Details

Journal of Defense Analytics and Logistics, vol. 2 no. 2
Type: Research Article
ISSN: 2399-6439

Keywords

1 – 10 of over 1000