Search results

1 – 10 of over 1000
Book part
Publication date: 23 November 2011

Gayaneh Kyureghian, Oral Capps and Rodolfo M. Nayga

The objective of this research is to examine, validate, and recommend techniques for handling the problem of missingness in observational data. We use a rich observational data…

Abstract

The objective of this research is to examine, validate, and recommend techniques for handling the problem of missingness in observational data. We use a rich observational data set, the Nielsen HomeScan data set, which allows us to effectively combine elements from simulated data sets: large numbers of observations, large number of data sets and variables, allowing elements of “design” that typically come with simulated data, and its observational nature. We created random 20% and 50% uniform missingness in our data sets and employed several widely used methods of single imputation, such as mean, regression, and stochastic regression imputations, and multiple imputation methods to fill in the data gaps. We compared these methods by measuring the error of predicting the missing values and the parameter estimates from the subsequent regression analysis using the imputed values. We also compared coverage or the percentages of intervals that covered the true parameter in both cases. Based on our results, the method of single regression or conditional mean imputation provided the best predictions of the missing price values with 28.34 and 28.59 mean absolute percent errors in 20% and 50% missingness settings, respectively. The imputation from conditional distribution method had the best rate of coverage. The parameter estimates based on data sets imputed by conditional mean method were consistently unbiased and had the smallest standard deviations. The multiple imputation methods had the best coverage of both the parameter estimates and predictions of the dependent variable.

Details

Missing Data Methods: Cross-sectional Methods and Applications
Type: Book
ISBN: 978-1-78052-525-9

Keywords

Article
Publication date: 12 August 2021

Pooja Rani, Rajneesh Kumar and Anurag Jain

Decision support systems developed using machine learning classifiers have become a valuable tool in predicting various diseases. However, the performance of these systems is…

Abstract

Purpose

Decision support systems developed using machine learning classifiers have become a valuable tool in predicting various diseases. However, the performance of these systems is adversely affected by the missing values in medical datasets. Imputation methods are used to predict these missing values. In this paper, a new imputation method called hybrid imputation optimized by the classifier (HIOC) is proposed to predict missing values efficiently.

Design/methodology/approach

The proposed HIOC is developed by using a classifier to combine multivariate imputation by chained equations (MICE), K nearest neighbor (KNN), mean and mode imputation methods in an optimum way. Performance of HIOC has been compared to MICE, KNN, and mean and mode methods. Four classifiers support vector machine (SVM), naive Bayes (NB), random forest (RF) and decision tree (DT) have been used to evaluate the performance of imputation methods.

Findings

The results show that HIOC performed efficiently even with a high rate of missing values. It had reduced root mean square error (RMSE) up to 17.32% in the heart disease dataset and 34.73% in the breast cancer dataset. Correct prediction of missing values improved the accuracy of the classifiers in predicting diseases. It increased classification accuracy up to 18.61% in the heart disease dataset and 6.20% in the breast cancer dataset.

Originality/value

The proposed HIOC is a new hybrid imputation method that can efficiently predict missing values in any medical dataset.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 14 no. 4
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 27 July 2021

Sonia Goel and Meena Tushir

In real-world decision-making, high accuracy data analysis is essential in a ubiquitous environment. However, we encounter missing data while collecting user-related data…

Abstract

Purpose

In real-world decision-making, high accuracy data analysis is essential in a ubiquitous environment. However, we encounter missing data while collecting user-related data information because of various privacy concerns on account of a user. This paper aims to deal with incomplete data for fuzzy model identification, a new method of parameter estimation of a Takagi–Sugeno model in the presence of missing features.

Design/methodology/approach

In this work, authors proposed a three-fold approach for fuzzy model identification in which imputation-based linear interpolation technique is used to estimate missing features of the data, and then fuzzy c-means clustering is used for determining optimal number of rules and for the determination of parameters of membership functions of the fuzzy model. Finally, the optimization of the all antecedent and consequent parameters along with the width of the antecedent (Gaussian) membership function is done by gradient descent algorithm based on the minimization of root mean square error.

Findings

The proposed method is tested on two well-known simulation examples as well as on a real data set, and the performance is compared with some traditional methods. The result analysis and statistical analysis show that the proposed model has achieved a considerable improvement in accuracy in the presence of varying degree of data incompleteness.

Originality/value

The proposed method works well for fuzzy model identification method, a new method of parameter estimation of a Takagi–Sugeno model in the presence of missing features with varying degree of missing data as compared to some well-known methods.

Details

International Journal of Pervasive Computing and Communications, vol. 17 no. 4
Type: Research Article
ISSN: 1742-7371

Keywords

Article
Publication date: 2 April 2021

Tressy Thomas and Enayat Rajabi

The primary aim of this study is to review the studies from different dimensions including type of methods, experimentation setup and evaluation metrics used in the novel…

1618

Abstract

Purpose

The primary aim of this study is to review the studies from different dimensions including type of methods, experimentation setup and evaluation metrics used in the novel approaches proposed for data imputation, particularly in the machine learning (ML) area. This ultimately provides an understanding about how well the proposed framework is evaluated and what type and ratio of missingness are addressed in the proposals. The review questions in this study are (1) what are the ML-based imputation methods studied and proposed during 2010–2020? (2) How the experimentation setup, characteristics of data sets and missingness are employed in these studies? (3) What metrics were used for the evaluation of imputation method?

Design/methodology/approach

The review process went through the standard identification, screening and selection process. The initial search on electronic databases for missing value imputation (MVI) based on ML algorithms returned a large number of papers totaling at 2,883. Most of the papers at this stage were not exactly an MVI technique relevant to this study. The literature reviews are first scanned in the title for relevancy, and 306 literature reviews were identified as appropriate. Upon reviewing the abstract text, 151 literature reviews that are not eligible for this study are dropped. This resulted in 155 research papers suitable for full-text review. From this, 117 papers are used in assessment of the review questions.

Findings

This study shows that clustering- and instance-based algorithms are the most proposed MVI methods. Percentage of correct prediction (PCP) and root mean square error (RMSE) are most used evaluation metrics in these studies. For experimentation, majority of the studies sourced the data sets from publicly available data set repositories. A common approach is that the complete data set is set as baseline to evaluate the effectiveness of imputation on the test data sets with artificially induced missingness. The data set size and missingness ratio varied across the experimentations, while missing datatype and mechanism are pertaining to the capability of imputation. Computational expense is a concern, and experimentation using large data sets appears to be a challenge.

Originality/value

It is understood from the review that there is no single universal solution to missing data problem. Variants of ML approaches work well with the missingness based on the characteristics of the data set. Most of the methods reviewed lack generalization with regard to applicability. Another concern related to applicability is the complexity of the formulation and implementation of the algorithm. Imputations based on k-nearest neighbors (kNN) and clustering algorithms which are simple and easy to implement make it popular across various domains.

Details

Data Technologies and Applications, vol. 55 no. 4
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 14 August 2017

Panagiotis Loukopoulos, George Zolkiewski, Ian Bennett, Pericles Pilidis, Fang Duan and David Mba

Centrifugal compressors are integral components in oil industry, thus effective maintenance is required. Condition-based maintenance and prognostics and health management…

364

Abstract

Purpose

Centrifugal compressors are integral components in oil industry, thus effective maintenance is required. Condition-based maintenance and prognostics and health management (CBM/PHM) have been gaining popularity. CBM/PHM can also be performed remotely leading to e-maintenance. Its success depends on the quality of the data used for analysis and decision making. A major issue associated with it is the missing data. Their presence may compromise the information within a set, causing bias or misleading results. Addressing this matter is crucial. The purpose of this paper is to review and compare the most widely used imputation techniques in a case study using condition monitoring measurements from an operational industrial centrifugal compressor.

Design/methodology/approach

Brief overview and comparison of most widely used imputation techniques using a complete set with artificial missing values. They were tested regarding the effects of the amount, the location within the set and the variable containing the missing values.

Findings

Univariate and multivariate imputation techniques were compared, with the latter offering the smallest error levels. They seemed unaffected by the amount or location of the missing data although they were affected by the variable containing them.

Research limitations/implications

During the analysis, it was assumed that at any time only one variable contained missing data. Further research is still required to address this point.

Originality/value

This study can serve as a guide for selecting the appropriate imputation method for missing values in centrifugal compressor condition monitoring data.

Details

Journal of Quality in Maintenance Engineering, vol. 23 no. 3
Type: Research Article
ISSN: 1355-2511

Keywords

Book part
Publication date: 10 July 2006

Craig Enders, Samantha Dietz, Marjorie Montague and Jennifer Dixon

Missing data are a pervasive problem in special education research. The purpose of this chapter is to provide researchers with an overview of two “modern” alternatives for…

Abstract

Missing data are a pervasive problem in special education research. The purpose of this chapter is to provide researchers with an overview of two “modern” alternatives for handling missing data, full information maximum likelihood (FIML) and multiple imputation (MI). These techniques are currently considered to be the methodological “state of the art”, and generally provide more accurate parameter estimates than the traditional methods that are still common in published educational studies. The chapter begins with an overview of missing data theory, and provides brief descriptions of some traditional missing data techniques and their requisite assumptions. Detailed descriptions of FIML and MI are given, and the chapter concludes with an analytic example from a longitudinal study of depression.

Details

Applications of Research Methodology
Type: Book
ISBN: 978-0-76231-295-5

Content available
Article
Publication date: 24 October 2023

Jared Nystrom, Raymond R. Hill, Andrew Geyer, Joseph J. Pignatiello and Eric Chicken

Present a method to impute missing data from a chaotic time series, in this case lightning prediction data, and then use that completed dataset to create lightning prediction…

Abstract

Purpose

Present a method to impute missing data from a chaotic time series, in this case lightning prediction data, and then use that completed dataset to create lightning prediction forecasts.

Design/methodology/approach

Using the technique of spatiotemporal kriging to estimate data that is autocorrelated but in space and time. Using the estimated data in an imputation methodology completes a dataset used in lightning prediction.

Findings

The techniques provided prove robust to the chaotic nature of the data, and the resulting time series displays evidence of smoothing while also preserving the signal of interest for lightning prediction.

Research limitations/implications

The research is limited to the data collected in support of weather prediction work through the 45th Weather Squadron of the United States Air Force.

Practical implications

These methods are important due to the increasing reliance on sensor systems. These systems often provide incomplete and chaotic data, which must be used despite collection limitations. This work establishes a viable data imputation methodology.

Social implications

Improved lightning prediction, as with any improved prediction methods for natural weather events, can save lives and resources due to timely, cautious behaviors as a result of the predictions.

Originality/value

Based on the authors’ knowledge, this is a novel application of these imputation methods and the forecasting methods.

Details

Journal of Defense Analytics and Logistics, vol. 7 no. 2
Type: Research Article
ISSN: 2399-6439

Keywords

Article
Publication date: 12 October 2020

Ibrahim Said Ahmad, Azuraliza Abu Bakar, Mohd Ridzwan Yaakub and Mohammad Darwich

Sequel movies are very popular; however, there are limited studies on sequel movie revenue prediction. The purpose of this paper is to propose a sentiment analysis based model for…

Abstract

Purpose

Sequel movies are very popular; however, there are limited studies on sequel movie revenue prediction. The purpose of this paper is to propose a sentiment analysis based model for sequel movie revenue prediction and to propose a missing value imputation method for the sequel revenue prediction dataset.

Design/methodology/approach

A sequel of a successful movie will most likely also be successful. Therefore, we propose a supervised learning approach in which data are created from sequel movies to predict the box-office revenue of an upcoming sequel. The algorithms used in the prediction are multiple linear regression, support vector machine and multilayer perceptron neural network.

Findings

The results show that using four sequel movies in a franchise to predict the box-office revenue of a fifth sequel achieved better prediction than using three sequels, which was also better than using two sequel movies.

Research limitations/implications

The model produced will be beneficial to movie producers and other stakeholders in the movie industry in deciding the viability of producing a movie sequel.

Originality/value

Previous studies do not give priority to sequel movies in movie revenue prediction. Additionally, a new missing value imputation method was introduced. Finally, sequel movie revenue prediction dataset was prepared.

Details

Data Technologies and Applications, vol. 54 no. 5
Type: Research Article
ISSN: 2514-9288

Keywords

Book part
Publication date: 11 June 2009

Anca E. Cretu and Roderick J. Brodie

Companies in all industries are searching for new sources of competitive advantage since the competition in their marketplace is becoming increasingly intensive. The…

Abstract

Companies in all industries are searching for new sources of competitive advantage since the competition in their marketplace is becoming increasingly intensive. The resource-based view of the firm explains the sources of sustainable competitive advantages. From a resource-based view perspective, relational based assets (i.e., the assets resulting from firm contacts in the marketplace) enable competitive advantage. The relational based assets examined in this work are brand image and corporate reputation, as components of brand equity, and customer value. This paper explores how they create value. Despite the relatively large amount of literature describing the benefits of firms in having strong brand equity and delivering customer value, no research validated the linkage of brand equity components, brand image, and corporate reputation, simultaneously in the customer value–customer loyalty chain. This work presents a model of testing these relationships in consumer goods, in a business-to-business context. The results demonstrate the differential roles of brand image and corporate reputation on perceived quality, customer value, and customer loyalty. Brand image influences the perception of quality of the products and the additional services, whereas corporate reputation actions beyond brand image, estimating the customer value and customer loyalty. The effects of corporate reputation are also validated on different samples. The results demonstrate the importance of managing brand equity facets, brand image, and corporate reputation since their differential impacts on perceived quality, customer value, and customer loyalty. The results also demonstrate that companies should not limit to invest only in brand image. Maintaining and enhancing corporate reputation can have a stronger impact on customer value and customer loyalty, and can create differential competitive advantage.

Details

Business-To-Business Brand Management: Theory, Research and Executivecase Study Exercises
Type: Book
ISBN: 978-1-84855-671-3

Article
Publication date: 17 May 2011

Doris Gomezelj Omerzel, Boštjan Antončič and Mitja Ruzzier

The purpose of this paper is to use a structural equation modelling technique to verify a theoretically proposed model of knowledge management (KM).

1175

Abstract

Purpose

The purpose of this paper is to use a structural equation modelling technique to verify a theoretically proposed model of knowledge management (KM).

Design/methodology/approach

Existing studies on KM were reviewed and their limitations were identified. Mailed structured questionnaire data for this study were collected from small‐ and medium‐sized enterprises (SMEs) in Slovenia (168 usable responses). Exploratory and confirmatory factor analysis with structural equation modelling was used to estimate the model.

Findings

The hypothesis on the multidimensionality of KM model were mainly supported.

Research limitations/implications

The study is limited to Slovenian SMEs, but can be generalised to other regions. The study offers important contributions for research (KM construct) and for practice (improvements in SME KM).

Practical implications

KM can have beneficial effects on the firm's growth and profitability. The findings can be used to guide entrepreneur in efficient management of different dimensions of knowledge.

Originality/value

This study proved latent elements of KM model in SME. It gives valuable information, which hopefully will help SMEs to respect more the meaning of knowledge and KM.

Details

Baltic Journal of Management, vol. 6 no. 2
Type: Research Article
ISSN: 1746-5265

Keywords

1 – 10 of over 1000