Search results

1 – 10 of over 76000
Book part
Publication date: 29 September 2023

Torben Juul Andersen

This chapter outlines how the comprehensive North American and European datasets were collected and explains the ensuing data cleaning process outlining three alternative methods…

Abstract

This chapter outlines how the comprehensive North American and European datasets were collected and explains the ensuing data cleaning process outlining three alternative methods applied to deal with missing values, the complete case, the multiple imputation (MI), and the K-nearest neighbor (KNN) methods. The complete case method is the conventional approach adopted in many mainstream management studies. We further discuss the implied assumption underlying use of this technique, which is rarely assessed, or tested in practice and explain the alternative imputation approaches in detail. Use of North American data is the norm but we also collected a European dataset, which is rarely done to enable subsequent comparative analysis between these geographical regions. We introduce the structure of firms organized within different industry classification schemes for use in the ensuing comparative analyses and provide base information on missing values in the original and cleaned datasets. The calculated performance indicators derived from the sampled data are defined and presented. We show how the three alternative approaches considered to deal with missing values have significantly different effects on the calculated performance measures in terms of extreme estimate ranges and mean performance values.

Details

A Study of Risky Business Outcomes: Adapting to Strategic Disruption
Type: Book
ISBN: 978-1-83797-074-2

Keywords

Open Access
Article
Publication date: 22 November 2022

Kedong Yin, Yun Cao, Shiwei Zhou and Xinman Lv

The purposes of this research are to study the theory and method of multi-attribute index system design and establish a set of systematic, standardized, scientific index systems…

Abstract

Purpose

The purposes of this research are to study the theory and method of multi-attribute index system design and establish a set of systematic, standardized, scientific index systems for the design optimization and inspection process. The research may form the basis for a rational, comprehensive evaluation and provide the most effective way of improving the quality of management decision-making. It is of practical significance to improve the rationality and reliability of the index system and provide standardized, scientific reference standards and theoretical guidance for the design and construction of the index system.

Design/methodology/approach

Using modern methods such as complex networks and machine learning, a system for the quality diagnosis of index data and the classification and stratification of index systems is designed. This guarantees the quality of the index data, realizes the scientific classification and stratification of the index system, reduces the subjectivity and randomness of the design of the index system, enhances its objectivity and rationality and lays a solid foundation for the optimal design of the index system.

Findings

Based on the ideas of statistics, system theory, machine learning and data mining, the focus in the present research is on “data quality diagnosis” and “index classification and stratification” and clarifying the classification standards and data quality characteristics of index data; a data-quality diagnosis system of “data review – data cleaning – data conversion – data inspection” is established. Using a decision tree, explanatory structural model, cluster analysis, K-means clustering and other methods, classification and hierarchical method system of indicators is designed to reduce the redundancy of indicator data and improve the quality of the data used. Finally, the scientific and standardized classification and hierarchical design of the index system can be realized.

Originality/value

The innovative contributions and research value of the paper are reflected in three aspects. First, a method system for index data quality diagnosis is designed, and multi-source data fusion technology is adopted to ensure the quality of multi-source, heterogeneous and mixed-frequency data of the index system. The second is to design a systematic quality-inspection process for missing data based on the systematic thinking of the whole and the individual. Aiming at the accuracy, reliability, and feasibility of the patched data, a quality-inspection method of patched data based on inversion thought and a unified representation method of data fusion based on a tensor model are proposed. The third is to use the modern method of unsupervised learning to classify and stratify the index system, which reduces the subjectivity and randomness of the design of the index system and enhances its objectivity and rationality.

Details

Marine Economics and Management, vol. 5 no. 2
Type: Research Article
ISSN: 2516-158X

Keywords

Open Access
Article
Publication date: 8 November 2022

Yilong Ren and Jianbin Wang

The missing travel time data for roads is a common problem encountered by traffic management departments. Tensor decomposition, as one of the most widely used method for…

Abstract

Purpose

The missing travel time data for roads is a common problem encountered by traffic management departments. Tensor decomposition, as one of the most widely used method for completing missing traffic data, plays a significant role in the intelligent transportation system (ITS). However, existing methods of tensor decomposition focus on the global data structure, resulting in relatively low accuracy in fibrosis missing scenarios. Therefore, this paper aims to propose a novel tensor decomposition model which further considers the local spatiotemporal similarity for fibrosis missing to improve travel time completion accuracy.

Design/methodology/approach

The proposed model can aggregate road sections with similar physical attributes by spatial clustering, and then it calculates the temporal association of road sections by the dynamic longest common subsequence. A similarity relationship matrix in the temporal dimension is constructed and incorporated into the tensor completion model, which can enhance the local spatiotemporal relationship of the missing parts of the fibrosis type.

Findings

The experiment shows that this method is superior and robust. Compared with other baseline models, this method has the smallest error and maintains good completion results despite high missing rates.

Originality/value

This model has higher accuracy for the fibrosis missing and performs good convergence effects in the case of the high missing rate.

Details

Smart and Resilient Transportation, vol. 4 no. 3
Type: Research Article
ISSN: 2632-0487

Keywords

Article
Publication date: 27 February 2023

Wenfeng Zhang, Ming K. Lim, Mei Yang, Xingzhi Li and Du Ni

As the supply chain is a highly integrated infrastructure in modern business, the risks in supply chain are also becoming highly contagious among the target company. This…

Abstract

Purpose

As the supply chain is a highly integrated infrastructure in modern business, the risks in supply chain are also becoming highly contagious among the target company. This motivates researchers to continuously add new features to the datasets for the credit risk prediction (CRP). However, adding new features can easily lead to missing of the data.

Design/methodology/approach

Based on the gaps summarized from the literature in CRP, this study first introduces the approaches to the building of datasets and the framing of the algorithmic models. Then, this study tests the interpolation effects of the algorithmic model in three artificial datasets with different missing rates and compares its predictability before and after the interpolation in a real dataset with the missing data in irregular time-series.

Findings

The algorithmic model of the time-decayed long short-term memory (TD-LSTM) proposed in this study can monitor the missing data in irregular time-series by capturing more and better time-series information, and interpolating the missing data efficiently. Moreover, the algorithmic model of Deep Neural Network can be used in the CRP for the datasets with the missing data in irregular time-series after the interpolation by the TD-LSTM.

Originality/value

This study fully validates the TD-LSTM interpolation effects and demonstrates that the predictability of the dataset after interpolation is improved. Accurate and timely CRP can undoubtedly assist a target company in avoiding losses. Identifying credit risks and taking preventive measures ahead of time, especially in the case of public emergencies, can help the company minimize losses.

Details

Industrial Management & Data Systems, vol. 123 no. 5
Type: Research Article
ISSN: 0263-5577

Keywords

Article
Publication date: 2 April 2021

Tressy Thomas and Enayat Rajabi

The primary aim of this study is to review the studies from different dimensions including type of methods, experimentation setup and evaluation metrics used in the novel…

1376

Abstract

Purpose

The primary aim of this study is to review the studies from different dimensions including type of methods, experimentation setup and evaluation metrics used in the novel approaches proposed for data imputation, particularly in the machine learning (ML) area. This ultimately provides an understanding about how well the proposed framework is evaluated and what type and ratio of missingness are addressed in the proposals. The review questions in this study are (1) what are the ML-based imputation methods studied and proposed during 2010–2020? (2) How the experimentation setup, characteristics of data sets and missingness are employed in these studies? (3) What metrics were used for the evaluation of imputation method?

Design/methodology/approach

The review process went through the standard identification, screening and selection process. The initial search on electronic databases for missing value imputation (MVI) based on ML algorithms returned a large number of papers totaling at 2,883. Most of the papers at this stage were not exactly an MVI technique relevant to this study. The literature reviews are first scanned in the title for relevancy, and 306 literature reviews were identified as appropriate. Upon reviewing the abstract text, 151 literature reviews that are not eligible for this study are dropped. This resulted in 155 research papers suitable for full-text review. From this, 117 papers are used in assessment of the review questions.

Findings

This study shows that clustering- and instance-based algorithms are the most proposed MVI methods. Percentage of correct prediction (PCP) and root mean square error (RMSE) are most used evaluation metrics in these studies. For experimentation, majority of the studies sourced the data sets from publicly available data set repositories. A common approach is that the complete data set is set as baseline to evaluate the effectiveness of imputation on the test data sets with artificially induced missingness. The data set size and missingness ratio varied across the experimentations, while missing datatype and mechanism are pertaining to the capability of imputation. Computational expense is a concern, and experimentation using large data sets appears to be a challenge.

Originality/value

It is understood from the review that there is no single universal solution to missing data problem. Variants of ML approaches work well with the missingness based on the characteristics of the data set. Most of the methods reviewed lack generalization with regard to applicability. Another concern related to applicability is the complexity of the formulation and implementation of the algorithm. Imputations based on k-nearest neighbors (kNN) and clustering algorithms which are simple and easy to implement make it popular across various domains.

Details

Data Technologies and Applications, vol. 55 no. 4
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 27 July 2021

Sonia Goel and Meena Tushir

In real-world decision-making, high accuracy data analysis is essential in a ubiquitous environment. However, we encounter missing data while collecting user-related data

Abstract

Purpose

In real-world decision-making, high accuracy data analysis is essential in a ubiquitous environment. However, we encounter missing data while collecting user-related data information because of various privacy concerns on account of a user. This paper aims to deal with incomplete data for fuzzy model identification, a new method of parameter estimation of a Takagi–Sugeno model in the presence of missing features.

Design/methodology/approach

In this work, authors proposed a three-fold approach for fuzzy model identification in which imputation-based linear interpolation technique is used to estimate missing features of the data, and then fuzzy c-means clustering is used for determining optimal number of rules and for the determination of parameters of membership functions of the fuzzy model. Finally, the optimization of the all antecedent and consequent parameters along with the width of the antecedent (Gaussian) membership function is done by gradient descent algorithm based on the minimization of root mean square error.

Findings

The proposed method is tested on two well-known simulation examples as well as on a real data set, and the performance is compared with some traditional methods. The result analysis and statistical analysis show that the proposed model has achieved a considerable improvement in accuracy in the presence of varying degree of data incompleteness.

Originality/value

The proposed method works well for fuzzy model identification method, a new method of parameter estimation of a Takagi–Sugeno model in the presence of missing features with varying degree of missing data as compared to some well-known methods.

Details

International Journal of Pervasive Computing and Communications, vol. 17 no. 4
Type: Research Article
ISSN: 1742-7371

Keywords

Article
Publication date: 1 November 2003

Marvin L. Brown and John F. Kros

The actual data mining process deals significantly with prediction, estimation, classification, pattern recognition and the development of association rules. Therefore, the…

7066

Abstract

The actual data mining process deals significantly with prediction, estimation, classification, pattern recognition and the development of association rules. Therefore, the significance of the analysis depends heavily on the accuracy of the database and on the chosen sample data to be used for model training and testing. Data mining is based upon searching the concatenation of multiple databases that usually contain some amount of missing data along with a variable percentage of inaccurate data, pollution, outliers and noise. The issue of missing data must be addressed since ignoring this problem can introduce bias into the models being evaluated and lead to inaccurate data mining conclusions. The objective of this research is to address the impact of missing data on the data mining process.

Details

Industrial Management & Data Systems, vol. 103 no. 8
Type: Research Article
ISSN: 0263-5577

Keywords

Book part
Publication date: 10 July 2006

Craig Enders, Samantha Dietz, Marjorie Montague and Jennifer Dixon

Missing data are a pervasive problem in special education research. The purpose of this chapter is to provide researchers with an overview of two “modern” alternatives for…

Abstract

Missing data are a pervasive problem in special education research. The purpose of this chapter is to provide researchers with an overview of two “modern” alternatives for handling missing data, full information maximum likelihood (FIML) and multiple imputation (MI). These techniques are currently considered to be the methodological “state of the art”, and generally provide more accurate parameter estimates than the traditional methods that are still common in published educational studies. The chapter begins with an overview of missing data theory, and provides brief descriptions of some traditional missing data techniques and their requisite assumptions. Detailed descriptions of FIML and MI are given, and the chapter concludes with an analytic example from a longitudinal study of depression.

Details

Applications of Research Methodology
Type: Book
ISBN: 978-0-76231-295-5

Article
Publication date: 24 August 2018

Jewoo Kim and Jongho Im

The purpose of this paper is to introduce a new multiple imputation method that can effectively manage missing values in online review data, thereby allowing the online review…

Abstract

Purpose

The purpose of this paper is to introduce a new multiple imputation method that can effectively manage missing values in online review data, thereby allowing the online review analysis to yield valid results by using all available data.

Design/methodology/approach

This study develops a missing data method based on the multivariate imputation chained equation to generate imputed values for online reviews. Sentiment analysis is used to incorporate customers’ textual opinions as the auxiliary information in the imputation procedures. To check the validity of the proposed imputation method, the authors apply this method to missing values of sub-ratings on hotel attributes in both the simulated and real Honolulu hotel review data sets. The estimation results are compared to those of different missing data techniques, namely, listwise deletion and conventional multiple imputation which does not consider text reviews.

Findings

The findings from the simulation analysis show that the imputation method of the authors produces more efficient and less biased estimates compared to the other two missing data techniques when text reviews are possibly associated with the rating scores and response mechanism. When applying the imputation method to the real hotel review data, the findings show that the text sentiment-based propensity score can effectively explain the missingness of sub-ratings on hotel attributes, and the imputation method considering those propensity scores has better estimation results than the other techniques as in the simulation analysis.

Originality/value

This study extends multiple imputation to online data considering its spontaneous and unstructured nature. This new method helps make the fuller use of the observed online data while avoiding potential missing problems.

Details

International Journal of Contemporary Hospitality Management, vol. 30 no. 11
Type: Research Article
ISSN: 0959-6119

Keywords

1 – 10 of over 76000