Feature space heterogeneity exists widely in various application fields of classification techniques, such as customs inspection decision, credit scoring and medical diagnosis. This paper aims to study the relationship between feature space heterogeneity and classification performance.
A measurement is first developed for measuring and identifying any significant heterogeneity that exists in the feature space of a data set. The main idea of this measurement is derived from a meta-analysis. For the data set with significant feature space heterogeneity, a classification algorithm based on factor analysis and clustering is proposed to learn the data patterns, which, in turn, are used for data classification.
The proposed approach has two main advantages over the previous methods. The first advantage lies in feature transform using orthogonal factor analysis, which results in new features without redundancy and irrelevance. The second advantage rests on samples partitioning to capture the feature space heterogeneity reflected by differences of factor scores. The validity and effectiveness of the proposed approach is verified on a number of benchmarking data sets.
Measurement should be used to guide the heterogeneity elimination process, which is an interesting topic in future research. In addition, to develop a classification algorithm that enables scalable and incremental learning for large data sets with significant feature space heterogeneity is also an important issue.
Measuring and eliminating the feature space heterogeneity possibly existing in the data are important for accurate classification. This study provides a systematical approach to feature space heterogeneity measurement and elimination for better classification performance, which is favorable for applications of classification techniques in real-word problems.
A measurement based on meta-analysis for measuring and identifying any significant feature space heterogeneity in a classification problem is developed, and an ensemble classification framework is proposed to deal with the feature space heterogeneity and improve the classification accuracy.
The authors are grateful to the editor and the anonymous reviewers for their constructive comments and suggestions, which have significantly improved the paper. This research was supported by the National Natural Science Foundation of China (Grants No. 71801164 and 71471022) and Fundamental Research Funds for the Central Universities (Project No. 106112016CDJXY020010).
CitationDownload as .RIS
Emerald Publishing Limited
Copyright © 2019, Emerald Publishing Limited