The immense quantity of available unstructured text documents serve as one of the largest source of information. Text classification can be an essential task for many purposes in information retrieval, such as document organization, text filtering and sentiment analysis. Ensemble learning has been extensively studied to construct efficient text classification schemes with higher predictive performance and generalization ability. The purpose of this paper is to provide diversity among the classification algorithms of ensemble, which is a key issue in the ensemble design.
An ensemble scheme based on hybrid supervised clustering is presented for text classification. In the presented scheme, supervised hybrid clustering, which is based on cuckoo search algorithm and k-means, is introduced to partition the data samples of each class into clusters so that training subsets with higher diversities can be provided. Each classifier is trained on the diversified training subsets and the predictions of individual classifiers are combined by the majority voting rule. The predictive performance of the proposed classifier ensemble is compared to conventional classification algorithms (such as Naïve Bayes, logistic regression, support vector machines and C4.5 algorithm) and ensemble learning methods (such as AdaBoost, bagging and random subspace) using 11 text benchmarks.
The experimental results indicate that the presented classifier ensemble outperforms the conventional classification algorithms and ensemble learning methods for text classification.
The presented ensemble scheme is the first to use supervised clustering to obtain diverse ensemble for text classification
CitationDownload as .RIS
Emerald Publishing Limited
Copyright © 2017, Emerald Publishing Limited