The purpose of this study is to alleviate the specified issues to a great extent. To promote patients’ health via early prediction of diseases, knowledge extraction using data mining approaches shows an integral part of e-health system. However, medical databases are highly imbalanced, voluminous, conflicting and complex in nature, and these can lead to erroneous diagnosis of diseases (i.e. detecting class-values of diseases). In literature, numerous standard disease decision support system (DDSS) have been proposed, but most of them are disease specific. Also, they usually suffer from several drawbacks like lack of understandability, incapability of operating rare cases, inefficiency in making quick and correct decision, etc.
Addressing the limitations of the existing systems, the present research introduces a two-step framework for designing a DDSS, in which the first step (data-level optimization) deals in identifying an optimal data-partition (Popt) for each disease data set and then the best training set for Popt in parallel manner. On the other hand, the second step explores a generic predictive model (integrating C4.5 and PRISM learners) over the discovered information for effective diagnosis of disease. The designed model is a generic one (i.e. not disease specific).
The empirical results (in terms of top three measures, namely, accuracy, true positive rate and false positive rate) obtained over 14 benchmark medical data sets (collected from https://archive.ics.uci.edu/ml) demonstrate that the hybrid model outperforms the base learners in almost all cases for initial diagnosis of the diseases. After all, the proposed DDSS may work as an e-doctor to detect diseases.
The model designed in this study is original, and the necessary parallelized methods are implemented in C on Cluster HPC machine (FUJITSU) with total 256 cores (under one Master node).
Compliance with ethical standards: The study is not funded by any agency. It does not involve other human participants and/or animal. The author declares that there is no conflict of interests regarding the publication of this paper.
Sarkar, B.K. and Sana, S.S. (2019), "An e-healthcare system for disease prediction using hybrid data mining technique", Journal of Modelling in Management, Vol. 14 No. 3, pp. 628-661. https://doi.org/10.1108/JM2-05-2018-0069
Emerald Publishing Limited
Copyright © 2019, Emerald Publishing Limited