Statistical monitoring applied to data science in classification: continuous validation in predictive models
Abstract
Purpose
This study aims to analyze the performance of quality indices to continuously validate a predictive model focused on the control chart classification.
Design/methodology/approach
The research method used analytical statistical methods to propose a classification model. The project science research concepts were integrated with the statistical process monitoring (SPM) concepts using the modeling methods applied in the data science (DS) area. For the integration development, SPM Phases I and II were associated, generating models with a structured data analysis process, creating a continuous validation approach.
Findings
Validation was performed by simulation and analytical techniques applied to the Cohen’s Kappa index, supported by voluntary comparisons in the Matthews correlation coefficient (MCC) and the Youden index, generating prescriptive criteria for the classification. Kappa-based control charts performed well for m = 5 sample amounts and n = 500 sizes when Pe is less than 0.8. The simulations also showed that Kappa control requires fewer samples than the other indices studied.
Originality/value
The main contributions of this study to both theory and practitioners is summarized as follows: (1) it proposes DS and SPM integration; (2) it develops a tool for continuous predictive classification models validation; (3) it compares different indices for model quality, indicating their advantages and disadvantages; (4) it defines sampling criteria and procedure for SPM application considering the technique’s Phases I and II and (5) the validated approach serves as a basis for various analyses, enabling an objective comparison among all alternative designs.
Keywords
Acknowledgements
Funding: This study was partially supported by the Coordination for the Improvement of Higher Education Personnel (CAPES-Brazil), 0001. The first author was granted a doctoral scholarship (CAPES-Brazil).
Citation
Bueno, C.R., Sordan, J.E., Oprime, P.C., Vicentin, D.C. and Condé, G.C.P. (2024), "Statistical monitoring applied to data science in classification: continuous validation in predictive models", Benchmarking: An International Journal, Vol. ahead-of-print No. ahead-of-print. https://doi.org/10.1108/BIJ-02-2024-0171
Publisher
:Emerald Publishing Limited
Copyright © 2024, Emerald Publishing Limited