To read this content please select one of the options below:

Statistical monitoring applied to data science in classification: continuous validation in predictive models

Carlos Renato Bueno (Department of Production Engineering, Federal University of Sao Carlos, Sao Carlos, Brazil)
Juliano Endrigo Sordan (Department of Production Engineering, Federal University of Sao Carlos, Sao Carlos, Brazil)
Pedro Carlos Oprime (Department of Production Engineering, Federal University of Sao Carlos, Sao Carlos, Brazil)
Damaris Chieregato Vicentin (Department of Production Engineering, Federal University of Sao Carlos, Sao Carlos, Brazil)
Giovanni Cláudio Pinto Condé (Department of Production Engineering, Federal University of Sao Carlos, Sao Carlos, Brazil)

Benchmarking: An International Journal

ISSN: 1463-5771

Article publication date: 22 October 2024

37

Abstract

Purpose

This study aims to analyze the performance of quality indices to continuously validate a predictive model focused on the control chart classification.

Design/methodology/approach

The research method used analytical statistical methods to propose a classification model. The project science research concepts were integrated with the statistical process monitoring (SPM) concepts using the modeling methods applied in the data science (DS) area. For the integration development, SPM Phases I and II were associated, generating models with a structured data analysis process, creating a continuous validation approach.

Findings

Validation was performed by simulation and analytical techniques applied to the Cohen’s Kappa index, supported by voluntary comparisons in the Matthews correlation coefficient (MCC) and the Youden index, generating prescriptive criteria for the classification. Kappa-based control charts performed well for m = 5 sample amounts and n = 500 sizes when Pe is less than 0.8. The simulations also showed that Kappa control requires fewer samples than the other indices studied.

Originality/value

The main contributions of this study to both theory and practitioners is summarized as follows: (1) it proposes DS and SPM integration; (2) it develops a tool for continuous predictive classification models validation; (3) it compares different indices for model quality, indicating their advantages and disadvantages; (4) it defines sampling criteria and procedure for SPM application considering the technique’s Phases I and II and (5) the validated approach serves as a basis for various analyses, enabling an objective comparison among all alternative designs.

Keywords

Acknowledgements

Funding: This study was partially supported by the Coordination for the Improvement of Higher Education Personnel (CAPES-Brazil), 0001. The first author was granted a doctoral scholarship (CAPES-Brazil).

Citation

Bueno, C.R., Sordan, J.E., Oprime, P.C., Vicentin, D.C. and Condé, G.C.P. (2024), "Statistical monitoring applied to data science in classification: continuous validation in predictive models", Benchmarking: An International Journal, Vol. ahead-of-print No. ahead-of-print. https://doi.org/10.1108/BIJ-02-2024-0171

Publisher

:

Emerald Publishing Limited

Copyright © 2024, Emerald Publishing Limited

Related articles