To read this content please select one of the options below:

Text mining techniques for identifying failure modes

Francina Malan (Department of Industrial Engineering, Stellenbosch University, Stellenbosch, South Africa)
Johannes Lodewyk Jooste (Department of Industrial Engineering, Stellenbosch University, Stellenbosch, South Africa)

Journal of Quality in Maintenance Engineering

ISSN: 1355-2511

Article publication date: 6 February 2023

Issue publication date: 18 July 2023

89

Abstract

Purpose

The purpose of this paper is to compare the effectiveness of the various text mining techniques that can be used to classify maintenance work-order records into their respective failure modes, focussing on the choice of algorithm and preprocessing transforms. Three algorithms are evaluated, namely Bernoulli Naïve Bayes, multinomial Naïve Bayes and support vector machines.

Design/methodology/approach

The paper has both a theoretical and experimental component. In the literature review, the various algorithms and preprocessing techniques used in text classification is considered from three perspectives: the domain-specific maintenance literature, the broader short-form literature and the general text classification literature. The experimental component consists of a 5 × 2 nested cross-validation with an inner optimisation loop performed using a randomised search procedure.

Findings

From the literature review, the aspects most affected by short document length are identified as the feature representation scheme, higher-order n-grams, document length normalisation, stemming, stop-word removal and algorithm selection. However, from the experimental analysis, the selection of preprocessing transforms seemed more dependent on the particular algorithm than on short document length. Multinomial Naïve Bayes performs marginally better than the other algorithms, but overall, the performances of the optimised models are comparable.

Originality/value

This work highlights the importance of model optimisation, including the selection of preprocessing transforms. Not only did the optimisation improve the performance of all the algorithms substantially, but it also affects model comparisons, with multinomial Naïve Bayes going from the worst to the best performing algorithm.

Keywords

Acknowledgements

The authors would like to thank Pragma Africa for providing the data set and partial funding for this research.

Citation

Malan, F. and Jooste, J.L. (2023), "Text mining techniques for identifying failure modes", Journal of Quality in Maintenance Engineering, Vol. 29 No. 3, pp. 666-682. https://doi.org/10.1108/JQME-02-2020-0012

Publisher

:

Emerald Publishing Limited

Copyright © 2023, Emerald Publishing Limited

Related articles