An approach for fault-related monitoring variables selection based on dual-layer correlation networks

Zhenjie Zhang (China-Austria Belt and Road Joint Laboratory on Artificial Intelligence and Advanced Manufacturing, Hangzhou Dianzi University, Hangzhou, China)
Xinjiu Chen (China-Austria Belt and Road Joint Laboratory on Artificial Intelligence and Advanced Manufacturing, Hangzhou Dianzi University, Hangzhou, China) (School of Automation, Hangzhou Dianzi University, Hangzhou, China)
Xiaobin Xu (China-Austria Belt and Road Joint Laboratory on Artificial Intelligence and Advanced Manufacturing, Hangzhou Dianzi University, Hangzhou, China)
Yi Li (China-Austria Belt and Road Joint Laboratory on Artificial Intelligence and Advanced Manufacturing, Hangzhou Dianzi University, Hangzhou, China)
Pingzhi Hou (China-Austria Belt and Road Joint Laboratory on Artificial Intelligence and Advanced Manufacturing, Hangzhou Dianzi University, Hangzhou, China)
Zehui Zhang (China-Austria Belt and Road Joint Laboratory on Artificial Intelligence and Advanced Manufacturing, Hangzhou Dianzi University, Hangzhou, China)
Haohao Guo (China-Austria Belt and Road Joint Laboratory on Artificial Intelligence and Advanced Manufacturing, Hangzhou Dianzi University, Hangzhou, China) (School of Automation, Hangzhou Dianzi University, Hangzhou, China)

Journal of Intelligent Manufacturing and Special Equipment

ISSN: 2633-6596

Article publication date: 20 August 2024

Issue publication date: 4 December 2024

98

Abstract

Purpose

Fault-related monitoring variables selection is a process of obtaining a subset of variables from the original set, which is of great significance for reducing information redundancy and improving the performance of the fault diagnosis models. This paper aims to propose a novel variables selection approach based on complex networks.

Design/methodology/approach

Firstly, a dual-layer correlation networks (DlCN) which consists of mechanism-oriented correlation sub-network (MoCSN) and data-oriented correlation sub-network (DoCSN) is constructed. Secondly, an algorithm for identifying critical fault-related monitoring variables based on dual correlations is introduced. In the algorithm, the topological attributes of the MoCSN and correlation threshold of the DoCSN are used successively.

Findings

In the experiments of vertical elevator fault diagnosis, the critical fault-related monitoring variables selected by the DlCN-based approach is more effective than the traditional approaches. It indicates that fusion mechanism-oriented correlation can enhance the comprehensiveness of variable correlation analysis. Moreover, the approach has been proved to be adaptable to different fault diagnosis models.

Originality/value

In the DlCN-based variables selection approach, the mechanism-oriented correlation and data-oriented correlation are comprehensively considered. It improves the precision of variables selection. Meanwhile, it is an unsupervised and model-agnostic approach which addresses the shortcomings of some conventional approaches that require data labels and have insufficient adaptability for fault diagnosis models.

Keywords

Citation

Zhang, Z., Chen, X., Xu, X., Li, Y., Hou, P., Zhang, Z. and Guo, H. (2024), "An approach for fault-related monitoring variables selection based on dual-layer correlation networks", Journal of Intelligent Manufacturing and Special Equipment, Vol. 5 No. 2, pp. 255-264. https://doi.org/10.1108/JIMSE-05-2024-0008

Publisher

:

Emerald Publishing Limited

Copyright © 2024, Zhenjie Zhang, Xinjiu Chen, Xiaobin Xu, Yi Li, Pingzhi Hou, Zehui Zhang and Haohao Guo

License

Published in Journal of Intelligent Manufacturing and Special Equipment. Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) licence. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this licence may be seen at http://creativecommons.org/licences/by/4.0/legalcode


1. Introduction

Variable selection is the process of obtaining a subset of variables from the original set, serving as a common data preprocessing technique that underpins various data mining and machine learning tasks (Chandrashekar and Sahin, 2014). It is widely used in the fault monitoring and diagnosis process. For fault monitoring, principal component analysis (PCA) and partial least squares (PLS) are two classical variable selection methods, effectively reducing the dimensionality of monitoring variables (Ghosh et al., 2014; Jiang et al., 2015; Li et al., 2015). In fault diagnosis, to decrease model input dimensions, alleviate information redundancy and enhance the performance of fault diagnosis models, selection of fault-related monitoring variables is also imperative. Regarding fault diagnosis, current selection methods primarily operate on two levels: the data-level variables and feature-level variables. The data level is utilized for identifying crucial time-series monitoring variables, while the feature level focuses on selecting fault features extracted from the monitoring variables. This paper concentrates on the selection of monitoring variables at the data level.

Currently, common variables selection approaches include Random forest (Gregorutti et al., 2017), Entropy (Zhang et al., 2016), Mutual information (Verron et al., 2008; Hassani et al., 2021; Liang et al., 2019), LASSO (Deng et al., 2020; Yan and Yao, 2015), mRMR (Li et al., 2017; Zhong et al., 2019), causal correlation (Clavijo et al., 2021) and so on. Among these, correlation analysis of the fault monitoring variables is a pivotal strategy. Nonetheless, prevailing approaches are largely data-driven, that is, analyzing time-series data of fault-related monitoring variables through correlation analysis techniques. These approaches lack a mechanistic perspective in the correlation analysis of fault-related monitoring variables, which undermines the effectiveness of variables selection to some extent. Mechanism-oriented correlation analysis aims to study the operational mechanisms and inherent principles of systems. For instance, within an electromechanical system, different functional modules are interconnected mechanically or electrically. From the operational mechanism of such a system, fault-related monitoring variables exhibit mechanistic correlations.

Therefore, a fault-related monitoring variables selection approach based on the dual-layer correlation networks from both mechanism and data perspectives is proposed by using complex network modeling and analysis techniques. The main contributions are threefold. First, the dual-layer correlation networks that encompasses both the mechanism-oriented correlation and data-oriented correlation of fault-related monitoring variables is developed. It enriches and expands the dimensions of variable correlation modeling. Second, an algorithm for selecting fault-related variables based on dual correlations is designed. It enhances the precision of variable selection through network topological attributes and correlation thresholds. Third, an unsupervised and model-agnostic approach for variables selection is presented, which addresses the shortcomings of some conventional methods that require data labels and have insufficient adaptability for fault diagnosis models.

The remainder of the paper is structured as follows: Section 2 introduces the fault-related monitoring variables selection approach based on the DlCN. Section 3 validates the proposed approach with a case study about elevator and Section 4 concludes the paper.

2. Methodology

In this section, the complex network theory is employed to conduct a comprehensive modeling and analysis of both mechanism-oriented and data-oriented correlations. Then the fault-related monitoring variables selection approach is introduced based on a dual-layer correlation networks model, as shown in Figure 1. The approach is divided into two parts: construction of the dual-layer correlation networks model and variables selection based on dual correlations. Firstly, the dual-layer network model consists of mechanism-oriented correlation sub-network (MoCSN) and data-oriented correlation sub-network (DoCSN). It represents the dual correlations of the fault-related monitoring variables. In the construction of the MoCSN, the energy flow and information flow of the electromechanical system are analyzed to derive a matrix to represent the mechanism-oriented correlation. In the construction of the DoCSN, Spearman correlation analysis is used to extracted the data-oriented correlation of the time series data of the variables. Secondly, the critical fault-related monitoring variables are selected based on dual correlations. The degree attribute of the MoCSN and Spearman correlation threshold of the DoCSN are used together to find out the variables with low correlation.

2.1 Construction of dual-layer correlation networks

For an electromechanical system, suppose it has n functional modules, denoted as R = { ri |i = 1, …, n }. The set of fault-related monitoring variables is expressed as V = {vi |i = 1, …, S}, where S represents the number of the monitoring variables that can be collected. The time series data set of fault-related monitoring variables is expressed as VD = {vdi |i ∈ V}.

  • (1)

    Construction of MoCSN

Mechanism-oriented correlation analysis is a process of analyzing the internal structure and functional relationship of the system based on expert knowledge. It aims to establish the correlation of the fault-related monitoring variables by analyzing the mechanism influence among each functional modules. The specific steps are as follows.

  • Step 1: According to the internal mechanism-oriented correlation of electromechanical system, the energy correlation matrix and the information correlation matrix are established, denoted as P1 and P2, respectively. The criterion for determining the internal mechanism-oriented correlation is based on whether there is energy flow or information flow transfer among the functional modules of the electromechanical system. The correlation matrix can be expressed as follows:

(1)Pi={pj,ki|i[1,2];j,kR}
where j and k represent the functional modules, pj,ki represents the mechanism correlation between j and k (when i = 1, it represents the energy flow correlation, and when i = 2, it represents the information flow correlation). For an electromechanical system, if there is an correlation of energy flow or information flow between j and k, then pj,ki=1, otherwise pj,ki=0.
  • Step 2: By combining the matrix P1 and P2, the mechanism-oriented correlation matrix P is obtained, expressed as P={pj,k|j,k[1,n]}, where pj,k represents the mechanism-oriented correlation coefficient.

  • Step 3: The mapping matrix Map is established according to the corresponding relationship between the functional modules of the electromechanical system and the fault-related monitoring variables. The mapping matrix is expressed as follows:

(2)Map={mapi,j|iV,jR}
where i represents the fault-related monitoring variable, j represents the functional modules, mapi,j represents the mapping relationship between the variable i and the module j. When mapi,j = 1, it means that the monitoring variable i is collected on the module j.
  • Step 4: The MoCSN is established based on the mechanism-oriented correlation matrix P and mapping matrix Map. It can be formalized as Gα =(V, Eα), where V is the node set representing the fault-related monitoring variables, Eα is the edge set representing the mechanism-oriented correlation of the variables. If there is a mechanism-oriented correlation between variable i and variable j, the edge weight is set to 1, denoted as ei,jα = 1. Otherwise, ei,jα = 0. It should be pointed out that there is mechanism-oriented correlation if variable i and variable j are from the same functional module.

  • (2)

    Construction of DoCSN

The construction of DoCSN is based on the correlation analysis of the time series data of the fault-related monitoring variables. It can be formalized as Gβ = (V, Eβ), where V is the node set representing the fault-related monitoring variables, Eβ is the edge set representing the data-oriented correlation of the variables. In this section, Spearman coefficient is used to characterize the edge weight, denote as μa,b. It can be calculated as follow:

(3)μa,b=|16i=1Mdi2M(M21)|
where di represents the position difference value of the data point i after time series data of the fault-related monitoring variable a and b are sorted respectively, M represents the number of time series data points of each variable.
  • (3)

    Construction of DlCN

According to the two networks MoCSN and DoCSN, the dual-layer correlation networks can be constructed, expressed as the following super-adjacency matrix:

(4)G=[GαM|1,2|M|2,1|Gβ]
where Gα and Gβ represent the adjacency matrix of the MoCSN and DoCSN respectively, M|1,2| and M|2,1| are two identity matrix.

2.2 Fault-related monitoring variables selection based on dual correlations

When using the monitoring variables in fault diagnosis, the smaller the correlation strength, the less the redundant information. In other words, more fault-related information can be provided for the diagnosis model. Therefore, the monitoring variables with low correlation should be selected. The specific steps are as follows:

  • Step 1: According to the topological attributes of the MoCSN, the degree of each node v, denoted as dv, in the Gα is calculated, and then the correlation strength of the variables can be obtained. Specifically, the node with smaller degree has lower correlation strength with other nodes in the network. Then the node pair (i.e. two variables) with the smallest degree sum is selected and added to the set DV1. By doing this, several low-correlation variable pairs are obtained. It can be expressed as DV1={(vi,vj)|vi,vjV}.

  • Step 2: The node pair in the network Gβ are selected based on the correlation threshold. Specifically, if the edge weight in the network satisfies the condition wi,jβ<=gthreshold, then the corresponding variable pairs are added to the set DV2. It can be denoted as DV2={(vi,vj)|vi,vjV}.

  • Step 3: According to the intersection of DV1 and DV2, the variable pair set DV3 is obtained, denoted as DV 3 = DV1DV2. After that, selecting the variable pairs with the lowest correlation coefficient in the set DV3 as the final result.

3. Experiments

In this section, vertical elevator fault diagnosis is taken as an example to verify the effectiveness and superiority of the proposed approach.

3.1 Data introduction

The experimental data comes from the elevator operation simulation system built in Simscape environment (Vladić et al., 2011), as shown in Figure 2, which can effectively simulate the dynamics of the actual elevator operation process.

Based on the elevator system, different fault modes are generated through fault injection into different components. In this experiment, three typical fault modes are injected. They are tractor circuit fault (TF), pulley wear failure (PF) and counterweight wear failure (CF), denoted as F = {TF,PF,CF}. There are nine fault-related monitoring variables are collected. For each variable, 400 sample data of each variable in different fault modes are obtained. See Table 1.

3.2 Fault-related monitoring variables selection

According to the dual-layer network modeling approach in Section 2.1, the MoCSN(Gα) and DoCSN(Gβ) for vertical elevator are established, as shown in Figures 3 and 4 (where the thickness of the edge represents the correlation strength of the variables) respectively.

Firstly, the degree of the node in Gα is calculated. Then four nodes with the minimum degree value are selected, namely No.5, No.6, No. 8 and No. 9. After that, DV1 can be obtained as {(No.5, No.6), (No.5, No.8), (No.5, No.9), (No.6, No.8), (No.6, No.9), (No.8, No.9)}. Secondly, the value of Spearman correlation threshold is set as gthreshold = 0.3. Then DV2 can be obtained as {(No.1, No.2), (No.1, No.5), (No.1, No.8), (No.2, No.3), (No.2, No.4), (No.2, No.6), (No.2, No.7), (No.2, No.9), (No.3, No.5), (No.3, No.8), (No.4, No.5), (No.4, No.8), (No.5, No.6), (No.5, No.7), (No.5, No.9), (No.6, No.8), (No.7, No.8), (No.8, No.9)}. Finally, DV3 can be obtained as {(No.5, No.6), (No.5, No.9), (No.6, No.8), (No.8, No.9)} and variable pair (No.5, No.6) is the final result since the Spearman correlation coefficient is the smallest in DV3.

3.3 Comparative analysis

To verify the effectiveness and superiority of the dual-layer correlation network (DlCN) based approach, the Spearman based, Random Forest (RF) based and Max-Relevance and Min-Redundancy (mRMR) based approaches are selected to make comparison. The results obtained by the Spearman, RF and mRMR are shown in Table 2 and Table 3.

According to Table 2 and Table 3, the critical variables obtained by Spearman, RF and MRMR are {No.3, No.8}, {No.2, No.5} and {No.2, No.7}. To verify the effectiveness of the method and the generalization ability of different fault diagnosis models such as BRB, SVM and BPNN and the diagnostic accuracy is used as the evaluation criteria. The results are shown in Figure 5.

As shown in Figure 5, for any fault diagnosis model, the diagnostic accuracy obtained by DlCN-based approach is superior to that obtained by the RF based approach, mRMR based approach and Spearman based approach. Besides, it should be pointed out that the fault diagnostic accuracy obtained by DlCN-based approach is the highest among all variable pair combinations. It is attributed to that the mechanism-oriented correlation and data-oriented correlation are considered comprehensively in the DlCN. It improves the performance of the fault-related monitoring variables selection. Moreover, it can also been found that the diagnostic accuracy obtained by the Spearman based approach is the worst. It further indicates that fusion mechanism-oriented correlation enhances the comprehensiveness of variable correlation analysis.

4. Conclusion

To improve the performance of the fault-related monitoring variables selection, a novel approach based on DlCN is proposed in this paper. In the approach, the mechanism-oriented correlation and data-oriented correlation are comprehensively considered and the dual-layer correlation network is constructed. Then the critical fault-related monitoring variables are selected based on the dual correlations. Through the analysis of the experiments of vertical elevator fault diagnosis, the effectiveness and superiority of the proposed approach are verified. Moreover, the approach has been proved to be adaptable to different fault diagnosis models. Since the DlCN-based proposed can only filter two critical variables, future work will focus on the optimization of the approach to achieve the selection of any number of variables to adapt to different demand scenarios.

Figures

Fault-related monitoring variables selection based on dual-layer correlation networks

Figure 1

Fault-related monitoring variables selection based on dual-layer correlation networks

Elevator operation simulation system

Figure 2

Elevator operation simulation system

MoCSN for vertical elevator(Gα)

Figure 3

MoCSN for vertical elevator(Gα)

DoCSN for vertical elevator(Gβ)

Figure 4

DoCSN for vertical elevator(Gβ)

Comparison of fault diagnostic accuracy

Figure 5

Comparison of fault diagnostic accuracy

The physical meaning of monitoring variables

No.Physical meaning
1Angular acceleration of the motor pulley
2Velocity of the motor pulley
3Torque of the motor
4Rope tension on the elevator cabin side
5Velocity of the elevator cabin
6Acceleration of the elevator cabin
7Rope tension on the counterweight side
8Velocity of the counterweight
9Acceleration of the counterweight

Source(s): Authors' own creation

Critical variables obtained by Spearman based approach

Variable No.123456789
11.00000.25750.69440.84210.26120.83770.44930.26360.8378
20.25751.00000.03820.27270.99980.27320.19130.99240.2732
30.69440.03821.00000.72350.03610.74930.76440.03420.7493
40.84210.27270.72351.00000.26890.99450.46420.27220.9944
50.26120.99980.03610.26891.00000.26940.19260.99260.2794
60.83770.27320.74930.99450.26941.00000.48910.27280.9996
70.44930.19130.76440.46420.19260.48911.00000.19790.4888
80.26360.99240.03420.27220.99260.27280.19791.00000.2727
90.83780.27320.74930.99440.27940.99960.48880.27271.0000

Source(s): Authors' own creation

Critical variables obtained by RF based approach and mRMR based approach

Variable No.RFMRMR
10.05070.2641
20.21260.6931
30.06310.0000
40.07430.2535
50.21840.2996
60.07480.3505
70.07690.5060
80.16040.3725
90.06890.3725

Source(s): Authors' own creation

References

Chandrashekar, G. and Sahin, F. (2014), “A survey on feature selection methods”, Computers and Electrical Engineering, Vol. 40 No. 1, pp. 16-28, doi: 10.1016/j.compeleceng.2013.11.024.

Clavijo, N., Melo, A., Soares, R.M., Campos, L.F.D.O., Lemos, T., Câmara, M.M., Anzai, T.K., Diehl, F.C., Thompson, P.H. and Pinto, J.C. (2021), “Variable selection for fault detection based on causal discovery methods: analysis of an actual industrial case”, Processes, Vol. 9 No. 3, p. 544, doi: 10.3390/pr9030544.

Deng, R., Wang, Z. and Fan, Y. (2020), “Fault relevant variable selection for fault diagnosis”, IEEE Access, Vol. 8, pp. 23134-23142, doi: 10.1109/access.2020.2970046.

Ghosh, K., Ramteke, M. and Srinivasan, R. (2014), “Optimal variable selection for effective statistical process monitoring”, Computers and Chemical Engineering, Vol. 60, pp. 260-276, doi: 10.1016/j.compchemeng.2013.09.014.

Gregorutti, B., Michel, B. and Saint-Pierre, P. (2017), “Correlation and variable importance in random forests”, Statistics and Computing, Vol. 27 No. 3, pp. 659-678, doi: 10.1007/s11222-016-9646-1.

Hassani, H., Hallaji, E., Razavi-Far, R. and Saif, M. (2021), “Unsupervised concrete feature selection based on mutual information for diagnosing faults and cyber-attacks in power systems”, Engineering Applications of Artificial Intelligence, Vol. 100, 104150, doi: 10.1016/j.engappai.2020.104150.

Jiang, Q., Yan, X. and Huang, B. (2015), “Performance-driven distributed PCA process monitoring based on fault-relevant variable selection and Bayesian inference”, IEEE Transactions on Industrial Electronics, Vol. 63 No. 1, pp. 377-386, doi: 10.1109/tie.2015.2466557.

Li, J., Duan, C. and Fei, Z. (2015), “A novel variable selection approach for redundant information elimination purpose of process control”, IEEE Transactions on Industrial Electronics, Vol. 63 No. 3, pp. 1737-1744, doi: 10.1109/tie.2015.2498909.

Li, Y., Yang, Y., Li, G., Xu, M. and Huang, W. (2017), “A fault diagnosis scheme for planetary gearboxes using modified multi-scale symbolic dynamic entropy and mRMR feature selection”, Mechanical Systems and Signal Processing, Vol. 91, pp. 295-312, doi: 10.1016/j.ymssp.2016.12.040.

Liang, J., Hou, L., Luan, Z. and Huang, W. (2019), “Feature selection with conditional mutual information considering feature interaction”, Symmetry, Vol. 11 No. 7, p. 858, doi: 10.3390/sym11070858.

Verron, S., Tiplica, T. and Kobi, A. (2008), “Fault detection and identification with a new feature selection based on mutual information”, Journal of Process Control, Vol. 18 No. 5, pp. 479-490, doi: 10.1016/j.jprocont.2007.08.003.

Vladić, J., Đokić, R., Kljajin, M. and Karakašić, M. (2011), “Modelling and simulations of elevator dynamic behaviour”, Tehnicki vjesnik/Technical Gazette, Vol. 18 No. 3, p. 423.

Yan, Z. and Yao, Y. (2015), “Variable selection method for fault isolation using least absolute shrinkage and selection operator (LASSO)”, Chemometrics and Intelligent Laboratory Systems, Vol. 146, pp. 136-146, doi: 10.1016/j.chemolab.2015.05.019.

Zhang, X., Mei, C., Chen, D. and Li, J. (2016), “Feature selection in mixed data: a method using a novel fuzzy rough set-based information entropy”, Pattern Recognition, Vol. 56, pp. 1-15, doi: 10.1016/j.patcog.2016.02.013.

Zhong, K., Han, M., Qiu, T., Han, B. and Chen, Y.W. (2019), “Distributed dynamic process monitoring based on minimal redundancy maximal relevance variable selection and Bayesian inference”, IEEE Transactions on Control Systems Technology, Vol. 28 No. 5, pp. 2037-2044, doi: 10.1109/tcst.2019.2932682.

Acknowledgements

We acknowledge financial support from the National Key R&D Program of China (2022YFE0210700), NSFC (62103121), the “Pioneer” and “Leading Goose” R&D Program of Zhejiang (2024C03254, 2023C01215, 2024C01208).

Corresponding author

Xiaobin Xu can be contacted at: xuxiaobin1980@163.com

Related articles