Application of complex network theory in identifying critical elements of CRH2 train system

Purpose – Based on complex network theory, a method for critical elements identification of China Railway High-speed 2 (CRH2) train system is introduced in this paper. Design/methodology/approach – First, two methods, reliability theory and complex theory, are introduced, and the advantages and disadvantages for their application in identifying critical elements of high-speed train system are summarized. Second, a multi-layer multi-granularity network model including virtual and actual nodes is proposed, and the corresponding fusion rules for the same nodes in different layers are given. Findings – Finally, taking CRH2 train system as an example, the critical elements are identified by using complex network theory, which provides a reference for train operation andmaintenance. © Huiru Zhang, Limin Jia, Li Wang and Yong Qin. Published in Smart and Resilient Transport. Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) licence. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this licence maybe seen at http:// creativecommons.org/licences/by/4.0/legalcode This study is funded by the National Key Research and Development Program of China (2016YFB1200401). Application of complex network theory

identification method is needed in high-speed railway systems to achieve operational reliability, availability, maintainability and supportability (Saraswat and Yadava, 2008).

Related literature
The critical elements of the high-speed train system refer to the unit that plays an important role in maintaining the global topology and normal functions. The importance measure is the common method of system-critical elements identification, which can be roughly divided into the following two types.
1.2.1 Importance measurement based on reliability theory. In the existing research on the identification of critical elements, important indices of reliability have been widely used. Birnbaum's importance is a sensitivity analysis method widely used in the field of component reliability (Wang et al., 2004). Critical importance is usually combined with fault tree analysis to measure the impact of failed critical components on system failures (Espiritu et al., 2007;Lambert, 1975). The reliability achievement worth (RAW) importance mainly measures the importance of the component to maintain the current reliability level of the system, and alternately, the reliability reduction worth (RRW) importance is mainly used to analyze the degree of influence on the current reliability level of the system when the component is always unreliable (Bisanovic et al., 2016). Fussell-Vesely's importance is mainly used to evaluate the influence of a minimal cut set containing at least one failed component on the system reliability (Van Der Borst and Schoonakker, 2001). The Bayesian reliability importance measures the probability that a component fails given that the system fails (Zhu and Kuo, 2014).
In general, the following assumptions are made before the analysis of reliability importance.
Failure probabilities and repair times are independent. Component states and associated probabilities are known.
However, in real systems, components are interdependent in the process of implementing functions and reliability also affects each other (Dobson et al., 2007). Therefore, for a complex system such as the high-speed train system, it is almost infeasible to accurately obtain information on the reliability importance of each component to identify critical components, which is also uneconomical. 1.2.2 Importance measurement based on complex network theory. As the groundbreaking work of Watts and Strogatz (1998) regarding small-world networks and Barab asi and Albert (1999) regarding scale-free networks, real-word phenomena have begun to be studied from the perspective of actual networks and network theory. Taking components as nodes and connecting relationships as arcs are the main mean of abstracting actual systems into complex networks (Lin et al., 2018;Wang et al., 2017). Kou et al. (2018) proposed a new method that can better use the theory of network flow to represent the network: arcs represent the components, and nodes are the transitive relation. For the distributed and complex electromechanical system, Wang et al. (2016) generated a penetrable visibility graph method that combines the phase space reconstruction method. Topological features include degree centrality (DC), betweenness centrality (BC), closeness centrality (CC), etc (Bonacich, 2007;Brandes et al., 2016;Chen et al., 2012;Du et al., 2015;Hu et al., 2015).
The application of the topological approach to measure importance is quite popular. On the one hand, it has obvious advantages in the analysis of complex systems because it is relatively simple to use. On the other hand, it offers the capability of identifying elements of structural reliability, i.e. network edges and nodes whose failure can induce severe damage Application of complex network theory to the network through the physical disconnection of its parts. However, the traditional complex network approach only focuses on the topology characteristics of the network and ignores the physical significance of the components (Hines and Blumsack, 2008;Zio and Golea, 2012). In this respect, it is important to possibly overcome these limitations by complementation with more actual characteristic analyzes on components of complex systems (Bompard et al., 2009).

Contributions
A method of identifying key elements of the CRH2 train system based on integrated importance indices is introduced, which is a meaningful extension of the application of complex network theory to identify key components. Our work makes two important contributions.
(1) A multi-layer multi-granularity network model suitable for the identification of critical elements in the high-speed train system is presented, including virtual nodes and actual nodes. The rules for merging edges of the same component at different layers are given.
(2) Considering the topology structure, actual function and risk characteristics of the high-speed train system, an integrated importance ranking algorithm based on entropy weight and grey relation analysis is proposed. This algorithm compensates for the lack of actual features of the complex network theory.
The rest of this paper is structured as follows: Section 2 describes a multi-layer multigranularity network model and details the rule of fusion. Section 3 explains the integrated importance ranking algorithm of the proposed network model. A case study is used to verify the effectiveness of the network model and importance ranking algorithm in Section 4. Finally, conclusions are drawn in Section 5.

The network model
Based on the definition of dependency relationship between elements in train system in Wang et al. (2017), we define the connection relationship between the components in the established network as mechanical connection, electrical connection and information connection. Therefore, the conception of the mechanical layer, electrical layer and information layer is proposed. The multi-layer multi-granularity network S is built as follows: where S i is subsystem i of the train system, G i,a , G i,b and G i,g are the mechanical layer, electrical layer and information layer of subsystem i, V i is the set of nodes of subsystem i, E i,a , E i,b and E i,g are the set of links of three layers of subsystem i.
where v vir i is the virtual node of subsystem i, V real i is the set of real nodes of subsystem i, v real i;s is the real node s of subsystem i. Note that, nodes in different layers of the same subsystem are identical: is the link between v real i;s and v real i;t at layer u. The structure diagram of the network is shown in Figure 2. In the equipment system, each subsystem S i has a virtual node v vir i and several real nodes, and two real nodes may have different relationships at the mechanical, electrical and information layers.

Fusion rules of links of different layers
For the constructed multi-layer complex network, fusion rules of edges of the different layer are given (Zhang et al., 2020). With the help of a common vertex, the connections of different layers can be merged. For example, through node 5, e a are merged into one.
3. An integrated importance ranking algorithm An importance ranking algorithm considering the actual function and risk characteristics of the complex network based on the topology structure is proposed.

Application of complex network theory
3.1 Selection and calculation of indices 3.1.1 Indices of topology characteristics. The topology indicator K topo s of component s consists of four indices. Topology degree I topo deg s ð Þ is the simplest centrality measure of a node in a complex network, and the more links a given node is connected the more important it will be. Topology closeness centrality I topo close s ð Þ represents the "closeness" of a node to the others and the larger the value, the more important the node is. Topology betweenness I topo betw s ð Þ refers to the number of shortest paths through a given node in the complex network, and the larger the value, the more important the component. Topology efficiency I topo ne s ð Þ measures the network efficiency on the condition that the component s is in failure, and the smaller the value is, the more important it is, which is contrary to the judgment standard of the other three indices: where a st is the value of the sth row and the jth column of the adjacency matrix, m is the total number of nodes in complex network S; d st is the shortest path between node s and t, which is the number of links between two nodes; both values s ab (s) and s ab are related to the number of shortest paths between nodes a and b, but the difference is that the former only calculates the shortest paths through node s. The indicator K topo s mainly presents the topology characteristic of the complex network. The bigger the value K topo s is, the more important the component corresponding to the node is in the topology structure.
3.1.2 Indices of function characteristics. On the basis of the K topo s , the function importance indicator K func s is defined according to the importance of the components to the train operation.
where v s is the coefficient representing the function importance of node s, and the value can be given through the method of scoring by expert's experience (Table 1).
On the basis of rich practical experience, experts rank the importance of each node, with scores ranging from 0 to 1. If the value of K func s is large, the corresponding component plays an important role in ensuring the normal operation of the train.
3.1.3 Indices of risk characteristics. Considering the possibility and impact of the failure, the risk indicator K risk s is defined: where p s is the coefficient of occurrence frequency of node s counted from fault data ( Table 2); q sa is the impact on node a after the failure of node s; d À st is the risk shortest path from nodes s to t; l s is the severity of the impact on train operation when node s fails obtained from the historical text data. Note that, the impact on train operation we mentioned here means that the train has to be stopped temporarily and l s is the value of the current state that the node has degraded from the optimal state, where l s ranges from 0 to 100. A larger K risk s means that when the corresponding component fails, the greater the impact on train operation. Application of complex network theory 3.2 Integrated importance measure An importance ranking algorithm combining the entropy weight method and grey relational analysis is introduced here to measure the integrated importance.
3.2.1 Index preprocessing. Because the goals and directions of these indices are different, processing all performance values for every component into a comparability sequence is necessary. If there are m components and n index, the sth component can be expressed as I s = (I s1 ,I s2 ,Á Á Á,I sx ,Á Á Á,I sn ), where I sx is the performance value of index x of component s. The term I s can be translated into D s by use of one of the equations (7)-(8): Equation (7) is used for the-larger-the-better index and equation (8) is used for the-smallerthe-better index.
3.2.2 Entropy weight calculation. The concept of entropy is well suited to measuring the utility value of indices to represent the average intrinsic information transmitted for decisionmaking. In general, the smaller the entropy E i of a certain index is, the greater the variation degree of the index value is and the more information can be provided, the greater the weight of the index value is. On the contrary, the greater the entropy, the smaller the weight.
where m is the number of actual data for evaluation objects, n is the number of indices selected, f sx is the proportion of the sth component to the xth index, e x is the entropy value of the xth index, w x is the weight of the xth index.

Grey relational analysis.
The state of the components in the high-speed train system is regarded as a grey system, and the critical elements are identified based on the value of correlation degree. The reference sequence f sx is defined as d 0x = (d sx ,s = 1,2,Á Á Á,m): where g (I 0x ,I sx ) is the grey relational coefficient between I 0x and I sx ; r is the distinguishing coefficient, r [ [0,1]; C s is the weighted grey relational grade.

Numerical example
A typical CRH2 train system is taken as an example to illustrate the feasibility of the model and algorithm proposed in this study. The code is implemented in Ri386 3.4.3, uses Gephi to draw graphics and runs on a 64-bit Windows operating system.

Data and parameters
The components composing the rail train are more than 40,000 (Kou et al., 2018). To facilitate the analysis, we select some representative elements for study and finally form a  Table 3). Note that, virtual and actual nodes are encoded together in the network construction. For example, the virtual node "PANTOGRAPH" corresponds to the unshown node whose label should be "2" in Table 3.

Results
4.2.1 Fusion rules of links of different layers. Taking the bogie subsystem as an example to illustrate the fusion rules. CRH2 train adopts the 4M4T marshaling mode, its motor car uses the SKMB-200 power bogie and the trailer uses the SKTB-200 trailer bogie. According to the structure and fault data of CRH2 bogie and the extraction rules of components, a total of 31 components are extracted as nodes in each layer of the multi-layer network model of the bogie subsystem (Table 4).
Node Bogie is a virtual node that connecting with all the other nodes belonging to the bogie subsystem, presenting the affiliation relationships between the subsystem and components. In Figure 4, the affiliation connections are blue and the actual connections are in different shades of pink, which is proportional to the degree of each node. In Figure 4(c), the nodes labeled "50" and "57" are motor and main windpipe, which are hidden in the fused bogie network because they are divided into other subsystems. Figure 4(d) is the result of Application of complex network theory fusing the mechanical layer, electrical layer and information layer, which contains all the connection relationships of the bogie subsystem. The combined network comprehensively considers all connection relationships and greatly reduces the complexity of multi-layer network computing. Next, fusion rules are applied to the complex network of the high-speed train system and critical elements will be identified.
4.2.2 Indices calculation. After adopting the fusion rules, the complex network of the high-speed train system has 130 nodes and 370 edges. In Figure 5, the node is colored according to its outdegree value, which reflects the activity of the node in the network (Snijders, 2003). The outdegree value of each virtual node is relatively large because it has affiliation relationships with all actual nodes in its subsystem.
4.2.2.1 Indices in topology characteristics. I topo deg ; I topo close ; I topo ne and I topo betw are calculated as shown in Figure 6. The ranking results of importance will be quite different if it is performed according to each index separately, and therefore the topology index K topo is of great  9  13  17  21  25  29  33  37  41  45  49  53  57  61  65  69  73  77  81  85  89  93  97  101  105  109  113  117  121  125 significance to present the integrated topology importance (Figure 7). The weights of each indicator are: W topo = {0.280, 0.281, 0.165, 0.274}. Rank nodes by the value of K topo and the top 30 are shown in Table 5. The K topo 70 value of the bogie frame is the largest (0.788), that is, this component is the most important one from the perspective of the topology structure. This is followed by car body, motor bearing, axle box, etc., and the topology critical components are formed.   9  13  17  21  25  29  33  37  41  45  49  53  57  61  65  69  73  77  81  85  89  93  97  101  105  109  113  117  121  125 Application of complex network theory Overall, the difference of the K topo value is obvious in the whole system, the scores of the bogie subsystem and the air-brake subsystem are relatively higher than the other parts. Next, K func and K risk are calculated separately based on the result of K topo . 4.2.2.2 Indices in function characteristics. Based on the topology indices and function score, K func is obtained (Figure 8). The weights of each indicator are: W func = {0.272, 0.235, 0.165, 0.328}. Rank nodes by the value of K func and the top 30 are shown in Table 6.
Motor bearing is the most important component on the point of function importance that with the largest value of K func 50 (0.613), followed by bogie frame, axle box, brake   9  13  17  21  25  29  33  37  41  45  49  53  57  61  65  69  73  77  81  85  89  93  97  101  105  109  113  117  121  125 cylinder, wheel, etc. The difference in the K topo of the components is significant. In contrast, most components in the network have a high K func with an average of 0.35. That is to say, the components selected in this paper are very important in the process of train operation. 4.2.2.3 Indices in risk characteristics. Similarly, K risk is obtained as shown in Figure 9 and the top 30 are shown in Table 7. The weights of each indicator are: W risk ¼ f0:286; 0:266; 0:161; 0:287g: In terms of the probability of risk and the severity of the consequences, the bogie frame is the most important component, with a K risk 70 value of 0.584, followed by IGBT, gearbox, motor bearing, axle box, etc. The probability of nodes with high K topo and K func degenerating to failure is supposed to be small, but once the components fail, the consequences are very serious, so the value of K risk may be very large such as the bogie frame.
4.2.2.4 The important elements. The maximum values of the topology index K topo , the function index K func and the risk index K risk are used as the grey-reference. Based on the grey relational analysis, the integrated importance indicator K is obtained and the critical components are obtained (Table 8). Generally, based on a comprehensive analysis of the three indicators of topology, function and risk, the most important element of the CRH2 train system is the bogie frame. The bogie frame is the basic stress point of the bogie and the installation foundation of various components. It has a connection relationship with almost all components of the bogie subsystem, so the topology index value is very high and the integrated importance based on the topological structure is also relatively high, which means that it is a component that requires the focus of the relevant railway department. Carbody, motor bearing, axle box, gearbox and brake cylinder are ranked in the top 10 in all tables, which means that they are very important and require more attention to keep train operation safely. Here, the car body is defined as a combination of electromechanical components, that is, the train system except for the other five subsystems clearly given in the text. Most of the key elements of the CRH2 train system identified by complex network theory belong to the bogie subsystem, so this subsystem needs special attention.

Conclusions
This study aims to introduce a method for identifying critical elements of the CRH2 train system based on complex network theory. A multi-layer multi-granularity network model suitable for the CRH2 train system is presented, including virtual nodes and actual nodes. Based on the network characteristic index, entropy weight and grayscale theory, the integrated importance ranking algorithm considering the three dimensions of topology, function and risk is proposed. Finally, a CRH2 train system is provided. Compared with the identification results by topology, function and risk indicator separately, the critical elements identified by the integrated importance ranking algorithm is more reasonable because it comprehensively takes into consideration the characteristics of the complex network in three dimensions. The most important element of the identified key elements is the bogie frame and most of the components belong to the bogie subsystem of the train, which also means that this subsystem is the one that needs the most attention of relevant staff. In summary, the critical elements identification method based on complex network theory proposed in this paper enables decision-makers to not only improve train reliability in the design phase but also allocate maintenance resources more reasonably during the operation phase. At the same time, this method has universality and can be applied to the identification of critical elements of any other type of train on the basis of corresponding data.