Causality analysis in process control based on denoising and periodicity-removing CCM

Purpose – 1.Toimprovethecausalityanalysisperformance,anovelcausalitydetectorbasedontime-delayed convergent cross mapping (TD-CCM) is proposed in this work. 2. Identify the root cause of plant-wide oscillations in process control system. Design/methodology/approach – A novel causality analysis framework is proposed based on denoising and periodicity-removing TD-CCM(time-delayed convergentcross mapping). We firstpoint out that noise and periodicity have adverse effects on causality detection. Then, the empirical mode decomposition (EMD) and detrended fluctuation analysis (FDA) are combined to achieve denoising. The periodicities are effectively removed through singular spectrum analysis (SSA). Following, the TD-CCM can accurately capture the causalities and locate the root cause by analyzing the filtered signals. Findings – 1. A novel causality detector based on denoising and periodicity-removing time-delayed convergentcross mapping(TD-CCM)isproposed.2. Simulationstudies showthat the proposedmethodis able toimprovethecausalityanalysisperformance.3.Industrialcasestudyshowstheproposedmethodcanbeusedtoanalyzetherootcauseofplant-wideoscillationsinprocesscontrolsystem. Originality/value – 1. A novel causality detector based on denoising and periodicity-removing time-delayed convergent cross mapping (TD-CCM) is proposed. 2. The influences of noise and periodicity on causality analysis are investigated. 3. Simulations and industrial case shows that the proposed method can improve the causality analysis performance and can be used to identify the root cause of plant-wide oscillations in process control system.


Introduction
Oscillations are a common cause of performance degradation of process control system. They may generate in one control loop and propagate to other units (Lang et al., 2018a). Then the whole plant may oscillate. These oscillations are likely to result in the waste of energy and materials, product quality fluctuation, and even poor safety (Chen et al., 2020a). It is important for engineers and operators to correctly identify the root cause of oscillations as soon as possible (Yang et al., 2014;Lang et al., 2019). However, due to the disturbances of noises, periodicities, nonlinearities (Lang et al., 2018b) and nonstationarities (Xie et al., 2016), it is difficult to achieve the goal (Chen et al., 2020b). Therefore, a lot of data-driven causality analysis methods have been introduced to revealing the propagation paths and locating the root cause in the past decade (Lindner et al., 2019a). A brief review is provided in the following. Bauer and Thornhill (2008) analyzed the cross-correlation between variables to detect causality. In this method, the time delay corresponding to the maximum value of crosscorrelation function (CCF) is regarded as the optimal delay. This CCF-based approach is simple and practical, but its results are not reliable in the case of nonlinearity. (Yuan and Qin, 2014) combined the principal component analysis (PCA) and the multivariate Granger causality to find the root cause of plant-wide oscillations. Granger causality is a popular method in various applications. However, it requires variables are independent and satisfy the linear stationary condition. Besides the causality methods in time domain, some related approaches are developed from the perspective of frequency domain, such as spectral Granger causality (SGC) , partial directed coherence (PDC) (Zhang et al., 2015) and directed transfer function (DTF) (Yang et al., 2014), to name a few. Spectral Granger causality is based on the Fourier Transform, which is limited to processing the linear and stationary signals. Both PDC and DTF can describe the direction of cause and effect, but are not able to quantify the direct causality.
The above methods are restricted to analyzing linear relationships, which is not consistent with the fact that most industrial processes are nonlinear. Bayesian network learning (Yang and Xiao, 2006) utilized the graph structure with conditional probabilities to describe the causalities among various process variables. Nevertheless, Bayesian network is a directed acyclic graph, which is not proper to model the dynamic process. And the physical explanation of probabilities in Bayesian network is not straightforward, which cannot be easily accepted by engineers. Although (Richardson et al., 1996) developed the cyclic causal discovery framework to allow the existence of cycles, it is difficult to collect massive industrial data. Lindner et al. (2019a) made a comprehensive analysis of Granger causality and transfer entropy in process control system and Lindner et al. (2019b) provided a systematic workflow for oscillation diagnosis using transfer entropy. However, Transfer entropy's performance depends on probability density function, which is not easy to be estimated.
In 2012, Sugihara et al. (2012) proposed a novel causality method called as convergent cross mapping (CCM) for analyzing relationships in complex systems. It is mainly based on the Takens' theorem (Takens, 1981) that if variable X is a cause of variable Y, then the historical information of X can be recovered from Y alone. CCM uses simplex projection to quantify the correspondence between the states of X and Y. And the Pearson correlation coefficients between estimated states recovered from Y and real states are adopted to measure the cross mapping ability, which indicates the causality degree. CCM has been widely used in various cases, but its criterion of convergence is too subjective. Recently, Ye et al. (2015) developed a time-delayed convergent cross mapping (TD-CCM) by assuming there is a time lag between the cause variable and the effect variable. It is reported that CCM shows the best performance when two variables are matched in accordance with proper time lag (Luo et al., 2017). TD-CCM overcomes the subjectivity of the conclusion under the original CCM and can directly give the corresponding results. TD-CCM is the most promising progress of CCM in recent years. However, when it is applied to analyze oscillations in process control, the noise and periodicity degrade its performance. To eliminate the influences, this paper proposes to combine empirical mode decomposition (EMD) (Huang et al., 1998) and detrended fluctuation analysis (DFA) (Kantelhardt et al., 2002) to achieve denoising. The periodicities are removed by singular spectrum analysis (SSA) (Hassani and Thomakos, 2010). Following, the TD-CCM can accurately capture the causalities and locate the root cause by analyzing the filtered signals. Simulations shows that the proposed denoising and periodicity-removing framework is able to improve the causality analysis performance of TD-CCM. In the end, the effectiveness and advantages of the proposed method are validated in Tennessee Eastman process.
The remainder of this paper is organized as follows. Section 2 provides an overview of CCM and TD-CCM. The proposed causality analysis framework based on denoising and periodicity-removing TD-CCM is detailed in Section 3. The industrial case is studied in Section 4, followed by conclusions.
2. Preliminaries 2.1 Convergent cross mapping (Sugihara et al., 2012) proposed the convergent cross mapping (CCM) based on Taken's theorem (Takens, 1981) to analyze the causality between different variables in a nonlinear system. CCM assumes that if variable X has an effect on variable Y, denoted as X → Y, Y will contain the information of X. The corresponding causality can be tested by measuring the correlation between reconstructed manifolds of X and Y. More specifically, for two time series X and Y with length N, the reconstructed states at time t can be expressed as X t ¼ ½X ðtÞ; X ðt À τÞ; X ðt À 2τÞ; . . . ; X ðt À ðE À 1ÞτÞ; Y t ¼ ½Y ðtÞ; Y ðt À τÞ; Y ðt À 2τÞ; . . . ; Y ðt À ðE À 1ÞτÞ; where E is the embedding dimension; τ is the time lag (default is 1). The embedding dimension is selected according to G-curve method (Liu et al., 2008). The sets of X t and Y t correspond to their shadow manifolds M X and M Y , respectively. According to Taken's theorem (Takens, 1981), if X and Y are coupled, M X and M Y are different observation forms of primitive manifold, i.e. diffeomorphism. Conversely, if there is no relationship between X and Y, the reconstructed states of M X and M Y will be far away. When X has an effect on Y, the information of M X can be accurately estimated from M Y , but M Y cannot be recovered from M X . CCM uses Pearson correlation coefficient to quantify the accuracy of X estimation from Y, shown as , the stronger the influence of X on Y.

Time-delayed convergent cross mapping
It is reported that CCM is effective to detect causality from systems with weak to moderate coupling strength, but strong unidirectional forcing may lead to the phenomenon of generalized synchrony (Sugihara et al., 2012). Besides, Ye et al. (2015) pointed out that CCM suffered two main limitations: (1) This method only considers the causality between the variables at the same time (zero lag). However, there are a lot of causalities with time delay in the actual process, in which CCM would meet difficulties or failures; (2) CCM judges causality by observing whether the correlation coefficient curve converges, which is subjective. To tackle these issues, Ye et al. (2015) proposed a time-delayed CCM (TD-CCM).

Causality analysis in process control
The main difference between CCM and TD-CCM is that the latter utilizes M Y to estimate x tþλ , in which λ is the time lag; while in CCM, M Y is used to calculate the estimation of x t .
In practice, different values of λ are tested to calculate the cross mapping index ρ. The λ corresponding to the maximum ρ value is the optimal time delay λ Ã . λ Ã ≤ 0 corresponds to the relationship X → Y, because λ Ã ≤ 0 means the past information of causal variable is contained in effect variable. If λ Ã > 0, it reflects the future information of cause variable can predict the effect variable, which is ridiculous. Thus, λ Ã > 0 indicates the causal relationship from X to Y is not tenable. In this way, TD-CCM can automatically and objectively determine causality.
Herein, a coupled Lorenz system (3) is constructed to show the performance of CCM and TD-CCM.
where ξ is noise. X 1 , Y 1 and Z 1 belong to system 1; X 2 , Y 2 and Z 2 belong to system 2; Y 1 and Z 1 , Y 2 and Z 2 are respectively coupled. Y 2 in system 2 affects Y 1 in system 1 in one direction. Z 1 and Z 2 are selected for causality analysis. Note that Z 2 is the cause of Z 1 . According to G-curve method (Liu et al., 2008), the embedding dimension is set as E Ã ¼ 3.
The causality results of CCM and TD-CCM are displayed in Figures 1a and b, respectively. It can be seen from Figure 1a that both red and blue curves tend to convergence, which indicates Z 1 and Z 2 are mutually coupled, i.e. two-way causality. This judgment contradicts the real situation of the system. From Figure 1b, It is observed that the optimal time lag of Z 2 → Z 1 is λ Ã ¼ −50, which indicates Z 2 → Z 1 ; while for Z 1 → Z 2 , the optimal time lag is positive, which is not consistent with the fact that the result event must happen after the cause event. It is concluded that Z 2 is the cause of Z 1 . Therefore, compared with the original CCM, the TD-CCM not only can automatically and objectively determine causality, but also solves the problem that the original CCM is not able to deal with strong coupling relationship.

Proposed framework
Although TD-CCM shows better performance than the original CCM, it meets difficulties in the case of processing signals from process control system. For data collected from industrial environments, they are contaminated by noise and periodicity (Chen et al., 2019), which have an adverse influence on TD-CCM. Therefore, a denoising and periodicity-removing framework is proposed to improve TD-CCM's performance in this section.

Adverse influences of noise and periodicity on TD-CCM
A four-input and four-output system with correlated disturbances is taken from Wang et al. For (a) CCM, the horizontal axis and vertical axis stand for the sample length and causality strength, respectively. Through the trends, it is judged that Z 1 and Z 2 are mutually coupled, i.e. two-way causality. For (b) TD-CCM, the horizontal axis and vertical axis stand for the time lag and causality strength, respectively. It is observed that the optimal time lag of which indicates Z 2 → Z 1 ; while for Z 1 → Z 2 , the optimal time lag is positive, which is not consistent with the fact that the result event must have happened after the cause event. Therefore, it is concluded Z 2 → Z 1 .
Second, we test the performance of TD-CCM when the signals are contaminated by various periodicities. The amplitudes and frequencies of periodicities vary from 12.5%   Causality analysis in process control to 100% and 30 to 100 samples per cycle, respectively. The corresponding causality results obtained from TD-CCM are listed in Table 2. It is observed that due to the presence of periodicities, TD-CCM often misjudges the causalities in this system. Therefore, the periodicities are expected to be removed before TD-CCM are applied to detect causality.
Remark: The adverse effects of noise and periodicity on TD-CCM can be explained from the following two aspects: (1) TD-CCM involves searching the nearest neighbor. Noises will disturb the performance of nearest neighbor algorithm. Thus the causality results obtained from TD-CCM may make mistakes; (2) Periodicities strengthen the invalid coordination between variable, thus covering up the valuable information transfer flow, which may lead to two-way causality or missing relationship.

Denoising and periodicity-removing
The last section demonstrates the necessity of denoising and periodicity-removing procedure. Herein, a denoising and periodicity-removing framework is proposed to achieve this goal.
First, the signals are decomposed into a series of modes by empirical mode decomposition (EMD) (Huang et al., 1998). EMD is a modern time-frequency analysis technique. The traditional signal processing methods, such as Fourier transform, are limited to processing linear and stationary signals (Lang et al., 2020a). Although the wavelet decomposition can be used to deal with complex signals, its parameters, such as decomposition level and mother wavelet, should be provided in advance. On the contrary, EMD is totally adaptive and datadriven. It is capable to extract the intrinsic mode functions from nonlinear and nonstationary signals through a recursive sifting process that makes use of signal extrema. More specifically, by interpolating the extremum of the signal, the average values of the upper and lower envelops are obtained, and then the local average values of the signal are obtained. These local average values are the low-frequency estimation of the data, and then they are removed from the input data recursively to generate the high-frequency mode in the signal. The process is repeated until all principal oscillatory modes present in the data are recovered. In EMD, the signal xðtÞ can be expressed as where d i ðtÞ is the mode and rðtÞ is trend. After the signal is decomposed into a set of modes by EMD, the next task is to identify which modes are noise. There are many methods that can be adopted, such as permutation entropy (Lang et al., 2020b), normalized correlation coefficient (Chen et al., 2020c), etc. Herein, the detrended fluctuation analysis (DFA) (Bryce and Sprague, 2012) is utilized to identify whether the mode is noise. Specifically, a measurement factor α of each mode can be calculated by DFA. If α ≤ 0:5, the corresponding mode is regarded as noise. That is to say, only modes with α > 0:5 are retained. The threshold 0.5 is recommended by Peng et al. (1995). Now, we turn our attention to remove periodicities. First, the signal is decomposed by singular spectrum analysis (SSA) to obtain the eigenvalues and eigenvectors. Because the eigenvectors corresponding to periodicities are sine or cosine sequences with the same frequency and phase, the scatter diagram of a pair of eigenvectors of periodicities will form an approximate polygon. In this way, periodicities can be distinguished.
The proposed causality analysis framework is described in Figure 4. It consists of three parts: denoising, periodicity-removing and TD-CCM.

Simulation
In order to demonstrate the effectiveness and advantages of the proposed framework, the four-input and four-output control system (4) is used as a subject. Loop 3 is contaminated by noise and periodicity. Its original data, noise, periodicity, and filtered signal are displayed in Figure 5. It is observed that the noise and periodicity are accurately extracted through EMD-DFA-SSA procedures. Then, we apply TD-CCM to the original data, and denoising and periodicity-removing data, respectively. The causality results are reported in Figures 6a and b, respectively. It can been seen that the original TD-CCM misjudges the causality L 4 → L 2 (red dotted line) and misses the causality between L 1 and

Industrial case study
In this section, the Tennessee Eastman process is used to demonstrate the utility of the proposed method in industrial situation (Zheng et al., 2020). The corresponding  process schematic is shown in Figure 7. It mainly consists of five parts: a reactor, a recycle compressor, a stripper, a product condenser, and a vapor separator. The process is regulated under a decentralized control strategy (Ricker, 1996). The predefined fault is added into loop 16. It can be seen from Figure 8 that the predefined fault propagates to other loops and results in the plant-wide oscillations (Lang et al., 2018c). In this test, the sampling period is 0.1 h and the analyzed data are sampled from point 250-700. The causality detected by the proposed method is provided in Figure 9. For comparison, multivariate Granger causality (Lindner et al., 2019a) is also tested and the corresponding results are plotted in Figure 10. Because of the large number of process variables, the causal network 9 is too complex. It is not easy to distinguish the root cause with the naked eye. A net causal flow (Yuan and Qin, 2014) can be computed for each variable which is equal to the number of outgoing flows minus the number of incoming flows. According to Yuan and Qin (2014), a node with a high positive causal flow is likely to be a source; while a high negative causal flow represents a likely sink in the causality network. Figures 11a and b depict the causal flow obtained from the proposed method and multivariate Granger causality, respectively. The significance level of hypothesis test is 0.05 for multivariate Granger causality. In Figure 11a, the variable 16 and 17 have the largest positive causal flow, and thus are regarded as the potential root causes. Because variable 16 has one-way causal effect on variable 17 in Figure 9, it can be concluded that variable 16 is the real root cause. By contrast, the potential root causes of multivariate Granger causality are variable 6 and 15, which do not contain the predefined root cause. Therefore, the proposed method is successful to be applied to analyze the root cause in Tennessee Eastman process. And it shows better performance than the classical multivariate Granger causality.  Multivariate Granger causality. The potential root causes are marked with red asterisk. It is observed that the potential root causes of the proposed method contain the real source; while that of Granger causality method excludes the true source

Conclusions
In this paper, a novel causality analysis framework is proposed based on denoising and periodicity-removing TD-CCM. First, the adverse effects of noise and periodicity are investigated. Then, the EMD and FDA are combined to achieve denoising. And SSA are used to remove the periodicities. Simulations demonstrate the proposed denoising and periodicity-removing procedure can effectively improve the performance of TD-CCM. It can reduce the miscalculation and omission of causality. In the end, the proposed causality analysis framework is applied to Tennessee Eastman process to identify the root cause of plant-wide oscillations. The application results show that the proposed method is effective and promising for root cause diagnosis in process control system.