ATukey’s g-and-h distribution based approach with PSO for degradation reliability modeling

Purpose – A generalized distribution with wide range of skewness and elongation will be suitable for the data mining and compatible for the misspecification of the distribution. Hence, the purpose of this paper is to present a distribution-based approach for estimating degradation reliability considering these conditions. Design/methodology/approach – Tukey’s g-and-h distribution with the quantile expression is introduced to fit the degradation paths of the population over time. The Newton–Raphson algorithm is used to approximately evaluate the reliability. Simulation verification for parameter estimation with particle swarm optimization (PSO) is carried out. The effectiveness and validity of the proposed approach for degradation reliability is verified by the two-stage verification and the comparison with others’work. Findings – Simulation studies have proved the effectiveness of PSO in the parameter estimation. Two degradation datasets of GaAs laser devices and crack growth are performed by the proposed approach. The results show that it can well match the initial failure time and be more compatible than the normal distribution and theWeibull distribution. Originality/value – Tukey’s g-and-h distribution is first proposed to investigate the influence of the tail and the skewness on the degradation reliability. In addition, the parameters of the Tukey’s g-and-h distribution is estimated by PSO with root-mean-square error as the object function.


Introduction
Reliability is a significant indicator to the performance estimation and the quality control of products. For highly reliably products, failure data are uneasy to get even in the accelerated test. In this case, degradation data will be an efficient alternative for reliability estimation. The existing models for degradation reliability in detail have been summarized (Gorjian et al., 2009;Ye and Xie, 2015). A common approach is to extrapolate the degradation paths to the failure threshold and obtain the pseudo-lifetime data for reliability estimation (Lu and Meeker, 1993). The other way to estimate reliability is based on the relationship between the degradation data and the failure threshold (Zuo et al., 1999). The relationship between models based on the degradation data and models based on the pseudo-lifetime data are stated by Meeker and Escobar (1998). However, the present models for degradation reliability are based on normal distribution, Weibull distribution and so on, and misspecification of distributions may exist. Hence, a generalized distribution is required.
The g-and-h distribution generated by the transformation of the standard normal distribution, is first proposed by Tukey (1977). Basic properties are studied (Jorge and Boris, 1984;Hoaglin et al., 1985;Macgillivray, 1992) and the modifications are extensively investigated (Morgenthaler and Tukey, 2000;Fischer, 2010). It has extensive range of skewness and elongation and corresponding control parameters, which could depict the properties of data in detail. Comparison of goodness-of-fit has been performed among gh-transformed distributions and other distributions, and the results proved that in most cases the original Tukey's g-and-h distribution would provide a better fit (Klein and Mittnik, 2002). In practice, Tukey's g-and-h distribution has been widely used in various fields, such as common stock return (Badrinath and Chatterjee, 1991), modeling extreme wind speeds (Dupuis and Field, 2004), operational risk (Degen et al., 2007), value-at-risk (Jiménez and Arunachalam, 2011), and simulation studies (He and Raghunathan, 2006). Nevertheless, it has been not investigated in reliability yet.
The advantages of Tukey's g-and-h distribution for fitting degradation paths are summarized as follows: It has wide range of skewness and kurtosis which could be robust with extreme values and outliers, and can judge whether the distribution is heavy tail. It is expressed by a quantile regression and can be easily introduced into the location-scale model, which will well deal with heteroscedasticity over time.
It can accommodate a rich class of common univariate distributions, such as normal, lognormal, Weibull and exponential (Morgenthaler and Tukey, 2000;Chaudhuri and Ghosh, 2016).
One prominent problem lies in the parameter estimation of Tukey's g-and-h distribution, especially for the degradation situation considering time. There are mainly three methods for parameter estimation, which are quantile method (Hoaglin et al., 1985), method of moments (Headrick et al., 2008), and maximum likelihood method (Majumder and Ali, 2008), respectively. Maximum likelihood estimation (MLE) is the most popular approach; nevertheless, its likelihood function cannot be written in closed form and finally induce the intractable computation. Therefore, numerical methods are introduced to solve the MLE by approximation, which may be complicated (Rayner and Macgillivray, 2002a;Xu and Genton, 2015;Bee and Trapin, 2016). Method of moments is another approach for parameter estimation, but it may be not appropriate for the situation varying with time.
Quantile-based estimation method would be a natural alternative for Tukey's g-and-h distribution. A weighted quantile-based estimation shows better effects in a number of ways than the numerical MLE for smaller and moderately-sized samples (Rayner and Macgillivray, 2002b). In addition, a rapidly convergent quantile based least squares estimation method (Xu et al., 2014) has been developed to fit the g-and-h distributional family parameters. However, when modeling the degradation path, more parameters EC 36,5 including g and h should be estimated. In order to overcome this difficulty, particle swarm optimization (PSO) (Kennedy and Eberhart, 1995) as a heuristic optimization method is brought in for quantile-based estimation. PSO has many advantages including simplicity of implementation, robust, and has been successfully used in data-driven fuzzy clustering (Chen and Zhao, 2009), solving the likelihood objective function of 3-p Weibull distribution for parameter estimation (Örkcü et al., 2015).
In this paper, emphasize is put on the reliability estimation by Tukey's g-and-h distribution for the degradation data. The rest of the paper is organized as follows. In Section 2, Tukey's gand-h distribution and its properties are presented, the influence of parameters on the distributions are investigated as well. In addition, the quantile-based estimation is introduced in detail. Section 3 employs an intelligent optimization method called PSO for the parameter estimation. The practical applications for degradation reliability estimation concerning GaAs laser device and crack growth are performed in Section 4. Conclusion is given in Section 5.

Tukey's g-and-h distribution
Definition Let Z be a random variable with the standard normal distribution, then a random variable U defined by: is said to have Tukey's g-and-h distribution. q g,h (z) should be a strictly increasing monotonic function of z, that is, q 0 g;h z ð Þ > 0 with parameters g [ R and h ! 0. When h < 0, the transformation is no longer strictly monotone increasing and induces a finite support of data (Klein and Mittnik, 2002), so in this paper, the parameter h will be constrained in the interval [0, 1).
Two subclasses of distributions based on equation (1) are the g and the h classes which are defined as follows: When g = h = 0, U = Z, so U has a standard normal distribution. Furthermore, the standard normal distribution could be replaced by other symmetric distributions, such as Laplace distribution, generalized error distribution and so on (Morgenthaler and Tukey, 2000;Klein and Mittnik, 2002).

Properties
Some properties are introduced for the following verification. The probabilistic density function (pdf) of standard normal distribution is: and its cumulative density function (cdf) is:

Degradation reliability modeling
The formulae for k-th moment of the g-and-h distribution is: Then the mean of Tukey's g-and-h distribution is: and the variance:

Relationship with reliability
Since the function for h > 0 is strictly increasing, the cdf of a g-and-h random variable U can be written as: where z ¼ q À1 g;h u ð Þ, which can be solved with the Newton-Raphson method, that is: in which n 0 is the iterative number, and by transformation of equation (10), we can obtain the function and its first derivative The corresponding reliability function is: EC 36,5 The pdf of U is expressed as: and the log-type (Xu and Genton, 2015) is: The influence of g and h on the pdf and the reliability are illustrated in Figures 1(a) and 1(b). As g increases, the distribution will become more skewed. When g = 0, U has a symmetric distribution. As h increases, the tails of the distribution get heavier. Moreover, all curves of reliability meet at the same point F U (0) = U(0) = 0.5, which is the median value.

Quantile-based estimation
Introducing the parameters of location m [ R and scale s > 0, the regression model is obtained as follows: Quantile-based parameter estimation approach was proposed by Tukey (1977), which used a letter value based sequence of quantiles to estimate the parameters sequentially. In this work, a brief description of the letter value based method is carried out with following steps (Hoaglin et al., 1985;Rayner and Macgillivray, 2002b;Bee and Trapin, 2016): Estimate parameter m by the median of the samples; For z p = -z 1-p , then by we can get: Calculate g p i ¼ Àlog y 1Àp i À y 0:5 ð Þ = y 0:5 À y p i ð Þ Â Ã =z p i with median rank estimator p i of samples, where y p i is the corresponding i-th sample quantile. Finally, the value g will be obtained by the mean of all g p i . In sequence, by the relationship is induced as: A plot of log UHS * p against z 2 p =2 would have intercept log(s ) and slope h. Regress log [(y 1-py 0.5 )/(exp(-gz p )-1)] or log [(y 0.5y 1-p )/(1exp(-gz p ))] on z 2 p =2 depending on g positive or negative to estimate s and h. By adjusting quantile selections, a quantile estimation method that works well for g and h estimation is demonstrated.
3. Parameter estimation of the g-and-h distribution 3.1 Degradation data description Degradation data are generally obtained from repeated measurements with designed time points and certain number of units. The measurements are stopped until the degradation measurements reach the pre-specified failure threshold or the terminated time. The assumptions about the manner which the test and measurement should be conducted (Lu and Meeker, 1993) are as follows: Sample units are randomly selected from a population or production process and random measurement errors are independent across time. Sample units are tested in a particular homogenous environment such as the same constant temperature. Measurement (or inspection) times are pre-specified, the same across all the test assets, and may or may not be equally spaced in time.
The lifetimes of the sample units are assumed to be independent and identically distributed random variables.
EC 36,5 In Figure 2, the degradation data may be censored at the threshold, however, they are of great significance on the regression of the mean with time-dependent (Zhou et al., 2012). So the rank of the censored data should be retained, and the degradation data should be ordered at every time point to make sure the rank. Consider that n units are tested, and the degradation measurements of each unit are available at time t 1 ; t 2 ; Á Á Á ; t m i . Given the degradation data (y ij , t j ), where y ij is the measured deterioration for the i-th unit at time t j , and i = 1, 2, Á Á Á, n, j = 1, 2, Á Á Á, m i . The sample path with equation (18) considering time can be rewritten as:ŷ where is the quantile expression of equation (1), and z p i ¼ U À1 p i ð Þ, with the median rank estimator: Finally, with equations (8) and (9), the mean and the standard deviation (Std.) of Y can be obtained as: The location m (t) and the scale s (t) can be selected as follows: Linear regression: Exponential regression: The reliability function evaluated at the time t is just the probability that the failure lifetime T beyond the time t: where F(·) denotes the cdf of the lifetime. For the increasing degradation path, the relationship between them is defined as: where D is the failure threshold. Then the reliability can be obtained by equation (15): and solution of the reliability with equation (12). According to the section 2.3, the quantile-based estimator is performed without considering the influence of time, which will lead the estimator to be complicated or inappropriate for the degradation modeling. Inspired by the quantile estimator (Lu and Meeker, 1993;Xu et al., 2014;Takeuchi et al., 2003), the unknown parameters will be estimated by minimizing the root-mean-square error (RMSE) on the degradation path of each unit as illustrated in Figure 2 using the equation (34): The rank of the censored data should be considered for computation. For convenient solution of the RMSE with many parameters, PSO may be a good choice for its good ability of global optimization and easy implement. It can perform parallel computation and simultaneously estimate all the parameters with the object function.

Particle swarm optimization
PSO (Kennedy and Eberhart, 1995) is an evolution technology based on the swarm intelligence, rooted from the research of bird predatory behavior. In the target searching space of N dimensions, there will be a group composed with M particles. The position vector of particle P i in the t-th iteration is X i (t) = (x i1 , x i2 , Á Á Á x iN ) T and the velocity vector V i (t) = (v i1 , v i2 , Á Á Á v iN ) T . The individual extreme point pbest i = (pbest i1 , pbest i2 , Á Á Á pbest iN ) represents the influence of the memory of particles themselves and makes the particles have the ability of EC 36,5 global searching, meanwhile avoids the local minimum. The global extreme point gbest i = (gbest i1 , gbest i2 , Á Á Á gbest iN ) represents the influence of the group information, and reflects the information sharing as well as cooperation among the particles. In the (t þ 1)-th iteration, the position and velocity of the particle P i can be updated by the following equations: where c 1 and c 2 are learning factors, and chosen to be 2 generally; rand() is the random number between 0 and 1; v is the inertia weight, which makes the method have better global search ability when big while tend to local search when small. In the computation, the expression of v is: where v max = 0.9, v min = 0.4 (Shi and Eberhart, 1997); Iter is the iterative number and Iter max the maximum iterative number. To prevent the particles leaving the searching space, every dimensional velocity v n of particles is confined in 3.3 Procedure for parameter estimation with particle swarm optimization (1) Set the population size M, and specify the convergence precision and the maximum iterative steps; (2) Initialize the velocity and the position of the particles; and (3) Set the learning factors c 1 and c 2 , as well as the inertia weight v ; (4) Give the fitness function namely the RMSE in this paper and the dimensions of the optimized parameters including g, h and the parameters in the location m (t) and the scale s (t); (5) Minimum the fitness function and compute the fitness values of all particles; (6) Obtain the global extreme value and the individual extreme value, and update the velocity and the position of the particles; (7) Check whether the terminal condition is satisfied, if not, continue steps (5) to (6); if is, stop computation and output the optimum values.

Simulation for verification of the estimation algorithm
To verify the effectiveness of PSO in estimating parameters of the proposed approach based on Turkey's g-and-h distribution, a simulation is performed. Degradation measures are generated with the following at m equispaced time points for n units under the following model: Degradation reliability modeling where i = 1, 2, Á Á Á, n; j = 1, 2, Á Á Á, m, z = j þ « , j $ N(0, 1), « $ N(0, 0.05). Different values of concerning parameters are listed in Table I. Considering the influence of the sample number and verification of different parameters, four groups of simulations with ten times are carried out. To control the big fluctuation at different time points, only one group of simulation data obeying the standard normal distribution is generated at every time, but a small disturb belonging to the standard normal distribution with mean 0, variance 0.05 is added at fifteen time points as illustration in Figure 2. In PSO, 1000 particles are set for computation. Mean and variance of the estimated parameters are computed, as well the bias between the true value and the mean. The results have proved that the approach could satisfy requirement of the estimation accuracy for the following study in degradation reliability investigation.
4. Applications 4.1 Degradation description 4.1.1 GaAs laser device. In this section, a practical case related to the degradation dataset of GaAs laser devices is presented to demonstrate the validity of the proposed estimation method (Meeker and Escobar, 1998). The operation current is measured as the degradation characteristic at 80°C, which increases over time. The device fails when the per cent increase in operating current exceeds a pre-specified threshold D = 10. Degradation data from 15 testing samples are collected, with observations made at assigned time point and plotted in Figure 3(a). It has the degradation trend of linearity, and expressions of the location and the scale by equation (28) respectively are: 4.1.2 Crack growth. Sets of fatigue crack growth data versus time (in million cycles) are plotted in Figure 3(b). These data sets were collected on twenty-one test specimens. All specimens have an initial crack length of 0.90 inch. The failure threshold is set as a critical  (29) will be necessary to fit the location and the scale with time: Minimizing the RMSE by PSO, the parameters could be estimated. First, the two important variables mean and Std. are computed by equations (26) and (27), and verified by comparison with the experiment data. Second, the proposed approach is applied in the lower thresholds where no failure occurs. The pseudo-lifetime data obtained by the interpolation in the lower thresholds are used to verify the reliability estimation. Then the proposed approach is used for the practical threshold and compared with others' results. The twostage verification can guarantee the validity of the approach, and the whole procedure is illustrated in Figure 4.

Parameter estimation
The particle number in PSO is set 300, convergence precision 1-e20. For GaAs laser device, the time is divided 250 for scale match, and the iteration is stopped until 1593 steps. For the crack growth, the iteration is stopped until 1740 steps. Parameters of Tukey's g-and-h distribution estimated with PSO are listed in Table II and the RMSE versus iterative numbers is plotted in Figure 5. The mean and the variance at every test time are computed by equations (26) and (27) and compared with the experiment data in Figure 6, while the relative errors of the mean and the Std. are computed by equation (43) expressed as: where z represents the experiment andẑ the simulation, and the results show that they both match well.

Reliability
KS-test is carried out for normal, Weibull and the g-h distribution. The g-h distribution fails to pass at time 250 for GaAs laser device, as well as at 0.01, 0.02 and 0.03 millions of cycles for crack growth, as shown in Figure 7. The reason may be that the data at the beginning are too concentrated. However, the g-h distribution shows better fitness than the other two   distributions at the rest time for the two practical applications with the correlation coefficients in Figure 8. In further, the Q-Q plots at the time t = 4000 h of the GaAs laser device and at 0.09 million cycles of the crack growth are respectively extracted as shown in Figure 9, which shows the advantage of the g-h distribution in fitting data distributions. Figure 10 gives the pdf of the degradation data varying with time for GaAs laser device and crack growth respectively. By equation (33), the degradation reliability is computed.
To verify the effectiveness of the approach for degradation reliability, the pseudo-lifetime data are obtained by interpolation at three different thresholds D = 4, D = 5, D = 6 of the GaAs laser device and D = 1.2, D = 1.3, D = 1.4 of the crack growth without censored data in Figure 11. Finally, the practical reliability for the GaAs laser device at failure threshold D = 10 and the crack at failure threshold D = 1.6 are computed respectively and compared with others' research results as shown in Figure 12, where the pseudo lifetime are obtained by extrapolating the censored data to the failure threshold, especially the first two lifetime data of the GaAs laser device and the first 12 lifetime data of the crack growth are practically failed.

Results
Regression of the degradation paths with Tukey's g-and-h distribution shows good fitness with the experiment data. Comparisons of the mean and the Std. between the simulation and the experiment data prove that PSO with RMSE for parameter estimation is effective. The approach with Tukey's g-and-h distribution for reliability estimation can well match the practical lifetime point at the beginning of failure, which is concerned with the tail the skewness of the distribution. Compared with other methods, the Tukey's g-and-h distribution for degradation reliability could have a good fit of the pseudo-lifetime at the descent stage of the curve as well.
From the above analysis, the skewness and the tail of the distribution should be considered to obtain reasonable results when modeling the degradation processes, because they can exactly predict the beginning time of the failure. Although the proposed approach may introduce more parameters, they make the model more flexible and reliable results can be governed. However, an important parameter h has to be restricted in the interval [0, 1] when h < 0, in order to approximate the distribution of the degradation data.

Conclusions
In this paper, an approach based on Tukey's g-and-h distribution for degradation reliability is proposed. PSO as an intelligent optimization algorithm is investigated for the parameter estimation and verified with the simulation study. Aim at no explicit expression of Tukey's g-and-h distribution, the Newton-Raphson algorithm is adopted for solution of reliability. The effectiveness of the proposed approach for the reliability estimation have been demonstrated through two practical degradation data. Clearly, the approach can well estimate the reliability by the two-stage verification. In further, it can well predict the reliability with censored data compared with others' results.