## Abstract

### Purpose

The safe operation of the metro power transformer directly relates to the safety and efficiency of the entire metro system. Through voiceprint technology, the sounds emitted by the transformer can be monitored in real-time, thereby achieving real-time monitoring of the transformer’s operational status. However, the environment surrounding power transformers is filled with various interfering sounds that intertwine with both the normal operational voiceprints and faulty voiceprints of the transformer, severely impacting the accuracy and reliability of voiceprint identification. Therefore, effective preprocessing steps are required to identify and separate the sound signals of transformer operation, which is a prerequisite for subsequent analysis.

### Design/methodology/approach

This paper proposes an Adaptive Threshold Repeating Pattern Extraction Technique (REPET) algorithm to separate and denoise the transformer operation sound signals. By analyzing the Short-Time Fourier Transform (STFT) amplitude spectrum, the algorithm identifies and utilizes the repeating periodic structures within the signal to automatically adjust the threshold, effectively distinguishing and extracting stable background signals from transient foreground events. The REPET algorithm first calculates the autocorrelation matrix of the signal to determine the repeating period, then constructs a repeating segment model. Through comparison with the amplitude spectrum of the original signal, repeating patterns are extracted and a soft time-frequency mask is generated.

### Findings

After adaptive thresholding processing, the target signal is separated. Experiments conducted on mixed sounds to separate background sounds from foreground sounds using this algorithm and comparing the results with those obtained using the FastICA algorithm demonstrate that the Adaptive Threshold REPET method achieves good separation effects.

### Originality/value

A REPET method with adaptive threshold is proposed, which adopts the dynamic threshold adjustment mechanism, adaptively calculates the threshold for blind source separation and improves the adaptability and robustness of the algorithm to the statistical characteristics of the signal. It also lays the foundation for transformer fault detection based on acoustic fingerprinting.

## Keywords

## Citation

Chen, L., Xiong, L., Zhao, F., Ju, Y. and Jin, A. (2024), "Research on blind source separation of operation sounds of metro power transformer through an Adaptive Threshold REPET algorithm", *Railway Sciences*, Vol. ahead-of-print No. ahead-of-print. https://doi.org/10.1108/RS-07-2024-0026

## Publisher

:Emerald Publishing Limited

Copyright © 2024, Liang Chen, Liyi Xiong, Fang Zhao, Yanfei Ju and An Jin

## License

Published in *Railway Sciences*. Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) licence. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this licence may be seen at http://creativecommons.org/licences/by/4.0/legalcode

## 1. Introduction

The metro power transformer is the heart of ensuring stable operation of the metro system. It is responsible for converting high-voltage electricity into voltage levels suitable for metro systems, providing power for metro trains as well as lighting, air conditioning, elevators, and other equipment in stations. Therefore, the safe operation of the transformer directly relates to the safety and efficiency of the entire metro system. During operation, transformers generate specific acoustic patterns. Through voiceprint technology, the sounds emitted by the transformer can be monitored in real-time, analyzing abnormal characteristics in the audio signals, thereby achieving real-time monitoring of the transformer’s operational status. The voiceprint technology can also identify acoustic features emitted by the transformer under different fault conditions, such as winding failures, insulation damage, etc. By conducting in-depth analysis of the audio signals, technicians can accurately diagnose potential faults in the transformer and take preemptive measures to prevent the escalation of faults.

The environment surrounding power transformers is filled with various interfering sounds that intertwine with both the normal operational voiceprints and faulty voiceprints of the transformer, severely impacting the accuracy and reliability of voiceprint identification. In order to accurately detect the operational condition of the transformer in noisy environments, particularly to detect early signs of potential failures, effective technical means are required to separate pure faulty voiceprints from mixed signals. This is the primary demand for the intervention of blind source separation techniques in acoustics.

Initially, research abroad concerning pre-processing of transformer sounds primarily focused on aspects of transformer noise control. Wang *et al.* (2018) investigated the feasibility of using acoustic metamaterials in sound barriers to reduce noise from power transformers. They proposed a sound barrier integrated with active noise control methods to cancel out diffracted sound waves near the edges of the barrier. This method contributes to enhancing the noise reduction performance of sound barriers used for transformer noise reduction. Through systematic research on the mechanisms of noise generation, frequency spectra, types of sound sources, and noise control measures, Ruan, Li, and Wei (2011) provided theoretical support for further improving the prediction accuracy of noise from high-voltage direct current (HVDC) transformers. Marcsa (2019) analyzed and proposed a multiphysics model based on finite elements, investigating the impact of anisotropy and magnetostriction on the performance of silicon steel sheets. Regarding the issue of noise interference encountered during the process of fault diagnosis based on voiceprints, researchers have also put forward some solutions. Secic, Jambrosic, and Kuzle (2018) utilized the FastICA algorithm to separate mixed sound signals near on-load tap-changing transformers. Rafii and Pardo (2013) introduced the Repeating Pattern Extraction Technique (REPET), based on the characteristic of musical pieces having an underlying repetitive structure onto which different elements are superimposed. Accordingly, this technique is employed to separate the repetitive “background” from the non-repetitive “foreground” in mixtures. The fundamental concept involves identifying periodic repeating segments within the audio, comparing them to a repeating segment model derived from these segments, and extracting the repeating patterns via time-frequency masking. Experiments on a dataset comprising snippets of 1000 songs and 14 full-length songs demonstrate that this approach can be successfully applied to the blind source separation of background and foreground sounds.

In China, Yu and Wang (2012) adopted the JADE algorithm, utilizing MATLAB and LabVIEW to simulate transformer vibration signals, thereby achieving the separation of winding and core vibration signals. Li and Sun (2007) addressed the problem of eliminating various interfering noises in partial discharge monitoring signals, proposing an Independent Component Analysis algorithm combined with Empirical Mode Decomposition. They constructed a virtual reference signal, forming a multi-dimensional signal with the observed signal for ICA-based separation of partial discharge pulse signals and noise. Simulation experiments indicate that this method exhibits good noise reduction effects. Guo *et al.* (2012) proposed a blind source separation method for transformer winding and core vibration signals based on the Subspace Dependent Independent Component Analysis (SDICA) approach. Firstly, they explained the blind source separation method based on the SDICA algorithm, verified this algorithm using simulated signals they had constructed, and compared its separation results with those of the conventional blind source separation algorithm - the Fast Independent Component Analysis (FastICA) algorithm. Zhou, Wang, Dang, Zhang, and Liu (2020), based on the ensemble empirical mode decomposition (EEMD) results of acoustic signals, separated the UHV AC transformer sound signals from interfering sound signals through constructing a K-singular value decomposition dictionary and applying the orthogonal matching pursuit algorithm. Zhu (2021) conducted a comparative analysis of denoising methods based on wavelet thresholding, denoising methods based on Fast ICA, and spectral subtraction, evaluating their effectiveness in separating transformer sound signals under different types of noise interference. He optimized the wavelet threshold denoising method by hierarchically selecting thresholds for sound signals, enhancing the denoising effect.

In summary, there has been some progress in domestic and international research on transformer fault diagnosis based on voiceprints. However, blind separation of transformer sound and accurate fault diagnosis in the noisy environment of substations is still challenging. Blind source separation techniques based on thresholds often rely on pre-set thresholds, which may lack flexibility when faced with varying intensities and types of noise. Consequently, this study proposes an adaptive threshold REPET method, aiming to adjust the separation strategy according to the real-time statistical characteristics of the signal. This approach effectively deals with the randomness of environmental noise, enhances the robustness of sound preprocessing, increases sensitivity to periodic repeating patterns, adapts to complex and changeable acoustic environments, and ultimately improves the accuracy and efficiency of fault diagnosis.

## 2. Analysis of transformer sound characteristics

### 2.1 Microphone selection

Detecting the operational status of transformers based on acoustic fingerprints requires the selection of suitable microphones to accurately capture and analyze the sound signals produced by transformers. Common types of microphones mainly include electret condenser microphones, dynamic microphones, and condenser microphones (Xu, Zhang, & Zhao, 2002). A comparison of microphones is shown in Table 1, summarizing their characteristics in terms of sensitivity, frequency response, electromagnetic interference resistance, and price.

From Table 1, it is evident that electret condenser microphones offer superior technical specifications but come at a higher price. Dynamic microphones are more economically priced but relatively weaker in terms of technical performance. Condenser microphones, however, maintain good technical performance while offering a wide range of price options. Therefore, this study selects condenser microphones as the acquisition device. Specifically regarding microphone performance parameters, according to the Nyquist theorem, a microphone with a sampling rate around 50 kHz can ensure the effectiveness of data analysis. The frequency response range of transformer operating sounds is concentrated between 20 Hz and 20 kHz (Ma, Xie, Zhao, Li, & Xu, 2018).

### 2.2 Analysis of sound characteristics of transformer core operation

In the process of analyzing the operational sound data of a 35 kV transformer in a certain metro system, we first conducted a study and analysis of the collected sound data in both the time and frequency domains. These data not only provide an intuitive perception of the operating state of the transformer but also reveal the complex patterns of its internal mechanical and electromagnetic activities (Li, 2022).

Firstly, Figure 1 presents the time-domain plot of the transformer’s sound, illustrating the characteristics of the sound signal as it varies over time. Within the figure, it is evident that the time-domain data of the transformer’s sound exhibits a pronounced periodic nature. This periodicity reflects the cyclical vibrations of the transformer’s internal structure and mechanical components during the transmission of electrical energy, indicating that the transformer is operating in a stable state. This also provides significant clues for the detection of the transformer’s health condition.

Secondly, Figure 2 displays the frequency-domain plot of the transformer’s sound. The frequency-domain plot utilizes techniques such as Fourier transformation to convert the sound signal from the time domain to the frequency domain, thereby revealing the distribution of different frequency components within the sound signal. In the figure, we can observe that the frequency components of the sound signal are primarily distributed between 20 Hz to 20 kHz, which indicates that, although the sound produced by the transformer during operation encompasses a variety of frequency components, the main energy distribution is still concentrated in the low-frequency range. This low-frequency characteristic is related to the vibrations of the transformer’s large mechanical components and changes in the electromagnetic field. The audible sound signals from the transformer mainly originate from the vibrations of the windings, core, tank, and cooling fans (Wu, 2012; Pan, Zhao, & Li, 2009).

Further analysis indicates that the 100 Hz and its harmonic components dominate the frequency-domain plot. These harmonic components may originate from the periodic changes in the electromagnetic field within the transformer and the resonance of mechanical parts. For instance, the fundamental frequency of 100 Hz could correspond to the frequency of the transformer’s main magnetic flux change, while its harmonics might be generated due to the interaction of the electromagnetic field with other structural components. The phenomenon of harmonics not only provides information about the internal electromagnetic field and mechanical structure of the transformer but also aids in identifying potential faults and abnormal conditions.

## 3. The adaptive REPET method

### 3.1 Basic principle

Blind Source Separation (BSS) is a technique aimed at recovering independent source signals from mixed signals. The development in this field has yielded a variety of algorithms and theoretical frameworks designed to address different types of signal mixing and application scenarios. The widely used algorithm for BSS is Independent Component Analysis (ICA), which is a statistical method that separates source signals by maximizing their statistical independence. It assumes that the source signals are statistically independent and achieve separation through the central limit theorem and non-Gaussianity.

The Repeating Pattern Extraction Technique (REPET) is a single-channel BSS algorithm primarily used to separate non-stationary interference sounds from continuous and stable signals, especially suitable for signal processing that contains periodic repeating patterns. The core of the REPET algorithm lies in identifying and utilizing the repeating patterns within the signal, which usually correspond to the periodic structure of the signal. By identifying these repeating patterns, the algorithm can construct a model to represent the background characteristics of the signal, thereby separating it from the foreground signal. The flow diagram of blind source separation based on REPET is shown in Figure 3.

The steps of the algorithm are as follows.

#### 3.1.1 Periodicity detection

Initially, the Short-Time Fourier Transform (STFT) of the mixed signal is calculated, using a Hamming window for sampling to obtain the amplitude spectrum, which is the square of the elements of the amplitude spectrum. Then, the autocorrelation matrix of each frequency channel over time is computed to identify the signal’s repeating periods. To highlight the peaks that appear periodically, the matrix is normalized, with the first item of the autocorrelation serving as the normalization factor. The identification of the repeating period is accomplished by observing the peaks in the autocorrelation results, which correspond to the periodic structures within the signal.

Calculate the STFT of the mixed signal *b*(*l*) of b is divided by its first term, *b*(*l*), to get the normalized value of *b*. The calculation formula is as follows:

In the formula, *i*, *l*) refers to the element of matrix A. Calculate the overall autocorrelation vector

#### 3.1.2 Repeat segment modeling

After identifying the repeating period, the amplitude spectrum is divided into multiple segments, each with a length equal to the repeating period. By taking the median of the elements of these segments, a repeat segment model S is constructed, which represents the statistical characteristics of the repeating structure. The calculation formula is as follows:

#### 3.1.3 Repeat pattern extraction

After calculating the repeating spectral model W, time-frequency masking techniques are used to extract the repeating background part from the mixed signal based on the aforementioned model, and use it to derive a soft time-frequency mask M. By comparing with the corresponding part of the mixed signal’s amplitude spectrum, the minimum value of the two is taken to construct the repeating spectral model. This mask highlights the repeating parts while suppressing the non-repeating foreground content, such as human speech. The aim of this step is to extract the specific manifestation of the repeating pattern.

The calculation formula for the repeating spectral model W is shown in Equation (5):

#### 3.1.4 Soft time-frequency mask generation

Normalize the repeating spectral model with the mixed spectrum to create a soft time-frequency mask. The mask is used to distinguish the foreground and background of the signal. By setting a threshold, the mask is binarized to further derive a binary time-frequency mask. The soft time-frequency mask M is deduced by normalizing the corresponding elements of the mixed spectrum V with respect to W. The calculation formula is shown in Equation (6):

Symmetrize the time-frequency mask M and multiply it element-wise by the STFT of the mixed signal X. Convert the resulting STFT back to the time domain to obtain the estimated background signal. By simply subtracting the time-domain background signal from the mixed signal

### 3.2 Adaptive threshold

Adaptive threshold computation is a method of dynamically adjusting thresholds that is widely applied in the field of signal analysis, especially when the local characteristics of the signal change significantly. The core principle is to determine the optimal threshold based on the statistical properties of the local area’s data, rather than using a single global threshold. Building upon the principles of the REPET algorithm, the introduction of a dynamic threshold adjustment mechanism enhances the algorithm’s adaptability and robustness to different signal statistical characteristics. The energy distribution of the mixed signal is measured by the short-time energy or the average short-time energy; the short-time energy is the average of the sum of squares of each frame of the signal, reflecting the local intensity changes of the signal. The dynamic threshold is set according to the global or local energy distribution of the signal. The method adopted in this paper is to use the average energy of the signal plus two standard deviations as the adaptive threshold. The main basis for choosing two standard deviations as the adaptive threshold is the principle of statistics; in a normally distributed dataset, approximately 95% of the data points fall within two standard deviations of the mean. If the dataset follows a normal distribution, then choosing the mean plus or minus three standard deviations as the threshold can cover almost all normal data points, with outliers exceeding this range. Setting the threshold in this way can sensitively capture signal changes while reducing the impact of noise and improving the accuracy of signal processing. In the REPET algorithm’s periodicity detection step, the originally static and fixed threshold is now replaced by a dynamically calculated threshold. The specific method is as follows.

In the REPET method, the soft time-frequency mask M is subjected to binarization by setting a threshold.

In this context,

This approach effectively addresses issues such as signal strength fluctuations or background complexity, ensuring the adaptability and robustness of the threshold. It solves the problem of fixed-threshold-based separation techniques, which often rely on pre-set fixed thresholds and are not flexible enough when facing noises of different intensities and types, leading to poor separation effects.

## 4. Experiment

To verify the effectiveness of the algorithm, a mixed signal containing human speech and the operational sound of a 35 kV dry-type transformer was separated using the algorithm proposed in this paper, and the results were compared with those obtained from blind source separation using the FastICA algorithm.

The frequency response of the transformer’s operating sound and audible sound ranges from 20 Hz to 20 kHz, therefore, according to Nyquist theorem, the sampling rate is selected to 50 kHz to ensure the effectiveness of data analysis, and the maximum sound pressure level range of the microphone is 0 to 120 dB, the sensitivity is 5.5 mV/Pa, and the signal-to-noise ratio is 100 dB. In this paper, the Hamming window is selected as the window function, and the sampling points of the short-time Fourier transform are set to be 1024 in length, and the frame shift is 0.5 in each Hamming window length, that is, 512 sampling points.

### 4.1 Blind source separation results of the FastICA algorithm

Figures 4–6 are the results of the separation of the mixed sound signal by using the FastICA algorithm, and the time spectrogram of the mixed signal, the separated foreground signal and background signal in turn. Time spectrogram is a method of visualizing a signal in both time and frequency dimensions, and the frequency component of the signal at different time points can be observed.

Figure 4 is a time-frequency spectrogram of a mixed signal, showing the time-frequency distribution of the mixed sound signal, which includes the sound of a person speaking and the sound of a transformer running, which overlap in the time-frequency domain. After the blind source separation, Figure 5 shows the time-frequency distribution of the foreground signal (human voice) separated by the FastICA algorithm, usually the human voice is mainly concentrated in the frequency range from a few hundred hertz to several thousand hertz, and the intensity of the separated foreground signal in the time period of 1.5 s-2.5 s is significantly enhanced, but the low-frequency component of the transformer’s operating sound still exists. Figure 6 shows the frequency component of the sound of the separated transformer operation, mainly including low-frequency vibrations, and it can be seen that there is still a lot of interference from the voice. Since the FastICA algorithm is based on the assumption that the source signals are statistically independent of each other, there is a certain degree of correlation between the transformer sound and the related noise in terms of statistical characteristics, and the FastICA algorithm uses the non-Gaussian nature of the source signal to achieve separation, which will make it difficult for the algorithm to distinguish different source signals when the non-Gaussian nature of the source signal in the mixed signal is not obvious.

### 4.2 Blind source separation results of the adaptive threshold REPET method

Figures 7 and 8 present the results of separation performed by the algorithm proposed in this paper. The foreground signal, background signal, and the time-frequency spectrum of the mixed sound data obtained from blind source separation are depicted in the figures.

By comparing the time-frequency spectra of the sound signals, it can be observed that the foreground sound (speech) has been essentially completely separated from the mixed signal, and the noise in the background sound (transformer operation sound) has also been suppressed. By comparing Figures 5 and 7, it is observed that the Adaptive Threshold REPET algorithm can more effectively separate the human voice while reducing noise and interference. By comparing Figures 6 and 8, it can be seen that the Adaptive Threshold REPET algorithm performs better in suppressing background noise and extracting pure transformer operating sound, and retains the operating low-frequency sound of some transformers. Compared to the FastICA method, the algorithm proposed in this paper has largely filtered out the foreground sound, demonstrating that the algorithm has a good separation effect on the preprocessing of transformer operation sounds.

## 5. Conclusion

- (1)
This paper analyzes the characteristics of transformer sounds, starting with the selection of microphones for collecting transformer operation sounds, and analyzes the frequency domain and time domain characteristics of the transformer’s own operation sounds, respectively.

- (2)
This paper proposes an Adaptive Threshold REPET algorithm, which, based on the principle of the REPET algorithm, employs a dynamic threshold adjustment mechanism to adaptively calculate thresholds for blind source separation, thereby enhancing the algorithm’s adaptability and robustness to the statistical characteristics of the signals.

- (3)
By employing both the FastICA method and the Adaptive Threshold REPET method to perform blind source separation on mixed sound source data, and comparing their separation results in terms of time-frequency spectra, it is concluded that the Adaptive Threshold REPET method achieves good separation results.

- (4)
Blind source separation of transformer sound signals is of significant importance for the feature extraction and fault diagnosis of transformer sound signals. Therefore, the proposed method has certain application prospects and practical value.

## Figures

Microphone comparison table

Sensitivity | Frequency response | Resistance to electromagnetic interference | Price | |
---|---|---|---|---|

Electret condenser microphone | Providing higher sensitivity, it can capture faint sounds | Having a wider frequency response range, it can cover the entire frequency spectrum, potentially generated by transformers | They typically feature good electromagnetic shielding design and interference resistance, making them suitable for use in complex electromagnetic environments | The price is relatively higher, typically falling into the category of more expensive sound acquisition equipment |

Condenser microphone | Relatively higher sensitivity, capable of effectively capturing minute sound signals | Possessing a broad frequency response range, it reacts well to sound signals of different frequencies from the transformer | They are typically designed with certain electromagnetic shielding measures, which can suppress the impact of electromagnetic interference on signal acquisition to a certain extent | The price range is relatively broad, varying from mid-to-low end to high-end products |

Dynamic microphone | Generally lower, unable to effectively capture minute sound signals | The frequency response range is relatively narrower, possibly failing to cover all the frequencies generated by transformers | Relatively poorer, prone to interference in strong electromagnetic field environments, which can lead to unstable signals or signal distortion | Relatively lower, generally belonging to a more economical and cost-effective choice |

**Source(s):** Authors' own work

## References

Guo, J., Ji, S., Shen, Q., Zhu, L., Ou, X., & Du, L. (2012). Blind source separation technology for the detection of transformer fault based on vibration method. Transactions of China Electrotechnical Society, 27(10), 68–78.

Li, M. (2022). Research on voiceprint recognition method of transformer fault under high noise environment. MA thesis. Beijing: North China Electric Power University (Beijing).

Li, H., & Sun, Y. (2007). Extraction of partial discharge signals using independent component analysis. Proceedings of the CSU-EPSA, 19(2), 5.

Ma, C., Xie, R., Zhao, L., Li, Z., & Xu, S. (2018). Characteristic analysis of transformer audible acoustic signals. Power Systems and Big Data, 1(2), 9.

Marcsa, D. (2019). Noise and vibration analysis of a distribution transformer. In 2019 Applications of Electromagnetics in Modern Engineering and Medicine (PTZE) (pp. 109–112). New Jersy: IEEE.

Pan, L., Zhao, S., & Li, B. (2009). Electrical equipment fault diagnosis based on acoustic wave signal analysis. Electric Power Automation Equipment, 29(08), 87–90.

Rafii, Z., & Pardo, B. (2013). Repeating pattern extraction technique (REPET): A simple method for music/voice separation. IEEE Transactions on Audio Speech and Language Processing, 21(1), 73–84, doi: 10.1109/tasl.2012.2213249.

Ruan, X., Li, Z., & Wei, H. (2011). Study on noise prediction and control methods of the converter transformer. In 2011 International Conference on Electric Information and Control Engineering (pp. 5392–5394). Wuhan: IEEE.

Secic, A., Jambrosic, K., & Kuzle, I. (2018). Blind source separation as an extraction tool of the useful diagnostic material in on load tap changer audio based diagnostics. In 2018 IEEE PES Innovative Smart Grid Technologies Conference Europe (ISGT-Europe) (pp. 1–6). Bosnia and Herzegovina: IEEE.

Wang, T., Jin, T., Lu, Z., Zhang, C., Liu, G., Zhao, X., & Li, C. (2018). Research on active noise control method compensating for acoustic metamaterial noise barrier in transformer noise reduction. In 2018 IEEE 2nd International Electrical and Energy Conference (CIEEC) (pp. 648–652). Beijing: IEEE.

Wu, S. (2012). Research on fault diagnosis of power transformer based on acoustic features. MA thesis. Wuhan: Huazhong University of Science and Technology.

Xu, Z., Zhang, S., & Zhao, G. (2002). Choice and use of microphones. China Medical Education Technology, 16(2), 2.

Yu, H., & Wang, P. (2012). Research on vibration signal separation of transformer based on JADE algorithm. Modern Electric Power, 29, 42–49.

Zhou, D., Wang, F., Dang, X., Zhang, X., & Liu, S. (2020). Blind separation of UHV power transformer acoustic signal preprocessing based on sparse representation theory. Power System Technology, 44(08), 3139–3148.

Zhu, K. (2021). Research on the extraction method of transformer voiceprint features under high noise environment. MA thesis. Beijing: North China Electric Power University (Beijing).

## Acknowledgements

This research is supported by the China Academy of Railway Sciences Corporation Limited (2023YJ257).