Performance evaluation of direction-finding techniques of an acoustic source with uniform linear array

Purpose – The purpose of this paper is to show a comparative study of different direction-of-arrival (DOA) estimation techniques, namely, multiple signal classification (MUSIC) algorithm, delay-and-sum (DAS) beamforming, support vector regression (SVR), multivariate linear regression (MLR) and multivariate curvilinear regression (MCR). Design/methodology/approach – Therelativedelaybetweenthemicrophonesignalsisthekeyattributefor the implementation of any of these techniques. The machine-learning models SVR, MLR and MCR have been trained using correlationcoefficient as the feature set. However, MUSIC usesnoise subspace of the covariance-matrix of the signals recorded with the microphone, whereas DAS uses the constructive and destructive interference of the microphone signals. Findings – Variations in root mean square angular error (RMSAE) values are plotted using different DOA estimation techniques at different signal-to-noise-ratio (SNR) values as 10, 14, 18, 22 and 26dB. The RMSAE curveforDASseemstobesmoothascomparedtoPR1,PR2andRRbutitshowsarelativelyhigherRMSAEat higherSNR.As comparedto(DAS, PR1,PR2andRR),SVRhasthe lowestRMSAEsuchthatthe graphismore suppressed towards the bottom. Originality/value – DAShasasmoothcurvebuthashigherRMSAEathigherSNRvalues.Allthetechniques show a higher RMSAE at the end-fire, i.e. angles near 90 8 , but comparatively, MUSIC has the lowest RMSAE near the end-fire, supporting the claim that MUSIC outperforms all other algorithms considered


Introduction
The determination of the direction-of-arrival (DOA) of an acoustic signal is a problem that is studied under the ambit of localization and tracking. It has applications in various domains, namely, robotics, where unmanned vehicles have to move in an unexplored/new environment, radar systems for aerial/underwater target tracking, sonar system, surveillance system, where a camera needs to align to a direction from where sound is coming (Johnson and Dudgeon, 1993;Godara, 1997;Asaei et al., 2016;Bekkerman and Tabrikian, 2006;Zhao et al., 2010Zhao et al., , 2012Clark and Tarasek, 2006;Bechler et al., 2004;Argentieri and Danes, 2007;Xiao et al., 2014;Delikaris-Manias et al., 2016;Zhang et al., 2008). The DOA estimation is challenging, as there is a certain distortion in the acquired signals, the possible reasons being sensor noise, ambient noise, non-uniformity in array elements, reverberation, interferences or a combination of these impairments. The presence of distortion causes inaccurate estimation of DOAs of an acoustic source. Techniques for DOA estimation with small errors in the presence of impairments use different acoustic vector sensors and microphone/sensor array configurations (Bogaert et al., 2011;Wajid et al., 2017aWajid et al., , b, 2019Wajid et al., , 2020aYadav et al., 2020;Alam et al., 2021;Liu et al., 2019). The DOA estimation techniques are classified as based on regression, beamforming and subspace method. delay-and-sum (DAS), minimum-variancedistortionless-response (MVDR) and fast-fourier transform-effective aperture distribution function (FFT-EADF) are present under the subcategory of the beamforming method. Polynomial-regression of order 1 (PR1), Polynomial-regression of order 2 (PR2), support vector regression (SVR), ridge-regression (RR) are under regression techniques. Multiple signal classification (MUSIC), root MUSIC, estimation of signal parameters via rotational invariance technique (ESPRIT), etc. algorithms are present under the subcategory of the subspace method (Shi, 2019;Cui et al., 2019;Gupta et al., 2020;Varma, 2002;Zhou et al., 2017;Tang et al., 2014). DOA estimation has paved its path from early methods where narrow beams are steered in a particular direction for knowing the incident angle. Digital signal processors have been used as an approach for finding the direction. Methods such as subspace decomposition, analysis of eigen values and compressed sensing-based methods are playing an important role in achieving better performance in terms of speed, accuracy and robustness (Ge et al., 2021;Zhang et al., 2021).
Understanding the wide range of applications for DOA estimation, the increased variety of sensor configurations, and the wide knowledge of constraints implied by the hardware, the research into DOA estimation strategies has continued uninterrupted. Recently, new approaches based on deep neural networks (DNNs) for many speech sources localization using an array of smaller dimensions. These techniques result into high-resolution DOA estimation. DNNs, amongst the data-driven methods, show the potency for high precision DOA estimation. The recent deep learning technology undergo heavy analysis, proving the importance of DOA estimation; various combinations of convolutional neural network and deep neural network are taken into account. The evaluation criteria include root meansquared error, accuracy and mean absolute error. They evaluate this new deep and machine learning technology in DOA estimation, and various factors (signal-to-noise ratio, number of snapshots, number of antennas and number of signal sources) affecting DOA estimation are also processed. Based on findings, it is being believed that advanced technologies like deep learning has improved the direction-finding techniques to a greater extent. Such kind of study helps researchers to conduct detailed analysis (Ge et al., 2021;Zhang et al., 2021). This paper is divided into the following sections. Section 2 presents the signal model for the uniformly linear array. Section 3 briefly describes the different direction-finding techniques. In Section 4, simulation parameters and results are presented and analysed, Section 5 concludes the paper.

Directionfinding techniques
2. Signal model for uniform linear array A uniform linear array (ULA) of M number of microphones is used as a receiver, where two adjacent array microphones are separated by a distance, d. Let there are D number of far-field acoustic sources and transmitted signal, s 1 ðtÞ, a narrowband signal. Figure 1 depicts the farfield sound source and a ULA of microphones with M ¼ 4 and D ¼ 1. Assuming a single sound source in the far-field and the wavelength of the incoming signal is λ 1 , which arrives at an angle θ 1 with respect to the y-axis and in the clockwise direction. Thus, the equation of the signal received by the ith microphone is expressed as where e −jγðθ 1 Þ is the phase component which is common to all microphones signal and introduced due to wave travel from the sound source (at an angle θ) to the first microphone of ULA, and β i ðθÞ ¼ 2πði − 1Þ d λ 1 sin θ 1 ; is called additional phase difference caused by the path difference between the first microphone and the ith microphone. n i ðtÞ represents the AWGN at the ith microphone. Additive white Gaussian noise (AWGN) has been used to represent the inherent noise of microphone sensor and electronics system noise, ambient noise, etc. Now, (1) can also be rewritten in the matrix form and is given by Wajid et al. (2020c) rðtÞ ¼ AðθÞsðtÞ þ nðtÞ ¼ ½ r 1 ðtÞ r 2 ðtÞ . . . r M ðtÞ T ; (2) where AðθÞ ¼ 2 6 6 4 nðtÞ is a vector representation of AWGN and ½: T denotes the transpose, AðθÞ is the Vandermonde structure of the array steering matrix AðθÞ (matrix order M 3 D). The where, ½: H denote the conjugate-transpose and E½: denote the ensemble average. Similarly, (say) R s and N n represents signal and noise correlation matrices, respectively. Therefore, they can be expressed as follows: and Since the noise realizations are mutually uncorrelated, so their cross-correlation is zero and all noise realizations will have the same variance. Thus, N n is expressed as where σ 2 is the variance of zero-mean AWGN and I is the identity matrix. Substituting value of N n from (8) into (5) results in the following equation (Wajid et al., 2020c).
3. Techniques of the direction of arrival estimation There are many existing techniques of DOA estimation, which can be categorized based on three broad approaches, (1) regression modelling, (2) classical beamformer and (3) subspace methods, which are shown in Figure 2. This paper presents the extension work of Directionfinding techniques Wajid et al. (2020c). In this paper, we have compared polynomial regression, RR and SVR, DAS beamforming with the subspace technique, i.e. MUSIC algorithm for DOA estimation. The details of the direction-finding techniques among which comparison has been made are given in the subsequent subsections.

Regression technique
Regression is a statistical method that attempts to determine the nature and degree of relationship between a dependent variable based on many independent variables. The nature of the relationship is produced in the form of a mathematical model (equation) between the predictors and response. The coefficients in the mathematical model are found by undergoing training of parameters. The training process aims at reducing the error distance between the predicted and the actual values in the training by using the best-fit parameters to adapt to the training set. The error estimate between the predictors and response is assessed by the leastsquares method in this work. This error is given by the following equation: where Y is the vector of observed values and b Y is the set of predicted values and n is the number of predictions. The regression techniques require the identification of the features derived from the independent variables. These features are then used as an input to identify the mathematical model that is to be determined.
In this work, we have used Pearson Product-Moment-Correlation-Coefficient (PPMCC) as the feature that is taken as the input in the training of the mathematical model. PPMCC is the degree of association or dissociation between two variables. If a variable increases with the increase in the other variable, then the correlation between the two variables is þ1. If the variable decreases with the increase in another variable, then the coefficient is À1. The rest of the values lie between þ1 and À1 commensurate with the degree of association between the variables. As a feature, PPMCC is calculated on each signal pair acquired at each microphone. The PPMCC thus calculated between each microphone pair is indicative of the phase difference between the sinusoidal waves received at each microphone. The phase difference occurs due to a certain time delay in the reception of signals in the ULA of the microphones. Different regression techniques used are discussed as follows: 3.2 Polynomial-regression of order 1 and order 2 Polynomial-regression or linear-regression is the simplest machine learning algorithm that can be used for estimating the DOA which is given in (11). If k ¼ 1 and k ¼ 2, then they are denoted with PR1 and PR2, respectively.
. . b k is a vector that consists of weights for different input vectors, c is a constant and y is the output vector that is dependent on x.

Support-vector-regression (SVR)
SVR model uses a non-linear model for the estimation of DOA which is trained to relate the input correlation-coefficient features and the output DOA. It uses the Vapnik-Chervonenkis theory of support vectors to form a relationship between predictors and response. Assuming that the predictor variable is denoted by variable x and the variable of importance, the FEBE 1,2 dependent response variable is denoted by GðxÞ: The variable x encompasses all the individual variables that would determine GðxÞ after training. x is 0 a 0 dimensional indicating that 0 a 0 independent variables are used for prediction. It is defined as follows: A general-regression technique requires that the order of the relationship between the predictors and response be predetermined before the training process. The order of relationships could be linear or polynomial. This pre-ascertained relationship hinders the establishment of a mathematical model that is closer to the actual values, as the real relationship could be of scores of a different order than surmised on the proposed order of the polynomial. SVR has a different methodology for determining this mathematical model. To identify a closer model, it uses a kernel function that projects the input variable to an infinitely high dimensional space, with other dimensions as derived dimensions of the input space. To fine-tune the model, the training is performed on a set of known predictor and response values. At the end of the training process, a linear hyperplane is identified in this high dimension that helps minimize the prediction error. The hyperplane thus identified is linear in the high-dimensional derived input space but it is non-linear when projected back in the 0 a 0 dimensional input space. The process of projecting in the high dimension and then projecting it back in the input space relieves us from predetermining the order of the mathematical model before the training process, thus helping in the establishment of a closer relationship between predictors and response. Radial basis function (RBF) is a popular function that is used for transforming the input space to a high-dimensional space. This RBH has been used in our experimental work. Its mathematical equation is given as follows: where x and x 0 are all vectors in feature space ℝ d . The function in (13) on expanding reveals that it has an infinitely high number of dimensions. The final value that computes ranges from 0 to 1. The final value thus computed is commensurate with the distance jx − x 0 j: To establish a close relationship between predictors and response, multiple linear regression is performed with each variable being the derived dimension in the projected higher dimension. The established mathematical model forms a hyper tube of predicted values in the high-dimension space such that it rotates around the actual values. The linear regression is trained on a set of known values such the overall loss is minimized. The estimation is measured using a loss-function given by the following equation: This function is an ε-insensitive loss function that forms a tube of width ε such that if the predicted value is in its periphery, then the loss is 0 otherwise the loss is the measure of distance between the predicted value and the tube periphery. The training process performs a linear regression on this high-dimensional feature space and an initially haphazard and high-width tube. The training process then reduces the width of this tube. This is done by minimizing the loss between the predictors and the response using the above-mentioned ε-insensitive loss function. It minimizes the parameter, min 1 2 jjwjj 2 , where w is the vector normal to the tube. The emphasis is on finding the most flattened tube such that most of the predictions lie within its boundaries (Awad and Khanna, 2015;Alam et al., 2021;Drucker et al., 1997).

Ridge-regression
RR is used to eradicate some of the drawbacks of the linear regression technique. This technique is for the analysis of multiple regression data in which data has the issue of multicollinearity, in which there is the existence of non-linear relationships among the independent variables. In the case of occurrence of multicollinearity, the linear regression estimates are unbiased but the value of variance for different inputs are so large that the estimated value would be far from the actual value. By adding a percentage of bias to this technique, it reduces the percentage of error and thus RR provides more suitable results.

Beamforming technique
DAS algorithm is a beamforming technique that estimates DOA using signal power, P DAS ðθÞ. The DOA is estimated by searching for the values of θ for which P DAS ðθÞ show peaks (Awad and Khanna, 2015;Alam et al., 2021;Drucker et al., 1997). P DAS ðθÞ is defined as follows: where a ðθÞ contains the look-angle vector of ULA. The look angle vector a ðθÞ, scans for all possible values of DOA angles to evaluate the estimated values of DOA (Awad and Khanna, 2015).

Subspace algorithm-based DOA estimation
MUSIC algorithm is a subspace algorithm that uses data collected from ULA to estimate covariance matrix to form subspaces. The steering vector is imposed on the noise-only subspace which leads to the formation of the pseudo-spectrum, the number of peaks in the pseudo-spectrum represents the number of sources and the angular value at which peaks occur is the estimated DOA (Zhang et al., 2021). The eigen-decomposition is used to separate noise subspace and signal subspace. In this algorithm, eigen-decomposition is performed for covariance matrix for any output data of the ULA. This decomposition results in the formation of signal-plus-noise and noise-only subspace. These resulting subspaces will be orthogonal to each other. Later in the algorithm, the orthogonality property is exploited using a steering vector which forms a spectrum function. In the pseudo-spectrum function, we search for peaks, and the corresponding angle at which this peak occurs becomes the estimated DOA (Ahmad and Zhang, 2016;Liao and Abouzaid, 2014). Implementation of MUSIC algorithm is as follows: (1) Estimation of covariance matrix from the signal vector acquired by ULA. In practice, R rr is estimated by averaging over snapshots (N). These snapshots are output data of M-microphones of ULA captured N time instances where r n (order M 3 1) is the output of M-sensors at nth time instant.
(2) The second step involves the eigen-decomposition of the estimated covariance matrix, b R rr with the assumption that b R rr is a non-singular matrix. b R rr being an M 3 M matrix results in M eigenvalues and corresponds to M eigenvectors.
(3) The third step is the formation of subspaces. The eigenvalues obtained from the second step are used. Among M eigenvalues, the first D number of larger eigenvalues forms eigenvectors which represent signal-plus-noise subspace. The rest of M − D eigenvalues and associated eigenvectors represent noise-only subspace Q n . If eigenvalues are FEBE 1,2 and their corresponding eigenvectors (column vector of Q n ) are Thus, noise subspace becomes (4) Formation of the pseudo-spectrum is done by projecting look angle vector on the noise subspace (i.e. a H ðθÞQ n ) and is given by Scan the pseudo-spectrum by varying the value of θ, for peaks. For multiple sources, multiple peaks are observed, the corresponding number of values of θ are the estimated DOAs (Ahmad and Zhang, 2016;Liao and Abouzaid, 2014).

Simulation environment and results
The properties of the sound wave propagating in the air medium are assumed to be quiescent, isotropic and homogeneous. The microphones are placed along the x-axis in a uniform linear manner. Beam patterns of the microphone array are assumed to be omnidirectional. The separation "d " between each of them is 10 cm. A point-sized single sound source is placed at a far distance which is transmitting a sinusoidal signal of frequency 1 kHz and traveling at a speed of sound in air which happens to be 343 m/s. It is assumed that the source is transmitting signals from the far-field. The sampling rate of 48 kHz is chosen for the received signal and the signals are recorded for the duration of 25 ms. The attenuation of signals which are impinging on the microphone surface is not considered in this analysis. The measurements of the DOA are done in the clockwise direction w.r.t. the positive y-axis. A zero-mean white Gaussian noise is added in the received signal vectors with different values of SNR. For every DOA angle, a total of 2,000 independent noisy-signal vectors have been used out of which 1,400 are used for training of the regression model and 600 are used for the testing purpose of the model, for SNR values ranging from 26 to 10 dB, decrementing by 4 dB at each step (Awad and Khanna, 2015). For training, data of the 46-ary system is used where DOA varies from 08 to 908 with steps of 28. For testing of the trained models, 91-ary system has been used, where DOAs range from 08 to 908 in steps of 18. Training has been performed on the signal acquired at the microphones of ULA with SNR 5 26 dB with 1,400 independent realizations at each DOA; however, testing has been performed on the signal acquired with SNR 5 10, 14, 18, 22 and 26dB with 600 independent realizations at each DOA. PPMCC on a combination of any two of the microphones for each vector has been calculated and has been used as the feature in the training and testing of regression models.
DAS and MUSIC have also been applied on the ULA microphone signals corresponding to SNR 5 10 dB, 14 dB, 18 dB, 22 dB and 26 dB with 600 independent realizations at each DOA 08-908 in steps of 18 (91-ary). The spatial scanning/searching of peaks w.r.t. θ is done with a step size of 0.18 in the range of DOA from 08 to 908 as per (15) and (18).
The metrics root mean square angular error (RMSAE) and average root mean square angular error (RMSAE) have been used to evaluate the performance of direction-finding techniques. These evaluation metrics are expressed in (19) and (20) Directionfinding techniques where b θ i is the estimated angle obtained using ith realization of the actual angle, N(5600) is the total number of times a source was at θ i . The formula of RMSAE can be written as follows: (20) where N T is the total number of possible actual-DOAs in a given ary (for 91-ary, N T 5 91).

RMSAEðθÞ
In Figures 3-8, graphs represent the result of DOA estimation using different DOA estimation techniques that were tested with 600 independent realizations at different SNR values as described above. Each of Figures 3-7 shows variations in RMSAE values with   different SNR values as 10, 14, 18, 22 and 26dB, for 600 independent realizations for every value of SNR. Figure 8 represents the comparison of the mentioned DOA estimation techniques in terms of RMSAE for signals acquired at each of the SNR. It can be observed from Figures 3 to 7 that the RMSAE curve for RR and PR1 seems to be overlapping, with considerably more lobes than other techniques. They also reveal to have higher SNR on average as compared with other techniques. PR2 follows a similar pattern of a higher number of lobes but shows lesser RMSAE for all SNR values considered. The RMSAE curve for DAS seems to be smooth as compared to PR1, PR2 and RR but it shows a relatively higher RMSAE at higher SNR values (14 , 18, 22 and 26dB). As compared to (DAS, PR1, PR2 and RR), SVR has the lowest RMSAE such that the graph is more suppressed towards the bottom. SVR also shows much fewer lobes than other regression techniques. The lobes in the RMSAE curve are not indicative of a good machine learning model as the predicted DOA may have a higher error for a randomly tested angle. A common observation among all the machine learning methods and DAS is that the RMSAE is considerably higher towards the end-fire, i.e. angles  training of regression models is done at SNR 5 26dB, and the testing is performed at SNR values ranging between 10 and 26dB with an increment of 4dB Directionfinding techniques near 908. MUSIC algorithms prove to be the best, having the lowest RMSAE among all the algorithms considered. It also has the least number of lobes as compared to other techniques. Even close to the end-fire where all other techniques have large RMSAE, the MUSIC algorithm shows a small RMSAE value.
It can be inferred from Figure 8 that the cumulative/average of RMSAE for MUSIC is the least when compared with other techniques. In fact, RMSAE for MUSIC is much lower when compared with other techniques for all SNR values. Other than MUSIC techniques, SVR performs better with lower RMSAE at all SNR values as compared with other methods. PR2 performs better than PR1 and RR.

Conclusion
A comparative analysis of multiple techniques of DOA estimation, namely, SVR, RR, PR1, PR2, DAS and MUSIC, have been performed in this work. It has been revealed from the experiment that the MUSIC algorithm outperforms all other techniques in terms of RMSAE. Amongst the machine learning techniques, SVR performs better in terms of RMSAE. Techniques such as PR1, PR2 and RR have higher RMSAE and have lobes in the RMSAE curve for DOA estimation. These lobes in the RMSAE curve indicate that the predicted DOA may have a higher error for a randomly tested angle. DAS has a smooth curve but has higher RMSAE compared to other techniques at higher SNR values. All the techniques show a higher RMSAE at the end-fire, i.e. angles near 908, but comparatively, MUSIC has the lowest RMSAE near the end-fire, supporting the claim that MUSIC outperforms all other algorithms considered.
In the future, this work can be extended by implementing a root-MUSIC algorithm that avoids searching for peaks in the spectrum and angle corresponding to it, rather, it finds roots by defining a variable-based steering vector and uses it to estimate DOA.