Analysis of data-driven approaches for radar target classi ﬁ cation

Purpose – This study focuses on the classi ﬁ cation of targets with varying shapes using radar cross section (RCS), which is in ﬂ uenced by the target ’ s shape. This study aims to develop a robust classi ﬁ cation method by considering an incident angle with minor random ﬂ uctuations and using a physical optics simulation to generate data sets. Design/methodology/approach – The approach involves several supervised machine learning and classi ﬁ cation methods, including traditional algorithms and a deep neural network classi ﬁ er. It uses histogram-based de ﬁ nitions of the RCS for feature extraction, with an emphasis on resilience against noise in the RCS data. Dataenrichmenttechniquesareincorporated,includingtheuseofnoise-impactedhistogramdatasets. Findings – The classi ﬁ cation algorithms are extensively evaluated, highlighting their ef ﬁ cacy in feature extraction from RCS histograms. Among the studied algorithms, the K-nearest neighbour is found to be the most accurateof the traditional methods, but it is surpassed in accuracy by a deep learning network classi ﬁ er. The results demonstrate the robustness of the feature extraction from the RCS histograms, motivated by mm-wave radarapplications. Originality/value – This study presents a novel approach to target classi ﬁ cation that extends beyond traditional methods by integrating deep neural networks and focusing on histogram-based methodologies. It also incorporates data enrichment techniques to enhance the analysis, providing a comprehensive perspective fortarget detectionusingRCS.


Introduction
Millimetre-wave radar technology has significantly advanced, offering enhanced detection of subtle movements and objects in high frequencies.The radar cross section (RCS) is vital in this progress, improving object visibility and precision in target identification (Jun et al., 2021).To maximise target identification using RCS, extensive research has focused on the statistical aspects of RCS, including analysing how a target's size, shape, orientation and material influence RCS.
The need for reliable target detection and classification drives research that emphasises minimal environmental impact and computational resource reduction.This shift prioritises the use of machine learning applications in the field (Cai et al., 2021;Arab et al., 2021), also considering different signal pre-processing (Kanona et al., 2022) in radar systems.Histogram statistics are becoming a hot topic as they offer a reduction in the computation associated with deep learning algorithms (Shuai et al., 2020).Prior work (Coskun and Bilicz, 2023a) has primarily used the K-nearest neighbour (KNN) algorithm for the classification of targets, and the artificial neural network (ANN) (Coskun and Bilicz, 2023b) has been explored for classifying targets based on the RCS.
In this study, we delve deeper into the utilisation of histogram-based methodologies for radar target classification, endeavouring to establish a dependable solution.It extends beyond traditional machine learning methods, integrating deep neural networks to capture the nuances of target detection.Additionally, the research benefits from the integration of data enrichment techniques, which includes incorporating a noise-impacted histogram data set, and it is represented alongside two supplementary methodologies to provide a thorough analytical perspective for the readers.

Histogram-based target classification methodology
Let us define the benchmark problem of radar classification as follows.The target is a conducting plate with a shape being one of the six possibilities shown in Figure 1.The radar is assumed to measure the RCS, i.e. the intensity of the scattering from the target, as sketched in Figure 2. Note that the RCS is measured in dB square meter (dBsm) unit.The distance from the radar (R), the mean incident angle (u m ) and the surface of the target (S) are enabled to vary within the ranges: The data sets for the subsequent studies in this work are generated as follows.For each of the six classes, 5,000 histograms are calculated for uniformly distributed random values of R, u m and S in the ranges given in equation (1).In total, 6 Â 5,000 samples are generated, each consisting of a histogram, or feature vector, of length b ¼ 100.
The scattering calculations are performed by a physical optics simulation, implemented in Matlab, as presented in Coskun and Bilicz (2023a).The operating frequency of the radar is 24 GHz, which implies a wavelength of 12.5 mm.Therefore, given the target sizes and distances in equation ( 1), the illumination cannot be approximated as a plane-wave, but the near-field RCS is considered.It indeed depends on the distance R as it varies in the range given in equation ( 1) .Here this near-field RCS is defined for a given distance R as follows: where S i and S r are the incident power density at the target and the reflected power density at the antenna, respectively.A pioneer work is Taylor and Terzuoli (1997), which defines the near-field RCS in terms of R x and T x power (i.e.antenna properties are also taken into account).However, in the present work, those attributes of the near-field scattering are in the focus, which are associated with the target.

Traditional machine learning classification
In this section, we aim to investigate how the traditional machine learning classification methods perform on the benchmark problem with the data set generated according to the previous section.We underscore the importance of a validation hold-out data set by reserving 30% of the data for validation; we assess the model's accuracy in the final stages.This approach boosts confidence in estimating accuracy for unseen data, whereas 70% of the data set is used for modelling.Furthermore, we use a 10-fold cross-validation method to evaluate algorithms based on accuracy, which is defined as the ratio of correctly classified objects to the total number.In this method of cross-validation, the data is split into 10 subsets, with 9 subsets used for training and 1 subsets used for validation in each iteration.This process is repeated 10 times, with each subset serving as the validation set exactly once, aiming to provide a robust measure of the model's performance.

Radar target classification
In the initial stages, we establish a performance baseline, where the effectiveness of various machine learning algorithms will be compared.It is a good practice to think of the performance baseline as the initial score to reach, allowing us to observe the basic, unoptimised model's performance over chosen algorithms.Then, we assess various algorithms: linear discriminant analysis (LDA), KNN, Gaussian Naive Bayes (NB), support vector machines (SVM) (Bishop and Nasrabadi, 2006) and classification and regression trees (CART) (Geron, 2019).All algorithms initially use default parameters, and we compute their mean and standard deviation of accuracy to quantify their effectiveness.Comprehensive details and theoretical principles of these algorithms can be found in Hastie et al. (2009), serving as an authoritative reference for such machine learning methods.

Spot-check algorithm
The distribution of accuracy values across the cross-validation folds is crucial.The outcomes reveal compact distributions for each algorithm, indicating minimal variance.Both KNN and SVM stand out as viable options for further consideration due to their mean accuracy and low variance, as shown in Table 1.

Effect of standardising data
We suspect that the diverse distribution of the original data may impact the performance of some algorithms.We assess these algorithms using a standardised data set to investigate this.Standardisation involves transforming each feature to have a mean of zero and a standard deviation of one across the 30,000 samples.To sum up, each of the 100 bins is individually scaled so that the values in each specific bin have a zero mean and a unit variance.It is aimed at this strategy of standardising the data and building the model for each fold within the cross-validation framework; a more accurate estimation of each model's performance will be obtained.This transformation has been adopted in the pipeline approach, as it is presented in Table 2.The purpose of the pipelines, each containing a pre- Upon analysis, KNN has maintained its strong performance and even improved; standardising the data set has slightly boosted KNN's classification accuracy, establishing it as the top performer.Evidently, the NB and LDA algorithms perform significantly lower accuracy compared with other algorithms, with mean accuracies (Tables 1 and 2).These algorithms appear to be inadequate for our purposes.In conclusion, the comprehensive impact of standardisation on the classification algorithms can be observed in Figure 3.

Algorithm tuning
This section focuses on fine-tuning the parameters of two promising algorithms identified during the previous phases: KNN and SVM.This investigation follows the earlier section where KNN and SVM were evaluated.
3.3.1 KNN tuning.The initial step involves fine-tuning the K number of neighbours for KNN.While the default value is 7, we explore a range of odd values from 1 to 21, encompassing the default setting.Each K value is evaluated through k-fold cross-validation using the standardised training data set.The settings assign equal weight to each point in the neighbourhood and calculate distances using the L1 norm.Upon completion, we can display the configuration that yielded the highest accuracy and the accuracy values obtained for all the tested k values; the optimal configuration is observed to be K ¼ 5, resulting in an accuracy of 82.46%, as seen in Figure 4.
3.3.2Support vector machines tuning.The tuning process for the SVM algorithm involves optimising two parameters: the value of C (which determines the margin relaxation) and the type of kernel used.Similar to the approach used for KNN, a grid search has been conducted using k-fold cross-validation on the standardised training data set.This grid search covers two variations of kernel types: the radial basis function (RBF) and sigmoid kernels (Bishop and Nasrabadi, 2006) and different C values to identify the optimal combination.
The most accurate version of SVM is obtained by the RBF kernel.As for penalty parameter, C ¼ 2 is found as optimum, with an accuracy of 80.03%.The decision function The difference in the performance between RBF and sigmoid kernel (Figure 5) can be caused by several factors, including kernel stability, data characteristics and multiclass strategy.The sigmoid kernel may not be suitable for our data set because it is prone to perform as a linear classifier in high-dimensional data sets.Furthermore, the RBF kernel tends to have a higher model complexity compared to the sigmoid kernel and could contribute to better capturing the underlying patterns in our data set.KNN could achieve better performance so that the model is finalised with KNN.

Final model
In this section, the finalisation of our chosen model, KNN, which has demonstrated exceptional promise as a robust and low-complexity solution for our problem, will be

COMPEL
presented.The objective here is to consolidate our model's performance by conducting comprehensive evaluations.We begin by training the KNN model using the entire training data set and subsequently evaluating its predictive capabilities on a dedicated hold-out test data set.This testing step serves to validate the robustness and generalisation of our model.Furthermore, we investigate the model's robustness in additive white Gaussian noise (AWGN) presence with varying standard deviation values.To this end, we explore three distinct scenarios: (1) Baseline approach: In this scenario, we use a pure data set devoid of AWGN noise to train and test the KNN model.This setting establishes a baseline for comparison.
(2) Naive approach: The AWGN noise is introduced solely during the testing phase, while the training data set remains unaltered, consisting of pure data of the RCS histograms.This setup allows us to assess the model's resilience to testing data noise.
(3) Data enrichment approach: This approach involves creating and fusing an additional noise-affected histogram data set composed of noise-affected histograms with the original pure histogram data set (Needham, 2021).The combined data set, now consisting of 60,000 histograms, each with 100 bins, is used for training and testing the KNN model.We anticipate that this approach will enhance the model's robustness by enabling it to adapt to more challenging scenarios involving the combined pure and noisy histograms with an extended observation approach.
In summary, this section encompasses the comprehensive finalisation of our KNN model, addressing its performance, robustness and adaptability to noisy data scenarios.AWGN has been implemented as follows: where z i is the zero-mean Gaussian random variable Z 2 N 0; STD ð Þ, s i is the pure RCS value from the PO simulation and si is its noise-corrupted counterpart.STD stands for the standard deviation of the noise, which will be a varying parameter in the subsequent studies.
Figure 6 presents a visual representation of our model's performance analysis.The baseline method, as depicted, exhibits consistent and stable performance across the tested conditions.Conversely, the naive approach appears less effective in handling noisy testing histograms, as evidenced by its performance in the graph; Figure 6(b) depicts the look within a narrower range to observe the behaviour of the naive approach.However, as a promising alternative, the data enrichment strategy emerges as a viable and robust solution.This approach allows the algorithm to adapt effectively to more challenging scenarios by using an extensive data set for both training and testing purposes.The demonstrated capability to handle noisy data and yield improved results underscores the efficacy of this approach.
In conclusion, the study demonstrates the comparative performance of different testing scenarios.The data enrichment approach is the most effective among the three, especially in noisy environments.It not only starts with the highest accuracy at no noise but also degrades gracefully as noise increases.Emphasising the advantages of the data enrichment approach for addressing challenges posed by noisy testing histograms compatible with a real-life scenario.
Table 3 depicts the average prediction time for the models.The training time for KNN could be zero or near zero, as KNN is a lazy learning algorithm, which means that it does not have a training phase in the traditional sense we observe for SVM or neural networks.In the

Radar target classification
KNN approach, the actual computation happens during the prediction phase, while the distance calculation is used between two data points to determine the nearest neighbours.All tests were conducted on a machine equipped with an 11th Gen Intel(R) Core TM i5 processor, 8 GB RAM, and running on a 64-bit operating system on an Â 64-based processor, including the following numerical experiments as well.

Deep neural network classification
A neural network classification has been used for the classification of RCS histograms to ensure more efficiency.After applying extensive grid search to optimise hyperparameters such as the number of neurons, layers and the choice of the optimiser as seen in Table 4, we establish a foundational neural network for our RCS histogram classification task.
The final feed-forward neural network is a fully connected model with five layers containing 30 neurons.The hidden layers use rectified linear unit functions (Goodfellow et al., 2016), while the output layer, representing six different one-hot encoded classes, uses the Softmax activation function to produce a probability distribution.The training uses the ADAM gradient descent optimiser (Kingma and Ba, 2017) and categorical sparse-cross-entropy loss function.We assess the model's learning performance using fivefold cross-validation with data shuffling.The final outcome is the average accuracy across all data set partitions.The optimisation is enhanced by applying learning rate scheduling as a form of regularisation.The learning rate is gradually decreased by a factor of 0.5 every 50 epochs.This introduces noise into the optimisation process, preventing over-fitting by discouraging the model from fitting the training data too closely.During training, an early stopping criterion is used to prevent overfitting and reduce the training time.The patience parameter is set to 10, where training will stop if  Radar target classification there are 10 epochs in a row without any improvement in the validation loss.Figure 7 proves the ability of the model to distinguish the different classes.For more challenging cases adaptive to real-life problems, we also consider the three distinct scenarios defined in Section III-D; however, this time, we introduced early stopping criteria to reduce the computational cost and time for training the neural networks for each distinctive scenario.
The accuracy of the baseline approach fluctuates between 82.94% and 84.86% across all noise levels.The baseline model, unaffected by noise, exhibits performance  8 do not correspond to changes in noise levels for the baseline model; instead, they reflect the inherent variability due to data partitioning.At the lowest noise level, the naive approach slightly outperforms the baseline with an accuracy of 84.63%.As the noise level increases, there is a sharp drop in accuracy, indicating that the naive approach is highly sensitive to noise.The enrichment approach at the lowest noise level outperforms both the baseline and naive approaches with an accuracy of 87.45%.As the noise level increases, there is a gradual decline in accuracy.However, even at the highest noise levels, the enrich approach maintains a relatively high accuracy of 70.58%, suggesting that the data enrichment strategy helps preserve the model's performance under noisy conditions.In summary, when dealing with noisy data or expecting varying noise levels, the enrichment approach is the most suitable.The naive approach, despite its initial promise, quickly deteriorates with noise, while the baseline provides a consistent performance.The mean value of each of the three different approaches' training and prediction times are shown in Table 5.

Conclusion
In this work, a comprehensive analysis of data-driven approaches to radar target classification based on RCS histograms has been studied.Through deep analysis, the study highlights the robustness and efficacy of traditional machine learning algorithms and deep neural network models in dealing with inherent challenges.Specifically, noise introduction to the data in different approaches, namely, the baseline, naive and data enrichment approaches, towards varying noise levels have been studied.The baseline approach, without noise augmentation, highlights the model's inherent performance variability due to data partitioning randomness.The naive approach reveals a notable accuracy degradation with increasing noise, particularly in test data set inducted noise.On the other hand, the Data Enrichment approach, blending noise-affected and pure data, enhances the model's robustness and significantly improves performance across various noise conditions.Additionally, the exploration investigates deep neural networks with fine-tuned hyperparameters, promising capability in distinguishing various target classes, even in noise-affected scenarios.Using early stopping criteria during training enhances model efficiency and creates balance in computational resources and performance optimisation.Essentially, the highlight of data enrichment shows how adding more varied data can affect the classification model's robustness and accuracy.Throughout this work, the achieved results emphasise data enrichment strategies potential along with machine learning and deep learning models for classification, offering more reliable and noiseresilient radar target classification systems.The insight and achievements from this study are crucial for steering future research towards developing robust and reliable radar target classification solutions in the face of histogram-based classification.Future work will also include the use of real measured data for testing the classifiers.
Figure 1.Shape of the targets in the classes 0. ..5

Figure 2 .
Figure 2. Scattering configuration in the benchmark problem Figure 3. Accuracy comparison Figure 4. KNN tuning results Figure 6.Comparative analysis of the effects of AWGN noise on KNN classification accuracy with zeromean Gaussian random variable Z 2 N 0; STD ð Þ , and different standard deviation values

COMPEL
Figure 7. Receiver operating characteristics curve analysis for the flat plates (see Figure 1 for classes) Figure 8. Comparative effects of AWGN noise on ANN classification accuracy performance for three distinct scenarios

Table 1 .
Performance by a classifier, we are able to test the performance of different classifiers on the standardised data.It is essentially useful for observing how each classifier performs and can be taken for further steps such as hyperparameter tuning.This strategy enables us to objectively estimate the performance of each model, using standardised data on unseen data.

Table 4 .
Grid search parameters Source: Authors' own work