Subjective assessment for an advanced driver assistance system: a case study in China

Purpose – This study aims to propose a novel subjective assessment (SA) method for level 2 or level 2 1 advanced driver assistance system (ADAS) with a customized case study in China. Design/methodology/approach – The proposed SA method contains six dimensions, including perception, driveability and stability, riding comfort, human – machine interaction, driver workload and trustworthiness and exceptional operating case, respectively. And each dimension subordinates several subsections, which describe the corresponding details under this dimension. Findings – Based on the proposed SA, a case study in China is conducted. Six drivers with different driving experiences are invited to give their subjective ratings for each subsection according to a prede ﬁ ned rating standard. The rating results show that the ADAS from Tesla outperforms the upcoming electric vehicle in most cases. Originality/value – The proposed SA method is bene ﬁ cial for the original equipment manufacturers developing related technologies in the future.


Introduction
Automated driving (AD) technologies have been researched and developed for over a decade. But due to do the costs, reliability, legislation, etc., it cannot be put on the market yet. A low Level 2 or 21 AD system is commonly acknowledged by consumers [for the details of AD system classification, the readers can refer to an international standard J3016 from the Society of Automotive Engineers (SAE J3016, 2016)]. This level of AD system is also called the advanced driver assistance system (ADAS). It can remove workload for drivers, release driver physical participation, increase traffic safety, optimize fuel efficiency and so forth (Hasenjäger and Wersing, 2017;Ziebinski et al., 2016Ziebinski et al., , 2017. The benefits of ADAS or AD system motivate many famous Tier 1 suppliers and automotive manufacturers to commercialize their ADAS products for many years, such as Robert Bosch, Continental AG, Delphi Technologies, ZF Group, Telsa, Volvo, General Motors, Ford and VW. Most of their products concern lateral vehicle dynamics control and longitudinal speed and space control. Adaptive cruise control (ACC), lane-keeping assistance (LKA) and auto lane change (ALC) are three major functions of commercialized ADAS. ACC system is an extended cruise control system that can automatically tune the ego vehicle speed to keep a safe distance from the preceding vehicle. LKA is denoted that automatic steering or braking begins to act when the vehicle starts to deviate from the lane. At the same time, visual, audible or haptic warnings such as seat or steering wheel vibrations are activated to provide a warning signal to the driver when the vehicle is approaching or crossing the lane markings. ALC can help the driver cross the lane safely and efficiently by an advanced algorithm that can make the right decisions according to the perception information. Integrated ACC, LKA and ALC together, the system generally is called an L2 or L21 ADAS. The general process for developing ADAS follows a V-model, as shown in Figure 1 (Koopman and Wagner, 2016;Crolla, 1996;Acosta et al., 2018;Perricone et al., 2020). Experimental verification is an unavoidable process within every step before a new technology launches into the market. Besides, as the vehicles are sold to consumers, human responses have to be considered simultaneously. But how to assess these developed systems from the perspective of drivers. In general, most developers use the subjective assessment (SA) method to quantify the performance of these systems. Specifically, SA should be conducted during the whole developing process for optimizing the performance of the developed systems or functions.
SAs for traditional steering ability, maneuverability, handling stability, drive-ability, dynamic comfort, etc., have been developed and accepted by all the automotive original equipment manufacturers (OEMs). Plenty of existing papers focuses on the assessment of traditional vehicle dynamic responses, while seldom works describe the SA on ADAS or AD system. Gil G omez et al. present the in-depth research on correlations of SA and objective metrics (OM) for steering and handling performance. In Gil G omez et al. (2016), a comprehensive review of correlations of SA and OM is presented for steering and handling performance. The detailed key parameters for these performances are summarized. And for further correlations of SA and OM, the authors also give some methods such as linear regression, nonlinear neural network, phase analysis and fuzzy logic to generate a mathematical function that can describe the relationship between SA and OM. This research also demonstrates the details about test procedures, subjects, maneuvers, data processing steps and preferred parameter values. However, as different people have variable tastes in a specific vehicle, the authors still state that the research on the correlation of SAs and OM for vehicle handling and steering still needs deep calibration. Research (Gil G omez et al., 2015) investigates the verbal and numerical SA results of vehicle handling and steering feel. The judgment scales, rating trends and evaluation spread from expert drivers are investigated. In this work, two types of testing methods are compared and analyzed, namely, short multiple vehicles driving under the predefined path and extensive one vehicle free driving tests. To keep the repeatability of subjective ratings, blind tests have to be conducted first. And the SA results show that the previous method is time-saving and efficient. For the verbal SA, a word poll containing some key factors of steering feel and handling performance such as on-center, roll control, under-gradient and yaw-damping are used to record real-time feelings when the driver doing SA experiments. Besides, counting first expression comments, directly asking the experts and the SA-SA correlation metrics are specified to figure out the key parameters for handling and steering feel. The results dedicate that steering response, steering torque build-up and deadband, understeer gradient, etc., gain more weights compared with other factors. These key factors can be used for optimization for the SA ratings for further searching for SA-OM correlations. Moreover, the further advanced machine learning algorithm is applied to reference (Gil G omez et al., 2018). The author adopted two artificial neural networks to close the gap between SA and OM for steering feel. The first neural network (self-organized map [SOM]) is developed to generate an OM data map purely based on the OM data from similar vehicles, which can produce a clear visualized map of the qualitative data for the measured vehicles. Besides, the map also denotes the subjective key factors of undergoing development. The second general regression neural network is introduced on the top of SOM. The regression surface is associated with the SA ratings. The projected regression surface can help engineers know the clusters of similar vehicles and identify how the SA ratings are hidden within them. By setting the desired values of the key parameters, the SOM 1 GRNN network map can help the engineers foresee the steering feel performance during developing the new vehicle. This work is extremely helpful for the correlation between SA and OM. If a large database can be built, it can be extended to assess the handling, driveability, ride comfort (Ao et al., 2020a(Ao et al., , 2020b, etc. This research is pretty effective for OEMs, saving time and shortening the new vehicle developing period. Vehicle winter tests for vehicle handling are analyzed and optimized in Gil G omez et al. (2017). It is difficult for engineers to acquire repeatable and high signal-to-noise data under winter tests. This research presents robust maneuvers and OM by using steering robots in low adhesion ice and complex simulation models. The bicycle vehicle model with a brush and magic formula tire models are built respectively to generate an optimization model. At the same time, a vehicle model in VI-CarRealtime is built based on experimental results, kinematic & compliance measurements and tire performance on the ice, which aimed to provide a good transient performance, predict and upgrade the winter test metrics. This research is helpful to calibrate and modify the winter test maneuvers and make the winter test results more robust against weather changes.
However, all these research studies focus on the topic of traditional SA of handing stability and steering feel. Therefore, this research focuses on the SA of ADAS. The frontrunner for objective and subjective evaluation for ADAS was studied in research (Auckland et al., 2008). The researchers develop two simple control logic for ACC and LKA. By integration, the objective and subjective evaluation for ADAS are conducted on the hardware in loop testbed. Experiment results from testbed and questionnaires show that ride comfort is increased with ADAS, while the driving pleasure is decreased. Note that the developed ADAS in this work aims to keep the vehicle headway and the velocities of the following and leading car. Moreover, it also minimizes the jerk induced by acceleration or deceleration. At the same time, the maximum levels of acceleration that the system can exert are constrained within about À0.2 g and 0.1 g. Given the aforementioned reasons, ride comfort is largely enhanced. For driving pleasure, the general forms are agility, rapid response to the steering wheel and acceleration pedal. These are possible reasons for the inverse perception between "ride comfort" and "driving pleasure." In addition, the driver's reaction performance is reduced when ADAS is dropping out. The collisions are less likely to happen with the help of ADAS. The SA in this research is assessed based on four dimensions, namely, acceptability, concentration, experience safety and trust. The detailed descriptions of these dimensions are not given in this work. Besides, the robust performance of ADAS is not disclosed in this research, such as weather change and illumination variations.
The mental stress of drivers when using the LKA system is researched in Schick et al. (2019). A professional team is set up which covers expert engineers from psychology, vehicle dynamics measurement, product design and data processing.
The key parameters of the physiological stress of the subjects are defined in this research. The benchmark of the LKA performance is recorded by three premium vehicles. The statistical analysis results from 50 subjects show that the steering effort is decreased with the help of the LKA system. While the decreased part is mainly related to physical stress, the mental stress of the subjects is relatively increased under the LKA system. Unpredicted drop-off of the LKA system, weak tracking performance, high attention effort and unclear system borders result in low trust and acceptance of this system. This work proposes an indication of future development of the ADAS/AD system that is driving attributes and experiencebased development. Acceptance and trust are the most important parts of these systems. Generally, the SA contains the customer level and expert level. Similar research (Seidler and Schick, 2018) from the same research term states that the customer and expert evaluation for the LKA system shows the same tendency and good correlation. However, their research studies only focus on the LKA performance while the integrated ADAS is not accounted for.
In summary, the contributions in this work are summarized as follows: 1) a novel and holistic SA method is proposed for an integrated Level 2 or 21 ADAS, which covers ACC, LKA, ALC, etc. The SA method contains six assessing dimensions, namely, perception, driveability and stability, riding comfort, human-machine interaction (HMI), driver workload and trustworthiness and exceptional operating case. Each dimension has detailed subsections including more scenarios to give a robust SA of ADAS. The proposed SA method can provide insights for some research and development; 2) A case study is conducted by involving two vehicles, a Tesla Model S and an upcoming vehicle based on the proposed SA method.
The rest structure of this work is organized as follows. Section 2 presents the details of every dimension. The proposed SA method is applied to a case study (comparison between Tesla Model S and an upcoming electric vehicle [EV]) in China is illustrated in Section 3 with result analysis and discussions. And Section 4 concludes this paper.

Subjective assessment method for advanced driver assistance system
To develop a Level 2 or Level 21 ADAS, the most common method to assess the performance in the research and development (R&D) center is adopting SA by a lot of drivers under different working conditions. It is known that perception, motion control, decision-making and planning are four key parts of the AD system. Based on the analytical decomposition of the four key parts. Therefore, we decompose the ADAS performance into six dimensions which aim to describe the perception performance and traditional dynamic performance. The details are shown below.

Perception
Perception acts as the "eye" of the ADAS and provides fundamental information to the system. The perception performance depends on multiple sensors, such as self-sensing, localization and surrounding sensing. For SA of ADAS, only surrounding-sensing is considered, namely, medium-range radar, ultrasonic radar and camera. Thus, the corresponded performances for SA are lane edge detection, static and moving object detection. The explanation for these two parts are given as follows: 2.1.2.2 Static object detection. Static object detection is a vital problem for AD/ADAS. As is known, multiple traffic accidents under the ADAS active are caused by the fall identification of static object detection. Thus, it is essential to focus on the precision rate of recognizing the static objects (static vehicles, people, walls, etc). during running ADAS.
2.1.2.3 Robustness of moving and static object detection. Similar to lane edge detection, the robustness of moving object detection should also be involved in the SA experiment. The robustness verification method is equal to lane edge detection.
The detailed points with respect to the perception of the ADAS are described in the following Table 1.

Driveability and stability
The driveability and stability mainly consist of longitudinal and lateral vehicle dynamics (Ao et al., 2021). Both of them have direct interaction with the driver's perception and can decide the acceptance of ADAS. The details of them are shown as below:

Driveability
For the driveability performance, it mainly concerns the longitudinal vehicle dynamics and is highly correlated with the ACC system. The SA focuses on: the performance of the drive motor or engine; the coordination of the mechanical or electric transmission system; and braking performance in different situations.
Besides, the reaction time of stop-go, the reaction ability and speed of cut-in and cut-out performance and the pitch performance during acceleration/deceleration should also be assessed. The driveability assessment is conducted both on straights and curves with different longitudinal speeds.

Lateral stability
The lateral dynamic performance of the vehicle induced by ADAS plays an important role in consumer satisfaction. For the lateral performance, the SA mainly examines the yaw motion control performance and its robustness against external disturbance (Ao et al., 2020a(Ao et al., , 2020b. During the experiments, the driver assesses the yaw rate control, swing vibration of the Figure 2 Typical and unfrequent lane markers steering wheel, the overshoot of the steering wheel when LKA or ALC running, etc. The robustness of the lateral performance for ADAS is evaluated through the stability under hairpin turn, crosswind, slope and roughness road. In addition, the roll angle is also accounted for under high vehicle speed. Different longitudinal speeds are tested to give a fair score of the system performance.
The subdetails under dimension driveability and stability are summarized in Table 2.

Riding comfort
Referring to the riding comfort of ADAS, the SA concentrates on the steering, acceleration and braking induced effects. When it comes to the riding comfort for steering, we follow the steering wheel or yaw rate oscillation during the vehicle tracking a variable curvature path. The longitudinal acceleration surge (also known as a jerk) generated by the acceleration and braking action is the key point that produces uncomfortableness to drivers and passengers. Furthermore, the rolling performance will be suppressed by ADAS. That is to say, the vertical acceleration also should be assessed under this dimension. The details corresponding to this dimension are given in below Table 3.

Human-machine interaction
The HMI of ADAS conveys abundant information to drivers. The details of HMI for ADAS contain the alarm sounds from ADAS and pretensioning seat-belt; the real-time information from interior car displays, such as ADAS start, standby, malfunction and off; and the ADAS operating handle. It is pretty acceptable if the HMI is clear, convenient, accurate and unannoyed. This dimension of SA is implemented and assessed during previous sections. Concerning the HMI, the summarized assessment details are given as follows (Table 4).

Driver workload and trustworthiness
This dimension is related to the physiological and psychological feelings of drivers. When the ADAS is active, two key points, the time and magnitude of correction torque applied to the steering wheel and the feeling of tension when autonomous turning or lane-keeping tuning, almost determine the score on this dimension. Beyond that, according to the Road Traffic Safety Law, the driver cannot leave their hands off the steering wheel. Hence, the sensitivity of hand off detection and the time duration and warnings of hand off from ADAS also should be taken into the SA. The driver trustworthiness of ADAS is expressed as to whether the performance of this system can be trusted with respect to the driver's driving behavior. The detailed concerns for this dimension are presented in below Table 5.

Exceptional operating case
As the traffic environment is pretty complex, we cannot cover all the scenarios when doing SA experiments. Hence, this evaluating dimension is named as the exceptional operating case, which comprises some unfrequent working conditions, such as overtaking an overlength vehicle, super-wide vehicle, traveling through a construction motorway and guide road of motorway shown in Figure 3. The SA on this dimension is essential but needs a live update for covering more uncommon scenarios. A brief summary of this dimension is given in Table 6. This section presents an overview of six SA dimensions of evaluation of L2 or L21 ADAS. Honestly, the aforementioned dimensions are still insufficient due to the super-complexity and uncertainties of the transportation system. However, what we demonstrate here is a subjective evaluation method and what parts should be considered. How to define the evaluation dimensions, how to decouple the vehicle responses based on different dynamics and perception performance and how can we verify the robustness of the system are key factors in designing the SA method for the ADAS. It is compulsory to do the related SA as long as this technology is connected to vehicle safety and drivers.

A case study in China to assess the Tesla Model S and an upcoming pure electric vehicle
In this section, a case study in China for the SA comparison of L21 ADAS is demonstrated based on the proposed method in Section 2. It is well known that Tesla becomes the most overvalued automotive company in the world in 2020. The autopilot system from Tesla attracts a lot of consumers. And with the help of over the air system that updates give new features and enhance the existing ADAS over Wi-Fi (Coe et al., 2019). Furthermore, the ADAS from Tesla has become the benchmark in the automotive market nowadays, and it is the reason for us to choose Tesla Model S as a comparison object. And the upcoming pure EV is equipped with a self-developed ADAS algorithm (Yang et al., 2018). These two types of ADASs have the almost same hardware configurations (Millimeter Wave Radar supported by Continental and Bosch 1 Multiple Cameras 1 Navida Tegra and Xavier). The nearly same hardware configurations are the prerequisites for the comparison, which will make the SA meaningful and fair. Besides, the states (battery capacity, coolant, tire pressure, etc). of the vehicles should be examined to keep the assessment data reliable. The details of the SA comparison are disclosed in the following subsections.

The subjective rating standard
The subjective rating standard is original from an international standard from the Society of America Engineering (SAE) J1060 (SAE J1060, 2000. But the proposed subjective rating scale is slightly modified from J1060 according to enterprise standards. The rating scale is based on 0-10 with 10 representing the best level of ADAS. Specific information can be found in Appendix 1. The proposed rating scale has been verified through other types of assessment, such as handling   Feeling of tension during aligning steering wheel Safety anticipation of the system The possibility of losing control stability and braking pedal feeling tests. Then for the ADAS system, the proposed scale is also workable. To adequately describe the subjective value, every rating scale is divided into 10 segments. Every test driver is required to fill out a questionnaire and assign a suitable score according to the actual performance of ADAS. Honestly, each dimension and its subsections will be assessed more than 10 times, and then an average objective score can be assigned to this dimension.

Subjects
The subjects invited in this SA comparison are constructed by five novice drivers (who have less than two years of driving experience), six experienced drivers (who have more than five years of driving experience) and seven professional drivers (who are working at the vehicle test department at automotive OEM). The reasons for this driver experience allocation are: to simulate a hierarchical driver structure; to give a customer level, medium level and expert level SA; and to keep the reliability and scientific of the experiment results.
The demographic information is shown in Table 7.

Experimental map
To ensure the coherence of the SA data, the route for the SA experiment is almost fixed except for the final dimension (exceptional operating case) for the reason that this dimension is happening occasionally. Hence, this dimension of SA is treated specially. Furthermore, the robustness assessments related to weather change, illumination variation, etc., are also treated particularly. For the rest assessment dimensions, the routes are required to be as same as possible. The traffic environment in China seems to be a bit more different compared with other countries and some unnormal cases such as special style lane change, close cutting in or cutting out for vehicles, motorcycles, bicycles or pedestrians, a very large variety of obstacles, dense traffic and the irregular patterns are frequently happening. Under these circumstances, the selection of the SA experimental route is extremely important. Our research group spent more than one week finding an appropriate path that could cover multiple complex traffic environments, as shown in Figure 4. Besides, this path corresponds to human daily driving life, which contains about 45%-55% of the urban road, 20%-30% of the motorway and about 15%-35% of other types of roads. The percentage is originally coming from the vehicle endurance test enterprise standard. The estimated travel distance and time are estimated from Google Map that is about 162 km and 160 min. And the distance and time are generally enough to meet all the SA dimensions and subsections more than 10 times.

Subjective assessment results and discussion
The SA results are the overall scores given by 18 test drivers when they driving at the aforementioned map. In this research, the final score of each dimension is composed of the weighted mean value of 18 drivers. Honestly, we should give different weights on different level drivers. A weighted average probably is more accurate to represent the final scores on the six Figure 3 Typical and unfrequent lane markers dimensions. Upon the collected data, we also use a weighted sum to compute the final score, which follows the below equation: where x 1 ; x 2 ; x 3 are the scores from novice, experienced and professional drivers, respectively, while v 1 ; v 2 ; and v 3 are the corresponded weights. Here we set v 1 ¼ 0:2; v 2 ¼ 0:3; and v 3 ¼ 0:5. The overall average scores of the two vehicles are demonstrated on the following radar plot in Figure 5. The results of the case study show that the Tesla Autopilot system performs better than the self-developed system except for the HMI. The possible reason is that the local market adaptation is not sufficient, specifically, the customization of local human beings.   Figure 4 The fixed route for conducting the SA experiments

Conclusions
This research presents a SA method for an L2 or 21 ADAS. The overall performance of ADAS is divided into six dimensions, namely, perception, driveability and stability, riding comfort, HMI, driver workload and trustworthiness and exceptional operating case. Each dimension also contains a lot of subsets. Eighteen drivers with different driving experiences are invited to give their valuable assessments. The proposed SA method is applied to a case study in China based on the Tesla Autopilot system and a self-developed ADAS for an upcoming vehicle. The case study results show that the Tesla Autopilot system outperforms the self-developed ADAS in most cases except for the HMI. Further, the proposed details and methodology can give some takeaways or insights to automotive OEMs for future development related to ADAS or autonomous driving technologies, especially for some national vehicle brand R&D center that wants to develop some technologies with independent intellectual property rights.
In future research, we will expand the comparison with Mobileye, Volvo Pilot Assist system, GM's Super Cruise, etc. Furthermore, if abundant SA data can be collected, the digital twin methods or some learning algorithms can be used to build the gap with objective experiment results, which has a huge prospect of industrialization.