Using naturalistic driving data to identify driving style based on longitudinal driving operation conditions

Purpose – An individual’s driving style significantly affects overall traffic safety. However, driving style is difficult to identify due to temporal and spatial differences and scene heterogeneity of driving behavior data. As such, the study of real-time driving-style identification methods is of great significance for formulating personalized driving strategies, improving traffic safety and reducing fuel consumption. This study aims to establish a driving style recognition framework based on longitudinal driving operation conditions (DOCs) using a machine learning model and natural driving data collected by a vehicle equipped with an advanced driving assistance system (ADAS). Design/methodology/approach – Specifically, a driving style recognition framework based on longitudinal DOCs was established. To train the model, a real-world driving experiment was conducted. First, the driving styles of 44 drivers were preliminarily identified through natural driving data and video data; drivers were categorized through a subjective evaluation as conservative, moderate or aggressive. Then, based on the ADAS driving data, a criterion for extracting longitudinal DOCs was developed. Third, taking the ADAS data from 47 Kms of the two test expressways as the research object, six DOCs were calibrated and the characteristic data sets of the different DOCs were extracted and constructed. Finally, four machine learning classification (MLC) models were used to classify and predict driving style based on the natural driving data. Findings – The results showed that six longitudinal DOCs were calibrated according to the proposed calibration criterion. Cautious drivers undertook the largest proportion of the free cruise condition (FCC), while aggressive drivers primarily undertook the FCC, following steady condition and relative approximation condition. Compared with cautious and moderate drivers, aggressive drivers adopted a smaller time headway (THW) and distance headway (DHW). THW, time-to-collision (TTC) and DHW showed highly significant differences in driving style identification, while longitudinal acceleration (LA) showed no significant difference in driving style identification. Speed and TTC showed no significant difference between moderate and aggressive drivers. In consideration of the cross-validation results and model prediction results, the overall hierarchical prediction performance ranking of the four studied machine learning models under the current sample data set was extreme gradient boosting > multi-layer perceptron > logistic regression > support vector machine. Originality/value – The contribution of this research is to propose a criterion and solution for using longitudinal driving behavior data to label longitudinal DOCs and rapidly identify driving styles based on those DOCs and MLC models. This study provides a reference for real-time online driving style identification in vehicles equipped with onboard data acquisition equipment, such as ADAS.


Introduction
Driving style can be defined as an individual's habitual manner of driving (Elander et al., 1993;Lajunen and Özkan, 2011;Sagberg et al., 2015) (i.e. a person's preference of velocity distribution), which is formed over time as that person accumulates driving experience .
Studies also explicitly describe the importance of acceleration behavior as a key indicator of driving style because individuals have different preferences for speed (Müller et al., 2013;Reiser, 2008). To differentiate between driving skill and driving style (Elander et al., 1993;Taubman-Ben-Ari et al.,2004), "skill" is defined as the driver's ability to maintain control of the vehicle and adapt to complex traffic conditions, and driving skill is expected to improve with practice or training. On the other hand, "style" is defined as the manner in which a driver chooses to drive or habitually drives (i.e. his/her choice of driving speed and headway).
A number of studies have shown that driving style has a significant impact on traffic safety (Evans, 1996), vehicle dynamics control (Plöchl et al., 2007) and the economic and ecological efficiency of driving (Mensing et al., 2014). However, driving style information cannot be directly measured nor detected. Existing studies have categorized driving behavior into driving maneuvers (e.g. following, hard braking, lane changing, etc.) (Bellem et al., 2016). These studies estimate driving style in terms of the durations or frequencies of individual maneuver states. However, driving style is easily affected by and fluctuates with the road traffic environment. Additionally, relatively static and singular driving data does not fully reflect the true driving style. On the other hand, one of the main factors affecting the identification of driving style is the real-time ability and effectiveness of data acquisition. Therefore, how to effectively use driving data to comprehensively and quantitatively analyze driving style has become a new field to be further explored (Qi et al., 2019).
In recent years, advanced driving assistance systems (ADASs) have significantly progressed, opening novel horizons in reducing traffic accidents (Rezaei et al., 2021). Specifically, with the rapid development of in-vehicle information systems and collision warning systems, a large amount of natural driving data can be acquired through these types of ADASs (Bao et al., 2020;Orlovska et al.,2020). In response to the great need of driving style identification for traffic safety and fuel economy, naturalistic data collection is becoming ever more feasible as the penetration rate of ADASs increases in vehicles and on roadways around the world. Therefore, to explore the influence of different driving behavior data on driving style identification and realize the rapid and efficient detection of driving style, this study obtained a large amount of naturalistic driving data through an ADASequipped vehicle and proposes a solution framework for rapid detection of driving style based on the driver's longitudinal driving operation conditions (DOCs). The proposed framework calibrates the driver's DOCs through naturalistic driving data and rapidly detects driving style through a machine learning model according to the driving behavior parameter characteristics of different DOCs. To achieve the main goal of this research, 44 subjects participated in naturalistic driving experiments and data from the driver characteristics, vehicle motion attitude and micro driving operation was collected. The framework for rapid identification of driving styles proposed in this research may be applied in intelligent connected and vehicle-road cooperative scenarios, providing a reference for real-time and efficient identification of driving style to help drivers make real-time driving decisions.

Literature review
In recent years, to discover and present driving style information in a scientific method, many models have been developed that assess driving style from different aspects. Since its publication, the multidimensional driving style inventory (MDSI) (Taubman-Ben-Ari et al., 2004) has been the subject of research around the world. It defines an individual's driving style as a driving-specific factor that can contribute to both crashes and traffic violations directly and in terms of more general socio-demographic and personal factors. The MDSI can increase driver awareness of his/her own and others' driving styles and be used to identify baseline driving styles prior to the implementation of road safety interventions as well as inform post-intervention assessments (Taubman-Ben-Ari et al., 2016). To determine whether the MDSI is consistent with actual driving behavior, Van Huysduynen et al. (2018) conducted a simulation experiment with 88 participants. The objective data retrieved from the simulator was compared with the scores obtained from questionnaire data. The analysis showed that there is a moderate correlation between self-reported driving style and driving behavior in the simulator. This suggests that MDSI can be used as a diagnostic tool to identify typical driving behaviors of individuals in driving simulators. Ishibashi et al. (2007) developed a driving style questionnaire (DSQ) to extract key indicators from self-reports and calibrate different driving styles. However, the DSQ focuses more on preferences for driving behavior, which is limited by sample characteristics and structural validity (van Huysduynen et al., 2018). In other words, the DSQ cannot fully describe an objective condition.
There have been many studies that classify driving style based on actual vehicle operating parameters, such as naturalistic driving and field operational tests (FOTs). For instance, Toledo et al. (2008) developed pattern recognition algorithms to identify more than 20 maneuvers (such as lane change and sudden braking) using naturalistic driving data on different roads; this information was collected by onboard data loggers. On this basis, drivers were divided into three categories combined with the weighted maneuvering frequency. The results showed that this method effectively predicts driving style. Wang et al. (2015) extracted emergency braking maneuver features from naturalistic driving data. On this basis, a classification regression tree model was established to estimate driving style, and drivers were divided into three risk groups according to nine rules. Xu et al. (2015) used naturalistic driving data from American highways and adopted a neural network (NN) model to divide driver styles into three types. In a simulated scenario, Baer et al. (2011) rated five driving styles: aggressive, anxious, economical, sensitive and calm. Judging from the literature described above, it can be observed that driving style classification methods and standards are not uniform. That said, previous studies have found that in naturalistic driving, drivers generically categorized as high-risk drive faster, exhibit shorter time headways (THWs), brake harder and change lanes more frequently than low-risk drivers (Sagberg et al., 2015;Xiong et al.,2012). It was also found from field operation tests that low-risk drivers engage in fewer risky maneuvers (Simons-Morton et al., 2015;Kusano et al.,2015).
While the aforementioned studies did identify differences between driving styles, they did not establish evaluation models to estimate driving style through different driving maneuvers. In contrast, Guo and Fang (2013) classified drivers into three risk groups by a K-means clustering method according to the maneuvers detected from naturalistic driving data on different roads in the USA; the authors established a logistic model to predict driving style, which showed that the frequency of emergency braking events was a valid indicator of high-risk drivers. Li et al. (2017) proposed a new method to identify driving style according to the transition patterns between maneuvering states. Driving behavior in highway traffic was divided into 12 maneuvering states. A conditional likelihood maximization method was used to extract typical maneuverability transfer patterns, which represented driving styles from 144 probabilities and the selected features were classified by a random forest algorithm. The results showed that the transitions concerning five maneuver statesfree driving, approach, near following, constrained left lane changes and constrained right lane changescan reliably classify driving style.  proposed an online driving style detection model based on both a normal component and classification component mixed recursive Bayesian estimation. Seven driving styles associated with fuel economy were identified using an online estimation algorithm. That algorithm can also be used to model and predict fuel consumption, speed, throttle pedal position and gear selection. Lu et al. (2021) tried to understand the influence of different driving styles (such as cautious, normal and aggressive) on key variables (such as speed) in traffic flow theory and revealed the influence on network efficiency. The characteristics of different driving styles were extracted from high-dimensional data clustering classes and transformed into different vehicle-following models, which were simulated in a SUMO traffic simulator.
The key to the modeling and analysis of driving style is the extraction of driving maneuver features. Driving maneuvers are mainly divided into longitudinal or lateral. Longitudinal maneuvers include free driving, approaching, following, opening and emergency braking. Longitudinal maneuvers are classified according to the value of the THW, longitudinal acceleration (LA) and the perception of changes in the outward size of the vehicle ahead (Toledo et al., 2007). More specifically, the THW and LA are commonly used to describe the following maneuvers (Kondoh et al., 2008). Further, when rapid deceleration is not occurring, a 3-s THW or less is considered to be car-following (Kusano et al., 2015;Transportation Research Board, The Highway Capacity Manual, 2010). In other words, If the THW of the front and rear vehicles exceeds 3.0 s, it is considered a free drive operation.
This study focuses on longitudinal driving behavior and simplifies the impact of lateral driving behavior. The scope of this study was based on an urban expressway with high traffic flow and speed and the influence of acceleration and deceleration during the process of vehicle following was considered. According to the literature summary and the understanding and analysis of naturalistic driving data, this study took a 6.0-s THW as one of the criteria to indicate carfollowing. The longitudinal driving data was extracted from naturalistic driving data to classify different driving conditions, and different machine learning models were selected to construct driving style classification models; the accuracy of the various models was then compared to find the best fit.

Test equipment and test route
To obtain real and reliable driving data, in this study, an automatic GAC Trumpchi passenger car equipped with FOT data acquisition equipment, as shown in Figure 1, was used to perform FOTs on various types of roads in Wuhan, China.
The multi-functional road test vehicle platform is shown in Figure 1 and the data types and parameter descriptions collected by each experimental equipment are shown in Table 1.
The installation of all instruments and equipment did not hinder normal driving, such that the driver could maintain a naturalistic driving state. The sampling frequency of the invehicle devices was 20$100 Hz and the sampling interval of all devices was set to 0.1 s. The naturalistic driving data was obtained in real time through the onboard laptop and the driving video data was continuously stored in the memory card.
As shown in Figure 2, the experimental route consisted of four sections. Detailed information for each section is provided in Table 2.
As can be observed from Table 1, Section 2 was a highway with dispersed traffic volume. During the FOT drives, the traffic flow on this section was low and the traffic density was sparse, such that the experimental vehicle was in a free-driving state for a long time. As can also be observed, Section 4 was an arterial with congested traffic volume. During the FOT drives, this section of the road had a high traffic flow and density, such that the experimental vehicle was in a car-following state for a long time. Therefore, in both Sections 2 and 4, the motion posture of the experimental vehicle was relatively stable; drivers did not make any significant operations that would make driving style identifiable.
On the contrary, Sections 1 and 3 were both expressways with moderate traffic volume, and the road parameters were similar. During the FOT drives, the traffic flow was moderate and traffic density was balanced, such that the experimental vehicle made a variety of motion postures, and the driver's operating characteristics were significantly different, rendering driving style easily identifiable. Therefore, 47 Kms of Sections 1 and 3 were selected as the expressway test bed from which to observe the naturalistic driving data.

Participants
This study mainly focused on model and data analysis. The experiment was outdoor naturalistic driving, the experimental Figure 1 Multi-functional road test vehicle platform road environment was good, the traffic volume was moderate and the weather was sunny. During the whole process of the experiment, an experimental assistant was arranged to monitor the risk factors and explain the experimental requirements in real time. The research plan was discussed with the research group, and all participants were informed of the experimental requirements and impacts.
Sample size selection is critical to obtaining sufficient experimental data. If the sample size is too small, the reliability of the results will be reduced and if the sample size is too large, resources will be wasted. For this study, the correct sample size was calculated based on expected variance, target confidence and error margin according to reference (Zhao et al., 2020) as follows: where N is the sample size; Z is the standard normal distribution statistic; s is the standard deviation; E is the maximum error. Generally, a significance level of 10% is chosen to reflect the 90% confidence level of the unknown parameter. In this study, when the confidence level was 90%, Z = 1.25, s was 0.25$0.5 (Chow, 2007) and E = 10%. Therefore, the minimum sample size required for calculation ranged from 10 to 39.
For this study, a total of 44 participants were recruited (female = 19; male = 25). The participants' age ranged from 22 to 55 years old (mean = 32.8, SD = 8.2). Their driving experience ranged from 2 to 18 years (mean = 6.9) and their total lifetime driving mileage ranged from 400 to 400,000 Kms (mean = 110,000). The distribution of gender, age and experience of the sample was consistent with the distribution of the general driving population in China.

Test process
In this study, naturalistic driving data was collected using a single test vehicle and a continuous measurement method. Each subject drove the test vehicle one time along the test road during a weekday. To avoid traffic flow disturbance caused by peak periods, the test was run between 09:00 to 16:00 (outside of rush hour). Each test provided subjects with route guidance only and did not interfere with their daily driving habits so as to keep the subjects in a naturalistic driving state. The test data was preprocessed to facilitate statistical analysis.

Data processing
The raw data collected by the natural driving experimental platform and the other methods is shown in Table 2. Because the original data collected by the onboard sensor inevitably experienced defects, such as missing frames, discontinuity and jump, it was necessary to clean and preprocess the original data to ensure quality. Therefore, this study used cubic spline interpolation to supplement the lost frames, filtered the noise and corrected the jump data based on the Savitzky-Golay filter and finally obtained accurate vehicle motion attitude.
The data collected in this study included driver attributes, operation parameters and road characteristics, as shown in Table 3. Driver attributes included driver ID, age and gender. Operation parameters included speed, LA, THW, time to  Figure 2 Naturalistic driving experiment data acquisition equipment Arterial 40-60 2-3 12 Congested collision (TTC) and distance headway (DHW). Road characteristics included road type and length.

Subjective driving style evaluation
As the DSQ uses subjective responses for driving style calibration, the analysis results are not only limited by sample characteristics and structural validity, but the data focuses more on driving behavior preferences and cannot fully describe a true objective driving condition. In this study, using the threepoint scale method (Li et al., 2017), three drivers with rich driving experience (the actual driving mileage per person was more than 60,000 Kms and the driving experience per person was more than eight years) were selected as the scoring experts. Driving style was scored according to the video data based on three points, namely, 1 indicated a conservative driving style, 2 indicated a moderate driving style and 3 indicated an aggressive driving style. The scoring rules were set as follows: where E A is the scoring value of the first expert, E B is the scoring value of the second expert and E C is the scoring value of the third expert. The results from the DSQ are shown in Figure 4. In total, 16 drivers were scored as cautious, 22 drivers were scored as moderate and 6 drivers were scored as aggressive.

Research strategy
In a naturalistic driving environment, due to the influence of road conditions, traffic conditions, driver characteristics and other impactful factors, drivers will make myriad operations, such as accelerating, decelerating, parking, approaching, following and more. However, because different drivers have different driving styles, they make different operations under the same conditions. Therefore, the driver's operating performance under these different driving conditions can be used to identify that driver's style.

Label method of driving operation conditions
Previous studies have shown that relative distance and relative speed are two important indicators of longitudinal driving; they can be used to simulate driver behavior by taking them as elements of a regression function in longitudinal driving scenarios and models (Itkonen et al., 2020). Therefore, in this study, speed and THW were selected as the label basis of the DOCs. The following sections discuss the DOC labels and the labeling process is shown in Figure 6. The labeling of longitudinal driving behavior conditions was performed in two steps:

Label acceleration and deceleration segments
Taking an acceleration segment as an example, a sliding time window was adopted. From the initial moment when the vehicle entered the expressway, a fixed sampling threshold was set to 50 frames. As shown in Figure 7, the abscissa represents the number of frames, and the ordinate represents the speed. Within the 50frames range of the sliding time window (t 2 ,Àt 1 ! 50), if the speed increased, the driving segment of (t 1 , t 2 ) was temporarily marked as an accelerating segment, otherwise, the driving Research strategy of driving style identification segment of (t 1 , t 2 ) was marked as a conventional driving segment. If the speed decreased in the range of (t 2 , t 3 ), subsequent processing was required. The subsequent processing followed key principles: When the speed decreased at t 2 but started to rise at t 3 and the speed reaches its peak at t 4 : If t 3 , À t 2 ! 5, v t4 > v t2 , then (t 1 , t 4 ) was marked as an accelerating segment; If t 3 , À t 2 > 5, t 4 , À t 3 ! 50, then (t 1 , t 2 ) and (t 3 , t 4 ) were marked as accelerating segments and (t 1 , t 2 ) was marked as a conventional driving segment for a further label; If t 3 , À t 2 > 5, t 4 , À t 3 < 50, then (t 1 , t 2 ) was marked as an accelerating segment and (t 2 , t 4 ) was marked as a conventional driving segment for a further label.
Then, THW was used to determine whether the vehicle was following a car in the time window and the driving segment was labeled as either a following acceleration condition (FAC) or a free acceleration condition (FrAC). Because of the detection equipment, a 0 in the THW data meant that there was no leading vehicle and a non-zero meant that there was a leading vehicle detected ahead. In addition, the accelerating segment with THW ! 6s was also marked as a FrAC because when THW ! 6, the vehicle was in a relatively safe driving state. The label process for deceleration conditions was similar.

Label other conventional driving segments
The other conventional driving conditions included a free cruise condition (FCC), following steady condition (FSC), relatively distant condition (RDC) and a relative approximation condition (RAC). The sliding window was used to identify and label these continuous driving segmentsall except for the FCC, which was labeled based on a THW > 6s or THW = 0and the threshold and methods were similar to the acceleration label process described above. Within the sampling threshold, an increasing or decreasing THW was determined and the FSC, RDC and RAC were automatically labeled by MATLAB.
This study did not consider the impact of latitudinal vehicle operations (i.e. lane-changing). Only longitudinal driving conditions were considered. To sum up, the eight longitudinal driving conditions are defined as follows.
The FrAC and FrDC indicate that the speed of the host vehicle increased or decreased, respectively, within the sliding window detection time of 50 frames and either no leading vehicle was in front or the headway time between the front and rear vehicles was more than 6.0 s.
The FAC and FDC (following deceleration condition) indicate that the speed of the host vehicle increased or decreased, respectively, within the sliding window detection time of 50 frames and a leading vehicle was in front and the headway between the front and rear cars was within 6.0 s.
The FCC indicates that the speed of the host vehicle changed repeatedly within the sliding window detection time of 50 frames and either a leading vehicle was not detected in front or the headway between the front and rear vehicles was more than 6.0 s.
The RDC and RAC indicate that the speed of the host vehicle changed repeatedly and alternately within the sliding window detection time of 50 frames and a leading vehicle was in front and the headway of the front and rear vehicles was within 6.0 s. Within the 50-frame sliding window detection time, the headway time showed an increasing RAC or a decreasing RDC.
The FSC indicates that the speed of the host vehicle changed repeatedly within the sliding window detection time of 50 frames and there was a leading vehicle in front and the headway between the front and rear vehicles was within 6.0 s. Within the sliding window detection time of 50 frames, the headway time showed repeated and alternate changes.

Measurement of index
Drawing on the 10 observable driving style indices described in existing literature (Itkonen et al., 2020), the longitudinal driving behavior analysis indices and particular index measurement, including speed (V), LA, THW, the count backward of TTC and DHW, were selected to characterize the driving style. For each driving condition, the index measurement was different. For example, for FCC, because only parameters of the vehicle were relevant, only speed and LA were calculated. The particular analysis indices and index measurement values of the DOCs are shown in Table 4. The studied naturalistic driving data was captured from 44 participants driving on experimental road section 1 and experimental road section 3. The speed limit of road 1 was 70 Km/h and the speed limit of road 3 was 80 Km/h. In addition, the length of the two roads and the traffic flow on each road also differed, as observed through video. Therefore, to ensure high quality data analysis, the data of roads 1 and 3 were divided into independent analysis units and data from the whole process of driving on each section from the beginning to the exit was divided into small units with equal time intervals according to t = 600 frames. Then, the statistical index values in the small units that were split in different sections were analyzed, as shown in Table 3. Fragmented data less than 10 min was removed and subsequent analysis was not carried out. In this way, the naturalistic driving data from the 44 drivers on the two tested expressways was divided into 229 driving segments, and all the statistical analysis indicators were summarized to form a 229 Â 211 driving condition index analysis matrix.

Machine learning classification methods
This study aimed to test the feasibility of using longitudinal DOCs to identify driving styles through MLC algorithms. To achieve these goals and based on previous literature, it was found that MLC models, namely, SVM, XGB, LR and MLP, have shown relatively good predictive performance in existing practical applications. Therefore, this study evaluated the prediction performance of these four machine learning models based on the label analysis of the DOCs: (1) Support vector machines (SVMs): Support vector machines (SVM) are one of the most widely used supervised classification methods in the field of machine learning and artificial intelligence. The SVM proposed by Cortes and Vapnik made full use of the structural risk minimization theory, thus ensuring the strong generalization ability of the model (Cortes and Vapnik., 1995). SVMs are a supervised learning method to predict the labels of points in the test data set by learning the model of the training data set. This method is well-known in computer science and has been widely used in the field of transportation engineerings, such as traffic accident prediction (Tang et al., 2020;Zhang et al.,2018), road risk prediction (Basso et al., 2018), vehicle trajectory state recognition (Siddique et al., 2019) and path selection (Sun et al., 2017), driving behavior prediction (Wang et al., 2017) and driving state recognition (Chai et al., 2019;Allahviranloo, 2013). SVMs have generally good predictive performance.
(2) Extreme gradient boosting (XGB): XGB is an integrated machine learning model based on many decision trees that use an optimized gradient boosting system. It has the advantages of performing parallel processing, approximate greedy search and improving the learning process in the shortest time without overfitting. It has been proven that XGB has superior predictive performance and processing time compared with the random forest model (Chen and Boost, 2016). In recent years, XGB models have been proven to have good performance in traffic flow prediction (Mahmoud et al., 2021), rail defects prediction (Mohammadi et al., 2019), driving behavior prediction (Ayoub et al., 2021) and road risk identification and prediction (Das et al., 2020).
(3) Logistic regression (LR): LR is generally used to model the relationship between a categorical dependent variable and categorical/dichotomous/ continuous independent variables. These models predict the probability of occurrence of the dependent variable using a set of given independent variables (Venkata et al., 2020). LR is a generalized linear model and has been widely used in accident prediction (Venkata et al., 2020;Dong et al.,2018) and conflict risk prediction (Costela et al., 2020) in traffic safety research. At the same time, it is used in traffic system performance tests (Cafiso et al., 2020;Liu et al.,2018) and behavior prediction (Farooq et al., 2021;Ghasemzadeh et al., 2018).
(4) Multi-layer perceptron (MLP): The use of NNs and deep learning optimization algorithms to enhance discrete selection models is an active research area, which has shown encouraging results (Zargarnezhad et al., 2019). In recent years, experimental cases of deep learning methods in discrete choice models have been explored, such as personal travel mode prediction (Omrani, 2015), path tracking prediction (Ge et al., 2021), driving behavior feature recognition (Jasper et al., 2018) and more. As a basic threelayered back-propagation MLP model was used to develop the first NN (Clark, 1993), MLP has been developed into a novel non-parametric approach based on an MLP NN and has been demonstrated to be successful in complex behavioral data modeling (Costa et al., 1997).

Model prediction performance evaluation
After parameter adjustment and model training, it was necessary to evaluate the generalization ability of the model on an independent test set. To evaluate the performance of the prediction model, a confusion matrix was introduced. Taking the dichotomy problem as an example, the confusion matrix is shown in Table 5. True positive indicates that the number of the true value was positive and the predicted value was positive. False negative indicates that the number of the true value was positive, but the predicted value was negative. False positive indicates that the number of the true value was negative, but the predicted value was positive. True negative indicates that the number of the true value was negative and the predicted value was negative.
The indicators of accuracy (ACC), precision (PPV), sensitivity or recall rate (TPR), FPR, specificity TNR and the F1-score were used to evaluate the performance of the models. The calculation formula and meaning of the evaluation indices are shown in Table 6.

Calibration results of longitudinal driving operation conditions
Naturalistic driving data from 47 Kms of the expressway (test route 1 and 3) was extracted and the label method described in the previous section was used to identify the DOCs from 44 drivers on the tested expressway. Figure 10 describes the DOCs frequency distribution from different drivers. It can be observed that unlike the label results on the entire experimental section, only six DOCs, namely, FAC, FDC, FCC, RDC, RAC and FSC, appeared on the expressway for all drivers, while FrAC and FrDC did not appear at all. By definition, FrAC and FrDC generally do not appear on expressways and by reviewing the natural driving video data, it was also confirmed that FrAC and FrDC are not present on the tested expressway. Table 7 that the FCC occurred most frequently, indicating that, when driving on the expressway, drivers were most likely to adopt FCC and less likely to adopt FAC and FDC. The reason may be that when formulating the criteria for labeling the DOCs, the model was established based on the naturalistic driving data of the entire experimental road section. The data input took into account driving data from multiple types of roads, whereas only two types of roads were actually analyzed. In addition, the overall law of DOCs distribution among all drivers was roughly the same, but the mean and variance of each DOC ratio were different, which reflects the heterogeneity of the frequency distribution of the different DOCs.

Driving style identification with different machine learning classification methods
With reference to the four machine learning models, a sample set was established to distinguish driving style. The difference in this study is that the samples were divided into driving style labelsnamely, conservative driving style, moderate driving style and aggressive driving stylein the data aggregation stage. The sample set was divided into 70% training set and 30% test set. At the data level, the problem of the imbalanced number of samples for conservative drivers, moderate drivers and aggressive drivers was addressed. The ENN method was used to undersample the normal samples in the training set. Then, the five-fold cross-validation method was used to train and verify the data of the training set and finally the model was tested on an independent test set. Table 8 shows the confusion matrix predicted by the established model to distinguish driving style on the independent test set. The values in Table 8 represent the number of driving segments in the test set. Table 9 shows that the MLP model had the highest overall accuracy. The most accurate prediction models of aggressive driving style, moderate driving style and conservative driving style were XGB, MLP, MLP (PPV Aggressive ¼ 1:000; PPV Ordinary ¼ 0:659; PPV Conservative ¼ 0:867Þ, respectively. From the perspective of sensitivity (TPR), the detection rate of moderate driving style was higher than that of aggressive driving style and conservative driving style. This shows that these models had better predictive ability for moderate driving  Integrate the results of precision and recall's output. The value ranged from 0 to 1. 1 represents the best output of the model, and 0 represents the worst styles. The FPR of moderate driving style was higher than that of aggressive driving style and conservative driving style. In the point of view of the F1-score, apart from the LR model, the other prediction results exceeded 0.5, indicating that the overall output performance of the model was general under this sample size. Because it was difficult to clearly define a moderate driving style, its recognition rate was not high, which affected the overall recognition level of all the models. Table 9 also shows that, under the current sample size, a small number of extracted longitudinal driving conditions can be used to effectively identify driving styles through MLC models, and with the increase of sample size, the accuracy of driving style identification will significantly improve. However, different MLC models differ in performance in the identification of driving style. It was found that the four models  all showed good performance in the prediction of driving style. However, in terms of accuracy, precision, recall and F1-score, the MLP model had the best prediction results.

Statistical analysis of parameters based on different driving styles
According to the calibration results of the DOCs, a scatter diagram of average longitudinal driving behavior parameters was drawn, as shown in Figure 11. It can be observed that the scatter distribution of THW and DHW was significantly different. Compared with cautious and moderate drivers, aggressive drivers adopted a smaller THW and DHW during the natural driving experiment, indicating that THW and DHW showed high significance for the identification of driving style. However, the significance of the other three parameters for the identification of driving style needed to be further analyzed.
According to normality and lognormality tests, it was found that the longitudinal driving behavior parameters of different driving styles do not conform to the Gaussian distribution. Therefore, a non-parametric test was adopted to analyze the correlation of longitudinal driving behavior parameters to different driving styles. As the number of drivers who exhibited different driving styles was imbalanced, as were the different DOCs parameters, the sample size of each group was asymmetrical. Therefore, the Kruskal Wallis test method was used for a non-parametric one-way ANOVA of the population sample. Meanwhile, Dunn's Multiple Index test method was also selected for the non-parametric one-way ANOVA comparative analysis of driving data from drivers who exhibited different driving styles. The results are shown in Table 10.
The four longitudinal driving behavior parameters of speed, THW, TTC and DHW showed significant differences in driving style identification, while the LA showed no significant difference in driving style identification. In particular, THW, TTC and DHW showed highly significant differences in driving style identification. This also indicated that the driver's subjective perception of LA during natural driving was far less strong than the objective factors of speed, THW, TTC and DHW. This distinction is useful for ADAS-equipped vehicles, which can display THW, TTC and DHW in real time through the onboard intelligent display terminal, so that drivers can easily respond to this data and adopt different driving strategiesalso in real time.
From the results of multiple comparison analyzes, LA showed no significant difference between the three driving styles. At the same time, speed and TTC showed no significant difference between moderate and aggressive drivers. This also indirectly shows that there was little difference between moderate and aggressive drivers.

Statistical analysis of parameters based on different longitudinal driving operation conditions
Based on the 229 segments of naturalistic driving data, the box plots of the mean values of speed, LA, THW, TTCi and DHW were drawn according to the six DOCs, as shown in Figure 12. It should be noted that the FCC lacked the statistics of THW, TTCi and DHW.
It can be observed in Figure 12 that, among all the DOCs, the mean speed of the FDC was the lowest (FAC = 54.3 Km/h, FDC = 42.4 Km/h, FCC = 47.3 Km/h, FSC = 57.5 Km/h, RDC = 58.3 Km/h and RAC = 60.1 Km/h). This shows that when drivers were in the FDC, most drove at a low following  speed to maintain safety. However, the average speed was higher in the FAC, which indicates that the following vehicle accelerated when the lead vehicle accelerated. The speed distribution of the FSC, RDC and RAC was relatively uniform. The FCC had the largest range of speed fluctuations. This may be related to the fact that the vehicle entered an expressway from an urban road with a relatively low average speed. During this process, vehicles were required to accelerate. The average value of LA of FAC and FDC had similar distributions and the average absolute value of the LA between FAC and FDC (FAC = 0.42, FDC = 0.47) had little difference, but the absolute value of the maximum value of FDC was slightly larger than FAC (FAC = 0.78, FDC = 0.90) and significantly higher than the other DOCs. This shows that the driver had obvious acceleration or deceleration under these two DOCs, but the driving operation under the other DOCs was relatively smooth. The abnormal value of LA also illustrated the operating performance of aggressive drivers under different DOCs.
Meanwhile, the interquartile range of the average THW of FAC (Q3 = 4.40 s, Q1 = 2.92 s, IQR = 1.48 s, mean = 3.46 s) was bigger than FDC (Q3 = 3.58 s, Q1 = 2.22 s, IQR = 1.36 s, mean = 2.94 s), which indicates that drivers generally maintained a larger THW when following accelerating vehicles than when following decelerating vehicles. This shows that when a rear vehicle followed a front accelerating vehicle, the rear vehicle showed a delay effect. When the rear vehicle followed a front decelerating vehicle, the rear vehicle showed aggressive behavior, resulting in a small THW. This can also be observed from the LA index of FDC. It can be observed from the interquartile range of the average THW of FSC (Q3 = 2.35 s, Q1 = 1.41 s, IQR = 0.94 s, mean = 1.86 s), RDC (Q3 = 1.98 s, Q1 = 1.49 s, IQR = 0.49 s, mean = 1.71s) and RAC (Q3 = 2.00 s, Q1 = 1.55 s, IQR = 0.45 s, mean = 1.80 s) that when the vehicle was in these three DOCs, although the vehicle was still following, it did not rapidly accelerate or decelerate, but the THW was already less than 3.0 s, which is consistent with existing research conclusions (Xu et al., 2015;.
In general, although drivers exhibited different driving styles, they all maintained a large TTC when driving on the expressway. While the TTC index has been widely used for   According to normality and lognormality tests, it was found that the longitudinal driving control data of different DOCs did not conform to the Gaussian distribution, so a non-parametric test and analysis was adopted.
As the number of driving segments was consistent, the Friedman test method was used for a non-parametric one-way ANOVA of the sample population. At the same time, Dunn's multiple comparisons test method was selected to perform a non-parametric one-way ANOVA comparison analysis on the driving segment data from different DOCs. The analysis results are presented in Table 11, which shows that longitudinal driving behavior parameters showed highly significant differences in the calibration of longitudinal DOCs (p < 0.001).
6.3 Frequency of longitudinal driving operation conditions based on different driving styles As shown in Figure 13 and Table 11, the results of the DOC calibrations were classified and statistically analyzed according to driving style. In this naturalistic driving test, all drivers regardless of their dominant style preferred FCC. In addition to the influence of road factors (such as less crowded traffic flow and better road alignment), it showed that all drivers preferred free cruising conditions and attempted to avoid complex following conditions. It can be observed in Table 11 that cautious drivers took the largest proportion of FCC and the one-way ANOVA showed no difference (P = 0.073), indicating that cautious drivers tended to maintain FCC for a long time. On the contrary, there were significant differences between moderate and aggressive drivers, indicating that they will change their driving strategies according to the changes of driving environment in the process of naturalistic driving. In particular, the proportion of FSC and RAC by aggressive drivers was higher, indicating that aggressive drivers tended to challenge complex driving conditions.

Discussion of model recognition results
With reference to the four machine learning models, a sample set was established to distinguish driving styles. The difference Figure 12 Box plot for different DOCs is that the samples were divided into driving style labels in the data aggregation stage, namely, conservative driving style, moderate driving style and aggressive driving style. The sample set was divided into 70% training set and 30% test set. At the data level, the problem of the unbalanced number of conservative driving style, moderate driving style and aggressive driving style samples was addressed. The ENN method was used to undersample the normal samples in the training set. Then, the five-fold cross-validation method was used to train and verify the data of the training set and finally the model was tested on an independent test set. Table 9 shows the confusion matrix predicted by the established model to distinguish driving style on the independent test set. The values in Table 9 represent the number of driving segments in the test set. Figure 14 shows the variation trend and overfitting of the prediction accuracy of the training set and validation set with the increased sample training number in the cross-validation process of the four machine learning models, namely, SVM, XGB, LR and MLP. Table 9 shows the comparison of the prediction results of these models on the test set. For multiple classification problems, the evaluation index of the model was redefined. The accuracy of the model was the same as that of the binary classification problem, which was still the right proportion of the correctly classified samples to all the samples. As the confusion matrix of the three-way classification was different from that of the dichotomy, the PPV, TPR, FPR, TNR, F1-score were also different. In this study, to directly reflect the prediction of different driving styles, when calculating the evaluation index of any type of driving style prediction, the two types of driving styles were merged as one situation and then it was regarded as a binary classification problem. Figure 14 shows that the fitting accuracy of the SVM model on the training set was less than 80%, while the fitting accuracy of the other three models on the training set reached 100%. Moreover, with the gradual increase of the number of samples, Figure 13 Frequency of different DOCs between different driving style  From the point of view of the validation score, all the models were over-fitting. However, as the sample size gradually increased, the scores of all the models on the test set showed an upward trend and the change was most obvious for the XGB model. With the increase of the test sample size, the problem of overfitting of each classification model was gradually alleviated. Compared with other models, the overfitting problem of the SVM model had a smaller gap, but this was because the performance of the SVM model increased on the test set but decreased on the training set. That is to say, the SVM model relied on the decrease of accuracy on the training set and the increase of accuracy on the test set to solve the over-fitting problem, which is completely inconsistent with the performance of the other three models. Therefore, after analyzing the crossvalidation results of the different machine learning models, the heirarchical performance ranking of the four models on the test set and training set was XGB 1 MLP 1 LR 1 SVM.
Considering model cross-validation results and prediction results, the overall heirarchical prediction performance ranking of the four machine learning models under the current sample data set was XGB 1 MLP 1 LR 1 SVM.

Conclusions
The driving style of each driver is not fixed; it is affected by driving environment, traffic state, psychological state and myriad other influencing factors. This exemplifies the characteristics of temporal and spatial instability and segment heterogeneity. If a real-time evaluation method of driving style based on driving segment change can be constructed, it is of great significance for formulating personalized driving strategies, improving driving safety and reducing fuel consumption. The purpose of this research was to identify DOCs based on longitudinal driving behavior data and rapidly predict and label driving styles through MLC models. The main contributions of this research are as follows: Based on the longitudinal driving behavior parameters of naturalistic driving data, six DOCs of naturalistic driving on expressways were calibrated by formulating reasonable calibration rules, and the feasibility of the DOC calibrations was verified by naturalistic driving video data.
Compared with cautious and moderate drivers, aggressive drivers adopted a smaller THW and DHW during naturalistic driving. THW, time-to-collision (TTC) and DHW, three well-established longitudinal driving behavior parameters, showed highly significant differences in driving style identification, while LA showed no Figure 14 Cross-validation results of different machine learning models significant difference in driving style identification. At the same time, speed and TTC showed no significant difference between moderate and aggressive drivers.
Cautious drivers undertook the largest proportion of FCC, while aggressive drivers primarily undertook FCC, FSC and RAC, which indicated that cautious drivers preferred free cruising, but aggressive drivers tended to challenge complex driving conditions.
Four MLC methods, namely, SVM, XGB, LR and MLP, were used to classify and predict driving style based on the six DOCs. In consideration of the cross-validation results and model prediction results, the overall hierarchical prediction performance ranking of the four machine learning models under the current sample data set was XGB 1 MLP 1 LR 1 SVM.
The contribution of this research is to propose a criterion and solution for using longitudinal driving behavior data to label longitudinal DOCs and rapidly identify driving styles based on those DOCs and MLC models. This study provides a reference for real-time online driving style identification in vehicles equipped with onboard data acquisition equipment, such as ADAS.
However, there are still some directions to be further studied: Naturalistic driving data was heterogeneous due to different road types; as a result, the threshold criterion for the label of the DOCs based on driving data from different road types may not be portable nor extensive. Therefore, the DOCs calibration criteria developed in this study may not be fully applicable to driving style identification on all types of road scenes. In addition, the problems of endogeneity among various DOCs and the spatiotemporal correlation also needs to be further studied.
The influence of lateral driving behavior was simplified in this research, which may affect the training and test performance of the model. This research was an attempt to quickly label driving style. The multi-dimensional data of the vehicle's longitudinal and lateral driving behavior will be worth considering for modeling in future research.
The amount of sample input in this study was insufficient, which is reflected in the fact that the problem of overfitting was common in the process of model training and testing and the generalization error was large. Future research will carry out more naturalistic driving data collection to verify the model. At the same time, it is also necessary to carry out multi-scenario testing to study the applicability of the model under multiple scenarios.