Effects of feature selection on lane-change maneuver recognition: an analysis of naturalistic driving data

Xiaohan Li (Chair of Human-Machine-Systems, Technishe Universität Berlin, Berlin, Germany)

Wenshuo Wang (Department of Mechanical Engineering, Carnegie Mellon University, PA, USA)

Zhang Zhang (Chair of Human-Machine-Systems, Technishe Universität Berlin, Berlin, Germany)

Matthias Rötting (Chair of Human-Machine-Systems, Technishe Universität Berlin, Berlin, Germany)

Journal of Intelligent and Connected Vehicles

ISSN: 2399-9802

Article publication date: 14 December 2018

Issue publication date: 8 February 2019

Downloads

1170

pdf (805 KB)

Abstract

Purpose

Feature selection is crucial for machine learning to recognize lane-change (LC) maneuver as there exist a large number of feature candidates. Blindly using feature could take up large storage and excessive computation time, while insufficient feature selection would cause poor performance. Selecting high contributive features to classify LC and lane-keep behavior is effective for maneuver recognition. This paper aims to propose a feature selection method from a statistical view based on an analysis from naturalistic driving data.

Design/methodology/approach

In total, 1,375 LC cases are analyzed. To comprehensively select features, the authors extract the feature candidates from both time and frequency domains with various LC scenarios segmented by an occupancy schedule grid. Then the effect size (Cohen’s d) and p-value of every feature are computed to assess their contribution for each scenario.

Findings

It has been found that the common lateral features, e.g. yaw rate, lateral acceleration and time-to-lane crossing, are not strong features for recognition of LC maneuver as empirical knowledge. Finally, cross-validation tests are conducted to evaluate model performance using metrics of receiver operating characteristic. Experimental results show that the selected features can achieve better recognition performance than using all the features without purification.

Originality/value

In this paper, the authors investigate the contributions of each feature from the perspective of statistics based on big naturalistic driving data. The aim is to comprehensively figure out different types of features in LC maneuvers and select the most contributive features over various LC scenarios.

Keywords

Citation

Li, X., Wang, W., Zhang, Z. and Rötting, M. (2018), "Effects of feature selection on lane-change maneuver recognition: an analysis of naturalistic driving data", Journal of Intelligent and Connected Vehicles, Vol. 1 No. 3, pp. 85-98. https://doi.org/10.1108/JICV-09-2018-0010

Publisher

:

Emerald Publishing Limited

License

Published in Journal of Intelligent and Connected Vehicles. Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) licence. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this licence may be seen at http://creativecommons.org/licences/by/4.0/legalcode

1. Introduction

Lane-change (LC) accidents are accounting for about 4-10 per cent of all crashes (Barr and Najm, 2001) and 1.5 per cent of all motor vehicle fatalities in the USA (N. H. T. S. Administration, 2015). With the development of advanced driver assistance systems, functions such as lane departure warning (LDW) and lane-change assist (LCA) hit the market to help avoid LC related accidents (Visvikis et al., 2008). One of the key problems is how to correctly recognize driver’s LC maneuver in advance. When an improper LC maneuver is occurring, the driver assistance systems should either give warnings or assist this person aborting the maneuver.

Supervised learning is popularly used for recognizing LC maneuvers. This method raise the challenges of how to select the most contributive or efficient features. Longitudinal features [e.g. time to collision, TTC, (Liebner et al., 2013) (Peng et al., 2015), longitudinal acceleration] and lateral features [e.g. steering angle (Xu et al., 2012), yaw rate (Sivaraman and Trivedi, 2014; Doshi et al., 2011), lateral acceleration (Boubezoul et al., 2009; Kasper et al., 2012)] have been used, with the assumption that they are strong enough for LC maneuver recognition by either intuition or empirical knowledge; however, this assumption is still hanging on and yet comprehensively studied.

In general, LC maneuver can be either discretionary or mandatory. A mandatory LC will occur when a driver must leave a lane due to a lane drop or bypass a blockage, etc. A discretionary lane change occurs when a driver prefers a more efficient adjacent lane (J2944, 2013), for example, passing a slow-moving leading vehicle to maintain the current speed (Lee et al., 2004). So it would require different weighted features to recognize LC maneuver with discretionary and mandatory LC cases. Leonhardt and Wanielik evaluated the effects of various features in different LC driving scenarios (Leonhardt and Wanielik, 2017) and showed that even for the same feature, the weight of overtaking a slow vehicle and merging is different. Thus, feature selection process should also take LC scenarios into account.

In this paper, we propose a feature selection method for predicting driver LC behavior. Our aim is to comprehensively figure out different types of features in LC maneuvers and select the most contributive features over various LC scenarios. The main contribution of our work can be summarized as follows:

presenting a feature selection method from the perspective of statistics to investigate the statistical significance of each feature based on big naturalistic driving data;
both time-domain and frequency-domain features are considered to fill in gaps in existing works on feature selection; and
taking different driving scenarios into consideration in the feature extraction procedure to comprehensively evaluate the extracted features.

The remainder of this paper is organized as follow. Section 2 reviews the related work of feature selection and LC maneuver recognition. Section 3 describes how to model the contextual traffic in each LC scenario. Section 4 details data processing and feature extraction. Section 5 shows the results of feature selection and model performance. Section 6 makes conclusion and discussion.

2. Related works

2.1 Lane-Change maneuver recognition

Machine learning techniques, such as support vector machines (SVM) (Mandalia and Salvucci, 2005; Kumar et al., 2013), Naive Bayes (NB), Decision Tree (DT), k-nearest neighbor (KNN) (Lethaus et al., 2013), artificial neural networks (ANN) (Peng et al., 2015) and Bayesian Networks (BN) (Kasper et al., 2012; Li et al., 2016; Weidl et al., 2018), have been implemented to recognize driver LC maneuvers based on a well-trained classifier using labeled datasets. Then new data are fed to the classifier to determine the classification of either LC or LK maneuver. In this way, data being classified as LC means the driver at the moment is prone to make LC maneuver, otherwise not.

Although most papers have made comparison work to show their effectiveness, it is achieved by using the less contributive features in their proposed model. To overcome this bias, we evaluate model performance using the identical features – the comprehensively selected features – and then give them a relatively objective rate to evaluate their contribution for maneuver recognition.

2.2 Feature selection

The goal of feature selection is to reduce the dimension of training datasets by removing redundant information. In general, feature selection methods can be grouped into filter and wrapper methods. Filter methods analyze the intrinsic properties of data, ranking and selecting features without involving learning algorithms. On the contrary, the wrapper method with learning algorithms would get involved to score a given subset of features (Guyon et al., 2008). For wrapper methods, the ranking of features can vary from model to model. Here, we shall find the intrinsic properties of feature candidates related to LC and select the most contributive features rather than ranking and selecting features for a specific learning approach. Therefore, the filter method was selected in this paper.

For LC maneuver recognition, the data collected from sensors are time series data, and the properties of the features in the time domain are the most frequently extracted (Liebner et al., 2013) (Kasper et al., 2012). On the other hand, frequency-domain features have already been used to recognize driver state, for instance, the power spectrum features via wavelet transform were selected for Belief networks (Hajinoroozi et al., 2015; Chen et al., 2015). In other areas of time series recognition such as speech recognition (Thomas et al., 2008) and anomaly detection (Zhang et al., 2008), frequency-domain features play an important role.

In this paper, we consider the properties of the features in both time domain and frequency domain to select the most contributive features for LC maneuver recognition.

2.3 Modeling contextual traffic

To capture the most contributive features in each LC scenario, we first model the contextual traffic of the ego vehicle. A potential field diagram (Woo et al., 2016) composed of bubbles with different dynamic sizes was used to describe the dynamic relationship between the ego vehicle and its surrounding vehicles. However, it is not an intuitive way for driving situation analysis. Leonhardt and Wanielik (2017) developed a probabilistic situation assessment model to judge the safety state of the ego vehicle with its surrounding vehicles; however, it only works as behavior recognition model with a single input.

To easily describe the relationship between the ego vehicle and its surroundings, one of the most popular approaches is to segment the surrounding traffic into cell grids. The occupancy state of each cell is represented by a binary value, i.e. occupied or empty (Kasper et al., 2012; Do et al., 2017). We model contextual traffic also based on the cell grid method which is detailed in the next section.

3. Lane-Change scenario modeling

In Do et al. (2017), nine cells and 32 cases (25) were considered in the driving contextual traffic. But the authors did not give the specific boundary of the cells. Kasper and Weidl modeled the cell by carrying the speed-dependent information when a cell will be occupied or will become free (Kasper et al., 2012). But they assumed that the vehicle can move unobstructed toward certain cell, which cannot be satisfied in some situation where a car wants to overtake the ego car. In this paper, we model the cell grid by considering the dynamic relationship between the ego car and the surrounding cars. A three-cell grid is used to model contextual traffic where the ego vehicle executes the LC maneuver, for both left and right LC with eight cases.

As limited by our on-board sensors, which can only detect the traffic in front of the ego vehicle, the traffic situation on back of the ego vehicle is not considered. Despite such limitation, our method of modeling contextual traffic can be extended to more cell grids which can include the traffic on back of the ego vehicle. Here we only model the contextual traffic in front of the ego vehicle as is depicted in Figure 1(a).

We adopt the theory presented in Karim et al. (2013) to define the middle cell (cell_m) and theory in Kesting and Treiber (2013) to define the left (cell_l)/right cell (cell_r). The dynamic length of each cell is s1*, s2*, s3*, as shown in Figure 1(a). The length of cell_m is defined by a mean safe time gap (MSTG) in Karim et al. (2013) as:

(1) MSTG=BTEV – BTOV + RT

where BT_EV and BT_OV are the brake time of the ego vehicle and object vehicle 1, respectively, RT is the driver’s perception-reaction time and for certain vehicle, the BT is calculated by an empirical equation:

(2) BT=0.02321⋅v – 0.08785

where v is the vehicle speed and thus:

(3) BTEV – BTOV=0.02321⋅Ṙ

where Ṙ is the range rate between the ego vehicle and object vehicle 1. So, the dynamic length of s1* can be written as:

(4) s1*=v⋅MSTG

where v is the longitudinal speed of the ego vehicle.

We define cell_l and cell_r based on the Intelligent Driver Model (IDM) (Kesting and Treiber, 2013). Here, the safe distance is derived from the leading vehicle, driving at a desired speed, or preferring accelerations to be within a comfortable range. Additionally, kinematic aspects are taken into account, such as the quadratic relation between braking distance and speed. First, on the left and right lane, desired distances on the left ( sl*) and right ( sr*) lane are defined respectively as:

(5) sl*=s0+max⁡(0,v⋅T+Ṙl⋅Rl2⋅a*⋅b*)

(6) sr*=s0+max⁡(0,v⋅T+Ṙr⋅Rr2⋅a*⋅b*)

where s₀ is the minimum (bumper-to-bumper) gap, T is the safe time gap, a^* and b^* are acceleration and comfortable deceleration. R_l, Ṙ_l and R_r, Ṙ_r are the range and range rate the ego vehicle with object Vehicle 2 and Vehicle 3 in Figure 1(b), respectively. The dynamic terms Ṙl⋅Rl/(2⋅a*⋅b*) and Ṙr⋅Rr/(2⋅a*⋅b*) imply the intelligent braking strategy for LLC and RLC cases.

Second, based on the desired distance on the left ( sl*) and right ( sr*) lane, the dynamic safety distance, namely, the length of s2* and s3*, can be written as:

(7) s2*=sl*(sl*Rl)2–Δa+abiasaz

(8) s3*=sr*(sr*Rr)2–Δa+abiasaz

where az is the longitudinal acceleration of the ego vehicle, Δa is the LC threshold and a_bias represents the asymmetric property of LLC and RLC.

All the values of the parameters in equations (1) and (5)-(8) are listed in Table I (Kesting and Treiber, 2013), and the occupancy states of cells can be given as (Figure 2):

(9) {cellm=0if R≥s1*cellm=1if R<s1*

(10) {cell1=0if Rl≥s2*cell1=1if Rl<s2*

(11) {cellr=0if Rr≥s3*cellr=1if Rr<s3*

Depending on the occupancy state of cell grid, eight scenarios (four scenarios for LLC) can be generated, as depicted in Figure 3:

LLC Scenario 0_0: When the ego vehicle makes LLC, there are no object vehicles on both cell_m and cell_l;
LLC Scenario 0_1: When the ego vehicle makes LLC, there is no object vehicle on cell_l but cell_m is occupied;
LLC Scenario 1_0: When the ego vehicle makes LLC, there is no object vehicle on cell_m but cell_l is occupied;
LLC Scenario 1_1: When the ego vehicle makes LLC, both cell_m and cell_l are occupied;
RLC Scenario 0_0: When the ego vehicle makes RLC, there are no object vehicles on both cell_m and cell_r;
RLC Scenario 0_1: When the ego vehicle makes LLC, there is no object vehicle on cell_m but cell_r is occupied;
RLC Scenario 1_0: When the ego vehicle makes LLC, there is no object vehicle on cell_r but cell_m is occupied; and
RLC Scenario 1_1: When the ego vehicle makes LLC, both cell_m and cell_r are occupied.

Here, the name of the LC scenarios such as Scenario 0_1 and Scenario 1_0 is in accordance with the binary states of the occupancy cells illustrated in Figure 3.

4. Data processing and feature extraction

4.1 Naturalistic driving data

The naturalistic driving data that used in this paper are from the project of the Safety Pilot Model Deployment (SPMD).

The on-road test includes multi-modal traffic, hosting approximately 3,000 vehicles equipped with vehicle-to-vehicle (V2V) communication devices (Henclewood et al., 2014). The data sets we used were extracted from 20 vehicles, driving in the field test including 75 miles of roadway, see Figure 3. Roads that marked as yellow are the route SPMD vehicle driving. Drivers voluntarily joined in SPMD project. They drove the SPMD vehicle completely based on their own driving styles with no restriction on their driving behaviors. Each SPMD vehicle was equipped with data acquisition systems (DAS) such as CAN and GPS and vision system such as Mobileye. All the signals coming from different DAS were time-synchronized and were recorded at 10 Hz.

Finally, 1,375 LC cases (761 LLC and 614 RLC) were analyzed. The distribution of the LC cases with respect to the corresponding LC scenarios in Figure 3 can be seen in Table II. We can see that for LLC, most of the cases took place in LLC Scenario 0_0 (365 cases) and LLC Scenario 0_1 (354 cases). For RLC, the dominating cases are RLC Scenario 0_0 (371 cases) and RLC Scenario 1_0 (214 cases). This result implies that when the driver want to execute left/right LC, he/she tends to wait until the destination lane being empty (cell_l/cell_r is unoccupied).

4.2 Feature extraction

4.2.1 Original features from on-board sensors

Vehicle yaw rate and lateral acceleration are usually used as strong features of vehicle lateral behavior. Together with longitudinal acceleration, the above signals are necessary for recognition, prediction and modeling vehicle lateral behaviors (Leonhardt and Wanielik, 2017; Higgs and Abbas, 2015; Li et al., 2015; Luo et al., 2016). Here we also choose the following signals directly collected from on-board sensors as our candidate features:

yaw Rate_t = yaw rate of the ego vehicle at time t;
az_t = longitudinal acceleration of the ego vehicle at time t; and
ax_t = lateral acceleration of the ego vehicle at time t.

4.2.2 Compound features

Time-to-collision (TTC) is the time required for two vehicles to collide if they continue at their present speeds on the same path. It is usually used to evaluate collision risk (Kusano and Gabler, 2011). If a small TTC indicates the driver may execute LC to overtake the slow leading vehicle. Thus the TTC can be regarded as a valuable feature to recognize LC maneuver (Kasper et al., 2012). Time-to-lane crossing (TLC) represents the time available for a driver until the moment at which any part of the vehicle reaches one of the lane boundaries (Godthelp et al., 1984). It is a parameter to estimate if the ego vehicle is going to cross the lane. Based on (J2944, 2013), TTC and TLC are given by.

TTC with the object vehicle in front on the current lane (TTC_t) at time t:
(12) TTCt=RṘ

where R and Ṙ [in Figure 1(b)] are the range and the range rate between the front edge of the ego vehicle and rear edge of the closest object vehicle in the same traveling path as the ego vehicle, respectively. Here, what needs to be mentioned is that TTC is only calculated for the LC case when Cell_m = 1, because Cell_m = 0 means there is no vehicle in the cell.

TLC at time t (TLC_t):
(13) TLCt=dxvx

where dx is lateral distance between the front wheel and the lane boundary of the ego vehicle and vx is the lateral speed.

In case that Ṙ and vx are equal to zero, equations (12) and (13) approach infinity, we use the inverse of TTCt−1 and TLCt−1 instead.

4.2.3 Time-Window features

Vehicle on-board signals are time series, using time-window (TW) for feature extraction is effective to capture the information during the past few seconds (Thissen et al., 2003; Salfner and Malek, 2007). In the case of LC recognition, different length of TW between 1 and 5 s are selected for feature extraction (Mandalia and Salvucci, 2005). To capture the properties of time series, statistical variables (mean, standard deviation, maximum, minimum and median) are calculated within each TW (Li et al., 2015) as is described in Table III, i.e. feature number 6-80. The number of the top right corner of the feature is the length of TW, so ‘5’ in mean_yawt5 means 5 s length of TW and ‘4’ in mean_yawt4 represents 4 s length of TW, see feature # 6 and # 7 as examples.

4.2.4 Frequency-domain features

Frequency-domain feature extraction has already been used in anomaly detection (Chen et al., 2015; Chandola et al., 2009). Fast Fourier transform (FFT) was used to transform time-domain signals into frequency-domain (Heckbert, 1995). The maximum value of FFT coefficients within TW is a good indicator to represent the property of frequency signals (Mörchen, 2003). The description of the frequency-domain features are listed in Table III, with feature number 81-95.

4.3 Labeling LC datasets

To evaluate extracted features, both LC and LK datasets should be labeled. Take LLC for example, as shown in Figure 4, the ego vehicle (blue) intends to overtake the slow vehicle (red) by left lane change. The moment that the left wheel of the ego car just crosses the central dotted line is marked as the initial LC time t₀. Based on the study in Salvucci and Liu (2002), normally drivers tend to start LC maneuver approximately 5 s before actual LC. Thus in this paper, time series between t₀ and 5 s before are labeled as LC behavior. To ensure LK data sets are separation of LC data sets, LK behavior are labeled between 10 and 15 s prior to t₀. It is the same way to label RLC data sets.

5. Method

5.1 Feature evaluation

In the view of statistics, the p-value is commonly used to test whether there is statistical significance between two groups. In our case, if there is statistical significance between LC data sets and LK data sets, the extracted features are probably good indicators to classify LC and LK maneuvers. However, only using p-value to evaluate significance is insufficient (Sullivan and Feinn, 2012). The effect size, such as Cohen (1988), is also used as an important evaluation metric (Cohen, 1990):

(14) d=|M1 – M2|S12+S222

where:

d = Cohen’s index;
M₁ = mean of the first group data;
M₂ = mean of the second group data;
S₁ = standard deviation of the first group data; and
S₂ = standard deviation of the second group data.

To define significance level, Cohen defines the effect class as follow (Cohen, 1992):

d < 0.5 = small effect;
0.5 ≤ d < 0.8 = medium effect; and
d ≥ 0.8 = large effect.

For each LC maneuver, we label LC and LK data sets and calculate both Cohen’d and p-value for each feature. Then for all the LC cases, we average the Cohen’d and p-values to get the mean for each feature in each scenario.

5.2 Models used for feature evaluation

To test if the selected features have advantages for machine learning techniques over all features, the SVM, NB, DT and KNN are chosen to evaluate classification performance. We then built the above learning-based models using the Statistics and Machine Learning Toolbox[1]. Here, the SVM model was set with a Gaussian kernel function, and NB with Kernel smoothing density estimation method, DT with the default setting and KNN using empirical prior with k = 1. The datasets used for training the machine learning models are the same, which are labeled by the method presented in Section 4.3.

6. Results and analysis

6.1 Analysis from effect size and p value

All the evaluation results (Cohen’ d and p-value) for each feature can be found in Table AI. A p-value smaller than 0.05 can be regarded as having statistical significance and a Cohen’s d value larger than 0.8 has large effect level (Cohen, 1992). By following this two criterion, we mark each feature with Cohen’ d larger than 0.8 and p-value smaller than 0.05 as red in Table AI. The red-marked features have great influence on the corresponding LC maneuver (LLC or RLC), and thus can be selected as strong features for LC maneuver recognition. Overall, based on the features marked as red, we find the following:

Although some features (p < 0.05) have shown statistical significance (marked as blue), they have only medium or small effect size (Cohen’ d < 0.8). This result also coincides with that only using p-value to evaluate statistical significance is not enough (Sullivan and Feinn, 2012);
Original features of yawRate_t (#1), az_t (#2) and ax_t (#3) and compound feature TLCt−1 (#5) are not strong features for LLC case with no items marked as red. For RLC, only az_t and TLCt−1 in RLC Scenario 0_1 can be regarded as strong features. This implies that the common empirical knowledge of using these features is not that much convincing.
We mentioned that TTCt−1 is only calculated when the front cell of the ego vehicle is occupied by an object vehicle (Cell_m = 1). TTCt−1 is marked as a strong feature in the LLC case, which demonstrates that the potential of rear-end collision does influence drivers’ LC decision. In many research, a hypothesis – if the driver follows a leading vehicle which is too slow, he/she would probably maneuver a LC to overtake the slow leading vehicle – was made. This analysis from naturalistic driving data proves that this hypothesis is reasonable.
Features #56-#60, which refer to mean_ax, are the least important features for LC maneuver recognition, with no item marked as red.
To analyze the TW features (#6-#95), we take the marked strong features in LLC Scenario 0_0 and LLC Scenario 0_1 for instance. Here, we segment the Table vertically with 5 features in a group, e.g. features.

Features #6-#10 are related to the same feature mean_yaw but with different TW from 5 to 1 s, and so on. The detailed illustration can is shown in Table AI, where the features with the largest Cohen’d and the smallest p-value (marked with ‘▲’ and ‘▼’, respectively) demonstrate that they have the strongest effect on LC. From these peak and valley values we find that features with the largest Cohen’ d are also likely to have the smallest p-values, except for Feature #31 and #32 in LLC Scenario 0_0. We select the final features for each scenario based on the marked peak and valley features, and for the special case like feature #31 and #32, the features with large Cohen’d (e.g. #31) are selected.

6.2 Final selected features for each LC scenario

Based on the marked features and results, the final selected features in each LC scenario are listed in Table IV. It can be found that different LC scenarios have different features sets. The number of selected features from all 95 features for each LC scenario ranges from 8 to 16. There is no feature eligible for all LC scenarios. Only using original features and compound features (#2, #4, #5) are far less enough, because no such kinds of features have been selected at all in LLC Scenario 0_0 and LLC Scenario 1_0, RLC Scenario 0_0, and RLC Scenario 1_0.

Although original features related to vehicle’s lateral movement (yawRate_t (#1), az_t (#2) and ax_t (#3)) are not contributive as expected, their corresponding TW features have shown large effect sizes. This implies that the property of the original features within certain TW carries more important information regarding to LC maneuvers. In addition, frequency-domain features do have a contribution as expected, with nearly at least one feature falls into the strong feature set (the exception is LLC Scenario 0_1 with no frequency-domain feature eligible). In what follows, we will use these selected features to train models and evaluate their performance.

6.3 Performance evaluation using the selected features

To test if the selected features can really improve model performance, we compare the classification results of different models trained with the selected features in Table IV (termed as `Selected’) and all features (termed as ‘All Features’). Data sets used for training are the same as what we used for calculating Cohen’d and p-value. To guarantee that the training data and testing data are disjoint, a cross-validation (CV) method is used to test the performance of these models. The datasets are evenly divided into ten folds. Nine folds are used to train the models and the remaining is used to test the models.

The receiver operating characteristic (ROC) curve is used to access model performance as it has been widely used as a tool to illustrate the performance of binary classifiers by considering the true positive rate (TPR) and false positive rate (FPR) over different thresholds settings (Lethaus et al., 2013; Morris et al., 2011). TPR and FPR are defined as follows:

(15) TPR=TPTP+FNFPR=FPTN+FP

where TP, TN, FP, FN are true positives, true negatives, false positives and false negatives, respectively. A simple way to compare different classifiers is to calculate the value of area under curve (AUC) with a value ranging from 0 to 1. A larger AUC value indicates better performance. All ROC curves are illustrated in Figures 6 and 7. The corresponding AUC values are listed in Table V. Figure 6 demonstrates the classification performance of each classification model in different LLC scenarios. The blue lines are the ROC curves of using selected features for training while the red lines are using all features. Figure 7 represents the same content as Figure 6 but for RLC scenarios (Figure 5).

In Table V, a comparison is made between all features and selected features for each classification model in each LC scenario. We denote ‘↓’ as the performance deterioration of using selected features compared with using all features. The improvement in percentage is also shown in Table V. Finally, we find the following results:

KNN performs very good in all LC scenarios with AUC values greater than 0.95, so does DT except for the performance in LLC Scenario 0_0 with all features (AUC = 0.82). With this exception, DT can significantly improve the classification performance from 0.82 to 0.98 (an increase of 19.5 per cent) by using the selected features. For DT, using selected features only shows tiny deterioration (1.0 per cent) in LLC Scenario 1_1 and RLC Scenario 1_1.
For SVM, it can greatly improve the classification performance (performance increase between 4.2 and 13.6 per cent) by using the selected features, compared with using all features, but only show declination in LLC Scenario 0_1.
NB represents different pictures. Using selected features cannot improve model performance, compared with using all features (no improvement in all LLC scenarios), except for in RLC Scenario 0_1 and RLC Scenario 1_1 (Figure 6).

7. Conclusion and future work

In this paper, a statistics-based feature selection method for recognition of LC maneuver is proposed using naturalistic driving data from the time domain and the frequency domain. The extracted features include original features collected from on-board sensors and compound features like TTC, TLC as well as time-window features. Totally 95 features are extracted as candidate features. We found that for different LC scenarios, the final selected features are different. There is no feature being sufficient for all the LC scenarios. In addition, features refer to vehicle lateral movement which are frequently being used as features regarding to LC, such as yaw rate (yawRate_t, #1), lateral acceleration (ax_t, #3) as well as TLC ( TTCt−1, #5), do not show statistical significance (except for TTCt−1 in RLC Scenario 0_1). This counter-empirical result makes it more worthwhile to do feature selection work rather than just based on empirical knowledge.

Finally, the classification performance by using the final selected features in each LC scenario is compared to that using all features. The result shows that except for the relatively poor performance of Naive Bayes, the performance of SVM and Decision Tree, as well as KNN, can be improved from different levels by using the selected features in most LC scenarios compared with using all features. Summarily, the high performance achieved by the classification models using all features (95 features) is at the expense of computation time and taking up large storage. Considering the fact that using the selected features (nearly only ten features) to train the models can still achieve the same performance or even have significant improvement. In future work, a series of on-road experiment will be conducted to recognize LC maneuver to evaluate the recognizing performance in real-time scenarios.

Figures

Figure 1

Illustration of the occupancy cells of LC scenarios

Figure 2

Illustration of the occupancy cells LC scenarios

Figure 3

On-road test area of SPMD (Bezzina and Sayer, 2014)

Figure 4

Data labeling for LC and LK behaviors

Figure 5

ROC curves of comparison results with different models using the selected features and all features in LLC scenarios

Figure 6

ROC curves of comparison results with different models using the selected features and all features in RLC scenarios

Table I

Values of the defined cell grid

Parameter	Value
RT	1.9 s
T	1.0 s
s₀	2 m
a*	1.0 m/s²
b*	1.5 m/s²
Δa	0.1 m/s²
*a_bias*	0.3 m/s²

Table II

Total amount of LC cases

LC Type	Scenario	Amount
LLC	0_0	365
	0_1	354
	1_0	15
	1_1	27
RLC	0_0	371
	0_1	10
	1_0	214
	1_1	16

Table III

Description of the extracted features

#	Feature name	Feature description
1	yaw Rate_t	yaw rate of ego vehicle at time t
2	az_t	az of ego vehicle at time t
3	ax_t	ax of ego vehicle at time t
4	TTCt−1	TTCt−1 at time t
5	TLCt−1	TLCt−1 at time t
6	mean_yawt5	mean of yawRate in TW 5 s
7	mean_yawt4	mean of yawRate in TW 4 s
8-10	⋮	mean yaw_t in TW 3 s, 2 s, 1 s
11	std_yawt5	std of yawRate in TW 5 s
12-15	⋮	std_yaw_t in TW 4 s, 3 s, 2 s, 1 s
16	max_yawt5	maximum of yawRate in TW 5 s
17-20	⋮	max yaw_t in TW 4 s, 3 s, 2 s, 1 s
21	min_yawt5	minimum of yawRate in TW 5 s
22-25	⋮	min_yaw_t in TW 4 s, 3 s, 2 s, 1 s
26	med_yawt5	median of yawRate in TW 5 s
27-30	⋮	med_yaw_t in TW 4 s, 3 s, 2 s, 1 s
31	mean_azt5	mean of the az in TW 5 s
32-35	⋮	mean_az_t in TW 4 s, 3 s, 2 s, 1 s
36	std_azt5	standard deviation of az in TW 5 s
37-40	⋮	std_az_t in TW 4 s, 3 s, 2 s, 1 s
41	max_azt5	maximum of az in TW 5 s
42-45	⋮	max_az_t in TW 4 s, 3 s, 2 s, 1 s
46	min_azt5	minimum of az in TW 5 s
47-50	⋮	min_az_t in TW 4 s, 3 s, 2 s, 1 s
51	med_azt5	median of az in TW 5 s
52-55	⋮	med_az_t in TW 4 s, 3 s, 2 s, 1 s
56	mean_axt5	mean of the ax in TW 5 s
57-60	⋮s	mean_ax_t in TW 4 s, 3 s, 2 s, 1 s
61	std_axt5	standard deviation of ax in TW 5 s
62-65	⋮	std_ax_t in TW 4 s, 3 s, 2 s, 1 s
66	max_axt5	maximum of ax in TW 5 s
67-70	⋮	mean_ax_t in TW 4 s, 3 s, 2 s, 1 s
71	min_axt5	minimum of ax in TW 5 s
72-75	⋮	min_ax_t in TW 4 s, 3 s, 2 s, 1 s
76	med_axt5	median of ax in TW 5 s
77-80	⋮	med_ax_t in TW 4 s, 3 s, 2 s, 1 s
81	max_F_yawt5	max yawRate FFT coefficients in TW 5 s
82-85	⋮	max_F_yaw_t in TW 4 s, 3 s, 2 s, 1 s
86	max_F_azt5	max az FFT coefficients in TW 5 s
87-90	⋮	max_F_az_t in TW 4 s, 3 s, 2 s, 1 s
91	max_F_axt5	max ax FFT coefficients in TW 5 s
92-95	⋮	max_F_ax_t in TW 4 s, 3 s, 2 s, 1 s

Table IV

Final selected strong features for each LC scenario

#		LLC Scenario								RLC Scenario
		0_0		0_1		1_0		1_1		0_0		0_1		1_0		1_1
	Feature	d	p	d	p	d	p	d	p	d	p	d	p	d	p	d	p
Selected Amount		12		10		11		11		8		16		11		13
2	az_t	–	–	–	–	–	–	–	–	–	–	0.96	0.04	–	–	–	–
4	TTCt−1	–	–	0.92	0.04	–	–	1.18	0.01	–	–	–	–	–	–	1.18	< 0.01
5	TLCt−1	–	–	–	–	–	–	–	–	–	–	0.82	0.02	–	–	–	–
6	mean_yawt5	1.02	0.03	–	–	–	–	1.54	0.04	0.92	0.04	1.29	0.04	0.96	0.04	1.54	< 0.01
7	mean_yawt4	–	–	0.98	0.04	1.14	< 0.01	–	–	–	–	–	–	–	–	–	–
11	std_yawt5	–	–	1.05	0.03	–	–	–	–	1.01	0.03	–	–	1.02	0.03	0.98	0.03
12	std_yawt4	–	–	–	–	–	–	1.03	0.01	–	–	–	–	–	–	–	–
13	std_yawt3	0.99	0.04	–	–	–	–	–	–	–	–	1.60	< 0.01	–	–	–	–
16	max_yawt5	1.00	0.04	–	–	–	–	–	–	0.97	0.03	–	–	0.95	0.04	–	–
17	max_yawt4	–	–	0.97	0.03	–	–	1.36	0.03	–	–	–	–	–	–	–	–
18	max_yawt3	–	–	–	–	–	–	–	–	–	–	–	–	–	–	1.40	< 0.01
21	min_yawt5	–	–	0.95	0.03	–	–	–	–	0.98	0.04	1.10	0.02	1.06	0.02	1.38	< 0.01
22	min_yawt4	1.03	0.03	–	–	–	–	1.20	0.04	–	–	–	–	–	–	–	–
23	min_yawt3	–	–	–	–	1.04	0.03	–	–	–	–	–	–	–	–	–	–
26	med_yawt5	0.92	0.04	–	–	–	–	–	–	–	–	–	–	1.04	0.03	–	–
27	med_yawt4	–	–	–	–	0.95	< 0.01	–	–	–	–	–	–	–	–	–	–
28	med_yawt3	–	–	–	–	–	–	–	–	–	–	–	–	–	–	1.23	0.01
31	mean_azt5	0.92	0.04	–	–	0.87	0.01	–	–	–	–	1.16	< 0.01	–	–	–	–
32	mean_azt4	–	–	0.95	0.04	–	–	–	–	–	–	–	–	0.98	0.04	–	–
36	std_azt5	0.91	0.04	1.05	0.04	–	–	–	–	0.91	0.04	0.85	0.02	0.94	0.04	–	–
38	std_azt3	–	–	–	–	1.23	< 0.01	–	–	–	–	–	–	–	–	0.92	0.03
41	max_azt5	0.98	0.04	–	–	–	–	–	–	–	–	–	–	0.98	0.04	–	–
42	max_azt4	–	–	1.01	0.03	–	–	–	–	0.95	0.04	–	–	–	–	–	–
43	max_azt3	–	–	–	–	–	–	0.85	0.04	–	–	–	–	–	–	–	–
46	min_azt5	0.93	0.03	0.90	0.04	–	–	–	–	0.84	0.04	1.28	0.02	0.94	0.03	–	–
50	min_azt1	–	–	–	–	–	–	1.04	0.03	–	–	–	–	–	–	–	–
51	med_azt5	0.88	0.04	0.94	0.04	0.97	0.01	–	–	–	–	1.01	< 0.01	–	–	0.94	< 0.01
52	med_azt4	–	–	–	–	–	–	–	–	–	–	–	–	1.00	0.04	–	–
54	med_azt2	–	–	–	–	–	–	0.91	0.03	–	–	–	–	–	–	–	–
61	std_axt5	–	–	–	–	1.21	< 0.01	–	–	–	–	1.16	0.02	–	–	–	–
62	std_axt4	–	–	–	–	–	–	1.02	0.03	–	–	–	–	–	–	1.02	0.01
66	max_axt5	–	–	–	–	–	–	–	–	–	–	1.13	0.01	–	–	–	–
68	max_axt3	–	–	–	–	1.06	0.04	–	–	–	–	–	–	–	–	–	–
71	min_axt5	0.94	0.04	–	–	–	–	–	–	–	–	1.10	0.02	–	–	–	–
72	min_axt4	–	–	–	–	0.99	< 0.01	–	–	–	–	–	–	–	–	1.12	0.03
76	med_axt5	–	–	–	–	–	–	–	–	–	–	1.06	0.04	–	–	–	–
81	max_F_yawt5	–	–	–	–	–	–	–	–	–	–	0.95	0.01	–	–	–	–
82	max_F_yawt4	–	–	–	–	0.87	0.01	–	–	0.88	0.04	–	–	–	–	1.20	0.03
83	max_F_yawt3	–	–	–	–	–	–	1.14	0.04	–	–	–	–	–	–	–	–
86	max_F_azt5	0.98	0.04	–	–	–	–	0.83	0.02	–	–	0.88	0.02	0.95	0.04	–	–
87	max_F_azt4	–	–	–	–	–	–	–	–	–	–	–	–	–	–	0.86	0.04
91	max_F_axt5	–	–	–	–	–	–	–	–	–	–	–	–	–	–	1.16	0.02
93	max_F_axt5	–	–	–	–	1.18	< 0.01	–	–	–	–	–	–	–	–	–	–
94	max_F_axt2	–	–	–	–	–	–	–	–	–	–	1.11	< 0.01	–	–	–	–

Table V

AUC values of comparison results with different models using the selected features and all features in each LC scenario

	LLC Scenario 0_0				LLC Scenario 0_1				LLC Scenario 1_0				LLC Scenario 1_1
Feature type	SVM	NB	DT	KNN	SVM	NB	DT	KNN	SVM	NB	DT	KNN	SVM	NB	DT	KNN
All Features	0.90	0.85	0.82	0.96	0.92	0.85	0.94	0.99	0.88	0.94	0.99	0.98	0.93	0.97	0.99	0.98
Selected	0.99	0.81↓	0.98	0.98	0.84↓	0.78↓	0.97	0.99	1	0.93↓	0.99	1	0.99	0.96↓	0.98↓	0.99
Improvement (%)	10.0	−4.7	19.5	2.0	−8.6	−8.2	3.1	0	13.6	−1.0	0	2.0	6.4	−1.0	−1.0	1.0
	RLC Scenario 0_0				RLC Scenario 0_1				RLC Scenario 1_0				RLC Scenario 1_1
All Features	0.93	0.83	0.92	0.97	0.92	0.93	0.97	0.96	0.94	0.87	0.91	0.97	0.95	0.94	0.99	0.96
Selected	0.97	0.75↓	0.98	0.99	0.99	0.98	0.98	0.98	0.98	0.80↓	0.98	0.99	0.99	0.95	0.98↓	1
Improvement (%)	4.3	−9.6	6.5	2.0	7.6	5.3	1.0	2.0	4.2	−8.0	7.6	1.0	4.2	1.0	−1.0	4.1

Table AI

Full-scale effect size of the features

	LLC Scenario								RLC Scenario
	0_0		0_1		1_0		1_1		0_0		0_1		1_0		1_1
#	d	p	d	p	d	p	d	p	d	p	d	p	d	p	d	p
1	0.75	0.10	0.75	0.10	0.81	0.05	0.66	0.14	0.66	0.11	0.60	0.12	0.76	0.09	0.66	0.19
2	0.81	0.09	0.71	0.13	0.74	0.13	0.79	0.10	0.69	0.12	0.96	0.04	0.81	0.09	0.79	0.08
3	0.06	0.78	0.07	0.75	0.04	0.86	0.07	0.75	0.06	0.78	0.07	0.72	0.07	0.76	0.07	0.74
4	–	–	0.92	0.04	–	–	1.18	0.01	–	–	–	–	0.84	0.06	1.18	<0.01
5	0.51	0.18	0.59	0.11	0.31	0.27	0.60	0.16	0.55	0.16	0.82	0.02	0.62	0.13	0.71	0.04
6	1.02 ▲	0.03 ▼	0.98	0.05	0.98	0.02	1.54	0.04	0.92	0.04	1.29	0.04	0.96	0.04	1.54	<0.01
7	0.97	0.06	0.98 ▲	0.04 ▼	1.14	<0.01	1.32	0.07	0.91	0.04	0.01	0.11	0.94	0.06	1.32	0.02
8	0.91	0.06	0.89	0.06	1.00	0.01	0.83	0.07	0.92	0.05	0.72	0.03	0.85	0.07	0.83	0.01
9	0.86	0.06	0.88	0.06	0.89	0.06	0.92	0.09	0.89	0.07	1.11	<0.01	0.80	0.09	0.92	0.01
10	0.82	0.07	0.79	0.09	0.86	0.10	1.04	0.10	0.78	0.08	0.94	0.14	0.78	0.08	1.04	0.08
11	0.98	0.04	1.05 ▲	0.03 ▼	0.91	0.05	0.98	0.08	1.01	0.03	1.29	0.02	1.02	0.03	0.98	0.03
12	0.97	0.04	1.04	0.05	1.21	0.12	1.03	0.01	0.95	0.04	1.28	<0.01	0.99	0.03	1.03	0.11
13	0.99 ▲	0.04 ▼	0.97	0.06	1.08	0.06	0.79	0.02	0.96	0.05	1.60	<0.01	0.97	0.07	0.79	0.02
14	0.91	0.06	0.90	0.08	0.77	0.03	1.07	0.09	0.91	0.09	1.34	<0.01	0.88	0.09	1.07	0.10
15	0.80	0.09	0.73	0.12	0.63	0.17	0.84	0.11	0.79	0.10	1.02	0.03	0.74	0.12	0.84	0.09
16	1.00 ▲	0.04 ▼	0.93	0.04	0.62	0.08	1.34	0.02	0.97	0.03	1.28	<0.01	0.95	0.04	1.34	<0.01
17	0.94	0.05	0.97 ▲	0.03 ▼	0.94	0.08	1.36	0.03	1.02	0.05	1.13	<0.01	0.97	0.05	1.36	<0.01
18	0.98	0.06	0.93	0.04	0.78	0.11	1.40	0.06	0.99	0.05	1.03	<0.01	0.96	0.05	1.40	<0.01
19	0.92	0.07	0.88	0.06	0.74	0.07	1.23	0.09	0.93	0.06	1.38	<0.01	0.91	0.06	1.23	<0.01
20	0.84	0.06	0.84	0.08	0.79	0.10	0.91	0.05	0.88	0.06	0.93	<0.01	0.83	0.06	0.91	0.05
21	0.98	0.04	0.95 ▲	0.03 ▼	0.89	0.06	1.38	0.07	0.98	0.04	1.10	0.02	1.06	0.02	1.38	<0.01
22	1.03 ▲	0.03 ▼	0.94	0.04	0.77	0.14	1.20	0.04	0.93	0.04	0.98	<0.01	1.03	0.04	1.20	0.03
23	1.00	0.04	0.94	0.05	1.04	0.03	0.72	0.05	0.93	0.04	0.60	0.11	0.94	0.06	0.72	0.04
24	0.94	0.05	0.91	0.07	0.91	0.02	0.56	0.01	0.89	0.07	0.24	0.23	0.91	0.06	0.56	0.09
25	0.88	0.07	0.82	0.07	0.90	0.09	0.99	0.05	0.80	0.08	0.69	0.04	0.82	0.07	0.99	0.14
26	0.92 ▲	0.04 ▼	0.95	0.05	1.11	0.06	1.18	0.08	0.90	0.05	0.65	0.06	1.04	0.03	1.18	<0.01
27	0.91	0.06	0.93	0.05	0.95	<0.01	1.15	0.10	0.92	0.05	0.83	0.06	1.00	0.04	1.15	0.03
28	0.87	0.06	0.83	0.06	0.80	0.03	1.23	0.07	0.89	0.05	1.05	0.11	0.85	0.07	1.23	0.01
29	0.80	0.09	0.82	0.06	0.87	0.06	0.98	0.06	0.84	0.08	0.86	0.10	0.75	0.11	0.98	0.03
30	0.79	0.08	0.77	0.09	0.82	0.09	1.01	0.09	0.77	0.08	0.90	0.16	0.77	0.08	1.01	0.07
31	0.92 ▲	0.04	0.94	0.04	0.87	0.01	0.70	0.02	0.87	0.06	1.16	<0.01	0.96	0.04	0.70	0.02
32	0.91	0.03 ▼	0.95 ▲	0.04 ▼	0.67	0.03	0.59	0.02	0.89	0.06	0.88	<0.01	0.98	0.04	0.59	0.05
33	0.87	0.04	0.86	0.06	0.59	0.11	0.85	0.07	0.89	0.07	0.69	0.02	0.95	0.03	0.85	0.06
34	0.86	0.07	0.85	0.06	0.66	0.13	0.80	0.07	0.87	0.06	1.04	0.07	0.95	0.06	0.80	0.06
35	0.83	0.07	0.81	0.10	0.75	0.06	0.71	0.10	0.83	0.06	1.07	0.09	0.83	0.07	0.71	<0.01
36	0.91 ▲	0.04 ▼	1.05 ▲	0.04 ▼	1.06	0.01	1.00	0.05	0.91	0.04	0.85	0.02	0.94	0.04	1.00	0.06
37	0.93	0.05	0.98	0.05	1.07	0.01	1.05	0.09	0.96	0.05	0.66	0.03	0.92	0.05	1.05	0.06
38	0.99	0.05	0.93	0.07	1.23	<0.01	0.92	0.10	0.97	0.05	0.79	0.05	0.90	0.07	0.92	0.03
39	0.94	0.07	0.88	0.08	1.10	<0.01	0.74	0.06	0.90	0.06	0.81	0.21	0.84	0.11	0.74	0.05
40	0.75	0.10	0.75	0.11	0.87	0.01	0.54	0.08	0.71	0.12	0.89	0.11	0.69	0.13	0.54	0.22
41	0.98 ▲	0.04 ▼	0.99	0.04	0.76	0.08	0.73	0.03	0.92	0.04	1.00	0.10	0.98	0.04	0.73	0.07
42	0.96	0.04	1.01 ▲	0.03 ▼	0.78	0.06	0.53	<0.01	0.95	0.04	1.12	0.16	0.87	0.04	0.53	0.09
43	0.90	0.05	0.95	0.05	0.85	0.04	0.79	0.01	0.93	0.05	0.90	0.09	0.90	0.04	0.79	0.11
44	0.89	0.06	0.91	0.04	0.71	0.03	0.60	0.05	0.93	0.04	0.63	0.06	0.84	0.06	0.60	0.13
45	0.87	0.07	0.86	0.09	0.49	0.08	0.65	0.05	0.91	0.05	0.90	0.02	0.83	0.07	0.65	0.01
46	0.93 ▲	0.03 ▼	0.90 ▲	0.04 ▼	0.58	0.14	0.90	0.01	0.84	0.04	1.28	0.02	0.94	0.03	0.90	0.06
47	0.89	0.04	0.88	0.05	0.66	0.11	0.82	<0.01	0.85	0.05	1.30	0.05	0.92	0.03	0.82	0.08
48	0.85	0.06	0.87	0.06	0.74	0.17	1.00	<0.01	0.83	0.05	1.01	0.09	0.95	0.05	1.00	0.04
49	0.81	0.07	0.91	0.06	0.45	0.07	0.89	0.04	0.87	0.07	0.90	0.02	0.95	0.05	0.89	0.05
50	0.84	0.07	0.84	0.08	0.63	0.16	1.04	0.03	0.80	0.09	0.89	0.05	0.90	0.08	1.04	0.01
51	0.88 ▲	0.04 ▼	0.94 ▲	0.04 ▼	0.97	0.01	0.94	0.05	0.91	0.05	1.01	<0.01	0.90	0.05	0.94	<0.01
52	0.86	0.05	0.91	0.05	0.61	0.04	0.83	<0.01	0.88	0.05	0.63	0.09	1.00	0.04	0.83	0.06
53	0.81	0.05	0.88	0.06	0.61	0.15	0.82	<0.01	0.85	0.07	0.54	0.02	0.93	0.04	0.82	0.10
54	0.82	0.07	0.85	0.06	0.74	0.05	0.91	0.03	0.84	0.07	0.92	0.09	0.93	0.08	0.91	0.04
55	0.84	0.07	0.81	0.10	0.79	0.05	0.83	0.05	0.82	0.06	1.07	0.06	0.83	0.07	0.83	0.01
56	0.37	0.34	0.44	0.28	0.31	0.38	0.30	0.39	0.41	0.31	0.58	0.12	0.48	0.26	0.30	0.46
57	0.36	0.34	0.44	0.27	0.33	0.37	0.39	0.31	0.42	0.30	0.54	0.10	0.48	0.26	0.39	0.24
58	0.35	0.33	0.44	0.25	0.30	0.40	0.34	0.25	0.39	0.31	0.71	0.09	0.47	0.25	0.34	0.23
59	0.31	0.35	0.43	0.23	0.28	0.48	0.38	0.19	0.36	0.30	0.58	0.13	0.43	0.26	0.38	0.17
60	0.24	0.40	0.34	0.27	0.23	0.48	0.39	0.25	0.28	0.35	0.38	0.27	0.32	0.30	0.39	0.12
61	0.91	0.05	0.96	0.05	1.21	<0.01	0.76	0.06	0.96	0.05	1.16	0.02	0.95	0.06	0.76	0.03
62	0.90	0.05	0.95	0.05	1.16	0.01	1.02	0.03	0.93	0.07	1.11	0.10	0.91	0.07	1.02	0.01
63	0.93	0.07	0.93	0.05	1.06	0.07	0.93	0.03	0.97	0.08	0.96	0.09	0.86	0.07	0.93	0.01
64	0.91	0.07	0.85	0.08	1.02	0.04	0.89	0.07	0.90	0.05	0.80	0.04	0.79	0.09	0.89	0.07
65	0.73	0.08	0.69	0.12	0.82	0.01	0.70	0.05	0.72	0.09	0.79	0.03	0.70	0.09	0.70	0.17
66	0.88	0.06	0.91	0.06	1.00	<0.01	0.79	0.06	0.92	0.07	1.13	0.01	0.97	0.05	0.79	0.01
67	0.87	0.07	0.90	0.06	0.97	0.07	0.77	0.02	0.91	0.06	0.86	0.15	0.93	0.07	0.77	0.07
68	0.91	0.07	0.89	0.07	1.06	0.04	0.70	0.04	0.90	0.08	0.96	0.08	0.88	0.07	0.70	0.18
69	0.87	0.07	0.82	0.10	0.98	0.05	0.63	0.02	0.84	0.09	0.86	0.05	0.79	0.10	0.63	0.17
70	0.70	0.08	0.67	0.12	0.70	0.03	0.59	0.06	0.65	0.11	0.67	0.17	0.65	0.13	0.59	0.24
71	0.94 ▲	0.04 ▼	0.94	0.06	0.98	<0.01	0.95	0.06	0.94	0.06	1.10	0.02	0.85	0.08	0.95	0.13
72	0.93	0.05	0.94	0.06	0.99	<0.01	1.12	0.05	0.92	0.05	1.09	0.01	0.92	0.08	1.12	0.03
73	0.94	0.06	0.91	0.07	1.03	0.11	0.98	0.06	0.95	0.07	0.88	0.11	0.89	0.08	0.98	0.01
74	0.91	0.07	0.82	0.08	0.98	0.07	0.98	0.07	0.86	0.07	0.76	0.05	0.82	0.09	0.98	0.02
75	0.71	0.10	0.65	0.11	0.81	0.05	0.80	0.05	0.70	0.10	0.72	0.15	0.68	0.10	0.80	0.11
76	0.83	0.06	0.84	0.07	0.71	0.07	0.64	0.04	0.77	0.08	1.06	0.04	0.77	0.10	0.64	0.15
77	0.82	0.07	0.81	0.08	0.76	0.03	0.65	0.08	0.76	0.10	1.08	0.06	0.78	0.11	0.65	0.17
78	0.79	0.09	0.80	0.11	0.62	0.04	0.59	0.07	0.72	0.13	0.91	0.02	0.74	0.12	0.59	0.13
79	0.67	0.11	0.64	0.13	0.59	0.18	0.52	0.09	0.67	0.14	0.80	0.10	0.69	0.14	0.52	0.24
80	0.48	0.20	0.44	0.21	0.47	0.16	0.42	0.13	0.44	0.20	0.53	0.12	0.49	0.23	0.42	0.20
81	0.91	0.05	0.96	0.05	0.81	0.02	0.75	0.06	0.86	0.06	0.95	0.01	0.93	0.05	0.75	0.08
82	0.94	0.05	0.83	0.07	0.87	0.01	1.20	0.08	0.88	0.04	0.43	0.01	0.90	0.06	1.20	0.03
83	0.90	0.07	0.85	0.09	0.70	0.07	1.14	0.04	0.87	0.08	1.36	0.05	0.82	0.08	1.14	0.01
84	0.84	0.09	0.78	0.11	0.59	0.12	0.90	0.02	0.83	0.11	0.88	0.03	0.78	0.10	0.90	0.09
85	0.69	0.14	0.62	0.15	0.83	0.10	0.73	0.13	0.68	0.15	0.62	0.05	0.68	0.14	0.73	0.06
86	0.98 ▲	0.04 ▼	0.97	0.06	0.61	0.07	0.83	0.02	0.90	0.06	0.88	0.02	0.95	0.04	0.83	0.09
87	0.86	0.06	0.89	0.06	0.79	0.08	0.86	0.07	0.88	0.08	0.86	0.14	0.88	0.07	0.86	0.04
88	0.89	0.07	0.87	0.07	1.03	0.05	0.66	0.06	0.81	0.08	0.70	0.12	0.87	0.08	0.66	0.09
89	0.82	0.09	0.79	0.10	0.87	0.07	0.83	0.10	0.72	0.13	0.63	0.23	0.75	0.10	0.83	0.13
90	0.66	0.14	0.59	0.17	0.74	0.10	0.72	0.13	0.58	0.19	0.57	0.10	0.59	0.17	0.72	0.16
91	0.92	0.05	0.84	0.05	0.97	<0.01	1.16	0.10	0.89	0.07	0.52	0.01	0.81	0.07	1.16	0.02
92	0.87	0.05	0.79	0.09	1.03	0.08	0.90	0.14	0.88	0.07	0.88	0.10	0.88	0.06	0.90	0.09
93	0.81	0.09	0.81	0.10	1.18	<0.01	0.65	0.16	0.90	0.07	1.05	0.02	0.86	0.06	0.65	0.14
94	0.79	0.09	0.80	0.09	1.09	0.11	0.74	0.09	0.79	0.08	1.11	<0.01	0.82	0.09	0.74	0.11
95	0.65	0.15	0.61	0.17	1.01	0.11	0.52	0.11	0.59	0.15	1.09	<0.01	0.61	0.15	0.52	0.18

Note

1.

Available at: www.mathworks.com/products/statistics.html

Appendix. All the effect sizes regarding LLC and RLC for all extracted features are listed in Table AI.

Table AI

References

Barr, L. and Najm, W. (2001), “Crash problem characteristics for the intelli- gent vehicle initiative”, Transportation Research Board 80th Annual Meeting.

Bezzina, D. and Sayer, J. (2014), “Safety pilot model deployment: test conductor team report”, Report No. DOT HS, Vol. 812, p. 171.

Boubezoul, A., Koita, A. and Daucher, D. (2009), “Vehicle trajectories classifica- tion using support vectors machines for failure trajectory prediction”, 2009 International Conference on Advances in Computational Tools for Engineering Applications, IEEE, pp. 486-491.

Chen, L., Zhao, Y., Zhang, J. and Zou, J.Z. (2015), “Automatic detection of alertness/drowsiness from physiological signals using wavelet-based nonlinear features and machine learning”, Expert Systems with Applications, Vol. 42 No. 21, pp. 7344-7355.

Chandola, V., Banerjee, A. and Kumar, V. (2009), “Anomaly detection”, ACM Computing Surveys, Vol. 41 No. 3, p. 15.

Cohen (1988), “Statistical power analysis for the behavioral sciences 2nd edn”.

Cohen, J. (1992), “A power primer”, Psychological Bulletin, Vol. 112 No. 1, p. 155.

Cohen, J. (1990), “Things i have learned (so far)”, American Psychologist, Vol. 45 No. 12, p. 1304.

Do, Q.H., Tehrani, H., Mita, S., Egawa, M., Muto, K. and Yoneda, K. (2017), “Human drivers based active-passive model for automated lane change”, IEEE Intelligent Transportation Systems Magazine, Vol. 9 No. 1, pp. 42-56.

Doshi, A., Morris, B. and Trivedi, M. (2011), “On-road prediction of driver’s intent with multimodal sensory cues”, IEEE Pervasive Computing, Vol. 10 No. 3, pp. 22-34.

Godthelp, H., Milgram, P. and Blaauw, G.J. (1984), “The development of a time- related measure to describe driving strategy”, Human Factors: The Journal of the Human Factors and Ergonomics Society, Vol. 26 No. 3, pp. 257-268.

Guyon, I., Gunn, S., Nikravesh, M. and Zadeh, L.A. (2008), Feature Extraction: foundations and Applications, Vol. 207, Springer, Berlin.

Hajinoroozi, M., Jung, T.-P., Lin, C.T. and Huang, Y. (2015), “Feature extraction with deep belief networks for driver’s cognitive states prediction from EEG data”, 2015 IEEE China Summit and International Conference on Signal and Information Processing (ChinaSIP), IEEE, pp. 812- 815.

Heckbert, P. (1995), “Fourier transforms and the fast fourier transform (fft) algorithm”, Computer Graphics, Vol. 2, pp. 15-463.

Henclewood, D. Abramovich, M. and Yelchuru, B. (2014), “Safety pilot model deployment-sample data environment data handbook, v. 1.2”, USDOT Research and Technology Innovation Administrations.

Higgs, B. and Abbas, M. (2015), “Segmentation and clustering of car-following behavior: recognition of driving patterns”, IEEE Transactions on Intelligent Transportation Systems, Vol. 16 No. 1, pp. 81-90.

J2944 (2013), “Operational definitions of driving performance measures and statistics”, SAE Technical Paper, Tech. Rep.

Lee, S.E. Olsen, E.C. and Wierwille, W.W. (2004), “A comprehensive examination of naturalistic lane-changes”, Tech. Rep.

Liebner, M., Ruhhammer, C., Klanner, F. and Stiller, C. (2013), “Generic driver intent inference based on parametric models”, 2013 16th International IEEE Conference on Intelligent Trans- portation Systems-(ITSC), IEEE, pp. 268-275.

Karim, M.R., Saifizul, A., Yamanaka, H., Sharizli, A. and Ramli, R. (2013), “Minimum safe time gap (mstg) as a new safety indicator incorporating vehicle and driver factors”, Journal of the Eastern Asia Society for Transportation Studies, Vol. 10, pp. 2069-2079.

Kasper, D., Weidl, G., Dang, T., Breuel, G., Tamke, A., Wedel, A. and Rosenstiel, W. (2012), “Object-oriented bayesian networks for detection of lane change maneuvers”, IEEE Intelligent Transportation Systems Magazine, Vol. 4 No. 3, pp. 19-31.

Kesting, A. and Treiber, M. (2013), Traffic Flow Dynamics: data, Models and Simulation, no. Book, Whole, Springer, Berlin Heidelberg.

Kumar, P., Perrollaz, M., Lefevre, S. and Laugier, C. (2013), “Learning-based approach for online lane change intention prediction”, 2013 IEEE Intelligent Vehicles Symposium (IV), IEEE, pp. 797-802.

Kusano, K.D. and Gabler, H. (2011), “Method for estimating time to collision at braking in real-world, lead vehicle stopped rear-end crashes for use in pre-crash system design”, SAE International Journal of Passenger Cars – Mechanical Systems, no. 2011-01-0576, Vol. 4 No. 1, pp. 435-443.

Leonhardt, V. and Wanielik, G. (2017), “Feature evaluation for lane change prediction based on driving situation and driver behavior”, 2017 20th International Conference on Information Fusion (Fusion), IEEE, pp. 1-7.

Lethaus, F., Baumann, M.R., Köster, F. and Lemmer, K. (2013), “A comparison of selected simple supervised learning algorithms to predict driver intent based on gaze data”, Neurocomputing, Vol. 121, pp. 108-130.

Li, G., Li, S.E., Liao, Y., Wang, W., Cheng, B. and Chen, F. (2015), “Lane change maneuver recognition via vehicle state and driver operation signals results from naturalistic driving data”, 2015 IEEE Intelligent Vehicles Symposium (IV), IEEE, pp. 865-870.

Li, X., Wang, W. and Rötting, M. (2016), “Bayesian network-based identification of driver lane-changing intents using eye tracking and vehicle-based data”, Advanced Vehicle Control: Proceedings of the 13th Interna- tional Symposium on Advanced Vehicle Control (AVEC’16), September 13-16, (2016), CRC Press, Munich, pp. 229-304.

Luo, Y., Xiang, Y., Cao, K. and Li, K. (2016), “A dynamic automated lane change maneuver based on vehicle-to-vehicle communication”, Transportation Research Part C: Emerging Technologies, Vol. 62, pp. 87-102.

Mandalia, H.M. and Salvucci, M.D.D. (2005), “Using support vector machines for lane-change detection”, in Proceedings of the Human Factors and Ergonomics Society Annual Meeting, Vol. 49 No. 22, pp. 1965-1969. Sage, Los Angeles, CA,

Mörchen, F. (2003), “Time series feature extraction for data mining using dwt and dft”.

Morris, B., Doshi, A. and Trivedi, M. (2011), “Lane change intent prediction for driver assistance: on-road design and evaluation”, 2011 IEEE Intelligent Vehicles Symposium (IV), IEEE, pp. 895-901.

N.H.T.S. Administration (2015), “Traffic safety facts 2015”, Traffic Safety Facts Research Note, et al. Vol. p. 170, 2017.

Peng, J., Guo, Y., Fu, R., Yuan, W. and Wang, C. (2015), “Multi-parameter prediction of drivers’ lane-changing behaviour with neural network model”, Applied Ergonomics, Vol. 50, pp. 207-217.

Salfner, F. and Malek, M. “Using hidden semi-markov models for effective online failure prediction”, 2007 26th IEEE International Symposium on Reliable Distributed Systems (SRDS 2007), IEEE, (2007), pp. 161-174.

Salvucci, D.D. and Liu, A. (2002), “The time course of a lane change: driver control and eye-movement behavior”, Transportation Research Part F: Traffic Psychology and Behaviour, Vol. 5 No. 2, pp. 123-132.

Sivaraman, S. and Trivedi, M.M. (2014), “Dynamic probabilistic drivability maps for lane change and merge driver assistance”, IEEE Transactions on Intelligent Transportation Systems, Vol. 15 No. 5, pp. 2063-2073.

Sullivan, G.M. and Feinn, R. (2012), “Using effect sizeor why the p value is not enough”, Journal of Graduate Medical Education, Vol. 4 No. 3, pp. 279-282.

Thissen, U., Van Brakel, R., De Weijer, A., Melssen, W. and Buydens, L. (2003), “Using support vector machines for time series prediction”, Chemomet-Rics and Intelligent Laboratory Systems, Vol. 69 No. 1-2, pp. 35-49.

Thomas, S., Ganapathy, S. and Hermansky, H. (2008), “Recognition of rever – berant speech using frequency domain linear prediction”, IEEE Signal Processing Letters, Vol. 15, pp. 681-684.

Weidl, G. Madsen, A.L. Wang, S. Kasper, D. and Karlsen, M. (2018), “Early and accurate recognition of highway traffic maneuvers considering real world application: a novel framework using bayesian networks”.

Woo, H., Ji, Y., Kono, H., Tamura, Y., Kuroda, Y., Sugano, T., Yamamoto, Y., Yamashita, A. and Asama, H. (2016), “Dynamic potential-model-based feature for lane change prediction”, 2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC), IEEE, pp. 000 838-000 843.

Visvikis, C., Smith, T., Pitcher, M. and Smith, R. (2008), “Study on lane de- parture warning and lane change assistant systems”, Transport Research Laboratory Project Rpt PPR, Vol. 374.

Xu, G., Liu, L., Ou, Y. and Song, Z. (2012), “Dynamic modeling of driver control strategy of lane-change behavior and trajectory planning for collision prediction”, IEEE Transactions on Intelligent Transportation Systems, Vol. 13 No. 3, pp. 1138-1155.

Zhang, B., Georgoulas, G., Orchard, M., Saxena, A., Brown, D., Vacht- sevanos, G. and Liang, S. (2008), “Rolling element bearing feature extraction and anomaly detection based on vibration monitoring”, 2008 16th Mediterranean Conference on Control and Automation, IEEE, pp. 1792-1797.

Acknowledgements

The authors would like to thank all the participants who are willing to be the experimental driver for this research. This work is supported by China Scholarship Council.

Corresponding author

Xiaohan Li can be contacted at: xiaohan.li@mms.tu-berlin.de