Dynamic prediction of traf ﬁ c incident duration on urban expressways: a deep learning approach based on LSTM and MLP

Purpose – Ef ﬁ cient traf ﬁ c incident management is needed to alleviate the negative impact of traf ﬁ c incidents. Accurate and reliable estimation of traf ﬁ c incident duration is of great importance for traf ﬁ c incident management. Previous studies have proposed models for traf ﬁ c incident duration prediction; however, most of these studies focus on the total duration and could not update prediction results in real-time. From a traveler ’ s perspective, the relevant factor is the residual duration of the impact of the traf ﬁ c incident. Besides, few (if any) studies have used dynamic traf ﬁ c ﬂ ow parameters in the prediction models. This paper aims to propose a framework to ﬁ ll these gaps. Design/methodology/approach – This paper proposes a framework based on the multi-layer perception (MLP) and long short-term memory (LSTM) model. The proposed methodology integrates traf ﬁ c incident-related factors and real-time traf ﬁ c ﬂ ow parameters to predict the residual traf ﬁ c incident duration. To validate the effectiveness of the framework, traf ﬁ c incident data and traf ﬁ c ﬂ ow data from Shanghai Zhonghuan Expressway are used for modeling training and testing. Findings – Results show that the model with 30-min time window and taking both traf ﬁ c volume and speed as inputs performed best. The area under the curve values exceed 0.85 and the prediction accuracies exceed 0.75. These indicators demonstrated that the model is appropriate for this study context. The model provides new insights into traf ﬁ c incident duration prediction. Research limitations/implications – The incident samples applied by this study might not be enough and the variables are not abundant. The number of injuries and casualties, more detailed description of the incident location and other variables are expected to be used to characterize the traf ﬁ c incident comprehensively. The framework needs to be further validated through a suf ﬁ ciently large number of variables and locations. Practical implications – The framework can help reduce the impacts of incidents on the safety of ef ﬁ ciency of road traf ﬁ c once implemented in intelligent transport system and traf ﬁ c management systems in future practical applications. Originality/value – This study uses two arti ﬁ cial neural network methods, MLP and LSTM, to establish a framework aiming at providing accurate and time-ef ﬁ cient information on traf ﬁ c incident duration in the future for transportation operators and travelers. This study will contribute to the deployment of emergency management and urban traf ﬁ c navigation planning.


Introduction
Traffic incidents cause casualties, direct economic losses and traffic congestions which have been studied over the years (Adler et al., 2013;Hojati et al., 2016). For instance, Skabardonis et al. (2003) found out that in California, about 72% of non-recurrent congestions and 13%-30% of traffic delays in peak hours are associated with traffic incidents. In addition, traffic incidents lead to a high probability of secondary incidents. The risk of a second incident has been estimated to be six times greater than that of a primary incident (Wang et al., 2019). These factors highlight the importance of implementing proper and timely countermeasures for traffic incidents such as traffic flow control and incident response resource allocation (Haule et al., 2019).
Accurate and time-efficient prediction methods for the traffic incident duration are required for formulating and implementing traffic incident countermeasures (Chung, 2010). Traffic incident duration is defined as the period between the time when an incident occurs to the time when traffic recovers to normal (Hojati et al., 2014;Highway Capacity Manual, 2010). Large number of studies have been devoted to the prediction of traffic incident duration. Dimitriou and Vlahogianni (2015) proposed a fuzzy rule-based system to estimate highway traffic incident durations. Lin et al. (2016) proposed an improved M5P model by combining a hazard-based duration model to minimize data heterogeneity in traffic incident duration prediction.
However, most studies used the total incident duration as the prediction object, which means that the prediction results are given at the time when the incident occurred and will not update over time. In fact, the residual incident duration, i.e. how long the negative impact of the incident will continue in the future, has the most practical application value. Given a real-time prediction, transportation operators could adjust and optimize the countermeasures for traffic incidents (Adler et al., 2013). Travelers on the way could also decide whether to choose an alternate route.
In addition, few (if any) studies have taken into account the dynamic traffic flow parameters. Dynamic traffic flow parameters refer to the real-time temporal sequences of traffic parameters including speed, traffic volume and time occupancy during the duration of the traffic incident. It is generally accepted that the real-time traffic flow status has an effect on traffic incident duration (Ma et al., 2017;Ru et al., 2017).
Given the research gaps in the real-time prediction and the employment of dynamic traffic flow parameters, this study uses two artificial neural networks (ANN) methods, multi-layer perception (MLP) and long short-term memory (LSTM), to establish a framework aiming at providing accurate and timeefficient information on traffic incident duration in the future for transportation operators and travelers. This study will contribute to the deployment of emergency management and urban traffic navigation planning.

Literature review
Different statistical methods and machine learning methods have been applied in traffic incident duration prediction, including tree-based method (Lin et al., 2016;Weng et al., 2015), Bayesian classifier (Cong et al., 2018;Ozbay and Noyan, 2006;Zou et al., 2021), hazard-based method (Haule et al., 2019;Li et al., 2017;Li et al., 2015) and ANN (Lee et al., 2017). Among these methods, the accelerated failure time (AFT) model is the most widely used hazard-based method Li et al., 2015). It assumes that the factors related to the incident will accelerate or decelerate the incident duration; thus, it is easily interpreted (Kay and Kinnersley, 2002). Chung (2010) has established a log-logistic AFT metric model based on the Korean Freeway accident data and the results yielded a reasonable prediction. Hojati et al. (2013) proposed a series of parametric AFT survival models of incident duration based on three common distributions, i.e. log-logistic, lognormal and Weibull. Zou et al. (2021) proposed a Bayesian Model Averaging model to predict traffic incident clearance time. Besides, decision tree models were also widely applied to predict incident clearance time due to it can determine the importance of explanatory variables (Weng et al., 2019). Ma et al. (2017) developed a novel approach, gradient boosting decision trees, to predict incident clearance time using different traffic parameters.
However, the structures of these machine learning methods are limited in applying temporal sequences and dynamic traffic flow parameters are not suitable as inputs. Previous studies have used traffic status and other variables in describing traffic flow conditions as substitutes. Lin et al. (2016) applied "congestion or not" as a variable of the prediction model for traffic incident duration. Zou et al. (2018) adopted "peak hour or not" along with 13 other variables to investigate the dependence between incident clearance and response time. Some studies also used traffic flow characteristics. For example, Ghosh et al. (2014) applied the "85th/15th percentile speed" and "peak hour volume" to examine the impact of influence factors on the clearance time of incidents. Hojati et al. (2016) considered posted speed limit, road capacity, recurrent flow, the ratio of speed before and after the incident, etc., as variables to model travel time reliability. However, as traffic incidents are generally a sustaining process and traffic flow status changes over time, the application of stationary traffic flow parameters cannot provide an accurate picture of the situation. Therefore, a new method integrated with dynamic traffic flow information is required.
In recent years, ANN methods have been shown to perform well in short-and long-term forecasting applications with steady data-driven capabilities . LSTM neural network is one of ANN and performs well in dealing with temporal sequences. LSTM has been widely used in short-term traffic flow forecasting and could be used as a reference in processing dynamic traffic flow parameters (Polson and Sokolov, 2017). Short-term traffic flow forecasting applies existing traffic flow data to continuously predict the traffic flow and travel time for a period in the future (usually within 15 min). Ma et al. (2015) used LSTM to predict the speed of the next 2 min by applying data from microwave traffic detectors and the mean absolute percentage error of the model applying both speed and volume as inputs is under 5%. Gu et al. (2019) proposed a two-layer deep learning framework based on LSTM and gated recurrent unit neural network to predict lane-level traffic speed. The detailed methodological review for clearance time prediction of road incidents can refer to Tang et al. (2020). Nevertheless, LSTM neural network has not yet been used to predict the incident duration, which also motivated this study.
The LSTM model provides new insights on using real-time traffic flow parameters. Moreover, traffic incident-related factors should be taken into account, which calls for a hybrid model in addition to the LSTM. MLP is a standard ANN model and it is capable of dealing with classification problems. In this study, the LSTM and MLP methods are combined to build a framework to use two forms of variables as inputs to predict the short-term traffic incident duration.

Methodology
The methodology section provides details about concepts and definitions; key parameters and factors and deep learning modeling approach.

Decomposition of traffic incident
The process of an incident is generally divided into four parts: detection time (time duration between incident occurrence and incident discovery), the response time (time duration between incident discovery and response team arrival), clearance time (time duration between response team arrival and incident clearance) and recovery time (time duration between incident clearance and traffic normalization) Highway Capacity Manual, 2010), as shown in Figure 1. To provide timely decision support to road traffic managers and drivers, this study proposes a real-time method to predict the total traffic incident duration. This study uses the velocity thermogram method, which compares the velocity of the vehicle under the influence of the accident with the historical average speed of a certain road section to determine the impact range of an accident. The detailed process of how to obtain the accurate incident duration can refer to Zhang et al. (2019).

Dynamic prediction for traffic incident duration
The traffic incident management handbook suggested that systems with recorded information should be updated at least every 5 to 10 min during peak periods in urban areas (Farradyne, 2000). This model could provide the information dynamically every 1 min. The objective of this prediction model is the residual incident duration at each update moment. The residual incident duration in this study was classified into three categories based on two levels of incident duration. The two incident levels are 5 and 10 min. The reason for choosing 5 and 10 min as incident duration levels are as follows: The levels of 5 and 10 min are more practical. In urban expressways, traffic incident duration is mostly short. In the data set this study used, 35% is less than 5 min while 70% is less than 15 min. Too long intervals may lose practical significance.
The levels of 5 and 10 min are sufficient for upstream drivers and road managers to make decisions and preparations. Speed limit in urban expressways is usually 80 km/h and cars can drive more than 5 kilometers in 5 min, more than 10 kilometers in 10 min. This distance is enough for the driver to leave the expressway at the previous exit and choose another route.

Key parameters and factors 3.2.1 Traffic flow parameters
In the four parts of the incident duration, traffic flow status is changing constantly. During the detection time and response time, the crash vehicles may occupy one or more lanes and decreasing traffic capacity, which will cause a decline in upstream traffic speed and traffic flow. During the clearance time, the impact on traffic congestion will change according to the processing methods. If medical response, police or towing cars are required, the congestion may get worse. After the incident is cleared, traffic congestion will gradually ease. Consequently, dynamic traffic flow parameters will reflect the process of traffic incident handling. This study uses dynamic traffic flow parameters as the inputs. Comparison is provided in the case study section.
Referring to the previous studies on traffic incident risk estimation with traffic flow conditions (Oh et al., 2005;Fang et al., 2016), the traffic flow parameters (i.e. traffic flow and traffic speed) 5-30 min before the traffic incident to the end of traffic impact have been applied in this study.

Incident-related factors
Previous studies indicated that incident duration has a strong correlation with incident characteristics, e.g. incident type and incident severity (Adler et al., 2013). Traffic incident-related factors could be categorized as temporal factors, spatial factors, environmental factors, incident detail factors and operational factors (Abouaïssa et al., 2016). These factors are summarized in Table 1. 3.3 Deep learning modeling approach based on long short-term memory and multi-layer perception network As shown in Figure 2, the proposed framework contains four parts: data processing, LSTM neural network for incident clearance prediction and MLP network for incident clearance prediction. Two kinds of variables, traffic-related factors and traffic flow data, are applied as inputs of the framework to predict the short-term traffic incident duration. Each part of the framework will be detailed below.

Data processing
Traffic flow data and incident-related factors are two parts of inputs of the framework. They should be processed first before being imported into the model. Data processing contains two parts: slide the time window to get more samples and data normalization.
3.3.1.1 Slide the time window to get more samples. The input variables of an incident are {V 1 , V 2 , V 3 , . . ., V n , Vol, Spd, Y}, where {V 1 , V 2 , V 3 , . . ., V n } denote incident related factors, Vol and Spd denote traffic volume data and speed data (i.e. Traffic flow data), Y denotes traffic incident duration. In Figure 2, at each updated moment, the length of traffic flow data is fixed, i.e. the length of the time window. The recommended time window length is 5-30 min. Figure 3 shows the process of sliding the time window when the time window length is 5 min.
For each incident, the time window sequences Vol tw1 , Vol tw2 , . . ., Vol twm are paired with the same incident-related factors V 1 , V 2 , V 3 , . . ., V n and incident duration Y. m in Vol twm indicates the number of time windows corresponding to the incident. After sliding the time window, these two variables should be added or updated: Update incident duration Y. At the update moment, the total incident duration Y should be replaced by the residual incident duration Y 0 . As mentioned above, two forms of output results are proposed: Label 1whether Y 0 is greater than 5 min. Categorical variables y label1 = {0,1}, where 0 -Y 0 5 min and 1 -Y 0 > 5 min. Label 2whether Y 0 is greater than 10 min. Categorical variables y label2 = {0,1}, where 0 -Y 0 10 min and 1 -Y 0 > 10 min.
Add a new variable elapsed time (V n11 ) and update it at each prediction moment. Elapsed time means how long the incident has lasted. It is a continuous variable and the unit is minutes.
In summary, an incident sample should contain the following variables: In the multi-factor evaluation system, each factor may have different magnitudes or units. The impact of the factor will be enlarged or minified depending on the value of the factor if it is used for analysis directly. Therefore, to ensure the reliability of the results and the equal contribution of each factor, the raw data need to be normalized before being input onto the framework. This study performs min-max normalization on the raw data to make the result fall into the interval [0,1]. The transformation function is as follows: where max is the maximum value and min is the minimum value of the sample data. Notes: a Similar expression: peak hour or not, daytime or night; b Similar expression: capacity reduction; c A1: People died during an accident, A2: People injured during an accident or died after an accident and A3: Property damage; HOV = high occupancy vehicle 3.3.2 Long short-term memory neural network for incident clearance prediction LSTM is a powerful type of artificial recurrent neural network (RNN), which is good at dealing with sequential data (Song et al., 2020). As shown in, Figure 4, similar to the traditional MLP, RNN consists of an input layer, hidden layer and output layer. The hidden layers of RNN are more like a block. For each block, the output h t is calculated with both x t (input of the model at time t) and h t-1 (the result of the memory cell at the last time t -1): where U and W represent weight coefficients, 1 represents activation function and b represents bias.
The output at time t is: where V represents weight coefficient and c represents bias.
The output of the model is: where s represents activation function. Based on the memory of the previous learning content, RNN is mostly used for machine translation and speech recognition (Su et al., 2019). However, during the training of the RNN model, the information at all times before will be traced back when calculating the partial derivative of the loss function to the weight coefficient, which leads to the continuous multiplication of the derivative of the activation function. The continuous multiplication will cause the gradient to be too large (named "gradient explosion") or too small (named "gradient disappearance"), therefore the model learning efficiency is unstable or the information may be weakened. Aiming at solving this problem, Hochreiter and Schmidhuber (1997) proposed the LSTM which could overcome long-term dependencies and determine the best time window automatically. As shown in Figure 5, an LSTM neural network consists of one input layer, one hidden layer and one output layer. In the hidden layer, different from the RNN, there are three "gates" in each memory cell, namely, "forget gate," "input gate" and "output gate." The gate can control whether the previous status information passes and affects the subsequent predictions.
The first gate is the "forget gate" which decides whether the information will be discarded. It reads h t-1 and x t and outputs a value between 0 and 1, where 1 means "completely reserved" and 0 means "completely discarded." The next step is to determine what new information is stored in the cell state, which will be calculated in the "input gate" by the following functions: Finally, the output value will be calculated in "output gate" by the following functions: In the output layer, the output value is calculated as: where s , g and h represent the activation function Traffic flow data are input to the part of LSTM after normalization. The LSTM part is implemented by Python's Keras Library. The construction of this part involves the configuration of the following parameters: Input_shape. The input dimensions and lengths of the model, determined by the time window length and the number of categories for traffic flow parameters (traffic volume, speed or both volume and speed). For example, if the time window length is 30 min and both two traffic parameters are the inputs, the input_shape format is 30 Â 2. The LSTM layer is connected to the DENSE layer. The DENSE layer is the name of the fully connected layer in Python's Keras Library. All neurons in the fully connected layer are connected to each other and have a directional relationship. Configuration of two parameters in this layer are set as follows: Units. This is the output dimension of this part, it is set to 1.
Activation. The activation function of this layer. It is Relu function here:

Multi-Layer perception network for incident clearance prediction
MLP is a well-known method of ANN. As shown in Figure 6, MLP consists of one input layer, one or more hidden layers and one output layer. Neurons in each layer are fully connected to the next layer. The input of the model is denoted as x = {x 1 , x 2 , x 3 ,. . .. . ., x i }. Each node in the hidden layer (l) is denoted as a l ¼ a l 1 ; a l 2 ; a l 3 ; . . . . . . 0Ca l m È É and can be calculated as follows: where w l mi represents the weight coefficient between node a lÀ1 i and node a l m . Furthermore, b l m represents the bias of the node a l m and s represents the activation function. The output of the model is denoted as y = {y 1 , y 2 , y 3 ,. . .. . ., y k } and can be calculated as: where w represents the activation function. Training MLP is based on back propagation by adjusting weight coefficients and bias according to the error gradient descent method. The expression for updating the weight coefficient is as follows: where a indicates the learning rate, L(W) indicates loss function and @ @wi L W ð Þ indicates the partial derivative of the loss function L(W) for the biased w i . Loss function is to evaluate the model prediction results and the target of model training is to get the minimum loss value L min (W). For binary classification problems, the most common loss function is binary crossentropy: where Z i indicates the true value, y i indicates the prediction value of the model and q indicates the number of samples. The output of the LSTM part and the normalized incidentrelated factors are integrated into this part first. The integration is implemented by function keras.concatenate in Python. Then the integrated variables are input into the part of the MLP network of two hidden layers.
For determining the clearance of the incident: if the result of from modeling is Y 0 > 5 min, slide the time window and get the traffic flow data of the next time window into the next cycle; if Y 0 5 min, the clearance of the incident will happen in the coming 5 mins, then the program will be terminated.
In the case study section, the model will be tested and validated. The false-positive rate (FPR) and true positive rate (TPR) are often used to comprehensively evaluate the ability of the prediction model. The receiver operating characteristic (ROC) curve and Kolmogorov-Smirnov (KS) curve is created by plotting the TPR against the FPR at various threshold settings. Besides, the area under the ROC curve (AUC), which provides an aggregate measure of performance across all possible classification thresholds, was used in this study to evaluate the prediction performance.

Description of sites and data
This study selected the traffic incident data and traffic flow parameters in Shanghai Zhonghuan Expressway as a case study. The total length is 70 km approximately with a speed limit of 80 km/h. The expressway is two-directional with four lanes in each direction. Different data sources are used for incident-related factors and traffic flow information. Incidentrelated factors are from the Road Network Monitoring Center, while Traffic flow information was collected from the inductive loop detectors along the Zhonghuan Expressway.
A total number of 4,041 indecent records were originally collected, covering the period from April 1, 2017 to October 7, 2017. Table 2 presents detailed descriptions of the indicentrelated variables collected in the case study. Traffic flow information was obtained from 176 inductive loop detector sets (loops at the same detection spot are considered as one set) distributed along the Shanghai Zhonghuan Expressway, with 800 m intervals on average. These loop data contain both speed and traffic volume information with an acquisition frequency of 20 s. To be effectively applied in the modeling framework, traffic data was converted into 1-min intervals by aggregating the traffic volume and averaging the speed. Data cleansing and matching work was conducted to remove vague traffic incident records and to pair up the incidents with the associated traffic flow information. After data cleansing and data matching, 391 incidents with their traffic flow information paired were selected. Note that no secondary incidents were observed, therefore they were not considered in this study.

Preparation for modeling
With the time window sliding procedure, approximately 4,200 samples are obtained. During this procedure, as the status of a small number of loop detectors was missing or invalid, corresponding samples were deleted. After expanding the sample through the time window sliding procedure, the sample ratio of the two categories in Label 1 is 2:3 and the sample ratio of the two categories in Label 2 is 3:2. As the sample imbalance problem was not significant and no processing was needed. All samples (approximately 4,200) are divided into a training set and test set. The sample ratio is 80% and 20%, therefore 3,360 samples are in the training set and 840 samples are in the test set approximately.

Results and discussions 4.3.1 Selection of modeling inputs and parameters
Comparative analysis for modeling input selection was conducted and results are provided in Table 3. 4.3.1.1 Input of traffic flow parameters. As provided in Table 3, AUC values are the highest (0.84 for Label 1 and 0.89 for Label 2) when synthesizing both traffic volume and speed as inputs of the model, compared to cases using simply the traffic volume or the speed as input. Therefore, using traffic volume and speed as the modeling inputs has the best performance, as it can fully describe the traffic flow characteristics. The combination of traffic volume and speed is recommended as inputs for predicting incident clearances.
4.3.1.2 Time window length. Then, both traffic flow parameters including traffic volume and speed are integrated as the model inputs. Different time window lengths were tested and compared. The results from Table 3 indicated that the model with 30-min time window performed best with the highest AUC value of 0.86 for Label 1 and 0.94 for Label 2. This is probably because that the longer time series can better reflect the changes in traffic flow during the process of traffic incident development. Therefore, a time window length s set to be 30 min.
4.3.1.3 Dynamic traffic parameters and static traffic parameters. The model with the best performance, i.e. the model of 30-min time window length and application of both speed and traffic volume, is applied. Then modeling approach using dynamic traffic parameters was compared to that uses static traffic parameters. As from Table 3, when using dynamic traffic parameters as inputs, AUC values were higher for both prediction tasks represented by Label 1 (if clearance occurs in 5 min) and 2 (if clearance occurs in 10 min). Therefore, dynamic traffic parameters were preferred in the modeling approach. After the process, the cumulative variance contribution of each principal component reached 89.1%, which is consistent with previous studies (Ru et al., 2017).

Prediction performance with selected parameters
Finally, It can be clearly found from Ru et al. (2017) that the prediction performance of the model applying dynamic traffic flow parameters is significantly better than the model applying traffic flow parameters processed by PCA. The result confirmed that the dynamic traffic flow parameters outperformed static traffic flow parameters.
This paper provides the ROC curve, KS curve and confusion matrix of the proposed model with optimal parameter combination, as shown in Figure 7, Figure 8 and Table 4. It can be seen from Figure 7 that the AUC values of the LABEL 1 and LABEL 2 are 0.86 and 0.94, respectively. As the ROC curve of LABEL 2 is closer to the upper left corner, the predictive performance of this classifier is better than that of LABEL1. Besides, as shown in Figure 8, it can be seen that the prediction performance of the model is better by observing the change of FPR and TPR with the threshold. The confusion matrix of the proposed model with optimal parameter combination is shown in Table 4. The prediction accuracy of LABEL 1 and LABEL 2 are 80.7% and 85.7%, respectively.    framework in terms of categorical variables, integrated with the output of the part of LSTM and then processed by MLP. The framework was tested through traffic incident samples recorded by traffic police and traffic flow data obtained by loop detectors in Shanghai Zhonghuan Expressway. The trained framework performed well and shows promise in applying dynamic traffic flow parameters in traffic incident duration prediction.

Conclusions
The main contribution of this study is to propose a framework, which initially applies dynamic traffic flow parameters in traffic incident duration prediction. Dynamic traffic flow parameters better reflect the change of traffic status over traffic incident duration compared with traffic flow characteristics such as posted speed, 85th percentile speed and the ratio of average speed at the time of the incident to that in history. Based on the case study of the Shanghai Zhonghuan Expressway, the impact of both traffic incident-related factors and real-time traffic flow parameters on traffic incident duration are considered in the framework. The results show that the inputs of a 30-min time window, applying both dynamic traffic volume and speed had the best performance and are recommended in future studies.
It is worth mentioning that the framework features high computing power and can provide convenience for road managers and drivers with little delay in future practical applications. However, this model lacks interpretation compared with traditional hazard-based models. Besides, the correlation between input variables is ignored and will be further discussed in future research. The effect of each variable is demonstrated through the statistical method. In addition, the incident samples applied by this study are not enough and the variables are not abundant. The number of injuries and casualties, more detailed description of the incident location and other variables are expected to be used to characterize the traffic incident comprehensively. The framework needs to be further validated through a sufficiently large number of variables and locations.
Further research is desirable to study sequential prediction. Not all traffic incident-related factors can be acquired at the time when the incident is reported, on the contrary, more detailed information is gradually acquired over time. For instance, whether towing cars are required is known as the traffic police arrive at the incident site. A more realistic prediction method that continuously updates model variables and results over time will provide more accurate estimation and reliable references.