Dynamic model to characterise sectors using machine learning techniques

Purpose – The purpose of this paper is to set out a methodology for characterising the complexity of air traf ﬁ c control (ATC) sectors based on individual operations. This machine learning methodology also learns from the data on which the model is based. Design/methodology/approach – The methodology comprises three steps. Firstly, a statistical analysis of individual operations is carried out using elementary or initial variables, and these are combined using machine learning. Secondly, based on the initial statistical analysis and using machine learning techniques, the impact of air traf ﬁ c ﬂ ows on an ATC sector are determined. The last step is to calculate the complexity of the ATC sector based on the impact of its air traf ﬁ c ﬂ ows. Findings – The results obtained are logical from an operational point of view and are easy to interpret. The classi ﬁ cation of ATC sectors based on complexity is quite accurate. Research limitations/implications – The methodology is in its preliminary phase and has been tested with very little data. Further re ﬁ nement is required. Originality/value – The methodology can be of signi ﬁ cant value to ATC in that when applied to real cases, ATC will be able to anticipate the complexity of the airspace and optimise its resources accordingly.


Introduction and objectives
The objective of the air traffic management (ATM) system is to enable efficient, safe operations for airspace users (Gallego et al., 2018a).The number of flights is forecast to almost double by 2035 (EUROCONTROL, 2010).Within this context, Europe has a very complex transport network, specifically air transport, due to the high level of mobility of its inhabitants (Samolej et al., 2021).
However, simply counting the number of aircraft in the airspace gives a misleading idea of its complexity.It is important to take other factors into account, such as the number of interactions of air traffic control (ATC) with aircraft (Lee et al., 2009).Furthermore, the complexity of the operational scenario depends in part on the variability between the predicted and actual trajectories of the aircraft.The actual trajectory differs from the predicted one due to several factors, including the weather forecast, the integration of operational information and operational uncertainty (Gallego et al., 2018b).
At present, there is no single universal indicator of complexity (Gianazza, 2007).However, over the past 25 years, concern has grown over the need to measure the complexity of airspace (Pejovic et al., 2020).This complexity and the resulting unpredictability for the ATM system is one of the main reasons for the deployment of technical and operational solutions to balance the capacity and the demand of ATM (SESAR, 2015).The research and projects on this topic are important for the ATC service, as an increasing airspace complexity directly leads to an increase in air traffic controllers' workload (Xie et al., 2021).Thus, The current issue and full text archive of this journal is available on Emerald Insight at: https://www.emerald.com/insight/1748-8842.htm the definition of a unique complexity parameter would imply a more exact and earlier evaluation of the controllers' workload.
This paper attempts to further the development of the universal definition of the complexity of an ATC sector.A parameter called complexity will be defined based on the main traffic flows within the sector and ultimately based on the data of individual operations.As such, the complexity of a sector will depend on the characteristics of overflying aircraft.This methodology will be data-driven.Data-driven complexity models are being developed to overcome the limitations of expert opinion-based models, such as bias or dependence of air traffic controllers' confirmation (Gianazza and Guittet, 2006).This methodology will thus have some advantages when being compared with previously made airspace complexity evaluation models: Possibility of using massive data to develop and test the methodology.
Adaptability, as the methodology, presents updated results when new data are provided.
Autonomy from controllers and other stakeholders when updating results.Furthermore, a tool will be developed that, via this definition of complexity, will help ATC to increase its capacity to better cope with demand.This increase in capacity will be due to better management of resources.This tool works by characterising airspace sectors according to their complexity.This will enable ATC to anticipate the operational scenario and react in good time.
For the tool to be of practical value in a real scenario, we will use machine learning.The function of machine learning will be to make an automatic tool that learns based on the data.In this way, if the data are from different sectors, machine learning will learn different patterns and give different results, thereby taking the intrinsic characteristics of each sector into account.If historical data are available, machine learning will be able to capture longerterm patterns resulting in different predictions.This variability means that the model is dynamic.Furthermore, the aim is to ensure that the results are easy to interpret, thereby assisting ATC.
The addition of machine learning to this tool will also provide some advantages to the ATC service when using it: Possibility to learn historical trends if the necessary data are available.
Possibility to learn patterns from structural aspects of different ATC sectors based on the characteristics of the operation within the sectors.
The addition of machine learning to this tool will make the application of the complexity characterisation methodology automatic.Therefore, this predictive tool will be the perfect complement to the proposed methodology.The proposed methodology, set out in Section 2, details the process for classifying sectors according to their complexity.Section 3 shows the results obtained using real data from Spanish airspace.Finally, Section 4 gives the conclusions of the paper and outlines future research in this area.

Methodology
The objective of this paper is to develop a methodology that will enable airspace sectors to be characterised based on historical data.The methodology will be such that it is capable of adaptation and learning as per data-based models.The characterisation will be based on a statistical analysis of individual aircraft overflying the specific sector.This statistical analysis will enable us to define parameters to characterise the traffic flows in the sector and, carrying on from that, the complexity of the sector itself.To carry out this work, data for the whole of 2019 from Spanish airspace were used.
The characterisation must be such that it is of genuine help to ATC in managing its resources.For this very reason, it was decided to make a model based on the daily complexity of the sectors themselves.As regards ATC, the complexity of a sector depends both on its structural aspects and on the characteristics of its traffic flows (Sridhar et al., 1998).Specifically, the characteristics of the traffic flow within a sector are related to aspects of individual flights such as the number of aircraft, mix of aircraft models, weather, aircraft separation, aircraft speed and regulations affecting traffic (Oktal and Yaman, 2011).
According to the literature, the process of estimating the complexity of the sectors begins with the flights themselves.In other words, to define the complexity of a sector using a databased methodology, it is necessary to start with the most basic unit, which is the individual operation.
The flight plans for one day will be used to carry out statistical analysis to characterise which traffic flows will be the most complex for ATC to control.The first step in the methodology is to carry out this statistical analysis.For the statistical analysis, different indicators based on the literature have been used and classified into four fields of study.

Traffic density
The ever-increasing density in air traffic results in a heavier workload for air traffic controllers (Debbache, 2003).Traffic density is key for identifying the main traffic flows, and thus, the complexity of the airspace.The indicators used to define this variable are the number of aircraft per day, the number of aircraft per hour and the maximum number of aircraft in an hour.

Vertical density
This is obviously related to traffic density.It is important to understand how the traffic is structured at the different flight levels within the sector.The number of ascending/descending aircraft is important to define the airspace complexity (Cao et al., 2018).This variable has been defined from the percentage of ascending/descending aircraft, the number of ascending/descending aircraft, the number of Flight Levels and the number of aircraft per Flight Level.

Time distribution
Delays in air transport are a major concern for the industry because of inconvenience to the main actors and the resulting costs (Rodriguez-Sanz et al., 2021).To include possible delays in the overall model, we must study the time distribution and hourly milestones of the aircraft in the sector.To maintain a simple definition of this variable, it is calculated based on the percentage of hours in a day that aircraft are flying within the air traffic flow itself.

Air traffic flow and capacity management regulations
Air traffic flow and capacity management (ATFCM) regulations are very important when determining the complexity of airspace.
Furthermore, ATFCM regulations reveal mechanisms for balancing capacity and demand (Sanaei et al., 2019).When there is excess demand with respect to ATC capacity, this generates complexity in the sector; therefore, it is important to study the regulations.The ATFCM regulations are directly related to complexity, so special concerns will be shown in the definition of this variable.The indicators that will define the variable ATFCM regulations will be the percentage of regulated flights, the percentage of delayed flights, the average number of regulations, and the average delay faced by flights in their operation.
The aim of the paper remains the definition of a simple methodology to characterise ATC sectors.We will define the four initial variables in the simplest way possible to be consistent with this objective.
The indicators which are the base of the methodology can be directly subtracted from the statistical analysis of the data.The mean values of these indicators and their variation coefficients will be calculated on a daily basis for each flow within a sector [equation ( 1)].This will ensure a complete description of the future complexity of the sector: where: s = Standard deviation; and m = Mean.
The four initial variables will be later defined by the sum of their respective indicators [equations ( 2) and ( 3)].A distinction will be made between mean and variability values, so eight variables will be calculated: Using these mean values and variabilities, we will calculate two parameters that will enable us to know the impact of the flows.These parameters will be called "mean impact" (based on the average values) and "impact variability" (based on the variation coefficients).To obtain the values of mean impact and impact variability, we calculate the weighted sum using the initial variables previously identified.Therefore, the mean impact and the impact variability are given by the following: Where x, y, z and w are the relative importance of each of the initial variables in the calculation of the mean impact.In equation ( 6) x 0 , y 0 , z 0 and w 0 are the relative importance of each of the initial variables in the calculation of the impact variability.Both mean and variability values remain the same in the first iteration, but they will independently vary once machine learning is introduced.
Thereafter, the mean impact and impact variability are rescaled so that they fall within the preestablished range of one to five.To rescale the mean impact and the impact variability, the calculation will be based on the maximum and minimum values [equation ( 8)]: Once the mean impact and the impact variability are rescaled, the daily impact of the traffic flow can be ascertained from a table model using a combination of both.The impact of the traffic flow variable will be discretised and arranged into five groups, named one to five, the value of which will indicate the treatment that each flow requires from ATC.The flows with low levels of impact will hardly require attention from ATC, whereas those of Level 5 will require significantly more.
The impact of the traffic flows will give a snapshot of the sector, enabling us to easily identify the most important flows and concentrate on them.That being said, the final goal is to be able to characterise the sector using a single variable.This variable is aptly named "complexity".
From an operational point of view, complexity is a function of many variables, the most important of which are the number of flows and their distribution (Comendador et al., 2019).Adapting this to the methodology being developed, we will define the complexity of the sector using a table model (Figure 1) based on the number of flows within the sector on the day in question and the percentage of those that have an impact on Level 5.The complexity variable will be discretised into five classes, ranging from one to five.The lowest level is one and the highest five.
Once this parameter has been defined, it is now possible to characterise the airspace volumes.Furthermore, this successive approximation enables us to observe all the steps of the process.It is possible to see the level of complexity of the sector and also the flows that have the greatest impact within it.If necessary, we can also see the main characteristics of a specific flow.
Interesting initial variables, their relative importance and table models have been established based on expert opinion, through meetings and discussions with operational personnel, along with results from previous projects (SESAR Joint Undertaking, 2019).
That being said, each sector will have unique characteristics.To cater to this, the model will be automatically updated with the help of machine learning.Machine learning will learn from the data provided once it has performed individual operations on it.As these modifications are based on machine learning, they will be uniquely tailored to the specific sector.This will make the model dynamic and means that it will constantly change based on the airspace in question.
Certain learning algorithms enable us to see the relative importance of input variables when predicting the target variable in regression problems (Bi and Chung, 2011).The relative importance of the initial variables will show which of these are most relevant when calculating the mean impact and impact variability.The values assigned as per expert opinion will be updated with those provided by the machine learning model until convergence is reached.The result will be that, for each sector, the variables that are genuinely the most important will stand out in the machine learning model.
Furthermore, the table models that enable the values of the variables' impact and complexity to be determined can also benefit from machine learning.In this classification, the relative importance of the x-and y-axes can also be calculated (Holz and Loew, 1994).The goal of machine learning here will be different.By studying the relative importance of the parameters in the tables, we will attempt to ensure a balance between the different intermediate and final variables.In this way, we can ensure that the classification of the levels of impact and levels of complexity is influenced by all of the respective variables contemplated.
By using machine learning in the overall model, we not only can tailor the model for each sector but can also update it continuously.In this paper, we used data from 2019.If, however, data from other years had been added, then the values of relative importance would have been different.This model can be used to view the evolution of the characteristics of air traffic and to see how the relative importance of the different variables changes over time.
The overview of the characterisation methodology, starting from the initial variables and arriving at the complexity of the sectors, is given in Figure 1.

Results
In this section, we see the results of applying the methodology to a set of initial sectors of Spanish airspace.The data correspond to operations in five ATC sectors of Spanish airspace during 2019.The data have been obtained based on ENAIRE radar traces and have been provided to the authors after processing and validation by the company CRIDA.The methodology is divided into three steps.The results of each are shown in the respective subsections: 1 Calculate the mean impact and impact variability via statistical analysis of the data: to obtain the values of mean impact and impact variability, we calculate weighted sums of the mean values and variation coefficients using the initial variables identified. 2 Calculate the impact of the flows based on the mean impact and impact variability: following on from the previous step and using a table model, we calculate the "impact" of the traffic flows within a sector during a single day. 3 Calculate the complexity of the sector: once the impact of the flows is known and based on the number of flows in the sector and the percentage of these with an impact of Level 5, we can calculate the "complexity" of the sector with the aid of a table model.Achieving a value for this parameter is the overall aim of the sector characterisation methodology we have developed.

Results: relative importance
The first step is to calculate the mean impact and impact variability.To do this, we start with the initial variables.The initial relative importance of each of these variables is determined by expert opinion and shown in Figure 2. Looking at both graphs in Figure 2, we can clearly see that the values of relative importance used to calculate the mean impact and impact variability are the same.The reason for this is that in both cases, the same four initial variables, i.e. traffic density, vertical density, time distribution and regulations, are involved.According to expert opinion, the values of relative importance should be the same in both cases, and it should be up to the machine learning model to establish the differences going forward.
The model will begin the first iteration using these initial values of relative importance.Thereafter, a random forest model will be used to ascertain the new values of relative importance for each of the initial variables.These values will be continually updated.This algorithm was chosen due to its versatility and the results obtained in delay prediction problems (Rebollo and Balakrishnan, 2014).
Figure 3 shows how the values of the relative importance of the different initial variables evolve over eight iterations.
Relative importance converges from the sixth iteration onwards.In the seventh and eighth iterations, the relative importance of the four initial variables varies by 0.01 or less.This limit has been considered acceptable, and for this reason, the relative importance of the sixth iteration is used to calculate the rest of the variables of the methodology.To validate this regression machine learning model, the mean absolute error (MAE) has been used (Georgiou et al., 2020).This parameter is used in regression models to give an overall picture of the model.This indicator calculates the difference between the estimated and actual value for each element of the test set and then calculates the mean.In the case of the expert opinionbased model, the MAE is 0.02, while in the eighth machine learning iteration, the MAE is 0.007.Both values and those of the rest of the models which are within this interval are correct, and we will consider all models to be valid.
As the model progresses, the regulations variable becomes increasingly important in predicting the mean impact and impact variability.In fact, when calculating the mean impact, with the data provided, this initial variable is practically the only one that is relevant, having final relative importance of more than 0.9.When calculating the impact variability, the regulations variable continues to be the most important factor, with a relative importance of over 0.5, but traffic density is also important, with a relative importance of over 0.4.In neither case does time distribution or vertical density appear to be important.
Relative importance is different from what was expected by expert opinion.Airspace complexity models are typically based on expert opinion and will therefore be subject to bias (Gianazza and Guittet, 2006).Data-driven methodologies are trying to overcome this limitation by extracting their results directly from the data provided.This difference in results can be seen in this evaluation, as machine learning is obtaining relative importance based on data, which are different from the relative importance expected from expert opinion.
From an operational point of view, these results seem to be logical.The existence of regulations in specific airspace is due to the fact that the ATC service provider does not have sufficient capacity to handle the demand.For this reason, the flows in which there are a greater number of regulations will have a greater impact on the airspace by pushing the system to the limit of its capacity.The number of aircraft, which may seem significant, will not really have a great influence on the service provider if it has sufficient capacity to meet the demand.However, variability in the number of aircraft is extremely relevant because an ATC service provider may have more problems in satisfying a demand that is continuously changing.
The project on which this paper is based is in a preliminary phase.Thus far, it has been applied to a small number of sectors of Spanish airspace.Once the model has been expanded to handle data from all sectors within Spanish airspace, it may be necessary to incorporate new initial variables.That being said, with the initial variables already used, it is clear that the model is capable of iterating and distinguishing which initial variables are the most important.

Results: impact
Starting with values for the mean impact and impact variability and with the help of a table model (Figure 1), we can obtain a value for the daily impact of each of the flows within the sector.This calculation is influenced by machine learning and continually updated by it.By knowing the values of relative importance, we can ascertain whether the table model is appropriate or not for the calculation.To be appropriate, the relative importance of both the mean impacts and the impact variabilities must be balanced.In other words, one should not be significantly greater than the other.
Figure 4 gives the table initially proposed and the relative importance of each intermediate variable.
In this model, the relative importance of the mean impact is greater than that of the impact variability (0.65 vs 0.35).Therefore, in this case, the calculation of the impact will be influenced to a greater extent by the mean impact.To compensate for this imbalance, another table model is presented in Figure 5.
In this case, the calculation gives roughly the same relative importance to the mean impact and the impact variability (0.48 vs 0.52).This means that this particular table will be more appropriate for the data analysed.The table model is updated from a variety of initial tables, which the machine learning model will use automatically to obtain the values of relative importance.The model will decide which table best fits the case in question.This means that the model is dynamic because different data sets will have different tables that better adapt to their characteristics.
This time, the accuracy of the model has been calculated to ensure the quality of the machine learning model.The accuracy is calculated as total accurate prediction divided by total number, or one minus misclassification rate (Truong and Choi, 2020).This parameter is used in classification problems, giving an overall idea of the behaviour of the model.All the machine learning models used to calculate the impact of the air traffic flows have an accuracy of over 97.5%, being very precise models.The model used to obtain the results in Figure 4 has an accuracy of 98.7%, whereas the model used to obtain the results in Figure 5 has an accuracy of 99.2%.
Once it has been decided which table will be used to calculate the daily impact of the flows, the methodology can then calculate the impact of the flows in an air traffic sector.The impact of the flows will be calculated on a daily basis to enable ATC to organise its resources, allowing it to allocate more resources to the flows with greater impact and less to those with lesser impact.The final result of this step of the sector characterisation methodology will be an image of the sector in question and the associated traffic flows along with their impact.Figure 6 shows a typical day during 2019 in the LECMPAU control sector.
In the example, the traffic flows in a north-south direction are clearly visible, these mostly being flights between Madrid-Barajas airport and northern Europe.There are also traffic flows that cross LECMPAU in an east-west direction, these being flights between North America and Barcelona-El Prat airport.The results obtained correspond to the main traffic flows expected in LECMPAU on a typical day.Therefore, we can say that, from an operational point of view, the results are correct.
With respect to the impact of the flows, there was a lot of variety on the day in question.As such, it is not possible to draw further conclusions other than state that the methodology works and can be applied using real operational data.

Results: complexity
The last step of the proposed methodology is to calculate the complexity of the sector based on the air traffic flows (Figure 1).This is similar to the previous step, in which the impact of flows is calculated.The machine learning tool that enables us to obtain the relative importance of the final variables is used again.The aim is that both final variables should have similar relative importance to ensure that their influence on the table model is balanced.In each case, the table model is adjusted to the needs of the data.Figure 7 shows the initial table used in this case with the associated relative importance of the final variables.
The initial table model does not fit the nature of the data.When calculating the complexity, the relative importance of the number of flows is much greater than that of the percentage of flows with an impact Level 5 (0.82 vs 0.18).For the calculation to treat both final variables equally, another table model is required that results in a similar weighting for both final variables.This is shown in Figure 8.
This table model achieves a balance between the number of flows in the sector (0.55) and the percentage of these flows that have an impact of Level 5 (0.45).The calculation of the complexity of the sector is now balanced with respect to the real data.As in each of the previous steps, machine learning has been used to define a model that adapts to the data and learns from them.In other words, this model is tailored for each possible scenario within the airspace.
To validate the machine learning models used to calculate the complexity of the ATC sectors, accuracy has been used again.All the machine learning models provide again an accuracy of over 97.5%.The model used to obtain the results in Figure 7 has an accuracy of 99.3%, whereas the model used to obtain the results in Figure 8 has an accuracy of 99.0%.This last step will determine the final complexity of the sectors.The classification ranges from Level 1, the least complex, to Level 5, the most complex.Figure 9 shows the results for five sectors of Spanish airspace on a typical day.
Of the five sectors studied, Castej on (LECMCJI) is the most complex, Level 5. Arrival and departure traffic to and from Madrid-Barajas is concentrated in this sector, so it is very difficult to control.Santiago (LECMSAN) and central Barcelona (LECBCCC) are of complexity Level 4. These are control sectors that have a lot of traffic.Santiago also has free-route traffic, which increases the difficulty as regards control.The Balearic Islands (LECBBAS) and Pamplona (LECMPAU) sectors are less complex.Even so, they are still above the minimum level of complexity.
The results produced by the model are logical from an operational point of view.They enable ATC to anticipate the real complexity of the airspace and optimise its resources accordingly.

Conclusions and further work
By establishing a methodology that uses one day's worth of operational data within a sector, it is possible to get an idea of the complexity of that sector.This will enable ATC to anticipate the real complexity of the airspace and optimise its resources accordingly.The methodology is automatic; therefore, it is simple to apply, and the results are easy to interpret.
By incorporating the machine learning tool, it is possible to know the relative importance of the different variables (initial, intermediate, final) used in each step of the methodology.This acts as a support to expert opinion and enables the values of relative importance in these steps to be updated to achieve the objectives of the step in question.In addition, this machine learning tool will learn based on the data entered in the model.This is useful because the results obtained will be different depending on the scenario in question.Thus, the characteristics of each volume of airspace are captured with the help of machine learning.
Specifically, when this methodology is evaluated using data captured in 2019 from Spanish airspace, the following conclusions are possible: When calculating the mean impact of a flow (Figure 3), ATFCM regulations are of paramount importance.This When there is excess demand with respect to capacity in the sector, this creates complexity for ATC, and this is when the need for regulations arises.When calculating the impact variability of a flow (Figure 3), ATFCM regulations are again very important.In this case, the traffic density is also important.The same conclusions as in the calculation of the mean impact can be drawn.
The table models for calculating the impact of the flows (Figure 5) and the complexity of the sectors (Figure 6) behave differently and will, therefore, need to be balanced differently by the machine learning model.To automate these steps, a database with different table models will be required.The machine learning model will then choose the most suitable of these for the scenario in question.
The results obtained when calculating the impact of the flows (Figure 6) are correct from an operational point of view.The flows identified in the LECMPAU sector correspond to the traffic flows actually experienced in the sector.Furthermore, the impact levels that were calculated are consistent with real operations on a typical day.
The results when calculating the complexity of the sectors (Figure 9) also appear to be correct.All the sectors studied have levels of complexity above the minimum, but these vary according to their intrinsic characteristics.
Based on the results obtained in the specific case outlined, it is safe to conclude that the methodology produces realistic and easily interpretable results.That having been said, the methodology is in a preliminary phase, and certain aspects will have to be refined to improve upon it.These include: Apply to more sectors.This will produce a greater variety of results and will allow more exhaustive testing of the methodology.
Review the definition of the four initial variables.It is clear that, of the four initial variables, "regulations" is by far the most influential.Although this is entirely logical, it would be wise to review the choice of initial variables to ensure that each of them has at least some influence on the overall results.
Study the incorporation of other variables when calculating the complexity of the sectors.The complexity of a sector is currently defined based on the number of flows and the percentage of flows of impact Level 5.There are, however, many other parameters that influence the complexity of a sector.It would be interesting to see if an expanded list of parameters could be incorporated within this table model.
Add more variety to the table models.This would give a more exact fit.
The objective of all future steps should be to improve the model, both by reviewing the current model to see where it needs to be modified and by incorporating additional parameters to make it even more complex and realistic.That being said, the results to date are promising and indicate that the methodology is worthwhile.

Figure 1
Figure 1 Overview of the characterisation methodology

Figure 2 Figure 3
Figure 2 Initial relative importance of the four initial variables using expert opinion

Figure 4 Figure 5
Figure 4 Initial table model used to calculate the impact

Figure 7 Figure 8
Figure 7 Initial table model used to calculate the complexity