data driven modeling of systemic delay propagation under severe meteorological conditions
DESCRIPTION
Report on one of the ComplexWorld Network's PhD research project's. Find further information at http://complexworld.eu/wiki/Uncertainty_in_ATMTRANSCRIPT
http://ifisc.uib-csic.es - Mallorca - Spain
Complex World | Seminar 2013
@ifisc_mallorca
www.facebook.com/ifisc
Data-driven modeling of systemic delay propagation under severe meteorological conditions
Pablo Fleurquin José J. Ramasco Victor M Eguíluz
http://ifisc.uib-csic.es
Outline
Motivation
Air-traffic data
Network & Cluster construction
Data Results
Model definition
Comparison: model – reality
Effect of large scale disruptions on the system
Conclusions
http://ifisc.uib-csic.es
Why is it important?
(http://www.transtats.bts.gov/)
• Total cost of flight delay in US in 2007 was 41B dollars. • Rich transport dynamics. • Cascading failure.
(http://www.eurocontrol.int )
30%
41%
0%
25%
4%Air Carrier Delay
Aircraft Arriving Late
Security Delay
National Aviation System Delay
Extreme Weather
http://ifisc.uib-csic.es
Database & network
Database:
• Airline On-Time Performance Data (www.bts.gov)
Ø Schedule & actual departure (arrival) times Ø Origin & destination airports Ø Airline id Ø Tail number
• 2010 flights:
Ø 6,450,129 flights (74 %) Ø 18 carriers Ø 305 airports
Network: • Nodes: airports • Edges: direct flights between airports • Node attributes: average delay per flight
http://ifisc.uib-csic.es
Cluster definition
Clusters: • Formed by airports in problem
Ø average delay per flight > 29 min
• Must be connected (flight route between them) • A group of airports connected by flights that their average delay is higher than 29 minutes
Cluster(A(size(4((
Cluster(B(size(2((
http://ifisc.uib-csic.es
Clusters: • Formed by airports in problem
Ø average delay per flight > 29 min
• Must be connected (flight route between them) • A group of airports connected by flights that their average delay is higher than 29 minutes
Cluster definition
Cluster(A(size(4((
Cluster(B(size(2((
http://ifisc.uib-csic.es
Largest daily cluster
Clusters: • Formed by airports in problem
Ø average delay per flight > 29 min
• Must be connected (flight route between them) • A group of airports connected by flights that their average delay is higher than 29 minutes • April 19, 2010 • Average delay per delayed flight:
Ø 16.9 min
Cluster(A(size(4((
Cluster(B(size(2((
http://ifisc.uib-csic.es
• March 9, 2010 • Average delay per delayed flight:
Ø 25.7 min
Clusters: • Formed by airports in problem
Ø average delay per flight > 29 min
• Must be connected (flight route between them) • A group of airports connected by flights that their average delay is higher than 29 minutes
Largest daily cluster
Cluster(A(size(4((
Cluster(B(size(2((
http://ifisc.uib-csic.es
Clusters: • Formed by airports in problem
Ø average delay per flight > 29 min
• Must be connected (flight route between them) • A group of airports connected by flights that their average delay is higher than 29 minutes • March 12, 2010 • Average delay per delayed flight:
Ø 53.2 min
Largest daily cluster
Cluster(A(size(4((
Cluster(B(size(2((
http://ifisc.uib-csic.es
• Great variety • Consecutive days are very different each other.
Cluster size
3.4. Cluster and airport dynamics 19
The distribution of �TAT also shows long tails both in the positive and neg-ative values (Figure 3.12). Another indication of the complex nature of this phe-nomenon.
3.4 Cluster and airport dynamics
The focus so far has been on individual flight delays. We define now a metric ofcongestion for the full network. As mentioned in the previous section we consideran airport as congested during a given time period whenever the average delay ofall its departing flights in that period exceeds the 29 minutes threshold. Because ofthe low operating activity in the early morning that disrupts the delay propagationdynamics from day to day (see Section 3.2), a daily airport network is built using theday flights to assess whether congested airports form connected clusters. Note thatbeing in the same cluster is a measure of spatio-temporal correlation of congestionbut not necessarily a sign of a cause-e↵ect relation. Maps with the congested airportsand the connections between them are shown for di↵erent days in Figures 3.14 A-C. We analyzed days with di↵erent level of congestion given by the daily averagedeparture delay: March 12 high congestion, April 19 low congestion (see Table 3.4)and in order to explore what happen at an intermediate level of congestion weselected March 9. The scenario dramatically changes from day to day: in some daysa large cluster surges covering 1/3 of all airports (high congestion), while in othersonly one or two airports cluster together (low congestion). At an intermediate levelsome airports rise as congested but they are not able to merge into a cluster. Thesebehaviors indicates that connectivity is thus an important factor to produce highcongestion and consequently delays are propagating through connected airports inan intra-day time period.
A B
0 60 120 180 240 300 360
Day0
20
40
60
80
100
120
Larg
est c
lust
er s
ize
0 20 40 60 80 100 120
Largest cluster size [Airports]10
-3
10-2
10-1
100
P(>s
ize)
Characteristic Size: 20.1
slope ~ -0.0496
Figure 3.13. (A) Daily size of the largest cluster. (B) Complementary cumulative distri-bution of the size of the largest cluster (log-normal scale).
Taking into account all days of 2010 the largest connected cluster size is ex-plored as a function of the day (Figure 3.13 A). A strong variability is thus the
http://ifisc.uib-csic.es
Intraday cluster evolution
Evolution of clusters for March 12:
http://ifisc.uib-csic.es
• Jaccard Index: • Great variety • Consecutive days are very different each other • For consecutive days not only they differ in the cluster size also the airports comprising the cluster are different.
Cluster composition
0 60 120 180 240 300 360Day
0
0.2
0.4
0.6
0.8
1
Jacc
ard
Inde
x
0 5 10 15 20Ranking
0
0.2
0.4
0.6
Jacc
ard
Inde
x Top 20 (best days)Top 20 (worst days)
A B
http://ifisc.uib-csic.es
Ø Flight rotation (same tail number)
§ Ts § Schedule (arrival/departure)
Ø Flight connectivity (different tail number)
§ ΔT § α
Scheduled departure time
Flight E | Airline X
Actual arrival time. Flight D | Airline X
Actual departure time. Flight E | Airline X Sch. arrival
time. Flight D | Airline X
Departure delay
waiting time
Ø Airport Congestion
§ Schedule Airport Arrival Rate (SAAR) § First Arrived First served § β
Model definition
c Inbound delay Departure delay
Scheduled turn around time
Scheduled arrival time
Scheduled departure time
Actual arrival time Flight A
Actual departure time Flight A
Ts
4am
6am
8am
10am
12am 2p
m4p
m6p
m8p
m10
pm12
pm 2am
Time [EST]
0
20
40
60
80
100
SA
AR
ATLORDDEN
March 12
Initial Conditions § From the data…
- Known à when, where and the departure delay for the first flight of the sequence.
§ Random initial conditions…
- Fixed initial delay (min) - % of initially delayed planes
4.3. Subprocesses 29
The equation that govern the rotation subprocess is given by:
T jact.d(pij) = max[T j
sch.d(pij);Tjact.a(pij) + Ts] (4.2)
where j corresponds to destination airport and i to the origin one. Thesubindexes act.d,act.a and sch.d correspond respectively to Actual Departure, Ac-tual Arrival and Schedule Departure.
4.3.2 Flight connectivity
In addition to rotational reactionary delay, the need to wait for load, connectingpassengers and/or crew from another delayed airplane from the same fleet (airlineid) may cause, as well, reactionary delay.
Figure 4.4: Possible connections within flights of the same airline.
For each flight at a particular airport, connections from that airport are ran-domly chosen as follows. Firstly, we take a �T window prior to the scheduleddeparture time of the flight. Secondly, we distinguish possible connections of thesame airline from other flights, that have a scheduled arrival time within the �Twindow (Flights B and D in the example of Figure 4.4). Finally, from these possi-ble connections we select those with probability ↵ ⇤ flight connectivity factor. Theflight connectivity factor was defined in 3.1.2 and ↵ is an e↵ective parameter ofcontrol that allows to modify the strength of this e↵ect in the model. For instance,↵ = 0 means that there is no connection between flights with di↵erent tail number,while ↵ = 1 makes the fraction of connecting flights of the same airline equal tothe fraction of connecting passengers in the given airport. In the simulations, ↵ isvaried according to the case under study and �T is always taken to be 180 minutes(3 hours).
Let us suppose that from the previous example Flight D was randomly selected. Bythis subprocess an airplane is able to fly if and if only their connections have alreadyarrived to the airport, if not it has to wait until this condition is satisfied (Figure4.5). It is important to note that flight connectivity is the only source of stochasticityin the model due to a lack of knowledge about the real flight connections within theschedule. In this case the Actual Departure time of the next flight leg is given by:
T jact.d(pij) = max[T j
sch.d(pij);Tjact.a(pij) + Ts; max[T j
act.a(pi0j)]], 8i0 6= i (4.3)
4.3. Subprocesses 29
The equation that govern the rotation subprocess is given by:
T jact.d(pij) = max[T j
sch.d(pij);Tjact.a(pij) + Ts] (4.2)
where j corresponds to destination airport and i to the origin one. Thesubindexes act.d,act.a and sch.d correspond respectively to Actual Departure, Ac-tual Arrival and Schedule Departure.
4.3.2 Flight connectivity
In addition to rotational reactionary delay, the need to wait for load, connectingpassengers and/or crew from another delayed airplane from the same fleet (airlineid) may cause, as well, reactionary delay.
Figure 4.4: Possible connections within flights of the same airline.
For each flight at a particular airport, connections from that airport are ran-domly chosen as follows. Firstly, we take a �T window prior to the scheduleddeparture time of the flight. Secondly, we distinguish possible connections of thesame airline from other flights, that have a scheduled arrival time within the �Twindow (Flights B and D in the example of Figure 4.4). Finally, from these possi-ble connections we select those with probability ↵ ⇤ flight connectivity factor. Theflight connectivity factor was defined in 3.1.2 and ↵ is an e↵ective parameter ofcontrol that allows to modify the strength of this e↵ect in the model. For instance,↵ = 0 means that there is no connection between flights with di↵erent tail number,while ↵ = 1 makes the fraction of connecting flights of the same airline equal tothe fraction of connecting passengers in the given airport. In the simulations, ↵ isvaried according to the case under study and �T is always taken to be 180 minutes(3 hours).
Let us suppose that from the previous example Flight D was randomly selected. Bythis subprocess an airplane is able to fly if and if only their connections have alreadyarrived to the airport, if not it has to wait until this condition is satisfied (Figure4.5). It is important to note that flight connectivity is the only source of stochasticityin the model due to a lack of knowledge about the real flight connections within theschedule. In this case the Actual Departure time of the next flight leg is given by:
T jact.d(pij) = max[T j
sch.d(pij);Tjact.a(pij) + Ts; max[T j
act.a(pi0j)]], 8i0 6= i (4.3)
4.4. Initial conditions 31
Figure 4.6. Example of SAAR for three major airports: Atlanta International Airport(ATL), O’Hare International Airport (ORD) and Denver International Airport (DEN).
When aircraft rotation and airport congestion is present the equation is ruledby:
T jact.d(pij) = max[T j
sch.d(pij);Tjq (pij) + T j
act.a(pij) + Ts] (4.4)
where q means the time spent by the aircraft in the queue waiting to beserved. Finally, the full model dynamics is govern by a combination of the threesubprocesses:
T jact.d(pij) = max[T j
sch.d(pij);Tjq (pij) + T j
act.a(pij) + Ts; max[T jact.a(pi0j)]], 8i0 6= i (4.5)
4.4 Initial conditions
Initial condition refers to the situation of the first flight of an aircraft sequence,meaning when, where and the departure delay of this flight. Variations on thissituation can have a great impact on the delay propagation. In other words, thedynamics of delays over the network is highly sensitive to the initial conditions.
We characterized initial conditions by the average delay per flight for the firstflights of all the aircraft sequences and by the fraction of airplanes that their firstflight was delayed. Comparing the ranking of the 20 worst and best days of 2010(Figure 4.7) we can observe that it is most likely that if a day started with unfavor-able initial conditions it will likely produce large congested clusters.
The simulations can be initialized by two di↵erent ways depending on the caseunder study: from data or random initial conditions.
http://ifisc.uib-csic.es
Delay propagation dynamics
http://ifisc.uib-csic.es
Data/Model comparison
Data and model comparison for March 12 and April 19
A) B)
C) D)
Full model Plane rotation
Airport congestion
March 12 April 19
Connections
4am
8am
12am 4p
m8p
m12
pm0
20
40
60
80
100
Clu
ster
siz
e DataModel
4am
8am
12am 4p
m8p
m12
pm0
1
2
3
Time (EST)4a
m8a
m12
am 4pm
8pm
12pm
0
20
40
60
80
100
Clu
ster
siz
e DataModel
4am
8am
12am 4p
m8p
m12
pm0
1
2
3
Time (EST)
March 12 April 19
March 12 April 19 March 12 April 19
4am
8am
12am 4p
m8p
m12
pm0
20
40
60
80
100
Clu
ster
siz
e DataModel
4am
8am
12am 4p
m8p
m12
pm0
1
2
3
Time (EST)4a
m8a
m12
am 4pm
8pm
12pm
0
20
40
60
80
100
Clu
ster
siz
e DataModel
4am
8am
12am 4p
m8p
m12
pm0
1
2
3
Time (EST)
Good agreement between model and reality.
http://ifisc.uib-csic.es
Data/Model comparison
Data and model comparison for March 12 and April 19
A) B)
C) D)
Full model Plane rotation
Airport congestion
March 12 April 19
Connections
4am
8am
12am 4p
m8p
m12
pm0
20
40
60
80
100
Clu
ster
siz
e DataModel
4am
8am
12am 4p
m8p
m12
pm0
1
2
3
Time (EST)4a
m8a
m12
am 4pm
8pm
12pm
0
20
40
60
80
100
Clu
ster
siz
e DataModel
4am
8am
12am 4p
m8p
m12
pm0
1
2
3
Time (EST)
March 12 April 19
March 12 April 19 March 12 April 19
4am
8am
12am 4p
m8p
m12
pm0
20
40
60
80
100
Clu
ster
siz
e DataModel
4am
8am
12am 4p
m8p
m12
pm0
1
2
3
Time (EST)4a
m8a
m12
am 4pm
8pm
12pm
0
20
40
60
80
100
Clu
ster
siz
e DataModel
4am
8am
12am 4p
m8p
m12
pm0
1
2
3
Time (EST)
Good agreement between model and reality.
A) B)
C) D)
Full model Plane rotation
Airport congestion
March 12 April 19
Connections
4am
8am
12am 4p
m8p
m12
pm0
20
40
60
80
100
Clus
ter s
ize Data
Model
4am
8am
12am 4p
m8p
m12
pm0
1
2
3
Time (EST)4a
m8a
m12
am 4pm
8pm
12pm
0
20
40
60
80
100
Clus
ter s
ize Data
Model
4am
8am
12am 4p
m8p
m12
pm0
1
2
3
Time (EST)
March 12 April 19
March 12 April 19 March 12 April 19
4am
8am
12am 4p
m8p
m12
pm0
20
40
60
80
100
Clus
ter s
ize Data
Model
4am
8am
12am 4p
m8p
m12
pm0
1
2
3
Time (EST)4a
m8a
m12
am 4pm
8pm
12pm
0
20
40
60
80
100
Clus
ter s
ize Data
Model
4am
8am
12am 4p
m8p
m12
pm0
1
2
3
Time (EST)
A) B)
C) D)
Full model Plane rotation
Airport congestion
March 12 April 19
Connections
4am
8am
12am 4p
m8p
m12
pm0
20
40
60
80
100
Clus
ter s
ize Data
Model
4am
8am
12am 4p
m8p
m12
pm0
1
2
3
Time (EST)4a
m8a
m12
am 4pm
8pm
12pm
0
20
40
60
80
100
Clus
ter s
ize Data
Model
4am
8am
12am 4p
m8p
m12
pm0
1
2
3
Time (EST)
March 12 April 19
March 12 April 19 March 12 April 19
4am
8am
12am 4p
m8p
m12
pm0
20
40
60
80
100
Clus
ter s
ize Data
Model
4am
8am
12am 4p
m8p
m12
pm0
1
2
3
Time (EST)4a
m8a
m12
am 4pm
8pm
12pm
0
20
40
60
80
100
Clus
ter s
ize Data
Model
4am
8am
12am 4p
m8p
m12
pm0
1
2
3
Time (EST)
A) B)
C) D)
Full model Plane rotation
Airport congestion
March 12 April 19
Connections
4am
8am
12am 4p
m8p
m12
pm0
20
40
60
80
100
Clus
ter s
ize Data
Model
4am
8am
12am 4p
m8p
m12
pm0
1
2
3
Time (EST)4a
m8a
m12
am 4pm
8pm
12pm
0
20
40
60
80
100
Clus
ter s
ize Data
Model
4am
8am
12am 4p
m8p
m12
pm0
1
2
3
Time (EST)
March 12 April 19
March 12 April 19 March 12 April 19
4am
8am
12am 4p
m8p
m12
pm0
20
40
60
80
100
Clus
ter s
ize Data
Model
4am
8am
12am 4p
m8p
m12
pm0
1
2
3
Time (EST)4a
m8a
m12
am 4pm
8pm
12pm
0
20
40
60
80
100
Clus
ter s
ize Data
Model
4am
8am
12am 4p
m8p
m12
pm0
1
2
3
Time (EST)
http://ifisc.uib-csic.es
Data/Model comparison
Data and model comparison for March 12 and April 19
A) B)
C) D)
Full model Plane rotation
Airport congestion
March 12 April 19
Connections
4am
8am
12am 4p
m8p
m12
pm0
20
40
60
80
100
Clu
ster
siz
e DataModel
4am
8am
12am 4p
m8p
m12
pm0
1
2
3
Time (EST)4a
m8a
m12
am 4pm
8pm
12pm
0
20
40
60
80
100
Clu
ster
siz
e DataModel
4am
8am
12am 4p
m8p
m12
pm0
1
2
3
Time (EST)
March 12 April 19
March 12 April 19 March 12 April 19
4am
8am
12am 4p
m8p
m12
pm0
20
40
60
80
100
Clu
ster
siz
e DataModel
4am
8am
12am 4p
m8p
m12
pm0
1
2
3
Time (EST)4a
m8a
m12
am 4pm
8pm
12pm
0
20
40
60
80
100
Clu
ster
siz
e DataModel
4am
8am
12am 4p
m8p
m12
pm0
1
2
3
Time (EST)
Good agreement between model and reality.
http://ifisc.uib-csic.es
System resilience
• With random initial conditions…
• Each day is potentially a bad day, if some initial conditions are met. • Flight connectivity is a key factor for the rise of congestion in the network. • Sensitivity to initial conditions.
April 19 March 12
α =
0.1
α =
0.03
http://ifisc.uib-csic.es
What about October 27 ?
http://ifisc.uib-csic.es
External perturbation
4am
6am
8am
10am
12am 2p
m4p
m6p
m8p
m10
pm12
pm 2am
4am
Time [EST]
0
20
40
60
80
Clu
ster
siz
e p
er h
ou
r DataBasic modelBaseline model
http://ifisc.uib-csic.es
External perturbation: variants
Variant 1: • Baseline + … • Connectivity drops to 0
between 7 pm & 9 pm EST.
Variant 2: • Baseline + … • Connectivity drops to 0.13
between 7 pm & 9 pm EST.
Variant 3: • Baseline + … • Connectivity drops to 0.13
between 6 pm & 10 pm EST.
• Improve the matching. • Make sense to interpret cancelation policies as a decrease on the network connectivity. • Higher sensitivity to time period ΔTα .
Ø What about the declining phase ?
4am
6am
8am
10am
12am 2p
m4p
m6p
m8p
m10
pm12
pm 2am
4am
Time [EST]
0
20
40
60
80
Clu
ster
siz
e per
hour Data
Baseline modelVariant 1
A)
4am
6am
8am
10am
12am 2p
m4p
m6p
m8p
m10
pm12
pm 2am
4am
Time [EST]
0
20
40
60
80C
lust
er s
ize
per
hour Data
Baseline modelVariant 2
B)
4am
6am
8am
10am
12am 2p
m4p
m6p
m8p
m10
pm12
pm 2am
4am
Time [EST]
0
20
40
60
80
Clu
ster
siz
e per
hour Data
Baseline modelVariant 3
C)
http://ifisc.uib-csic.es
4am
6am
8am
10am
12am 2p
m4p
m6p
m8p
m10
pm12
pm 2am
4am
Time [EST]
0
20
40
60
80
100C
lust
er s
ize
per
ho
ur Data
Baseline modelSchedule: Oct 20
Effect of the schedule
• For comparison purposes: schedule of October 20.
§ This day showed a low level of congestion: largest cluster size of 2.
• Figure: Initial conditions of October 27 run using the schedule of October 20.
• Schedule of October 27 was not the reason for the unfolding of the delays. • Real intervention measures on October 27 were a palliative to the delay spreading mechanism.
http://ifisc.uib-csic.es
Conclusions
• We defined a way of measuring the network-wide spread of the delays
Ø Strong variability between days and intraday
• We introduced a model able to reproduce the cluster dynamics in the data
Ø Resilience of the system Ø Non-negligible risk of system instability (systemic delay) Ø Other transport modes
• Mimic external perturbations to the system.
Ø Perturbations could be model as a decrease in the airport capacity parameter. Ø Intervention measures modeled as a decrease in the network connectivity. Articles: Ø P. Fleurquin, J.J. Ramasco, V.M. Eguiluz, “Systemic delay propagation in the US airport
network”, Scientific Reports 3, 1159 (2013). Ø P. Fleurquin, J.J. Ramasco, V.M. Eguiluz, “ Data-driven modeling of systemic delay
propagation under severe meteorological conditions”, Tenth USA/Europe Air Traffic Management Research and Development Seminar 2013.
Ø Spanish patent pending, filed December 14 2012, number P201231942. Ø P. Fleurquin, J.J. Ramasco, V.M. Eguiluz, “Characterization of delay propagation in the airport
network”, submitted to Proceedings of the 2012 Air Transport Research Society Conference.