optimizing collision avoidance in dense airspace using deep … · 2019. 6. 26. · vicas for...
Post on 13-Feb-2021
1 Views
Preview:
TRANSCRIPT
-
Optimizing Collision Avoidance in Dense Airspace using Deep Reinforcement Learning
Sheng Li, Maxim Egorov and Mykel Kochenderfer
06/19/2019
19 June, 2019 Optimizing Collision Avoidance in Dense Airspace using Deep Reinforcement Learning1
-
High Density Airspace Operations
Define a dense airspace: when aircraft having encounters
Pr num_intruders > 1 > 50%
19 June, 2019 Optimizing Collision Avoidance in Dense Airspace using Deep Reinforcement Learning2
Future airspace simulation for Auckland
-
High Density Airspace Operations
19 June, 2019 Optimizing Collision Avoidance in Dense Airspace using Deep Reinforcement Learning3
Unmanned Flight Management
CAS: Collision Avoidance Systems
-
• Current methods
• Proposed solution: deep correction
• Results and analysis
• Summary and future work
19 June, 2019 Optimizing Collision Avoidance in Dense Airspace using Deep Reinforcement Learning4
Outline
-
Current Methods for Multi-agent CASORCA [1] ACAS X [2]
Principle Geometry based Value table based
Advantages • Fast• Smooth
• Robust• Safe• Can handle uncertainties
Disadvantages
• Hard to tune• Sensitive to uncertainties• Sometimes infeasible
• Optimized for pairwise encounters
• Possibly over-conservative
Not optimized for dense airspace
19 June, 2019 Optimizing Collision Avoidance in Dense Airspace using Deep Reinforcement Learning5
A multi-agent encounter
-
Ownship
vown
Intrudervint
⇢
✓
ψ
VICAS: resolve pairwise encounter
• Markov decision process (MDP)
• Assume horizontal (2D) encounters
• State: 𝑠 = 𝜌, 𝜃, 𝜑, 𝑣89:, 𝑣;:<• Action: 𝐴 = −10,−5, 0,+5,+10 @/sec for heading ∪
{Clear of ConJlict (𝐶𝑂𝐶)}
• Reward: 𝑅 𝑠, 𝑎 = 𝑅RS8TU:UTT 𝑠 + 𝑅R8SS; 𝑠 + 𝑅VW
-
Utility Decomposition and Fusion
Utility Decomposition: dividing the encounter into pairwise encounters
Utility Fusion: “adding-up” pairwise utilities to decide on safe actions
Ownship
Intruder
19 June, 2019 Optimizing Collision Avoidance in Dense Airspace using Deep Reinforcement Learning7
Ownship Action
Decomposition
𝑠`𝑠a
…𝑠:
𝑄`∗
𝑄a∗
𝑄:∗
𝑠 argmax𝑄S89∗
Fusion
-
VICAS for Multi-Intruders
Ownship Action
Decomposition
𝑠`𝑠a
…
𝑠:
𝑄`∗
𝑄a∗
𝑄:∗
𝑠 argmax𝑄S89∗
VICASClosest: using the closest intruder𝑄S89∗ 𝑠, 𝑎 ≈ 𝑄;∗ 𝑠;, 𝑎𝑖 = arg mind∈ `,…,: 𝜌d
• A very rough approximation
VICASMulti: using all the 𝑛 intruders𝑄S89∗ (𝑠, 𝑎) ≈ min;∈{`,…,:} 𝑄;
∗(𝑠;, 𝑎)
• Considers the most dangerous intruder for each action, risk averse
Fusion
19 June, 2019 Optimizing Collision Avoidance in Dense Airspace using Deep Reinforcement Learning8
-
Airspace SimulationORCA with realistic “clamped” dynamics:• 𝜏 = 1 sec, 𝑅 = 150 m• 𝑣 i_j = 50 m/sec, 𝑎 i_j = 2 m/seca• 𝑇𝑢𝑟𝑛𝑅𝑎𝑡𝑒 i_j = 108/sec
19 June, 2019 Optimizing Collision Avoidance in Dense Airspace using Deep Reinforcement Learning9
VICASMulti:• 𝑣 = 50m/sec• 𝑇𝑢𝑟𝑛𝑅𝑎𝑡𝑒 ∈ ±10,±5, 0 @/sec
High NMAC rates due to 2nd-order dynamics Lower NMAC rates but unsteady airspace
-
Safety and Efficiency
10 20 30 40 50 60
Take-o↵ Rates (flight / km2-hr)
0
20
40
60
80
NM
AC
s/
Fligh
tH
our
(⇥10
�3)
No CAS
ORCA
VICASClosest
VICASMulti
Safety: Near Mid-Air Collision (NMAC) Rate Efficiency: |Taken Path| / |Shortest Path|
10 20 30 40 50 60
Take-o↵ Rates (flight / km2-hr)
1.0
1.2
1.4
1.6
1.8
2.0
Nor
mal
ized
Rou
teLen
gth
Region: 10 km × 10 km, Demand: geographically uniform, Simulation Time: 5000 sec
19 June, 2019 Optimizing Collision Avoidance in Dense Airspace using Deep Reinforcement Learning10
NMAC rates of VICASMulti increase explosively Low efficiency causes congestion in the airspace
-
• Current methods
• Proposed solution: deep correction
• Results and analysis
• Summary and future work
19 June, 2019 Optimizing Collision Avoidance in Dense Airspace using Deep Reinforcement Learning11
Outline
-
What is Deep Correction?• Multi-fidelity optimization
• The high fidelity model (𝑓u;) is expensive to evaluate
• Use a surrogate model :
𝑓u; ≈ 𝑓S89 + 𝛿
𝑄∗ ≈ 𝑄S89∗ + 𝛿
• Correction 𝛿 is a deep Q-network
19 June, 2019 Optimizing Collision Avoidance in Dense Airspace using Deep Reinforcement Learning12
AgentAction
𝑠`
…𝑠:
𝑄`∗
𝑄:∗𝑠
argmax𝑄S89∗
Correction 𝛿DQN(𝜃)
+
…
A diagram for deep correction
-
• 𝑄∗ is hard to optimize
• 𝑄S89∗ is easy to obtain
• Deep Q-network is powerful
Why Deep Correction?
𝑄∗ ≈ 𝑄S89∗ + 𝛿
19 June, 2019 Optimizing Collision Avoidance in Dense Airspace using Deep Reinforcement Learning13
AgentAction
𝑠`
…𝑠:
𝑄`∗
𝑄:∗𝑠
argmax𝑄S89∗
Correction 𝛿DQN(𝜃)
+
…
A diagram for deep correction
-
Deep CorrectionUtility decomposition / fusion + deep correction:
Agent
Action
𝑠`𝑠a
…𝑠:
𝑄`∗
𝑄a∗
𝑄:∗
𝑠
argmax
𝑄S89∗
Correction 𝛿DQN(𝜃)
+
w𝑄∗ = 1 − 𝑘 𝑄S89∗ + 𝑘𝛿
Decomposition Fusion:VICASMulti
19 June, 2019 Optimizing Collision Avoidance in Dense Airspace using Deep Reinforcement Learning14
-
Deep Q-Network
𝑠 𝑄(𝑠, 𝑎; 𝜃)
Weights: 𝜃
Neural networks are universal nonlinear function approximators
19 June, 2019 Optimizing Collision Avoidance in Dense Airspace using Deep Reinforcement Learning15
-
Deep Q-Learning
Example: training DQNs to play Atari games [3]
19 June, 2019 Optimizing Collision Avoidance in Dense Airspace using Deep Reinforcement Learning16
Deep Q-learning can achieve superhuman performance in Atari games
-
Correction State Formation
AgentAction
𝑄S89∗𝑠
argmax
VICASMulti
Correction 𝛿DQN(𝜃)
+𝑠W
𝑠W needs a fixed size
CorrectedSector: CorrectedClosest: Add destination info in state:• Efficiency stimulation in reward
• 𝑅 𝑠, 𝑎 = 𝑅RS8TU:UTT 𝑠 + 𝑅R8SS; 𝑠 +𝑅VW
-
• Current methods
• Proposed solution: deep correction
• Results and analysis
• Summary and future work
19 June, 2019 Optimizing Collision Avoidance in Dense Airspace using Deep Reinforcement Learning18
Outline
-
Policy Sensitivity
Advisory maps (pairwise): corrected CAS have more compact alert area
19 June, 2019 Optimizing Collision Avoidance in Dense Airspace using Deep Reinforcement Learning19
-
Policy Sensitivity
19 June, 2019 Optimizing Collision Avoidance in Dense Airspace using Deep Reinforcement Learning20
Advisory maps (multi-threat): corrected CAS have more compact alert area
-
Policy Sensitivity
Encounter simulations with fixed numbers of aircraft
19 June, 2019 Optimizing Collision Avoidance in Dense Airspace using Deep Reinforcement Learning21
Corrected CAS have low alert frequencies
-
Trajectories
VICASMulti VICASClosestNo CAS
19 June, 2019 Optimizing Collision Avoidance in Dense Airspace using Deep Reinforcement Learning22
An encounter with NMAC Winding routes and oscillations in actions Less winding routes
-
Trajectories
CorrectedClosest CorrectedSector
19 June, 2019 Optimizing Collision Avoidance in Dense Airspace using Deep Reinforcement Learning23
Avoiding collision with minimal maneuvers and straightforward paths
-
Airspace Simulation
19 June, 2019 Optimizing Collision Avoidance in Dense Airspace using Deep Reinforcement Learning24
VICASMulti:• 𝑣 = 50m/sec• 𝑇𝑢𝑟𝑛𝑅𝑎𝑡𝑒 ∈ ±10,±5, 0 @/sec
CorrectedClosest:• 𝑣 = 50m/sec• 𝑇𝑢𝑟𝑛𝑅𝑎𝑡𝑒 ∈ ±10,±5, 0 @/sec
Unsteady airspace Lower NMAC rates and steady airspace
-
Safety and Efficiency
Safety: NMAC Rate Efficiency: |Taken Path| / |Shortest Path|
10 20 30 40 50 60
Take-o↵ Rates (flight / km2-hr)
0
20
40
60
80
NM
AC
s/
Fligh
tH
our
(⇥10
�3)
No CAS
ORCA
VICASClosest
VICASMulti
CorrectedSector
CorrectedClosest
10 20 30 40 50 60
Take-o↵ Rates (flight / km2-hr)
1.0
1.2
1.4
1.6
1.8
2.0
Nor
mal
ized
Rou
teLen
gth
Region: 10 km × 10 km, Demand: geographically uniform, Simulation Time: 5000 sec
19 June, 2019 Optimizing Collision Avoidance in Dense Airspace using Deep Reinforcement Learning25
Corrected CAS have low NMAC rates Corrected CAS have decent route efficiency
-
Trade-off
1 2 3 4 5 6 7 8
Normalized Route Length
0
20
40
60
80
NM
AC
s/
Flig
htH
our
(⇥10
�3 )
Take-o↵ Rate = 20
Take-o↵ Rate = 40
Take-o↵ Rate = 60
No CAS
ORCA
VICASClosest
VICASMulti
CorrectedSector
CorrectedClosest
1.0 1.2 1.4 1.6 1.8 2.0
Normalized Route Length
0
5
10
15
20
25
30
35
40
NM
AC
s/
Flig
htH
our
(⇥10
�3 )
Take-o↵ Rate = 20
Take-o↵ Rate = 40
Take-o↵ Rate = 60
No CAS
ORCA
VICASClosest
VICASMulti
CorrectedSector
CorrectedClosest
19 June, 2019 Optimizing Collision Avoidance in Dense Airspace using Deep Reinforcement Learning26
Corrected CAS are the best performing points at the bottom left corners of the Pareto frontiers
-
The impact of the active CAS on the encounter distribution
𝐷}~ 𝑃@ || 𝑃 =12j∈
𝑃@ 𝑥 − 𝑃(𝑥)
𝑥 is the number of intruders in an encounter
Impact on Encounters
# intruders
19 June, 2019 Optimizing Collision Avoidance in Dense Airspace using Deep Reinforcement Learning27
Corrected CAS have low impact on the encounter distribution
-
• Current methods
• Proposed solution: deep correction
• Results and analysis
• Summary and future work
19 June, 2019 Optimizing Collision Avoidance in Dense Airspace using Deep Reinforcement Learning28
Outline
-
Summary for Deep Correction
• Using a deep Q-network as correction term
• Trained by deep Q-learning
• Both safety and efficiency are improved in dense airspace
• Impact on encounter distribution is small
Agent
Action
𝑠`
…
𝑠:
𝑄`∗
𝑄:∗𝑠
argmax𝑄S89∗
Correction 𝛿DQN(𝜃)
+
19 June, 2019 Optimizing Collision Avoidance in Dense Airspace using Deep Reinforcement Learning29
-
Future Work• Exploring relationships between strategic deconfliction and on-
board collision avoidance
• “End-to-end”: integrating guidance and collision avoidance
• “Reciprocal”: considering the “reactive nature” of the other aircraft
• Using Multi-agent reinforcement learning framework
19 June, 2019 Optimizing Collision Avoidance in Dense Airspace using Deep Reinforcement Learning30
SharedPolicy
World
!(#$%)
obs 1
obs 2
obs n
……
Action 1
Action 2
Action n
……
A framework with centralized policy and decentralized control
-
Q&A
19 June, 2019 Optimizing Collision Avoidance in Dense Airspace using Deep Reinforcement Learning31
-
References
[1] Van Den Berg, Jur, et al. “Reciprocal n-body collision avoidance,” Robotics research. Springer, Berlin, Heidelberg, 2011. 3-19.
[2] M. J. Kochenderfer, J. E. Holland, and J. P. Chryssanthacopoulos, “Next generation airborne collision avoidance system,” Lincoln Laboratory Journal, vol. 19, no. 1, pp. 17–33, 2012.
[3] V. Mnih, et al. (2013). “Playing Atari with Deep Reinforcement Learning,” Available at https://arxiv.org/abs/1312.5602.
19 June, 2019 Optimizing Collision Avoidance in Dense Airspace using Deep Reinforcement Learning32
https://arxiv.org/abs/1312.5602
top related