shield synthesis for ai - tumkretinsk/live_2019_shields.pdf · 2019-05-03 · bettina könighofer...
TRANSCRIPT
Bettina Könighofer Shield Synthesis for AI
www.iaik.tugraz.at www.iaik.tugraz.at
Shield Synthesis for AI
Bettina KönighoferRoderick Bloem
Ufuk TopcuScott NiekumMohammed AlshiekSuda Bharadwaj
Rüdiger Ehlers
Nils Jansen
Sebastian Junges
Rayna Dimitrova
Thomas HenzingerGuy AvniKrishnendu Chatterjee
Bettina Könighofer Shield Synthesis for AI
2
LiVe @ ETAPS, PragueApril 6, 2019
Reinforcement Learning
ReactiveSynthesis
Your Controller
Synthesis
Your Specification
𝑮 𝑏𝑙𝑜𝑐𝑘𝑒𝑑 → 𝑭𝑅 ∧ 𝑮 𝑏𝑙𝑜𝑐𝑘𝑒𝑑 → 𝑭𝑆∧ 𝑮 𝑏𝑙𝑜𝑐𝑘𝑒𝑑 → 𝑿 𝐶 𝑼 𝑏𝑙𝑜𝑐𝑘𝑒𝑑
Infinitely often, visit R and S. If S is blocked, go to C. Resume visiting R and S once S is unblocked.
via Game Solving
Bettina Könighofer Shield Synthesis for AI
3
LiVe @ ETAPS, PragueApril 6, 2019
Reinforcement Learning
ReactiveSynthesis
• Large• Complicated• Highly optimized• Many sensors• …
Your Controller
• Large• Hard to write• Greyscale
Your Specification
Bettina Könighofer Shield Synthesis for AI
4
Reinforcement Learning
Reinforcement Learning
ReactiveSynthesis
EnvironmentLearning
Agent𝑠𝑡𝑎𝑡𝑒
𝑎𝑐𝑡𝑖𝑜𝑛
𝑟𝑒𝑤𝑎𝑟𝑑
LiVe @ ETAPS, PragueApril 6, 2019
Bettina Könighofer Shield Synthesis for AI
5
LiVe @ ETAPS, PragueApril 6, 2019
Reinforcement Learning
ReactiveSynthesis
Correctness Guarantees
OptimalityHow?
Bettina Könighofer Shield Synthesis for AI
6
LiVe @ ETAPS, PragueApril 6, 2019
Reinforcement Learning
ReactiveSynthesis
Correctness Guarantees
Optimality
• Large• Complicated• Highly optimized• Many sensors• …
Your Controller
• Large• Hard to write• Greyscale
Your Specification
Shielding
Bettina Könighofer Shield Synthesis for AI
7
LiVe @ ETAPS, PragueApril 6, 2019
Reinforcement Learning
ReactiveSynthesis
• Critical aspects only• Small & sweet
• Large• Complicated• Highly optimized• Many sensors• …
Your Controller Critical Spec
Correctness Guarantees
OptimalityShielding
Bettina Könighofer Shield Synthesis for AI
8
LiVe @ ETAPS, PragueApril 6, 2019
EnvironmentLearning
Agent
Shield
ShieldingPreemptive
Reinforcement Learning
ReactiveSynthesis
Minimal Interference
Bettina Könighofer Shield Synthesis for AI
9
LiVe @ ETAPS, PragueApril 6, 2019
ShieldingPost-Posed
Environment Learning Agent
Shield
Policy Update: for 𝒔𝒂𝒇𝒆_𝒂𝒄𝒕𝒊𝒐𝒏 using 𝑟𝑒𝑤𝑎𝑟𝑑 for 𝑎𝑐𝑡𝑖𝑜𝑛 if 𝑎𝑐𝑡𝑖𝑜𝑛 𝒔𝒂𝒇𝒆_𝒂𝒄𝒕𝒊𝒐𝒏 :
1. Assign a punishment to 𝑎𝑐𝑡𝑖𝑜𝑛2. Assign 𝑟𝑒𝑤𝑎𝑟𝑑 to 𝑎𝑐𝑡𝑖𝑜𝑛
Shield can be added in execution phase
Bettina Könighofer Shield Synthesis for AI
A Shield for PAC-MAN10
LiVe @ ETAPS, PragueApril 6, 2019
M. Alshiekh, R. Bloem, R. Ehlers, B. Könighofer, S. Niekum, U. Topcu: Safe Reinforcement Learning via Shielding. AAAI 2018
Bettina Könighofer Shield Synthesis for AI
Outline11
Safety Shields
Optimal Shields
Safety Shields for Multi-Agent Systems
Probabilistic Safety Shields
LiVe @ ETAPS, PragueApril 6, 2019
AAAI-18
submission
ACC-19
arXiv
Bettina Könighofer Shield Synthesis for AI
Optimal Shields12
LiVe @ ETAPS, PragueApril 6, 2019
Problems of learned controllers (Safety problems)
1. Difficult to add new features2. Poor performance on
un-trained behavior3. No local fairness
Solution:Optimal Shield
Bettina Könighofer Shield Synthesis for AI
Shields for Traffic Light Controllers13
LiVe @ ETAPS, PragueApril 6, 2019
Learned Controller: “minimize total waiting time”
1. Difficult to add new features priority to public transport, changes due to an accident
2. Poor performance on un-trained behavior Uniform traffic congestion meets rush-hour traffic
3. No local fairness Farm road never gets green
Bettina Könighofer Shield Synthesis for AI
Lightweight shields 𝑐 : Cost for behavior 𝑐 : Cost for interference
Optimal Shields Synthesis14
LiVe @ ETAPS, PragueApril 6, 2019
Environment Learned Controller
opt-Shield’
𝝀 ⋅ 𝒄𝑩𝑬𝑯 𝟏 𝝀 ⋅ 𝒄𝑰𝑵𝑻
Mean-Payoff Game with 2 Objectives Mean-Payoff Game
𝜆: tradeoff between objective of controller vs shield
Two cost functions
Bettina Könighofer Shield Synthesis for AI
Controller Deep Convolutional Q-Network
16 dim input vector num approaching cars, waiting time 4 layers (16, 604, 604, 4 nodes), Q-learning: 𝛼 0.001, 𝛾 0.95
“Minimize waiting time of two junctions” Shield 𝑐 : size of maximal queue 𝑐 : 1 for interference, 0 otherwise
Dealing with rush-hour traffic15
LiVe @ ETAPS, PragueApril 6, 2019
𝑎𝑏𝑠𝑡𝑟𝑎𝑐𝑡 𝑠𝑡𝑎𝑡𝑒1,8,1,2
Bettina Könighofer Shield Synthesis for AI
Dealing with rush-hour traffic16
LiVe @ ETAPS, PragueApril 6, 2019
Bettina Könighofer Shield Synthesis for AI
Outline17
Safety Shields
Optimal Shields
Safety Shields for Multi-Agent Systems
Probabilistic Safety Shields
LiVe @ ETAPS, PragueApril 6, 2019
AAAI-18
submission
ACC-19
arXiv
Bettina Könighofer Shield Synthesis for AI
Safety Shields for Multi-Agent Systems18
Task: Enforce global safety property1. Quantitative interference costs Counting cost function Different costs for interferences with different agents
2. Fair Shielding Do not always interfere with the same agent repeatedly
LiVe @ ETAPS, PragueApril 6, 2019
Bettina Könighofer Shield Synthesis for AI
Case Study: Warehouse19
LiVe @ ETAPS, PragueApril 6, 2019
Bettina Könighofer Shield Synthesis for AI
Case Study: Warehouse20
LiVe @ ETAPS, PragueApril 6, 2019
S. Bharadwaj, R. Bloem, R. Dimitrova, B. Könighofer, and U. Topcu: Synthesis of Minimum-Cost Shields for Multi-agent Systems. ACC-19
Bettina Könighofer Shield Synthesis for AI
Outline21
Safety Shields
Optimal Shields
Safety Shields for Multi-Agent Systems
Probabilistic Safety Shields
LiVe @ ETAPS, PragueApril 6, 2019
AAAI-18
submission
ACC-19
arXiv
Bettina Könighofer Shield Synthesis for AI
Shielding original Pacman?22
LiVe @ ETAPS, PragueApril 6, 2019
State space is huge!
Not realizable!
Bettina Könighofer Shield Synthesis for AI
Learning the Adversary Model23
Each ghost has it‘s individual behaviour Observe it, model the behaviour Data augmentation techniques Is PAC-MAN north, south, east, or west?
Results in MDP of environment Guaranteed safety w.r.t. probabilistic temporal
logic spec
LiVe @ ETAPS, PragueApril 6, 2019
Bettina Könighofer Shield Synthesis for AI
MDP is huge! Scalability24
Finite Horizon safety for finite number of steps infinite horizon may cause large errors anyways
Piecewise Construction compute shield for each state independently
LiVe @ ETAPS, PragueApril 6, 2019
Bettina Könighofer Shield Synthesis for AI
MDP is huge! Scalability25
Independent Agents crashing probabilities for different agents are
stochastically independent compute individually, compose shields
Abstractions adversaries may be far away neglect adversary positions that are not relevant
LiVe @ ETAPS, PragueApril 6, 2019
Bettina Könighofer Shield Synthesis for AI
Probabilistic Safety Shield for Pacman26
LiVe @ ETAPS, PragueApril 6, 2019
N. Jansen, B. Könighofer2, S. Junges, and R. Bloem:Shielded Decision-Making in MDPs, arXiv
Bettina Könighofer Shield Synthesis for AI
Future Work27
Safety Shields
Optimal Shields
Safety Shields for Multi-Agent Systems
Probabilistic Safety Shields
LiVe @ ETAPS, PragueApril 6, 2019
Performance in autonomous systems
Shields for CPS, Deal with wrong models
Partially observable MDPs
Distributed Shield Synthesis
Bettina Könighofer Shield Synthesis for AI
28
LiVe @ ETAPS, PragueApril 6, 2019