when artificial intelligence (ai) meets autonomous vehicles (av)...
TRANSCRIPT
Ching-Yao ChanBerkeley DeepDrive, UC Berkeley
Cooperative Interacting Vehicles Summer School 2018Domaine de Chalès - Nouan-le-Fuzelier, France
September 4, 2018
When Artificial Intelligence (AI)
Meets
Autonomous Vehicles (AV)
• Berkeley DeepDrive, Brief Introduction
• Emergence of AV and AI
• AI in AV, Why and How?
• Reinforcement Learning (RL) and Inverse Reinforcement Learning (IRL)• Topic to be covered by Pin Wang
• AI for Deployment
• The Ultimate Driving Machine
• Concluding Remarks
Presentation Outline
• Berkeley Vision Learning Center• A consortium that started in 2012• Tremendous advances in computer vision and deep learning• Open-source CAFFE, widely used globally
• Now Berkeley Artificial Intelligence Research (BAIR)• https://bair.berkeley.edu/
• Berkeley DeepDrive (BDD) Center• A consortium that started in Spring 2016• Seeking to apply AI and deep learning technologies to
automotive applications.
Deep Learning at Berkeley
Berkeley DeepDrive• Current industrial members include: (as of August 2018)
– Automakers and Suppliers: • Ford, GM, Honda, Hyundai, SF Motors, Toyota• Continental, ZF
– Mobility Operators and Providers: • Didi Chuxing, Meituan-Dianping, UISEE, Zenity Mobility
– Technology providers: • Autobrain, Baidu, Huawei, Mapillary, Nexar• Nvidia, NXP, Panasonic, Samsung, Sony
Our M ission:
We seek to merge deep learning w ith automotive perception and bring computer vision technology to the forefront.
Berkeley DeepDriveSee deepdrive.berkeley.edu for lists of projects and researchers
Pushing the scientific forefronts of• Computer Vision/ Autonomous Perception• Automated Driving Systems• Robotics• A.I ./ Machine Learning
Berkeley DeepDriveDeep Learning Autonomy
BDD Research Themes
BDD Research Intelligence for Autonomy
Skill Sets of Intelligent Dynamic Systems
BDD Research and Applications
Autonomy for
Intelligent Systems
BDD-100k Data Release, 05/2018See bdd-data.berkeley.edu for detail and archived paper
100K Videos
“Autonomous” Vehicles for Real in 2018-2021?
AV Testing in California
As of August 23, 2018,
• There are 56 Autonomous Vehicle Testing permit holders.
• More than 400 test vehicles.
Latest News about Vehicle Automation• Toyota invests 500M in Uber, and aim for deployment in 2021 (08/2018)• Waymo pilot program shows self-driving cars can boost transit (07/2018)• Drive.ai self-driving car hitting road in Frisco, Texas (07/2018)• Ford hives off self-driving operations (07/2018)• Waymo partners with Walmart to shuttle customers in self-driving cars (07/2018)• Mercedes (+Nvidia+Bosch) will launch self-driving taxi in California next year (07/2018) • Uber, Waymo in talks about self-driving partnership: Uber CEO (05/2018)• Ford's self-driving car network will launch 'at scale' in 2021. (05/2018)• Apple reportedly working with Volkswagen on self-driving vans. (05/2018)• Aptiv, Lyft launch Las Vegas fleet of self-driving cars (05/2018)• Waymo and Honda reportedly will build a self-driving delivery vehicle. (04/2018)• Auto parts maker Magna invests $200 million in Lyft (03/2018)• ……
The (Fourth) Wave of A.I.
Doing Better and BetterWith Deeper and Deeper Networks
* End-to-End Training of Deep Visuomotor Policies, Levine et al, 2015
Deep Learning: From Image to Control
How can Deep Learning (AI) Help (Self-Driving) Vehicles?
Automobiles A.I.
A Great Enabler
Machine Learning/ A.I . & Automated Driving
A Fitting Challenge
Where and How Best to Utilize?
Automated Driving Systems (ADS) - Functional Block Diagram
DrivingEnvironment
Actuation
Sensing(camera,
radar, lidar, etc.)
VehicleKinematic & Dynamic
Model
Control Commands
EgoVehicleStates
Trajectory Planning
Driver
Autonomous Perception
Mapping & Localization
Route Planning
Automated Driving Systems (ADS) - Feedforward and Feedback in Control Systems
DrivingEnvironment
Actuation
Sensing(camera,
radar, lidar, etc.)
VehicleKinematic & Dynamic
Model
Control Commands
EgoVehicleStates
Trajectory Planning
Driver
Autonomous Perception
Mapping & Localization
Route Planning Feedforward
Conventional Vehicle Control
DisciplineFeedback
Automated Driving Systems (ADS) - DNN End-to-End Learning for ADS
DrivingEnvironment
Actuation
Sensing(camera,
radar, lidar, etc.)
VehicleKinematic & Dynamic
Model
Control Commands
EgoVehicleStates
Trajectory Planning
Driver
Autonomous Perception
Mapping & Localization
Route Planning
*End-to-end Learning for Self-Driving Cars, Nvidia, 2016
End-to-End Learning for Self-Driving Cars
(NVIDIA, 2016)• Minimum training data used to learn to
drive in traffic on local roads with or without lane markings and on highways.
• The system learns internal representations such as detecting useful road features with only the human steering angle as the training signal.
• A convolutional neural network (CNN) to map raw pixels from a single front-facing camera directly to steering commands.
Automated Driving Systems (ADS) - End-to-End Learning for Self-Driving Cars
DrivingEnvironment
Actuation
Sensing(camera,
radar, lidar, etc.)
VehicleKinematic & Dynamic
Model
Control Commands
EgoVehicleStates
Trajectory Planning
Driver
Autonomous Perception
Mapping & Localization
Route Planning
*End-to-end Learning for Self-Driving Cars, Nvidia, 2016
?
Automated Driving Systems (ADS) End-to-end to predict future egomotion (UCB Darrell’s Group)
DrivingEnvironment
Actuation
Sensing(camera,
radar, lidar, etc.)
VehicleKinematic & Dynamic
Model
Control Commands
EgoVehicleStates
Trajectory Planning
Driver
Autonomous Perception
Mapping & Localization
Route Planning
An end-to-end trainable architecture for learning to predict a distribution over future vehicle egomotion
*End-to-end Learning of Driving Models from Large-scale Video Datasets, Xu et al, CVPR 2017
End-to-End Learning of Driving Models (UCB Darrell’s Group, 2017)
• Exploiting large scale online and/or crowdsourced datasets.
• Learning a driving model or policy from uncalibrated sources.
• Predicting the distribution over feasible future actions.
End-to-End Learning of Driving Models (UCB Darrell’s Group, 2017)
• Exploiting large scale online and/or crowdsourced datasets.
• Learning a driving model or policy from uncalibrated sources.
• Predicting the distribution over feasible future actions.
Automated Driving Systems (ADS) - End-to-End Navigation by RL (Deep Mind 2018)
DrivingEnvironment
Actuation
Sensing(camera,
radar, lidar, etc.)
VehicleKinematic & Dynamic
Model
Control Commands
EgoVehicleStates
Trajectory Planning
Driver
Autonomous Perception
Mapping & Localization
Route Planning
*Learning to Navigate in Cities w ithout a Map, DeepMind, 2018
An end-to-end deep reinforcementlearning approach that can be applied on a city scale
End-to-End Navigation by Reinforcement Learning
(DeepMind 2018)
• Real-world grounded content is built on top of the publicly available Google StreetView.
• Agent never sees the underlying graphs but only the RGB images.
• The goal is represented in terms of its proximity to a set L of fixed landmarks.
• The aim is to show a neural network can learn to traverse entire cities (London, Paris and New York) using only visual observations.
Automated Driving Systems (ADS) Reinforcement Learning for AV (Wang & Chan, 2017)
DrivingEnvironment
Actuation
Sensing(camera,
radar, lidar, etc.)
VehicleKinematic & Dynamic
Model
Control Commands
EgoVehicleStates
Trajectory Planning
Driver
Autonomous Perception
Mapping & Localization
Route Planning
Maneuver Control based on Reinforcement Learning for Automated Vehicles in An Interactive Environment
*Reinforcement Learning, P. Wang, C-Y Chan, ITSC 17, IV 18
Reinforcement Learning for driving policy in interactivedriving environment (Wang and Chan 2017-2018)
ImmediateReward Safety Promptness
𝒇𝒇𝒅𝒅(𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅) 𝒇𝒇𝒗𝒗(𝒅𝒅𝒔𝒔𝒅𝒅𝒅𝒅𝒅𝒅)
Smoothness
𝒇𝒇𝒅𝒅(𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒂𝒂𝒅𝒅𝒂𝒂𝒅𝒅𝒅𝒅𝒅𝒅𝒂𝒂𝒅𝒅)
Application of Reinforcement Learning andInverse Reinforcement Learning for
Autonomous Driving
Pin WangTeam Leader
Ching-Yao ChanAssociate Director, Berkeley DeepDrive
Reinforcement Learning for AutonomousDriving
– use cases: Ramp Merge and Lane Change
Reinforcement Learning – Problem Formulation• Find a safe, comfortable, efficient driving policy under
dynamic traffic by maximizing a long-term reward
Continuousstate space
Continuousaction space
Continuousreward function
Vehicle control Longitudinal Lateral
Reinforcement Learning AlgorithmsAn Overview
RL algorithms
Discrete Action Space Continuous Action Space
Q-learningDueling Networks
Stochastic policy gradientActor-criticTrust region policy gradientNatural policy gradients
Stochastic ContinuousAction Space
Deterministic ContinuousAction Space
Deterministic policy gradientOn-policy DPGOff-policy DPGNormalized Advantage Functions
Quadratic Q-function Approximator
Reward Function
• Reward Function• Safety• Comfort• Efficiency
• Time sequence
• Q-function approximator design
Quadratic Q-function Approximation
𝝁𝝁 𝒅𝒅 ,𝑴𝑴 𝒅𝒅 ,𝑽𝑽(𝒅𝒅) are values learnedfrom neural networks.
• A Reinforcement Learning Based Approach for Automated Lane Change Maneuvers, 2018 IEEE International Conference onIntelligent Vehicles.
• Formulation of Deep Reinforcement Learning Architecture Toward Autonomous Driving for On-Ramp Merge, 2017 IEEE InternationalConference on Intelligent Transportation Systems.
Highway/ramp traffic: -- random departure time-- random initial speed-- individual speed limit
Highway vehicles: -- car following behavior
Ego vehicle: -- ramp merging behavior-- lane changing
(1) Scenarios of ramp merging and lane changing
(2) Traffic on highway and ramp (3) Vehicle behaviors
(4) Simulation rules: Vehicle interactions Accepted gap Lane change commands
Simulation Platform
Lane changeRamp merge
Loss (decreasing) Reward (increasing )
Training Results
Training steps: 600,000Lane changing vehicles: 6,000Train on CUPTraining time: 150 mins.
Loss (decreasing) Reward (increasing )
Training steps: 400,000Ramp merging vehicles: 15,000Train on CUPTraining time: 100 mins.
• Verification– Save 10 models during training– Play each model with 100 vehicles running.– Calculate the averaged total rewards for each model.
Model Verification
training steps
Verification of Vehicle Performance
Inverse Reinforcement Learning forReward Function Learning
Inverse Reinforcement LearningInfer reward function from roll-outs of expert policy/demonstrations
• Given:– States, Actions– transition model p(s’|s, a) (sometimes) – Samples from policy 𝝅𝝅
• Learn:– Reward Function 𝒂𝒂𝝓𝝓(𝒅𝒅,𝒅𝒅)– Either a linear combination or neural network
• Then:– Use learned reward function to learn 𝝅𝝅∗(𝒅𝒅|𝒅𝒅)
Two Main Methods
• Maximum Margin Based (Ng & Abbeel, 2004)– Reward function design: 𝑹𝑹 𝒅𝒅 = 𝒘𝒘 ∗ 𝒇𝒇 𝒅𝒅– Feature function expectation : 𝝁𝝁𝑬𝑬– Max margin and update:
– Drawback:• Ambiguity: Different policies may lead to the same feature values.
• Max Entropy Based (Ziebart, 2008) – Learn 𝒔𝒔 𝒂𝒂 𝜽𝜽 from observations – Based on max. entropy.– Use max. likelihood as approximation
– Drawback: approximation has bias.
Proposed Method
• Max. Entropy
• Incorporate prior knowledge– Incorporate prior info. on vehicle kinematics
Kinematic Model
Feature Functions
• Features
– Front vehicle time headway: 𝐓𝐓𝐓𝐓𝐓𝐓𝒇𝒇 = 𝒚𝒚𝒇𝒇𝒂𝒂𝒂𝒂𝒅𝒅𝒅𝒅−𝒚𝒚𝒅𝒅𝒆𝒆𝒂𝒂𝒗𝒗
– Rear vehicle time headway: 𝐓𝐓𝐓𝐓𝐓𝐓𝒂𝒂 = 𝒚𝒚𝒅𝒅𝒆𝒆𝒂𝒂−𝒚𝒚𝒂𝒂𝒅𝒅𝒅𝒅𝒂𝒂𝒗𝒗
– AV longitudinal acceleration: �̇�𝒚
– AV lateral acceleration: �̇�𝒙
– AV steering angle rate: �̇�𝜹𝒇𝒇
– Speed diff. btw. current speed and desired speed: |𝒗𝒗 − 𝒗𝒗𝒅𝒅𝒅𝒅𝒅𝒅|
– Lateral deviation from the target lane: |𝒚𝒚 − 𝒚𝒚𝒅𝒅𝒅𝒅𝒅𝒅|
Training
• NGSIM Data– Naturalistic traffic data on I-80 – Coverage of rush hour (5:00pm-5:30pm) and transition period (4:00pm-4:15pm)– 5000+ vehicle trajectories, 200 lane changes
• Extracted Scenario– Lane change between two lanes– Four vehicles as a pair– Target vehicle (blue) is changing lane
Driving Direction
12
34
5
76La
nes
Bird view of naturalistic traffic recorded on I-80 freeway Extracted scenario illustration
• Generated trajectory of left & right lane changes based on the learned reward function
130 140 150 160 170 180 190
X/m
10
11
12
13
14
15
16
17
18
19
Y/m
Original TrajectoryFiltered TrajectoryIRL Generated TrajectoryLane I
Lane II
120 130 140 150 160 170 180 190
X/m
-1
0
1
2
3
4
5
6
7
8
Y/m
Original TrajectoryFiltered TrajectoryIRL Generated TrajectoryLane I
Lane II
• Research Topics:– Different formats of reward functions– Diverse situations to make the model more robust– Comparison with other IRL methods
Technical Approach
Applying AI to Production Cars
Software 1.0
Written in codes (C++ …)Requires Domain Expertise1. Decompose problems2. Design algorithms3. Compose into a systemMeasure performance
*Building the Software 2.0 Stack,” Andrej Kaparthy, Tesla, 05/2018
Software 2.0
Requires Much Less Domain Expertise1. Design a Code SkeletonMeasure performance
“Fill in the Blanks Programming”
*Building the Software 2.0 Stack,” Andrej Kaparthy, Tesla, 05/2018
Cameras, Radar, Ultrasonic, IMU
Steering, Acceleration
Cameras, Radar, Ultrasonic, IMU
Steering, Acceleration
Cameras, Radar, Ultrasonic, IMU
Steering, Acceleration
1.0 Code
2.0 Code
*Building the Software 2.0 Stack,” Andrej Kaparthy, Tesla, 05/2018
How to Expedite Learning and Testing?• The consensus is that it is too resource-consuming and not feasible
to conduct ADS testing by physical cases “completely.” (>108 km)
• Practices of Safety Assurance Testing:• Learn from database of “corner cases”
• Collection of challenging scenarios and probable test cases for specifications
• “Fleet” Learning• Tesla, e.g. (100’s M of on-road data)
• “Simulated” Learning• Waymo, e.g. (8M miles daily, 2.5B miles yearly)
Applying AI in Achieving Safe and Robust AV Performance
• Proving Ground
• Road Testing
• Simulation
• Supervised Learning
• Imitation + Reinforcement Learning
• RL + Supervised Learning
Testing/Validation AI & ML
General Intelligence All Situations Uncharted Territory
Domain Adaptation Transfer Learning Learning to Learn
Philosophically Speaking ….
What are We (Humans) and Machine Good at?
• Expression and Gesture
• Intuitive Reflex
• Imagination
• Adaption
• System One*
• Complex & Fast Computation
• Rational Reasoning
• Rule-Abiding
• Vast Data Capacity
• System Two*
Human Machine
* Thinking Fast and Slow, Daniel Kahneman
Man and Machine are quite complementary
H(orse) Metaphor for Automated Driving Systems (ADS)
Tight ReinLoose Rein
High Autonomy High Intervention
HorseRiding
CarDriving
• The H-Metaphor as a Guideline for Vehicle Automation and Interaction by F. Flemisch et al., 2003
H-Metaphor for Automated Driving Systems (ADS)
The horse can run a course well on its own; it also behaves well even if the rider pulls the rein or uses the whip occasionally.
HorseRiding
CarDriving
The car can run the course well on its own; it also behaves well even if the driver steers the wheel or pushes the pedal occasionally.
The Ultimate Driving Machine
The Ultimate Driving Machine?
Level of Automation
Level of Driver Inputs
I
II
III
IV
V
5 Levels of Automation per
SAE J-3016
Switching of Automation
Levels
Supervisory Controlin Automated (Driving) Systems
• Supervisory control*:
Human-machine systems can exist in a spectrum of automation, and shift across the spectrum of control levels in real time to suit the situation at hand.
* T. Sheridan, Telerobotics, Automation, and Human Supervisory Control, Cambridge, MA: MIT Press, 1992.
The Ultimate Driving Machine?
Level of Automation
Level of Driver Inputs
I
II
III
IV
V
5 Levels of Automation per
SAE J-3016
Supervisory Control at Varying
Automation Levels
Safe and Effective Interaction with
Surrounding
Vehicle State Measurement
Module
Detection and Perception
Modules
Actuation Control Modules
If there is a lack of clarity and certainty,
Can an arbitration module learn to make decisions to achieve its goal?
Given the foundation below,
Research Questions in Supervisory Concept
ArbitrationModule
AV ControllerInputs
DriverInputs
?
Minimum Risk Doman
Automation Lock Doman
1
2
4
6
3
Automation ODD
Automatic Transition Doman
Singularity Doman
5
1. Request In2. Request Out3. Auto Transition In4. Minimum Risk Move5. Driver Takeover at Will6. Automation Lock-In
Operational Design Domain (ODD, per SAE)
Concluding Remarks
Opportunities in AI for AV
• Significant advancements in Deep Learning, 2010s • Text, Voice, Image • Robotics Autonomous Driving
• Still a long way to go, to achieve general intelligence, but it is an exciting era for AI+AV
Intelligence ≠ Perfection
Artificial or Human
We, As a Society, Have a High Tolerance of What Humans Do,
• Distraction
• Fatigue
• Poor Judgment
• Mistakes
• Not Knowing What Is in Others’ Mind
• Misinformation
• Reliability
• Consistency
• Fail-Safe
• Not Understanding Algorithms?
Human Behaviors Machine Performance
Can We Accept and Live With What Machines Do?
(What we have now is)
Not A.I., but I.A., Intelligence Augmentation
Michael Jordan, UC Berkeley
Thank you.
Ching-Yao [email protected]