![Page 1: Learning from Demonstrations Jur van den Berg. Kalman Filtering and Smoothing Dynamics and Observation model Kalman Filter: – Compute – Real-time, given](https://reader036.vdocument.in/reader036/viewer/2022062308/56649ce25503460f949adb4e/html5/thumbnails/1.jpg)
Learning from Demonstrations
Jur van den Berg
![Page 2: Learning from Demonstrations Jur van den Berg. Kalman Filtering and Smoothing Dynamics and Observation model Kalman Filter: – Compute – Real-time, given](https://reader036.vdocument.in/reader036/viewer/2022062308/56649ce25503460f949adb4e/html5/thumbnails/2.jpg)
Kalman Filtering and Smoothing
• Dynamics and Observation model
• Kalman Filter:– Compute– Real-time, given data so far
• Kalman Smoother:– Compute – Post-processing, given all data
ttt YYX yy ,,| 00
TtYYX TTt 0,,,| 00 yy
),(~
),(~
,
,1
RN
QN
C
A
t
t
tt
tt
t
t
0v
0w
vx
wx
y
x
![Page 3: Learning from Demonstrations Jur van den Berg. Kalman Filtering and Smoothing Dynamics and Observation model Kalman Filter: – Compute – Real-time, given](https://reader036.vdocument.in/reader036/viewer/2022062308/56649ce25503460f949adb4e/html5/thumbnails/3.jpg)
EM Algorithm
• Kalman smoother: – Compute distributions X0, …, Xt
given parameters A, C, Q, R, and data y0, …, yt.
• EM Algorithm:– Simultaneously optimize X0, …, Xt and A, C, Q, R
given data y0, …, yt.
),(~
),(~
,
,1
RN
QN
C
A
t
t
tt
tt
t
t
0v
0w
vx
wx
y
x
![Page 4: Learning from Demonstrations Jur van den Berg. Kalman Filtering and Smoothing Dynamics and Observation model Kalman Filter: – Compute – Real-time, given](https://reader036.vdocument.in/reader036/viewer/2022062308/56649ce25503460f949adb4e/html5/thumbnails/4.jpg)
Learning from Demonstrations
• Application of EM-algorithm• Example: – Autonomous helicopter aerobatics– Autonomous surgical tasks (knot-tying)
![Page 5: Learning from Demonstrations Jur van den Berg. Kalman Filtering and Smoothing Dynamics and Observation model Kalman Filter: – Compute – Real-time, given](https://reader036.vdocument.in/reader036/viewer/2022062308/56649ce25503460f949adb4e/html5/thumbnails/5.jpg)
Motivation
• Learning an ideal “trajectory” of system• Human provides demonstrations of ideal
trajectory• Human demonstrations imperfect• Multiple demonstrations implicitly encode
ideal trajectory• Task: infer ideal trajectory from
demonstrations
![Page 6: Learning from Demonstrations Jur van den Berg. Kalman Filtering and Smoothing Dynamics and Observation model Kalman Filter: – Compute – Real-time, given](https://reader036.vdocument.in/reader036/viewer/2022062308/56649ce25503460f949adb4e/html5/thumbnails/6.jpg)
Acquiring Demonstrations
• Known system dynamics (A, B, Q)• Observations with known sensors (C, R)– Inertial measurement unit– GPS– Cameras
• Use Kalman smoother to optimally estimate states x along demonstration trajectory
),(~
),(~
,
,1
RN
QN
C
BA
t
t
tt
ttt
t
t
0v
0w
vx
wux
y
x
![Page 7: Learning from Demonstrations Jur van den Berg. Kalman Filtering and Smoothing Dynamics and Observation model Kalman Filter: – Compute – Real-time, given](https://reader036.vdocument.in/reader036/viewer/2022062308/56649ce25503460f949adb4e/html5/thumbnails/7.jpg)
Multiple Demonstrations
• D demonstration trajectories of duration Tj
• Hidden ideal trajectory z of duration T*
j
jt
jtj
t TtDj ,,1,,1
u
xd
Tt
t
tt ,,1
u
xz
![Page 8: Learning from Demonstrations Jur van den Berg. Kalman Filtering and Smoothing Dynamics and Observation model Kalman Filter: – Compute – Real-time, given](https://reader036.vdocument.in/reader036/viewer/2022062308/56649ce25503460f949adb4e/html5/thumbnails/8.jpg)
Model of Ideal Trajectory• Main idea: use demonstrations as noisy
observations of hidden ideal trajectory• Dynamics of hidden trajectory
• Observation of hidden trajectory
)0
0,(~,
01
N
QN
I
BAtttt 0wwzz
)
00
00
00
,(~,
11
Dttt
Dt
t
S
S
Nss
I
I
0z
d
d
![Page 9: Learning from Demonstrations Jur van den Berg. Kalman Filtering and Smoothing Dynamics and Observation model Kalman Filter: – Compute – Real-time, given](https://reader036.vdocument.in/reader036/viewer/2022062308/56649ce25503460f949adb4e/html5/thumbnails/9.jpg)
Inferring Ideal Trajectory
• Dynamics model: Parameter N controls smoothness; A, B, Q known
• Observation model: Parameters S encode relative quality of demonstrations
• Use EM-algorithm with Kalman smoother to simultaneously optimize z and S (and N).• Initialize S with identity matrices
![Page 10: Learning from Demonstrations Jur van den Berg. Kalman Filtering and Smoothing Dynamics and Observation model Kalman Filter: – Compute – Real-time, given](https://reader036.vdocument.in/reader036/viewer/2022062308/56649ce25503460f949adb4e/html5/thumbnails/10.jpg)
Time Warping
• But, this assumes demonstrations are of equal length and uniformly paced
• Include Dynamic Time Warping into EM-algorithm
• Such that demonstrations map temporally
![Page 11: Learning from Demonstrations Jur van den Berg. Kalman Filtering and Smoothing Dynamics and Observation model Kalman Filter: – Compute – Real-time, given](https://reader036.vdocument.in/reader036/viewer/2022062308/56649ce25503460f949adb4e/html5/thumbnails/11.jpg)
Time Warping
• For each demonstration j, we have function tj(t)
• Maps time t along z to time tj(t) along dj
• Adapted observation model:
)
00
00
00
,(~,
1
)(
1
)(1
Dttt
D
t
t
S
S
N
I
I
D
0ssz
d
d
![Page 12: Learning from Demonstrations Jur van den Berg. Kalman Filtering and Smoothing Dynamics and Observation model Kalman Filter: – Compute – Real-time, given](https://reader036.vdocument.in/reader036/viewer/2022062308/56649ce25503460f949adb4e/html5/thumbnails/12.jpg)
Learning Time Warping
• tj(t) is (initially) unknown• Assume (initially):– T* = (T1 + … + TD) / D– tj(t) = (Tj / T*) t
• Adapted EM-algorithm:– Run Kalman smoother with current S and t– Optimize S by maximizing likelihood– Optimize t by maximizing likelihood
(Dynamic Time Warping)
![Page 13: Learning from Demonstrations Jur van den Berg. Kalman Filtering and Smoothing Dynamics and Observation model Kalman Filter: – Compute – Real-time, given](https://reader036.vdocument.in/reader036/viewer/2022062308/56649ce25503460f949adb4e/html5/thumbnails/13.jpg)
Dynamic Time Warping
• Match demonstration j with z• Assume that demonstration moves locally– twice as slow as z– same pace as z– twice as fast as z
• Dynamic Programmingto find optimal “path”
• Cost function: likelihood of),(~,
)(
jttt
j
tSNj 0sszd
![Page 14: Learning from Demonstrations Jur van den Berg. Kalman Filtering and Smoothing Dynamics and Observation model Kalman Filter: – Compute – Real-time, given](https://reader036.vdocument.in/reader036/viewer/2022062308/56649ce25503460f949adb4e/html5/thumbnails/14.jpg)
Example: Helicopter Airshow• Thesis work of Pieter Abbeel• Unaligned demonstrations:– Movie
• Time-aligned demonstrations:– Movie
• Execution of learnt trajectory– Movie
![Page 15: Learning from Demonstrations Jur van den Berg. Kalman Filtering and Smoothing Dynamics and Observation model Kalman Filter: – Compute – Real-time, given](https://reader036.vdocument.in/reader036/viewer/2022062308/56649ce25503460f949adb4e/html5/thumbnails/15.jpg)
Example Surgical Knot-tie
• ICRA 2010 Best Medical Robotics Paper Award• Video of knot-tie
![Page 16: Learning from Demonstrations Jur van den Berg. Kalman Filtering and Smoothing Dynamics and Observation model Kalman Filter: – Compute – Real-time, given](https://reader036.vdocument.in/reader036/viewer/2022062308/56649ce25503460f949adb4e/html5/thumbnails/16.jpg)
Conclusion
• Learning from demonstrations• Includes Dynamic Time Warping into
EM-algorithm