shengli minyue derong liu - uta talks 2017/robot hri.pdfbe assessed, including the safety, level of...
TRANSCRIPT
2
3
Shengli XieMinyue FuDerong Liu
Moncrief-O’Donnell Chair, UTA Research Institute (UTARI)The University of Texas at Arlington, USA
and
F.L. Lewis, NAI
Talk available online at http://www.UTA.edu/UTARI/acs
Assistive Human-Robot Interaction (HRI)and Dan Popa, University of Louisville
and Isura Ranatunga, Reza Modares, Bakur AlQaudi
Guest Foreign Professor, Guangdong University of Technology,Guangzhou China
Supported by :NSF NRI Initiative ONR
Supported by :China NNSFChina Project 111
3. Devices: “distributed skin sensors”Integration of multi-modal, multi-resolution,MEMS skin sensors to include tactile,thermal, pressure, acceleration, and distanceIR sensing.- Sensor design tuned for pHRI- Fabrication on flexible substrates- Robust packaging in Frubber® & laminates- Efficient wire interconnect schemes
Microsensor packaging and interconnects Concept: Microactuator
array using piezo actuator
Sensors and skin onto PR2 and youBot robots at UTARI
Electro Hydro Dynamicalsensor printing
(maskless lithography)
Multi-Modal Skin and Garments for Healthcare and Home Robots
University of Texas at ArlingtonNRI Grant No. 1208623
Program Manager: Dr. Paul Werbos, ECCS, ENG, NSF
Dan O. Popa1, Frank L. Lewis1, Nicoleta Bugnariu2, Woo Ho Lee1 and Muthu Wijesundara3
1Department of Electrical Engineering, University of Texas at Arlington, 2University of North Texas Health Science Center, 3UT Arlington Research InstitutePartner Companies: Advanced Arm Dynamics, Hanson Robotics, Inc., National Instruments
1. System Design: “where to place sensors on robot?”Novel algorithms and methods for optimal placement anddata management of such devices on several co-Robots.
- Statistical adaptive sampling for sensor selection- Sensor fusion based on noise and sensor scaling
models- Optimization algorithms for maximizing robot perception- New sensor simulation models and robot control
algorithms
4. Co-Robot performance: “how does thistechnology help humans?”The impact of the new technology to humans willbe assessed, including the safety, level ofassistance to several targeted user groups, easeof use, aesthetics, and therapeutic benefits.- Clinical Testing at UNTHSC and UTARI- Collaborative work with Advanced Arm Dynamics
Robot, Tactile sensor array andthe environment in Gazebo
Robot
Human
Sensor
Diagram of SkinSim, multi-modal skin simulation environment
Assistance for UpperLimb Amputees PR2 Teach-by-demonstration
Human Interaction Data Collection
PR2 Human
Character
SensorPlacementInfraredAccel.Temp.Tactile
World
ModelPlugins
SDF
SensorPlugins
WorldPlugin
OverlayThermalTempprofile
OutputInfraredAccel.Temp.Tactile
GAZEBO
Sketch-up
CollectMeshes
User ApplicationC++/Python
Pressure sensitiveadhesive tape(50µm thick)Bottom Kaptonlayer with electric traces and pads
Top Kapton layer with electric traces and pads
Array of temperature sensors Parylene (polymer substrate) at UTARI
( )v t
Model Reference Neuroadaptive Controller
Neuroadaptive Impedance Control
2. Control and Learning: “both human and robot learn during interaction”Learning algorithms and adaptive impedancecontrol for efficient use of multimodal sensors tosense human intent and improve the usability ofco-Robots.- Online reinforcement learning for pHRI with co-
Robots wearing skin and garments, given human-centric rewards and cost functions- Neuroadaptive Impedance Control with stabilityand performance guarantees
The robustifying signalis,
( ) ( )( ) ( , )( ) ( ) ( )m m mf x M q q e V q q q e F q G q ˆ ˆ( )T T
v rW V x K r v
ˆ( ) ( )z F Bv t K Z Z r ‖ ‖
Where,
Iterate Designs & Algorithms
PerceivedImpedance
Robot and Skin
Simulation
Fabrication & Integration of Skin/Garment
Hardware
Interaction LearningInitial sensor
prototypes& Robotichardware
Task requirements
Measurement&
Simulation
pHRI
Fully Automated Robot vs. Assistive Robot
PR2 meets Isura
Standard Robot Trajectory Tracking Controller
Where is the human?
Robot dynamics
Prescribed Error system
Control torque depends onImpedance model parameters
Impedance Control
Human task learning has 2 components:1. Human learns a robot dynamics model to compensate for robot nonlinearities2. Human learns a task model to properly perform a task
Inner Robot Specific Control LoopINDEPENDENT OF TASK
Outer Task Specific Control LoopINDEPENDENT OF ROBOT DETAILS
Human Performance Factors Studies
Two-loop HRI Design- Robot Control versus Task Control1. Inner Robot-Specific Control Loop2. Outer Task-Specific Control Loop
2A. Adaptive Inverse Filter Task Design2B. Model Reference Adaptive Control Task Design2C. Reinforcement Learning Task Control for Minimum Human Effort
3. Experiments
Robot control inner loop
Task control outer loop
1. Inner‐Loop Robot Specific Controller
Model‐reference Neuro‐adaptive Control
There is NO prescribed trajectory in the robot control loop design
Make the robot behave like the prescribed model
F.L. Lewis, D.M. Dawson, and C.T. Abdallah, RobotManipulator Control: Theory and Practice, 2nd edition,Revised and Expanded, CRC Press, Boca Raton, 2006.
F.L. Lewis, S. Jagannathan, and A. Yesildirek, Neural NetworkControl of Robot Manipulators and Nonlinear Systems,Taylor and Francis, London, 1999.
A Novel Control Objective Using Neuro‐adaptive Control Techniques
( ) ( , ) ( ) ( ) =d c hM q x V q q x F q G q f f f
=m m m m m m hM x D x K x f
= me x x
=r e e
( ) = ( , ) ( ) d c hM q r V q q r f f f f
( ) = ( )( ) ( , )( ) ( ) ( )m mf M q x e V q q x e F q G q
=TT T T T T T T
m m me e x x x q q
Robot dynamics
Prescribed robot impedance model
Model‐following error
Sliding mode error
Error dynamics
Unknown robot nonlinear function
There is NO task trajectory here
Model‐following error formulation
= ( )TcJ q f
Control torque
ˆ= ( ) ( )c v hf f K r v t f
( ) = ( , ) ( ) ( )v dM q r V q q r K r f f v t
( ) ( , ) ( ) ( ) =d c hM q x V q q x F q G q f f f
( ) = ( , ) ( ) d c hM q r V q q r f f f f
ˆ ˆ ˆ( ) = ( )T Tf W V
( ) = ( )T Tf W V
Dynamics
Controller‐ approximation based
Robot
Error
Unknown nonlinearities parameterized in terms of a function approximator
Estimated parameters
Estimate for unknown nonlinearities
Closed‐loop error dynamics
W, V unknown parameters
Model‐following error driven by parameter estimation error
ˆ ˆ,W V
ˆ ˆ ˆ( ) ( ) ( ) = ( ) ( )T T T Tf f f W V W V
ˆ ˆ= ( ) ( )T Tc v hf W V K r v t f
ˆ ˆ ˆ ˆ ˆ= ( ) ( )T T T T TW F V r F V V r F r W
ˆ ˆ ˆ ˆ= ( ( ) )T T TV G V Wr G r V
ˆ( ) = ( )z F Bv t K Z Z r
=TT T T T T T T
m m me e x x x q q
Adaptive control structure
Standard Adaptive Parameter tuning algorithms
Robust control term
No task reference trajectory is used hereThe robot controller makes the model following error smallThe parameters of the admittance model are not needed
= me x x=m m m m m m hM x D x K x f
= me x x
=r e e
Model‐followingerror
No task trajectory information is used in this inner‐loop robot controllerThe inner‐loop robot controller makes the model‐following error smallThe admittance model parameters are not neededOnly the admittance model trajectories are needed., ,m m mx x x
2. Outer Task-Specific Control Loop 2A. Adaptive Inverse Filter Task Design2B. Model Reference Adaptive Control Task Design2C. Reinforcement Learning Task Control for Minimum Human Effort
Robot control inner loop
Task control outer loop
Three Outer Loop DesignsTo appear 2016
2A. Outer‐loop Task Specific Design #1
( ) ( ) ( )D s M s H sWant to find M(s) so that
with H(s) and D(s) unknown
Adaptive Inverse Control and Wiener Filter
B. Widrow Adaptive inverse filter
For trajectory following task‐ e.g. point‐to‐point motion control
Work of Isura Ranatunga
Signal to robot controller
( ) ( ) ( )D s M s H sWant to find M(s) so that
with H(s) and D(s) unknown
( ) ( )( ) = =( ) ( )
f xh d
f fh h
s D sM ss H s
Wiener Filter Solution in terms of power spectral densities
1 1
2 2
3 3
0 1 0 00 0 1 00 0 0 1
hd fdt
1 1
2 2
3 3
0 1 0 00 0 1 00 0 0 1
md xdt
1/s
1/s
1/s
1/s
1/s
1/s
1b
2b
3b
1a
2a
3a
( )hf t ( )mx tFind Wiener Filter online using adaptive learning
( ) = ( ) ( )dx t H t t
ˆ( ) = ( ) ( )mx t H t t
Ideal Filter
Wiener filter solution
Known regression vector Unknown coefficients
d mx x
Kalman Filter = CT RLS
1 1 11 1 1= ( ) { } { } ( ) ( ) ( )2 2 2
T T T TL r M q r tr W F W tr V G V t P t t
Combined stability analysis of Inner robot control loop and Outer task following loop
Lyapunov function
Robot model‐following error
NN weight estimation errors
Outer‐loop inverse adaptive filter error
Shi nian shu muBai nian shu ren
十年树木,百年树人
Keshi-Wu nian shu xuesheng可是,五年树学生
2B. Outer‐loop Task Specific Design #2
Model Reference Adaptive Control
K. Astrom
BUT ‐In standard MRAC, the controller appears before the unknown plantHere, the unknown plant (e.g. human) is BEFORE the controller
Work of Bakur AlQaudi
BUT ‐In standard MRAC, the controller appears before the unknown plantHere, the unknown plant (e.g. human) is BEFORE the controller
So we need to add a human dynamics identifier
( )hc
b yH ss a u
( ) m mm
m c
b yH ss a u
( ) pnp
n
ybH ss a u
Nominal Robot impedance model
To generate prescribed model trajectory , ,m m mx x x
Unknown Human model
Task reference model – first‐order crossover model – ideal human + robot system
Basic muscle response model
Human factors studies show that AFTER Learning, the human plus robot system obeysA simple first‐order roll‐off high bandwidth dynamics
ˆy y y
ˆˆb b b
a a a
Human response estimation error
Parameter estimation error
ˆy y y Human response estimation error
Combined stability proof of overall 2‐loop robot‐task system
1 11 1 1( ) { } { }2 2 2
T T Tr M q r tr W F W tr V G V
Adaptive tuning parameter errors
Inner model tracking errorNN parameter estimation errors
Basic muscle response
PD controller like that provided by cerebellum
2C. Outer‐loop Task Specific Design #3
Reinforcement Learning for minimum human effort
Feedforward assistive control term
‐ 2 1( )Ms Bs K -+ +hK
(.)l
+
-
dx
mxh
fd
e ++
1( )p d
K s K s -+
PrescribedImpedanceModelHuman
Find robot impedance model parametersTo minimize human force effortAnd task trajectory following error
, ,M B Khf
de
Human force amplifier
Work of Reza Modares
Force exerted by human indicates his discontent‐A measure of Human Intent
( )d p h e d
K s K f k e+ =
nd d m
e x x= - Î Tracking error
2[ ]T T T nd d d d
e e e x x= = - Î
2[ ]T T T nm m
x x x= Î
2[ ]T T T nd d d
x x x= Î
Minimize human effort and tracking error
( )T T Td d d h h h e e
t
J e Q e f Q f u R u dt¥
= + +ò
Performance index
1 2e d hu K e K f= +
Then control is
( )T Te e
t
J X Q X u R u dt¥
= +ò
How to get human force into PI ?
Feedback linearization loop
Robot Impedance Model Unknown Human Model
Overall Augmented Dynamics
1,0h d p h e d d h h h d
f K K f k K e A f E e-= - + º +
( )d p e d
K s K f k e+ =
d h p h e dK f K f k e+ =
We want online method to learn the optimal control without knowing the System Matrix A
Optimal Design Always Admits Reinforcement Learning for Real‐time Optimal Adaptive Control
Optimal control is an offline methodBased on solving AREKnowing all the plant dynamics
Take enough data along the system trajectoryTo solve this equation using least‐squares
OFF‐POLICYReinforcement LearningNeeds NO knowledge of the system dynamics
Off‐policy IRL Bellman equation
( ) ( ) [2 ( ) ] ( ) ( ) ( ) ( )t t t t
T T T T Te e
t t
X t PX t X t PBe d X t Q X t u R u d X t t P X t tt t t+D +D
é ù+ = + + +D +Dê úë ûò ò
Off‐policy term
Finds optimal control gains without using ANY system dynamics
Off‐Policy Reinforcement LearningNeeds NO knowledge of the system dynamics
3. Experimental Results on PR2
3. Experimental Results Work of Isura RanatungaSven Cremer
Point‐to‐point tracking error Human force effort
Future Work
Thanks !!