autonomous motion learning for near optimal control
Post on 23-Feb-2016
31 Views
Preview:
DESCRIPTION
TRANSCRIPT
Autonomous Motion Learning for Near Optimal Control
By Alan JenningsSchool of Engineering, University of Dayton
Dayton, OH, August 2012
Dissertation defense in partial fulfillment of the requirements for the degree of doctor of philosophy in electrical engineering
Alan Jennings, Dissertation Defense, July 2012 2
MotivationConsider human learning: •Intelligent system: able to solve new problems and become an expertConsider computer accomplishments: •Beat chess & Jeopardy! grandmasters•But one cannot be repurposed for the other. Consider general purpose learning: •People can grow up to be presidents, design fashion, play croquet, identify liars, train animals, predict weather…•Not been accomplishedThe foundation for general purpose learning is a developmental framework:•Shaped by environment & experiences•Complex value systems guide learning•Infant stage restrict exploration until basic skills are established
IBM’s Deep blue beat Kasparov on their second match in 1997 IBM’s Watson beat Jennings
and Rutter in 2011
In 2011, Google gained Nevada licenses for self-driving cars
Google Prius image, Flckr user Steve Jurvetson
Alan Jennings, Dissertation Defense, July 2012 3
Context in Developmental Learning
• Developmental learning seeks to mimic the progressive learning process– Infant -> Toddler -> Child -> Young adult -> … – The solution/knowledge should be unguided by the
programmer• Learning basic tasks supports learning high-level tasks
– Proverbial walking before running• The robot then learns general tasks of increasing
complexity at increasing proficiency– Does not require reasoning/understanding/consciousness
Alan Jennings, Dissertation Defense, July 2012 4
My ContributionsAutonomous motion learning:
• General purpose rigid body motion optimization– Provides novel high-level interface at the robot geometry level– Allows for novice roboticists or computers to design motions– However, has high computation requirements
• Optimal inverse functions from a global search– Organizes motions in continuous, optimal inverse functions– Provides a set of reflexive responses for use online– Efficiently searches high dimension space using agents & local gradient
• Improving motions by unbounded resolution– Nodes are added to an interpolation approaching optimal continuous function in
the limit– Efficiently collects and “understands” experiences – Motions are not limited by initial programming resolution or initial training time
limitation
Alan Jennings, Dissertation Defense, July 2012 5
Motivating Example
Use of general purpose programs to solve control problems– Use CAD package to draw robot– Use kinematic program for equations of motion– Use optimal control program to solve
• Optimal control problem is introduced– Finding the input with the lowest cost among
inputs satisfying constraints.
Alan Jennings, Dissertation Defense, July 2012 6
Motivating ExampleUse of general purpose programs for solving control problemsThe optimal control problem •Finding the control input with the lowest cost among inputs satisfying constraints.
Optimal Control
DynamicsMass & joints
Set up DIDO
Draft project
Set up Simulink
What does it look like
What are the controls
What is trying to be done
Human creativity comes in at the design level, not the optimization.
Alan Jennings, Dissertation Defense, July 2012 7
Motivating Example
• Typically solved by discretizing over time – Optimize a set of variables, not the continuous function– Local search method– Applies to isolated problem
• Change final value and needs new optimization
Use of general purpose programs for solving control problemsThe optimal control problem •Finding the control input with the lowest cost among inputs satisfying constraints.
x(t), u(t) → g(t)
ψo
ϕ
J ψf
xo
xf
Xf
Xo
General Optimal Control Problem
Alan Jennings, Dissertation Defense, July 2012 8
Motivating ExampleMotion Primitive
Example Problem: System:
Pendulum actuated at base Cost: (Torque)2, J=∫ u(τ)2 dτOutput:
Initial Disturbance, y = θ(t0)Constraints:
Reach final value: θ (tf) =0Saturation: -umax ≤ u(t) ≤ umax
The way forwardIf system dynamics and initial
state are repeatable, Then problem is really only to
find a control signal. Continuous signals can be
approximated by parameterization,
So motion primitives can be composed solely by a vector function of an output.
Alan Jennings, Dissertation Defense, July 2012 9
My ContributionsAutonomous motion learning:
• General purpose rigid body motion optimization– Provides novel high-level interface at the robot geometry level– Allows for novice roboticists or computers to design motions– However, has high computation requirements
• Optimal inverse functions from a global search– Organizes motions in continuous, optimal inverse functions– Provides a set of reflexive responses for use online– Efficiently searches high dimension space using agents & local gradient
• Improving motions by unbounded resolution– Nodes are added to an interpolation approaching optimal continuous function in
the limit– Efficiently collects and “understands” experiences – Motions are not limited by initial programming resolution or initial training time
limitation
Alan Jennings, Dissertation Defense, July 2012 10
Diversity and Progression in Motion Primitives
• Continuous, optimal inverse function– Motion primitives should be continuous so that changes in the
system behavior are not abrupt• Global search required for discovery
– Global search offers possibility of finding alternative motion primitives
– Finding isolated optima requires testing candidates which local conditions indicate would give worse performance
• Progression via increasing resolution– After optimizing at a given resolution, the signal is then limited by
the optimal signal not lying in the space of the parameterization. So the resolution must be increased to improve performance.
Alan Jennings, Dissertation Defense, July 2012 11
Optimal Inverse FunctionsHigh level concept
• Population covers broad area and uses local gradients to improve. • Converging agents are removed so number of agents quickly drops. • Settled agents create a motion primitive and use the local gradient to expand to
new outputs.• The operator has a choice of inverse functions to select from.
– Can use softer criteria for preference.• Inverse function is continuous and easily calculated making them suited for real-
time use.
Optimization
Initialize Population
Move Agents: Lower J(x),
Maintain f(x)
Check for removal or settling conditions
Form Cluster
Set of hk(yd)’s
Execution
Get yd, Evaluate hk
Select inverse function, hk(yd)
Move to new x*
Operator
Alan Jennings, Dissertation Defense, July 2012 12
Optimal Inverse FunctionsMechanics of the method
Improving a given agent1.Restrict motion to null space of
Output gradient2.Move opposite Cost gradient
•Saturation If gradients are large -> Limits effectIf Cost gradient is small -> small stepIf Output gradient is small
-> ease null space restriction•Boundary constraint reduces step
length•Minimum step for settling•Remove particles too close
Quickly reduces population size
-1 -0.9 -0.8 -0.7
1.1
1.15
1.2
1.25
1.3
1.35
1.4
1.45
1.5
Output Cost
Step
Alan Jennings, Dissertation Defense, July 2012 13
Optimal Inverse FunctionsMechanics of the method
Form a cluster of optimal points1.Change output by moving
along the Output gradient2.Repeat optimizing steps
•Test for continuity/optimalityOutput changes in expected directionNot too far (discontinuity) Not too close (ill conditioned surface)Settled (optimality satisfied)
Decreasing yd
Increasing yd
Alan Jennings, Dissertation Defense, July 2012 14
Optimal Inverse FunctionsTesting of the method
x1
x 2
Peaks
-1 0 1-1
0
1
x1
x 2
x12 (x2-c1)2
-1 0 1-1
0
1
x1
x 2
(x1-c1) x22
-1 0 1-1
0
1
x1
x 2
x1 sin(c1 x2)
-1 0 1-1
0
1
Quadratic Cost
-1
0
1Optimal Inverse Functions, hi
x 1
-1
0
1
x 2
-1 -0.5 0 0.5 101
2
yd
J
-1 0 1-1
-0.5
0
0.5
1
1.5Final Clusters: 5
x1
x 2
-1
0
1
Optimal Inverse Functions, hi
x 1
-1
0
1
x 2
-0.5 0 0.5 1-2
0
2
yd
J
-1 0 1-1
-0.5
0
0.5
1
1.5Final Clusters: 11
x1
x 2
Linear/Quadratic Cost
-1
0
1Optimal Inverse Functions, hi
x 1-1
0
1
x 2
-0.5 0 0.5 1-2
0
2
yd
J
-1 0 1-1
-0.5
0
0.5
1
1.5Final Clusters: 9
x1
x 2
Periodic Cost
-1
0
1
Optimal Inverse Functions, hi
x 1
-1
0
1
x 2
-0.5 0 0.5 1012
yd
J
-1 0 1-1
-0.5
0
0.5
1
1.5Final Clusters: 7
x1
x 2
Quadratic Cost
• Combination of functions Multiple extremum Saddle points 2-dim for verification• Expected result Clusters between output extremum
Alan Jennings, Dissertation Defense, July 2012 15
Optimal Inverse FunctionsTesting of the method
Quadratic CostPeriodic-Linear output
-1 0 1-1
-0.5
0
0.5
1
1.5Final Clusters: 5
x1
x 2
-1
0
1
Optimal Inverse Functions, hi
x 1
-1
0
1
x 2
-1 -0.5 0 0.5 101
2
yd
J
Alan Jennings, Dissertation Defense, July 2012 16
Optimal Inverse FunctionsPractical example
• Robot control Problem– Precision is dependent on the pose– Radial precision is optimized
via joint angles for varying radial distance• Planar Robot, Motoman HP-3:
• Complex Robot, Motoman IA-20:
Alan Jennings, Dissertation Defense, July 2012 17
Optimal Inverse FunctionsPractical example
• Each link has a different radius to the tip and therefore a different sensitivity
• In addition, the direction of sensitivity is different
• The problem effectively finds the joint locations that reduce sensitivity in the radial distance Links are shown by solid arrows.
The effective length to the tip is shown by a dashed arrow. The arc showing the sensitivity for a joint is matched by color.
Optimal Inverse FunctionsPractical example
Output is adjusted as desired (additional task of finding angle of plane and the in-plane angle)
Operator selects an inverse function
Alan Jennings, Dissertation Defense, July 2012 19
Optimal Inverse Functions
• Method searches a large space efficiently by:– Having agents congregate to locally optimal solutions
(increasing the effective search area of each), and– Eliminating neighboring points (once locations of
optima are sketched out, less agents are needed). • Sets of continuous, optimal inverse functions
– Can be used in real time, and – Reduces the burden on operator
without reducing optimality
Alan Jennings, Dissertation Defense, July 2012 20
My ContributionsAutonomous motion learning:
• General purpose rigid body motion optimization– Provides novel high-level interface at the robot geometry level– Allows for novice roboticists or computers to design motions– However, has high computation requirements
• Optimal inverse functions from a global search– Organizes motions in continuous, optimal inverse functions– Provides a set of reflexive responses for use online– Efficiently searches high dimension space using agents & local gradient
• Improving motions by unbounded resolution– Nodes are added to an interpolation approaching optimal continuous function
in the limit– Efficiently collects and “understands” experiences – Motions are not limited by initial programming resolution or initial training
time limitation
Alan Jennings, Dissertation Defense, July 2012 21
Unbounded ResolutionHigh level concept
• To have continuous learning, must have unbounded resolution.
• Unbounded resolution leads to exponential growth in complexity
• Must make efficient use of experience
Developing theRe°ex Function
³y; J´ Optimization Reflex
FunctionMemory Model
a
a¤
a¤(yd)
yd
(y; J )
Cubic Interpolation SystemReflex
Function
u(t)a¤
Operator or Higher Level Planner
ydUsing theRe°ex Function
Cubic Interpolation SystemMemory
Model
u(t)
(y; J )
a
Developing theMemory Model
aq³y(aq); J (aq)
´
Alan Jennings, Dissertation Defense, July 2012 22
Unbounded ResolutionMechanics of the method
(y; J )
Cubic Interpolation SystemReflex
Function
u(t)a¤
Operator or Higher Level Planner
ydUsing theRe°ex Function
Developing theRe°ex Function
³y; J´ Optimization Reflex
FunctionMemory Model
a
a¤
a¤(yd)
yd
Cubic Interpolation SystemMemory
Model
u(t)
(y; J )
a
Developing theMemory Model
aq³y(aq); J (aq)
´
System Assumptions• t and a are bounded• y(a) and J(a) are in C2 and
constant
Alan Jennings, Dissertation Defense, July 2012 23
Unbounded ResolutionWhy cubic interpolation
• Adding node to cubic interpolation allows for all experiences to be transferred.
• Power series parameters are ill conditioned as the effective area of the basis approaches extremes
• Fourier series parameters typically create a less smooth optimization surface
• Radial basis function scaling parameter is either too small at low resolutions or large at high resolutions, and automatically changing it means data cannot be mapped exactly
• Sigmoid neural network parameters are large with respect to the input magnitude, resulting in poor optimization scaling.
Alan Jennings, Dissertation Defense, July 2012 24
Unbounded ResolutionWhy Locally weighted regression
• Locally Weighted Regression performs a least-squared-error regression where the error is scaled by the distance to the test point.
– local weighting allows global nonlinear behavior• Quadratic regression to accurately model optima• Provides gradient for optimization (and hessian) • Directions with insufficient data are identified from
eigenvalues – Allows for autonomously determining which samples must be
tested
Alan Jennings, Dissertation Defense, July 2012 25
Unbounded ResolutionTesting of the method
Problem Design: Cost: (Distance to sine wave)2
J2=∫ (u(τ)-(sin(2π τ)+2)/4)2 dτ
Output: Average valuey=∫ u(τ) dτ
Saturation applied to u(t)Results
Sinusoidal shape & Saturate at closer side
MotivationPossibly internal resonance,
Distance traveled, material processed, …
Internal LimitationsFlattens peaks in the absolute
distance -> Minimize RMS
Alan Jennings, Dissertation Defense, July 2012 26
Unbounded ResolutionTesting of the method
Near optimal compared to direct optimization
Exponential Learning Rate
Waveform results
The results exploits saturation. Going from 4 to 9 nodes,
the cost decreases but the shape appears identical by sight.
Alan Jennings, Dissertation Defense, July 2012 27
Unbounded ResolutionPractical example
• Objective:– Control the motor voltage to
spin the motor to a given speed at a set time with the minimum peak current.
• Only modifications– Adjusted parameters for range
of u, y & J– Increase measure of data
required to deal with process variation
– Ideal cost based on steady state
Amplifier Motor
Voltage out
u(t)
TachometerCurrent Peak Detector
yJSampled after the run, does not need to be sampled continuously
Unknown to Method
Alan Jennings, Dissertation Defense, July 2012 28
Unbounded ResolutionPractical example
• Completely automated• Progressive improvement• Sizable variation
– Direct optimization on an average of 10 trials still did not converge
– However, LWR provided an sufficiently accurate estimate of the gradients to converge
• Thirteen sets of data– Multiple runs gave similar results
Alan Jennings, Dissertation Defense, July 2012 29
Unbounded ResolutionPractical example
• 7 dim in 17 hours – About 40,000 samples– Method parameters were not optimized
• Results make sense– Final voltage determines output– Initial voltage very similar– Initial slope flattens
Alan Jennings, Dissertation Defense, July 2012 30
My ContributionsRelated publications and presentations:
• Journal submissions– “Unbounded Motion Optimization by Developmental Learning ” Revision submitted to IEEE Systems,
Man and Cybernetics Part B– “Optimal Inverse Functions Created via Population Based Optimization” Submitted to IEEE Systems, Man
and Cybernetics Part B• Conference Presentations
– “Memory-Based Motion Optimization for Unbounded Resolution” Computational Intelligence and Bioinformatics, IASTED, 753-31, Nov 2011
– “Population Based Optimization for Variable Operating Points” Congress on Evolutionary Computation, IEEE, Jun 2011
– “Constrained Near-Optimal Control Using a Numerical Kinetic Solver” Robotics and Applications, IASTED, 706-21, Nov 2010
– “Biomimetic Learning, Not Learning Biomimetics: A survey of developmental learning” National Aerospace and Electronics Conference (NAECON), IEEE, July 2010
• Posters– “Memory Based Optimization for Unbounded Learning” 2011 Great Midwest Regional Space Grant
Consortia Meeting, also NASA Futures Form, Feb 2012.– “Constrained Near-Optimal Control Using a Numerical Kinetic Solver” 2009 Great Midwest Regional
Space Grant Consortia, 3rd place
Alan Jennings, Dissertation Defense, July 2012 31
My ContributionsAutonomous motion learning:
• General purpose rigid body motion optimization– Provides novel high-level interface at the robot geometry level– Allows for novice roboticists or computers to design motions– However, has high computation requirements
• Optimal inverse functions from a global search– Organizes motions in continuous, optimal inverse functions– Provides a set of reflexive responses for use online– Efficiently searches high dimension space using agents & local gradient
• Improving motions by unbounded resolution– Nodes are added to an interpolation approaching optimal continuous function in
the limit– Efficiently collects and “understands” experiences – Motions are not limited by initial programming resolution or initial training time
limitation
Alan Jennings, Dissertation Defense, July 2012 32
CommencementFuture applications:
• Implementation for novel locomotion– Implement on an inch worm– Challenge is automating the tests, such as defining distance traveled– Would be very interesting to reduce variation
• Learn control law for regulation – Develop control law for pendulum– Question of what disturbance to use and metric for cost or output
(possibly response time, the operator sets the urgency)• Address multidimensional outputs
– Robots are used to provide multiple outputs– A manifold of the output may not be represented in the output space
(Think of a screw thread, despite moving continuously , there are multiple surfaces with the same horizontal coordinates).
top related