machine learning for control of morphing air vehicles john valasek, amanda lampton, adam niksch...

Post on 12-Jan-2016

220 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Machine Learning for Control

of Morphing Air Vehicles

John Valasek, Amanda Lampton, Adam Niksch

Aerospace Engineering DepartmentTexas A&M University

SAE Aerospace Control and Guidance Systems Committee Meeting 99

1 March 2007Boulder, CO

Valasek, Lampton, Niksch - 2

Student Research Team

2006 - 2007

Valasek, Lampton, Niksch - 3

Briefing Agenda

Shape Changing / Morphing for Micro Air Vehicles

Learning

Control

Learning and Control– Adaptive-Reinforcement Learning Control

Some Results

Some Conclusions

Research Issues

Extensions

Valasek, Lampton, Niksch - 4

Biomimetic Example 1

Valasek, Lampton, Niksch - 5

300 million year old design

97% success rate in capturing prey

Flight relies entirely on 3-D unsteady aerodynamics

Unstable in all axes

Can hover as well as fly from a forward speed of +100 body

length/sec (~ 60 mph) to - 3 bl/sec within 5 bl

Can alter heading by 90 degs in less than 0.1 secs

Dragonflies and other flying insects have the fastest visual processing

system in the animal kingdom

Biomimetic Example 2

These feats are accomplished with the neuro-circuitryof a brain smaller than a sesame seed.

Andres Meade, Rice University

Valasek, Lampton, Niksch - 6

Biomimetic Example 3

Courtesy Graham Taylor, Oxford University

Cossack

Valasek, Lampton, Niksch - 7

Biomimetic Example 3

Courtesy Graham Taylor, Oxford University

Valasek, Lampton, Niksch - 8

Biomimetic Example 3Cossack sysid using OKID

Valasek, Lampton, Niksch - 9

Biomimetic Example 3

Courtesy Graham Taylor, Oxford University

Kinematic Engineering Model of Eagle Wing

Valasek, Lampton, Niksch - 10

Reconfigurable; shape changing (morphing); rigid or elastic body plants Multi-Input Multi-Output (MIMO)systems Distributed sensing, actuation, and propulsion

– Multiple, possibly non co-located sensors– Can have large number of actuators control allocation problem– High bandwidth actuators gust alleviation

Robust Adaptive Control– Flow control– Exogenous inputs (turbulence and gusts)– Damage and fault tolerance

Distributed and limited processing capability– Volumetric limit– Mass limit– Power limit– Real-time operation

Integrated guidance, navigation, and control– Self learning, self adapting, self tuning

Biomimetic Flight Summary

Information processing system

Valasek, Lampton, Niksch - 11

A Multi-Role Platform that:

Changes its state substantially to adapt to changing mission environments.

Provides superior system capability not possible without reconfiguration.

Uses a design that integrates innovative combinations of advanced materials, actuators, flow controllers, and mechanisms to achieve the state change.

Morphing Aircraft

Aerospace America, Feb 2004

DARPA’s definition

Changes On The Order Of 50% More Wing Area Or Wing Span And Chord

Valasek, Lampton, Niksch - 12

Morphing for Mission Adaptation

– Large scale, relatively slow, in-flight shape change to enable a single vehicle to perform multiple diverse mission profiles

or:

Morphing for Control

– In-flight physical or virtual shape change to achieve multiple control objectives (maneuvering, flutter suppression, load alleviation, active separation control)

MissionAdaptation

Control

John Davidson, NASA Langley, AFRL Morphing Controls Workshop – Feb 2004

Which Morphing?

Valasek, Lampton, Niksch - 13

Lockheed Martin

NextGen&

Barron Associates

Large Morphing Air Vehicle Models

Valasek, Lampton, Niksch - 14

Small Morphing Air Vehicle Models

Cornell (collaborator) Garcia & Lipson

– Morphing dynamical model and simulation that incorporates aerodynamic and structural effects

– Validated with experimental data– Basic morphing parameters (incidence angle, dihedral angle)

Maryland Hubbard

– Actuating a flapping wing structure with SMA’s– Structure, material, distribution of actuators

Florida Lind

– Morphing flight demonstrator vehicle– Basic morphing parameters (dihedral angle, sweep angle)– H-inf control

lots of modeling, very little control

Valasek, Lampton, Niksch - 15

• Use biologically inspired approach to understand and control the: • morphing (shape changing),

• flight control (stability, flight path, gust tolerance, stall)

• maneuvering (perching and hovering)

of multi-mission micro air vehicles.

• Control the physics, in concert with nonlinearities.• Not as a robustness bandaid for aerodynamic and structural

uncertainties & lack of understanding

• Shape change the entire vehicle, not a component on the vehicle• DARPA definition

Technical Approach and Scope

Valasek, Lampton, Niksch - 16

• Address:

•WHEN to morph, perch, etc.

•HOW to morph, perch, etc.

•LEARNING to morph, perch, etc.

All while keeping the (pointy?) end facing forward.

Big Picture Research Goals

Valasek, Lampton, Niksch - 17

Valasek, Lampton, Niksch - 18

Common Mathematical Domain Helps

To Avoid Ad-hoc Approaches

??

Machine Learning

Reconfiguration Policy

State based methods

learning

Adaptive Control

Parameters in a Known

Functional Relationship

State based methods

adapting

The Mathematical Domain Issue

Valasek, Lampton, Niksch - 19

Adaptive-ReinforcementLearning Control (A-RLC)

Conceptual Control Architecture

for Reconfigurable or Morphing Aircraft

SAMIStructured Adaptive Model Inversion

(Traditional Control)

Flight controller to handle wide variation in dynamic properties

due to shape change

ML Machine Learning(Intelligent Control)

Learn the morphing dynamics and the optimal shape at every flight

condition in real-time

Valasek, Lampton, Niksch - 20

Synthetic Jets for Virtual Shaping and Separation Control

MultiSensor MEMS Arrays for Flow Control Feedback

Adaptive Controller

Sensed Information Aggregation

Control Information Distribution

Reconfiguration Command

Generation

System Performance Evaluation

Knowledge

Base

Environment

Control Architecture

Valasek, Lampton, Niksch - 21

2-D Plate Rectangular Block Ellipsoid Delta Wing Final 2004 2005 2006 2007 Objective

Morphing Air Vehicle Evolution

Valasek, Lampton, Niksch - 22

Morphing Air Vehicle Model - TiiMY

Shape

Ellipsoidal shape with varying axis dimensions.

Constant volume (V) during morphing

2 independent variables: y and z, dependent dimension

Morphing Dynamics

Smart material: shape memory alloy (SMA)

Morphing Dynamics : Simple Nonlinear Differential Equations

6Vx

yz

y-dimension 2

z-dimension 3y

z

y yy V

z zz V

Valasek, Lampton, Niksch - 23

Morphing Dynamics

Y-morphing Z-morphing

Valasek, Lampton, Niksch - 24

Shape Morphing Simulation TiiMY

Valasek, Lampton, Niksch - 25

Optimal Shapes atVarious Flight Conditions

Optimality is defined by identifying a cost function.

J=J (Current shape, Flight condition)

2 2

0.5

( ( )) ( ( ))

3 cos( ) and 2 22

y z y z

Fy z

J J J y S F z S F

S F S e

Valasek, Lampton, Niksch - 26

6-DOF Mathematical Modelfor Dynamic Behavior

Variables

Nonlinear 6–DOF Equations

– Kinematic level:

– Acceleration level:

Drag Force

– Function of air density, square of velocity along axis, and projected area of the vehicle perpendicular to the axis

[ ] [ ]

[ ] [ ]

T Tc x y z c

T T

p d d d v u v w

p q r

;

c l c a

c c d

d

p J v J

mv mv F F

I I I M M

additional dynamics due to morphing

Valasek, Lampton, Niksch - 27

Learning

Valasek, Lampton, Niksch - 28

– Machine Learning

• Learn the optimal control mechanism by wind-tunnel experiments & flight tests

• Possible learning algorithms include Artificial Neural Networks (ANN) , Explanation-Based Learning (EBL), and Reinforcement Learning (RL)

Candidates to develop inference mechanism– Rules-Based Expert System

• Model the knowledge of human experts

• Imitate the natural behaviour of birds

Question: How Many Control Theorists Does It Take To Change A Using

Artificial Neural Networks ?

Knowledge Based Control

– Biologically inspired control process• Mimic the behaviour of birds

Question: How Many Control Theorists Does It Take To Change A Using

Reinforcement Learning ?

Valasek, Lampton, Niksch - 29

Simple Example

Valasek, Lampton, Niksch - 30

Reinforcement Learning - 1

Supervised or unsupervised learning? Sequential decision making.– Knowledge is based on experience and interaction with the environment, not

on input-output data supplied by an external supervisor

Achieves a specific goal by learning from interactions with the environment.– Considers state information (s)– Performs sequences of actions, (a), observing the consequences– Attempts to maximize rewards (r) over time

• These specify what is to be achieved, not how to achieve it

– Constructs a state value function (V)• Learns an optimal control policy

Memory is contained in the state value function

* *arg max[ ( , ) ( ( , ))]a

r s a V s a

21 2 3

0

... it t t t t i

i

R r r r r

*( ) max ( )V s V s for all s

Valasek, Lampton, Niksch - 31

Reinforcement Learning - 2

Learning is done repetitively, by subjecting to different scenarios

Learning is cumulative and lifelong

Formulations are generally based on Finite Markov Decision Processes (MDP)

• 3 major candidate algorithms: – Dynamic programming

– Monte Carlo methods

– Temporal Difference Learning

Valasek, Lampton, Niksch - 32

1. Actor takes action based upon states and preference function2. Critic updates state value function, and evaluates action3. Actor updates preference function

Learning is done repetitively, by subjecting to different scenarios

Reinforcement Learningself training

Valasek, Lampton, Niksch - 34

Two Illustrative Examples

Valasek, Lampton, Niksch - 35

State: – Gain for δa = Kφ (φcmd - φ)

Actions: – Increase gain by small amount– Decrease gain by small amount

Constraints/Boundaries– Max overshoot– Rise time– Settling time

Interesting Features– e-greedy policy incorporated– Upward annealing of γ

incorporated

max os = 2%Tr = 8sTs = 10s

Cessna 208B Super Cargomaster

3 2

4 3 2

3 15 9.0 6.0 19.7

4.8 5.2 11.1 0.01a

e S S SS

S S S S

Matlab: ~250 sec real-time for 1000 learning episodes

familiar example

Reinforcement Learning

Valasek, Lampton, Niksch - 36

Reinforcement Learningfamiliar example

Valasek, Lampton, Niksch - 37

Reinforcement Learningfamiliar example

Valasek, Lampton, Niksch - 38

0

20

40

60

80

100

120

-4-2024

-505

x

z

y

Start

Finish

Smart Block Demo 1

aerial obstacle course

Valasek, Lampton, Niksch - 40

Smart Block: First Try

Valasek, Lampton, Niksch - 41

Smart Block: Second Try

Valasek, Lampton, Niksch - 42

Smart Block: New Course

Valasek, Lampton, Niksch - 51

Adaptive–Reinforcement Learning Control

Valasek, John, Tandale, Monish D., and Rong, Jie, "A Reinforcement Learning Adaptive Control Architecture for Morphing,” Journal of Aerospace Computing, Information, and Communication, Volume 2, Number 4, pp. 174-195, April 2005.

Valasek, John, Doebbler, James, Tandale, Monish D., and Meade, Andrew J., "Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles,”Journal of Aerospace Computing, Information, and Communication (in review).

Tandale, Monish D., Rong, Jie, and Valasek, John, "Preliminary Results of Adaptive-Reinforcement Learning Control for Morphing Aircraft,” AIAA-2004-5358, Proceedings of the AIAA Guidance, Navigation, and Control Conference, Providence, RI, 16-19 August 2004.

Valasek, Lampton, Niksch - 52

A-RLC Architecture

Valasek, Lampton, Niksch - 53

Air Vehicle Example

Valasek, Lampton, Niksch - 54

Objective– Demonstrate optimal

shape morphing for multiplespecified flight conditions

Method– For every flight condition, learn

optimal policy that commands

voltage producing the optimal shape– Minimize total cost over the entire flight trajectory– Evaluate the learning performance after 200 learning episodes

Air Vehicle Example

RL Module is Completely Ignorant of Optimality Relations and Morphing Control Functions:

It Must Learn On Its Own, From Scratch

Valasek, Lampton, Niksch - 55

Agent: Morphing Air Vehicle Reinforcement Learning Module

Environment: Various flight conditions

Goal: Fly in optimal shape that minimizes cost

States: Flight condition; shape of vehicle

Actions: Discrete voltages applied to change shape of vehicle– Action set:

Rewards: Determined by cost functions

Optimal control policy: Mapping of the state to the voltage leading to the optimal shape

Timmy Demo

reinforcement learning definitions

( , ) [2,4] [0,5]

( , ) [2,4] [0,5]

y FS

z F

( , ) [0, 0.5, 1, 5] ; [0, 0.5, 1, 4]y zA V V

Valasek, Lampton, Niksch - 56

Episodic Learning

Unsupervised learning episode

Single pass through 100 meter flight path in 200 seconds

Reference trajectory is generated arbitrarily

The flight condition changes twice during each episode

Shape change iteration after every 1 second

Exploration-exploitation dilemma:– Explorative early, exploitative later -policy with decreasing

Limited training examples– Only 6 discrete flight conditions:

2000 samples for KNNPI) rand(aa

asQ a

i

a

else

),(maxarg

) -1 ty (probabili if policy greedy-ε

Valasek, Lampton, Niksch - 57

Demo

Valasek, Lampton, Niksch - 58

Comparison of True Optimal Shape and Learned Shape

KNN learns poorly for

several flight conditions

Valasek, Lampton, Niksch - 59

Function Approximation– Errors remained which could not be eliminated with additional training.

– Use Galerkin-based Sequential Function Approximation (SFA) to approximate the action-value function Q(s,a)

What Happened?

Valasek, Lampton, Niksch - 60

New SFA approach learns optimal shape well

Comparison of True Optimal Shape and Learned Shape

Valasek, Lampton, Niksch - 61

Comparison

Normalized RMS error

Y dimension Z dimension

K N N 1.42 0.821

S F A 1.27 0.661

10% reduction 20% reduction

Valasek, Lampton, Niksch - 68

Reinforcement Learning successfully learns the optimal control policy that results in the optimal shape at every flight condition. – Can function in real-time, leading to better performance as system

operates over the long term.

Adaptive-Reinforcement Learning Control is a promising candidate for control of Mission Morphing.– Maintains asymptotic tracking in the presence of parametric

uncertainties and initial condition errors.

Shape Changes for “Mission Morphing” can be treated as piecewise constant parameter changes– SAMI is a favorable method for trajectory tracking control

“Morphing for Control” will require different control strategy– Piecewise constant approximation no longer valid

What Does This Show?

Valasek, Lampton, Niksch - 69

Issues & Future Directions

Realistic structural response effects – Aeroelastic behaviour– SMA models and hysteretic behaviour

• Priesach model is algebraic, only has major hysterisis loops• Solution: roll your own with R-L

SMA Hysteresis

-0.005

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0 20 40 60 80 100

Temperature (C)

Str

ain

Experimental Major Math Model Experimental Minor

Valasek, Lampton, Niksch - 70

Issues & Future Directions

Time scale problem: control methodologies to handle faster shape changes– Hovakimyan’s Adaptive Control– Linear Parameter Varying (LPV) control

Novel distributed sensing and distributed actuation on a large(!) scale– Insect and avian inspired sensing

Learning on a continuous domain– Continuous versus discrete

Modify the simulation to include a more advanced aircraft model– Wing-Body, Wing-Body-Empennage, etc.

Build and fly R/C class morphing demonstrator UAV

Valasek, Lampton, Niksch - 71

Issues & Future Directions

Incorporate aerodynamic and structural effects due to large shape changing

Cost Function– Potential components

• Specified CL

• Minimum drag• Minimum peak stress

Degrees of Freedom– Thickness

• 6% to 24%– Camber

• 0% to 10%– Max camber location

• 0.2c to 0.8c– Chord

• 1 unit to ? units– Angle-of-attack

• -5° to 10 °– Within linear range

R-L For Morphing Airfoils & Wings

Valasek, Lampton, Niksch - 72

Morphing Airfoil Demonstration

Valasek, Lampton, Niksch - 73

Questions?

John Valasek

Aerospace Engineering Department

Texas A&M University

3141 TAMU

College Station, TX 77843-3141

(979) 845-1685

valasek@aero.tamu.edu

FSL Web Page

– http://flutie.tamu.edu/~fsl

top related