george konidaris [email protected] · george konidaris! [email protected] spring 2015. machine learning...
TRANSCRIPT
![Page 2: George Konidaris gdk@cs.duke · George Konidaris! gdk@cs.duke.edu Spring 2015. Machine Learning Subfield of AI concerned with learning from data.!!! Broadly, using:! • Experience](https://reader036.vdocument.in/reader036/viewer/2022081614/5fd148356b23bd09732de5f2/html5/thumbnails/2.jpg)
Machine LearningSubfield of AI concerned with learning from data. !!Broadly, using:
• Experience • To Improve Performance • On Some Task
!(Tom Mitchell, 1997)
!
![Page 3: George Konidaris gdk@cs.duke · George Konidaris! gdk@cs.duke.edu Spring 2015. Machine Learning Subfield of AI concerned with learning from data.!!! Broadly, using:! • Experience](https://reader036.vdocument.in/reader036/viewer/2022081614/5fd148356b23bd09732de5f2/html5/thumbnails/3.jpg)
Unsupervised LearningInput: X = {x1, …, xn} !Try to understand the structure of the data. !!!E.g., how many types of cars? How can they vary?
inputs
![Page 4: George Konidaris gdk@cs.duke · George Konidaris! gdk@cs.duke.edu Spring 2015. Machine Learning Subfield of AI concerned with learning from data.!!! Broadly, using:! • Experience](https://reader036.vdocument.in/reader036/viewer/2022081614/5fd148356b23bd09732de5f2/html5/thumbnails/4.jpg)
Dimensionality ReductionX = {x1, …, xn} !If n is high, data can be hard to deal with.
• High-dimensional decision boundary. • Need more data. • But data is often not really high-dimensional.
!!
Dimensionality reduction: • Reduce or compress the data • Try not to lose too much! • Find intrinsic dimensionality
![Page 5: George Konidaris gdk@cs.duke · George Konidaris! gdk@cs.duke.edu Spring 2015. Machine Learning Subfield of AI concerned with learning from data.!!! Broadly, using:! • Experience](https://reader036.vdocument.in/reader036/viewer/2022081614/5fd148356b23bd09732de5f2/html5/thumbnails/5.jpg)
Dimensionality ReductionFor example, imagine if x1 and x2 are meaningful features, and x3 … xn are random noise. !What happens to k-nearest neighbors? !What happens to a decision tree? !What happens to the perceptron algorithm? !What happens if you want to do clustering?
![Page 6: George Konidaris gdk@cs.duke · George Konidaris! gdk@cs.duke.edu Spring 2015. Machine Learning Subfield of AI concerned with learning from data.!!! Broadly, using:! • Experience](https://reader036.vdocument.in/reader036/viewer/2022081614/5fd148356b23bd09732de5f2/html5/thumbnails/6.jpg)
Dimensionality ReductionOften can be phrased as a projection: !!!where:
• • our goal: retain as much variance as possible.
!!
Variance captures what varies within the data.
f : X ! X 0
|X 0| << |X|
![Page 7: George Konidaris gdk@cs.duke · George Konidaris! gdk@cs.duke.edu Spring 2015. Machine Learning Subfield of AI concerned with learning from data.!!! Broadly, using:! • Experience](https://reader036.vdocument.in/reader036/viewer/2022081614/5fd148356b23bd09732de5f2/html5/thumbnails/7.jpg)
PCAPrinciple Components Analysis. !Project data into a new space:
• Dimensions are linearly uncorrelated. • We have a measure of importance for each dimension.
!!
![Page 8: George Konidaris gdk@cs.duke · George Konidaris! gdk@cs.duke.edu Spring 2015. Machine Learning Subfield of AI concerned with learning from data.!!! Broadly, using:! • Experience](https://reader036.vdocument.in/reader036/viewer/2022081614/5fd148356b23bd09732de5f2/html5/thumbnails/8.jpg)
PCA
![Page 9: George Konidaris gdk@cs.duke · George Konidaris! gdk@cs.duke.edu Spring 2015. Machine Learning Subfield of AI concerned with learning from data.!!! Broadly, using:! • Experience](https://reader036.vdocument.in/reader036/viewer/2022081614/5fd148356b23bd09732de5f2/html5/thumbnails/9.jpg)
PCA• Gather data X1, …, Xm. • Adjust data to be zero-mean:
!!
• Compute covariance matrix C. • Compute unit eigenvectors Vi and eigenvalues vi of C.
!Each Vi is a direction, and each vi is its importance - the amount of the data’s variance it accounts for. !New data points:
Xi = Xi �X
j
Xj
m
X̂i = [V1, ..., Vp]Xi
![Page 10: George Konidaris gdk@cs.duke · George Konidaris! gdk@cs.duke.edu Spring 2015. Machine Learning Subfield of AI concerned with learning from data.!!! Broadly, using:! • Experience](https://reader036.vdocument.in/reader036/viewer/2022081614/5fd148356b23bd09732de5f2/html5/thumbnails/10.jpg)
Eigenfaces
(courtesy ORL database)
![Page 11: George Konidaris gdk@cs.duke · George Konidaris! gdk@cs.duke.edu Spring 2015. Machine Learning Subfield of AI concerned with learning from data.!!! Broadly, using:! • Experience](https://reader036.vdocument.in/reader036/viewer/2022081614/5fd148356b23bd09732de5f2/html5/thumbnails/11.jpg)
ISOMAPAnother approach:
• Estimate intrinsic geometric dimensionality of data. • Recover natural distance metric
![Page 12: George Konidaris gdk@cs.duke · George Konidaris! gdk@cs.duke.edu Spring 2015. Machine Learning Subfield of AI concerned with learning from data.!!! Broadly, using:! • Experience](https://reader036.vdocument.in/reader036/viewer/2022081614/5fd148356b23bd09732de5f2/html5/thumbnails/12.jpg)
ISOMAPCore idea: distance metric locally Euclidean • Small radius r, connect each point to neighbors • Weight based on Euclidean distance
![Page 13: George Konidaris gdk@cs.duke · George Konidaris! gdk@cs.duke.edu Spring 2015. Machine Learning Subfield of AI concerned with learning from data.!!! Broadly, using:! • Experience](https://reader036.vdocument.in/reader036/viewer/2022081614/5fd148356b23bd09732de5f2/html5/thumbnails/13.jpg)
ISOMAPSolve all-points shortest pairs:
• Transforms local distance to global distance. • Compute embedding.
![Page 14: George Konidaris gdk@cs.duke · George Konidaris! gdk@cs.duke.edu Spring 2015. Machine Learning Subfield of AI concerned with learning from data.!!! Broadly, using:! • Experience](https://reader036.vdocument.in/reader036/viewer/2022081614/5fd148356b23bd09732de5f2/html5/thumbnails/14.jpg)
ISOMAP
From Tenenbaum, de Silva, and Langford, Science 290:2319-2323, December 2000.
![Page 15: George Konidaris gdk@cs.duke · George Konidaris! gdk@cs.duke.edu Spring 2015. Machine Learning Subfield of AI concerned with learning from data.!!! Broadly, using:! • Experience](https://reader036.vdocument.in/reader036/viewer/2022081614/5fd148356b23bd09732de5f2/html5/thumbnails/15.jpg)
Reinforcement Learning
![Page 16: George Konidaris gdk@cs.duke · George Konidaris! gdk@cs.duke.edu Spring 2015. Machine Learning Subfield of AI concerned with learning from data.!!! Broadly, using:! • Experience](https://reader036.vdocument.in/reader036/viewer/2022081614/5fd148356b23bd09732de5f2/html5/thumbnails/16.jpg)
Reinforcement LearningLearning counterpart of planning. !
max�
R =��
t=0
�trtπ : S → A
![Page 17: George Konidaris gdk@cs.duke · George Konidaris! gdk@cs.duke.edu Spring 2015. Machine Learning Subfield of AI concerned with learning from data.!!! Broadly, using:! • Experience](https://reader036.vdocument.in/reader036/viewer/2022081614/5fd148356b23bd09732de5f2/html5/thumbnails/17.jpg)
RLThe problem of learning how to interact with an environment to maximize reward.
![Page 18: George Konidaris gdk@cs.duke · George Konidaris! gdk@cs.duke.edu Spring 2015. Machine Learning Subfield of AI concerned with learning from data.!!! Broadly, using:! • Experience](https://reader036.vdocument.in/reader036/viewer/2022081614/5fd148356b23bd09732de5f2/html5/thumbnails/18.jpg)
RL
Agent interacts with an environment At each time t:
• Receives sensor signal • Executes action • Transition:
• new sensor signal • reward
st
at
st+1
rt
Goal: find policy that maximizes expected return (sum of discounted future rewards):
π
maxπ
E
!
R =
∞"
t=0
γtrt
#
![Page 19: George Konidaris gdk@cs.duke · George Konidaris! gdk@cs.duke.edu Spring 2015. Machine Learning Subfield of AI concerned with learning from data.!!! Broadly, using:! • Experience](https://reader036.vdocument.in/reader036/viewer/2022081614/5fd148356b23bd09732de5f2/html5/thumbnails/19.jpg)
RLThis formulation is general enough to encompass a wide variety of learned control problems.
![Page 20: George Konidaris gdk@cs.duke · George Konidaris! gdk@cs.duke.edu Spring 2015. Machine Learning Subfield of AI concerned with learning from data.!!! Broadly, using:! • Experience](https://reader036.vdocument.in/reader036/viewer/2022081614/5fd148356b23bd09732de5f2/html5/thumbnails/20.jpg)
Markov Decision Processes: set of states : set of actions : discount factor !: reward function is the reward received taking action from state and transitioning to state . !: transition function is the probability of transitioning to state after taking action in state . !
RL: one or both of T, R unknown.
S
A
R
R(s, a, s′)
γ
a s
s′
T
T (s′|s, a) s′
a s
< S, A, γ, R, T >
![Page 21: George Konidaris gdk@cs.duke · George Konidaris! gdk@cs.duke.edu Spring 2015. Machine Learning Subfield of AI concerned with learning from data.!!! Broadly, using:! • Experience](https://reader036.vdocument.in/reader036/viewer/2022081614/5fd148356b23bd09732de5f2/html5/thumbnails/21.jpg)
RLExample: !!!!!!!!
States: set of grid locations Actions: up, down, left, right Transition function: move in direction of action with p=0.9 Reward function: -1 for every step, 1000 for finding the goal
![Page 22: George Konidaris gdk@cs.duke · George Konidaris! gdk@cs.duke.edu Spring 2015. Machine Learning Subfield of AI concerned with learning from data.!!! Broadly, using:! • Experience](https://reader036.vdocument.in/reader036/viewer/2022081614/5fd148356b23bd09732de5f2/html5/thumbnails/22.jpg)
RLExample: !!!!!!!!
States: (real-valued vector) Actions: +1, -1, 0 units of torque added to elbow Transition function: physics! Reward function: -1 for every step
(θ1, θ̇1, θ2, θ̇2)
![Page 23: George Konidaris gdk@cs.duke · George Konidaris! gdk@cs.duke.edu Spring 2015. Machine Learning Subfield of AI concerned with learning from data.!!! Broadly, using:! • Experience](https://reader036.vdocument.in/reader036/viewer/2022081614/5fd148356b23bd09732de5f2/html5/thumbnails/23.jpg)
MDPsOur target is a policy: !!!A policy maps states to actions. !
The optimal policy maximizes: !!!!This means that we wish to find a policy that maximizes the return from every state.
π : S → A
maxπ
!s, E
!
R(s) =∞"
t=0
γtrt
#
#
#
#
#
s0 = s
$
![Page 24: George Konidaris gdk@cs.duke · George Konidaris! gdk@cs.duke.edu Spring 2015. Machine Learning Subfield of AI concerned with learning from data.!!! Broadly, using:! • Experience](https://reader036.vdocument.in/reader036/viewer/2022081614/5fd148356b23bd09732de5f2/html5/thumbnails/24.jpg)
Value Functions
Given a policy, we can estimate of for every state. • This is a value function. • It can be used to improve our policy.
!!!!
R(s)
Vπ(s) = E
!
∞"
t=0
γtrt
#
#
#
#
#
π, s0 = s
$
πThis is the value of state under policy . s
![Page 25: George Konidaris gdk@cs.duke · George Konidaris! gdk@cs.duke.edu Spring 2015. Machine Learning Subfield of AI concerned with learning from data.!!! Broadly, using:! • Experience](https://reader036.vdocument.in/reader036/viewer/2022081614/5fd148356b23bd09732de5f2/html5/thumbnails/25.jpg)
Value FunctionsSimilarly, we define a state-action value function as follows: !!!!This is the value of executing in state , then following . !Note that:
Qπ(s, a) = E
!
∞"
t=0
γtrt
#
#
#
#
#
π, s0 = s, a0 = a
$
πa s
Qπ(s, π(s)) = Vπ(s)
![Page 26: George Konidaris gdk@cs.duke · George Konidaris! gdk@cs.duke.edu Spring 2015. Machine Learning Subfield of AI concerned with learning from data.!!! Broadly, using:! • Experience](https://reader036.vdocument.in/reader036/viewer/2022081614/5fd148356b23bd09732de5f2/html5/thumbnails/26.jpg)
Policy IterationRecall that we seek the policy that maximizes . !Therefore we know that, for the optimal policy : !!!!!This means that any change to that increases anywhere obtains a better policy.
Vπ(s),!s
π∗
Vπ∗(s) ≥ Vπ(s),∀π, s
Qπ∗(s, a) ≥ Qπ(s, a),∀π, s, a
π Q
![Page 27: George Konidaris gdk@cs.duke · George Konidaris! gdk@cs.duke.edu Spring 2015. Machine Learning Subfield of AI concerned with learning from data.!!! Broadly, using:! • Experience](https://reader036.vdocument.in/reader036/viewer/2022081614/5fd148356b23bd09732de5f2/html5/thumbnails/27.jpg)
Policy IterationThis leads to a general policy improvement framework:
1. Start with a policy 2. Learn 3. Improve
a. !
π
Qπ
π
π(s) = maxa
Q(s, a),!sRepeat
This is known as policy iteration. It is guaranteed to converge to the optimal policy. !Steps 2 and 3 can be interleaved as rapidly as you like. Usually, perform 3a every time step.
![Page 28: George Konidaris gdk@cs.duke · George Konidaris! gdk@cs.duke.edu Spring 2015. Machine Learning Subfield of AI concerned with learning from data.!!! Broadly, using:! • Experience](https://reader036.vdocument.in/reader036/viewer/2022081614/5fd148356b23bd09732de5f2/html5/thumbnails/28.jpg)
FridayHow to learn Q!