function learning and neural nets r&n: chap. 20, sec. 20.5

Function Learning and Neural Nets

R&N: Chap. 20, Sec. 20.5

Function-Learning Formulation

Goal function f Training set: (x(i),y(i)), i = 1,…,n, y(i)=f(x(i))

Inductive inference: find a function h that fits the points well

Same Keep-It-Simple biasx

Least-Squares Fitting Propose a class of functions g(x,)

parameterized by Minimize E() = i ( g(x(i),)-y(i))2

Linear Least-Squares

g(x,) = x1 1 + … + xN N

Best given by = (ATA)-1 AT b

Where A is matrix of x(i)’s, b is vector of y(i)’s

f(x)g(x,)

Constant offset

Set x0=1, g(x,) = x0 0 + x1 1 + … + xN N

Best given by = (ATA)-1 AT b

Where A is matrix of x(i)’s, b is vector of y(i)’s

f(x)g(x,)

Nonlinear Least-Squares

E.g. quadratic g(x,) = 0 + x 1 + x2

E.g. exponential g(x,) = exp(0 + x

1) Any combinations

g(x,) = exp(0 + x 1) + 2 + x

linear

quadratic other

Performance of Nonlinear Least-squares

Overfitting: too many parameters Efficient optimization

Often can only find a local minimum of objective E()

Expensive with lots of data!

Neural Networks Overfitting: too many parameters Efficient optimization

Often can only find a local minimum of objective E()

Expensive with lots of data!

Perceptron(The goal function f is a boolean

y = g(i=1,…,n wi xi)

w1 x1 + w2 x2 = 0

Perceptron(The goal function f is a boolean

Unit (Neuron)

g(u) = 1/[1 + exp(-u)]

A Single Neuron can learn

A disjunction of boolean literals x1 x2 x3

Majority function

Neural Network

Network of interconnected neurons

Acyclic (feed-forward) vs. recurrent networks

Two-Layer Feed-Forward Neural Network

Inputs Hiddenlayer

Outputlayer

w1j w2k

Backpropagation (Principle)

New example y(k) = f(x(k)) φ(k) = outcome of NN with weights w(k-1) for

inputs x(k) Error function: E(k)(w(k-1)) = ||φ(k) – y(k)||2

wij(k) = wij

(k-1) – εE(k)/wij (w(k) = w(k-1) - E)

Backpropagation algorithm: Update the weights of the inputs to the last layer, then the weights of the inputs to the previous layer, etc.

Understanding Backpropagation

Minimize E() Gradient Descent…

Gradient of E

Step ~ gradient

Example of Stochastic Gradient Descent

Minimize E() = e1()+e2()+…+eN() Here ei = (g(x(i),)-y(i))2

Take a step to reduce eiE()

Gradient of e1

Gradient of e2

Gradient of e3

Stochastic Gradient Descent

Parameter values over time

(local) minimum of E

Stochastic Gradient Descent

Objective function values over time

Caveats

Choosing a convergent “learning rate” can be hard in practice

Comments and Issues

How to choose the size and structure of networks?• If network is too large, risk of over-fitting

(data caching)• If network is too small, representation

may not be rich enough Role of representation: e.g., learn the

concept of an odd number Incremental learning

Role of Marketing

Not a good model of a neuron Spiking behavior, recurrence in real NNs

No special properties above other learning techniques

Like other learning techniques, a convenient way to get results without thinking too hard

Incremental (“Online”) Function Learning

Data is streaming into learnerx1,y1, …, xt,yt yi = f(xi)

Observes xt+1 and must make

prediction for next time step yt+1

Brute force approach: Store all data at step t Use your learner of choice on all data up

to time t, predict for time t+1

Example: Mean Estimation

yi = + error term (no x’s) Current estimate t= 1/t i=1…t yi

t+1= 1/(t+1) i=1…t+1 yi

= 1/(t+1) (yt+1 + i=1…t yi) = 1/(t+1) (yt+1 + tt)

t+1= 1/(t+1) i=1…t+1 yi

= 1/(t+1) (yt+1 + i=1…t yi) = 1/(t+1) (yt+1 + tt)

t+1= 1/(t+1) i=1…t+1 yi

= 1/(t+1) (yt+1 + i=1…t yi) = 1/(t+1) (yt+1 + tt)

5 6 = 5/6 5 + 1/6 y6

t+1= 1/(t+1) (yt+1 + tt) Only need to store t, t

Similar formulas for standard deviation

5 6 = 5/6 6 + 1/6 y6

Incremental Least Squares

Recall Least Squares estimate = (ATA)-1 AT b

Where A is matrix of x(i)’s, b is vector of y(i)’s (laid out in rows)

…NxM Nx1

Let A(t), b(t) be A matrix, b vector up to time t

(t) = (A(t)TA(t))-1 A(t)T b(t)

A(t+1) =

x(t+1)

b(t+1) =

y(t+1)

(T+1)xM (t+1)x1

b(t)A(t)

(t+1) = (A(t+1)TA(t+1))-1 A(t+1)T b(t+1)

A(t+1)T b(t+1) =A(t)T b(t) + y(t+1)x(t+1)

A(t+1) =

x(t+1)

b(t+1) =

y(t+1)

(T+1)xM (t+1)x1

b(t)A(t)

(t+1) = (A(t+1)TA(t+1))-1 A(t+1)T b(t+1)

A(t+1)T b(t+1) =A(t)T b(t) + y(t+1)x(t+1)

A(t+1)TA(t+1) = A(t)TA(t) + x(t+1)x(t+1)T

A(t+1) =

x(t+1)

b(t+1) =

y(t+1)

(T+1)xM (t+1)x1

b(t)A(t)

(t+1) = (A(t+1)TA(t+1))-1 A(t+1)T b(t+1)

A(t+1)T b(t+1) =A(t)T b(t) + y(t+1)x(t+1)

A(t+1) =

x(t+1)

b(t+1) =

y(t+1)

(T+1)xM (t+1)x1

b(t)A(t)

(t+1) = (A(t+1)TA(t+1))-1 A(t+1)T b(t+1)

A(t+1)T b(t+1) =A(t)T b(t) + y(t+1)x(t+1)

Sherman-Morrison Update (Y + xxT)-1 = Y-1 - Y-1

xxT Y-1 / (1 – xT Y-1 x)

Putting it all together Store

p(t) = A(t)Tb(t)

Q(t) = (A(t)TA(t))-1

Updatep(t+1) = p(t) + y x

Q(t+1) = Q(t) - Q(t)

xxT Q(t) / (1 – xT Q(t) x)(t+1) = Q(t+1)p(t+1)

• Function learning with least squares

• Neural nets, backpropagation, and gradient descent

• Incremental learning

Reminder

• HW6 due

• HW7 available on Oncourse

Machine Learning Classes

• CS659 (Hauser) Principles of Intelligent Robot Motion

• CS657 (Yu) Computer Vision

• STAT520 (Trosset) Introduction to Statistics

• STAT682 (Rocha) Statistical Model Selection

function learning and neural nets r&n: chap. 20, sec. 20.5

Documents

fortgeschrittene funktionale contents programmierung...chap....

fishing nets or safety nets

mushoku tensei vol 20.5 [gokoto project]

1 petri nets classical petri nets: the basic model

safe nets, mumbai, bird protection & construction safety...

chap.1 chap.2 chap.3 chap.4 chap.5 chap.6 chap.7 chap.8 chap...

training & conditioning 20.5

safety nets: fall protection for the construction...

ewrt1 a f15 class 20.5

reversible fabric bins instructions · reversible fabric...

20.5 - beauty.pdf

child development center (cdc)...use r-value 20.5 or better;...

“””””””””””””””””””””””””””””””””””””””””””””””””””””””””” ·...

nets wa mansell neosled manualnetswa.net.au › ... ›...

markings in perpetual free-choice nets are fully ... ·...

the vetinari dualegacy chapter 20.5

sheaves - gosan.net · 12 chap. 02 02 chap. 1 chap. 2 chap....

hp casestudy final 20.5

fishing nets or safety nets

kansas amateur radio - ksarrl.orgksarrl.org/kar/jan 10 kar...