data assimilation, machine learning: statistical …2018/08/04  · data assimilation, machine...

125
Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D. I. Abarbanel Department of Physics and Marine Physical Laboratory (Scripps Institution of Oceanography) Center for Engineered Natural Intelligence University of California, San Diego [email protected]

Upload: others

Post on 21-May-2020

16 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

Data Assimilation, Machine Learning:

Statistical Physics Problems

Introduction, Core Ideas, Applications

Henry D. I. Abarbanel

Department of Physics

and

Marine Physical Laboratory (Scripps Institution of Oceanography)

Center for Engineered Natural Intelligence

University of California, San Diego

[email protected]

Page 2: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

This is meant to be an introductory and an

advanced set of pedagogical talks on data assimilation

and machine learning.

These are statistical Physics problems.

I hope you will ask a lot of questions.

My colleague Dan Margoliash will provide a

neurobiological setting for utilizing many of the methods

discussed. Much of what I address has been developed

with him and tested and improved in application to results

obtained in his laboratory.

Page 3: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

I received a bipolar transistor for a

New Year’s present. I want to know how

it works so I can use many of them to

build a follow-on K computer (before

2020?).

What do I do?

Page 4: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

General answer:

Hook up my nice new transistor to a known RLC

circuit and drive the dynamical variables of the transistor

through their dynamical range. Measure some of the

variables of the circuit V(t); this produces data. Make a

model of the transistor, drive it in precisely the same way, to

get model output Vmodel(t).

Minimize the distance ,

Subject to our model equations of motion.

Test the model completed by estimated parameters

through prediction for t > T.

(Vdata

(t) -Vmod el

(t))2

t=0

T

å

Page 5: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

What do we need to complete this task?

➢ a model of the origin of the data

➢ data

➢ a way to minimize the distance between the data and

the model variables

We first generate our own data from our model,

then use our minimization method to show it works----

called a twin experiment.

Then, use the method on experimental data with

some confidence now to determine the parameters in my

new transistor.

Page 6: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

B

C

E

L

R

C1

C2

Ree

Vee

Colpitts Oscillator Circuit 1920’s and 1950s and 1970s and 1990s

IC(VE) = (1 mA) exp

.

| |

Bth

k TV

e

J.J Ebers and J.L. Moll. Large-signal behavior of junction transistors.

Proceedings of the IRE, 42(12):1761–1772, Dec. 1954.

H.K. Gummel and H.C. Poon, “An Integral Charge Control Model of

Bipolar Transistors,” Bell System Technical Journal, vol. 49, no. 5, pp.

827–852, 1970.

Turn this on, please

Page 7: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

Voltage time series VE(t) recorded from a Colpitts circuit operating in the

chaotic regime. Δt = 10 μsec

ms

Page 8: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

Data Source

dx1(t)

dt= a x

2(t) a is kept fixed, then driven : a(t)

dx2(t)

dt= -g (x

1(t) + x

3(t)) - qx

2(t)

dx3(t)

dt= h(x

2(t) +1+ e

- x1(t )

)

Rescaled Colpitts Oscillator

Page 9: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D
Page 10: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D
Page 11: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

No coupling of data into model k(t) = 0

Page 12: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

Colpitts Oscillatork(t) = 1.9

Page 13: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

Experimental Data from Colpitts Circuit

u = k

Page 14: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

Data Source

dx1(t)

dt= a x

2(t)

dx2(t)

dt= -g (x

1(t) + x

3(t)) - qx

2(t)

dx3(t)

dt= h(x

2(t) +1+ e

- x1(t )

)

Model Equations x1(t) is passed to the model

dy

1(t)

dt= a

My

2(t) + u(t)(x

1(t) - y

1(t))

dy

2(t)

dt= -g

M( y

1(t) + y

3(t)) - q

My

2(t); u(t) ³ 0

dy

3(t)

dt= h

M( y

2(t) +1+ e

- y1(t )

)

Page 15: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

12 2

1 1

0

Minimize

1( , , ) (( ( ) ( )) ( ) )

2

subject to

N

m

C y u p x m y m u mN

1

12 1 1

21 3 2

( )32

( ) ( ) ( )( ( ) ( ))

( ) ( ( ) ( )) ( )

( ) ( ( ) 1 )

M

M M

y t

M

dy ty t u t x t y t

dt

dy ty t y t q y t

dt

dy ty t e

dt

Data

Page 16: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

The solution of the optimization problem is an

iterative process in (y(n),u(n),p) space. Given initial values,

(y0(n),u0(n),p0), iterate, adjusting all (ym(n),um(n),pm) m = 1, 2,

3, … to minimize cost function.

Tracks state variables and correctly estimates parameters

when time dependent. Tracks accurately through bifurcations in

system behavior: chaoticfixed pointlimit cyclechaotic.

The number of variables in each optimal estimation

calculation is about 3000-5000.

0 0 0 1 1 1( ( ) , ( ) , ) ( ( ) , ( ) , )

......( ( ) , ( ) , )

until objective (cost) function is minimized,

subject to model e

We require ( ) 0.

quations

Final Final Fi

Fin

nal

al

y n u n p y n u n p

y n u n p

u n

Page 17: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

Chaotic Colpitts Oscillator; Initial Conditions in Optimization are free

x1(t) observed; other state variables evaluated by SNOPT

Page 18: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

Chaotic Colpitts Oscillator; External Driving of Parameter α(t)

α > 5.0

Chaotic

α < 5.0

Regular

Page 19: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

11 1 1 1

1

( )( ( ), ( ), ) ( )( ( ) - ( ))

( )( ( ), ( ), )

R

RR R

dy tF y t y t q u t x t y t

dt

dy tF y t y t q

dt

Model

Data

Page 20: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

Colpitts Oscillator

Page 21: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

Model

Data

Page 22: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D
Page 23: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

SNOPT reached Solution

Model

Data

Page 24: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D
Page 25: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

Experimental

Colpitts Oscillator Circuit

Δt = 10 µs

VE(t) presented to state and

estimation procedure.

VCE(t) and IL(t) estimated,

with 10 ms of data.

Then predictions are made

from estimated state at

t = 10ms: VE(t), VCE(t), IL(t)

Page 26: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

Measured VE(t) and estimated VCE(t) and IL(t)

Page 27: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

Predicted VE(t), VCE(t) and IL(t) from estimates at t = 10ms

Page 28: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

12 2

1 1

0

Minimize

1( , , ) (( ( ) ( )) ( ) )

2

subject to

N

m

C y u p x m y m u mN

1

12 1 1

21 3 2

( )32

( ) ( ) ( )( ( ) ( ))

( ) ( ( ) ( )) ( )

( ) ( ( ) 1 )

M

M M

y t

M

dy ty t u t x t y t

dt

dy ty t y t q y t

dt

dy ty t e

dt

Data

Where did all this come from?

Why this C(y,u,p)? Why this “nudging” term ?

Page 29: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

12 2

1 1

0

Minimize

1( , , ) (( ( ) ( )) ( ) )

2

subject to

N

m

C y u p x m y m u mN

1

12 1 1

21 3 2

( )32

( ) ( ) ( )( ( ) ( ))

( ) ( ( ) ( )) ( )

( ) ( ( ) 1 )

M

M M

y t

M

dy ty t u t x t y t

dt

dy ty t y t q y t

dt

dy ty t e

dt

Where did all this come from?

Why this C(y,u,p)? Why this “nudging” term ?

Just for the record, this is the wrong

answer. We will derive the correct answer.

Page 30: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

With the electronic circuit in mind, we turn to a

general view of the problem of transferring information

from observations to a models of the processes of those

observations.

Not actually a new problem. Newton did this in

1687 in determining that elliptical orbits satisfying

Kepler’s laws require a 1/r2 force.

The questions we pose 330 years later are richer:

collect information from many sources from observations

of complex systems. We want to do that in a systematic

manner allowing large data sets, rich models of the

processes producing those data sets.

Page 31: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

My transistor is the same, in scientific spirit, as your:

❖ Atmosphere

❖Neuron

❖ Lake

❖Ocean

❖Biological cell

❖ whatever is your complex system of interest

a model of the origin of the data

data

a way to minimize the distance between the data and the model variables

Page 32: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

Topics:

❖ Investigating rules of nonlinear dynamics in

physical and biological (complex) systems

❖ A complex oscillator-data assimilation

❖ General setting: neurobiological example –

see Margoliash talks

❖ General problems—numerical algorithms

❖ machine learning: statistical Physics and

data assimilation

Page 33: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

Data Assimilation in a time window [t0,tF]:

Transfer Information from Data Library y(τ) to a Model x(t)

321 k F

Observe y(τ1)

Observe y(τ2)

Observe y(τ3)

Observe y(τk)

Observe y(τF)

Move model

forward

Move model

forward

Move model

forward

Move model

forward

t0 t1tn tF

P( X |Y ) =P( X ,Y )

P(Y )

X= x(t0),x(t

1),...., x(t

F){ } Y= y(t

1), y(t

2),...., y(t

F){ }

states/parameters of model da ta

Page 34: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

Data Assimilation:

Transfer of Information from Measurements

to a Model of the Observations

We start with noisy measurements yk(t); k = 1, 2, …L, errors in the model xa(t); a=1,2,…,D >>L, and uncertain initial conditions at x(t0) .

We wish to incorporate the information in measurements at t0, t1, …, tF = T into our statistical estimate of the complete state of the model at these times and into our statistical estimate of the model parameters.

The model has errors; given x(T), we use it to predict for x(t > T). This a validation (or not) of the model.

Page 35: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

t = t

0,t

1,...t

n,...,t

F= T{ }

( )

1,2,...,

l ny

l L

1( )

( ( ));

1,2,...,

a n

a n

x t

f x t

a D

y

l(n) = x

l(n))

L D

Data source: Transmitter

Model: Receiver

Generalized synchronization of

the transmitter and receiver

Page 36: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

Statistical data assimilation is communication of information from measurements (transmitter) to a dynamical model (receiver).

At the end of an observation window [t0,tF] we want

the conditional probability distribution of the state of the

system P(X|observations); X = {x(t0),x(t1),…,x(tF)} is path of

model through [t0,tF] given measurements during the

window.

We then want to predict the future conditional

probability distribution P(x(t > tF)|observations) for new

forcing of the system.

Typical situation: The measurements are noisy. The model has

errors. We are unsure of the state of the system when we begin observing.

Page 37: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

Data Assimilation in a time window [t0,tF]:

Transfer Information from Data Library y(τ) to a Model x(t)

321 k F

Observe y(τ1)

Observe y(τ2)

Observe y(τ3)

Observe y(τk)

Observe y(τF)

Move model

forward

Move model

forward

Move model

forward

Move model

forward

t0 t1tn tF

P( X |Y ) =P( X ,Y )

P(Y )

X= x(t0),x(t

1),...., x(t

F){ } Y= y(t

1), y(t

2),...., y(t

F){ }

states/parameters of model da ta

Page 38: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

Observation window in time: t0,t

1,...t

N

X (n) = x(t0),x(t

1),...,x(N ){ } = x(0),x(1),x(2),...,x(N ){ }

Model state vectors and parameters at times t0,t

1,...t

N

Y(n) = y(1), y(2), y(3),...., y(F ){ }

Observed data vectors at times t0

£ t1,t

1,...t

F£ t

N= t

F

We want to express P(X(n+1)|Y(n+1)) in terms of

P(X(n)|Y(n)).

Then we iterate from n=N -1, back to n=0. The

product of these probabilities gives us a representation

of P(X(N)|Y(N)) starting at P(x(0)).

Page 39: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

( ( 1), ( ), ( 1), ( ))

( ( 1), ( ), ( 1), ( )) ( ( 1), ( ), ( ))

( ( 1), ( ), ( )) ( ( ), ( ))

( ( 1), ( ) | ( ))( ( 1) | ( 1), ( ), ( ))

( ( ) | ( ))

( ( 1), ( 1))

( ( ), ( ))

( ( ), (

P x n X n y n Y n

P x n X n y n Y n P x n X n Y n

P x n X n Y n P X n Y n

P x n X n Y nP y n x n X n Y n

P X

P X n Y n

P X n Y n

n Y nP X n Y

))

( ( ), ( ))( ( 1) | ( 1), ( ), ( ( ( 1) | ( )) M) ar ov k) P x

n

P X n Y nP y n x n X n n nY xn

Change due to Observation Move Model Forward

Page 40: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

( ( 1) | ( )

( ( 1) | ( 1))

( ( ) | (

( ( 1), ( 1))

( ( 1), ( ))

( ( 1) | ( 1), ( ), ( ))( ( ))

( ( 1), ( ))

( ( 1), ( 1), ( ), (

)

( ( 1) | ( )))

( ( 1) | (

))

)) ( ( 1), ( ), )(

( )) (

P X n Y n

P X n Y n

P X n Y n

P y n Y n

P y n x n X n Y nP Y n

P y n Y n

P y n x n X n Y n

P y n Y n P x n X n Y n

P x n x n

P x n P Xx nn

( ( 1), ( 1), ( ) | ( ))exp[log ( ( 1) | (]

( ( 1) | ( )) ( ( 1), ( ) |( ( ) | ( ))

( ))

))

) | ( ))

( (( ( 1), ( 1), ( ) | ( ))

( ( 1) | ( )) ( ( 1), ( ) | ( ))

ex

( ( 1) | ( )) ) | (

p

))

[

P y n x n X n Y n

P y n Y n P x n X n YP X n YP x n

nnx n

P yP x n x

Y n

P X n Yn x n X n Y n

P y n Y n P x n X n Y n

CM

n n

( ( 1), ( 1), ( ) | ( ))] ( ( 1) ( ( ) | () )| )( )P x n xI y n x n X n X nY nn P Yn

Change due to Observation Move Model Forward

Page 41: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

, log , /

1 , 1 , ,

1 , 1

| | | |

{ }

Shannon

, |

1 , 1 , |log

1 |

194

,

0s

1 |

CMI a b c P a b c P a c P b c

a y

CMI y n x n X n Y n

P y n x n X n Y n

P y n Y n P x n X n Y

n b x n X n c Y

n

n

Page 42: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

X = x(t0),x(t

1),...., x(t

F){ } Y = y(t

1), y(t

0),...., y(t

F){ }

P( X |Y ) = d Dx(n)ò P( y(k) | X (k)) P(x(n +1) | x(n))P(x(0))n=0

F-1

Õk=0

F

Õn=0

F

Õ

= dX e- A( X )

ò

Expected value of functions on the path X(N), G(X)

E[G(X)|Y] = <G(X)> = dX e- A( X )

ò G( X )

dX e- A( X )

ò

Our job is this: given data Y,

and a model dx(t)/dt = F(x(t)) [equivalently x(n+1) = f(x(n))],

do the integral.

Path of state in [t0,tF]

Page 43: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

00

1

0

( ) ( ( ), ( ) | ( 1))

- log{ ( ( 1) | ( ))} log{ ( (0))}

N

n

N

n

A X CMI X n y n Y n

P x n x n P x

Action for State and Parameter Estimation

Dynamics

Information Transfer

Initial Condition

Page 44: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

Everything rests on the structure of A(X) in path space

E[G(X)|Y]=<G(X)>=dX e- A( X )

ò G(X )

dX e- A( X )

ò

Page 45: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

Two methods for evaluating such high dimensional

integrals:

(1) Laplace’s method (1774); seek minima of A(X)--

there are multiple minima

(2) Monte Carlo searches

First seeks minima of A0(X); second samples near

these minima.

¶A( X )

¶Xa

= 0 and ¶2 A( X )

¶Xa

¶Xb

> 0

Page 46: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

¶A0( X )

¶Xa X q

= 0 q=0,1,... A0( X 0 ) < A( X q¹0 )

We focus on Laplace method. How can we find the minima Xq

which minimum gives the biggest contribution to the integral ?

Everything rests on the structure of A(X) in path space.

E[G( X ) |Y] = < G( X ) > = dXe- A( X )

ò G( X )

dXe- A( X )

ò

Page 47: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

A(X) is nonlinear in X and has multiple minima. Location and

number of these minima depend on number of measurements at

each observation time in [t0,tN].

Standard model, Gaussian Error Action

Observations have Gaussian noise; models have Gaussian

errors, action is

It is not Gaussian in X, if f(x) is nonlinear. Finding paths and

associated minima at any Rm and Rf is not hard (IPOPT, other

public domain optimization algorithms), but finding the path with the

smallest action—is a challenge---it is NP complete, in general.

ASM (X) =Rm (n,l)

2(xl (n) - yl (n))2

l=1

L

ån=0

N

å +

R f (a)

2a=1

D

ån=0

N-1

å (xa (n +1) - fa (x(n)))2

Page 48: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

Origin of multiple minima is instability on the synchronization manifold

yl = xl(n). Measurements act to transfer information and stabilize directions

in state space. Looking in continuous time shows this:

gives the Euler-Lagrange equations:

These have the boundary

conditions pa(t0) = pa(tf) = 0.

ASM

(x(t), dx(t) / dt) = dt L(x(t), dx(t) / dtt0

tf

ò ,t)

= dt R

m(t)

2(x

l(t) - y

l(t))2

l=1

L

å +R

f(a)

2(dx

a(t) / dt - F

a(x(t)))2

a=1

D

åé

ëêê

ù

ûúút

0

tf

ò

( )

( )( ( ) ( ( )) ( ( ) ( ))

q

b mab ab b al l l

fx t

d dx t RDF x t F x t x t y t

dt dt R

‘nudging’ to x(n) = y(n)

0( ) 0A X

Page 49: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

ASM

(x(t), dx(t) / dt) = dt L(x(t), dx(t) / dtt0

tf

ò ,t)

= dt R

m(t)

2(x

l(t) - y

l(t))2

l=1

L

å +R

f(a)

2(dx

a(t) / dt - F

a(x(t)))2

a=1

D

åé

ëêê

ù

ûúút

0

tf

ò

c(x(t) - y(t)) =R

m(t)

2(x

l(t) - y

l(t))2

l=1

L

å

Wab

(x(t)) =¶F

a(x(t))

¶xb(t)

-¶F

b(x(t))

¶xa(t)

dab

d 2xb(t)

dt2- W

ab(x(t))

dxb(t)

dt=

¶c(x(t) - y(t))

Rf

+F(x(t))2

2

é

ëêê

ù

ûúú

¶xa(t)

+¶F

a(x(t))

¶t

Page 50: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

d 2xa(t)

dt2=

dx(t)

dt´ B(x(t))

a

+ Ea(x(t))

Ba(x(t)) = e

abc

¶xb(t)

Ac(x(t))

=1

2

¶xb(t)

Ac(x(t)) -

¶xc(t)

Ab(x(t))

é

ëê

ù

ûú

Ea(x(t)) = -

¶j(x(t))

¶xa(t)

-¶A

a(x(t))

¶t

Page 51: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

Now we move on to evaluating the expected value

integrals using Laplace’s method

We do not discuss corrections to the method, but

one can. It is complicated algebra.

Page 52: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

A(X) is nonlinear in X and has multiple minima.

Location and number of these minima depend on the number

of measurements at each observation time in [t0,tN].

Standard model, Gaussian Error Action

Observations have Gaussian noise; models have Gaussian

errors, action is

Now we are ready to minimize the action—maximize the

probability distribution.

ASM (X) =Rm (n,l)

2(xl (n) - yl (n))2

l=1

L

ån=0

N

å +

R f (a)

2a=1

D

ån=0

N-1

å (xa (n +1) - fa (x(n)))2

Page 53: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

2

Standard Model

0 1

12

0 1

( , )( ) ( ( ) ( ))

2

( )( ( 1) ( ( )))

2

N Lm

l l

n l

N Df

a a

n a

R n lA X x n y n

R ax n f x n

We want to minimize this over all x(n) and parameters in f(x(n))

If f(x(n)) is nonlinear the Action = A(x(n),x(n+1)) has many minima in

general.

The search for the smallest minimum of a nonlinear objective

function, such as A(x(n),x(n+1)) is, in general, NP-complete.

An NP-complete problem cannot be solved in polynomial time in any

known way.

For us, that is not good news.

Page 54: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

To determine paths for lowest minimum action; find minimum for very

small model error value Rf, then slowly increase Rf to larger values. We call

this variational annealing (This is distinct from standard simulated

annealing in statistical Physics). If Rf ∞, model error is 0.

We look at the opposite limit Rf0, where model plays no role and

dynamical phase space structure is absent.

At Rf = 0, minimum is degenerate at xl(t) = yl(t); other unmeasured

states undetermined. With Rf = Rf0 very small, choose N0 initial starting paths

with xl(t) = yl(t), others chosen from a uniform distribution, this is a set of X0

for numerical minimization. We call outcomes X1.

Use N0 paths X1 as initial starting paths with Rf = αRf0 ; α>1, to arrive

at N0 paths X2. Increase Rf to α2Rf0, … and continue using outcome paths as

initial choices for next optimizations, slowly increasing Rf by powers of α.

Plot A0(Xq) versus . Action level plots.b = loga Rf / Rf 0

éë ùû

Page 55: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

Simple Model Neuron NaKL

D+NP = 4 + 19, L = 1 Voltage

Page 56: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

Twin Experiment on NaKL Neuron

CdV (t)

dt= g

Nam(t)3h(t)(E

Na-V (t)) +

gKn(t)4(E

K-V (t)) + g

L(E

L-V (t)) + I

applied(t)

dx(t)

dt=

(V (t)) - x(t)

tx(V (t))

x(t) = m(t),h(t),n(t){ }

2

1 2

1( ) 1 tanh( )

2

( ) (1 tanh ( ))

x

x

xtx x x

xt

V Vx V

dV

V VV t t

dV

Generate data from NaKL equations

y(t) = x(t) + σN(0,1) noise

D = 4, L = 1

Page 57: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D
Page 58: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

Annealing in Model Error Accuracy

Paths giving minima of the action depend on the

number of measurements L.

For the Standard Model, when action levels are

independent of Rf, the action level is dictated by statistics of

measurement error term---a consistency check on the action

level evaluations.

Page 59: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

NaKL Model Action Level Plot Measure Voltage ONLY

3/2 0log [ / ]f fR R

Page 60: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D
Page 61: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D
Page 62: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

NaKL Neuron Twin Experiment

Parameters Known Estimated LB UBgNa 120.0 108.4 50.0 200.0ENa 50.0 49.98 0.0 100.0gK 20.0 21.11 5.0 40.0EK −77.0 −77.09 −100.0 −50.0gL 0.3 0.3028 0.1 1.0EL −54.0 −54.05 −60.0 −50.0C 0.8 0.81 0.5 1.5Vm −40.0 −40.24 −60.0 −30.0dVm 0.0667 0.0669 0.01 0.1τm0 0.1 0.0949 0.05 0.25τm1 0.4 0.4120 0.1 1.0Vh −60.0 −59.43 −70.0 −40.0dVh −0.0667 −0.0702 −0.1 −0.01τh0 1.0 1.0321 0.1 5.0τh1 7.0 7.76 1.0 15.0Vn −55.0 −54.52 −70.0 −40.0dVn 0.0333 0.0328 0.01 0.1τn0 1.0 1.06 0.1 5.0τn1 5.0 4.97 2.0 12.0

Page 63: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

dxa(t)

dt= x

a-1(t)(x

a+1(t) - x

a-2(t)) - x

a(t) - f

a = 1, 2, ..., D;

x-1

(t) = xD-1

(t); x0(t) = x

D(t); x

D+1(t) = x

1(t).

f is a fixed parameter f = 10. Solutions are chaotic.

Lorenz96 Model D = 11

‘Twin Experiments’ Use to test methods of data

assimilation; use to design experiments.

Generate data with known model; add noise to

model output; present l = 1,2,…,L < D noisy time series to

assimilation procedure.

Page 64: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D
Page 65: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D
Page 66: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D
Page 67: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D
Page 68: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

Neurobiological Example

Page 69: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

Inject

current

Iapplied(t)

Measure

Response

Voltage

V(t)

3

4

( )( ) ( )( ( ))

( ) ( ( )) ( ( )) I ( )

( ( ) ( ))( ) a( ) ( ), ( ), ( )

( ( ))

Na Na

K K L L applied

x

dV tC g m t h t E V t

dt

g n t E V t g E V t t

a V t a tda tt m t h t n t

dt V t

2

1 2

1( ) 1 tanh( )

2

( ) (1 tanh ( ))

x

x

xtx x x

xt

V Va V

dV

V VV t t

dV

D = 4 L = 1 p = 20

Measure V(t) with

selected Iapplied(t)

Evaluate all

parameters and all

unobserved state

variables a(t)

Neuron Model

Page 70: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

This is the challenge:

Using laboratory experiments on individual neurons and on

collections of neurons. Build biophysically based models of functional

neural networks which matches experiments and predicts the response

to new stimuli.

Our strategy is this:

o create a model of the functional network of interest—e.g. song production

network for song birds and, of course, the individual neurons in the network---

what is a sensible model—we use Hodgkin-Huxley models.

o using the model itself, design experiments that stimulate all degrees of

freedom of the neuron/network and measure enough quantities at each

observation time---these are numerical simulations.

o Use the model along with data, voltage across membranes, and perhaps

other measurements, to determine the unknown parameters in the model and

the unobserved state variables in the model.

o “validate the neuron model”—via prediction—These validated neurons can

can be used in network construction.

Page 71: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

One mainstream view of network modeling and operation is that

details do not matter but some form of network “organization” or

structure determines network operations.

Our use of data assimilation to design experiments, test and validate

models of cells and systems points to advantages of other directions.

Why would we want the kind of detail of neural or cellular processes

accurate modeling and careful data assimilation provides?

➢ use models of nerve cells (neurons) to compare healthy and

diseased cells to provide biophysical targets for therapies

➢ use detailed models of regulatory networks for genetic action to

design interventions

➢ use detailed, verified models of functional network connectivity

and nodal performance to engineer functions into high-

performance electronics---e.g. sequence generation and

recognition with human accuracy but machine performance

Page 72: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

Green: Motor Pathway

HVcRA

Respiration/Syrinx

Song Production

Auditory Feedback

Red: Anterior Forebrain Pathway (AFP)

HVcArea XDLM

LMANArea X

and HVC

Control and Song

Maintenance

Songbox

Page 73: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

Neurobiological Laboratory Experiments

Margoliash Laboratory, UChicago

Isolated Neurons from the Avian Song System

On each neuron many different Iapplied(t) measurements in time

“epochs” of 2-6 seconds

Membrane Voltage Observed

Sampling time 0.02 ms (50 kHz), 500-1500 ms of observations

Use all this to estimate the unknown parameters in the neuron

and unmeasured state variables, then predict the response of

the neuron to new stimuli (forcing). Model Validation

Page 74: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

Time (Units of 0.02 ms) 2000 ms

Why this Iappl(t) ?

Page 75: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

Back to Song System Nucleus HVC

Interneurons, L = 1 Voltage

Page 76: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

CdV (t)

dt= g

Nam(t)3h(t)(E

Na-V (t)) +

gKn(t)4(E

K-V (t)) + g

L(E

L-V (t)) +

gCa

a(t)b(t)V (t)[Ca2+ ]

ext- [Ca2+ ](t)e

-V (t )/VT

1- e-V (t )/V

T

+ other currents + Iapplied

(t)

dx(t)

dt=

(V (t)) - x(t)

tx(V (t))

x(t) = m(t),h(t),n(t),a(t),b(t){ }

2

1 2

1( ) 1 tanh( )

2

( ) (1 tanh ( ))

x

x

xtx x x

xt

V Vx V

dV

V VV t t

dV

Page 77: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D
Page 78: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D
Page 79: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D
Page 80: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D
Page 81: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

VLSI Neuromorphic Chip

➢ Test parameters on the chip to check quality of fabrication

versus design

➢ Use a twin experiment to test method of data assimilation:

generate data from the VLSI chip. Use “voltages” on chip

neurons as measured quantities to estimate parameters

known from first step.

➢ Use voltage data from biological neuron to readjust chip

parameters and state variables to those for the data, then

predict voltage response to new current stimulation.

Page 82: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

Test

parameters

on the chip

to check

quality of

fabrication

versus design

Page 83: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

Use a twin

experiment to test

method of data

assimilation:

generate data from

the VLSI chip. Use

“voltages” on chip

neurons as

measured

quantities to

estimate

parameters known

from first step.

Page 84: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

Use voltage

data from

biological

neuron to

readjust chip

parameters

and state

variables to

those for the

data, then

predict

voltage

response to

new current

stimulation.

Page 85: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

Unfinished Business

Measurements for Networks of Neurons---extracellular

potentials? Other technology

Computational capability for the future

Port network models to VLSI

Use principles of network functions to solve similar problems in

other space and time domains.

Page 86: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D
Page 87: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

Some application areas for Data Assimilation:

Genetic regulatory networks

signal transduction pathways

systems biology; synthetic biology; Immunology

biophysical modeling of neurons and functional networks

neutrino astrophysics

coastal flows and transport of toxic constituents after storms

electrical and chemical engineering

identifying oil and gas reservoirs

hydrological models of streams and lakes

neuromorphic engineering---neurons and functional networks on a chip

numerical weather prediction

Page 88: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D
Page 89: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

Machine Learning

Feedforward Multi-layer Perceptron

Page 90: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

Data Assimilation in a time window [t0,tF]:

Transfer Information from Data Library y(τ) to a Model x(t)

321 k F

Observe y(τ1)

Observe y(τ2)

Observe y(τ3)

Observe y(τk)

Observe y(τF)

Move model

forward

Move model

forward

Move model

forward

Move model

forward

P(y|x) P(xn+1|xn)3 P(y|x) P(xn+1|xn)3 P(y|x) P(xn+1|xn)3 P(y|x) P(xn+1|xn)3 P(y|x)

Page 91: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

l0 l1 lF-1 lFj

j=1

j=2

j=N

Page 92: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

Fl0l

Input y(l0)

Output y(lF)

Move model forward

layer to layer

P(y|xl0) P(xl+1|xl)3 P(y|xlF)

Multi-Layer Perceptron over layers [l0,lF]:

Transfer Information from Data Library {y(l0),y(lF)} to a

Model x(l+1) = f(Wx(l))

Page 93: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

Total Probability = P(y|x)P(xn+1|xn)3 P(y|x)

= exp -[-log(Total Probability] = exp – [Σ -logP(x|y) + Σ –log P(xl+1|xl)3 ]

= exp –[ActionML(l)] = e-AML

(l)

( )

( )

G(X) eExpected Value of G(X) =

e

ML

ML

A l

A l

dX

dX

Machine Learning

requires evaluating a

Statistical Physics Integral

Page 94: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

In standard Machine Learning, we have a network with an input l0 layer

and an output layer lF, and between them intermediate “hidden” layers.

Information in noisy pairs {yk(l0),yk(lF)} k = 1,2,…,M are presented to the

network.

We want to minimize the cost function

subject to the network rules with layer l and active units

(“neurons”) xj(l), j = 1, 2, N satisfying

Relax the equality constraint

x

j(l +1) = f

j[W

ji(l)x

i(l)]

,2

1, 1

( , )1( ( ) ( ))

2 2

M Lk kmr r

k r

R r lx l y l

ML

Page 95: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

AML

(x(l)) =1

2ML k=1,r=1

M ,L

åR

m(r,l)

2(x

r

(k )(l) - yr

(k )(l))2 Rm(l) ¹ 0 when l = l

0,l

F{ }

+R

f(l)

2[x(k )(l +1) - f (W (l)x(k )(l))]2å

Total Probability = exp[-AML

(x(l))]

To approximate expected value integral

Maximize Overall Probability = Minimize AML

(x(l)) over x(l) and W (l)

Model is exact when Rf ∞

Machine Learning Action

Page 96: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

ML example

Page 97: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

Data is generated by selecting a network with 100 layers

and 10 `neurons’ at each layer. Weights are selected from a

uniform distribution U[-0.1,0.1]. Inputs xk(l0); k = 1,2,… are

passed through the network producing outputs xk(lF).

Gaussian noise N(0,σ2=0.0025) is added to the inputs

and the outputs. These make our library of data {yk(l0), yk(lF)}.

We then build a network with lF layers and N active

units per layer. Train this network by minimizing:

AML

(xr

k (l),Wji(l)) =

1

M

Rm(l)

2L(x

r

k (l) - yr

k (l))2

r=1

L

å +R

f

2(x

j

k (l +1) - f (Wji(l)x

i

k (l)))2

j=1

N

ål=l

0

lF

åìíï

îï

üýï

þïk=1

M

å

tÞ l

Page 98: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

Adversarial Perturbations

Page 99: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

E2(lF, M ) =

1

NMP

[xj

k (lF

) - yj

k (lF

)]2

k=1, j=1

k= MP

, j=N

å

Network is trained using variational

annealing with M data pairs.

Prediction (generalization) is performed

using MP new pairs {yk(l0), yk(lF)}.

Input yk(l0) is presented to the trained

network at l0, output xk(lF) is compared to the

output of the data pair at lF.

Quality of predictions:

Page 100: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D
Page 101: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

A(x(l),x '(l)) = dl L(x(l),x '(l),l)l0

lF

ò

L(x(l),x '(l),l) =R

m(r,l)

2(x

r(l) - y

r(l))2 +

Rf(a)

2[x

a'(l) - F

a(x(l),l)]2

a=1

D

år=1

L

å

= c(x(l) - y(l)) + +R

f(a)

2[x '

a(l) - F

a(x(l),l)]2

a=1

D

å

Deepest Learning: layer is continuous variable

Page 102: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

Our variational principle is in Lagrangian

coordinates {x(l),x’(l)} with

Boundary Conditions: p(l0) = p(lF) = 0

Generator of a

rotation in

D-dimensions

d2x(t)

dt 2= "v(t)´B(x(t),t)"+[ÑF(x(t),t)+ ¶t A(x(t),t)]

Page 103: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

H (x(l), p(l),l) =p2

2Rf

+ p · F(x(l)) - Measurement Error Term

p(l) =¶L(x(l),x '(l),l)

¶x '(l)

dx(l)

dl= F(x(l),l) +

p(l)

Rf

dp(l)

dl= -

¶F(x(l),l)

¶x(l)p(l) + R

m(r,l)(x

r(l) - y

r(l))

Boundary conditions: p(l0) = p(l

F) = 0

Back Propagation starting at lF

Page 104: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

AML (x(l), x '(l)) = L(x(l), x(l +1)l=l0

lF -1

å )

d AML (x(l), x '(l),l) = d x(l0 )¶L(x(l0 ), x(l1))

¶x(l0 )

+d x(lF )¶L(x(lF-1), x(lF ))

¶x(lF )

+ Discrete Euler Lagrange Equation at each layer

Lagrangian variation

satisfies Symplectic Symmetry of problem

Accurate estimation of minima of the Action A(X)

More stable than Back Propagation ?

Boundary

Condition

s

Page 105: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D
Page 106: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

Further possibilities:

Recurrent Networks

Performance on large libraries of labeled Images

Use of ML method in identification of functional network

connectivity of biophysical neurons

Information input at intermediate layers ?

More complex networks; learn by introducing many Rf

terms and use them as required?

Page 107: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D
Page 108: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D
Page 109: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

( )

( )

G(X) eExpected Value of G(X) =

e

ML

ML

A l

A l

dX

dX

Machine Learning

requires evaluating a

Statistical Physics Integral

Page 110: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

1

1

12

21 3 2

( )32

Data Source

( )( )

( )( ( ) ( )) ( )

( )( ( ) 1 )

Model Equations ( ) is passed to th

e mode

l

x t

dx tx t

dt

dx tx t x t qx t

dt

dx tx t e

dt

d

x t

α is first treated as fixed, then driven : α(t)

1

12 1 1

21 3 2

( )32

( )( ) ( )( ( ) ( ))

( ) ( ( ) ( )) ( )

( ) ( ( ) 1 )

M

M M

y t

M

y ty t u t x t y t

dt

dy ty t y t q y t

dt

dy ty t e

dt

Page 111: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

0 1 1

0 1 1

( 1) ( ), ( ),..., ( ), ( ) { ( ), ( 1)}

( 1) ( ), ( ),..., ( ), ( ) { ( ), ( 1)}

n n

n n

X n x t x t x t x t X n x n

Y n y t y t y t y t Y n y n

Page 112: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

So, what did we learn ?

1. Make a model—no algorithms for this, use your best

knowledge of the (bio)physics.

2. Make a big model—experiments will prune the model

3. Do twin experiments to determine how many measurements

you need to get the “global” minimum of A0(X)—annealing.

4. Use twin experiments to design laboratory experiments

5. Do experiments to determine consistency of model with data.

6. Use the completed model and estimated x(T), via probability

distribution or dx/dt = F(x(t)), to predict, for t > T. This

validates (or not) the model.

Page 113: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

So, what did we learn ?

7. Use Laplace method + computable corrections to determine

consistency of numerical methods.

8. If there are not enough measurements at each observation

time, (a) get more; (b) use waveform information via time delays.

9. Using Data Assimilation, one can (a) test new fabrications of

VLSI neurons, (b) test DA methods on verified VLSI chip, (c)

complete neuron model on chip from biological data; predict

response to new forcing.

Page 114: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D
Page 115: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D
Page 116: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D
Page 117: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D
Page 118: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

( ( 1), ( ), ( 1), ( ))

( ( 1), ( ), ( 1), ( )) ( ( 1), ( ), ( ))

( ( 1), ( ), ( )) ( ( ), ( ))

( ( 1), ( ) | ( ))( ( 1) | ( 1), ( ), ( ))

( ( ) | ( ))

( ( 1), ( 1))

( ( ), ( ))

( ( ), (

P x n X n y n Y n

P x n X n y n Y n P x n X n Y n

P x n X n Y n P X n Y n

P x n X n Y nP y n x n X n Y n

P X

P X n Y n

P X n Y n

n Y nP X n Y

))

( ( ), ( ))( ( 1) | ( 1), ( ), ( ( ( 1) | ( )) M) ar ov k) P x

n

P X n Y nP y n x n X n n nY xn

Change due to Observation Move Model Forward

Page 119: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

To see how the optimization process is working, we look at the

optimization output at various steps during the iteration:

0 0 0 1 1 1( ( ) , ( ) , ) ( ( ) , ( ) , )

......( ( ) , ( ) , )

until objective (cost) function is minimized,

subject to model e

We req

qu

uire ( )

a o

0.

ti ns

Final Fina

Final

l Final

y n u n p y n u n p

y n u n p

u n

Page 120: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

We will discuss the equivalence between machine learning (ML)

and data assimilation (DA).

Data Assimilation is the transfer of information from (often sparse)

observations, to dynamical models of complex systems---numerical

weather prediction, neurobiology, …

We will review the formulation of each, DA and ML. The equivalence will

be clear.

Then we will give a variational annealing method for locating the

smallest minimum of the action/cost function in variational calculations

in each field.

Using this in an example of each, gives design insight to “deep

learning”.

We then formulate “deepest learning” as ML layers become continuous.

This puts back propagation in a familiar perspective—the Euler-

Lagrange equations of the variational principle. It also suggests more

stable variational approaches.

Page 121: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

Data Assimilation is the transfer of information from (often sparse)

observations, to dynamical models of complex systems---numerical

weather prediction, neurobiology, …

Then we will give a variational annealing method for locating the

smallest minimum of the action/cost function in variational calculations

in each field.

Page 122: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

Total Probability

= P(y|x)P(xn+1|xn)3 P(y|x)P(xn+1|xn)3P(y|x)P(xn+1|xn)3 P(y|x)P(xn+1|xn)3 P(y|x)

= exp -[-log(Total Probability]

= exp – [Σ -logP(x|y) + Σ –log P(xn+1|xn)3 ] = e –[Action(X)] = e-A(X)

Expected Value of G(X) = dX G(X) e- A( X )

òdX e- A( X )

ò

Data Assimilation

requires evaluating a

Statistical Physics Integral

Page 123: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

0 1

0 1

0 1

Observation window in time: t ,t ,...t

( ) (0), (1), (2),...., ( )

Model state vectors and parameters at times t ,t ,...t

Y(n) (0), (1), (2),...., ( )

Observed data vectors at times , ,...

N

n

n

X n n

y y y y n

x x x x

We want P(X(n+1)|Y(n+1)) in terms of P(X(n)|Y(n)).

This relation will give us a representation of P(X(N)|Y(N))recursion

Page 124: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

( ( 1) | ( 1))

P(x(n+1),y(n(1),X(n),Y(n)) definition of conditional probability

( ( 1), ( ))

P(x(n+1),y(n+1),X(n),Y(n))

( ( 1), ( )) (

( ( 1), ( ) | ( ))

( ( ) | ( ))( ( )

( 1), ( ) | ( )| ( )

)

P X n Y n

P y n Y n

P y n Y n P x n X n

P x n X n Y n

P X n Y nP

nn n

YX Y

exp[CMI(x(n+1),y(n

Markov property: x

+1),X(n)|Y(n))]

(n+1) depends ONLY on x(n)--true of

( ( 1) | ( ))

( ( 1) | ( )

differential

equations in

P )(

biophysic

y(n+1)|x(

)

( ( ) | ( )

n+1),X(

)

n ) ( ()

s

,Y(n)

P x n x n

P x n x

P n

P Xn

X Y n

+ terms independent of X

= exp[-A(X)] + terms independe

) |

nt

( )

f X

)

o

n Y n

Page 125: Data Assimilation, Machine Learning: Statistical …2018/08/04  · Data Assimilation, Machine Learning: Statistical Physics Problems Introduction, Core Ideas, Applications Henry D

Lorenz96 Model

L = 2, 4, 5, 6 D = 11