distilling free-form natural laws from experimental...
TRANSCRIPT
Hod Lipson, Cornell University
Distilling Free-Form Natural
Laws from Experimental Data
Lipson & Pollack, Nature 406, 2000
Adapting in simulation
Simulator
Evolve Controller
In Simulation
Try it in reality!
Download
Crossing The
Simulation-Reality
Gap
Adapting in reality
Evolve Controller
In Reality
Try it !
Too many
Physical Trials
Simulation & Reality
Evolve Controller
“Simulator”
Try it in reality!
Build
Collect Sensor Data
Evolve Simulator Evolve Simulators Evolve Robots
Servo
Actuators
Tilt Sensors
Emergent Self-Model
With Josh Bongard and Victor Zykov, Science 2006
Damage Recovery
With Josh Bongard and Victor Zykov, Science 2006
Random
Predicted
Physical
System Identification
?
Perturbations
Symbolic Regression
f(x)=exsin(|x|)
What function describes this data?
John Koza, 1992
Encoding Equations
*
sin –
x1 3
f(x)
+
x2 -7
Building Blocks: + - * / sin cos exp log … etc
sin(x2)
x1*sin(x2)
(x1 – 3)*sin(x2)
(x1 – 3)*sin(-7 + x2)
John Koza, 1992
Michael D. Schmidt, Hod Lipson (2006)
Models: Expression trees
Subject to mutation and selection
Experiments: Data-points
Subject to mutation and selection
{const,+,-,*,/,sin,cos,exp,log,abs}
+
sin×
–1.2
x 2
x
+
sin×
–1.2
x 2
x
Solution Accuracy
Entire Dataset
Coevolved Dataset
15
17
19
21
23
25
27
29
31
33
35
0 5000 10000 15000
Generations
Solu
tion S
ize (
# o
f nodes)
Coevolution
Exact
Solution Complexity
Entire Dataset
Coevolved Dataset
Semi-empirical mass formula
Modeling the binding energy of an atomic nucleus
ZA
A
ZAa
A
ZZaAaAaE ACSVB ,
212
31
32
odd ,
odd
even ,
0,
0
0
NZ
A
NZ
ZA
210
A
aP
R2 = 0.999915
Weizsäcker’s Formula:
A
ZN.
A
Z.A.A..E
.
.
B
2
260
2640 )(2917390
391243138314
R2 = 0.99944
Inferred Formula:
Systems of Differential Equations
• Regress on derivative
time x1 x2 …
0 3.4 -1.7 …
0.1 3.2 -0.9 …
0.2 3.1 -0.1 …
0.3 2.7 1.2 …
… … … …
State Variables
dx1/dt x2/dt …
-2.0 8.0 …
-1.0 8.0 …
-4.0 1.3 …
-5.7 1.9 …
… … …
Derivatives
0
1
2 6
1
2 3
3
1 31
43
1 322 1 2 14
3
2*
32 1 3 2
42 3
*2.5 100
1 13.6769*
*200 6* * 12* *
1 13.6769*
6* * 16* *
16* * 100*
J
v
v v
v
v v
v
S AdS
dt A
S AdSS N S N
dt A
dSS N S A
dt
dSA S
dt
4
2 4
3 5
1
2 4
22 1 2 4
3 1 32 3 34
3 2
2*
55
*
6* * 100* *
*200 32* * 1.28*
1 13.6769
1.3*
v
v v
v v
v
J
N S
dNS N N S
dt
dA S AA S A
dt A
dSS
dt
Original Equations Inferred Equations
Inferring Biological Networks
With Michael Schmidt, John Wikswo (Vanderbilt), Jerry Jenkins (CFDRC)
0
1
2 6
1
2
1 31
43
1 322 1 2 14
3
2*
32 1
*2.42114 99.2721
1 13.5956*
*199.935 5.99475* * 11.9895* *
1 13.6734*
5.99857* * 1
J
v
v v
v
v
S AdS
dt A
S AdSS N S N
dt A
dSS N
dt
3
43
2 4
3 2
42 3 2 4
22 1 2 4
3 1 3
3
43
2*
5.99606* *
15.997* * 100.0
0.01
15* *
5.99857* * 99.9963* *
*197.781
1 13.263
286
3
*
v
vv
v
extran
v
v
eous
S A
dSA S N S
dt
dNS N N S
dt
dA S
dt
S
A
A
3 5
1
2 3 3
2
55
31.9682* * 1.29659*
1.29626*
v v
J
A S A
dSS
dt
Charles Richter
Wet Data, Unknown System
With Michael Schmidt (Cornell) and Gurol Suel (UT Southwestern)
Wet Data, Unknown System
With Michael Schmidt (Cornell) and Gurol Suel (UT Southwestern)
Cell #1
Cell #2
Cell #3-60 …
Blue Dots = data points, Green Line = regressed fit
=
=
0
1
1
11
/ /
/ //
n
K Kk Kn n
K S
S k
S Sp
K S
K KdKK
dt k K K S
SdSS
dt K SK k
Biologist’s Inferred Model: Gurol Suel, et. al., Science 2007
Symbolic Regression Inferred Time-Delay Model:
S
Kcba
dt
dS
K
Scba
dt
dK
SSS
KKK
7423.163214.170.1582
18
51
t
tt
G
S
dt
dG
05.33019.0922.114
15
25
t
tt
S
G
dt
dS
Withheld Test Set #1 Fit
1355.10312.2192.3526
17
54
t
tt
G
S
dt
dG
9693.20178.0271.132
18
57
t
tt
S
G
dt
dS
Withheld Test Set #2 Fit
4406.67452.391.5057
21
46
t
tt
G
S
dt
dG
871.32965.0426.295
20
54
t
tt
S
G
dt
dS
Withheld Test Set #3 Fit
Looking For Invariants
42
42+x-x
42+1/(1000+x2)
?
From Data:
x y …
0.1 2.3
0.2 4.5
0.3 9.7
0.4 5.1
0.5 3.3
0.6 1.0
… … …
δx
δy
δy
δx , , …
Calculate partial derivatives Numerically:
δf
δx
δf
δy
From Equation:
δx’
δy’
δy’
δx’ , , …
Calculate predicted partial
derivatives Symbolically:
-5 0 5
-3
-2
-1
0
1
2
3
x
y
x
y
x3 + x – y2 – 1.5 = 0 x2 + y2 – 16 = 0 x2 + y2 + z2 – 1 = 0
-2 -1 0 1 2-3
-2
-1
0
1
2
3
x
y
x
y
-1 -0.5 0 0.5 1-1
-0.5
0
0.5
1
x
z
x
z
Homework
Circle Elliptic Curve Sphere
Linear Oscillator
2
2 61.591 369.495dx
H xdt
* *
2
2 114 28 692 322
. * . *dx
L xdt
2
2 61.591 369.495dx
H xdt
* *
• Coefficients may have different
scales and offsets each run
2
2 114 28 692 322
. * . *dx
L xdt
Pendulum
2
2.42847*cos( )
dL
dt 2
3.52768* 9.43429*cos( ) d
Hdt
H
L
Double Linear Oscillator
2 2
2 2 2 11 2 1 214.691* 15.551* 21.676* 8.3808* 2.6046*
dx dxH x x x x
dt dt
would be plus for Lagrangian
-k1ω1 – k2ω2 + k3ω1 cos(θ1 – θ2) + k4ω2 cos(θ1 – θ2)
k1 ω1 ω2 – k2 cos(θ1 – θ2)
k1ω12 + k2ω2
2 – k3ω1ω2 cos(θ1 – θ2) – k4 cos(θ1) – k5 cos(θ2)
Parsimony [-nodes]
Pre
dic
tive A
bil
ity [
-lo
g e
rro
r]
A k1 θ2 – k2 ω12 – k3 ω2
2 + k4 ω1 ω2 cos(θ1 – k5 θ2) + k6 cos(θ2) + k7 cos(θ1) – k8 cos(k9 θ2) – k10
cos(k11 – k12 θ2)
-k1 ω12 – k1ω2
2 + k1 ω1ω2 cos(θ2) + k1 cos(θ2) + k1 cos(θ1)
ω2·cos(θ1 θ2) +ω1
0
-2
-0.4
-0.8
-1.2
-1.6
SimpleComplex
Less
Pre
dic
tive
Mo
re
Pre
dic
tive
1
2
3
4
K K t t
K
t t
S S t t
S
t t
SK
t K
KS
t S
b cda
d
b cda
d
55 60 65 70 75 80 85 90 95 100
-20
0
20
40
60
80
100
120
0 4010 20 30
-20
100
20
60
0 10 20 30 40 50 60 70 80
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 10 20
0
1
0.2
0.4
0.6
0.8
30 40 50 60 70 80 90 100 110 120 1300
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 4010 20 300
1
0.2
0.4
0.6
0.8
0 5 10 15 20 25 30 35 4010
1
102
103
101
102
103
0 105
Time [hours]Time [hours]
Time [hours]Time [hours]
No
rmal
ize
d C
om
KN
orm
aliz
ed
Co
mK
Inva
rian
tIn
vari
ant
Simulation:
Experiment:
Von Thomas Hermanowski, Dr. Andreas Rick, Dr. Jochen Weber
Particle Accelerators
angle
Distortion Correction (GE X-Ray Detectors)
5x Resolution Improvement
Jay Schuren
David R. Stoutemyer
NOISE?
Stochastic Models
f(x)=sin (x + noise) ×
+
sin ×
– 1.2
x
x
+
sin ×
– 1.2
x
x noise
600 700 800 900 1000 1100 1200 1300 1400600
700
800
900
1000
1100
1200
1300
1400
1500
x
y
Periodic Samples of a Stochastic System
x
x + y
y
2 x
2 y
ø
0.1
10
10
Maximum Likelihood Stochastic Model
Simulation…
Ishanu Chattopadhyay
Ishanu Chattopadhyay Data: OGLE-II Light curves
Ishanu Chattopadhyay
Ishanu Chattopadhyay
Concluding Remarks
Correlation is enough. Faced with massive data, [the Scientific Method] is becoming obsolete. We can stop looking for models.
“
” Chris Anderson
Wired 16.07
The data deluge accelerates our ability to hypothesize, model, and test.
Theoretical physicists are not yet obsolete,
but scientists have taken steps toward
replacing themselves