conditioning and density, mathematically and computationally€¦ · this work is supported by...
TRANSCRIPT
This work is supported by DARPA grant FA8750-14-2-0007. 1
Conditioning and density,mathematically and computationally
Chung-chieh Shan (with Wazim Mohammed Ismail)
Mathematical Foundations of Programming Semantics
June 14, 2014
2
Probabilistic programming
Alice beat Bob at a game. Is she better than him at it?
Generative story
a <- normal 10 3
b <- normal 10 3
l <- normal 0 2
Observed e�ect
condition (a-b > l)
Hidden cause
return (a > b)
2
Probabilistic programming
Alice beat Bob at a game. Is she better than him at it?
Generative story
0 5 10 15 200
50
100
150
200
250
300
a
a <- normal 10 3
b <- normal 10 3
l <- normal 0 2
Observed e�ect
condition (a-b > l)
Hidden cause
return (a > b)
2
Probabilistic programming
Alice beat Bob at a game. Is she better than him at it?
Generative story
0 5 10 15 200
5
10
15
20
a
b
a <- normal 10 3
b <- normal 10 3
l <- normal 0 2
Observed e�ect
condition (a-b > l)
Hidden cause
return (a > b)
2
Probabilistic programming
Alice beat Bob at a game. Is she better than him at it?
Generative story
0
5
a10
15
20
0
5
b10
15
20
-4
-2
nois
e
0
2
4a <- normal 10 3
b <- normal 10 3
l <- normal 0 2
Observed e�ect
condition (a-b > l)
Hidden cause
return (a > b)
2
Probabilistic programming
Alice beat Bob at a game. Is she better than him at it?
Generative story
0
5
a10
15
20
0
5
b10
15
20
-4
-2
nois
e
0
2
4a <- normal 10 3
b <- normal 10 3
l <- normal 0 2
Observed e�ect
condition (a-b > l)
Hidden cause
return (a > b)
2
Probabilistic programming
Alice beat Bob at a game. Is she better than him at it?
Generative story
0
5
a10
15
20
0
5
b10
15
20
-4
-2
nois
e
0
2
4a <- normal 10 3
b <- normal 10 3
l <- normal 0 2
Observed e�ect
condition (a-b > l)
Hidden cause
return (a > b)
2
Probabilistic programming
Alice beat Bob at a game. Is she better than him at it?
Generative story
0
5
a10
15
20
0
5
b10
15
20
-4
-2
nois
e
0
2
4
Denoted measure:
�c:
ZN(10;3)
da
ZN(10;3)
db
ZN(0;2)
dl ha� b > li c(a > b)
a <- normal 10 3
b <- normal 10 3
l <- normal 0 2
Observed e�ect
condition (a-b > l)
Hidden cause
return (a > b)
2
Importance sampling
Alice beat Bob at a game. Is she better than him at it?
Generative story
0 5 10 15 200
5
10
15
20
a
b
a <- normal 10 3
b <- normal 10 3
weight�1 + erf( a�b
2p2)�=2
Observed e�ect
condition (a-b > l)
Hidden cause
return (a > b)
2
Importance sampling
Alice beat Bob at a game. Is she better than him at it?
Generative story
0 5 10 15 200
5
10
15
20
a
b
Denoted measure:
�c:
ZN(10;3)
da
ZN(10;3)
db1 + erf
� a�b2p2
�2
c(a > b)
a <- normal 10 3
b <- normal 10 3
weight�1 + erf( a�b
2p2)�=2
Observed e�ect
condition (a-b > l)
Hidden cause
return (a > b)
3
Conditioning on a real number
The bike was moving near location 9. Where’s it now?
Generative story
x <- normal 10 3
m <- normal 10p10
let m = 9
x <- normal 9110
q910m <- normal x 1
x <- normal ( 910m+ 1
1010)q
910
x’ <- normal 14110
q4910x’ <- normal (x+5) 2
Denoted measure:�c:
ZN( 91
10;p
9
10)
dx
ZN(x+5;2)
dx0 c(x;m; x0)
3
Conditioning on a real number
The bike was moving near location 9. Where’s it now?
Generative story
0 5 10 15 200
20
40
60
80
x
x <- normal 10 3
m <- normal 10p10
let m = 9
x <- normal 9110
q910m <- normal x 1
x <- normal ( 910m+ 1
1010)q
910
x’ <- normal 14110
q4910x’ <- normal (x+5) 2
Denoted measure:�c:
ZN( 91
10;p
9
10)
dx
ZN(x+5;2)
dx0 c(x;m; x0)
3
Conditioning on a real number
The bike was moving near location 9. Where’s it now?
Generative story
0 5 10 15 200
5
10
15
20
x
m
x <- normal 10 3
m <- normal 10p10
let m = 9
x <- normal 9110
q910
m <- normal x 1
x <- normal ( 910m+ 1
1010)q
910
x’ <- normal 14110
q4910x’ <- normal (x+5) 2
Denoted measure:�c:
ZN( 91
10;p
9
10)
dx
ZN(x+5;2)
dx0 c(x;m; x0)
3
Conditioning on a real number
The bike was moving near location 9. Where’s it now?
Generative story
m
x5
10
15
20
5
10x’15
20
25
0 0
5
10
15
20
x <- normal 10 3
m <- normal 10p10
let m = 9
x <- normal 9110
q910
m <- normal x 1
x <- normal ( 910m+ 1
1010)q
910
x’ <- normal 14110
q4910
x’ <- normal (x+5) 2
Denoted measure:�c:
ZN(10;3)
dx
ZN(x;1)
dm
ZN(x+5;2)
dx0 c(x;m; x0)
3
Conditioning on a real number
The bike was moving near location 9. Where’s it now?
Generative story
m
x5
10
15
20
5
10x’15
20
25
0 0
5
10
15
20
x <- normal 10 3
m <- normal 10p10
let m = 9
x <- normal 9110
q910
m <- normal x 1
x <- normal ( 910m+ 1
1010)q
910
x’ <- normal 14110
q4910
x’ <- normal (x+5) 2
Observed e�ect
condition (m = 9)
Denoted measure:�c:
ZN(10;3)
dx
ZN(x;1)
dm
ZN(x+5;2)
dx0 c(x;m; x0)
3
Conditioning on a real number
The bike was moving near location 9. Where’s it now?
Generative story
x <- normal 10 3 m <- normal 10p10
let m = 9
x <- normal 9110
q910
m <- normal x 1 x <- normal ( 910m+ 1
1010)q
910
x’ <- normal 14110
q4910
x’ <- normal (x+5) 2
Observed e�ect
condition (m = 9)
Denoted measure:�c:
ZN(10;
p10)
dm
ZN( 9
10m+ 1
1010;p
9
10)
dx
ZN(x+5;2)
dx0 c(x;m; x0)
3
Conditioning on a real number
The bike was moving near location 9. Where’s it now?
Generative story
x <- normal 10 3 m <- normal 10p10 let m = 9
x <- normal 9110
q910
m <- normal x 1 x <- normal ( 910m+ 1
1010)q
910
x’ <- normal 14110
q4910
x’ <- normal (x+5) 2
Observed e�ect
condition (m = 9)
Denoted conditional measure:�m: �c:
ZN( 9
10m+ 1
1010;p
9
10)
dx
ZN(x+5;2)
dx0 c(x;m; x0)
3
Conditioning on a real number
The bike was moving near location 9. Where’s it now?
Generative story
x <- normal 10 3 m <- normal 10p10 let m = 9
x <- normal 9110
q910
m <- normal x 1 x <- normal ( 910m+ 1
1010)q
910
x’ <- normal 14110
q4910
x’ <- normal (x+5) 2
Denoted marginal measure:
�c:
ZN( 91
10;p
9
10)
dx
ZN(x+5;2)
dx0 c(x; x0)
3
Conditioning on a real number
The bike was moving near location 9. Where’s it now?
Generative story
x <- normal 10 3 m <- normal 10p10 let m = 9
x <- normal 9110
q910
m <- normal x 1 x <- normal ( 910m+ 1
1010)q
910
x’ <- normal 14110
q4910
x’ <- normal (x+5) 2
Hidden cause
return x’
Denoted marginal measure:
�c:
ZN( 91
10;p
9
10)
dx
ZN(x+5;2)
dx0 c(x0)
3
Conditioning on a real number
The bike was moving near location 9. Where’s it now?
Generative story
x <- normal 10 3 m <- normal 10p10 let m = 9
x <- normal 9110
q910m <- normal x 1 x <- normal ( 9
10m+ 11010)
q910
x’ <- normal 14110
q4910
x’ <- normal (x+5) 2
Hidden cause
return x’
Denoted marginal measure:
�c:
ZN( 141
10;p
49
10)
dx0 c(x0)
4
Importance sampling
Each sample has an importance weight
:How much did we rig our random choices to avoid rejection?
Generative story
x <- normal 10 3
m <- normal x 1
x’ <- normal (x+5) 2
Observed e�ect
condition (m = 9)
Denoted measure:
�c:
ZN(10;3)
dxe�(9�x)
2=2
p2�Z
N(x+5;2)
dx0 c(x;m; x0)
4
Importance sampling
Each sample has an importance weight
:How much did we rig our random choices to avoid rejection?
Generative story
0 5 10 15 205
10
15
20
25
x
x’
x <- normal 10 3
m <- normal x 1
x’ <- normal (x+5) 2
Observed e�ect
condition (m = 9)
Denoted measure:
�c:
ZN(10;3)
dxe�(9�x)
2=2
p2�Z
N(x+5;2)
dx0 c(x;m; x0)
4
Importance sampling
Each sample has an importance weight:How much did we rig our random choices to avoid rejection?
Generative story
0 5 10 15 205
10
15
20
25
x
x’
x <- normal 10 3
weight e�(9�x)2=2=
p2�
x’ <- normal (x+5) 2
Observed e�ect
condition (m = 9)
Denoted measure:
�c:
ZN(10;3)
dxe�(9�x)
2=2
p2�Z
N(x+5;2)
dx0 c(x;m; x0)
5
Probability or density?
Generative story
x <- flip True False
m <- (if x then flip 0 1
else normal 0 1)
Observed e�ect
condition (m = 1)
Hidden cause
return x
5
Probability or density?
Generative story
x <- flip True False
weight (if x then 1/2
else e�1=2=p2�) m
0 1
Observed e�ect
condition (m = 1)
Hidden cause
return x
6
Ambient measures
Let � be a measure over pairs (x;m).
� = ... <- ...
... <- ...
...
return (x,m)
= �c:
ZZ� � � c(x;m)
Condition � on m = Express � by choosing m �rst!(easy when m is drawn from a �xed measure)
�(m)
� = m <- ...
... <- ...
...
return (x,m)
= �c:
Z
�
dm
Z� � � c(x;m)
It doesn’t matter what the ambient measure � is,but there must be one, such as Lebesgue.
6
Ambient measures
Let � be a measure over pairs (x;m).
� = ... <- ...
... <- ...
...
return (x,m)
= �c:
ZZ� � � c(x;m)
Condition � on m = Express � by choosing m �rst!(easy when m is drawn from a �xed measure)
�(m)
� = m <- ...
... <- ...
...
return (x,m)
= �c:
Z
�
dm
Z� � � c(x;m)
It doesn’t matter what the ambient measure � is,but there must be one, such as Lebesgue.
6
Ambient measures
Let � be a measure over pairs (x;m).
� = ... <- ...
... <- ...
...
return (x,m)
= �c:
ZZ� � � c(x;m)
Condition � on m = Express � by choosing m �rst!(easy when m is drawn from a �xed measure)
�(m) = m <- �
...
... <- ...
...
return (x,m)
= �c:
Z�
dm
Z� � � c(x;m)
It doesn’t matter what the ambient measure � is,but there must be one, such as Lebesgue.
7
Discrete importance sampling revisited
a <- normal 10 3
b <- normal 10 3
l <- normal 0 2
w <- dirac (a-b > l)
return (a > b)
a <- normal 10 3
b <- normal 10 3
l <- normal 0 2
w <- counting
weight (if a-b > l then 1
else 0)
return (a > b)
0
5
a10
15
20
0
5
b10
15
20
-4
-2
nois
e
0
2
4
0 5 10 15 200
5
10
15
20
a
b
probability density (absolute continuity) w.r.t. counting
7
Discrete importance sampling revisited
a <- normal 10 3
b <- normal 10 3
l <- normal 0 2
w <- dirac (a-b > l)
return (a > b)
a <- normal 10 3
b <- normal 10 3
l <- normal 0 2
w <- counting
weight (if a-b > l then 1
else 0)
return (a > b)
0
5
a10
15
20
0
5
b10
15
20
-4
-2
nois
e
0
2
4
0 5 10 15 200
5
10
15
20
a
b
probability density (absolute continuity) w.r.t. counting
8
Continuous importance sampling revisited
x <- normal 10 3
m <- normal x 1
x’ <- normal (x+5) 2
...
x <- normal 10 3
m <- lebesgue
weight e�(m�x)2=2=
p2�
x’ <- normal (x+5) 2
...
m
x5
10
15
20
5
10
x’15
20
25
0 0
5
10
15
20
0 5 10 15 205
10
15
20
25
x
x’
probability density (absolute continuity) w.r.t. lebesgue
8
Continuous importance sampling revisited
x <- normal 10 3
m <- normal x 1
x’ <- normal (x+5) 2
...
x <- normal 10 3
m <- lebesgue
weight e�(m�x)2=2=
p2�
x’ <- normal (x+5) 2
...
m
x5
10
15
20
5
10
x’15
20
25
0 0
5
10
15
20
0 5 10 15 205
10
15
20
25
x
x’
probability density (absolute continuity) w.r.t. lebesgue
9
Discrete+continuous importance sampling revisited
x <- flip True False
m <- (if x then flip 0 1 else normal 0 1)
...
x <- flip True False
m <- (x’ <- flip True False
if x’ then flip 0 1 else normal 0 1)
weight (if x then if m==0 || m==1 then 2 else 0
else if m==0 || m==1 then 0 else 2)
...
m0 1
probability density (absolute continuity) w.r.t. marginal
9
Discrete+continuous importance sampling revisited
x <- flip True False
m <- (if x then flip 0 1 else normal 0 1)
...
x <- flip True False
m <- (x’ <- flip True False
if x’ then flip 0 1 else normal 0 1)
weight (if x then if m==0 || m==1 then 2 else 0
else if m==0 || m==1 then 0 else 2)
...
m0 1
probability density (absolute continuity) w.r.t. marginal
10
Borel’s paradox
Requires change of variable:
z <- uniform �1 1
� <- uniform ��=2 �=2
y <- diracp1� z2 sin �
...
z <- uniform �1 1
y <- uniform �p1� z2p1� z2
� <- dirac arcsin(y=p1� z2)
weight 2= cos �...
x
y
z
�
�
10
Borel’s paradox
Requires change of variable:
z <- uniform �1 1
� <- uniform ��=2 �=2
y <- diracp1� z2 sin �
...
z <- uniform �1 1
y <- uniform �p1� z2p1� z2
� <- dirac arcsin(y=p1� z2)
weight 2= cos �...
x
y
z
�
�
11
Another desideratum
Generative story
x <- normal 0 1
m <- dirac (x * 2)
Observed e�ect
condition (m = 32)
Hidden cause
return (x == 16)
Should give True with 100% probability.
11
Another desideratum
Generative story
x <- normal 0 1
m <- dirac (x / 2)
Observed e�ect
condition (m = 32)
Hidden cause
return (x == 64)
Should give True with 100% probability.
Interim summary
Conditioning means disintegration.(Chang & Pollard)
If the conditioned choice has a densitywith respect to an ambient measure,
then we can transform the program locallyinto importance sampling.
I Simple ambient measures:counting, lebesgue
I Complex ambient measures:marginal, …
How to infer an ambient measure?How to calculate density with respect to it?
(Pfe�er, Bhat et al.)
Interim summary
Conditioning means disintegration.(Chang & Pollard)
If the conditioned choice has a densitywith respect to an ambient measure,
then we can transform the program locallyinto importance sampling.
I Simple ambient measures:counting, lebesgue
I Complex ambient measures:marginal, …
How to infer an ambient measure?How to calculate density with respect to it?
(Pfe�er, Bhat et al.)
13
Density calculator demo
x <- uniform 0 1
x + x + uniform 0 1
+Z 1
0h0 < t� x� x < 1i dx
+8>>>>>>><>>>>>>>:
0 if t < 0
t=2 if t < 1
1=2 if t < 2
3=2� t=2 if t < 3
0 if 3 � t1 2 30
t
14
Density calculator demo
exp (log (uniform 0 1) + log (uniform 0 1))
+8><>:Z 1
0h0 < eln t�lnx < 1i eln t�lnx dx
�t if 0 < t
0 otherwise
+(� ln t if 0 < t < 1
0 otherwise10
t
15
Density calculator demo
x <- uniform 0 1
(if x < .2 then 0 else uniform 0 1)
+ (if x > .7 then 1 else uniform 0 1)
Pfe�er’s approximate algorithm randomly samples from oneterm then calculates density of the other term.
16
Mochastic vs stochastic lambda calculus
Mochastic/stochastic only Stochastic
x <- normal 0 1
return (exp x + 1)
exp (normal 0 1) + 1
x <- normal 0 1
y <- normal 0 1
return (x + y)
normal 0 1 + normal 0 1
x <- flip True False
if x
then flip 0 1
else normal 0 1
if flip True False
then flip 0 1
else normal 0 1
x <- normal 0 1
return (x + x)
none
17
Stochastic lambda calculus: expectation
Juniform 0 1K � c =
Z 1
0c(t) dt
Jexp "K � c = J"K � (�x: c(ex))
Goal: Given ", �nd D(") such that
J"K � c =
ZD(") � t� c(t) dt for all �; c.
17
Stochastic lambda calculus: density
Juniform 0 1K � c =
Z 1
0c(t) dt
Jexp "K � c = J"K � (�x: c(ex))
Goal: Given ", �nd D(") such that
J"K � c =
ZD(") � t� c(t) dt for all �; c.
Approach: De�ne D(") by recursion on ".
17
Stochastic lambda calculus: density
Juniform 0 1K � c =
Z 1
0c(t) dt
Jexp "K � c = J"K � (�x: c(ex))
Goal: Given ", �nd D(") such that
J"K � c =
ZD(") � t� c(t) dt for all �; c.
One base case:
D(uniform 0 1) � t = h0 < t < 1i
17
Stochastic lambda calculus: density
Juniform 0 1K � c =
Z 1
0c(t) dt
Jexp "K � c = J"K � (�x: c(ex))
Goal: Given ", �nd D(") such that
J"K � c =
ZD(") � t� c(t) dt for all �; c.
One recursive case:
D(exp ") � t = ???
ht > 0iD(") � (ln t)
t
Jexp "K � c =
Z 1
�1???� c(t) dt
Z 1
�1ht > 0iD(") � (ln t)
t� c(t) dt
k kJ"K � (�x: c(ex)) =
Z 1
�1D(") � x� c(ex) dx
17
Stochastic lambda calculus: density
Juniform 0 1K � c =
Z 1
0c(t) dt
Jexp "K � c = J"K � (�x: c(ex))
Goal: Given ", �nd D(") such that
J"K � c =
ZD(") � t� c(t) dt for all �; c.
One recursive case:
D(exp ") � t = ???
ht > 0iD(") � (ln t)
t
Jexp "K � c =
Z 1
�1???� c(t) dt
Z 1
�1ht > 0iD(") � (ln t)
t� c(t) dt
k
k
J"K � (�x: c(ex)) =
Z 1
�1D(") � x� c(ex) dx
17
Stochastic lambda calculus: density
Juniform 0 1K � c =
Z 1
0c(t) dt
Jexp "K � c = J"K � (�x: c(ex))
Goal: Given ", �nd D(") such that
J"K � c =
ZD(") � t� c(t) dt for all �; c.
One recursive case:
D(exp ") � t = ht > 0iD(") � (ln t)
t
Jexp "K � c =
Z 1
�1ht > 0iD(") � (ln t)
t� c(t) dt
k kJ"K � (�x: c(ex)) =
Z 1
�1D(") � x� c(ex) dx
18
Stochastic lambda calculus: density
J"1 + "2K � c = J"1K � (�x1: J"2K � (�x2: c(x1 + x2)))
= J"2K � (�x2: J"1K � (�x1: c(x1 + x2)))
Randomized algorithm for binary operators
D("1 + "2) � t = J"1K � (�x:D("2) � (t� x))
= J"2K � (�x:D("1) � (t� x))
19
Stochastic lambda calculus: density
J42K � c = c(42)
JvarK � c = c(� var)
Jvar <- "1; "2K � c = J"1K � (�x: J"2K (�fvar 7! xg) c)
Three strategies for bound variables
D(bool_var) � t = h� bool_var = tiD(var <- "1; "2) � t = D("2fvar 7! "1g) � t
if "2 uses var at most onceD(var <- "1; "2) � t = J"1K � (�x:D("2) (�fvar 7! xg) (t))
Summary
Probabilistic programmingI Denote measure by generative storyI Run backwards to infer cause from e�ect
Mathematical reasoningI De�ne conditioning as disintegrationI Perform importance samplingI Derive density calculator
Need semantics and inference for loopsI Bag of wordsI Brownian motionI Probabilistic context-free grammars
Come to Indiana University to create essential abstractionsand practical languages for clear, robust and e�cient programs.
Dan Friedmanrelational & logic languages,meta-circularity & re�ection
Ryan Newtonstreaming, distributed & GPU DSLs,Haskell deterministic parallelism
Amr Sabryquantum computing, typetheory, information e�ects
Chung-chieh Shanprobabilistic programming,semantics
Jeremy Siekgradual typing,mechanized metatheory,high performance
Sam Tobin-Hochstadttypes for untyped languages,contracts,languages for the Web
Check out our work: Boost Libraries · Build to Order BLAS · C++ Concepts ·Chapel Generics · HANSEI · JavaScript Modules · Racket & Typed Racket ·miniKanren · LVars · monad-par · meta-par · WaveScript
http://lambda.soic.indiana.edu/