trustworthy, useful languages for probabilistic modeling and...

Trustworthy, Useful Languages forTrustworthy, Useful Languages forTrustworthy, Useful Languages for

Probabilistic Modeling and InferenceProbabilistic Modeling and InferenceProbabilistic Modeling and Inference

Neil Toronto

Dissertation Defense

Brigham Young University

2014/06/11

Master’s Research: Super-ResolutionMaster’s Research: Super-ResolutionMaster’s Research: Super-Resolution

Toronto et al. Super-Resolution via Recapture and Bayesian EffectModeling. CVPR 2009 1111111111111111

• Model and query: Half a page of beautiful math

2222222222222222

• Query implementation: 600 lines of Python

2222222222222222

Main Results: Super-ResolutionMain Results: Super-ResolutionMain Results: Super-Resolution

• Competitor and BEI on 4x super-resolution:

Resolution Synthesis

3333333333333333

Resolution Synthesis Bayesian Edge Inference

3333333333333333

• Beat state-of-the-art on “objective” measures

3333333333333333

• Beat state-of-the-art on “objective” measures

• Was capable of other reconstruction tasks with few changes 3333333333333333

Only Mostly SatisfyingOnly Mostly SatisfyingOnly Mostly Satisfying

Problem 1: Still not sure the program is right

4444444444444444

Problem 2: smooth edges instead of discontinuous

4444444444444444

“To approximate blurring with a spatially varying point-spreadfunction (PSF), we assign each facet a Gaussian PSF andconvolve each analytically before combining outputs.”

4444444444444444

“To approximate blurring with a spatially varying point-spreadfunction (PSF), we assign each facet a Gaussian PSF andconvolve each analytically before combining outputs.”

i.e. “We can’t model it correctly so here’s a hack.” 4444444444444444

Solution Idea: Probabilistic LanguageSolution Idea: Probabilistic LanguageSolution Idea: Probabilistic Language

5555555555555555

Solution Idea: Probabilistic LanguageSolution Idea: Probabilistic LanguageSolution Idea: Probabilistic Language

• Also somehow let me model correctly 5555555555555555

Prior WorkPrior WorkPrior Work

Defined by animplementation

Defined by a semantics (i.e. mathematically)

6666666666666666

Designed for Bayesian practice

6666666666666666

Mimic human translation

6666666666666666

Can’t tell error from feature

6666666666666666

Limited: usually no recursion orloops; conditions

6666666666666666

Designed for Bayesian practice Designed for functionalprogrammers or FP theorists

6666666666666666

Mimic human translation May not be implemented

6666666666666666

Can’t tell error from feature Behavior is well-defined

6666666666666666

Limited: usually finitedistributions, no conditioning

6666666666666666

Limited: usually finitedistributions, no conditioning

Best of all worlds: define language using functional programmingtheory, make it for Bayesians, and remove limitations 6666666666666666

Thesis StatementThesis StatementThesis Statement

Functional programming theory and measure-theoreticprobability provide a solid foundation

for trustworthy, useful languages for constructiveprobabilistic modeling and inference.

7777777777777777

• Useful: let you think abstractly and handle details for you

7777777777777777

• Trustworthy: defined mathematically

7777777777777777

• Functional programming theory has the tools to defineprogramming languages mathematically

7777777777777777

• Functional programming theory has the tools to defineprogramming languages mathematically

• Measure-theoretic probability is the most complete account ofprobability; should allow shedding common limitations 7777777777777777

Simple Example ProcessSimple Example ProcessSimple Example Process

• Example process: Normal-Normal

8888888888888888

• Intuition: Sample , then sample using

8888888888888888

• Density model :

8888888888888888

• Compute query by integrating:

8888888888888888

Conditional QueriesConditional QueriesConditional Queries

• Compute query using Bayes’ law:

9999999999999999

So What Can’t Densities Model?So What Can’t Densities Model?So What Can’t Densities Model?

10101010101010101010101010101010

• Tons of useful things that are easy to write down

10101010101010101010101010101010

Distributions given non-axial, zero-probability conditions

10101010101010101010101010101010

Discontinuous change of variable (e.g. a thermometer)

10101010101010101010101010101010

Distributions of variable-dimension random variables

10101010101010101010101010101010

Nontrivial distributions on infinite products

10101010101010101010101010101010

Nontrivial distributions on infinite products

• Tricks to get around limitations aren’t general enough 10101010101010101010101010101010

Measure-Theoretic ProbabilityMeasure-Theoretic ProbabilityMeasure-Theoretic Probability

• Main ideas:

Don’t assign probability-like quantities to values, assignprobabilities to sets — the probability query is king

11111111111111111111111111111111

• Main ideas:

Confine assumed randomness to one place by makingrandom variables deterministic functions that observe arandom source

11111111111111111111111111111111

• Main ideas:

• Measure-theoretic model of example process:

11111111111111111111111111111111

• Main ideas:

• Measure-theoretic model of example process:

11111111111111111111111111111111

Measure-Theoretic QueriesMeasure-Theoretic QueriesMeasure-Theoretic Queries

• Specific query:

12121212121212121212121212121212

• Specific query:

• Generalized:

12121212121212121212121212121212

• Specific query:

• Generalized:

• Conditional query: if then

12121212121212121212121212121212

• Specific query:

• Generalized:

• Conditional query: if then

Can we avoid densities when ?

12121212121212121212121212121212

Zero-Probability Conditions (Axial)Zero-Probability Conditions (Axial)Zero-Probability Conditions (Axial)

13131313131313131313131313131313

Zero-Probability Conditions (Circular)Zero-Probability Conditions (Circular)Zero-Probability Conditions (Circular)

14141414141414141414141414141414

Contribution: Don’t Integrate, Compute Backwards (1)Contribution: Don’t Integrate, Compute Backwards (1)Contribution: Don’t Integrate, Compute Backwards (1)

• Integration is hard!

15151515151515151515151515151515

• But random variables and are an abstraction boundaryhiding and , so we can choose convenient ones

15151515151515151515151515151515

A uniform random source model:

15151515151515151515151515151515

where is the Normal CDF

15151515151515151515151515151515

where is the Normal CDF

• Stretches instead of integrates 15151515151515151515151515151515

• Generalized query:

16161616161616161616161616161616

i.e. output distributions are defined by preimages

16161616161616161616161616161616

• For a uniform random source model,

Compute probabilities by computing preimage areas

16161616161616161616161616161616

Compute conditional probabilities as quotients of preimageareas

16161616161616161616161616161616

Compute conditional probabilities as quotients of preimageareas

• Is this really more feasible than integrating?

16161616161616161616161616161616

Queries Using PreimagesQueries Using PreimagesQueries Using Preimages

17171717171717171717171717171717

Uniform Random Source Original Model

17171717171717171717171717171717

Crazy Idea is Feasible If...Crazy Idea is Feasible If...Crazy Idea is Feasible If...

• Seems like we need:

Standard interpretation of programs as pure functions from arandom source

18181818181818181818181818181818

Efficient way to compute preimage sets

18181818181818181818181818181818

Efficient representation of arbitrary sets

18181818181818181818181818181818

Efficient way to compute areas of preimage sets

18181818181818181818181818181818

Proof of correctness w.r.t. standard interpretation

18181818181818181818181818181818

Proof of correctness w.r.t. standard interpretation

• Completely infeasible! But...

18181818181818181818181818181818

What About Approximating?What About Approximating?What About Approximating?

Conservative approximation with rectangles:

19191919191919191919191919191919

Conservative approximation with rectangles:

19191919191919191919191919191919

Restricting preimages to rectangular subdomains:

20202020202020202020202020202020

Sampling: exponential to quadratic (e.g. days to minutes)

21212121212121212121212121212121

Sampling: exponential to quadratic (e.g. days to minutes)

21212121212121212121212121212121

Crazy Idea is Actually Feasible If...Crazy Idea is Actually Feasible If...Crazy Idea is Actually Feasible If...

• Standard interpretation of programs as pure functions from arandom source

• Efficient way to compute preimage sets

• Efficient representation of arbitrary sets

• Efficient way to compute volumes of preimage sets

• Proof of correctness w.r.t. standard interpretation

22222222222222222222222222222222

• Efficient way to compute approximate preimage subsets

• Efficient representation of arbitrary sets

22222222222222222222222222222222

• Efficient representation of approximating sets

22222222222222222222222222222222

• Efficient way to sample uniformly in preimage sets

22222222222222222222222222222222

Efficient domain partition sampling

22222222222222222222222222222222

Efficient domain partition sampling

Efficient way to determine whether a domain sample isactually in the preimage (just use standard interpretation)

22222222222222222222222222222222

Standard InterpretationStandard InterpretationStandard Interpretation

• Grammar:

23232323232323232323232323232323

• Grammar:

• Semantic function

23232323232323232323232323232323

• Grammar:

• Math has no general recursion, so (i.e. interpretation ofprogram ) is a λ-calculus term

23232323232323232323232323232323

• Grammar:

• Math has no general recursion, so (i.e. interpretation ofprogram ) is a λ-calculus term

• Easy implementation in any language with lambdas

23232323232323232323232323232323

Compositional SemanticsCompositional SemanticsCompositional Semantics

• Compositional: every term’s meaning depends only on itsimmediate subterms’ meanings

24242424242424242424242424242424

• Advantage: proofs about all programs by structural induction

24242424242424242424242424242424

• Example: meaning of

24242424242424242424242424242424

• Nonexample:

24242424242424242424242424242424

• Nonexample:

24242424242424242424242424242424

• Nonexample:

• Can preimages be computed compositionally? 24242424242424242424242424242424

Pair PreimagesPair PreimagesPair Preimages

25252525252525252525252525252525

Nonstandard Interpretation: Computing PreimagesNonstandard Interpretation: Computing PreimagesNonstandard Interpretation: Computing Preimages

• Preimage computation:

26262626262626262626262626262626

Nonstandard Interpretation: Computing PreimagesNonstandard Interpretation: Computing PreimagesNonstandard Interpretation: Computing Preimages

• Preimage computation:

26262626262626262626262626262626

Nonstandard Interpretation: Preimages Under PairingNonstandard Interpretation: Preimages Under PairingNonstandard Interpretation: Preimages Under Pairing

• Pairing types:

27272727272727272727272727272727

• Pairing types:

Theorem (correctness under pairing). If

computes preimages under

27272727272727272727272727272727

• Pairing types:

27272727272727272727272727272727

• Pairing types:

then computes preimages under .

27272727272727272727272727272727

• Pairing types:

Proof sketch. Preimages distribute over cartesian products.

27272727272727272727272727272727

• Pairing types:

Proof sketch. Preimages distribute over cartesian products.

• Similar theorems for every kind of term 27272727272727272727272727272727

Nonstandard Interpretation: CorrectnessNonstandard Interpretation: CorrectnessNonstandard Interpretation: Correctness

Theorem. For all programs , computes preimages under.

Proof. By structural induction on program terms.

28282828282828282828282828282828

Wait a MinuteWait a MinuteWait a Minute

• Q. Don’t the interpretations of do uncountable things?

29292929292929292929292929292929

A. Yes. Yes, they do.

29292929292929292929292929292929

• Q. Where do I get a computer that runs them?

29292929292929292929292929292929

A. Nowhere, but we’ll approximate them soon.

29292929292929292929292929292929

• Q. Why interpret programs as uncomputable functions, then?

29292929292929292929292929292929

A. So we know exactly what to approximate.

29292929292929292929292929292929

A. So we know exactly what to approximate.

• Q. Where did you get a λ-calculus that could operate on arbitrary,possibly infinite sets, anyway?

A. Well...

29292929292929292929292929292929

Lambda-ZFCLambda-ZFCLambda-ZFC

λ calculus

30303030303030303030303030303030

λ calculus

Infinite sets and operations

30303030303030303030303030303030

λ calculus

30303030303030303030303030303030

λ calculus

• Contemporary math, but with lambdas and general recursion; orfunctional programming, but with infinite sets

30303030303030303030303030303030

λ calculus

• Can express uncountably infinite operations, can’t solve its ownhalting problem

30303030303030303030303030303030

λ calculus

• Can express uncountably infinite operations, can’t solve its ownhalting problem

• Can use contemporary mathematical theorems directly

30303030303030303030303030303030

Rectangular ApproximationRectangular ApproximationRectangular Approximation

• A rectangle is

An interval or union of intervals

for rectangles and

31313131313131313131313131313131

• A rectangle is

for rectangles and

• Easy representation; easy intersection and join (union-like)operation, empty test, other operations

31313131313131313131313131313131

• A rectangle is

for rectangles and

• Recall:

31313131313131313131313131313131

• A rectangle is

for rectangles and

• Recall:

• Define:

31313131313131313131313131313131

• A rectangle is

for rectangles and

• Recall:

• Define:

• Derive 31313131313131313131313131313131

In Theory...In Theory...In Theory...

Theorem (sound). computes overapproximations of thepreimages computed by .

• Consequence: Sampling within preimages doesn’t leave anythingout

32323232323232323232323232323232

Theorem (monotone). is monotone.

• Consequence: Partitioning the domain never increasesapproximate preimages

32323232323232323232323232323232

Theorem (monotone). is monotone.

• Consequence: Partitioning the domain never increasesapproximate preimages

Theorem (decreasing). never returns preimages larger thanthe given subdomain.

• Consequence: Refining preimage partitions never explodes32323232323232323232323232323232

In Practice...In Practice...In Practice...

Theorems prove this always works:

33333333333333333333333333333333

Importance SamplingImportance SamplingImportance Sampling

• Alternative to arbitrarily low-rate rejection sampling:

34343434343434343434343434343434

First, refine using preimage computation:

34343434343434343434343434343434

Second, randomly choose from arbitrarily fine partition:

34343434343434343434343434343434

Third, refine again:

34343434343434343434343434343434

Fourth, sample uniformly:

34343434343434343434343434343434

Do process “in the limit”; i.e. choose :

34343434343434343434343434343434

What About Recursion?What About Recursion?What About Recursion?

• General recursion, programs that halt with probability 1; e.g.

(define/drbayes (geometric p) (if (bernoulli p)

0(+ 1 (geometric p))))

35353535353535353535353535353535

• Consider programs as being fully inlined (thus infinite):

(if (bernoulli p)0(+ 1 (if (bernoulli p)

0(+ 1 (if (bernoulli p)

0(+ 1 ...))))))

35353535353535353535353535353535

• Consider programs as being fully inlined (thus infinite):

(if (bernoulli p)0(+ 1 (if (bernoulli p)

0(+ 1 (if (bernoulli p)

0(+ 1 ...))))))

• Random domain needs to be big enough and the right shape35353535353535353535353535353535

Program Domain ValuesProgram Domain ValuesProgram Domain Values

• Values are infinite binary trees:

36363636363636363636363636363636

• Every expression in a program is assigned a node

36363636363636363636363636363636

• Implemented using lazy trees of random values

36363636363636363636363636363636

• Implemented using lazy trees of random values

• No probability density for domain, but there is a measure 36363636363636363636363636363636

Demo: Normal-Normal With Circular ConditionDemo: Normal-Normal With Circular ConditionDemo: Normal-Normal With Circular Condition

• Normal-Normal process:

37373737373737373737373737373737

• Objective: Find the distribution of

37373737373737373737373737373737

• Implementation:

(define/drbayes e (let* ([x (normal 0 1)]

[y (normal x 1)]) (list x y (sqrt (+ (sqr x) (sqr y))))))

37373737373737373737373737373737

• Implementation:

[y (normal x 1)]) (list x y (sqrt (+ (sqr x) (sqr y))))))

• Goal: Sample in the preimage of

(set-list reals reals (interval (- 1 ε) (+ 1 ε)))

37373737373737373737373737373737

For ε = 0.01:

Preimage rectangles

38383838383838383838383838383838

For ε = 0.01:

Preimage samples

38383838383838383838383838383838

For ε = 0.01:

Preimage samples Output samples

38383838383838383838383838383838

For ε = 0.01:

Preimage samples Output samples

• Works fine with much smaller ε38383838383838383838383838383838

Demo: ThermometerDemo: ThermometerDemo: Thermometer

• Normal-Normal thermometer process:

39393939393939393939393939393939

• Implementation:

[y (normal x 1)]) (list x (if (> y 100) 100 y))))

39393939393939393939393939393939

• Implementation:

[y (normal x 1)]) (list x (if (> y 100) 100 y))))

• Goal: Sample in the preimage of

(set-list reals (interval 100 100)) 39393939393939393939393939393939

Preimage rectangles

40404040404040404040404040404040

Preimage samples

40404040404040404040404040404040

Preimage samples Density of

40404040404040404040404040404040

Preimage samples Density of

Calculated from samples: mean 105.1, stddev 4.6

40404040404040404040404040404040

Demo: Stochastic Ray TracingDemo: Stochastic Ray TracingDemo: Stochastic Ray Tracing

• Idea: Model light transmission and reflection, condition on pathsthat pass through aperture

41414141414141414141414141414141

• Idea: Model light transmission and reflection, condition on pathsthat pass through aperture

41414141414141414141414141414141

• Part of the implementation (totals ~50 lines):(define/drbayes (ray-plane-intersect p0 v n d) (let ([denom (- (vec-dot v n))])

(if (positive? denom)(let ([t (/ (+ d (vec-dot p0 n)) denom)]) (if (positive? t)

(collision t (vec+ p0 (vec-scale v t)) n)#f))

42424242424242424242424242424242

• Part of the implementation (totals ~50 lines):(define/drbayes (ray-plane-intersect p0 v n d) (let ([denom (- (vec-dot v n))])

(if (positive? denom)(let ([t (/ (+ d (vec-dot p0 n)) denom)]) (if (positive? t)

(collision t (vec+ p0 (vec-scale v t)) n)#f))

• Constrained light path outputs:

Paths Through Aperture Projected and Accumulated

42424242424242424242424242424242

Other Inference TasksOther Inference TasksOther Inference Tasks

• Typical

Hierarchical models

Bayesian regression

Model selection

43434343434343434343434343434343

Other Inference TasksOther Inference TasksOther Inference Tasks

• Typical

Hierarchical models

Bayesian regression

Model selection

• Atypical

Programs that halt with probability < 1, or never halt

Probabilistic program verification (sample in preimage of errorcondition)

43434343434343434343434343434343

44444444444444444444444444444444

• Was it falsifiable?

44444444444444444444444444444444

MeasurabilityMeasurabilityMeasurability

• Only measurable sets can have probabilities

45454545454545454545454545454545

• Computing preimages under must preserve measurability—wesay itself is measurable

45454545454545454545454545454545

Theorem (measurability). For all programs , is measurable,regardless of errors or nontermination, if language primitives aremeasurable.

45454545454545454545454545454545

• Primitives include uncomputable operations like limits

45454545454545454545454545454545

• Primitives include uncomputable operations like limits

• Applies to all probabilistic programming languages45454545454545454545454545454545

What I DidWhat I DidWhat I Did

45454545454545454545454545454545

What I DidWhat I DidWhat I Did

The core calculus for this:

45454545454545454545454545454545

Future WorkFuture WorkFuture Work

• Expressiveness

Lambdas and macros

Exceptions, parameters (or continuations and marks)

46464646464646464646464646464646

• Expressiveness

Lambdas and macros

• Optimization

Direct implementation is in depth; cut to

Incremental computation

Adaptive sampling algorithms

Static analysis

46464646464646464646464646464646

• Expressiveness

Lambdas and macros

• Optimization

Direct implementation is in depth; cut to

Incremental computation

Adaptive sampling algorithms

Static analysis

• Branching out: investigate preimage computation connection withtype systems and predicate transformer semantics 46464646464646464646464646464646

trustworthy, useful languages for probabilistic modeling and...

Documents

trustworthy moving services

title of presentation ·...

trustworthy sensor networks

towards information-theoretic k-means clustering...

enterprise email: simply trustworthy?

trustworthy artificial intelligence

trustworthy elections strategy

theoretic arithmetic

intranet trustworthy-amurgis (1)

innovative. fast. responsive. trustworthy

trustworthy artificial intelligence - springer

trustworthy software - tsfdn€¦ · to adopt trustworthy...

affordable trustworthy-systems

trustworthy security framework for wireless … trustworthy...

7-noah (trustworthy guide)

theoretic a

cre652 processor architecture making it trustworthy...

trustworthy essex plumbers

taaas: trustworthy authentication as a service -...

towards trustworthy ai - enisa