trustworthy, useful languages for probabilistic modeling and...

Post on 15-May-2020

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Trustworthy, Useful Languages forTrustworthy, Useful Languages forTrustworthy, Useful Languages for

Probabilistic Modeling and InferenceProbabilistic Modeling and InferenceProbabilistic Modeling and Inference

Neil Toronto

Dissertation Defense

Brigham Young University

2014/06/11

Master’s Research: Super-ResolutionMaster’s Research: Super-ResolutionMaster’s Research: Super-Resolution

Toronto et al. Super-Resolution via Recapture and Bayesian EffectModeling. CVPR 2009 1111111111111111

Master’s Research: Super-ResolutionMaster’s Research: Super-ResolutionMaster’s Research: Super-Resolution

• Model and query: Half a page of beautiful math

2222222222222222

Master’s Research: Super-ResolutionMaster’s Research: Super-ResolutionMaster’s Research: Super-Resolution

• Query implementation: 600 lines of Python

2222222222222222

Main Results: Super-ResolutionMain Results: Super-ResolutionMain Results: Super-Resolution

• Competitor and BEI on 4x super-resolution:

Resolution Synthesis

3333333333333333

Main Results: Super-ResolutionMain Results: Super-ResolutionMain Results: Super-Resolution

• Competitor and BEI on 4x super-resolution:

Resolution Synthesis Bayesian Edge Inference

3333333333333333

Main Results: Super-ResolutionMain Results: Super-ResolutionMain Results: Super-Resolution

• Competitor and BEI on 4x super-resolution:

Resolution Synthesis Bayesian Edge Inference

• Beat state-of-the-art on “objective” measures

3333333333333333

Main Results: Super-ResolutionMain Results: Super-ResolutionMain Results: Super-Resolution

• Competitor and BEI on 4x super-resolution:

Resolution Synthesis Bayesian Edge Inference

• Beat state-of-the-art on “objective” measures

• Was capable of other reconstruction tasks with few changes 3333333333333333

Only Mostly SatisfyingOnly Mostly SatisfyingOnly Mostly Satisfying

Problem 1: Still not sure the program is right

4444444444444444

Only Mostly SatisfyingOnly Mostly SatisfyingOnly Mostly Satisfying

Problem 1: Still not sure the program is right

Problem 2: smooth edges instead of discontinuous

4444444444444444

Only Mostly SatisfyingOnly Mostly SatisfyingOnly Mostly Satisfying

Problem 1: Still not sure the program is right

Problem 2: smooth edges instead of discontinuous

“To approximate blurring with a spatially varying point-spreadfunction (PSF), we assign each facet a Gaussian PSF andconvolve each analytically before combining outputs.”

4444444444444444

Only Mostly SatisfyingOnly Mostly SatisfyingOnly Mostly Satisfying

Problem 1: Still not sure the program is right

Problem 2: smooth edges instead of discontinuous

“To approximate blurring with a spatially varying point-spreadfunction (PSF), we assign each facet a Gaussian PSF andconvolve each analytically before combining outputs.”

i.e. “We can’t model it correctly so here’s a hack.” 4444444444444444

Solution Idea: Probabilistic LanguageSolution Idea: Probabilistic LanguageSolution Idea: Probabilistic Language

5555555555555555

Solution Idea: Probabilistic LanguageSolution Idea: Probabilistic LanguageSolution Idea: Probabilistic Language

• Also somehow let me model correctly 5555555555555555

Prior WorkPrior WorkPrior Work

Defined by animplementation

Defined by a semantics (i.e. mathematically)

6666666666666666

Prior WorkPrior WorkPrior Work

Defined by animplementation

Defined by a semantics (i.e. mathematically)

Designed for Bayesian practice

6666666666666666

Prior WorkPrior WorkPrior Work

Defined by animplementation

Defined by a semantics (i.e. mathematically)

Designed for Bayesian practice

Mimic human translation

6666666666666666

Prior WorkPrior WorkPrior Work

Defined by animplementation

Defined by a semantics (i.e. mathematically)

Designed for Bayesian practice

Mimic human translation

Can’t tell error from feature

6666666666666666

Prior WorkPrior WorkPrior Work

Defined by animplementation

Defined by a semantics (i.e. mathematically)

Designed for Bayesian practice

Mimic human translation

Can’t tell error from feature

Limited: usually no recursion orloops; conditions

6666666666666666

Prior WorkPrior WorkPrior Work

Defined by animplementation

Defined by a semantics (i.e. mathematically)

Designed for Bayesian practice Designed for functionalprogrammers or FP theorists

Mimic human translation

Can’t tell error from feature

Limited: usually no recursion orloops; conditions

6666666666666666

Prior WorkPrior WorkPrior Work

Defined by animplementation

Defined by a semantics (i.e. mathematically)

Designed for Bayesian practice Designed for functionalprogrammers or FP theorists

Mimic human translation May not be implemented

Can’t tell error from feature

Limited: usually no recursion orloops; conditions

6666666666666666

Prior WorkPrior WorkPrior Work

Defined by animplementation

Defined by a semantics (i.e. mathematically)

Designed for Bayesian practice Designed for functionalprogrammers or FP theorists

Mimic human translation May not be implemented

Can’t tell error from feature Behavior is well-defined

Limited: usually no recursion orloops; conditions

6666666666666666

Prior WorkPrior WorkPrior Work

Defined by animplementation

Defined by a semantics (i.e. mathematically)

Designed for Bayesian practice Designed for functionalprogrammers or FP theorists

Mimic human translation May not be implemented

Can’t tell error from feature Behavior is well-defined

Limited: usually no recursion orloops; conditions

Limited: usually finitedistributions, no conditioning

6666666666666666

Prior WorkPrior WorkPrior Work

Defined by animplementation

Defined by a semantics (i.e. mathematically)

Designed for Bayesian practice Designed for functionalprogrammers or FP theorists

Mimic human translation May not be implemented

Can’t tell error from feature Behavior is well-defined

Limited: usually no recursion orloops; conditions

Limited: usually finitedistributions, no conditioning

Best of all worlds: define language using functional programmingtheory, make it for Bayesians, and remove limitations 6666666666666666

Thesis StatementThesis StatementThesis Statement

Functional programming theory and measure-theoreticprobability provide a solid foundation

for trustworthy, useful languages for constructiveprobabilistic modeling and inference.

7777777777777777

Thesis StatementThesis StatementThesis Statement

Functional programming theory and measure-theoreticprobability provide a solid foundation

for trustworthy, useful languages for constructiveprobabilistic modeling and inference.

• Useful: let you think abstractly and handle details for you

7777777777777777

Thesis StatementThesis StatementThesis Statement

Functional programming theory and measure-theoreticprobability provide a solid foundation

for trustworthy, useful languages for constructiveprobabilistic modeling and inference.

• Useful: let you think abstractly and handle details for you

• Trustworthy: defined mathematically

7777777777777777

Thesis StatementThesis StatementThesis Statement

Functional programming theory and measure-theoreticprobability provide a solid foundation

for trustworthy, useful languages for constructiveprobabilistic modeling and inference.

• Useful: let you think abstractly and handle details for you

• Trustworthy: defined mathematically

• Functional programming theory has the tools to defineprogramming languages mathematically

7777777777777777

Thesis StatementThesis StatementThesis Statement

Functional programming theory and measure-theoreticprobability provide a solid foundation

for trustworthy, useful languages for constructiveprobabilistic modeling and inference.

• Useful: let you think abstractly and handle details for you

• Trustworthy: defined mathematically

• Functional programming theory has the tools to defineprogramming languages mathematically

• Measure-theoretic probability is the most complete account ofprobability; should allow shedding common limitations 7777777777777777

Simple Example ProcessSimple Example ProcessSimple Example Process

• Example process: Normal-Normal

8888888888888888

Simple Example ProcessSimple Example ProcessSimple Example Process

• Example process: Normal-Normal

• Intuition: Sample , then sample using

8888888888888888

Simple Example ProcessSimple Example ProcessSimple Example Process

• Example process: Normal-Normal

• Intuition: Sample , then sample using

• Density model :

8888888888888888

Simple Example ProcessSimple Example ProcessSimple Example Process

• Example process: Normal-Normal

• Intuition: Sample , then sample using

• Compute query by integrating:

8888888888888888

Conditional QueriesConditional QueriesConditional Queries

• Compute query using Bayes’ law:

9999999999999999

Conditional QueriesConditional QueriesConditional Queries

• Compute query using Bayes’ law:

9999999999999999

Conditional QueriesConditional QueriesConditional Queries

• Compute query using Bayes’ law:

9999999999999999

Conditional QueriesConditional QueriesConditional Queries

• Compute query using Bayes’ law:

9999999999999999

Conditional QueriesConditional QueriesConditional Queries

• Compute query using Bayes’ law:

9999999999999999

Conditional QueriesConditional QueriesConditional Queries

• Compute query using Bayes’ law:

9999999999999999

So What Can’t Densities Model?So What Can’t Densities Model?So What Can’t Densities Model?

10101010101010101010101010101010

So What Can’t Densities Model?So What Can’t Densities Model?So What Can’t Densities Model?

• Tons of useful things that are easy to write down

10101010101010101010101010101010

So What Can’t Densities Model?So What Can’t Densities Model?So What Can’t Densities Model?

• Tons of useful things that are easy to write down

Distributions given non-axial, zero-probability conditions

10101010101010101010101010101010

So What Can’t Densities Model?So What Can’t Densities Model?So What Can’t Densities Model?

• Tons of useful things that are easy to write down

Distributions given non-axial, zero-probability conditions

Discontinuous change of variable (e.g. a thermometer)

10101010101010101010101010101010

So What Can’t Densities Model?So What Can’t Densities Model?So What Can’t Densities Model?

• Tons of useful things that are easy to write down

Distributions given non-axial, zero-probability conditions

Discontinuous change of variable (e.g. a thermometer)

Distributions of variable-dimension random variables

10101010101010101010101010101010

So What Can’t Densities Model?So What Can’t Densities Model?So What Can’t Densities Model?

• Tons of useful things that are easy to write down

Distributions given non-axial, zero-probability conditions

Discontinuous change of variable (e.g. a thermometer)

Distributions of variable-dimension random variables

Nontrivial distributions on infinite products

10101010101010101010101010101010

So What Can’t Densities Model?So What Can’t Densities Model?So What Can’t Densities Model?

• Tons of useful things that are easy to write down

Distributions given non-axial, zero-probability conditions

Discontinuous change of variable (e.g. a thermometer)

Distributions of variable-dimension random variables

Nontrivial distributions on infinite products

• Tricks to get around limitations aren’t general enough 10101010101010101010101010101010

Measure-Theoretic ProbabilityMeasure-Theoretic ProbabilityMeasure-Theoretic Probability

• Main ideas:

Don’t assign probability-like quantities to values, assignprobabilities to sets — the probability query is king

11111111111111111111111111111111

Measure-Theoretic ProbabilityMeasure-Theoretic ProbabilityMeasure-Theoretic Probability

• Main ideas:

Don’t assign probability-like quantities to values, assignprobabilities to sets — the probability query is king

Confine assumed randomness to one place by makingrandom variables deterministic functions that observe arandom source

11111111111111111111111111111111

Measure-Theoretic ProbabilityMeasure-Theoretic ProbabilityMeasure-Theoretic Probability

• Main ideas:

Don’t assign probability-like quantities to values, assignprobabilities to sets — the probability query is king

Confine assumed randomness to one place by makingrandom variables deterministic functions that observe arandom source

• Measure-theoretic model of example process:

11111111111111111111111111111111

Measure-Theoretic ProbabilityMeasure-Theoretic ProbabilityMeasure-Theoretic Probability

• Main ideas:

Don’t assign probability-like quantities to values, assignprobabilities to sets — the probability query is king

Confine assumed randomness to one place by makingrandom variables deterministic functions that observe arandom source

• Measure-theoretic model of example process:

11111111111111111111111111111111

Measure-Theoretic QueriesMeasure-Theoretic QueriesMeasure-Theoretic Queries

• Specific query:

12121212121212121212121212121212

Measure-Theoretic QueriesMeasure-Theoretic QueriesMeasure-Theoretic Queries

• Specific query:

• Generalized:

12121212121212121212121212121212

Measure-Theoretic QueriesMeasure-Theoretic QueriesMeasure-Theoretic Queries

• Specific query:

• Generalized:

• Conditional query: if then

12121212121212121212121212121212

Measure-Theoretic QueriesMeasure-Theoretic QueriesMeasure-Theoretic Queries

• Specific query:

• Generalized:

• Conditional query: if then

Can we avoid densities when ?

12121212121212121212121212121212

Zero-Probability Conditions (Axial)Zero-Probability Conditions (Axial)Zero-Probability Conditions (Axial)

13131313131313131313131313131313

Zero-Probability Conditions (Axial)Zero-Probability Conditions (Axial)Zero-Probability Conditions (Axial)

13131313131313131313131313131313

Zero-Probability Conditions (Axial)Zero-Probability Conditions (Axial)Zero-Probability Conditions (Axial)

13131313131313131313131313131313

Zero-Probability Conditions (Axial)Zero-Probability Conditions (Axial)Zero-Probability Conditions (Axial)

13131313131313131313131313131313

Zero-Probability Conditions (Axial)Zero-Probability Conditions (Axial)Zero-Probability Conditions (Axial)

13131313131313131313131313131313

Zero-Probability Conditions (Axial)Zero-Probability Conditions (Axial)Zero-Probability Conditions (Axial)

13131313131313131313131313131313

Zero-Probability Conditions (Axial)Zero-Probability Conditions (Axial)Zero-Probability Conditions (Axial)

13131313131313131313131313131313

Zero-Probability Conditions (Circular)Zero-Probability Conditions (Circular)Zero-Probability Conditions (Circular)

14141414141414141414141414141414

Zero-Probability Conditions (Circular)Zero-Probability Conditions (Circular)Zero-Probability Conditions (Circular)

14141414141414141414141414141414

Zero-Probability Conditions (Circular)Zero-Probability Conditions (Circular)Zero-Probability Conditions (Circular)

14141414141414141414141414141414

Zero-Probability Conditions (Circular)Zero-Probability Conditions (Circular)Zero-Probability Conditions (Circular)

14141414141414141414141414141414

Zero-Probability Conditions (Circular)Zero-Probability Conditions (Circular)Zero-Probability Conditions (Circular)

14141414141414141414141414141414

Contribution: Don’t Integrate, Compute Backwards (1)Contribution: Don’t Integrate, Compute Backwards (1)Contribution: Don’t Integrate, Compute Backwards (1)

• Integration is hard!

15151515151515151515151515151515

Contribution: Don’t Integrate, Compute Backwards (1)Contribution: Don’t Integrate, Compute Backwards (1)Contribution: Don’t Integrate, Compute Backwards (1)

• Integration is hard!

• But random variables and are an abstraction boundaryhiding and , so we can choose convenient ones

15151515151515151515151515151515

Contribution: Don’t Integrate, Compute Backwards (1)Contribution: Don’t Integrate, Compute Backwards (1)Contribution: Don’t Integrate, Compute Backwards (1)

• Integration is hard!

• But random variables and are an abstraction boundaryhiding and , so we can choose convenient ones

A uniform random source model:

15151515151515151515151515151515

Contribution: Don’t Integrate, Compute Backwards (1)Contribution: Don’t Integrate, Compute Backwards (1)Contribution: Don’t Integrate, Compute Backwards (1)

• Integration is hard!

• But random variables and are an abstraction boundaryhiding and , so we can choose convenient ones

A uniform random source model:

where is the Normal CDF

15151515151515151515151515151515

Contribution: Don’t Integrate, Compute Backwards (1)Contribution: Don’t Integrate, Compute Backwards (1)Contribution: Don’t Integrate, Compute Backwards (1)

• Integration is hard!

• But random variables and are an abstraction boundaryhiding and , so we can choose convenient ones

A uniform random source model:

where is the Normal CDF

• Stretches instead of integrates 15151515151515151515151515151515

Contribution: Don’t Integrate, Compute Backwards (2)Contribution: Don’t Integrate, Compute Backwards (2)Contribution: Don’t Integrate, Compute Backwards (2)

• Generalized query:

16161616161616161616161616161616

Contribution: Don’t Integrate, Compute Backwards (2)Contribution: Don’t Integrate, Compute Backwards (2)Contribution: Don’t Integrate, Compute Backwards (2)

• Generalized query:

i.e. output distributions are defined by preimages

16161616161616161616161616161616

Contribution: Don’t Integrate, Compute Backwards (2)Contribution: Don’t Integrate, Compute Backwards (2)Contribution: Don’t Integrate, Compute Backwards (2)

• Generalized query:

i.e. output distributions are defined by preimages

• For a uniform random source model,

Compute probabilities by computing preimage areas

16161616161616161616161616161616

Contribution: Don’t Integrate, Compute Backwards (2)Contribution: Don’t Integrate, Compute Backwards (2)Contribution: Don’t Integrate, Compute Backwards (2)

• Generalized query:

i.e. output distributions are defined by preimages

• For a uniform random source model,

Compute probabilities by computing preimage areas

Compute conditional probabilities as quotients of preimageareas

16161616161616161616161616161616

Contribution: Don’t Integrate, Compute Backwards (2)Contribution: Don’t Integrate, Compute Backwards (2)Contribution: Don’t Integrate, Compute Backwards (2)

• Generalized query:

i.e. output distributions are defined by preimages

• For a uniform random source model,

Compute probabilities by computing preimage areas

Compute conditional probabilities as quotients of preimageareas

• Is this really more feasible than integrating?

16161616161616161616161616161616

Queries Using PreimagesQueries Using PreimagesQueries Using Preimages

17171717171717171717171717171717

Queries Using PreimagesQueries Using PreimagesQueries Using Preimages

Uniform Random Source Original Model

17171717171717171717171717171717

Queries Using PreimagesQueries Using PreimagesQueries Using Preimages

Uniform Random Source Original Model

17171717171717171717171717171717

Queries Using PreimagesQueries Using PreimagesQueries Using Preimages

Uniform Random Source Original Model

17171717171717171717171717171717

Queries Using PreimagesQueries Using PreimagesQueries Using Preimages

Uniform Random Source Original Model

17171717171717171717171717171717

Crazy Idea is Feasible If...Crazy Idea is Feasible If...Crazy Idea is Feasible If...

• Seems like we need:

Standard interpretation of programs as pure functions from arandom source

18181818181818181818181818181818

Crazy Idea is Feasible If...Crazy Idea is Feasible If...Crazy Idea is Feasible If...

• Seems like we need:

Standard interpretation of programs as pure functions from arandom source

Efficient way to compute preimage sets

18181818181818181818181818181818

Crazy Idea is Feasible If...Crazy Idea is Feasible If...Crazy Idea is Feasible If...

• Seems like we need:

Standard interpretation of programs as pure functions from arandom source

Efficient way to compute preimage sets

Efficient representation of arbitrary sets

18181818181818181818181818181818

Crazy Idea is Feasible If...Crazy Idea is Feasible If...Crazy Idea is Feasible If...

• Seems like we need:

Standard interpretation of programs as pure functions from arandom source

Efficient way to compute preimage sets

Efficient representation of arbitrary sets

Efficient way to compute areas of preimage sets

18181818181818181818181818181818

Crazy Idea is Feasible If...Crazy Idea is Feasible If...Crazy Idea is Feasible If...

• Seems like we need:

Standard interpretation of programs as pure functions from arandom source

Efficient way to compute preimage sets

Efficient representation of arbitrary sets

Efficient way to compute areas of preimage sets

Proof of correctness w.r.t. standard interpretation

18181818181818181818181818181818

Crazy Idea is Feasible If...Crazy Idea is Feasible If...Crazy Idea is Feasible If...

• Seems like we need:

Standard interpretation of programs as pure functions from arandom source

Efficient way to compute preimage sets

Efficient representation of arbitrary sets

Efficient way to compute areas of preimage sets

Proof of correctness w.r.t. standard interpretation

• Completely infeasible! But...

18181818181818181818181818181818

What About Approximating?What About Approximating?What About Approximating?

Conservative approximation with rectangles:

19191919191919191919191919191919

What About Approximating?What About Approximating?What About Approximating?

Conservative approximation with rectangles:

19191919191919191919191919191919

What About Approximating?What About Approximating?What About Approximating?

Restricting preimages to rectangular subdomains:

20202020202020202020202020202020

What About Approximating?What About Approximating?What About Approximating?

Restricting preimages to rectangular subdomains:

20202020202020202020202020202020

What About Approximating?What About Approximating?What About Approximating?

Restricting preimages to rectangular subdomains:

20202020202020202020202020202020

What About Approximating?What About Approximating?What About Approximating?

Restricting preimages to rectangular subdomains:

20202020202020202020202020202020

What About Approximating?What About Approximating?What About Approximating?

Restricting preimages to rectangular subdomains:

20202020202020202020202020202020

What About Approximating?What About Approximating?What About Approximating?

Restricting preimages to rectangular subdomains:

20202020202020202020202020202020

What About Approximating?What About Approximating?What About Approximating?

Restricting preimages to rectangular subdomains:

20202020202020202020202020202020

What About Approximating?What About Approximating?What About Approximating?

Restricting preimages to rectangular subdomains:

20202020202020202020202020202020

What About Approximating?What About Approximating?What About Approximating?

Restricting preimages to rectangular subdomains:

20202020202020202020202020202020

What About Approximating?What About Approximating?What About Approximating?

Restricting preimages to rectangular subdomains:

20202020202020202020202020202020

What About Approximating?What About Approximating?What About Approximating?

Restricting preimages to rectangular subdomains:

20202020202020202020202020202020

What About Approximating?What About Approximating?What About Approximating?

Restricting preimages to rectangular subdomains:

20202020202020202020202020202020

What About Approximating?What About Approximating?What About Approximating?

Sampling: exponential to quadratic (e.g. days to minutes)

21212121212121212121212121212121

What About Approximating?What About Approximating?What About Approximating?

Sampling: exponential to quadratic (e.g. days to minutes)

21212121212121212121212121212121

Crazy Idea is Actually Feasible If...Crazy Idea is Actually Feasible If...Crazy Idea is Actually Feasible If...

• Standard interpretation of programs as pure functions from arandom source

• Efficient way to compute preimage sets

• Efficient representation of arbitrary sets

• Efficient way to compute volumes of preimage sets

• Proof of correctness w.r.t. standard interpretation

22222222222222222222222222222222

Crazy Idea is Actually Feasible If...Crazy Idea is Actually Feasible If...Crazy Idea is Actually Feasible If...

• Standard interpretation of programs as pure functions from arandom source

• Efficient way to compute approximate preimage subsets

• Efficient representation of arbitrary sets

• Efficient way to compute volumes of preimage sets

• Proof of correctness w.r.t. standard interpretation

22222222222222222222222222222222

Crazy Idea is Actually Feasible If...Crazy Idea is Actually Feasible If...Crazy Idea is Actually Feasible If...

• Standard interpretation of programs as pure functions from arandom source

• Efficient way to compute approximate preimage subsets

• Efficient representation of approximating sets

• Efficient way to compute volumes of preimage sets

• Proof of correctness w.r.t. standard interpretation

22222222222222222222222222222222

Crazy Idea is Actually Feasible If...Crazy Idea is Actually Feasible If...Crazy Idea is Actually Feasible If...

• Standard interpretation of programs as pure functions from arandom source

• Efficient way to compute approximate preimage subsets

• Efficient representation of approximating sets

• Efficient way to sample uniformly in preimage sets

• Proof of correctness w.r.t. standard interpretation

22222222222222222222222222222222

Crazy Idea is Actually Feasible If...Crazy Idea is Actually Feasible If...Crazy Idea is Actually Feasible If...

• Standard interpretation of programs as pure functions from arandom source

• Efficient way to compute approximate preimage subsets

• Efficient representation of approximating sets

• Efficient way to sample uniformly in preimage sets

Efficient domain partition sampling

• Proof of correctness w.r.t. standard interpretation

22222222222222222222222222222222

Crazy Idea is Actually Feasible If...Crazy Idea is Actually Feasible If...Crazy Idea is Actually Feasible If...

• Standard interpretation of programs as pure functions from arandom source

• Efficient way to compute approximate preimage subsets

• Efficient representation of approximating sets

• Efficient way to sample uniformly in preimage sets

Efficient domain partition sampling

Efficient way to determine whether a domain sample isactually in the preimage (just use standard interpretation)

• Proof of correctness w.r.t. standard interpretation

22222222222222222222222222222222

Standard InterpretationStandard InterpretationStandard Interpretation

• Grammar:

23232323232323232323232323232323

Standard InterpretationStandard InterpretationStandard Interpretation

• Grammar:

• Semantic function

23232323232323232323232323232323

Standard InterpretationStandard InterpretationStandard Interpretation

• Grammar:

• Semantic function

• Math has no general recursion, so (i.e. interpretation ofprogram ) is a λ-calculus term

23232323232323232323232323232323

Standard InterpretationStandard InterpretationStandard Interpretation

• Grammar:

• Semantic function

• Math has no general recursion, so (i.e. interpretation ofprogram ) is a λ-calculus term

• Easy implementation in any language with lambdas

23232323232323232323232323232323

Compositional SemanticsCompositional SemanticsCompositional Semantics

• Compositional: every term’s meaning depends only on itsimmediate subterms’ meanings

24242424242424242424242424242424

Compositional SemanticsCompositional SemanticsCompositional Semantics

• Compositional: every term’s meaning depends only on itsimmediate subterms’ meanings

• Advantage: proofs about all programs by structural induction

24242424242424242424242424242424

Compositional SemanticsCompositional SemanticsCompositional Semantics

• Compositional: every term’s meaning depends only on itsimmediate subterms’ meanings

• Advantage: proofs about all programs by structural induction

• Example: meaning of

24242424242424242424242424242424

Compositional SemanticsCompositional SemanticsCompositional Semantics

• Compositional: every term’s meaning depends only on itsimmediate subterms’ meanings

• Advantage: proofs about all programs by structural induction

• Example: meaning of

24242424242424242424242424242424

Compositional SemanticsCompositional SemanticsCompositional Semantics

• Compositional: every term’s meaning depends only on itsimmediate subterms’ meanings

• Advantage: proofs about all programs by structural induction

• Example: meaning of

• Nonexample:

24242424242424242424242424242424

Compositional SemanticsCompositional SemanticsCompositional Semantics

• Compositional: every term’s meaning depends only on itsimmediate subterms’ meanings

• Advantage: proofs about all programs by structural induction

• Example: meaning of

• Nonexample:

24242424242424242424242424242424

Compositional SemanticsCompositional SemanticsCompositional Semantics

• Compositional: every term’s meaning depends only on itsimmediate subterms’ meanings

• Advantage: proofs about all programs by structural induction

• Example: meaning of

• Nonexample:

• Can preimages be computed compositionally? 24242424242424242424242424242424

Pair PreimagesPair PreimagesPair Preimages

25252525252525252525252525252525

Pair PreimagesPair PreimagesPair Preimages

25252525252525252525252525252525

Pair PreimagesPair PreimagesPair Preimages

:

25252525252525252525252525252525

Pair PreimagesPair PreimagesPair Preimages

and :

25252525252525252525252525252525

Pair PreimagesPair PreimagesPair Preimages

:

25252525252525252525252525252525

Nonstandard Interpretation: Computing PreimagesNonstandard Interpretation: Computing PreimagesNonstandard Interpretation: Computing Preimages

• Preimage computation:

26262626262626262626262626262626

Nonstandard Interpretation: Computing PreimagesNonstandard Interpretation: Computing PreimagesNonstandard Interpretation: Computing Preimages

• Preimage computation:

26262626262626262626262626262626

Nonstandard Interpretation: Preimages Under PairingNonstandard Interpretation: Preimages Under PairingNonstandard Interpretation: Preimages Under Pairing

• Pairing types:

27272727272727272727272727272727

Nonstandard Interpretation: Preimages Under PairingNonstandard Interpretation: Preimages Under PairingNonstandard Interpretation: Preimages Under Pairing

• Pairing types:

Theorem (correctness under pairing). If

computes preimages under

27272727272727272727272727272727

Nonstandard Interpretation: Preimages Under PairingNonstandard Interpretation: Preimages Under PairingNonstandard Interpretation: Preimages Under Pairing

• Pairing types:

Theorem (correctness under pairing). If

computes preimages under

computes preimages under

27272727272727272727272727272727

Nonstandard Interpretation: Preimages Under PairingNonstandard Interpretation: Preimages Under PairingNonstandard Interpretation: Preimages Under Pairing

• Pairing types:

Theorem (correctness under pairing). If

computes preimages under

computes preimages under

then computes preimages under .

27272727272727272727272727272727

Nonstandard Interpretation: Preimages Under PairingNonstandard Interpretation: Preimages Under PairingNonstandard Interpretation: Preimages Under Pairing

• Pairing types:

Theorem (correctness under pairing). If

computes preimages under

computes preimages under

then computes preimages under .

Proof sketch. Preimages distribute over cartesian products.

27272727272727272727272727272727

Nonstandard Interpretation: Preimages Under PairingNonstandard Interpretation: Preimages Under PairingNonstandard Interpretation: Preimages Under Pairing

• Pairing types:

Theorem (correctness under pairing). If

computes preimages under

computes preimages under

then computes preimages under .

Proof sketch. Preimages distribute over cartesian products.

• Similar theorems for every kind of term 27272727272727272727272727272727

Nonstandard Interpretation: CorrectnessNonstandard Interpretation: CorrectnessNonstandard Interpretation: Correctness

Theorem. For all programs , computes preimages under.

Proof. By structural induction on program terms.

28282828282828282828282828282828

Wait a MinuteWait a MinuteWait a Minute

• Q. Don’t the interpretations of do uncountable things?

29292929292929292929292929292929

Wait a MinuteWait a MinuteWait a Minute

• Q. Don’t the interpretations of do uncountable things?

A. Yes. Yes, they do.

29292929292929292929292929292929

Wait a MinuteWait a MinuteWait a Minute

• Q. Don’t the interpretations of do uncountable things?

A. Yes. Yes, they do.

• Q. Where do I get a computer that runs them?

29292929292929292929292929292929

Wait a MinuteWait a MinuteWait a Minute

• Q. Don’t the interpretations of do uncountable things?

A. Yes. Yes, they do.

• Q. Where do I get a computer that runs them?

A. Nowhere, but we’ll approximate them soon.

29292929292929292929292929292929

Wait a MinuteWait a MinuteWait a Minute

• Q. Don’t the interpretations of do uncountable things?

A. Yes. Yes, they do.

• Q. Where do I get a computer that runs them?

A. Nowhere, but we’ll approximate them soon.

• Q. Why interpret programs as uncomputable functions, then?

29292929292929292929292929292929

Wait a MinuteWait a MinuteWait a Minute

• Q. Don’t the interpretations of do uncountable things?

A. Yes. Yes, they do.

• Q. Where do I get a computer that runs them?

A. Nowhere, but we’ll approximate them soon.

• Q. Why interpret programs as uncomputable functions, then?

A. So we know exactly what to approximate.

29292929292929292929292929292929

Wait a MinuteWait a MinuteWait a Minute

• Q. Don’t the interpretations of do uncountable things?

A. Yes. Yes, they do.

• Q. Where do I get a computer that runs them?

A. Nowhere, but we’ll approximate them soon.

• Q. Why interpret programs as uncomputable functions, then?

A. So we know exactly what to approximate.

• Q. Where did you get a λ-calculus that could operate on arbitrary,possibly infinite sets, anyway?

A. Well...

29292929292929292929292929292929

Lambda-ZFCLambda-ZFCLambda-ZFC

λ calculus

30303030303030303030303030303030

Lambda-ZFCLambda-ZFCLambda-ZFC

λ calculus

+

Infinite sets and operations

30303030303030303030303030303030

Lambda-ZFCLambda-ZFCLambda-ZFC

λ calculus

+

Infinite sets and operations

=

λZFC

30303030303030303030303030303030

Lambda-ZFCLambda-ZFCLambda-ZFC

λ calculus

+

Infinite sets and operations

=

λZFC

• Contemporary math, but with lambdas and general recursion; orfunctional programming, but with infinite sets

30303030303030303030303030303030

Lambda-ZFCLambda-ZFCLambda-ZFC

λ calculus

+

Infinite sets and operations

=

λZFC

• Contemporary math, but with lambdas and general recursion; orfunctional programming, but with infinite sets

• Can express uncountably infinite operations, can’t solve its ownhalting problem

30303030303030303030303030303030

Lambda-ZFCLambda-ZFCLambda-ZFC

λ calculus

+

Infinite sets and operations

=

λZFC

• Contemporary math, but with lambdas and general recursion; orfunctional programming, but with infinite sets

• Can express uncountably infinite operations, can’t solve its ownhalting problem

• Can use contemporary mathematical theorems directly

30303030303030303030303030303030

Rectangular ApproximationRectangular ApproximationRectangular Approximation

• A rectangle is

An interval or union of intervals

for rectangles and

31313131313131313131313131313131

Rectangular ApproximationRectangular ApproximationRectangular Approximation

• A rectangle is

An interval or union of intervals

for rectangles and

• Easy representation; easy intersection and join (union-like)operation, empty test, other operations

31313131313131313131313131313131

Rectangular ApproximationRectangular ApproximationRectangular Approximation

• A rectangle is

An interval or union of intervals

for rectangles and

• Easy representation; easy intersection and join (union-like)operation, empty test, other operations

• Recall:

31313131313131313131313131313131

Rectangular ApproximationRectangular ApproximationRectangular Approximation

• A rectangle is

An interval or union of intervals

for rectangles and

• Easy representation; easy intersection and join (union-like)operation, empty test, other operations

• Recall:

• Define:

31313131313131313131313131313131

Rectangular ApproximationRectangular ApproximationRectangular Approximation

• A rectangle is

An interval or union of intervals

for rectangles and

• Easy representation; easy intersection and join (union-like)operation, empty test, other operations

• Recall:

• Define:

• Derive 31313131313131313131313131313131

In Theory...In Theory...In Theory...

Theorem (sound). computes overapproximations of thepreimages computed by .

• Consequence: Sampling within preimages doesn’t leave anythingout

32323232323232323232323232323232

In Theory...In Theory...In Theory...

Theorem (sound). computes overapproximations of thepreimages computed by .

• Consequence: Sampling within preimages doesn’t leave anythingout

Theorem (monotone). is monotone.

• Consequence: Partitioning the domain never increasesapproximate preimages

32323232323232323232323232323232

In Theory...In Theory...In Theory...

Theorem (sound). computes overapproximations of thepreimages computed by .

• Consequence: Sampling within preimages doesn’t leave anythingout

Theorem (monotone). is monotone.

• Consequence: Partitioning the domain never increasesapproximate preimages

Theorem (decreasing). never returns preimages larger thanthe given subdomain.

• Consequence: Refining preimage partitions never explodes32323232323232323232323232323232

In Practice...In Practice...In Practice...

Theorems prove this always works:

33333333333333333333333333333333

In Practice...In Practice...In Practice...

Theorems prove this always works:

33333333333333333333333333333333

In Practice...In Practice...In Practice...

Theorems prove this always works:

33333333333333333333333333333333

In Practice...In Practice...In Practice...

Theorems prove this always works:

33333333333333333333333333333333

In Practice...In Practice...In Practice...

Theorems prove this always works:

33333333333333333333333333333333

In Practice...In Practice...In Practice...

Theorems prove this always works:

33333333333333333333333333333333

In Practice...In Practice...In Practice...

Theorems prove this always works:

33333333333333333333333333333333

In Practice...In Practice...In Practice...

Theorems prove this always works:

33333333333333333333333333333333

In Practice...In Practice...In Practice...

Theorems prove this always works:

33333333333333333333333333333333

Importance SamplingImportance SamplingImportance Sampling

• Alternative to arbitrarily low-rate rejection sampling:

34343434343434343434343434343434

Importance SamplingImportance SamplingImportance Sampling

• Alternative to arbitrarily low-rate rejection sampling:

First, refine using preimage computation:

34343434343434343434343434343434

Importance SamplingImportance SamplingImportance Sampling

• Alternative to arbitrarily low-rate rejection sampling:

Second, randomly choose from arbitrarily fine partition:

34343434343434343434343434343434

Importance SamplingImportance SamplingImportance Sampling

• Alternative to arbitrarily low-rate rejection sampling:

Third, refine again:

34343434343434343434343434343434

Importance SamplingImportance SamplingImportance Sampling

• Alternative to arbitrarily low-rate rejection sampling:

Fourth, sample uniformly:

34343434343434343434343434343434

Importance SamplingImportance SamplingImportance Sampling

• Alternative to arbitrarily low-rate rejection sampling:

Do process “in the limit”; i.e. choose :

34343434343434343434343434343434

What About Recursion?What About Recursion?What About Recursion?

• General recursion, programs that halt with probability 1; e.g.

(define/drbayes (geometric p) (if (bernoulli p)

0(+ 1 (geometric p))))

35353535353535353535353535353535

What About Recursion?What About Recursion?What About Recursion?

• General recursion, programs that halt with probability 1; e.g.

(define/drbayes (geometric p) (if (bernoulli p)

0(+ 1 (geometric p))))

• Consider programs as being fully inlined (thus infinite):

(if (bernoulli p)0(+ 1 (if (bernoulli p)

0(+ 1 (if (bernoulli p)

0(+ 1 ...))))))

35353535353535353535353535353535

What About Recursion?What About Recursion?What About Recursion?

• General recursion, programs that halt with probability 1; e.g.

(define/drbayes (geometric p) (if (bernoulli p)

0(+ 1 (geometric p))))

• Consider programs as being fully inlined (thus infinite):

(if (bernoulli p)0(+ 1 (if (bernoulli p)

0(+ 1 (if (bernoulli p)

0(+ 1 ...))))))

• Random domain needs to be big enough and the right shape35353535353535353535353535353535

Program Domain ValuesProgram Domain ValuesProgram Domain Values

• Values are infinite binary trees:

36363636363636363636363636363636

Program Domain ValuesProgram Domain ValuesProgram Domain Values

• Values are infinite binary trees:

• Every expression in a program is assigned a node

36363636363636363636363636363636

Program Domain ValuesProgram Domain ValuesProgram Domain Values

• Values are infinite binary trees:

• Every expression in a program is assigned a node

• Implemented using lazy trees of random values

36363636363636363636363636363636

Program Domain ValuesProgram Domain ValuesProgram Domain Values

• Values are infinite binary trees:

• Every expression in a program is assigned a node

• Implemented using lazy trees of random values

• No probability density for domain, but there is a measure 36363636363636363636363636363636

Demo: Normal-Normal With Circular ConditionDemo: Normal-Normal With Circular ConditionDemo: Normal-Normal With Circular Condition

• Normal-Normal process:

37373737373737373737373737373737

Demo: Normal-Normal With Circular ConditionDemo: Normal-Normal With Circular ConditionDemo: Normal-Normal With Circular Condition

• Normal-Normal process:

• Objective: Find the distribution of

37373737373737373737373737373737

Demo: Normal-Normal With Circular ConditionDemo: Normal-Normal With Circular ConditionDemo: Normal-Normal With Circular Condition

• Normal-Normal process:

• Objective: Find the distribution of

• Implementation:

(define/drbayes e (let* ([x (normal 0 1)]

[y (normal x 1)]) (list x y (sqrt (+ (sqr x) (sqr y))))))

37373737373737373737373737373737

Demo: Normal-Normal With Circular ConditionDemo: Normal-Normal With Circular ConditionDemo: Normal-Normal With Circular Condition

• Normal-Normal process:

• Objective: Find the distribution of

• Implementation:

(define/drbayes e (let* ([x (normal 0 1)]

[y (normal x 1)]) (list x y (sqrt (+ (sqr x) (sqr y))))))

• Goal: Sample in the preimage of

(set-list reals reals (interval (- 1 ε) (+ 1 ε)))

37373737373737373737373737373737

Demo: Normal-Normal With Circular ConditionDemo: Normal-Normal With Circular ConditionDemo: Normal-Normal With Circular Condition

For ε = 0.01:

Preimage rectangles

38383838383838383838383838383838

Demo: Normal-Normal With Circular ConditionDemo: Normal-Normal With Circular ConditionDemo: Normal-Normal With Circular Condition

For ε = 0.01:

Preimage samples

38383838383838383838383838383838

Demo: Normal-Normal With Circular ConditionDemo: Normal-Normal With Circular ConditionDemo: Normal-Normal With Circular Condition

For ε = 0.01:

Preimage samples Output samples

38383838383838383838383838383838

Demo: Normal-Normal With Circular ConditionDemo: Normal-Normal With Circular ConditionDemo: Normal-Normal With Circular Condition

For ε = 0.01:

Preimage samples Output samples

• Works fine with much smaller ε38383838383838383838383838383838

Demo: ThermometerDemo: ThermometerDemo: Thermometer

• Normal-Normal thermometer process:

39393939393939393939393939393939

Demo: ThermometerDemo: ThermometerDemo: Thermometer

• Normal-Normal thermometer process:

• Objective: Find the distribution of

39393939393939393939393939393939

Demo: ThermometerDemo: ThermometerDemo: Thermometer

• Normal-Normal thermometer process:

• Objective: Find the distribution of

• Implementation:

(define/drbayes e (let* ([x (normal 90 10)]

[y (normal x 1)]) (list x (if (> y 100) 100 y))))

39393939393939393939393939393939

Demo: ThermometerDemo: ThermometerDemo: Thermometer

• Normal-Normal thermometer process:

• Objective: Find the distribution of

• Implementation:

(define/drbayes e (let* ([x (normal 90 10)]

[y (normal x 1)]) (list x (if (> y 100) 100 y))))

• Goal: Sample in the preimage of

(set-list reals (interval 100 100)) 39393939393939393939393939393939

Demo: ThermometerDemo: ThermometerDemo: Thermometer

Preimage rectangles

40404040404040404040404040404040

Demo: ThermometerDemo: ThermometerDemo: Thermometer

Preimage samples

40404040404040404040404040404040

Demo: ThermometerDemo: ThermometerDemo: Thermometer

Preimage samples Density of

40404040404040404040404040404040

Demo: ThermometerDemo: ThermometerDemo: Thermometer

Preimage samples Density of

Calculated from samples: mean 105.1, stddev 4.6

40404040404040404040404040404040

Demo: Stochastic Ray TracingDemo: Stochastic Ray TracingDemo: Stochastic Ray Tracing

• Idea: Model light transmission and reflection, condition on pathsthat pass through aperture

41414141414141414141414141414141

Demo: Stochastic Ray TracingDemo: Stochastic Ray TracingDemo: Stochastic Ray Tracing

• Idea: Model light transmission and reflection, condition on pathsthat pass through aperture

41414141414141414141414141414141

Demo: Stochastic Ray TracingDemo: Stochastic Ray TracingDemo: Stochastic Ray Tracing

• Part of the implementation (totals ~50 lines):(define/drbayes (ray-plane-intersect p0 v n d) (let ([denom (- (vec-dot v n))])

(if (positive? denom)(let ([t (/ (+ d (vec-dot p0 n)) denom)]) (if (positive? t)

(collision t (vec+ p0 (vec-scale v t)) n)#f))

#f)))

42424242424242424242424242424242

Demo: Stochastic Ray TracingDemo: Stochastic Ray TracingDemo: Stochastic Ray Tracing

• Part of the implementation (totals ~50 lines):(define/drbayes (ray-plane-intersect p0 v n d) (let ([denom (- (vec-dot v n))])

(if (positive? denom)(let ([t (/ (+ d (vec-dot p0 n)) denom)]) (if (positive? t)

(collision t (vec+ p0 (vec-scale v t)) n)#f))

#f)))

• Constrained light path outputs:

Paths Through Aperture Projected and Accumulated

42424242424242424242424242424242

Other Inference TasksOther Inference TasksOther Inference Tasks

• Typical

Hierarchical models

Bayesian regression

Model selection

43434343434343434343434343434343

Other Inference TasksOther Inference TasksOther Inference Tasks

• Typical

Hierarchical models

Bayesian regression

Model selection

• Atypical

Programs that halt with probability < 1, or never halt

Probabilistic program verification (sample in preimage of errorcondition)

43434343434343434343434343434343

Thesis StatementThesis StatementThesis Statement

Functional programming theory and measure-theoreticprobability provide a solid foundation

for trustworthy, useful languages for constructiveprobabilistic modeling and inference.

44444444444444444444444444444444

Thesis StatementThesis StatementThesis Statement

Functional programming theory and measure-theoreticprobability provide a solid foundation

for trustworthy, useful languages for constructiveprobabilistic modeling and inference.

True.

44444444444444444444444444444444

Thesis StatementThesis StatementThesis Statement

Functional programming theory and measure-theoreticprobability provide a solid foundation

for trustworthy, useful languages for constructiveprobabilistic modeling and inference.

True.

• Was it falsifiable?

44444444444444444444444444444444

MeasurabilityMeasurabilityMeasurability

• Only measurable sets can have probabilities

45454545454545454545454545454545

MeasurabilityMeasurabilityMeasurability

• Only measurable sets can have probabilities

• Computing preimages under must preserve measurability—wesay itself is measurable

45454545454545454545454545454545

MeasurabilityMeasurabilityMeasurability

• Only measurable sets can have probabilities

• Computing preimages under must preserve measurability—wesay itself is measurable

Theorem (measurability). For all programs , is measurable,regardless of errors or nontermination, if language primitives aremeasurable.

45454545454545454545454545454545

MeasurabilityMeasurabilityMeasurability

• Only measurable sets can have probabilities

• Computing preimages under must preserve measurability—wesay itself is measurable

Theorem (measurability). For all programs , is measurable,regardless of errors or nontermination, if language primitives aremeasurable.

• Primitives include uncomputable operations like limits

45454545454545454545454545454545

MeasurabilityMeasurabilityMeasurability

• Only measurable sets can have probabilities

• Computing preimages under must preserve measurability—wesay itself is measurable

Theorem (measurability). For all programs , is measurable,regardless of errors or nontermination, if language primitives aremeasurable.

• Primitives include uncomputable operations like limits

• Applies to all probabilistic programming languages45454545454545454545454545454545

What I DidWhat I DidWhat I Did

45454545454545454545454545454545

What I DidWhat I DidWhat I Did

The core calculus for this:

45454545454545454545454545454545

Future WorkFuture WorkFuture Work

• Expressiveness

Lambdas and macros

Exceptions, parameters (or continuations and marks)

46464646464646464646464646464646

Future WorkFuture WorkFuture Work

• Expressiveness

Lambdas and macros

Exceptions, parameters (or continuations and marks)

• Optimization

Direct implementation is in depth; cut to

Incremental computation

Adaptive sampling algorithms

Static analysis

46464646464646464646464646464646

Future WorkFuture WorkFuture Work

• Expressiveness

Lambdas and macros

Exceptions, parameters (or continuations and marks)

• Optimization

Direct implementation is in depth; cut to

Incremental computation

Adaptive sampling algorithms

Static analysis

• Branching out: investigate preimage computation connection withtype systems and predicate transformer semantics 46464646464646464646464646464646

top related