probabilistic inference and learning with stein's...
TRANSCRIPT
Probabilistic Inference and Learning with
Stein’s Method
Lester Mackey
Microsoft Research New England
November 6, 2019
Collaborators: Jackson Gorham, Andrew Duncan, Sebastian Vollmer, Jonathan Huggins,Wilson Chen, Alessandro Barp, Francois-Xavier Briol, Mark Girolami, Chris Oates, Murat
Erdogdu, and Ohad Shamir
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 1 / 33
Motivation: Large-scale Posterior Inference
Example: Logistic regression1 Fixed feature vectors: vl ∈ Rd for each datapoint l = 1, . . . , L2 Binary class labels: Yl ∈ {0, 1}
, P(Yl = 1 | vl, β) = 11+e−〈β,vl〉
3 Unknown parameter vector: β ∼ N (0, I)
Generative model simple to expressPosterior distribution over unknown parameters is complex
Normalization constant unknown, exact integration intractable
Standard inferential approach: Use Markov chain Monte Carlo(MCMC) to (eventually) draw samples from the posterior distribution
Benefit: Approximates intractable posterior expectationsEP [h(Z)] =
∫X p(x)h(x)dx with asymptotically exact sample
estimates EQ[h(X)] = 1n
∑ni=1 h(xi)
Problem: Each new MCMC sample point xi requires iteratingover entire observed dataset: prohibitive when dataset is large!
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 2 / 33
Motivation: Large-scale Posterior Inference
Example: Logistic regression1 Fixed feature vectors: vl ∈ Rd for each datapoint l = 1, . . . , L2 Binary class labels: Yl ∈ {0, 1}, P(Yl = 1 | vl, β) = 1
1+e−〈β,vl〉
3 Unknown parameter vector: β ∈ Rd
Generative model simple to expressPosterior distribution over unknown parameters is complex
Normalization constant unknown, exact integration intractable
Standard inferential approach: Use Markov chain Monte Carlo(MCMC) to (eventually) draw samples from the posterior distribution
Benefit: Approximates intractable posterior expectationsEP [h(Z)] =
∫X p(x)h(x)dx with asymptotically exact sample
estimates EQ[h(X)] = 1n
∑ni=1 h(xi)
Problem: Each new MCMC sample point xi requires iteratingover entire observed dataset: prohibitive when dataset is large!
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 2 / 33
Motivation: Large-scale Posterior Inference
Example: Bayesian logistic regression1 Fixed feature vectors: vl ∈ Rd for each datapoint l = 1, . . . , L2 Binary class labels: Yl ∈ {0, 1}, P(Yl = 1 | vl, β) = 1
1+e−〈β,vl〉
3 Unknown parameter vector: β ∼ N (0, I)
Generative model simple to express
Posterior distribution over unknown parameters is complexNormalization constant unknown, exact integration intractable
Standard inferential approach: Use Markov chain Monte Carlo(MCMC) to (eventually) draw samples from the posterior distribution
Benefit: Approximates intractable posterior expectationsEP [h(Z)] =
∫X p(x)h(x)dx with asymptotically exact sample
estimates EQ[h(X)] = 1n
∑ni=1 h(xi)
Problem: Each new MCMC sample point xi requires iteratingover entire observed dataset: prohibitive when dataset is large!
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 2 / 33
Motivation: Large-scale Posterior Inference
Example: Bayesian logistic regression1 Fixed feature vectors: vl ∈ Rd for each datapoint l = 1, . . . , L2 Binary class labels: Yl ∈ {0, 1}, P(Yl = 1 | vl, β) = 1
1+e−〈β,vl〉
3 Unknown parameter vector: β ∼ N (0, I)
Generative model simple to expressPosterior distribution over unknown parameters is complex
Normalization constant unknown, exact integration intractable
Standard inferential approach: Use Markov chain Monte Carlo(MCMC) to (eventually) draw samples from the posterior distribution
Benefit: Approximates intractable posterior expectationsEP [h(Z)] =
∫X p(x)h(x)dx with asymptotically exact sample
estimates EQ[h(X)] = 1n
∑ni=1 h(xi)
Problem: Each new MCMC sample point xi requires iteratingover entire observed dataset: prohibitive when dataset is large!
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 2 / 33
Motivation: Large-scale Posterior Inference
Example: Bayesian logistic regression1 Fixed feature vectors: vl ∈ Rd for each datapoint l = 1, . . . , L2 Binary class labels: Yl ∈ {0, 1}, P(Yl = 1 | vl, β) = 1
1+e−〈β,vl〉
3 Unknown parameter vector: β ∼ N (0, I)
Generative model simple to expressPosterior distribution over unknown parameters is complex
Normalization constant unknown, exact integration intractable
Standard inferential approach: Use Markov chain Monte Carlo(MCMC) to (eventually) draw samples from the posterior distribution
Benefit: Approximates intractable posterior expectationsEP [h(Z)] =
∫X p(x)h(x)dx with asymptotically exact sample
estimates EQ[h(X)] = 1n
∑ni=1 h(xi)
Problem: Each new MCMC sample point xi requires iteratingover entire observed dataset: prohibitive when dataset is large!
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 2 / 33
Motivation: Large-scale Posterior Inference
Example: Bayesian logistic regression1 Fixed feature vectors: vl ∈ Rd for each datapoint l = 1, . . . , L2 Binary class labels: Yl ∈ {0, 1}, P(Yl = 1 | vl, β) = 1
1+e−〈β,vl〉
3 Unknown parameter vector: β ∼ N (0, I)
Generative model simple to expressPosterior distribution over unknown parameters is complex
Normalization constant unknown, exact integration intractable
Standard inferential approach: Use Markov chain Monte Carlo(MCMC) to (eventually) draw samples from the posterior distribution
Benefit: Approximates intractable posterior expectationsEP [h(Z)] =
∫X p(x)h(x)dx with asymptotically exact sample
estimates EQ[h(X)] = 1n
∑ni=1 h(xi)
Problem: Each new MCMC sample point xi requires iteratingover entire observed dataset: prohibitive when dataset is large!
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 2 / 33
Motivation: Large-scale Posterior Inference
Question: How do we scale Markov chain Monte Carlo (MCMC)posterior inference to massive datasets?
MCMC Benefit: Approximates intractable posteriorexpectations EP [h(Z)] =
∫X p(x)h(x)dx with asymptotically
exact sample estimates EQ[h(X)] = 1n
∑ni=1 h(xi)
Problem: Each point xi requires iterating over entire dataset!
Template solution: Approximate MCMC with subset posteriors[Welling and Teh, 2011, Ahn, Korattikara, and Welling, 2012, Korattikara, Chen, and Welling, 2014]
Approximate standard MCMC procedure in a manner that makesuse of only a small subset of datapoints per sampleReduced computational overhead leads to faster sampling andreduced Monte Carlo varianceIntroduces asymptotic bias: target distribution is not stationaryHope that for fixed amount of sampling time, variance reductionwill outweigh bias introduced
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 3 / 33
Motivation: Large-scale Posterior Inference
Question: How do we scale Markov chain Monte Carlo (MCMC)posterior inference to massive datasets?
MCMC Benefit: Approximates intractable posteriorexpectations EP [h(Z)] =
∫X p(x)h(x)dx with asymptotically
exact sample estimates EQ[h(X)] = 1n
∑ni=1 h(xi)
Problem: Each point xi requires iterating over entire dataset!
Template solution: Approximate MCMC with subset posteriors[Welling and Teh, 2011, Ahn, Korattikara, and Welling, 2012, Korattikara, Chen, and Welling, 2014]
Approximate standard MCMC procedure in a manner that makesuse of only a small subset of datapoints per sampleReduced computational overhead leads to faster sampling andreduced Monte Carlo varianceIntroduces asymptotic bias: target distribution is not stationaryHope that for fixed amount of sampling time, variance reductionwill outweigh bias introduced
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 3 / 33
Motivation: Large-scale Posterior Inference
Template solution: Approximate MCMC with subset posteriors[Welling and Teh, 2011, Ahn, Korattikara, and Welling, 2012, Korattikara, Chen, and Welling, 2014]
Hope that for fixed amount of sampling time, variance reductionwill outweigh bias introduced
Introduces new challenges
How do we compare and evaluate samples from approximateMCMC procedures?
How do we select samplers and their tuning parameters?
How do we quantify the bias-variance trade-off explicitly?
Difficulty: Standard evaluation criteria like effective sample size,trace plots, and variance diagnostics assume convergence to thetarget distribution and do not account for asymptotic bias
This talk: Introduce new quality measures suitable for comparingthe quality of approximate MCMC samples
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 4 / 33
Motivation: Large-scale Posterior Inference
Template solution: Approximate MCMC with subset posteriors[Welling and Teh, 2011, Ahn, Korattikara, and Welling, 2012, Korattikara, Chen, and Welling, 2014]
Hope that for fixed amount of sampling time, variance reductionwill outweigh bias introduced
Introduces new challenges
How do we compare and evaluate samples from approximateMCMC procedures?
How do we select samplers and their tuning parameters?
How do we quantify the bias-variance trade-off explicitly?
Difficulty: Standard evaluation criteria like effective sample size,trace plots, and variance diagnostics assume convergence to thetarget distribution and do not account for asymptotic bias
This talk: Introduce new quality measures suitable for comparingthe quality of approximate MCMC samples
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 4 / 33
Motivation: Large-scale Posterior Inference
Template solution: Approximate MCMC with subset posteriors[Welling and Teh, 2011, Ahn, Korattikara, and Welling, 2012, Korattikara, Chen, and Welling, 2014]
Hope that for fixed amount of sampling time, variance reductionwill outweigh bias introduced
Introduces new challenges
How do we compare and evaluate samples from approximateMCMC procedures?
How do we select samplers and their tuning parameters?
How do we quantify the bias-variance trade-off explicitly?
Difficulty: Standard evaluation criteria like effective sample size,trace plots, and variance diagnostics assume convergence to thetarget distribution and do not account for asymptotic bias
This talk: Introduce new quality measures suitable for comparingthe quality of approximate MCMC samples
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 4 / 33
Motivation: Large-scale Posterior Inference
Template solution: Approximate MCMC with subset posteriors[Welling and Teh, 2011, Ahn, Korattikara, and Welling, 2012, Korattikara, Chen, and Welling, 2014]
Hope that for fixed amount of sampling time, variance reductionwill outweigh bias introduced
Introduces new challenges
How do we compare and evaluate samples from approximateMCMC procedures?
How do we select samplers and their tuning parameters?
How do we quantify the bias-variance trade-off explicitly?
Difficulty: Standard evaluation criteria like effective sample size,trace plots, and variance diagnostics assume convergence to thetarget distribution and do not account for asymptotic bias
This talk: Introduce new quality measures suitable for comparingthe quality of approximate MCMC samples
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 4 / 33
Motivation: Large-scale Posterior Inference
Template solution: Approximate MCMC with subset posteriors[Welling and Teh, 2011, Ahn, Korattikara, and Welling, 2012, Korattikara, Chen, and Welling, 2014]
Hope that for fixed amount of sampling time, variance reductionwill outweigh bias introduced
Introduces new challenges
How do we compare and evaluate samples from approximateMCMC procedures?
How do we select samplers and their tuning parameters?
How do we quantify the bias-variance trade-off explicitly?
Difficulty: Standard evaluation criteria like effective sample size,trace plots, and variance diagnostics assume convergence to thetarget distribution and do not account for asymptotic bias
This talk: Introduce new quality measures suitable for comparingthe quality of approximate MCMC samples
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 4 / 33
Motivation: Large-scale Posterior Inference
Template solution: Approximate MCMC with subset posteriors[Welling and Teh, 2011, Ahn, Korattikara, and Welling, 2012, Korattikara, Chen, and Welling, 2014]
Hope that for fixed amount of sampling time, variance reductionwill outweigh bias introduced
Introduces new challenges
How do we compare and evaluate samples from approximateMCMC procedures?
How do we select samplers and their tuning parameters?
How do we quantify the bias-variance trade-off explicitly?
Difficulty: Standard evaluation criteria like effective sample size,trace plots, and variance diagnostics assume convergence to thetarget distribution and do not account for asymptotic bias
This talk: Introduce new quality measures suitable for comparingthe quality of approximate MCMC samples
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 4 / 33
Quality Measures for Samples
Challenge: Develop measure suitable for comparing the quality ofany two samples approximating a common target distribution
Given
Continuous target distribution P with support X = Rd anddensity p
p known up to normalization, integration under P is intractable
Sample points x1, . . . , xn ∈ XDefine discrete distribution Qn with, for any function h,EQn [h(X)] = 1
n
∑ni=1 h(xi) used to approximate EP [h(Z)]
We make no assumption about the provenance of the xi
Goal: Quantify how well EQn approximates EP in a manner that
I. Detects when a sample sequence is converging to the target
II. Detects when a sample sequence is not converging to the target
III. Is computationally feasible
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 5 / 33
Quality Measures for Samples
Challenge: Develop measure suitable for comparing the quality ofany two samples approximating a common target distribution
Given
Continuous target distribution P with support X = Rd anddensity p
p known up to normalization, integration under P is intractable
Sample points x1, . . . , xn ∈ XDefine discrete distribution Qn with, for any function h,EQn [h(X)] = 1
n
∑ni=1 h(xi) used to approximate EP [h(Z)]
We make no assumption about the provenance of the xi
Goal: Quantify how well EQn approximates EP in a manner that
I. Detects when a sample sequence is converging to the target
II. Detects when a sample sequence is not converging to the target
III. Is computationally feasible
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 5 / 33
Quality Measures for Samples
Challenge: Develop measure suitable for comparing the quality ofany two samples approximating a common target distribution
Given
Continuous target distribution P with support X = Rd anddensity p
p known up to normalization, integration under P is intractable
Sample points x1, . . . , xn ∈ XDefine discrete distribution Qn with, for any function h,EQn [h(X)] = 1
n
∑ni=1 h(xi) used to approximate EP [h(Z)]
We make no assumption about the provenance of the xi
Goal: Quantify how well EQn approximates EP in a manner that
I. Detects when a sample sequence is converging to the target
II. Detects when a sample sequence is not converging to the target
III. Is computationally feasible
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 5 / 33
Quality Measures for Samples
Challenge: Develop measure suitable for comparing the quality ofany two samples approximating a common target distribution
Given
Continuous target distribution P with support X = Rd anddensity p
p known up to normalization, integration under P is intractable
Sample points x1, . . . , xn ∈ XDefine discrete distribution Qn with, for any function h,EQn [h(X)] = 1
n
∑ni=1 h(xi) used to approximate EP [h(Z)]
We make no assumption about the provenance of the xi
Goal: Quantify how well EQn approximates EP
in a manner that
I. Detects when a sample sequence is converging to the target
II. Detects when a sample sequence is not converging to the target
III. Is computationally feasible
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 5 / 33
Quality Measures for Samples
Challenge: Develop measure suitable for comparing the quality ofany two samples approximating a common target distribution
Given
Continuous target distribution P with support X = Rd anddensity p
p known up to normalization, integration under P is intractable
Sample points x1, . . . , xn ∈ XDefine discrete distribution Qn with, for any function h,EQn [h(X)] = 1
n
∑ni=1 h(xi) used to approximate EP [h(Z)]
We make no assumption about the provenance of the xi
Goal: Quantify how well EQn approximates EP in a manner that
I. Detects when a sample sequence is converging to the target
II. Detects when a sample sequence is not converging to the target
III. Is computationally feasible
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 5 / 33
Quality Measures for Samples
Challenge: Develop measure suitable for comparing the quality ofany two samples approximating a common target distribution
Given
Continuous target distribution P with support X = Rd anddensity p
p known up to normalization, integration under P is intractable
Sample points x1, . . . , xn ∈ XDefine discrete distribution Qn with, for any function h,EQn [h(X)] = 1
n
∑ni=1 h(xi) used to approximate EP [h(Z)]
We make no assumption about the provenance of the xi
Goal: Quantify how well EQn approximates EP in a manner that
I. Detects when a sample sequence is converging to the target
II. Detects when a sample sequence is not converging to the target
III. Is computationally feasible
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 5 / 33
Quality Measures for Samples
Challenge: Develop measure suitable for comparing the quality ofany two samples approximating a common target distribution
Given
Continuous target distribution P with support X = Rd anddensity p
p known up to normalization, integration under P is intractable
Sample points x1, . . . , xn ∈ XDefine discrete distribution Qn with, for any function h,EQn [h(X)] = 1
n
∑ni=1 h(xi) used to approximate EP [h(Z)]
We make no assumption about the provenance of the xi
Goal: Quantify how well EQn approximates EP in a manner that
I. Detects when a sample sequence is converging to the target
II. Detects when a sample sequence is not converging to the target
III. Is computationally feasibleMackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 5 / 33
Integral Probability Metrics
Goal: Quantify how well EQn approximates EPIdea: Consider an integral probability metric (IPM) [Muller, 1997]
dH(Qn, P ) = suph∈H|EQn [h(X)]− EP [h(Z)]|
Measures maximum discrepancy between sample and targetexpectations over a class of real-valued test functions HWhen H sufficiently large, convergence of dH(Qn, P ) to zeroimplies (Qn)n≥1 converges weakly to P (Requirement II)
Problem: Integration under P intractable!⇒ Most IPMs cannot be computed in practice
Idea: Only consider functions with EP [h(Z)] known a priori to be 0Then IPM computation only depends on Qn!How do we select this class of test functions?Will the resulting discrepancy measure track sample sequenceconvergence (Requirements I and II)?How do we solve the resulting optimization problem in practice?
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 6 / 33
Integral Probability Metrics
Goal: Quantify how well EQn approximates EPIdea: Consider an integral probability metric (IPM) [Muller, 1997]
dH(Qn, P ) = suph∈H|EQn [h(X)]− EP [h(Z)]|
Measures maximum discrepancy between sample and targetexpectations over a class of real-valued test functions HWhen H sufficiently large, convergence of dH(Qn, P ) to zeroimplies (Qn)n≥1 converges weakly to P (Requirement II)
Problem: Integration under P intractable!⇒ Most IPMs cannot be computed in practice
Idea: Only consider functions with EP [h(Z)] known a priori to be 0Then IPM computation only depends on Qn!How do we select this class of test functions?Will the resulting discrepancy measure track sample sequenceconvergence (Requirements I and II)?How do we solve the resulting optimization problem in practice?
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 6 / 33
Integral Probability Metrics
Goal: Quantify how well EQn approximates EPIdea: Consider an integral probability metric (IPM) [Muller, 1997]
dH(Qn, P ) = suph∈H|EQn [h(X)]− EP [h(Z)]|
Measures maximum discrepancy between sample and targetexpectations over a class of real-valued test functions HWhen H sufficiently large, convergence of dH(Qn, P ) to zeroimplies (Qn)n≥1 converges weakly to P (Requirement II)
Problem: Integration under P intractable!⇒ Most IPMs cannot be computed in practice
Idea: Only consider functions with EP [h(Z)] known a priori to be 0Then IPM computation only depends on Qn!
How do we select this class of test functions?Will the resulting discrepancy measure track sample sequenceconvergence (Requirements I and II)?How do we solve the resulting optimization problem in practice?
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 6 / 33
Integral Probability Metrics
Goal: Quantify how well EQn approximates EPIdea: Consider an integral probability metric (IPM) [Muller, 1997]
dH(Qn, P ) = suph∈H|EQn [h(X)]− EP [h(Z)]|
Measures maximum discrepancy between sample and targetexpectations over a class of real-valued test functions HWhen H sufficiently large, convergence of dH(Qn, P ) to zeroimplies (Qn)n≥1 converges weakly to P (Requirement II)
Problem: Integration under P intractable!⇒ Most IPMs cannot be computed in practice
Idea: Only consider functions with EP [h(Z)] known a priori to be 0Then IPM computation only depends on Qn!How do we select this class of test functions?
Will the resulting discrepancy measure track sample sequenceconvergence (Requirements I and II)?How do we solve the resulting optimization problem in practice?
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 6 / 33
Integral Probability Metrics
Goal: Quantify how well EQn approximates EPIdea: Consider an integral probability metric (IPM) [Muller, 1997]
dH(Qn, P ) = suph∈H|EQn [h(X)]− EP [h(Z)]|
Measures maximum discrepancy between sample and targetexpectations over a class of real-valued test functions HWhen H sufficiently large, convergence of dH(Qn, P ) to zeroimplies (Qn)n≥1 converges weakly to P (Requirement II)
Problem: Integration under P intractable!⇒ Most IPMs cannot be computed in practice
Idea: Only consider functions with EP [h(Z)] known a priori to be 0Then IPM computation only depends on Qn!How do we select this class of test functions?Will the resulting discrepancy measure track sample sequenceconvergence (Requirements I and II)?
How do we solve the resulting optimization problem in practice?
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 6 / 33
Integral Probability Metrics
Goal: Quantify how well EQn approximates EPIdea: Consider an integral probability metric (IPM) [Muller, 1997]
dH(Qn, P ) = suph∈H|EQn [h(X)]− EP [h(Z)]|
Measures maximum discrepancy between sample and targetexpectations over a class of real-valued test functions HWhen H sufficiently large, convergence of dH(Qn, P ) to zeroimplies (Qn)n≥1 converges weakly to P (Requirement II)
Problem: Integration under P intractable!⇒ Most IPMs cannot be computed in practice
Idea: Only consider functions with EP [h(Z)] known a priori to be 0Then IPM computation only depends on Qn!How do we select this class of test functions?Will the resulting discrepancy measure track sample sequenceconvergence (Requirements I and II)?How do we solve the resulting optimization problem in practice?
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 6 / 33
Stein’s Method
Stein’s method [1972] provides a recipe for controlling convergence:1 Identify operator T and set G of functions g : X → Rd with
EP [(T g)(Z)] = 0 for all g ∈ G.
T and G together define the Stein discrepancy [Gorham and Mackey, 2015]
S(Qn,T ,G) , supg∈G|EQn [(T g)(X)]| = dT G(Qn, P ),
an IPM-type measure with no explicit integration under P
2 Lower bound S(Qn,T ,G) by reference IPM dH(Qn, P )⇒ (Qn)n≥1 converges to P whenever S(Qn, T ,G)→ 0 (Req. II)
Performed once, in advance, for large classes of distributions
3 Upper bound S(Qn,T ,G) by any means necessary todemonstrate convergence to 0 (Requirement I)
Standard use: As analytical tool to prove convergenceOur goal: Develop Stein discrepancy into practical quality measure
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 7 / 33
Stein’s Method
Stein’s method [1972] provides a recipe for controlling convergence:1 Identify operator T and set G of functions g : X → Rd with
EP [(T g)(Z)] = 0 for all g ∈ G.T and G together define the Stein discrepancy [Gorham and Mackey, 2015]
S(Qn,T ,G) , supg∈G|EQn [(T g)(X)]| = dT G(Qn, P ),
an IPM-type measure with no explicit integration under P
2 Lower bound S(Qn,T ,G) by reference IPM dH(Qn, P )⇒ (Qn)n≥1 converges to P whenever S(Qn, T ,G)→ 0 (Req. II)
Performed once, in advance, for large classes of distributions
3 Upper bound S(Qn,T ,G) by any means necessary todemonstrate convergence to 0 (Requirement I)
Standard use: As analytical tool to prove convergenceOur goal: Develop Stein discrepancy into practical quality measure
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 7 / 33
Stein’s Method
Stein’s method [1972] provides a recipe for controlling convergence:1 Identify operator T and set G of functions g : X → Rd with
EP [(T g)(Z)] = 0 for all g ∈ G.T and G together define the Stein discrepancy [Gorham and Mackey, 2015]
S(Qn,T ,G) , supg∈G|EQn [(T g)(X)]| = dT G(Qn, P ),
an IPM-type measure with no explicit integration under P
2 Lower bound S(Qn,T ,G) by reference IPM dH(Qn, P )⇒ (Qn)n≥1 converges to P whenever S(Qn, T ,G)→ 0 (Req. II)
Performed once, in advance, for large classes of distributions
3 Upper bound S(Qn,T ,G) by any means necessary todemonstrate convergence to 0 (Requirement I)
Standard use: As analytical tool to prove convergenceOur goal: Develop Stein discrepancy into practical quality measure
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 7 / 33
Stein’s Method
Stein’s method [1972] provides a recipe for controlling convergence:1 Identify operator T and set G of functions g : X → Rd with
EP [(T g)(Z)] = 0 for all g ∈ G.T and G together define the Stein discrepancy [Gorham and Mackey, 2015]
S(Qn,T ,G) , supg∈G|EQn [(T g)(X)]| = dT G(Qn, P ),
an IPM-type measure with no explicit integration under P
2 Lower bound S(Qn,T ,G) by reference IPM dH(Qn, P )⇒ (Qn)n≥1 converges to P whenever S(Qn, T ,G)→ 0 (Req. II)
Performed once, in advance, for large classes of distributions
3 Upper bound S(Qn,T ,G) by any means necessary todemonstrate convergence to 0 (Requirement I)
Standard use: As analytical tool to prove convergenceOur goal: Develop Stein discrepancy into practical quality measure
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 7 / 33
Stein’s Method
Stein’s method [1972] provides a recipe for controlling convergence:1 Identify operator T and set G of functions g : X → Rd with
EP [(T g)(Z)] = 0 for all g ∈ G.T and G together define the Stein discrepancy [Gorham and Mackey, 2015]
S(Qn,T ,G) , supg∈G|EQn [(T g)(X)]| = dT G(Qn, P ),
an IPM-type measure with no explicit integration under P
2 Lower bound S(Qn,T ,G) by reference IPM dH(Qn, P )⇒ (Qn)n≥1 converges to P whenever S(Qn, T ,G)→ 0 (Req. II)
Performed once, in advance, for large classes of distributions
3 Upper bound S(Qn,T ,G) by any means necessary todemonstrate convergence to 0 (Requirement I)
Standard use: As analytical tool to prove convergenceOur goal: Develop Stein discrepancy into practical quality measure
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 7 / 33
Identifying a Stein Operator TGoal: Identify operator T for which EP [(T g)(Z)] = 0 for all g ∈ G
Approach: Generator method of Barbour [1988, 1990], Gotze [1991]
Identify a Markov process (Zt)t≥0 with stationary distribution PUnder mild conditions, its infinitesimal generator
(Au)(x) = limt→0
(E[u(Zt) | Z0 = x]− u(x))/t
satisfies EP [(Au)(Z)] = 0
Overdamped Langevin diffusion: dZt = 12∇ log p(Zt)dt+ dWt
Generator: (APu)(x) = 12〈∇u(x),∇ log p(x)〉+ 1
2〈∇,∇u(x)〉
Stein operator: (TPg)(x) , 〈g(x),∇ log p(x)〉+ 〈∇, g(x)〉[Gorham and Mackey, 2015, Oates, Girolami, and Chopin, 2016]
Depends on P only through ∇ log p; computable even if pcannot be normalized!EP [(TP g)(Z)] = 0 for all g : X → Rd in classical Stein set
G‖·‖ ={g : supx 6=y max
(‖g(x)‖∗, ‖∇g(x)‖∗, ‖∇g(x)−∇g(y)‖
∗
‖x−y‖
)≤ 1}
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 8 / 33
Identifying a Stein Operator TGoal: Identify operator T for which EP [(T g)(Z)] = 0 for all g ∈ GApproach: Generator method of Barbour [1988, 1990], Gotze [1991]
Identify a Markov process (Zt)t≥0 with stationary distribution PUnder mild conditions, its infinitesimal generator
(Au)(x) = limt→0
(E[u(Zt) | Z0 = x]− u(x))/t
satisfies EP [(Au)(Z)] = 0
Overdamped Langevin diffusion: dZt = 12∇ log p(Zt)dt+ dWt
Generator: (APu)(x) = 12〈∇u(x),∇ log p(x)〉+ 1
2〈∇,∇u(x)〉
Stein operator: (TPg)(x) , 〈g(x),∇ log p(x)〉+ 〈∇, g(x)〉[Gorham and Mackey, 2015, Oates, Girolami, and Chopin, 2016]
Depends on P only through ∇ log p; computable even if pcannot be normalized!EP [(TP g)(Z)] = 0 for all g : X → Rd in classical Stein set
G‖·‖ ={g : supx 6=y max
(‖g(x)‖∗, ‖∇g(x)‖∗, ‖∇g(x)−∇g(y)‖
∗
‖x−y‖
)≤ 1}
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 8 / 33
Identifying a Stein Operator TGoal: Identify operator T for which EP [(T g)(Z)] = 0 for all g ∈ GApproach: Generator method of Barbour [1988, 1990], Gotze [1991]
Identify a Markov process (Zt)t≥0 with stationary distribution PUnder mild conditions, its infinitesimal generator
(Au)(x) = limt→0
(E[u(Zt) | Z0 = x]− u(x))/t
satisfies EP [(Au)(Z)] = 0
Overdamped Langevin diffusion: dZt = 12∇ log p(Zt)dt+ dWt
Generator: (APu)(x) = 12〈∇u(x),∇ log p(x)〉+ 1
2〈∇,∇u(x)〉
Stein operator: (TPg)(x) , 〈g(x),∇ log p(x)〉+ 〈∇, g(x)〉[Gorham and Mackey, 2015, Oates, Girolami, and Chopin, 2016]
Depends on P only through ∇ log p; computable even if pcannot be normalized!Multivariate generalization of density method operator(T g)(x) = g(x) ddx log p(x) + g′(x) [Stein, Diaconis, Holmes, and Reinert, 2004]
G‖·‖ ={g : supx 6=y max
(‖g(x)‖∗, ‖∇g(x)‖∗, ‖∇g(x)−∇g(y)‖
∗
‖x−y‖
)≤ 1}
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 8 / 33
Identifying a Stein Operator TGoal: Identify operator T for which EP [(T g)(Z)] = 0 for all g ∈ GApproach: Generator method of Barbour [1988, 1990], Gotze [1991]
Identify a Markov process (Zt)t≥0 with stationary distribution PUnder mild conditions, its infinitesimal generator
(Au)(x) = limt→0
(E[u(Zt) | Z0 = x]− u(x))/t
satisfies EP [(Au)(Z)] = 0
Overdamped Langevin diffusion: dZt = 12∇ log p(Zt)dt+ dWt
Generator: (APu)(x) = 12〈∇u(x),∇ log p(x)〉+ 1
2〈∇,∇u(x)〉
Stein operator: (TPg)(x) , 〈g(x),∇ log p(x)〉+ 〈∇, g(x)〉[Gorham and Mackey, 2015, Oates, Girolami, and Chopin, 2016]
Depends on P only through ∇ log p; computable even if pcannot be normalized!EP [(TP g)(Z)] = 0 for all g : X → Rd in classical Stein set
G‖·‖ ={g : supx 6=y max
(‖g(x)‖∗, ‖∇g(x)‖∗, ‖∇g(x)−∇g(y)‖
∗
‖x−y‖
)≤ 1}
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 8 / 33
Detecting Convergence and Non-convergence
Goal: Show classical Stein discrepancy S(Qn, TP ,G‖·‖)→ 0 ifand only if (Qn)n≥1 converges to P
In the univariate case (d = 1), known that for many targets P ,S(Qn, TP ,G‖·‖)→ 0 only if Wasserstein dW‖·‖(Qn, P )→ 0[Stein, Diaconis, Holmes, and Reinert, 2004, Chatterjee and Shao, 2011, Chen, Goldstein, and Shao, 2011]
Few multivariate targets have been analyzed (see [Reinert and Rollin,
2009, Chatterjee and Meckes, 2008, Meckes, 2009] for multivariate Gaussian)
New contribution [Gorham, Duncan, Vollmer, and Mackey, 2019]
Theorem (Stein Discrepancy-Wasserstein Equivalence)
If the Langevin diffusion couples at an integrable rate and ∇ log p isLipschitz, then S(Qn, TP ,G‖·‖)→ 0 ⇔ dW‖·‖(Qn, P )→ 0.
Examples: strongly log concave P , Bayesian logistic regressionor robust t regression with Gaussian priors, Gaussian mixturesConditions not necessary: template for bounding S(Qn, TP ,G‖·‖)
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 9 / 33
Detecting Convergence and Non-convergence
Goal: Show classical Stein discrepancy S(Qn, TP ,G‖·‖)→ 0 ifand only if (Qn)n≥1 converges to P
In the univariate case (d = 1), known that for many targets P ,S(Qn, TP ,G‖·‖)→ 0 only if Wasserstein dW‖·‖(Qn, P )→ 0[Stein, Diaconis, Holmes, and Reinert, 2004, Chatterjee and Shao, 2011, Chen, Goldstein, and Shao, 2011]
Few multivariate targets have been analyzed (see [Reinert and Rollin,
2009, Chatterjee and Meckes, 2008, Meckes, 2009] for multivariate Gaussian)
New contribution [Gorham, Duncan, Vollmer, and Mackey, 2019]
Theorem (Stein Discrepancy-Wasserstein Equivalence)
If the Langevin diffusion couples at an integrable rate and ∇ log p isLipschitz, then S(Qn, TP ,G‖·‖)→ 0 ⇔ dW‖·‖(Qn, P )→ 0.
Examples: strongly log concave P , Bayesian logistic regressionor robust t regression with Gaussian priors, Gaussian mixturesConditions not necessary: template for bounding S(Qn, TP ,G‖·‖)
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 9 / 33
A Simple Example
●
●●
●
●●
●●●●
●
●
● ●●
●●●
●
●
●●
●●
0.10
0.03
0.01
100 1000 10000Number of sample points, n
Ste
in d
iscr
epan
cy
Gaussian ScaledStudent's t
●
●
●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●
●
●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
−1.0−0.5
0.00.51.0
−1.0−0.5
0.00.51.0
−1.0−0.5
0.00.51.0
n = 300
n = 3000
n = 30000
−6 −3 0 3 6 −6 −3 0 3 6x
g
Gaussian ScaledStudent's t
●
●
●●●●
●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●
●●●●●●●
●
●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
−2−1
012
−2024
−2.50.02.55.0
n = 300
n = 3000
n = 30000
−6 −3 0 3 6 −6 −3 0 3 6x
h=
TP g Sample
● GaussianScaledStudent's t
For target P = N (0, 1), compare i.i.d. N (0, 1) sample sequenceQ1:n to scaled Student’s t sequence Q′1:n with matching variance
Expect S(Q1:n, TP ,G‖·‖,Q,G1)→ 0 & S(Q′1:n, TP ,G‖·‖,Q,G1) 6→ 0
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 10 / 33
A Simple Example
●
●●
●
●●
●●●●
●
●
● ●●
●●●
●
●
●●
●●
0.10
0.03
0.01
100 1000 10000Number of sample points, n
Ste
in d
iscr
epan
cy
Gaussian ScaledStudent's t
●
●
●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●
●
●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
−1.0−0.5
0.00.51.0
−1.0−0.5
0.00.51.0
−1.0−0.5
0.00.51.0
n = 300
n = 3000
n = 30000
−6 −3 0 3 6 −6 −3 0 3 6x
g
Gaussian ScaledStudent's t
●
●
●●●●
●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●
●●●●●●●
●
●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
−2−1
012
−2024
−2.50.02.55.0
n = 300
n = 3000
n = 30000
−6 −3 0 3 6 −6 −3 0 3 6x
h=
TP g Sample
● GaussianScaledStudent's t
Middle: Recovered optimal functions g
Right: Associated test functions h(x) , (TPg)(x) which bestdiscriminate sample Q from target P
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 11 / 33
Selecting Sampler Hyperparameters
●●
●
●
●
●●
●
diagnostic = ESS
diagnostic = Spanner Stein
1.0
1.5
2.0
2.5
1.01.52.02.53.0
1e−04 1e−03 1e−02Step size, ε
Log
med
ian
diag
nost
ic
Step size, ε = 5e−05 Step size, ε = 5e−03 Step size, ε = 5e−02
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●
●
●●●●
●●●●●
● ● ●●●●●●
●●●●●●
●●●●●●●●●●●●●●●
●●●●
●●
●●●●●●●● ●●
●●●●●●
●●●●
●●●●●●●●●● ●●● ●●
●●
●●●●●●●●●
●●
●●●● ●
●●●●●
●●●●
●●●●●●●●●●
●● ● ●●●●
●
●●●●●●●●●●●●● ●●
●●●●●
●● ●●●● ● ●●●●●●●
●●●● ●●
●●●●●
●●●●
●●●●●
●●●●●●●●●
●●● ●●●●●
●●
●●●●
●●● ●
●●●
●● ●●●
●●●●●●●●●
●●●●●●●●
●●●
● ●
●●●●●●●●●●●●●●
●●●
●●●●●●●●● ●●●●●● ●●
●●●●
●●●
●●●●●●●● ●●● ●●●●
●●●●
●●●
●● ●
●●●●
●●
●●●●
●●●●●●
●● ●●● ●●●●
●●●●●
●●●●●●●●●●●●●● ●●●●
●●●●●● ●
● ●●
●●●
●●●● ●●●●
●●
●●●●●●●
● ●●●
●●●●●●●●●●
●●●●●●●●●●●●
●●●●●●●●●●
●●●●●● ●●●
●●●●●●
●●●●●●
●● ●● ● ●●
●●●●●●●
●●●●
●●●●●●
●● ●● ●● ● ● ●●●●●●
●●
●●●●
●●●●●●●●● ●●●●● ●●
●●●●●
● ●●
●●●● ●●●●●●
●●●
●●● ●●●●●●●●
●●●●●●
●●●●●●●● ●
● ●●●●●
●●●●●●●●●●●●●●●●●
●●
●●●●● ●●●●
●●●
●●●●●●●●
●●●●●●●●●●●●●●●●
●●●●●
●●●●●● ●
●●●●●●●●
●●●●●●●●●● ●●●●●●●●●
●●●●●●●
●●● ●●
●●●●●●●
● ●●●●●
●●●●●●●
●●●●●●●●
● ●●●●●●●●●●●●●●●
●●●●●●●●●●●●
●●●●● ●●●●●●
●●
●●●
●●●●●● ●
●●●●●●●●●●●●
●●●●●
●●●●●
●●●
●●●
●
●●●●●●
●●●●●●●
●●●●●●
●●●
●●
● ●●
●●●
●●●●●●●
●●●●
●●●●●● ●
●●●●●●
●● ●●●●
●●●●
●●●●
●●●●●●●●●
●● ●●●●● ●●●●●●●
●●●
●●●●●●
● ●●●●●●●●●●●●● ●●●●●
●●●●●
●●●
●●●●●
●●
●●●
●●●●
●●●●●●●●●
●●●●●
●●●●●●●●●●
●● ● ●●
●●●●
●●●●●
●●● ●●●
● ●●●●
● ●●●●●●●
●●●●●●
●●● ●●● ●●
●●●●
●●●●
●●●●
●●
●●●●●●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●●
●
●
●●
●●
●
●
●●●
●●
●
●
●●
● ●
●
●
●
● ●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●●
●● ●
●
● ●●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
● ●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●●
● ●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●
● ●
●
●
●
●
●●
●
●
● ●●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●●●
●
●●
●
●●
●
● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●●
●●
●
●
●
●●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●
● ●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●●
● ●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●
● ●●
●
●
●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●●
●
●
● ●
●
●
●
●●
●
●
●
●
● ●●
●●
●
●●
●
●
●
●
●
● ●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●● ●
●
●
●
●
●
●
●
●
● ●
●
●●●
●●
●●
● ●
●
●
●
●●
●●
●●
●●
●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●●
● ●
●
●
●
●●●
●
●●
●
● ●
● ●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
● ●
●●
●
●
●
●●
●●
●
●
●
●●
●●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●
● ●●
●
●
● ●●
●● ●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
● ●
●
●●●●●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●
●
−4
−3
−2
−1
0
1
2
3
4
−2 −1 0 1 2 3 −2 −1 0 1 2 3 −2 −1 0 1 2 3x1
x 2
Target posterior density: p(x) ∝ π(x)∏L
l=1 π(yl | x)Stochastic Gradient Langevin Dynamics [Welling and Teh, 2011]
xk+1 ∼ N (xk + ε2(∇ log π(xk) + L
|Bk|∑
l∈Bk ∇ log π(yl|xk)), εI)
Random batch Bk of datapoints used to draw each sample pointStep size ε too small ⇒ slow mixingStep size ε too large ⇒ sampling from very different distributionStandard MCMC selection criteria like effective sample size(ESS) and asymptotic variance do not account for this bias
ESS maximized at ε = 5× 10−2, Stein minimized at ε = 5× 10−3Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 12 / 33
Alternative Stein Sets GGoal: Identify a more “user-friendly” Stein set G than the classical
Approach: Reproducing kernels k : X × X → R [Oates, Girolami, and
Chopin, 2016, Chwialkowski, Strathmann, and Gretton, 2016, Liu, Lee, and Jordan, 2016]
A reproducing kernel k is symmetric (k(x, y) = k(y, x)) andpositive semidefinite (
∑i,l ciclk(zi, zl) ≥ 0,∀zi ∈ X , ci ∈ R)
Gaussian: k(x, y) = e−12‖x−y‖22 , IMQ: k(x, y) = 1
(1+‖x−y‖22)1/2
Generates a reproducing kernel Hilbert space (RKHS) KkDefine the kernel Stein set [Gorham and Mackey, 2017]
Gk , {g = (g1, . . . , gd) | ‖v‖∗ ≤ 1 for vj , ‖gj‖Kk}
Yields closed-form kernel Stein discrepancy (KSD)
S(Qn, TP ,Gk) = ‖w‖ for wj ,√∑n
i,i′=1 kj0(xi, xi′).
Reduces to parallelizable pairwise evaluations of Stein kernels
kj0(x, y) ,1
p(x)p(y)∇xj∇yj (p(x)k(x, y)p(y))
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 13 / 33
Alternative Stein Sets GGoal: Identify a more “user-friendly” Stein set G than the classical
Approach: Reproducing kernels k : X × X → R [Oates, Girolami, and
Chopin, 2016, Chwialkowski, Strathmann, and Gretton, 2016, Liu, Lee, and Jordan, 2016]
A reproducing kernel k is symmetric (k(x, y) = k(y, x)) andpositive semidefinite (
∑i,l ciclk(zi, zl) ≥ 0,∀zi ∈ X , ci ∈ R)
Gaussian: k(x, y) = e−12‖x−y‖22 , IMQ: k(x, y) = 1
(1+‖x−y‖22)1/2
Generates a reproducing kernel Hilbert space (RKHS) Kk
Define the kernel Stein set [Gorham and Mackey, 2017]
Gk , {g = (g1, . . . , gd) | ‖v‖∗ ≤ 1 for vj , ‖gj‖Kk}
Yields closed-form kernel Stein discrepancy (KSD)
S(Qn, TP ,Gk) = ‖w‖ for wj ,√∑n
i,i′=1 kj0(xi, xi′).
Reduces to parallelizable pairwise evaluations of Stein kernels
kj0(x, y) ,1
p(x)p(y)∇xj∇yj (p(x)k(x, y)p(y))
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 13 / 33
Alternative Stein Sets GGoal: Identify a more “user-friendly” Stein set G than the classical
Approach: Reproducing kernels k : X × X → R [Oates, Girolami, and
Chopin, 2016, Chwialkowski, Strathmann, and Gretton, 2016, Liu, Lee, and Jordan, 2016]
A reproducing kernel k is symmetric (k(x, y) = k(y, x)) andpositive semidefinite (
∑i,l ciclk(zi, zl) ≥ 0,∀zi ∈ X , ci ∈ R)
Gaussian: k(x, y) = e−12‖x−y‖22 , IMQ: k(x, y) = 1
(1+‖x−y‖22)1/2
Generates a reproducing kernel Hilbert space (RKHS) KkDefine the kernel Stein set [Gorham and Mackey, 2017]
Gk , {g = (g1, . . . , gd) | ‖v‖∗ ≤ 1 for vj , ‖gj‖Kk}
Yields closed-form kernel Stein discrepancy (KSD)
S(Qn, TP ,Gk) = ‖w‖ for wj ,√∑n
i,i′=1 kj0(xi, xi′).
Reduces to parallelizable pairwise evaluations of Stein kernels
kj0(x, y) ,1
p(x)p(y)∇xj∇yj (p(x)k(x, y)p(y))
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 13 / 33
Alternative Stein Sets GGoal: Identify a more “user-friendly” Stein set G than the classical
Approach: Reproducing kernels k : X × X → R [Oates, Girolami, and
Chopin, 2016, Chwialkowski, Strathmann, and Gretton, 2016, Liu, Lee, and Jordan, 2016]
A reproducing kernel k is symmetric (k(x, y) = k(y, x)) andpositive semidefinite (
∑i,l ciclk(zi, zl) ≥ 0,∀zi ∈ X , ci ∈ R)
Gaussian: k(x, y) = e−12‖x−y‖22 , IMQ: k(x, y) = 1
(1+‖x−y‖22)1/2
Generates a reproducing kernel Hilbert space (RKHS) KkDefine the kernel Stein set [Gorham and Mackey, 2017]
Gk , {g = (g1, . . . , gd) | ‖v‖∗ ≤ 1 for vj , ‖gj‖Kk}
Yields closed-form kernel Stein discrepancy (KSD)
S(Qn, TP ,Gk) = ‖w‖ for wj ,√∑n
i,i′=1 kj0(xi, xi′).
Reduces to parallelizable pairwise evaluations of Stein kernels
kj0(x, y) ,1
p(x)p(y)∇xj∇yj (p(x)k(x, y)p(y))
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 13 / 33
Detecting Non-convergence
Goal: Show (Qn)n≥1 converges to P whenever S(Qn, TP ,Gk)→ 0
Theorem (Univariate KSD detects non-convergence [Gorham and Mackey, 2017])
Suppose P ∈ P and k(x, y) = Φ(x− y) for Φ ∈ C2 with anon-vanishing generalized Fourier transform. If d = 1, then (Qn)n≥1converges weakly to P whenever S(Qn, TP ,Gk)→ 0.
P is the set of targets P with Lipschitz ∇ log p and distantstrong log concavity ( 〈∇ log(p(x)/p(y)),y−x〉
‖x−y‖22≥ k for ‖x− y‖2 ≥ r)
Includes Bayesian logistic and Student’s t regression withGaussian priors, Gaussian mixtures with common covariance, ...
Justifies use of KSD with popular Gaussian, Matern, or inversemultiquadric kernels k in the univariate case
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 14 / 33
Detecting Non-convergence
Goal: Show (Qn)n≥1 converges to P whenever S(Qn, TP ,Gk)→ 0
Theorem (Univariate KSD detects non-convergence [Gorham and Mackey, 2017])
Suppose P ∈ P and k(x, y) = Φ(x− y) for Φ ∈ C2 with anon-vanishing generalized Fourier transform. If d = 1, then (Qn)n≥1converges weakly to P whenever S(Qn, TP ,Gk)→ 0.
P is the set of targets P with Lipschitz ∇ log p and distantstrong log concavity ( 〈∇ log(p(x)/p(y)),y−x〉
‖x−y‖22≥ k for ‖x− y‖2 ≥ r)
Includes Bayesian logistic and Student’s t regression withGaussian priors, Gaussian mixtures with common covariance, ...
Justifies use of KSD with popular Gaussian, Matern, or inversemultiquadric kernels k in the univariate case
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 14 / 33
Detecting Non-convergence
Goal: Show (Qn)n≥1 converges to P whenever S(Qn, TP ,Gk)→ 0
In higher dimensions, KSDs based on common kernels fail todetect non-convergence, even for Gaussian targets P
Theorem (KSD fails with light kernel tails [Gorham and Mackey, 2017])
Suppose d ≥ 3, P = N (0, Id), and α , (12− 1
d)−1. If k(x, y) and its
derivatives decay at a o(‖x− y‖−α2 ) rate as ‖x− y‖2 →∞, thenS(Qn, TP ,Gk)→ 0 for some (Qn)n≥1 not converging to P .
Gaussian (k(x, y) = e−12‖x−y‖22) and Matern kernels fail for d ≥ 3
Inverse multiquadric kernels (k(x, y) = (1 + ‖x− y‖22)β) withβ < −1 fail for d > 2β
1+β
The violating sample sequences (Qn)n≥1 are simple to construct
Problem: Kernels with light tails ignore excess mass in the tails
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 15 / 33
Detecting Non-convergence
Goal: Show (Qn)n≥1 converges to P whenever S(Qn, TP ,Gk)→ 0
In higher dimensions, KSDs based on common kernels fail todetect non-convergence, even for Gaussian targets P
Theorem (KSD fails with light kernel tails [Gorham and Mackey, 2017])
Suppose d ≥ 3, P = N (0, Id), and α , (12− 1
d)−1. If k(x, y) and its
derivatives decay at a o(‖x− y‖−α2 ) rate as ‖x− y‖2 →∞, thenS(Qn, TP ,Gk)→ 0 for some (Qn)n≥1 not converging to P .
Gaussian (k(x, y) = e−12‖x−y‖22) and Matern kernels fail for d ≥ 3
Inverse multiquadric kernels (k(x, y) = (1 + ‖x− y‖22)β) withβ < −1 fail for d > 2β
1+β
The violating sample sequences (Qn)n≥1 are simple to construct
Problem: Kernels with light tails ignore excess mass in the tailsMackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 15 / 33
Detecting Non-convergence
Goal: Show (Qn)n≥1 converges to P whenever S(Qn, TP ,Gk)→ 0
Consider the inverse multiquadric (IMQ) kernel
k(x, y) = (c2 + ‖x− y‖22)β for some β < 0, c ∈ R.
IMQ KSD fails to detect non-convergence when β < −1
However, IMQ KSD detects non-convergence when β ∈ (−1, 0)
Theorem (IMQ KSD detects non-convergence [Gorham and Mackey, 2017])
Suppose P ∈ P and k(x, y) = (c2 + ‖x− y‖22)β for β ∈ (−1, 0). IfS(Qn, TP ,Gk)→ 0, then (Qn)n≥1 converges weakly to P .
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 16 / 33
Detecting Non-convergence
Goal: Show (Qn)n≥1 converges to P whenever S(Qn, TP ,Gk)→ 0
Consider the inverse multiquadric (IMQ) kernel
k(x, y) = (c2 + ‖x− y‖22)β for some β < 0, c ∈ R.
IMQ KSD fails to detect non-convergence when β < −1
However, IMQ KSD detects non-convergence when β ∈ (−1, 0)
Theorem (IMQ KSD detects non-convergence [Gorham and Mackey, 2017])
Suppose P ∈ P and k(x, y) = (c2 + ‖x− y‖22)β for β ∈ (−1, 0). IfS(Qn, TP ,Gk)→ 0, then (Qn)n≥1 converges weakly to P .
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 16 / 33
Detecting Convergence
Goal: Show S(Qn, TP ,Gk)→ 0 whenever (Qn)n≥1 converges to P
Proposition (KSD detects convergence [Gorham and Mackey, 2017])
If k ∈ C(2,2)b and ∇ log p Lipschitz and square integrable under P ,
then S(Qn, TP ,Gk)→ 0 whenever the Wasserstein distancedW‖·‖2 (Qn, P )→ 0.
Covers Gaussian, Matern, IMQ, and other common boundedkernels k
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 17 / 33
Detecting Convergence
Goal: Show S(Qn, TP ,Gk)→ 0 whenever (Qn)n≥1 converges to P
Proposition (KSD detects convergence [Gorham and Mackey, 2017])
If k ∈ C(2,2)b and ∇ log p Lipschitz and square integrable under P ,
then S(Qn, TP ,Gk)→ 0 whenever the Wasserstein distancedW‖·‖2 (Qn, P )→ 0.
Covers Gaussian, Matern, IMQ, and other common boundedkernels k
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 17 / 33
Selecting Samplers
Stochastic Gradient Fisher Scoring (SGFS)[Ahn, Korattikara, and Welling, 2012]
Approximate MCMC procedure designed for scalability
Approximates Metropolis-adjusted Langevin algorithm but doesnot use Metropolis-Hastings correctionTarget P is not stationary distribution
Goal: Choose between two variants
SGFS-f inverts a d× d matrix for each new sample pointSGFS-d inverts a diagonal matrix to reduce sampling time
MNIST handwritten digits [Ahn, Korattikara, and Welling, 2012]
10000 images, 51 features, binary label indicating whetherimage of a 7 or a 9
Bayesian logistic regression posterior P
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 18 / 33
Selecting Samplers●
●
●● ●●
●●●●
● ● ●●●●●●● ● ● ● ●
17500
18000
18500
19000
102 102.5 103 103.5 104 104.5
Number of sample points, n
IMQ
ker
nel S
tein
dis
crep
ancy
Sampler● SGFS−d
SGFS−f
SGFS−d
●●
−0.4
−0.3
−0.2
BE
ST
−0.3 −0.2 −0.1 0.0x7
x 51
SGFS−d
●●
0.80.91.01.11.2 W
OR
ST
−0.5 −0.4 −0.3 −0.2 −0.1 0.0x8
x 42
SGFS−f
●●
−0.1
0.0
0.1
0.2 BE
ST
−0.1 0.0 0.1 0.2x32
x 34
SGFS−f
●●
−0.6
−0.5
−0.4
−0.3 WO
RS
T
−1.5 −1.4 −1.3 −1.2 −1.1x2
x 25
Left: IMQ KSD quality comparison for SGFS Bayesian logisticregression (no surrogate ground truth used)
Right: SGFS sample points (n = 5× 104) with bivariatemarginal means and 95% confidence ellipses (blue) that alignbest and worst with surrogate ground truth sample (red)
Both suggest small speed-up of SGFS-d (0.0017s per sample vs.0.0019s for SGFS-f) outweighed by loss in inferential accuracy
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 19 / 33
Beyond Sample Quality Comparison
Goodness-of-fit testingChwialkowski, Strathmann, and Gretton [2016] used the KSD S(Qn, TP ,Gk)to test whether a sample was drawn from a target distribution P(see also Liu, Lee, and Jordan [2016])Test with default Gaussian kernel k experienced considerable lossof power as the dimension d increased
We recreate their experiment with IMQ kernel (β = −12, c = 1)
For n = 500, generate sample (xi)ni=1 with xi = zi + ui e1
ziiid∼ N (0, Id) and ui
iid∼ Unif[0, 1]. Target P = N (0, Id).Compare with standard normality test of Baringhaus and Henze [1988]
Table: Mean power of multivariate normality tests across 400 simulations
d=2 d=5 d=10 d=15 d=20 d=25B&H 1.0 1.0 1.0 0.91 0.57 0.26
Gaussian 1.0 1.0 0.88 0.29 0.12 0.02IMQ 1.0 1.0 1.0 1.0 1.0 1.0
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 20 / 33
Beyond Sample Quality Comparison
Goodness-of-fit testingChwialkowski, Strathmann, and Gretton [2016] used the KSD S(Qn, TP ,Gk)to test whether a sample was drawn from a target distribution P(see also Liu, Lee, and Jordan [2016])Test with default Gaussian kernel k experienced considerable lossof power as the dimension d increasedWe recreate their experiment with IMQ kernel (β = −1
2, c = 1)
For n = 500, generate sample (xi)ni=1 with xi = zi + ui e1
ziiid∼ N (0, Id) and ui
iid∼ Unif[0, 1]. Target P = N (0, Id).Compare with standard normality test of Baringhaus and Henze [1988]
Table: Mean power of multivariate normality tests across 400 simulations
d=2 d=5 d=10 d=15 d=20 d=25B&H 1.0 1.0 1.0 0.91 0.57 0.26
Gaussian 1.0 1.0 0.88 0.29 0.12 0.02IMQ 1.0 1.0 1.0 1.0 1.0 1.0
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 20 / 33
Beyond Sample Quality Comparison
Improving sample quality
Given sample points (xi)ni=1, can minimize KSD S(Qn, TP ,Gk)
over all weighted samples Qn =∑n
i=1 qn(xi)δxi for qn aprobability mass function
Liu and Lee [2016] do this with Gaussian kernel k(x, y) = e−1h‖x−y‖22
Bandwidth h set to median of the squared Euclidean distancebetween pairs of sample points
We recreate their experiment with the IMQ kernelk(x, y) = (1 + 1
h‖x− y‖22)−1/2
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 21 / 33
Improving Sample Quality
● ● ● ● ●
10−4.5
10−4
10−3.5
10−3
10−2.5
10−2
2 10 50 75 100Dimension, d
Ave
rage
MS
E, |
|EP Z
−E
Qn~ X
|| 22d
Sample● Initial Qn
Gaussian KSD
IMQ KSD
MSE averaged over 500 simulations (±2 standard errors)
Target P = N (0, Id)
Starting sample Qn = 1n
∑ni=1 δxi for xi
iid∼ P , n = 100.
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 22 / 33
Generating High-quality Samples
Stein Variational Gradient Descent (SVGD) [Liu and Wang, 2016]
Uses KSD to repeatedly update locations of n sample points:xi ← xi + ε
n
∑nl=1(k(xl, xi)∇ log p(xl) +∇xlk(xl, xi))
Approximates gradient step in KL divergence (but convergenceis unclear)Simple to implement (but each update costs n2 time)
Stein Points [Chen, Mackey, Gorham, Briol, and Oates, 2018]
Greedily minimizes KSD by constructing Qn = 1n
∑ni=1 δxi with
xn ∈ argminx S(n−1nQn−1 + 1
nδx, TP ,Gk)
= argminx∑d
j=1kj0(x,x)
2+∑n−1
i=1 kj0(xi, x)
Can generate sample sequence from scratch or, under budgetconstraints, optimize existing locations by coordinate descentSends KSD to zero at O(
√log(n)/n) rate
Stein Point MCMC [Chen, Barp, Briol, Gorham, Girolami, Mackey, and Oates, 2019]
Suffices to optimize over iterates of a Markov chain
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 23 / 33
Generating High-quality Samples
Stein Variational Gradient Descent (SVGD) [Liu and Wang, 2016]
Uses KSD to repeatedly update locations of n sample points:xi ← xi + ε
n
∑nl=1(k(xl, xi)∇ log p(xl) +∇xlk(xl, xi))
Approximates gradient step in KL divergence (but convergenceis unclear)Simple to implement (but each update costs n2 time)
Stein Points [Chen, Mackey, Gorham, Briol, and Oates, 2018]
Greedily minimizes KSD by constructing Qn = 1n
∑ni=1 δxi with
xn ∈ argminx S(n−1nQn−1 + 1
nδx, TP ,Gk)
= argminx∑d
j=1kj0(x,x)
2+∑n−1
i=1 kj0(xi, x)
Can generate sample sequence from scratch or, under budgetconstraints, optimize existing locations by coordinate descentSends KSD to zero at O(
√log(n)/n) rate
Stein Point MCMC [Chen, Barp, Briol, Gorham, Girolami, Mackey, and Oates, 2019]
Suffices to optimize over iterates of a Markov chain
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 23 / 33
Generating High-quality Samples
Stein Variational Gradient Descent (SVGD) [Liu and Wang, 2016]
Uses KSD to repeatedly update locations of n sample points:xi ← xi + ε
n
∑nl=1(k(xl, xi)∇ log p(xl) +∇xlk(xl, xi))
Approximates gradient step in KL divergence (but convergenceis unclear)Simple to implement (but each update costs n2 time)
Stein Points [Chen, Mackey, Gorham, Briol, and Oates, 2018]
Greedily minimizes KSD by constructing Qn = 1n
∑ni=1 δxi with
xn ∈ argminx S(n−1nQn−1 + 1
nδx, TP ,Gk)
= argminx∑d
j=1kj0(x,x)
2+∑n−1
i=1 kj0(xi, x)
Can generate sample sequence from scratch or, under budgetconstraints, optimize existing locations by coordinate descentSends KSD to zero at O(
√log(n)/n) rate
Stein Point MCMC [Chen, Barp, Briol, Gorham, Girolami, Mackey, and Oates, 2019]
Suffices to optimize over iterates of a Markov chainMackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 23 / 33
Generating High-quality SamplesSP-MCMCMCMC
Stein Point MCMC [Chen, Barp, Briol, Gorham, Girolami, Mackey, and Oates, 2019]
Greedily minimizes KSD by constructing Qn = 1n
∑ni=1 δxi with
xn ∈ argminx S(n−1nQn−1 + 1
nδx, TP ,Gk)
= argminx∑d
j=1kj0(x,x)
2+∑n−1
i=1 kj0(xi, x)
Suffices to optimize over iterates of a Markov chainMackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 24 / 33
Generating High-quality SamplesGoodwin oscillator: kinetic model of oscillatory enzymatic control
Measure mRNA and protein product (yi)40i=1 at times (ti)
40i=1
yi = g(u(ti)) + εi, εii.i.d.∼ N (0, σ2I)
u(t) = fθ(t, u), u(0) = u0 ∈ R8
P is posterior of log(θ) ∈ R10 given y, t and log Γ(2, 1) priorsEvaluating likelihood requires solving the ODE numerically
4 6 8 10 12log neval
1
2
3
4
5
6
7
8
log
KSD
MALARWMSVGDMEDSPSP-MALA LASTSP-MALA INFLSP-RWM LASTSP-RWM INFL
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 25 / 33
Generating High-quality Samples
2 4 6 8 10 12log n
eval
-11
-10
-9
-8
-7
-6
-5
-4
log
EP
RWMSVGDSP-RWM INFL
IGARCH model of financial time series with time-varying volatilityDaily percentage returns y = (yt)
2000t=1 from S&P 500 modeled as
yt = σtεt, εti.i.d.∼ N (0, 1)
σ2t = θ1 + θ2y
2t−1 + (1− θ2)σ2
t−1P is posterior of θ1 > 0, θ2 ∈ (0, 1) given y and uniform priors
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 26 / 33
Future Directions
Many opportunities for future development1 Improving scalability while maintaining convergence control
Subsampling of likelihood terms in ∇ log p
Inexpensive approximations of kernel matrix
Finite set Stein discrepancies [Jitkrittum, Xu, Szabo, Fukumizu, and Gretton, 2017]:low-rank kernel, linear runtime (but convergence control unclear)Random feature Stein discrepancies [Huggins and Mackey, 2018]:stochastic low-rank kernel, near-linear runtime + high probabilityconvergence control when (Qn)n≥1 moments uniformly bounded
2 Exploring the impact of Stein operator choice
An infinite number of operators T characterize PHow is discrepancy impacted? How do we select the best T ?
Thm: If ∇ log p bounded and k ∈ C(1,1)0 , then
S(Qn, TP ,Gk)→ 0 for some (Qn)n≥1 not converging to PDiffusion Stein operators (T g)(x) = 1
p(x)〈∇, p(x)a(x)g(x)〉 ofGorham, Duncan, Vollmer, and Mackey [2019] may be appropriate for heavy tails
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 27 / 33
Future Directions
Many opportunities for future development1 Improving scalability while maintaining convergence control
Subsampling of likelihood terms in ∇ log pInexpensive approximations of kernel matrix
Finite set Stein discrepancies [Jitkrittum, Xu, Szabo, Fukumizu, and Gretton, 2017]:low-rank kernel, linear runtime (but convergence control unclear)Random feature Stein discrepancies [Huggins and Mackey, 2018]:stochastic low-rank kernel, near-linear runtime + high probabilityconvergence control when (Qn)n≥1 moments uniformly bounded
2 Exploring the impact of Stein operator choice
An infinite number of operators T characterize PHow is discrepancy impacted? How do we select the best T ?
Thm: If ∇ log p bounded and k ∈ C(1,1)0 , then
S(Qn, TP ,Gk)→ 0 for some (Qn)n≥1 not converging to PDiffusion Stein operators (T g)(x) = 1
p(x)〈∇, p(x)a(x)g(x)〉 ofGorham, Duncan, Vollmer, and Mackey [2019] may be appropriate for heavy tails
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 27 / 33
Future Directions
Many opportunities for future development1 Improving scalability while maintaining convergence control
Subsampling of likelihood terms in ∇ log pInexpensive approximations of kernel matrix
Finite set Stein discrepancies [Jitkrittum, Xu, Szabo, Fukumizu, and Gretton, 2017]:low-rank kernel, linear runtime (but convergence control unclear)Random feature Stein discrepancies [Huggins and Mackey, 2018]:stochastic low-rank kernel, near-linear runtime + high probabilityconvergence control when (Qn)n≥1 moments uniformly bounded
2 Exploring the impact of Stein operator choice
An infinite number of operators T characterize PHow is discrepancy impacted? How do we select the best T ?
Thm: If ∇ log p bounded and k ∈ C(1,1)0 , then
S(Qn, TP ,Gk)→ 0 for some (Qn)n≥1 not converging to PDiffusion Stein operators (T g)(x) = 1
p(x)〈∇, p(x)a(x)g(x)〉 ofGorham, Duncan, Vollmer, and Mackey [2019] may be appropriate for heavy tails
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 27 / 33
Future Directions
Many opportunities for future development1 Improving scalability while maintaining convergence control
Subsampling of likelihood terms in ∇ log pInexpensive approximations of kernel matrix
Finite set Stein discrepancies [Jitkrittum, Xu, Szabo, Fukumizu, and Gretton, 2017]:low-rank kernel, linear runtime (but convergence control unclear)Random feature Stein discrepancies [Huggins and Mackey, 2018]:stochastic low-rank kernel, near-linear runtime + high probabilityconvergence control when (Qn)n≥1 moments uniformly bounded
2 Exploring the impact of Stein operator choice
An infinite number of operators T characterize PHow is discrepancy impacted? How do we select the best T ?
Thm: If ∇ log p bounded and k ∈ C(1,1)0 , then
S(Qn, TP ,Gk)→ 0 for some (Qn)n≥1 not converging to P
Diffusion Stein operators (T g)(x) = 1p(x)〈∇, p(x)a(x)g(x)〉 of
Gorham, Duncan, Vollmer, and Mackey [2019] may be appropriate for heavy tails
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 27 / 33
Future Directions
Many opportunities for future development1 Improving scalability while maintaining convergence control
Subsampling of likelihood terms in ∇ log pInexpensive approximations of kernel matrix
Finite set Stein discrepancies [Jitkrittum, Xu, Szabo, Fukumizu, and Gretton, 2017]:low-rank kernel, linear runtime (but convergence control unclear)Random feature Stein discrepancies [Huggins and Mackey, 2018]:stochastic low-rank kernel, near-linear runtime + high probabilityconvergence control when (Qn)n≥1 moments uniformly bounded
2 Exploring the impact of Stein operator choice
An infinite number of operators T characterize PHow is discrepancy impacted? How do we select the best T ?
Thm: If ∇ log p bounded and k ∈ C(1,1)0 , then
S(Qn, TP ,Gk)→ 0 for some (Qn)n≥1 not converging to PDiffusion Stein operators (T g)(x) = 1
p(x)〈∇, p(x)a(x)g(x)〉 ofGorham, Duncan, Vollmer, and Mackey [2019] may be appropriate for heavy tails
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 27 / 33
Future Directions
Many opportunities for future development3 Addressing other inferential tasks
Training generative adversarial networks [Wang and Liu, 2016] andvariational autoencoders [Pu, Gan, Henao, Li, Han, and Carin, 2017]
Non-convex optimization
Example (Optimization with Discretized Diffusions [Erdogdu, Mackey, and Shamir, 2018])
To minimize f(x), choose a(x) < cI with a(x)∇f(x) Lipschitz and
distantly dissipative ( 〈a(x)∇f(x)−a(y)∇f(y),x−y〉‖x−y‖22≥ k for ‖x− y‖2 ≥ r)
Approximate target sequence pn(x) ∝ e−γnf(x) using Markov chainxn+1 ∼ N (xn − εn
2a(xn)∇f(xn) +
εn2γn〈∇, a(xn)〉, εnγna(xn))
Thm: min1≤i≤n Ef(xi)→ minx f(x) (with explicit error bounds)for appropriate εn and γn when ∇f,∇a, and a1/2 are Lipschitz
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 28 / 33
Future Directions
Many opportunities for future development3 Addressing other inferential tasks
Training generative adversarial networks [Wang and Liu, 2016] andvariational autoencoders [Pu, Gan, Henao, Li, Han, and Carin, 2017]
Non-convex optimization
Example (Optimization with Discretized Diffusions [Erdogdu, Mackey, and Shamir, 2018])
To minimize f(x), choose a(x) < cI with a(x)∇f(x) Lipschitz and
distantly dissipative ( 〈a(x)∇f(x)−a(y)∇f(y),x−y〉‖x−y‖22≥ k for ‖x− y‖2 ≥ r)
Approximate target sequence pn(x) ∝ e−γnf(x) using Markov chainxn+1 ∼ N (xn − εn
2a(xn)∇f(xn) +
εn2γn〈∇, a(xn)〉, εnγna(xn))
Thm: min1≤i≤n Ef(xi)→ minx f(x) (with explicit error bounds)for appropriate εn and γn when ∇f,∇a, and a1/2 are Lipschitz
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 28 / 33
Future Directions
Many opportunities for future development3 Addressing other inferential tasks
Training generative adversarial networks [Wang and Liu, 2016] andvariational autoencoders [Pu, Gan, Henao, Li, Han, and Carin, 2017]
Non-convex optimization
Example (Optimization with Discretized Diffusions [Erdogdu, Mackey, and Shamir, 2018])
To minimize f(x), choose a(x) < cI with a(x)∇f(x) Lipschitz and
distantly dissipative ( 〈a(x)∇f(x)−a(y)∇f(y),x−y〉‖x−y‖22≥ k for ‖x− y‖2 ≥ r)
Approximate target sequence pn(x) ∝ e−γnf(x) using Markov chainxn+1 ∼ N (xn − εn
2a(xn)∇f(xn) +
εn2γn〈∇, a(xn)〉, εnγna(xn))
Thm: min1≤i≤n Ef(xi)→ minx f(x) (with explicit error bounds)for appropriate εn and γn when ∇f,∇a, and a1/2 are Lipschitz
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 28 / 33
Future Directions
Many opportunities for future development3 Addressing other inferential tasks
Training generative adversarial networks [Wang and Liu, 2016] andvariational autoencoders [Pu, Gan, Henao, Li, Han, and Carin, 2017]
Non-convex optimization
Example (Optimization with Discretized Diffusions [Erdogdu, Mackey, and Shamir, 2018])
To minimize f(x), choose a(x) < cI with a(x)∇f(x) Lipschitz and
distantly dissipative ( 〈a(x)∇f(x)−a(y)∇f(y),x−y〉‖x−y‖22≥ k for ‖x− y‖2 ≥ r)
Approximate target sequence pn(x) ∝ e−γnf(x) using Markov chainxn+1 ∼ N (xn − εn
2a(xn)∇f(xn) +
εn2γn〈∇, a(xn)〉, εnγna(xn))
Thm: min1≤i≤n Ef(xi)→ minx f(x) (with explicit error bounds)for appropriate εn and γn when ∇f,∇a, and a1/2 are Lipschitz
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 28 / 33
Future Directions
Many opportunities for future development3 Addressing other inferential tasks
Training generative adversarial networks [Wang and Liu, 2016] andvariational autoencoders [Pu, Gan, Henao, Li, Han, and Carin, 2017]
Non-convex optimization [Erdogdu, Mackey, and Shamir, 2018]
minx f(x) = 5 log(1+ 12‖x‖22), a(x) = (1+ 1
2‖x‖22)I, a(x)∇f(x) = 5x
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 29 / 33
Future Directions
Many opportunities for future development1 Improving scalability while maintaining convergence control
Subsampling of likelihood terms in ∇ log pInexpensive approximations of kernel matrix
Finite set Stein discrepancies [Jitkrittum, Xu, Szabo, Fukumizu, and Gretton, 2017]
Random feature Stein discrepancies [Huggins and Mackey, 2018]
2 Exploring the impact of Stein operator choiceAn infinite number of operators T characterize PHow is discrepancy impacted? How do we select the best T ?
Thm: If ∇ log p bounded and k ∈ C(1,1)0 , then
S(Qn, TP ,Gk)→ 0 for some (Qn)n≥1 not converging to PDiffusion Stein operators (T g)(x) = 1
p(x)〈∇, p(x)m(x)g(x)〉 ofGorham, Duncan, Vollmer, and Mackey [2019] may be appropriate for heavy tails
3 Addressing other inferential tasksTraining generative adversarial networks [Wang and Liu, 2016] andvariational autoencoders [Pu, Gan, Henao, Li, Han, and Carin, 2017]
Non-convex optimization [Erdogdu, Mackey, and Shamir, 2018]
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 30 / 33
References IS. Ahn, A. Korattikara, and M. Welling. Bayesian posterior sampling via stochastic gradient Fisher scoring. In Proc. 29th ICML,
ICML’12, 2012.
A. D. Barbour. Stein’s method and Poisson process convergence. J. Appl. Probab., (Special Vol. 25A):175–184, 1988. ISSN0021-9002. A celebration of applied probability.
A. D. Barbour. Stein’s method for diffusion approximations. Probab. Theory Related Fields, 84(3):297–322, 1990. ISSN0178-8051. doi: 10.1007/BF01197887.
L. Baringhaus and N. Henze. A consistent test for multivariate normality based on the empirical characteristic function.Metrika, 35(1):339–348, 1988.
S. Chatterjee and E. Meckes. Multivariate normal approximation using exchangeable pairs. ALEA Lat. Am. J. Probab. Math.Stat., 4:257–283, 2008. ISSN 1980-0436.
S. Chatterjee and Q. Shao. Nonnormal approximation by Stein’s method of exchangeable pairs with application to theCurie-Weiss model. Ann. Appl. Probab., 21(2):464–483, 2011. ISSN 1050-5164. doi: 10.1214/10-AAP712.
L. Chen, L. Goldstein, and Q. Shao. Normal approximation by Stein’s method. Probability and its Applications. Springer,Heidelberg, 2011. ISBN 978-3-642-15006-7. doi: 10.1007/978-3-642-15007-4.
W. Y. Chen, L. Mackey, J. Gorham, F.-X. Briol, and C. Oates. Stein points. In J. Dy and A. Krause, editors, Proceedings of the35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages844–853, Stockholmsmssan, Stockholm Sweden, 10–15 Jul 2018. PMLR.
W. Y. Chen, A. Barp, F.-X. Briol, J. Gorham, M. Girolami, L. Mackey, and C. Oates. Stein point Markov chain Monte Carlo. InK. Chaudhuri and R. Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning,volume 97 of Proceedings of Machine Learning Research, pages 1011–1021, Long Beach, California, USA, 09–15 Jun 2019.PMLR. URL http://proceedings.mlr.press/v97/chen19b.html.
K. Chwialkowski, H. Strathmann, and A. Gretton. A kernel test of goodness of fit. In Proc. 33rd ICML, ICML, 2016.
M. A. Erdogdu, L. Mackey, and O. Shamir. Global non-convex optimization with discretized diffusions. In Advances in NeuralInformation Processing Systems, pages 9694–9703, 2018.
J. Gorham and L. Mackey. Measuring sample quality with Stein’s method. In C. Cortes, N. D. Lawrence, D. D. Lee,M. Sugiyama, and R. Garnett, editors, Adv. NIPS 28, pages 226–234. Curran Associates, Inc., 2015.
J. Gorham and L. Mackey. Measuring sample quality with kernels. In ICML, volume 70 of Proceedings of Machine LearningResearch, pages 1292–1301. PMLR, 2017.
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 31 / 33
References II
J. Gorham, A. B. Duncan, S. J. Vollmer, and L. Mackey. Measuring sample quality with diffusions. Ann. Appl. Probab., 29(5):2884–2928, 10 2019. doi: 10.1214/19-AAP1467. URL https://doi.org/10.1214/19-AAP1467.
F. Gotze. On the rate of convergence in the multivariate CLT. Ann. Probab., 19(2):724–739, 1991.
J. Huggins and L. Mackey. Random feature stein discrepancies. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman,N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31, pages 1903–1913. CurranAssociates, Inc., 2018.
W. Jitkrittum, W. Xu, Z. Szabo, K. Fukumizu, and A. Gretton. A Linear-Time Kernel Goodness-of-Fit Test. In Advances inNeural Information Processing Systems, 2017.
A. Korattikara, Y. Chen, and M. Welling. Austerity in MCMC land: Cutting the Metropolis-Hastings budget. In Proc. of 31stICML, ICML’14, 2014.
Q. Liu and J. Lee. Black-box importance sampling. arXiv:1610.05247, Oct. 2016. To appear in AISTATS 2017.
Q. Liu and D. Wang. Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm. arXiv:1608.04471,Aug. 2016.
Q. Liu, J. Lee, and M. Jordan. A kernelized Stein discrepancy for goodness-of-fit tests. In Proc. of 33rd ICML, volume 48 ofICML, pages 276–284, 2016.
L. Mackey and J. Gorham. Multivariate Stein factors for a class of strongly log-concave distributions. Electron. Commun.Probab., 21:14 pp., 2016. doi: 10.1214/16-ECP15.
E. Meckes. On Stein’s method for multivariate normal approximation. In High dimensional probability V: the Luminy volume,volume 5 of Inst. Math. Stat. Collect., pages 153–178. Inst. Math. Statist., Beachwood, OH, 2009. doi:10.1214/09-IMSCOLL511.
A. Muller. Integral probability metrics and their generating classes of functions. Ann. Appl. Probab., 29(2):pp. 429–443, 1997.
C. J. Oates, M. Girolami, and N. Chopin. Control functionals for Monte Carlo integration. Journal of the Royal StatisticalSociety: Series B (Statistical Methodology), 2016. ISSN 1467-9868. doi: 10.1111/rssb.12185.
Y. Pu, Z. Gan, R. Henao, C. Li, S. Han, and L. Carin. Vae learning via stein variational gradient descent. In Advances in NeuralInformation Processing Systems, pages 4237–4246, 2017.
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 32 / 33
References III
G. Reinert and A. Rollin. Multivariate normal approximation with Stein’s method of exchangeable pairs under a general linearitycondition. Ann. Probab., 37(6):2150–2173, 2009. ISSN 0091-1798. doi: 10.1214/09-AOP467.
C. Stein. A bound for the error in the normal approximation to the distribution of a sum of dependent random variables. InProc. 6th Berkeley Symposium on Mathematical Statistics and Probability (Univ. California, Berkeley, Calif., 1970/1971),Vol. II: Probability theory, pages 583–602. Univ. California Press, Berkeley, Calif., 1972.
C. Stein, P. Diaconis, S. Holmes, and G. Reinert. Use of exchangeable pairs in the analysis of simulations. In Stein’s method:expository lectures and applications, volume 46 of IMS Lecture Notes Monogr. Ser., pages 1–26. Inst. Math. Statist.,Beachwood, OH, 2004.
D. Wang and Q. Liu. Learning to Draw Samples: With Application to Amortized MLE for Generative Adversarial Learning.arXiv:1611.01722, Nov. 2016.
M. Welling and Y. Teh. Bayesian learning via stochastic gradient Langevin dynamics. In ICML, 2011.
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 33 / 33
Comparing Discrepancies
●●
●●
●●●●
●●●●●●
●●●●●●●●
●●
●●●●●●●●
●●●●●●●●
●●●●●●●●
●●●●●●●●●●
●●
●●
●● ●●
●●●●●●●●●●●●
●●●●●● ●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●● ●●●●
i.i.d. from mixturetarget P
i.i.d. from singlemixture component
101 102 103 104 101 102 103 10410−2.5
10−2
10−1.5
10−1
10−0.5
100
Number of sample points, n
Dis
crep
ancy
val
ueDiscrepancy
●
IMQ KSD
Graph Steindiscrepancy
Wasserstein●
●●●●●●●●●
●●●●●
●●●●
●●●●●
●●●●
●
●
●●●●●●
●●●●
●
●
●
●
●●●●●
●
●
●
d = 1 d = 4
101 102 103 104 101 102 103 104
10−3
10−2
10−1
100
101
102
103
Number of sample points, n
Com
puta
tion
time
(sec
)
i.i.d. from mixturetarget P
i.i.d. from singlemixture component
gh
=T
P g
−3 0 3 −3 0 3
0.2
0.4
0.6
0.8
1.0
−1.0
−0.5
0.0
0.5
x
Left: Samples drawn i.i.d. from either the bimodal Gaussianmixture target p(x) ∝ e−
12(x+1.5)2 + e−
12(x−1.5)2 or a single
mixture component.
Right: Discrepancy computation time using d cores in ddimensions.
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 34 / 33
The Importance of Kernel Choice
i.i.d. from target P Off−target sample
●
●●
●●●●●●●
●●
●●●●●●●
●●
●●●●●●●
●
●●
●●●●●●●
●●
●●●●●●●
●●
●●●●●●●
●●
●●●●●●●
●●
●●
●●●●●●
●● ●●●●●●●
● ● ● ●●●●●●●● ● ●●●●●●●
● ● ●●●●●●●
● ● ● ●●●●●●●● ● ●●●●●●●
● ● ●●●●●●●
● ● ● ●●●●●●● ● ● ●●●●●●● ● ● ●●●●●●●
10−2
10−1
100
10−1
100
10−2
10−1
100
Gaussian
Matérn
Inverse Multiquadric
102 103 104 105102 103 104 105
Number of sample points, n
Ker
nel S
tein
dis
crep
ancy
Dimension● d = 5
d = 8
d = 20
Target P = N (0, Id)
Off-target Qn has all‖xi‖2 ≤ 2n1/d log n,‖xi − xj‖2 ≥ 2 log n
Gaussian and MaternKSDs driven to 0 byan off-target sequencethat does not convergeto P
IMQ KSD(β = −1
2, c = 1) does
not have thisdeficiency
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 35 / 33
Selecting Sampler Hyperparameters
Setup [Welling and Teh, 2011]
Consider the posterior distribution P induced by L datapoints yldrawn i.i.d. from a Gaussian mixture likelihood
Yl|Xiid∼ 1
2N (X1, 2) + 1
2N (X1 +X2, 2)
under Gaussian priors on the parameters X ∈ R2
X1 ∼ N (0, 10) ⊥⊥ X2 ∼ N (0, 1)
Draw m = 100 datapoints yl with parameters (x1, x2) = (0, 1)Induces posterior with second mode at (x1, x2) = (1,−1)
For range of parameters ε, run approximate slice sampling for148000 datapoint likelihood evaluations and store resultingposterior sample Qn
Use minimum IMQ KSD (β = −12, c = 1) to select appropriate ε
Compare with standard MCMC parameter selection criterion,effective sample size (ESS), a measure of Markov chainautocorrelationCompute median of diagnostic over 50 random sequences
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 36 / 33
Selecting Samplers
Setup
MNIST handwritten digits [Ahn, Korattikara, and Welling, 2012]
10000 images, 51 features, binary label indicating whetherimage of a 7 or a 9
Bayesian logistic regression posterior P
L independent observations (yl, vl) ∈ {1,−1} × Rd with
P(Yl = 1|vl, X) = 1/(1 + exp(−〈vl, X〉))
Flat improper prior on the parameters X ∈ Rd
Use IMQ KSD (β = −12, c = 1) to compare SGFS-f to SGFS-d
drawing 105 sample points and discarding first half as burn-in
For external support, compare bivariate marginal means and95% confidence ellipses with surrogate ground truth HamiltonianMonte chain with 105 sample points [Ahn, Korattikara, and Welling, 2012]
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 37 / 33
The Importance of Tightness
Goal: Show S(Qn, TP ,Gk)→ 0 only if Qn converges to P
A sequence (Qn)n≥1 is uniformly tight if for every ε > 0, thereis a finite number R(ε) such that supnQn(‖X‖2 > R(ε)) ≤ ε
Intuitively, no mass in the sequence escapes to infinity
Theorem (KSD detects tight non-convergence [Gorham and Mackey, 2017])
Suppose that P ∈ P and k(x, y) = Φ(x− y) for Φ ∈ C2 with anon-vanishing generalized Fourier transform. If (Qn)n≥1 is uniformlytight and S(Qn, TP ,Gk)→ 0, then (Qn)n≥1 converges weakly to P .
Good news, but, ideally, KSD would detect non-tight sequencesautomatically...
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 38 / 33
The Importance of Tightness
Goal: Show S(Qn, TP ,Gk)→ 0 only if Qn converges to P
A sequence (Qn)n≥1 is uniformly tight if for every ε > 0, thereis a finite number R(ε) such that supnQn(‖X‖2 > R(ε)) ≤ ε
Intuitively, no mass in the sequence escapes to infinity
Theorem (KSD detects tight non-convergence [Gorham and Mackey, 2017])
Suppose that P ∈ P and k(x, y) = Φ(x− y) for Φ ∈ C2 with anon-vanishing generalized Fourier transform. If (Qn)n≥1 is uniformlytight and S(Qn, TP ,Gk)→ 0, then (Qn)n≥1 converges weakly to P .
Good news, but, ideally, KSD would detect non-tight sequencesautomatically...
Mackey (MSR) Inference and Learning with Stein’s Method November 6, 2019 38 / 33