generalized inferential models: basics and beyond

40
Generalized inferential models: basics and beyond Ryan Martin 1 North Carolina State University Researchers.One Statistics Seminar Northwestern Polytechnical University, China November 19, 2021 1 www4.stat.ncsu.edu/ ~ rmartin 1 / 40

Upload: others

Post on 11-May-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Generalized inferential models: basics and beyond

Generalized inferential models: basics and beyond

Ryan Martin1

North Carolina State UniversityResearchers.One

Statistics SeminarNorthwestern Polytechnical University, China

November 19, 2021

1www4.stat.ncsu.edu/~rmartin

1 / 40

Page 2: Generalized inferential models: basics and beyond

Introduction

Statistics aims to give reliable/valid uncertainty quantificationabout unknowns based on data, models, etc.

Two dominant schools of thought:

frequentistBayesian

Both have familiar pros and cons...

Do we have to accept the cons? Can’t we just have all pros?

e.g., Efron (2013):

Perhaps the most important unresolved problem in

statistical inference is the use of Bayes theorem in the

absence of prior information.

2 / 40

Page 3: Generalized inferential models: basics and beyond

Introduction, cont.

Chuanhai Liu2 and I developed a pros-focused approach.

Objectives:

data-dependent “probabilities,” without priorscalibration properties to make inference reliable

Our framework: inferential models (IMs).

Some similarities to what Fisher & others did.

Key difference:

reliability requires “probabilities” to be imprecise

2https://www.stat.purdue.edu/people/faculty/chuanhai.html

3 / 40

Page 4: Generalized inferential models: basics and beyond

This talk

Background / inferential models (IMs)

Generalized IMs

easier constructionstill valid

Applications:

meta-analysissurvival analysis

Next-generation generalized IMs...

4 / 40

Page 5: Generalized inferential models: basics and beyond

Inferential models

Observable data: Y in sample space YStatistical model: Y ∼ PY |θ, θ in parameter space Θ

Goal: learn about unknown θ from observed data, y

That is, quantify uncertainty about θ based on y

Bayes, Fisher, and others use probability distributionsDempster & Shafer use “belief functions”

These are special cases of IMs...

5 / 40

Page 6: Generalized inferential models: basics and beyond

IMs, cont.

Mathematically, an IM is a mapping that takes data y to apair of lower and upper probabilities34

Πy (A) = degree of belief in “θ ∈ A”

Πy (A) = degree of plausibility in “θ ∈ A”

→ Probabilities are additive, Πy = Πy .

→ Belief functions, etc. are non-additive, Πy ≤ Πy .

Clearly lots of options, how to choose?

Recommend an IM that’s “statistically reliable”

3Technically, these are super/sub-additive and monotone capacities4Linked via the duality Πy (A) = 1 − Πy (Ac)

6 / 40

Page 7: Generalized inferential models: basics and beyond

IMs, cont.

“Reliable” in what sense?

Basic principle: if ΠY (A) is large, infer A.

Reid & Cox:it is unacceptable if a procedure. . . of representing

uncertain knowledge would, if used repeatedly, give

systematically misleading conclusions.

We don’t want, e.g., ΠY (A) to be large if A is false.

Idea: require that y 7→ Πy (·) satisfy

{θ 6∈ A and Y ∼ PY |θ} =⇒ ΠY (A) tends to be small.

7 / 40

Page 8: Generalized inferential models: basics and beyond

IMs, cont.

Definition.

An IM y 7→ (Πy ,Πy ) is valid if

supθ 6∈A

PY |θ{ΠY (A) > 1− α} ≤ α, ∀ A ⊆ Θ, α ∈ [0, 1]

Validity controls the frequency at which the IM assignsrelatively high beliefs to false assertions.

There’s an equivalent statement in terms of Πy :

supθ∈A

PY |θ{ΠY (A) ≤ α} ≤ α, ∀ A ⊆ Θ, α ∈ [0, 1].

False confidence theorem:5 additive IMs can’t be valid.

5Balch, M., and Ferson, arXiv:1706.085658 / 40

Page 9: Generalized inferential models: basics and beyond

IMs, cont.

Theorem.

If (ΠY ,ΠY ) are valid, then derived procedures control error rates:

“reject H0 : θ ∈ A if Πy (A) ≤ α” is a size α test,

the 100(1− α)% plausibility region {ϑ : Πy ({ϑ}) > α} hascoverage probability ≥ 1− α.

IM validity =⇒ usual frequentist validity

Connection is mutually beneficial:

IMs help with interpretation of frequentist outputcalibration makes IM’s (Πy ,Πy ) real-world relevant

9 / 40

Page 10: Generalized inferential models: basics and beyond

IMs, cont.

How to construct a valid IM?

A. Y = a(θ,U), U ∼ PU .P. Use a random set U to “guess” the

unobserved U.C. Data-dependent random set

Θy (U) =⋃u∈U{ϑ : y = a(ϑ, u)}.

leads to lower and upper probs

Πy (A) = PU{Θy (U) ⊆ A}Πy (A) = PU{Θy (U) ∩ A 6= ∅}.

This is what the book is about!

10 / 40

Page 11: Generalized inferential models: basics and beyond

IMs, cont.

Problems first considered by Fisher:

Scalar Y ∼ PY |θ and scalar θcontinuous distribution function Fθrange of Fθ(y) unconstrained by fixed y or fixed θ.

IM construction:6

A. Y = F−1θ (U), U ∼ PU = Unif(0, 1)P. U = {u ∈ [0, 1] : |u − 0.5| ≤ |Unif(0, 1)− 0.5|}C. Θy (U) = {ϑ : Fϑ(y) ∈ U}, with

πy (ϑ) = PU{Θy (U) 3 ϑ}︸ ︷︷ ︸plausibility contour

= 1− |2Fϑ(y)− 1|.

Lots of examples can be covered by this analysis.

6For original details, see M. and Liu, arXiv:1206.409111 / 40

Page 12: Generalized inferential models: basics and beyond

IMs, cont.

Two examples: Cauchy(θ, 1) and Gamma(θ, 1)

Plots below show the plausibility contour, πy (ϑ).

How this is used?confidence interval {ϑ : πy (ϑ) > α}upper probabilities: Πy (A) = supϑ∈A πy (ϑ)

−30 −20 −10 0 10 20 30

0.0

0.2

0.4

0.6

0.8

1.0

ϑ

φy(ϑ

)

(a) y = 0, Cauchy

0 1 2 3 4 5 6 7

0.0

0.2

0.4

0.6

0.8

1.0

ϑ

πy(ϑ

)

(b) y = 1, gamma

12 / 40

Page 13: Generalized inferential models: basics and beyond

IMs, cont.

Despite IM’s nice features, practical challenges can arise

Basic issue: A-step must determine PY |θChallenges:

efficiency-motivated auxiliary variable dimension reduction7

eliminating nuisance parameters8

(big) requires a fully-specified statistical model...

Formal remedies are difficult to carry out.

Idea: do these dimension-reduction-related tasks less formallybefore starting the IM construction.

Leads to a generalized IM...

7M. and Liu, arXiv:1211.15308M. and Liu, arXiv:1306.3092

13 / 40

Page 14: Generalized inferential models: basics and beyond

Generalized IMs

The idea9 is to connect some function of (Y , θ) to anauxiliary variable with known distribution.

Let Ty ,ϑ be a real-valued function of (y , ϑ).

Good example to keep in mind

Ty ,ϑ =Ly (ϑ)

Ly (θ), relative likelihood.

Note: the value of TY ,θ does not determine (Y , θ).

So, an association in terms of TY ,θ amounts to a “loss ofinformation” in a sense that turns out to be irrelevant.

9M., arXiv:1203.6665 and arXiv:1511.06733

14 / 40

Page 15: Generalized inferential models: basics and beyond

Generalized IMs, cont.

Generalized association:

TY ,θ = F−1θ (U), U ∼ Unif(0, 1)

where Fθ(t) = PY |θ(TY ,θ ≤ t), t ∈ R.

Unlike before, the generalized association doesn’t determinethe distribution of Y — but that’s not important.

Key benefits:

U is a scalar, no dimension reduction needed!ordering in ϑ 7→ Ty ,ϑ suggests a particular random set U .

15 / 40

Page 16: Generalized inferential models: basics and beyond

Generalized IMs, cont.

Generalized IM construction.

A. TY ,θ = F−1θ (U) for U ∼ Unif(0, 1).

P. Introduce a suitable random set U on [0, 1]

C. Combine to get a new random set on Θ:

Θy (U) = {ϑ : Fϑ(Ty ,ϑ) ∈ U}.

For the special case U = [U ′, 1] for U ′ ∼ Unif(0, 1), somesimplification is possible:

πy (ϑ) = PU{Θy (U) 3 ϑ} = Fϑ(Ty ,ϑ), ϑ ∈ Θ.

Immediately gives valid, prior-free probabilistic inferenceacross a wide range of problems!

16 / 40

Page 17: Generalized inferential models: basics and beyond

Generalized IMs, cont.

Simple binomial example.

Left: plot of πy (ϑ) based on (n, y) = (25, 15)

Right: GIM’s and Clopper–Pearson’s coverage probability.

17 / 40

Page 18: Generalized inferential models: basics and beyond

Generalized IMs, cont.

Too good to be true?

Computation of Fθ can be challenging.

Lots of sampling-based methods available for this.

To evaluate πy (ϑ) on a grid:

do separate Monte Carlo on each grid pointMonte Carlo + importance sampling adjustmentsother things...10

Better/general strategies for GIM computation would be aninteresting and welcomed contribution!

10Syring and M., arXiv:2103.0265918 / 40

Page 19: Generalized inferential models: basics and beyond

Generalized IMs, cont.

Often only interested in some feature of θ.

Split θ = (φ, λ), interest in φ.

Now the idea is to connect a function of (Y , φ) to an auxiliaryvariable with known distribution.

Let Ty ,ϕ be a real-valued function of (y , ϕ).

Good example to keep in mind

Ty ,ϕ =Ly (ϕ, λϕ)

Ly (φ, λ), relative profile likelihood.

19 / 40

Page 20: Generalized inferential models: basics and beyond

Generalized IMs, cont.

Generalized association:

TY ,φ = F−1φ,λ(U), U ∼ Unif(0, 1)

where Fφ,λ(t) = PY |φ,λ(TY ,φ ≤ t).

TY ,φ doesn’t directly depend on λ, but its distribution does.

If λ were known, or if the dependence on λ dropped out,11

then this would be exactly like before.

That is, we end up with

“πy (ϕ)” = Fϕ,λ(Ty ,ϕ), ϕ ∈ φ(Θ).

11e.g., bivariate normal with φ the correlation and λ the means and variances20 / 40

Page 21: Generalized inferential models: basics and beyond

Generalized IMs, cont.

Natural idea is to use a plug-in estimate

Define λϕ = arg maxλ Ly (ϕ, λ).

Generalized IM has (plug-in) plausibility contour

πy (ϕ) = Fϕ,λϕ(Ty ,ϕ).

Plug-in means it can’t be exactly valid; but one can usuallyprove asymptotic validity, i.e.,

limn→∞

PY n|φ,λ{πY n(φ) ≤ α} = α.

Open question: Empirically, this convergence is very fast, butis there a built-in “higher-order accuracy”?

21 / 40

Page 22: Generalized inferential models: basics and beyond

Applications

Two recent applications of generalized IMs:

1 Meta-analysis with few studies.12

2 Survival analysis.13

Both involve nuisance parameters and non-trivial computation.

Generalized IM methods outperform existing methods.

Below are some details for each in turn.

12Cahoon and M., arXiv:1910.0053313Cahoon and M., arXiv:1912.00037

22 / 40

Page 23: Generalized inferential models: basics and beyond

Meta-analysis

It’s natural in science for multiple researchers to carry outtheir own analysis related to the same question.

Pool these separate analyses into a “meta-analysis”?

K independent studies produce data (Yk , σ2k)

Yk is estimate of µ from study-k dataσk is the study-k standard error, treated as fixed

Basic model: Yk ∼ N(µ, ν + σ2k), k = 1, . . . ,K .

ν > 0 is the across-study variance, unknown.

Goal is inference on µ.

23 / 40

Page 24: Generalized inferential models: basics and beyond

Meta-analysis, cont.

Estimating µ is easy, valid inference on µ is difficult.

Challenge comes from the nuisance parameter ν.

Asymptotic confidence intervals for µ are available,14 theserequire large K .

I was unsuccessful trying to work out basic IM details for this,but aforementioned generalized IM works.

Take-away messages:

Probabilistic inference on µAsymptotically valid, empirically accurate for small KOutperforms other methods we tried

14DerSimonian & Laird is classic24 / 40

Page 25: Generalized inferential models: basics and beyond

Meta-analysis, cont.

Left: 3 individual plausibility contours & combined

Right: empirical CDF of πY (µ) for K = 5 studies.

25 / 40

Page 26: Generalized inferential models: basics and beyond

Meta-analysis, cont.

Simulation comparison of GIM against competitors.

As functions of K , compare 95% CIs for µ

coverage probability (left)average length (right)

e.g., GIM (black), oracle (green), DL (purple)

26 / 40

Page 27: Generalized inferential models: basics and beyond

Survival analysis

Data may be incomplete in some applications.

e.g., in time-to-event studies, event times may be censored.

Survival analysis deals with such things.

Basic right-censoring model:

Xi ∼ Hφ and Ci ∼ G , with θ = (φ,G )Ti = Xi ∧ Ci , Di = 1(Xi ≤ Ci )

Data Yi = (Ti ,Di ) for i = 1, . . . , n.

Goal is inference on φ

G is an (infinite-dim) nuisance parameter.

27 / 40

Page 28: Generalized inferential models: basics and beyond

Survival analysis, cont.

There’s a likelihood function, hence MLEs

Asymptotic normality of φ can be used for inference.

Bayesian methods are also available.

I was unsuccessful trying to work out basic IM details for this,but aforementioned generalized IM works.

Take-away messages:

Probabilistic inference on φAsymptotically valid, empirically accurate for small nOutperforms other methods we tried

28 / 40

Page 29: Generalized inferential models: basics and beyond

Survival analysis, cont.

Some computational details:

Monte Carlo to evaluate πy (ϕ)To simulate censoring times for Monte Carlo, use plug-inKaplan–Meier G , the nonparametric MLE

Some theoretical details:

challenging to handle the infinite-dim plug-in Gworks because G is root-n consistentempirical results suggest higher-order accuracy...

29 / 40

Page 30: Generalized inferential models: basics and beyond

Survival analysis, cont.

Hφ is Weibull with φ = (shape, α; scale, β)

Compare coverage probability with existing methods

GIM (black), MLE (red), Bayes (green)

Not an extensive simulation...

30 / 40

Page 31: Generalized inferential models: basics and beyond

Survival analysis, cont.

Real-data example:

Chemical concentrations in soilLeft-censored — measuring instrument has limited precisionHφ is log-normal, φ = (µ, σ2)

Plots:

Left: joint plausibility contour for (µ, σ2)Right: derived marginal for ψ = exp(µ+ σ2/2)

31 / 40

Page 32: Generalized inferential models: basics and beyond

Next-gen GIMs

Generalized IMs relaxed the basic IM’s construction by notrequiring the association to determine the model for Y .

But still requires a statistical model.

In machine learning applications, it’s common to work withouta statistical model.

This has been a barrier for IMs and other probabilisticinference frameworks.

Turns out the generalized IM can be generalized even furtherto cover certain no-model cases.

I’ll talk briefly about two such extensions.

32 / 40

Page 33: Generalized inferential models: basics and beyond

No-model inference

Often the quantity of interest isn’t a model parameter.

Common situation: θ = arg minϑ E{`ϑ(Y )}Loss could be squared error, classification error, etc.

Analogue to the relative likelihood

Ty ,ϑ = e−{Ry (ϑ)−Ry (θy )} Rn(ϑ) = 1n

∑ni=1 `ϑ(yi ).

Similar in principle to generalized IM before, but there areseveral new challenges.15

e.g., bootstrap needed to compute distribution of TY ,θ

Asymptotically valid under regularity conditions.

15Cella and M., almost done...33 / 40

Page 34: Generalized inferential models: basics and beyond

No-model inference, cont.

Quantile regression is an important example.

Conditional τ th quantile: Qτ (Y | x) = x>βτ

“Check loss” defines the risk minimization problem.

34 / 40

Page 35: Generalized inferential models: basics and beyond

No-model prediction

Prediction of future observations is important.

Can do model-based IM prediction.16

Model assumptions are restrictive in applications, so whatabout a no-model version?

Recent development using a new type of generalized IM.17

Only assumes exchangeability, provably valid.

Close connections to conformal prediction.18

16M. and Lingham, arXiv:1403.758917Cella and M. , https://researchers.one/articles/20.01.0001018Vovk, Shafer, etc.

35 / 40

Page 36: Generalized inferential models: basics and beyond

No-model prediction, cont.

Latest developments for supervised learning, e.g., regression.19

Very general relationships between response and predictors.

Leads to valid probabilistic inference!

19Cella and M., almost done...36 / 40

Page 37: Generalized inferential models: basics and beyond

Conclusion

IM framework is a promising potential solution to Efron’s“most important unresolved problem”

Necessary breakthrough to get past old B vs. F debates is theimportance of non-additivity/imprecision

I didn’t fully appreciate how fundamental imprecision wasuntil after the book was finished.

So I’ve written quite a bit about this recently...

37 / 40

Page 38: Generalized inferential models: basics and beyond

Conclusion, cont.

Two recent papers about validity and the importance of

imprecision/non-additivity

38 / 40

Page 39: Generalized inferential models: basics and beyond

Conclusion, cont.

Generalized IM framework is also powerful, in some sensemight be the “right” way to do IMs.

I’m excited about the latest developments.

Tons of potential applications!

Open questions/problems:

General computational strategies?Possible higher-order accuracy?Anything lost in the move IM → GIM?Likelihood-based inference is “optimal,” so can GIMs shedlight on “optimal IM constructions”?

39 / 40

Page 40: Generalized inferential models: basics and beyond

The end

Thanks for your attention!

Questions? [email protected]

Links to papers: www4.stat.ncsu.edu/~rmartin/

100% open peer-review & publicationhttps://researchers.one

www.twitter.com/ResearchersOne

40 / 40