challengesinfiducialinference · challengesinfiducialinference partsofthistalkarejointworkwith...

Post on 09-Aug-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Challenges in Fiducial Inference

Parts of this talk are joint work withT. C.M Lee (UC Davis), Randy Lai (U of Maine), H. Iyer (NIST)J. Williams, Y Cui (UNC)

BFF 2017

Jan Hanniga

University of North Carolina at Chapel Hill

aNSF support acknowledged

outline

Outline

Introduction

Definition

Sparsity

Regularization

Conclusions

1

introduction

Outline

Introduction

Definition

Sparsity

Regularization

Conclusions

introduction

Fiducial?

▶ Oxford English Dictionary▶ adjective technical (of a point or line) used as a fixed basis of

comparison.▶ Origin from Latin fiducia ‘trust, confidence’

▶ Merriam-Webster dictionary1. taken as standard of reference a fiducial mark2. founded on faith or trust3. having the nature of a trust : fiduciary

2

introduction

Aim of this talk

▶ Explain the definition of generalized fiducial distribution

▶ Challenge of extra information:▶ Sparsity▶ Regularization

▶ My point of view: frequentist▶ Justified using asymptotic theorems and simulations.▶ GFI tends to work well

3

introduction

Aim of this talk

▶ Explain the definition of generalized fiducial distribution▶ Challenge of extra information:

▶ Sparsity▶ Regularization

▶ My point of view: frequentist▶ Justified using asymptotic theorems and simulations.▶ GFI tends to work well

3

introduction

Aim of this talk

▶ Explain the definition of generalized fiducial distribution▶ Challenge of extra information:

▶ Sparsity▶ Regularization

▶ My point of view: frequentist▶ Justified using asymptotic theorems and simulations.▶ GFI tends to work well

3

definition

Outline

Introduction

Definition

Sparsity

Regularization

Conclusions

definition

Comparison to likelihood

▶ Density is the function f(x, ξ), where ξ is fixed and x isvariable.

▶ Likelihood is the function f(x, ξ), where ξ is variable and x isfixed.

▶ Likelihood as a distribution?

4

definition

Comparison to likelihood

▶ Density is the function f(x, ξ), where ξ is fixed and x isvariable.

▶ Likelihood is the function f(x, ξ), where ξ is variable and x isfixed.

▶ Likelihood as a distribution?

4

definition

Comparison to likelihood

▶ Density is the function f(x, ξ), where ξ is fixed and x isvariable.

▶ Likelihood is the function f(x, ξ), where ξ is variable and x isfixed.

▶ Likelihood as a distribution?

4

definition Formal Defintion

General Definition

▶ Data generating equationX = G(U , ξ).▶ e.g. Xi = µ+ σUi

▶ A distribution on the parameter space is Generalized FiducialDistribution if it can be obtained as a limit (as ε ↓ 0) of

arg minξ

∥x−G(U⋆, ξ)∥ | {minξ

∥x−G(U⋆, ξ)∥ ≤ ε} (1)

▶ Similar to ABC; generating from prior replaced bymin.▶ Is this practicle? Can we compute?

5

definition Formal Defintion

General Definition

▶ Data generating equationX = G(U , ξ).▶ e.g. Xi = µ+ σUi

▶ A distribution on the parameter space is Generalized FiducialDistribution if it can be obtained as a limit (as ε ↓ 0) of

arg minξ

∥x−G(U⋆, ξ)∥ | {minξ

∥x−G(U⋆, ξ)∥ ≤ ε} (1)

▶ Similar to ABC; generating from prior replaced bymin.▶ Is this practicle? Can we compute?

5

definition Formal Defintion

General Definition

▶ Data generating equationX = G(U , ξ).▶ e.g. Xi = µ+ σUi

▶ A distribution on the parameter space is Generalized FiducialDistribution if it can be obtained as a limit (as ε ↓ 0) of

arg minξ

∥x−G(U⋆, ξ)∥ | {minξ

∥x−G(U⋆, ξ)∥ ≤ ε} (1)

▶ Similar to ABC; generating from prior replaced bymin.

▶ Is this practicle? Can we compute?

5

definition Formal Defintion

General Definition

▶ Data generating equationX = G(U , ξ).▶ e.g. Xi = µ+ σUi

▶ A distribution on the parameter space is Generalized FiducialDistribution if it can be obtained as a limit (as ε ↓ 0) of

arg minξ

∥x−G(U⋆, ξ)∥ | {minξ

∥x−G(U⋆, ξ)∥ ≤ ε} (1)

▶ Similar to ABC; generating from prior replaced bymin.▶ Is this practicle? Can we compute?

5

definition Formal Defintion

Explicit limit (1)

▶ AssumeX ∈ Rn is continuous; parameter ξ ∈ Rp

▶ The limit in (1) has density (H, Iyer, Lai & Lee, 2016)

r(ξ|x) = fX(x|ξ)J(x, ξ)∫Ξ fX(x|ξ′)J(x, ξ′) dξ′

,

where J(x, ξ) = D

(ddξG(u, ξ)

∣∣∣u=G−1(x,ξ)

)▶ n = p givesD(A) = | detA|

▶ ∥· ∥2 givesD(A) = (detA⊤A)1/2

Compare to Fraser, Reid, Marras & Yi (2010)

▶ ∥· ∥∞ givesD(A) =∑

i=(i1,...,ip)

|det(A)i|

6

definition Formal Defintion

Explicit limit (1)

▶ AssumeX ∈ Rn is continuous; parameter ξ ∈ Rp

▶ The limit in (1) has density (H, Iyer, Lai & Lee, 2016)

r(ξ|x) = fX(x|ξ)J(x, ξ)∫Ξ fX(x|ξ′)J(x, ξ′) dξ′

,

where J(x, ξ) = D

(ddξG(u, ξ)

∣∣∣u=G−1(x,ξ)

)▶ n = p givesD(A) = | detA|

▶ ∥· ∥2 givesD(A) = (detA⊤A)1/2

Compare to Fraser, Reid, Marras & Yi (2010)

▶ ∥· ∥∞ givesD(A) =∑

i=(i1,...,ip)

|det(A)i|

6

definition Formal Defintion

Example -- Linear Regression

▶ Data generating equation Y = Xβ + σZ

▶ ddθY = (X,Z) and Z = (Y −Xβ)/σ.

▶ The L2 Jacobian is

J(y, β, σ) =

(det

((X,

y −Xβ

σ)⊤(X,

y −Xβ

σ)

))1/2

= σ−1|det(XTX)|1/2(RSS)1/2

▶ Fiducial happens to be same as independence Jeffreys,explicit normalizing constant

7

definition Formal Defintion

Example -- Linear Regression

▶ Data generating equation Y = Xβ + σZ

▶ ddθY = (X,Z) and Z = (Y −Xβ)/σ.

▶ The L2 Jacobian is

J(y, β, σ) =

(det

((X,

y −Xβ

σ)⊤(X,

y −Xβ

σ)

))1/2

= σ−1| det(XTX)|1/2(RSS)1/2

▶ Fiducial happens to be same as independence Jeffreys,explicit normalizing constant

7

definition Formal Defintion

Example -- Linear Regression

▶ Data generating equation Y = Xβ + σZ

▶ ddθY = (X,Z) and Z = (Y −Xβ)/σ.

▶ The L2 Jacobian is

J(y, β, σ) =

(det

((X,

y −Xβ

σ)⊤(X,

y −Xβ

σ)

))1/2

= σ−1| det(XTX)|1/2(RSS)1/2

▶ Fiducial happens to be same as independence Jeffreys,explicit normalizing constant

7

definition Formal Defintion

Example -- Uniform(θ, θ2)

▶ Xi i.i.d. U(θ, θ2), θ > 1

▶ Data generating equationXi = θ + (θ2 − θ)Ui, Ui ∼ U(0, 1).

▶ Compute Jacobian: ddθ [θ + (θ2 − θ)Ui] = 1 + (2θ − 1)Ui, with

Ui =Xi−θθ2−θ

.

▶ Using ∥· ∥∞ we have J(x, θ) = n x̄(2θ−1)−θ2

θ2−θ .

▶ Reference prior π(θ) = eψ( 2θ

2θ−1)(2θ−1)θ(θ−1) Berger, Bernardo &

Sun (2009) – complicated to derive.▶ In simulations fiducial was marginally better than reference

prior which was much better than flat prior.

8

definition Formal Defintion

Example -- Uniform(θ, θ2)

▶ Xi i.i.d. U(θ, θ2), θ > 1▶ Data generating equationXi = θ + (θ2 − θ)Ui, Ui ∼ U(0, 1).

▶ Compute Jacobian: ddθ [θ + (θ2 − θ)Ui] = 1 + (2θ − 1)Ui, with

Ui =Xi−θθ2−θ

.

▶ Using ∥· ∥∞ we have J(x, θ) = n x̄(2θ−1)−θ2

θ2−θ .

▶ Reference prior π(θ) = eψ( 2θ

2θ−1)(2θ−1)θ(θ−1) Berger, Bernardo &

Sun (2009) – complicated to derive.▶ In simulations fiducial was marginally better than reference

prior which was much better than flat prior.

8

definition Formal Defintion

Example -- Uniform(θ, θ2)

▶ Xi i.i.d. U(θ, θ2), θ > 1▶ Data generating equationXi = θ + (θ2 − θ)Ui, Ui ∼ U(0, 1).

▶ Compute Jacobian: ddθ [θ + (θ2 − θ)Ui] = 1 + (2θ − 1)Ui, with

Ui =Xi−θθ2−θ

.

▶ Using ∥· ∥∞ we have J(x, θ) = n x̄(2θ−1)−θ2

θ2−θ .

▶ Reference prior π(θ) = eψ( 2θ

2θ−1)(2θ−1)θ(θ−1) Berger, Bernardo &

Sun (2009) – complicated to derive.▶ In simulations fiducial was marginally better than reference

prior which was much better than flat prior.

8

definition Formal Defintion

Example -- Uniform(θ, θ2)

▶ Xi i.i.d. U(θ, θ2), θ > 1▶ Data generating equationXi = θ + (θ2 − θ)Ui, Ui ∼ U(0, 1).

▶ Compute Jacobian: ddθ [θ + (θ2 − θ)Ui] = 1 + (2θ − 1)Ui, with

Ui =Xi−θθ2−θ

.

▶ Using ∥· ∥∞ we have J(x, θ) = n x̄(2θ−1)−θ2

θ2−θ .

▶ Reference prior π(θ) = eψ( 2θ

2θ−1)(2θ−1)θ(θ−1) Berger, Bernardo &

Sun (2009) – complicated to derive.▶ In simulations fiducial was marginally better than reference

prior which was much better than flat prior.

8

definition Formal Defintion

Example -- Uniform(θ, θ2)

▶ Xi i.i.d. U(θ, θ2), θ > 1▶ Data generating equationXi = θ + (θ2 − θ)Ui, Ui ∼ U(0, 1).

▶ Compute Jacobian: ddθ [θ + (θ2 − θ)Ui] = 1 + (2θ − 1)Ui, with

Ui =Xi−θθ2−θ

.

▶ Using ∥· ∥∞ we have J(x, θ) = n x̄(2θ−1)−θ2

θ2−θ .

▶ Reference prior π(θ) = eψ( 2θ

2θ−1)(2θ−1)θ(θ−1) Berger, Bernardo &

Sun (2009) – complicated to derive.

▶ In simulations fiducial was marginally better than referenceprior which was much better than flat prior.

8

definition Formal Defintion

Example -- Uniform(θ, θ2)

▶ Xi i.i.d. U(θ, θ2), θ > 1▶ Data generating equationXi = θ + (θ2 − θ)Ui, Ui ∼ U(0, 1).

▶ Compute Jacobian: ddθ [θ + (θ2 − θ)Ui] = 1 + (2θ − 1)Ui, with

Ui =Xi−θθ2−θ

.

▶ Using ∥· ∥∞ we have J(x, θ) = n x̄(2θ−1)−θ2

θ2−θ .

▶ Reference prior π(θ) = eψ( 2θ

2θ−1)(2θ−1)θ(θ−1) Berger, Bernardo &

Sun (2009) – complicated to derive.▶ In simulations fiducial was marginally better than reference

prior which was much better than flat prior.

8

definition Formal Defintion

Important Simple Observations

▶ GFD is allways proper

▶ GFD is invariant to re-parametrizations (same as Jeffreys)

▶ GFD is not invariant to smooth transformation of the data ifn > p

▶ Does not satisfy likelihood principle.

9

definition Formal Defintion

Important Simple Observations

▶ GFD is allways proper

▶ GFD is invariant to re-parametrizations (same as Jeffreys)

▶ GFD is not invariant to smooth transformation of the data ifn > p

▶ Does not satisfy likelihood principle.

9

definition Formal Defintion

Important Simple Observations

▶ GFD is allways proper

▶ GFD is invariant to re-parametrizations (same as Jeffreys)

▶ GFD is not invariant to smooth transformation of the data ifn > p

▶ Does not satisfy likelihood principle.

9

definition Formal Defintion

Important Simple Observations

▶ GFD is allways proper

▶ GFD is invariant to re-parametrizations (same as Jeffreys)

▶ GFD is not invariant to smooth transformation of the data ifn > p

▶ Does not satisfy likelihood principle.

9

definition Asymptotic results

Various Asymptotic Results

r(ξ|x) ∝ fX(x|ξ)J(x, ξ)where J(x, ξ) = D

(ddξ

G(u, ξ)∣∣∣u=G−1(x,ξ)

)

▶ Most start with C−1n J(x, ξ) → J(ξ0, ξ)

▶ Bernstein-von Mises theorem for fiducial distributionsprovides asymptotic correctness of fiducial CIs H (2009,2013), Sonderegger & H (2013) .

▶ Consistency of model selection H & Lee (2009), Lai, H & Lee(2015), H, Iyer, Lai & Lee (2016).

▶ Regular higher order asymptotics in Pal Majumdar & H(2016+).

10

sparsity

Outline

Introduction

Definition

Sparsity

Regularization

Conclusions

sparsity

Model Selection

▶ X = G(M, ξM ,U), M ∈ M, ξM ∈ ξM

Theorem: (H, Iyer, Lai, Lee 2016) Under assumptions

r(M |y) ∝ q|M |∫ξM

fM (y, ξM )JM (y, ξM ) dξM

▶ Need for penalty – in fiducial framework additional equations0 = Pk, k = 1, . . . ,min(|M |, n)

▶ Default value q = n−1/2 (motivated by MDL)

11

sparsity

Model Selection

▶ X = G(M, ξM ,U), M ∈ M, ξM ∈ ξM

Theorem: (H, Iyer, Lai, Lee 2016) Under assumptions

r(M |y) ∝ q|M |∫ξM

fM (y, ξM )JM (y, ξM ) dξM

▶ Need for penalty – in fiducial framework additional equations0 = Pk, k = 1, . . . ,min(|M |, n)

▶ Default value q = n−1/2 (motivated by MDL)

11

sparsity

Model Selection

▶ X = G(M, ξM ,U), M ∈ M, ξM ∈ ξM

Theorem: (H, Iyer, Lai, Lee 2016) Under assumptions

r(M |y) ∝ q|M |∫ξM

fM (y, ξM )JM (y, ξM ) dξM

▶ Need for penalty – in fiducial framework additional equations0 = Pk, k = 1, . . . ,min(|M |, n)

▶ Default value q = n−1/2 (motivated by MDL)

11

sparsity

Alternative to penalty

▶ Penalty is used to discourage models with many parameters

▶ Real issue: Not too many parameters but a smaller modelcan do almost the same job

r(M |y) ∝∫ξM

fM (y, ξM )JM (y, ξM )hM (ξM ) dξM ,

hM (ξM ) =

{0 a smaller model predicts nearly as well

1 otherwise

▶ Motivated by non-local priors of Johnson & Rossell (2009)

12

sparsity

Alternative to penalty

▶ Penalty is used to discourage models with many parameters▶ Real issue: Not too many parameters but a smaller model

can do almost the same job

r(M |y) ∝∫ξM

fM (y, ξM )JM (y, ξM )hM (ξM ) dξM ,

hM (ξM ) =

{0 a smaller model predicts nearly as well

1 otherwise

▶ Motivated by non-local priors of Johnson & Rossell (2009)

12

sparsity

Alternative to penalty

▶ Penalty is used to discourage models with many parameters▶ Real issue: Not too many parameters but a smaller model

can do almost the same job

r(M |y) ∝∫ξM

fM (y, ξM )JM (y, ξM )hM (ξM ) dξM ,

hM (ξM ) =

{0 a smaller model predicts nearly as well

1 otherwise

▶ Motivated by non-local priors of Johnson & Rossell (2009)

12

sparsity

Alternative to penalty

▶ Penalty is used to discourage models with many parameters▶ Real issue: Not too many parameters but a smaller model

can do almost the same job

r(M |y) ∝∫ξM

fM (y, ξM )JM (y, ξM )hM (ξM ) dξM ,

hM (ξM ) =

{0 a smaller model predicts nearly as well

1 otherwise

▶ Motivated by non-local priors of Johnson & Rossell (2009)

12

sparsity

Regression

▶ Y = Xβ + σZ

▶ First idea hM (βM ) = I{|βi|>ϵ, i∈M} – issue: collinearity

▶ Better:

hM (βM ) := I{ 12∥XT (XMβM−Xbmin)∥22≥ε(n,|M |)}

where bmin solves

minb∈Rp

1

2∥XT (XMβM −Xb)∥22 subject to ∥b∥0 ≤ |M | − 1.

▶ algorithm – Bertsimas et al (2016)▶ similar to Dantzig selector Candes & Tao (2007)

different norm and target

13

sparsity

Regression

▶ Y = Xβ + σZ

▶ First idea hM (βM ) = I{|βi|>ϵ, i∈M} – issue: collinearity▶ Better:

hM (βM ) := I{ 12∥XT (XMβM−Xbmin)∥22≥ε(n,|M |)}

where bmin solves

minb∈Rp

1

2∥XT (XMβM −Xb)∥22 subject to ∥b∥0 ≤ |M | − 1.

▶ algorithm – Bertsimas et al (2016)

▶ similar to Dantzig selector Candes & Tao (2007)different norm and target

13

sparsity

Regression

▶ Y = Xβ + σZ

▶ First idea hM (βM ) = I{|βi|>ϵ, i∈M} – issue: collinearity▶ Better:

hM (βM ) := I{ 12∥XT (XMβM−Xbmin)∥22≥ε(n,|M |)}

where bmin solves

minb∈Rp

1

2∥XT (XMβM −Xb)∥22 subject to ∥b∥0 ≤ |M | − 1.

▶ algorithm – Bertsimas et al (2016)▶ similar to Dantzig selector Candes & Tao (2007)

different norm and target

13

sparsity

GFD

r(M |y) ∝ π|M|2 Γ

(n− |M |2

)RSS

−(n−|M|−1

2)

M E[hεM (β⋆M )]

Observations:

▶ Expectation with respect to within model GFD (usual T)

▶ r(M |y) negligibly small for large models because of h,e.g., |M | > n implies r(M |y) = 0.

▶ Implemented using Grouped Independence MetropolisHastings (Andrieu & Roberts, 2009).

14

sparsity

GFD

r(M |y) ∝ π|M|2 Γ

(n− |M |2

)RSS

−(n−|M|−1

2)

M E[hεM (β⋆M )]

Observations:

▶ Expectation with respect to within model GFD (usual T)▶ r(M |y) negligibly small for large models because of h,

e.g., |M | > n implies r(M |y) = 0.

▶ Implemented using Grouped Independence MetropolisHastings (Andrieu & Roberts, 2009).

14

sparsity

GFD

r(M |y) ∝ π|M|2 Γ

(n− |M |2

)RSS

−(n−|M|−1

2)

M E[hεM (β⋆M )]

Observations:

▶ Expectation with respect to within model GFD (usual T)▶ r(M |y) negligibly small for large models because of h,

e.g., |M | > n implies r(M |y) = 0.▶ Implemented using Grouped Independence Metropolis

Hastings (Andrieu & Roberts, 2009).

14

sparsity

Main Result

TheoremWilliams & H (2017+)Suppose the true model is given byMT . Then under certainconditions, for a fixed positive constant α < 1,

r(MT |y) =r(MT |y)∑nα

j=1

∑M :|M |=j r(M |y)

P−→ 1 as n, p → ∞.

15

sparsity

Some Conditions

▶ Number of Predictors: lim infn→∞p→∞

n1−α

log(p) > 2,

▶ For the true model/parameter pT < log nγ

εMT(n, p) ≤ 1

18∥XT (µT −Xbmin)∥22

where bmin minimizes the norm subject to ∥b∥0 ≤ pT − 1.▶ For a large model |M | > pT and large enough n or p,

9

2∥XT (HM −HM(−1))µT ∥22 < εM (n, p),

whereHM andHM(−1) are the projection matrix forM andM with a covariate removed respectively.

16

sparsity

Some Conditions

▶ Number of Predictors: lim infn→∞p→∞

n1−α

log(p) > 2,

▶ For the true model/parameter pT < log nγ

εMT(n, p) ≤ 1

18∥XT (µT −Xbmin)∥22

where bmin minimizes the norm subject to ∥b∥0 ≤ pT − 1.

▶ For a large model |M | > pT and large enough n or p,

9

2∥XT (HM −HM(−1))µT ∥22 < εM (n, p),

whereHM andHM(−1) are the projection matrix forM andM with a covariate removed respectively.

16

sparsity

Some Conditions

▶ Number of Predictors: lim infn→∞p→∞

n1−α

log(p) > 2,

▶ For the true model/parameter pT < log nγ

εMT(n, p) ≤ 1

18∥XT (µT −Xbmin)∥22

where bmin minimizes the norm subject to ∥b∥0 ≤ pT − 1.▶ For a large model |M | > pT and large enough n or p,

9

2∥XT (HM −HM(−1))µT ∥22 < εM (n, p),

whereHM andHM(−1) are the projection matrix forM andM with a covariate removed respectively.

16

sparsity

Simulation

▶ Setup from Rockova & George (2015)▶ n = 100, p = 1000, pT = 8.▶ Columns ofX either a) independent or b) correlated with

ρ = 0.6

▶ εM (n, p) = ΛM σ̂2M

(n0.51

9 + |M | log(pπ)1.1

9 − log(n)γ)+with

γ = 1.45.

17

sparsity

Highlight of simulation results

▶ See Jon Williams’ poster for details on theory and simulation

▶ WhenX independent – usually select the correct model▶ WhenX correlated – usually select too small of a model

▶ Conditions of Theorem violated▶ based on conditions: p decreased to 500 to satisfy,

performance improves.

18

sparsity

Highlight of simulation results

▶ See Jon Williams’ poster for details on theory and simulation▶ WhenX independent – usually select the correct model▶ WhenX correlated – usually select too small of a model

▶ Conditions of Theorem violated▶ based on conditions: p decreased to 500 to satisfy,

performance improves.

18

sparsity

Highlight of simulation results

▶ See Jon Williams’ poster for details on theory and simulation▶ WhenX independent – usually select the correct model▶ WhenX correlated – usually select too small of a model

▶ Conditions of Theorem violated

▶ based on conditions: p decreased to 500 to satisfy,performance improves.

18

sparsity

Highlight of simulation results

▶ See Jon Williams’ poster for details on theory and simulation▶ WhenX independent – usually select the correct model▶ WhenX correlated – usually select too small of a model

▶ Conditions of Theorem violated▶ based on conditions: p decreased to 500 to satisfy,

performance improves.

18

sparsity

Comments

▶ Standardized way of measuring closeness in other models?▶ What if small model not the right target, e.g., gene

interactions?

19

sparsity

Comments

▶ Standardized way of measuring closeness in other models?

▶ What if small model not the right target, e.g., geneinteractions?

19

sparsity

Comments

▶ Standardized way of measuring closeness in other models?▶ What if small model not the right target, e.g., gene

interactions?

19

regularization

Outline

Introduction

Definition

Sparsity

Regularization

Conclusions

regularization

Recall

▶ A distribution on the parameter space is Generalized FiducialDistribution if it can be obtained as a limit (as ε ↓ 0) of

arg minξ

∥x−G(U⋆, ξ)∥ | {minξ

∥x−G(U⋆, ξ)∥ ≤ ε}

▶ Conditioning U⋆ on {x = G(U⋆, ξ)}– “regularization by model”

20

regularization

Recall

▶ A distribution on the parameter space is Generalized FiducialDistribution if it can be obtained as a limit (as ε ↓ 0) of

arg minξ

∥x−G(U⋆, ξ)∥ | {minξ

∥x−G(U⋆, ξ)∥ ≤ ε}

▶ Conditioning U⋆ on {x = G(U⋆, ξ)}– “regularization by model”

20

regularization

Most general iid model

▶ Data generating equation:

Xi = F−1(Ui), Ui, i.i.d. Uniform(0,1)

▶ Inverting (solving for F ) we get

F ∗(x−i ) ≤ U∗i ≤ F ∗(xi).

There is a solution iff order of U∗i matches order of xi.

21

regularization

Most general iid model

▶ Data generating equation:

Xi = F−1(Ui), Ui, i.i.d. Uniform(0,1)

▶ Inverting (solving for F ) we get

F ∗(x−i ) ≤ U∗i ≤ F ∗(xi).

There is a solution iff order of U∗i matches order of xi.

21

regularization

Most general iid model

▶ Data generating equation:

Xi = F−1(Ui), Ui, i.i.d. Uniform(0,1)

▶ Inverting (solving for F ) we get

F ∗(x−i ) ≤ U∗i ≤ F ∗(xi).

There is a solution iff order of U∗i matches order of xi.

x x x x• •

••

0

1

U⋆i

xi

21

regularization

Most general iid model

▶ Data generating equation:

Xi = F−1(Ui), Ui, i.i.d. Uniform(0,1)

▶ Inverting (solving for F ) we get

F ∗(x−i ) ≤ U∗i ≤ F ∗(xi).

There is a solution iff order of U∗i matches order of xi.

x x x x0

1

F ∗

xi lower upper

• •

••

21

regularization

Most general iid model

▶ Data generating equation:

Xi = F−1(Ui), Ui, i.i.d. Uniform(0,1)

▶ Inverting (solving for F ) we get

F ∗(x−i ) ≤ U∗i ≤ F ∗(xi).

There is a solution iff order of U∗i matches order of xi.

x x x x0

1

F ∗

xi lower upper

• •

••

▶ See Yifan Cui’s poster for extension to censored data.21

regularization

Additional Constraints

▶ Location scale family with known density f(x) and cdf F (x),e.g.,N(µ, σ2).

▶ Condition U∗i on existence µ∗, σ∗ so that

F (σ∗−1(xi − µ∗)) = U∗i , for all i

x x x x0

1

F ∗

xiN(4.5, 32) lower upper

• •

••

▶ GFD is r(µ, σ) ∝ σ−1∏n

i=1 σ−1f

(σ−1(xi − µ)

)

22

regularization

Additional Constraints

▶ Location scale family with known density f(x) and cdf F (x),e.g.,N(µ, σ2).

▶ Condition U∗i on existence µ∗, σ∗ so that

F (σ∗−1(xi − µ∗)) = U∗i , for all i

x x x x0

1

F ∗

xiN(4.5, 32) lower upper

• •

••

▶ GFD is r(µ, σ) ∝ σ−1∏n

i=1 σ−1f

(σ−1(xi − µ)

)

22

regularization

Additional Constraints

▶ Location scale family with known density f(x) and cdf F (x),e.g.,N(µ, σ2).

▶ Condition U∗i on existence µ∗, σ∗ so that

F (σ∗−1(xi − µ∗)) = U∗i , for all i

x x x x0

1

F ∗

xiN(4.5, 32) lower upper

• •

••

▶ GFD is r(µ, σ) ∝ σ−1∏n

i=1 σ−1f

(σ−1(xi − µ)

)22

regularization

Constraint complications

Toy example: X = µ+ Z, µ > 0.

▶ Option 1: condition Z⋆|x− Z⋆ > 0

▶ r(µ) = φ(x−µ)Φ(x) I{µ>0}

▶ Lower confidence bounds do not have correct coverage.

▶ Option 2: projection to µ > 0▶ r(µ) = (1− Φ(x))I{0} + φ(x− µ)I{µ>0}▶ Correct coverage; possible to get {0} as CI – sure bet against

▶ Option 3: mixture▶ r(µ) = min( 12 , 1− Φ(x)))I{0} +max( 1

2Φ(x) , 1)φ(x− µ)I{µ>0}▶ Correct/conservative coverage, no {0} for reasonable α CIs.

23

regularization

Constraint complications

Toy example: X = µ+ Z, µ > 0.

▶ Option 1: condition Z⋆|x− Z⋆ > 0

▶ r(µ) = φ(x−µ)Φ(x) I{µ>0}

▶ Lower confidence bounds do not have correct coverage.

▶ Option 2: projection to µ > 0▶ r(µ) = (1− Φ(x))I{0} + φ(x− µ)I{µ>0}▶ Correct coverage; possible to get {0} as CI – sure bet against

▶ Option 3: mixture▶ r(µ) = min( 12 , 1− Φ(x)))I{0} +max( 1

2Φ(x) , 1)φ(x− µ)I{µ>0}▶ Correct/conservative coverage, no {0} for reasonable α CIs.

23

regularization

Constraint complications

Toy example: X = µ+ Z, µ > 0.

▶ Option 1: condition Z⋆|x− Z⋆ > 0

▶ r(µ) = φ(x−µ)Φ(x) I{µ>0}

▶ Lower confidence bounds do not have correct coverage.

▶ Option 2: projection to µ > 0▶ r(µ) = (1− Φ(x))I{0} + φ(x− µ)I{µ>0}▶ Correct coverage; possible to get {0} as CI – sure bet against

▶ Option 3: mixture▶ r(µ) = min( 12 , 1− Φ(x)))I{0} +max( 1

2Φ(x) , 1)φ(x− µ)I{µ>0}▶ Correct/conservative coverage, no {0} for reasonable α CIs.

23

regularization

Constraint complications

Toy example: X = µ+ Z, µ > 0.

▶ Option 1: condition Z⋆|x− Z⋆ > 0

▶ r(µ) = φ(x−µ)Φ(x) I{µ>0}

▶ Lower confidence bounds do not have correct coverage.

▶ Option 2: projection to µ > 0▶ r(µ) = (1− Φ(x))I{0} + φ(x− µ)I{µ>0}▶ Correct coverage; possible to get {0} as CI – sure bet against

▶ Option 3: mixture▶ r(µ) = min( 12 , 1− Φ(x)))I{0} +max( 1

2Φ(x) , 1)φ(x− µ)I{µ>0}▶ Correct/conservative coverage, no {0} for reasonable α CIs.

23

regularization

Shape restrictions - preliminary results

▶ Example: Positive iid data with concave cdf(MLE is the Grenander estimator)

▶ Condition U⋆ on concave solution (Gibbs sampler)▶ Project unrestricted GFD to space of concave functions

(quadratic program)

24

regularization

Shape restrictions - preliminary results

▶ Example: Positive iid data with concave cdf(MLE is the Grenander estimator)

▶ Condition U⋆ on concave solution (Gibbs sampler)

▶ Project unrestricted GFD to space of concave functions(quadratic program)

24

regularization

Shape restrictions - preliminary results

▶ Example: Positive iid data with concave cdf(MLE is the Grenander estimator)

▶ Condition U⋆ on concave solution (Gibbs sampler)▶ Project unrestricted GFD to space of concave functions

(quadratic program)

24

regularization

Shape restrictions - preliminary results

▶ Example: Positive iid data with concave cdf(MLE is the Grenander estimator)

▶ Condition U⋆ on concave solution (Gibbs sampler)▶ Project unrestricted GFD to space of concave functions

(quadratic program)

Condition Projection

24

regularization

Shape restrictions - preliminary results

▶ Example: Positive iid data with concave cdf(MLE is the Grenander estimator)

▶ Condition U⋆ on concave solution (Gibbs sampler)▶ Project unrestricted GFD to space of concave functions

(quadratic program)

Condition Projection

24

regularization

Comments

▶ When to use conditioning vs. projection?

▶ Connection to ancillarity and IM.▶ Computational cost a consideration?

25

regularization

Comments

▶ When to use conditioning vs. projection?▶ Connection to ancillarity and IM.

▶ Computational cost a consideration?

25

regularization

Comments

▶ When to use conditioning vs. projection?▶ Connection to ancillarity and IM.▶ Computational cost a consideration?

25

conclusions

Outline

Introduction

Definition

Sparsity

Regularization

Conclusions

conclusions

Fiducial Future

▶ What is it that we provide?▶ GFI: General purpose method that often works well

▶ Computational convenience and efficiency▶ Fiducial options in software.

▶ Theoretical guarantees▶ Applications

▶ The proof is in the pudding

26

conclusions

Fiducial Future

▶ What is it that we provide?▶ GFI: General purpose method that often works well

▶ Computational convenience and efficiency▶ Fiducial options in software.

▶ Theoretical guarantees▶ Applications

▶ The proof is in the pudding

26

conclusions

Fiducial Future

▶ What is it that we provide?▶ GFI: General purpose method that often works well

▶ Computational convenience and efficiency▶ Fiducial options in software.

▶ Theoretical guarantees▶ Applications

▶ The proof is in the pudding

26

conclusions

Fiducial Future

▶ What is it that we provide?▶ GFI: General purpose method that often works well

▶ Computational convenience and efficiency▶ Fiducial options in software.

▶ Theoretical guarantees

▶ Applications▶ The proof is in the pudding

26

conclusions

Fiducial Future

▶ What is it that we provide?▶ GFI: General purpose method that often works well

▶ Computational convenience and efficiency▶ Fiducial options in software.

▶ Theoretical guarantees▶ Applications

▶ The proof is in the pudding

26

conclusions

List of successful applications

▶ General Linear Mixed Models E, H & Iyer (2008); Cissewski &H (2012)

▶ Confidence sets for wavelet regression H & Lee (2009) andfree knot splines Sonderegger & H (2014)

▶ Extreme value data (Generalized Pareto), Maximum mean,and model comparison Wandler & H (2011, 2012ab)

▶ Uncertainty quantification for ultra high dimensionalregression Lai, H & Lee (2015), Wandler & H (2017+)

▶ Volatility estimation for high frequency data Katsoridas & H(2016+)

▶ Logistic regression with random effects (response models)Liu & H (2016,2017)

27

conclusions

I have a dream …

▶ One famous statistician said (I paraphrase)

“I use Bayes because there is no need to proveasymptotic theorem; it is correct.”

▶ I have a dream that by the time I retire people will havesimilar trust in fiducial inspired approaches.

Thank you!

28

conclusions

I have a dream …

▶ One famous statistician said (I paraphrase)

“I use Bayes because there is no need to proveasymptotic theorem; it is correct.”

▶ I have a dream that by the time I retire people will havesimilar trust in fiducial inspired approaches.

Thank you!

28

conclusions

I have a dream …

▶ One famous statistician said (I paraphrase)

“I use Bayes because there is no need to proveasymptotic theorem; it is correct.”

▶ I have a dream that by the time I retire people will havesimilar trust in fiducial inspired approaches.

Thank you!

28

conclusions

I have a dream …

▶ One famous statistician said (I paraphrase)

“I use Bayes because there is no need to proveasymptotic theorem; it is correct.”

▶ I have a dream that by the time I retire people will havesimilar trust in fiducial inspired approaches.

Thank you!

28

top related