Download - A Dirichlet Process Functional Approach to Heteroscedastic … · 2016. 6. 22. · (Ioannidis, 2008: “Why most discovered true associations are inflated”, Epid.), to investigate

A Dirichlet Process Functional

Approach to

Heteroscedastic-Consistent

Covariance Estimation

with Applications

George Karabatsos

University of Illinois-Chicago (UIC)

International Workshop on Applied Probability

Monday, June 20, 2016, 13:15 - 14:45

Session 1B-1 on

Random Measures and Bayesian Nonparametrics

(Organizer and Chair: Fabrizio Leisen)

Supported by NSF grant SES-1156372.

2

• A line of research deals with the problem of determining the

expression for the distribution of functionals of the Dirichlet process,

with any prescribed error of approximation (Cifarelli & Regazzini ,1979;

reviews: Regazzini et al., 2002; Lijoi & Prunster, 2009).

This research has focused on mean/linear functionals of DP.

• I will show that the ordinary least squares (OLS) estimator,

as a functional of the Mixture of Dirichlet process (MDP),

has (approximately) posterior mean given by weighted least squares (WLS),

and posterior covariance given by a weighted version of

White’s (1980) heteroscedastic-consistent sandwich covariance estimator.

• This finding is based on a Taylor series approximation of a

bootstrap distribution (i.e., Pólya urn) representation of

the posterior of the Dirichlet process (DP),

using the multivariate delta method.

• Under a non-informative DP prior, this sandwich estimator closely

approximates White's (1980) original (unweighted) sandwich estimator.

Introduction

3

• For a particular choice of baseline distribution in the MDP,

the WLS estimator corresponds to a generalized ridge regression estimator,

which, as known, can handle:

multicollinear or singular covariate design matrices, including p > n settings,

while shrinking the coefficient estimates of irrelevant covariates towards zero;

the fitting of nonlinear regressions via basis expansions.

• Bayesian ridge regression is tough to beat in terms of prediction

(Griffin & Brown, 2013, Bayesian Analysis).

• The current study is the first to:

make connections between the DP and ridge regression,

provide heteroscedastic-consistent posterior covariance

estimation for the coefficients in (Bayesian) ridge regression.

• I will then illustrate the MDP functional methodology through the

analysis of simulated data sets, and two real data sets.

• These illustrations will also include Vibration of Effects (VoE) analyses

(Ioannidis, 2008: “Why most discovered true associations are inflated”, Epid.),

to investigate how the covariate effect changes as a function

of what other covariates are included in the model, and choice of prior.

Introduction

4

• All of these posterior quantities of this MDP functional methodology

(the WLS and sandwich estimators)

are analytically manageable and permit fast computations for large data sets.

• This work on MDP functional methodology

provides a contribution to likelihood-free approximate Bayesian inference.

• This is important step, because full MCMC inference and (esp. likelihood)

computations with DP (infinite-) mixture models can be non-trivial in practice.

• Originally, the DP prior was introduced with the motivation (Ferguson, 1973, p.209)

that the DP posterior distribution:

(1) "should be manageable analytically," and that

(2) the "support of the prior distribution should be large-with respect to some

suitable topology on the space of probability distributions on the sample space."

• This seminal paper then provided explicit analytical (closed form)

solutions to a list of nonparametric statistical problems based on the DP posterior,

including the estimation of a distribution function, median, quantiles, variance,

covariance, and the probability that one variable exceeds another.

• The MDP functional methodology (of this talk ) adds to this list.

Introduction

5

• Data:

n observations of Z = (X, Y), sample space .

• cn < n dist. Clusters:

• Cluster frequencies: , with .

• n observations

assumed

exchangeable

from

MDP model:

• A-priori, 𝔼[F()] = F0(), and 𝕍(F) = F0()[1 F0()].

• Ridge baseline prior:

Unit ridge baseline prior: has vxk = 1, for k = 2, ..., K.

• →

all k ≥ 1 partitions B1,…,Bk of .

• π(α) is either a U(0,) prior, or

a truncated Cauchy-type prior: π(α) = 1(0 < α < ξ)/(α + 1)² (Nandram & Yin, 2016).

• (the sn(k) are the signless Stirling numbers.)

Zn zi xi

, yii1n X, y Xn, yn

Zcn

Xcn

, ycn

zc xc

, ycc1

cnn

ncn n1, , ncn

zi |F iid F, i 1, , n,

F | DP, F0,

F0z NK1xi , yi |mz , Vz

, 0.

#

#

#

#

F0z Nx1 |0, 0k2

KNxk |0, vxkNy |0, 0

F | DP, F0 FB1, , FBk | DikF0B1, ,F0Bk

Z Z K1

c1

cn nc n

Z Z K1

PCn k | snkk n

, 0

MDP Model (prior)

MDP Model (posterior)

6

• Conditional posterior:

all k ≥ 1 partitions B1,…,Bk of .

• DP cond.

posterior

expectation

(predictive;

Pólya urn

scheme):

• DP cond.

posterior

variance:

• posterior:

Linear Regression Theory (Basic Review)

7

• Data:

• cn < n dist. clusters:

• Cluster frequencies:

• Regression equation: Yi = β1xi1 + ⋯ + βKxiK + εi , i = 1, … , n.

• Regression errors: ,

• Corresponding variances: ,

• Assuming exogeneity:

• OLS estimator:

• Sampling covariance matrix:

• Under homoscedasticity: ( , )

( , , consistent only under homoscedasticity)

• Sandwich estimator:

(White, 1980) , i = 1,..., n

Consistent under hetero/homoscedasticity. (HC0 = under homosc.)

• , n → .

8

• Observed data: ,

has empirical dist. (and clusters )

with mean and (K+1) (K+1) covariance matrix ,

including the K K covariance matrix of X,

and the K1 vector of covariances between X and y (resp.).

• Imaginary data: with eff. sample size ,

has MDP baseline distribution, F0, with mean ,

with (K+1) (K+1) covariance matrix , with K K covariance matrix

of X, and the K1 vector of covariances between X and Y.

• The OLS estimator from the combined data set, , is WLS:

Matrix Identities

mz mx, mY

X

XS

diag1n ,

SIS X

XS

1XXS

diag1n,

SIS

yyS

Xcn

XS

diagncn

, SIS

Xcn

XS

1Xcn

XS

diagncn

, SIS

ycn

yS

Xdiagn, SISX

1Xdiagn ,

SISY

X 1n diagn,

SISX

1X 1

n diagn , SISY

nVx mxmx

Vx mxmx

1

nVxY mYmx VxY mYmx

#

#

#

#

#

9

• The OLS estimator from

the combined data set, , is WLS:

• Diagonal elements of (α / S)1S sum to α, the prior sample size of the DP.

• If α is a positive integer with S = α,

then: (α / S)IS = (α / α)Iα = Iα,

and

Matrix Identities

Xn

Xn1Xn yn, where Xn, yn

Xn

X

yn

y

X

XS

diag1n ,

SIS X

XS

1XXS

diag1n,

SIS

yyS

Xcn

XS

diagncn

, SIS

Xcn

XS

1Xcn

XS

diagncn

, SIS

ycn

yS

Xdiagn, SISX

1Xdiagn ,

SISY

X 1n diagn,

SISX

1X 1

n diagn , SISY

nVx mxmx

Vx mxmx

1

nVxY mYmx VxY mYmx

#

#

#

#

#

X,Y Xcn

XS

,ycn

yS

10

• The OLS estimator from

the combined data set, , is WLS:

• For any α > 0, possibly S ≠ α, an extension of the fractional imputation

method (e.g.,Kim & Kim, 2012) can be used to simulate the imaginary data set.

• Draw a large number (S) of samples .

• by the law of large numbers.

• Then plug in , , into the OLS estimator above.

.

Matrix Identities

X

XS

diag1n ,

SIS X

XS

1XXS

diag1n,

SIS

yyS

Xcn

XS

diagncn

, SIS

Xcn

XS

1Xcn

XS

diagncn

, SIS

ycn

yS

Xdiagn, SISX

1Xdiagn ,

SISY

X 1n diagn,

SISX

1X 1

n diagn , SISY

nVx mxmx

Vx mxmx

1

nVxY mYmx VxY mYmx

#

#

#

#

#

X,Y Xcn

XS

,ycn

yS

zcn1,s xcn1,s, ycn1,s F0, for s 1, , S

plimS1

S

s1

Szcn1,szcn1,s

Vz mzmz

XS : xcn1,s s1

S yS : ycn1,ss1S

• Under the ridge baseline prior, the OLS estimator also satisfies the equalities:

• Uses imaginary data , obtained by deterministic single imputation.

• The OLS solution above, obtained by the MDP ridge baseline prior,

is the generalized ridge regression estimator with shrinkage parameters

α ⋅ (0, v₂,…,vK); and if further v₂ = = vK = 1, then the OLS solution above

is the ridge regression estimator (Hoerl & Kennard, 1970) with

shrinkage parameter α (Hastie et al., 2009, p. 96, observed the latter

without any reference to the DP).

• Under a ridge baseline prior, we may just use: .

Matrix Identities

11

XK, yK

X,Y Xcn

XK

,ycn

yK

MDP Bootstrap Procedure• Given , a bootstrap draw b of a random c.d.f. F* can be constructed by taking:

given n + ceil(α) + 1 draws zcb* from

and a posterior draw ~ ( | Zn).

• This bootstrap procedure (for each stage b) is the same as taking a multinomial

draw: ~

and then drawing zc* with probability nc / ( + n) [nc* times], for c = 1 ,…, cn,

and drawing z*c(n)+1 ~ F0 with probability / ( + n) [nc(n)+1* times].

• This bootstrap sampling process may be repeated for b = 1,…, B, for large B.

• Then:

, (≐ denotes near equality);

, ;

F has twice the skewness of F*, but is small anyways;

G(φ(F) | Zn, α) ≐ G*(φ(F*) | Zn, α), (c.d.f.s), any well-behaved functional φ;

All above are true marginally over the posterior ( | Zn).

See Hjort (1985). 12

EF |Zn, c1

cn nc

n z c

n F0

ncn1,b ncb

cn11

EFB |Zn, EFB |Zn,

Fbt

c1

cn1 ncb

n11zc

t floor

n11zcn11ceil,b

t

VFB |Zn, VFB |Zn,B BZB BZ

Multn ceil 1;n1

n , , nc

n , ,ncn

n , n

MDP Bootstrap Procedure• The, for the OLS functional, we may adopt the bootstrap sampling scheme:

• The bottom S (or K) rows of (𝕏, 𝕐) already contain the

imaginary observations sampled from the MDP baseline, F0.

• The factor (n + + 1) / (n + ceil() + 1) addresses continuous .

• We have:

• For a ridge baseline prior, use (α/(α + n)) ⊗ 1K⊺ instead of ({α/S}/(α + n)) ⊗ 1S

⊺.

• Lancaster (2003) made rather similar observations as above (without using α):

Classical (Efron) bootstrap uses: n* ~ Multinomial(n; n1/n,…, nc(n)/n);

Bayesian bootstrap uses: n* ~ Dirichlet(11,…,1n).13

F

n X diagnX1X diagnY,

n n1

nceil1ncb

cn11

ncbcn11 |Zn, MucnSn ceil 1;

n1

n , ,ncn

n ,/Sn 1S

,

|Zn |cn.

#

#

#

#

En |Zn, n n 1 n1

n , ,ncn

n ,/Sn 1S

,

Vn |Zn, diagn n 11n

n.

#

#

MDP Bootstrap Procedure• Recall:

• Again, for a ridge baseline prior, use (α/(α + n)) ⊗ 1K⊺ instead of ({α/S}/(α + n)) ⊗ 1S

⊺.

• π(α) is either a U(0,) prior, or a Cauchy type prior truncated above by .

Let A be a fine grid of α defined over the support of the prior, π(α).

(The grid A may range from .005 to in intervals of .005).

• Then, marginalizing over the posterior, π(α | cn), and by the total law of

probability for expectations and covariances, the marginal expectation and

covariance matrix can be approximated and rapidly computed by:

14

En |Zn, n n 1 n1

n , ,ncn

n ,/Sn 1S

,

Vn |Zn, diagn n 11n

n.

#

#

En |Zn n A

En |Zn, |cn,

Vn |Zn A

Vn |Zn, |cn A

En |Zn,En |Zn, |cn

En |ZnEn |Zn.

#

#

#

Multivariate Delta Method• We now consider a deterministic approach to evaluating the distribution of a

functional (e.g., β*(F*)) of the MDP posterior, using a prescribed error of

approximation (as in research on DP functionals; Regazzini, et al. 2002).

• We employ the multivariate delta method to approximate the (MDP bootstrap)

posterior distribution of β(F*) via a Taylor series approximation β*(n*) ≈ β(n*)

of β(n*) around the posterior mean, . With ∂β(n) /∂n a K× (cn + S) matrix of

first derivatives (S = K for ridge), this Taylor series approximation is given by:

15

n

n n nn

nnn n

n Y Xn XdiagnX1Xvecdiagn

n nnn n

n un XXdiagnX1

e1e1

ecnSecnS

n n

n Rnn n,

#

#

#

#

un Y Xn; ec 1c 1, , 1c cn S;

Rn e1e1, , ecnSecnS

un XXdiagnX1

diagunXXdiagnX1, is cn S K.

#

#

#

where:

• Then the posterior distribution of n* implies that the approximation has exact

posterior mean given by the WLS estimator:

and exact covariance matrix given by the heteroscedastic-consistent sandwich

estimator for WLS (Greene, 2012, p. 319):

where ∘ denotes the Hadamard product operator.

Poirier (2011, Economic Reviews) obtained a similar (but not identical) result.

• Then the posterior variances from this matrix,

provide the asymptotic-consistent 95% posterior credible interval,

, for k = 1,…,K.16

En |Zn n XdiagnX1XdiagnY

Vn |Zn nn

nnVn |Zn

nn

nn

RnVn |ZnRn

XdiagnX1Xdiagn un2XXdiagnX1

#

#

kn 1. 96Vk

n |Zn1/2

Multivariate Delta Method

diagVn |Zn V1n |Zn, ,VK

n |Zn,

• Furthermore, suppose that:

the MDP model assumes a non-informative DP prior, defined by the specification α → 0;

The number of distinct cluster in the data is cn = n, so that (S = K for ridge baseline prior).

• Then the conditional posterior distribution of the MDP is Dirichlet (Di),

with support points the cn observed cluster values ,

where the random probabilities, θc = Pr(z = zc∗), c = 1, …, cn,

coincide with those of the Bayesian Bootstrap (Rubin, 1981).

• Then,

the DP posterior predictive distribution function reduces to:

,

the distribution function used by the classical bootstrap (Efron, 1979);

the posterior mean (WLS) estimate of is approximately equal to the usual OLS estimator, ;

and the posterior covariance matrix of reduces to White’s (1980) heteroscedastic consistent covariance estimator:

17

Non-informative DP prior case

cn 1, , cn |Zn Dicnncn,zc

c1cnn

Przn1 B |Zn FnB

c1

cn nc

n z cB,

n0 XX1Xy

n n0 1n, 0S

B BZ,

Vn |Zn XX1Xdiagun02XXX1 HC0.

Simulation Study (coverage rates)

18

Heteroscedasticity Level

ah 0 (0) ah 1 (. 05) ah 2 (. 1) ah 2. 5 (. 15)

X dist. n c u h c u h c u h c u h

U(0,1) 10 . 59 . 58 . 83 . 72 . 71 . 82 . 87 . 87 . 79 . 93 . 92 . 78

Ex(1) 10 . 78 . 78 . 79 . 74 . 74 . 73 . 70 . 69 . 68 . 66 . 66 . 64

N(0,25) 10 . 81 . 81 . 82 . 65 . 65 . 67 . 64 . 64 . 67 . 66 . 66 . 70

AR(1) 10 . 82 . 82 . 81 . 81 . 80 . 80 . 78 . 79 . 78 . 77 . 77 . 77

U(0,1) 20 . 74 . 74 . 90 . 84 . 84 . 89 . 92 . 92 . 87 . 95 . 95 . 86

Ex(1) 20 . 86 . 86 . 86 . 81 . 81 . 80 . 75 . 75 . 75 . 71 . 71 . 70

N(0,25) 20 . 88 . 88 . 88 . 84 . 84 . 85 . 88 . 88 . 89 . 90 . 90 . 91

AR(1) 20 . 88 . 88 . 89 . 87 . 87 . 87 . 86 . 86 . 86 . 85 . 85 . 86

U(0,1) 50 . 88 . 88 . 93 . 92 . 92 . 93 . 94 . 94 . 92 . 94 . 94 . 92

Ex(1) 50 . 90 . 90 . 91 . 86 . 86 . 86 . 82 . 82 . 82 . 82 . 82 . 82

N(0,25) 50 . 92 . 92 . 92 . 93 . 93 . 93 . 96 . 96 . 97 . 98 . 98 . 98

AR(1) 50 . 93 . 93 . 93 . 92 . 92 . 92 . 91 . 91 . 92 . 91 . 91 . 92

U(0,1) 100 . 92 . 92 . 94 . 94 . 94 . 94 . 94 . 94 . 94 . 94 . 94 . 94

Ex(1) 100 . 92 . 92 . 92 . 89 . 89 . 90 . 89 . 89 . 89 . 90 . 90 . 90

N(0,25) 100 . 94 . 94 . 94 . 96 . 96 . 96 . 98 . 98 . 98 . 99 . 99 . 99

AR(1) 100 . 94 . 94 . 94 . 93 . 93 . 94 . 93 . 93 . 93 . 93 . 93 . 93

U(0,1) 500 . 94 . 94 . 95 . 95 . 95 . 95 . 95 . 95 . 95 . 94 . 94 . 95

Ex(1) 500 . 95 . 95 . 95 . 93 . 93 . 93 . 95 . 95 . 95 . 96 . 96 . 96

N(0,25) 500 . 95 . 95 . 95 . 98 . 98 . 98 1. 0 1. 0 1. 0 1. 0 1. 0 1. 0

AR(1) 500 . 95 . 95 . 95 . 95 . 95 . 95 . 95 . 95 . 95 . 95 . 95 . 95

Models fitted to data:

c: MDP with ridge base and

truncated Cauchy-type prior

π(α) = 1(0 < α < ξ)/(α + 1)².

u: MDP with ridge base and

U(α | 0, 3) prior.

h: HC0 (White, 1980).

Data Generating model:

yi | xi, β, σi² ∼ N(β₀ + β₁xi, σi²);

β₀ = β₁ = 1,

σi² = exp(ahxi + ahxi²), i = 1,…,n.

U(0,1) X dist: ah = 0, 1, 2, 2.5.

Else: ah = 0, .05, .1, .15.

4 × 4 × 5 simulation design:

• X distribution type (4),

by heterosced. level (4),

by sample size level (5).

• 10,000 data sets simulated

for each of 80 conditions.

Results: Rates similar across

models and .95, esp. for

larger n.

Simulation Study (rate means, s.d.s, by condition)

19

Models fitted to data:

c: MDP with ridge base and

truncated Cauchy-type prior

π(α) = 1(0 < α < ξ)/(α + 1)².

u: MDP with ridge base and

U(α | 0, 3) prior.

h: HC0 (White, 1980).

Data Generating model:

yi | xi, β, σi² ∼ N(β₀ + β₁xi, σi²);

β₀ = β₁ = 1,

σi² = exp(ahxi + ahxi²), i = 1,…,n.

U(0,1) X dist: ah = 0, 1, 2, 2.5.

Else: ah = 0, .05, .1, .15.

4 × 4 × 5 simulation design:

• X distribution type (4),

by heterosced. level (4),

by sample size level (5).

• 10,000 data sets simulated

for each of 80 conditions.

Results: Rates similar across

models and .95, esp. for

larger n.

Heteroscedasticity Level

ah 0 (0) ah 1 (. 05) ah 2 (. 1) ah 2. 5 (. 15)

c u h c u h c u h c u h

U(0,1) . 81 . 81 . 91 . 87 . 87 . 91 . 92 . 92 . 89 . 94 . 94 . 89

. 13 . 14 . 04 . 09 . 09 . 05 . 03 . 03 . 06 . 01 . 01 . 06

Ex(1) . 88 . 88 . 89 . 85 . 85 . 84 . 82 . 82 . 82 . 81 . 81 . 80

. 06 . 06 . 06 . 07 . 07 . 07 . 09 . 09 . 09 . 11 . 11 . 12

N(0,25) . 90 . 90 . 90 . 87 . 87 . 88 . 89 . 89 . 90 . 91 . 91 . 92

. 05 . 05 . 05 . 12 . 12 . 11 . 13 . 13 . 12 . 13 . 13 . 11

AR(1) . 90 . 90 . 90 . 90 . 90 . 90 . 89 . 89 . 89 . 88 . 88 . 88

. 05 . 05 . 05 . 05 . 05 . 05 . 06 . 06 . 06 . 07 . 07 . 07

n 10 . 75 . 75 . 81 . 73 . 72 . 76 . 75 . 75 . 73 . 75 . 75 . 72

. 09 . 10 . 01 . 06 . 06 . 06 . 09 . 09 . 06 . 11 . 11 . 06

n 20 . 84 . 84 . 88 . 84 . 84 . 85 . 85 . 85 . 84 . 85 . 85 . 83

. 06 . 06 . 01 . 02 . 02 . 03 . 06 . 06 . 06 . 09 . 09 . 08

n 50 . 91 . 91 . 92 . 91 . 91 . 91 . 91 . 91 . 91 . 91 . 91 . 91

. 02 . 02 . 01 . 03 . 03 . 03 . 05 . 05 . 05 . 06 . 06 . 06

n 100 . 93 . 93 . 94 . 93 . 93 . 93 . 94 . 94 . 94 . 94 . 94 . 94

. 01 . 01 . 01 . 02 . 02 . 02 . 03 . 03 . 03 . 03 . 03 . 03

n 500 . 95 . 95 . 95 . 95 . 95 . 95 . 96 . 96 . 96 . 96 . 96 . 96

. 00 . 00 . 00 . 02 . 02 . 02 . 02 . 02 . 02 . 02 . 02 . 02

Total . 87 . 87 . 90 . 87 . 87 . 88 . 88 . 88 . 88 . 88 . 88 . 87

. 09 . 09 . 05 . 09 . 09 . 08 . 10 . 10 . 09 . 10 . 10 . 10

LMT Data Analysis Example

20

• n = 347 observations from undergraduate teacher education students (89.9%

female) who each attended one of four Chicago universities between the Fall

2007 semester and Fall 2013 spring semesters, inclusive, excluding summers.

• Dependent variable (Y): math teaching ability score

(25-item test of Learning Math for Teaching; LMT, 2012).

• Covariates: Year, Year², and CTPP = 1(Year ≥ 2010.9),

an indicator of the administration of the new (versus old) teaching curriculum.

• All covariates rescaled to have mean zero and variance 1 before data analysis.

n pSD ES 95%PI n OLS SE

Intercept 12. 90 . 18 72. 76 12. 55, 13. 24 12. 90 . 18

Year . 69 . 57 1. 21 1. 81, . 43 478. 17 426. 54

Year2 . 65 . 57 1. 14 1. 78, . 47 476. 75 426. 54

CTPP . 60 . 28 2. 10 . 04, 1. 15 . 67 . 28


21

• Vibration of effects (VoE) analysis (Ioannidis, 2008) provides a

way to assess how much a covariate’s effect size differs (or vibrates)

over different ways that the analysis can be done, e.g. with respect to different:

variables included and/or excluded in the analysis (statistical adjustments);

models used;

definitions of outcomes and predictors;

and inclusion and exclusion criteria for the study population.

• We will consider a VoE analysis to investigate how

the effect size of the CTPP treatment variable (T),

varies as a function of the other covariates that are included

in the regression model, and α.

• is a WLS estimate that excludes a missing variable, U,

from the regression. See next slide for more details (on bias analysis).

• Covariate selection is undertaken using the fast

LARS algorithm (Efron et al. 2004),

applied to the data augmented by the

DP baseline prior imaginary observations.

EST

T

/VT

1/2

T

n

/VTn

|Zn,1/2

β~


22

• We may add a quantitative bias analysis into the VoE analysis.

• Let (X, T) as covariates in the regression equation, incl. treatment variable T,

and let U be a missing covariate from the regression equation, a confounder

such that εiU = γui + εi (i =1,…,n) is correlated with T (i.e., hidden bias).

• If there is hidden bias, then least-squares estimates are inconsistent.

• If no interactions between (T, U, X), then:

(see VanderWeele & Arah, 2011)

where:

is T’s WLS slope estimator in a regression fit that excludes U (given ),

is the WLS slope estimator in a regression fit that includes U (given ).

• If the missing variable U is binary, we may evaluate how sensitive

the effect size of T is, using the formula displayed above,

where FU( u | T = 1, x,; λ) and FU( u | T = 0, x; λ) are based on a

binary logistic regression, over independent standard normal samples of (, λ).

• Then also observe the distribution of effect sizes

over in the support of (), and over LARS covariate subsets incl. T.

Tn

Tn

udFUu |T 1, x udFUu |T 0, xdFXx

Tn

Tn

ES n/VT

1/2


Vibration of Effects Analysis

23

PIRLS Data Analysis Example

24

• Data on n = 565 low-income students from 21 U.S. elem schools (PIRLS 2006).

• Dependent variable: Student literacy score (zREAD).

• 8 covariates: student male status (1 if MALE, or 0); AGE; class size (SIZE);

class percent of English language learners (ELL); teacher years of experience

(TEXP4) and education level (EDLEVEL = 5 if bachelor's; = 6 if at least

master's); school enrollment (ENROL), safety (SAFE=1 high; SAFE = 3 low).

• Each variable in the data set was rescaled to have mean 0 and variance 1.

n pSD ES 95%PI n OLS SE

Intercept . 48 . 04 12. 67 . 56, . 41 . 48 . 04

MALE . 06 . 04 1. 44 . 13, . 02 . 06 . 04

AGE . 20 . 04 5. 22 . 27, . 12 . 20 . 04

SIZE . 07 . 05 1. 48 . 16, . 02 . 07 . 05

ELL . 13 . 05 2. 81 . 22, . 04 . 13 . 05

TEXP4 . 14 . 04 3. 23 . 05, . 22 . 14 . 04

EDLEVEL . 03 . 04 . 77 . 05, . 11 . 03 . 04

ENROL . 30 . 04 7. 60 . 22, . 37 . 30 . 04

SAFE . 16 . 04 3. 71 . 24, . 07 . 16 . 04

PIRLS Data Analysis Example

Vibration of Effects Analysis

25

Conclusions• First study to show that the OLS estimator, as a posterior functional of the MDP,

has (approximately) posterior mean given by WLS, and posterior covariance

given by a weighted heteroscedastic-consistent sandwich covariance estimator.

• These posterior quantities (the WLS and sandwich estimators)

are analytically manageable and permit fast computations for large data sets,

and correspond to ridge regression (shrinkage) estimators for a

particular choice of baseline distribution in the MDP.

• Illustrated MDP functional methodology through the

analysis of simulated and real data (including VoE analyses).

• Manuscript: http://arxiv.org/abs/1602.05155

• MDP methodology can be run using the “VoE analysis” menu option in my

Bayesian Regression software: http://georgek.people.uic.edu/BayesSoftware.html

• In principle, the MDP functional methodology can be extended to

processes of other Gibbs-type priors, such as the NIG process.

• For each of these other processes, the process variance is not available in

closed form, which would then make Monte Carlo simulation necessary

for posterior inference, and would make it more challenging to

correspond the mean and variance of the process with

its Polya urn scheme (posterior predictive) used for bootstrapping.26

http://arxiv.org/abs/1602.05155

http://georgek.people.uic.edu/BayesSoftware.html

Download - A Dirichlet Process Functional Approach to Heteroscedastic … · 2016. 6. 22. · (Ioannidis, 2008: “Why most discovered true associations are inflated”, Epid.), to investigate

Top Related