A Dirichlet Process Functional
Approach to
Heteroscedastic-Consistent
Covariance Estimation
with Applications
George Karabatsos
University of Illinois-Chicago (UIC)
International Workshop on Applied Probability
Monday, June 20, 2016, 13:15 - 14:45
Session 1B-1 on
Random Measures and Bayesian Nonparametrics
(Organizer and Chair: Fabrizio Leisen)
Supported by NSF grant SES-1156372.
2
• A line of research deals with the problem of determining the
expression for the distribution of functionals of the Dirichlet process,
with any prescribed error of approximation (Cifarelli & Regazzini ,1979;
reviews: Regazzini et al., 2002; Lijoi & Prunster, 2009).
This research has focused on mean/linear functionals of DP.
• I will show that the ordinary least squares (OLS) estimator,
as a functional of the Mixture of Dirichlet process (MDP),
has (approximately) posterior mean given by weighted least squares (WLS),
and posterior covariance given by a weighted version of
White’s (1980) heteroscedastic-consistent sandwich covariance estimator.
• This finding is based on a Taylor series approximation of a
bootstrap distribution (i.e., Pólya urn) representation of
the posterior of the Dirichlet process (DP),
using the multivariate delta method.
• Under a non-informative DP prior, this sandwich estimator closely
approximates White's (1980) original (unweighted) sandwich estimator.
Introduction
3
• For a particular choice of baseline distribution in the MDP,
the WLS estimator corresponds to a generalized ridge regression estimator,
which, as known, can handle:
multicollinear or singular covariate design matrices, including p > n settings,
while shrinking the coefficient estimates of irrelevant covariates towards zero;
the fitting of nonlinear regressions via basis expansions.
• Bayesian ridge regression is tough to beat in terms of prediction
(Griffin & Brown, 2013, Bayesian Analysis).
• The current study is the first to:
make connections between the DP and ridge regression,
provide heteroscedastic-consistent posterior covariance
estimation for the coefficients in (Bayesian) ridge regression.
• I will then illustrate the MDP functional methodology through the
analysis of simulated data sets, and two real data sets.
• These illustrations will also include Vibration of Effects (VoE) analyses
(Ioannidis, 2008: “Why most discovered true associations are inflated”, Epid.),
to investigate how the covariate effect changes as a function
of what other covariates are included in the model, and choice of prior.
Introduction
4
• All of these posterior quantities of this MDP functional methodology
(the WLS and sandwich estimators)
are analytically manageable and permit fast computations for large data sets.
• This work on MDP functional methodology
provides a contribution to likelihood-free approximate Bayesian inference.
• This is important step, because full MCMC inference and (esp. likelihood)
computations with DP (infinite-) mixture models can be non-trivial in practice.
• Originally, the DP prior was introduced with the motivation (Ferguson, 1973, p.209)
that the DP posterior distribution:
(1) "should be manageable analytically," and that
(2) the "support of the prior distribution should be large-with respect to some
suitable topology on the space of probability distributions on the sample space."
• This seminal paper then provided explicit analytical (closed form)
solutions to a list of nonparametric statistical problems based on the DP posterior,
including the estimation of a distribution function, median, quantiles, variance,
covariance, and the probability that one variable exceeds another.
• The MDP functional methodology (of this talk ) adds to this list.
Introduction
5
• Data:
n observations of Z = (X, Y), sample space .
• cn < n dist. Clusters:
• Cluster frequencies: , with .
• n observations
assumed
exchangeable
from
MDP model:
• A-priori, 𝔼[F()] = F0(), and 𝕍(F) = F0()[1 F0()].
• Ridge baseline prior:
Unit ridge baseline prior: has vxk = 1, for k = 2, ..., K.
• →
all k ≥ 1 partitions B1,…,Bk of .
• π(α) is either a U(0,) prior, or
a truncated Cauchy-type prior: π(α) = 1(0 < α < ξ)/(α + 1)² (Nandram & Yin, 2016).
• (the sn(k) are the signless Stirling numbers.)
Zn zi xi
, yii1n X, y Xn, yn
Zcn
Xcn
, ycn
zc xc
, ycc1
cnn
ncn n1, , ncn
zi |F iid F, i 1, , n,
F | DP, F0,
F0z NK1xi , yi |mz , Vz
, 0.
#
#
#
#
F0z Nx1 |0, 0k2
KNxk |0, vxkNy |0, 0
F | DP, F0 FB1, , FBk | DikF0B1, ,F0Bk
Z Z K1
c1
cn nc n
Z Z K1
PCn k | snkk n
, 0
MDP Model (prior)
MDP Model (posterior)
6
• Conditional posterior:
all k ≥ 1 partitions B1,…,Bk of .
• DP cond.
posterior
expectation
(predictive;
Pólya urn
scheme):
• DP cond.
posterior
variance:
• posterior:
Linear Regression Theory (Basic Review)
7
• Data:
• cn < n dist. clusters:
• Cluster frequencies:
• Regression equation: Yi = β1xi1 + ⋯ + βKxiK + εi , i = 1, … , n.
• Regression errors: ,
• Corresponding variances: ,
• Assuming exogeneity:
• OLS estimator:
• Sampling covariance matrix:
• Under homoscedasticity: ( , )
( , , consistent only under homoscedasticity)
• Sandwich estimator:
(White, 1980) , i = 1,..., n
Consistent under hetero/homoscedasticity. (HC0 = under homosc.)
• , n → .
8
• Observed data: ,
has empirical dist. (and clusters )
with mean and (K+1) (K+1) covariance matrix ,
including the K K covariance matrix of X,
and the K1 vector of covariances between X and y (resp.).
• Imaginary data: with eff. sample size ,
has MDP baseline distribution, F0, with mean ,
with (K+1) (K+1) covariance matrix , with K K covariance matrix
of X, and the K1 vector of covariances between X and Y.
• The OLS estimator from the combined data set, , is WLS:
Matrix Identities
mz mx, mY
X
XS
diag1n ,
SIS X
XS
1XXS
diag1n,
SIS
yyS
Xcn
XS
diagncn
, SIS
Xcn
XS
1Xcn
XS
diagncn
, SIS
ycn
yS
Xdiagn, SISX
1Xdiagn ,
SISY
X 1n diagn,
SISX
1X 1
n diagn , SISY
nVx mxmx
Vx mxmx
1
nVxY mYmx VxY mYmx
#
#
#
#
#
9
• The OLS estimator from
the combined data set, , is WLS:
• Diagonal elements of (α / S)1S sum to α, the prior sample size of the DP.
• If α is a positive integer with S = α,
then: (α / S)IS = (α / α)Iα = Iα,
and
Matrix Identities
Xn
Xn1Xn yn, where Xn, yn
Xn
X
yn
y
X
XS
diag1n ,
SIS X
XS
1XXS
diag1n,
SIS
yyS
Xcn
XS
diagncn
, SIS
Xcn
XS
1Xcn
XS
diagncn
, SIS
ycn
yS
Xdiagn, SISX
1Xdiagn ,
SISY
X 1n diagn,
SISX
1X 1
n diagn , SISY
nVx mxmx
Vx mxmx
1
nVxY mYmx VxY mYmx
#
#
#
#
#
X,Y Xcn
XS
,ycn
yS
10
• The OLS estimator from
the combined data set, , is WLS:
• For any α > 0, possibly S ≠ α, an extension of the fractional imputation
method (e.g.,Kim & Kim, 2012) can be used to simulate the imaginary data set.
• Draw a large number (S) of samples .
• by the law of large numbers.
• Then plug in , , into the OLS estimator above.
.
Matrix Identities
X
XS
diag1n ,
SIS X
XS
1XXS
diag1n,
SIS
yyS
Xcn
XS
diagncn
, SIS
Xcn
XS
1Xcn
XS
diagncn
, SIS
ycn
yS
Xdiagn, SISX
1Xdiagn ,
SISY
X 1n diagn,
SISX
1X 1
n diagn , SISY
nVx mxmx
Vx mxmx
1
nVxY mYmx VxY mYmx
#
#
#
#
#
X,Y Xcn
XS
,ycn
yS
zcn1,s xcn1,s, ycn1,s F0, for s 1, , S
plimS1
S
s1
Szcn1,szcn1,s
Vz mzmz
XS : xcn1,s s1
S yS : ycn1,ss1S
• Under the ridge baseline prior, the OLS estimator also satisfies the equalities:
• Uses imaginary data , obtained by deterministic single imputation.
• The OLS solution above, obtained by the MDP ridge baseline prior,
is the generalized ridge regression estimator with shrinkage parameters
α ⋅ (0, v₂,…,vK); and if further v₂ = = vK = 1, then the OLS solution above
is the ridge regression estimator (Hoerl & Kennard, 1970) with
shrinkage parameter α (Hastie et al., 2009, p. 96, observed the latter
without any reference to the DP).
• Under a ridge baseline prior, we may just use: .
Matrix Identities
11
XK, yK
X,Y Xcn
XK
,ycn
yK
MDP Bootstrap Procedure• Given , a bootstrap draw b of a random c.d.f. F* can be constructed by taking:
given n + ceil(α) + 1 draws zcb* from
and a posterior draw ~ ( | Zn).
• This bootstrap procedure (for each stage b) is the same as taking a multinomial
draw: ~
and then drawing zc* with probability nc / ( + n) [nc* times], for c = 1 ,…, cn,
and drawing z*c(n)+1 ~ F0 with probability / ( + n) [nc(n)+1* times].
• This bootstrap sampling process may be repeated for b = 1,…, B, for large B.
• Then:
, (≐ denotes near equality);
, ;
F has twice the skewness of F*, but is small anyways;
G(φ(F) | Zn, α) ≐ G*(φ(F*) | Zn, α), (c.d.f.s), any well-behaved functional φ;
All above are true marginally over the posterior ( | Zn).
See Hjort (1985). 12
EF |Zn, c1
cn nc
n z c
n F0
ncn1,b ncb
cn11
EFB |Zn, EFB |Zn,
Fbt
c1
cn1 ncb
n11zc
t floor
n11zcn11ceil,b
t
VFB |Zn, VFB |Zn,B BZB BZ
Multn ceil 1;n1
n , , nc
n , ,ncn
n , n
MDP Bootstrap Procedure• The, for the OLS functional, we may adopt the bootstrap sampling scheme:
• The bottom S (or K) rows of (𝕏, 𝕐) already contain the
imaginary observations sampled from the MDP baseline, F0.
• The factor (n + + 1) / (n + ceil() + 1) addresses continuous .
• We have:
• For a ridge baseline prior, use (α/(α + n)) ⊗ 1K⊺ instead of ({α/S}/(α + n)) ⊗ 1S
⊺.
• Lancaster (2003) made rather similar observations as above (without using α):
Classical (Efron) bootstrap uses: n* ~ Multinomial(n; n1/n,…, nc(n)/n);
Bayesian bootstrap uses: n* ~ Dirichlet(11,…,1n).13
F
n X diagnX1X diagnY,
n n1
nceil1ncb
cn11
ncbcn11 |Zn, MucnSn ceil 1;
n1
n , ,ncn
n ,/Sn 1S
,
|Zn |cn.
#
#
#
#
En |Zn, n n 1 n1
n , ,ncn
n ,/Sn 1S
,
Vn |Zn, diagn n 11n
n.
#
#
MDP Bootstrap Procedure• Recall:
• Again, for a ridge baseline prior, use (α/(α + n)) ⊗ 1K⊺ instead of ({α/S}/(α + n)) ⊗ 1S
⊺.
• π(α) is either a U(0,) prior, or a Cauchy type prior truncated above by .
Let A be a fine grid of α defined over the support of the prior, π(α).
(The grid A may range from .005 to in intervals of .005).
• Then, marginalizing over the posterior, π(α | cn), and by the total law of
probability for expectations and covariances, the marginal expectation and
covariance matrix can be approximated and rapidly computed by:
14
En |Zn, n n 1 n1
n , ,ncn
n ,/Sn 1S
,
Vn |Zn, diagn n 11n
n.
#
#
En |Zn n A
En |Zn, |cn,
Vn |Zn A
Vn |Zn, |cn A
En |Zn,En |Zn, |cn
En |ZnEn |Zn.
#
#
#
Multivariate Delta Method• We now consider a deterministic approach to evaluating the distribution of a
functional (e.g., β*(F*)) of the MDP posterior, using a prescribed error of
approximation (as in research on DP functionals; Regazzini, et al. 2002).
• We employ the multivariate delta method to approximate the (MDP bootstrap)
posterior distribution of β(F*) via a Taylor series approximation β*(n*) ≈ β(n*)
of β(n*) around the posterior mean, . With ∂β(n) /∂n a K× (cn + S) matrix of
first derivatives (S = K for ridge), this Taylor series approximation is given by:
15
n
n n nn
nnn n
n Y Xn XdiagnX1Xvecdiagn
n nnn n
n un XXdiagnX1
e1e1
ecnSecnS
n n
n Rnn n,
#
#
#
#
un Y Xn; ec 1c 1, , 1c cn S;
Rn e1e1, , ecnSecnS
un XXdiagnX1
diagunXXdiagnX1, is cn S K.
#
#
#
where:
• Then the posterior distribution of n* implies that the approximation has exact
posterior mean given by the WLS estimator:
and exact covariance matrix given by the heteroscedastic-consistent sandwich
estimator for WLS (Greene, 2012, p. 319):
where ∘ denotes the Hadamard product operator.
Poirier (2011, Economic Reviews) obtained a similar (but not identical) result.
• Then the posterior variances from this matrix,
provide the asymptotic-consistent 95% posterior credible interval,
, for k = 1,…,K.16
En |Zn n XdiagnX1XdiagnY
Vn |Zn nn
nnVn |Zn
nn
nn
RnVn |ZnRn
XdiagnX1Xdiagn un2XXdiagnX1
#
#
kn 1. 96Vk
n |Zn1/2
Multivariate Delta Method
diagVn |Zn V1n |Zn, ,VK
n |Zn,
• Furthermore, suppose that:
the MDP model assumes a non-informative DP prior, defined by the specification α → 0;
The number of distinct cluster in the data is cn = n, so that (S = K for ridge baseline prior).
• Then the conditional posterior distribution of the MDP is Dirichlet (Di),
with support points the cn observed cluster values ,
where the random probabilities, θc = Pr(z = zc∗), c = 1, …, cn,
coincide with those of the Bayesian Bootstrap (Rubin, 1981).
• Then,
the DP posterior predictive distribution function reduces to:
,
the distribution function used by the classical bootstrap (Efron, 1979);
the posterior mean (WLS) estimate of is approximately equal to the usual OLS estimator, ;
and the posterior covariance matrix of reduces to White’s (1980) heteroscedastic consistent covariance estimator:
17
Non-informative DP prior case
cn 1, , cn |Zn Dicnncn,zc
c1cnn
Przn1 B |Zn FnB
c1
cn nc
n z cB,
n0 XX1Xy
n n0 1n, 0S
B BZ,
Vn |Zn XX1Xdiagun02XXX1 HC0.
Simulation Study (coverage rates)
18
Heteroscedasticity Level
ah 0 (0) ah 1 (. 05) ah 2 (. 1) ah 2. 5 (. 15)
X dist. n c u h c u h c u h c u h
U(0,1) 10 . 59 . 58 . 83 . 72 . 71 . 82 . 87 . 87 . 79 . 93 . 92 . 78
Ex(1) 10 . 78 . 78 . 79 . 74 . 74 . 73 . 70 . 69 . 68 . 66 . 66 . 64
N(0,25) 10 . 81 . 81 . 82 . 65 . 65 . 67 . 64 . 64 . 67 . 66 . 66 . 70
AR(1) 10 . 82 . 82 . 81 . 81 . 80 . 80 . 78 . 79 . 78 . 77 . 77 . 77
U(0,1) 20 . 74 . 74 . 90 . 84 . 84 . 89 . 92 . 92 . 87 . 95 . 95 . 86
Ex(1) 20 . 86 . 86 . 86 . 81 . 81 . 80 . 75 . 75 . 75 . 71 . 71 . 70
N(0,25) 20 . 88 . 88 . 88 . 84 . 84 . 85 . 88 . 88 . 89 . 90 . 90 . 91
AR(1) 20 . 88 . 88 . 89 . 87 . 87 . 87 . 86 . 86 . 86 . 85 . 85 . 86
U(0,1) 50 . 88 . 88 . 93 . 92 . 92 . 93 . 94 . 94 . 92 . 94 . 94 . 92
Ex(1) 50 . 90 . 90 . 91 . 86 . 86 . 86 . 82 . 82 . 82 . 82 . 82 . 82
N(0,25) 50 . 92 . 92 . 92 . 93 . 93 . 93 . 96 . 96 . 97 . 98 . 98 . 98
AR(1) 50 . 93 . 93 . 93 . 92 . 92 . 92 . 91 . 91 . 92 . 91 . 91 . 92
U(0,1) 100 . 92 . 92 . 94 . 94 . 94 . 94 . 94 . 94 . 94 . 94 . 94 . 94
Ex(1) 100 . 92 . 92 . 92 . 89 . 89 . 90 . 89 . 89 . 89 . 90 . 90 . 90
N(0,25) 100 . 94 . 94 . 94 . 96 . 96 . 96 . 98 . 98 . 98 . 99 . 99 . 99
AR(1) 100 . 94 . 94 . 94 . 93 . 93 . 94 . 93 . 93 . 93 . 93 . 93 . 93
U(0,1) 500 . 94 . 94 . 95 . 95 . 95 . 95 . 95 . 95 . 95 . 94 . 94 . 95
Ex(1) 500 . 95 . 95 . 95 . 93 . 93 . 93 . 95 . 95 . 95 . 96 . 96 . 96
N(0,25) 500 . 95 . 95 . 95 . 98 . 98 . 98 1. 0 1. 0 1. 0 1. 0 1. 0 1. 0
AR(1) 500 . 95 . 95 . 95 . 95 . 95 . 95 . 95 . 95 . 95 . 95 . 95 . 95
Models fitted to data:
c: MDP with ridge base and
truncated Cauchy-type prior
π(α) = 1(0 < α < ξ)/(α + 1)².
u: MDP with ridge base and
U(α | 0, 3) prior.
h: HC0 (White, 1980).
Data Generating model:
yi | xi, β, σi² ∼ N(β₀ + β₁xi, σi²);
β₀ = β₁ = 1,
σi² = exp(ahxi + ahxi²), i = 1,…,n.
U(0,1) X dist: ah = 0, 1, 2, 2.5.
Else: ah = 0, .05, .1, .15.
4 × 4 × 5 simulation design:
• X distribution type (4),
by heterosced. level (4),
by sample size level (5).
• 10,000 data sets simulated
for each of 80 conditions.
Results: Rates similar across
models and .95, esp. for
larger n.
Simulation Study (rate means, s.d.s, by condition)
19
Models fitted to data:
c: MDP with ridge base and
truncated Cauchy-type prior
π(α) = 1(0 < α < ξ)/(α + 1)².
u: MDP with ridge base and
U(α | 0, 3) prior.
h: HC0 (White, 1980).
Data Generating model:
yi | xi, β, σi² ∼ N(β₀ + β₁xi, σi²);
β₀ = β₁ = 1,
σi² = exp(ahxi + ahxi²), i = 1,…,n.
U(0,1) X dist: ah = 0, 1, 2, 2.5.
Else: ah = 0, .05, .1, .15.
4 × 4 × 5 simulation design:
• X distribution type (4),
by heterosced. level (4),
by sample size level (5).
• 10,000 data sets simulated
for each of 80 conditions.
Results: Rates similar across
models and .95, esp. for
larger n.
Heteroscedasticity Level
ah 0 (0) ah 1 (. 05) ah 2 (. 1) ah 2. 5 (. 15)
c u h c u h c u h c u h
U(0,1) . 81 . 81 . 91 . 87 . 87 . 91 . 92 . 92 . 89 . 94 . 94 . 89
. 13 . 14 . 04 . 09 . 09 . 05 . 03 . 03 . 06 . 01 . 01 . 06
Ex(1) . 88 . 88 . 89 . 85 . 85 . 84 . 82 . 82 . 82 . 81 . 81 . 80
. 06 . 06 . 06 . 07 . 07 . 07 . 09 . 09 . 09 . 11 . 11 . 12
N(0,25) . 90 . 90 . 90 . 87 . 87 . 88 . 89 . 89 . 90 . 91 . 91 . 92
. 05 . 05 . 05 . 12 . 12 . 11 . 13 . 13 . 12 . 13 . 13 . 11
AR(1) . 90 . 90 . 90 . 90 . 90 . 90 . 89 . 89 . 89 . 88 . 88 . 88
. 05 . 05 . 05 . 05 . 05 . 05 . 06 . 06 . 06 . 07 . 07 . 07
n 10 . 75 . 75 . 81 . 73 . 72 . 76 . 75 . 75 . 73 . 75 . 75 . 72
. 09 . 10 . 01 . 06 . 06 . 06 . 09 . 09 . 06 . 11 . 11 . 06
n 20 . 84 . 84 . 88 . 84 . 84 . 85 . 85 . 85 . 84 . 85 . 85 . 83
. 06 . 06 . 01 . 02 . 02 . 03 . 06 . 06 . 06 . 09 . 09 . 08
n 50 . 91 . 91 . 92 . 91 . 91 . 91 . 91 . 91 . 91 . 91 . 91 . 91
. 02 . 02 . 01 . 03 . 03 . 03 . 05 . 05 . 05 . 06 . 06 . 06
n 100 . 93 . 93 . 94 . 93 . 93 . 93 . 94 . 94 . 94 . 94 . 94 . 94
. 01 . 01 . 01 . 02 . 02 . 02 . 03 . 03 . 03 . 03 . 03 . 03
n 500 . 95 . 95 . 95 . 95 . 95 . 95 . 96 . 96 . 96 . 96 . 96 . 96
. 00 . 00 . 00 . 02 . 02 . 02 . 02 . 02 . 02 . 02 . 02 . 02
Total . 87 . 87 . 90 . 87 . 87 . 88 . 88 . 88 . 88 . 88 . 88 . 87
. 09 . 09 . 05 . 09 . 09 . 08 . 10 . 10 . 09 . 10 . 10 . 10
LMT Data Analysis Example
20
• n = 347 observations from undergraduate teacher education students (89.9%
female) who each attended one of four Chicago universities between the Fall
2007 semester and Fall 2013 spring semesters, inclusive, excluding summers.
• Dependent variable (Y): math teaching ability score
(25-item test of Learning Math for Teaching; LMT, 2012).
• Covariates: Year, Year², and CTPP = 1(Year ≥ 2010.9),
an indicator of the administration of the new (versus old) teaching curriculum.
• All covariates rescaled to have mean zero and variance 1 before data analysis.
n pSD ES 95%PI n OLS SE
Intercept 12. 90 . 18 72. 76 12. 55, 13. 24 12. 90 . 18
Year . 69 . 57 1. 21 1. 81, . 43 478. 17 426. 54
Year2 . 65 . 57 1. 14 1. 78, . 47 476. 75 426. 54
CTPP . 60 . 28 2. 10 . 04, 1. 15 . 67 . 28
LMT Data Analysis Example
21
• Vibration of effects (VoE) analysis (Ioannidis, 2008) provides a
way to assess how much a covariate’s effect size differs (or vibrates)
over different ways that the analysis can be done, e.g. with respect to different:
variables included and/or excluded in the analysis (statistical adjustments);
models used;
definitions of outcomes and predictors;
and inclusion and exclusion criteria for the study population.
• We will consider a VoE analysis to investigate how
the effect size of the CTPP treatment variable (T),
varies as a function of the other covariates that are included
in the regression model, and α.
• is a WLS estimate that excludes a missing variable, U,
from the regression. See next slide for more details (on bias analysis).
• Covariate selection is undertaken using the fast
LARS algorithm (Efron et al. 2004),
applied to the data augmented by the
DP baseline prior imaginary observations.
EST
T
/VT
1/2
T
n
/VTn
|Zn,1/2
β~
LMT Data Analysis Example
22
• We may add a quantitative bias analysis into the VoE analysis.
• Let (X, T) as covariates in the regression equation, incl. treatment variable T,
and let U be a missing covariate from the regression equation, a confounder
such that εiU = γui + εi (i =1,…,n) is correlated with T (i.e., hidden bias).
• If there is hidden bias, then least-squares estimates are inconsistent.
• If no interactions between (T, U, X), then:
(see VanderWeele & Arah, 2011)
where:
is T’s WLS slope estimator in a regression fit that excludes U (given ),
is the WLS slope estimator in a regression fit that includes U (given ).
• If the missing variable U is binary, we may evaluate how sensitive
the effect size of T is, using the formula displayed above,
where FU( u | T = 1, x,; λ) and FU( u | T = 0, x; λ) are based on a
binary logistic regression, over independent standard normal samples of (, λ).
• Then also observe the distribution of effect sizes
over in the support of (), and over LARS covariate subsets incl. T.
Tn
Tn
udFUu |T 1, x udFUu |T 0, xdFXx
Tn
Tn
ES n/VT
1/2
LMT Data Analysis Example
Vibration of Effects Analysis
23
PIRLS Data Analysis Example
24
• Data on n = 565 low-income students from 21 U.S. elem schools (PIRLS 2006).
• Dependent variable: Student literacy score (zREAD).
• 8 covariates: student male status (1 if MALE, or 0); AGE; class size (SIZE);
class percent of English language learners (ELL); teacher years of experience
(TEXP4) and education level (EDLEVEL = 5 if bachelor's; = 6 if at least
master's); school enrollment (ENROL), safety (SAFE=1 high; SAFE = 3 low).
• Each variable in the data set was rescaled to have mean 0 and variance 1.
n pSD ES 95%PI n OLS SE
Intercept . 48 . 04 12. 67 . 56, . 41 . 48 . 04
MALE . 06 . 04 1. 44 . 13, . 02 . 06 . 04
AGE . 20 . 04 5. 22 . 27, . 12 . 20 . 04
SIZE . 07 . 05 1. 48 . 16, . 02 . 07 . 05
ELL . 13 . 05 2. 81 . 22, . 04 . 13 . 05
TEXP4 . 14 . 04 3. 23 . 05, . 22 . 14 . 04
EDLEVEL . 03 . 04 . 77 . 05, . 11 . 03 . 04
ENROL . 30 . 04 7. 60 . 22, . 37 . 30 . 04
SAFE . 16 . 04 3. 71 . 24, . 07 . 16 . 04
PIRLS Data Analysis Example
Vibration of Effects Analysis
25
Conclusions• First study to show that the OLS estimator, as a posterior functional of the MDP,
has (approximately) posterior mean given by WLS, and posterior covariance
given by a weighted heteroscedastic-consistent sandwich covariance estimator.
• These posterior quantities (the WLS and sandwich estimators)
are analytically manageable and permit fast computations for large data sets,
and correspond to ridge regression (shrinkage) estimators for a
particular choice of baseline distribution in the MDP.
• Illustrated MDP functional methodology through the
analysis of simulated and real data (including VoE analyses).
• Manuscript: http://arxiv.org/abs/1602.05155
• MDP methodology can be run using the “VoE analysis” menu option in my
Bayesian Regression software: http://georgek.people.uic.edu/BayesSoftware.html
• In principle, the MDP functional methodology can be extended to
processes of other Gibbs-type priors, such as the NIG process.
• For each of these other processes, the process variance is not available in
closed form, which would then make Monte Carlo simulation necessary
for posterior inference, and would make it more challenging to
correspond the mean and variance of the process with
its Polya urn scheme (posterior predictive) used for bootstrapping.26