an introduction to algebraic statisticsmd5/papers/algstat.pdf · 2010-01-13 · ‘algebraic...
TRANSCRIPT
![Page 1: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/1.jpg)
An Introduction to Algebraic Statistics
Mathias Drton
Department of StatisticsUniversity of Chicago
January, 2010
![Page 2: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/2.jpg)
‘Algebraic statistics’
Application and development of techniques in
Algebraic Geometry, Commutative Algebra, and Combinatorics
to address problems in Statistics.
Instrumental paper:
Diaconis, Persi; Sturmfels, Bernd. Algebraic algorithms forsampling from conditional distributions. Annals of Statistics26 (1998), no. 1, 363–397.
Applied-minded algebraists get involved with Statistics
(AMS meetings, SIAM activity group, . . . ).
![Page 3: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/3.jpg)
Some literature
Pistone, Riccomagno & Wynn: Algebraic Statistics (Exp. Design)
Pachter & Sturmfels: Algebraic Statistics for Computational Biology
Gibilisco et al. (Eds.): Algebraic and Geometric Methods in Statistics
Viana & Richards (Eds.): Algebraic Methods in Statistics and Probability(2nd volume in prep.)
![Page 4: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/4.jpg)
These lectures
Material from Chapters 1, 2 and 5 in
Drton, Sullivant & Sturmfels:Lectures on Algebraic Statistics
Chapter 3: Conditional independenceGraphical models
Chapter 4: Hidden variable models
Chapter 6: Worked exercises
Chapter 7: Open problems
![Page 5: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/5.jpg)
Lectures
Lecture I: Markov Bases for Exact Inference in Contingency Tables
(Chapter 1 in lecture notes)
Lecture II: Likelihood Ratio Tests and Singularities
(Section 2.3 in lecture notes)
Lecture III: Bayesian Integrals
(Section 5.1 in lecture notes)
![Page 6: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/6.jpg)
Part I
Markov Bases for Exact Inference in Contingency Tables
1 Fisher’s exact test for 2× 2 contingency tables2 Log-linear models for multi-way tables3 Markov bases for exact conditional inference
![Page 7: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/7.jpg)
Lecture outline
1 Fisher’s exact test for 2× 2 contingency tables
2 Log-linear models for multi-way tables
3 Markov bases for exact conditional inference
Mathias Drton Lecture 1: Fisher’s exact test 2 / 110
![Page 8: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/8.jpg)
Example: Cancer treatment
Surgery versus radition treatment for cancer patients:
Cancer Cancer NotControlled Controlled
Surgery 21 0 21Radiation therapy 15 3 18
36 3 39
Disease outcome independent of treatment?
Chi-square test p-value = 0.1788
Fisher’s exact test p-value = 0.08929
Mathias Drton Lecture 1: Fisher’s exact test 3 / 110
![Page 9: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/9.jpg)
Independence model
Two discrete/categorical random variables
X ∈ [r ] := {1, 2, . . . , r} and Y ∈ [c] := {1, 2, . . . , c}
Joint and marginal probabilities:
pij = P(X = i ,Y = j), pi+ = P(X = i), p+j = P(Y = j)
X and Y independent (X⊥⊥Y ) iff
pij = pi+p+j for all i ∈ [r ], j ∈ [c]
or, equivalently, the matrix P = (pij) has rank 1.
Mathias Drton Lecture 1: Fisher’s exact test 4 / 110
![Page 10: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/10.jpg)
Chi-square test of independence
Counts from n i.i.d. copies of (X ,Y ):
Uij =n∑
k=1
1{X (k)=i ,Y (k)=j}, i ∈ [r ], j ∈ [c].
Contingency table U = (Uij) has multinomial distribution:
P(U = u) =n!
u11!u12! · · · urc !
r∏i=1
c∏j=1
puij
ij .
Chi-square statistic
X 2(U) =r∑
i=1
c∑j=1
(Uij − uij)2
uij
H0−→d χ2(r−1)(c−1), n→∞
Mathias Drton Lecture 1: Fisher’s exact test 5 / 110
![Page 11: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/11.jpg)
Fisher’s exact test for 2× 2 table
Hypergeometric distribution:
If X⊥⊥Y , then
P(U11 = u11 |U1+ = u1+,U+1 = u+1) =
(u1+u11
)( n−u1+u+1−u11
)( nu+1
)for u11 ∈ {max(0, u1+ + u+1 − n), . . . ,min(u1+, u+1)}.
Exact test:1 Choose a test statistic T (u)
(e.g., X 2(u), P(U11 = u11 |U1+ = u1+,U+1 = u+1), . . . )2 P-value:
P(T (U) ≥ T (u) |U1+,U+1) =∑
v :T (v)≥T (u)
(U1+
v11
)(n−U1+
U+1−v11
)(n
U+1
)Mathias Drton Lecture 1: Fisher’s exact test 6 / 110
![Page 12: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/12.jpg)
Lecture outline
1 Fisher’s exact test for 2× 2 contingency tables
2 Log-linear models for multi-way tables
3 Markov bases for exact conditional inference
Mathias Drton Lecture 1: Log-linear models 7 / 110
![Page 13: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/13.jpg)
Three-way table (Agresti, 2002)
White subjects were asked about:
(1) “Black children on school bus”, (2) “Black candidate for presidency”,
(3) “Black friend for dinner at home”
HomePresident Busing Yes No ???
Yes Yes 41 65 0No 71 157 1??? 1 17 0
No Yes 2 5 0No 3 44 0??? 1 0 0
??? Yes 0 3 1No 0 10 0??? 0 0 1
??? = ‘don’t know’
Mathias Drton Lecture 1: Log-linear models 8 / 110
![Page 14: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/14.jpg)
Log-linear models
Discrete r.v. X1, . . . ,Xm; X` ∈ [r`]
State space: R =∏m`=1[r`]
Joint probability table: p = (pi | i ∈ R)
Probability simplex: ∆R−1
Definition
Fix a matrix A ∈ Zd×R whose columns all sum to the same value. Thelog-linear model associated with A is the set of positive probability tables
MA ={
p = (pi ) ∈ int(∆R−1) : log p = (log pi ) ∈ rowspan(A)},
where rowspan(A) is the linear space spanned by the rows of A.
Mathias Drton Lecture 1: Log-linear models 9 / 110
![Page 15: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/15.jpg)
Example: Independence model
X , Y : two discrete r.v. with joint probabilities pij > 0
X⊥⊥Y is equivalent to
log pij = log pi+ + log p+j = αi + βj , i ∈ [r ], j ∈ [c].
Suppose r = 2 and c = 3. Then log p ∈ R2×3 is in row span of the(r + c)× rc = 5× 6 matrix
A =
11 12 13 21 22 23
α1 1 1 1 0 0 0α2 0 0 0 1 1 1β1 1 0 0 1 0 0β2 0 1 0 0 1 0β3 0 0 1 0 0 1
.
Mathias Drton Lecture 1: Log-linear models 10 / 110
![Page 16: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/16.jpg)
Contingency tables
Based on n-sample, define m-way contingency table U:
Ui =n∑
k=1
1{X (k)1 =i1,...,X
(k)m =im}
, i = (i1, . . . , im) ∈ R
Let T (n) be the space of non-neg integer tables summing to n.
Definition
We call the vector Au the minimal sufficient statistics for the model MA,and the set of tables
F(u) ={
v ∈ NR : Av = Au}
is the fiber of a contingency table u ∈ T (n) with respect to model MA.
Mathias Drton Lecture 1: Log-linear models 11 / 110
![Page 17: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/17.jpg)
Example: Independence model
Let u be an r × c table.
For the matrix A encoding the independence model X⊥⊥Y :
Au =
(u·+u+·
),
where u·+ and u+· are the row and columns sums of table u.
If r = 2 and c = 3:
Au =
1 1 1 0 0 00 0 0 1 1 11 0 0 1 0 00 1 0 0 1 00 0 1 0 0 1
u11
u12
u13
u21
u22
u23
=
u1+
u2+
u+1
u+2
u+3
.
Mathias Drton Lecture 1: Log-linear models 12 / 110
![Page 18: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/18.jpg)
Hierarchical models
Conditional independence:
X1 and X2 conditionally independent given X3 if
P(X1 = i ,X2 = j |X3 = k) = P(X1 = i |X3 = k)P(X2 = j |X3 = k).
Equivalent to matrices Pk = (pijk) having rank at most 1 for all k.
Log-linear formulation:
log pijk = α(13)ik + α
(23)jk
No three-way interaction:
log pijk = α(12)ij + α
(13)ik + α
(23)jk
Mathias Drton Lecture 1: Log-linear models 13 / 110
![Page 19: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/19.jpg)
Conditional inference
Lemma
If p = eATα ∈MA and u ∈ T (n), then
P(U = u) =n!∏
i∈R ui !eα
T (Au).
Corollary
Conditional distribution is multivariate hypergeometric:
P(U = u |AU = Au) =1/(∏
i∈R ui !)∑
v∈F(u) 1/(∏
i∈R vi !) ,
and does not depend on p.
Mathias Drton Lecture 1: Log-linear models 14 / 110
![Page 20: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/20.jpg)
Exact test
Consider the hypothesis testing problem
H0 : p ∈MA versus H1 : p 6∈ MA.
Maximum likelihood estimates pi
Expected counts ui = npi (same for all tables in a fiber F(u))
Chi-square statistic
X 2(U) =∑i∈R
(Ui − ui )2
ui
Exact p-valueP(X 2(U) ≥ X 2(u) |AU = Au)
Mathias Drton Lecture 1: Log-linear models 15 / 110
![Page 21: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/21.jpg)
Markov chain Monte Carlo
Exact p-value is equal to∑v∈F(u) 1{X 2(v)≥X 2(u)}/
(∏i∈R ui !
)∑v∈F(u) 1/
(∏i∈R vi !
) .
Larger counts or tables: prohibitive to sum over entire fiber
Approximate p-value by Markov chain Monte Carlo algorithms forsampling tables from the conditional distribution
With prob 1, MCMC yields sequence of tables vt ∈ F(u) such thatthe proportion of tables with X 2(vt) ≥ X 2(u) converges to p-value.
Problem
For an irreducible Metropolis-Hastings sampler, find
Finite set of moves that connect any two tables in any fiber.
Mathias Drton Lecture 1: Log-linear models 16 / 110
![Page 22: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/22.jpg)
Lecture outline
1 Fisher’s exact test for 2× 2 contingency tables
2 Log-linear models for multi-way tables
3 Markov bases for exact conditional inference
Mathias Drton Lecture 1: Markov bases 17 / 110
![Page 23: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/23.jpg)
Markov basis – Definition
Log-linear model MA associated with matrix A
Integer kernel kerZ(A)
Definition
A finite subset B ⊂ kerZ(A) is a Markov basis for MA if for all u ∈ T (n)and all pairs v , v ′ ∈ F(u) there exists a sequence u1, . . . , uL ∈ B such that
v ′ = v +L∑
k=1
uk and v +l∑
k=1
uk ≥ 0 for all l = 1, . . . , L.
The elements of the Markov basis are called moves.
Mathias Drton Lecture 1: Markov bases 18 / 110
![Page 24: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/24.jpg)
Metropolis-Hastings algorithm
Input: Contingency table u; Markov basis B for the model MA.
Output: Sequence (X 2(vt))∞t=1 for tables vt in fiber F(u).
Step 1: Initialize v1 = u.
Step 2: For t = 1, 2, . . . repeat the following steps:
(i) Select uniformly at random a move ut ∈ B.(ii) If min(vt + ut) < 0, then set vt+1 = vt , else set
vt+1 =
{vt + ut
vt
with probability
{q
1− q,
where
q = min
{1,
P(U = vt + ut |AU = Au)
P(U = vt |AU = Au)
}.
(iii) Compute X 2(vt).
Mathias Drton Lecture 1: Markov bases 19 / 110
![Page 25: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/25.jpg)
Markov basis for independence model
Let eij be the r × c table:
j
0 0 0 0 0 0 . . .i 0 0 0 1 0 0 . . .
0 0 0 0 0 0 . . ....
......
......
. . .
Proposition
The (unique minimal) Markov basis for the independence model MX⊥⊥Y
consists of the following 2 ·(r
2
)(c2
)moves, each having one-norm 4:
B ={±(eij + ekl − eil − ekj) : 1 ≤ i < k ≤ r , 1 ≤ j < l ≤ c
}.
Mathias Drton Lecture 1: Markov bases 20 / 110
![Page 26: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/26.jpg)
Independence model – Proof
Idea Show that we can use elements of B to bring any twodistinct tables in the same fiber closer to one another.
Claim Given v 6= u, v ∈ F(u) show that there is b ∈ B such that(i) u + b ≥ 0 and (ii) ‖u − v‖1 > ‖u + b − v‖1.
Proof Recall Au yields row and column sums:
(a) Since u 6= v and Au = Av , there is at least one positiveentry in u − v . WLOG, u11 − v11 > 0.
(b) Since Au = Av , there is a negative entry in the first row ofu − v . WLOG, u12 − v12 < 0.
(c) Similarly, u22 − v22 > 0.
(d) Let b = e12 + e21 − e11 − e22. Then‖u − v‖1 > ‖u + b − v‖1 and u + b ≥ 0 as desired.
Mathias Drton Lecture 1: Markov bases 21 / 110
![Page 27: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/27.jpg)
Symbolic computation – 4ti2
Markov basis of ‘no 3-way interaction model’ for 2× 2× 2 table?
Matrix representing model has format 12× 8 (store in file no3way):
12 81 1 0 0 0 0 0 00 0 1 1 0 0 0 00 0 0 0 1 1 0 00 0 0 0 0 0 1 11 0 1 0 0 0 0 00 1 0 1 0 0 0 00 0 0 0 1 0 1 00 0 0 0 0 1 0 11 0 0 0 1 0 0 00 1 0 0 0 1 0 00 0 1 0 0 0 1 00 0 0 1 0 0 0 1
Mathias Drton Lecture 1: Markov bases 22 / 110
![Page 28: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/28.jpg)
Symbolic computation – 4ti2
Compute Markov basis (up to sign) using command markov no3way
Output in file no3way.mar:
1 81 -1 -1 1 -1 1 1 -1
Two moves
±(e111 + e122 + e212 + e221 − e112 − e121 − e211 − e222)
correspond to the quartic equation
p111p122p212p221 = p112p121p211p222
Recall:pijk ∝ θ
(12)ij θ
(13)ik θ
(23)jk
Mathias Drton Lecture 1: Markov bases 23 / 110
![Page 29: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/29.jpg)
Polynomial algebra
Polynomial ring R[p] = R[p1, p2, . . . , pk ]
For non-neg integer table u = (u1, . . . , uk) ∈ Nk define monomial
pu = pu11 pu2
2 · · · pukk
For integer table u = u+ − u− ∈ Zk with positive and negative partsu+, u− ∈ Nk define binomial
pu+ − pu−
Example:
p =
(p11 p12
p21 p22
), u =
(2 −2−1 1
)=⇒ p2
11p22 − p212p21
Mathias Drton Lecture 1: Markov bases 24 / 110
![Page 30: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/30.jpg)
Polynomial algebra
A subset I ⊂ R[p] is an ideal if
f , g ∈ I =⇒ f + g ∈ I
f ∈ I , h ∈ R[p] =⇒ hf ∈ I
Hilbert’s basis theorem:Every ideal I has a finite generating set f1, . . . , fm ∈ R[p], that is,
I = 〈f1, . . . , fm〉 =
{m∑
i=1
hi fi : h1, . . . , hm ∈ R[p]
}
Mathias Drton Lecture 1: Markov bases 25 / 110
![Page 31: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/31.jpg)
Fundamental theorem
Given a matrix A ∈ Nd×k for a log-linear model, define the (toric) ideal
IA := 〈 pu+ − pu− : u ∈ kerZ(A) 〉 ⊂ R[p].
Theorem (Fundamental theorem of Markov bases)
A subset B of kerZ(A) is a Markov basis if and only if the correspondingset of binomials { pb+ − pb− : b ∈ B } generates the ideal IA. Inparticular, a (finite) Markov basis always exists.
Mathias Drton Lecture 1: Markov bases 26 / 110
![Page 32: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/32.jpg)
Example: Independence model for 2× 2 table
We have shown that a Markov basis (up to sign) is given by
b =
(1 −1−1 1
)Hence, IA = I ∗ := 〈p11p22 − p12p21〉
Example for IA ⊆ I ∗: Consider the tables
u =
(4 12 5
), v =
(3 23 4
).
Since u − b = v , we have u − b+ = v − b− and thus
p411p1
12p221p5
22 − p311p2
12p321p4
22 = p311p1
12p221p4
22(p11p22 − p12p21) ∈ I ∗
Mathias Drton Lecture 1: Markov bases 27 / 110
![Page 33: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/33.jpg)
Computing Markov bases
Theorem
The ideal IA is a homogeneous ideal and its homogeneous elements areexactly the homogeneous polynomials f in R[p] that vanish on thelog-linear model MA:
f (p) = 0 for all p ∈MA.
For a matrix A = (aij) ∈ Nd×k , compute a Markov basis byeliminating the variables from the equation system
pj − θa1j
1 θa2j
2 · · · θadj
d = 0, i = 1, . . . , k .
Software for Grobner basis calculations.... . . Macaulay 2, Singular, 4ti2
Mathias Drton Lecture 1: Markov bases 28 / 110
![Page 34: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/34.jpg)
Example: No 3-way interaction in 2× 2× 2 table
Equation system:
p111 = α11β11γ11, p112 = α11β12γ12,
p121 = α12β11γ21, p122 = α12β12γ22,
p211 = α21β21γ11, p212 = α21β22γ12,
p221 = α22β21γ21, p222 = α22β22γ22.
Variable elimination:Every relation among pijk is a polynomial multiple of
p111p122p212p221 − p112p121p211p222
Markov basis:
±(e111 + e122 + e212 + e221 − e112 − e121 − e211 − e222)
Mathias Drton Lecture 1: Markov bases 29 / 110
![Page 35: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/35.jpg)
Singular session
LIB "elim.lib";ring R = 0,(p111,p112,p121,p122,p211,p212,p221,p222,
a11,a12,a21,a22,b11,b12,b21,b22,c11,c12,c21,c22),dp;ideal M =p111 - a11*b11*c11,p112 - a11*b12*c12,p121 - a12*b11*c21,p122 - a12*b12*c22,p211 - a21*b21*c11,p212 - a21*b22*c12,p221 - a22*b21*c21,p222 - a22*b22*c22;eliminate(M, a11*a12*a21*a22*b11*b12*b21*b22*
c11*c12*c21*c22);
Mathias Drton Lecture 1: Markov bases 30 / 110
![Page 36: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/36.jpg)
Background reading
Cox, D.; Little, J.; O’Shea, D. (2007).Ideals, varieties, and algorithms.Springer, New York, 2007.
Mathias Drton Lecture 1: Markov bases 31 / 110
![Page 37: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/37.jpg)
Database: http://mbdb.mis.mpg.de
Mathias Drton Lecture 1: Markov bases 32 / 110
![Page 38: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/38.jpg)
Slim and long tables
Theorem
Let X1 be a r.v. with 3 states, and X2 and X3 r.v. with r2 and r3 states,resp. Let v ∈ Zk be any integer vector. There are r2, r3 ∈ N and acoordinate projection π : Z3×r2×r3 → Zk such that every minimal Markovbasis for the no 3-way interaction model contains a table u with π(u) = v.
Theorem
Fix a set of interactions Γ for a hierarchical log-linear model, and fixr2, . . . , rm. There exists a number b(Γ, r2, . . . , rm) <∞ such that theone-norms of the elements of any minimal Markov basis for Γ ons × r2 × · · · × rm tables are less than or equal to b(Γ, r2, . . . , rm). Thisbound is independent of s, which can grow large.
Mathias Drton Lecture 1: Markov bases 33 / 110
![Page 39: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/39.jpg)
Exercise
Exercises 6.1 and 6.2 in the lecture notes
Perform an exact test for your favorite table
e.g. test ‘no 3-way interaction’ in the example from Agresti (2002)shown earlier:
HomePresident Busing Yes No ???
Yes Yes 41 65 0No 71 157 1??? 1 17 0
No Yes 2 5 0No 3 44 0??? 1 0 0
??? Yes 0 3 1No 0 10 0??? 0 0 1
??? = ‘don’t know’
Mathias Drton Lecture 1: Markov bases 34 / 110
![Page 40: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/40.jpg)
Part II
Likelihood Ratio Tests and Singularities
4 Algebraic statistical models5 Large-sample asymptotics and Chernoff’s theorem6 Examples
![Page 41: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/41.jpg)
Lecture outline
4 Algebraic statistical models
5 Large-sample asymptotics and Chernoff’s theorem
6 Examples
Mathias Drton Lecture 2: Algebraic statistical models 36 / 110
![Page 42: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/42.jpg)
Example: Bayesian network
Sachs et al. (2005): Analysis of flow cytometry data
Expression values for 11 proteins discretized −→ ternary variables
Large sample size (observational part: n = 1200)
Bayesian network (conditional independence model):
Typical task: test absence of edges
Likelihood ratio test of absence‘PKC → PKA’ can be based on χ2
4
distribution
See Chapter 3 in the lecture notes
Mathias Drton Lecture 2: Algebraic statistical models 37 / 110
![Page 43: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/43.jpg)
Chi-square asymptotics
Theorem
Suppose
(i) {Pθ : θ ∈ Θ} is a regular exponential family (Θ ⊂ Rk open),
(ii) Θ0 ⊂ Θ1 are smooth submanifolds of Θ,
(iii) True parameter point θ0 ∈ Θ0.
Then the likelihood ratio statistic for testing
H0 : θ ∈ Θ0 vs. H1 : θ ∈ Θ1 \Θ0
tends to χ2dim(Θ1)−dim(Θ0) as n→∞.
Theorem covers Bayesian network example because
interior of probability simplex is regular exponential family, and
Bayesian networks define smooth submanifolds.
Mathias Drton Lecture 2: Algebraic statistical models 38 / 110
![Page 44: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/44.jpg)
Regular exponential families
Definition
Let PΘ = {Pθ : θ ∈ Θ} be a family of probability distributions onX ⊆ Rm that have densities with respect to a measure ν. We call PΘ anexponential family if there is a statistic T : X → Rk and functionsh : Θ→ Rk and Z : Θ→ R such that each distribution Pθ has ν-density
pθ(x) =1
Z (θ)exp{〈h(θ),T (x)〉}, x ∈ X .
If
H =
{η ∈ Rk :
∫X
exp{〈η,T (x)〉} dν(x) <∞}
is an open subset of Rk and h a diffeomorphism between Θ and H, thenwe say that PΘ is a regular exponential family of order k .
Mathias Drton Lecture 2: Algebraic statistical models 39 / 110
![Page 45: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/45.jpg)
Curved exponential families
Definition
Suppose {Pθ : θ ∈ Θ} is a regular exponential family. If Θ0 is a smoothsubmanifold of Θ, then {Pθ : θ ∈ Θ0} is a curved exponential family.
Well-developed large-sample theory for CEFs
Estimation and confidence intervals:
Maximum likelihood estimators are asymptotically normal.
Hypothesis testing:
Likelihood ratio statistics have asymptotic chi-square distributions.Wald statistics asymptotic chi-square distributions.
Model selection:
Bayesian information criterion (BIC) is consistent and connected to theasymptotics of marginal likelihood integrals.
Mathias Drton Lecture 2: Algebraic statistical models 40 / 110
![Page 46: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/46.jpg)
Example: Instrumental variables
Estimate coeffient γ43 in the system
X3 = γ35X5 + ε3,
X4 = γ43X3 + γ45X5 + ε4,
X5 = ε5
with εi ∼ N (0, ωi ) independent
X3
X4
X5
Variable X5 hidden
: Consider distributions
(X1, . . . ,X4) ∼ N(0,Σ(γ, ω)
)(γ, ω)→ Σ(γ, ω) polynomial parametrization
Mathias Drton Lecture 2: Algebraic statistical models 41 / 110
![Page 47: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/47.jpg)
Example: Instrumental variables
Estimate coeffient γ43 in the system
X1 = ε1,
X2 = ε2,
X3 = γ31X1 + γ32X2 + γ35X5 + ε3,
X4 = γ43X3 + γ45X5 + ε4,
X5 = ε5
with εi ∼ N (0, ωi ) independent
X1
X3
X4
X2
X5
Variable X5 hidden
Marginal distribution
(X1, . . . ,X4) ∼ N(0,Σ(γ, ω)
)
Mathias Drton Lecture 2: Algebraic statistical models 42 / 110
![Page 48: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/48.jpg)
Example: Instrumental variables
Covariance matrix parametrization is a polynomial map:
Σ(γ, ω) =ω1 0 γ31 ω1 γ43 γ31 ω1
ω2 γ32 ω2 γ43 γ32 ω2
Var[X3] γ43 Var[X3] + γ35 γ45 ω5
ω4 + γ243 Var[X3] + γ2
45 ω5 + 2γ45 γ43 γ35 ω5
with
Var[X3] = ω3 + γ231 ω1 + γ2
32 ω2 + γ235 ω5
Coordinate σij is a combinatorial expression summing termsassociated with ‘treks’
i ←− `1 ←− `2 ←− . . .←− t −→ . . . −→ r2 −→ r1 −→ j
Mathias Drton Lecture 2: Algebraic statistical models 43 / 110
![Page 49: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/49.jpg)
Example: Instrumental variables
In this hidden variable model test
H0 : γ31 = γ32 = 0
Null distrib. of LR statistic (n = 1000) X1
X3
X4
X2
X5
0 2 4 6 8 10 12
0.0
0.2
0.4
0.6
0.8
1.0
CDF
F(x
)
Mathias Drton Lecture 2: Algebraic statistical models 44 / 110
![Page 50: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/50.jpg)
Algebraic exponential families
Asymptotic behavior of the LRT in instrumental variables example?
Hidden variable models 6= curved exponential family
What is a suitable general framework to study hidden variable models?
Definition
Suppose {Pθ : θ ∈ Θ} is a regular exponential family. If Θ0 is asemi-algebraic subset of Θ, then the submodel {Pθ : θ ∈ Θ0} is analgebraic exponential family.
Mathias Drton Lecture 2: Algebraic statistical models 45 / 110
![Page 51: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/51.jpg)
Semi-algebraic sets
Definition
Let R[t1, . . . , tk ] be the ring of polynomials in the indeterminates t1, . . . , tkwith real coefficients. A semi-algebraic set is a finite union of the form
Θ0 =m⋃
i=1
{θ ∈ Rk | f (θ) = 0 for f ∈ Fi and h(θ) > 0 for h ∈ Hi},
where Fi ,Hi ⊂ R[t1, . . . , tk ] are collections of polynomials and all Hi finite.
Theorem (Tarski-Seidenberg)
If g : Rd → Rk is a polynomial map and Γ is a semi-algebraic set, thenΘ0 = g(Γ) is semi-algebraic.
Mathias Drton Lecture 2: Algebraic statistical models 46 / 110
![Page 52: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/52.jpg)
Lecture outline
4 Algebraic statistical models
5 Large-sample asymptotics and Chernoff’s theorem
6 Examples
Mathias Drton Lecture 2: Large-sample asymptotics and Chernoff’s theorem 47 / 110
![Page 53: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/53.jpg)
Likelihood ratio test
Independent observations X (1), . . . ,X (n) with unknown distribution
Statistical model {Pθ : θ ∈ Θ}, Θ ⊆ Rk
Suppose Pθ have density functions pθ(x). Define likelihood function
Ln : Θ→ R, θ 7→n∏
i=1
pθ(X (i)).
Test H0 : θ ∈ Θ0 vs. H1 : θ ∈ Θ1 \Θ0 for some Θ0 ⊂ Θ1 ⊂ Θ.
Definition
The likelihood ratio test rejects H0 if the likelihood ratio statistic
λn = 2 logsupθ∈Θ1
Ln(θ)
supθ∈Θ0Ln(θ)
is “too large” =⇒ p-value PH0(λn ≥ λobs).
![Page 54: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/54.jpg)
Canonical example: Normal means
Normal mean model {N (θ, Ik) : θ ∈ Rk}Log-likelihood function
`n(θ) = −nk
2log(2π)−
n
2‖Xn − θ‖2
2 −1
2
n∑i=1
‖X (i) − Xn‖22.
Sample mean
Xn =1
n
n∑i=1
X (i)
Likelihood ratio statistic for testing H0 : θ ∈ Θ0 vs. H1 : θ 6∈ Θ0:
λn = n · infθ∈Θ0
‖Xn − θ‖22 = inf
θ∈Θ0
‖√
n(Xn − θ0)−√
n(θ − θ0)‖22
where θ0 is the true parameter.
Mathias Drton Lecture 2: Large-sample asymptotics and Chernoff’s theorem 49 / 110
![Page 55: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/55.jpg)
Canonical example: Normal means
Asymptotics of LR statistic determined by squared Euclidean distancebetween N (0, Ik)-point and “limit of
√n(Θ0 − θ0)”
Example: Cuspidal cubic
Bivariate normal mean model
Θ0 cuspidal cubic {(θ1, θ2) : θ31 = θ2
2}
Tangent cone at θ0 = 0 is half-ray{(θ1, θ2) : θ1 ≥ 0, θ2 = 0}
Limiting distribution of LRT is a mixtureof chi-squares:
λnD−→ 1
2χ2
1 +1
2χ2
2.
Mathias Drton Lecture 2: Large-sample asymptotics and Chernoff’s theorem 50 / 110
![Page 56: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/56.jpg)
Chernoff’s theorem: Preparation
Definition (Tangent cone)
TC θ0(Θ0) =
{lim
n→∞
θn − θ0
βn: βn > 0, θn ∈ Θ0, θn −→ θ0
}
Definition (Fisher-information matrix)
Positive semi-definite matrix I (θ) with entries
I (θ)ij = Eθ
[(∂
∂θilog pθ(X )
)(∂
∂θjlog pθ(X )
)], i , j ∈ [k].
Mathias Drton Lecture 2: Large-sample asymptotics and Chernoff’s theorem 51 / 110
![Page 57: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/57.jpg)
Chernoff’s theorem (for exponential families)
Theorem
Suppose {Pθ : θ ∈ Θ} is a regular exponential family with Θ ⊆ Rk . Letθ0 ∈ Θ0 ⊆ Θ ⊆ Rk be the true parameter point. If Θ0 is Chernoff-regularat θ0 and n→∞, then LR statistic λn for H0 : θ ∈ Θ0 vs. H1 : θ 6∈ Θ0
converges tomin
τ∈TCθ0(Θ0)‖Z − I (θ0)1/2τ‖2
2
where Z ∼ N (0, Ik) and I (θ0)1/2 is any matrix square root of theFisher-information I (θ0).
Mathias Drton Lecture 2: Large-sample asymptotics and Chernoff’s theorem 52 / 110
![Page 58: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/58.jpg)
What is Chernoff-regularity?
Condition on how tangent cone TC θ0(Θ0) approximates the set Θ0
locally at θ0 ∈ Θ0.Allows one to pass from supθ∈Θ0
. . . to supτ∈TCθ0(Θ0) . . . .
For θ0 = 0:
distance(θ,TC 0(Θ0)) = o(‖θ‖), θ ∈ Θ0,
distance(τ,Θ0) = o(‖τ‖), τ ∈ TC 0(Θ0)
Definition
A set Θ0 ⊆ Rk is Chernoff-regular at θ0 if
For all τ ∈ TC θ0(Θ0) and βn ↘ 0there exists a sequence θn → θ0 in Θ0 such that
limn→∞
θn − θ0
βn= τ.
Mathias Drton Lecture 2: Large-sample asymptotics and Chernoff’s theorem 53 / 110
![Page 59: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/59.jpg)
Chernoff-regularity of semi-algebraic sets
Lemma
Semi-algebraic sets are everywhere Chernoff-regular.
Follows from ‘curve selection lemma’ that implies that for all τ ∈ TΘ(θ0)there exists a (real analytic) map α : [0, ε)→ Θ with α(0) = θ0 s.t.
τ = limt→0+
α(t)− α(0)
t.
Corollary (Testing in a submodel)
Suppose {Pθ : θ ∈ Θ} is regular exponential family with Θ ⊆ Rk . LetΘ0,Θ1 be semi-algebraic subsets of Θ. If true parameter θ0 is in Θ0 andn→∞, then LR statistic for H0 : θ ∈ Θ0 vs. H1 : θ ∈ Θ1 \Θ0 converges to
minτ∈TCθ0
(Θ0)‖Z − I (θ0)1/2τ‖2
2− minτ∈TCθ0
(Θ1)‖Z − I (θ0)1/2τ‖2
2, Z ∼ N (0, Ik).
Mathias Drton Lecture 2: Large-sample asymptotics and Chernoff’s theorem 54 / 110
![Page 60: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/60.jpg)
Lecture outline
4 Algebraic statistical models
5 Large-sample asymptotics and Chernoff’s theorem
6 Examples
Mathias Drton Lecture 2: Examples 55 / 110
![Page 61: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/61.jpg)
Linear spaces
Lemma
If Θ0 is a d-dimensional linear subspace of Rk and X ∼ N (0,Σ) withpositive definite covariance matrix Σ, then
infθ∈Θ0
(X − θ)T Σ−1(X − θ) ∼ χ2k−d .
Corollary
Likelihood ratio statistic is asymptotically chi-square when testing linear orsmooth hypotheses.
Mathias Drton Lecture 2: Examples 56 / 110
![Page 62: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/62.jpg)
Order-restricted inference
Example:
X1 : Difference in blood pressure before and after taking 1 pillX2 : Difference in blood pressure before and after taking 2 pills
Suppose X1 ∼ N(µ1, σ20) and X2 ∼ N(µ2, σ
20) and test:
H0 : µ2 ≥ µ1 ≥ 0 versus H1 : (µ2 < µ1 or µ1 < 0)
or possibly,
H0 : µ2 = µ1 = 0 versus H1 : µ2 ≥ µ1 ≥ 0
Mathias Drton Lecture 2: Examples 57 / 110
![Page 63: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/63.jpg)
Mixture of chi-square distributions
1
8· χ2
0 +1
2· χ2
1 +3
8· χ2
2
Mathias Drton Lecture 2: Examples 58 / 110
![Page 64: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/64.jpg)
Convex cones – ‘Boundary problems’
Lemma
Distance between standard normal random vector and convex cone isdistributed like a mixture of chi-square distributions.
Theorem (Miles, 1959; Drton & Klivans, 2009)
(a)
H0 : θ ∈{
x ∈ Rk : x1 ≤ x2 ≤ · · · ≤ xk
}Mixture weights ∝ coeff’s of t(t − 1) · · · (t − k + 1)
(b)
H0 : θ ∈{
x ∈ Rk : 0 ≤ x1 ≤ x2 ≤ · · · ≤ xk
}Mixture weights ∝ coeff’s of (t − 1)(t − 3) · · · (t − 2k + 1).
Mathias Drton Lecture 2: Examples 59 / 110
![Page 65: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/65.jpg)
Singularities
Geometry of a semi-algebraic set Θ0 ⊆ Rk expresses itselfalgebraically in the vanishing ideal
I(Θ0) = {f ∈ R[t1, . . . , tk ] : f (θ) = 0 for all θ ∈ Θ0}.
Finite generating set
〈 f1, . . . , fs 〉 = I(Θ0), f1, . . . , fs ⊂ R[t1, . . . , tk ]
Definition
A point θ0 in Θ0 is a singularity if the rank of the Jacobian matrix
Jf (θ0) =
(∂fi (t)
∂tj
)t=θ
∈ Rs×k .
is smaller than k − dim Θ0.
Mathias Drton Lecture 2: Examples 60 / 110
![Page 66: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/66.jpg)
Algebraic tangent cone
Let θ0 be a root of the polynomial f ∈ R[t1, . . . , tk ] and write
f (t) =L∑
h=l
fh(t − θ0),
where fh homogeneous, degree(fh) = h, and fl 6= 0.
Since f (θ0) = 0, minimal degree l ≥ 1, and we define fθ0,min = fl .
Tangent cone ideal:
{fθ0,min : f ∈ I(Θ0)} ⊂ R[t1, . . . , tk ].
Lemma
Suppose θ0 is a point in the semi-algebraic set Θ0 and f ∈ R[t1, . . . , tk ] apolynomial such that f (θ0) = 0 and f (θ) ≥ 0 for all θ ∈ Θ0. Then everytangent vector τ ∈ TC θ0(Θ0) satisfies that fθ0,min(τ) ≥ 0.
Mathias Drton Lecture 2: Examples 61 / 110
![Page 67: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/67.jpg)
Example: Cuspidal cubic
Θ0 = {(θ1, θ2) : θ31 = θ2
2}Tangent cone ideal for θ0 = 0 isgenerated by t2
2
Associated algebraic tangent cone
{θ : θ22 = 0} = {θ : θ2 = 0}
Tangent cone at θ0 = 0 is half-ray
{θ : θ1 ≥ 0, θ2 = 0}
Mathias Drton Lecture 2: Examples 62 / 110
![Page 68: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/68.jpg)
Instrumental variables – Singularities
Covariance matrixω1 0 γ31 ω1 γ43 γ31 ω1
ω2 γ32 ω2 γ43 γ32 ω2
ω3 + . . . γ35 γ45 ω5 + . . .
ω4 + . . .
X1
X3
X4
X2
X5
Vanishing idealI = 〈σ12, σ13σ24 − σ14σ23 〉
Singular locus:
{Σ = (σij) : σ12 = σ13 = σ14 = σ23 = σ24 = 0}
coincides with H0 : γ31 = γ32 = 0
Mathias Drton Lecture 2: Examples 63 / 110
![Page 69: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/69.jpg)
Instrumental variables – Tangent cone
Singularities are ‘zero’
Vanishing ideal is homogeneous and thus equal to tangent cone ideal
Algebraic tangent cone at a singularity:(diag2×2 rank ≤ 1
arbitrary2×2
)Geometric tangent cone TC is closed cone that contains all derivativedirections. It is equal to algebraic cone.
Mathias Drton Lecture 2: Examples 64 / 110
![Page 70: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/70.jpg)
Instrumental variables – Asymptotics
Proposition
Consider testingH0 : γ31 = γ32 = 0
in the instrumental variables example. Under the null and as n→∞,
λn −→d max{eigenvalues(W(2, I ))}
where W2×2(2, I ) is standard Wishart matrix with 2 degrees of freedom.
‘Proof’ (Details in worked exercises 6.4 and 6.5 in lecture notes)
Tangent cone invariant under transformation with matrix square rootof Fisher-information
Distance between 2× 2-matrix A and {rank ≤ 1} given by smallersingular value of A
Mathias Drton Lecture 2: Examples 65 / 110
![Page 71: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/71.jpg)
Factor analysis
Factor analysis (conditional independence given hidden variable)
X1 = γ1H + ε1,
X2 = γ2H + ε2,
X3 = γ3H + ε3,
X4 = γ4H + ε4
X1 X2 X3 X4
H
Multivariate normal distributions N4(µ,Σ) with µ ∈ R4 and Σ in
Θ0 = {∆ + γγt | ∆ ∈ R4×4pd diagonal, γ ∈ R4}
Software (e.g. factanal in R) for testing
H0 : Σ ∈ Θ0 vs. H1 : Σ 6∈ Θ0,
uses LRT and χ22-approximation
Mathias Drton Lecture 2: Examples 66 / 110
![Page 72: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/72.jpg)
Factor analysis
Histograms of 20,000 simulated p-values for sample size n = 1000:
Γ = (1, 1, 1, 1)t
p−value
0.0 0.4 0.8
Γ = (1, 1, 1, 0)t
p−value
0.0 0.4 0.8
0.0
0.4
0.8
Γ = (1, 1, 0, 0)t
p−value
0.0 0.4 0.8
0.0
0.6
1.2
Γ = (1, 0, 0, 0)t
p−value
0.0 0.4 0.8
0.0
1.0
Factor loadings 0 or 1, cond. variances 1/3 =⇒ correlations 0 or 3/4.
Three types of limiting distributions?
Mathias Drton Lecture 2: Examples 67 / 110
![Page 73: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/73.jpg)
Factor analysis – Singular session
LIB "sing.lib";LIB "linalg.lib";
ring R = 0,(s11,s12,s13,s14, s22,s23,s24, s33,s34, s44,d1,d2,d3,d4, g1,g2,g3,g4),dp;
// Compute the vanishing ideal by eliminationideal F = s11-(d1+g1^2), s12-g1*g2, s13-g1*g3, s14-g1*g4,
s22-(d2+g2^2), s23-g2*g3, s24-g2*g4,s33-(d3+g3^2), s34-g3*g4,s44-(d4+g4^2);
ideal I = eliminate(F, d1*d2*d3*d4*g1*g2*g3*g4);I;
Mathias Drton Lecture 2: Examples 68 / 110
![Page 74: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/74.jpg)
Factor analysis – Singular session
ring RR = 0,(s11,s12,s13,s14, s22,s23,s24, s33,s34, s44),dp;ideal I = fetch(R,I);dim(groebner(I));
// Compute the singularitiesideal S = slocus(I); S;primdecGTZ(S);
// Tangent cone at diagonal matrixtangentcone(I);// at matrix with s12=1tangentcone( subst(I,s12,s12+1) );// at regular point with s12=s13=1tangentcone( subst(I,s12,s12+1,s13,s13+1) );
Mathias Drton Lecture 2: Examples 69 / 110
![Page 75: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/75.jpg)
Factor analysis: Singularities and tangent cones
Theorem (D, 2009)
(i) A covariance matrix Σ is a singularity of the one-factor model if andonly if Σ has at most one non-zero off-diagonal entry σij , i < j .
(ii) If Σ is diagonal then the tangent cone is the topological closure of{∆ + γγt | ∆ ∈ Rm×m diagonal, γ ∈ Rm
}.
(iii) If Σ has exactly one non-zero off-diagonal entry that is positive, sayσ12 > 0, then the tangent cone is the set of symmetric matrices
θ =
θ11 θ12 θ13 . . . θ1m
θ12 θ22 cθ13 . . . cθ1m
θ33 . . .θmm
, c ∈[σ12
σ11,σ22
σ12
].
Case σ12 < 0 is similar with c < 0.
Mathias Drton Lecture 2: Examples 70 / 110
![Page 76: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/76.jpg)
Exercise: RC association model (Haberman, 1981)
Two discrete r.v. X1 and X2 with r1 and r2 states, respectively.
Logarithmic parametrization
log pij = αi + βj + γiδj , i ∈ [r1], j ∈ [r2]
What are the singularities? (in log-prob coordinates)
What do the tangent cones at the singularities look like?
What is the asymptotic distribution for the likelihood ratio statisticwhen testing the independence model X1⊥⊥X2 against the RCassociation model?
Mathias Drton Lecture 2: Examples 71 / 110
![Page 77: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/77.jpg)
Part III
Bayesian Integrals
7 Information criteria for model selection8 Marginal likelihood integrals9 Resolution of singularities and Newton polyhedra10 Reduced rank regression
![Page 78: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/78.jpg)
Lecture outline
7 Information criteria for model selection
8 Marginal likelihood integrals
9 Resolution of singularities and Newton polyhedra
10 Reduced rank regression
Mathias Drton Lecture 3: Information criteria for model selection 73 / 110
![Page 79: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/79.jpg)
Model selection: Setup
Observations X (1), . . . ,X (n) ∼ P i.i.d.
Unknown P assumed to be in (identifiable) ambient statistical model
{Pθ : θ ∈ Θ}, Θ ⊆ Rk .
True parameter θ0 is such that Pθ0 = P.
Call submodel given by Θ0 ⊂ Θ true if θ0 ∈ Θ0.
Model selection problem
Find the “simplest” true model from a set of competing submodelsassociated with
Θ1,Θ2, . . . ,ΘM ⊆ Θ.
Mathias Drton Lecture 3: Information criteria for model selection 74 / 110
![Page 80: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/80.jpg)
Score-based search
Strategy
Assign a score to each model and maximize the score.
Assume densities pθ(x), and define likelihood function
Ln : Θ→ R, θ 7→n∏
i=1
pθ(X (i)).
For submodel Θi , let
ˆn(i) = sup{ log Ln(θ) | θ ∈ Θi}, i = 1, . . . ,M.
If Θ1 ⊆ Θ2, then ˆn(1) ≤ ˆ
n(2).
Mathias Drton Lecture 3: Information criteria for model selection 75 / 110
![Page 81: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/81.jpg)
Information criteria
Definition
The information criterion associated with a penalty function πn : [M]→ Rassigns the score
τn(i) = ˆn(i)− πn(i)
to the i-th model, i = 1, . . . ,M.
Example
AIC: πn(i) = dim(Θi ) (Akaike)
BIC: πn(i) = dim(Θi )2 log(n) (Bayesian, Schwarz)
Information criteria strike balance between model fit and modeldimensionality.
Mathias Drton Lecture 3: Information criteria for model selection 76 / 110
![Page 82: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/82.jpg)
Basic consistency result
Theorem (compare Haughton, 1988)
Consider a regular exponential family (Pθ | θ ∈ Θ). In particular, Θ ⊆ Rk
is open. Let Θ1,Θ2 ⊆ Θ be any two sets.
1 Suppose θ0 ∈ Θ2 \Θ1. If 1n |πn(2)− πn(1)| n→∞−→ 0, then
limn→∞
Pθ0 (τn(1) < τn(2)) = 1.
2 Suppose θ0 ∈ Θ1 ∩Θ2. If πn(1)− πn(2)n→∞−→ ∞, then
limn→∞
Pθ0 (τn(1) < τn(2)) = 1.
Mathias Drton Lecture 3: Information criteria for model selection 77 / 110
![Page 83: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/83.jpg)
Consistency
Corollary
Suppose a collection of models is given by closed sets Θ1,Θ2, . . . ,ΘM . Ifthe collection is closed under intersections, and Θi ⊂ Θj impliesdim(Θi ) < dim(Θj), then:
1 AIC identifies a true model with prob one as n→∞.
2 BIC identifies smallest true model with prob one as n→∞.
Example
1 Linear regression (random design)
2 Undirected graphical models
3 Determining rank in reduced-rank regression (‘singularities)
4 Determining number of factors in factor analysis (‘singularities)
5 Directed graphical models (‘faithfulness’), hidden var’s (‘singularities’)
Mathias Drton Lecture 3: Information criteria for model selection 78 / 110
![Page 84: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/84.jpg)
Lecture outline
7 Information criteria for model selection
8 Marginal likelihood integrals
9 Resolution of singularities and Newton polyhedra
10 Reduced rank regression
Mathias Drton Lecture 3: Marginal likelihood integrals 79 / 110
![Page 85: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/85.jpg)
Bayesian model determination
Prior probability of model i :
P(Θi ), i = 1, . . . ,M
Prior distribution of parameter in model i :
Qi (θ), θ ∈ Θi
Likelihood function:
Ln(θ | X (1), . . . ,X (n)) =n∏
i=1
pθ(X (i))
Posterior probability of model i :
P(Θi | X (1), . . . ,X (n)) ∝ P(Θi )
∫Θi
Ln(θ | X (1), . . . ,X (n)) dQi (θ)︸ ︷︷ ︸marginal/integrated likelihood
Mathias Drton Lecture 3: Marginal likelihood integrals 80 / 110
![Page 86: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/86.jpg)
Bayesian model determination
Prior probability of model i :
P(Θi ), i = 1, . . . ,M
Prior distribution of parameter in model i :
Qi (θ), θ ∈ Θi
Likelihood function:
Ln(θ | X (1), . . . ,X (n)) =n∏
i=1
pθ(X (i))
Posterior probability of model i :
P(Θi | X (1), . . . ,X (n)) ∝ P(Θi )
∫Θi
Ln(θ | X (1), . . . ,X (n)) dQi (θ)︸ ︷︷ ︸marginal/integrated likelihood
Mathias Drton Lecture 3: Marginal likelihood integrals 80 / 110
![Page 87: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/87.jpg)
Marginal likelihood
In typical applications, the models are parametrized:
θ = gi (γ), γ ∈ Rd
Priors Qi specified via distributions on γ that have densities pi (γ)
Marginal likelihood for one model (suppressing index i):
µn =
∫Rd
Ln
(g(γ) | X (1), . . . ,X (n)
)p(γ) dγ
=
∫Rd
e`n( g(γ) |X (1),...,X (n))p(γ) dγ
Frequentist view
Suppose X (1), . . . ,X (n), · · · ∼ Pθ0 are i.i.d. with θ0 = g(γ0).
What is the asymptotic behavior of the sequence (µn)?
Mathias Drton Lecture 3: Marginal likelihood integrals 81 / 110
![Page 88: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/88.jpg)
Asymptotics for marginal likelihood integrals
Theorem (Laplace approximation; Haughton, 1988)
Let {Pθ : θ ∈ Θ} be a regular exponential family with Θ ⊆ Rk . Consideran open set Γ ⊆ Rd and a smooth injective map g : Γ→ Rk withcontinuous inverse. Let θ0 = g(γ0) be the true parameter, and assumethat the prior density p(γ) is smooth and positive in a neighborhood of γ0.Then
logµn = ˆn −
d
2log(n) + Op(1),
whereˆn = sup
γ∈Γ`n(g(γ) |X (1), . . . ,X (n)
).
Recall: Rn = Op(1) if ∀ε > 0 ∃Mε ∀n P(|Rn| > Mε) < ε
Haughton actually gives expansion of log µn up to Op
(n−1/2
)Mathias Drton Lecture 3: Marginal likelihood integrals 82 / 110
![Page 89: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/89.jpg)
Example: Normal means model
Observations:
X (1), . . . ,X (n) ∼ N (θ, Ik×k), θ ∈ Θ = Rk
Likelihood function:
Ln(θ | X (1), . . . ,X (n)) =
(1√
(2π)k
)n
exp{−n · 1
2 ||Xn − θ||2}
Model parametrization g : Rd → Rk
Marginal likelihood
µn = Cn
∫Rd
exp{−n · 1
2‖Xn − g(γ)‖2}
p(γ) dγ
Mathias Drton Lecture 3: Marginal likelihood integrals 83 / 110
![Page 90: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/90.jpg)
Cuspidal cubic
Model Θ0 = {θ ∈ R2 : θ22 = θ3
1}Parametrized by g(γ) = (γ2, γ3)
If γ0 6= 0, i.e., g(γ0) 6= 0, thenHaughton’s Theorem applies.
If θ0 = g(γ0) 6= 0, then
log
∫ ∞−∞
exp{−n · 1
2‖Xn − g(γ)‖2}
p(γ) dγ = −1
2log(n) + Op(1).
(Exponent ≈ quadratic in γ, Gaussian density with variance c/n)
What if θ0 = 0 ⇐⇒ γ0 = 0?
Mathias Drton Lecture 3: Marginal likelihood integrals 84 / 110
![Page 91: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/91.jpg)
Cuspidal cubic
Integral with normalizing constant omitted:∫ ∞−∞
exp{− 1
2
[(√
nγ2 −√
nXn,1)2 + (√
nγ3 −√
nXn,2)2]}
p(γ) dγ
Change of variables γ = n1/4γ:
n−1/4
∫ ∞−∞
exp{− 1
2
[(γ2 −
√nXn,1)2+( γ3
n1/4−√
nXn,2
)2]}p
(γ
n1/4
)d γ.
Let θ0 = 0 and Z1,Z2ind∼ N (0, 1). Limit when multiplying by n1/4:∫ ∞
−∞exp
{− 1
2
[(γ2 − Z1)2 + Z 2
2
]}p (0) dγ.
Hence, log µn = ˆn − 1
4 log(n) + Op(1)
Mathias Drton Lecture 3: Marginal likelihood integrals 85 / 110
![Page 92: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/92.jpg)
Observation
Sequence of random intervals:
logµn = log
∫Rd
Cn exp{−n · 1
2‖Xn − g(γ)‖2}
p(γ) dγ
=
{ˆn − 1
2 log(n) + Op(1) if γ0 6= 0,ˆn − 1
4 log(n) + Op(1) if γ0 = 0
Deterministic intervals (replace Xn by expectation θ0 = g(γ0)):
log
∫ ∞−∞
Cn exp{−n · 1
2‖g(γ0)− g(γ)‖2}
p(γ) dγ
=
{n log(C )− 1
2 log(n) + O(1) if γ0 6= 0,
n log(C )− 14 log(n) + O(1) if γ0 = 0
Same asymptotics!
Mathias Drton Lecture 3: Marginal likelihood integrals 86 / 110
![Page 93: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/93.jpg)
Laplace integrals
Theorem
Let {Pθ : θ ∈ Θ} be a regular exponential family. Consider a polynomialmap g : Rd → Θ, and let θ0 = g(γ0) be the true parameter. Assume thatthat the prior density p(γ) is smooth and positive on a compact andsemi-analytic supporting set. Then
logµn = ˆn − q log(n) + (s − 1) log log(n) + Op(1),
where the rational number q ∈ (0, d/2] and the integer s ∈ [d ] satisfy that
log
∫e−n‖g(γ)−θ0‖2
p(γ)dγ = −q log(n) + (s − 1) log log(n) + O(1).
Remark
The remainder can be shown to converge in distribution.
Mathias Drton Lecture 3: Marginal likelihood integrals 87 / 110
![Page 94: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/94.jpg)
Watanabe’s book
The theorem is proven in the book byWatanabe.
Watanabe also discusses algebraictechniques for computing the learningcoefficient = growth index q and themultiplicity s
Singular integrals:
Arnol’d, V.I.; Gusein-Zade, S.M.;Varchenko, A.N. Singularities ofdifferentiable maps. Vol. I & II,1985/88.Work by Michael Greenblatt at UIC
Mathias Drton Lecture 3: Marginal likelihood integrals 88 / 110
![Page 95: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/95.jpg)
Example: Sample vs true mean in normal means model
Random integral
logµn = log
∫Rd
exp{−n · 1
2‖Xn − g(γ)‖2}
p(γ) dγ
Simple bound for any a > 0:
2|〈Xn − θ0, g(γ)− θ0〉| ≤ a‖Xn − θ0‖2 +1
a‖g(γ)− θ0‖2
Bound in exponent:
‖Xn − g(γ)‖2a=1≤ 2‖g(γ)− θ0‖2 + 2‖Xn − θ0‖2
‖Xn − g(γ)‖2a=2≥ 1
2‖g(γ)− θ0‖2 − ‖Xn − θ0‖2
If deterministic integral based on e−n‖g(γ)−θ0‖2has an asymptotic
expansion then random integrals have same growth behavior.
Mathias Drton Lecture 3: Marginal likelihood integrals 89 / 110
![Page 96: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/96.jpg)
Lecture outline
7 Information criteria for model selection
8 Marginal likelihood integrals
9 Resolution of singularities and Newton polyhedra
10 Reduced rank regression
Mathias Drton Lecture 3: Resolution of singularities and Newton polyhedra 90 / 110
![Page 97: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/97.jpg)
Zeta function
Polynomial map f : Rd → [0,∞)
Smooth prior p(γ), positive on compact semi-analytic support
Laplace integral ∫e−nf (γ)p(γ) dγ
Zeta function:
ζ(λ) =
∫f (γ)λp(γ) dγ, λ ∈ C,Re(λ) > 0
Theorem
The zeta function ζ(λ) can be continued (uniquely) to a meromorphicfunction on all of C. All poles are negative rational numbers. The negatedgrowth index q is the largest pole of ζ(λ) and the multiplicity s is themultiplicity of this pole.
Mathias Drton Lecture 3: Resolution of singularities and Newton polyhedra 91 / 110
![Page 98: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/98.jpg)
Local view
For large n, main contribution to∫e−nf (γ)p(γ) dγ
comes from neighborhood of
Vf = {γ : f (γ) = 0} ∩ supp(p).
Since prior support assumed compact, study the asymptotics of∫U(γ0)
e−nf (γ)p(γ) dγ, U(γ0) small neighborhood of γ0,
for all γ0 ∈ Vf
Note: For marginal likelihood f (γ) = 0 ⇐⇒ g(γ) = θ0
(‘identifiability’ issues)
Mathias Drton Lecture 3: Resolution of singularities and Newton polyhedra 92 / 110
![Page 99: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/99.jpg)
Resolution of singularities
Theorem (Hironaka, 1964; Atiyah, 1970)
In the considered setup, for every γ0 ∈ Vf , there exists
a neighborhood U(γ0) of γ0 ∈ Rd and
changes of coordinates
such that the zeta function becomes a finite sum of the form∫U(γ0)
f (γ)λp(γ) dγ =
∑α
∫[0,b]d
(u
2k1(α)1 . . . u
2kd (α)d
)λφα(u)u
h1(α)1 . . . u
hd (α)d du,
where the φα are smooth and bounded away from zero on [0, b]d .
Mathias Drton Lecture 3: Resolution of singularities and Newton polyhedra 93 / 110
![Page 100: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/100.jpg)
Largest pole and multiplicity
Once in ‘normal crossing form’ meromorphic continuation anddetermination of poles clear.
Example:∫(u2k)λuh du =
u2kλ+h+1
2kλ+ h + 1, Pole λ = −h + 1
2k
Growth index:
q = minα
min1≤j≤d
hj(α) + 1
2kj(α)
Multiplicity:
s = maxα
#
{j :
hj(α) + 1
2kj(α)= q
}
Mathias Drton Lecture 3: Resolution of singularities and Newton polyhedra 94 / 110
![Page 101: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/101.jpg)
Example: Blow-up transformations
Product interval∫ 1
−1
∫ 1
−1e−n·(x4+y6) dy dx ∼ n−1/4n−1/6 · C = n−5/12 · C
Resolve by repeatedly applying blow-up transformation, i.e., the pair
x = x1, y = x1y1; x = x2y2, y = y2.
y = y’x = x’y’,
Mathias Drton Lecture 3: Resolution of singularities and Newton polyhedra 95 / 110
![Page 102: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/102.jpg)
Example: Blow-up transformations
First blow-up transformation gives
x4 + y 6 = x41 (1 + x2
1 y 61 ) Jacob. x1
= y 42 (x2
2 + y 22 ) y1
In 1st coordinates normal crossing, 4λ+ 2 = 0, pole −12
In 2nd coordinates not normal crossing, repeat
y 4(x4 + y 2) = x61 y 4
1 (x21 + y 2
2 ) Jacob. x21 y1
= y 62 (1 + x4
2 y 22 ) y 2
2
In 2nd coordinates normal crossing, 6λ+ 3 = 0, pole −12
In 1st coordinates not normal crossing, repeat
x6y 4(x2 + y 2) = x121 y 4
1 (1 + y 21 ) Jacob. x4
1 y1
= x62 y 12
2 (1 + x22 ) x2
2 y 42
Normal crossing in both coordinates: q = 512 , s = 1
Mathias Drton Lecture 3: Resolution of singularities and Newton polyhedra 96 / 110
![Page 103: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/103.jpg)
Example: Blow-up transformations
First blow-up transformation gives
x4 + y 6 = x41 (1 + x2
1 y 61 ) Jacob. x1
= y 42 (x2
2 + y 22 ) y1
In 1st coordinates normal crossing, 4λ+ 2 = 0, pole −12
In 2nd coordinates not normal crossing, repeat
y 4(x4 + y 2) = x61 y 4
1 (x21 + y 2
2 ) Jacob. x21 y1
= y 62 (1 + x4
2 y 22 ) y 2
2
In 2nd coordinates normal crossing, 6λ+ 3 = 0, pole −12
In 1st coordinates not normal crossing, repeat
x6y 4(x2 + y 2) = x121 y 4
1 (1 + y 21 ) Jacob. x4
1 y1
= x62 y 12
2 (1 + x22 ) x2
2 y 42
Normal crossing in both coordinates: q = 512 , s = 1
Mathias Drton Lecture 3: Resolution of singularities and Newton polyhedra 96 / 110
![Page 104: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/104.jpg)
Example: Blow-up transformations
First blow-up transformation gives
x4 + y 6 = x41 (1 + x2
1 y 61 ) Jacob. x1
= y 42 (x2
2 + y 22 ) y1
In 1st coordinates normal crossing, 4λ+ 2 = 0, pole −12
In 2nd coordinates not normal crossing, repeat
y 4(x4 + y 2) = x61 y 4
1 (x21 + y 2
2 ) Jacob. x21 y1
= y 62 (1 + x4
2 y 22 ) y 2
2
In 2nd coordinates normal crossing, 6λ+ 3 = 0, pole −12
In 1st coordinates not normal crossing, repeat
x6y 4(x2 + y 2) = x121 y 4
1 (1 + y 21 ) Jacob. x4
1 y1
= x62 y 12
2 (1 + x22 ) x2
2 y 42
Normal crossing in both coordinates: q = 512 , s = 1
Mathias Drton Lecture 3: Resolution of singularities and Newton polyhedra 96 / 110
![Page 105: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/105.jpg)
Example: Blow-up transformations
First blow-up transformation gives
x4 + y 6 = x41 (1 + x2
1 y 61 ) Jacob. x1
= y 42 (x2
2 + y 22 ) y1
In 1st coordinates normal crossing, 4λ+ 2 = 0, pole −12
In 2nd coordinates not normal crossing, repeat
y 4(x4 + y 2) = x61 y 4
1 (x21 + y 2
2 ) Jacob. x21 y1
= y 62 (1 + x4
2 y 22 ) y 2
2
In 2nd coordinates normal crossing, 6λ+ 3 = 0, pole −12
In 1st coordinates not normal crossing, repeat
x6y 4(x2 + y 2) = x121 y 4
1 (1 + y 21 ) Jacob. x4
1 y1
= x62 y 12
2 (1 + x22 ) x2
2 y 42
Normal crossing in both coordinates: q = 512 , s = 1
Mathias Drton Lecture 3: Resolution of singularities and Newton polyhedra 96 / 110
![Page 106: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/106.jpg)
Resolution – Singular session
LIB "resolve.lib";ring R = 0,(x,y),dp;
ideal J = x4+y6;list L=resolve(J);presentTree(L);
list L=resolve(J,0,"A");presentTree(L);LIB "reszeta.lib";list coll=collectDiv(L);LIB "resgraph.lib";ResTree(L,coll[1]);
Mathias Drton Lecture 3: Resolution of singularities and Newton polyhedra 97 / 110
![Page 107: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/107.jpg)
Distance of Newton polyhedron
∫ 1
−1
∫ 1
−1e−n·(x4+y6) dy dx ∼ n−1/4n−1/6 · C = n−5/12 · C
(12/5,12/5)
(6,0)
(0,4)
Distance:
ρ = 4 · 3
5= 6 · 2
5=
12
5=⇒ q =
1
ρ
Mathias Drton Lecture 3: Resolution of singularities and Newton polyhedra 98 / 110
![Page 108: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/108.jpg)
Newton polyhedron
Polynomial
f (x) =∑a∈Nd
caxa, xa = xa11 . . . xad
d
Newton polyhedron Pf is the convex hull of the set⋃a:ca 6=0
({a}+ [0,∞)d
)Distance:
ρ = min{r : r · 1d ∈ Pf }
For A ⊂ Rd , definefA(x) =
∑a∈A∩Nd
caxa
Mathias Drton Lecture 3: Resolution of singularities and Newton polyhedra 99 / 110
![Page 109: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/109.jpg)
Non-degenerate exponents and remoteness
Theorem
If the polynomial f has a minimum at zero and is non-degenerate, that is,for any compact face A of the Newton polyhedron the equation system
∂fA(x)
∂x1= . . .
∂fA(x)
∂xd= 0
has no solution in (R \ {0})d , then for small ε the growth index for theintegral ∫
[−ε,ε]de−nf (γ)p(γ) dγ
is q = 1/ρ and the multiplicity s is the codimension of thelowest-dimensional face containing the point at which the ray spanned by1d first intersects the Newton polyhedron.
Mathias Drton Lecture 3: Resolution of singularities and Newton polyhedra 100 / 110
![Page 110: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/110.jpg)
Lecture outline
7 Information criteria for model selection
8 Marginal likelihood integrals
9 Resolution of singularities and Newton polyhedra
10 Reduced rank regression
Mathias Drton Lecture 3: Reduced rank regression 101 / 110
![Page 111: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/111.jpg)
Reduced rank regression
Multivariate regression model
Y = θX + ε, θ ∈ Ra×b, rank(θ) ≤ h
X1
H
X2
Y1
Y2
Multivariate normal model (random design X )
Parametrize
θ = g(α, β) = αβT , α ∈ Ra×h, β ∈ Rb×h
Model selection problem: Determine h
WLOG: Assume coordinates of X and ε mutually independent withknown variances.
Mathias Drton Lecture 3: Reduced rank regression 102 / 110
![Page 112: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/112.jpg)
Asymptotics – regular case
Consider model given by rank h.
Suppose true matrix θ0 has rank r ≤ h.
Interested in the asymptotics of the integral∫ ∫exp{−n‖αβT − θ0‖2} dα dβ
Regular case:
The Jacobian of the map g(α, β) = αβT achieves its maximal rankh(a + b − h) at a point (α0, β0) if and only if α0β
T0 has full rank h.
If θ0 has rank r = h, then the set g−1(θ0) ⊆ Rah+bh is a smoothmanifold of dimension h2.
Reparametrize and apply Laplace approximaton (Haughton’s result) toobtain
q = h(a + b − h)/2, s = 1.
Mathias Drton Lecture 3: Reduced rank regression 103 / 110
![Page 113: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/113.jpg)
Asymptotics – singular case
Interested in the asymptotics of the integral∫ ∫exp{−n‖αβT − θ0‖2} dα dβ
Singular case: rank of θ0 is equal to r < h
Aoyagi & Watanabe (2005):Found growth index q and multiplicity s as a function of (a, b, h, r)
Simplest case with singularities is model rank h = 1
Mathias Drton Lecture 3: Reduced rank regression 104 / 110
![Page 114: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/114.jpg)
Asymptotics – singular case for rank 1
Model rank h = 1
Only one singular point: θ0 = 0
Fiberg−1(θ0) = {(α0, β0) : α0 = 0 or β0 = 0}
singular at the origin (α0, β0) = 0 and smooth elsewhere.
Local integrals are∫U(α0)
∫U(β0)
exp{−n(α21 + · · ·+ α2
a)(β21 + · · ·+ β2
b)} dα dβ,
(α0, β0) ∈ g−1(0).
Mathias Drton Lecture 3: Reduced rank regression 105 / 110
![Page 115: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/115.jpg)
Case 1
Suppose α0 = (α01, . . . , α0k , 0, . . . , 0) 6= 0. Then β0 = 0.
Shift (α0, β0) to origin by transformation αi = αi − α0i
Local integral becomes∫U(0)
exp{−n[(α1 + α01)2 + · · ·+ (αk + α0k)2 + α2k+1 + · · ·+ α2
a]
(β21 + · · ·+ β2
b)} d(α, β)
Function of α in exponent is bounded away from zero in aneighborhood U(0).
Asymptotics determined by that of∫U(0)
exp{−n(β21 + · · ·+ β2
b)} dβ
which is a regular integral with growth index b/2 and multiplicity 1.
Mathias Drton Lecture 3: Reduced rank regression 106 / 110
![Page 116: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/116.jpg)
Case 2
Suppose α0 = β0 = 0.
Resolve (α21 + · · ·+ α2
a)(β21 + · · ·+ β2
b) by applying a blow-up to thefirst term and a blow-up to the second term.
We obtain∫U(0,0)
α2λ1 β2λ
1 αa−11 βb−1
1
(1 + α2
2 + . . .)λ (
1 + β22 + . . .
)λdα dβ.
Consider ∫α2λ+a−1
1 β2λ+b−11 dα1dβ1 =
α2λ+a1 β2λ+b
1
(2λ+ a)(2λ+ b).
Poles λ = −a/2 and λ = −b/2.
Mathias Drton Lecture 3: Reduced rank regression 107 / 110
![Page 117: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/117.jpg)
Asymptotics for rank 1
Proposition
The marginal likelihood for the reduced rank regression model for rankh = 1 has growth index and multiplicity
(q, s) =
(a+b−1
2 , 1)
if θ0 6= 0,(min{a,b}
2 , 1)
if θ0 = 0 and a 6= b,(a2 = b
2 , 2)
if θ0 = 0 and a = b.
This can also be shown by looking at the Newton diagrams
Mathias Drton Lecture 3: Reduced rank regression 108 / 110
![Page 118: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/118.jpg)
Exercise: Factor analysis
Let H and ε1, . . . , εd be mutually independent N (0, 1) r.v.
Define
X = αH + ε, α ∈ Rd
Then X ∼ N (0, θ) with covariance matrix θ = I + ααT , α ∈ Rd
X1 X2 X3 X4
H
What is the growth behaviour of marginal likelihood of this model?
Mathias Drton Lecture 3: Reduced rank regression 109 / 110
![Page 119: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative](https://reader033.vdocument.in/reader033/viewer/2022042222/5ec8669398fe501cbb4a232c/html5/thumbnails/119.jpg)
Conclusion
Algebraic statistical models:useful framework for discussing non-smooth statistical models.
Computational algebra:Markov bases, vanishing ideals, singular loci, tangent cones, resolutionof singularities, . . .
Many open questions about classical statistical models . . .
Mathias Drton 110 / 110