large-deviations and applications for learning tree ...large-deviations and applications for...
TRANSCRIPT
![Page 1: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/1.jpg)
Large-Deviations and Applications for LearningTree-Structured Graphical Models
Vincent Tan
Stochastic Systems Group,Lab of Information and Decision Systems,
Massachusetts Institute of Technology
Thesis Defense (Nov 16, 2010)
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 1 / 52
![Page 2: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/2.jpg)
Acknowledgements
The following is joint work with:
Alan Willsky (MIT)
Lang Tong (Cornell)
Animashree Anandkumar (UC Irvine)
John Fisher (MIT)
Sujay Sanghavi (UT Austin)
Matt Johnson (MIT)
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 2 / 52
![Page 3: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/3.jpg)
Outline
1 Motivation, Background and Main Contributions
2 Learning Discrete Trees Models: Error Exponent Analysis
3 Learning Gaussian Trees Models: Extremal Structures
4 Learning High-Dimensional Forest-Structured Models
5 Related Topics and Conclusion
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 3 / 52
![Page 4: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/4.jpg)
Outline
1 Motivation, Background and Main Contributions
2 Learning Discrete Trees Models: Error Exponent Analysis
3 Learning Gaussian Trees Models: Extremal Structures
4 Learning High-Dimensional Forest-Structured Models
5 Related Topics and Conclusion
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 3 / 52
![Page 5: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/5.jpg)
Outline
1 Motivation, Background and Main Contributions
2 Learning Discrete Trees Models: Error Exponent Analysis
3 Learning Gaussian Trees Models: Extremal Structures
4 Learning High-Dimensional Forest-Structured Models
5 Related Topics and Conclusion
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 3 / 52
![Page 6: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/6.jpg)
Outline
1 Motivation, Background and Main Contributions
2 Learning Discrete Trees Models: Error Exponent Analysis
3 Learning Gaussian Trees Models: Extremal Structures
4 Learning High-Dimensional Forest-Structured Models
5 Related Topics and Conclusion
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 3 / 52
![Page 7: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/7.jpg)
Outline
1 Motivation, Background and Main Contributions
2 Learning Discrete Trees Models: Error Exponent Analysis
3 Learning Gaussian Trees Models: Extremal Structures
4 Learning High-Dimensional Forest-Structured Models
5 Related Topics and Conclusion
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 3 / 52
![Page 8: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/8.jpg)
Outline
1 Motivation, Background and Main Contributions
2 Learning Discrete Trees Models: Error Exponent Analysis
3 Learning Gaussian Trees Models: Extremal Structures
4 Learning High-Dimensional Forest-Structured Models
5 Related Topics and Conclusion
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 4 / 52
![Page 9: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/9.jpg)
Motivation: A Real-Life Example
Manchester Asthma and Allergy Study (MAAS)
More than n ≈ 1000 children
Number of variables d ≈ 106
Environmental, Physiological and Genetic (SNP)
M a n c h e
t
s t e r
e
A
A
s t
dh
nm a a l l r g y S u d y
www.maas.org.uk
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 5 / 52
![Page 10: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/10.jpg)
Motivation: A Real-Life Example
Manchester Asthma and Allergy Study (MAAS)
More than n ≈ 1000 children
Number of variables d ≈ 106
Environmental, Physiological and Genetic (SNP)
M a n c h e
t
s t e r
e
A
A
s t
dh
nm a a l l r g y S u d y
www.maas.org.uk
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 5 / 52
![Page 11: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/11.jpg)
Motivation: Modeling Large Datasets I
How do we model such data to make useful inferences?
Model the relationships between variables by a sparse graph
Reduce the number of interdependencies between the variables
Airway Obstruction
Viral Infection
Airway Inflammation
Bronchial Hyperresponsiveness
Acquired Immune Response
Immune Response to
Virus
Obesity
Smoking
Prematurity
Lung Function
Simpson*, VYFT* et al. “Beyond Atopy: Multiple Patterns of Sensitization in Relation toAsthma in a Birth Cohort Study”, Am. J. Respir. Crit. Care Med. Feb 2010.
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 6 / 52
![Page 12: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/12.jpg)
Motivation: Modeling Large Datasets I
How do we model such data to make useful inferences?
Model the relationships between variables by a sparse graph
Reduce the number of interdependencies between the variables
Airway Obstruction
Viral Infection
Airway Inflammation
Bronchial Hyperresponsiveness
Acquired Immune Response
Immune Response to
Virus
Obesity
Smoking
Prematurity
Lung Function
Simpson*, VYFT* et al. “Beyond Atopy: Multiple Patterns of Sensitization in Relation toAsthma in a Birth Cohort Study”, Am. J. Respir. Crit. Care Med. Feb 2010.
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 6 / 52
![Page 13: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/13.jpg)
Motivation: Modeling Large Datasets I
How do we model such data to make useful inferences?
Model the relationships between variables by a sparse graph
Reduce the number of interdependencies between the variables
Airway Obstruction
Viral Infection
Airway Inflammation
Bronchial Hyperresponsiveness
Acquired Immune Response
Immune Response to
Virus
Obesity
Smoking
Prematurity
Lung Function
Simpson*, VYFT* et al. “Beyond Atopy: Multiple Patterns of Sensitization in Relation toAsthma in a Birth Cohort Study”, Am. J. Respir. Crit. Care Med. Feb 2010.
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 6 / 52
![Page 14: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/14.jpg)
Motivation: Modeling Large Datasets II
Reduce the dimensionality of the covariates (features) forpredicting a variable for interest (e.g., asthma)
Information-theoretic limits†?
Learning graphical models tailored specifically for hypothesistesting
Can we learn better models in the finite-sample setting‡?
† VYFT, Johnson and Willsky, “Necessary and Sufficient Conditions for Salient SubsetRecovery,” Intl. Symp. on Info. Theory, Jul 2010.
‡ VYFT, Sanghavi, Fisher and Willsky, “Learning Graphical Models for Hypothesis Testingand Classification,” IEEE Trans. on Signal Processing, Nov 2010.
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 7 / 52
![Page 15: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/15.jpg)
Motivation: Modeling Large Datasets II
Reduce the dimensionality of the covariates (features) forpredicting a variable for interest (e.g., asthma)
Information-theoretic limits†?
Learning graphical models tailored specifically for hypothesistesting
Can we learn better models in the finite-sample setting‡?
† VYFT, Johnson and Willsky, “Necessary and Sufficient Conditions for Salient SubsetRecovery,” Intl. Symp. on Info. Theory, Jul 2010.
‡ VYFT, Sanghavi, Fisher and Willsky, “Learning Graphical Models for Hypothesis Testingand Classification,” IEEE Trans. on Signal Processing, Nov 2010.
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 7 / 52
![Page 16: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/16.jpg)
Graphical Models: Introduction
Graph structure G = (V,E) represents a multivariate distribution ofa random vector X = (X1, . . . ,Xd) indexed by V = 1, . . . , d
Node i ∈ V corresponds to random variable Xi
Edge set E corresponds to conditional independencies
Graphical Models: Introduction
Graph structure G = (V,E) in the multivariate distribution of randomvariables, with V = 1, . . . ,m.Nodes i ∈ V correspond to random variable Xi.
Edges E correspond to conditional independence relationships.
V \nbd(i) ∪ i
i
nbd(i)
Xi ⊥⊥ XV \nbd(i)∪i|Xnbd(i)
A
B
S
XA ⊥⊥ XB |XS
Anima Anandkumar (UCI) Trees, Latent Trees & Beyond 11/08/2010 4 / 50
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 8 / 52
![Page 17: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/17.jpg)
Graphical Models: Introduction
Graph structure G = (V,E) represents a multivariate distribution ofa random vector X = (X1, . . . ,Xd) indexed by V = 1, . . . , d
Node i ∈ V corresponds to random variable Xi
Edge set E corresponds to conditional independencies
Graphical Models: Introduction
Graph structure G = (V,E) in the multivariate distribution of randomvariables, with V = 1, . . . ,m.Nodes i ∈ V correspond to random variable Xi.
Edges E correspond to conditional independence relationships.
V \nbd(i) ∪ i
i
nbd(i)
Xi ⊥⊥ XV \nbd(i)∪i|Xnbd(i)
A
B
S
XA ⊥⊥ XB |XS
Anima Anandkumar (UCI) Trees, Latent Trees & Beyond 11/08/2010 4 / 50
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 8 / 52
![Page 18: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/18.jpg)
From Conditional Independence to Gibbs Distribution
Hammersley-Clifford Theorem (1971)Let P be the joint pmf of graphicalmodel Markov on G = (V,E):
P(x) =1Z
exp
[∑
c∈CΨc(xc)
]
From Conditional Independence to Gibbs Distribution
Hammersley-Clifford Theorem’71
Let P be joint pmf of model with graphG = (V,E),
P (x) =1
Zexp[
∑
c∈CΨc(xc)].
where C is the set of maximal cliques.
Anima Anandkumar (UCI) Trees, Latent Trees & Beyond 11/08/2010 5 / 50
Gaussian Graphical Models
From Conditional Independence to Gibbs Distribution
Hammersley-Clifford Theorem’71
Let P be joint pmf of model with graphG = (V,E),
P (x) =1
Zexp[
∑
c∈CΨc(xc)].
where C is the set of maximal cliques.
Gaussian Graphical Models
Dependency
Graph
1
2
3
4
56
7
81
2
3
4
5
6
7
1 2 3 7654
X X
XX
XX
X
XX
X X
X
X XX X8
8
XX
X Inverse of
Covariance
Matrix
Anima Anandkumar (UCI) Trees, Latent Trees & Beyond 11/08/2010 5 / 50
Dependency Graph Inverse Covariance Matrix
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 9 / 52
![Page 19: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/19.jpg)
From Conditional Independence to Gibbs Distribution
Hammersley-Clifford Theorem (1971)Let P be the joint pmf of graphicalmodel Markov on G = (V,E):
P(x) =1Z
exp
[∑
c∈CΨc(xc)
]
From Conditional Independence to Gibbs Distribution
Hammersley-Clifford Theorem’71
Let P be joint pmf of model with graphG = (V,E),
P (x) =1
Zexp[
∑
c∈CΨc(xc)].
where C is the set of maximal cliques.
Anima Anandkumar (UCI) Trees, Latent Trees & Beyond 11/08/2010 5 / 50
Gaussian Graphical Models
From Conditional Independence to Gibbs Distribution
Hammersley-Clifford Theorem’71
Let P be joint pmf of model with graphG = (V,E),
P (x) =1
Zexp[
∑
c∈CΨc(xc)].
where C is the set of maximal cliques.
Gaussian Graphical Models
Dependency
Graph
1
2
3
4
56
7
81
2
3
4
5
6
7
1 2 3 7654
X X
XX
XX
X
XX
X X
X
X XX X8
8
XX
X Inverse of
Covariance
Matrix
Anima Anandkumar (UCI) Trees, Latent Trees & Beyond 11/08/2010 5 / 50
Dependency Graph Inverse Covariance Matrix
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 9 / 52
![Page 20: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/20.jpg)
From Conditional Independence to Gibbs Distribution
Hammersley-Clifford Theorem (1971)Let P be the joint pmf of graphicalmodel Markov on G = (V,E):
P(x) =1Z
exp
[∑
c∈CΨc(xc)
]
From Conditional Independence to Gibbs Distribution
Hammersley-Clifford Theorem’71
Let P be joint pmf of model with graphG = (V,E),
P (x) =1
Zexp[
∑
c∈CΨc(xc)].
where C is the set of maximal cliques.
Anima Anandkumar (UCI) Trees, Latent Trees & Beyond 11/08/2010 5 / 50
Gaussian Graphical Models
From Conditional Independence to Gibbs Distribution
Hammersley-Clifford Theorem’71
Let P be joint pmf of model with graphG = (V,E),
P (x) =1
Zexp[
∑
c∈CΨc(xc)].
where C is the set of maximal cliques.
Gaussian Graphical Models
Dependency
Graph
1
2
3
4
56
7
81
2
3
4
5
6
7
1 2 3 7654
X X
XX
XX
X
XX
X X
X
X XX X8
8
XX
X Inverse of
Covariance
Matrix
Anima Anandkumar (UCI) Trees, Latent Trees & Beyond 11/08/2010 5 / 50
Dependency Graph Inverse Covariance Matrix
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 9 / 52
![Page 21: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/21.jpg)
Tree-Structured Graphical Models
uu
u u@@@@
X4
X1
X3 X2
P(x)=∏
i∈V
Pi(xi)∏
(i,j)∈E
Pi,j(xi, xj)
Pi(xi)Pj(xj)
= P1(x1)P1,2(x1, x2)
P1(x1)
P1,3(x1, x3)
P1(x1)
P1,4(x1, x4)
P1(x1)
Tree-structured Graphical Models: Tractable Learning and Inference
Maximum-Likelihood learning of tree structure is tractableChow-Liu Algorithm (1968)
Inference on Trees is tractableSum-Product Algorithm
Which other classes of graphical models are tractable for learning?
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 10 / 52
![Page 22: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/22.jpg)
Tree-Structured Graphical Models
uu
u u@@@@
X4
X1
X3 X2
P(x)=∏
i∈V
Pi(xi)∏
(i,j)∈E
Pi,j(xi, xj)
Pi(xi)Pj(xj)
= P1(x1)P1,2(x1, x2)
P1(x1)
P1,3(x1, x3)
P1(x1)
P1,4(x1, x4)
P1(x1)
Tree-structured Graphical Models: Tractable Learning and Inference
Maximum-Likelihood learning of tree structure is tractableChow-Liu Algorithm (1968)
Inference on Trees is tractableSum-Product Algorithm
Which other classes of graphical models are tractable for learning?
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 10 / 52
![Page 23: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/23.jpg)
Tree-Structured Graphical Models
uu
u u@@@@
X4
X1
X3 X2
P(x)=∏
i∈V
Pi(xi)∏
(i,j)∈E
Pi,j(xi, xj)
Pi(xi)Pj(xj)
= P1(x1)P1,2(x1, x2)
P1(x1)
P1,3(x1, x3)
P1(x1)
P1,4(x1, x4)
P1(x1)
Tree-structured Graphical Models: Tractable Learning and Inference
Maximum-Likelihood learning of tree structure is tractableChow-Liu Algorithm (1968)
Inference on Trees is tractableSum-Product Algorithm
Which other classes of graphical models are tractable for learning?
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 10 / 52
![Page 24: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/24.jpg)
Tree-Structured Graphical Models
uu
u u@@@@
X4
X1
X3 X2
P(x)=∏
i∈V
Pi(xi)∏
(i,j)∈E
Pi,j(xi, xj)
Pi(xi)Pj(xj)
= P1(x1)P1,2(x1, x2)
P1(x1)
P1,3(x1, x3)
P1(x1)
P1,4(x1, x4)
P1(x1)
Tree-structured Graphical Models: Tractable Learning and Inference
Maximum-Likelihood learning of tree structure is tractableChow-Liu Algorithm (1968)
Inference on Trees is tractableSum-Product Algorithm
Which other classes of graphical models are tractable for learning?
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 10 / 52
![Page 25: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/25.jpg)
Main Contributions in Thesis: I
Error Exponent Analysis of Tree Structure Learning (Ch. 3 and 4)
uuuu
u u u u u u
High-Dimensional Structure Learning for Forest Models (Ch. 5)
Graphical Models: Trees & Beyond
Analysis of Tree Structure Learning: Extremal Trees for Learning
Star
t ttt t
Chain
t t t t t
Structure Learning in Graphical Models Beyond Trees
Forests Latent Trees Random Graphs
Anima Anandkumar (UCI) Trees, Latent Trees & Beyond 11/08/2010 7 / 50
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 11 / 52
![Page 26: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/26.jpg)
Main Contributions in Thesis: I
Error Exponent Analysis of Tree Structure Learning (Ch. 3 and 4)
uuuu
u u u u u uHigh-Dimensional Structure Learning for Forest Models (Ch. 5)
Graphical Models: Trees & Beyond
Analysis of Tree Structure Learning: Extremal Trees for Learning
Star
t ttt t
Chain
t t t t t
Structure Learning in Graphical Models Beyond Trees
Forests Latent Trees Random Graphs
Anima Anandkumar (UCI) Trees, Latent Trees & Beyond 11/08/2010 7 / 50
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 11 / 52
![Page 27: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/27.jpg)
Main Contributions in Thesis: II
Learning Graphical Models for Hypothesis Testing (Ch. 6)
Devised algorithms for learning trees for hypothesis testing12
tt
tt qp
pDT pCL Tp
D(p||pDT) −D(q||pDT)
D(p||pCL)
Fig. 1. Illustration of Proposition 2. As defined in (8), Tp is the subset of tree distributions that are marginally consistent with
p, the empirical distribution of the positively labeled samples. p and q are not trees, thus p, q /∈ Tp. The generatively-learned
distribution (via Chow-Liu) pCL, is the projection of p onto Tp as given by the optimization problem in (9). The discriminatively-
learned distribution pDT, is the solution of (20a) which is “further” (in the KL-divergence sense) from q (because of the −D(q||p)
term).
reverse is true for q. See Fig. 1 for an illustration of the proposition. Note that all distances are measured
using the KL-divergence. Each one of these problems can be solved by a MWST procedure with the
appropriate edge weights given in the following proposition.
Proposition 3: (Edge Weights for Discriminative Trees) Assume that p and q are marginally consistent
with p and q respectively as defined in (13). Then, for the selection of the edge set of p in (20a), we can
apply a MWST procedure with the weights on each pair of nodes (i, j) ∈(V
2
)are given by
ψ(+)i,j := Epi,j
[log
pi,j
pipj
]− Eqi,j
[log
pi,j
pipj
]. (21)
Proof: The proof can be found in Appendix A.
From (21), we observe that only the marginal and pairwise statistics are needed in order to compute the
edge weights. Subsequently, the MWST is used to obtain Ep. Then, given this optimal tree structure, the
model p is the projection of p onto Ep. A similar procedure yields q, with edge weights ψ(−)i,j given by an
expression similar to (21), but with p and q interchanged. The algorithm is summarized in Algorithm 1.
This discriminative tree (DT) learning procedure produces at most n− 1 edges (pairwise features) in
each tree model p and q (some of the edge weights ψ(+)i,j in (21) may turn out to be negative so the
algorithm may terminate early). The tree models p and q will then be used to construct φ, which is
used in the likelihood ratio test (3). Section V-B compares the classification performance of this method
with other tree-based methods such as Chow-Liu as well as TAN [13], [14]. Finally, we remark that the
proposed procedure has exactly the same complexity as learning a TAN network.
July 8, 2010 DRAFT
Information-Theoretic Limits for Salient Subset Recovery (Ch. 7)
Devised necessary and sufficient conditions for estimating ofsalient set of features
We will focus on Chapters 3 - 5 here. See thesis for Chapters 6 and 7.
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 12 / 52
![Page 28: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/28.jpg)
Main Contributions in Thesis: II
Learning Graphical Models for Hypothesis Testing (Ch. 6)
Devised algorithms for learning trees for hypothesis testing12
tt
tt qp
pDT pCL Tp
D(p||pDT) −D(q||pDT)
D(p||pCL)
Fig. 1. Illustration of Proposition 2. As defined in (8), Tp is the subset of tree distributions that are marginally consistent with
p, the empirical distribution of the positively labeled samples. p and q are not trees, thus p, q /∈ Tp. The generatively-learned
distribution (via Chow-Liu) pCL, is the projection of p onto Tp as given by the optimization problem in (9). The discriminatively-
learned distribution pDT, is the solution of (20a) which is “further” (in the KL-divergence sense) from q (because of the −D(q||p)
term).
reverse is true for q. See Fig. 1 for an illustration of the proposition. Note that all distances are measured
using the KL-divergence. Each one of these problems can be solved by a MWST procedure with the
appropriate edge weights given in the following proposition.
Proposition 3: (Edge Weights for Discriminative Trees) Assume that p and q are marginally consistent
with p and q respectively as defined in (13). Then, for the selection of the edge set of p in (20a), we can
apply a MWST procedure with the weights on each pair of nodes (i, j) ∈(V
2
)are given by
ψ(+)i,j := Epi,j
[log
pi,j
pipj
]− Eqi,j
[log
pi,j
pipj
]. (21)
Proof: The proof can be found in Appendix A.
From (21), we observe that only the marginal and pairwise statistics are needed in order to compute the
edge weights. Subsequently, the MWST is used to obtain Ep. Then, given this optimal tree structure, the
model p is the projection of p onto Ep. A similar procedure yields q, with edge weights ψ(−)i,j given by an
expression similar to (21), but with p and q interchanged. The algorithm is summarized in Algorithm 1.
This discriminative tree (DT) learning procedure produces at most n− 1 edges (pairwise features) in
each tree model p and q (some of the edge weights ψ(+)i,j in (21) may turn out to be negative so the
algorithm may terminate early). The tree models p and q will then be used to construct φ, which is
used in the likelihood ratio test (3). Section V-B compares the classification performance of this method
with other tree-based methods such as Chow-Liu as well as TAN [13], [14]. Finally, we remark that the
proposed procedure has exactly the same complexity as learning a TAN network.
July 8, 2010 DRAFT
Information-Theoretic Limits for Salient Subset Recovery (Ch. 7)
Devised necessary and sufficient conditions for estimating ofsalient set of features
We will focus on Chapters 3 - 5 here. See thesis for Chapters 6 and 7.
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 12 / 52
![Page 29: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/29.jpg)
Main Contributions in Thesis: II
Learning Graphical Models for Hypothesis Testing (Ch. 6)
Devised algorithms for learning trees for hypothesis testing12
tt
tt qp
pDT pCL Tp
D(p||pDT) −D(q||pDT)
D(p||pCL)
Fig. 1. Illustration of Proposition 2. As defined in (8), Tp is the subset of tree distributions that are marginally consistent with
p, the empirical distribution of the positively labeled samples. p and q are not trees, thus p, q /∈ Tp. The generatively-learned
distribution (via Chow-Liu) pCL, is the projection of p onto Tp as given by the optimization problem in (9). The discriminatively-
learned distribution pDT, is the solution of (20a) which is “further” (in the KL-divergence sense) from q (because of the −D(q||p)
term).
reverse is true for q. See Fig. 1 for an illustration of the proposition. Note that all distances are measured
using the KL-divergence. Each one of these problems can be solved by a MWST procedure with the
appropriate edge weights given in the following proposition.
Proposition 3: (Edge Weights for Discriminative Trees) Assume that p and q are marginally consistent
with p and q respectively as defined in (13). Then, for the selection of the edge set of p in (20a), we can
apply a MWST procedure with the weights on each pair of nodes (i, j) ∈(V
2
)are given by
ψ(+)i,j := Epi,j
[log
pi,j
pipj
]− Eqi,j
[log
pi,j
pipj
]. (21)
Proof: The proof can be found in Appendix A.
From (21), we observe that only the marginal and pairwise statistics are needed in order to compute the
edge weights. Subsequently, the MWST is used to obtain Ep. Then, given this optimal tree structure, the
model p is the projection of p onto Ep. A similar procedure yields q, with edge weights ψ(−)i,j given by an
expression similar to (21), but with p and q interchanged. The algorithm is summarized in Algorithm 1.
This discriminative tree (DT) learning procedure produces at most n− 1 edges (pairwise features) in
each tree model p and q (some of the edge weights ψ(+)i,j in (21) may turn out to be negative so the
algorithm may terminate early). The tree models p and q will then be used to construct φ, which is
used in the likelihood ratio test (3). Section V-B compares the classification performance of this method
with other tree-based methods such as Chow-Liu as well as TAN [13], [14]. Finally, we remark that the
proposed procedure has exactly the same complexity as learning a TAN network.
July 8, 2010 DRAFT
Information-Theoretic Limits for Salient Subset Recovery (Ch. 7)
Devised necessary and sufficient conditions for estimating ofsalient set of features
We will focus on Chapters 3 - 5 here. See thesis for Chapters 6 and 7.
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 12 / 52
![Page 30: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/30.jpg)
Outline
1 Motivation, Background and Main Contributions
2 Learning Discrete Trees Models: Error Exponent Analysis
3 Learning Gaussian Trees Models: Extremal Structures
4 Learning High-Dimensional Forest-Structured Models
5 Related Topics and Conclusion
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 13 / 52
![Page 31: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/31.jpg)
Motivation
ML learning of tree structure given i.i.d.X d-valued samples rr
r r@@
X4
X1
X3 X2
6
-
Pn(err)
n = # Samples
Pn(err) .= exp(−n Rate)
When does the error probability decay exponentially?
What is the exact rate of decay of the probability of error?
How does the error exponent depend on the parameters andstructure of the true distribution?
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 14 / 52
![Page 32: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/32.jpg)
Motivation
ML learning of tree structure given i.i.d.X d-valued samples rr
r r@@
X4
X1
X3 X2
6
-
Pn(err)
n = # Samples
Pn(err) .= exp(−n Rate)
When does the error probability decay exponentially?
What is the exact rate of decay of the probability of error?
How does the error exponent depend on the parameters andstructure of the true distribution?
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 14 / 52
![Page 33: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/33.jpg)
Motivation
ML learning of tree structure given i.i.d.X d-valued samples rr
r r@@
X4
X1
X3 X2
6
-
Pn(err)
n = # Samples
Pn(err) .= exp(−n Rate)
When does the error probability decay exponentially?
What is the exact rate of decay of the probability of error?
How does the error exponent depend on the parameters andstructure of the true distribution?
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 14 / 52
![Page 34: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/34.jpg)
Motivation
ML learning of tree structure given i.i.d.X d-valued samples rr
r r@@
X4
X1
X3 X2
6
-
Pn(err)
n = # Samples
Pn(err) .= exp(−n Rate)
When does the error probability decay exponentially?
What is the exact rate of decay of the probability of error?
How does the error exponent depend on the parameters andstructure of the true distribution?
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 14 / 52
![Page 35: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/35.jpg)
Motivation
ML learning of tree structure given i.i.d.X d-valued samples rr
r r@@
X4
X1
X3 X2
6
-
Pn(err)
n = # Samples
Pn(err) .= exp(−n Rate)
When does the error probability decay exponentially?
What is the exact rate of decay of the probability of error?
How does the error exponent depend on the parameters andstructure of the true distribution?
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 14 / 52
![Page 36: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/36.jpg)
Motivation
ML learning of tree structure given i.i.d.X d-valued samples rr
r r@@
X4
X1
X3 X2
6
-
Pn(err)
n = # Samples
Pn(err) .= exp(−n Rate)
When does the error probability decay exponentially?
What is the exact rate of decay of the probability of error?
How does the error exponent depend on the parameters andstructure of the true distribution?
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 14 / 52
![Page 37: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/37.jpg)
Main Contributions
Discrete case:
Provide the exact rate of decay for a given P
Rate of decay ≈ SNR for learning
Gaussian case:
Extremal structures: Star (worst) and chain (best) for learning
uuuu
u u u u u u
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 15 / 52
![Page 38: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/38.jpg)
Main Contributions
Discrete case:
Provide the exact rate of decay for a given P
Rate of decay ≈ SNR for learning
Gaussian case:
Extremal structures: Star (worst) and chain (best) for learning
uuuu
u u u u u u
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 15 / 52
![Page 39: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/39.jpg)
Related Work in Structure Learning
ML for trees: Max-weight spanning tree with mutual informationedge weights (Chow & Liu 1968)
Causal dependence trees: directed mutual information (Quinn,Coleman & Kiyavash 2010)
Convex relaxation methods: `1 regularization
Gaussian graphical models (Meinshausen and Buehlmann 2006)
Logistic regression for Ising models (Ravikumar et al. 2010)
Learning thin junction trees through conditional mutual informationtests (Chechetka et al. 2007)
Conditional independence tests for bounded degree graphs(Bresler et al. 2008)
We obtain and analyze error exponents for the ML learning of trees(and extensions to forests)
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 16 / 52
![Page 40: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/40.jpg)
Related Work in Structure Learning
ML for trees: Max-weight spanning tree with mutual informationedge weights (Chow & Liu 1968)
Causal dependence trees: directed mutual information (Quinn,Coleman & Kiyavash 2010)
Convex relaxation methods: `1 regularization
Gaussian graphical models (Meinshausen and Buehlmann 2006)
Logistic regression for Ising models (Ravikumar et al. 2010)
Learning thin junction trees through conditional mutual informationtests (Chechetka et al. 2007)
Conditional independence tests for bounded degree graphs(Bresler et al. 2008)
We obtain and analyze error exponents for the ML learning of trees(and extensions to forests)
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 16 / 52
![Page 41: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/41.jpg)
ML Learning of Trees (Chow-Liu) I
Samples xn = x1, . . . , xn drawn i.i.d. from P ∈ P(X d), X is finite
Solve the ML problem given the data xn
PML , argmaxQ∈Trees
1n
n∑
k=1
log Q(xk)
Denote P(a) = P(a; xn) as the empirical distribution of xn
Reduces to a max-weight spanning tree problem (Chow-Liu 1968)
EML = argmaxEQ∈Trees
∑
e∈EQ
I(Pe)
Pe is the marginal of the empirical on e = (i, j)
I(Pe) is the mutual information of the empirical Pe
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 17 / 52
![Page 42: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/42.jpg)
ML Learning of Trees (Chow-Liu) I
Samples xn = x1, . . . , xn drawn i.i.d. from P ∈ P(X d), X is finite
Solve the ML problem given the data xn
PML , argmaxQ∈Trees
1n
n∑
k=1
log Q(xk)
Denote P(a) = P(a; xn) as the empirical distribution of xn
Reduces to a max-weight spanning tree problem (Chow-Liu 1968)
EML = argmaxEQ∈Trees
∑
e∈EQ
I(Pe)
Pe is the marginal of the empirical on e = (i, j)
I(Pe) is the mutual information of the empirical Pe
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 17 / 52
![Page 43: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/43.jpg)
ML Learning of Trees (Chow-Liu) I
Samples xn = x1, . . . , xn drawn i.i.d. from P ∈ P(X d), X is finite
Solve the ML problem given the data xn
PML , argmaxQ∈Trees
1n
n∑
k=1
log Q(xk)
Denote P(a) = P(a; xn) as the empirical distribution of xn
Reduces to a max-weight spanning tree problem (Chow-Liu 1968)
EML = argmaxEQ∈Trees
∑
e∈EQ
I(Pe)
Pe is the marginal of the empirical on e = (i, j)
I(Pe) is the mutual information of the empirical Pe
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 17 / 52
![Page 44: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/44.jpg)
ML Learning of Trees (Chow-Liu) I
Samples xn = x1, . . . , xn drawn i.i.d. from P ∈ P(X d), X is finite
Solve the ML problem given the data xn
PML , argmaxQ∈Trees
1n
n∑
k=1
log Q(xk)
Denote P(a) = P(a; xn) as the empirical distribution of xn
Reduces to a max-weight spanning tree problem (Chow-Liu 1968)
EML = argmaxEQ∈Trees
∑
e∈EQ
I(Pe)
Pe is the marginal of the empirical on e = (i, j)
I(Pe) is the mutual information of the empirical Pe
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 17 / 52
![Page 45: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/45.jpg)
ML Learning of Trees (Chow-Liu) II
uu
u u
ppppppppppppppp p p p p p p
p p p p pppppppp
pppppp
pppppppppppppp
pppppppppppp
p p p p p p pp p p p p p p
p p p p p p pp p p p pp p p p p p p p p p p p p p p p p p p p p p p p p p
X4
X1
X3 X2
5 6
4
1
3 2
uu
u u@@@@
X4
X1
X3 X2
5 6
4
True MI I(Pe) Max-weight spanning tree EP
uu
u u
ppppppppppppppp p p p p p p
p p p p pppppppp
pppppp
pppppppppppppp
pppppppppppp
p p p p p p pp p p p p p p
p p p p p p pp p p p pp p p p p p p p p p p p p p p p p p p p p p p p p p
X4
X1
X3 X2
4.9 6.3
3.5
1.1
3.6 2.2
uu
u u
AAAAAAAA@@@@
X4
X1
X3 X2
4.9 6.3
3.6
Empirical MI I(Pe) from xn Max-weight spanning tree EML 6= EP
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 18 / 52
![Page 46: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/46.jpg)
ML Learning of Trees (Chow-Liu) II
uu
u u
ppppppppppppppp p p p p p p
p p p p pppppppp
pppppp
pppppppppppppp
pppppppppppp
p p p p p p pp p p p p p p
p p p p p p pp p p p pp p p p p p p p p p p p p p p p p p p p p p p p p p
X4
X1
X3 X2
5 6
4
1
3 2
uu
u u@@@@
X4
X1
X3 X2
5 6
4
True MI I(Pe) Max-weight spanning tree EP
uu
u u
ppppppppppppppp p p p p p p
p p p p pppppppp
pppppp
pppppppppppppp
pppppppppppp
p p p p p p pp p p p p p p
p p p p p p pp p p p pp p p p p p p p p p p p p p p p p p p p p p p p p p
X4
X1
X3 X2
4.9 6.3
3.5
1.1
3.6 2.2
uu
u u
AAAAAAAA@@@@
X4
X1
X3 X2
4.9 6.3
3.6
Empirical MI I(Pe) from xn Max-weight spanning tree EML 6= EP
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 18 / 52
![Page 47: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/47.jpg)
ML Learning of Trees (Chow-Liu) II
uu
u u
ppppppppppppppp p p p p p p
p p p p pppppppp
pppppp
pppppppppppppp
pppppppppppp
p p p p p p pp p p p p p p
p p p p p p pp p p p pp p p p p p p p p p p p p p p p p p p p p p p p p p
X4
X1
X3 X2
5 6
4
1
3 2
uu
u u@@@@
X4
X1
X3 X2
5 6
4
True MI I(Pe) Max-weight spanning tree EP
uu
u u
ppppppppppppppp p p p p p p
p p p p pppppppp
pppppp
pppppppppppppp
pppppppppppp
p p p p p p pp p p p p p p
p p p p p p pp p p p pp p p p p p p p p p p p p p p p p p p p p p p p p p
X4
X1
X3 X2
4.9 6.3
3.5
1.1
3.6 2.2
uu
u u
AAAAAAAA@@@@
X4
X1
X3 X2
4.9 6.3
3.6
Empirical MI I(Pe) from xn Max-weight spanning tree EML 6= EP
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 18 / 52
![Page 48: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/48.jpg)
ML Learning of Trees (Chow-Liu) II
uu
u u
ppppppppppppppp p p p p p p
p p p p pppppppp
pppppp
pppppppppppppp
pppppppppppp
p p p p p p pp p p p p p p
p p p p p p pp p p p pp p p p p p p p p p p p p p p p p p p p p p p p p p
X4
X1
X3 X2
5 6
4
1
3 2
uu
u u@@@@
X4
X1
X3 X2
5 6
4
True MI I(Pe) Max-weight spanning tree EP
uu
u u
ppppppppppppppp p p p p p p
p p p p pppppppp
pppppp
pppppppppppppp
pppppppppppp
p p p p p p pp p p p p p p
p p p p p p pp p p p pp p p p p p p p p p p p p p p p p p p p p p p p p p
X4
X1
X3 X2
4.9 6.3
3.5
1.1
3.6 2.2
uu
u u
AAAAAAAA@@@@
X4
X1
X3 X2
4.9 6.3
3.6
Empirical MI I(Pe) from xn Max-weight spanning tree EML 6= EP
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 18 / 52
![Page 49: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/49.jpg)
Problem Statement
Define PML to be ML tree-structured distribution with edge set EML
and the error event is EML 6= EP
uu
u u
AAAAAAAA@@@@
X4
X1
X3 X2
4.9 6.3
3.6
uu
u u@@@@
X4
X1
X3 X2
5 6
4
Find the error exponent KP:
KP , limn→∞
−1n
log Pn (EML 6= EP) Pn (EML 6= EP).= exp(−nKP)
Naïvely, what could we do to compute KP?I-projections onto all trees?
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 19 / 52
![Page 50: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/50.jpg)
Problem Statement
Define PML to be ML tree-structured distribution with edge set EML
and the error event is EML 6= EP
uu
u u
AAAAAAAA@@@@
X4
X1
X3 X2
4.9 6.3
3.6
uu
u u@@@@
X4
X1
X3 X2
5 6
4
Find the error exponent KP:
KP , limn→∞
−1n
log Pn (EML 6= EP) Pn (EML 6= EP).= exp(−nKP)
Naïvely, what could we do to compute KP?I-projections onto all trees?
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 19 / 52
![Page 51: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/51.jpg)
Problem Statement
Define PML to be ML tree-structured distribution with edge set EML
and the error event is EML 6= EP
uu
u u
AAAAAAAA@@@@
X4
X1
X3 X2
4.9 6.3
3.6
uu
u u@@@@
X4
X1
X3 X2
5 6
4
Find the error exponent KP:
KP , limn→∞
−1n
log Pn (EML 6= EP)
Pn (EML 6= EP).= exp(−nKP)
Naïvely, what could we do to compute KP?I-projections onto all trees?
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 19 / 52
![Page 52: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/52.jpg)
Problem Statement
Define PML to be ML tree-structured distribution with edge set EML
and the error event is EML 6= EP
uu
u u
AAAAAAAA@@@@
X4
X1
X3 X2
4.9 6.3
3.6
uu
u u@@@@
X4
X1
X3 X2
5 6
4
Find the error exponent KP:
KP , limn→∞
−1n
log Pn (EML 6= EP) Pn (EML 6= EP).= exp(−nKP)
Naïvely, what could we do to compute KP?I-projections onto all trees?
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 19 / 52
![Page 53: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/53.jpg)
Problem Statement
Define PML to be ML tree-structured distribution with edge set EML
and the error event is EML 6= EP
uu
u u
AAAAAAAA@@@@
X4
X1
X3 X2
4.9 6.3
3.6
uu
u u@@@@
X4
X1
X3 X2
5 6
4
Find the error exponent KP:
KP , limn→∞
−1n
log Pn (EML 6= EP) Pn (EML 6= EP).= exp(−nKP)
Naïvely, what could we do to compute KP?
I-projections onto all trees?
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 19 / 52
![Page 54: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/54.jpg)
Problem Statement
Define PML to be ML tree-structured distribution with edge set EML
and the error event is EML 6= EP
uu
u u
AAAAAAAA@@@@
X4
X1
X3 X2
4.9 6.3
3.6
uu
u u@@@@
X4
X1
X3 X2
5 6
4
Find the error exponent KP:
KP , limn→∞
−1n
log Pn (EML 6= EP) Pn (EML 6= EP).= exp(−nKP)
Naïvely, what could we do to compute KP?I-projections onto all trees?
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 19 / 52
![Page 55: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/55.jpg)
The Crossover Rate I
Correct Structure
True MI I(Pe) 6 5 4 3 2 1Emp MI I(Pe) 6.2 5.6 4.5 2.8 2.2 1.1 s
ss s@@@
6.2 5.6
4.5
Incorrect Structure!
True MI I(Pe) 6 5 4 3 2 1Emp MI I(Pe) 6.3 4.9 3.5 3.6 2.2 1.1 s
ss s
AAAAAA@@@
4.9 6.3
3.6
Structure Unaffected
True MI I(Pe) 6 5 4 3 2 1Emp MI I(Pe) 5.5 5.6 4.5 3.0 2.2 1.1 s
ss s@@@
5.6 5.5
4.5
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 20 / 52
![Page 56: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/56.jpg)
The Crossover Rate I
Correct Structure
True MI I(Pe) 6 5 4 3 2 1Emp MI I(Pe) 6.2 5.6 4.5 2.8 2.2 1.1 s
ss s@@@
6.2 5.6
4.5
Incorrect Structure!
True MI I(Pe) 6 5 4 3 2 1Emp MI I(Pe) 6.3 4.9 3.5 3.6 2.2 1.1 s
ss s
AAAAAA@@@
4.9 6.3
3.6
Structure Unaffected
True MI I(Pe) 6 5 4 3 2 1Emp MI I(Pe) 5.5 5.6 4.5 3.0 2.2 1.1 s
ss s@@@
5.6 5.5
4.5
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 20 / 52
![Page 57: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/57.jpg)
The Crossover Rate I
Correct Structure
True MI I(Pe) 6 5 4 3 2 1Emp MI I(Pe) 6.2 5.6 4.5 2.8 2.2 1.1 s
ss s@@@
6.2 5.6
4.5
Incorrect Structure!
True MI I(Pe) 6 5 4 3 2 1Emp MI I(Pe) 6.3 4.9 3.5 3.6 2.2 1.1 s
ss s
AAAAAA@@@
4.9 6.3
3.6
Structure Unaffected
True MI I(Pe) 6 5 4 3 2 1Emp MI I(Pe) 5.5 5.6 4.5 3.0 2.2 1.1 s
ss s@@@
5.6 5.5
4.5
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 20 / 52
![Page 58: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/58.jpg)
The Crossover Rate I
w w w we e′
Given two node pairs e, e′ ∈(V
2
)with joint distribution Pe,e′ ∈ P(X 4), s.t.
I(Pe) > I(Pe′).
Consider the crossover event of the empirical MI
I(Pe) ≤ I(Pe′)
Def: Crossover Rate
Je,e′ , limn→∞
−1n
log Pn(
I(Pe) ≤ I(Pe′))
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 21 / 52
![Page 59: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/59.jpg)
The Crossover Rate I
w w w we e′
Given two node pairs e, e′ ∈(V
2
)with joint distribution Pe,e′ ∈ P(X 4), s.t.
I(Pe) > I(Pe′).
Consider the crossover event of the empirical MI
I(Pe) ≤ I(Pe′)
Def: Crossover Rate
Je,e′ , limn→∞
−1n
log Pn(
I(Pe) ≤ I(Pe′))
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 21 / 52
![Page 60: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/60.jpg)
The Crossover Rate I
w w w we e′
Given two node pairs e, e′ ∈(V
2
)with joint distribution Pe,e′ ∈ P(X 4), s.t.
I(Pe) > I(Pe′).
Consider the crossover event of the empirical MI
I(Pe) ≤ I(Pe′)
Def: Crossover Rate
Je,e′ , limn→∞
−1n
log Pn(
I(Pe) ≤ I(Pe′))
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 21 / 52
![Page 61: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/61.jpg)
The Crossover Rate I
w w w we e′
Given two node pairs e, e′ ∈(V
2
)with joint distribution Pe,e′ ∈ P(X 4), s.t.
I(Pe) > I(Pe′).
Consider the crossover event of the empirical MI
I(Pe) ≤ I(Pe′)
Def: Crossover Rate
Je,e′ , limn→∞
−1n
log Pn(
I(Pe) ≤ I(Pe′))
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 21 / 52
![Page 62: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/62.jpg)
The Crossover Rate II
w w w we e′ I(Pe) > I(Pe′) I(Pe) ≤ I(Pe′)
PropositionThe crossover rate for empirical mutual informations is
Je,e′ = minQ∈P(X 4)
D(Q ||Pe,e′) : I(Qe′) = I(Qe)
P(X 4) vPe,e′
I(Qe)= I(Qe′)vQ∗e,e′
D(Q∗e,e′ ||Pe,e′)
I-projection (Csiszár)
Sanov’s Theorem
Exact but not intuitive
Non-Convex
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 22 / 52
![Page 63: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/63.jpg)
The Crossover Rate II
w w w we e′ I(Pe) > I(Pe′) I(Pe) ≤ I(Pe′)
PropositionThe crossover rate for empirical mutual informations is
Je,e′ = minQ∈P(X 4)
D(Q ||Pe,e′) : I(Qe′) = I(Qe)
P(X 4) vPe,e′
I(Qe)= I(Qe′)vQ∗e,e′
D(Q∗e,e′ ||Pe,e′)
I-projection (Csiszár)
Sanov’s Theorem
Exact but not intuitive
Non-Convex
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 22 / 52
![Page 64: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/64.jpg)
The Crossover Rate II
w w w we e′ I(Pe) > I(Pe′) I(Pe) ≤ I(Pe′)
PropositionThe crossover rate for empirical mutual informations is
Je,e′ = minQ∈P(X 4)
D(Q ||Pe,e′) : I(Qe′) = I(Qe)
P(X 4)
vPe,e′
I(Qe)= I(Qe′)vQ∗e,e′
D(Q∗e,e′ ||Pe,e′)
I-projection (Csiszár)
Sanov’s Theorem
Exact but not intuitive
Non-Convex
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 22 / 52
![Page 65: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/65.jpg)
The Crossover Rate II
w w w we e′ I(Pe) > I(Pe′) I(Pe) ≤ I(Pe′)
PropositionThe crossover rate for empirical mutual informations is
Je,e′ = minQ∈P(X 4)
D(Q ||Pe,e′) : I(Qe′) = I(Qe)
P(X 4) vPe,e′
I(Qe)= I(Qe′)vQ∗e,e′
D(Q∗e,e′ ||Pe,e′)
I-projection (Csiszár)
Sanov’s Theorem
Exact but not intuitive
Non-Convex
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 22 / 52
![Page 66: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/66.jpg)
The Crossover Rate II
w w w we e′ I(Pe) > I(Pe′) I(Pe) ≤ I(Pe′)
PropositionThe crossover rate for empirical mutual informations is
Je,e′ = minQ∈P(X 4)
D(Q ||Pe,e′) : I(Qe′) = I(Qe)
P(X 4) vPe,e′
I(Qe)= I(Qe′)
vQ∗e,e′
D(Q∗e,e′ ||Pe,e′)
I-projection (Csiszár)
Sanov’s Theorem
Exact but not intuitive
Non-Convex
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 22 / 52
![Page 67: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/67.jpg)
The Crossover Rate II
w w w we e′ I(Pe) > I(Pe′) I(Pe) ≤ I(Pe′)
PropositionThe crossover rate for empirical mutual informations is
Je,e′ = minQ∈P(X 4)
D(Q ||Pe,e′) : I(Qe′) = I(Qe)
P(X 4) vPe,e′
I(Qe)= I(Qe′)vQ∗e,e′
D(Q∗e,e′ ||Pe,e′)
I-projection (Csiszár)
Sanov’s Theorem
Exact but not intuitive
Non-Convex
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 22 / 52
![Page 68: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/68.jpg)
The Crossover Rate II
w w w we e′ I(Pe) > I(Pe′) I(Pe) ≤ I(Pe′)
PropositionThe crossover rate for empirical mutual informations is
Je,e′ = minQ∈P(X 4)
D(Q ||Pe,e′) : I(Qe′) = I(Qe)
P(X 4) vPe,e′
I(Qe)= I(Qe′)vQ∗e,e′
D(Q∗e,e′ ||Pe,e′)
I-projection (Csiszár)
Sanov’s Theorem
Exact but not intuitive
Non-Convex
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 22 / 52
![Page 69: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/69.jpg)
Error Exponent for Structure Learning I
How to calculate the error exponent KP with the crossover rates Je,e′?
Easy only in some very special cases
“Star” graph withI(Qa) > I(Qb) > 0
There is a unique crossoverrate
The unique crossover rate isthe error exponent w
ww
www
www
@@@@@@@@@@
Qa
Qb
w w
KP = minR∈P(X 4)
D(R ||Qa,b) : I(Re) = I(Re′)
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 23 / 52
![Page 70: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/70.jpg)
Error Exponent for Structure Learning I
How to calculate the error exponent KP with the crossover rates Je,e′?
Easy only in some very special cases
“Star” graph withI(Qa) > I(Qb) > 0
There is a unique crossoverrate
The unique crossover rate isthe error exponent w
ww
www
www
@@@@@@@@@@
Qa
Qb
w w
KP = minR∈P(X 4)
D(R ||Qa,b) : I(Re) = I(Re′)
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 23 / 52
![Page 71: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/71.jpg)
Error Exponent for Structure Learning I
How to calculate the error exponent KP with the crossover rates Je,e′?
Easy only in some very special cases
“Star” graph withI(Qa) > I(Qb) > 0
There is a unique crossoverrate
The unique crossover rate isthe error exponent w
ww
www
www
@@@@@@@@@@
Qa
Qb
w w
KP = minR∈P(X 4)
D(R ||Qa,b) : I(Re) = I(Re′)
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 23 / 52
![Page 72: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/72.jpg)
Error Exponent for Structure Learning I
How to calculate the error exponent KP with the crossover rates Je,e′?
Easy only in some very special cases
“Star” graph withI(Qa) > I(Qb) > 0
There is a unique crossoverrate
The unique crossover rate isthe error exponent w
ww
www
www
@@@@@@@@@@
Qa
Qb
w w
KP = minR∈P(X 4)
D(R ||Qa,b) : I(Re) = I(Re′)
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 23 / 52
![Page 73: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/73.jpg)
Error Exponent for Structure Learning I
How to calculate the error exponent KP with the crossover rates Je,e′?
Easy only in some very special cases
“Star” graph withI(Qa) > I(Qb) > 0
There is a unique crossoverrate
The unique crossover rate isthe error exponent w
ww
www
www
@@@@@@@@@@
Qa
Qb
w w
KP = minR∈P(X 4)
D(R ||Qa,b) : I(Re) = I(Re′)
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 23 / 52
![Page 74: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/74.jpg)
Error Exponent for Structure Learning II
A large deviation is done in the least unlikely of all unlikely ways.
– “Large deviations” by F. Den Hollander
uu
u
u u
u
u
u
uu
u uu
u
@@@@
@@
@@
@@@@
TP ∈ T
e′ /∈ EP
vv
v vv
@@@@
Path(e′; EP)
dominatesv v T ′P 6= TP
Theorem (Error Exponent)
KP = mine′ /∈EP
(min
e∈Path(e′;EP)Je,e′
)
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 24 / 52
![Page 75: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/75.jpg)
Error Exponent for Structure Learning II
A large deviation is done in the least unlikely of all unlikely ways.
– “Large deviations” by F. Den Hollander
uu
u
u u
u
u
u
uu
u uu
u
@@@@
@@
@@
@@@@
TP ∈ T
e′ /∈ EP
vv
v vv
@@@@
Path(e′; EP)
dominatesv v T ′P 6= TP
Theorem (Error Exponent)
KP = mine′ /∈EP
(min
e∈Path(e′;EP)Je,e′
)
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 24 / 52
![Page 76: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/76.jpg)
Error Exponent for Structure Learning II
A large deviation is done in the least unlikely of all unlikely ways.
– “Large deviations” by F. Den Hollander
uu
u
u u
u
u
u
uu
u uu
u
@@@@
@@
@@
@@@@
TP ∈ T
e′ /∈ EP
vv
v vv
@@@@
Path(e′; EP)
dominatesv v T ′P 6= TP
Theorem (Error Exponent)
KP = mine′ /∈EP
(min
e∈Path(e′;EP)Je,e′
)
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 24 / 52
![Page 77: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/77.jpg)
Error Exponent for Structure Learning II
A large deviation is done in the least unlikely of all unlikely ways.
– “Large deviations” by F. Den Hollander
uu
u
u u
u
u
u
uu
u uu
u
@@@@
@@
@@
@@@@
TP ∈ T
e′ /∈ EP
vv
v vv
@@@@
Path(e′; EP)
dominatesv v T ′P 6= TP
Theorem (Error Exponent)
KP = mine′ /∈EP
(min
e∈Path(e′;EP)Je,e′
)
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 24 / 52
![Page 78: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/78.jpg)
Error Exponent for Structure Learning II
A large deviation is done in the least unlikely of all unlikely ways.
– “Large deviations” by F. Den Hollander
uu
u
u u
u
u
u
uu
u uu
u
@@@@
@@
@@
@@@@
TP ∈ T
e′ /∈ EP
vv
v vv
@@@@
Path(e′; EP)
dominates
v v T ′P 6= TP
Theorem (Error Exponent)
KP = mine′ /∈EP
(min
e∈Path(e′;EP)Je,e′
)
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 24 / 52
![Page 79: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/79.jpg)
Error Exponent for Structure Learning II
A large deviation is done in the least unlikely of all unlikely ways.
– “Large deviations” by F. Den Hollander
uu
u
u u
u
u
u
uu
u uu
u
@@@@
@@
@@
@@@@
TP ∈ T
e′ /∈ EP
vv
v vv
@@@@
Path(e′; EP)
dominatesv v T ′P 6= TP
Theorem (Error Exponent)
KP = mine′ /∈EP
(min
e∈Path(e′;EP)Je,e′
)
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 24 / 52
![Page 80: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/80.jpg)
Error Exponent for Structure Learning II
A large deviation is done in the least unlikely of all unlikely ways.
– “Large deviations” by F. Den Hollander
uu
u
u u
u
u
u
uu
u uu
u
@@@@
@@
@@
@@@@
TP ∈ T
e′ /∈ EP
vv
v vv
@@@@
Path(e′; EP)
dominatesv v T ′P 6= TP
Theorem (Error Exponent)
KP = mine′ /∈EP
(min
e∈Path(e′;EP)Je,e′
)
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 24 / 52
![Page 81: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/81.jpg)
Error Exponent for Structure Learning III
Pn (EML 6= EP).= exp
[−n min
e′ /∈EP
(min
e∈Path(e′;EP)Je,e′
)]
We have a finite-sample result too! See thesis
PropositionThe following statements are equivalent:
(a) The error probability decays exponentially, i.e., KP > 0
(b) TP is a connected tree, i.e., not a proper forest
6
-
Pn(err)
n = # Samplesss
s s@@@
KP>0 ss
s s@@@
KP =0
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 25 / 52
![Page 82: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/82.jpg)
Error Exponent for Structure Learning III
Pn (EML 6= EP).= exp
[−n min
e′ /∈EP
(min
e∈Path(e′;EP)Je,e′
)]
We have a finite-sample result too! See thesis
PropositionThe following statements are equivalent:
(a) The error probability decays exponentially, i.e., KP > 0
(b) TP is a connected tree, i.e., not a proper forest
6
-
Pn(err)
n = # Samplesss
s s@@@
KP>0 ss
s s@@@
KP =0
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 25 / 52
![Page 83: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/83.jpg)
Error Exponent for Structure Learning III
Pn (EML 6= EP).= exp
[−n min
e′ /∈EP
(min
e∈Path(e′;EP)Je,e′
)]
We have a finite-sample result too! See thesis
PropositionThe following statements are equivalent:
(a) The error probability decays exponentially, i.e., KP > 0
(b) TP is a connected tree, i.e., not a proper forest
6
-
Pn(err)
n = # Samplesss
s s@@@
KP>0 ss
s s@@@
KP =0
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 25 / 52
![Page 84: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/84.jpg)
Error Exponent for Structure Learning III
Pn (EML 6= EP).= exp
[−n min
e′ /∈EP
(min
e∈Path(e′;EP)Je,e′
)]
We have a finite-sample result too! See thesis
PropositionThe following statements are equivalent:
(a) The error probability decays exponentially, i.e., KP > 0
(b) TP is a connected tree, i.e., not a proper forest
6
-
Pn(err)
n = # Samples
ss
s s@@@
KP>0 ss
s s@@@
KP =0
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 25 / 52
![Page 85: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/85.jpg)
Error Exponent for Structure Learning III
Pn (EML 6= EP).= exp
[−n min
e′ /∈EP
(min
e∈Path(e′;EP)Je,e′
)]
We have a finite-sample result too! See thesis
PropositionThe following statements are equivalent:
(a) The error probability decays exponentially, i.e., KP > 0
(b) TP is a connected tree, i.e., not a proper forest
6
-
Pn(err)
n = # Samplesss
s s@@@
KP>0
ss
s s@@@
KP =0
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 25 / 52
![Page 86: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/86.jpg)
Error Exponent for Structure Learning III
Pn (EML 6= EP).= exp
[−n min
e′ /∈EP
(min
e∈Path(e′;EP)Je,e′
)]
We have a finite-sample result too! See thesis
PropositionThe following statements are equivalent:
(a) The error probability decays exponentially, i.e., KP > 0
(b) TP is a connected tree, i.e., not a proper forest
6
-
Pn(err)
n = # Samplesss
s s@@@
KP>0 ss
s s@@@
KP =0
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 25 / 52
![Page 87: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/87.jpg)
Approximating The Crossover Rate I
Def: Very-noisy learning condition on Pe,e′
Pe ≈ Pe′
I(Pe) ≈ I(Pe′) w
w w
Pe
Pe′
Euclidean Information Theory [Borade & Zheng ’08]:
P ≈ Q ⇒ D(P ||Q) ≈ 12
∑
a
(P(a)− Q(a))2
P(a)
Def: Given a Pe = Pi,j the information density is
Se(Xi; Xj) , logPi,j(Xi,Xj)
Pi(Xi)Pj(Xj), E[Se] = I(Pe).
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 26 / 52
![Page 88: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/88.jpg)
Approximating The Crossover Rate I
Def: Very-noisy learning condition on Pe,e′
Pe ≈ Pe′
I(Pe) ≈ I(Pe′)
w
w w
Pe
Pe′
Euclidean Information Theory [Borade & Zheng ’08]:
P ≈ Q ⇒ D(P ||Q) ≈ 12
∑
a
(P(a)− Q(a))2
P(a)
Def: Given a Pe = Pi,j the information density is
Se(Xi; Xj) , logPi,j(Xi,Xj)
Pi(Xi)Pj(Xj), E[Se] = I(Pe).
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 26 / 52
![Page 89: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/89.jpg)
Approximating The Crossover Rate I
Def: Very-noisy learning condition on Pe,e′
Pe ≈ Pe′
I(Pe) ≈ I(Pe′) w
w w
Pe
Pe′
Euclidean Information Theory [Borade & Zheng ’08]:
P ≈ Q ⇒ D(P ||Q) ≈ 12
∑
a
(P(a)− Q(a))2
P(a)
Def: Given a Pe = Pi,j the information density is
Se(Xi; Xj) , logPi,j(Xi,Xj)
Pi(Xi)Pj(Xj), E[Se] = I(Pe).
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 26 / 52
![Page 90: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/90.jpg)
Approximating The Crossover Rate I
Def: Very-noisy learning condition on Pe,e′
Pe ≈ Pe′
I(Pe) ≈ I(Pe′) w
w w
Pe
Pe′
Euclidean Information Theory [Borade & Zheng ’08]:
P ≈ Q ⇒ D(P ||Q) ≈ 12
∑
a
(P(a)− Q(a))2
P(a)
Def: Given a Pe = Pi,j the information density is
Se(Xi; Xj) , logPi,j(Xi,Xj)
Pi(Xi)Pj(Xj), E[Se] = I(Pe).
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 26 / 52
![Page 91: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/91.jpg)
Approximating The Crossover Rate I
Def: Very-noisy learning condition on Pe,e′
Pe ≈ Pe′
I(Pe) ≈ I(Pe′) w
w w
Pe
Pe′
Euclidean Information Theory [Borade & Zheng ’08]:
P ≈ Q ⇒ D(P ||Q) ≈ 12
∑
a
(P(a)− Q(a))2
P(a)
Def: Given a Pe = Pi,j the information density is
Se(Xi; Xj) , logPi,j(Xi,Xj)
Pi(Xi)Pj(Xj), E[Se] = I(Pe).
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 26 / 52
![Page 92: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/92.jpg)
Approximating The Crossover Rate I
Def: Very-noisy learning condition on Pe,e′
Pe ≈ Pe′
I(Pe) ≈ I(Pe′) w
w w
Pe
Pe′
Euclidean Information Theory [Borade & Zheng ’08]:
P ≈ Q ⇒ D(P ||Q) ≈ 12
∑
a
(P(a)− Q(a))2
P(a)
Def: Given a Pe = Pi,j the information density is
Se(Xi; Xj) , logPi,j(Xi,Xj)
Pi(Xi)Pj(Xj), E[Se] = I(Pe).
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 26 / 52
![Page 93: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/93.jpg)
Approximating The Crossover Rate II
Convexifying the optimization problem by linearizing constraints
vPe,e′
I(Qe)= I(Qe′)vQ∗e,e′
D(Q∗e,e′ ||Pe,e′)
v
v
Pe,e′
Q∗e,e′ Q(Pe,e′)
12‖Q∗e,e′−Pe,e′‖2
Pe,e′
Theorem (Euclidean Approximation of Crossover Rate)
Je,e′ =(I(Pe′)− I(Pe))
2
2 Var(Se′ − Se)=
(E[Se′ − Se])2
2 Var(Se′ − Se)=
12
SNR
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 27 / 52
![Page 94: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/94.jpg)
Approximating The Crossover Rate II
Convexifying the optimization problem by linearizing constraintsvPe,e′
I(Qe)= I(Qe′)vQ∗e,e′
D(Q∗e,e′ ||Pe,e′)
v
v
Pe,e′
Q∗e,e′ Q(Pe,e′)
12‖Q∗e,e′−Pe,e′‖2
Pe,e′
Theorem (Euclidean Approximation of Crossover Rate)
Je,e′ =(I(Pe′)− I(Pe))
2
2 Var(Se′ − Se)=
(E[Se′ − Se])2
2 Var(Se′ − Se)=
12
SNR
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 27 / 52
![Page 95: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/95.jpg)
Approximating The Crossover Rate II
Convexifying the optimization problem by linearizing constraintsvPe,e′
I(Qe)= I(Qe′)vQ∗e,e′
D(Q∗e,e′ ||Pe,e′)
v
v
Pe,e′
Q∗e,e′ Q(Pe,e′)
12‖Q∗e,e′−Pe,e′‖2
Pe,e′
Theorem (Euclidean Approximation of Crossover Rate)
Je,e′ =(I(Pe′)− I(Pe))
2
2 Var(Se′ − Se)=
(E[Se′ − Se])2
2 Var(Se′ − Se)=
12
SNR
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 27 / 52
![Page 96: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/96.jpg)
Approximating The Crossover Rate II
Convexifying the optimization problem by linearizing constraintsvPe,e′
I(Qe)= I(Qe′)vQ∗e,e′
D(Q∗e,e′ ||Pe,e′)
v
v
Pe,e′
Q∗e,e′ Q(Pe,e′)
12‖Q∗e,e′−Pe,e′‖2
Pe,e′
Theorem (Euclidean Approximation of Crossover Rate)
Je,e′ =(I(Pe′)− I(Pe))
2
2 Var(Se′ − Se)
=(E[Se′ − Se])
2
2 Var(Se′ − Se)=
12
SNR
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 27 / 52
![Page 97: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/97.jpg)
Approximating The Crossover Rate II
Convexifying the optimization problem by linearizing constraintsvPe,e′
I(Qe)= I(Qe′)vQ∗e,e′
D(Q∗e,e′ ||Pe,e′)
v
v
Pe,e′
Q∗e,e′ Q(Pe,e′)
12‖Q∗e,e′−Pe,e′‖2
Pe,e′
Theorem (Euclidean Approximation of Crossover Rate)
Je,e′ =(I(Pe′)− I(Pe))
2
2 Var(Se′ − Se)=
(E[Se′ − Se])2
2 Var(Se′ − Se)=
12
SNR
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 27 / 52
![Page 98: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/98.jpg)
The Crossover Rate
How good is the approximation? We consider a binary model
0 0.01 0.02 0.03 0.04 0.05 0.06 0.070
0.005
0.01
0.015
0.02
0.025
I(Pe)−I(P
e′)
Rat
e J e,
e′
True RateApprox Rate
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 28 / 52
![Page 99: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/99.jpg)
Remarks for Learning Discrete Trees
Characterized precisely the error exponent for structure learning
Pn (EML 6= EP).= exp(−nKP)
Analysis tools include the method of types (large-deviations) andsimple properties of trees
Analyzed the very-noisy learning regime (Euclidean InformationTheory) where learning is error-prone
Extensions to learning the tree projection for non-trees have alsobeen studied.
VYFT, A. Anandkumar, L. Tong, A. S. Willsky “A Large-Deviation Analysis of theMaximum-Likelihood Learning of Markov Tree Structures,” ISIT 2009, submitted to IEEE Trans.on Information Theory, revised in Oct 2010.
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 29 / 52
![Page 100: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/100.jpg)
Remarks for Learning Discrete Trees
Characterized precisely the error exponent for structure learning
Pn (EML 6= EP).= exp(−nKP)
Analysis tools include the method of types (large-deviations) andsimple properties of trees
Analyzed the very-noisy learning regime (Euclidean InformationTheory) where learning is error-prone
Extensions to learning the tree projection for non-trees have alsobeen studied.
VYFT, A. Anandkumar, L. Tong, A. S. Willsky “A Large-Deviation Analysis of theMaximum-Likelihood Learning of Markov Tree Structures,” ISIT 2009, submitted to IEEE Trans.on Information Theory, revised in Oct 2010.
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 29 / 52
![Page 101: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/101.jpg)
Remarks for Learning Discrete Trees
Characterized precisely the error exponent for structure learning
Pn (EML 6= EP).= exp(−nKP)
Analysis tools include the method of types (large-deviations) andsimple properties of trees
Analyzed the very-noisy learning regime (Euclidean InformationTheory) where learning is error-prone
Extensions to learning the tree projection for non-trees have alsobeen studied.
VYFT, A. Anandkumar, L. Tong, A. S. Willsky “A Large-Deviation Analysis of theMaximum-Likelihood Learning of Markov Tree Structures,” ISIT 2009, submitted to IEEE Trans.on Information Theory, revised in Oct 2010.
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 29 / 52
![Page 102: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/102.jpg)
Outline
1 Motivation, Background and Main Contributions
2 Learning Discrete Trees Models: Error Exponent Analysis
3 Learning Gaussian Trees Models: Extremal Structures
4 Learning High-Dimensional Forest-Structured Models
5 Related Topics and Conclusion
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 30 / 52
![Page 103: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/103.jpg)
Setup
Jointly Gaussian distribution in very-noisy learning regime
p(x) ∝ exp(−1
2xTΣ−1x
), x ∈ Rd.
Zero-mean, unit variances
Keep correlations coefficients on edges fixed – specifies theGaussian graphical model by Markovianity
ρi is the correlation coefficienton edge ei for i = 1, . . . , d − 1
w
w w
wρ1
ρ2
ρ3
ρ1 ρ2 ρ3
Compare the error exponent associated to different structures
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 31 / 52
![Page 104: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/104.jpg)
Setup
Jointly Gaussian distribution in very-noisy learning regime
p(x) ∝ exp(−1
2xTΣ−1x
), x ∈ Rd.
Zero-mean, unit variances
Keep correlations coefficients on edges fixed – specifies theGaussian graphical model by Markovianity
ρi is the correlation coefficienton edge ei for i = 1, . . . , d − 1
w
w w
wρ1
ρ2
ρ3
ρ1 ρ2 ρ3
Compare the error exponent associated to different structures
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 31 / 52
![Page 105: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/105.jpg)
Setup
Jointly Gaussian distribution in very-noisy learning regime
p(x) ∝ exp(−1
2xTΣ−1x
), x ∈ Rd.
Zero-mean, unit variances
Keep correlations coefficients on edges fixed – specifies theGaussian graphical model by Markovianity
ρi is the correlation coefficienton edge ei for i = 1, . . . , d − 1
w
w w
wρ1
ρ2
ρ3
ρ1 ρ2 ρ3
Compare the error exponent associated to different structures
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 31 / 52
![Page 106: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/106.jpg)
Setup
Jointly Gaussian distribution in very-noisy learning regime
p(x) ∝ exp(−1
2xTΣ−1x
), x ∈ Rd.
Zero-mean, unit variances
Keep correlations coefficients on edges fixed – specifies theGaussian graphical model by Markovianity
ρi is the correlation coefficienton edge ei for i = 1, . . . , d − 1
w
w w
wρ1
ρ2
ρ3
ρ1 ρ2 ρ3
Compare the error exponent associated to different structures
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 31 / 52
![Page 107: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/107.jpg)
The Gaussian Case: Extremal Tree Structures
Theorem (Extremal Structures)Under the very-noisy assumption,
Star graphs are hardest to learn (smallest approx error exponent)
Markov chains are easiest to learn (largest approx error exponent)
wwww
wρ3
ρ1
ρ2ρ4
Star
w w w w wρπ(1) ρπ(2) ρπ(3) ρπ(4)
Chainπ: Permutation
6
-
Pn(err)
n = # Samples
Chain
Star Any other tree
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 32 / 52
![Page 108: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/108.jpg)
The Gaussian Case: Extremal Tree Structures
Theorem (Extremal Structures)Under the very-noisy assumption,
Star graphs are hardest to learn (smallest approx error exponent)
Markov chains are easiest to learn (largest approx error exponent)
wwww
wρ3
ρ1
ρ2ρ4
Star
w w w w wρπ(1) ρπ(2) ρπ(3) ρπ(4)
Chainπ: Permutation
6
-
Pn(err)
n = # Samples
Chain
Star Any other tree
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 32 / 52
![Page 109: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/109.jpg)
The Gaussian Case: Extremal Tree Structures
Theorem (Extremal Structures)Under the very-noisy assumption,
Star graphs are hardest to learn (smallest approx error exponent)
Markov chains are easiest to learn (largest approx error exponent)
wwww
wρ3
ρ1
ρ2ρ4
Star
w w w w wρπ(1) ρπ(2) ρπ(3) ρπ(4)
Chainπ: Permutation
6
-
Pn(err)
n = # Samples
Chain
Star Any other tree
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 32 / 52
![Page 110: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/110.jpg)
The Gaussian Case: Extremal Tree Structures
Theorem (Extremal Structures)Under the very-noisy assumption,
Star graphs are hardest to learn (smallest approx error exponent)
Markov chains are easiest to learn (largest approx error exponent)
wwww
wρ3
ρ1
ρ2ρ4
Star
w w w w wρπ(1) ρπ(2) ρπ(3) ρπ(4)
Chainπ: Permutation
6
-
Pn(err)
n = # Samples
Chain
Star Any other tree
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 32 / 52
![Page 111: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/111.jpg)
The Gaussian Case: Extremal Tree Structures
Theorem (Extremal Structures)Under the very-noisy assumption,
Star graphs are hardest to learn (smallest approx error exponent)
Markov chains are easiest to learn (largest approx error exponent)
wwww
wρ3
ρ1
ρ2ρ4
Star
w w w w wρπ(1) ρπ(2) ρπ(3) ρπ(4)
Chainπ: Permutation
6
-
Pn(err)
n = # Samples
Chain
Star Any other tree
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 32 / 52
![Page 112: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/112.jpg)
The Gaussian Case: Extremal Tree Structures
Theorem (Extremal Structures)Under the very-noisy assumption,
Star graphs are hardest to learn (smallest approx error exponent)
Markov chains are easiest to learn (largest approx error exponent)
wwww
wρ3
ρ1
ρ2ρ4
Star
w w w w wρπ(1) ρπ(2) ρπ(3) ρπ(4)
Chainπ: Permutation
6
-
Pn(err)
n = # Samples
Chain
Star
Any other tree
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 32 / 52
![Page 113: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/113.jpg)
The Gaussian Case: Extremal Tree Structures
Theorem (Extremal Structures)Under the very-noisy assumption,
Star graphs are hardest to learn (smallest approx error exponent)
Markov chains are easiest to learn (largest approx error exponent)
wwww
wρ3
ρ1
ρ2ρ4
Star
w w w w wρπ(1) ρπ(2) ρπ(3) ρπ(4)
Chainπ: Permutation
6
-
Pn(err)
n = # Samples
Chain
Star Any other tree
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 32 / 52
![Page 114: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/114.jpg)
Numerical Simulations
Chain, Star and Hybrid for d = 10
ρi = 0.1× i i ∈ [1 : 9]
u u u u u uuu
uu
@@@
ρi
P(error) −1n logP(error)
103
104
0.2
0.4
0.6
0.8
Number of samples n
Sim
ulat
ed P
rob
of E
rror
ChainHybridStar
103
104
0
0.5
1
1.5
2
2.5x 10
−3
Number of samples n
Sim
ulat
ed E
rror
Exp
onen
t
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 33 / 52
![Page 115: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/115.jpg)
Numerical Simulations
Chain, Star and Hybrid for d = 10
ρi = 0.1× i i ∈ [1 : 9]
u u u u u uuu
uu
@@@
ρi
P(error) −1n logP(error)
103
104
0.2
0.4
0.6
0.8
Number of samples n
Sim
ulat
ed P
rob
of E
rror
ChainHybridStar
103
104
0
0.5
1
1.5
2
2.5x 10
−3
Number of samples n
Sim
ulat
ed E
rror
Exp
onen
t
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 33 / 52
![Page 116: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/116.jpg)
Numerical Simulations
Chain, Star and Hybrid for d = 10
ρi = 0.1× i i ∈ [1 : 9]
u u u u u uuu
uu
@@@
ρi
P(error) −1n logP(error)
103
104
0.2
0.4
0.6
0.8
Number of samples n
Sim
ulat
ed P
rob
of E
rror
ChainHybridStar
103
104
0
0.5
1
1.5
2
2.5x 10
−3
Number of samples n
Sim
ulat
ed E
rror
Exp
onen
t
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 33 / 52
![Page 117: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/117.jpg)
Numerical Simulations
Chain, Star and Hybrid for d = 10
ρi = 0.1× i i ∈ [1 : 9]
u u u u u uuu
uu
@@@
ρi
P(error) −1n logP(error)
103
104
0.2
0.4
0.6
0.8
Number of samples n
Sim
ulat
ed P
rob
of E
rror
ChainHybridStar
103
104
0
0.5
1
1.5
2
2.5x 10
−3
Number of samples n
Sim
ulat
ed E
rror
Exp
onen
t
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 33 / 52
![Page 118: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/118.jpg)
Proof Idea and Intuition
Correlation decay
tt
t
t t
t
t
t
tt
t tt
t
@@@
@@@
@@@
e′ /∈ EPp p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p pe′ /∈ EP
p p p p p p p p p p p p p p p p p p p p p p p p p uuuu
uO(d2)p p p p p p p p p p p u u u u u
O(d)
p p p p p p p p p pp p p p p p p p p p
Number of distance-two node pairs in:
Star is O(d2)
Markov chain is O(d)
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 34 / 52
![Page 119: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/119.jpg)
Proof Idea and Intuition
Correlation decay
tt
t
t t
t
t
t
tt
t tt
t
@@@
@@@
@@@
e′ /∈ EPp p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p
e′ /∈ EP
p p p p p p p p p p p p p p p p p p p p p p p p p uuuu
uO(d2)p p p p p p p p p p p u u u u u
O(d)
p p p p p p p p p pp p p p p p p p p p
Number of distance-two node pairs in:
Star is O(d2)
Markov chain is O(d)
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 34 / 52
![Page 120: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/120.jpg)
Proof Idea and Intuition
Correlation decay
tt
t
t t
t
t
t
tt
t tt
t
@@@
@@@
@@@
e′ /∈ EPp p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p pe′ /∈ EP
p p p p p p p p p p p p p p p p p p p p p p p p p
uuuu
uO(d2)p p p p p p p p p p p u u u u u
O(d)
p p p p p p p p p pp p p p p p p p p p
Number of distance-two node pairs in:
Star is O(d2)
Markov chain is O(d)
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 34 / 52
![Page 121: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/121.jpg)
Proof Idea and Intuition
Correlation decay
tt
t
t t
t
t
t
tt
t tt
t
@@@
@@@
@@@
e′ /∈ EPp p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p pe′ /∈ EP
p p p p p p p p p p p p p p p p p p p p p p p p p uuuu
uO(d2)p p p p p p p p p p p
u u u u uO(d)
p p p p p p p p p pp p p p p p p p p p
Number of distance-two node pairs in:
Star is O(d2)
Markov chain is O(d)
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 34 / 52
![Page 122: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/122.jpg)
Proof Idea and Intuition
Correlation decay
tt
t
t t
t
t
t
tt
t tt
t
@@@
@@@
@@@
e′ /∈ EPp p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p pe′ /∈ EP
p p p p p p p p p p p p p p p p p p p p p p p p p uuuu
uO(d2)p p p p p p p p p p p u u u u u
O(d)
p p p p p p p p p pp p p p p p p p p p
Number of distance-two node pairs in:
Star is O(d2)
Markov chain is O(d)
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 34 / 52
![Page 123: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/123.jpg)
Proof Idea and Intuition
Correlation decay
tt
t
t t
t
t
t
tt
t tt
t
@@@
@@@
@@@
e′ /∈ EPp p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p pe′ /∈ EP
p p p p p p p p p p p p p p p p p p p p p p p p p uuuu
uO(d2)p p p p p p p p p p p u u u u u
O(d)
p p p p p p p p p pp p p p p p p p p p
Number of distance-two node pairs in:
Star is O(d2)
Markov chain is O(d)
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 34 / 52
![Page 124: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/124.jpg)
Concluding Remarks for Learning Gaussian Trees
Gaussianity allows us to perform further analysis to find theextremal structures for learning
Allows to derive a data-processing inequality for crossover rates
Universal result – not (strongly) dependent on choice ofcorrelations
ρ = ρ1, . . . , ρd−1
VYFT, A. Anandkumar, A. S. Willsky “Learning Gaussian Tree Models: Analysis of ErrorExponents and Extremal Structures”, Allerton 2009, IEEE Trans. on Signal Processing, May2010.
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 35 / 52
![Page 125: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/125.jpg)
Concluding Remarks for Learning Gaussian Trees
Gaussianity allows us to perform further analysis to find theextremal structures for learning
Allows to derive a data-processing inequality for crossover rates
Universal result – not (strongly) dependent on choice ofcorrelations
ρ = ρ1, . . . , ρd−1
VYFT, A. Anandkumar, A. S. Willsky “Learning Gaussian Tree Models: Analysis of ErrorExponents and Extremal Structures”, Allerton 2009, IEEE Trans. on Signal Processing, May2010.
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 35 / 52
![Page 126: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/126.jpg)
Concluding Remarks for Learning Gaussian Trees
Gaussianity allows us to perform further analysis to find theextremal structures for learning
Allows to derive a data-processing inequality for crossover rates
Universal result – not (strongly) dependent on choice ofcorrelations
ρ = ρ1, . . . , ρd−1
VYFT, A. Anandkumar, A. S. Willsky “Learning Gaussian Tree Models: Analysis of ErrorExponents and Extremal Structures”, Allerton 2009, IEEE Trans. on Signal Processing, May2010.
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 35 / 52
![Page 127: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/127.jpg)
Outline
1 Motivation, Background and Main Contributions
2 Learning Discrete Trees Models: Error Exponent Analysis
3 Learning Gaussian Trees Models: Extremal Structures
4 Learning High-Dimensional Forest-Structured Models
5 Related Topics and Conclusion
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 36 / 52
![Page 128: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/128.jpg)
Motivation: Prevent Overfitting
Chow-Liu algorithm tells us how to learn trees
Suppose we are in the high-dimensional setting where
Samples n Variables d
learning forest-structured graphical models may reduce overfittingvis-à-vis trees [Liu, Lafferty and Wasserman, 2010]
Extend Liu et al.’s work for discrete models and improveconvergence results
Strategy: Remove “weak” edges
tt
t t@@@
X4
X1
X3 X2
⇒ Reduce Num Params ⇒ tt t
t@@@
X4
X1
X3 X2
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 37 / 52
![Page 129: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/129.jpg)
Motivation: Prevent Overfitting
Chow-Liu algorithm tells us how to learn trees
Suppose we are in the high-dimensional setting where
Samples n Variables d
learning forest-structured graphical models may reduce overfittingvis-à-vis trees [Liu, Lafferty and Wasserman, 2010]
Extend Liu et al.’s work for discrete models and improveconvergence results
Strategy: Remove “weak” edges
tt
t t@@@
X4
X1
X3 X2
⇒ Reduce Num Params ⇒ tt t
t@@@
X4
X1
X3 X2
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 37 / 52
![Page 130: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/130.jpg)
Motivation: Prevent Overfitting
Chow-Liu algorithm tells us how to learn trees
Suppose we are in the high-dimensional setting where
Samples n Variables d
learning forest-structured graphical models may reduce overfittingvis-à-vis trees [Liu, Lafferty and Wasserman, 2010]
Extend Liu et al.’s work for discrete models and improveconvergence results
Strategy: Remove “weak” edges
tt
t t@@@
X4
X1
X3 X2
⇒ Reduce Num Params ⇒ tt t
t@@@
X4
X1
X3 X2
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 37 / 52
![Page 131: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/131.jpg)
Motivation: Prevent Overfitting
Chow-Liu algorithm tells us how to learn trees
Suppose we are in the high-dimensional setting where
Samples n Variables d
learning forest-structured graphical models may reduce overfittingvis-à-vis trees [Liu, Lafferty and Wasserman, 2010]
Extend Liu et al.’s work for discrete models and improveconvergence results
Strategy: Remove “weak” edges
tt
t t@@@
X4
X1
X3 X2
⇒ Reduce Num Params ⇒
tt t
t@@@
X4
X1
X3 X2
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 37 / 52
![Page 132: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/132.jpg)
Motivation: Prevent Overfitting
Chow-Liu algorithm tells us how to learn trees
Suppose we are in the high-dimensional setting where
Samples n Variables d
learning forest-structured graphical models may reduce overfittingvis-à-vis trees [Liu, Lafferty and Wasserman, 2010]
Extend Liu et al.’s work for discrete models and improveconvergence results
Strategy: Remove “weak” edges
tt
t t@@@
X4
X1
X3 X2
⇒ Reduce Num Params ⇒ tt t
t@@@
X4
X1
X3 X2
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 37 / 52
![Page 133: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/133.jpg)
Main Contributions
Propose CLThres, a thresholding algorithm, for consistentlylearning forest-structured models
Prove convergence rates (“moderate deviations”) for a fixeddiscrete graphical model P ∈ P(X d)
Prove achievable scaling laws on (n, d, k) (k is the num edges) forconsistent recovery in high-dimensions. Roughly speaking,
n & log1+δ(d − k)
is achievable
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 38 / 52
![Page 134: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/134.jpg)
Main Contributions
Propose CLThres, a thresholding algorithm, for consistentlylearning forest-structured models
Prove convergence rates (“moderate deviations”) for a fixeddiscrete graphical model P ∈ P(X d)
Prove achievable scaling laws on (n, d, k) (k is the num edges) forconsistent recovery in high-dimensions. Roughly speaking,
n & log1+δ(d − k)
is achievable
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 38 / 52
![Page 135: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/135.jpg)
Main Contributions
Propose CLThres, a thresholding algorithm, for consistentlylearning forest-structured models
Prove convergence rates (“moderate deviations”) for a fixeddiscrete graphical model P ∈ P(X d)
Prove achievable scaling laws on (n, d, k) (k is the num edges) forconsistent recovery in high-dimensions. Roughly speaking,
n & log1+δ(d − k)
is achievable
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 38 / 52
![Page 136: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/136.jpg)
Main Difficulty
Unknown minimum mutual information Imin in the forest model
Markov order estimation [Merhav, Gutman, Ziv 1989]
If known, can easily use a threshold, i.e,
if I(Pi,j) < Imin, remove (i, j)
How to deal with classic tradeoff between over- andunderestimation errors?
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 39 / 52
![Page 137: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/137.jpg)
Main Difficulty
Unknown minimum mutual information Imin in the forest model
Markov order estimation [Merhav, Gutman, Ziv 1989]
If known, can easily use a threshold, i.e,
if I(Pi,j) < Imin, remove (i, j)
How to deal with classic tradeoff between over- andunderestimation errors?
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 39 / 52
![Page 138: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/138.jpg)
Main Difficulty
Unknown minimum mutual information Imin in the forest model
Markov order estimation [Merhav, Gutman, Ziv 1989]
If known, can easily use a threshold, i.e,
if I(Pi,j) < Imin, remove (i, j)
How to deal with classic tradeoff between over- andunderestimation errors?
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 39 / 52
![Page 139: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/139.jpg)
Main Difficulty
Unknown minimum mutual information Imin in the forest model
Markov order estimation [Merhav, Gutman, Ziv 1989]
If known, can easily use a threshold, i.e,
if I(Pi,j) < Imin, remove (i, j)
How to deal with classic tradeoff between over- andunderestimation errors?
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 39 / 52
![Page 140: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/140.jpg)
The CLThres Algorithm
Compute the set of empirical mutual information I(Pi,j) for all(i, j) ∈ V × V
Max-weight spanning tree
Ed−1 := argmaxE:Tree
∑
(i,j)∈E
I(Pi,j)
Estimate number of edges given threshold εn
kn :=∣∣∣
(i, j) ∈ Ed−1 : I(Pi,j) ≥ εn
∣∣∣
Output the forest with the top kn edges
Computational Complexity = O((n + log d)d2)
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 40 / 52
![Page 141: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/141.jpg)
The CLThres Algorithm
Compute the set of empirical mutual information I(Pi,j) for all(i, j) ∈ V × V
Max-weight spanning tree
Ed−1 := argmaxE:Tree
∑
(i,j)∈E
I(Pi,j)
Estimate number of edges given threshold εn
kn :=∣∣∣
(i, j) ∈ Ed−1 : I(Pi,j) ≥ εn
∣∣∣
Output the forest with the top kn edges
Computational Complexity = O((n + log d)d2)
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 40 / 52
![Page 142: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/142.jpg)
The CLThres Algorithm
Compute the set of empirical mutual information I(Pi,j) for all(i, j) ∈ V × V
Max-weight spanning tree
Ed−1 := argmaxE:Tree
∑
(i,j)∈E
I(Pi,j)
Estimate number of edges given threshold εn
kn :=∣∣∣
(i, j) ∈ Ed−1 : I(Pi,j) ≥ εn
∣∣∣
Output the forest with the top kn edges
Computational Complexity = O((n + log d)d2)
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 40 / 52
![Page 143: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/143.jpg)
The CLThres Algorithm
Compute the set of empirical mutual information I(Pi,j) for all(i, j) ∈ V × V
Max-weight spanning tree
Ed−1 := argmaxE:Tree
∑
(i,j)∈E
I(Pi,j)
Estimate number of edges given threshold εn
kn :=∣∣∣
(i, j) ∈ Ed−1 : I(Pi,j) ≥ εn
∣∣∣
Output the forest with the top kn edges
Computational Complexity = O((n + log d)d2)
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 40 / 52
![Page 144: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/144.jpg)
A Convergence Result for CLThres
Assume that P ∈ P(X d) is a fixed forest-structured graphical model
d does not grow with n
Theorem (“Moderate Deviations”)Assume that the sequence εn∞n=1 satisfies
limn→∞
εn = 0, limn→∞
nεn
log n=∞, (εn := n−1/2 works)
Then
lim supn→∞
1nεn
logP(Ekn6= EP) ≤ −1, ⇒ P(Ekn
6= EP) ≈ exp(−nεn)
Also have a “liminf” lower bound
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 41 / 52
![Page 145: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/145.jpg)
A Convergence Result for CLThres
Assume that P ∈ P(X d) is a fixed forest-structured graphical model
d does not grow with n
Theorem (“Moderate Deviations”)Assume that the sequence εn∞n=1 satisfies
limn→∞
εn = 0, limn→∞
nεn
log n=∞, (εn := n−1/2 works)
Then
lim supn→∞
1nεn
logP(Ekn6= EP) ≤ −1, ⇒ P(Ekn
6= EP) ≈ exp(−nεn)
Also have a “liminf” lower bound
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 41 / 52
![Page 146: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/146.jpg)
A Convergence Result for CLThres
Assume that P ∈ P(X d) is a fixed forest-structured graphical model
d does not grow with n
Theorem (“Moderate Deviations”)Assume that the sequence εn∞n=1 satisfies
limn→∞
εn = 0, limn→∞
nεn
log n=∞, (εn := n−1/2 works)
Then
lim supn→∞
1nεn
logP(Ekn6= EP) ≤ −1, ⇒
P(Ekn6= EP) ≈ exp(−nεn)
Also have a “liminf” lower bound
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 41 / 52
![Page 147: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/147.jpg)
A Convergence Result for CLThres
Assume that P ∈ P(X d) is a fixed forest-structured graphical model
d does not grow with n
Theorem (“Moderate Deviations”)Assume that the sequence εn∞n=1 satisfies
limn→∞
εn = 0, limn→∞
nεn
log n=∞, (εn := n−1/2 works)
Then
lim supn→∞
1nεn
logP(Ekn6= EP) ≤ −1, ⇒ P(Ekn
6= EP) ≈ exp(−nεn)
Also have a “liminf” lower bound
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 41 / 52
![Page 148: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/148.jpg)
A Convergence Result for CLThres
Assume that P ∈ P(X d) is a fixed forest-structured graphical model
d does not grow with n
Theorem (“Moderate Deviations”)Assume that the sequence εn∞n=1 satisfies
limn→∞
εn = 0, limn→∞
nεn
log n=∞, (εn := n−1/2 works)
Then
lim supn→∞
1nεn
logP(Ekn6= EP) ≤ −1, ⇒ P(Ekn
6= EP) ≈ exp(−nεn)
Also have a “liminf” lower bound
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 41 / 52
![Page 149: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/149.jpg)
Remarks: A Convergence Result for CLThres
The Chow-Liu phase is consistent with exponential rate ofconvergence
The sequence can be taken to be εn := n−β for β ∈ (0, 1)
For all n sufficiently large,
εn < Imin
implies no underestimation asymptotically
Note that for two independent random variables Xi and Xj withproduct pmf Qi × Qj,
std(I(Pi,j)) = Θ(1/n)
Since the sequence εn = ω(log n/n) decays slower than std(I(Pi,j)),no overestimation asymptotically
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 42 / 52
![Page 150: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/150.jpg)
Remarks: A Convergence Result for CLThres
The Chow-Liu phase is consistent with exponential rate ofconvergence
The sequence can be taken to be εn := n−β for β ∈ (0, 1)
For all n sufficiently large,
εn < Imin
implies no underestimation asymptotically
Note that for two independent random variables Xi and Xj withproduct pmf Qi × Qj,
std(I(Pi,j)) = Θ(1/n)
Since the sequence εn = ω(log n/n) decays slower than std(I(Pi,j)),no overestimation asymptotically
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 42 / 52
![Page 151: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/151.jpg)
Remarks: A Convergence Result for CLThres
The Chow-Liu phase is consistent with exponential rate ofconvergence
The sequence can be taken to be εn := n−β for β ∈ (0, 1)
For all n sufficiently large,
εn < Imin
implies no underestimation asymptotically
Note that for two independent random variables Xi and Xj withproduct pmf Qi × Qj,
std(I(Pi,j)) = Θ(1/n)
Since the sequence εn = ω(log n/n) decays slower than std(I(Pi,j)),no overestimation asymptotically
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 42 / 52
![Page 152: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/152.jpg)
Remarks: A Convergence Result for CLThres
The Chow-Liu phase is consistent with exponential rate ofconvergence
The sequence can be taken to be εn := n−β for β ∈ (0, 1)
For all n sufficiently large,
εn < Imin
implies no underestimation asymptotically
Note that for two independent random variables Xi and Xj withproduct pmf Qi × Qj,
std(I(Pi,j)) = Θ(1/n)
Since the sequence εn = ω(log n/n) decays slower than std(I(Pi,j)),no overestimation asymptotically
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 42 / 52
![Page 153: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/153.jpg)
Pruning Away Weak Empirical Mutual Informations
-
6
n
R
Imin (unknown)
I(Pi,j)≈ 1n
εn = ω( log nn ) ∩ o(1)
Asymptotically, εn will be smaller than Imin and larger than I(Pi,j) withhigh probability
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 43 / 52
![Page 154: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/154.jpg)
Pruning Away Weak Empirical Mutual Informations
-
6
n
R
Imin (unknown)
I(Pi,j)≈ 1n
εn = ω( log nn ) ∩ o(1)
Asymptotically, εn will be smaller than Imin and larger than I(Pi,j) withhigh probability
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 43 / 52
![Page 155: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/155.jpg)
Pruning Away Weak Empirical Mutual Informations
-
6
n
R
Imin (unknown)
I(Pi,j)≈ 1n
εn = ω( log nn ) ∩ o(1)
Asymptotically, εn will be smaller than Imin and larger than I(Pi,j) withhigh probability
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 43 / 52
![Page 156: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/156.jpg)
Proof Idea
Based fully on the method of types
Estimate Chow-Liu learning error
Estimate underestimation error
P(kn < k).= exp(−nLP)
Estimate overestimation error
Decays subexponentially but faster than any polynomial:
P(kn > k) ≈ exp(−nεn)
Upper bound has no dependence on P (there exists a duality gap)
Additional Technique: Euclidean Information Theory
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 44 / 52
![Page 157: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/157.jpg)
Proof Idea
Based fully on the method of types
Estimate Chow-Liu learning error
Estimate underestimation error
P(kn < k).= exp(−nLP)
Estimate overestimation error
Decays subexponentially but faster than any polynomial:
P(kn > k) ≈ exp(−nεn)
Upper bound has no dependence on P (there exists a duality gap)
Additional Technique: Euclidean Information Theory
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 44 / 52
![Page 158: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/158.jpg)
Proof Idea
Based fully on the method of types
Estimate Chow-Liu learning error
Estimate underestimation error
P(kn < k).= exp(−nLP)
Estimate overestimation error
Decays subexponentially but faster than any polynomial:
P(kn > k) ≈ exp(−nεn)
Upper bound has no dependence on P (there exists a duality gap)
Additional Technique: Euclidean Information Theory
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 44 / 52
![Page 159: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/159.jpg)
Proof Idea
Based fully on the method of types
Estimate Chow-Liu learning error
Estimate underestimation error
P(kn < k).= exp(−nLP)
Estimate overestimation error
Decays subexponentially but faster than any polynomial:
P(kn > k) ≈ exp(−nεn)
Upper bound has no dependence on P (there exists a duality gap)
Additional Technique: Euclidean Information Theory
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 44 / 52
![Page 160: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/160.jpg)
Proof Idea
Based fully on the method of types
Estimate Chow-Liu learning error
Estimate underestimation error
P(kn < k).= exp(−nLP)
Estimate overestimation error
Decays subexponentially but faster than any polynomial:
P(kn > k) ≈ exp(−nεn)
Upper bound has no dependence on P (there exists a duality gap)
Additional Technique: Euclidean Information TheoryVincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 44 / 52
![Page 161: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/161.jpg)
High-Dimensional Learning
Consider a sequence of structure learning problems indexed bynumber of samples n
For each particular problem, we have data xn = xini=1
Each sample xi ∈ X d is drawn independently from aforest-structured model with d nodes and k edges
Sequence of tuples (n, dn, kn)∞n=1
Assumptions
(A1) Iinf := infd∈N
min(i,j)∈EP
I(Pi,j) > 0
(A2) κ := infd∈N
min(xi,xj)∈X 2
Pi,j(xi, xj) > 0
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 45 / 52
![Page 162: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/162.jpg)
High-Dimensional Learning
Consider a sequence of structure learning problems indexed bynumber of samples n
For each particular problem, we have data xn = xini=1
Each sample xi ∈ X d is drawn independently from aforest-structured model with d nodes and k edges
Sequence of tuples (n, dn, kn)∞n=1
Assumptions
(A1) Iinf := infd∈N
min(i,j)∈EP
I(Pi,j) > 0
(A2) κ := infd∈N
min(xi,xj)∈X 2
Pi,j(xi, xj) > 0
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 45 / 52
![Page 163: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/163.jpg)
High-Dimensional Learning
Consider a sequence of structure learning problems indexed bynumber of samples n
For each particular problem, we have data xn = xini=1
Each sample xi ∈ X d is drawn independently from aforest-structured model with d nodes and k edges
Sequence of tuples (n, dn, kn)∞n=1
Assumptions
(A1) Iinf := infd∈N
min(i,j)∈EP
I(Pi,j) > 0
(A2) κ := infd∈N
min(xi,xj)∈X 2
Pi,j(xi, xj) > 0
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 45 / 52
![Page 164: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/164.jpg)
High-Dimensional Learning
Consider a sequence of structure learning problems indexed bynumber of samples n
For each particular problem, we have data xn = xini=1
Each sample xi ∈ X d is drawn independently from aforest-structured model with d nodes and k edges
Sequence of tuples (n, dn, kn)∞n=1
Assumptions
(A1) Iinf := infd∈N
min(i,j)∈EP
I(Pi,j) > 0
(A2) κ := infd∈N
min(xi,xj)∈X 2
Pi,j(xi, xj) > 0
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 45 / 52
![Page 165: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/165.jpg)
High-Dimensional Learning
Consider a sequence of structure learning problems indexed bynumber of samples n
For each particular problem, we have data xn = xini=1
Each sample xi ∈ X d is drawn independently from aforest-structured model with d nodes and k edges
Sequence of tuples (n, dn, kn)∞n=1
Assumptions
(A1) Iinf := infd∈N
min(i,j)∈EP
I(Pi,j) > 0
(A2) κ := infd∈N
min(xi,xj)∈X 2
Pi,j(xi, xj) > 0
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 45 / 52
![Page 166: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/166.jpg)
An Achievable Scaling Law for CLThres
Theorem (Sufficient Conditions)Assume (A1) and (A2). Fix δ > 0. There exists constants C1,C2 > 0such that if
n > max
C1 log d,C2 log k,
(2 log(d − k))1+δ
the error probability of structure learning
P(error)→ 0
as (n, dn, kn)→∞
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 46 / 52
![Page 167: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/167.jpg)
An Achievable Scaling Law for CLThres
Theorem (Sufficient Conditions)Assume (A1) and (A2). Fix δ > 0. There exists constants C1,C2 > 0such that if
n > max
C1 log d,C2 log k, (2 log(d − k))1+δ
the error probability of structure learning
P(error)→ 0
as (n, dn, kn)→∞
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 46 / 52
![Page 168: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/168.jpg)
An Achievable Scaling Law for CLThres
Theorem (Sufficient Conditions)Assume (A1) and (A2). Fix δ > 0. There exists constants C1,C2 > 0such that if
n > max
C1 log d,C2 log k, (2 log(d − k))1+δ
the error probability of structure learning
P(error)→ 0
as (n, dn, kn)→∞
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 46 / 52
![Page 169: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/169.jpg)
Remarks on the Achievable Scaling Law for CLThres
If the model parameters (n, d, k) grow with n but if
d subexponentialk subexponential
d − k subexponential
structure recovery is asymptotically possible
d can grow much faster than n
Proof uses:1 Previous fixed d result2 Exponents in the limsup upper bound do not vanish with increasing
problem size as (n, dn, kn)→∞
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 47 / 52
![Page 170: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/170.jpg)
Remarks on the Achievable Scaling Law for CLThres
If the model parameters (n, d, k) grow with n but if
d subexponentialk subexponential
d − k subexponential
structure recovery is asymptotically possible
d can grow much faster than n
Proof uses:1 Previous fixed d result2 Exponents in the limsup upper bound do not vanish with increasing
problem size as (n, dn, kn)→∞
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 47 / 52
![Page 171: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/171.jpg)
Remarks on the Achievable Scaling Law for CLThres
If the model parameters (n, d, k) grow with n but if
d subexponentialk subexponential
d − k subexponential
structure recovery is asymptotically possible
d can grow much faster than n
Proof uses:1 Previous fixed d result2 Exponents in the limsup upper bound do not vanish with increasing
problem size as (n, dn, kn)→∞
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 47 / 52
![Page 172: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/172.jpg)
A Simple Strong Converse Result
Proposition (A Necessary Condition)Assume forests with d nodes are chosen uniformly at random. Fixη > 0. Then if
n <(1− η) log d
log |X |the error probability of structure learning
P(error)→ 1
as (n, dn)→∞ (independent of kn)
Ω(log d) is necessary for successful recovery
This lower bound is independent of parameters
The dependence on num of edges kn can be made more explicit
Close to the sufficient condition
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 48 / 52
![Page 173: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/173.jpg)
A Simple Strong Converse Result
Proposition (A Necessary Condition)Assume forests with d nodes are chosen uniformly at random. Fixη > 0. Then if
n <(1− η) log d
log |X |the error probability of structure learning
P(error)→ 1
as (n, dn)→∞ (independent of kn)
Ω(log d) is necessary for successful recovery
This lower bound is independent of parameters
The dependence on num of edges kn can be made more explicit
Close to the sufficient conditionVincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 48 / 52
![Page 174: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/174.jpg)
Concluding Remarks for Learning Forests
Proposed a simple extension of Chow-Liu’s MWST algorithm tolearn forests consistently
Error rates in the form of a “moderate deviations” result
Scaling laws on (n, d, k) for structural consistency in highdimensions
Extensions:
Risk consistency has also been analyzed (See thesis for details)
R(P∗) = Op
(d log dn1−γ
)
Need to find the right balance between over- and underestimationfor the finite sample case
VYFT, A. Anandkumar and A. S. Willsky “Learning High-Dimensional Markov ForestDistributions: Analysis of Error Rates”, Allerton 10, Submitted to JMLR.
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 49 / 52
![Page 175: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/175.jpg)
Concluding Remarks for Learning Forests
Proposed a simple extension of Chow-Liu’s MWST algorithm tolearn forests consistently
Error rates in the form of a “moderate deviations” result
Scaling laws on (n, d, k) for structural consistency in highdimensions
Extensions:
Risk consistency has also been analyzed (See thesis for details)
R(P∗) = Op
(d log dn1−γ
)
Need to find the right balance between over- and underestimationfor the finite sample case
VYFT, A. Anandkumar and A. S. Willsky “Learning High-Dimensional Markov ForestDistributions: Analysis of Error Rates”, Allerton 10, Submitted to JMLR.
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 49 / 52
![Page 176: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/176.jpg)
Concluding Remarks for Learning Forests
Proposed a simple extension of Chow-Liu’s MWST algorithm tolearn forests consistently
Error rates in the form of a “moderate deviations” result
Scaling laws on (n, d, k) for structural consistency in highdimensions
Extensions:
Risk consistency has also been analyzed (See thesis for details)
R(P∗) = Op
(d log dn1−γ
)
Need to find the right balance between over- and underestimationfor the finite sample case
VYFT, A. Anandkumar and A. S. Willsky “Learning High-Dimensional Markov ForestDistributions: Analysis of Error Rates”, Allerton 10, Submitted to JMLR.
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 49 / 52
![Page 177: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/177.jpg)
Concluding Remarks for Learning Forests
Proposed a simple extension of Chow-Liu’s MWST algorithm tolearn forests consistently
Error rates in the form of a “moderate deviations” result
Scaling laws on (n, d, k) for structural consistency in highdimensions
Extensions:
Risk consistency has also been analyzed (See thesis for details)
R(P∗) = Op
(d log dn1−γ
)
Need to find the right balance between over- and underestimationfor the finite sample case
VYFT, A. Anandkumar and A. S. Willsky “Learning High-Dimensional Markov ForestDistributions: Analysis of Error Rates”, Allerton 10, Submitted to JMLR.
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 49 / 52
![Page 178: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/178.jpg)
Concluding Remarks for Learning Forests
Proposed a simple extension of Chow-Liu’s MWST algorithm tolearn forests consistently
Error rates in the form of a “moderate deviations” result
Scaling laws on (n, d, k) for structural consistency in highdimensions
Extensions:
Risk consistency has also been analyzed (See thesis for details)
R(P∗) = Op
(d log dn1−γ
)
Need to find the right balance between over- and underestimationfor the finite sample case
VYFT, A. Anandkumar and A. S. Willsky “Learning High-Dimensional Markov ForestDistributions: Analysis of Error Rates”, Allerton 10, Submitted to JMLR.
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 49 / 52
![Page 179: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/179.jpg)
Concluding Remarks for Learning Forests
Proposed a simple extension of Chow-Liu’s MWST algorithm tolearn forests consistently
Error rates in the form of a “moderate deviations” result
Scaling laws on (n, d, k) for structural consistency in highdimensions
Extensions:
Risk consistency has also been analyzed (See thesis for details)
R(P∗) = Op
(d log dn1−γ
)
Need to find the right balance between over- and underestimationfor the finite sample case
VYFT, A. Anandkumar and A. S. Willsky “Learning High-Dimensional Markov ForestDistributions: Analysis of Error Rates”, Allerton 10, Submitted to JMLR.
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 49 / 52
![Page 180: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/180.jpg)
Outline
1 Motivation, Background and Main Contributions
2 Learning Discrete Trees Models: Error Exponent Analysis
3 Learning Gaussian Trees Models: Extremal Structures
4 Learning High-Dimensional Forest-Structured Models
5 Related Topics and Conclusion
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 50 / 52
![Page 181: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/181.jpg)
Beyond Trees
Structure Learning in Graphical Models Beyond Trees
Techniques extendto learning otherclasses of graphicalmodels
Graphical Models: Trees & Beyond
Analysis of Tree Structure Learning: Extremal Trees for Learning
Star
t ttt t
Chain
t t t t t
Structure Learning in Graphical Models Beyond Trees
Forests Latent Trees Random Graphs
Anima Anandkumar (UCI) Trees, Latent Trees & Beyond 11/08/2010 7 / 50Learn latent trees, where only a subset of nodes are observed
If the original graph is drawn from the Erdos-Rényi ensembleG(n, c
n), we can also provide guarantees for structure learning
Utilize the fact that the model is locally tree-like
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 51 / 52
![Page 182: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/182.jpg)
Beyond Trees
Structure Learning in Graphical Models Beyond Trees
Techniques extendto learning otherclasses of graphicalmodels
Graphical Models: Trees & Beyond
Analysis of Tree Structure Learning: Extremal Trees for Learning
Star
t ttt t
Chain
t t t t t
Structure Learning in Graphical Models Beyond Trees
Forests Latent Trees Random Graphs
Anima Anandkumar (UCI) Trees, Latent Trees & Beyond 11/08/2010 7 / 50
Learn latent trees, where only a subset of nodes are observed
If the original graph is drawn from the Erdos-Rényi ensembleG(n, c
n), we can also provide guarantees for structure learning
Utilize the fact that the model is locally tree-like
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 51 / 52
![Page 183: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/183.jpg)
Beyond Trees
Structure Learning in Graphical Models Beyond Trees
Techniques extendto learning otherclasses of graphicalmodels
Graphical Models: Trees & Beyond
Analysis of Tree Structure Learning: Extremal Trees for Learning
Star
t ttt t
Chain
t t t t t
Structure Learning in Graphical Models Beyond Trees
Forests Latent Trees Random Graphs
Anima Anandkumar (UCI) Trees, Latent Trees & Beyond 11/08/2010 7 / 50Learn latent trees, where only a subset of nodes are observed
If the original graph is drawn from the Erdos-Rényi ensembleG(n, c
n), we can also provide guarantees for structure learning
Utilize the fact that the model is locally tree-like
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 51 / 52
![Page 184: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/184.jpg)
Conclusions
Graphical models provide a powerful and parsimoniousrepresentation of high-dimensional data
(Ch. 3) Provided large-deviation analysis of ML learning oftree-structured distributions
(Ch. 4) Identified extremal structures for tree-structured Gaussiangraphical models
(Ch. 5) Extended analysis to forest-structured graphical models
Derived scaling laws on num variables, num edges and numsamples for consistent learning in high-dimensions
(Ch. 6) Also proposed algorithms for learning tree models forhypothesis testing
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 52 / 52
![Page 185: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/185.jpg)
Conclusions
Graphical models provide a powerful and parsimoniousrepresentation of high-dimensional data
(Ch. 3) Provided large-deviation analysis of ML learning oftree-structured distributions
(Ch. 4) Identified extremal structures for tree-structured Gaussiangraphical models
(Ch. 5) Extended analysis to forest-structured graphical models
Derived scaling laws on num variables, num edges and numsamples for consistent learning in high-dimensions
(Ch. 6) Also proposed algorithms for learning tree models forhypothesis testing
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 52 / 52
![Page 186: Large-Deviations and Applications for Learning Tree ...Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of](https://reader035.vdocument.in/reader035/viewer/2022071006/5fc3b962dfd94c7e685681fd/html5/thumbnails/186.jpg)
Conclusions
Graphical models provide a powerful and parsimoniousrepresentation of high-dimensional data
(Ch. 3) Provided large-deviation analysis of ML learning oftree-structured distributions
(Ch. 4) Identified extremal structures for tree-structured Gaussiangraphical models
(Ch. 5) Extended analysis to forest-structured graphical models
Derived scaling laws on num variables, num edges and numsamples for consistent learning in high-dimensions
(Ch. 6) Also proposed algorithms for learning tree models forhypothesis testing
Vincent Tan (MIT) Large-Deviations for Learning Trees Thesis Defense 52 / 52