![Page 1: Parameter and Structure Learning Dhruv Batra, 10-708 Recitation 10/02/2008](https://reader030.vdocument.in/reader030/viewer/2022032516/56649c785503460f9492d0b5/html5/thumbnails/1.jpg)
Parameter and Structure Learning
Dhruv Batra,10-708 Recitation
10/02/2008
![Page 2: Parameter and Structure Learning Dhruv Batra, 10-708 Recitation 10/02/2008](https://reader030.vdocument.in/reader030/viewer/2022032516/56649c785503460f9492d0b5/html5/thumbnails/2.jpg)
Overview• Parameter Learning
– Classical view, estimation task– Estimators, properties of estimators– MLE, why MLE?– MLE in BNs, decomposability
• Structure Learning– Structure score, decomposable scores – TAN, Chow-Liu– HW2 implementation steps
![Page 3: Parameter and Structure Learning Dhruv Batra, 10-708 Recitation 10/02/2008](https://reader030.vdocument.in/reader030/viewer/2022032516/56649c785503460f9492d0b5/html5/thumbnails/3.jpg)
Note• Plagiarism alert
– Some slides taken from others– Credits/references at the end
![Page 4: Parameter and Structure Learning Dhruv Batra, 10-708 Recitation 10/02/2008](https://reader030.vdocument.in/reader030/viewer/2022032516/56649c785503460f9492d0b5/html5/thumbnails/4.jpg)
![Page 5: Parameter and Structure Learning Dhruv Batra, 10-708 Recitation 10/02/2008](https://reader030.vdocument.in/reader030/viewer/2022032516/56649c785503460f9492d0b5/html5/thumbnails/5.jpg)
![Page 6: Parameter and Structure Learning Dhruv Batra, 10-708 Recitation 10/02/2008](https://reader030.vdocument.in/reader030/viewer/2022032516/56649c785503460f9492d0b5/html5/thumbnails/6.jpg)
Parameter Learning• Classical statistics view / Point Estimation
– Parameters unknown but not random– Point estimation = “find the right parameter”– Estimate parameters (or functions of parameters) of the
model from data
• Estimators– Any statistic– Function of data alone
• Say you have a dataset– Need to estimate mean– Is 5, an estimator?– What would you do?
![Page 7: Parameter and Structure Learning Dhruv Batra, 10-708 Recitation 10/02/2008](https://reader030.vdocument.in/reader030/viewer/2022032516/56649c785503460f9492d0b5/html5/thumbnails/7.jpg)
Properties of estimator
• Since estimator gives rise an estimate that depends on sample points (x1,x2,,,xn) estimate is a function of sample points.
• Sample points are random variable therefore estimate is random variable and has probability distribution.
• We want that estimator to have several desirable properties like• Consistency• Unbiasedness• Minimum variance
• In general it is not possible for an estimator to have all these properties.
![Page 8: Parameter and Structure Learning Dhruv Batra, 10-708 Recitation 10/02/2008](https://reader030.vdocument.in/reader030/viewer/2022032516/56649c785503460f9492d0b5/html5/thumbnails/8.jpg)
![Page 9: Parameter and Structure Learning Dhruv Batra, 10-708 Recitation 10/02/2008](https://reader030.vdocument.in/reader030/viewer/2022032516/56649c785503460f9492d0b5/html5/thumbnails/9.jpg)
![Page 10: Parameter and Structure Learning Dhruv Batra, 10-708 Recitation 10/02/2008](https://reader030.vdocument.in/reader030/viewer/2022032516/56649c785503460f9492d0b5/html5/thumbnails/10.jpg)
![Page 11: Parameter and Structure Learning Dhruv Batra, 10-708 Recitation 10/02/2008](https://reader030.vdocument.in/reader030/viewer/2022032516/56649c785503460f9492d0b5/html5/thumbnails/11.jpg)
![Page 12: Parameter and Structure Learning Dhruv Batra, 10-708 Recitation 10/02/2008](https://reader030.vdocument.in/reader030/viewer/2022032516/56649c785503460f9492d0b5/html5/thumbnails/12.jpg)
So why MLE?• MLE has some nice properties
– MLEs are often simple and easy to compute.– MLEs have asymptotic optimality properties (consistency
and efficiency).– MLEs are invariant under reparameterization.– and more..
![Page 13: Parameter and Structure Learning Dhruv Batra, 10-708 Recitation 10/02/2008](https://reader030.vdocument.in/reader030/viewer/2022032516/56649c785503460f9492d0b5/html5/thumbnails/13.jpg)
Let’s try
![Page 14: Parameter and Structure Learning Dhruv Batra, 10-708 Recitation 10/02/2008](https://reader030.vdocument.in/reader030/viewer/2022032516/56649c785503460f9492d0b5/html5/thumbnails/14.jpg)
Back to BNs• MLE in BN
– Data– Model DAG G– Parameters CPTs– Learn parameters from data
x(1)
…
x(m)
Data
structure parameters
CPTs –
P(Xi| PaXi)
![Page 15: Parameter and Structure Learning Dhruv Batra, 10-708 Recitation 10/02/2008](https://reader030.vdocument.in/reader030/viewer/2022032516/56649c785503460f9492d0b5/html5/thumbnails/15.jpg)
10-708 – Carlos Guestrin 2006-2008
15
Learning the CPTs
x(1)
…
x(m)
DataFor each discrete variable Xi
![Page 16: Parameter and Structure Learning Dhruv Batra, 10-708 Recitation 10/02/2008](https://reader030.vdocument.in/reader030/viewer/2022032516/56649c785503460f9492d0b5/html5/thumbnails/16.jpg)
Example• Learning MLE parameters
![Page 17: Parameter and Structure Learning Dhruv Batra, 10-708 Recitation 10/02/2008](https://reader030.vdocument.in/reader030/viewer/2022032516/56649c785503460f9492d0b5/html5/thumbnails/17.jpg)
10-708 – Carlos Guestrin 2006-2008
17
Learning the CPTs
x(1)
…
x(m)
DataFor each discrete variable Xi
![Page 18: Parameter and Structure Learning Dhruv Batra, 10-708 Recitation 10/02/2008](https://reader030.vdocument.in/reader030/viewer/2022032516/56649c785503460f9492d0b5/html5/thumbnails/18.jpg)
10-708 – Carlos Guestrin 2006-2008
18
Maximum likelihood estimation (MLE) of BN parameters – example
• Given structure, log likelihood of data:
Flu Allergy
Sinus
Nose
![Page 19: Parameter and Structure Learning Dhruv Batra, 10-708 Recitation 10/02/2008](https://reader030.vdocument.in/reader030/viewer/2022032516/56649c785503460f9492d0b5/html5/thumbnails/19.jpg)
Decomposability• Likelihood Decomposition
• Local likelihood function
What’s the difference?
Global parameter
independence!
![Page 20: Parameter and Structure Learning Dhruv Batra, 10-708 Recitation 10/02/2008](https://reader030.vdocument.in/reader030/viewer/2022032516/56649c785503460f9492d0b5/html5/thumbnails/20.jpg)
Taking derivatives of MLE of BN parameters – General case
![Page 21: Parameter and Structure Learning Dhruv Batra, 10-708 Recitation 10/02/2008](https://reader030.vdocument.in/reader030/viewer/2022032516/56649c785503460f9492d0b5/html5/thumbnails/21.jpg)
Structure Learning• Constraint Based
– Check independences, learn PDAG– HW1
• Score Based– Give a score for all possible structures– Maximize score
![Page 22: Parameter and Structure Learning Dhruv Batra, 10-708 Recitation 10/02/2008](https://reader030.vdocument.in/reader030/viewer/2022032516/56649c785503460f9492d0b5/html5/thumbnails/22.jpg)
Score Based• What’s a good score function?
• How about our old friend, log likelihood?
• So here’s our score function:
![Page 23: Parameter and Structure Learning Dhruv Batra, 10-708 Recitation 10/02/2008](https://reader030.vdocument.in/reader030/viewer/2022032516/56649c785503460f9492d0b5/html5/thumbnails/23.jpg)
Score Based• [Defn]: Decomposable scores
• Why do we care about decomposable scores?
• Log likelihood based score decomposes!
Need regularization
![Page 24: Parameter and Structure Learning Dhruv Batra, 10-708 Recitation 10/02/2008](https://reader030.vdocument.in/reader030/viewer/2022032516/56649c785503460f9492d0b5/html5/thumbnails/24.jpg)
Score Based• Chow-Liu
![Page 25: Parameter and Structure Learning Dhruv Batra, 10-708 Recitation 10/02/2008](https://reader030.vdocument.in/reader030/viewer/2022032516/56649c785503460f9492d0b5/html5/thumbnails/25.jpg)
Score Based• Chow-Liu modification for TAN (HW2)
![Page 26: Parameter and Structure Learning Dhruv Batra, 10-708 Recitation 10/02/2008](https://reader030.vdocument.in/reader030/viewer/2022032516/56649c785503460f9492d0b5/html5/thumbnails/26.jpg)
Slide and other credits• Zoubin Ghahramani, guest lectures in 10-702• Andrew Moore tutorial
– http://www.autonlab.org/tutorials/mle.html
• http://cnx.org/content/m11446/latest/• Lecture slides by Carlos Guestrin