graphical models - learning - wolfram burgard, luc de raedt, kristian kersting, bernhard nebel...
Post on 18-Dec-2015
226 views
TRANSCRIPT
Graphical Models- Learning -
Graphical Models- Learning -
Wolfram Burgard, Luc De Raedt, Kristian Kersting, Bernhard Nebel
Albert-Ludwigs University Freiburg, Germany
PCWP CO
HRBP
HREKG HRSAT
ERRCAUTERHRHISTORY
CATECHOL
SAO2 EXPCO2
ARTCO2
VENTALV
VENTLUNG VENITUBE
DISCONNECT
MINVOLSET
VENTMACHKINKEDTUBEINTUBATIONPULMEMBOLUS
PAP SHUNT
ANAPHYLAXIS
MINOVL
PVSAT
FIO2PRESS
INSUFFANESTHTPR
LVFAILURE
ERRBLOWOUTPUTSTROEVOLUMELVEDVOLUME
HYPOVOLEMIA
CVP
BP
AdvancedI WS 06/07
Based on J. A. Bilmes,“A Gentle Tutorial of the EM Algorithm and its Applicationto Parameter Estimation for Gaussian Mixture and Hidden Markov Models“, TR-97-021, U.C. Berkeley, April 1998; G. J. McLachlan, T. Krishnan, „The EM Algorithm and Extensions“, John Wiley & Sons, Inc., 1997; D. Koller, course CS-228 handouts, Stanford University, 2001., N. Friedman & D. Koller‘s NIPS‘99.
Structure Learning
Bayesian Networks
Bayesian Networks
AdvancedI WS 06/07 Learning With Bayesian Networks
Fixed structure Fixed variables Hidden variables
ob
serv
ed
fully
Partia
lly
Easiest problemcounting
Selection of arcsNew domain with no domain expert
Data mining
Numerical, nonlinear
optimization,Multiple calls to
BNs,Difficult for large
networks
Encompasses to difficult
subproblem,„Only“ Structural
EM is known
Scientific discouvery
A B A B? A B? ? H
- Learning
- Learning
Stucture learing?Parameter Estimation
Bayesian Networks
Bayesian Networks
AdvancedI WS 06/07
• Increases the number of parameters to be estimated
• Wrong assumptions about domain structure
• Cannot be compensated for by fitting parameters
• Wrong assumptions about domain structure
Earthquake Alarm Set
Sound
Burglary Earthquake Alarm Set
Sound
Burglary
Earthquake Alarm Set
Sound
Burglary
Adding an arcMissing an arc
Why Struggle for Accurate Structure?
- Learning
- Learning
Bayesian Networks
Bayesian Networks
AdvancedI WS 06/07
Unknown Structure, (In)complete Data
E B
A
.9 .1
e
b
e
.7 .3
.99 .01
.8 .2
be
b
b
e
BE P(A | E,B)
Learningalgorithm
- Learning
- Learning
? ?
e
b
e
? ?
? ?
? ?
be
b
b
e
BE P(A | E,B)
E, B, A<Y,?,N><Y,N,?><N,N,Y><N,Y,Y> . .<?,Y,Y>
• Network structure is not specified• Data contains missing values
– Need to consider assignments to missing values
E B
A
E, B, A<Y,N,N><Y,N,Y><N,N,Y><N,Y,Y> . .<N,Y,Y>
• Network structure is not specified– Learnerr needs to select arcs &
estimate parameters
• Data does not contain missing values
Bayesian Networks
Bayesian Networks
AdvancedI WS 06/07 Score-based Learning
E, B, A<Y,N,N><Y,Y,Y><N,N,Y><N,Y,Y> . .<N,Y,Y>
E B
A
E
B
A
E
BA
Search for a structure that maximizes the score
Define scoring function that evaluates how well a structure matches the data
score
- Learning
- Learning
Bayesian Networks
Bayesian Networks
AdvancedI WS 06/07 Structure Search as Optimization
Input:– Training data– Scoring function– Set of possible structures
Output:– A network that maximizes the score -
Learning
- Learning
Bayesian Networks
Bayesian Networks
AdvancedI WS 06/07 Heuristic Search
• Define a search space:– search states are possible structures– operators make small changes to
structure• Traverse space looking for high-
scoring structures• Search techniques:
– Greedy hill-climbing– Best first search– Simulated Annealing– ...
Theorem: Finding maximal scoring structure with at most k parents per node is NP-hard for k > 1
- Learning
- Learning
Bayesian Networks
Bayesian Networks
AdvancedI WS 06/07 Typically: Local Search
S C
E
D - Learning
- Learning
• Start with a given network– empty network, best tree , a random
network• At each iteration
– Evaluate all possible changes– Apply change based on score
• Stop when no modification improves score
Bayesian Networks
Bayesian Networks
AdvancedI WS 06/07 Typically: Local Search
S C
E
D
Add C
D
S C
E
D
- Learning
- Learning
• Start with a given network– empty network, best tree , a random
network• At each iteration
– Evaluate all possible changes– Apply change based on score
• Stop when no modification improves score
Bayesian Networks
Bayesian Networks
AdvancedI WS 06/07 Typically: Local Search
S C
E
D
Reverse C
E
Add C
D
S C
E
D
- Learning
- Learning
• Start with a given network– empty network, best tree , a random
network• At each iteration
– Evaluate all possible changes– Apply change based on score
• Stop when no modification improves score
S C
E
D
Bayesian Networks
Bayesian Networks
AdvancedI WS 06/07 Typically: Local Search
S C
E
D
Reverse C
EDelete C
E
Add C
D
S C
E
D
S C
E
D
- Learning
- Learning
• Start with a given network– empty network, best tree , a random
network• At each iteration
– Evaluate all possible changes– Apply change based on score
• Stop when no modification improves score
S C
E
D
Bayesian Networks
Bayesian Networks
AdvancedI WS 06/07 Typically: Local Search
S C
E
D
Reverse C
EDelete C
E
Add C
D
S C
E
D
S C
E
D
If data is complete:To update score after local change, only re-score (counting) families that changed
- Learning
- Learning
S C
E
D
If data is incomplete:To update score after local change, reran parameter estimation algorithm
Bayesian Networks
Bayesian Networks
AdvancedI WS 06/07
• Local search can get stuck in:– Local Maxima:
•All one-edge changes reduce the score– Plateaux:
•Some one-edge changes leave the score unchanged
• Standard heuristics can escape both– Random restarts– TABU search– Simulated annealing
Local Search in Practice
- Learning
- Learning
Bayesian Networks
Bayesian Networks
AdvancedI WS 06/07 Local Search in Practice• Using LL as score, adding arcs always helps
– Max score attained by fully connected network– Overfitting: A bad idea…
• Minimum Description Length:– Learning data compression
• Other: BIC (Bayesian Information Criterion), Bayesian score (BDe)
||2
log),|(log)|( Θ+Θ−=
NGDPDBNMDL
DL(Model)
DL(Data|model)
<9.7 0.6 8 14
18> <0.2 1.3 5 ?? ??
> <1.3 2.8 ?? 0 1
> <?? 5.6 0 10 ??
> ……………….
- Learning
- Learning
Bayesian Networks
Bayesian Networks
AdvancedI WS 06/07 Local Search in Practice
G1G3 G2
Parametric optimization
(EM)
Parameter space
Local Maximum
G4Gn
• Perform EM for each candidate graph
- Learning
- Learning
Bayesian Networks
Bayesian Networks
AdvancedI WS 06/07 Local Search in Practice
• Perform EM for each candidate graphG1G3 G2
Parametric optimization
(EM)
Parameter space
Local Maximum
G4Gn
Computationally expensive: Parameter optimization via EM — non-trivial Need to perform EM for all candidate structures Spend time even on poor candidates
In practice, considers only a few candidates
- Learning
- Learning
Bayesian Networks
Bayesian Networks
AdvancedI WS 06/07
Structural EM [Friedman et al. 98]
Recall, in complete data we had–Decomposition efficient search
Idea: • Instead of optimizing the real score… •Find decomposable alternative
score•Such that maximizing new score
improvement in real score
- Learning
- Learning
Bayesian Networks
Bayesian Networks
AdvancedI WS 06/07
Structural EM [Friedman et al. 98]
Idea: • Use current model to help evaluate new
structures
Outline:• Perform search in (Structure, Parameters)
space• At each iteration, use current model for
finding either:– Better scoring parameters: “parametric” EM
stepor– Better scoring structure: “structural” EM step
- Learning
- Learning
Bayesian Networks
Bayesian Networks
AdvancedI WS 06/07
TrainingData
Expected CountsN(X1)N(X2)N(X3)N(H, X1, X1, X3)N(Y1, H)N(Y2, H)N(Y3, H)
Computation
X1 X2 X3
H
Y1 Y2 Y3
X1 X2 X3
H
Y1 Y2 Y3
Score &
Parameterize
Reiterate
N(X2,X1)N(H, X1, X3)N(Y1, X2)N(Y2, Y1, H)
X1 X2 X3
H
Y1 Y2 Y3
Structural EM [Friedman et al. 98]
- Learning
- Learning
Bayesian Networks
Bayesian Networks
AdvancedI WS 06/07
Structure Learning: incomplete data
EM-algorithm:iterate until convergence
Current model
S X D C B <? 0 1 0
1> <1 1 ? 0
1> <0 0 0 ? ?
> <? ? 0 ?
1> ………
Data
S X D C B 1 0 1 0 1 1 1 1 0 1 0 0 0 0 0 1 0 0 0 1 ………..
Expected counts
Expectation Inference: P(S|X=0,D=1,C=0,B=1)
Maximization Parameters
- Learning
- Learning
E
BA
Bayesian Networks
Bayesian Networks
AdvancedI WS 06/07
Structure Learning: incomplete data
SEM-algorithm:iterate until convergence
Current model
S X D C B <? 0 1 0
1> <1 1 ? 0
1> <0 0 0 ? ?
> <? ? 0 ?
1> ………
Data
S X D C B 1 0 1 0 1 1 1 1 0 1 0 0 0 0 0 1 0 0 0 1 ………..
Expected counts
Expectation Inference: P(S|X=0,D=1,C=0,B=1)
Maximization Parameters
- Learning
- Learning
MaximizationStructure
E B
A
E
B
A
E
BA
E
BA
E B
A
Bayesian Networks
Bayesian Networks
AdvancedI WS 06/07 Structure Learning: Summary
• Expert knowledge + learning from data• Structure learning involves parameter
estimation (e.g. EM) • Optimization w/ score functions
– likelihood + complexity penality = MDL• Local traversing of space of possible
structures:– add, reverse, delete (single) arcs
• Speed-up: Structural EM– Score candidates w.r.t. current best
model
- Learning
- Learning