graphical) models) bayesian)networks) seman&cs)&)...
TRANSCRIPT
Daphne Koller
Seman&cs)&)Factoriza&on)
Probabilis&c)Graphical)Models) Bayesian)Networks)
Representa&on)
Daphne Koller
• Grade • Course Difficulty • Student Intelligence • Student SAT • Reference Letter
P(G,D,I,S,L)
Daphne Koller
Intelligence Difficulty
Grade
Letter
SAT
Daphne Koller
Intelligence Difficulty
Grade
Letter
SAT
0.3 0.08 0.25
0.4 g2(B)
0.02 0.9 i1,d0 0.7 0.05 i0,d1
0.5
0.3 g1(A) g3(C)
0.2 i1,d1
0.3 i0,d0
l1 l0
0.99 0.4 0.1 0.9 g1
0.01 g3 0.6 g2
0.2 0.95
s0 s1
0.8 i1 0.05 i0
0.4 0.6 d1 d0
0.3 0.7 i1 i0
Daphne Koller
Intelligence Difficulty
Grade
Letter
SAT
P(D) P(I)
P(G|I,D) P(S|I)
P(L|G)
Chain Rule for Bayesian Networks
P(D,I,G,S,L) = P(D) P(I) P(G|I,D) P(S|I) P(L|G)
Distribution defined as a product of factors!
Daphne Koller
Intelligence Difficulty
Grade
Letter
SAT
0.3 0.08 0.25
0.4 g2
0.02 0.9 i1,d0 0.7 0.05 i0,d1
0.5
0.3 g1 g3
0.2 i1,d1
0.3 i0,d0
l1 l0
0.99 0.4 0.1 0.9 g1
0.01 g3 0.6 g2
0.2 0.95
s0 s1
0.8 i1 0.05 i0
0.4 0.6 d1 d0
0.3 0.7 i1 i0
P(d0, i1, g3, s1, l1) =
Daphne Koller
Bayesian Network • A Bayesian network is: – A directed acyclic graph (DAG) G whose
nodes represent the random variables X1,…,Xn – For each node Xi a CPD P(Xi | ParG(Xi))
• The BN represents a joint distribution via the chain rule for Bayesian networks
P(X1,…,Xn) = Πi P(Xi | ParG(Xi))
Daphne Koller
BN Is a Legal Distribution: P ≥ 0
Daphne Koller
BN Is a Legal Distribution: ∑ P = 1 ∑D,I,G,S,L P(D,I,G,S,L) = ∑D,I,G,S,L P(D) P(I) P(G|I,D) P(S|I) P(L|G)
= ∑D,I,G,S P(D) P(I) P(G|I,D) P(S|I) ∑L P(L|G)
= ∑D,I,G,S P(D) P(I) P(G|I,D) P(S|I)
= ∑D,I,G P(D) P(I) P(G|I,D) ∑S P(S|I)
= ∑D,I P(D) P(I) ∑G P(G|I,D)
Daphne Koller
P Factorizes over G • Let G be a graph over X1,…,Xn. • P factorizes over G if
P(X1,…,Xn) = Πi P(Xi | ParG(Xi))
Daphne Koller
Genetic Inheritance
Homer
Bart
Marge
Lisa Maggie
Clancy Jackie
Selma
Genotype
Phenotype
AA, AB, AO, BO, BB, OO
A, B, AB, O
Daphne Koller
BNs for Genetic Inheritance
GHomer
GBart
GMarge
GLisa GMaggie
GClancy GJackie
GSelma
BClancy BJackie
BSelma BHomer BMarge
BBart BLisa BMaggie
Daphne'Koller'
Reasoning'Pa1erns'
Probabilis3c'Graphical'Models' Bayesian'Networks'
Representa3on'
Daphne'Koller'
The Student Network
Intelligence Difficulty
Grade
Letter
SAT
0.3 0.08 0.25
0.4 g2
0.02 0.9 i1,d0 0.7 0.05 i0,d1
0.5
0.3 g1 g3
0.2 i1,d1
0.3 i0,d0
l1 l0
0.99 0.4 0.1 0.9 g1
0.01 g3 0.6 g2
0.2 0.95
s0 s1
0.8 i1 0.05 i0
0.4 0.6 d1 d0
0.3 0.7 i1 i0
Daphne'Koller'
Intelligence Difficulty
Grade
Letter
SAT
Causal Reasoning
P(l1) ≈ 0.5
P(l1 | i0 ) ≈
P(l1 | i0 , d0) ≈
Difficulty Intelligence
0.39
0.51
Daphne'Koller'
Intelligence Difficulty
Grade
Letter
SAT
Evidential Reasoning P(d1) = 0.4 P(i1) = 0.3 P(d1 | g3) ≈ P(i1 | g3) ≈
Student gets a C
0.3 0.08 0.25
0.4 g2
0.02 0.9 i1,d0 0.7 0.05 i0,d1
0.5
0.3 g1 g3
0.2 i1,d1
0.3 i0,d0
0.63 0.08
Daphne'Koller'
Intelligence Difficulty
Grade
Letter
SAT
Intercausal Reasoning
Class is hard!
Student gets a C
P(i1 | g3, d1) ≈ 0.11
P(d1) = 0.4 P(i1) = 0.3 P(d1 | g3) ≈ 0.63 P(i1 | g3) ≈ 0.08
Daphne'Koller'
Intercausal Reasoning Explained
X2 X1
Y OR
X1 X2 Y' Prob(
0' 0' 0 0.25
0' 1' 1 0.25
1' 0' 1 0.25
1' 1 1 0.25
Daphne'Koller'
Difficulty Intelligence Difficulty
Grade
Letter
SAT
Intercausal Reasoning II
Class is hard!
Student gets a B
P(i1) = 0.3
P(i1 | g2, d1) ≈ 0.34 P(i1 | g2) ≈ 0.175
g2'
Daphne'Koller'
Student Aces the SAT • What happens to the posterior
probability that the class is hard? Intelligence
Grade
Letter
SAT
Student gets a C
Difficulty
Student aces the SAT
Daphne'Koller'
Intelligence
Grade
Letter
SAT
Student Aces the SAT
P(i1 | g3, s1) ≈ 0.58 P(d1 | g3, s1) ≈ 0.76 Difficulty
Student aces the SAT
P(d1) = 0.4 P(i1) = 0.3 P(d1 | g3) ≈ 0.63 P(i1 | g3) ≈ 0.08
Student gets a C
Daphne'Koller'
Flow'of''Probabilis3c'Influence'
Probabilis3c'Graphical'Models' Bayesian'Networks'
Representa3on'
Daphne'Koller'
When can X influence Y? • X � Y • X � Y • X � W � Y • X � W � Y • X � W � Y • X � W � Y
Intelligence Difficulty
Grade
Letter
SAT
Daphne'Koller'
Active Trails • A trail X1 ─ … ─ Xn is active if: it has no v-structures Xi-1 � Xi � Xi+1
Daphne'Koller'
When can X influence Y Given evidence about Z
• X � Y • X � Y
• X � W � Y • X � W � Y • X � W � Y • X � W � Y
Intelligence Difficulty
Grade
Letter
SAT
W ∉ Z W ∈ Z
Daphne'Koller'
When can X influence Y given evidence about Z
• S ― I ― G ― D allows influence to flow when: Intelligence Difficulty
Grade
Letter
SAT
Daphne'Koller'
Active Trails • A trail X1 ─ … ─ Xn is active given Z if:
– for any v-structure Xi-1 � Xi � Xi+1 we have that Xi or one of its descendants ∈Z
– no other Xi is in Z
Daphne Koller
Preliminaries*
Probabilis-c*Graphical*Models* Independencies*
Representa-on*
Daphne Koller
Independence • For events α, β, P α ⊥ β if: – P(α, β) = P(α) P(β) – P(α|β) = P(α) – P(β|α) = P(β)
• For random variables X,Y, P X ⊥ Y if: – P(X, Y) = P(X) P(Y) – P(X|Y) = P(X) – P(Y|X) = P(Y)
Daphne Koller
Independence
I D Prob i0 d0 0.42 i0 d1 0.18 i1 d0 0.28 i1 d1 0.12
I Prob i0 0.6 i1 0.4
D Prob d0 0.7 d1 0.3
P(I,D)
I D G Prob. i0 d0 g1 0.126 i0 d0 g2 0.168 i0 d0 g3 0.126 i0 d1 g1 0.009 i0 d1 g2 0.045 i0 d1 g3 0.126 i1 d0 g1 0.252 i1 d0 g2 0.0224 i1 d0 g3 0.0056 i1 d1 g1 0.06 i1 d1 g2 0.036 i1 d1 g3 0.024
I D
G
Daphne Koller
Conditional Independence • For (sets of) random variables X,Y,Z P (X ⊥ Y | Z) if: – P(X, Y|Z) = P(X|Z) P(Y|Z) – P(X|Y,Z) = P(X|Z) – P(Y|X,Z) = P(X|Z) – P(X,Y,Z) ∝ φ1(X,Y) φ2(Y,Z)
Daphne Koller
Conditional Independence
X2 X1
Coin
Daphne Koller
Conditional Independence I S G Prob. i0 s0 g1 0.114 i0 s0 g2 0.1938 i0 s0 g3 0.2622 i0 s1 g1 0.006 i0 s1 g2 0.0102 i0 s1 g3 0.0138 i1 s0 g1 0.252 i1 s0 g2 0.0224 i1 s0 g3 0.0056 i1 s1 g1 0.108 i1 s1 g2 0.0096 i1 s1 g3 0.0024
S Prob s0 0.95 s1 0.05
P(S,G | i0)
S G Prob. s0 g1 0.19 s0 g2 0.323 s0 g3 0.437 s1 g1 0.01 s1 g2 0.017 s1 g3 0.023
G Prob. g1 0.2 g2 0.34 g3 0.46
I
G S
Daphne Koller
Conditioning can Lose Independences I D G Prob. i0 d0 g1 0.126 i0 d0 g2 0.168 i0 d0 g3 0.126 i0 d1 g1 0.009 i0 d1 g2 0.045 i0 d1 g3 0.126 i1 d0 g1 0.252 i1 d0 g2 0.0224 i1 d0 g3 0.0056 i1 d1 g1 0.06 i1 d1 g2 0.036 i1 d1 g3 0.024
I D Prob. i0 d0 0.282 i0 d1 0.02 i1 d0 0.564 i1 d1 0.134
P(I,D | g1)
Daphne Koller
Bayesian(Networks(
Probabilis2c(Graphical(Models( Independencies(
Representa2on(
Daphne Koller
Independence & Factorization
• Factorization of a distribution P implies independencies that hold in P
• If P factorizes over G, can we read these independencies from the structure of G?
P(X,Y,Z) ∝ φ1(X,Z) φ2(Y,Z) P(X,Y) = P(X) P(Y) X,Y independent
(X ⊥ Y | Z)
Daphne Koller
Flow of influence & d-separation
Definition: X and Y are d-separated in G given Z if there is no active trail in G between X and Y given Z
I D
G
L
S
Notation: d-sepG(X, Y | Z)
Daphne Koller
Factorization Independence: BNs Theorem: If P factorizes over G, and d-sepG(X, Y | Z) then P satisfies (X ⊥ Y | Z)
I D
G
L
S
Daphne Koller
Any node is d-separated from its non-descendants given its parents Intelligence Difficulty
Grade
Letter
SAT
Job
Happy
Coherence
If P factorizes over G, then in P, any variable is independent of its non-descendants given its parents
Daphne Koller
I-maps • d-separation in G P satisfies
corresponding independence statement
• Definition: If P satisfies I(G), we say that G is an I-map (independency map) of P
I(G) = {(X ⊥ Y | Z) : d-sepG(X, Y | Z)}
Daphne Koller
I-maps I D Prob i0 d0 0.42 i0 d1 0.18 i1 d0 0.28 i1 d1 0.12
I D Prob. i0 d0 0.282 i0 d1 0.02 i1 d0 0.564 i1 d1 0.134
G1
P2
I D I D
P1
G2
Daphne Koller
Factorization Independence: BNs Theorem: If P factorizes over G, then G is
an I-map for P
Daphne Koller
Independence Factorization Theorem: If G is an I-map for P, then P
factorizes over G
I D
G
L
S
Daphne Koller
Summary Two equivalent views of graph structure: • Factorization: G allows P to be represented • I-map: Independencies encoded by G hold in P
If P factorizes over a graph G, we can read from the graph independencies that must hold in P (an independency map)
Daphne'Koller'
Naïve'Bayes'
Probabilis5c'Graphical'Models' Bayesian'Networks'
Representa5on'
Daphne'Koller'
Naïve'Bayes'Model'
Class
X1 X2 Xn .'.'.'(Xi ⊥ Xj | C) for all Xi, Xj
Daphne'Koller'
Naïve'Bayes'Classifier'
Class
X1 X2 Xn .'.'.'
Daphne'Koller'
Bernoulli'Naïve'Bayes'for'Text'
cat dog buy .'.'.'Document Label
P(“cat” appears | Label)
Financial cat dog buy sell …
Pets 0.001 0.001 0.2 0.3 … 0.3 0.4 0.02 0.0001 …
Daphne'Koller'
Mul5nomial'Naïve'Bayes'for'Text'
W1 W2 Wn .'.'.'Document Label Financial
cat dog buy sell …
Pets 0.001 0.001 0.02 0.02 … 0.02 0.03 0.003 0.001 …
Daphne'Koller'
Summary • Simple approach for classification
– Computationally efficient – Easy to construct
• Surprisingly effective in domains with many weakly relevant features
• Strong independence assumptions reduce performance when many features are strongly correlated
Daphne Koller
Applica'on:+Diagnosis+
Probabilis'c+Graphical+Models+ Bayesian+Networks+
Representa'on+
Daphne Koller
Medical Diagnosis: Pathfinder (1992) • Help pathologist diagnose lymph node
pathologies (60 different diseases) • Pathfinder I: Rule-based system • Pathfinder II used naïve Bayes and got
superior performance
Heckerman et al.
Daphne Koller
Medical Diagnosis: Pathfinder (1992) • Pathfinder III: Naïve Bayes with better
knowledge engineering • No incorrect zero probabilities • Better calibration of conditional probabilities – P(finding | disease1) to P(finding | disease2) – Not P(finding1 | disease) to P(finding2 | disease)
Heckerman et al.
Daphne Koller
Medical Diagnosis: Pathfinder (1992) • Pathfinder IV: Full Bayesian network – Removed incorrect independencies – Additional parents led to more accurate
estimation of probabilities • BN model agreed with expert panel in 50/53
cases, vs 47/53 for naïve Bayes model • Accuracy as high as expert that designed
the model Heckerman et al.
Daphne Koller
CPCS
# of parameters: 21000 to 133,931,430 to 8254
M. Pradhan , G. Provan , B. Middleton , M.Henrion, UAI 94
Daphne Koller
Medical Diagnosis (Microsoft)
Thanks to: Eric Horvitz, Microsoft Research
Daphne Koller
- Medical Diagnosis (Microsoft)
Thanks to: Eric Horvitz, Microsoft Research
Daphne Koller
Fault Diagnosis • Microsoft troubleshooters
Daphne Koller
Fault Diagnosis • Many examples: – Microsoft troubleshooters – Car repair
• Benefits: – Flexible user interface – Easy to design and maintain