bayesian networks c. learning / c.3 structure learning
Post on 28-May-2022
2 Views
Preview:
TRANSCRIPT
Bayesian Networks
C. Learning / C.3 Structure Learning / PC Algorithm
Lars Schmidt-Thieme
Information Systems and Machine Learning Lab (ISMLL)Institute of Computer Science
University of Hildesheimhttp://www.ismll.uni-hildesheim.de
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of HildesheimCourse on Bayesian Networks, winter term 2016/2017 1/31
Bayesian Networks
Syllabus
Mon. 24.10. (1) 0. Introduction
A. Fundamentals& A.1. Probability Calculus revisited
Mon. 31.10. (2) A.2. Separation in GraphsMon. 7.11. (3) A.3. Bayesian and Markov NetworksMon. 14.11. (4) (ctd.)
B. InferenceMon. 21.11. (5) B.1. Exact InferenceMon. 28.11. (6) B.2. Exact Inference / Join TreeMon. 5.12. (7) (ctd.)Mon. 12.12. (8) B.3. Approximate Inference / SamplingMon. 19.12. (9) B.4. Approximate Inference / Propagation
C. LearningMon. 9.1. (10) C.1. Parameter LearningMon. 16.1. (11) C.2. Parameter Learning with Missing ValuesMon. 23.1. (12) C.3. Structure Learning / PC AlgorithmMon. 30.1. (13) C.4. Structure Learning / Search Methods
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of HildesheimCourse on Bayesian Networks, winter term 2016/2017 1/31
Bayesian Networks
1. PC Algorithm
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of HildesheimCourse on Bayesian Networks, winter term 2016/2017 1/31
Bayesian Networks / 1. PC Algorithm
Types of Methods for Structure Learning
There are three types of structure learning algorithms forBayesian networks:
1. constrained-based learning (e.g., PC),2. searching with a target function (e.g., K2),3. hybrid methods (e.g., sparse candidate).
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of HildesheimCourse on Bayesian Networks, winter term 2016/2017 1/31
Bayesian Networks / 1. PC Algorithm
Computing the Skeleton
Lemma 1 (Edge Criterion). Let G := (V,E) be a DAG andX, Y ∈ V . Then it is equivalent:
(i) X and Y cannot be separated by any Z, i.e.,
¬IG(X, Y | Z) ∀Z ⊆ V \ {X, Y }(ii) There is an edge between X and Y , i.e.,
(X, Y ) ∈ E or (Y,X) ∈ E
Definition 1. Any Z ⊆ V \ {X, Y } with IG(X, Y | Z) iscalled a separator of X and Y .
Sep(X, Y ) := {Z ⊆ V \ {X, Y } | IG(X, Y | Z)}
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of HildesheimCourse on Bayesian Networks, winter term 2016/2017 2/31
Bayesian Networks / 1. PC Algorithm
Computing the Skeleton / Separators
1 separators-basic(set of variables V, independency relation I) :2 Allocate S : P2(V ) → P(V ) ∪ {none}3 for {X, Y } ⊆ V do4 S({X, Y }) := none5 for T ⊆ V \ {X, Y } do6 if I(X, Y |T )7 S({X, Y }) := T8 break9 fi
10 od11 od12 return S
Figure 1: Compute a separator for each pair of variables.
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of HildesheimCourse on Bayesian Networks, winter term 2016/2017 3/31
Bayesian Networks / 1. PC Algorithm
Example / 1/3 – Computing the Skeleton
Let I be the following independency structure:
I(A,D |C), I(A,D | {C,B}), I(B,D)
Then we can compute the followingseparators:
S(A,B) := noneS(A,C) := noneS(A,D) := {C}S(B,C) := noneS(B,D) := ∅S(C,D) := none
Thus, the skeleton of the BayesianNetwork representing I looks like
D
B
A
C
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of HildesheimCourse on Bayesian Networks, winter term 2016/2017 4/31
Bayesian Networks / 1. PC Algorithm
Computing the V-structure
Lemma 2 (Uncoupled Head-to-head Meeting Criterion). LetG := (V,E) be a DAG, X, Y, Z ∈ V with
x
Z
YX
Then it is equivalent:
(i) X → Z ← Y is an uncoupled head-to-head meeting, i.e.,
(X,Z), (Y, Z) ∈ E, (X, Y ), (Y,X) 6∈ E(ii) Z is not contained in any separator of X and Y , i.e.,
Z 6∈ S ∀S ∈ Sep(X, Y )
(iii) Z is not contained in at least one separator of X and Y , i.e.,
Z 6∈ S ∃S ∈ Sep(X, Y )
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of HildesheimCourse on Bayesian Networks, winter term 2016/2017 5/31
Bayesian Networks / 1. PC Algorithm
Computing Skeleton and V-structure
1 vstructure(set of variables V, independency relation I) :2 S := separators(V, I)3 G := (V,E) with E := {{X, Y } |S({X, Y }) = none}4 for X, Y, Z ∈ V with X − Z − Y,X−6 −Y do5 if Z 6∈ S(X, Y )6 orient X − Z − Y as X → Z ← Y7 fi8 od9 return G
Figure 2: Compute skeleton and v-structure.
1 learn-structure-pc(set of variables V, independency relation I) :2 G := vstructure(V, I)3 saturate(G)4 return G
Figure 3: Learn structure of a Bayesian Network (SGS/PC algorithm,[SGS00]).
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of HildesheimCourse on Bayesian Networks, winter term 2016/2017 6/31
Bayesian Networks / 1. PC Algorithm
Example / 2/3 – Computing the V-Structure
Separators:S(A,B) := noneS(A,C) := noneS(A,D) := {C}S(B,C) := noneS(B,D) := ∅S(C,D) := none
Skeleton:
D
B
A
C
Checking A–C–D:
D
B
A
C
Checking B–C–D:
D
B
A
C
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of HildesheimCourse on Bayesian Networks, winter term 2016/2017 7/31
Bayesian Networks / 1. PC Algorithm
Example / 3/3 – Saturating
Skeleton and v-structure:
C
A
B
D
Saturating:
rule 1 rule 2
C
A
B
D
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of HildesheimCourse on Bayesian Networks, winter term 2016/2017 8/31
Bayesian Networks / 1. PC Algorithm
Number of Independency Tests
Let there be n variables.For each of the
(n2
)pairs of variables, there are 2n−2
candidates for possible separators.
number of I-tests =
(n
2
)2n−2
Example: n = 4:(n
2
)2n−2 =
(4
2
)22 = 6 · 4 = 24
If we start with small separators and stop once aseparator has been found, we still have to check
4 · (1 + 2 + 1) + 1 · (1 + 2) + 1 · 1 = 20
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of HildesheimCourse on Bayesian Networks, winter term 2016/2017 9/31
Bayesian Networks / 1. PC Algorithm
Number of Independency Tests
Can we reduce the number of tests for a given pair ofvariable by reusing results for other pairs of variables?
Lemma 3. Let G := (V,E) be a DAG and X, Y ∈ Vseparated. Then
I(X, Y | pa(X)) or I(X, Y | pa(Y ))
As we do not know directions of edges at the skeletonrecovery step, we use the weaker result:
I(X, Y | fan(X)) or I(X, Y | fan(Y ))
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of HildesheimCourse on Bayesian Networks, winter term 2016/2017 10/31
Bayesian Networks / 1. PC Algorithm
Computing the Skeleton / Separators(1) separators-remove-edges(separator map S, skeleton graph G, independency relation I) :(2) i := 0(3) while ∃X ∈ V : | fanG(X)| > i do(4) for {X, Y } ∈ E with | fanG(X)| > i or | fanG(Y )| > i do(5) for T ∈ P i(fanG(X) \ {Y }) ∪ P i(fanG(Y ) \ {X}) do(6) if I(X, Y | T )(7) S({X, Y }) := T(8) E := E \ {{X, Y }}(9) break
(10) fi(11) od(12) od(13) i := i+ 1(14) od(15) return S
1 separators-interlaced(set of variables V, independency relation I) :2 Allocate S : P2(V ) → P(V ) ∪ {none}3 S({X, Y }) := none ∀{X, Y } ⊆ V4 G := (V,E) with E := P2(V )5 separators-remove-edges(S,G, I)6 return S
Figure 4: Compute a separator for each pair of variables.Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of HildesheimCourse on Bayesian Networks, winter term 2016/2017 11/31
Bayesian Networks / 1. PC Algorithm
Example / Computing the Separators (1/3)
I(A,D |C), I(A,D | {C,B}), I(B,D)
i = 0 :A, B, T = ∅: —
C, T = ∅: —D, T = ∅: —
B, C, T = ∅: —D, T = ∅: D({B,D}) = ∅
C, D, T = ∅: —
initial graph:
D
B
A
C
after update for B,D:
D
B
A
C
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of HildesheimCourse on Bayesian Networks, winter term 2016/2017 12/31
Bayesian Networks / 1. PC Algorithm
Example / Computing the Separators (2/3)
I(A,D |C), I(A,D | {C,B}), I(B,D)
i = 1 :A, B, T = {C}, {D}: —
C, T = {B}, {D}: —D, T = {B}, {C}: S({A,D}) = {C}
B, C, T = {A}, {D}: —C, D, T = {A}, {B}: —
after update for B,D:
D
B
A
C
after update for A,D:
C
A
B
D
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of HildesheimCourse on Bayesian Networks, winter term 2016/2017 13/31
Bayesian Networks / 1. PC Algorithm
Example / Computing the Separators (3/3)
I(A,D |C), I(A,D | {C,B}), I(B,D)
i = 2 :A, C, T = {B,D}: —B, C, T = {A,D}: —C, D, T = {A,B}: —
total: 19 I-tests.
after update for A,D:
C
A
B
D
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of HildesheimCourse on Bayesian Networks, winter term 2016/2017 14/31
Bayesian Networks / 1. PC Algorithm
Algorithms – SGS vs. PC
SGS/PC with separators-basic is called SGSalgorithm ([SGS00], 1990).
SGS/PC with separators-interlaced is called PCalgorithm ([SGS00], 1991).
Implementations are available:
• in Tetradhttp://www.phil.cmu.edu/projects/tetrad/(class files & javadocs, no sources)• in Hugin (commercial).
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of HildesheimCourse on Bayesian Networks, winter term 2016/2017 15/31
Bayesian Networks / 1. PC Algorithm
Another Example
Let I be the following independency structure:
I(A,D |B), I(B,D)
PC computes the following DAG pattern:
A
B
D
But this is not even a representation of I, as it implies
IG(A,D | ∅)
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of HildesheimCourse on Bayesian Networks, winter term 2016/2017 16/31
Bayesian Networks / 1. PC Algorithm
Representation and Faithfulness Tests
PC computes the DAG pattern of an independencyrelation
if there exists one at all !
Remember: not every independency relation has afaithful DAG representation.
But how do we know if an independency relation has afaithful DAG representation?
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of HildesheimCourse on Bayesian Networks, winter term 2016/2017 17/31
Bayesian Networks / 1. PC Algorithm
Representation and Faithfulness Tests
There is no easy way to decide if a given independencyrelation has a faithful DAG representation.
So just check ex-post if the DAG pattern we have foundis a faithful representation.
Check:
1. compute all IG(X, Y | Z) and check if I(X, Y | Z)(representation),
2. check for each I(X, Y | Z) if IG(X, Y | Z) (faithfulness).
As there is no easy way to enumerate IG(X, Y | Z) for aDAG pattern G directly, we draw a representative H andthen enumerate IH(X, Y | Z) (remember that IH = IG).
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of HildesheimCourse on Bayesian Networks, winter term 2016/2017 18/31
Bayesian Networks / 1. PC Algorithm
Draw a Representative of a DAG pattern
1 draw-representative(DAG pattern G) :2 saturate(G)3 while G has unoriented edges do4 draw an edge from G an orient it arbitrarily5 saturate(G)6 od7 return G
Figure 5: Draw a representative of a DAG pattern.
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of HildesheimCourse on Bayesian Networks, winter term 2016/2017 19/31
Bayesian Networks / 1. PC Algorithm
Draw a Representative of a DAG pattern
Here, rule 4 is necessary in saturate.
W
Y
Z
X
W
Y
Z
X
Start with DAG pattern
X
Z
Y
W
that is already saturated
and choose as next edge X–W andorient it as X → W . Then rule 4 has tobe applied to saturate the resultingDAG pattern
X
Z
Y
W
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of HildesheimCourse on Bayesian Networks, winter term 2016/2017 20/31
Bayesian Networks / 1. PC Algorithm
Draw a Representative of a DAG pattern / Example
step 1a: saturate.X
Z
Y
W
step 1b: orient any unoriented edge:
W
Y
Z
X
step 2a: saturate.
rule 4
W
Y
Z
X
step 2b: orient any unoriented edge:
rule 4
X
Z
Y
W
step 3a: saturate.
rule 1
rule 4
X
Z
Y
W
done.
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of HildesheimCourse on Bayesian Networks, winter term 2016/2017 21/31
Bayesian Networks / 1. PC Algorithm
Representation and Faithfulness Tests
If I = Ip the probabilistic independency relation of a JPDp, then for (1) it suffices to check the Markov property,i.e.,
for all X check if Ip(X, nondesc(X) \ pa(X) | pa(X))
It even suffices to check for any topological orderingσ = (X1, . . . , Xn) if
Ip(σ(i), σ({1, . . . , i− 1}) \ pa(σ(i)) | pa(σ(i)))
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of HildesheimCourse on Bayesian Networks, winter term 2016/2017 22/31
Bayesian Networks / 1. PC Algorithm
A Third Method to Compute Separators
separators-interlaced computes separatorstop-down by
• starting with a complete graph and then• successively thining the graph.
Therefore, we have to start checking with lots ofcandidates for possible separators.
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of HildesheimCourse on Bayesian Networks, winter term 2016/2017 23/31
Bayesian Networks / 1. PC Algorithm
A Third Method to Compute Separators
1 separators-add-edges(separator map S, skeleton graph G, independency relation I) :2 S({X, Y }) := argminT⊆fanG(X),T⊆fanG(Y ) g(X, Y, T ) ∀{X, Y } ∈ P2(V )
3 {X0, Y0} := argmax{X,Y }∈P2(V ) g(X, Y, S({X, Y }))4 while ¬I(X0, Y0 |S({X0, Y0})) do5 E := E ∪ {{X0, Y0}}6 S({X0, Y0}) := none7 S({X, Y }) := argminT⊆fanG(X),T⊆fanG(Y ) g(X, Y, T ) ∀{X, Y } ∈ P2(V ) \ E,X ∈ {X0, Y0}8 {X0, Y0} := argmax{X,Y }∈P2(V )\E g(X, Y, S({X, Y }))9 od
10 return S
1 separators-bottumup(set of variables V, independency relation I) :2 Allocate S : P2(V ) → P(V ) ∪ {none}3 G := (V,E) with E := ∅4 separators-add-edges(S,G, I)5 separators-remove-edges(S,G, I)6 return S
Figure 6: Compute a separator for each pair of variables [TASB03].
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of HildesheimCourse on Bayesian Networks, winter term 2016/2017 24/31
Bayesian Networks / 1. PC Algorithm
Embedded Faithfulness
Let I be the following independency structure:
I(A,D), I(A,E), I(A,E |B), I(A,E |D), I(B,E)
EA C
B D
Assume C to be hidden.
d-separations between variables A,B,D, and E:
I(A,D), I(A,E), I(A,E |B), I(A,E |D), I(B,E)
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of HildesheimCourse on Bayesian Networks, winter term 2016/2017 25/31
Bayesian Networks / 1. PC Algorithm
Embedded Faithfulness
Definition 2. Let I be an independency relation on thevariables V and G be a DAG with vertices W ⊇ V .I is embedded in G, if all independency statementsentailed by G between variables from V hold in I:
IG(X ,Y |Z)⇒ I(X ,Y |Z) ∀X ,Y ,Z ⊆ V
I is embedded faithfully in G, if the independencystatements entailed by G are exactly I:
IG(X ,Y |Z)⇔ I(X ,Y |Z) ∀X ,Y ,Z ⊆ V
Many independency relations w./o. faithful DAGrepresentation can be embedded faithfully.
But not every independency relation can be embeddedfaithfully.
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of HildesheimCourse on Bayesian Networks, winter term 2016/2017 26/31
Bayesian Networks / 1. PC Algorithm
Embedded Faithfulness / Example
Let I be the independency relation [Nea03, ex. 2.13,p. 103]
I(X, Y ), I(X, Y |Z)
Z
X YAssume, I can be embedded faithfully in a DAG G.
• As ¬I(X,Z), there must be chain X ∼ Z w./o.head-to-head meetings.• As ¬I(Z, Y ), there must be chain Y ∼ Z w./o.
head-to-head meetings.
Now, concatenate both chains X ∼ Z ∼ Y :
• eihter X ∼→ Z ←∼ Y , and then the chain is notblocked by Z, i.e., not I(X, Y |Z),• or not X ∼→ Z ←∼ Y , and then the chain is not
blocked by ∅, i.e., not I(X, Y ).
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of HildesheimCourse on Bayesian Networks, winter term 2016/2017 27/31
Bayesian Networks / 1. PC Algorithm
There are variants of the PC algorithm for finding faithfulembeddings of a given independency relation:
• Causal Inference (CI) and• Fast Causal Inference Algorithms (FCI; [SGS00])
See also [Nea03, ch. 10.2].
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of HildesheimCourse on Bayesian Networks, winter term 2016/2017 28/31
Bayesian Networks / 1. PC Algorithm
Outline of Part 1 : Learning Bayesian Networks
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of HildesheimCourse on Bayesian Networks, winter term 2016/2017 29/31
Bayesian Networks / 1. PC Algorithm
section key concepts
I. ProbabilisticIndependence andSeparation in Graphs
Prob. independence,separation in graphs,Bayesian Network
II. Learning Parameters Max. likelihoodestimation, max. aposterior estimation
III. Learning Parameterswith Missing Values
EM algo.
IV. Learning Structure byConstrained-basedLearning
Markov equivalence,graph patterns, PCalgo.
V. Learning Structure byLocal Search
Bayesian InformationCriterion (BIC), K2algo., GES algo.
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of HildesheimCourse on Bayesian Networks, winter term 2016/2017 30/31
Bayesian Networks / 1. PC Algorithm
References
[Nea03] Richard E. Neapolitan. Learning Bayesian Networks. Prentice Hall, 2003.
[SGS00] Peter Spirtes, Clark Glymour, and Richard Scheines. Causation, Prediction, andSearch. MIT Press, 2 edition, 2000.
[TASB03] I. Tsamardinos, C. F. Aliferis, A. Statnikov, and L. E. Brown. Scaling-up bayesiannetwork learning to thousands of variables using local learning technique. Technicalreport, DSL TR-03-02, March 12, 2003, Vanderbilt University, Nashville, TN, USA,2003.
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of HildesheimCourse on Bayesian Networks, winter term 2016/2017 31/31
top related