survey of contemporary bayesian network structure learning ... · survey of contemporary bayesian...

38
Survey of contemporary Bayesian Network Structure Learning methods Ligon Liu September 2015 Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 1) September 2015 1 / 38

Upload: vomien

Post on 08-Apr-2018

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Survey of contemporary Bayesian Network Structure Learning ... · Survey of contemporary Bayesian Network Structure Learning methods Ligon Liu September 2015 Ligon Liu (CUNY) Survey

Survey of contemporary Bayesian Network StructureLearning methods

Ligon Liu

September 2015

Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 1) September 2015 1 / 38

Page 2: Survey of contemporary Bayesian Network Structure Learning ... · Survey of contemporary Bayesian Network Structure Learning methods Ligon Liu September 2015 Ligon Liu (CUNY) Survey

Bayesian Network

DefinitionLet V be a set of variables. A Bayesian Network is comprised of a discretestructure part and a continuous parameter part:

structural part: a Directed Acyclic Graph (V ,E), V being randomvariables, E ⊂ V × Vparameter part: the conditional probability of every variable given itsparents in the DAG.

ExampleThe Y-shaped Bayesian Network: V={0,1,2,3}, E={(0,2),(1,2),(2,3)}

10

3

2

Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 2) September 2015 2 / 38

Page 3: Survey of contemporary Bayesian Network Structure Learning ... · Survey of contemporary Bayesian Network Structure Learning methods Ligon Liu September 2015 Ligon Liu (CUNY) Survey

Bayesian Network

ExampleConditional Probability Table

x0,x1 x2 Pr

0 01 0.12 0.253 0.65

0 11 0.22 0.353 0.45

1 02 0.53 0.5

1 21 0.12 0.43 0.5

x0 Pr

0 0.61 0.4

x1 Pr

0 0.31 0.42 0.3

x2 x3 Pr

10 0.21 0.8

20 0.51 0.5

3 1 1

Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 3) September 2015 3 / 38

Page 4: Survey of contemporary Bayesian Network Structure Learning ... · Survey of contemporary Bayesian Network Structure Learning methods Ligon Liu September 2015 Ligon Liu (CUNY) Survey

BN Structure Learning Problem

Counted indexed relation dataset (V ,R , c)

Scoring function s(V ,R , c,E), abbrev. s(V ,E)usually s(V ,E) is required to be decomposable:

s(V ,E) =∑v∈V

S(v ,Pa(v))

Find the DAG (V ,E) that maximize s(V ,E)

Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 4) September 2015 4 / 38

Page 5: Survey of contemporary Bayesian Network Structure Learning ... · Survey of contemporary Bayesian Network Structure Learning methods Ligon Liu September 2015 Ligon Liu (CUNY) Survey

Clusters of surveyed articles

Conditional Independence(C.I.) constraint-based algorithms [16], [1],[11], [12][20])

Ordering-based search[10], [19], [15], Branch and bound[4][14],Parent Graph shortest path [22], [5, 7, 6]

Integer Linear Programming and LP relaxation based approximatealgorithms [8] [9], [17, 18], [2, 3]

Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 5) September 2015 5 / 38

Page 6: Survey of contemporary Bayesian Network Structure Learning ... · Survey of contemporary Bayesian Network Structure Learning methods Ligon Liu September 2015 Ligon Liu (CUNY) Survey

Conditional Independence and its testing

DefinitionLet P be a distribution over variable set V , X ,Y ,M ⊂ V , X and Y is said tobe Conditional Independent given M if

P(X ,Y | M) = P(X | M) · P(Y | M)

Conditional Independence conclusions can be tested or inferred fromknown conditional independences.

Testing(for discrete variables), e.g. Inferring, e.g.

χ2 testG2 test

Monte Carlo permutation test

Semi-graphoid rules:(1) Symmetry

CI(A ,B | C)⇔ CI(B ,A | C)(2) Decomposition

CI(A ,B ∪ D | C)⇒ CI(A ,B | C)(3) Weak union

CI(A ,B∪C | D)⇒ CI(A ,B | C∪D)(4) Contraction

CI(A ,B | C ∪ D) ∧ CI(A ,C | D)⇒CI(A ,B ∪ C | D)

Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 6) September 2015 6 / 38

Page 7: Survey of contemporary Bayesian Network Structure Learning ... · Survey of contemporary Bayesian Network Structure Learning methods Ligon Liu September 2015 Ligon Liu (CUNY) Survey

Conditional Independence and Bayesian Network DAG

NotationLet (V,E) be a Directed Acyclic Graph, v ∈ V , the parent set of v is denotedas Pa(v), i.e.

Pa(v) = {u | (u, v) ∈ E}

LemmaP(v | V − {v}) = P(v | Pa(v))

DefinitionLet (V ,E) be a DAG, vertexes u, v ∈ V is said to be d-separated given M,if after all colliders(including collider sets) in M be replaced by bidirectededges between their parents, all directed paths from u to v or from v to uin (V,E) does not pass through M.

On a Bayesian Network DAG, vertexes u, v being d-separated by Mindicates u ⊥ v | M in the BN probability distribution.

Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 7) September 2015 7 / 38

Page 8: Survey of contemporary Bayesian Network Structure Learning ... · Survey of contemporary Bayesian Network Structure Learning methods Ligon Liu September 2015 Ligon Liu (CUNY) Survey

Early Conditional Independence-based algorithms

C.I. based algorithms are based on the following facts:

d-separation on Bayesian Network DAG⇐⇒ ConditionalIndependence

Existence of an undirected edge u ↔ v can be inferred from at leastone of many conditional independences. Testing on more C.I. triples(u, v ,M) may increase confidence.

C.I. tests are computationally expensive to perform on datasets.Minimize number of tests.

Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 8) September 2015 8 / 38

Page 9: Survey of contemporary Bayesian Network Structure Learning ... · Survey of contemporary Bayesian Network Structure Learning methods Ligon Liu September 2015 Ligon Liu (CUNY) Survey

Early Conditional Independence-based algorithms

SGS (the first C.I.-based algorithm to learn BN)

PC(PC*, Stable- and Conservative- PC)

Grow-Shrink, IAMB, SRS

Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 9) September 2015 9 / 38

Page 10: Survey of contemporary Bayesian Network Structure Learning ... · Survey of contemporary Bayesian Network Structure Learning methods Ligon Liu September 2015 Ligon Liu (CUNY) Survey

PC Algorithm brief

PC is an iterative algorithm to learn Bayesian Network from C.I. tests. Withgraph edges E as a variable,

Start with complete undirected graphEdge elimination: For each two variables u, v, do C.I. tests,startingawithsize 0 (unconditional) | Ø , thensize 1 condition sets | {i},| {j},..., thensize 2 condition sets | {i, j},| {i, k },| {j, k },...,larger condition sets . . . . . .until conditional independence i ⊥ j | M is found, | V − {u, v}.Eliminate any edge between two variables that are conditionallyindependent given any condition set.For any pair of variables,PC algorithm test against conditional sets with variables in any pathbetween the pair.Directing the edge by “unshielded collider rule” and “loop removalrule”

Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 10) September 2015 10 / 38

Page 11: Survey of contemporary Bayesian Network Structure Learning ... · Survey of contemporary Bayesian Network Structure Learning methods Ligon Liu September 2015 Ligon Liu (CUNY) Survey

Edge direction rules

Unshielded Collider Rule:If two variables u,w are not directly connected but are connected asu → v −w, orient v −w as v → w to avoid forming unshielded collideru → v ← w

Loop Removal Rule:If two variables, u and v connected both by an undirected edge andby a directed path, orient the undirected edge as u → v

Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 11) September 2015 11 / 38

Page 12: Survey of contemporary Bayesian Network Structure Learning ... · Survey of contemporary Bayesian Network Structure Learning methods Ligon Liu September 2015 Ligon Liu (CUNY) Survey

Advantages of PC based algorithms

1 Fast speed. On sparse graphs, the computation time of PC ispolynomial time.

2 Compared to SGS, C.I. constraint propagation by semi-graphoid rulessaved a lot of C.I. testings.In addition, if parallel machine is available, it is possible to doredundant C.I. testings to improve confidence[1],

3 By computing independence of smaller conditional testing M first, theconditional independence test has higher confidence for highdimensional dataset.

Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 12) September 2015 12 / 38

Page 13: Survey of contemporary Bayesian Network Structure Learning ... · Survey of contemporary Bayesian Network Structure Learning methods Ligon Liu September 2015 Ligon Liu (CUNY) Survey

Robustness of PC based algorithms

The robustness of C.I. based algorithms is doubted by researchers[11],[1].Factors that will undermine robustness of PC in high dimensional datasets:

Sampling loss – when marginal dataset is relatively small w/regardingto graph complexity, local C.I. tests are usually less accurate – alsocalled non faithfulness of the C.I. relations to the distribution.

C.I. testing order – when earlier independence test happens to havelower confidence, they can prevent tests generating higher confidencecontradictory C.I. results.

Two algorithms, Conservative-PC[11] and Stable-PC[1], are invented toovercome the instability over C.I. testing order. They use redundant CItesting to detect unfaithfulness and a voting mechanism to find the mostlikely CI.

Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 13) September 2015 13 / 38

Page 14: Survey of contemporary Bayesian Network Structure Learning ... · Survey of contemporary Bayesian Network Structure Learning methods Ligon Liu September 2015 Ligon Liu (CUNY) Survey

Markov blanket

DefinitionLet (V ,E) be a DAG. The Markov Blanket of v ∈ V , denoted by MB(v), isthe set of vertexes not d-separated with v by any variables. i.e., the set ofnodes composed of v ’s parents, children, and children’s parents in theDAG.

Theoremv is d-separated from V − {v} −MB(v) by MB(v)

DefinitionLet (V ,E) be a DAG. The Moral Graph (V ,F) of (V ,E) is formed byconnecting nodes that have a common child, and then making all edges inthe graph undirected. i.e.

F = {{u, v} | (u, v) ∈ E or (v , u) ∈ E or ∃w : (u,w), (w, u) ∈ E}

Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 14) September 2015 14 / 38

Page 15: Survey of contemporary Bayesian Network Structure Learning ... · Survey of contemporary Bayesian Network Structure Learning methods Ligon Liu September 2015 Ligon Liu (CUNY) Survey

Markov blanket

CorollaryLet (V ,E) be a DAG, (V ,F) be the Moral Graph of (V,E), the MarkovBlanket of v ∈ V is the neighbors of v in (V ,F).

DAG Moral Graph Markov Blanket of E

Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 15) September 2015 15 / 38

Page 16: Survey of contemporary Bayesian Network Structure Learning ... · Survey of contemporary Bayesian Network Structure Learning methods Ligon Liu September 2015 Ligon Liu (CUNY) Survey

Grow-Shrink and IAMB algorithms

DefinitionLet (V ,E) be a directed graph. The Markov Blanket of v ∈ V is the set ofvertexes not d-separated with v by any variables. i.e., the set of nodescomposed of v ’s parents, children, and children’s parents.i.e. A Markov Blanket M is a minimum subset of V that satisfies:

∀U ⊆ V − {v} −M : v ⊥ U | M

Obvious: finding every variable’s Markov Blanket is equivalent offinding the DAG’s Moral Graph

Grow-Shrink algorithm

IAMB algorithm – greedy ordering of condition sets of Grow-Shrink

Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 16) September 2015 16 / 38

Page 17: Survey of contemporary Bayesian Network Structure Learning ... · Survey of contemporary Bayesian Network Structure Learning methods Ligon Liu September 2015 Ligon Liu (CUNY) Survey

Multiple Markov Blankets

Since Bayesian Network decomposition is usually not unique for a datadistribution, one variable may have multiple different Markov Blankets.

Like for both M1 and M2:

∀U ⊆ V − {v} −M : v ⊥ U | M

DefinitionLet (V ,E) be a directed graph. A variable u ∈ V is called StronglyRelevant with v ∈ V if and only if

∀S ⊆ V − {v , u} : P(v | S) , P(v | S ∪ {u})

A variable u ∈ V is called Weakly Relevant with v ∈ V if and only if

∃S ⊆ V − {v , u} : P(v | S) , P(v | S ∪ {u})

and u is not Strongly Relevant with v.

Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 17) September 2015 17 / 38

Page 18: Survey of contemporary Bayesian Network Structure Learning ... · Survey of contemporary Bayesian Network Structure Learning methods Ligon Liu September 2015 Ligon Liu (CUNY) Survey

“Selection via Represent Sets” Algorithm

DefinitionLet V be the variable set, a representative set of v ∈ V consists of avariable u in v ’s Markov blanket and u’s corresponding correlated features.

Propositionu is strongly relevant with v, if and only if u belongs to the set of parentsand children of variable v in a faithful Bayesian Network.

SRS Algorithm

Step 1:Gv ← Get − PC(v)(PC means Parent&Child)for u in Gv :

Gu ← {u} ∪ Get − PC(u)

Step 2:Search a group of strongly relevantvariables’ Parent Child sets{Gi} ⊆ {Gu | u ∈ SR(v)}, such that ∪iGi

is a best Markov Blanket under the givenmeasure.

Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 18) September 2015 18 / 38

Page 19: Survey of contemporary Bayesian Network Structure Learning ... · Survey of contemporary Bayesian Network Structure Learning methods Ligon Liu September 2015 Ligon Liu (CUNY) Survey

Decomposable Scoring Function

Let V be the variables, scoring function s(V ,E): P(V × V)→ R

s is called decomposable if and only if s(V ,E) =∑

v∈V S(v ,Pa(v)),where Pa(v) = {u | (u, v) ∈ E}

Commonly used decomposable scoring functions:Log-Likelihood(AIC,BIC), BD(e,eu)

Define BN learning as finding E for V that maximizes s(V ,E)

Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 19) September 2015 19 / 38

Page 20: Survey of contemporary Bayesian Network Structure Learning ... · Survey of contemporary Bayesian Network Structure Learning methods Ligon Liu September 2015 Ligon Liu (CUNY) Survey

Ordering Search

Given an order O of variables, if scoring function s is decomposable, thebest DAG satisfying O can be found in polynomial time to the number ofvariables, simply by finding best parents among smaller-order variablesfrom sink to source.[15]

Modern ordering search algorithms use propagation of constraintsinferred from scoring function’s properties and background knowledgeto reduce search space.

Algorithm: branch ’n’ bound search, A∗ heuristic search

Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 20) September 2015 20 / 38

Page 21: Survey of contemporary Bayesian Network Structure Learning ... · Survey of contemporary Bayesian Network Structure Learning methods Ligon Liu September 2015 Ligon Liu (CUNY) Survey

Dynamic Programming of Parent Sets

Lemma[13]Let v ∈ V , Q ⊆ V , v < Q .

maxP⊆QS(v ,P) = max(S(v ,Q),maxu∈QmaxP⊆Q−{u}S(v ,P))

Which enables DP for propagation of argmaxP⊆QS(v ,P) for all subsets Qof V − {v}. This is the step all dynamic programming algorithms use to getoptimal parent sets of every variable.

Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 21) September 2015 21 / 38

Page 22: Survey of contemporary Bayesian Network Structure Learning ... · Survey of contemporary Bayesian Network Structure Learning methods Ligon Liu September 2015 Ligon Liu (CUNY) Survey

The 2006 ordering search algorithm[15]

Variable Set: V = {1, . . . ,N}.

Variable i’s parent candidate set:

Pa(v) ⊆ V − {v}

1. Calculate the localscores for all n · 2n−1

different (v ,Pa(v))-pairs.[s(v ,Pa(v)) | v ∈

V , Pa(v) ⊆ V − {v}]

2. Find optimalsmaller-by-1 parent setPα(v ,G) ⊆ Pa(v) for allG ⊆ V − {v} [Pα(v ,G) |

v = 1, . . . ,N, G ⊆ V − {v}]Pα(v ,G) = Pa(v) −

argmaxv∈Gs(v ,G − {v})3. Find the best sink from

all 2n variable sets.[sink(W) =

argmaxs∈W skore(W , s) |W ⊆ V ]

4. Using the best sink, finda best ordering of the

variables.Oi = sink(V − ∪N

j=i+1{Oj})

5. Compute the best network using above best parents, best sink, best orderingLigon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 22) September 2015 22 / 38

Page 23: Survey of contemporary Bayesian Network Structure Learning ... · Survey of contemporary Bayesian Network Structure Learning methods Ligon Liu September 2015 Ligon Liu (CUNY) Survey

Ordering by Sink Score

Lemma[15]Let W ⊆ V , k is the last variable(called sink) in the optimal order of Wif and only if

k = argmaxk∈W (maxP∈W−{k }S(k ,P) + S(W − {k }))

Which enables using DP for computation of optimal sink.maxP∈W−{k }S(k ,P) + S(W − {k }) is called sink score.

Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 23) September 2015 23 / 38

Page 24: Survey of contemporary Bayesian Network Structure Learning ... · Survey of contemporary Bayesian Network Structure Learning methods Ligon Liu September 2015 Ligon Liu (CUNY) Survey

Example of Optimal Parents and Optimal Sink

Add graphic example

Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 24) September 2015 24 / 38

Page 25: Survey of contemporary Bayesian Network Structure Learning ... · Survey of contemporary Bayesian Network Structure Learning methods Ligon Liu September 2015 Ligon Liu (CUNY) Survey

Optimization techniques

AD-Tree [19]

if U′ U, and s(v ,U′) ≥ s(v ,U), remove U from candidates [19]

Partition parent sets by size – reduce space to 2n(34)

pnO(1), p isdegree of parents [14]

Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 25) September 2015 25 / 38

Page 26: Survey of contemporary Bayesian Network Structure Learning ... · Survey of contemporary Bayesian Network Structure Learning methods Ligon Liu September 2015 Ligon Liu (CUNY) Survey

Structural constraints

Optimal DAGs under common scoring functions (MDL, BDeu) havecommon structural constraints[4] that can be used to prune.

Hard limits of incoming degrees

CorollaryUsing BIC or AIC as criterion, the optimal graph (V ,E) has at mostdlog2Ne parents per node.

Optimal parent set score has upper bounds – various heuristics

Optimal parent set has upper bounds

Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 26) September 2015 26 / 38

Page 27: Survey of contemporary Bayesian Network Structure Learning ... · Survey of contemporary Bayesian Network Structure Learning methods Ligon Liu September 2015 Ligon Liu (CUNY) Survey

Upper bound of optimal parent set[4]

TheoremLet N be total count of (V ,R , c), |LU | =

∏u∈U |Lu|. With BIC or AIC as

score function, if∣∣∣LPa(v)

∣∣∣ > Nw ·

log(|Lv |)|Lv |−1 , any proper superset of Pa(v) is not

the parent set of vertex v in an optimal structure.

TheoremGiven a BD score and two parent sets Pa′(v) and Pa(v) for a node v suchthat Pa′(v) ⊂ Pa(v), let Kvj =

∣∣∣LPa(v)|pj∣∣∣, if

S(v ,Pa′(v)) >Kvj |Kvj>2∑

j=1

f(Kvj , αvjk |∀k ) +

Kvj |Kvj=1∑j=1

logαvjk

αvj

then Pa(v) is not an optimal parent set of v.

Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 27) September 2015 27 / 38

Page 28: Survey of contemporary Bayesian Network Structure Learning ... · Survey of contemporary Bayesian Network Structure Learning methods Ligon Liu September 2015 Ligon Liu (CUNY) Survey

Upper bound of optimal parent set score[22, 21]

TheoremGiven a BD score S and two parent sets Pa′(v) and Pa(v) for a node vsuch that Pa′(v) ⊂ Pa(v), let Kvj =

∣∣∣LPa(v)|pj∣∣∣, if

S(v ,Pa′(v)) >Kvj |Kvj>2∑

j=1

f(Kvj , αvjk |∀k ) +

Kvj |Kvj=1∑j=1

logαvjk

αvj

then Pa(v) is not an optimal parent set of v.

Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 28) September 2015 28 / 38

Page 29: Survey of contemporary Bayesian Network Structure Learning ... · Survey of contemporary Bayesian Network Structure Learning methods Ligon Liu September 2015 Ligon Liu (CUNY) Survey

Order Graph

DefinitionLet V = 1, . . . , n be the indexset of variables, the order graph (V,E) isdefined by a graph with vertex setV being V’s powerset, edge setE= {(X ,Y) | X ,Y ∈ P(V),X ⊂ Y , |X |+ 1 = |Y |}.

Obviously, any order graph is DAG.

ExampleOrder graph of V = 1, 2, 3, 4

Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 29) September 2015 29 / 38

Page 30: Survey of contemporary Bayesian Network Structure Learning ... · Survey of contemporary Bayesian Network Structure Learning methods Ligon Liu September 2015 Ligon Liu (CUNY) Survey

Shortest Path formation of Optimal Parent Set Problem

Let S(v ,Pa(v)) be the scoring function item for v ∈ V and its parents.Finding optimal BN is equivalent to finding shortest path on OrderGraph (V,E) from Ø to V , if we define length of edge (X ,Y) to be:

d(X ,Y) = minPa⊆XS(Y − X ,Pa)

Advantages:

Shortest Path on directed graph G has well studied algorithms(Dijkstra, BFBnB, A∗ etc)

Generally does not require pre-generation of all graph data, vertexesand edges can be computed dynamically.

Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 30) September 2015 30 / 38

Page 31: Survey of contemporary Bayesian Network Structure Learning ... · Survey of contemporary Bayesian Network Structure Learning methods Ligon Liu September 2015 Ligon Liu (CUNY) Survey

Shortest Path Example

Add an example of shortest path on order graph <==> optimal parentset

Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 31) September 2015 31 / 38

Page 32: Survey of contemporary Bayesian Network Structure Learning ... · Survey of contemporary Bayesian Network Structure Learning methods Ligon Liu September 2015 Ligon Liu (CUNY) Survey

A ∗ best-first search algorithm

A heuristics-enhanced variation of Dijkstra algorithm, use “priorityfunction” to decide the next step of search

finding Shortest Path from vertex x to y on (V ,E), with the length ofeach edge d(u, v) | (u, v) ∈ E computable in a fixed time cost.

The “priority function” on vertex v ∈ V :

f(v) = d(x, v) + h(v , y)

d(x, v): already computed distance from x to v

h(v , y) is the heuristically estimated distance from v to y

Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 32) September 2015 32 / 38

Page 33: Survey of contemporary Bayesian Network Structure Learning ... · Survey of contemporary Bayesian Network Structure Learning methods Ligon Liu September 2015 Ligon Liu (CUNY) Survey

One A* heuristic function ford(X ,Y) = minPa⊆XS(Y − X ,Pa)

DefinitionLet (V,E) be an order graph of vertex set V, U ⊆ V , heuristic functionused in [22], denoted by h(U), is defined by

h(U) =∑

v⊆V−U

minPa⊆V−{v}S({v},Pa)

Remark: h(U) is acquired by using the best parent set for each vertex inV − U, regardless if the graph is DAG.

Theoremh(U) is monotonic. [22]

Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 33) September 2015 33 / 38

Page 34: Survey of contemporary Bayesian Network Structure Learning ... · Survey of contemporary Bayesian Network Structure Learning methods Ligon Liu September 2015 Ligon Liu (CUNY) Survey

Integer valued Multisets(imsets)

DefinitionLet V be a set of integers, A Integer Valued Multiset (imset) is a mappingfrom P(V) to the set of integers Z.

ExampleLet a,b,c be integers, an example imset u with V={a,b,c}:

u = δ{b} − δ{a,b} − δ{b ,c} + δ{a,b ,c}

δ : Kronecker delta imset defined on the following page.

Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 34) September 2015 34 / 38

Page 35: Survey of contemporary Bayesian Network Structure Learning ... · Survey of contemporary Bayesian Network Structure Learning methods Ligon Liu September 2015 Ligon Liu (CUNY) Survey

Arithmetic notation of imsets

DefinitionLet V be a set of integers and U ⊆ V . The U Kronecker delta imset,denoted by δU, is defined by

δU(X) =

1 X = U0 X , U

DefinitionLet V be a set of integers, a and b are imsets: P(V)→ Z

(a + b)(X) = a(X) + b(X)

The same for minus.

Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 35) September 2015 35 / 38

Page 36: Survey of contemporary Bayesian Network Structure Learning ... · Survey of contemporary Bayesian Network Structure Learning methods Ligon Liu September 2015 Ligon Liu (CUNY) Survey

DAG to {0,1} to [0,1]

Family Variable Vector

{φvU = 1 if Pa(v) = U, 0 otherwise}

Standard Imset

u(V ,E) = δV − δØ +∑v∈V

(δPa(v) − δ{v}∪Pa(v))

Characteristic Imset

c(V ,E)(U) = 1 −W⊆V∑U⊆W

u(V ,E)(W)

Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 36) September 2015 36 / 38

Page 37: Survey of contemporary Bayesian Network Structure Learning ... · Survey of contemporary Bayesian Network Structure Learning methods Ligon Liu September 2015 Ligon Liu (CUNY) Survey

Linear Program of Family Variable Vector

Family Variable Vector

{φvU = 1 if Pa(v) = U, 0 otherwise}

Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 37) September 2015 37 / 38

Page 38: Survey of contemporary Bayesian Network Structure Learning ... · Survey of contemporary Bayesian Network Structure Learning methods Ligon Liu September 2015 Ligon Liu (CUNY) Survey

References

Diego Colombo and Marloes H Maathuis.Order-independent constraint-based causal structure learning.The Journal of Machine Learning Research, 15(1):3741–3782, 2014.

James Cussens.Integer programming for bayesian network structure learning.2014.

James Cussens, David Haws, and Milan Studeny.Polyhedral aspects of score equivalence in bayesian network structurelearning.arXiv preprint arXiv:1503.00829, 2015.

Cassio P De Campos and Qiang Ji.Efficient structure learning of bayesian networks using constraints.The Journal of Machine Learning Research, 12:663–689, 2011.

Xiannian Fan, Brandon Malone, and Changhe Yuan.Finding optimal bayesian network structures with constraints learnedfrom data.In Proceedings of the 30th Annual Conference on Uncertainty inArtificial Intelligence (UAI-14), 2014.

Xiannian Fan and Changhe Yuan.An improved lower bound for bayesian network structure learning.In Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015.

Xiannian Fan, Changhe Yuan, and Brandon Malone.Tightening bounds for bayesian network structure learning.In Twenty-Eighth AAAI Conference on Artificial Intelligence, 2014.

Yuhong Guo and Dale Schuurmans.Convex structure learning for bayesian networks: Polynomial featureselection and approximate ordering.

Tommi Jaakkola, David Sontag, Amir Globerson, and Marina Meila.Learning bayesian network structure using lp relaxations.In International Conference on Artificial Intelligence and Statistics,pages 358–365, 2010.

Mikko Koivisto and Kismat Sood.Exact bayesian structure discovery in bayesian networks.The Journal of Machine Learning Research, 5:549–573, 2004.

Jan Lemeire, Stijn Meganck, and Francesco Cartella.Robust independence-based causal structure learning in absence ofadjacency faithfulness.on Probabilistic Graphical Models, page 169, 2010.

Dimitris Margaritis.Learning Bayesian network model structure from data.PhD thesis, US Army, 2003.

Sascha Ott and Satoru Miyano.Finding optimal gene networks using biological constraints.Genome Informatics, 14:124–133, 2003.

Pekka Parviainen and Mikko Koivisto.Bayesian structure discovery in bayesian networks with less space.In International Conference on Artificial Intelligence and Statistics,pages 589–596, 2010.

Tomi Silander and Petri Myllymaki.A simple approach for finding the globally optimal bayesian networkstructure.arXiv preprint arXiv:1206.6875, 2006.

Peter Spirtes, Clark N Glymour, and Richard Scheines.Causation, prediction, and search, volume 81.MIT press, 2000.

Milan Studeny and David Haws.On polyhedral approximations of polytopes for learning bayes nets.Technical report, 2011.

Milan Studeny and David Haws.Learning bayesian network structure: Towards the essential graph byinteger linear programming tools.International Journal of Approximate Reasoning, 55(4):1043–1071,2014.

Marc Teyssier.Ordering-based search: A simple and effective algorithm for learningbayesian networks.In In UAI, 2005.

Kui Yu, Xindong Wu, Zan Zhang, Yang Mu, Hao Wang, and Wei Ding.Markov blanket feature selection with non-faithful data distributions.In Data Mining (ICDM), 2013 IEEE 13th International Conference on,pages 857–866. IEEE, 2013.

Changhe Yuan and Brandon Malone.An improved admissible heuristic for learning optimal bayesiannetworks.

Changhe Yuan, Brandon Malone, and Xiaojian Wu.Learning optimal bayesian networks using a* search.In IJCAI Proceedings-International Joint Conference on ArtificialIntelligence, volume 22, page 2186, 2011.

Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 38) September 2015 38 / 38