bayesian networks - khoury college of computer sciences · 2016-11-11 · bayesian networks a...
TRANSCRIPT
![Page 1: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/1.jpg)
Bayesian Networks Chris Amato Northeastern University Some images and slides are used from: Rob Platt, CS188 UC Berkeley, AIMA, Mykel Kochenderfer
![Page 2: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/2.jpg)
Modelsdescribehow(apor1onof)theworldworksModelsarealwayssimplifica1ons
MaynotaccountforeveryvariableMaynotaccountforallinterac1onsbetweenvariables“Allmodelsarewrong;butsomeareuseful.”
–GeorgeE.P.Box
Whatdowedowithprobabilis1cmodels?
We(orouragents)needtoreasonaboutunknownvariables,givenevidence
Example:explana1on(diagnos1creasoning)Example:predic1on(causalreasoning)Example:valueofinforma1on
Probabilistic models
![Page 3: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/3.jpg)
Twoproblemswithusingfulljointdistribu1ontablesasourprobabilis1cmodels:Unlessthereareonlyafewvariables,thejointisWAYtoobigtorepresentexplicitlyHardtolearn(es1mate)anythingempiricallyaboutmorethanafewvariablesata1me
Bayes’nets:atechniquefordescribingcomplexjointdistribu1ons(models)usingsimple,localdistribu1ons(condi1onalprobabili1es)MoreproperlycalledgraphicalmodelsWedescribehowvariableslocallyinteractLocalinterac1onschaintogethertogiveglobal,indirectinterac1ons
Bayes’ nets: Big picture
![Page 4: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/4.jpg)
Nodes:variables(withdomains)
Canbeassigned(observed)orunassigned(unobserved)
Arcs:interac1ons
SimilartoCSPconstraintsIndicate“directinfluence”betweenvariablesFormally:encodecondi1onalindependence(morelater)
Graphical model notation
![Page 5: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/5.jpg)
§ Adirected,acyclicgraph,onenodeperrandomvariable§ Acondi1onalprobabilitytable(CPT)foreachnode
§ Acollec1onofdistribu1onsoverX,oneforeachcombina1onofparents’values
§ Bayes’netsimplicitlyencodejointdistribu1ons
§ Asaproductoflocalcondi1onaldistribu1ons§ ToseewhatprobabilityaBNgivestoafullassignment,mul1plyalltherelevantcondi1onalstogether:
Bayes’ net semantics
Bayesian networksA compact representation of a joint probability distribution
I Each node corresponds to a random variableI Arrows connect pairs of nodes, and there are no directed
cyclesI Associated with each node Xi is a distribution P(Xi | PaXi )
B S
E
D C
battery failure solar panel failure
electrical system failure
trajectory deviation communication loss
DMU 2.1.4
Representation of joint distribution
B S
E
D C
P(B) 1 1 P(S)
4 P(E | B, S)
P(D | E ) 2 2 P(C | E )
I Chain rule:P(B, S, E , D, C) = P(B)P(S)P(E | B, S)P(D | E )P(C | E )
I How many independent parameters in general? 25 ≠ 1 = 31I How many independent parameters in Bayes net?
1 + 1 + 4 + 2 + 2 = 10
Bayesian networks can reduce the number of parametersDMU 2.1.4
![Page 6: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/6.jpg)
NindependentcoinflipsNointerac1onsbetweenvariables:absoluteindependence
X1 X2 Xn
Example: Coin flips
![Page 7: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/7.jpg)
Variables:R:Itrains
T:Thereistraffic
Model1:independence
Whyisanagentusingmodel2be^er?
R
T
R
T
§ Model2:raincausestraffic
Example: Traffic
![Page 8: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/8.jpg)
Bayes’netsimplicitlyencodejointdistribu1ons
Asaproductoflocalcondi1onaldistribu1ons
ToseewhatprobabilityaBNgivestoafullassignment,mul1plyalltherelevantcondi1onalstogether:
Example:
Probabilities in Bayes’ nets
![Page 9: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/9.jpg)
Whyareweguaranteedthatse_ng
resultsinaproperjointdistribu1on?
Chainrule(validforalldistribu1ons):Assumecondi1onalindependences:àConsequence:
NoteveryBNcanrepresenteveryjointdistribu1on
Thetopologyenforcescertaincondi1onalindependencies
Probabilities in Bayes’ nets
![Page 10: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/10.jpg)
Onlydistribu0onswhosevariablesareabsolutelyindependentcanberepresentedbyaBayes’netwithnoarcs.
h 0.5
t 0.5
h 0.5
t 0.5
h 0.5
t 0.5
X1 X2 Xn
Example: Coin flips
![Page 11: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/11.jpg)
R
T
+r 1/4
-r 3/4
+r +t 3/4
-t 1/4
-r +t 1/2
-t 1/2
Example: Traffic
![Page 12: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/12.jpg)
Burglary Earthquake
Alarm
Johncalls
Marycalls
B P(B)
+b 0.001
-b 0.999
E P(E)
+e 0.002
-e 0.998
B E A P(A|B,E)
+b +e +a 0.95
+b +e -a 0.05
+b -e +a 0.94
+b -e -a 0.06
-b +e +a 0.29
-b +e -a 0.71
-b -e +a 0.001
-b -e -a 0.999
A J P(J|A)
+a +j 0.9
+a -j 0.1
-a +j 0.05
-a -j 0.95
A M P(M|A)
+a +m 0.7
+a -m 0.3
-a +m 0.01
-a -m 0.99
Example: Alarm network
![Page 13: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/13.jpg)
Causaldirec1on
R
T
+r 1/4
-r 3/4
+r +t 3/4
-t 1/4
-r +t 1/2
-t 1/2
+r +t 3/16
+r -t 1/16
-r +t 6/16
-r -t 6/16
Example: Traffic
![Page 14: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/14.jpg)
Reversecausality?
T
R
+t 9/16
-t 7/16
+t +r 1/3
-r 2/3
-t +r 1/7
-r 6/7
+r +t 3/16
+r -t 1/16
-r +t 6/16
-r -t 6/16
Example: Reverse traffic
![Page 15: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/15.jpg)
WhenBayes’netsreflectthetruecausalpa^erns:
Onensimpler(nodeshavefewerparents)OneneasiertothinkaboutOneneasiertoelicitfromexperts
BNsneednotactuallybecausal
Some1mesnocausalnetexistsoverthedomain(especiallyifvariablesaremissing)
E.g.considerthevariableTrafficEndupwitharrowsthatreflectcorrela1on,notcausa1on
Whatdothearrowsreallymean?
TopologymayhappentoencodecausalstructureTopologyreallyencodescondi1onalindependence
Causality?
![Page 16: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/16.jpg)
§ Howbigisajointdistribu1onoverNBooleanvariables?
2N
§ HowbigisanN-nodenetifnodeshaveuptokparents?
O(N*2k+1)
§ Bothgiveyouthepowertocalculate
§ BNs:Hugespacesavings!
§ AlsoeasiertoelicitlocalCPTs
§ Alsofastertoanswerqueries(coming)
Size of a Bayes’ net
![Page 17: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/17.jpg)
§ Assump1onswearerequiredtomaketodefinetheBayesnetwhengiventhegraph:
§ Beyondabove“chainrule→Bayesnet”condi1onalindependenceassump1ons
§ Onenaddi1onalcondi1onalindependences
§ Theycanbereadoffthegraph
§ Importantformodeling:understandassump1onsmadewhenchoosingaBayesnetgraph
P (xi|x1 · · ·xi�1) = P (xi|parents(Xi))
Bayes’ nets: Assumptions
![Page 18: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/18.jpg)
§ Condi1onalindependenceassump1onsdirectlyfromsimplifica1onsinchainrule:§ Addi1onalimpliedcondi1onalindependenceassump1ons?
Example
![Page 19: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/19.jpg)
§ Importantques1onaboutaBN:§ Aretwonodesindependentgivencertainevidence?§ Ifyes,canproveusingalgebra(tediousingeneral)§ Ifno,canprovewithacounterexample§ Example:
§ Ques1on:areXandZnecessarilyindependent?§ Answer:no.Example:lowpressurecausesrain,whichcausestraffic.§ XcaninfluenceZ,ZcaninfluenceX(viaY)§ Addendum:theycouldbeindependent:how?
Independence in a Bayes’ net
![Page 20: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/20.jpg)
§ Studyindependenceproper1esfortriples§ Analyzecomplexcasesintermsofmembertriples§ D-separa1on:acondi1on/algorithmforansweringsuchqueries
D-separation: Outline
![Page 21: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/21.jpg)
§ Thisconfigura1onisa“causalchain”
X:LowpressureY:RainZ:Traffic
§ GuaranteedXindependentofZ?§ No!
§ OneexamplesetofCPTsforwhichXisnotindependentofZissufficienttoshowthisindependenceisnotguaranteed.
§ Example:
§ Lowpressurecausesraincausestraffic,highpressurecausesnoraincausesnotraffic
§ Innumbers:
P(+y|+x)=1,P(-y|-x)=1,P(+z|+y)=1,P(-z|-y)=1
Causal chains
![Page 22: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/22.jpg)
§ Thisconfigura1onisa“causalchain”
§ GuaranteedXindependentofZgivenY?
§ Evidencealongthechain“blocks”theinfluence
Yes!
X:LowpressureY:RainZ:Traffic
Causal chains
![Page 23: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/23.jpg)
§ Thisconfigura1onisa“commoncause”
§ GuaranteedXindependentofZ?§ No!
§ OneexamplesetofCPTsforwhichXisnotindependentofZissufficienttoshowthisindependenceisnotguaranteed.
§ Example:
§ Projectduecausesbothforumsbusyandlabfull
§ Innumbers:
P(+x|+y)=1,P(-x|-y)=1,P(+z|+y)=1,P(-z|-y)=1
Y:Projectdue
X:Forumsbusy Z:Labfull
Common cause
![Page 24: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/24.jpg)
§ Thisconfigura1onisa“commoncause”
§ GuaranteedXandZindependentgivenY?
§ Observingthecauseblocksinfluencebetweeneffects.
Yes!
Y:Projectdue
X:Forumsbusy Z:Labfull
Common cause
![Page 25: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/25.jpg)
§ Lastconfigura1on:twocausesofoneeffect(v-structures)
Z:Traffic
§ AreXandYindependent?
§ Yes:theballgameandtheraincausetraffic,buttheyarenotcorrelated
§ S1llneedtoprovetheymustbe(tryit!)
§ AreXandYindependentgivenZ?
§ No:seeingtrafficputstherainandtheballgameincompe11onasexplana1on.
§ Thisisbackwardsfromtheothercases
§ Observinganeffectac1vatesinfluencebetweenpossiblecauses.
X:Raining Y:Ballgame
Common effect
![Page 26: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/26.jpg)
Conditional independence: Battery example
Bayesian networksA compact representation of a joint probability distribution
I Each node corresponds to a random variableI Arrows connect pairs of nodes, and there are no directed
cyclesI Associated with each node Xi is a distribution P(Xi | PaXi )
B S
E
D C
battery failure solar panel failure
electrical system failure
trajectory deviation communication loss
DMU 2.1.4
Representation of joint distribution
B S
E
D C
P(B) 1 1 P(S)
4 P(E | B, S)
P(D | E ) 2 2 P(C | E )
I Chain rule:P(B, S, E , D, C) = P(B)P(S)P(E | B, S)P(D | E )P(C | E )
I How many independent parameters in general? 25 ≠ 1 = 31I How many independent parameters in Bayes net?
1 + 1 + 4 + 2 + 2 = 10
Bayesian networks can reduce the number of parametersDMU 2.1.4
![Page 27: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/27.jpg)
C is independent of B given E
Knowing that you have a battery failure does not affect your belief that there is a communication loss if you know that there has been an electrical system failure
D is independent of S given E
If you know there is an electrical failure, observing a trajectory deviation does not affect your belief that there has been a solar panel failure
Bayes nets: conditional independence example
Conditional independence
I The structure of a Bayes net encodes conditionalindependence assumptions
I X and Y are independent if and only ifP(X , Y ) = P(X )P(Y )
I Equivalently, P(X ) = P(X | Y )I X and Y are conditionally independent given Z if and only if
P(X , Y | Z ) = P(X | Z )P(Y | Z )I Equivalently, P(X | Z ) = P(X | Y , Z )
Independence assumptions reduce the number of parameters
DMU 2.1.5
Conditional independence example
B S
E
D C
I C is independent of B given EI Knowing that you have a battery failure does not a�ect your
belief that there is a communication loss if you know thatthere has been an electrical system failure
I D is independent of S given EI If you know there is an electrical failure, observing a trajectory
deviation does not a�ect your belief that there has been a solarpanel failure
DMU 2.1.5
![Page 28: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/28.jpg)
B is independent of S
Knowing there is a battery failure does not affect your belief about whether there has been a solar panel failure
B is not independent of S given E
If you know there has been an electrical failure and there has not been a solar panel failure, then it is more likely there was a battery failure
Influence only flows through B → E ← S (a v-structure) when E (or one of its descendants) is known
Conditional independence in v-structures
![Page 29: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/29.jpg)
Two sets of nodes, A and B, are conditionally independent given node set C are called d-separated in the Bayes net if for any path between A and B:
• The path contains a chain of nodes, A → X → B, such that X is in C
• The path contains a fork, A ← X → B, such that X is in C
• The path contains a v-structure, A → X ← B, such that X is not in C and no descendant of X is in C
Markov blanket: the parents, the children and the parents of the children
• A node is independent of all other nodes in the graph given its Markov blanket
Independence concepts: General case
![Page 30: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/30.jpg)
Yes
Example
![Page 31: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/31.jpg)
Yes
Yes
Yes
Example
![Page 32: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/32.jpg)
§ Variables:§ R:Raining§ T:Traffic§ D:Roofdrips§ S:I’msad
§ Ques1ons:
Yes
Example
![Page 33: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/33.jpg)
§ GivenaBayesnetstructure,canrund-separa1onalgorithmtobuildacompletelistofcondi1onalindependencesthatarenecessarilytrueoftheform
§ Thislistdeterminesthesetofprobability
distribu1onsthatcanberepresented
Xi �� Xj |{Xk1 , ..., Xkn}
Structure implications
![Page 34: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/34.jpg)
§ GivensomegraphtopologyG,onlycertainjointdistribu1onscanbeencoded
§ Thegraphstructureguaranteescertain(condi1onal)independences
§ (Theremightbemoreindependence)
§ Addingarcsincreasesthesetofdistribu1ons,buthasseveralcosts
§ Fullcondi1oningcanencodeanydistribu1on
Topology limits distributions
![Page 35: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/35.jpg)
§ Wewanttotrackmul1plevariablesover1me,usingmul1plesourcesofevidence
§ Idea:RepeatafixedBayesnetstructureateach1me§ Variablesfrom1metcancondi1ononthosefromt-1
§ DynamicBayesnetsareageneraliza1onofHMMs
DynamicBayesNets(DBNs)
![Page 36: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/36.jpg)
§ Apar1cleisacompletesamplefora1mestep
§ Ini*alize:Generatepriorsamplesforthet=1Bayesnet§ Examplepar1cle:G1
a=(3,3)G1b=(5,3)
§ Elapse*me:Sampleasuccessorforeachpar1cle§ Examplesuccessor:G2
a=(2,3)G2b=(6,3)
§ Observe:Weighteachen0resamplebythelikelihoodoftheevidencecondi1onedonthesample§ Likelihood:P(E1a|G1
a)*P(E1b|G1b)
§ Resample:Selectpriorsamples(tuplesofvalues)inpropor1ontotheirlikelihood
DBNpar1clefilters
![Page 37: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/37.jpg)
Video:Pacmansonar--GhostDBN
![Page 38: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/38.jpg)
§ Examples:
§ Posteriorprobability
§ Mostlikelyexplana1on:
Inference
§ Inference:calcula1ngsomeusefulquan1tyfromajointprobabilitydistribu1on
![Page 39: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/39.jpg)
Initial evidence: car won’t start
Testable variables (green), “broken, so fix it” variables (orange)
Hidden variables (gray) ensure sparse structure, reduce parameters
Bayes net for car diagnosis
Example: Car diagnosis
Initial evidence: car won’t startTestable variables (green), “broken, so fix it” variables (orange)Hidden variables (gray) ensure sparse structure, reduce parameters
lights
no oil no gas starterbroken
battery age alternator broken
fanbeltbroken
battery dead no charging
battery flat
gas gauge
fuel lineblocked
oil light
battery meter
car won’t start dipstick
Chapter 14.1–3 19
![Page 40: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/40.jpg)
Suppose we want to infer the distribution P(B=true | D=true, C=true)
Query variables: B
Evidence variables: D, C
Hidden variables: S, E
Inference
Example: Inference in temporal models
X0 X1 X2
O0 O1 O2
I Filtering: P(Xk | O0:k) (Kalman filter/particle filter)I Prediction: P(Xk | O0:t) where k > tI Smoothing: P(Xk | O0:t) where k < t (forward-backward)I Most likely explanation: arg maxX0:t P(X0:t | O0:t) (Viterbi)
DMU 2.2.2
Inference
Suppose we want to infer the distribution P(b1 | d1, c1)
B S
E
D C
I Query variables: BI Evidence variables: D, CI Hidden variables: S, E
DMU 2.2.3
![Page 41: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/41.jpg)
Burglar alarm example Example contd.
.001P(B)
.002
P(E)
Alarm
Earthquake
MaryCallsJohnCalls
Burglary
BTTFF
ETFTF
.95
.29
.001
.94
P(A|B,E)
ATF
.90
.05
P(J|A) ATF
.70
.01
P(M|A)
Chapter 14.1–3 6
![Page 42: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/42.jpg)
B P(B)
+b 0.001
-b 0.999
E P(E)
+e 0.002
-e 0.998
B E A P(A|B,E)
+b +e +a 0.95
+b +e -a 0.05
+b -e +a 0.94
+b -e -a 0.06
-b +e +a 0.29
-b +e -a 0.71
-b -e +a 0.001
-b -e -a 0.999
A J P(J|A)
+a +j 0.9
+a -j 0.1
-a +j 0.05
-a -j 0.95
A M P(M|A)
+a +m 0.7
+a -m 0.3
-a +m 0.01
-a -m 0.99
Burglar alarm example
![Page 43: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/43.jpg)
What if I want to calculate P(B|j,m)?
Burglar alarm example Example contd.
.001P(B)
.002
P(E)
Alarm
Earthquake
MaryCallsJohnCalls
Burglary
BTTFF
ETFTF
.95
.29
.001
.94
P(A|B,E)
ATF
.90
.05
P(J|A) ATF
.70
.01
P(M|A)
Chapter 14.1–3 6
![Page 44: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/44.jpg)
General case: Evidence variables: Query* variable: Hidden variables:
Allvariables
*Worksfinewithmul0plequeryvariables,too
§ Wewant:
§ Step1:Selecttheentriesconsistentwiththeevidence
§ Step2:SumoutHtogetjointofqueryandevidence
§ Step3:Normalize
⇥ 1
Z
Inference by enumeration
![Page 45: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/45.jpg)
P(W)?
P(W | winter)?
P(W | winter, hot)?
S T W Psummer hot sun 0.30summer hot rain 0.05summer cold sun 0.10summer cold rain 0.05winter hot sun 0.10winter hot rain 0.05winter cold sun 0.15winter cold rain 0.20
Inference by enumeration
![Page 46: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/46.jpg)
Slightly intelligent way to sum out variables from the joint without actually constructing its explicit representation
P(B|j,m)
= P(B,j,m)/P(j,m)
= α P(B,j,m)
=α Pa∑
e∑ (B,e,a, j,m)
Inference by enumeration
![Page 47: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/47.jpg)
Slightly intelligent way to sum out variables from the joint without actually constructing its explicit representation
Rewrite full joint entries using product of CPT entries:
P(B|j,m)
Recursive depth-first enumeration: O(n) space, O(dn) time
Inference by enumeration
=α Pa∑
e∑ (B)P(e)P(a | B,e)P( j | a)P(m | a)
=αP(B) Pe∑ (e) P
a∑ (a | B,e)P( j | a)P(m | a)
![Page 48: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/48.jpg)
Enumeration is inefficient: repeated computation
e.g., computes P(j|a)P(m|a) for each value of e
Evaluation tree Evaluation tree
P(j|a).90
P(m|a).70 .01
P(m| a)
.05P(j| a) P(j|a)
.90
P(m|a).70 .01
P(m| a)
.05P(j| a)
P(b).001
P(e).002
P( e).998
P(a|b,e).95 .06
P( a|b, e).05P( a|b,e)
.94P(a|b, e)
Enumeration is inefficient: repeated computatione.g., computes P (j|a)P (m|a) for each value of e
Chapter 14.4–5 6
![Page 49: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/49.jpg)
P (Antilock|observed variables) = ?
Inference by enumeration?
![Page 50: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/50.jpg)
§ Whyisinferencebyenumera1onsoslow?§ Youjoinupthewholejointdistribu1onbefore
yousumoutthehiddenvariables
§ Idea:interleavejoiningandmarginalizing!§ Called“VariableElimina1on”§ Sumright-to-len,storingintermediateresults
(factors)toavoidrecomputa1on§ S1llNP-hard,butusuallymuchfasterthan
inference
§ Firstwe’llneedsomenewnota1on:factors
Inference by enumeration vs. variable elimination
![Page 51: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/51.jpg)
§ Ingeneral,whenwewriteP(Y1…YN|X1…XM)
§ Itisa“factor,”amul1-dimensionalarray
§ ItsvaluesareP(y1…yN|x1…xM)
§ Anyassigned(=lower-case)XorYisadimensionmissing(selected)fromthearray
Factors
![Page 52: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/52.jpg)
§ RandomVariables§ R:Raining§ T:Traffic§ L:Lateforclass!
+r 0.1-r 0.9
+r +t 0.8+r -t 0.2-r +t 0.1-r -t 0.9
+t +l 0.3+t -l 0.7-t +l 0.1-t -l 0.9
P (L) = ?
=X
r,t
P (r, t, L)
=X
r,t
P (r)P (t|r)P (L|t)
Example: Traffic domain
![Page 53: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/53.jpg)
§ Trackobjectscalledfactors§ Ini1alfactorsarelocalCPTs(onepernode)§ Anyknownvaluesareselected
§ E.g.ifweknow,theini1alfactorsare
§ Procedure:Joinallfactors,theneliminateallhiddenvariables
+r 0.1-r 0.9
+r +t 0.8+r -t 0.2-r +t 0.1-r -t 0.9
+t +l 0.3+t -l 0.7-t +l 0.1-t -l 0.9
+t +l 0.3-t +l 0.1
+r 0.1-r 0.9
+r +t 0.8+r -t 0.2-r +t 0.1-r -t 0.9
Inference by enumeration: Procedural outline
![Page 54: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/54.jpg)
§ Firstbasicopera1on:joiningfactors§ Combiningfactors:
§ Justlikeadatabasejoin§ Getallfactorsoverthejoiningvariable§ Buildanewfactorovertheunionofthevariables
involved
§ Example:JoinonR
§ Computa1onforeachentry:pointwiseproducts
+r 0.1-r 0.9
+r +t 0.8+r -t 0.2-r +t 0.1-r -t 0.9
+r +t 0.08+r -t 0.02-r +t 0.09-r -t 0.81
Operation 1: Join factors
![Page 55: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/55.jpg)
JoinR+r 0.1-r 0.9
+r +t 0.8+r -t 0.2-r +t 0.1-r -t 0.9
+t +l 0.3+t -l 0.7-t +l 0.1-t -l 0.9
+r +t 0.08+r -t 0.02-r +t 0.09-r -t 0.81
+t +l 0.3+t -l 0.7-t +l 0.1-t -l 0.9
+r +t +l 0.024+r +t -l 0.056+r -t +l 0.002+r -t -l 0.018-r +t +l 0.027-r +t -l 0.063-r -t +l 0.081-r -t -l 0.729
JoinT
Example: Multiple joins
![Page 56: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/56.jpg)
§ Secondbasicopera1on:marginaliza1on§ Takeafactorandsumoutavariable
§ Shrinksafactortoasmallerone§ Aprojec1onopera1on
§ Example:
+r +t 0.08+r -t 0.02-r +t 0.09-r -t 0.81
+t 0.17-t 0.83
Operation 2: Eliminate
![Page 57: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/57.jpg)
SumoutR
SumoutT
+r +t +l 0.024+r +t -l 0.056+r -t +l 0.002+r -t -l 0.018-r +t +l 0.027-r +t -l 0.063-r -t +l 0.081-r -t -l 0.729
+t +l 0.051+t -l 0.119-t +l 0.083-t -l 0.747
+l 0.134-l 0.886
Multiple elimination
![Page 58: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/58.jpg)
Thus far: Multiple join, multiple eliminate (= inference by enumeration)
![Page 59: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/59.jpg)
Marginalizing early (= variable elimination)
![Page 60: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/60.jpg)
§ InferencebyEnumera1on
P (L) = ?
§ VariableElimina1on
=X
t
P (L|t)X
r
P (r)P (t|r)=X
t
X
r
P (L|t)P (r)P (t|r)
Traffic domain
![Page 61: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/61.jpg)
SumoutR
+r +t 0.08+r -t 0.02-r +t 0.09-r -t 0.81
+t +l 0.3+t -l 0.7-t +l 0.1-t -l 0.9
+t 0.17-t 0.83
+t +l 0.3+t -l 0.7-t +l 0.1-t -l 0.9
+r 0.1-r 0.9
+r +t 0.8+r -t 0.2-r +t 0.1-r -t 0.9
+t +l 0.3+t -l 0.7-t +l 0.1-t -l 0.9
JoinR
+t +l 0.051+t -l 0.119-t +l 0.083-t -l 0.747
+l 0.134-l 0.866
JoinT SumoutT
Variable elimination
![Page 62: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/62.jpg)
§ Ifevidence,startwithfactorsthatselectthatevidence§ Noevidenceusestheseini1alfactors:
§ Compu1ng,theini1alfactorsbecome:
§ Weeliminateallvariablesotherthanquery+evidence
+r 0.1-r 0.9
+r +t 0.8+r -t 0.2-r +t 0.1-r -t 0.9
+t +l 0.3+t -l 0.7-t +l 0.1-t -l 0.9
+r 0.1 +r +t 0.8+r -t 0.2
+t +l 0.3+t -l 0.7-t +l 0.1-t -l 0.9
Evidence
![Page 63: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/63.jpg)
§ Resultwillbeaselectedjointofqueryandevidence§ E.g.forP(L|+r),wewouldendupwith:
§ Togetouranswer,justnormalizethis!
§ That’sit!
+l 0.26-l 0.74
+r +l 0.026+r -l 0.074
Normalize
Evidence
![Page 64: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/64.jpg)
§ Query:§ Startwithini1alfactors:
§ LocalCPTs(butinstan1atedbyevidence)§ Whilethereares1llhiddenvariables
(notQorevidence):§ PickahiddenvariableH§ Joinallfactorsmen1oningH§ Eliminate(sumout)H
§ Joinallremainingfactorsandnormalize
General variable elimination
![Page 65: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/65.jpg)
Choose A
Example
![Page 66: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/66.jpg)
ChooseE
FinishwithB
Example
![Page 67: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/67.jpg)
marginalcanbeobtainedfromjointbysummingoutuseBayes’netjointdistribu1onexpressionusex*(y+z)=xy+xzjoiningona,andthensummingoutgivesf1usex*(y+z)=xy+xzjoiningone,andthensummingoutgivesf2
Allwearedoingisexploi*nguwy+uwz+uxy+uxz+vwy+vwz+vxy+vxz=(u+v)(w+x)(y+z)toimprovecomputa*onalefficiency!
Same example in equations
![Page 68: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/68.jpg)
P(B|j,m)
P(B|j,m)
Complexity depends on factor size
Variable ordering
=αP(B) Pe∑ (e) P
a∑ (a | B,e)P( j | a)P(m | a)
=αP(B) Pa∑ ( j | a)P(m | a) P
e∑ (e)P(a | B,e)
![Page 69: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/69.jpg)
Consider the query P(JohnCalls|Burglary = true)
Sum over m is identically 1; M is irrelevant to the query
Thm 1: Y is irrelevant unless Y ∈ Ancestors(Q ∪ E)
Here, Q = {JohnCalls}, E = {Burglary}, and Ancestors(Q ∪ E) = {Alarm,Earthquake}
so MaryCalls is irrelevant
Irrelevant variables
P(J | b) =αP(b) Pe∑ (e) P
a∑ (a | b,e)P(J | a) P
m∑ (m | a)
![Page 70: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/70.jpg)
§ Thecomputa1onalandspacecomplexityofvariableelimina1onisdeterminedbythelargestfactor
§ Theelimina1onorderingcangreatlyaffectthesizeofthelargestfactor§ E.g.,2nvs.2
§ Doestherealwaysexistanorderingthatonlyresultsinsmallfactors?
§ No!
VE: Computation and space complexity
![Page 71: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/71.jpg)
Complexity in polytrees is linear in number of variables
A polytree has no undirected cycles
In general, inference in Bayesian networks is NP hard
In worst case, it is probably exponential
Variable elimination algorithm relies on a heuristic ordering of variables to eliminate in sequence
Often linear, sometimes exponential
Belief propagation propagates ``messages'' through the network: Linear for polytrees, not exact for other graphs
Complexity of exact inference
![Page 72: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/72.jpg)
Suppose we want to infer the distribution P(B=true | D=true, C=true)
Query variables: B
Evidence variables: D, C
Hidden variables: S, E
Approximate inference
Example: Inference in temporal models
X0 X1 X2
O0 O1 O2
I Filtering: P(Xk | O0:k) (Kalman filter/particle filter)I Prediction: P(Xk | O0:t) where k > tI Smoothing: P(Xk | O0:t) where k < t (forward-backward)I Most likely explanation: arg maxX0:t P(X0:t | O0:t) (Viterbi)
DMU 2.2.2
Inference
Suppose we want to infer the distribution P(b1 | d1, c1)
B S
E
D C
I Query variables: BI Evidence variables: D, CI Hidden variables: S, E
DMU 2.2.3
![Page 73: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/73.jpg)
List the nodes in order
If there is an edge A → B, then A comes before B in the list
Topological sort
Example: Inference in temporal models
X0 X1 X2
O0 O1 O2
I Filtering: P(Xk | O0:k) (Kalman filter/particle filter)I Prediction: P(Xk | O0:t) where k > tI Smoothing: P(Xk | O0:t) where k < t (forward-backward)I Most likely explanation: arg maxX0:t P(X0:t | O0:t) (Viterbi)
DMU 2.2.2
Inference
Suppose we want to infer the distribution P(b1 | d1, c1)
B S
E
D C
I Query variables: BI Evidence variables: D, CI Hidden variables: S, E
DMU 2.2.3
![Page 74: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/74.jpg)
Direct sampling in a Bayes Net
Approximate inference through sampling
![Page 75: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/75.jpg)
In topological order
Sample from the condition probability distribution of X, given the sampled parent values
Approximate inference through sampling (Direct sampling)
Complexity of exact inference
I Complexity in polytrees is linear in number of variablesI A polytree has no undirected cycles
I In general, inference in Bayesian networks is NP hardI In worst case, it is probably exponential
I Variable elimination algorithm relies on a heuristic ordering ofvariables to eliminate in sequence
I Often linear, sometimes exponentialI Belief propagation propagates “messages” through the
networkI Linear for polytrees, not exact for other graphs
DMU 2.2.4
Approximate inference through sampling
B S
E
D C
B S E D C1
DMU 2.2.5
![Page 76: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/76.jpg)
In topological order
Sample from the condition probability distribution of X, given the sampled parent values
Approximate inference through sampling (Direct sampling)
Approximate inference through sampling
B S
E
D C
B S E D C1 1
DMU 2.2.5
Approximate inference through sampling
B S
E
D C
B S E D C1 1 1
DMU 2.2.5
![Page 77: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/77.jpg)
In topological order
Sample from the condition probability distribution of X, given the sampled parent values
Approximate inference through sampling (Direct sampling)
Approximate inference through sampling
B S
E
D C
B S E D C1 1
DMU 2.2.5
Approximate inference through sampling
B S
E
D C
B S E D C1 1 1
DMU 2.2.5
![Page 78: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/78.jpg)
In topological order
Sample from the condition probability distribution of X, given the sampled parent values
Approximate inference through sampling (Direct sampling)
Approximate inference through sampling
B S
E
D C
B S E D C1 1 1 0
DMU 2.2.5
Approximate inference through sampling
B S
E
D C
B S E D C1 1 1 0 00 1 0 1 00 0 0 1 11 0 1 1 11 0 0 0 00 0 0 0 11 0 0 0 10 0 1 0 0
Only 2 rows are useful for estimating P(b1 | d1, c1)!
If likelihood of evidence is small, then many samples are required
DMU 2.2.5
![Page 79: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/79.jpg)
What is the current approximation for P(B=true | D=true, C=true)?
Approximate inference through sampling (Direct sampling)
Approximate inference through sampling
B S
E
D C
B S E D C1 1 1 0
DMU 2.2.5
Approximate inference through sampling
B S
E
D C
B S E D C1 1 1 0 00 1 0 1 00 0 0 1 11 0 1 1 11 0 0 0 00 0 0 0 11 0 0 0 10 0 1 0 0
Only 2 rows are useful for estimating P(b1 | d1, c1)!
If likelihood of evidence is small, then many samples are required
DMU 2.2.5
![Page 80: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/80.jpg)
If likelihood of evidence is small, then many samples are required
Approximate inference through sampling (Direct sampling)
Approximate inference through sampling
B S
E
D C
B S E D C1 1 1 0
DMU 2.2.5
Approximate inference through sampling
B S
E
D C
B S E D C1 1 1 0 00 1 0 1 00 0 0 1 11 0 1 1 11 0 0 0 00 0 0 0 11 0 0 0 10 0 1 0 0
Only 2 rows are useful for estimating P(b1 | d1, c1)!
If likelihood of evidence is small, then many samples are required
DMU 2.2.5
![Page 81: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to](https://reader033.vdocument.in/reader033/viewer/2022042214/5eba0bc18d698a22c224890a/html5/thumbnails/81.jpg)
Likelihood weighting involves generating samples that are consistent with evidence and weighting them
Gibbs sampling involves making random local changes to samples (form of Markov Chain Monte Carlo)
Many other approaches
Approximate sampling approaches