cpsc 422, lecture 18slide 1 intelligent systems (ai-2) computer science cpsc422, lecture 18 feb, 25,...
TRANSCRIPT
![Page 1: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of](https://reader035.vdocument.in/reader035/viewer/2022081514/56649d6e5503460f94a5008d/html5/thumbnails/1.jpg)
CPSC 422, Lecture 18 Slide 1
Intelligent Systems (AI-2)
Computer Science cpsc422, Lecture 18
Feb, 25, 2015Slide SourcesRaymond J. Mooney University of Texas at Austin
D. Koller, Stanford CS - Probabilistic Graphical Models
![Page 2: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of](https://reader035.vdocument.in/reader035/viewer/2022081514/56649d6e5503460f94a5008d/html5/thumbnails/2.jpg)
CPSC 422, Lecture 17 2
Lecture Overview
Probabilistic Graphical models• Recap Markov Networks• Applications of Markov Networks• Inference in Markov Networks (Exact and
Approx.)
• Conditional Random Fields
![Page 3: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of](https://reader035.vdocument.in/reader035/viewer/2022081514/56649d6e5503460f94a5008d/html5/thumbnails/3.jpg)
Parameterization of Markov Networks
CPSC 422, Lecture 17 Slide 3
Factors define the local interactions (like CPTs in Bnets)
What about the global model? What do you do with Bnets?
X
X
![Page 4: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of](https://reader035.vdocument.in/reader035/viewer/2022081514/56649d6e5503460f94a5008d/html5/thumbnails/4.jpg)
How do we combine local models?
As in BNets by multiplying them!
CPSC 422, Lecture 17 Slide 4
![Page 5: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of](https://reader035.vdocument.in/reader035/viewer/2022081514/56649d6e5503460f94a5008d/html5/thumbnails/5.jpg)
Step Back…. From structure to factors/potentials
In a Bnet the joint is factorized….
CPSC 422, Lecture 17 Slide 5
In a Markov Network you have one factor for each maximal clique
![Page 6: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of](https://reader035.vdocument.in/reader035/viewer/2022081514/56649d6e5503460f94a5008d/html5/thumbnails/6.jpg)
CPSC 422, Lecture 17 6
General definitionsTwo nodes in a Markov network are
independent if and only if every path between them is cut off by evidence
So the markov blanket of a node is…?
eg for C
eg for A C
![Page 7: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of](https://reader035.vdocument.in/reader035/viewer/2022081514/56649d6e5503460f94a5008d/html5/thumbnails/7.jpg)
CPSC 422, Lecture 17 7
Lecture Overview
Probabilistic Graphical models• Recap Markov Networks• Applications of Markov Networks• Inference in Markov Networks (Exact and
Approx.)
• Conditional Random Fields
![Page 8: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of](https://reader035.vdocument.in/reader035/viewer/2022081514/56649d6e5503460f94a5008d/html5/thumbnails/8.jpg)
8
Markov Networks Applications (1): Computer Vision
Called Markov Random Fields• Stereo Reconstruction• Image Segmentation• Object recognition
CPSC 422, Lecture 17
Typically pairwise MRF• Each vars correspond to a pixel (or
superpixel )• Edges (factors) correspond to interactions
between adjacent pixels in the image• E.g., in segmentation: from generically
penalize discontinuities, to road under car
![Page 9: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of](https://reader035.vdocument.in/reader035/viewer/2022081514/56649d6e5503460f94a5008d/html5/thumbnails/9.jpg)
CPSC 422, Lecture 17 9
Image segmentation
![Page 10: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of](https://reader035.vdocument.in/reader035/viewer/2022081514/56649d6e5503460f94a5008d/html5/thumbnails/10.jpg)
10
Markov Networks Applications (2): Sequence Labeling in NLP and
BioInformaticsConditional random fields
CPSC 422, Lecture 17
![Page 11: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of](https://reader035.vdocument.in/reader035/viewer/2022081514/56649d6e5503460f94a5008d/html5/thumbnails/11.jpg)
CPSC 422, Lecture 17 11
Lecture Overview
Probabilistic Graphical models• Recap Markov Networks• Applications of Markov Networks• Inference in Markov Networks (Exact and
Approx.)
• Conditional Random Fields
![Page 12: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of](https://reader035.vdocument.in/reader035/viewer/2022081514/56649d6e5503460f94a5008d/html5/thumbnails/12.jpg)
CPSC 422, Lecture 17 Slide 12
Variable elimination algorithm for Bnets
To compute P(Z| Y1=v1 ,… ,Yj=vj ) :1. Construct a factor for each conditional probability.
2. Set the observed variables to their observed values.
3. Given an elimination ordering, simplify/decompose sum of products
4. Perform products and sum out Zi
5. Multiply the remaining factors Z
6. Normalize: divide the resulting factor f(Z) by Z f(Z) .Variable elimination algorithm
for Markov Networks…..
![Page 13: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of](https://reader035.vdocument.in/reader035/viewer/2022081514/56649d6e5503460f94a5008d/html5/thumbnails/13.jpg)
13
Gibbs sampling for Markov Networks
Example: P(D | C=0)Resample non-evidence
variables in a pre-defined order or a random order
Suppose we begin with A
A
B C
D E
F
A B C D E F
1 0 0 1 1 0
Note: never change evidence!
What do we need to sample?
A. P(A | B=0)
B. P(A | B=0, C=0)
C. P( B=0, C=0| A)
CPSC 422, Lecture 17
![Page 14: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of](https://reader035.vdocument.in/reader035/viewer/2022081514/56649d6e5503460f94a5008d/html5/thumbnails/14.jpg)
CPSC 422, Lecture 17 14
Example: Gibbs sampling
Resample probability distribution of P(A|BC)
A
B C
D E
F
ϕ1
ϕ2 ϕ3
A=1 A=0
C=1 1 2
C=0
3 4
A=1 A=0
B=1 1 5
B=0 4.3 0.2A B C D E F
1 0 0 1 1 0
? 0 0 1 1 0
Φ1 × Φ2 × Φ3 = A=1 A=0
12.9 0.8
Normalized result = A=1 A=0
0.95 0.05
![Page 15: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of](https://reader035.vdocument.in/reader035/viewer/2022081514/56649d6e5503460f94a5008d/html5/thumbnails/15.jpg)
CPSC 422, Lecture 17 15
Example: Gibbs sampling
Resample probability distribution of B given A D
A
B C
D E
F
ϕ1
ϕ2
ϕ4
D=1 D=0
B=1 1 2
B=0
2 1
A=1 A=0
B=1 1 5
B=0
4.3 0.2A B C D E F
1 0 0 1 1 0
1 0 0 1 1 0
1 ? 0 1 1 0
Φ1 × Φ2 × Φ4 = B=1 B=0
1 8.6
Normalized result = B=1 B=0
0.11 0.89
![Page 16: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of](https://reader035.vdocument.in/reader035/viewer/2022081514/56649d6e5503460f94a5008d/html5/thumbnails/16.jpg)
CPSC 422, Lecture 17 16
Lecture Overview
Probabilistic Graphical models• Recap Markov Networks• Applications of Markov Networks• Inference in Markov Networks (Exact and
Approx.)
• Conditional Random Fields
![Page 17: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of](https://reader035.vdocument.in/reader035/viewer/2022081514/56649d6e5503460f94a5008d/html5/thumbnails/17.jpg)
We want to model P(Y1| X1.. Xn)
• Which model is simpler, MN or BN?
CPSC 422, Lecture 18 Slide 17
Y1
X1 X2 … Xn
Y1
X1 X2 … Xn
• Naturally aggregates the influence of different parents
… where all the Xi are always observed MN BN
![Page 18: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of](https://reader035.vdocument.in/reader035/viewer/2022081514/56649d6e5503460f94a5008d/html5/thumbnails/18.jpg)
Conditional Random Fields (CRFs)
• Model P(Y1 .. Yk | X1.. Xn)
• Special case of Markov Networks where all the Xi are always observed
• Simple case P(Y1| X1…Xn)
CPSC 422, Lecture 18 Slide 18
![Page 19: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of](https://reader035.vdocument.in/reader035/viewer/2022081514/56649d6e5503460f94a5008d/html5/thumbnails/19.jpg)
What are the Parameters?
CPSC 422, Lecture 18 Slide 19
![Page 20: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of](https://reader035.vdocument.in/reader035/viewer/2022081514/56649d6e5503460f94a5008d/html5/thumbnails/20.jpg)
Let’s derive the probabilities we need
CPSC 422, Lecture 18 Slide 20
}}1,1{exp{),( 11 YXwYX iiii
}}1{exp{)( 1010 YwY
Y1
X1 X2 … Xn
![Page 21: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of](https://reader035.vdocument.in/reader035/viewer/2022081514/56649d6e5503460f94a5008d/html5/thumbnails/21.jpg)
Let’s derive the probabilities we need
CPSC 422, Lecture 18 Slide 21
}}1,1{exp{),( 11 YXwYX iiii
}}1{exp{)( 1010 YwY
Y1
X1 X2 … Xn
![Page 22: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of](https://reader035.vdocument.in/reader035/viewer/2022081514/56649d6e5503460f94a5008d/html5/thumbnails/22.jpg)
Let’s derive the probabilities we need
CPSC 422, Lecture 18 Slide 22
}}1,1{exp{),( 11 YXwYX iiii
}}1{exp{)( 1010 YwY
Y1
X1 X2 … Xn
0
![Page 23: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of](https://reader035.vdocument.in/reader035/viewer/2022081514/56649d6e5503460f94a5008d/html5/thumbnails/23.jpg)
Let’s derive the probabilities we need
CPSC 422, Lecture 18 Slide 23
n
iiin xwwxxYP
1011 )exp(),....,,1( Y1
X1 X2 … Xn1),....,,0( 11
nxxYP
),....,|1( 11 nxxYP
![Page 24: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of](https://reader035.vdocument.in/reader035/viewer/2022081514/56649d6e5503460f94a5008d/html5/thumbnails/24.jpg)
Let’s derive the probabilities we need
CPSC 422, Lecture 18 Slide 24
n
iiin xwwxxYP
1011 )exp(),....,,1( Y1
X1 X2 … Xn1),....,,0( 11
nxxYP
),....,|1( 11 nxxYP
![Page 25: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of](https://reader035.vdocument.in/reader035/viewer/2022081514/56649d6e5503460f94a5008d/html5/thumbnails/25.jpg)
Sigmoid Function used in Logistic Regression
• Great practical interest
• Number of param wi is linear instead of exponential in the number of parents
• Natural model for many real-world applications
• Naturally aggregates the influence of different parents
CPSC 422, Lecture 18 Slide 25
Y1
X1 X2 … Xn
Y1
X1 X2 … Xn
![Page 26: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of](https://reader035.vdocument.in/reader035/viewer/2022081514/56649d6e5503460f94a5008d/html5/thumbnails/26.jpg)
Logistic Regression as a Markov Net (CRF)
Logistic regression is a simple Markov Net (a CRF) aka naïve markov model
Y
X1 X2… Xn
• But only models the conditional distribution, P(Y | X ) and not the full joint P(X,Y )
CPSC 422, Lecture 18 Slide 26
![Page 27: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of](https://reader035.vdocument.in/reader035/viewer/2022081514/56649d6e5503460f94a5008d/html5/thumbnails/27.jpg)
Naïve Bayes vs. Logistic RegressionY
X1 X2… Xn
Y
X1 X2
…Xn
Naïve Bayes
LogisticRegression (Naïve Markov)
Conditional
Generative
Discriminative
CPSC 422, Lecture 18 Slide 27
![Page 28: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of](https://reader035.vdocument.in/reader035/viewer/2022081514/56649d6e5503460f94a5008d/html5/thumbnails/28.jpg)
CPSC 422, Lecture 18 Slide 28
Learning Goals for today’s class
You can:
• Perform Exact and Approx. Inference in Markov Networks
• Describe a few applications of Markov Networks
• Describe a natural parameterization for a Naïve Markov model (which is a simple CRF)
• Derive how P(Y|X) can be computed for a Naïve Markov model
• Explain the discriminative vs. generative distinction and its implications
![Page 29: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of](https://reader035.vdocument.in/reader035/viewer/2022081514/56649d6e5503460f94a5008d/html5/thumbnails/29.jpg)
CPSC 422, Lecture 18 Slide 29
Next class
Revise generative temporal models (HMM)
To Do
Linear-chain CRFs
Midterm will be marked by Fri
![Page 30: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of](https://reader035.vdocument.in/reader035/viewer/2022081514/56649d6e5503460f94a5008d/html5/thumbnails/30.jpg)
30
Generative vs. Discriminative Models
Generative models (like Naïve Bayes): not directly designed to maximize performance on classification. They model the joint distribution P(X,Y).
Classification is then done using Bayesian inference
But a generative model can also be used to perform any other inference task, e.g. P(X1 | X2, …Xn, )• “Jack of all trades, master of none.”
Discriminative models (like CRFs): specifically designed and trained to maximize performance of classification. They only model the conditional distribution P(Y | X ).
By focusing on modeling the conditional distribution, they generally perform better on classification than generative models when given a reasonable amount of training data.
CPSC 422, Lecture 18
![Page 31: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of](https://reader035.vdocument.in/reader035/viewer/2022081514/56649d6e5503460f94a5008d/html5/thumbnails/31.jpg)
On Fri: Sequence Labeling
Y2
X1 X2… XT
HMM
Linear-chain CRF
Conditional
Generative
Discriminative
Y1 YT
..
Y2
X1 X2… XT
Y1 YT
..
CPSC 422, Lecture 18 Slide 31
![Page 32: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of](https://reader035.vdocument.in/reader035/viewer/2022081514/56649d6e5503460f94a5008d/html5/thumbnails/32.jpg)
CPSC 422, Lecture 18 32
Lecture Overview
• Indicator function• P(X,Y) vs. P(X|Y) and Naïve Bayes • Model P(Y|X) explicitly with Markov Networks
• Parameterization• Inference
• Generative vs. Discriminative models
![Page 33: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of](https://reader035.vdocument.in/reader035/viewer/2022081514/56649d6e5503460f94a5008d/html5/thumbnails/33.jpg)
P(X,Y) vs. P(Y|X)
Assume that you always observe a set of variables
X = {X1…Xn} and you want to predict one or more variables Y = {Y1…Ym}
You can model P(X,Y) and then infer P(Y|X)
Slide 33CPSC 422, Lecture 18
![Page 34: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of](https://reader035.vdocument.in/reader035/viewer/2022081514/56649d6e5503460f94a5008d/html5/thumbnails/34.jpg)
P(X,Y) vs. P(Y|X)
With a Bnet we can represent a joint as the product of Conditional Probabilities
With a Markov Network we can represent a joint a the product of Factors
Slide 34CPSC 422, Lecture 18
We will see that Markov Network are also suitable for representing the conditional prob. P(Y|X) directly
![Page 35: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of](https://reader035.vdocument.in/reader035/viewer/2022081514/56649d6e5503460f94a5008d/html5/thumbnails/35.jpg)
Directed vs. Undirected
CPSC 422, Lecture 18 Slide 35
Factorization
![Page 36: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of](https://reader035.vdocument.in/reader035/viewer/2022081514/56649d6e5503460f94a5008d/html5/thumbnails/36.jpg)
Naïve Bayesian Classifier P(Y,X)A very simple and successful Bnets that allow to
classify entities in a set of classes Y1, given a set of features (X1…Xn)
Example: • Determine whether an email is spam (only two
classes spam=T and spam=F)• Useful attributes of an email ?
Assumptions• The value of each attribute depends on the
classification• (Naïve) The attributes are independent of each
other given the classification P(“bank” | “account” , spam=T) = P(“bank” | spam=T)
Slide 36CPSC 422, Lecture 18
![Page 37: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of](https://reader035.vdocument.in/reader035/viewer/2022081514/56649d6e5503460f94a5008d/html5/thumbnails/37.jpg)
Naïve Bayesian Classifier for Email Spam
Email Spam
Email contains “free”
words
• What is the structure?
Email contains “money”
Email contains “ubc”
Email contains “midterm”
The corresponding Bnet represent : P(Y1, X1…Xn)
Slide 37CPSC 422, Lecture 18
![Page 38: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of](https://reader035.vdocument.in/reader035/viewer/2022081514/56649d6e5503460f94a5008d/html5/thumbnails/38.jpg)
Can we derive : P(Y1| X1…Xn) for any x1…xn
“free money for you now”
NB Classifier for Email Spam: Usage
Email Spam
Email contains “free”
Email contains “money”
Email contains “ubc”
Email contains “midterm”
But you can also perform any other inference…e.g., P(X1| X3 )
Slide 38CPSC 422, Lecture 18
![Page 39: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of](https://reader035.vdocument.in/reader035/viewer/2022081514/56649d6e5503460f94a5008d/html5/thumbnails/39.jpg)
Can we derive : P(Y1| X1…Xn)
“free money for you now”
NB Classifier for Email Spam: Usage
Email Spam
Email contains “free”
Email contains “money”
Email contains “ubc”
Email contains “midterm”
But you can perform also any other inferencee.g., P(X1| X3 )
Slide 39CPSC 422, Lecture 18
![Page 40: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of](https://reader035.vdocument.in/reader035/viewer/2022081514/56649d6e5503460f94a5008d/html5/thumbnails/40.jpg)
CPSC 422, Lecture 18 Slide 40