chapter 6. hidden markov and maximum entropy model
DESCRIPTION
Chapter 6. Hidden Markov and Maximum Entropy Model. Daniel Jurafsky and James H. Martin 2008. Introduction. Maximum Entropy ( MaxEnt ) More widely known as multinomial logistic regression Begin from non-sequential classifier A probabilistic classifier - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Chapter 6. Hidden Markov and Maximum Entropy Model](https://reader033.vdocument.in/reader033/viewer/2022061607/56814521550346895db1e555/html5/thumbnails/1.jpg)
Presented by Jian-Shiun Tzeng 4/9/2009
Chapter 6. Hidden Markov and Maximum Entropy Model
Daniel Jurafsky and James H. Martin2008
![Page 2: Chapter 6. Hidden Markov and Maximum Entropy Model](https://reader033.vdocument.in/reader033/viewer/2022061607/56814521550346895db1e555/html5/thumbnails/2.jpg)
2
Introduction
• Maximum Entropy (MaxEnt)– More widely known as multinomial logistic regression
• Begin from non-sequential classifier– A probabilistic classifier– Exponential or log-linear classifier– Text classification– Sentiment analysis
• Positive or negative opinion
– Sentence boundary
![Page 3: Chapter 6. Hidden Markov and Maximum Entropy Model](https://reader033.vdocument.in/reader033/viewer/2022061607/56814521550346895db1e555/html5/thumbnails/3.jpg)
3
Linear Regression
![Page 4: Chapter 6. Hidden Markov and Maximum Entropy Model](https://reader033.vdocument.in/reader033/viewer/2022061607/56814521550346895db1e555/html5/thumbnails/4.jpg)
4
Linear Regression
• x(j): a particular instance• y(j)
obs: observed label in the training set of x(j)
• y(j)pred: predict value from linear regression model
sum square error
![Page 5: Chapter 6. Hidden Markov and Maximum Entropy Model](https://reader033.vdocument.in/reader033/viewer/2022061607/56814521550346895db1e555/html5/thumbnails/5.jpg)
5
Logistic Regression – simplest case of binary classification
• Consider whether x is in class (1, true) or not (0, false)
w f‧ (-∞,∞)∈
∈ [0,∞)
∈ (-∞,∞)
∈ [0,1]
![Page 6: Chapter 6. Hidden Markov and Maximum Entropy Model](https://reader033.vdocument.in/reader033/viewer/2022061607/56814521550346895db1e555/html5/thumbnails/6.jpg)
6
Logistic Regression – simplest case of binary classification
![Page 7: Chapter 6. Hidden Markov and Maximum Entropy Model](https://reader033.vdocument.in/reader033/viewer/2022061607/56814521550346895db1e555/html5/thumbnails/7.jpg)
7
Logistic Regression – Classification
![Page 8: Chapter 6. Hidden Markov and Maximum Entropy Model](https://reader033.vdocument.in/reader033/viewer/2022061607/56814521550346895db1e555/html5/thumbnails/8.jpg)
8
Advanced: Learning in logistic regression
![Page 9: Chapter 6. Hidden Markov and Maximum Entropy Model](https://reader033.vdocument.in/reader033/viewer/2022061607/56814521550346895db1e555/html5/thumbnails/9.jpg)
9
Maximum Entropy Modeling
• Input: x (a word need to tag or a doc need to classify)– Features
• Ends in –ing• Previous word is “the”
– Each feature fi, weight wi
– Particular class c– Z is a normalizing factor, used to make the prob. sum
to 1
![Page 10: Chapter 6. Hidden Markov and Maximum Entropy Model](https://reader033.vdocument.in/reader033/viewer/2022061607/56814521550346895db1e555/html5/thumbnails/10.jpg)
10
Maximum Entropy Modeling
C = {c1, c2, …, cC}
Normalization
fi: A feature that only takes on the values 0 and 1 is also called an indicator function
In MaxEnt, instead of the notation fi, we will often use the notation fi(c,x), meaning that a feature i for a particular class c for a given observation x
![Page 11: Chapter 6. Hidden Markov and Maximum Entropy Model](https://reader033.vdocument.in/reader033/viewer/2022061607/56814521550346895db1e555/html5/thumbnails/11.jpg)
11
Maximum Entropy ModelingAssume C = {NN, VB}
![Page 12: Chapter 6. Hidden Markov and Maximum Entropy Model](https://reader033.vdocument.in/reader033/viewer/2022061607/56814521550346895db1e555/html5/thumbnails/12.jpg)
12
Learning Maximum Entropy Model
![Page 13: Chapter 6. Hidden Markov and Maximum Entropy Model](https://reader033.vdocument.in/reader033/viewer/2022061607/56814521550346895db1e555/html5/thumbnails/13.jpg)
13
HMM vs. MEMMHMM MEMM
MEMM can condition on any useful feature of the input observation; in HMM this isn’t possible
word
class
![Page 14: Chapter 6. Hidden Markov and Maximum Entropy Model](https://reader033.vdocument.in/reader033/viewer/2022061607/56814521550346895db1e555/html5/thumbnails/14.jpg)
14
Conditional Random Fields (CRFs)
• CRFs (Lafferty, McCallum, et al. 2001) constitute another conditional model based on maximal entropy
• Like MEMM, CRFs are able to accommodate many possibly correlated features of the observation
• However, CRFs are better able to trade off decisions at different sequence positions
• MEMM were found to suffer from the label bias problem
![Page 15: Chapter 6. Hidden Markov and Maximum Entropy Model](https://reader033.vdocument.in/reader033/viewer/2022061607/56814521550346895db1e555/html5/thumbnails/15.jpg)
15
Label Bias
• The problem appears when the MEMM contains states with different output degrees
• Because the probabilities of transitions from any given state must sum to 1, transitions from lower degree states receive higher probabilities than transitions from higher degree states
• In the extreme case, transition from a state with degree 1 always gets probability 1, effectively ignoring the observation
• CRFs do not have this problem because they define a single ME-based distribution over the whole label sequence