maximum entropy markov model adapted from: heshaam faili university of tehran 100050052 – dikkala...
TRANSCRIPT
![Page 1: MAXIMUM ENTROPY MARKOV MODEL Adapted From: Heshaam Faili University of Tehran 100050052 – Dikkala Sai Nishanth 100050056 – Ashwin P. Paranjape 100050057](https://reader035.vdocument.in/reader035/viewer/2022062809/5697bfb71a28abf838c9ef7e/html5/thumbnails/1.jpg)
MAXIMUM ENTROPY MARKOV MODEL
Adapted From: Heshaam FailiUniversity of Tehran
100050052 – Dikkala Sai Nishanth100050056 – Ashwin P. Paranjape100050057 – Vipul Singh
![Page 2: MAXIMUM ENTROPY MARKOV MODEL Adapted From: Heshaam Faili University of Tehran 100050052 – Dikkala Sai Nishanth 100050056 – Ashwin P. Paranjape 100050057](https://reader035.vdocument.in/reader035/viewer/2022062809/5697bfb71a28abf838c9ef7e/html5/thumbnails/2.jpg)
Introduction
• Need for MEMM in NLP
• MEMM and the feature and weight vectors
• Linear and Logistic Regression (MEMM)
• Learning in logistic regression
• Why call it Maximum Entropy?2
![Page 3: MAXIMUM ENTROPY MARKOV MODEL Adapted From: Heshaam Faili University of Tehran 100050052 – Dikkala Sai Nishanth 100050056 – Ashwin P. Paranjape 100050057](https://reader035.vdocument.in/reader035/viewer/2022062809/5697bfb71a28abf838c9ef7e/html5/thumbnails/3.jpg)
Need for MEMM in NLP
3
![Page 4: MAXIMUM ENTROPY MARKOV MODEL Adapted From: Heshaam Faili University of Tehran 100050052 – Dikkala Sai Nishanth 100050056 – Ashwin P. Paranjape 100050057](https://reader035.vdocument.in/reader035/viewer/2022062809/5697bfb71a28abf838c9ef7e/html5/thumbnails/4.jpg)
• HMM – tag and observed word both depend only on previous tag
• Need to account for dependency of tag on observed word
• Need to extract “Features” from word & use
4
![Page 5: MAXIMUM ENTROPY MARKOV MODEL Adapted From: Heshaam Faili University of Tehran 100050052 – Dikkala Sai Nishanth 100050056 – Ashwin P. Paranjape 100050057](https://reader035.vdocument.in/reader035/viewer/2022062809/5697bfb71a28abf838c9ef7e/html5/thumbnails/5.jpg)
MAXIMUM ENTROPY MODELS• Machine learning framework called Maximum Entropy modeling, MAXEnt• Used for Classification
– The task of classification is to take a single observation, extract some useful features describing the observation, and then based on these features, to classify the observation into one of a set of discrete classes.
• Probabilistic classifier: gives the probability of the observation being in that class
• Non-sequential classification– in text classification we might need to decide whether a particular email
should be classified as spam or not– In sentiment analysis we have to determine whether a particular sentence or
document expresses a positive or negative opinion.– we’ll need to classify a period character (‘.’) as either a sentence boundary or
not
5
![Page 6: MAXIMUM ENTROPY MARKOV MODEL Adapted From: Heshaam Faili University of Tehran 100050052 – Dikkala Sai Nishanth 100050056 – Ashwin P. Paranjape 100050057](https://reader035.vdocument.in/reader035/viewer/2022062809/5697bfb71a28abf838c9ef7e/html5/thumbnails/6.jpg)
MaxEnt
• MaxEnt belongs to the family of classifiers known as the exponential or log-linear classifiers
• MaxEnt works by extracting some set of features from the input, combining them linearly (meaning that we multiply each by a weight and then add them up), and then using this sum as an exponent
• Example: tagging– A feature for tagging might be this word ends in -ing or the
previous word was ‘the’
6
![Page 7: MAXIMUM ENTROPY MARKOV MODEL Adapted From: Heshaam Faili University of Tehran 100050052 – Dikkala Sai Nishanth 100050056 – Ashwin P. Paranjape 100050057](https://reader035.vdocument.in/reader035/viewer/2022062809/5697bfb71a28abf838c9ef7e/html5/thumbnails/7.jpg)
Linear Regression
• Two different names for tasks that map some input features into some output value: regression when the output is real-valued, and classification when the output is one of a discrete set of classes
7
![Page 8: MAXIMUM ENTROPY MARKOV MODEL Adapted From: Heshaam Faili University of Tehran 100050052 – Dikkala Sai Nishanth 100050056 – Ashwin P. Paranjape 100050057](https://reader035.vdocument.in/reader035/viewer/2022062809/5697bfb71a28abf838c9ef7e/html5/thumbnails/8.jpg)
8
price = w0+w1 Num Adjectives∗
![Page 9: MAXIMUM ENTROPY MARKOV MODEL Adapted From: Heshaam Faili University of Tehran 100050052 – Dikkala Sai Nishanth 100050056 – Ashwin P. Paranjape 100050057](https://reader035.vdocument.in/reader035/viewer/2022062809/5697bfb71a28abf838c9ef7e/html5/thumbnails/9.jpg)
Multiple linear regression
• price=w0+w1 Num Adjectives+∗ w2 Mortgage Rate+∗ w3 Num Unsold Houses∗
9
![Page 10: MAXIMUM ENTROPY MARKOV MODEL Adapted From: Heshaam Faili University of Tehran 100050052 – Dikkala Sai Nishanth 100050056 – Ashwin P. Paranjape 100050057](https://reader035.vdocument.in/reader035/viewer/2022062809/5697bfb71a28abf838c9ef7e/html5/thumbnails/10.jpg)
Learning in linear regression
• sum-squared error
• X is matrix of feature vectors• y is vector of costs
10
![Page 11: MAXIMUM ENTROPY MARKOV MODEL Adapted From: Heshaam Faili University of Tehran 100050052 – Dikkala Sai Nishanth 100050056 – Ashwin P. Paranjape 100050057](https://reader035.vdocument.in/reader035/viewer/2022062809/5697bfb71a28abf838c9ef7e/html5/thumbnails/11.jpg)
Logistic regression
• Classification in which the output y we are trying to predict takes on one from a small set of discrete values
• binary classification:
• Odds
• logit function11
![Page 12: MAXIMUM ENTROPY MARKOV MODEL Adapted From: Heshaam Faili University of Tehran 100050052 – Dikkala Sai Nishanth 100050056 – Ashwin P. Paranjape 100050057](https://reader035.vdocument.in/reader035/viewer/2022062809/5697bfb71a28abf838c9ef7e/html5/thumbnails/12.jpg)
Logistic regression
12
![Page 13: MAXIMUM ENTROPY MARKOV MODEL Adapted From: Heshaam Faili University of Tehran 100050052 – Dikkala Sai Nishanth 100050056 – Ashwin P. Paranjape 100050057](https://reader035.vdocument.in/reader035/viewer/2022062809/5697bfb71a28abf838c9ef7e/html5/thumbnails/13.jpg)
Logistic regression
13
![Page 14: MAXIMUM ENTROPY MARKOV MODEL Adapted From: Heshaam Faili University of Tehran 100050052 – Dikkala Sai Nishanth 100050056 – Ashwin P. Paranjape 100050057](https://reader035.vdocument.in/reader035/viewer/2022062809/5697bfb71a28abf838c9ef7e/html5/thumbnails/14.jpg)
Logistic regression: Classification
14
hyperplane
![Page 15: MAXIMUM ENTROPY MARKOV MODEL Adapted From: Heshaam Faili University of Tehran 100050052 – Dikkala Sai Nishanth 100050056 – Ashwin P. Paranjape 100050057](https://reader035.vdocument.in/reader035/viewer/2022062809/5697bfb71a28abf838c9ef7e/html5/thumbnails/15.jpg)
Learning in logistic regression
15
conditional maximum likelihood
estimation.
![Page 16: MAXIMUM ENTROPY MARKOV MODEL Adapted From: Heshaam Faili University of Tehran 100050052 – Dikkala Sai Nishanth 100050056 – Ashwin P. Paranjape 100050057](https://reader035.vdocument.in/reader035/viewer/2022062809/5697bfb71a28abf838c9ef7e/html5/thumbnails/16.jpg)
Learning in logistic regression
Convex Optimization
16
PS – GIS, if needed will be inserted here.
![Page 17: MAXIMUM ENTROPY MARKOV MODEL Adapted From: Heshaam Faili University of Tehran 100050052 – Dikkala Sai Nishanth 100050056 – Ashwin P. Paranjape 100050057](https://reader035.vdocument.in/reader035/viewer/2022062809/5697bfb71a28abf838c9ef7e/html5/thumbnails/17.jpg)
MAXIMUM ENTROPY MODELING
• multinomial logistic regression(MaxEnt)– Most of the time, classification problems that
come up in language processing involve larger numbers of classes (part-of-speech classes)
• y is a value take on C different value corresponding to classes C1,…,Cn
17
![Page 18: MAXIMUM ENTROPY MARKOV MODEL Adapted From: Heshaam Faili University of Tehran 100050052 – Dikkala Sai Nishanth 100050056 – Ashwin P. Paranjape 100050057](https://reader035.vdocument.in/reader035/viewer/2022062809/5697bfb71a28abf838c9ef7e/html5/thumbnails/18.jpg)
Maximum Entropy Modeling
• Indicator function: A feature that only takes on the values 0 and 1
18
![Page 19: MAXIMUM ENTROPY MARKOV MODEL Adapted From: Heshaam Faili University of Tehran 100050052 – Dikkala Sai Nishanth 100050056 – Ashwin P. Paranjape 100050057](https://reader035.vdocument.in/reader035/viewer/2022062809/5697bfb71a28abf838c9ef7e/html5/thumbnails/19.jpg)
Maximum Entropy Modeling• Example
– Secretariat/NNP is/BEZ expected/VBN to/TO race/?? tomorrow/
19
![Page 20: MAXIMUM ENTROPY MARKOV MODEL Adapted From: Heshaam Faili University of Tehran 100050052 – Dikkala Sai Nishanth 100050056 – Ashwin P. Paranjape 100050057](https://reader035.vdocument.in/reader035/viewer/2022062809/5697bfb71a28abf838c9ef7e/html5/thumbnails/20.jpg)
Maximum Entropy Modeling
20
![Page 21: MAXIMUM ENTROPY MARKOV MODEL Adapted From: Heshaam Faili University of Tehran 100050052 – Dikkala Sai Nishanth 100050056 – Ashwin P. Paranjape 100050057](https://reader035.vdocument.in/reader035/viewer/2022062809/5697bfb71a28abf838c9ef7e/html5/thumbnails/21.jpg)
The Occam Razor
• Adopting the least complex hypothesis possible is embodied in Occam's razor ("Nunquam ponenda est pluralitas sine necesitate.')
21
![Page 22: MAXIMUM ENTROPY MARKOV MODEL Adapted From: Heshaam Faili University of Tehran 100050052 – Dikkala Sai Nishanth 100050056 – Ashwin P. Paranjape 100050057](https://reader035.vdocument.in/reader035/viewer/2022062809/5697bfb71a28abf838c9ef7e/html5/thumbnails/22.jpg)
Why do we call it Maximum Entropy?
• From of all possible distributions, the equiprobable distribution has the maximum entropy
22
![Page 23: MAXIMUM ENTROPY MARKOV MODEL Adapted From: Heshaam Faili University of Tehran 100050052 – Dikkala Sai Nishanth 100050056 – Ashwin P. Paranjape 100050057](https://reader035.vdocument.in/reader035/viewer/2022062809/5697bfb71a28abf838c9ef7e/html5/thumbnails/23.jpg)
Why do we call it Maximum Entropy?
23
![Page 24: MAXIMUM ENTROPY MARKOV MODEL Adapted From: Heshaam Faili University of Tehran 100050052 – Dikkala Sai Nishanth 100050056 – Ashwin P. Paranjape 100050057](https://reader035.vdocument.in/reader035/viewer/2022062809/5697bfb71a28abf838c9ef7e/html5/thumbnails/24.jpg)
Maximum Entropy
probability distribution of a multinomial logistic regression model whose weights W maximize the likelihood of the training data! Thus the exponential model
24
![Page 25: MAXIMUM ENTROPY MARKOV MODEL Adapted From: Heshaam Faili University of Tehran 100050052 – Dikkala Sai Nishanth 100050056 – Ashwin P. Paranjape 100050057](https://reader035.vdocument.in/reader035/viewer/2022062809/5697bfb71a28abf838c9ef7e/html5/thumbnails/25.jpg)
References• Jurafsky, Daniel and Martin, James H. (2006) Speech and Language
Processing: An introduction to natural language processing, computational linguistics, and speech recognition. Prentice-Hall.
• Adam L. Berger, Vincent J. Della Pietra, and Stephen A. Della Pietra. 1996. A maximum entropy approach to natural language processing. Comput. Linguist. 22, 1 (March 1996), 39-71.
• Ratnaparkhi A. 1996. A Maximum Entropy Model for Part-of-Speech Tagging. Proceedings of the Empirical Methods in Natural Language Processing (1996), pp. 133-142
25
![Page 26: MAXIMUM ENTROPY MARKOV MODEL Adapted From: Heshaam Faili University of Tehran 100050052 – Dikkala Sai Nishanth 100050056 – Ashwin P. Paranjape 100050057](https://reader035.vdocument.in/reader035/viewer/2022062809/5697bfb71a28abf838c9ef7e/html5/thumbnails/26.jpg)
THANK YOU
26