natural language processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · natural...
Post on 13-Aug-2020
0 Views
Preview:
TRANSCRIPT
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Natural Language Processing
Nov 19, 2019
1 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
2 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Who wrote the Federalist Papers?
1787-8: anonymous essays try to convince New York toratify U.S Constitution: Jay, Madison, Hamilton.
Authorship of 12 of the letters in dispute.
1963: solved by Mosteller and Wallace using Bayesianmethods.
By the end of this lecture we will see how to do that.
3 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Who wrote the Federalist Papers?
1787-8: anonymous essays try to convince New York toratify U.S Constitution: Jay, Madison, Hamilton.
Authorship of 12 of the letters in dispute.
1963: solved by Mosteller and Wallace using Bayesianmethods.
By the end of this lecture we will see how to do that.
3 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
What makes it hard?
Formal languages are:
unambiguous
Natural languages areambiguous:
“He saw her duck”.“Time flies like an arrow. Fruit flies like a banana”
4 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
By the end of the class
By the end of the class we will see how to do:
1 Text Classification. E.g. Spam detection, Authorshipidentification.
2 Spell Correction. E.g. Auto-correct.
3 Word suggestion.
5 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Regular Expressions
A formal language for specifying text strings.
6 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Notations
• Disjunctions [] :
Pattern Matches[Ww ]oodchuck woodchuck, Woodchuck[0123456789] Any single digit
• Disjunctions |:
Pattern Matchesabc|def Find ‘abc’ or ‘def’.a|b|ab Find ‘a’ or ‘b’ or ‘ab’. Example: ‘abc’
7 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Notations
• Ranges:
Pattern Matches[A− Z ] An uppercase letter.[a− z ] A lowercase letter.[0− 9] A single digit.
• Negation ˆ. (Note: Carat means negation only when its firstin [])
Pattern Matches[ˆA− Z ] Not upper case
[ˆSs] Not ‘S’ nor ‘s’[ˆeˆ] Not ‘e’ nor ‘ˆ’aˆb Search for the pattern‘aˆb’
8 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Notations (? * . + ˆ $)
? 0 or 1 of previous character* 0 or more of previous character+ 1 or more of previous character. Any characterˆ Start anchor$ End anchor\ Escape character
9 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Examples
• Pattern: ˆ[A− Z ].Which of them are matches? “Class”, “cSCE”, “420”.
Class.• Pattern: ˆ[ˆA− Z ].Which of them are matches? “Class”, “cSCE”, “420”.cSCE, 420.• Pattern: .$Which of them are matches? “end”, “end?”, “end!”, “end.”.end. end? end! end.• Pattern: \.$Which of them are matches? “end”, “end?”, “end!”, “end.”.end..
10 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Examples
• Pattern: ˆ[A− Z ].Which of them are matches? “Class”, “cSCE”, “420”.Class.
• Pattern: ˆ[ˆA− Z ].Which of them are matches? “Class”, “cSCE”, “420”.cSCE, 420.• Pattern: .$Which of them are matches? “end”, “end?”, “end!”, “end.”.end. end? end! end.• Pattern: \.$Which of them are matches? “end”, “end?”, “end!”, “end.”.end..
10 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Examples
• Pattern: ˆ[A− Z ].Which of them are matches? “Class”, “cSCE”, “420”.Class.• Pattern: ˆ[ˆA− Z ].Which of them are matches? “Class”, “cSCE”, “420”.
cSCE, 420.• Pattern: .$Which of them are matches? “end”, “end?”, “end!”, “end.”.end. end? end! end.• Pattern: \.$Which of them are matches? “end”, “end?”, “end!”, “end.”.end..
10 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Examples
• Pattern: ˆ[A− Z ].Which of them are matches? “Class”, “cSCE”, “420”.Class.• Pattern: ˆ[ˆA− Z ].Which of them are matches? “Class”, “cSCE”, “420”.cSCE, 420.
• Pattern: .$Which of them are matches? “end”, “end?”, “end!”, “end.”.end. end? end! end.• Pattern: \.$Which of them are matches? “end”, “end?”, “end!”, “end.”.end..
10 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Examples
• Pattern: ˆ[A− Z ].Which of them are matches? “Class”, “cSCE”, “420”.Class.• Pattern: ˆ[ˆA− Z ].Which of them are matches? “Class”, “cSCE”, “420”.cSCE, 420.• Pattern: .$Which of them are matches? “end”, “end?”, “end!”, “end.”.
end. end? end! end.• Pattern: \.$Which of them are matches? “end”, “end?”, “end!”, “end.”.end..
10 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Examples
• Pattern: ˆ[A− Z ].Which of them are matches? “Class”, “cSCE”, “420”.Class.• Pattern: ˆ[ˆA− Z ].Which of them are matches? “Class”, “cSCE”, “420”.cSCE, 420.• Pattern: .$Which of them are matches? “end”, “end?”, “end!”, “end.”.end. end? end! end.
• Pattern: \.$Which of them are matches? “end”, “end?”, “end!”, “end.”.end..
10 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Examples
• Pattern: ˆ[A− Z ].Which of them are matches? “Class”, “cSCE”, “420”.Class.• Pattern: ˆ[ˆA− Z ].Which of them are matches? “Class”, “cSCE”, “420”.cSCE, 420.• Pattern: .$Which of them are matches? “end”, “end?”, “end!”, “end.”.end. end? end! end.• Pattern: \.$Which of them are matches? “end”, “end?”, “end!”, “end.”.
end..
10 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Examples
• Pattern: ˆ[A− Z ].Which of them are matches? “Class”, “cSCE”, “420”.Class.• Pattern: ˆ[ˆA− Z ].Which of them are matches? “Class”, “cSCE”, “420”.cSCE, 420.• Pattern: .$Which of them are matches? “end”, “end?”, “end!”, “end.”.end. end? end! end.• Pattern: \.$Which of them are matches? “end”, “end?”, “end!”, “end.”.end..
10 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Examples
• Pattern: colou?r .Which of them are matches? “color”, “colour”, “colouur”.
color, colour.• Pattern: colou + r .Which of them are matches? “color”, “colour”, “colouur”.colour, colouur.• Pattern: colou ∗ r .Which of them are matches? “color”, “colour”, “colouur”.color, colour, colouur.• Pattern: colou.r .Which of them are matches? “color”, “colour”, “colouur”.colouur.
11 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Examples
• Pattern: colou?r .Which of them are matches? “color”, “colour”, “colouur”.color, colour.
• Pattern: colou + r .Which of them are matches? “color”, “colour”, “colouur”.colour, colouur.• Pattern: colou ∗ r .Which of them are matches? “color”, “colour”, “colouur”.color, colour, colouur.• Pattern: colou.r .Which of them are matches? “color”, “colour”, “colouur”.colouur.
11 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Examples
• Pattern: colou?r .Which of them are matches? “color”, “colour”, “colouur”.color, colour.• Pattern: colou + r .Which of them are matches? “color”, “colour”, “colouur”.
colour, colouur.• Pattern: colou ∗ r .Which of them are matches? “color”, “colour”, “colouur”.color, colour, colouur.• Pattern: colou.r .Which of them are matches? “color”, “colour”, “colouur”.colouur.
11 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Examples
• Pattern: colou?r .Which of them are matches? “color”, “colour”, “colouur”.color, colour.• Pattern: colou + r .Which of them are matches? “color”, “colour”, “colouur”.colour, colouur.
• Pattern: colou ∗ r .Which of them are matches? “color”, “colour”, “colouur”.color, colour, colouur.• Pattern: colou.r .Which of them are matches? “color”, “colour”, “colouur”.colouur.
11 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Examples
• Pattern: colou?r .Which of them are matches? “color”, “colour”, “colouur”.color, colour.• Pattern: colou + r .Which of them are matches? “color”, “colour”, “colouur”.colour, colouur.• Pattern: colou ∗ r .Which of them are matches? “color”, “colour”, “colouur”.
color, colour, colouur.• Pattern: colou.r .Which of them are matches? “color”, “colour”, “colouur”.colouur.
11 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Examples
• Pattern: colou?r .Which of them are matches? “color”, “colour”, “colouur”.color, colour.• Pattern: colou + r .Which of them are matches? “color”, “colour”, “colouur”.colour, colouur.• Pattern: colou ∗ r .Which of them are matches? “color”, “colour”, “colouur”.color, colour, colouur.
• Pattern: colou.r .Which of them are matches? “color”, “colour”, “colouur”.colouur.
11 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Examples
• Pattern: colou?r .Which of them are matches? “color”, “colour”, “colouur”.color, colour.• Pattern: colou + r .Which of them are matches? “color”, “colour”, “colouur”.colour, colouur.• Pattern: colou ∗ r .Which of them are matches? “color”, “colour”, “colouur”.color, colour, colouur.• Pattern: colou.r .Which of them are matches? “color”, “colour”, “colouur”.
colouur.
11 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Examples
• Pattern: colou?r .Which of them are matches? “color”, “colour”, “colouur”.color, colour.• Pattern: colou + r .Which of them are matches? “color”, “colour”, “colouur”.colour, colouur.• Pattern: colou ∗ r .Which of them are matches? “color”, “colour”, “colouur”.color, colour, colouur.• Pattern: colou.r .Which of them are matches? “color”, “colour”, “colouur”.colouur.
11 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Example
We need to find instances of “the” in a text.
the × ‘The’
[Tt]he × ‘Theology’
[ˆA− Za− z ][Tt]he[ˆA− Za− z ]
12 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Example
We need to find instances of “the” in a text.
the
× ‘The’
[Tt]he × ‘Theology’
[ˆA− Za− z ][Tt]he[ˆA− Za− z ]
12 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Example
We need to find instances of “the” in a text.
the × ‘The’
[Tt]he × ‘Theology’
[ˆA− Za− z ][Tt]he[ˆA− Za− z ]
12 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Example
We need to find instances of “the” in a text.
the × ‘The’
[Tt]he
× ‘Theology’
[ˆA− Za− z ][Tt]he[ˆA− Za− z ]
12 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Example
We need to find instances of “the” in a text.
the × ‘The’
[Tt]he × ‘Theology’
[ˆA− Za− z ][Tt]he[ˆA− Za− z ]
12 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Example
We need to find instances of “the” in a text.
the × ‘The’
[Tt]he × ‘Theology’
[ˆA− Za− z ][Tt]he[ˆA− Za− z ]
12 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Text classification
Assigning subject categories, topics, or genres.
Spam detection.
Authorship identification.
Age/gender identification.
Language Identification.
· · ·
13 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Text Classification
Inputs:
Document d.Fixed set of classes C = {c1, c2, · · · , cn}.
Output:
A predicted class c ∈ C
14 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Naive Bayes
Relies on simple representation of document – Bag of Words.
For a document d and a class c
P(c |d) =P(d |c)P(c)
P(d)
15 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Naive Bayes
Relies on simple representation of document – Bag of Words.
For a document d and a class c
P(c |d) =P(d |c)P(c)
P(d)
15 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Naive Bayes Classifier
cMAP = argmaxc∈C
P(c |d)
MAP - Maximum a posteriori (most likely class).
cMAP = argmaxc∈C
P(d |c)P(c)
P(d)
cMAP = argmaxc∈C
P(d |c)P(c)
16 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Naive Bayes Classifier
cMAP = argmaxc∈C
P(d |c)P(c)
Let’s say that the document is represented by n featuresx1, x2, · · · xn
cMAP = argmaxc∈C
P(x1, x2, · · · xn|c)P(c)
17 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Assumptions
Bag of words: Position of words does not matter.Conditional Independence: The feature probabilities P(xi |c)are independent given the class c .
P(x1, x2, · · · xn|c) =n∏
i=1
P(xi |c)
18 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Bag of word representation
19 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Bag of word representation
20 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Naive Bayes: Learning
What do we need?Training set of m hand-labeled documents(d1, c1), · · · , (dm, cm)
21 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Naive Bayes: Learning
Let ND be the number of documents, and Ncj be the numberof documents present in class cj .Let Vcj be the set of all words in the documents of class cjNow we find the maximum likelihood estimates:
P̂(cj) =Ncj
ND
P̂(wi |cj) =count(wi , cj)∑
w∈Vcjcount(w , cj)
Now we can classify a document d by:
cd = argmaxcj∈C
P̂(cj)∏wi∈d
P̂(wi |cj)
22 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Naive Bayes: Learning
What if we come across an unknown word in the document d .Let wu be the unknown word P̂(wu|cj) = 0,∀cj .
23 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Laplace smoothing
Let V be the set of all words in the test documents, i.e,V = ∪cjVcj
Add one word for the unknown word in the vocabulary.
P̂(wi |cj) =count(wi , cj) + 1∑
w∈Vcjcount(w , cj) + |V |+ 1
So, for all unknown words, we have:
P̂(wu|cj) =1∑
w∈Vcjcount(w , cj) + |V |+ 1
24 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Example
Training set:
# Text Class1 Carla Betty Carla a2 Carla Carla Suzanne a3 Carla Matt a4 Taylor Jessica Carla b
P̂(a) = 3/4P̂(b) = 1/4
25 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Example
Training set:
# Text Class1 Carla Betty Carla a2 Carla Carla Suzanne a3 Carla Matt a4 Taylor Jessica Carla b
P̂(a) =
3/4P̂(b) = 1/4
25 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Example
Training set:
# Text Class1 Carla Betty Carla a2 Carla Carla Suzanne a3 Carla Matt a4 Taylor Jessica Carla b
P̂(a) = 3/4P̂(b) =
1/4
25 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Example
Training set:
# Text Class1 Carla Betty Carla a2 Carla Carla Suzanne a3 Carla Matt a4 Taylor Jessica Carla b
P̂(a) = 3/4P̂(b) = 1/4
25 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Example
Training set:
# Text Class1 Carla Betty Carla a2 Carla Carla Suzanne a3 Carla Matt a4 Taylor Jessica Carla b
P̂(Carla|a) =(5 + 1)/(8 + 6 + 1) = 6/15P̂(Taylor |a) = (0 + 1)/(8 + 6 + 1) = 1/15
26 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Example
Training set:
# Text Class1 Carla Betty Carla a2 Carla Carla Suzanne a3 Carla Matt a4 Taylor Jessica Carla b
P̂(Carla|a) =
(5 + 1)/(8 + 6 + 1) = 6/15P̂(Taylor |a) = (0 + 1)/(8 + 6 + 1) = 1/15
26 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Example
Training set:
# Text Class1 Carla Betty Carla a2 Carla Carla Suzanne a3 Carla Matt a4 Taylor Jessica Carla b
P̂(Carla|a) =(5 + 1)/(8 + 6 + 1) = 6/15P̂(Taylor |a) =
(0 + 1)/(8 + 6 + 1) = 1/15
26 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Example
Training set:
# Text Class1 Carla Betty Carla a2 Carla Carla Suzanne a3 Carla Matt a4 Taylor Jessica Carla b
P̂(Carla|a) =(5 + 1)/(8 + 6 + 1) = 6/15P̂(Taylor |a) = (0 + 1)/(8 + 6 + 1) = 1/15
26 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Example
Document d5: Carla Carla Carla Taylor Jessica
P̂(a) = 3/4 and P̂(b) = 1/4P̂(Carla|a) = 6/15, P̂(Carla|b) = 2/10P̂(Taylor |a) = 1/15, P̂(Taylor |b) = 2/10P̂(Jessica|a) = 1/15, P̂(Jessica|b) = 2/10
P(a|d5) = 3/4× (6/15)3 × 1/15× 1/15 ≈ 0.0002P(b|d5) = 1/4× (2/10)3 × 2/10× 2/10 ≈ 0.00008
27 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Example
Document d5: Carla Carla Carla Taylor Jessica
P̂(a) = 3/4 and P̂(b) = 1/4P̂(Carla|a) = 6/15, P̂(Carla|b) = 2/10P̂(Taylor |a) = 1/15, P̂(Taylor |b) = 2/10P̂(Jessica|a) = 1/15, P̂(Jessica|b) = 2/10
P(a|d5) = 3/4× (6/15)3 × 1/15× 1/15 ≈ 0.0002P(b|d5) = 1/4× (2/10)3 × 2/10× 2/10 ≈ 0.00008
27 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Example
Document d5: Carla Carla Carla Taylor Jessica
P̂(a) = 3/4 and P̂(b) = 1/4P̂(Carla|a) = 6/15, P̂(Carla|b) = 2/10P̂(Taylor |a) = 1/15, P̂(Taylor |b) = 2/10P̂(Jessica|a) = 1/15, P̂(Jessica|b) = 2/10
P(a|d5) =
3/4× (6/15)3 × 1/15× 1/15 ≈ 0.0002P(b|d5) = 1/4× (2/10)3 × 2/10× 2/10 ≈ 0.00008
27 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Example
Document d5: Carla Carla Carla Taylor Jessica
P̂(a) = 3/4 and P̂(b) = 1/4P̂(Carla|a) = 6/15, P̂(Carla|b) = 2/10P̂(Taylor |a) = 1/15, P̂(Taylor |b) = 2/10P̂(Jessica|a) = 1/15, P̂(Jessica|b) = 2/10
P(a|d5) = 3/4× (6/15)3 × 1/15× 1/15 ≈ 0.0002P(b|d5) =
1/4× (2/10)3 × 2/10× 2/10 ≈ 0.00008
27 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Example
Document d5: Carla Carla Carla Taylor Jessica
P̂(a) = 3/4 and P̂(b) = 1/4P̂(Carla|a) = 6/15, P̂(Carla|b) = 2/10P̂(Taylor |a) = 1/15, P̂(Taylor |b) = 2/10P̂(Jessica|a) = 1/15, P̂(Jessica|b) = 2/10
P(a|d5) = 3/4× (6/15)3 × 1/15× 1/15 ≈ 0.0002P(b|d5) = 1/4× (2/10)3 × 2/10× 2/10 ≈ 0.00008
27 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Naive Bayes
Naive Bayes is not so naive!!
Robust to Irrelevant Features.
Optimal if the independence assumptions hold.
A good dependable baseline for text classification. - Thereexists other classifiers that give better accuracy
28 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Federalist Papers
Discussion: Federalist papers.E.g. What training set do we need?
29 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Language Modelling
Goal: Assign probability to a sentence.
Why?
Machine Translation. P(high winds tonight) > P(largewinds tonight)
Spell Correction. P(about fifteen minutes from) >P(about fifteen minuets from)
Speech Recognition. P(I saw a van) > P(eyes awe of an)
30 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Language Modelling
Goal: Assign probability to a sentence.Why?
Machine Translation. P(high winds tonight) > P(largewinds tonight)
Spell Correction. P(about fifteen minutes from) >P(about fifteen minuets from)
Speech Recognition. P(I saw a van) > P(eyes awe of an)
30 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Language Modelling
Goal: Assign probability to a sentence.Why?
Machine Translation. P(high winds tonight) > P(largewinds tonight)
Spell Correction. P(about fifteen minutes from) >P(about fifteen minuets from)
Speech Recognition. P(I saw a van) > P(eyes awe of an)
30 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Language Modelling
Goal: Assign probability to a sentence.Why?
Machine Translation. P(high winds tonight) > P(largewinds tonight)
Spell Correction. P(about fifteen minutes from) >P(about fifteen minuets from)
Speech Recognition. P(I saw a van) > P(eyes awe of an)
30 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Language Modelling
Goal: Assign probability to a sentence.Why?
Machine Translation. P(high winds tonight) > P(largewinds tonight)
Spell Correction. P(about fifteen minutes from) >P(about fifteen minuets from)
Speech Recognition. P(I saw a van) > P(eyes awe of an)
30 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Language Modelling
Goal: compute the probability of a sentence or sequenceof words. P(W ) = P(w1,w2, · · · ,wn)
Related task: probability of an upcoming word.P(wi |w1,w2, · · · ,wi−1)
Chain rule:
P(x1, x2, · · · , xn)
= P(x1)P(x2|x1)P(x3|x1, x2) · · ·P(xn|x1, x2 · · · , xn−1)
=∏i
P(xi |x1, x2, · · · xi−1)
31 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Language Modelling
Goal: compute the probability of a sentence or sequenceof words. P(W ) = P(w1,w2, · · · ,wn)
Related task: probability of an upcoming word.P(wi |w1,w2, · · · ,wi−1)
Chain rule:
P(x1, x2, · · · , xn)
= P(x1)P(x2|x1)P(x3|x1, x2) · · ·P(xn|x1, x2 · · · , xn−1)
=∏i
P(xi |x1, x2, · · · xi−1)
31 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Example
P(its water is so transparent that) = P(its)× P(water |its)×P(so|its,water)× P(transparent|its,water , is, so)×P(that|its,water , is, so, transparent)Can we count?
P(that|its,water , is, so, transparent)
=P(its,water , is, so, transparent, that)
P(its,water , is, so, transparent)
No. Too many possibilities.
32 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Example
P(its water is so transparent that) = P(its)× P(water |its)×P(so|its,water)× P(transparent|its,water , is, so)×P(that|its,water , is, so, transparent)Can we count?
P(that|its,water , is, so, transparent)
=P(its,water , is, so, transparent, that)
P(its,water , is, so, transparent)
No. Too many possibilities.
32 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Markov Assumption
Take only the k words preceding it.
P(wi |w1,w2, · · · ,wi−1) ≈ P(wi |wi−k · · · ,wi−1)
P(that|its,water , is, so, transparent) = P(that|transparent)
or,
P(that|its,water , is, so, transparent) = P(that|so, transparent)
33 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Markov Assumption
Take only the k words preceding it.
P(wi |w1,w2, · · · ,wi−1) ≈ P(wi |wi−k · · · ,wi−1)
P(that|its,water , is, so, transparent) = P(that|transparent)
or,
P(that|its,water , is, so, transparent) = P(that|so, transparent)
33 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Unigram, Bigram and N-gram
Unigram model:
P(w1,w2, · · · ,wn−1,wn) ≈∏i
P(wi )
Bigram model:
P(wi |w1,w2, · · · ,wi−1) ≈ P(wi |wi−1)
N-gram model:Extension to trigram, 4-gram, 5-gram, etc.
34 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Discussion
1. Spell correction
2. Word suggestion.
35 / 35
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Discussion
1. Spell correction2. Word suggestion.
35 / 35
top related