parts of speech - carnegie mellon universitydemo.clab.cs.cmu.edu/11711fa19/slides/fa19...
TRANSCRIPT
-
▪▪▪
▪▪▪
▪▪
▪▪
-
Parts of Speech
-
More Fine-Grained Classes
-
More Fine-Grained Classes
Actually, I ran home extremely quickly yesterday
-
The closed classes
-
Example of POS tagging
-
The Penn Treebank Part-of-Speech Tagset
-
The Universal POS tagset
https://universaldependencies.org
https://universaldependencies.org/
-
POS tagging
goal: resolve POS ambiguities
-
POS tagging
-
Most Frequent Class Baseline
The WSJ training corpus and test on sections 22-24 of the same corpus the most-frequent-tag baseline achieves an accuracy
of 92.34%.
-
Most Frequent Class Baseline
The WSJ training corpus and test on sections 22-24 of the same corpus the most-frequent-tag baseline achieves an accuracy
of 92.34%.
● 97% tag accuracy achievable by most algorithms(HMMs, MEMMs, neural networks, rule-based algorithms)
-
Why POS tagging
▪ Text-to-speech ▪ record, lead, protest
▪ Lemmatization ▪ saw/V → see, saw/N → saw
▪ Preprocessing for harder disambiguation problems▪ syntactic parsing▪ semantic parsing
-
Generative sequence labeling: Hidden Markov Models
-
o1 o2 on
▪ In real world many events are not observable▪ Speech recognition: we observe
acoustic features but not the phones▪ POS tagging: we observe words but
not the POS tags
Hidden Markov Models
q1
q2
qn ...
-
HMM
From J&M
-
HMM example
From J&M
-
HMMs:Algorithms
From J&M
Forward
Viterbi
Forward–Backward; Baum–Welch
-
HMM tagging as decoding
-
HMM tagging as decoding
How many possible choices?
-
Part of speech tagging example
Slide credit: Noah Smith
-
The Viterbi Algorithm
-
The Viterbi Algorithm
-
The Viterbi Algorithm
-
The Viterbi Algorithm
-
Beam search
-
HMMs:Algorithms
From J&M
Forward
Viterbi
Forward–Backward; Baum–Welch
-
The Forward Algorithm
sum instead of max
-
Viterbi
▪ n-best decoding▪ relationship to sequence alignment▪
-
Extending the HMM Algorithm to Trigrams
-
▪ Word shape▪ lower case → x▪ upper case → X▪ numbers → d▪ punctuation → .▪ I.M.F → X.X.X▪ DC10-30 → XXdd-dd
▪ Word shape + consecutive character types are removed▪ DC10-30 → Xd-d
▪ Prefixes & suffixes ▪ -s, -ed, ing▪
Unknown Words
-
Brants (2000)
▪ a trigram HMM▪ handling unknown words▪ 96.7% on the Penn Treebank
-
Generative vs. Discriminative models
▪ Generative models specify a joint distribution over the labels and the data. With this you could generate new data
▪ Discriminative models specify the conditional distribution of the label y given the data x. These models focus on how to discriminate between the classes
From Bamman
-
Maximum Entropy Markov Models (MEMM)
▪ HMM
▪ MEMM
-
Features in a MEMM
-
Features in a MEMM
▪ well-dressed
-
Decoding and Training MEMMs
-
Decoding MEMMs
greedy approach:
doesn’t use evidence from future decisions
-
Decoding MEMMs
Viterbi
▪ filling the chart with▪ HMM
▪ MEMM
-
Bidirectionality
▪ Label bias or observation bias problem ▪ will/NN to/TO fight/VB
▪ Linear-chain CRF (Lafferty et al. 2001)▪ A bidirectional version of the MEMM (Toutanova et al. 2003)▪ bi-LSTM
-
Neural sequence tagger
▪ Lample et al. 2016▪ Neural Architectures for NER
-
Multilingual POS tagging
▪ In morphologically-rich languages like Czech, Hungarian, Turkish▪ a 250,000 word token corpus of Hungarian has more than twice as
many word types as a similarly sized corpus of English▪ a 10 million word token corpus of Turkish contains four times as
many word types as a similarly sized English corpus
▪ ⇒ many UNKs▪ more information is coded in morphology
-
Multilingual POS tagging
▪ In non-word-space languages like Chinese word segmentation is either applied before tagging or done jointly▪ UNKs are difficult: the majority of unknown words are common
nouns and verbs because of extensive compounding
▪ Universal POS tagset accounts for cross-linguistic differences
-
Named Entity Recognition
-
Named Entity tags
-
Ambiguity in NER
-
NER as Sequence Labeling
IOB tagging scheme
-
A feature-based algorithm for NER
-
A feature-based algorithm for NER
▪ gazetteers▪ a list of place names providing millions of entries for locations with
detailed geographical and political information▪ binary indicator features
-
Evaluation of NER
▪ F-score▪ segmentation is a confound
▪ e.g., American/B-ORG Airlines▪ 2 errors: false positive for O and a false negative for I-ORG
-
HMMs in Automatic Speech Recognition
ssssssssppppeeeeeeetshshshshllllaeaeaebbbbb
“speech lab”
-
HMMs in Automatic Speech Recognition
w1
w2
Words
s1
s2
s3
s4
s5
s6
s7
Sound types
a1
a2
a3
a4
a5
a6
a7
Acousticobservations
Languagemodel
Acousticmodel