parts of speech - carnegie mellon universitydemo.clab.cs.cmu.edu/11711fa19/slides/fa19...

▪▪▪

▪▪▪

▪▪

▪▪

Parts of Speech

More Fine-Grained Classes

More Fine-Grained Classes

Actually, I ran home extremely quickly yesterday

The closed classes

Example of POS tagging

The Penn Treebank Part-of-Speech Tagset

The Universal POS tagset

https://universaldependencies.org

https://universaldependencies.org/

POS tagging

goal: resolve POS ambiguities

POS tagging

Most Frequent Class Baseline

The WSJ training corpus and test on sections 22-24 of the same corpus the most-frequent-tag baseline achieves an accuracy

of 92.34%.

Most Frequent Class Baseline

The WSJ training corpus and test on sections 22-24 of the same corpus the most-frequent-tag baseline achieves an accuracy

of 92.34%.

● 97% tag accuracy achievable by most algorithms(HMMs, MEMMs, neural networks, rule-based algorithms)

Why POS tagging

▪ Text-to-speech ▪ record, lead, protest

▪ Lemmatization ▪ saw/V → see, saw/N → saw

▪ Preprocessing for harder disambiguation problems▪ syntactic parsing▪ semantic parsing

Generative sequence labeling: Hidden Markov Models

o1 o2 on

▪ In real world many events are not observable▪ Speech recognition: we observe

acoustic features but not the phones▪ POS tagging: we observe words but

not the POS tags

Hidden Markov Models

q1

q2

qn ...

HMM

From J&M

HMM example

From J&M

HMMs:Algorithms

From J&M

Forward

Viterbi

Forward–Backward; Baum–Welch

HMM tagging as decoding

HMM tagging as decoding

How many possible choices?

Part of speech tagging example

Slide credit: Noah Smith

The Viterbi Algorithm

Beam search

HMMs:Algorithms

From J&M

Forward

Viterbi

Forward–Backward; Baum–Welch

The Forward Algorithm

sum instead of max

Viterbi

▪ n-best decoding▪ relationship to sequence alignment▪

Extending the HMM Algorithm to Trigrams

▪ Word shape▪ lower case → x▪ upper case → X▪ numbers → d▪ punctuation → .▪ I.M.F → X.X.X▪ DC10-30 → XXdd-dd

▪ Word shape + consecutive character types are removed▪ DC10-30 → Xd-d

▪ Prefixes & suffixes ▪ -s, -ed, ing▪

Unknown Words

Brants (2000)

▪ a trigram HMM▪ handling unknown words▪ 96.7% on the Penn Treebank

Generative vs. Discriminative models

▪ Generative models specify a joint distribution over the labels and the data. With this you could generate new data

▪ Discriminative models specify the conditional distribution of the label y given the data x. These models focus on how to discriminate between the classes

From Bamman

Maximum Entropy Markov Models (MEMM)

▪ HMM

▪ MEMM

Features in a MEMM

Features in a MEMM

▪ well-dressed

Decoding and Training MEMMs

Decoding MEMMs

greedy approach:

doesn’t use evidence from future decisions

Decoding MEMMs

Viterbi

▪ filling the chart with▪ HMM

▪ MEMM

Bidirectionality

▪ Label bias or observation bias problem ▪ will/NN to/TO fight/VB

▪ Linear-chain CRF (Lafferty et al. 2001)▪ A bidirectional version of the MEMM (Toutanova et al. 2003)▪ bi-LSTM

Neural sequence tagger

▪ Lample et al. 2016▪ Neural Architectures for NER

Multilingual POS tagging

▪ In morphologically-rich languages like Czech, Hungarian, Turkish▪ a 250,000 word token corpus of Hungarian has more than twice as

many word types as a similarly sized corpus of English▪ a 10 million word token corpus of Turkish contains four times as

many word types as a similarly sized English corpus

▪ ⇒ many UNKs▪ more information is coded in morphology

Multilingual POS tagging

▪ In non-word-space languages like Chinese word segmentation is either applied before tagging or done jointly▪ UNKs are difficult: the majority of unknown words are common

nouns and verbs because of extensive compounding

▪ Universal POS tagset accounts for cross-linguistic differences

Named Entity Recognition

Named Entity tags

Ambiguity in NER

NER as Sequence Labeling

IOB tagging scheme

A feature-based algorithm for NER

A feature-based algorithm for NER

▪ gazetteers▪ a list of place names providing millions of entries for locations with

detailed geographical and political information▪ binary indicator features

Evaluation of NER

▪ F-score▪ segmentation is a confound

▪ e.g., American/B-ORG Airlines▪ 2 errors: false positive for O and a false negative for I-ORG

HMMs in Automatic Speech Recognition

ssssssssppppeeeeeeetshshshshllllaeaeaebbbbb

“speech lab”

HMMs in Automatic Speech Recognition

w1

w2

Words

s1

s2

s3

s4

s5

s6

s7

Sound types

a1

a2

a3

a4

a5

a6

a7

Acousticobservations

Languagemodel

Acousticmodel

parts of speech - carnegie mellon universitydemo.clab.cs.cmu.edu/11711fa19/slides/fa19...

Documents