unsupervised part-of-speech tagging with bilingual graph-based projections june 21 acl 2011 slav...

67
Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University

Upload: hiltraud-gaul

Post on 05-Apr-2015

104 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University

Unsupervised Part-of-Speech Tagging

with Bilingual Graph-Based Projections

June 21ACL 2011

Slav PetrovGoogle Research

Dipanjan DasCarnegie Mellon University

Page 2: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University

Part-of-Speech Tagging

Portland has a thriving music scene .NOUN VERB DET ADJ NOUN NOUN .

2

Page 3: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University

Supervised POS Tagging

Arabic

Bulgarian

Chinese

Danish

French

Greek

Italian

Korean

Russian

Spanish

Turkish

84 86 88 90 92 94 96 98 100

96.993.7 97.8

98.293.4 99.1

96.496.896.7

98.197.5

95.695.8

98.797.5

96.896.8

94.696.3

94.789.1

Supervised setting: average accuracy is 96.2% with TnT 3 (Brants, 2000)

Page 4: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University

Resource-Poor Languages

Several major languages with no or little annotated data

Oriya

Indonesian-MalayAzerbaijani

e.g.

See http://www.ethnologue.org/ethno_docs/distribution.asp?by=size

Haitian

However, lots of parallel and

unannotated data!Basic NLP tools like POS

tagging essential for development of

language technologies

4

Punjabi

VietnamesePolish

32 million

37 million20 million

Native speakers

7.7 million

109 million

69 million40 million

Page 5: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University

(Nearly) Universal Part-of-Speech Tags

VERB DET

NOUN CONJ

PRON NUM

ADJ PRT

ADV .

ADP X

5

Page 6: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University

(Nearly) Universal Part-of-Speech Tags

Example Penn Treebank tag maps:

NN NOUNNNP NOUNNNPS NOUNNNS NOUN

PRP PRONPRP$ PRONWP PRONWP$ PRON

np NOUNnc NOUN

Example Spanish Treebank tag maps:

p0 PRONpd PRONpe PRONpi PRONpn PRON

pp PRONpr PRONpt PRONpx PRON

See Petrov, Das and McDonald (2011)

Page 7: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University

(Nearly) Universal Part-of-Speech Tags

Portland has a thriving music scene .NOUN VERB DET ADJ NOUN NOUN .

Portland hat eine prächtig gedeihende Musikszene .NOUN VERB DET ADJ ADJ NOUN .

পো��র্ট�ল্যা��ন্ড শহর এর সঙ্গী�ত �রিরবে�শ পো�শ উন্নত | NOUN NOUN ADP NOUN NOUN ADJ ADJ .

7

Page 8: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University

State of the Art in Unsupervised POS Tagging

8

Page 9: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University

Unsupervised Part-of-Speech Tagging

Portland

hat eineprächti

g gedeihendeMusiksze

ne.

? ? ? ? ? ? ?

Hidden Markov Model (HMM) estimated with the Expectation-Maximization algorithm

: observation sequence: state sequence

9Merialdo (1994)

Page 10: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University

Unsupervised Part-of-Speech Tagging

Portland

hat eineprächti

g gedeihendeMusiksze

ne.

? ? ? ? ? ? ?

Hidden Markov Model (HMM) estimated with the Expectation-Maximization algorithm

one of the 12 coarse

tags

: observation sequence: state sequence

10Merialdo (1994)

Page 11: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University

Unsupervised Part-of-Speech Tagging

Portland

hat

? ?

Hidden Markov Model (HMM) estimated with the Expectation-Maximization algorithm

transition multinomials

: observation sequence: state sequence

11Merialdo (1994)

Page 12: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University

Unsupervised Part-of-Speech Tagging

Portland

hat

? ?

Hidden Markov Model (HMM) estimated with the Expectation-Maximization algorithm

emission multinomials

: observation sequence: state sequence

12Merialdo (1994)

Page 13: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University

Unsupervised Part-of-Speech Tagging

Portland

hat eineprächti

g gedeihendeMusiksze

ne.

? ? ? ? ? ? ?

Hidden Markov Model (HMM) estimated with the Expectation-Maximization algorithm

Danish

Dutch German

Greek Italian Portuguese

Spanish

Swedish

Average

68.7 57.0 75.9 65.8 63.7 62.9 71.5 68.4 66.7EM-HMM

Poor average result 13

Johnson (2007)

Page 14: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University

Unsupervised Part-of-Speech Tagging

Hidden Markov Model (HMM) with locally normalized log-linear models

: observation sequence

Portland

hat

? ?emission multinomials

: state sequence

14

Berg-Kirkpatrick et al. (2010)

Page 15: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University

Unsupervised Part-of-Speech Tagging

Hidden Markov Model (HMM) with locally normalized log-linear models

: observation sequence

Portland

hat

? ?emission multinomials

suffixhyphen

capital lettersnumbers

...: state sequence

15

Berg-Kirkpatrick et al. (2010)

Page 16: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University

Unsupervised Part-of-Speech Tagging

Hidden Markov Model (HMM) with locally normalized log-linear models

: observation sequence

Portland

hat

? ?emission multinomials

suffixhyphen

capital lettersnumbers

...

Estimated using gradient-based methods

: state sequence

16

Berg-Kirkpatrick et al. (2010)

Page 17: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University

Unsupervised Part-of-Speech Tagging

Hidden Markov Model (HMM) with locally normalized log-linear models

Portland

hat

? ?emission multinomials

Danish

Dutch German

Greek Italian Portuguese

Spanish

Swedish

Average

68.7 57.0 75.9 65.8 63.7 62.9 71.5 68.4 66.7

69.1

65.1 81.3 71.8 68.1 78.4 80.2 70.1 73.0

EM-HMM

Feature-HMM

Estimated using gradient-based methods

Improvements across all languages 17

Page 18: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University

Portland

hat eineprächti

g gedeihendeMusiksze

ne.

NOUN VERB

PRONDETADJNUM

ADJADV ADJ NOUN .

Unsupervised POS Tagging with Dictionaries

Hidden Markov Model (HMM) with locally normalized log-linear modelsState space constrained by possible gold

tags

18

Page 19: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University

Portland

hat eineprächti

g gedeihendeMusiksze

ne.

NOUN VERB

PRONDETADJNUM

ADJADV ADJ NOUN .

Unsupervised POS Tagging with Dictionaries

Hidden Markov Model (HMM) with locally normalized log-linear modelsState space constrained by possible gold

tags

Danish

Dutch German

Greek Italian Portuguese

Spanish

Swedish

Average

68.7 57.0 75.9 65.8 63.7 62.9 71.5 68.4 66.7

69.1 65.1 81.3 71.8 68.1 78.4 80.2 70.1 73.0

93.1 94.7 93.5 96.6 96.4 94.0 95.8 85.5 93.7

EM-HMM

Feature-HMM

w/ gold dictionary

19

Page 20: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University

For most languages, access to high-quality

tag dictionaries is not realistic.

Ideas: 1) Use supervision in resource-rich

languages2) Use parallel data

3) Construct projected tag lexicons

20

Morphologically rich languages only have base

forms in dictionaries

Page 21: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University

Bilingual Projection

Portland has a thriving music scene .

NOUN VERB DET ADJ NOUN NOUN .

automatic labels from supervised tagger, 97% accuracy

21

Page 22: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University

Bilingual Projection

Portland has a thriving music scene .

NOUN VERB DET ADJ NOUN NOUN .

Portland hat eine prächtig gedeihende Musikszene .

Automatic unsupervised alignments from translation data(available for more than 50 languages)

22

Page 23: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University

Bilingual Projection

Portland has a thriving music scene .

NOUN VERB DET ADJ NOUN NOUN .

Portland hat eine prächtig gedeihende Musikszene .

Baseline1: direct projection

unaligned word

NOUN(most frequent tag)

23

Yarowsky and Ngai (2001)

Page 24: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University

Bilingual Projection

Portland hat eine prächtig gedeihende Musikszene .NOUN VERB DET NOUN ADJ NOUN .

+more projected tagged sentences

supervised training

tagger

24

(Brants, 2000)

Yarowsky and Ngai (2001)

Baseline1: direct projection

Page 25: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University

Bilingual Projection

Baseline 1: direct projection

25

Danish

Dutch German

Greek Italian Portuguese

Spanish

Swedish

Average

68.7 57.0 75.9 65.8 63.7 62.9 71.5 68.4 66.7

69.1 65.1 81.3 71.8 68.1 78.4 80.2 70.1 73.0

73.6

77.0 83.2 79.3 79.7 82.6 80.1 74.7 78.8

EM-HMM

Directprojection

Feature-HMM

Yarowsky and Ngai (2001)

Page 26: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University

Bilingual Projection

Baseline 1: direct projection

26

Yarowsky and Ngai (2001)

consistent improvements over unsupervised models

Danish

Dutch German

Greek Italian Portuguese

Spanish

Swedish

Average

68.7 57.0 75.9 65.8 63.7 62.9 71.5 68.4 66.7

69.1 65.1 81.3 71.8 68.1 78.4 80.2 70.1 73.0

73.6

77.0 83.2 79.3 79.7 82.6 80.1 74.7 78.8

EM-HMM

Directprojection

Feature-HMM

Page 27: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University

Bilingual ProjectionBaseline 2: lexicon

projection

27

Page 28: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University

Bilingual ProjectionBaseline 2: lexicon

projection

NOUN

Portland

VERB

has

DET

aADJ

thriving

NOUN

music

NOUN

scene

.

.

Portland

hat

eine

prächtig

gedeihende

Musikszene

.

28

Page 29: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University

Bilingual ProjectionBaseline 2: lexicon

projectionNOUN

Portland

Portland

ADJ

thriving

gedeihende

prächtigVERB

has

hat

DET

a

eine

NOUN

scene

Musikszene

NOUN

music

.

.

.

ignore unaligned word

29

Page 30: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University

Bilingual ProjectionBaseline 2: lexicon

projectionNOUN

Portland

Portland

ADJ

thriving

gedeihende

VERB

has

hat

DET

a

eine

NOUN

scene

Musikszene

NOUN

music

.

.

.

Bag of alignments

30

Page 31: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University

Bilingual ProjectionBaseline 2: lexicon

projectionNOUN

Portland

Portland

ADJ

thriving

gedeihende

VERB

has

hat

eine

NOUN

scene

Musikszene

NOUN

music

.

.

.

ADJNOUN

XNUM

PRONVERB

DET

DET

a31ADJ

NOUNX

NUMPRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

Page 32: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University

Bilingual ProjectionBaseline 2: lexicon

projectionNOUN

Portland

Portland

ADJ

thriving

gedeihende

VERB

has

hat

eine

NOUN

scene

NOUN

music

.

.

.

DET

a NUMone

PRONone

Musikszene

32ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

Page 33: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University

Bilingual ProjectionBaseline 2: lexicon

projectionNOUN

Portland

Portland

ADJ

thriving

gedeihende

VERB

has

hat

eine

NOUN

scene

NOUN

music

.

.

.

DET

a NUMone

PRONone

Musikszene

VERB

thriving

33ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DETADJ

NOUNX

NUMPRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

Page 34: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University

Bilingual ProjectionBaseline 2: lexicon

projection

Portland

gedeihende

hat

eine

Musikszene

.

After scanning all the parallel data:

= probability of a tag given a word

34

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

Page 35: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University

Bilingual ProjectionBaseline 2: lexicon

projection

Feature HMM constrained with projected dictionary

Improvements over simple projection for majority of the languages35

Danish

Dutch German

Greek Italian Portuguese

Spanish

Swedish

Average

68.7 57.0 75.9 65.8 63.7 62.9 71.5 68.4 66.7

69.1 65.1 81.3 71.8 68.1 78.4 80.2 70.1 73.0

73.6 77.0 83.2 79.3 79.7 82.6 80.1 74.7 78.8

79.0

78.8 82.4 76.3 84.8 87.0 82.8 79.4 81.3

EM-HMM

Directprojection

ProjectedDictionar

y

Feature-HMM

Page 36: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University

Can coverage be improved?

Idea:Projected lexicon expansion and

refinement using label propagation

No information about unaligned words

36

Portland hat eine prächtig gedeihende Musikszene .

Portland has a thriving music scene .

NOUN VERB DET ADJ NOUN NOUN .

Page 37: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University

How can label propagation help? For a language:

• Build graph over a 2M trigram types as vertices• compute similarity matrix using co-occurrence

statistics

• Label distribution at each vertex tag distribution over the trigram’s middle word

Subramanya, Petrov and Pereira (2010)

Our Model: Graph-Based Projections

37

Page 38: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University

ist gut bei

ist lebhafter bei

ist wichtig bei

ist fein bei

gutem Essen zugetan

fuers Essen drauf

1000 Essen pro

schlechtes Essen und

zum Essen niederlassen

zu realisieren ,

zu erreichen ,

zu stecken , zu essen ,

Example Graph in German

38

Page 39: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University

ist gut bei

ist lebhafter bei

ist wichtig bei

ist fein bei

gutem Essen zugetan

fuers Essen drauf

1000 Essen pro

schlechtes Essen und

zum Essen niederlassen

zu realisieren ,

zu erreichen ,

zu stecken , zu essen ,

Example Graph in German

39

NOUN

VERB

Page 40: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University

How can label propagation help? For a target language:

• Build graph over a 2M trigram types as vertices• compute similarity matrix using co-occurrence

statistics

• Label distribution at each vertex tag distribution over the trigram’s middle word• Plug in auto-tagged words from a source language

• Links between source and target language units are word alignments 40

Our Model: Graph-Based Projections

Page 41: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University

ist gut bei

ist lebhafter bei

ist wichtig bei

ist fein bei

gutem Essen zugetan

fuers Essen drauf

1000 Essen pro

schlechtes Essen und

zum Essen niederlassen

zu realisieren ,

zu erreichen ,

zu stecken , zu essen ,eat

food

eat

eating

NOUNVERB

VERBVERB

goodADJ

nicelyADV

fineADJ

importantADJBilingual Graph

41

Page 42: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University

How can label propagation help?

For a target language:

• Plug in auto-tagged words from a source language

• Links between source and target language units are word alignments

• Run first stage of label propagation• Source language target language

42

Our Model: Graph-Based Projections

Page 43: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University

ist gut bei

ist lebhafter bei

ist wichtig bei

ist fein bei

gutem Essen zugetan

fuers Essen drauf

1000 Essen pro

schlechtes Essen und

zum Essen niederlassen

zu realisieren ,

zu erreichen ,

zu stecken , zu essen ,eat

food

eat

eating

NOUNVERB

VERBVERB

goodADJ

nicelyADV

fineADJ

importantADJ

First Stage of Label Propagation

43

Page 44: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University

ist gut bei

ist lebhafter bei

ist wichtig bei

ist fein bei

gutem Essen zugetan

fuers Essen drauf

1000 Essen pro

schlechtes Essen und

zum Essen niederlassen

zu realisieren ,

zu erreichen ,

zu stecken , zu essen ,eat

food

eat

eating

NOUNVERB

VERBVERB

goodADJ

nicelyADV

fineADJ

importantADJ

First Stage of Label Propagation

VERBNOUN

44

ADJADV

ADJADV

Page 45: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University

How can label propagation help?

For a target language:

• Plug in auto-tagged words from a source language

• Links between source and target language units are word alignments

• Run first stage of label propagation• Source language target language

• Run second stage of label propagation • within target language vertices• graph objective function with squared

penalties 45

Our Model: Graph-Based Projections

Page 46: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University

ist gut bei

ist lebhafter bei

ist wichtig bei

ist fein bei

gutem Essen zugetan

fuers Essen drauf

1000 Essen pro

schlechtes Essen und

zum Essen niederlassen

zu realisieren ,

zu erreichen ,

zu stecken , zu essen ,eat

food

eat

eating

NOUNVERB

VERBVERB

goodADJ

nicelyADV

fineADJ

importantADJ

Second Stage of Label Propagation

VERBNOUN

46

ADJADV

ADJADV

Page 47: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University

ADJ

ist gut bei

ist lebhafter bei

ist wichtig bei

ist fein bei

gutem Essen zugetan

fuers Essen drauf

1000 Essen pro

schlechtes Essen und

zum Essen niederlassen

zu realisieren ,

zu erreichen ,

zu stecken , zu essen ,eat

food

eat

eating

NOUNVERB

VERBVERB

goodADJ

nicelyADV

fineADJ

importantADJ

Second Stage of Label Propagation

VERBNOUN

ADJ

VERBNOUN

47

ADJADV

ADJADV

Page 48: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University

ADJ

ist gut bei

ist lebhafter bei

ist wichtig bei

ist fein bei

gutem Essen zugetan

fuers Essen drauf

1000 Essen pro

schlechtes Essen und

zum Essen niederlassen

zu realisieren ,

zu erreichen ,

zu stecken , zu essen ,eat

food

eat

eating

NOUNVERB

VERBVERB

goodADJ

nicelyADV

fineADJ

importantADJ

Second Stage of Label Propagation

VERBNOUN

ADJ

VERBNOUN

VERB

VERBNOUN

48

ADJADV

ADJADV

Page 49: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University

ADJ

ist gut bei

ist lebhafter bei

ist wichtig bei

ist fein bei

gutem Essen zugetan

fuers Essen drauf

1000 Essen pro

schlechtes Essen und

zum Essen niederlassen

zu realisieren ,

zu erreichen ,

zu stecken , zu essen ,eat

food

eat

eating

NOUNVERB

VERBVERB

goodADJ

nicelyADV

fineADJ

importantADJ

Second Stage of Label Propagation

VERBNOUN

ADJ

VERBNOUN

VERB

VERBNOUN

49

ADJADV

ADJADV

Continues till convergence...

Page 50: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University

fein lebhafter realisieren

Portland

gedeihende

hat

eine

Musikszene

.

End result?

50

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

ADVNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

Our Model: Graph-Based Projections

Page 51: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University

fein lebhafter realisieren

Portland

gedeihende

hat

eine

Musikszene

.

51

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

ADVNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

A larger set of tag distributions better and larger dictionary

ADJNOUN

XNUM

PRONVERB

DET

Our Model: Graph-Based Projections

Page 52: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University

52

Lexicon Expansion

Danish

Dutch

Germ

an

Greek

Italia

n

Portu

gues

e

Span

ish

Swed

ish0

20000

40000

60000

80000

100000

120000

140000

Projected Dictionary Graph-Based Projections

thousandsof words

Our Model: Graph-Based Projections

Page 53: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University

Brief Overview:Graph-Based Learning

with Labeled and Unlabeled Data

53

Page 54: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University

labeled datapoints unlabeled datapoints

supervised label distributions

distributions to be found

Zhu, Ghahramani and Lafferty (2003) 54

0.9

0.01

0.8

0.9

0.1

= symmetric weight matrix

0.05

Page 55: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University

0.9

0.01

0.8

0.9

0.1

Label Propagation

Zhu, Ghahramani and Lafferty (2003) 55

0.05

Page 56: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University

0.9

0.01

0.8

0.9

0.1

set of distributions over unlabeled vertices56

Label Propagation

0.05

Page 57: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University

0.9

0.01

0.8

0.9

0.1

unlabeled vertices57

Label Propagation

0.05

Page 58: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University

0.9

0.01

0.8

0.9

0.1

brings the distributions of similarvertices closer 58

Label Propagation

0.05

Page 59: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University

0.9

0.01

0.8

0.9

0.1

brings the distributions of uncertain neighborhoods

close to the uniform distribution

Size of the label set

59

Label Propagation

0.05

Page 60: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University

0.9

0.01

0.8

0.9

0.1

Iterative updates for optimization60

Label Propagation

0.05

Page 61: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University

61

Final Results

Page 62: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University

Feature HMM constrained with graph-based dictionary

Danish

Dutch German

Greek Italian Portuguese

Spanish

Swedish

Average

68.7 57.0 75.9 65.8 63.7 62.9 71.5 68.4 66.7

69.1 65.1 81.3 71.8 68.1 78.4 80.2 70.1 73.0

73.6 77.0 83.2 79.3 79.7 82.6 80.1 74.7 78.8

79.0 78.8 82.4 76.3 84.8 87.0 82.8 79.4 81.3

83.2

79.5 82.8 82.5 86.8 87.9 84.2 80.5 83.4

EM-HMM

Feature-HMM

Directprojection

ProjectedDictionar

yGraph-Based

Projections

62

Our Model: Graph-Based Projections

Page 63: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University

Feature HMM constrained with graph-based dictionary

Danish

Dutch German

Greek Italian Portuguese

Spanish

Swedish

Average

68.7 57.0 75.9 65.8 63.7 62.9 71.5 68.4 66.7

69.1 65.1 81.3 71.8 68.1 78.4 80.2 70.1 73.0

73.6 77.0 83.2 79.3 79.7 82.6 80.1 74.7 78.8

79.0 78.8 82.4 76.3 84.8 87.0 82.8 79.4 81.3

83.2

79.5 82.8 82.5 86.8 87.9 84.2 80.5 83.4

EM-HMM

Feature-HMM

Directprojection

ProjectedDictionar

yGraph-Based

Projections

93.1 94.7 93.5 96.6 96.4 94.0 95.8 85.5 93.7w/ gold dictionary

96.9 94.9 98.2 97.8 95.8 97.2 96.8 94.8 96.6supervised

63

Our Model: Graph-Based Projections

Page 64: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University

Concluding Notes

• Reasonably accurate POS taggers without direct supervision–Evaluated on major European languages

• Towards a standard of universal POS tags

• Traditional evaluation of unsupervised POS taggers done using greedy metrics that use labeled data–Our presented models avoid these evaluation methods

64

Page 65: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University

Future Directions

• Scaling up the number of nodes in the graph from 2M to billions may help create larger lexicons

• Including penalties in the graph objective that induce sparse tag distributions at each graph vertex

• Inclusion of multiple languages in the graph may further improve results–Label propagation in one huge multilingual graph

65

Page 66: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University

66

http://code.google.com/p/pos-projection/

Projected POS Tagged dataavailable at:

Page 67: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University

Portland has a thriving music scene .

NOUNADJADJ

Portland hat eine prächtig gedeihende Musikszene .

পো��র্ট�ল্যা��ন্ড শহর এর সঙ্গী�ত �রিরবে�শ পো�শ উন্নত | 

NOUN VERB DET ADJ NOUN NOUN .

Portland tiene una escena musical vibrante .

波特兰 有 一个 生机勃勃的 音乐 场景

Portland a une scène musicale florissante .

ADJNOUNNOUN NOUN

ADP

.NOUN VERB DET ADJ

ADJ

.NOUN VERB DET ADJ NOUN NOUNNOUN VERB DET NOUN

ADJ

ADJ .

Questions?

67