how much do word embeddings encode about syntax? jacob andreas and dan klein uc berkeley

26
How much do word embeddings encode about syntax? Jacob Andreas and Dan Klein UC Berkeley

Upload: may-harris

Post on 24-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: How much do word embeddings encode about syntax? Jacob Andreas and Dan Klein UC Berkeley

How much do word embeddings encode about syntax?

Jacob Andreas and Dan KleinUC Berkeley

Page 2: How much do word embeddings encode about syntax? Jacob Andreas and Dan Klein UC Berkeley

Everybody loves word embeddings

few most

that the

a eachthisevery

[Collobert 2011][Collobert 2011, Mikolov 2013, Freitag 2004, Schuetze 1995, Turian 2010]

Page 3: How much do word embeddings encode about syntax? Jacob Andreas and Dan Klein UC Berkeley

What might embeddings bring?

Cathleen complained about the magazine’s shoddy editorial quality .

Mary

executiveaverage

Page 4: How much do word embeddings encode about syntax? Jacob Andreas and Dan Klein UC Berkeley

Three hypotheses

Vocabulary expansion(good for OOV words)

Statistic pooling(good for medium-frequency words)

Embedding structure(good for features)

Cathleen

Mary

averageeditorial

executive

transitivity

tense

Page 5: How much do word embeddings encode about syntax? Jacob Andreas and Dan Klein UC Berkeley

Vocabulary expansion:

Embeddings help handling of out-of-vocabulary words

Cathleen

Mary

Page 6: How much do word embeddings encode about syntax? Jacob Andreas and Dan Klein UC Berkeley

Vocabulary expansion

John

Mary

Pierre

yellow

enormous

hungry

Cathleen

Page 7: How much do word embeddings encode about syntax? Jacob Andreas and Dan Klein UC Berkeley

Vocabulary expansion

John

Mary

Pierre

yellow

enormous

hungry

Cathleen complained about the magazine’s shoddy editorial quality.

Cathleen

Mary

Page 8: How much do word embeddings encode about syntax? Jacob Andreas and Dan Klein UC Berkeley

Vocab. expansion results

60

65

70

75

80

85

90

95

100

91.13 91.22

Baseline +OOV

Page 9: How much do word embeddings encode about syntax? Jacob Andreas and Dan Klein UC Berkeley

Vocab. expansion results

70

71

72

73

74

75

71.8872.20

Baseline +OOV

(300 sentences)

Page 10: How much do word embeddings encode about syntax? Jacob Andreas and Dan Klein UC Berkeley

Statistic pooling hypothesis:

Embeddings help handling ofmedium-frequency words

averageeditorial

executive

Page 11: How much do word embeddings encode about syntax? Jacob Andreas and Dan Klein UC Berkeley

Statistic pooling

executive

kindgiant

editorial

average

{NN, JJ}

{NN}

{NN, JJ}

{JJ}

{NN}

Page 12: How much do word embeddings encode about syntax? Jacob Andreas and Dan Klein UC Berkeley

Statistic pooling

executive

kindgiant

editorial

average

{NN, JJ}

{NN, JJ}

{NN, JJ}

{JJ, NN}

{NN, JJ}

Page 13: How much do word embeddings encode about syntax? Jacob Andreas and Dan Klein UC Berkeley

Statistic pooling

executive

kindgiant

editorial

average

{NN, JJ}

{NN}

{NN, JJ}

{JJ}

{NN}

editorial NN

editorialNN

Page 14: How much do word embeddings encode about syntax? Jacob Andreas and Dan Klein UC Berkeley

Statistic pooling results

Baseline +Pooling60

65

70

75

80

85

90

95

100

91.13 91.11

Page 15: How much do word embeddings encode about syntax? Jacob Andreas and Dan Klein UC Berkeley

Vocab. expansion results

70

71

72

73

74

75

71.8872.21

Baseline +Pooling

(300 sentences)

Page 16: How much do word embeddings encode about syntax? Jacob Andreas and Dan Klein UC Berkeley

Embedding structure hypothesis:

The organization of the embedding spacedirectly encodes useful features

transitivity

tense

Page 17: How much do word embeddings encode about syntax? Jacob Andreas and Dan Klein UC Berkeley

Embedding structure

vanisheddined vanishing

dining

devoured

assassinateddevouring

assassinating

“transitivity”

“tense”

dined dinedVBD VBD

[Huang 2011]

Page 18: How much do word embeddings encode about syntax? Jacob Andreas and Dan Klein UC Berkeley

Embedding structure results

60

65

70

75

80

85

90

95

100

91.13 91.08

Baseline +Features

Page 19: How much do word embeddings encode about syntax? Jacob Andreas and Dan Klein UC Berkeley

Embedding structure results

70

71

72

73

74

75

71.88

70.32

Baseline +Features

(300 sentences)

Page 20: How much do word embeddings encode about syntax? Jacob Andreas and Dan Klein UC Berkeley

To summarize

60

65

70

75

80

85

90

95

100Baseline+OOV+Pooling+Features

(300 sentences)

Page 21: How much do word embeddings encode about syntax? Jacob Andreas and Dan Klein UC Berkeley

Combined results

60

65

70

75

80

85

90

95

100

90.70 90.11

Baseline +OOV+Pooling

Page 22: How much do word embeddings encode about syntax? Jacob Andreas and Dan Klein UC Berkeley

Vocab. expansion results

70

71

72

73

74

75

71.8872.21

Baseline

(300 sentences)

+OOV+Pooling

Page 23: How much do word embeddings encode about syntax? Jacob Andreas and Dan Klein UC Berkeley

What about…

• Domain adaptation?(no significant gain)

• French?(no significant gain)

• Other kinds of embeddings?(no significant gain)

Page 24: How much do word embeddings encode about syntax? Jacob Andreas and Dan Klein UC Berkeley

Why didn’t it work?• Context clues often provide enough information to

reason around words with incomplete / incorrect statistics

• Parser already has a robust OOV, small count models

• Sometimes “help” from embeddings is worse than nothing:

bifurcate Soap homered Paschi tuning unrecognized

Page 25: How much do word embeddings encode about syntax? Jacob Andreas and Dan Klein UC Berkeley

What about other parsers?

• Dependency parsers(continuous repr. as syntactic abstraction)

• Neural networks(continuous repr. as structural requirement)

[Henderson 2004, Socher 2013][Henderson 2004, Socher 2013, Koo 2008, Bansal 2014]

Page 26: How much do word embeddings encode about syntax? Jacob Andreas and Dan Klein UC Berkeley

Conclusion

• Embeddings provide no apparent benefit to state-of-the-art parser for:– OOV handling– Parameter pooling– Lexicon features

• Code online at http://cs.berkeley.edu/~jda