vector space word representations rani nelken, phd director of research, outbrain @raninelken

25
Vector space word representations Rani Nelken, PhD Director of Research, Outbrain @RaniNelken

Upload: matthew-armstrong

Post on 17-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Vector space word representations Rani Nelken, PhD Director of Research, Outbrain @RaniNelken

Vector space word representations

Rani Nelken, PhD Director of Research, Outbrain

@RaniNelken

Page 2: Vector space word representations Rani Nelken, PhD Director of Research, Outbrain @RaniNelken

https://www.flickr.com/photos/hyku/295930906/in/photolist-EbXgJ-ajDBs8-9hevWb-s9HX1-5hZqnb-a1Jk8H-a1Mcx7-7QiUWL-6AFs53-9TRtkz-bqt2GQ-F574u-F56EA-3imqK7/

Page 3: Vector space word representations Rani Nelken, PhD Director of Research, Outbrain @RaniNelken

Words = atoms?

Page 4: Vector space word representations Rani Nelken, PhD Director of Research, Outbrain @RaniNelken
Page 5: Vector space word representations Rani Nelken, PhD Director of Research, Outbrain @RaniNelken

That would be crazy for numbers

https://www.flickr.com/photos/proimos/4199675334/

Page 6: Vector space word representations Rani Nelken, PhD Director of Research, Outbrain @RaniNelken

The distributional hypothesis

What is a word?

Wittgenstein (1953): The meaning of a word is its use in the language

Firth (1957): You shall know a word by the company it keeps

Page 7: Vector space word representations Rani Nelken, PhD Director of Research, Outbrain @RaniNelken

From atomic symbols to vectors

• Map words to dense numerical vectors “representing” their contexts

• Map words with similar contexts to vectors with small angle

Page 8: Vector space word representations Rani Nelken, PhD Director of Research, Outbrain @RaniNelken

History

• Hard Clustering: Brown clustering

• Soft clustering: LSA, Random projections, LDA

• Neural nets

Page 9: Vector space word representations Rani Nelken, PhD Director of Research, Outbrain @RaniNelken

Feedforward Neural Net Language Model

Page 10: Vector space word representations Rani Nelken, PhD Director of Research, Outbrain @RaniNelken

Training

• Input is one-hot vectors of context (0…0,1,0…0)

• We’re trying to learn a vector for each word (“projection”)

• Such that the output is close to the one-hot vector of w(t)

Page 11: Vector space word representations Rani Nelken, PhD Director of Research, Outbrain @RaniNelken

Simpler model: Word2Vec

Page 12: Vector space word representations Rani Nelken, PhD Director of Research, Outbrain @RaniNelken
Page 13: Vector space word representations Rani Nelken, PhD Director of Research, Outbrain @RaniNelken

What can we do with these representations?

• Plug them into your existing classifier

• Plug them into further neural nets – better!

• Improves accuracy on many NLP tasks– Named entity recognition– POS tagging– sentiment analysis– semantic role labeling

Page 14: Vector space word representations Rani Nelken, PhD Director of Research, Outbrain @RaniNelken

Back to cheese…

• cos(crumbled, cheese) = 0.042

• cos(crumpled, cheese) = 0.203

Page 15: Vector space word representations Rani Nelken, PhD Director of Research, Outbrain @RaniNelken

http://en.wikipedia.org/wiki/Penn_%26_Teller#mediaviewer/File:Penn_and_Teller_(1988).jpg

And now for the magic

Page 16: Vector space word representations Rani Nelken, PhD Director of Research, Outbrain @RaniNelken

“Magical” property

• [Paris] - [France] + [Italy] ≈ [Rome]

• [king] - [man] + [woman] ≈ [queen]

• We can use it to solve word analogy problemsBoston: Red_Sox= New_York: ?Demo

Page 17: Vector space word representations Rani Nelken, PhD Director of Research, Outbrain @RaniNelken
Page 18: Vector space word representations Rani Nelken, PhD Director of Research, Outbrain @RaniNelken

Why does it work?

[king] - [man] + [woman] ≈ [queen]

cos (x, ([king] – [man] + [woman])) =cos (x, [king]) – cos(x, [man]) + cos(x, [woman])

[queen] is a good candidate

Page 19: Vector space word representations Rani Nelken, PhD Director of Research, Outbrain @RaniNelken

It doesn’t always work

• London : England = Baghdad : ?

• We expect Iraq, but get Mosul

• We’re looking for a word that is close to Baghdad, and to England, but not to London

Page 20: Vector space word representations Rani Nelken, PhD Director of Research, Outbrain @RaniNelken

Why did it fail?

• London : England = Baghdad : ?

• cos(Mosul, Baghdad) >> cos(Iraq, London)

• Instead of adding the cosines, multiply them

• Improves accuracy

Page 21: Vector space word representations Rani Nelken, PhD Director of Research, Outbrain @RaniNelken

Word2Vec

• Open source C implementation from Google

• Comes with pre-learned embeddings

• Gensim: fast python implementation

Page 22: Vector space word representations Rani Nelken, PhD Director of Research, Outbrain @RaniNelken

Active field of research

• Bilingual embeddings

• Joint word and image embeddings

• Embeddings for sentiment

• Phrase and document embeddings

Page 23: Vector space word representations Rani Nelken, PhD Director of Research, Outbrain @RaniNelken

Bigger picture: how can we make NLP less fragile?

• 90’s: Linguistic engineering

• 00’s: Feature engineering

• 10’s: Unsupervised preprocessing

Page 25: Vector space word representations Rani Nelken, PhD Director of Research, Outbrain @RaniNelken

Thanks

@RaniNelken

We’re hiring for NLP positions