vector space word representations rani nelken, phd director of research, outbrain @raninelken

Vector space word representations

Rani Nelken, PhD Director of Research, Outbrain

@RaniNelken

https://www.flickr.com/photos/hyku/295930906/in/photolist-EbXgJ-ajDBs8-9hevWb-s9HX1-5hZqnb-a1Jk8H-a1Mcx7-7QiUWL-6AFs53-9TRtkz-bqt2GQ-F574u-F56EA-3imqK7/

Words = atoms?

That would be crazy for numbers

https://www.flickr.com/photos/proimos/4199675334/

The distributional hypothesis

What is a word?

Wittgenstein (1953): The meaning of a word is its use in the language

Firth (1957): You shall know a word by the company it keeps

From atomic symbols to vectors

• Map words to dense numerical vectors “representing” their contexts

• Map words with similar contexts to vectors with small angle

History

• Hard Clustering: Brown clustering

• Soft clustering: LSA, Random projections, LDA

• Neural nets

Feedforward Neural Net Language Model

Training

• Input is one-hot vectors of context (0…0,1,0…0)

• We’re trying to learn a vector for each word (“projection”)

• Such that the output is close to the one-hot vector of w(t)

Simpler model: Word2Vec

What can we do with these representations?

• Plug them into your existing classifier

• Plug them into further neural nets – better!

• Improves accuracy on many NLP tasks– Named entity recognition– POS tagging– sentiment analysis– semantic role labeling

Back to cheese…

• cos(crumbled, cheese) = 0.042

• cos(crumpled, cheese) = 0.203

http://en.wikipedia.org/wiki/Penn_%26_Teller#mediaviewer/File:Penn_and_Teller_(1988).jpg

And now for the magic

“Magical” property

• [Paris] - [France] + [Italy] ≈ [Rome]

• [king] - [man] + [woman] ≈ [queen]

• We can use it to solve word analogy problemsBoston: Red_Sox= New_York: ?Demo

http://radimrehurek.com/2014/02/word2vec-tutorial/

Why does it work?

[king] - [man] + [woman] ≈ [queen]

cos (x, ([king] – [man] + [woman])) =cos (x, [king]) – cos(x, [man]) + cos(x, [woman])

[queen] is a good candidate

It doesn’t always work

• London : England = Baghdad : ?

• We expect Iraq, but get Mosul

• We’re looking for a word that is close to Baghdad, and to England, but not to London

Why did it fail?

• London : England = Baghdad : ?

• cos(Mosul, Baghdad) >> cos(Iraq, London)

• Instead of adding the cosines, multiply them

• Improves accuracy

Word2Vec

• Open source C implementation from Google

• Comes with pre-learned embeddings

• Gensim: fast python implementation

Active field of research

• Bilingual embeddings

• Joint word and image embeddings

• Embeddings for sentiment

• Phrase and document embeddings

Bigger picture: how can we make NLP less fragile?

• 90’s: Linguistic engineering

• 00’s: Feature engineering

• 10’s: Unsupervised preprocessing

References

• https://code.google.com/p/word2vec/

• http://www.cs.bgu.ac.il/~yoavg/publications/conll2014analogies.pdf

• http://radimrehurek.com/2014/02/word2vec-tutorial/

https://code.google.com/p/word2vec/

https://code.google.com/p/word2vec/

http://www.cs.bgu.ac.il/~yoavg/publications/conll2014analogies.pdf






Thanks

@RaniNelken

We’re hiring for NLP positions

vector space word representations rani nelken, phd director of research, outbrain @raninelken

Documents

london slide

word2vec slide

raninelken slide

accuracy slide

magic slide

demo slide

document embeddings

small angle slide