lda2vec text by the bay 2016 with notes
TRANSCRIPT
![Page 1: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/1.jpg)
lda2vec (word2vec, and lda)
Christopher Moody @ Stitch Fix
Welcome, thanks for coming, having me, organizer
NLP can be a messy affair because you have to teach a computer about the irregularities and ambiguities of the English language
and have to teach it this sort of hierarchical & sparse nature of english grammar & vocab
3rd trimester, pregnant“wears scrubs” — medicinetaking a trip — a fix for vacation clothing
power and promise of word vectors is to sweep away a lot of issues
![Page 2: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/2.jpg)
About
@chrisemoody Caltech Physics PhD. in astrostats supercomputing sklearn t-SNE contributor Data Labs at Stitch Fix github.com/cemoody
Gaussian Processes t-SNE
chainer deep learning
Tensor Decomposition
![Page 3: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/3.jpg)
word2vec
lda
1
23ld
a2vec
![Page 4: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/4.jpg)
1. king - man + woman = queen 2. Huge splash in NLP world 3. Learns from raw text 4. Pretty simple algorithm 5. Comes pretrained
word2vec
1. Learns what words mean — can solve analogies cleanly.1. Not treating words as blocks, but instead modeling relationships
2. Distributed representations form the basis of more complicated deep learning systems3.
![Page 5: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/5.jpg)
1. Set up an objective function 2. Randomly initialize vectors 3. Do gradient descent
word2vec
![Page 6: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/6.jpg)
word
2vec
word2vec: learn word vector w from it’s surrounding context
w
1. not mention neural networks2. Let’s talk about training first3. n-grams transition probability vs tf-idf / LSI co-occurence matrices4. Here we will learn the embedded representation directly, with no intermediates, update it w/ every example
![Page 7: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/7.jpg)
word
2vec
“The fox jumped over the lazy dog”Maximize the likelihood of seeing the words given the word over.
P(the|over) P(fox|over)
P(jumped|over) P(the|over) P(lazy|over) P(dog|over)
…instead of maximizing the likelihood of co-occurrence counts.
1. Context — the words surrounding the training word2. Naively assume, BoW, not recurrent, no state3. Still a pretty simple assumption!
Conditioning on just *over* no other secret parameters or anything
![Page 8: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/8.jpg)
word
2vec
P(fox|over)
What should this be?
![Page 9: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/9.jpg)
word
2vec
P(vfox|vover)
Should depend on the word vectors.
P(fox|over)
Trying to learn the word vectors, so let’s start with those(we’ll randomly initialize them to begin with)
![Page 10: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/10.jpg)
word
2vec
“The fox jumped over the lazy dog”
P(w|c)
Extract pairs from context window around every input word.
![Page 11: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/11.jpg)
word
2vec
“The fox jumped over the lazy dog”
c
P(w|c)
Extract pairs from context window around every input word.
IN = training word = context
![Page 12: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/12.jpg)
word
2vec
“The fox jumped over the lazy dog”
w
P(w|c)
c
Extract pairs from context window around every input word.
![Page 13: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/13.jpg)
word
2vec
P(w|c)
w c
“The fox jumped over the lazy dog”
Extract pairs from context window around every input word.
![Page 14: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/14.jpg)
word
2vec
“The fox jumped over the lazy dog”
P(w|c)
w c
Extract pairs from context window around every input word.
![Page 15: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/15.jpg)
word
2vec
P(w|c)
c w
“The fox jumped over the lazy dog”
Extract pairs from context window around every input word.
![Page 16: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/16.jpg)
word
2vec
P(w|c)
c w
“The fox jumped over the lazy dog”
Extract pairs from context window around every input word.
![Page 17: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/17.jpg)
word
2vec
P(w|c)
c w
“The fox jumped over the lazy dog”
Extract pairs from context window around every input word.
inner most for-loopv_in was fix over the for loopincrement v_in to point to the next word
![Page 18: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/18.jpg)
word
2vec
P(w|c)
w c
“The fox jumped over the lazy dog”
Extract pairs from context window around every input word.
![Page 19: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/19.jpg)
word
2vec
P(w|c)
cw
“The fox jumped over the lazy dog”
Extract pairs from context window around every input word.
…So that at a high level is what we want word2vec to do.
![Page 20: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/20.jpg)
word
2vec
P(w|c)
cw
“The fox jumped over the lazy dog”
Extract pairs from context window around every input word.
…So that at a high level is what we want word2vec to do.
![Page 21: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/21.jpg)
word
2vec
P(w|c)
cw
“The fox jumped over the lazy dog”
Extract pairs from context window around every input word.
…So that at a high level is what we want word2vec to do.
![Page 22: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/22.jpg)
word
2vec
P(w|c)
c w
“The fox jumped over the lazy dog”
Extract pairs from context window around every input word.
…So that at a high level is what we want word2vec to do.
![Page 23: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/23.jpg)
word
2vec
P(w|c)
c w
“The fox jumped over the lazy dog”
Extract pairs from context window around every input word.
called skip grams
…So that at a high level is what we want word2vec to do.
two for loops
![Page 24: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/24.jpg)
objectiv
e
Measure loss between w and c?
How should we define P(w|c)?
Now we’ve defined the high-level update path for the algorithm.
Need to define this prob exactly in order to define our updates.
Boils down to diff between word & context — want to make as similar as possible, and then the probability will go up.
![Page 25: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/25.jpg)
objectiv
e
w . c
How should we define P(w|c)?
Measure loss between w and c?
Use cosine sim.
could imagine euclidean dist, mahalonobis dist
![Page 26: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/26.jpg)
word
2vec
w . c ~ 1
objectiv
e
w
c
vcanada . vsnow ~ 1
Dot product has these properties:Similar vectors have similarly near 1 (if normed)
![Page 27: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/27.jpg)
word
2vec
w . c ~ 0
objectiv
e
w
cvcanada . vdesert ~0
Orthogonal vectors have similarity near 0
![Page 28: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/28.jpg)
word
2vec
w . c ~ -1
objectiv
e
w
c
Orthogonal vectors have similarity near -1
![Page 29: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/29.jpg)
word
2vec
w . c ∈ [-1,1]
objectiv
e
But the inner product ranges from -1 to 1 (when normalized)
![Page 30: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/30.jpg)
word
2vec
But we’d like to measure a probability.
w . c ∈ [-1,1]
objectiv
e
…and we’d like a probability
![Page 31: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/31.jpg)
word
2vec
But we’d like to measure a probability.
objectiv
e
∈ [0,1]σ(c·w)
Transform again using sigmoid
![Page 32: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/32.jpg)
word
2vec
But we’d like to measure a probability.
objectiv
e
∈ [0,1]σ(c·w)
w c
w c
SimilarDissimilar
Transform again using sigmoid
![Page 33: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/33.jpg)
word
2vec
Loss function:
objectiv
e
L=σ(c·w)
Logistic (binary) choice. Is the (context, word) combination from our dataset?
Are these 2 similar?
This is optimized with identical vectors… everything looks exactly the same
![Page 34: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/34.jpg)
word
2vec
The skip-gram negative-sampling model
objectiv
e
Trivial solution is that context = word for all vectors
L=σ(c·w)w
c
no contrast! add negative samples
![Page 35: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/35.jpg)
word
2vec
The skip-gram negative-sampling model
L = σ(c·w) + σ(-c·wneg)
objectiv
e
Draw random words in vocabulary.
no contrast! add negative samples
discrimanate that this (w, c) is from observed vocabulary, this is a randomly drawn word
example: (fox, jumped) but (-fox, career)
no popularity guessing as in softmax
![Page 36: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/36.jpg)
word
2vec
The skip-gram negative-sampling model
objectiv
e
Discriminate positive from negative samples
Multiple Negative
L = σ(c·w) + σ(-c·wneg) +…+ σ(-c·wneg)
mult samples(fox, jumped)
not:(fox, career)(fox, android)(fox, sunlight)
remind: this L is being computed, and we’ll nudge all of the values of c & w via GD to optimize L
so that’s SGNS / w2vThat’s it! A bit disservice to make this a giant network
![Page 37: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/37.jpg)
word
2vec
The SGNS ModelPMI
ci·wj = PMI(Mij) - log k
…is extremely similar to matrix factorization!
Levy & Goldberg 2014
L = σ(c·w) + σ(-c·wneg)
explain what matrix is, row, col, entries
e.g. can solve word2vec via SVD if we want deterministically
One of the most cited NLP papers of 2014
![Page 38: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/38.jpg)
word
2vec
The SGNS ModelPMI
Levy & Goldberg 2014
‘traditional’ NLP
L = σ(c·w) + σ(-c·wneg)
ci·wj = PMI(Mij) - log k
…is extremely similar to matrix factorization!
Dropped indices for readability
most cited because connection to PMIInfo-theoretic measure association of (w and c)
![Page 39: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/39.jpg)
word
2vec
The SGNS Model
L = σ(c·w) + Σσ(-c·w)
PMI
ci·wj = log
Levy & Goldberg 2014
#(ci,wj)/n
k #(wj)/n #(ci)/n
‘traditional’ NLP
instead of looping over all words, you can count and roll them up the way
props are just counts divided by number of obs
word2vec adds this bias term
cool that all of for looping did before can be reduced to this form
![Page 40: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/40.jpg)
word
2vec
The SGNS Model
L = σ(c·w) + Σσ(-c·w)
PMI
ci·wj = log
Levy & Goldberg 2014
popularity of c,wk (popularity of c) (popularity of w)
‘traditional’ NLP
props are just counts divided by number of obs
word2vec adds this bias term
down weights rare terms
More frequent words are weighted more so than infrequent words. Rare words are downweighted.
Make sense given that words distribution follows Zipf law.
This weighting is what makes SGNS so powerful.
theoretical is nice, but practical is even better
![Page 41: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/41.jpg)
word
2vec
PMI
99% of word2vec is counting.
And you can count words in SQL
![Page 42: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/42.jpg)
word
2vec
PMI
Count how many times you saw c·w
Count how many times you saw c
Count how many times you saw w
![Page 43: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/43.jpg)
word
2vec
PMI
…and this takes ~5 minutes to compute on a single core. Computing SVD is a completely standard math library.
![Page 44: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/44.jpg)
word2vec
explain table
![Page 45: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/45.jpg)
intuition about word vectors
![Page 46: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/46.jpg)
Showing just 2 of the ~500 dimensions. Effectively we’ve PCA’d itonly 4 of 100k words
![Page 47: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/47.jpg)
![Page 48: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/48.jpg)
![Page 49: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/49.jpg)
![Page 50: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/50.jpg)
![Page 51: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/51.jpg)
![Page 52: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/52.jpg)
![Page 53: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/53.jpg)
![Page 54: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/54.jpg)
![Page 55: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/55.jpg)
![Page 56: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/56.jpg)
![Page 57: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/57.jpg)
If we only had locality and not regularity, this wouldn’t necessarily be true
![Page 58: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/58.jpg)
![Page 59: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/59.jpg)
![Page 60: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/60.jpg)
So we live in a vector space where operations like addition and subtraction are semantically meaningful.
So here’s a few examples of this working.
Really get the idea of these vectors as being ‘mixes’ of other ideas & vectors
![Page 61: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/61.jpg)
ITEM_3469 + ‘Pregnant’
SF is a person service
Box
![Page 62: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/62.jpg)
+ ‘Pregnant’
I love the stripes and the cut around my neckline was amazing
someone else might write ‘grey and black’
subtlety and nuance in that language
For some items, we have several times the collected works of shakespeare
![Page 63: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/63.jpg)
= ITEM_701333 = ITEM_901004 = ITEM_800456
![Page 64: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/64.jpg)
Stripes and are safe for maternityAnd also similar tones and flowy — still great for expecting mothers
![Page 65: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/65.jpg)
what about?LDA?
![Page 66: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/66.jpg)
LDA on Client Item Descriptions
This shows the incredible amount of structure
![Page 67: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/67.jpg)
LDA on Item
Descriptions (with Jay)
clunky jewelrydangling delicate jewelry elsewhere
![Page 68: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/68.jpg)
LDA on Item
Descriptions (with Jay)
topics on patterns, styles — this cluster is similarly described as high contrast tops with popping colors
![Page 69: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/69.jpg)
LDA on Item
Descriptions (with Jay)
bright dresses for a warm summer
LDA helps us model topics over documents in an interpretable way
![Page 70: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/70.jpg)
lda vs word2vec
![Page 71: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/71.jpg)
Bayesian Graphical ModelML Neural Model
![Page 72: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/72.jpg)
word2vec is local: one word predicts a nearby word
“I love finding new designer brands for jeans”
as if the world where one very long text string. no end of documents, no end of sentence, etc.
and a window across words
![Page 73: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/73.jpg)
“I love finding new designer brands for jeans”
But text is usually organized.
as if the world where one very long text string. no end of documents, no end of sentence, etc.
![Page 74: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/74.jpg)
“I love finding new designer brands for jeans”
But text is usually organized.
as if the world where one very long text string. no end of documents, no end of sentence, etc.
![Page 75: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/75.jpg)
“I love finding new designer brands for jeans”
In LDA, documents globally predict words.
doc 7681
these are client comment which are short, only predict dozens of words
but could be legal documents, or medical documents, 10k words
![Page 76: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/76.jpg)
typical word2vec vector
[ 0%, 9%, 78%, 11%]
typical LDA document vector
[ -0.75, -1.25, -0.55, -0.12, +2.2]
All sum to 100%All real values
![Page 77: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/77.jpg)
5D word2vec vector
[ 0%, 9%, 78%, 11%]
5D LDA document vector
[ -0.75, -1.25, -0.55, -0.12, +2.2]
Sparse All sum to 100%
Dimensions are absolute
Dense All real values
Dimensions relative
much easier to say to another human 78% than it is +2.2 of something and -1.25 of something else
w2v an address — 200 main st. — figure out from neighbors
LDA is a *mixture* model
78% of some ingredient
but
w2v isn’t -1.25 of some ingredient
ingredient = topics
![Page 78: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/78.jpg)
100D word2vec vector
[ 0%0%0%0%0% … 0%, 9%, 78%, 11%]
100D LDA document vector
[ -0.75, -1.25, -0.55, -0.27, -0.94, 0.44, 0.05, 0.31 … -0.12, +2.2]
Sparse All sum to 100%
Dimensions are absolute
Dense All real values
Dimensions relative
dense sparse
lda is sparse, has 95 dims close to zero
![Page 79: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/79.jpg)
100D word2vec vector
[ 0%0%0%0%0% … 0%, 9%, 78%, 11%]
100D LDA document vector
[ -0.75, -1.25, -0.55, -0.27, -0.94, 0.44, 0.05, 0.31 … -0.12, +2.2]
Similar in fewer ways (more interpretable)
Similar in 100D ways (very flexible)
+mixture +sparse
![Page 80: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/80.jpg)
can we do both? lda2vec
series of expgrain of salt
![Page 81: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/81.jpg)
-1.9 0.85 -0.6 -0.3 -0.5
Lufthansa is a German airline and when
fox
#hidden units
Skip grams from sentences
Word vector
Negative sampling loss
Lufthansa is a German airline and when
German
word2vec predicts locally: one word predicts a nearby word
read example
We extract pairs of pivot and target words that occur in a moving window that scans across the corpus. For every pair, the pivot word is used to predict the nearby target word.
![Page 82: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/82.jpg)
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topicsfox
#hidden units
#topics
#hidden units#hidden units
#hidden units
Skip grams from sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
German
Document vector predicts a word from
a global context
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topicsfox
#hidden units
#topics
#hidden units#hidden units
#hidden units
Skip grams from sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
German: French or Spanish+airlines a document vector similar to the word vector for airline.
we know we can add vectors
German + airline: like Lufthansa, Condor Flugdienst, and Aero Lloyd.
A latent vector is randomly initialized for every document in the corpus. Very similar to doc2vec and paragraph vectors.
![Page 83: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/83.jpg)
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topicsfox
#hidden units
#topics
#hidden units#hidden units
#hidden units
Skip grams from sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
We’re missing mixtures & sparsity!
German
Good for training sentiment models, great scores.
interpretability
![Page 84: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/84.jpg)
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topicsfox
#hidden units
#topics
#hidden units#hidden units
#hidden units
Skip grams from sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
We’re missing mixtures & sparsity!
German
Too many documents.
about as interpretable a hash
![Page 85: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/85.jpg)
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topicsfox
#hidden units
#topics
#hidden units#hidden units
#hidden units
Skip grams from sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
Now it’s a mixture.
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topicsfox
#hidden units
#topics
#hidden units#hidden units
#hidden units
Skip grams from sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
German
document X is +0.34 in topic 0, -0.1 in topic 2 and 0.17 in topic 3
model with 3 topics.
before had 500 DoF; now has just a few.
better choose really good topics, because I only have a few available to summarize the entire document
![Page 86: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/86.jpg)
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topicsfox
#hidden units
#topics
#hidden units#hidden units
#hidden units
Skip grams from sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
Trinitarian baptismal
Pentecostals Bede
schismatics excommunication
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topicsfox
#hidden units
#topics
#hidden units#hidden units
#hidden units
Skip grams from sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topicsfox
#hidden units
#topics
#hidden units#hidden units
#hidden units
Skip grams from sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
0.34 -0.1 0.17
#topicsDocument weight
Each topic has a distributed representation that lives in the same space as the word vectors. While each topic is not literally a token present in the corpus, it is similar to other tokens.
![Page 87: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/87.jpg)
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topicsfox
#hidden units
#topics
#hidden units#hidden units
#hidden units
Skip grams from sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
topic 1 = “religion” Trinitarian baptismal
Pentecostals Bede
schismatics excommunication
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topicsfox
#hidden units
#topics
#hidden units#hidden units
#hidden units
Skip grams from sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topicsfox
#hidden units
#topics
#hidden units#hidden units
#hidden units
Skip grams from sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
0.34 -0.1 0.17
#topicsDocument weight
![Page 88: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/88.jpg)
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topicsfox
#hidden units
#topics
#hidden units#hidden units
#hidden units
Skip grams from sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
Milosevic absentee
Indonesia Lebanese Isrealis
Karadzic
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topicsfox
#hidden units
#topics
#hidden units#hidden units
#hidden units
Skip grams from sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topicsfox
#hidden units
#topics
#hidden units#hidden units
#hidden units
Skip grams from sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
0.34 -0.1 0.17
#topicsDocument weight
notice one column over in the topic matrix
![Page 89: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/89.jpg)
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topicsfox
#hidden units
#topics
#hidden units#hidden units
#hidden units
Skip grams from sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
topic 2 = “politics” Milosevic absentee
Indonesia Lebanese Isrealis
Karadzic
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topicsfox
#hidden units
#topics
#hidden units#hidden units
#hidden units
Skip grams from sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topicsfox
#hidden units
#topics
#hidden units#hidden units
#hidden units
Skip grams from sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
0.34 -0.1 0.17
#topicsDocument weight
topic vectors, document vectors, and word vectors all live in the same space
![Page 90: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/90.jpg)
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topicsfox
#hidden units
#topics
#hidden units#hidden units
#hidden units
Skip grams from sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topicsfox
#hidden units
#topics
#hidden units#hidden units
#hidden units
Skip grams from sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
![Page 91: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/91.jpg)
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topicsfox
#hidden units
#topics
#hidden units#hidden units
#hidden units
Skip grams from sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topicsfox
#hidden units
#topics
#hidden units#hidden units
#hidden units
Skip grams from sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
The document weights are softmax transformed weights to yield the document proportions.
similar to logistic, but instead of 0-1 now a vectors of %
100% and indicates the topic proportions of a single document
one document might be 41% in topic 0, 26% in topic 1, and 34% in topic 3
very close to LDA-like representations
![Page 92: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/92.jpg)
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topicsfox
#hidden units
#topics
#hidden units#hidden units
#hidden units
Skip grams from sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topicsfox
#hidden units
#topics
#hidden units#hidden units
#hidden units
Skip grams from sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
1st time i did this, still very dense in percentages. so if 100 topics, had 1% each
might be, but still dense!
a zillion categories, 1% of this, 1% of this, 1% of this…
mathematically works, addition of lots of little bits (distributed)
![Page 93: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/93.jpg)
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topicsfox
#hidden units
#topics
#hidden units#hidden units
#hidden units
Skip grams from sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
Sparsity!
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topicsfox
#hidden units
#topics
#hidden units#hidden units
#hidden units
Skip grams from sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
34% 32% 34%
t=0
41% 26% 34%
t=10
99% 1% 0%
t=∞
time
init balanced, but dense
Dirichlet likelihood loss encourages proportions vectors to become sparser over time.
relatively simple
start pink, go to white and sparse
![Page 94: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/94.jpg)
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topicsfox
#hidden units
#topics
#hidden units#hidden units
#hidden units
Skip grams from sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
German
end up with something that’s quite a bit more complicated
but it achieves our goals: mixes word vectors w/ sparse interpretable document representations
![Page 95: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/95.jpg)
@chrisemoody Example Hacker News comments
Word vectors: https://github.com/cemoody/
lda2vec/blob/master/examples/hacker_news/lda2vec/
word_vectors.ipynb
read examples
play around at home — on
![Page 96: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/96.jpg)
@chrisemoody Example Hacker News comments
Topics: http://nbviewer.jupyter.org/github/cemoody/lda2vec/blob/master/examples/
hacker_news/lda2vec/lda2vec.ipynb
http://nbviewer.jupyter.org/github/cemoody/lda2vec/blob/master/examples/hacker_news/lda2vec/lda2vec.ipynb
topic 16 — sci, phystopic 1 — housingtopic 8 — finance, bitcointopic 23 — programming languages
topic 6 — transportationtopic 3 — education
![Page 97: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/97.jpg)
@chrisemoody
lda2vec.com
http://nbviewer.jupyter.org/github/cemoody/lda2vec/blob/master/examples/twenty_newsgroups/lda.ipynb
![Page 98: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/98.jpg)
+ API docs + Examples + GPU + Tests
@chrisemoody
lda2vec.com
http://nbviewer.jupyter.org/github/cemoody/lda2vec/blob/master/examples/twenty_newsgroups/lda.ipynb
![Page 99: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/99.jpg)
@chrisemoody
lda2vec.com
human-interpretable doc topics, use LDA.
machine-useable word-level features, use word2vec.
if you like to experiment a lot, and have topics over user / doc / region / etc. features, use lda2vec. (and you have a GPU)
If you want…
http://nbviewer.jupyter.org/github/cemoody/lda2vec/blob/master/examples/twenty_newsgroups/lda.ipynb
![Page 101: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/101.jpg)
@chrisemoody
lda2vec.com
http://nbviewer.jupyter.org/github/cemoody/lda2vec/blob/master/examples/twenty_newsgroups/lda.ipynb
![Page 102: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/102.jpg)
CreditLarge swathes of this talk are from
previous presentations by:
• Tomas Mikolov • David Blei • Christopher Olah • Radim Rehurek • Omer Levy & Yoav Goldberg • Richard Socher • Xin Rong • Tim Hopper
Richar & Xin Rong both ludic explanations of the word2vec gradient
![Page 103: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/103.jpg)
“PS! Thank you for such an awesome idea”
@chrisemoody
doc_id=1846
Can we model topics to sentences? lda2lstm
Data Labs @ SF is all about mixing cutting edge algorithms but we absolutely need interpretability.
initial vector is a dirichlet mixture — moves us from bag of words to sentence-level LDA
give a sent that’s 80% religion, 10% politics
word2vec on word level, LSTM on the sentence level, LDA on document level
Dirichlet-squeeze internal states and manipulations, that’ll help us understand the science of LSTM dynamics
![Page 104: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/104.jpg)
Can we model topics to sentences? lda2lstm
“PS! Thank you for such an awesome idea”doc_id=1846
@chrisemoody
Can we model topics to images? lda2ae
TJ Torres
![Page 105: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/105.jpg)
and now for something completely crazy4Fun Stuff
![Page 106: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/106.jpg)
translation
(using just a rotation matrix)
Miko
lov
2013
English
Spanish
Matrix Rotation
Blow mind
Explain plot
Not a complicated NN here
Still have to learn the rotation matrix — but it generalizes very nicely.
Have analogies for every linalg op as a linguistic operator
Robust framework and tools to do science on words
![Page 107: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/107.jpg)
deepwalk
Perozz
i
et al 2
014
learn word vectors from sentences
“The fox jumped over the lazy dog”
vOUT vOUT vOUT vOUT vOUTvOUT
‘words’ are graph vertices ‘sentences’ are random walks on the graph
word2vec
![Page 108: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/108.jpg)
Playlists at Spotify
context
sequence
lear
ning
‘words’ are song indices ‘sentences’ are playlists
![Page 109: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/109.jpg)
Playlists at Spotify
contextErik
Bernhar
dsson
Great performance on ‘related artists’
![Page 110: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/110.jpg)
Fixes at Stitch Fix
sequence
lear
ning
Let’s try: ‘words’ are items ‘sentences’ are fixes
![Page 111: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/111.jpg)
Fixes at Stitch Fix
context
Learn similarity between styles because they co-occur
Learn ‘coherent’ styles
sequence
lear
ning
![Page 112: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/112.jpg)
Fixes at Stitch Fix?
context
sequence
lear
ningGot lots of structure!
![Page 113: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/113.jpg)
Fixes at Stitch Fix?
context
sequence
lear
ning
![Page 114: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/114.jpg)
Fixes at Stitch Fix?
context
sequence
lear
ning
Nearby regions are consistent ‘closets’
What sorts of sequences do you have at Quora? What kinds of things can you learn from context?
![Page 116: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/116.jpg)
context dependent
Levy
& G
oldberg
2014
Australian scientist discovers star with telescopecontext +/- 2 words
![Page 117: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/117.jpg)
context dependent
context
Australian scientist discovers star with telescope
Levy
& G
oldberg
2014
What if we
![Page 118: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/118.jpg)
context dependent
context
Australian scientist discovers star with telescopecontext
Levy
& G
oldberg
2014
![Page 119: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/119.jpg)
context dependent
context
BoW DEPS
topically-similar vs ‘functionally’ similar
Levy
& G
oldberg
2014
![Page 121: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/121.jpg)
![Page 122: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/122.jpg)
Crazy Approaches
Paragraph Vectors (Just extend the context window)
Content dependency (Change the window grammatically)
Social word2vec (deepwalk) (Sentence is a walk on the graph)
Spotify (Sentence is a playlist of song_ids)
Stitch Fix (Sentence is a shipment of five items)
![Page 123: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/123.jpg)
See previous
![Page 124: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/124.jpg)
CBOW
“The fox jumped over the lazy dog”
Guess the word given the context
~20x faster. (this is the alternative.)
vOUT
vIN vINvIN vINvIN vIN
SkipGram
“The fox jumped over the lazy dog”
vOUT vOUT
vIN
vOUT vOUT vOUTvOUT
Guess the context given the word
Better at syntax. (this is the one we went over)
CBOW sums words vectors, loses the order in the sentenceBoth are good at semantic relationships Child and kid are nearby Or gender in man, woman If you blur words over the scale of context — 5ish words, you lose a lot grammatical nuanceBut skipgram preserves order Preserves the relationship in pluralizing, for example
![Page 125: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/125.jpg)
lda2
vec
vDOC = a vtopic1 + b vtopic2 +…
Let’s make vDOC sparse
Too many documents. I really like that document X is 70% in topic 0, 30% in topic1, …
![Page 126: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/126.jpg)
lda2
vec
This works! 😀 But vDOC isn’t as interpretable as the topic vectors. 😔
vDOC = topic0 + topic1
Let’s say that vDOC ads
Too many documents. I really like that document X is 70% in topic 0, 30% in topic1, …
![Page 127: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/127.jpg)
lda2
vec
softmax(vOUT * (vIN+ vDOC))
we want k *sparse* topics
![Page 128: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/128.jpg)
Shows that are many words similar to vacation actually come in lots of flavors — wedding words (bachelorette, rehearsals)— holiday/event words (birthdays, brunch, christmas, thanksgiving)— seasonal words (spring, summer,)— trip words (getaway)— destinations
![Page 129: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/129.jpg)
theory of lda2vec
lda2
vec
![Page 130: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/130.jpg)
pyLDAvis of lda2vec
lda2
vec
![Page 131: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/131.jpg)
LDA Results
context
History
I loved every choice in this fix!! Great job!
Great Stylist Perfect
There are k tags
Issues "Cancel Disappointed"Delivery "Profile, Pinterest""Weather Vacation" “Corrections for Next" "Wardrobe Mix" "Requesting Specific" "Requesting Department" "Requesting Style" "Style, Positive" "Style, Neutral"
![Page 132: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/132.jpg)
LDA Results
context
History
Body Fit
My measurements are 36-28-32. If that helps. I like wearing some clothing that is fitted.
Very hard for me to find pants that fit right.
There are k tags
Issues "Cancel Disappointed"Delivery "Profile, Pinterest""Weather Vacation" “Corrections for Next" "Wardrobe Mix" "Requesting Specific" "Requesting Department" "Requesting Style" "Style, Positive" "Style, Neutral"
![Page 133: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/133.jpg)
LDA Results
context
History
Sizing
Really enjoyed the experience and the pieces, sizing for tops was too big.
Looking forward to my next box!
Excited for next
There are k tags
Issues "Cancel Disappointed"Delivery "Profile, Pinterest""Weather Vacation" “Corrections for Next" "Wardrobe Mix" "Requesting Specific" "Requesting Department" "Requesting Style" "Style, Positive" "Style, Neutral"
![Page 134: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/134.jpg)
LDA Results
context
History
Almost Bought
It was a great fix. Loved the two items I kept and the three I sent back were close!
Perfect
There are k tags
Issues "Cancel Disappointed"Delivery "Profile, Pinterest""Weather Vacation" “Corrections for Next" "Wardrobe Mix" "Requesting Specific" "Requesting Department" "Requesting Style" "Style, Positive" "Style, Neutral"
![Page 135: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/135.jpg)
All of the following ideas will change what ‘words’ and ‘context’ represent.
But we’ll still use the same w2v algo
![Page 136: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/136.jpg)
parag
raph
vecto
r
What about summarizing documents?
On the day he took office, President Obama reached out to America’s enemies, offering in his first inaugural address to extend a hand if you are willing to unclench your fist. More than six years later, he has arrived at a moment of truth in testing that
![Page 137: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/137.jpg)
On the day he took office, President Obama reached out to America’s enemies, offering in his first inaugural address to extend a hand if you are willing to unclench your fist. More than six years later, he has arrived at a moment of truth in testing that
The framework nuclear agreement he reached with Iran on Thursday did not provide the definitive answer to whether Mr. Obama’s audacious gamble will pay off. The fist Iran has shaken at the so-called Great Satan since 1979 has not completely relaxed.
parag
raph
vecto
r
Normal skipgram extends C words before, and C words after.
IN
OUT OUT
Except we stay inside a sentence
![Page 138: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/138.jpg)
On the day he took office, President Obama reached out to America’s enemies, offering in his first inaugural address to extend a hand if you are willing to unclench your fist. More than six years later, he has arrived at a moment of truth in testing that
The framework nuclear agreement he reached with Iran on Thursday did not provide the definitive answer to whether Mr. Obama’s audacious gamble will pay off. The fist Iran has shaken at the so-called Great Satan since 1979 has not completely relaxed.
parag
raph
vecto
r
A document vector simply extends the context to the whole document.
IN
OUT OUT
OUT OUTdoc_1347
![Page 139: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/139.jpg)
fromgensim.modelsimportDoc2Vecfn=“item_document_vectors”model=Doc2Vec.load(fn)model.most_similar('pregnant')matches=list(filter(lambdax:'SENT_'inx[0],matches))
#['...Iamcurrently23weekspregnant...',#'...I'mnow10weekspregnant...',#'...notshowingtoomuchyet...',#'...15weeksnow.Babybump...',#'...6weekspostpartum!...',#'...12weekspostpartumandamnursing...',#'...Ihavemybabyshowerthat...',#'...amstillbreastfeeding...',#'...Iwouldloveanoutfitforababyshower...']
sente
nce
sear
ch
![Page 140: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/140.jpg)
![Page 141: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/141.jpg)
![Page 142: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/142.jpg)
![Page 143: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/143.jpg)
![Page 144: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/144.jpg)
![Page 145: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/145.jpg)
![Page 146: Lda2vec text by the bay 2016 with notes](https://reader034.vdocument.in/reader034/viewer/2022051521/587a52a51a28ab520b8b4aef/html5/thumbnails/146.jpg)