presented by eleni trianta llou march 1, 2016fidler/teaching/2015/slides/csc2523/...mikolov, tomas,...
TRANSCRIPT
![Page 1: presented by Eleni Trianta llou March 1, 2016fidler/teaching/2015/slides/CSC2523/...Mikolov, Tomas, et al. "E cient estimation of word representations in vector space." arXiv preprint](https://reader035.vdocument.in/reader035/viewer/2022081410/609dbcea4b53fa5ab22d81a4/html5/thumbnails/1.jpg)
Word2vec and beyond
presented by Eleni Triantafillou
March 1, 2016
![Page 2: presented by Eleni Trianta llou March 1, 2016fidler/teaching/2015/slides/CSC2523/...Mikolov, Tomas, et al. "E cient estimation of word representations in vector space." arXiv preprint](https://reader035.vdocument.in/reader035/viewer/2022081410/609dbcea4b53fa5ab22d81a4/html5/thumbnails/2.jpg)
The Big Picture
There is a long history of word representations
I Techniques from information retrieval: Latent SemanticAnalysis (LSA)
I Self-Organizing Maps (SOM)
I Distributional count-based methods
I Neural Language Models
Important take-aways:
1. Don’t need deep models to get good embeddings
2. Count-based models and neural net predictive models are notqualitatively different
source:http://gavagai.se/blog/2015/09/30/a-brief-history-of-word-embeddings/
![Page 3: presented by Eleni Trianta llou March 1, 2016fidler/teaching/2015/slides/CSC2523/...Mikolov, Tomas, et al. "E cient estimation of word representations in vector space." arXiv preprint](https://reader035.vdocument.in/reader035/viewer/2022081410/609dbcea4b53fa5ab22d81a4/html5/thumbnails/3.jpg)
Continuous Word Representations
I Contrast with simple n-gram models (words as atomic units)
I Simple models have the potential to perform very well...
I ... if we had enough data
I Need more complicated models
I Continuous representations take better advantage of data bymodelling the similarity between the words
![Page 4: presented by Eleni Trianta llou March 1, 2016fidler/teaching/2015/slides/CSC2523/...Mikolov, Tomas, et al. "E cient estimation of word representations in vector space." arXiv preprint](https://reader035.vdocument.in/reader035/viewer/2022081410/609dbcea4b53fa5ab22d81a4/html5/thumbnails/4.jpg)
Continuous Representations
source: http://www.codeproject.com/Tips/788739/Visualization-of-High-Dimensional-Data-using-t-SNE
![Page 5: presented by Eleni Trianta llou March 1, 2016fidler/teaching/2015/slides/CSC2523/...Mikolov, Tomas, et al. "E cient estimation of word representations in vector space." arXiv preprint](https://reader035.vdocument.in/reader035/viewer/2022081410/609dbcea4b53fa5ab22d81a4/html5/thumbnails/5.jpg)
Skip Gram
I Learn to predict surrounding words
I Use a large training corpus to maximize:
1
T
T∑t=1
∑−c≤j≤c, j 6=0
log p(wt+j |wt)
where:
I T: training set size
I c: context size
I wj : vector representation of the jth word
![Page 6: presented by Eleni Trianta llou March 1, 2016fidler/teaching/2015/slides/CSC2523/...Mikolov, Tomas, et al. "E cient estimation of word representations in vector space." arXiv preprint](https://reader035.vdocument.in/reader035/viewer/2022081410/609dbcea4b53fa5ab22d81a4/html5/thumbnails/6.jpg)
Skip Gram: Think of it as a Neural Network
Learn W and W’ in order to maximize previous objective
Input layer Hidden layer
Output layer
WV×N
W'N×V
C×V-dim
N-dim V-dim
xk hi W'N×V
W'N×V
y2,j
y1,j
yC,j
source: ”word2vec parameter learning explained.” ([4])
![Page 7: presented by Eleni Trianta llou March 1, 2016fidler/teaching/2015/slides/CSC2523/...Mikolov, Tomas, et al. "E cient estimation of word representations in vector space." arXiv preprint](https://reader035.vdocument.in/reader035/viewer/2022081410/609dbcea4b53fa5ab22d81a4/html5/thumbnails/7.jpg)
CBOW
Input layer
Hidden layerOutput layer
WV×N
WV×N
WV×N
W'N×V yjhix2k
x1k
xCk
C×V-dim
N-dim
V-dim
source: ”word2vec parameter learning explained.” ([4])
![Page 8: presented by Eleni Trianta llou March 1, 2016fidler/teaching/2015/slides/CSC2523/...Mikolov, Tomas, et al. "E cient estimation of word representations in vector space." arXiv preprint](https://reader035.vdocument.in/reader035/viewer/2022081410/609dbcea4b53fa5ab22d81a4/html5/thumbnails/8.jpg)
word2vec Experiments
I Evaluate how well syntactic/semantic word relationships arecaptured
I Understand effect of increasing training size / dimensionality
I Microsoft Research Sentence Completion Challenge
![Page 9: presented by Eleni Trianta llou March 1, 2016fidler/teaching/2015/slides/CSC2523/...Mikolov, Tomas, et al. "E cient estimation of word representations in vector space." arXiv preprint](https://reader035.vdocument.in/reader035/viewer/2022081410/609dbcea4b53fa5ab22d81a4/html5/thumbnails/9.jpg)
Semantic / Syntactic Word Relationships Task
![Page 10: presented by Eleni Trianta llou March 1, 2016fidler/teaching/2015/slides/CSC2523/...Mikolov, Tomas, et al. "E cient estimation of word representations in vector space." arXiv preprint](https://reader035.vdocument.in/reader035/viewer/2022081410/609dbcea4b53fa5ab22d81a4/html5/thumbnails/10.jpg)
Semantic / Syntactic Word Relationships Results
![Page 11: presented by Eleni Trianta llou March 1, 2016fidler/teaching/2015/slides/CSC2523/...Mikolov, Tomas, et al. "E cient estimation of word representations in vector space." arXiv preprint](https://reader035.vdocument.in/reader035/viewer/2022081410/609dbcea4b53fa5ab22d81a4/html5/thumbnails/11.jpg)
Learned Relationships
![Page 12: presented by Eleni Trianta llou March 1, 2016fidler/teaching/2015/slides/CSC2523/...Mikolov, Tomas, et al. "E cient estimation of word representations in vector space." arXiv preprint](https://reader035.vdocument.in/reader035/viewer/2022081410/609dbcea4b53fa5ab22d81a4/html5/thumbnails/12.jpg)
Microsoft Research Sentence Completion
![Page 13: presented by Eleni Trianta llou March 1, 2016fidler/teaching/2015/slides/CSC2523/...Mikolov, Tomas, et al. "E cient estimation of word representations in vector space." arXiv preprint](https://reader035.vdocument.in/reader035/viewer/2022081410/609dbcea4b53fa5ab22d81a4/html5/thumbnails/13.jpg)
Linguistic Regularities
I ”king” - ”man” + ”woman” = ”queen”!
I Demo
I Check out gensim (python library for topic modelling):https://radimrehurek.com/gensim/models/word2vec.html
![Page 14: presented by Eleni Trianta llou March 1, 2016fidler/teaching/2015/slides/CSC2523/...Mikolov, Tomas, et al. "E cient estimation of word representations in vector space." arXiv preprint](https://reader035.vdocument.in/reader035/viewer/2022081410/609dbcea4b53fa5ab22d81a4/html5/thumbnails/14.jpg)
Multimodal Word Embeddings: Motivation
Are these two objects similar?
![Page 15: presented by Eleni Trianta llou March 1, 2016fidler/teaching/2015/slides/CSC2523/...Mikolov, Tomas, et al. "E cient estimation of word representations in vector space." arXiv preprint](https://reader035.vdocument.in/reader035/viewer/2022081410/609dbcea4b53fa5ab22d81a4/html5/thumbnails/15.jpg)
Multimodal Word Embeddings: Motivation
And these?
![Page 16: presented by Eleni Trianta llou March 1, 2016fidler/teaching/2015/slides/CSC2523/...Mikolov, Tomas, et al. "E cient estimation of word representations in vector space." arXiv preprint](https://reader035.vdocument.in/reader035/viewer/2022081410/609dbcea4b53fa5ab22d81a4/html5/thumbnails/16.jpg)
Multimodal Word Embeddings: Motivation
What do you think should be the case?
sim( , ) < sim( , ) ?
or
sim( , ) > sim( , ) ?
![Page 17: presented by Eleni Trianta llou March 1, 2016fidler/teaching/2015/slides/CSC2523/...Mikolov, Tomas, et al. "E cient estimation of word representations in vector space." arXiv preprint](https://reader035.vdocument.in/reader035/viewer/2022081410/609dbcea4b53fa5ab22d81a4/html5/thumbnails/17.jpg)
When do we need image features?
It’s surely task-specific. In many cases can benefit from visualfeatures!
I Text-based Image Retrieval
I Visual Paraphrasing
I Common Sense Assertion Classification
I They are better-suited for zero shot learning (learn mappingbetween text and images)
![Page 18: presented by Eleni Trianta llou March 1, 2016fidler/teaching/2015/slides/CSC2523/...Mikolov, Tomas, et al. "E cient estimation of word representations in vector space." arXiv preprint](https://reader035.vdocument.in/reader035/viewer/2022081410/609dbcea4b53fa5ab22d81a4/html5/thumbnails/18.jpg)
Two Multimodal Word Embeddings approaches...
1. Combining Language and Vision with a Multimodal Skip-gramModel (Lazaridou et al, 2013)
2. Visual Word2Vec (vis-w2v): Learning Visually Grounded WordEmbeddings Using Abstract Scenes (Kottur et al, 2015)
![Page 19: presented by Eleni Trianta llou March 1, 2016fidler/teaching/2015/slides/CSC2523/...Mikolov, Tomas, et al. "E cient estimation of word representations in vector space." arXiv preprint](https://reader035.vdocument.in/reader035/viewer/2022081410/609dbcea4b53fa5ab22d81a4/html5/thumbnails/19.jpg)
Two Multimodal Word Embeddings approaches...
1. Combining Language and Vision with a MultimodalSkip-gram Model (Lazaridou et al, 2013)
2. Visual Word2Vec (vis-w2v): Learning Visually Grounded WordEmbeddings Using Abstract Scenes (Kottur et al, 2015)
![Page 20: presented by Eleni Trianta llou March 1, 2016fidler/teaching/2015/slides/CSC2523/...Mikolov, Tomas, et al. "E cient estimation of word representations in vector space." arXiv preprint](https://reader035.vdocument.in/reader035/viewer/2022081410/609dbcea4b53fa5ab22d81a4/html5/thumbnails/20.jpg)
Multimodal Skip-GramI The main idea: Use visual features for the (very) small
subset of the training data for which images are available.I Visual vectors are obtained by CNN and are fixed during
training!I Recall, Skip-Gram objective:
Lling (wt) =T∑t=1
∑−c≤j≤c,j 6=0
log(p(wt+j |wt))
I New Multimodal Skip-Gram objective:
L =1
T
T∑t=1
(Lling (wt) + Lvision(wt)),
whereI Lvision(wt) = 0 if wt does not have an entry in ImageNet,
and otherwiseI Lvision(wt) =
−∑
w ′∼P(w)
max(0, γ − cos(uwt , vwt ) + cos(uwt , vw ′))
![Page 21: presented by Eleni Trianta llou March 1, 2016fidler/teaching/2015/slides/CSC2523/...Mikolov, Tomas, et al. "E cient estimation of word representations in vector space." arXiv preprint](https://reader035.vdocument.in/reader035/viewer/2022081410/609dbcea4b53fa5ab22d81a4/html5/thumbnails/21.jpg)
Multimodal Skip-Gram: An example
![Page 22: presented by Eleni Trianta llou March 1, 2016fidler/teaching/2015/slides/CSC2523/...Mikolov, Tomas, et al. "E cient estimation of word representations in vector space." arXiv preprint](https://reader035.vdocument.in/reader035/viewer/2022081410/609dbcea4b53fa5ab22d81a4/html5/thumbnails/22.jpg)
Multimodal Skip-Gram: An example
![Page 23: presented by Eleni Trianta llou March 1, 2016fidler/teaching/2015/slides/CSC2523/...Mikolov, Tomas, et al. "E cient estimation of word representations in vector space." arXiv preprint](https://reader035.vdocument.in/reader035/viewer/2022081410/609dbcea4b53fa5ab22d81a4/html5/thumbnails/23.jpg)
Multimodal Skip-Gram: An example
![Page 24: presented by Eleni Trianta llou March 1, 2016fidler/teaching/2015/slides/CSC2523/...Mikolov, Tomas, et al. "E cient estimation of word representations in vector space." arXiv preprint](https://reader035.vdocument.in/reader035/viewer/2022081410/609dbcea4b53fa5ab22d81a4/html5/thumbnails/24.jpg)
Multimodal Skip-Gram: An example
![Page 25: presented by Eleni Trianta llou March 1, 2016fidler/teaching/2015/slides/CSC2523/...Mikolov, Tomas, et al. "E cient estimation of word representations in vector space." arXiv preprint](https://reader035.vdocument.in/reader035/viewer/2022081410/609dbcea4b53fa5ab22d81a4/html5/thumbnails/25.jpg)
Multimodal Skip-Gram: An example
![Page 26: presented by Eleni Trianta llou March 1, 2016fidler/teaching/2015/slides/CSC2523/...Mikolov, Tomas, et al. "E cient estimation of word representations in vector space." arXiv preprint](https://reader035.vdocument.in/reader035/viewer/2022081410/609dbcea4b53fa5ab22d81a4/html5/thumbnails/26.jpg)
Multimodal Skip-Gram: An example
![Page 27: presented by Eleni Trianta llou March 1, 2016fidler/teaching/2015/slides/CSC2523/...Mikolov, Tomas, et al. "E cient estimation of word representations in vector space." arXiv preprint](https://reader035.vdocument.in/reader035/viewer/2022081410/609dbcea4b53fa5ab22d81a4/html5/thumbnails/27.jpg)
Multimodal Skip-Gram: An example
![Page 28: presented by Eleni Trianta llou March 1, 2016fidler/teaching/2015/slides/CSC2523/...Mikolov, Tomas, et al. "E cient estimation of word representations in vector space." arXiv preprint](https://reader035.vdocument.in/reader035/viewer/2022081410/609dbcea4b53fa5ab22d81a4/html5/thumbnails/28.jpg)
Multimodal Skip-Gram: An example
![Page 29: presented by Eleni Trianta llou March 1, 2016fidler/teaching/2015/slides/CSC2523/...Mikolov, Tomas, et al. "E cient estimation of word representations in vector space." arXiv preprint](https://reader035.vdocument.in/reader035/viewer/2022081410/609dbcea4b53fa5ab22d81a4/html5/thumbnails/29.jpg)
Multimodal Skip-Gram: Comparing to Human Judgements
MEN: general relatedness (”pickles”, ”hamburgers”), Simplex-999:taxonomic similarity (”pickles”, ”food”), SemSim: Semantic similarity(”pickles”, ”onions”), VisSim: Visual Similarity (”pen”, ”screwdriver”)
![Page 30: presented by Eleni Trianta llou March 1, 2016fidler/teaching/2015/slides/CSC2523/...Mikolov, Tomas, et al. "E cient estimation of word representations in vector space." arXiv preprint](https://reader035.vdocument.in/reader035/viewer/2022081410/609dbcea4b53fa5ab22d81a4/html5/thumbnails/30.jpg)
Multimodal Skip-Gram: Examples of Nearest Neighbors
Only ”donut” and ”owl” trained with direct visual information.
![Page 31: presented by Eleni Trianta llou March 1, 2016fidler/teaching/2015/slides/CSC2523/...Mikolov, Tomas, et al. "E cient estimation of word representations in vector space." arXiv preprint](https://reader035.vdocument.in/reader035/viewer/2022081410/609dbcea4b53fa5ab22d81a4/html5/thumbnails/31.jpg)
Multimodal Skip-Gram: Zero-shot image labelling andimage retrieval
![Page 32: presented by Eleni Trianta llou March 1, 2016fidler/teaching/2015/slides/CSC2523/...Mikolov, Tomas, et al. "E cient estimation of word representations in vector space." arXiv preprint](https://reader035.vdocument.in/reader035/viewer/2022081410/609dbcea4b53fa5ab22d81a4/html5/thumbnails/32.jpg)
Multimodal Skip-Gram: Survey to evaluate on AbstractWords
Metric: Proportion (percentage) of words for which number votesin favour of ”neighbour” image significantly above chance.Unseen: Discard words for which visual info was accessible duringtraining.
![Page 33: presented by Eleni Trianta llou March 1, 2016fidler/teaching/2015/slides/CSC2523/...Mikolov, Tomas, et al. "E cient estimation of word representations in vector space." arXiv preprint](https://reader035.vdocument.in/reader035/viewer/2022081410/609dbcea4b53fa5ab22d81a4/html5/thumbnails/33.jpg)
Multimodal Skip-Gram: Survey to evaluate on AbstractWords
Left: subject preferred the nearest neighbour to the random image
freedom theory
god together place
wrong
![Page 34: presented by Eleni Trianta llou March 1, 2016fidler/teaching/2015/slides/CSC2523/...Mikolov, Tomas, et al. "E cient estimation of word representations in vector space." arXiv preprint](https://reader035.vdocument.in/reader035/viewer/2022081410/609dbcea4b53fa5ab22d81a4/html5/thumbnails/34.jpg)
Two Multimodal Word Embeddings approaches...
1. Combining Language and Vision with a Multimodal Skip-gramModel (Lazaridou et al, 2013)
2. Visual Word2Vec (vis-w2v): Learning Visually GroundedWord Embeddings Using Abstract Scenes (Kottur et al,2015)
![Page 35: presented by Eleni Trianta llou March 1, 2016fidler/teaching/2015/slides/CSC2523/...Mikolov, Tomas, et al. "E cient estimation of word representations in vector space." arXiv preprint](https://reader035.vdocument.in/reader035/viewer/2022081410/609dbcea4b53fa5ab22d81a4/html5/thumbnails/35.jpg)
Visual Word2Vec (vis-w2v): Motivation
Word Embedding
girl eating
ice cream
girl stares atice cream
stares ateating
vis-w2v : closer
eatingstares at
w2v : farther
![Page 36: presented by Eleni Trianta llou March 1, 2016fidler/teaching/2015/slides/CSC2523/...Mikolov, Tomas, et al. "E cient estimation of word representations in vector space." arXiv preprint](https://reader035.vdocument.in/reader035/viewer/2022081410/609dbcea4b53fa5ab22d81a4/html5/thumbnails/36.jpg)
Visual Word2Vec (vis-w2v): ApproachI Multimodal train set: tuples of (description, abstract scene)I Finetune word2vec to add visual features obtained by
abstract scenes (clipart)I Obtain surrogate (visual) classes by clustering those featuresI WI : initialized from word2vecI NK : number of clusters of abstract scene features
![Page 37: presented by Eleni Trianta llou March 1, 2016fidler/teaching/2015/slides/CSC2523/...Mikolov, Tomas, et al. "E cient estimation of word representations in vector space." arXiv preprint](https://reader035.vdocument.in/reader035/viewer/2022081410/609dbcea4b53fa5ab22d81a4/html5/thumbnails/37.jpg)
Clustering abstract scenes
Interestingly, ”prepare to cut”, ”hold”, ”give” are clusteredtogether with ”stare at” etc. It would be hard to infer thesesemantic relationships from text alone.
lay next to stand near
stare atenjoy
![Page 38: presented by Eleni Trianta llou March 1, 2016fidler/teaching/2015/slides/CSC2523/...Mikolov, Tomas, et al. "E cient estimation of word representations in vector space." arXiv preprint](https://reader035.vdocument.in/reader035/viewer/2022081410/609dbcea4b53fa5ab22d81a4/html5/thumbnails/38.jpg)
Visual Word2Vec (vis-w2v): Relationship to CBOW(word2vec)
Surrogate labels play the role of visual context.
![Page 39: presented by Eleni Trianta llou March 1, 2016fidler/teaching/2015/slides/CSC2523/...Mikolov, Tomas, et al. "E cient estimation of word representations in vector space." arXiv preprint](https://reader035.vdocument.in/reader035/viewer/2022081410/609dbcea4b53fa5ab22d81a4/html5/thumbnails/39.jpg)
Visual Word2Vec (vis-w2v): Visual Paraphrasing Results
![Page 40: presented by Eleni Trianta llou March 1, 2016fidler/teaching/2015/slides/CSC2523/...Mikolov, Tomas, et al. "E cient estimation of word representations in vector space." arXiv preprint](https://reader035.vdocument.in/reader035/viewer/2022081410/609dbcea4b53fa5ab22d81a4/html5/thumbnails/40.jpg)
Visual Word2Vec (vis-w2v): Visual Paraphrasing Results
Approach Visual Paraphrasing AP (%)
w2v-wiki 94.1w2v-wiki 94.4w2v-coco 94.6vis-w2v-wiki 95.1vis-w2v-coco 95.3
Table: Performance on visual paraphrasing task
![Page 41: presented by Eleni Trianta llou March 1, 2016fidler/teaching/2015/slides/CSC2523/...Mikolov, Tomas, et al. "E cient estimation of word representations in vector space." arXiv preprint](https://reader035.vdocument.in/reader035/viewer/2022081410/609dbcea4b53fa5ab22d81a4/html5/thumbnails/41.jpg)
Visual Word2Vec (vis-w2v): Common Sense AssertionClassification Results
Given a tuple (Primary Object, Relation, Secondary Object),decide if it is plausible or not.
Approach common sense AP (%)
w2v-coco 72.2w2v-wiki 68.1w2v-coco + vision 73.6vis-w2v-coco (shared) 74.5vis-w2v-coco (shared) + vision 74.2vis-w2v-coco (separate) 74.8vis-w2v-coco (separate) + vision 75.2vis-w2v-wiki (shared) 72.2vis-w2v-wiki (separate) 74.2
Table: Performance on the common sense task
![Page 42: presented by Eleni Trianta llou March 1, 2016fidler/teaching/2015/slides/CSC2523/...Mikolov, Tomas, et al. "E cient estimation of word representations in vector space." arXiv preprint](https://reader035.vdocument.in/reader035/viewer/2022081410/609dbcea4b53fa5ab22d81a4/html5/thumbnails/42.jpg)
Thank you!
[-0.0665592 -0.0431451 ... -0.05182673 -0.07418852 -0.044723570.02315103 -0.04419742 -0.01104935]
[ 0.08773034 0.00566679 ... 0.03735885 -0.04323553 0.02130294-0.09108844 -0.05708769 0.04659363]
![Page 43: presented by Eleni Trianta llou March 1, 2016fidler/teaching/2015/slides/CSC2523/...Mikolov, Tomas, et al. "E cient estimation of word representations in vector space." arXiv preprint](https://reader035.vdocument.in/reader035/viewer/2022081410/609dbcea4b53fa5ab22d81a4/html5/thumbnails/43.jpg)
Bibliography
Mikolov, Tomas, et al. ”Efficient estimation of wordrepresentations in vector space.” arXiv preprintarXiv:1301.3781 (2013).
Kottur, Satwik, et al. ”Visual Word2Vec (vis-w2v): LearningVisually Grounded Word Embeddings Using Abstract Scenes.”arXiv preprint arXiv:1511.07067 (2015).
Lazaridou, Angeliki, Nghia The Pham, and Marco Baroni.”Combining language and vision with a multimodal skip-grammodel.” arXiv preprint arXiv:1501.02598 (2015).
Rong, Xin. ”word2vec parameter learning explained.” arXivpreprint arXiv:1411.2738 (2014).
Mikolov, Tomas, et al. ”Distributed representations of wordsand phrases and their compositionality.” Advances in neuralinformation processing systems. 2013.