devise: a deep visual-semantic embedding model

DeViSE: A Deep Visual-Semantic Embedding Model Presenters: Ji Gao, Fandi Lin

Upload: others

Post on 10-Feb-2022

7 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

DeViSE: A Deep Visual-Semantic Embedding

Model

Presenters: Ji Gao, Fandi Lin

Motivation

Visual recognition systems experience problems with large amount of categories.

● Insufficient labeled training data

● Blurred distinction between classes

How do we improve predictions of unknown categories?

Page 3: DeViSE: A Deep Visual-Semantic Embedding Model

Background

N-way discrete classifiers

● Labels treated as unrelated

● Semantic information not captured

Result: These systems cannot make zero-shot predictions without additional

information, i.e. text data.

Page 4: DeViSE: A Deep Visual-Semantic Embedding Model

Related Work

WSABIE: Linear map from image features to embedding space. Only used training

labels.

Socher et al: Linear map from image features to embedding space. Outlier

detection. Only 8 known and 2 unknown classes.

Other work that has shown zero-shot classification relies on curated information.

Page 5: DeViSE: A Deep Visual-Semantic Embedding Model

Proposed Method

Combine a traditional Visual model with a language model.

Page 6: DeViSE: A Deep Visual-Semantic Embedding Model

Proposed Method1. Train a language model for semantic information2. At the same time, train a CNN for images3. Initialize the combined model using pre-trained parameters4. Train the combined model

Page 7: DeViSE: A Deep Visual-Semantic Embedding Model

● Efficient estimation of word representations

in vector space, ICLR 2013

● Skip-gram: a generalization of n-grams

which skips the words between

● Skip-gram model: Learn a NN from a word

to predict nearby words.

Skip-gram language model

Page 8: DeViSE: A Deep Visual-Semantic Embedding Model

Skip-gram language model

Learn the relationship between labels.

● Data: 5.7 million documents (5.4 billion

words) extracted from wikipedia.org

Page 9: DeViSE: A Deep Visual-Semantic Embedding Model

CNN model● AlexNet● Winner of ILSVRC 2012● 5 conv layers

Page 10: DeViSE: A Deep Visual-Semantic Embedding Model

Combined modelUse a linear embedding layer to map the features extracted before Softmax(4096d) to match the size of the language model(500 or 1000d).

Loss function:

Page 11: DeViSE: A Deep Visual-Semantic Embedding Model

ExperimentTask:

● Image classification● Zero-shot image classification

Page 12: DeViSE: A Deep Visual-Semantic Embedding Model

Experiment - With same label set (not zero-shot)Baselines:

● Alexnet● Random Embedding: Alexnet + a random vectors (instead of the

language model)

Page 13: DeViSE: A Deep Visual-Semantic Embedding Model

Experiment: Zero-shotDataset:

● 2-hop: two clusters of labels● 3-hop: three clusters of labels● ImageNet2011: Use labels in ImageNet2011 that doesn’t appear in ImageNet2012

Page 14: DeViSE: A Deep Visual-Semantic Embedding Model

Experiment: Zero-shotComparing to pure CNN:

Page 15: DeViSE: A Deep Visual-Semantic Embedding Model

Experiment: Zero-shotCompare to previous zero-shot result

Page 16: DeViSE: A Deep Visual-Semantic Embedding Model

DeViSE achieves state-of-the-art performance in classification task, and also able to do zero-shot learning.

Suitable for large amount of data, and can handle labels with not enough number of data.

Show the power of combining image and semantic data.

Conclusion

Towards Semantic Embedding in Visual Vocabulary semantic embedding in visual... · Visual vocabulary serves as a fundamental component in many computer vision tasks, such as object

Semantic expansion using word embedding clustering and

DeViSE: A Deep Visual-Semantic Embedding Model · dictions. In this paper we present a new deep visual-semantic embedding model trained to identify visual objects using both labeled

Rails Devise CanCan Rails-Devise-CanCan-BootstrapBootstrap

On Clustering Algorithms: Applications in Word-Embedding ...was to generate word embedding that carries the semantic meaning of words. Word2Vec represent wo. rds . as high-dimensional

Tag-based Video Retrieval by Embedding Semantic Content in ... · embedding space, query-video similarity can be computed as a dot product in a low-dimensional space. The mapping

A Dual Attention Network with Semantic Embedding for Few-shot … · 2019. 6. 10. · A Dual Attention Network with Semantic Embedding for Few-shot Learning Shipeng Yan∗, Songyang

Funding map using paragraph embedding based on semantic ... · techniques, word/paragraph embedding. The embedding techniques represent words and paragraphs as real-valued vectors

LNCS 4607 - Building Semantic Web Portals with WebML€¦ · cover Semantic Web and we devise a set of new primitives for ontology importing and querying. Finally, an implementation

Image Question Answering: A Visual Semantic Embedding ... · Image Question Answering: A Visual Semantic Embedding ... an automatic question generation ... Image Question Answering:

Graph-RISE: Graph-Regularized Image Semantic Embedding · Graph-RISE: Graph-Regularized Image Semantic Embedding Da-Cheng Juan, Chun-Ta Lu, Zhen Li, Futang Peng, Aleksei Timofeev,

NSEEN: Neural Semantic Embedding for Entity Normalizationambite/...Semantic_Embedding_for_Entity_Normalizati… · Data Normalization, linking entities to their canonical forms, is

DeViSE: A Deep Visual-Semantic Embedding Modelranzato/publications/frome_nips2013.pdf · oceanic whitetip shark, sandbar shark, dusky shark, blue shark, requiem shark, and great white

DeViSE: A Deep Visual-Semantic Embedding Modelvision.cs.utexas.edu/381V-fall2016/slides/nagarajan...margin random incorrect class Inference - ZSL When a new image comes in: 1. Push

A Uniﬁed Semantic Embedding: Relating …lsigal/Publications/nips2014hwang.pdfA Uniﬁed Semantic Embedding: Relating Taxonomies and Attributes Sung Ju Hwang Disney Research Pittsburgh,

Subgraph-augmented Path Embedding for Semantic User Search ...shichuan.org/hin/topic/Similarity Measure/2018.WWW... · neous social network and a semantic relation (e.g., schoolmates),

Graph-RISE: Graph-Regularized Image Semantic Embedding · 2019-03-01 · Graph-RISE: Graph-Regularized Image Semantic Embedding Woodstock ’18, June 03–05, 2018, Woodstock, NY

Word Embedding - GitHub PagesWord Embedding 10 • Full document vs Window – Full document: general topics of the word • Latent Semantic Analysis • Expensive for word representation

Semantic Word Clouds with Background Corpus Normalization ... · Semantic Word Clouds with Background Corpus Normalization and t-distributed Stochastic Neighbor Embedding ERICH SCHUBERT,

Video Thumbnail Selection Deep Visual Semantic …arunv/master_thesis_defence.pdfDeep Visual Semantic Embedding for Video Thumbnail Selection Arun Balajee Vasudevan Supervisors: Michael

Semantic Regularisation for Recurrent Image Annotation · supervised semantic regularisation to the embedding layer. (2) Our proposed semantic regularisation enables reliable ne-tuning

Devise rails

Ladder Loss for Coherent Visual-Semantic Embedding · Visual-semantic embedding aims to map images and their descriptive sentences into a common space, so that we can retrieve sentences

Sparser Johnson-Lindenstrauss Transforms · One way to speed up embedding time in the JL lemma for sparse vectors is to devise a distribu-tion over sparse embedding matrices. This

Multiple Visual-Semantic Embedding for Video Retrieval ... · guitar Similarity Aggregation s N Final Similarity Global Visual Network Textual Embedding Network Sequential Visual

Our journey with semantic embedding

NSEEN: Neural Semantic Embedding for Entity Normalization · entity normalization based on syntactic and semantic information. We compute these similarities via embedding the entity

DeViSE: A Deep Visual-Semantic Embedding Modelvicente/vislang/slides/devise.pdf · 2017. 2. 25. · Efficient estimation of word representations in vector space, ICLR 2013 Skip-gram:

Towards Semantic Embedding in Visual Vocabulary

Multi-Task Deep Visual-Semantic Embedding for Video Thumbnail … · 2015. 5. 26. · Multi-Task Deep Visual-Semantic Embedding for Video Thumbnail Selection Wu Liu 1,3, Tao Mei 2,

Zero-Shot Object Detection via Learning an Embedding from ... · joint embedding space or learning an embedding from visual space to semantic space, embedding the semantic space to

Semantic Word Embedding (SWE)home.ustc.edu.cn/~quanliu/papers/SWE.README.pdf · 2015-09-06 · Semantic Word Embedding (SWE) Reference Manual (Readme) Open-source toolkits, knowledge

DeViSE: A Deep Visual-Semantic Embedding Modelpapers.nips.cc/paper/5204-devise-a-deep-visual-semantic-embedding-model.pdf · predict one of 1,000 object categories from the ILSVRC

Theory ofMind and embedding ofperspective - Semantic … · Theory ofMind and embedding ofperspective A psychological test ofa literary((sweet spot" D. H. Whalen, Lisa Zunshine and