learning concept taxonomies from multi-modal datahzhang2/projects/learning-taxonomies/slides.pdf ·...
TRANSCRIPT
![Page 1: Learning Concept Taxonomies from Multi-modal Datahzhang2/projects/learning-taxonomies/slides.pdf · Learning Concept Taxonomies from Multi-modal Data Carnegie Mellon University Hao](https://reader033.vdocument.in/reader033/viewer/2022041422/5e1ffdac58ed39714b798494/html5/thumbnails/1.jpg)
1
LearningConceptTaxonomiesfromMulti-modalData
Carnegie Mellon University
Hao ZhangZhiting Hu, Yuntian Deng, Mrinmaya Sachan, Zhicheng Yan and Eric P. Xing
![Page 2: Learning Concept Taxonomies from Multi-modal Datahzhang2/projects/learning-taxonomies/slides.pdf · Learning Concept Taxonomies from Multi-modal Data Carnegie Mellon University Hao](https://reader033.vdocument.in/reader033/viewer/2022041422/5e1ffdac58ed39714b798494/html5/thumbnails/2.jpg)
• Problem
• Taxonomy Induction Model
• Features
• Evaluation and Analysis
Outline
2
![Page 3: Learning Concept Taxonomies from Multi-modal Datahzhang2/projects/learning-taxonomies/slides.pdf · Learning Concept Taxonomies from Multi-modal Data Carnegie Mellon University Hao](https://reader033.vdocument.in/reader033/viewer/2022041422/5e1ffdac58ed39714b798494/html5/thumbnails/3.jpg)
3
Problem• Taxonomy induction
{consumer goods, fashion, uniform, neckpiece, handwear, finery, disguise, ...}
A set of lexical terms =
• Human knowledge• Interpretability
• Question answering• Information extraction• Computer vision
![Page 4: Learning Concept Taxonomies from Multi-modal Datahzhang2/projects/learning-taxonomies/slides.pdf · Learning Concept Taxonomies from Multi-modal Data Carnegie Mellon University Hao](https://reader033.vdocument.in/reader033/viewer/2022041422/5e1ffdac58ed39714b798494/html5/thumbnails/4.jpg)
4
Problem• Existing Taxonomies
– Knowledge/time intensive to build– Limited coverage– Unavailable
![Page 5: Learning Concept Taxonomies from Multi-modal Datahzhang2/projects/learning-taxonomies/slides.pdf · Learning Concept Taxonomies from Multi-modal Data Carnegie Mellon University Hao](https://reader033.vdocument.in/reader033/viewer/2022041422/5e1ffdac58ed39714b798494/html5/thumbnails/5.jpg)
5
Related Works (NLP)• Automatically induction of taxonomies
Widdows [2003] Snow et al [2006]
Yang and Callan [2009] Kozareva and Hovy[2010]
Poon and Domnigos[2010]
Navigli et al [2011]
Fu et al [2014] Bansal et al [2014]
![Page 6: Learning Concept Taxonomies from Multi-modal Datahzhang2/projects/learning-taxonomies/slides.pdf · Learning Concept Taxonomies from Multi-modal Data Carnegie Mellon University Hao](https://reader033.vdocument.in/reader033/viewer/2022041422/5e1ffdac58ed39714b798494/html5/thumbnails/6.jpg)
6
Problem
– Surface features• Ends with• Contains• Suffix match• …
shark
white shark
brid
bird of prey
• What evidence helps taxonomy induction?
![Page 7: Learning Concept Taxonomies from Multi-modal Datahzhang2/projects/learning-taxonomies/slides.pdf · Learning Concept Taxonomies from Multi-modal Data Carnegie Mellon University Hao](https://reader033.vdocument.in/reader033/viewer/2022041422/5e1ffdac58ed39714b798494/html5/thumbnails/7.jpg)
7
Problem• What evidence helps taxonomy induction?
– Semantics from text descriptions• Parent-child relation• Sibling relation [Bansal 2014]
seafish
shark ray
“seafish, such as shark…”
“rays are a group of seafishes…”
“Either shark or ray…”“Both shark and ray…”
![Page 8: Learning Concept Taxonomies from Multi-modal Datahzhang2/projects/learning-taxonomies/slides.pdf · Learning Concept Taxonomies from Multi-modal Data Carnegie Mellon University Hao](https://reader033.vdocument.in/reader033/viewer/2022041422/5e1ffdac58ed39714b798494/html5/thumbnails/8.jpg)
8
Problem• What evidence helps taxonomy induction?
– Semantics from text descriptions• Parent-child relation• Sibling relation [Bansal 2014]
“seafish, such as shark…”
“rays are a group of seafishes…”
“Either shark or ray…”“Both shark and ray…”
• Wikipedia abstract– Presence and distance– Patterns
• Web-ngrams• …
extracted as
![Page 9: Learning Concept Taxonomies from Multi-modal Datahzhang2/projects/learning-taxonomies/slides.pdf · Learning Concept Taxonomies from Multi-modal Data Carnegie Mellon University Hao](https://reader033.vdocument.in/reader033/viewer/2022041422/5e1ffdac58ed39714b798494/html5/thumbnails/9.jpg)
9
Problem• What evidence helps taxonomy induction?
– wordvec
– Projections between parent and child [Fu 2014]
d(𝑣 𝑘𝑖𝑛𝑔 , 𝑣 𝑞𝑢𝑒𝑒𝑛 ) ≈ 𝑑(𝑣 𝑚𝑎𝑛 , 𝑣 𝑤𝑜𝑚𝑎𝑛 )
𝑣 se𝑎𝑓𝑖𝑠ℎ − 𝑣 𝑠ℎ𝑎𝑟𝑘 𝑣 ℎ𝑢𝑚𝑎𝑛 − 𝑣(𝑤𝑜𝑚𝑎𝑛)?
![Page 10: Learning Concept Taxonomies from Multi-modal Datahzhang2/projects/learning-taxonomies/slides.pdf · Learning Concept Taxonomies from Multi-modal Data Carnegie Mellon University Hao](https://reader033.vdocument.in/reader033/viewer/2022041422/5e1ffdac58ed39714b798494/html5/thumbnails/10.jpg)
10
Motivation• How about images?
seafish
shark ray
Seafish
Shark
Ray
![Page 11: Learning Concept Taxonomies from Multi-modal Datahzhang2/projects/learning-taxonomies/slides.pdf · Learning Concept Taxonomies from Multi-modal Data Carnegie Mellon University Hao](https://reader033.vdocument.in/reader033/viewer/2022041422/5e1ffdac58ed39714b798494/html5/thumbnails/11.jpg)
Motivation• Our motivation
– Images may include perceptual semantics– Jointly leverage text and visual information (from the web)
• Problems to be addressed:– How to design visual features to capture the perceptual semantics?
– How to design models to integrate visual and text information?
![Page 12: Learning Concept Taxonomies from Multi-modal Datahzhang2/projects/learning-taxonomies/slides.pdf · Learning Concept Taxonomies from Multi-modal Data Carnegie Mellon University Hao](https://reader033.vdocument.in/reader033/viewer/2022041422/5e1ffdac58ed39714b798494/html5/thumbnails/12.jpg)
12
Related Works (CV)• Building visual hierarchies
Chen et al [2013]
Sivic et al [2008]Griffin and Perona [2008]
![Page 13: Learning Concept Taxonomies from Multi-modal Datahzhang2/projects/learning-taxonomies/slides.pdf · Learning Concept Taxonomies from Multi-modal Data Carnegie Mellon University Hao](https://reader033.vdocument.in/reader033/viewer/2022041422/5e1ffdac58ed39714b798494/html5/thumbnails/13.jpg)
Task Definition• Assume a set of N cateogries 𝒙 = 𝑥=, 𝑥>, … , 𝑥@
– Each category has a name and a set of images
• Goal: induce a taxonomy tree over 𝒙– Using both text & visual features
• Setting: Supervised learning of category hierarchies from data
x = {Animal, Fish, Shark, Cat, Tiger, Terrestrial animal, Seafish, Feline}
![Page 14: Learning Concept Taxonomies from Multi-modal Datahzhang2/projects/learning-taxonomies/slides.pdf · Learning Concept Taxonomies from Multi-modal Data Carnegie Mellon University Hao](https://reader033.vdocument.in/reader033/viewer/2022041422/5e1ffdac58ed39714b798494/html5/thumbnails/14.jpg)
14
ModelLet 𝑧B(1 ≤ 𝑧B ≤ 𝑁) be the index of the parent of category 𝑥B
– The set 𝐳 = {𝑧=, 𝑧>, … , 𝑧B} encodes the whole tree structure
• Our goal → infer the conditional distribution 𝑝(𝒛|𝒙)
x = {Animal, Fish, Shark, Cat, Tiger, Terrestrial animal, Seafish, Feline}
![Page 15: Learning Concept Taxonomies from Multi-modal Datahzhang2/projects/learning-taxonomies/slides.pdf · Learning Concept Taxonomies from Multi-modal Data Carnegie Mellon University Hao](https://reader033.vdocument.in/reader033/viewer/2022041422/5e1ffdac58ed39714b798494/html5/thumbnails/15.jpg)
15
Model Overview• Intuition: Categories tend to be closely related to parents and siblings
– (text) hypernym-hyponym relation: shark -> cat shark– visual similarity: images of shark⇔ images of ray
• Method: Induce features from distributed representations of images and text
– image: deep convnet– text: word embedding
![Page 16: Learning Concept Taxonomies from Multi-modal Datahzhang2/projects/learning-taxonomies/slides.pdf · Learning Concept Taxonomies from Multi-modal Data Carnegie Mellon University Hao](https://reader033.vdocument.in/reader033/viewer/2022041422/5e1ffdac58ed39714b798494/html5/thumbnails/16.jpg)
16
Taxonomy Induction Model• Notations:
– 𝒄B: child nodes of 𝑥B– 𝑥BN ∈ 𝒄B– 𝑔P: consistency term depending on features– 𝑤: model weights to be learned
parent indexes of categories
popularity (#child) of categories
prior of popularityconsistency of 𝑥BN with parent 𝑥Band siblings 𝒄𝒏\xBN
![Page 17: Learning Concept Taxonomies from Multi-modal Datahzhang2/projects/learning-taxonomies/slides.pdf · Learning Concept Taxonomies from Multi-modal Data Carnegie Mellon University Hao](https://reader033.vdocument.in/reader033/viewer/2022041422/5e1ffdac58ed39714b798494/html5/thumbnails/17.jpg)
17
Taxonomy Induction Model• Looking into 𝑔P:
– 𝑔 𝑥B, 𝑥BN , 𝒄B\𝑥BN evaluates how consistent a parent-child group is.
– The whole model is a factorization of consistency terms of all local parent-child groups.
consistency of 𝑥BN with parent 𝑥Band siblings 𝒄𝒏\xBN
![Page 18: Learning Concept Taxonomies from Multi-modal Datahzhang2/projects/learning-taxonomies/slides.pdf · Learning Concept Taxonomies from Multi-modal Data Carnegie Mellon University Hao](https://reader033.vdocument.in/reader033/viewer/2022041422/5e1ffdac58ed39714b798494/html5/thumbnails/18.jpg)
18
Model: Develop 𝑔P• Notations:
– 𝒄B: child nodes of 𝑥B– 𝑥BN ∈ 𝒄B– 𝑔P: consistency term depending on features– 𝑤: model weights to be learned
consistency of 𝑥BN with parent 𝑥Band siblings 𝒄𝒏\xBN
weight vector (to be learned)
feature vector: feature vector of 𝑥BN with parent 𝑥Band siblings 𝒄B\𝑥BN
![Page 19: Learning Concept Taxonomies from Multi-modal Datahzhang2/projects/learning-taxonomies/slides.pdf · Learning Concept Taxonomies from Multi-modal Data Carnegie Mellon University Hao](https://reader033.vdocument.in/reader033/viewer/2022041422/5e1ffdac58ed39714b798494/html5/thumbnails/19.jpg)
19
Feature: Develop 𝑓• Visual features:
– Sibling similarity– Parent-child similarity– Parent prediction
• Text features– Parent prediction [Fu et al.]– Sibling Similarity – Surface features [Bansal et al.]
![Page 20: Learning Concept Taxonomies from Multi-modal Datahzhang2/projects/learning-taxonomies/slides.pdf · Learning Concept Taxonomies from Multi-modal Data Carnegie Mellon University Hao](https://reader033.vdocument.in/reader033/viewer/2022041422/5e1ffdac58ed39714b798494/html5/thumbnails/20.jpg)
20
Feature: Develop 𝑓• Visual features: Sibling similarity (S-V1*)
– Step 1 : fit a Gaussian to the images of each category– Step 2: Derive the pairwise similarity 𝑣𝑖𝑠𝑠𝑖𝑚(𝑥B, 𝑥U)
– Step 3: Derive the groupwise similarity by averaging
S-V1 evaluates the visual similarity between siblings
* S: Siblings, V: Visual
![Page 21: Learning Concept Taxonomies from Multi-modal Datahzhang2/projects/learning-taxonomies/slides.pdf · Learning Concept Taxonomies from Multi-modal Data Carnegie Mellon University Hao](https://reader033.vdocument.in/reader033/viewer/2022041422/5e1ffdac58ed39714b798494/html5/thumbnails/21.jpg)
21
Feature: Develop 𝑓• Visual features: Parent-child Similarity (PC-V1*)
– Step 1 : Fit a Gaussian for child categories– Step 2: Fit a Gaussian for only the top-K images of parent categories
– Step 3 – 4: same with S-V1
* PC: Parent-child, V: Visual
Seafish
Shark
![Page 22: Learning Concept Taxonomies from Multi-modal Datahzhang2/projects/learning-taxonomies/slides.pdf · Learning Concept Taxonomies from Multi-modal Data Carnegie Mellon University Hao](https://reader033.vdocument.in/reader033/viewer/2022041422/5e1ffdac58ed39714b798494/html5/thumbnails/22.jpg)
22
Feature: Develop 𝑓• Visual features: Parent Prediction (PC-V2*)
– Step 1 : Learn a projection matrix to map the mean image of child category to the word embedding of its parent category
– Step 2: Calculate the distance
– Step 3: bin the distance as a feature vector
* PC: Parent-child, V: Visual
![Page 23: Learning Concept Taxonomies from Multi-modal Datahzhang2/projects/learning-taxonomies/slides.pdf · Learning Concept Taxonomies from Multi-modal Data Carnegie Mellon University Hao](https://reader033.vdocument.in/reader033/viewer/2022041422/5e1ffdac58ed39714b798494/html5/thumbnails/23.jpg)
23
Feature: Develop 𝑓• Text features
– Parent prediction [Fu et al.]• Parent prediction: projection from child to parent
– Sibling Similarity • Distance between word vectors
– Surface features [Bansal et al.]• Ends with (e.g. catshark is a sub-category of shark), LCS, Capitalization, etc.
![Page 24: Learning Concept Taxonomies from Multi-modal Datahzhang2/projects/learning-taxonomies/slides.pdf · Learning Concept Taxonomies from Multi-modal Data Carnegie Mellon University Hao](https://reader033.vdocument.in/reader033/viewer/2022041422/5e1ffdac58ed39714b798494/html5/thumbnails/24.jpg)
24
Parameter Estimation• Inference
– Gibbs sampling
• Learning– Supervised learning from gold taxonomies of training data
– Gradient descent-based maximum likelihood estimation
• Output taxonomies– Chao-Liu-Edmonds algorithm
![Page 25: Learning Concept Taxonomies from Multi-modal Datahzhang2/projects/learning-taxonomies/slides.pdf · Learning Concept Taxonomies from Multi-modal Data Carnegie Mellon University Hao](https://reader033.vdocument.in/reader033/viewer/2022041422/5e1ffdac58ed39714b798494/html5/thumbnails/25.jpg)
25
Experiment Setup• Implementation
– Wordvec: Google word2vec– Convnet: VGG-16
• Evaluation metric: Ancestor-F1 = >VWVXW
• Data– Training set: ImageNet taxonomies
![Page 26: Learning Concept Taxonomies from Multi-modal Datahzhang2/projects/learning-taxonomies/slides.pdf · Learning Concept Taxonomies from Multi-modal Data Carnegie Mellon University Hao](https://reader033.vdocument.in/reader033/viewer/2022041422/5e1ffdac58ed39714b798494/html5/thumbnails/26.jpg)
26
EvaluationResults: Comparison to baseline methods• Embedding-based feature (LV) is comparable to state-of-the-art
• Full feature set (LVB) achieve the best
• L: Language features– surface features– embedding features
• V: Visual features• B: Bansal2014 features
– web ngrams etc.• E: Embedding features
![Page 27: Learning Concept Taxonomies from Multi-modal Datahzhang2/projects/learning-taxonomies/slides.pdf · Learning Concept Taxonomies from Multi-modal Data Carnegie Mellon University Hao](https://reader033.vdocument.in/reader033/viewer/2022041422/5e1ffdac58ed39714b798494/html5/thumbnails/27.jpg)
27
EvaluationResults: How much visual features help?
Messages: • Visual similarity (S-V1, PC-V1) help a lot• The complexity of visual representations does not affect much
![Page 28: Learning Concept Taxonomies from Multi-modal Datahzhang2/projects/learning-taxonomies/slides.pdf · Learning Concept Taxonomies from Multi-modal Data Carnegie Mellon University Hao](https://reader033.vdocument.in/reader033/viewer/2022041422/5e1ffdac58ed39714b798494/html5/thumbnails/28.jpg)
28
EvaluationResults: Investigating PC-V1• Images of parent category are not all necessarily visually similar to images of child category
Seafish
Shark
![Page 29: Learning Concept Taxonomies from Multi-modal Datahzhang2/projects/learning-taxonomies/slides.pdf · Learning Concept Taxonomies from Multi-modal Data Carnegie Mellon University Hao](https://reader033.vdocument.in/reader033/viewer/2022041422/5e1ffdac58ed39714b798494/html5/thumbnails/29.jpg)
29
EvaluationResults: When/Where visual features help?
• Messages:– Shallow layers ↔abstract categories ↔ text features more effective– Deep layers ↔specific categories ↔ visual features more effective
Weights v.s. depth
![Page 30: Learning Concept Taxonomies from Multi-modal Datahzhang2/projects/learning-taxonomies/slides.pdf · Learning Concept Taxonomies from Multi-modal Data Carnegie Mellon University Hao](https://reader033.vdocument.in/reader033/viewer/2022041422/5e1ffdac58ed39714b798494/html5/thumbnails/30.jpg)
30
Take-home Message• Visual similarity helps taxonomy induction a lot
– Sibling similarity– Parent-child similarity
• Which features are more important?– Visual features are more indicative in near-leaf layers
– Text features more evident in near-root layers
• Embedding features augments word count features
![Page 31: Learning Concept Taxonomies from Multi-modal Datahzhang2/projects/learning-taxonomies/slides.pdf · Learning Concept Taxonomies from Multi-modal Data Carnegie Mellon University Hao](https://reader033.vdocument.in/reader033/viewer/2022041422/5e1ffdac58ed39714b798494/html5/thumbnails/31.jpg)
31
Thank You!Q & A
![Page 32: Learning Concept Taxonomies from Multi-modal Datahzhang2/projects/learning-taxonomies/slides.pdf · Learning Concept Taxonomies from Multi-modal Data Carnegie Mellon University Hao](https://reader033.vdocument.in/reader033/viewer/2022041422/5e1ffdac58ed39714b798494/html5/thumbnails/32.jpg)
32
EvaluationResults: Visualization