distributional semantics with eyes - enriching corpus...
TRANSCRIPT
![Page 1: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation](https://reader034.vdocument.in/reader034/viewer/2022052017/602fef97f1f168505343b379/html5/thumbnails/1.jpg)
Distributional semantics with eyesEnriching corpus-based models of word meaning
with automatically extracted visual features
Marco Baroni
Center for Mind/Brain SciencesUniversity of Trento
Computational Linguistics ColloquiumComputational Linguistics & Phonetics Department
Saarland UniversityMay 2012
1
![Page 2: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation](https://reader034.vdocument.in/reader034/viewer/2022052017/602fef97f1f168505343b379/html5/thumbnails/2.jpg)
Collaborators
award
Jasper Uijlings
Giang Binh Tran
Nam Khanh Tran
Gemma Boleda
Elia Bruni
2
![Page 3: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation](https://reader034.vdocument.in/reader034/viewer/2022052017/602fef97f1f168505343b379/html5/thumbnails/3.jpg)
Warning!
This talk is NOT about:
I image retrievalI image annotationI object recognitionI connecting specific images to captions/phrases/wordsI improving computer vision
The talk is about using information extracted from images toimprove the semantic representation of word types
3
![Page 4: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation](https://reader034.vdocument.in/reader034/viewer/2022052017/602fef97f1f168505343b379/html5/thumbnails/4.jpg)
The distributional hypothesisHarris, Charles and Miller, Firth, Wittgenstein? . . .
The meaning of a word is (can beapproximated by, derived from) the setof contexts in which it occurs in texts
4
![Page 5: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation](https://reader034.vdocument.in/reader034/viewer/2022052017/602fef97f1f168505343b379/html5/thumbnails/5.jpg)
The distributional hypothesis in real lifeMcDonald & Ramscar 2001
He filled the wampimuk, passed itaround and we all drunk some
We found a little, hairy wampimuksleeping behind the tree
5
![Page 6: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation](https://reader034.vdocument.in/reader034/viewer/2022052017/602fef97f1f168505343b379/html5/thumbnails/6.jpg)
Distributional semanticsLandauer and Dumais (1997), Turney and Pantel (2010), . . .
he curtains open and the moon shining in on the barelyars and the cold , close moon " . And neither of the wrough the night with the moon shining so brightly , itmade in the light of the moon . It all boils down , wrsurely under a crescent moon , thrilled by ice-white
sun , the seasons of the moon ? Home , alone , Jay plam is dazzling snow , the moon has risen full and coldun and the temple of the moon , driving out of the hugin the dark and now the moon rises , full and amber a
bird on the shape of the moon over the trees in frontBut I could n’t see the moon or the stars , only the
rning , with a sliver of moon hanging among the starsthey love the sun , the moon and the stars . None of
the light of an enormous moon . The plash of flowing wman ’s first step on the moon ; various exhibits , aerthe inevitable piece of moon rock . Housing The Airsh
oud obscured part of the moon . The Allied guns behind
6
![Page 7: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation](https://reader034.vdocument.in/reader034/viewer/2022052017/602fef97f1f168505343b379/html5/thumbnails/7.jpg)
Distributional semanticsDistributional meaning as co-occurrence vector
planet night full shadow shine crescent
moon 10 22 43 16 29 12
sun 14 10 4 15 45 0
dog 0 4 2 10 0 0
7
![Page 8: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation](https://reader034.vdocument.in/reader034/viewer/2022052017/602fef97f1f168505343b379/html5/thumbnails/8.jpg)
Distributional semanticsThe geometry of meaning
shadow shinemoon 16 29sun 15 45dog 10 0
0 5 10 15 20
010
2030
4050
shadow
shine
dog (10,0)
sun (15,45)
moon (16,29)
8
![Page 9: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation](https://reader034.vdocument.in/reader034/viewer/2022052017/602fef97f1f168505343b379/html5/thumbnails/9.jpg)
Distributional semantics: A general-purposerepresentation of lexical meaningBaroni and Lenci, 2010
I Similarity (cord-string vs. cord-smile)I Synonymy (zenith-pinnacle)I Concept categorization (car ISA vehicle; banana ISA fruit)I Selectional preferences (eat topinambur vs. *eat sympathy)I Analogy (mason is to stone like carpenter is to wood)I Relation classification (exam-anxiety are in
CAUSE-EFFECT relation)I Qualia (TELIC ROLE of novel is to entertain)I Salient properties (car-wheels, dog-barking)I Argument alternations (John broke the vase - the vase
broke, John minces the meat - *the meat minced)
9
![Page 10: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation](https://reader034.vdocument.in/reader034/viewer/2022052017/602fef97f1f168505343b379/html5/thumbnails/10.jpg)
The ungrounded nature of distributional semanticsGlenberg and Robertson 2000, Andrews, Vigliocco and Vinson 2009,Riordan and Jones 2010. . .
Describing tigers. . .
Humans (McRae et al.,2005):
I have stripesI have teethI are blackI . . .
State-of-the art distributionalmodel (Baroni et al., 2010):
I live in jungleI can killI risk extinctionI . . .
10
![Page 11: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation](https://reader034.vdocument.in/reader034/viewer/2022052017/602fef97f1f168505343b379/html5/thumbnails/11.jpg)
The ungrounded nature of distributional semantics
SV
DS
UB
JEC
TS
TAXO RELENT PART QUALITY ACTIVITY FUNCTION LOCATION
Baroni and Lenci 2008 11
![Page 12: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation](https://reader034.vdocument.in/reader034/viewer/2022052017/602fef97f1f168505343b379/html5/thumbnails/12.jpg)
Interlude: “blind” semantics?
SIGHTED
BLIND
ABS_ENT
ABS_PROP
CONC_ENT
CONC_PROP
EVENT
PART
SPACE
TAXO
TIME
Cazzolli, Baroni, Lenci and Marotta in preparation12
![Page 13: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation](https://reader034.vdocument.in/reader034/viewer/2022052017/602fef97f1f168505343b379/html5/thumbnails/13.jpg)
The distributional hypothesis, generalized
The meaning of a word is (can beapproximated by, derived from) the setof contexts in which it occurs //in///////texts
13
![Page 14: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation](https://reader034.vdocument.in/reader034/viewer/2022052017/602fef97f1f168505343b379/html5/thumbnails/14.jpg)
Context in the 2010s
14
![Page 15: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation](https://reader034.vdocument.in/reader034/viewer/2022052017/602fef97f1f168505343b379/html5/thumbnails/15.jpg)
Multimodal distributional semanticsusing textual and visual collocates
planet night
moon 10 22 22 0
sun 14 10 15 0
dog 0 4 0 20
15
![Page 16: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation](https://reader034.vdocument.in/reader034/viewer/2022052017/602fef97f1f168505343b379/html5/thumbnails/16.jpg)
Bags of visual wordsSivic and Zisserman, 2003
3 3 0 0 2 3 1 116
![Page 17: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation](https://reader034.vdocument.in/reader034/viewer/2022052017/602fef97f1f168505343b379/html5/thumbnails/17.jpg)
Determining the visual vocabulary
!
!
!
17
![Page 18: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation](https://reader034.vdocument.in/reader034/viewer/2022052017/602fef97f1f168505343b379/html5/thumbnails/18.jpg)
Representing images as bags-of-visual-word vectors
!"#$%&'
!"#$ % # & #
18
![Page 19: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation](https://reader034.vdocument.in/reader034/viewer/2022052017/602fef97f1f168505343b379/html5/thumbnails/19.jpg)
Associating bags-of-visual-word vectorsto word labels
!""#
!""# $ % & %
19
![Page 20: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation](https://reader034.vdocument.in/reader034/viewer/2022052017/602fef97f1f168505343b379/html5/thumbnails/20.jpg)
Simple multimodal matricesBruni, G.B. Tran and Baroni 2011
!"#$%& !"'()
*%%( +, -. / 0
!1( +2 02 - .
$%3 +/ / . +
See also Feng and Lapata 2010, Leong and Mihalcea 2011 20
![Page 21: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation](https://reader034.vdocument.in/reader034/viewer/2022052017/602fef97f1f168505343b379/html5/thumbnails/21.jpg)
The ESP labeled-image data setvon Ahn, 2003
framefacealienboldgreysmile
malldevilredpictureman
bedmotelwhitelampflowerbreakfastpillowsfruithotel 21
![Page 22: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation](https://reader034.vdocument.in/reader034/viewer/2022052017/602fef97f1f168505343b379/html5/thumbnails/22.jpg)
Distribution of related concepts in text-vs. image-based spaceBLESS data set, 184 concrete concepts
Text
●
●
●
●
●
●●●
●
●●
●●
●
●
●
●
●
●
● ●
●●
●
●
COORD HYPER MERO ATTRI EVENT RAN.N RAN.J RAN.V
−2
−1
01
2
Image
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
COORD HYPER MERO ATTRI EVENT RAN.N RAN.J RAN.V
−2
−1
01
2
22
![Page 23: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation](https://reader034.vdocument.in/reader034/viewer/2022052017/602fef97f1f168505343b379/html5/thumbnails/23.jpg)
Nearest attributes of BLESS concepts
concept text imagecabbage leafy whitecarrot fresh orangecherry ripe reddeer wild browndishwasher electric whitehat white oldhatchet sharp shortonion fresh whiteoven electric newplum juicy redsparrow wild littletanker heavy grey 23
![Page 24: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation](https://reader034.vdocument.in/reader034/viewer/2022052017/602fef97f1f168505343b379/html5/thumbnails/24.jpg)
Multimodal fusionBruni, N.K. Tran and Baroni submitted
Latent multimodal smoothing(2)
normalize and concatenate
Textual fea-ture matrix
Visual fea-ture matrix
Text corpus Labeledimage data(1)
split blocks
Textualsmoothed matrix
Visualsmoothed matrix
Multimodal similarity estimation(3)
Textual features
Visual features
Featurecombination
Similarityestimate
Textual features
Visual features
Similarityestimate
Similarityestimate
Scorecombination
24
![Page 25: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation](https://reader034.vdocument.in/reader034/viewer/2022052017/602fef97f1f168505343b379/html5/thumbnails/25.jpg)
Predicting human semantic relatedness judgments
Window 2 Window 20Model MEN WordSim MEN WordSimText 0.73 0.70 0.68 0.70Image 0.43 0.36 0.43 0.36SmoothedText 0.77 0.73 0.74 0.75FeatureCombine 0.78 0.72 0.76 0.75ScoreCombine 0.78 0.71 0.77 0.72
25
![Page 26: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation](https://reader034.vdocument.in/reader034/viewer/2022052017/602fef97f1f168505343b379/html5/thumbnails/26.jpg)
Similar conceptsbest captured by Text vs. FeatureCombine
Text FeatureCombinedawn/dusk pet/puppysunrise/sunset candy/chocolatecanine/dog paw/petgrape/wine bicycle/bikefoliage/plant apple/cherryfoliage/petal copper/metalskyscraper/tall military/soldiercat/feline paws/whiskerspregnancy/pregnant stream/waterfallmisty/rain cheetah/lion
26
![Page 27: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation](https://reader034.vdocument.in/reader034/viewer/2022052017/602fef97f1f168505343b379/html5/thumbnails/27.jpg)
Distributional semantics in TechnicolorBruni, Boleda, Baroni and N.K. Tran 2012
Experiment 1 find typical color of 52 concrete objects:cardboard is brown, coal is black, forest is green(typical colors assigned by two judges byconsensus)
Experiment 2 distinguish literal and non-literal usages of coloradjectives: blue uniform, blue shark, blue note(342 adjective-noun pairs, 227 literal, 115non-literal, as decided by two judges byconsensus)
27
![Page 28: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation](https://reader034.vdocument.in/reader034/viewer/2022052017/602fef97f1f168505343b379/html5/thumbnails/28.jpg)
Experiment 1 resultsMedian rank of correct color and number of top matches
Model Exp 1TEXT30K 3 (11)
LAB128 1 (27)
SIFT40K 3 (15)
TEXT+LAB128 1 (27)
TEXT+SIFT40K 2 (17)
28
![Page 29: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation](https://reader034.vdocument.in/reader034/viewer/2022052017/602fef97f1f168505343b379/html5/thumbnails/29.jpg)
Experiment 1 examples
word gold LAB SIFT TEXTbanana yellow yellow blue orange
cauliflower white green yellow orangecello brown brown black bluedeer brown green blue redfroth white brown black orange
gorilla black black red greygrass green green green greenpig pink pink brown brownsea blue blue blue grey
weed green green yellow purple
29
![Page 30: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation](https://reader034.vdocument.in/reader034/viewer/2022052017/602fef97f1f168505343b379/html5/thumbnails/30.jpg)
Experiment 2 resultsAverage difference in normalized adj-noun cosines in literal vs. non-literal conditionswith t-test significance
Model Exp 1 Exp2TEXT30K 3 (11) .53***LAB128 1 (27) .25*SIFT40K 3 (15) .57***TEXT+LAB128 1 (27) .36***TEXT+SIFT40K 2 (17) .73***
30
![Page 31: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation](https://reader034.vdocument.in/reader034/viewer/2022052017/602fef97f1f168505343b379/html5/thumbnails/31.jpg)
Experiment 2 color breakdown
L N
0.05
0.10
0.15
0.20
0.25
0.30
Vision: black
●
●
●
●
●
●
●
L N
0.0
0.1
0.2
0.3
0.4
0.5
Text: black
L N
0.10
0.15
0.20
0.25
0.30
0.35
Vision: blue
●
●
L N
0.0
0.1
0.2
0.3
Text: blue
●
●
L N
0.05
0.15
0.25
Vision: green
●
●
●
L N
0.00
0.04
0.08
0.12
Text: green
L N
0.05
0.10
0.15
0.20
0.25
0.30
Vision: red
●
●
●
L N
0.00
0.10
0.20
0.30
Text: red
●
L N
0.05
0.10
0.15
0.20
0.25
0.30
Vision: white
●
●
●
●
●
●●
L N
0.00
0.05
0.10
0.15
Text: white
black issue, culture, business blue note, shark, shieldgreen future, politics, energy red meat, belt, face
31
![Page 32: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation](https://reader034.vdocument.in/reader034/viewer/2022052017/602fef97f1f168505343b379/html5/thumbnails/32.jpg)
What would you rather eat?Bergsma and Goebel 2011
I migas?
I zeolite?
I carillons?
I a ficus?
I a mamey?
I manioc?
32
![Page 33: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation](https://reader034.vdocument.in/reader034/viewer/2022052017/602fef97f1f168505343b379/html5/thumbnails/33.jpg)
What would you rather eat?Bergsma and Goebel 2011
Figure 1: Which out-of-vocabulary nouns areplausible direct objects for the verb eat? Each rowcorresponds to a noun: 1. migas, 2. zeolite, 3.carillon, 4. ficus, 5. mamey and 6. manioc.
sponding classifier that scores noun arguments onthe basis of various textual features. We use thisdiscriminative framework to incorporate the visualinformation as new, visual features.Our experiments evaluate the ability of these
classifiers to correctly predict the selectional pref-erences of a small set of verbs. We evaluate twocases: 1) the case where the nouns are all as-sumed to be out-of-vocabulary, and the classifiersmust make predictions without any corpus-basedco-occurrence information, and 2) the case wherewe assume access to noun-verb co-occurrence in-formation derived from web-scale N-gram data.We show that visual features are useful for some
verbs, but not for others. For verbs taking abstractarguments without definitive visual features, theclassifier can often learn to disregard the visualdata. On the other hand, for verbs taking physi-cal arguments (such as food, animals, or people),the classifier can make accurate predictions usingthe nouns’ visual properties. In these cases, visualinformation remains useful even after incorporat-ing the web-scale statistics.
2 Visual Selectional Preference
Consider determining whether the nouns carillon,migas and mamey are plausible arguments for the
verb eat. Existing systems are unlikely to havesuch words in their training data, let alone infor-mation about their edibility. However, after in-specting a few images returned by a Google searchfor these words (Figure 1), a human might rea-sonably predict which words are edible. Humansmake this determination by observing both intrin-sic visual properties (pits, skins, rounded shapesand fruity colors) and extrinsic visual context (cir-cular plates, bowls, and other food-related tools)(Oliva and Torralba, 2007).We propose using similar information to pre-
dict the plausibility of arbitrary verb-noun pairs.That is, we aim to learn the distinguishing vi-sual features of all nouns that are plausible argu-ments for a given verb. This differs from workthat has aimed to recognize, annotate and retrieveobjects defined by a single phrase, such as tree orwrist watch (Feng and Lapata, 2010a). These ap-proaches learn from labeled images during train-ing in order to assign words to unlabeled imagesduring testing. In contrast, we analyze labeled im-ages (during training and testing) in order to deter-mine their visual compatibility with a given predi-cate. Our approach does not need labeled trainingimages for a specific noun in order to assess thatnoun during testing; e.g. we can make a reason-able prediction for the plausibility of eat mameyeven if we’ve never encountered mamey before.We now specify how we automatically 1) down-
load a set of images for each noun, 2) extract vi-sual features from each image, and 3) combine thevisual features from multiple images into plausi-bility scores. Scripts, code and data are availableat: www.clsp.jhu.edu/∼sbergsma/ImageSP/.
2.1 Mining noun images from the web
To obtain a set of images for a particular noun ar-gument, we submit the noun as a query to eitherthe Flickr photo-sharing website (www.flickr.com), or Google’s image search (www.google.com/imghp). In both cases, we download thethumbnails on the results page directly rather thandownloading the source images. Flickr returns im-ages by matching the query against user-providedtags and accompanying text. Google retrieves im-ages based on the image caption, file-name, andsurrounding text (Feng and Lapata, 2010a). Im-ages obtained from Google are known to be com-petitive with “hand prepared datasets” for trainingobject recognizers (Fergus et al., 2005).
400
33
![Page 34: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation](https://reader034.vdocument.in/reader034/viewer/2022052017/602fef97f1f168505343b379/html5/thumbnails/34.jpg)
Coda: the illustrated distributional hypothesisBruni, Uijlings and Baroni rejected
The meaning of a visually depicted concept is (canbe approximated by, derived from) the set ofcontexts in which it occurs in images
34
![Page 35: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation](https://reader034.vdocument.in/reader034/viewer/2022052017/602fef97f1f168505343b379/html5/thumbnails/35.jpg)
The illustrated distributional hypothesisSperman correlation of image-based models with semantic relatedness intuitionsfor 20 concrete Pascal concepts
Segmentation:Area No Manual AutomaticConcept NA 39 36Surround NA 50 51Concept+Surround 47 54 54
!"#!$%&
'())"(#*
35