image labeling on a network: using social-network...
TRANSCRIPT
Image Labeling on a Network: Using Social-Network Metadata for Image ClassificationJulian McAuley, Jure Leskovec
Stanford
Abstract
We study the use of social network metadata for image classifica-tion. Existing multimodal classification frameworks use metadatasuch GPS, EXIF, tags, and user profiles. However, online photosharing networks like Flickr include several additional sources ofmetadata that can be harnessed for image classification. We buildrelational models for such types of metadata.
Building a graph of related images
annotated with the same tag
submitted to the same group
taken from the same location
posted by the same user
We form edges between images with common metadata. Edgefeatures include the number of common tags, groups, collections,and galleries, as well as location and user profile information.
Model
We model image labels in terms of image features φ(xi), and im-age relationships φ(xi, xj). We then label an entire dataset accord-ing to
argmaxY∈{−1,1}N
N∑i=1
yi 〈φ(xi), θnode〉︸ ︷︷ ︸
wi
+N∑
i=1
N∑j=1
δ(yi = yj) 〈φ(xi, xj), θedge〉︸ ︷︷ ︸
wij
,
which can be done efficiently using graph cuts so long as relatedimages prefer to have similar labels. We learn the optimal model(θnode; θedge) using Structured Learning approaches.
Data
CLEF PASCAL MIR NUS ALLNumber of photos 4546 10189 14460 244762 268587Number of users 2663 8698 5661 48870 58522Number of tags 21192 27250 51040 422364 450003Number of groups 10575 6951 21894 95358 98659Number of comments 77837 16669 248803 9837732 10071439Number of sets 6066 8070 15854 165039 182734Number of galleries 1026 155 3728 100189 102116Number of locations 1007 1222 2755 22106 23745Number of labels 99 20 14 81 214
We augment four popular datasets using metadata from Flickr.Our data is available at i.stanford.edu/˜julian/
Evaluation
We evaluate our method using publishedclassification results from the four bench-mark datasets we consider. For computa-tional reasons our method is optimized tominimize the Balanced Error Rate
∆(Y,Yc) =12
[|Ypos \ Ypos
c ||Ypos
c |︸ ︷︷ ︸false positive rate
+|Yneg \ Yneg
c ||Yneg
c |︸ ︷︷ ︸false negative rate
],
though we also report the Mean AveragePrecision for the sake of comparison.
In addition, we use our model to recommendtags and groups for each image.
Image labeling results
MAP (CLEF) MAP (PASCAL) MAP (MIR) MAP (NUS)
.37
.55 .60.49
label prediction:
1−∆ (CLEF) 1−∆ (PASCAL) 1−∆ (MIR) 1−∆ (NUS)
.64.74
.83
.61
best text-only methods (CLEF, from [4])best visual-only methods (CLEF, PASCAL, from [2,4])low-level features, SVM (MIR, from [3])low-level features and tags, SVM (MIR, from [3])low-level image featurestag-only ‘flat’ modelall-features flat modelgraphical model with social metadata
Tag and group prediction
1−∆(CLEF)
1−∆(PASCAL)
1−∆(MIR)
.69 .66 .69
MAP(CLEF)
MAP(PASCAL)
MAP(MIR)
.18.15
.20
tag recommendation:
1−∆(CLEF)
1−∆(PASCAL)
1−∆(MIR)
.69 .74 .71
MAP(CLEF)
MAP(PASCAL)
MAP(MIR)
.22.24
.22
group recommendation:
low-level image featuresgroups and wordsgraphical modelwith social metadata
low-level image featurestags and wordsgraphical modelwith social metadata
Which social network features are useful?
CLEF labels
Taken by friendsTaken by the same personTaken in the same location
Number of common galleriesNumber of common collections
Number of common groupsNumber of common tags
PASCAL labels MIR labels NUS labels tags (MIR) groups (MIR)
We confirm existing findings that tag and GPS data are useful for classification, while also finding that other sources of metadata are informative.
Do images with similar metadata have similar labels?
0 81shared tags
0
7
shar
edla
bels
tags vs. labels
0 40shared groups
0
7
shar
edla
bels
groups vs. labels
0 9shared collections
0
7
shar
edla
bels
collections vs. labels
false trueshared galleries
0
7
shar
edla
bels
galleries vs. labels
false trueshared location
0
7
shar
edla
bels
location vs. labels
false trueshared user
0
7
shar
edla
bels
user vs. labels
0 58shared tags
0
5
shar
edla
bels
tags vs. labels
0 21shared groups
0
5
shar
edla
bels
groups vs. labels
0 11shared collections
0
5
shar
edla
bels
collections vs. labels
false trueshared galleries
0
5
shar
edla
bels
galleries vs. labels
false trueshared location
0
5
shar
edla
bels
location vs. labels
false trueshared user
0
5
shar
edla
bels
user vs. labels
0 91shared tags
0
8
shar
edla
bels
tags vs. labels
0 53shared groups
0
8
shar
edla
bels
groups vs. labels
0 12shared collections
0
8
shar
edla
bels
collections vs. labels
0 2shared galleries
0
8
shar
edla
bels
galleries vs. labels
false trueshared location
0
8
shar
edla
bels
location vs. labels
false trueshared user
0
8
shar
edla
bels
user vs. labels
PASCAL MIR NUS average
For all forms of metadata, we find that images with similar metadata tend to have similar labels.
References
[1] J. J. McAuley and J. Leskovec.Image labeling on a network: Using social-networkmetadata for image classification.In ECCV, 2012.
[2] M. Everingham, L. Van Gool, C Williams, J. Winn,and A. Zisserman.The PASCAL visual object classes (VOC) challenge.IJCV, 2010.
[3] Mark Huiskes and Michael Lew.The MIR Flickr retrieval evaluation.In CIVR, 2008.
[4] Stefanie Nowak and Mark Huiskes.New strategies for image annotation: Overview ofthe photo annotation task at ImageCLEF 2010.In CLEF, 2010.
[5] Tat-Seng Chua, Jinhui Tang, Richang Hong, HaojieLi, Zhiping Luo, and Yan-Tao Zheng.NUS-WIDE: A real-world web image database fromthe National University of Singapore.In CIVR, 2009.