image labeling on a network: using social-network...

Image Labeling on a Network: Using Social-Network Metadata for Image ClassificationJulian McAuley, Jure Leskovec

Stanford

Abstract

We study the use of social network metadata for image classifica-tion. Existing multimodal classification frameworks use metadatasuch GPS, EXIF, tags, and user profiles. However, online photosharing networks like Flickr include several additional sources ofmetadata that can be harnessed for image classification. We buildrelational models for such types of metadata.

Building a graph of related images

annotated with the same tag

submitted to the same group

taken from the same location

posted by the same user

We form edges between images with common metadata. Edgefeatures include the number of common tags, groups, collections,and galleries, as well as location and user profile information.

Model

We model image labels in terms of image features φ(xi), and im-age relationships φ(xi, xj). We then label an entire dataset accord-ing to

argmaxY∈{−1,1}N

N∑i=1

yi 〈φ(xi), θnode〉︸︷︷︸

wi

+N∑

i=1

N∑j=1

δ(yi = yj) 〈φ(xi, xj), θedge〉︸︷︷︸

wij

,

which can be done efficiently using graph cuts so long as relatedimages prefer to have similar labels. We learn the optimal model(θnode; θedge) using Structured Learning approaches.

Data

CLEF PASCAL MIR NUS ALLNumber of photos 4546 10189 14460 244762 268587Number of users 2663 8698 5661 48870 58522Number of tags 21192 27250 51040 422364 450003Number of groups 10575 6951 21894 95358 98659Number of comments 77837 16669 248803 9837732 10071439Number of sets 6066 8070 15854 165039 182734Number of galleries 1026 155 3728 100189 102116Number of locations 1007 1222 2755 22106 23745Number of labels 99 20 14 81 214

We augment four popular datasets using metadata from Flickr.Our data is available at i.stanford.edu/˜julian/

Evaluation

We evaluate our method using publishedclassification results from the four bench-mark datasets we consider. For computa-tional reasons our method is optimized tominimize the Balanced Error Rate

∆(Y,Yc) =12

[|Ypos \ Ypos

c ||Ypos

c |︸︷︷︸false positive rate

+|Yneg \ Yneg

c ||Yneg

c |︸︷︷︸false negative rate

],

though we also report the Mean AveragePrecision for the sake of comparison.

In addition, we use our model to recommendtags and groups for each image.

Image labeling results

MAP (CLEF) MAP (PASCAL) MAP (MIR) MAP (NUS)

.37

.55 .60.49

label prediction:

1−∆ (CLEF) 1−∆ (PASCAL) 1−∆ (MIR) 1−∆ (NUS)

.64.74

.83

.61

best text-only methods (CLEF, from [4])best visual-only methods (CLEF, PASCAL, from [2,4])low-level features, SVM (MIR, from [3])low-level features and tags, SVM (MIR, from [3])low-level image featurestag-only ‘flat’ modelall-features flat modelgraphical model with social metadata

Tag and group prediction

1−∆(CLEF)

1−∆(PASCAL)

1−∆(MIR)

.69 .66 .69

MAP(CLEF)

MAP(PASCAL)

MAP(MIR)

.18.15

.20

tag recommendation:

1−∆(CLEF)

1−∆(PASCAL)

1−∆(MIR)

.69 .74 .71

MAP(CLEF)

MAP(PASCAL)

MAP(MIR)

.22.24

.22

group recommendation:

low-level image featuresgroups and wordsgraphical modelwith social metadata

low-level image featurestags and wordsgraphical modelwith social metadata

Which social network features are useful?

CLEF labels

Taken by friendsTaken by the same personTaken in the same location

Number of common galleriesNumber of common collections

Number of common groupsNumber of common tags

PASCAL labels MIR labels NUS labels tags (MIR) groups (MIR)

We confirm existing findings that tag and GPS data are useful for classification, while also finding that other sources of metadata are informative.

Do images with similar metadata have similar labels?

0 81shared tags

0

7

shar

edla

bels

tags vs. labels

0 40shared groups

0

7

shar

edla

bels

groups vs. labels

0 9shared collections

0

7

shar

edla

bels

collections vs. labels

false trueshared galleries

0

7

shar

edla

bels

galleries vs. labels

false trueshared location

0

7

shar

edla

bels

location vs. labels

false trueshared user

0

7

shar

edla

bels

user vs. labels

0 58shared tags

0

5

shar

edla

bels

tags vs. labels

0 21shared groups

0

5

shar

edla

bels

groups vs. labels


0

5

shar

edla

bels


false trueshared galleries

0

5

shar

edla

bels



0

5

shar

edla

bels

location vs. labels


0

5

shar

edla

bels

user vs. labels

0 91shared tags

0

8

shar

edla

bels

tags vs. labels

0 53shared groups

0

8

shar

edla

bels

groups vs. labels


0

8

shar

edla

bels


0 2shared galleries

0

8

shar

edla

bels



0

8

shar

edla

bels

location vs. labels


0

8

shar

edla

bels

user vs. labels

PASCAL MIR NUS average

For all forms of metadata, we find that images with similar metadata tend to have similar labels.

References

[1] J. J. McAuley and J. Leskovec.Image labeling on a network: Using social-networkmetadata for image classification.In ECCV, 2012.

[2] M. Everingham, L. Van Gool, C Williams, J. Winn,and A. Zisserman.The PASCAL visual object classes (VOC) challenge.IJCV, 2010.

[3] Mark Huiskes and Michael Lew.The MIR Flickr retrieval evaluation.In CIVR, 2008.

[4] Stefanie Nowak and Mark Huiskes.New strategies for image annotation: Overview ofthe photo annotation task at ImageCLEF 2010.In CLEF, 2010.

[5] Tat-Seng Chua, Jinhui Tang, Richang Hong, HaojieLi, Zhiping Luo, and Yan-Tao Zheng.NUS-WIDE: A real-world web image database fromthe National University of Singapore.In CIVR, 2009.

[email protected], [email protected]

i.stanford.edu/~julian/

[email protected]

[email protected]

image labeling on a network: using social-network...

Documents