image labeling on a network: using social-network...

1
Image Labeling on a Network: Using Social-Network Metadata for Image Classification Julian McAuley, Jure Leskovec Stanford Abstract We study the use of social network metadata for image classifica- tion. Existing multimodal classification frameworks use metadata such GPS, EXIF, tags, and user profiles. However, online photo sharing networks like Flickr include several additional sources of metadata that can be harnessed for image classification. We build relational models for such types of metadata. Building a graph of related images annotated with the same tag submitted to the same group taken from the same location posted by the same user We form edges between images with common metadata. Edge features include the number of common tags, groups, collections, and galleries, as well as location and user profile information. Model We model image labels in terms of image features φ(x i ), and im- age relationships φ(x i , x j ). We then label an entire dataset accord- ing to argmax Y ∈{-1,1} N N X i=1 y i hφ(x i )node i | {z } w i + N X i=1 N X j=1 δ (y i = y j ) hφ(x i , x j )edge i | {z } w ij , which can be done efficiently using graph cuts so long as related images prefer to have similar labels. We learn the optimal model (θ node ; θ edge ) using Structured Learning approaches. Data CLEF PASCAL MIR NUS ALL Number of photos 4546 10189 14460 244762 268587 Number of users 2663 8698 5661 48870 58522 Number of tags 21192 27250 51040 422364 450003 Number of groups 10575 6951 21894 95358 98659 Number of comments 77837 16669 248803 9837732 10071439 Number of sets 6066 8070 15854 165039 182734 Number of galleries 1026 155 3728 100189 102116 Number of locations 1007 1222 2755 22106 23745 Number of labels 99 20 14 81 214 We augment four popular datasets using metadata from Flickr. Our data is available at i.stanford.edu/ ˜ julian/ Evaluation We evaluate our method using published classification results from the four bench- mark datasets we consider. For computa- tional reasons our method is optimized to minimize the Balanced Error Rate Δ(Y , Y c )= 1 2 |Y pos \ Y pos c | |Y pos c | | {z } false positive rate + |Y neg \ Y neg c | |Y neg c | | {z } false negative rate , though we also report the Mean Average Precision for the sake of comparison. In addition, we use our model to recommend tags and groups for each image. Image labeling results MAP (CLEF) MAP (PASCAL) MAP (MIR) MAP (NUS) .37 .55 .60 .49 label prediction: 1 (CLEF) 1 (PASCAL) 1 (MIR) 1 (NUS) .64 .74 .83 .61 best text-only methods (CLEF, from [4]) best visual-only methods (CLEF, PASCAL, from [2,4]) low-level features, SVM (MIR, from [3]) low-level features and tags, SVM (MIR, from [3]) low-level image features tag-only ‘flat’ model all-features flat model graphical model with social metadata Tag and group prediction 1 (CLEF) 1 (PASCAL) 1 (MIR) .69 .66 .69 MAP (CLEF) MAP (PASCAL) MAP (MIR) .18 .15 .20 tag recommendation: 1 (CLEF) 1 (PASCAL) 1 (MIR) .69 .74 .71 MAP (CLEF) MAP (PASCAL) MAP (MIR) .22 .24 .22 group recommendation: low-level image features groups and words graphical model with social metadata low-level image features tags and words graphical model with social metadata Which social network features are useful? CLEF labels Taken by friends Taken by the same person Taken in the same location Number of common galleries Number of common collections Number of common groups Number of common tags PASCAL labels MIR labels NUS labels tags (MIR) groups (MIR) We confirm existing findings that tag and GPS data are useful for classification, while also finding that other sources of metadata are informative. Do images with similar metadata have similar labels? 0 81 shared tags 0 7 shared labels tags vs. labels 0 40 shared groups 0 7 shared labels groups vs. labels 0 9 shared collections 0 7 shared labels collections vs. labels false true shared galleries 0 7 shared labels galleries vs. labels false true shared location 0 7 shared labels location vs. labels false true shared user 0 7 shared labels user vs. labels 0 58 shared tags 0 5 shared labels tags vs. labels 0 21 shared groups 0 5 shared labels groups vs. labels 0 11 shared collections 0 5 shared labels collections vs. labels false true shared galleries 0 5 shared labels galleries vs. labels false true shared location 0 5 shared labels location vs. labels false true shared user 0 5 shared labels user vs. labels 0 91 shared tags 0 8 shared labels tags vs. labels 0 53 shared groups 0 8 shared labels groups vs. labels 0 12 shared collections 0 8 shared labels collections vs. labels 0 2 shared galleries 0 8 shared labels galleries vs. labels false true shared location 0 8 shared labels location vs. labels false true shared user 0 8 shared labels user vs. labels PASCAL MIR NUS average For all forms of metadata, we find that images with similar metadata tend to have similar labels. References [1] J. J. McAuley and J. Leskovec. Image labeling on a network: Using social-network metadata for image classification. In ECCV, 2012. [2] M. Everingham, L. Van Gool, C Williams, J. Winn, and A. Zisserman. The PASCAL visual object classes (VOC) challenge. IJCV, 2010. [3] Mark Huiskes and Michael Lew. The MIR Flickr retrieval evaluation. In CIVR, 2008. [4] Stefanie Nowak and Mark Huiskes. New strategies for image annotation: Overview of the photo annotation task at ImageCLEF 2010. In CLEF, 2010. [5]Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhiping Luo, and Yan-Tao Zheng. NUS-WIDE: A real-world web image database from the National University of Singapore. In CIVR, 2009. [email protected], [email protected]

Upload: others

Post on 29-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Image Labeling on a Network: Using Social-Network …i.stanford.edu/~julian/pdfs/poster_eccv12.pdfWhich social network features are useful? We confirm existing findings that tag

Image Labeling on a Network: Using Social-Network Metadata for Image ClassificationJulian McAuley, Jure Leskovec

Stanford

Abstract

We study the use of social network metadata for image classifica-tion. Existing multimodal classification frameworks use metadatasuch GPS, EXIF, tags, and user profiles. However, online photosharing networks like Flickr include several additional sources ofmetadata that can be harnessed for image classification. We buildrelational models for such types of metadata.

Building a graph of related images

annotated with the same tag

submitted to the same group

taken from the same location

posted by the same user

We form edges between images with common metadata. Edgefeatures include the number of common tags, groups, collections,and galleries, as well as location and user profile information.

Model

We model image labels in terms of image features φ(xi), and im-age relationships φ(xi, xj). We then label an entire dataset accord-ing to

argmaxY∈{−1,1}N

N∑i=1

yi 〈φ(xi), θnode〉︸ ︷︷ ︸

wi

+N∑

i=1

N∑j=1

δ(yi = yj) 〈φ(xi, xj), θedge〉︸ ︷︷ ︸

wij

,

which can be done efficiently using graph cuts so long as relatedimages prefer to have similar labels. We learn the optimal model(θnode; θedge) using Structured Learning approaches.

Data

CLEF PASCAL MIR NUS ALLNumber of photos 4546 10189 14460 244762 268587Number of users 2663 8698 5661 48870 58522Number of tags 21192 27250 51040 422364 450003Number of groups 10575 6951 21894 95358 98659Number of comments 77837 16669 248803 9837732 10071439Number of sets 6066 8070 15854 165039 182734Number of galleries 1026 155 3728 100189 102116Number of locations 1007 1222 2755 22106 23745Number of labels 99 20 14 81 214

We augment four popular datasets using metadata from Flickr.Our data is available at i.stanford.edu/˜julian/

Evaluation

We evaluate our method using publishedclassification results from the four bench-mark datasets we consider. For computa-tional reasons our method is optimized tominimize the Balanced Error Rate

∆(Y,Yc) =12

[|Ypos \ Ypos

c ||Ypos

c |︸ ︷︷ ︸false positive rate

+|Yneg \ Yneg

c ||Yneg

c |︸ ︷︷ ︸false negative rate

],

though we also report the Mean AveragePrecision for the sake of comparison.

In addition, we use our model to recommendtags and groups for each image.

Image labeling results

MAP (CLEF) MAP (PASCAL) MAP (MIR) MAP (NUS)

.37

.55 .60.49

label prediction:

1−∆ (CLEF) 1−∆ (PASCAL) 1−∆ (MIR) 1−∆ (NUS)

.64.74

.83

.61

best text-only methods (CLEF, from [4])best visual-only methods (CLEF, PASCAL, from [2,4])low-level features, SVM (MIR, from [3])low-level features and tags, SVM (MIR, from [3])low-level image featurestag-only ‘flat’ modelall-features flat modelgraphical model with social metadata

Tag and group prediction

1−∆(CLEF)

1−∆(PASCAL)

1−∆(MIR)

.69 .66 .69

MAP(CLEF)

MAP(PASCAL)

MAP(MIR)

.18.15

.20

tag recommendation:

1−∆(CLEF)

1−∆(PASCAL)

1−∆(MIR)

.69 .74 .71

MAP(CLEF)

MAP(PASCAL)

MAP(MIR)

.22.24

.22

group recommendation:

low-level image featuresgroups and wordsgraphical modelwith social metadata

low-level image featurestags and wordsgraphical modelwith social metadata

Which social network features are useful?

CLEF labels

Taken by friendsTaken by the same personTaken in the same location

Number of common galleriesNumber of common collections

Number of common groupsNumber of common tags

PASCAL labels MIR labels NUS labels tags (MIR) groups (MIR)

We confirm existing findings that tag and GPS data are useful for classification, while also finding that other sources of metadata are informative.

Do images with similar metadata have similar labels?

0 81shared tags

0

7

shar

edla

bels

tags vs. labels

0 40shared groups

0

7

shar

edla

bels

groups vs. labels

0 9shared collections

0

7

shar

edla

bels

collections vs. labels

false trueshared galleries

0

7

shar

edla

bels

galleries vs. labels

false trueshared location

0

7

shar

edla

bels

location vs. labels

false trueshared user

0

7

shar

edla

bels

user vs. labels

0 58shared tags

0

5

shar

edla

bels

tags vs. labels

0 21shared groups

0

5

shar

edla

bels

groups vs. labels

0 11shared collections

0

5

shar

edla

bels

collections vs. labels

false trueshared galleries

0

5

shar

edla

bels

galleries vs. labels

false trueshared location

0

5

shar

edla

bels

location vs. labels

false trueshared user

0

5

shar

edla

bels

user vs. labels

0 91shared tags

0

8

shar

edla

bels

tags vs. labels

0 53shared groups

0

8

shar

edla

bels

groups vs. labels

0 12shared collections

0

8

shar

edla

bels

collections vs. labels

0 2shared galleries

0

8

shar

edla

bels

galleries vs. labels

false trueshared location

0

8

shar

edla

bels

location vs. labels

false trueshared user

0

8

shar

edla

bels

user vs. labels

PASCAL MIR NUS average

For all forms of metadata, we find that images with similar metadata tend to have similar labels.

References

[1] J. J. McAuley and J. Leskovec.Image labeling on a network: Using social-networkmetadata for image classification.In ECCV, 2012.

[2] M. Everingham, L. Van Gool, C Williams, J. Winn,and A. Zisserman.The PASCAL visual object classes (VOC) challenge.IJCV, 2010.

[3] Mark Huiskes and Michael Lew.The MIR Flickr retrieval evaluation.In CIVR, 2008.

[4] Stefanie Nowak and Mark Huiskes.New strategies for image annotation: Overview ofthe photo annotation task at ImageCLEF 2010.In CLEF, 2010.

[5] Tat-Seng Chua, Jinhui Tang, Richang Hong, HaojieLi, Zhiping Luo, and Yan-Tao Zheng.NUS-WIDE: A real-world web image database fromthe National University of Singapore.In CIVR, 2009.

[email protected], [email protected]