phillip isola, jianxiong xiao, antonio torralba, aude...

1
Phillip Isola, Jianxiong Xiao, Antonio Torralba, Aude Oliva, MIT What makes an image memorable? Motivation Intrinsic memorability? What image content matters? Prediction algorithm Predicted memorable Predicted average Predicted forgettable Memory Game ... Vigilance repeat Memory repeat 100 1-7 back 91-109 back time + + + + + 665 participants on Amazon’s Mechanical Turk. 200 1000 1800 40% 50% 60% 70% 80% 90% 100% Image rank N, according to specified group Average % memorability, according to Group 1, of 25 images centered about rank N Group 1 Group 2 Chance l = 0.75 Memorable Average Forgettable Conclusions People, close-ups, and human-scale objects predict memorability Predictable from global features Stable intrinsic memorability What content makes an image memorable? Object score = (prediction when object included in image’s feature vector) - (prediction when object removed) Objects shaded according to object score (computed per image) Objects ranked according to object score (averaged across images) Scenes ranked according to their average memorability Prediction algorithm: SVM Regression with non-linear kernels 3) Object segmentation statistics? histograms of object segments counts, sizes, and rough positions l = 0.20 GIST SIFT HOG SSIM Pixels 2) Global image features? pixel histograms, GIST, SIFT, HOG, SSIM l = 0.46 “Aquarium” 4) Object and scene semantics? number, size, and rough position of each object class; scene category l = 0.50 e.g. mean hue, brightness, number of objects 1) Simple image stats? l = 0.16 Predicted rank N Average memorability for top N ranked images (%) Global features and annotations Other humans Chance 100 300 500 700 900 1100 70 75 80 85 l = 0.75 l = 0.54 What classes of information predict memorability? Database: 2222 photographs from SUN database (Xiao et al. 2010). Memorability = probability of correctly detecting a repeat after a single view of an image in a long stream. Rich features necessary Wide range of memorabilities and high inter-subject consistency Errors -- Overpredicted Errors -- Underpredicted Database Standardized database - 0.15 + 0.09 0 ... person sitting building mountain person floor sky tree seats natural lake (52%) broadleaf forest (52%) art studio (81%) campus (53%) bedroom (76%) bakery shop (81%) botanical garden (52%) bathroom (84%) ... 75 80 85 70 200 400 600 800 900 0 All Global Features Average memorability for top N ranked images (%) Predicted rank N l = 0.46 Automatic predictions 2) GIST Filter bank with 8 orientations, 4 scales, averaged over a 4x4 grid. RBF kernel. l = 0.38 1) Pixel histograms l = 0.22 3) Dense SIFT Dense descriptors quantized into visual words and summarized in spatial pyramid histogram. l = 0.41 4) SSIM Same construction as SIFT, but with SSIM descriptors. l = 0.43 5) HOG2x2 l = 0.43 Stacked RGB histograms. Same again, but with HOG2x2 descriptors, each of whcih is a stack of 2x2 neighboring HOG descriptors.

Upload: others

Post on 24-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Phillip Isola, Jianxiong Xiao, Antonio Torralba, Aude ...web.mit.edu/phillipi/www/posters/ImageMemorability CVPR poster v4… · ImageMemorability CVPR poster v4 Created Date: 6/16/2011

Phillip Isola, Jianxiong Xiao, Antonio Torralba, Aude Oliva, MIT

What makes an image memorable?

Motivation

Intrinsic memorability?

What image content matters?

Prediction algorithm

Predicted memorable

Predicted average

Predicted forgettable

Memory Game

...Vigilance repeat

Memory repeat

100

1-7 back

91-109 back time

+ + + + +

665 participants on Amazon’s Mechanical Turk.

200 1000 1800

40%

50%

60%

70%

80%

90%

100%

Image rank N, according to specified group

Average % memorability, according to

Group 1, of 25 images

centered about rank N

Group 1

Group 2

Chance

= 0.75Memorable

Average

Forgettable

Conclusions

People, close-ups, and human-scale objects predict memorability

Predictable from global features

Stable intrinsic memorability

What content makes an image memorable?

Object score = (prediction when object included in image’s feature vector) - (prediction when object removed)

Objects shaded according to object score (computed per

image)

Objects ranked according

to object score (averaged

across images)

Scenes ranked according to their average

memorability

Prediction algorithm: SVM Regression with non-linear kernels

3) Object segmentation statistics?

histograms of object segments counts, sizes, and rough positions

= 0.20

GIST

SIFT HOG SSIM

Pixels2) Global image features?

pixel histograms, GIST, SIFT, HOG, SSIM = 0.46

“Aquarium”

4) Object and scene semantics?

number, size, and rough position of each object class; scene category

= 0.50

e.g. mean hue, brightness, number of objects

1) Simple image stats? = 0.16

Predicted rank N

Average memorability

for top N ranked

images (%)

Global features and annotationsOther humans

Chance

100 300 500 700 900 1100

70

75

80

85 = 0.75

= 0.54

What classes of information predict memorability?

Database: 2222 photographs from SUN database (Xiao et al. 2010).

Memorability = probability of correctly detecting a repeat after a single view of an image in a long stream.

Rich features necessary

Wide range of memorabilities and high inter-subject consistency

Errors -- Overpredicted

Errors -- Underpredicted

Database

Standardized database

! 0.15 + 0.090

...

person sittingbuildingmountain personfloorskytree seats

natural lake (52%)

broadleaf forest (52%)

art studio (81%)

campus (53%)

bedroom (76%)

bakery shop (81%)

botanical garden (52%)

bathroom (84%)

...

75

80

85

70

200 400 600 800 9000

All Global Features

Average memorability

for top N ranked

images (%)

Predicted rank N

= 0.46

Automatic predictions

2) GISTFilter bank with 8 orientations, 4 scales, averaged over a 4x4 grid. RBF kernel.

= 0.38

1) Pixel histograms = 0.22

3) Dense SIFTDense descriptors quantized into visual words and summarized in spatial pyramid histogram.

= 0.41

4) SSIMSame construction as SIFT, but with SSIM descriptors.

= 0.43

5) HOG2x2 = 0.43

Stacked RGB histograms.

Same again, but with HOG2x2 descriptors, each of whcih is a stack of 2x2 neighboring HOG descriptors.