phillip isola, jianxiong xiao, antonio torralba, aude...

Phillip Isola, Jianxiong Xiao, Antonio Torralba, Aude Oliva, MIT

What makes an image memorable?

Motivation

Intrinsic memorability?

What image content matters?

Prediction algorithm

Predicted memorable

Predicted average

Predicted forgettable

Memory Game

...Vigilance repeat

Memory repeat

100

1-7 back

91-109 back time

+ + + + +

665 participants on Amazon’s Mechanical Turk.

200 1000 1800

40%

50%

60%

70%

80%

90%

100%

Image rank N, according to specified group

Average % memorability, according to

Group 1, of 25 images

centered about rank N

Group 1

Group 2

Chance

= 0.75Memorable

Average

Forgettable

Conclusions

People, close-ups, and human-scale objects predict memorability

Predictable from global features

Stable intrinsic memorability

What content makes an image memorable?

Object score = (prediction when object included in image’s feature vector) - (prediction when object removed)

Objects shaded according to object score (computed per

image)

Objects ranked according

to object score (averaged

across images)

Scenes ranked according to their average

memorability

Prediction algorithm: SVM Regression with non-linear kernels

3) Object segmentation statistics?

histograms of object segments counts, sizes, and rough positions

= 0.20

GIST

SIFT HOG SSIM

Pixels2) Global image features?

pixel histograms, GIST, SIFT, HOG, SSIM = 0.46

“Aquarium”

4) Object and scene semantics?

number, size, and rough position of each object class; scene category

= 0.50

e.g. mean hue, brightness, number of objects

1) Simple image stats? = 0.16

Predicted rank N

Average memorability

for top N ranked

images (%)

Global features and annotationsOther humans

Chance

100 300 500 700 900 1100

70

75

80

85 = 0.75

= 0.54

What classes of information predict memorability?

Database: 2222 photographs from SUN database (Xiao et al. 2010).

Memorability = probability of correctly detecting a repeat after a single view of an image in a long stream.

Rich features necessary

Wide range of memorabilities and high inter-subject consistency

Errors -- Overpredicted

Errors -- Underpredicted

Database

Standardized database

! 0.15 + 0.090

...

person sittingbuildingmountain personfloorskytree seats

natural lake (52%)

broadleaf forest (52%)

art studio (81%)

campus (53%)

bedroom (76%)

bakery shop (81%)

botanical garden (52%)

bathroom (84%)

...

75

80

85

70

200 400 600 800 9000

All Global Features

Average memorability

for top N ranked

images (%)

Predicted rank N

= 0.46

Automatic predictions

2) GISTFilter bank with 8 orientations, 4 scales, averaged over a 4x4 grid. RBF kernel.

= 0.38

1) Pixel histograms = 0.22

3) Dense SIFTDense descriptors quantized into visual words and summarized in spatial pyramid histogram.

= 0.41

4) SSIMSame construction as SIFT, but with SSIM descriptors.

= 0.43

5) HOG2x2 = 0.43

Stacked RGB histograms.

Same again, but with HOG2x2 descriptors, each of whcih is a stack of 2x2 neighboring HOG descriptors.

phillip isola, jianxiong xiao, antonio torralba, aude...

Documents