phillip isola, jianxiong xiao, antonio torralba, aude...
TRANSCRIPT
Phillip Isola, Jianxiong Xiao, Antonio Torralba, Aude Oliva, MIT
What makes an image memorable?
Motivation
Intrinsic memorability?
What image content matters?
Prediction algorithm
Predicted memorable
Predicted average
Predicted forgettable
Memory Game
...Vigilance repeat
Memory repeat
100
1-7 back
91-109 back time
+ + + + +
665 participants on Amazon’s Mechanical Turk.
200 1000 1800
40%
50%
60%
70%
80%
90%
100%
Image rank N, according to specified group
Average % memorability, according to
Group 1, of 25 images
centered about rank N
Group 1
Group 2
Chance
= 0.75Memorable
Average
Forgettable
Conclusions
People, close-ups, and human-scale objects predict memorability
Predictable from global features
Stable intrinsic memorability
What content makes an image memorable?
Object score = (prediction when object included in image’s feature vector) - (prediction when object removed)
Objects shaded according to object score (computed per
image)
Objects ranked according
to object score (averaged
across images)
Scenes ranked according to their average
memorability
Prediction algorithm: SVM Regression with non-linear kernels
3) Object segmentation statistics?
histograms of object segments counts, sizes, and rough positions
= 0.20
GIST
SIFT HOG SSIM
Pixels2) Global image features?
pixel histograms, GIST, SIFT, HOG, SSIM = 0.46
“Aquarium”
4) Object and scene semantics?
number, size, and rough position of each object class; scene category
= 0.50
e.g. mean hue, brightness, number of objects
1) Simple image stats? = 0.16
Predicted rank N
Average memorability
for top N ranked
images (%)
Global features and annotationsOther humans
Chance
100 300 500 700 900 1100
70
75
80
85 = 0.75
= 0.54
What classes of information predict memorability?
Database: 2222 photographs from SUN database (Xiao et al. 2010).
Memorability = probability of correctly detecting a repeat after a single view of an image in a long stream.
Rich features necessary
Wide range of memorabilities and high inter-subject consistency
Errors -- Overpredicted
Errors -- Underpredicted
Database
Standardized database
! 0.15 + 0.090
...
person sittingbuildingmountain personfloorskytree seats
natural lake (52%)
broadleaf forest (52%)
art studio (81%)
campus (53%)
bedroom (76%)
bakery shop (81%)
botanical garden (52%)
bathroom (84%)
...
75
80
85
70
200 400 600 800 9000
All Global Features
Average memorability
for top N ranked
images (%)
Predicted rank N
= 0.46
Automatic predictions
2) GISTFilter bank with 8 orientations, 4 scales, averaged over a 4x4 grid. RBF kernel.
= 0.38
1) Pixel histograms = 0.22
3) Dense SIFTDense descriptors quantized into visual words and summarized in spatial pyramid histogram.
= 0.41
4) SSIMSame construction as SIFT, but with SSIM descriptors.
= 0.43
5) HOG2x2 = 0.43
Stacked RGB histograms.
Same again, but with HOG2x2 descriptors, each of whcih is a stack of 2x2 neighboring HOG descriptors.