6.819 / 6.869: advances in computer...

48
6.819 / 6.869: Advances in Computer Vision Image Retrieval: Retrieval: Information, images, objects, large-scale Website: http://6.869.csail.mit.edu/fa15/ Instructor: Yusuf Aytar Lecture TR 9:30AM – 11:00AM (Room 34-101)

Upload: others

Post on 25-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast

6.819 / 6.869: Advances in Computer Vision

Image Retrieval:Retrieval: Information, images, objects, large-scale

Website: http://6.869.csail.mit.edu/fa15/

Instructor: Yusuf Aytar

Lecture TR 9:30AM – 11:00AM (Room 34-101)

Page 2: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast

What is Image Retrieval ?

Fall in Boston

Text

Query

Image

User

Speech

Retrieval Results

Page 3: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast

Applications

Art Retrieval Medical Image Retrieval Product Image Retrieval (Reviews, other prices etc.)

Fashion Image Retrieval

Page 4: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast

Overview

Bag of Words, TF-IDF, Cosine Similarity, Inverted IndexInformation Retrieval

Bag of Visual Words, Video Google, Object Instance RetrievalObject Instance Retrieval

KD-trees, Locality Sensitive Hashing, Semantic Hashing, Compact Codes

Fast Object Detection/RetrievalFast detection, Part representations, Generalization from exemplar

Large Scale Image Search

Page 5: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast

Information Retrieval

Page 6: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast

The last duel After quarrelling over a bank loan, two men took part in the last fatal duel staged on Scottish soil. BBC News's James Landale retraces the steps of his ancestor, who made that final challenge.

1 1 0 0

Bank : Loan : Water : Farmer :

Doc-1

West Bank water row Palestinians have accused Israel of diverting water away from their towns in order to keep Jewish settlements in the occupied territories fully supplied. Israel denies the charge saying Palestinian farmers are to blame for using illegal connections to irrigate their fields.

Bank : Loan : Water : Farmer :

1 0 2 0

Doc-2

Bag of Words (BOW)

A widely used document representation method

Page 7: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast

0.5 0.33 0 0

0.5 0 0 0.6

0 0.66 1 0.2

0 0 0 0.2

Bank

Loan

Water

Farmer

LexiconDocuments

Doc-1 Doc-2 Doc-3 Doc-4

1 1 0 0

1 0 0 3

0 2 1 1

0 0 0 1

Bank

Loan

Water

Farmer

LexiconDocuments

Doc-1 Doc-2 Doc-3 Doc-4

Term Frequency (TF)

Normalization

Page 8: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast

Inverse Document Frequency (IDF)

!"

#$%

&= df

nidfi

i log IDF of ith word:

tf x idf

The last duel After quarrelling over a bank loan, two men took part in the last fatal duel staged on Scottish soil. BBC News's James Landale retraces the steps of his ancestor, who made that final challenge.

Doc-1Ba

nk

Loan

Wat

er .Fa

rmer ...

=

Page 9: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast

Cosine Similarity

Query: fall in Boston

q TCosine Similarity Score =

Page 10: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast

lexicon/dictionary3 8 10 13 16 20

bank

loan

water

1 2 3 9 16 18

PL(bank)

PL(loan)

Postings list

4 5 8 10 13 19 20 22 PL(water)

Allows quick lookup of document ids with a particular word

Inverted Index

Page 11: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast

Bag of Words & Object Instance Retrieval

Page 12: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast

Slide Credit - F. Perronnin, LSVR tutorial at CVPR’13

Feature Detectors

Page 13: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast

Feature Descriptors

Slide Credit - F. Perronnin, LSVR tutorial at CVPR’13

Page 14: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast

Video Google Feature Detectors / Descriptors

Harris-Affine & Hessian Affine as the feature detectorsSIFT as the feature descriptor

Video Google: A Text Retrieval Approach to Object Matching in Videos, Josef Sivic and Andrew Zisserman, ICCV 2003

Page 15: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast

Video Google Bag of Visual Words

Video Google: A Text Retrieval Approach to Object Matching in Videos, Josef Sivic and Andrew Zisserman, ICCV 2003

Images

Affine Invariant Feature Detectors

Harris-Affine & MSERSIFT

~300K Feature

Descriptorsk-means

2,237 Visual Words

viewpoint invariance

illumination invariance

A wheel of an airplane A motorbike handleBack of a motorbike or tip of the wings (Polysemy)

Discovering objects and their location in images, Sivic et. al., ICCV 2005

http://www.robots.ox.ac.uk/~vgg/publications/2012/Arandjelovic12/presentation.pdf

Page 16: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast

• Find all instances of the query object in a large scale dataset

• Do it instantly (< 1sec), and be robust to scale, viewpoint, lighting, partial occlusion

Video Google Large scale object instance retrieval

Video Google: A Text Retrieval Approach to Object Matching in Videos, Josef Sivic and Andrew Zisserman, ICCV 2003http://www.robots.ox.ac.uk/~vgg/publications/2012/Arandjelovic12/presentation.pdf

Page 17: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast

Video Google Particular object retrieval - Bag of visual words

Video Google: A Text Retrieval Approach to Object Matching in Videos, Josef Sivic and Andrew Zisserman, ICCV 2003http://www.robots.ox.ac.uk/~vgg/publications/2012/Arandjelovic12/presentation.pdf

Object retrieval with large vocabularies and fast spatial matching, Philbin, Chum, Isard, Sivic, Zisserman, CVPR 2007

Page 18: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast

Video Google BOW + Inverted File Indexing

Slide Credit - Chum, LSVR tutorial at CVPR’13

Page 19: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast

1. Image Query 2. Initial Retrieval Set

3. Spatial Verification

Spatial Verification

http://www.robots.ox.ac.uk/~vgg/publications/2012/Arandjelovic12/presentation.pdf

Object retrieval with large vocabularies and fast spatial matching, Philbin, Chum, Isard, Sivic, Zisserman, CVPR 2007

Page 20: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast

Query Expansion

Slide Credit - Chum, LSVR tutorial at CVPR’13

Page 21: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast

Query Expansion

Slide Credit - Chum, LSVR tutorial at CVPR’13

Page 22: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast

Query Expansion

Slide Credit - Chum, LSVR tutorial at CVPR’13

Page 23: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast

http://www.robots.ox.ac.uk/~vgg/demo/

Video Google - Object Instance Retrieval

Page 24: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast

Immediate, scalable object category detection

Page 25: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast

• Running a detector fast on a single image (Cascades, PQ, etc.) [Felzenszwalb-CVPR10,Vedaldi-CVPR12, Sadeghi-NIPS13].

• Running multiple detectors fast on a single image (Sparselets, etc.) [Song-ECCV12, Dean-CVPR13].

. . .

• Running a detector fast (~1sec) on a large-scale image dataset, similar to Video Google [Sivic03] but for category detection.

Motivation: Object Detection

Page 26: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast

Large Scale Object Instance Retrieval

• Retrieve instantly (< 1sec)

• Robust to: scale, viewpoint, lighting, partial occlusion

Large Scale Object Category Detection

• Retrieve instantly (~ 1sec)

• Robust to: scale, viewpoint, lighting, partial occlusion and

Intra-class variance

[Arandjelovic-CVPR12] -

query query

versus

retrieval results

Page 27: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast

Overview

Uses the three stages of Video Google revamped for object category detection

. . . 1 Indexing and

inverted file

3 Reranking (a) Spatial Reranking (b) HOG Scoring

Classifier Part (CP) Dictionary

Query HOG Template Reconstructed Template 2 Shortlisting

Page 28: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast

Query Representation

Reconstructed Template

. .

.

CP Dictionary

 

 

Query (HOG template) is first represented as a sparse combination of CPs.

Query HOG Template

 

𝛂 and the spatial layouts of CPs define the reconstructed template

Page 29: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast

Image Representation

Page 30: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast

Shortlisting

 

Page 31: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast

Spatial Reranking

Page 32: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast

Reranking (Spatial + Original Template)Shortlisted images are reranked via fast Hough-like voting of bounding box candidates suggested by each CP.

… ... ...

Spat

ial

Rera

nkin

g

Retrieved bounding box candidates are re-scored using the original HOG template with fast and memory efficient PQ compression.

… ... ... HO

G

Scor

ing

...

Page 33: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast

Dictionary & Dataset

10K dictionaries of sizes 3x3 – 7x7 HOG cells are extracted from DPMs trained from 1000 ImageNet categories.

Tests are performed on PASCAL VOC07 test set (5K images) and validation sets (100K images) of ImageNet 2011 and 2012 challenges.

. . .

Page 34: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast

Detection Results

Person Template

Reconstructed Person Template

Top 3 retrievals

TV/Monitor Template

Reconstructed TV/Monitor Template Top 3 retrievals

Cow Template

Reconstructed Cow Template

Top 3 retrievals

Motorbike Template

Reconstructed Motorbike Template

Top 3 retrievals

Immediate, scalable object category detection, Y. Aytar, A. Zisserman, CVPR 2014

Page 35: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast

Exemplar SVM Results

Page 36: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast

Large Scale Image Search

Page 37: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast

• Find similar images in a large database

Slide Credit - Kristen Grauman et al

Large Scale Image Search

Fast & Accurate

Page 38: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast

Internet contains billions of images

The Challenge:

Search the internet

Large Scale Image Search

Needs to scale to Internet (How?)

Need way of measuring similarity between images

(distance metric learning)

Page 39: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast

• Search must be both fast, accurate and scalable to large data set

• Fast – Kd-trees: tree data structure to improve search speed – Locality Sensitive Hashing: hash tables to improve search speed – Small code: binary small code (010101101)

• Scalable – Require very little memory, enabling their use on standard hardware

or even on handheld devices • Accurate

– Learned distance metric

Requirements for image search

Page 40: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast

Categorization of existing large scale image search algorithms

• Tree Based Structure – Spatial partitions (i.e. kd-tree) and recursive hyper plane

decomposition provide an efficient means to search low-dimensional vector data exactly.

• Hashing – Locality-sensitive hashing offers sub-linear time search by

hashing highly similar examples together.

• Binary Small Code – Compact binary code, with a few hundred bits per image

Page 41: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast

Tree Based Structure

• Kd-tree – The kd-tree is a binary tree in which every node is a

k-dimensional point

• (No theoretical guarantee!)They are known to break down in practice for high dimensional data, and cannot provide better than a worst case linear query time guarantee.

Page 42: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast

• Take random projections of data

• Quantize each projection with few bits

0

1

0

10

1

101

No learning involved

Feature vector

Slide Credit - Fergus et al

Locality Sensitive Hashing

Page 43: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast

How to search from hash table?

Q111101

110111

110101

h r1…rkXi

N

h r1…rk

<< N

Q

Slide Credit - Kristen Grauman et al.

A set of data points

Hash function

Hash table

New query

Search the hash table for a small set of images

results

Page 44: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast

Binary codes for images

• Want images with similar content to have similar binary codes

• Use Hamming distance between codes – Number of bit flips

– E.g.:

• Semantic Hashing [Salakhutdinov & Hinton, 2007]

– Text documents

Ham_Dist(10001010,10001110)=1Ham_Dist(10001010,11101110)=3

Slide Credit - Fergus et al

Page 45: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast

Semantic Hashing

Address Space

Semantically similar images

Query address

Semantic Hash

Function

Query

Binary code

Images in database

Quite differentto a (conventional)randomizing hash

Slide Credit - Fergus et al Semantic Hashing, Ruslan Salakhutdinov and Geoffrey Hinton, International Journal of Approximate Reasoning, 2009 Small Codes and Large Databases for Recognition, A. Torralba, R. Fergus, and Y. Weiss, CVPR 2008

– Find neighbors by exploring Hamming ball around query address

– Lookup time depends on radius of ball, NOT on # data points

Page 46: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast

Compact Binary Codes

• Google has few billion images (109)

• PC has ~10 Gbytes (1011 bits)

• Codes must fit in memory (disk too slow)

Budget of 102 bits/image

• 1 Megapixel image is 107 bits

• 32x32 color image is 104 bits

Semantic hash function must also reduce dimensionality

Slide Credit - Fergus et al

Page 47: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast

RBM architecture

Hidden units: h

Visible units: v

Symmetric weights w

Learn weights and biases using Contrastive Divergence

Parameters: Weights w Biases b

• Network of binary stochastic units

• Hinton & Salakhutdinov, Science 2006

Convenient conditional distributions:

Slide Credit - Fergus et al

Page 48: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast

12 closest neighbors under different distance metrics

Examples of LabelMe retrieval

Slide Credit - Fergus et al