efficient image search and retrieval using compact binary codes

Download Efficient Image Search and Retrieval using Compact Binary Codes

If you can't read please download the document

Upload: nydia

Post on 05-Jan-2016

25 views

Category:

Documents


1 download

DESCRIPTION

Efficient Image Search and Retrieval using Compact Binary Codes. Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.). Large scale image search. Internet contains many billions of images. How can we search them, based on visual content?. The Challenge: - PowerPoint PPT Presentation

TRANSCRIPT

  • Efficient Image Search and Retrieval using Compact Binary Codes Rob Fergus (NYU)Antonio Torralba (MIT)Yair Weiss (Hebrew U.)

  • How can we search them, based on visual content?Large scale image searchInternet contains many billions of imagesThe Challenge:Need way of measuring similarity between imagesNeeds to scale to Internet

  • Existing approaches to Content-Based Image RetrievalFocus of scaling rather than understanding imageVariety of simple/hand-designed cues:Color and/or Texture histograms, Shape, PCA, etc.Various distance metricsEarth Movers Distance (Rubner et al. 98)

    Most recognition approaches slow (~1sec/image)

  • Our ApproachLearn the metric from training data

    DO BOTH TOGETHER

    Use compact binary codes for speed

  • Large scale image/video searchRepresentation must fit in memory (disk too slow)

    Facebook has ~10 billion images (1010)PC has ~10 Gbytes of memory (1011 bits) Budget of 101 bits/image

    YouTube has ~ a trillion video frames (1012)Big cluster of PCs has ~10 Tbytes (1014 bits) Budget of 102 bits/frame

  • Binary codes for imagesWant images with similar content to have similar binary codes

    Use Hamming distance between codesNumber of bit flipsE.g.:

    Semantic Hashing [Salakhutdinov & Hinton, 2007]Text documents

    Ham_Dist(10001010,10001110)=1Ham_Dist(10001010,11101110)=3

  • Semantic HashingAddress SpaceSemantically similar imagesQuery addressSemantic Hash FunctionQuery ImageBinary codeImages in database[Salakhutdinov & Hinton, 2007] for text documentsQuite different to a (conventional) randomizing hash

  • Semantic HashingEach image code is a memory addressFind neighbors by exploring Hamming ball around query address

    Address SpaceQuery addressImages in databaseChooseCode lengthRadiusLookup time is independent of # of data pointsDepends on radius of ball & length of code:

  • Code requirementsSimilar images Similar CodesVery compact (
  • Input Image representation: Gist vectorsPixels not a convenient representationUse Gist descriptor instead (Oliva & Torralba, 2001)512 dimensions/image (real-valued 16,384 bits)L2 distance btw. Gist vectors not bad substitute for human perceptual distanceOliva & Torralba, IJCV 2001NO COLOR INFORMATION

  • 1. Locality Sensitive HashingGionis, A. & Indyk, P. & Motwani, R. (1999)

    Take random projections of dataQuantize each projection with few bits101No learning involvedGist descriptor

  • 2. BoostingModified form of BoostSSC [Shaknarovich, Viola & Darrell, 2003]Positive examples are pairs of similar imagesNegative examples are pairs of unrelated images

    Learn threshold & dimension for each bit (weak classifier)

  • 3. Restricted Boltzmann Machine (RBM)Type of Deep Belief NetworkHinton & Salakhutdinov, Science 2006 Single RBM layerAttempts to reconstruct input at visible layer from activation of hidden layerW

  • Multi-Layer RBM: non-linear dimensionality reduction512512w1Input Gist vector (512 dimensions)Layer 1512256w2Layer 2256Nw3Layer 3Output binary code (N dimensions)Linear units at first layer

  • Training RBM models1st Phase: Pre-training

    Unsupervised

    Can use unlabeled data (unlimited quantity)

    Learn parameters greedily per layer

    Gets them to right ballpark2nd Phase: Fine-tuning

    Supervised

    Requires labeled data(limited quantity)

    Back propagate gradients of chosen error function

    Moves parameters to local minimum

  • Greedy pre-training (Unsupervised)512512w1Input Gist vector (512 real dimensions)Layer 1

  • Greedy pre-training (Unsupervised)Activations of hidden units from layer 1 (512 binary dimensions)512256w2Layer 2

  • Greedy pre-training (Unsupervised)Activations of hidden units from layer 2 (256 binary dimensions)256Nw3Layer 3

  • Fine-tuning: back-propagation of Neighborhood Components Analysis objective 512512Input Gist vector (512 real dimensions)Layer 1512256Layer 2256NLayer 3Output binary code (N dimensions)w3w2w1

  • Neighborhood Components AnalysisGoldberger, Roweis, Salakhutdinov & Hinton, NIPS 2004Tries to preserve neighborhood structure of input spaceAssumes this structure is given (will explain later)Points in output space (coordinate is activation probability of unit) Toy example with 2 classes & N=2 units at top of network:

  • Neighborhood Components AnalysisAdjust network parameters (weights and biases) to move:Points of SAME class closerPoints of DIFFERENT class away

  • Neighborhood Components AnalysisAdjust network parameters (weights and biases) to move:Points of SAME class closerPoints of DIFFERENT class awayPoints close in input space (Gist) will be close in output code space

  • Simple Binarization StrategySet threshold - e.g. use median

    Deliberately add noise

  • Overall Query SchemeQuery ImageRBMCompute GistBinary codeGist descriptorImage 1Semantic HashRetrieved images
  • Retrieval Experiments

  • Test set 1: LabelMe22,000 images (20,000 train | 2,000 test)Ground truth segmentations for allCan define ground truth distance btw. images using these segmentations

  • Defining ground truth Boosting and NCA back-propagation require ground truth distance between imagesDefine this using labeled images from LabelMe

  • Defining ground truth Pyramid Match (Lazebnik et al. 2006, Grauman & Darrell 2005)

  • Defining ground truth Pyramid Match (Lazebnik et al. 2006, Grauman & Darrell 2005)Varying spatial resolution to capture approximate spatial correspondance

  • Examples of LabelMe retrieval12 closest neighbors under different distance metrics

  • LabelMe RetrievalSize of retrieval set % of 50 true neighbors in retrieval set0 2,000 10,000 20,0000

  • LabelMe RetrievalSize of retrieval set % of 50 true neighbors in retrieval set0 2,000 10,000 20,0000Number of bits% of 50 true neighbors in first 500 retrieved

  • Test set 2: Web images12.9 million imagesCollected from InternetNo labels, so use Euclidean distance between Gist vectors as ground truth distance

  • Web images retrieval% of 50 true neighbors in retrieval setSize of retrieval set

  • Web images retrievalSize of retrieval set % of 50 true neighbors in retrieval set% of 50 true neighbors in retrieval setSize of retrieval set

  • Examples of Web retrieval12 neighbors using different distance metrics

  • Retrieval Timings

  • SummaryExplored various approaches to learning binary codes for hashing-based retrievalVery quick with performance comparable to complex descriptors

    More recent work on binarizationSpectral Hashing (Weiss, Torralba, Fergus NIPS 2009)

    *********