compact descriptors for visual search
TRANSCRIPT
Compact Descriptors 4 Visual Search
Danilo Pau ([email protected])
Senior Principal Engineer
Senior Member of Technical Staff
SMIEEE
SI/CVRP
STMicroelectronics/AST
Courtesy: M. Funamizu
Agenda
• Visual Search: Context
• MPEG initiative on Visual Search
• Compact Descriptors for Visual Search
• Implementation
• Use Cases
• Visual Search Evolution: Moving Pictures and 3D
• Question and Answers
2
15/01/2013Presentation Title
Agenda
• Visual Search: Context
• MPEG initiative on Visual Search
• Compact Descriptors for Visual Search
• Implementation
• Use Cases
• Visual Search Evolution: Moving Pictures and 3D
• Question and Answers
3
15/01/2013Presentation Title
Visual Search Context• Millions of images and videos continue being uploaded all over the
world on remote servers
• Each day on Facebook 300 million photos are uploaded
• roughly 58 photos uploaded each second
• One hour of video uploaded to YouTube every second
4
15/01/2013Presentation Title
Content Based Image Recognition
• CBIR covers the concept of search that analyzes the actual content inthe image, rather than relying on metadata.
• The development of this concept incorporated many algorithms andtechniques from fields such as statistics, pattern recognition andcomputer vision.
• CBIR attracted a lot of attention and after many years of research, ithas expanded towards the marketplace.
• CBIR’s application on mobile market is called Mobile Visual Search
• Visual Search is about the capability to initiate a search using animage as a query that captures a rigid object
• Market potential of mobile visual search considers any mobile device with camera(phones, tablets and hybrids).
5
15/01/2013Presentation Title
CBIR vs QR Codes
• Quick Response codes, a type of two-dimensional barcode.
• The code is scanned by the mobile imager to produce a URL addressfor re-direction and browsing.
• QR codes are being used by 6.2% of the smart phone users in USA
6
15/01/2013Presentation Title
Lots of Existing Applications• Google’s Goggles
• Nokia’s Point and Find
• oMoby
• Like.com
• Kooaba
• Moodstocks
• Snaptell
• pixlinQ
• Bing
7
15/01/2013Presentation Title
Existing Apps use Jpeg
• Previous applications use mobile imager that send JPEG compressed queries
8
15/01/2013Presentation Title
Remote server
Mobile device
Send Jpeg images
Visual search result
Database
An Example of Visual Search
Courtesy Telecom Italia
Interest Point DescriptionDescriptor pairingInliers
9
Query
The Rise of Compressed Descriptors
• Alternatively send “compact features” extracted from raw images
• For example Scale Invariant Feature Transform – SIFT visual descriptors
• Consider 1200 descriptors, each one 128 Bytes, 4 bytes for coordinates, times 30 fps � network load nearly 38 Mbit/s �unacceptable
10
15/01/2013Presentation Title
0
20
40
60
80
100
120
140
160
JPEG High JPEG Low SIFT
VGA Image
JPEG High
JPEG Low
SIFT
KB
Systems Considered
• Instead of sending images (a)
• application can send compact descriptors (b)
• and even perform search locally (c).
11
Previous Attempts
• Hashing• Locality Sensitive Hashing [Yeo et ali., 2008]
• Similarity Sensitive Coding [Torralba et ali., 2008]
• Spectral Hashing [Weiss et ali, 2008]
• Transform Coding• Karunen-love Transform [Chandrasekhar et ali. 2009]
• ICA based Transform [Narozny et ali., 2008]
• Vector Quantization• Product Quantization [Jegou et ali., 2010]
• Tree Structured Vector Quantization [Nistr et ali., 2006]
• Alternative to SIFT• Compressed Histogram of Gradients [Chandrasekhar et ali. 2011]
12
15/01/2013Presentation Title
Agenda
• Visual Search: Context
• MPEG initiative on Visual Search
• Compact Descriptors for Visual Search
• Implementation
• Use Cases
• Visual Search Evolution: Moving Pictures and 3D
• Question and Answers
13
15/01/2013Presentation Title
Is a standard on Visual Search needed ?
• Reduce load on wireless networks carrying visual search-related information.
• Ensure interoperability of visual search applications and databases,
• Enable hardware support for descriptor extraction and matching in mobile devices,
• Enable high level of performance of implementations conformant to the standard,
• Simplify design of descriptor extraction and matching for visual search applications,
14
What is a suitable standardizationbody ?
• Informal title:• Moving Picture Experts Group (MPEG)
• Formal title:• ISO/IEC JTC1 SC29 WG11 (Coding of Moving Pictures and Audio)
• Parent SDOs:• ISO: International Organization for Standardization • IEC: International Electro technical Commission• JTC 1: Joint Technical Committee One• SC29: Study Committee 29: Coding of Audio, Picture,
Multimedia and Hypermedia Information
• Members: National Bodies (25 voting, 16 observers)
JTC 1
SC29
WG11 (MPEG)
15
16
Agenda
• Visual Search: Context
• MPEG initiative on Visual Search
• Compact Descriptors for Visual Search
• Implementation
• Use Cases
• Visual Search Evolution: Moving Pictures and 3D
• Question and Answers
17
15/01/2013Presentation Title
CDVS : Scope
• Descriptor extraction process needed to ensure interoperability.
• Bitstream of compact descriptors
Query Image
Descriptor extraction
Descriptor bitstream
Descriptor matching
Geometric verification
Database
List of results
Standard
18
Requirements
� Robustness� High matching accuracy shall be achieved at least for images of textured
rigid objects, landmarks, and printed documents. � The matching accuracy shall be robust to changes in vantage points,
camera parameters, lighting conditions, as well as in the presence of partial occlusions.
� Sufficiency� Descriptors shall be self-contained, in the sense that no other data are
necessary for matching.
� Compactness� Shall minimize lengths/size of image descriptors
� Scalability� Shall allow adaptation of descriptor lengths to support the required
performance level and database size.� Shall enable design of web-scale visual search applications and
databases.
19
How to achieve robustness• Image content is transformed into visual feature with coordinates
that are invariant to illumination, scale, rotation, affine and perspective transforms
20
Types of invariance
• Illumination
21
• Illumination
• Scale
22Types of invariance
• Illumination
• Scale
• Rotation
23Types of invariance
• Illumination
• Scale
• Rotation
• Affine Transform
24Types of invariance
• Illumination
• Scale
• Rotation
• Affine Transform
• Full Perspective
25Types of invariance
Compactness 26
15/01/2013Presentation Title
0
20
40
60
80
100
120
140
160
JPEG High JPEG Low SIFT 512B 1KB 2KB 4KB 8KB 16KB
VGA Image
JPEG High
JPEG Low
SIFT
512B
1KB
2KB
4KB
8KB
16KB
KB
Extraction Pipeline 27
Image
Compactdescriptors
H Mode
H-Mode uses SQ encoding (256B)
S-Mode uses MSVQ encoding (38KB)
Both Mode uses SCFV (49KB)
Resizing
Local DescriptionExtraction
Encoding
SCFV
Descriptor
Coordinate coding
Arithmetic coding
MSVQ
encoding
Keypointselection
SIFTDoG
Transform & SQ
S Mode
Properties of SIFTDavid Lowe’s local descriptor detection extraction (1999-2004)
Extraordinarily robust matching technique• Can handle changes in viewpoint
• Up to about 30 degree out of plane rotation
• Can handle significant changes in illumination• Sometimes even day vs. night (below)
• Lots of code available � http://www.vlfeat.org (BSD license)
28
Pyramid of DoG
DoGs
DoGs
DoGs
Octave 1
Octave n
Scale 1 Scale m29
Actual Interest Point Detector Output 30
Building a Descriptor• Take 16x16 patch window around detected interest point
• Subdivide patch with 4x4 sub-patches
• Create per sub patch 8 bin-histogram over edge orientations weighted by magnitude
• These lead to a 4x4x8=128 element vector � the SIFT descriptor
31
15/01/2013Presentation Title
0 2ππππ
angle histogram
Key point selection
• Basic idea: inlier features do not behave, in a statistical sense, as do the outlier features.
• Relevance value that results from taking into account distance from center, scale, orientation, peak, mean and variance of the SIFT descriptor.
32
• Main idea is to generate a compressed descriptor from uncompressed SIFT by
• Simple linear combinations of histograms
• Scalar quantisation of resultant values
• Adaptive Arithmetic coding
• Main benefits• Very low computational complexity
• Negligible memory requirements
• Highly scalable
• Allows for very efficient matching and retrieval
Local Descriptor Compression H mode 33
Vector Quantizer Scheme: S- Mode 34
Location Encoding
• Histogram Map: The positions of the nonzero bins are encoded asbinary words through scanning columns and compressing the words byarithmetic coding.
• Histogram Count: The number of coordinates in the nonzero bins isencoded in an iterative fashion, by specifying first which bins containmore than 1 key point, then by specifying which among these thatcontain more than 2 keypoints, and so forth
35
Agenda
• Visual Search: Context
• MPEG initiative on Visual Search
• Compact Descriptors for Visual Search
• Implementation
• Use Cases
• Visual Search Evolution: Moving Pictures and 3D
• Question and Answers
36
15/01/2013Presentation Title
Extraction times
• SIFT interest point detection and feature extraction made the biggest contribution
• Global descriptors as complex as Interest Point Detection
• Very fast local descriptors and coordinate encoding
37
15/01/2013Quantitative evaluation of CDVS extraction and pairwise matching
Agenda
• Visual Search: Context
• MPEG initiative on Visual Search
• Compact Descriptors for Visual Search
• Implementation
• Use Cases
• Visual Search Evolution: Moving Pictures and 3D
• Question and Answers
38
15/01/2013Presentation Title
Mobile Visual Search: Music CDs
Query
Stream Music
39
… …
SnapshotPaper-copy Initiate Visual
Search
Mass Storage
SendCompact Query
Selective quality&contentprinting
Multimedia Content RetrievalFrom the cloud
Augmentation Rendering
Composition of augmentations
and image
Augmentation 3D models and markers
Transmission of markers and 3D
models
2D / 3D Rendering
Content Augmentation
40Visual Search: eReaders, Printers
News FinderStill Pictures - Visual Search
41
15/01/2013Presentation Title
Application and Use Cases from Broadcaster point of view
• Logo Detection
• Interactive Fruition
42
15/01/2013Presentation TitleCourtesy RAI
Automotive 3D Top View
EC
UCam
Cam
Cam
Cam
43
Automotive 3D Top View 44
45Moving Pictures Visual Search
Courtesy Telecom Design
Agenda
• Visual Search: Context
• MPEG initiative on Visual Search
• Compact Descriptors for Visual Search
• Implementation
• Use Cases
• Visual Search Evolution: Moving Pictures and 3D
• Question and Answers
46
15/01/2013Presentation Title
Intra Predicted Descriptors 47
15/01/2013Presentation Title
� Desirable Properties:
� An inter descriptor coded in a compact visual stream
� Expressed in terms of one or more temporally neighboring descriptors.
� The "inter" part of the term refers to the use of Inter Frame Prediction.
� Designed to achieve higher compression rates and/or better precision-recall performances
3D Mobile Devices Will Surpass 148 Million in 2015
• Advances in the 3D technology are very fast
• Industry adoption opens new opportunities � 3D Visual Search
• From In-Stat studies:• ~ 30 % of all handheld game consoles will be 3D by 2015.
• 3D mobile devices will increase demand for image sensors by 130 %.
• In 2012, Notebook will be the first 3D enabled mobile device to reach 1 million units.
• By 2014, 18 % of all tablets will be 3D.
• Nintendo, Fuji, GoPro, Sony, ViewSonic, LG, Origin, Toshiba, Fujitsu, HP, ASUS, Lenovo, Dell, Alienware, HTC and Sharp focusing on autostereoscopy mobile technologies
48
15/01/2013Presentation Title
49
15/01/2013Presentation Title
Microsoft Kinect Asus Xtion
Google 3D Warehouse
LG Optimus 3D P920
LG Optimus Pad
HTC EVO 3D Sharp Aquos SH-12C
3DS by Nintendo
3D Object Recognition with Kinect 50
15/01/2013Presentation Title
http://www.youtube.com/watch?v=eRW1zG_aONk
Courtesy: CV laboratory University of Bologna
SHOT: Unique Signatures of Histograms for Local Surface Description
Agenda
• Visual Search: Context
• MPEG initiative on Visual Search
• Compact Descriptors for Visual Search
• Implementation
• Use Cases
• Visual Search Evolution: Moving Pictures and 3D
• Question and Answers
51
15/01/2013Presentation Title
52
15/01/2013Presentation Title