image and video descriptors advanced topics in computer vision spring 2010 weizmann institute of...

71
Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

Upload: meghan-matthews

Post on 17-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

Image and video descriptors

Advanced Topics in Computer Vision

Spring 2010

Weizmann Institute of Science

Oded Shahar and Gil Levi

Page 2: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

Outline

• Overview

• Image Descriptors– Histograms of Oriented Gradients Descriptors– Shape Descriptors– Color Descriptors

• Video Descriptors

Page 3: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

Overview - Motivation

• The problem we are trying to solve is image similarity.

• Given two images (or image regions) – are they similar or not ?

Page 4: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

Overview - Motivation

• Solution: Image Descriptors.

• An image descriptors “describes” a region in an image.

• To compare two such regions we will compare their descriptors.

Page 5: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

Overview - Descriptor

Descriptor FunctionSimilar? Similar?

To compare two images, we will compare their descriptors

Page 6: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

Overview - Similarity

• But what is similar to you ?

• Depends on the application !

Page 7: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

Overview

• Image (or region) similarity is used in many CV applications, for example:

– Object recognition– Scene classification– Image registration– Image retrieval– Robot localization– Template matching – Building panorama– And many more…

Page 8: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

Overview

• Example – 3D reconstruction from stereo images.

1275

200

102

195

8039

3

150

3015

19

23

208

110103

80

196

• Comparing the pixels as they are, will not work!

Page 9: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

Overview

• Descriptors provide a means for comparing images or image regions.

• Descriptors allow certain differences between the regions – scale, rotation, illumination changes, noise, shape, etc.

Page 10: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

Overview - Motivation

Descriptor Function

Similar ? Similar ?

•Again, can’t take the pixels alone…

Page 11: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

OverviewComonly used as follows

1. Extract features from the image as small regions

2. Describe each region using a feature descriptor

3. Use the descriptors in application (comparison, training a classifier, etc.)

Page 12: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

Overview

• Main problems

– Features Detection – Where to compute the descriptors? will cover briefly

– Feature Description (Descriptors) How to compute descriptors? today

– Feature Comparison How to compare two descriptors? will cover briefly

Page 13: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

Overview - Features DetectionDetection MethodsWhere to compute the descriptors?

• Grid

• Key-Points

• Global

Page 14: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

Key-Points as Detector Output

• Can be– Points– Regions (of different

orientation, scale and affine trans.)

• Squares • Ellipses• Circles• Etc..

Overview - Features Detection

Page 15: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

Overview – Descriptor Comparison

Given two region description, how to compare them?

• Usually descriptor come with it’s own distance function

• Many descriptors use L2 distance

Page 16: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

Overview – Descriptor Invariance

• Different applications require different invariance therefore require different descriptors

Similar ?

• Different descriptors measure different similarity• Descriptors can have invariance for visual

effects– Illumination– Noise– Colors– Texture

Page 17: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

Outline

• Overview

• Image Descriptors– Histograms of Oriented Gradients Descriptors– Shape Descriptors– Color Descriptors

• Video Descriptors

Page 18: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

Descriptor

Descriptor FunctionSimilar? Similar?

To compare two images, we will compare their descriptors

Page 19: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

Descriptors

Types of descriptors• Intensity based• Histogram• Gradient based• Color Based• Frequency• Shape• Combination of the above

Page 20: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

Descriptors

Why not use patches?

• Very large representation.• Not invariant to small deformations in the descriptor location.• Not invariant to changes in illumination.

Page 21: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

Descriptors

Intensity Histogram

0 255

- Not invariant to light intensity change

- Does not capture geometric information

Page 22: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

Descriptors

Histogram of image gradients

• Does not capture geometric information

• Normalize for light intensity invariance

Page 23: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

Descriptors

Solution: • Divide the area

• For each section compute it’s own histogram

SIFT - David Lowe 1999

Page 24: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

Descriptors - SIFT

Input: an image and a location to compute the descriptor

Step 1: Warp the image to the correct orientation and scale, and than extract the feature as 16x16 pixels

16 x 16

How to compute SIFT descriptor

Page 25: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

Descriptors - SIFT

Step 2: Compute the gradient for each pixel (direction and magnitude)

16 x 16

Step 3: Divide the pixels into 16, 4x4 squares

Page 26: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

Descriptors - SIFT

Step 4: For each square, compute gradient direction histogram over 8 directions.

The result: 128 dimensions feature vector.

Page 27: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

Descriptors - SIFT

• Warp the feature into 16x16 square.• Divide into 16, 4x4 squares.• For each square, compute an histogram of the gradient

directions.

=> Feature vector (128)

Page 28: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

Descriptors - SIFT

• Use L2 distance to compare features

Can use other distance functions• X^2 (chi square)• Earth mover’s distance

• Weighted by magnitude and Gaussian window ( σ is half the window size)

• Normalize the feature to unit vector

Page 29: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

Descriptors - SIFT

Invariance to shift and rotation• Histograms does not contains any geometric

information

• Using 16 histograms allows to preserve geometric information.

Invariance to illumination

• Gradient are invariant to Light intensity shift (i.e. add a scalar to all the pixels)

• Normalization to unit length add invariance to light intensity change (i.e. multiply all the pixels by a scalar)

Page 30: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

Descriptors - GLOH

C. S. Krystian Mikolajczyk. A performance evaluation of local descriptors. TPAMI 2005

• Similar to SIFT• Divide the feature into log-polar bins instead of dividing

the feature into square. – 17 log-polar location bins– 16 orientation bins– We get 17x16=272 dimensions.

Analyze the 17x16=272 DimensionsApply PCA analysis, keep 128 components

Page 31: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

SURF

• Use integral images to detect and describe SIFT like features

• SURF describes image faster than SIFT by 3 times

• SURF is not as well as SIFT on invariance to illumination change and viewpoint change

Page 32: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

Descriptors

Histograms of Oriented Gradients Descriptors

SIFT David Lowe 1999

GLOH Mikolajczyk K., Schmid C 2005

SURF Bay H., Ess A., Tuytelaars T., Van Gool L 2008

Page 33: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

Outline

• Overview

• Image Descriptors– Histograms of Oriented Gradients Descriptors– Shape Descriptors– Color Descriptors

• Video Descriptors

Page 34: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

Descriptors

Page 35: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

Descriptors - Shape Context

Assume we have a good edge detector

Take a patch of edges?Not invariant to small deformations in the shape

=?

Page 36: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

Descriptors - Shape Context

• Quantize the edges surface using a log-polar binning• In each bin, sum the number of edge points

Page 37: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

Descriptors - Shape Context

Page 38: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

Descriptors - Shape Context

Page 39: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

Complex Notion of Similarity

Page 40: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

Image descriptor

Correlation surface

Input image

The Local Self-Similarity Descriptor

SSD e

Page 41: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

1

The Local Self-Similarity Descriptor

23

1

23

1

11 22

2 33

3

Page 42: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

Edges

The Local Self-Similarity Descriptor

Properties & Benefits:

1. A unified treatment of repetitive patterns, color, texture, edges

2. Captures the shape of a local region

3. Invariant to appearance

4. Accounts for small local affine & non-rigid deformations

Image descriptor

Correlation surface

Input image

MAX

Color Texture

Page 43: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

Template image:

Page 44: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

Shape DescriptorsAllows measuring of shape similarity

Shape Context Belongie S., Malik J., Puzicha J. Shape Matching and Object Recognition Using Shape Contexts. PAMI, 2002.

Local Self-Similarity Shechtman E., Irani M. Matching Local Self-Similarities across Images and Videos. CVPR, 2007.

Geometric Blurrg A. C., Malik J. Geometric Blur for Template Matching. CVPR, 2001.

Outperform the commonly used SIFT in object classification task

Horster E., Greif T., Lienhart R., Slaney M. Comparing local feature descriptors in pLSA-based image models. 

Descriptors

Page 45: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

Outline

• Overview

• Image Descriptors– Histograms of Oriented Gradients Descriptors– Shape Descriptors– Color Descriptors

• Video Descriptors

Page 46: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

Color Descriptors

Page 47: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

Color Descriptors

Color spaces

• RGB

• HSV

• Opponent

Page 48: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

Color Descriptors

Opponent color space

• color information is represented by channel O1 and O2

• O1 and O2 are invariant to offset

• intensity information is represented by channel O3

Page 49: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

Color Descriptors

• RGB color histogram

• Opponent O1, O2

• Color moments• Use all generalized color moments up to the second

degree and the first order.• Gives information on the distribution of the colors.

Page 50: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

Color Descriptors• RGB-SIFT descriptors are computed for every RGB channel

independently– Normalize each channel separately – Invariant to light color change

• rg-SIFT - SIFT descriptors over to r and g channels of the normalized-RGB space (2x128 dimensions per descriptor)

• OpponentSIFT - describes all the channels in the opponent color space

• C-SIFT - Use O1/O3 and O2/O3 of the opponent color space (2x128 dimensions per descriptor)– Scale-invariant with respect to light intensity.– Due to the definition of the color space, the offset does not cancel

out when taking the derivativeG. J. Burghouts and J. M. Geusebroek

Performance evaluation of local color invariants 2009

Page 51: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

Color Descriptors

Light intensity changeLight color change

Light intensity shiftLight color change and shift

Light intensity shift and change

Studies the invariance properties and the distinctiveness of color descriptors

Page 52: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

Color Descriptors

Page 53: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

Color Descriptors

Page 54: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

Color Descriptors

Increased invariance can reduce discriminative power

Page 55: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

Color Descriptors

Descriptor performance on image benchmark

Page 56: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

Color Descriptors

Page 57: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

Descriptors

How to chose your descriptor?

What is the similarity that you need for your application?

Page 58: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

Descriptors

Page 59: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

DescriptorsNameCapture

SIFTGradient histogramsTexture, gradients

GLOHVariant of SIFT, log-polar descriptorTexture, gradients

SURFFaster variant of SIFT with lower performance

Texture, gradients

Shape Context

Histogram of edges, good for shapes description

Shape, edges

Self-Similarity

Higher level shape description, Invariant to appearance

Shape

RGB-SIFTSIFT descriptors are computed for every RGB channel independently

Texture, gradients

C-SIFTSIFT base on the opponent color space, shown to be better then SIFT for object and scene recognition

Texture, gradients, color

Page 60: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

Outline

• Overview

• Image Descriptors– Histograms of Oriented Gradients Descriptors– Shape Descriptors– Color Descriptors

• Video Descriptors

Page 61: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

Video Descriptors

Application: Action recognition

Video: More then just a sequence of images

Want to capture temporal information

Page 62: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

Video Descriptors

• Space-Time SIFT 

P. Scovanner, S. Ali, M. Shah A 3-dimensional sift descriptor and its application to action recognition - 2007

64-directions histogram

Page 63: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

Video Descriptors

Actions as Space-Time Shapes

Page 64: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

3D Shape Context

M. Grundmann, F. Meier, and I. Essa (2008) “3D Shape Context and Distance Transform for Action Recognition”

Represent an action in a video sequence by a 3D point cloud extracted by sampling 2D silhouettes over time

Page 65: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

Input video

Videodescriptor

Correlationvolume

x

y

time

space-time space-time

regionregion

space-time space-time patchpatch

Action detection

The Local Self-Similarity Descriptor in Video

Page 66: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

Video Descriptors

• On Space-Time Interest Points; Ivan Laptev– Local image features provide compact and

abstract representations of images, eg: corners

– Extend the concept of a spatial corner detector to a spatio-temporal corner detector

Page 67: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

Space-Time Interest Points

• Consider a synthetic sequence of a ball moving towards a wall and colliding with it

• An interest point is detected at the collision point

Page 68: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

Space-Time Interest Points• Consider a synthetic sequence of 2 balls moving towards

each other

• Different interest points are calculated at different spatial and temporal scales

coarser scale

Page 69: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

Conclusion

• The problem we are trying to solve is similarity between images and videos.

• Descriptors provide a solution

Page 70: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

Conclusion

• Tradeoff between preserving information and obtaining invariance.

• Tradeoff between keeping the geometric structure and obtaining invariance properties (perturbations & rotations).

Page 71: Image and video descriptors Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi

Thank You