what should be done at the low level? 16-721: learning-based methods in vision a. efros, cmu, spring...

What should be done at the Low Level?

16-721: Learning-Based Methods in VisionA. Efros, CMU, Spring 2009

Class Introductions

• Name:• Research area / project / advisor• What you want to learn in this class?• When I am not working, I ______________• Favorite fruit:

Analysis Projects / Presentations

Wed: Varun

note-taker: Dan

Next Wed: Dan

note-taker: Edward

Dan and Edward need to meet with me ASAP

Varun needs to meet second time

Four Stages of Visual PerceptionFour Stages of Visual Perception

Image- BasedProcessing

Surface- BasedProcessing

Object-Based

Processing

Category- BasedProcessing

Vision

Audition

LightMove-ment

Odor (etc.)

Ceramiccup on a table

David Marr, 1982

The Retinal Image

An Image (blowup) Receptor Output

Image-basedRepresentation

Primal Sketch(Marr)

An Image

(Line Drawing)

RetinalImage

Image-based

processes

EdgesLinesBlobsetc.

Surface-basedRepresentation

Primal Sketch 2.5-D Sketch

Image-basedRepresentation

Surface-based

processes

StereoShadingMotion

Koenderink’s trick

Object-basedRepresentation

Object-based

processes

GroupingParsing

Completionetc.

Surface-basedRepresentation

2.5-D Sketch Volumetric Sketch

Geons(Biederman '87)

Category-basedRepresentation

Category-based

processes

Pattern-Recognition

Spatial-description

Object-basedRepresentation

Volumetric Sketch Basic-level Category

Category: cup

Color: light-gray

Size: 6”

Location: table

We likely throw away a lot

line drawings are universal

However, things are not so simple…

● Problems with feed-forward model of processing…

two-tone images

hair (not shadow!)

inferred external contours

“attached shadow” contour

“cast shadow” contour

Finding 3D structure in two-tone images requires distinguishing cast shadows, attached shadows, and areas of low reflectivity

The images do not contain this information a priori (at low level)

Cavanagh's argument

Marr's model (circa 1980) Cavanagh’s Model (circa 1990s)

Feedforward vs. feedback models

stimulusstimulus

2D shape

memory

3D shape

2½D sketch

Object

3D model

feedback

basic recognition with 2D primitives

reconstruction of shape from image features

object recognition by matching 3D models

primal sketch

A Classical View of Vision

Grouping /Segmentation

Figure/GroundOrganization

Object and Scene Recognition

pixels, features, edges, etc.Low-level

Mid-level

High-level

A Contemporary View of Vision

Figure/GroundOrganization

Grouping /Segmentation

Object and Scene Recognition

pixels, features, edges, etc.Low-level

Mid-level

High-level

But where we draw this line?

Question #1:What (if anything) should be done at the “Low-Level”?

N.B. I have already told you everything that is known. From now on, there

aren’t any answers.. Only questions…

Who cares? Why not just use pixels?

Pixel differences vs. Perceptual differences

Eye is not a photometer!

"Every light is a shade, compared to the higher lights, till you come to the sun; and every shade is a light, compared to the deeper shades, till you come to the night."

— John Ruskin, 1879

Cornsweet Illusion

Campbell-Robson contrast sensitivity curveCampbell-Robson contrast sensitivity curve

Sine wave

Metamers

Question #1:What (if anything) should be done at the “Low-Level”?

i.e. What input stimulus should we be invariant to?

Invariant to:

• Brightness / Color changes?

small brightness / color changeslow-frequency changes

But one can be too invariant

Invariant to:

• Edge contrast / reversal?

I shouldn’t care what background I am on!

but be careful of exaggerating noise

Representation choices

Raw Pixels

Gradients:

Gradient Magnitude:

Thresholded gradients (edge + sign):

Thresholded gradient mag. (edges):

Typical filter bank

pyramid (e.g. wavelet, stearable, etc)

Filters

Input image

What does it capture?

v = F * Patch (where F is filter matrix)

Why these filters?

Learned filters

Spatial invariance

• Rotation, Translation, Scale• Yes, but not too much…

• In brain: complex cells – partial invariance

• In Comp. Vision: histogram-binning methods (SIFT, GIST, Shape Context, etc) or, equivalently, blurring (e.g. Geometric Blur) -- will discuss later

Many lives of a boundary

Often, context-dependent…

input canny human

Maybe low-level is never enough?

what should be done at the low level? 16-721: learning-based methods in vision a. efros, cmu, spring...

universal slide

time slide

lot slide

twotone images slide

d models primal sketch

koenderinks trick slide

d sketch object

d structure

Documents

image pyramids and blending slides modified from alexei...

image warping 15-463: computational photography alexei...

wrap up 15-463: computational photography alexei efros, cmu,...

physiology of vision: a swift overview 16-721:...

computational theories & low-level pixels to percepts a....

cmu scs 15-721 :: indexing (locking & latching) · cmu...

cmu scs 15-721 (spring 2017) :: olap indexes

cmu scs 15-721 :: physical logging · crash course on aries...

cmu scs 15-721 :: course introduction & history of ... ·...

image warping 15-463: computational photography alexei...

cmu scs 15-721 (spring 2018) :: logging protocols recovery...

cmu scs 15-721 (spring 2017) :: multi-version …...cmu...

16-721: learning-based methods in vision staff: instructor:...

cmu scs 15-721 (spring 2017) :: vectorized execution (part...

image manifolds 16-721: learning-based methods in vision...

18 advanced database systems - cmu 15-721

theories of vision: a swift overview 16-721: learning-based...

cmu scs 15-721 :: query execution & schedulingfoundations...

“bag of words”: recognition using texture 16-721:...

6 advanced database systems - cmu 15-721 · cmu 15-721...