cs 764 seminar in computer vision

CORNELLUNIVERSITY

CS 764Seminar in Computer Vision

Ramin Zabih

Fall 1998

2

CORNELLUNIVERSITY

Course mechanics

Meeting time will be Tue/Thu 11-12, here• Starting a week from today

Home page is now upwww/CS764

Assignment: present one paper• You’ll have a lot of freedom, but you need to

talk to me in advance• Some possible papers will be posted shortly

3

CORNELLUNIVERSITY

Topic of this seminar

The use of “knowledge” in the analysis of visual data• Sometimes called “context”

Clearly this is vital• On both psychological and technical grounds• But how? No one has much of an idea…

What is the interface between reasoning and perception? (Or, mind and body?)

4

CORNELLUNIVERSITY

What is the visual system’s “contract”

Two standard (bad) answers Answer 1: describe the scene in terms of

surfaces [low-level vision]• There is a green patch 2” wide 1’ away

Answer 2: describe the scene in terms of objects [model-based recognition]• Start with a set of 3D models (modelbase)• Determine position and pose

5

CORNELLUNIVERSITY

Why are these answers wrong?

They are almost purely data-driven• Bottom-up (from the data) versus top-down (from

somewhere else) They report “objective fact”, with no room for

the task at hand• For a given image, there is only one right answer

Other problems as well• Not very useful, etc.

6

CORNELLUNIVERSITY

Technical and psychological arguments

There are technical arguments against this• Vision is an inverse problem

– Many 3D scenes could explain a single 2D image

• On engineering grounds, this makes no sense– Ultimately, perception is used for some task

The human perceptual system has both top-down and bottom-up elements• Various optical illusions

– Two people can look at the same picture and see something completely different

10

CORNELLUNIVERSITY

Your vision system doesn’t listen

11

CORNELLUNIVERSITY

It makes “reasonable” assumptions

12

CORNELLUNIVERSITY

Low-level vision has its solution

Inverse problems require assumptions The assumptions for low-level vision are extremely general (I.e., weak)• Reflect the physics of the visible world• For example, motion or depth or intensity tend

to be “coherent”– Saying that every pixel is moving differently from its

neighbors is a very unlikely answer– The world we live in tends not to do that– Helmholtz’s “unconscious inference”

13

CORNELLUNIVERSITY

We’ll need high-level vision

Most of the field is low-level vision or model-based recognition• Partly to avoid the confusion CS764 is about

Key question: how to avoid brittleness?• Can make the visual system compute just what we

need for our task (I.e., berries)• But how to handle the unexpected (I.e., lions)?

14

CORNELLUNIVERSITY

A short historical perspective

1960’s vision was completely task-specific• A black blob in the center of the image is a

telephone• These efforts are now considered “hacks”

1970’s vision became completely general• Marr pushed the field towards precise technical

questions• Low-level vision and recognition became

dominant

15

CORNELLUNIVERSITY

Tasks strike back

In the mid-1980’s, several attempts were made to re-introduce a notion of task• Active/animate/purposive vision

These attempts are widely viewed as failures, for good reasons• We’ll look at them a bit next week

It’s not enough to have good intuitions• There needs to be technical merit as well

16

CORNELLUNIVERSITY

Desiderata

Technical solutions (algorithms) that are very roughly consistent with human data• Goal is not AI, psychology or philosophy

Provide visual summaries useful for tasks, but degrade gracefully• Handle open/unstructured environments• Deal with expectations and breakdown

17

CORNELLUNIVERSITY

Our path for 764

No good computational work to read• Perhaps Vera will fix this?

We will examine papers along these lines:• Computational approaches that failed• Psychological data that is highly suggestive• Neurologically inspired architectures• Cognitive scientists and philosophers

– Their goal is argument, not algorithm!

– They’ve thought the most about these issues

cs 764 seminar in computer vision

Documents