recap low level vision –input: pixel values from the imaging device –data structure: 2d array,...

Recap

• Low Level Vision– Input: pixel values from the imaging device– Data structure: 2D array, homogeneous– Processing:

• 2D neighborhood operations• Histogram based operation• Image enhancements• Feature extraction

– Edges– Regions

Recap

• Mid Level Vision– Input: features from low level processing– Data structures: lists, arrays, heterogeneous– Processing:

• Pixel/feature grouping operations• Model based operations• Object descriptions

– Lines – orientation, location– Regions – central moments– Relationships amongst objects

What’s Left To Do?

• High Level Vision– Input:

• Symbolic objects from mid level processing• Models of the objects to be recognized (a priori knowledge)

– Data structures: lists, arrays, heterogeneous– Processing:

• Object correspondence– Local correspondences represent individual object (or component)

recognitions– Global correspondences represent recognition of an object in the

context of the scene

• Search problem– Typified by the “NP-Complete” question

High Level Vision

• Goal is to interpret the 2D structure (image pixels) as a 3D scene

• Humans do this very well– This is a problem for computer

vision researchers– Our competition is fierce and

unrelenting• To achieve our goal we don’t necessarily need to

mimic the biological system– We’re not trying to explain how human vision works, we’re

just trying to achieve comparable results

High Level Vision

• The 3D scene is made up of– Objects– Illumination due to light sources

• The appearance of boundaries between object surfaces is dependent on their orientation relative to the light source (and surface material, sensing device…)– This is where we get edges from

Labeling Edges

• In a 3D scene, edges have very specific meanings– They separate objects from one another

• Occlusion

– They demark sudden changes in surface orientation within a single object

– The demark sudden changed in surface albedo (light reflectance)

– A shadow cast by an object

Labeling Edges

• Edge detectors can provide some information regarding the meaning

Labeling Edges

• But additional information must be provided through logic and reasoning

• Under some constrained “worlds” we can identify all possible edge and vertex types– “Blocks World” assumption

• Toy blocks

• Trihedral vertices

– Sounds simple but much has been learned from studying such worlds

Labeling Edges

• Blade edges– One continuous surface occludes another– Surface normal changes smoothly– Curved surfaces– Single arrowhead

Labeling Edges• Limb edges

– One continuous surface occludes another– Surface normal changes smoothly and becomes

perpendicular to the viewing angle– Surface ultimately occludes itself– Curved surfaces– Double arrowhead

Labeling Edges• Mark edges

– Change of albedo (reflectance) on the surface– A marking on the surface– No occlusion is involved– Any shape surfaces– M

M

Labeling Edges• Crease edges

– Change in surface due to the joining of two surfaces– Can be convex or concave– No occlusion is involved– Abrupt changes – not smooth, curved surfaces– Convex (+), Concave (-)

+

--

++

Labeling Edges

• Illumination edges– Change in illumination– Typically a shadow– No surface changes– S

S

Labeling Edges

• Jump edges– A blade or limb with a large depth discontinuity

across the boundary

Labeling Edges

• This edge labeling scheme is proposed by a few researchers

• There are variations

• You don’t have to do it this way if it doesn’t suit the needs of your application

• Choose whatever scheme you want– Just make sure you are consistent

Vertices

• A Vertex is the place where two or more lines intersect

• Observations regarding the types of vertices possible when mapping 3D objects into a 2D space have been made

• Assumes trihedral corners– Restricted to a “blocks world” but may be useful

elsewhere

Vertices

• L junctions

+ + -

-

Vertices

• Arrow junctions

+-

+

+

- -

+

Vertices

• Fork junctions

+

++

-

--

-

Vertices

• T junctions

+ -

Vertex Labeling

• Assume the shape is sitting on a table

-+

++

+

+

+

--

S

S

?

? is special edge called a “crack”

May be labeled as an “S”

Edge/Vertex Labeling

• To do such labeling programmatically one would employ a Relaxation algorithm– Essentially, you start by assigning all labels to all

edges– Eliminate inconsistent labels based on the vertex

type, working on all edges in parallel– Repeat until no more changes occur

Perceptual Objects

• Outside of the blocks world– Long, thin ribbon-like objects

• Made of [nearly] parallel lines

• Straight or Curved

– Region objects• Homogeneous intensity

• Homogeneous texture

• Bounded by well defined edges

Perceptual Objects

Perceptual Organization

Model Graph Representation

R1

R2

R3

G1Intersects(70)

Intersects(70)

Intersects(40)

Bounds

Bounds

Bounds

Rx – Ribbon Structure

Gx – Region Structure

Model Graph Representation

• Each node may have additional information– Ribbon nodes may have

• Width

• Orientation

• Intensity

– Region nodes may have• Moments

• Intensity

Model Contents

• The Model may be a 3D representation• If camera information such as

– Orientation– Distance– Lens– etc.is available, then…

• This information can be used to transform the model into the image space– Create a 2D rendering of the 3D model– The book refers to this a Pose Consistency– The reverse problem can also be estimated

Model Matching

• Matching can proceed after feature extraction– Extract features from the image– Create the scene graph– Match the scene graph to the model graph using

graph theoretic methods

• Matching can proceed during feature extraction– Guide the areas of concentration for the feature

detectors

Model Matching

• Matching that proceeds after feature extraction– Depth first tree search

– Can be a very, very slow process

– Heuristics may help• Anchor to the most important/likely objects

• Matching that proceeds during feature extraction– Set up processing windows within the image

– System may hallucinate (see things that aren’t really there)

Model Matching

• Difficult to make the system…– Orientation dependent– Illumination dependent

• “Difficult” doesn’t mean “impossible”– Just means it’ll take more time to perform the

search

Relaxation Labeling

• Formally stated– An iterative process that attempts to assign labels to objects

based on local constraints (based on an object’s description) and global constraints (how the assignment affects the labeling of other objects)

• The technique has been used for many matching applications– Object labeling– Stereo correspondence– Motion correspondence– Model matching

Perceptual Grouping

• How many rectangles are in the picture?

Perceptual Grouping

• How many rectangles are in the picture?– One possible answer

Perceptual Grouping

• It depends on what the picture represents– What is the underlying model?

• Toy blocks?

• Projective aerial view of a building complex?

• Rectangles drawn on a flat piece of paper?

– Was the image sensor noisy? (long lines got broken up)

– Depending on the answer, you may solve the problem with• Relaxation labeling

• Graph matching

• Neural network based training/learning

Summary

• High level vision is probably the least understood– It requires more than just an understanding of

detectors– It requires understanding of the data structures

used to represent the objects and the logic structures for reasoning about the objects

• This is where computer vision moves from the highly mathematical field of image processing into the symbolic field of artificial intelligence

Things To Do

• Reading for Next Week– Multi-Frame Processing

• Chapter 10 – The Geometry of Multiple Views

• Chapter 11 – Stereopsis

Final Exam

• See handout

recap low level vision –input: pixel values from the imaging device –data structure: 2d array,...

Documents

labeling edges limb

labeling edges blade

labeling edges jump

surface orientation

labeling edges edge

meaning slide

continuous surface

surface changes ss s