recap low level vision –input: pixel values from the imaging device –data structure: 2d array,...

38
Recap • Low Level Vision – Input: pixel values from the imaging device – Data structure: 2D array, homogeneous – Processing: • 2D neighborhood operations • Histogram based operation • Image enhancements • Feature extraction – Edges – Regions

Upload: clarence-ferguson

Post on 25-Dec-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Recap Low Level Vision –Input: pixel values from the imaging device –Data structure: 2D array, homogeneous –Processing: 2D neighborhood operations Histogram

Recap

• Low Level Vision– Input: pixel values from the imaging device– Data structure: 2D array, homogeneous– Processing:

• 2D neighborhood operations• Histogram based operation• Image enhancements• Feature extraction

– Edges– Regions

Page 2: Recap Low Level Vision –Input: pixel values from the imaging device –Data structure: 2D array, homogeneous –Processing: 2D neighborhood operations Histogram

Recap

• Mid Level Vision– Input: features from low level processing– Data structures: lists, arrays, heterogeneous– Processing:

• Pixel/feature grouping operations• Model based operations• Object descriptions

– Lines – orientation, location– Regions – central moments– Relationships amongst objects

Page 3: Recap Low Level Vision –Input: pixel values from the imaging device –Data structure: 2D array, homogeneous –Processing: 2D neighborhood operations Histogram

What’s Left To Do?

• High Level Vision– Input:

• Symbolic objects from mid level processing• Models of the objects to be recognized (a priori knowledge)

– Data structures: lists, arrays, heterogeneous– Processing:

• Object correspondence– Local correspondences represent individual object (or component)

recognitions– Global correspondences represent recognition of an object in the

context of the scene

• Search problem– Typified by the “NP-Complete” question

Page 4: Recap Low Level Vision –Input: pixel values from the imaging device –Data structure: 2D array, homogeneous –Processing: 2D neighborhood operations Histogram

High Level Vision

• Goal is to interpret the 2D structure (image pixels) as a 3D scene

• Humans do this very well– This is a problem for computer

vision researchers– Our competition is fierce and

unrelenting• To achieve our goal we don’t necessarily need to

mimic the biological system– We’re not trying to explain how human vision works, we’re

just trying to achieve comparable results

Page 5: Recap Low Level Vision –Input: pixel values from the imaging device –Data structure: 2D array, homogeneous –Processing: 2D neighborhood operations Histogram

High Level Vision

• The 3D scene is made up of– Objects– Illumination due to light sources

• The appearance of boundaries between object surfaces is dependent on their orientation relative to the light source (and surface material, sensing device…)– This is where we get edges from

Page 6: Recap Low Level Vision –Input: pixel values from the imaging device –Data structure: 2D array, homogeneous –Processing: 2D neighborhood operations Histogram

Labeling Edges

• In a 3D scene, edges have very specific meanings– They separate objects from one another

• Occlusion

– They demark sudden changes in surface orientation within a single object

– The demark sudden changed in surface albedo (light reflectance)

– A shadow cast by an object

Page 7: Recap Low Level Vision –Input: pixel values from the imaging device –Data structure: 2D array, homogeneous –Processing: 2D neighborhood operations Histogram

Labeling Edges

• Edge detectors can provide some information regarding the meaning

Page 8: Recap Low Level Vision –Input: pixel values from the imaging device –Data structure: 2D array, homogeneous –Processing: 2D neighborhood operations Histogram

Labeling Edges

• But additional information must be provided through logic and reasoning

• Under some constrained “worlds” we can identify all possible edge and vertex types– “Blocks World” assumption

• Toy blocks

• Trihedral vertices

– Sounds simple but much has been learned from studying such worlds

Page 9: Recap Low Level Vision –Input: pixel values from the imaging device –Data structure: 2D array, homogeneous –Processing: 2D neighborhood operations Histogram

Labeling Edges

• Blade edges– One continuous surface occludes another– Surface normal changes smoothly– Curved surfaces– Single arrowhead

Page 10: Recap Low Level Vision –Input: pixel values from the imaging device –Data structure: 2D array, homogeneous –Processing: 2D neighborhood operations Histogram

Labeling Edges• Limb edges

– One continuous surface occludes another– Surface normal changes smoothly and becomes

perpendicular to the viewing angle– Surface ultimately occludes itself– Curved surfaces– Double arrowhead

Page 11: Recap Low Level Vision –Input: pixel values from the imaging device –Data structure: 2D array, homogeneous –Processing: 2D neighborhood operations Histogram

Labeling Edges• Mark edges

– Change of albedo (reflectance) on the surface– A marking on the surface– No occlusion is involved– Any shape surfaces– M

M

Page 12: Recap Low Level Vision –Input: pixel values from the imaging device –Data structure: 2D array, homogeneous –Processing: 2D neighborhood operations Histogram

Labeling Edges• Crease edges

– Change in surface due to the joining of two surfaces– Can be convex or concave– No occlusion is involved– Abrupt changes – not smooth, curved surfaces– Convex (+), Concave (-)

+

--

++

Page 13: Recap Low Level Vision –Input: pixel values from the imaging device –Data structure: 2D array, homogeneous –Processing: 2D neighborhood operations Histogram

Labeling Edges

• Illumination edges– Change in illumination– Typically a shadow– No surface changes– S

S

Page 14: Recap Low Level Vision –Input: pixel values from the imaging device –Data structure: 2D array, homogeneous –Processing: 2D neighborhood operations Histogram

Labeling Edges

• Jump edges– A blade or limb with a large depth discontinuity

across the boundary

Page 15: Recap Low Level Vision –Input: pixel values from the imaging device –Data structure: 2D array, homogeneous –Processing: 2D neighborhood operations Histogram

Labeling Edges

• This edge labeling scheme is proposed by a few researchers

• There are variations

• You don’t have to do it this way if it doesn’t suit the needs of your application

• Choose whatever scheme you want– Just make sure you are consistent

Page 16: Recap Low Level Vision –Input: pixel values from the imaging device –Data structure: 2D array, homogeneous –Processing: 2D neighborhood operations Histogram

Vertices

• A Vertex is the place where two or more lines intersect

• Observations regarding the types of vertices possible when mapping 3D objects into a 2D space have been made

• Assumes trihedral corners– Restricted to a “blocks world” but may be useful

elsewhere

Page 17: Recap Low Level Vision –Input: pixel values from the imaging device –Data structure: 2D array, homogeneous –Processing: 2D neighborhood operations Histogram

Vertices

• L junctions

+ + -

-

Page 18: Recap Low Level Vision –Input: pixel values from the imaging device –Data structure: 2D array, homogeneous –Processing: 2D neighborhood operations Histogram

Vertices

• Arrow junctions

+-

+

+

- -

+

Page 19: Recap Low Level Vision –Input: pixel values from the imaging device –Data structure: 2D array, homogeneous –Processing: 2D neighborhood operations Histogram

Vertices

• Fork junctions

+

++

-

--

-

Page 20: Recap Low Level Vision –Input: pixel values from the imaging device –Data structure: 2D array, homogeneous –Processing: 2D neighborhood operations Histogram

Vertices

• T junctions

+ -

Page 21: Recap Low Level Vision –Input: pixel values from the imaging device –Data structure: 2D array, homogeneous –Processing: 2D neighborhood operations Histogram

Vertex Labeling

• Assume the shape is sitting on a table

-+

++

+

+

+

--

S

S

?

? is special edge called a “crack”

May be labeled as an “S”

Page 22: Recap Low Level Vision –Input: pixel values from the imaging device –Data structure: 2D array, homogeneous –Processing: 2D neighborhood operations Histogram

Edge/Vertex Labeling

• To do such labeling programmatically one would employ a Relaxation algorithm– Essentially, you start by assigning all labels to all

edges– Eliminate inconsistent labels based on the vertex

type, working on all edges in parallel– Repeat until no more changes occur

Page 23: Recap Low Level Vision –Input: pixel values from the imaging device –Data structure: 2D array, homogeneous –Processing: 2D neighborhood operations Histogram

Perceptual Objects

• Outside of the blocks world– Long, thin ribbon-like objects

• Made of [nearly] parallel lines

• Straight or Curved

– Region objects• Homogeneous intensity

• Homogeneous texture

• Bounded by well defined edges

Page 24: Recap Low Level Vision –Input: pixel values from the imaging device –Data structure: 2D array, homogeneous –Processing: 2D neighborhood operations Histogram

Perceptual Objects

Page 25: Recap Low Level Vision –Input: pixel values from the imaging device –Data structure: 2D array, homogeneous –Processing: 2D neighborhood operations Histogram

Perceptual Organization

Page 26: Recap Low Level Vision –Input: pixel values from the imaging device –Data structure: 2D array, homogeneous –Processing: 2D neighborhood operations Histogram

Model Graph Representation

R1

R2

R3

G1Intersects(70)

Intersects(70)

Intersects(40)

Bounds

Bounds

Bounds

Rx – Ribbon Structure

Gx – Region Structure

Page 27: Recap Low Level Vision –Input: pixel values from the imaging device –Data structure: 2D array, homogeneous –Processing: 2D neighborhood operations Histogram

Model Graph Representation

• Each node may have additional information– Ribbon nodes may have

• Width

• Orientation

• Intensity

– Region nodes may have• Moments

• Intensity

Page 28: Recap Low Level Vision –Input: pixel values from the imaging device –Data structure: 2D array, homogeneous –Processing: 2D neighborhood operations Histogram

Model Contents

• The Model may be a 3D representation• If camera information such as

– Orientation– Distance– Lens– etc.is available, then…

• This information can be used to transform the model into the image space– Create a 2D rendering of the 3D model– The book refers to this a Pose Consistency– The reverse problem can also be estimated

Page 29: Recap Low Level Vision –Input: pixel values from the imaging device –Data structure: 2D array, homogeneous –Processing: 2D neighborhood operations Histogram

Model Matching

• Matching can proceed after feature extraction– Extract features from the image– Create the scene graph– Match the scene graph to the model graph using

graph theoretic methods

• Matching can proceed during feature extraction– Guide the areas of concentration for the feature

detectors

Page 30: Recap Low Level Vision –Input: pixel values from the imaging device –Data structure: 2D array, homogeneous –Processing: 2D neighborhood operations Histogram

Model Matching

• Matching that proceeds after feature extraction– Depth first tree search

– Can be a very, very slow process

– Heuristics may help• Anchor to the most important/likely objects

• Matching that proceeds during feature extraction– Set up processing windows within the image

– System may hallucinate (see things that aren’t really there)

Page 31: Recap Low Level Vision –Input: pixel values from the imaging device –Data structure: 2D array, homogeneous –Processing: 2D neighborhood operations Histogram

Model Matching

• Difficult to make the system…– Orientation dependent– Illumination dependent

• “Difficult” doesn’t mean “impossible”– Just means it’ll take more time to perform the

search

Page 32: Recap Low Level Vision –Input: pixel values from the imaging device –Data structure: 2D array, homogeneous –Processing: 2D neighborhood operations Histogram

Relaxation Labeling

• Formally stated– An iterative process that attempts to assign labels to objects

based on local constraints (based on an object’s description) and global constraints (how the assignment affects the labeling of other objects)

• The technique has been used for many matching applications– Object labeling– Stereo correspondence– Motion correspondence– Model matching

Page 33: Recap Low Level Vision –Input: pixel values from the imaging device –Data structure: 2D array, homogeneous –Processing: 2D neighborhood operations Histogram

Perceptual Grouping

• How many rectangles are in the picture?

Page 34: Recap Low Level Vision –Input: pixel values from the imaging device –Data structure: 2D array, homogeneous –Processing: 2D neighborhood operations Histogram

Perceptual Grouping

• How many rectangles are in the picture?– One possible answer

Page 35: Recap Low Level Vision –Input: pixel values from the imaging device –Data structure: 2D array, homogeneous –Processing: 2D neighborhood operations Histogram

Perceptual Grouping

• It depends on what the picture represents– What is the underlying model?

• Toy blocks?

• Projective aerial view of a building complex?

• Rectangles drawn on a flat piece of paper?

– Was the image sensor noisy? (long lines got broken up)

– Depending on the answer, you may solve the problem with• Relaxation labeling

• Graph matching

• Neural network based training/learning

Page 36: Recap Low Level Vision –Input: pixel values from the imaging device –Data structure: 2D array, homogeneous –Processing: 2D neighborhood operations Histogram

Summary

• High level vision is probably the least understood– It requires more than just an understanding of

detectors– It requires understanding of the data structures

used to represent the objects and the logic structures for reasoning about the objects

• This is where computer vision moves from the highly mathematical field of image processing into the symbolic field of artificial intelligence

Page 37: Recap Low Level Vision –Input: pixel values from the imaging device –Data structure: 2D array, homogeneous –Processing: 2D neighborhood operations Histogram

Things To Do

• Reading for Next Week– Multi-Frame Processing

• Chapter 10 – The Geometry of Multiple Views

• Chapter 11 – Stereopsis

Page 38: Recap Low Level Vision –Input: pixel values from the imaging device –Data structure: 2D array, homogeneous –Processing: 2D neighborhood operations Histogram

Final Exam

• See handout