damien blond alim fazal tory richard april 11th, 2000 perception

73
Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Upload: lynne-kennedy

Post on 05-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Damien Blond

Alim Fazal

Tory Richard

April 11th, 2000

PERCEPTION

Page 2: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Outline

9.1: Introduction9.2: Image Formation9.3: Image Processing Operations for Early Vision9.4: Extracting 3D Information using Vision9.5: Using Vision for Manipulation and Navigation9.6: Object Representation and Recognition9.7: Summary

Page 3: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Introduction

• Perception provides agents with information about the world they inhabit.

• A sensor is anything that can change the computational state of the agent in response to a change in the state of the world.

• The sensors that agents share with humans are vision, hearing, and touch.

Page 4: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Introduction

• The main focus of the sensors will be on the processing of the raw information that they provide.

• Where S is the sensory stimulus and W is the world.

S=f(W)

• In order to gain information about the world we can take the straightforward approach and invert the equation.

W=f-1(S)

Page 5: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Introduction

• A drawback of the straightforward approach is that it is trying to solve too difficult a problem.

• In many cases, the agent does not need to know everything about the world.

• Sometimes just one or two predicates are needed.

Page 6: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Introduction

Some of the possible uses for Vision:

• Manipulation – Grasping, insertion, needs local shape information and feedback for motor control.

• Navigation – Finding clear paths, avoiding obstacles, calculating one’s current velocity and orientation.

• Object Recognition – A useful skill for distinguishing between multiple objects.

Page 7: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Outline

9.1: Introduction9.2: Image Formation9.3: Image Processing Operations for Early Vision9.4: Extracting 3D Information using Vision9.5: Using Vision for Manipulation and Navigation9.6: Object Representation and Recognition9.7: Summary

Page 8: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Outline

9.2: Image Formation Pinhole Camera Lens Systems Photometry of Image Formation

Page 9: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Image Formation

• Vision works by gathering light scattered from objects in the scene and creating a 2-D image.

• It’s important to the understand the geometry of the process in order to obtain information about the scene.

Page 10: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Image Formation

Page 11: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Image Formation

Perspective Project Equations

-x/f = X/Z, -y/f = Y/Z =>

x = (-fX)/Z, y = (-fY)/Z

Page 12: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Image Formation

• The Perspective projection is often approximated using orthographic projection, but there is an important difference.

• The Orthographic projection does not project vectors through a pinhole.

• Instead, the vectors run parallel, either perpendicular to or at a consistent angle from the image plane.

Page 13: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Lens Systems

• Both human and artificial eyes use a lens.

• The lens is wider than a pinhole, allowing more light to enter, increasing the information collected.

• The human eye focuses by bending the shape of the lens.

•Artificial eyes focus by changing the distance between the lens and the image plane.

Page 14: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Photometry of Image Formation

Page 15: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Photometry of Image Formation

Page 16: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Photometry of Image Formation

• A processed image plane contains a brightness value for each pixel.

• The brightness of a pixel p in the image is proportional to amount of light directed toward the camera by the surface patch Sp that projects to pixel p.

• The light is characterized as being either Diffuse or Specular reflection.

Page 17: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Photometry of Image Formation

• Diffuse reflection redirects light equally in all directions, and is common for dull surfaces.

• It is described by the following equation, known as Lambert's formula: E = p E0cos(theta)

where p describes how dull/shiny the surface is, E0 is the intensity of the light source and (theta) is the angle between the light direction and surface normal.

Page 18: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Photometry of Image Formation

• Phong's formula: E = p E0cosm (theta)

• p is the coefficient of Specular reflection• E0 is the intensity of the light source• m is the 'shininess' of the surface• (theta) is the angle between the light direction and surface normal.

Page 19: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Photometry of Image Formation

• In real life, surfaces exhibit a combination of diffuse and specular properties.

• Modeling this on the computer is what computer graphics is all about.

•Rendering realistic images is usually done by ray tracing.

Page 20: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Outline

9.1: Introduction9.2: Image Formation9.3: Image Processing Operations for Early Vision9.4: Extracting 3D Information using Vision9.5: Using Vision for Manipulation and Navigation9.6: Object Representation and Recognition9.7: Summary

Page 21: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Outline

9.3: Image Processing Operations for Early VisionEdge Detection

Page 22: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Image-Processing Operations

Edge Detection

• Edges are curves in the image plane across which there is a “significant” change in image brightness.

• The goal of edge detection is the construction of an idealized line drawing

Page 23: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Image-Processing Operations

•One idea to detect edges is to differentiate the image and look for places where the brightness undergoes a sharp change•Consider a 1-D example. Below is an intensity profile for a 1-D image.

Page 24: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Image-Processing Operations

•Below we have the derivative of the previous graph.

•Here we have a peak at x=18, x=50 and x=75.•These errors are due to the presence of noise in the image.

Page 25: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Image-Processing Operations

• This problem is countered by convolving a smoothing function along with the differentiation operation.

• The mathematical concept of convolution allows us to perform many useful image-processing operations.

Page 26: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Image-Processing Operations

• One standard form of smoothing is to use a Gaussian function.

• Now using the idea of convolving with the Gaussian functionwe can revisit the 1-D example.

Page 27: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Image-Processing Operations

• With the convolving applied we can more easily see the edge at x=50.

Using convolving we are able to discover where edges are locatedand this allows us to make an accurate line drawing.

Page 28: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Image-Processing Operations

•Here is an example of using convolving in an 2-D picture of Mona Lisa

Page 29: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Outline

9.1: Introduction9.2: Image Formation9.3: Image Processing Operations for Early Vision9.4: Extracting 3D Information using Vision9.5: Using Vision for Manipulation and Navigation9.6: Object Representation and Recognition9.7: Summary

Page 30: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Outline

9.4: Extracting 3D Information using VisionMotionBinocular StereopsisTexture GradientShadingContour

Page 31: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Extracting 3-D Information Using Vision

We need to extract 3-D information for performing certain tasks such as manipulation, navigation, and recognition.

Three aspects:1.Segmentation2.Position & Orientation3.Shape

To recover 3-D information there are a number of cues available including motion, binocular stereopsis, texture, shading and contour.

Page 32: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Extracting 3-D Information Using VisionMotion

• Optical Flow - resulting motion when a camera moves relative to the 3-D scene.

Page 33: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Extracting 3-D Information Using Vision

• To measure Optical Flow, we need to find corresponding points between one time frame and the next.

• One formula is Sum of Squared Differences (SSD)SSD(Dx, Dy) = (x,y) (I(x, y, t) - I(x+Dx, y+Dy, t+Dt))2

Page 34: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Extracting 3-D Information Using Vision

The other formula to show this is Cross-Correlation(CC):CC(Dx, Dy) = (x,y) I(x, y, t)I(x+Dx, y+Dy, t+Dt)

• Cross-Correlation works best when there is texture in the scene. Because there is a significant brightness variation among the pixels.

Page 35: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Extracting 3-D Information Using Vision

Binocular Stereopsis

•Binocular stereopsis uses multiple images in space. Where as motion used multiple images over time.

• Because the scenes will be in a different places relative to the z-axis, if we superpose the two images, there will be disparity in the location of important features.

Page 36: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Extracting 3-D Information Using Vision

• This also allows us to easily determine depth. Knowing the distance between the cameras, and the point at which their lines of sight intersect, it only requires a few simple geometric calculations to determine the depth coordinate z for any given (x, y) coordinate.

Page 37: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Extracting 3-D Information Using Vision

Texture Gradient

• Texture refers to a spatially repeating pattern on a surface that can be sensed visually.

•In the images, the apparent size, shape, spacing of the repeating texture elements(texels) vary.

Page 38: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Extracting 3-D Information Using Vision

The two main causes for this variation in size are:

• Varying distance from the camera to the different texture elements.• Varying orientation of the texel relative to the line of sight from the camera.

It is possible to express the rate of change of these texel features, by using some mathematical analysis called texture gradients.

Page 39: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Extracting 3-D Information Using Vision

Texture can be used to determine shape via a two-step process: (a) measure the texture gradients and (b) estimate the surface shape, slant and tilt that could give rise to them.

Page 40: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Extracting 3-D Information Using Vision

Shading

• The variation in the intensity of light received from different portions of a surface in the scene.

• Given the image brightness, I (x, y), our hope is to recover the scene geometry and the reflectance properties of the object.

•But this has proved difficult to do in anything but the simplest cases.

Page 41: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Extracting 3-D Information Using Vision

• The main problem is with dealing with interreflections.

• In most scenes the surfaces are not only illuminated by the light sources, but also by the light reflected from other surfaces which serve as a secondary light source.

• These mutual illumination effects are quite significant.

Page 42: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Extracting 3-D Information Using Vision

Contour

• The use of lines in a line drawing to get a vivid perception of 3-D shapes and layout.

• Determine the exact significance of each line in an image.

•Also called the line labeling problem as the task is to label each line according to its significance.

Page 43: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Extracting 3-D Information Using Vision

•In a simplified world, where all surface marks and shadows have been removed all the lines can be classified as either limbs or edges.

• Limbs are the locus point on the surface where the line of sight is tangent to the surface.

• Edge is a surface normal discontinuity.

• Each edge can be further broken up into convex, concave and occluding edges.

Page 44: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Extracting 3-D Information Using Vision

• "+" and "-" labels represent convex and concave edges respectively.• "<-" and "->" labels represent occluding edges. • "<-<-" and "->->" labels represent limbs.

Page 45: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Extracting 3-D Information Using Vision

In 1971 two men (Huffman and Clowes) independently studied the line labeling problem for trihedral solids – objects in which exactly three plane surfaces come together at each vertex.

Page 46: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Extracting 3-D Information Using Vision

For this particular trihedral world, Huffman and Clowes made an exhaustive list of all the different vertex types and the different ways in which they could be viewed under general view point.

Page 47: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Extracting 3-D Information Using Vision

They created a junction dictionary to find a labeling for the line drawing. Later this work was generalized for arbitrary polyhedral and for piecewise smooth curved objects.

Page 48: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Outline

9.1: Introduction9.2: Image Formation9.3: Image Processing Operations for Early Vision9.4: Extracting 3D Information using Vision9.5: Using Vision for Manipulation and Navigation9.6: Object Representation and Recognition9.7: Summary

Page 49: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Outline

9.5: Using Vision for Manipulation and NavigationDriving ExampleLateral ControlLongitudinal Control

Page 50: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Using Vision for Manipulation and Navigation

• One of the main uses of vision is to provide information for manipulating objects as well as navigating in a scene while avoiding obstacles.

• A perfect example of the use of vision is the driving example.

Page 51: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Using Vision for Manipulation and Navigation

Figure 24.24: The information needed for visual control of a vehicle on a freeway.

Page 52: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Using Vision for Manipulation and Navigation

The tasks for the driver in Figure 24.24:

1. Keep moving at a reasonable speed. (v0)2. Lateral control. (dl = dr)3. Longitudinal control. (d2 = safe distance)4. Monitor vehicles in neighboring lanes and be prepared for action if one of them decides to change lanes.

Page 53: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Using Vision for Manipulation and Navigation

• The problem for the driver is to generate appropriate steering, actuation or braking actions to best accomplish these tasks.

• Focusing specifically on lateral and longitudinal control, what information is needed for these tasks?

Page 54: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Using Vision for Manipulation and Navigation

Lateral Control:• The steering control system for the vehicle needs to detect edges corresponding to the lane marker segments and then needs to fit smooth curves to these.

• The parameters of these curves carry information about the lateral position of the car, the direction it is pointing relative to the lane, and the curvature of the lane.

• The dynamics of the vehicle are also needed.

Page 55: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Using Vision for Manipulation and Navigation

Longitudinal Control:• The driver needs to know the distance to the vehicles in front.

• This can be accomplished using binocularstereopsis or optical flow.

• The driving example makes one point very clear: for a specific task, one does not need to recover all the information that in principle can be recovered from an image.

Page 56: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Outline

9.1: Introduction9.2: Image Formation9.3: Image Processing Operations for Early Vision9.4: Extracting 3D Information using Vision9.5: Using Vision for Manipulation and Navigation9.6: Object Representation and Recognition9.7: Summary

Page 57: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Outline

9.6: Object Representation and Recognition Alignment Method Projective Invariants Representation of Models Matching Models to Images

Page 58: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Object Representation and Recognition

Problem:

• Given: a scene consisting of one or more objects chosen from a collection of objects and an image of the scene taken from an unknown viewer position and orientation.

• Determine: Which of the objects from the collection are present in the scene and for each object, determine its position and orientation relative to the viewer.

Page 59: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Object Representation and Recognition

• The two fundamental issues that any object recognition scheme must address are the representation of the models and the matching of models to images.

• The most common way of representing objects in a recognition system in 3D is by using generalized cylinders.

Page 60: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Object Representation and Recognition

• Examples ofGeneralizedCylinders:

Page 61: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Object Representation and Recognition

Alignment Method:

• Handy for identifying 3D objects without knowing their position or orientation in respect to the observer.

• Accomplishes this by representing the object with a set of m features or distinguishing points in 3D.

• The points are then subjected to 3D rotation R, followed by a translation by unknown amount t and projection to give rise to image feature points on the image plane.

Page 62: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Object Representation and Recognition

• A disadvantage of this model is that this involves trying each model in the model library, resulting in a recognition complexity proportional to the number of models in the library.

• A solution is provided by using geometric invariants as the shape representation. This model uses the projective invariants measured from the image curves.

Page 63: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Object Representation and Recognition

• When an invariant that is measured corresponds to a value in the library, a recognition hypothesis is generated.This is verified by back projecting the outline just like the alignment method.

• An advantage of invariant shape representation is that models can be acquired directly from images without making measurements on the actual objects because the shape descriptors have the same value when measured in any image.

Page 64: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Object Representation and Recognition

• Although the computer is capable of recognizing a broad array of images, there are some images that are currently nearly impossible for the computer to recognize.

Page 65: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Object Representation and Recognition

• Other images show ambiguities that humans are capable of handling with little difficulties. • Can a computer algorithm distinguish which object is intended when there are a number of possible objects?

Page 66: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Object Representation and Recognition

• Further Examples:

Page 67: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Object Representation and Recognition

• There are also other images which exist in 2D but cannot exist in 3D, can a computer algorithm detect this?

Page 68: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Summary

• Perception Agents

• Perception Sensors

• The Straightforward Approach

• Manipulation

• Navigation

• Object Recognition

Page 69: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Summary

• Perceptive Projection

• Orthographic Projection

• Lens Systems

•Photometry of Image Formation

Page 70: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Summary

• Edges

• Convolving

•The smoothing Gaussian function

Page 71: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Summary

• Motion

• Binocular stereopsis

• Texture

• Shading • Contour

Page 72: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Summary

• Some main uses of vision

• Representation of models

• Matching of models to images

• Alignment Method

• Projective Invariants

• Problems with Object Recognition

Page 73: Damien Blond Alim Fazal Tory Richard April 11th, 2000 PERCEPTION

Sources

• 533 Text book• http://sern.ucalgary.ca/courses/CPSC/533/W99/presentations/L2_24A_Lee_Wang/ http://sern.ucalgary.ca/courses/CPSC/533/W99/presentations/L1_24A_Kaasten_Steller_Hoang/main.htm http://sern.ucalgary.ca/courses/CPSC/533/W99/presentations/L1_24_Schebywolok/index.html http://sern.ucalgary.ca/courses/CPSC/533/W99/presentations/L2_24B_Doering_Grenier/• http://www.geocities.com/SoHo/Museum/3828/optical.html• http://members.spree.com/funNgames/katbug/