damien blond alim fazal tory richard april 11th, 2000 perception

Damien Blond

Alim Fazal

Tory Richard

April 11th, 2000

PERCEPTION

Outline

9.1: Introduction9.2: Image Formation9.3: Image Processing Operations for Early Vision9.4: Extracting 3D Information using Vision9.5: Using Vision for Manipulation and Navigation9.6: Object Representation and Recognition9.7: Summary

Introduction

• Perception provides agents with information about the world they inhabit.

• A sensor is anything that can change the computational state of the agent in response to a change in the state of the world.

• The sensors that agents share with humans are vision, hearing, and touch.

Introduction

• The main focus of the sensors will be on the processing of the raw information that they provide.

• Where S is the sensory stimulus and W is the world.

S=f(W)

• In order to gain information about the world we can take the straightforward approach and invert the equation.

W=f-1(S)

Introduction

• A drawback of the straightforward approach is that it is trying to solve too difficult a problem.

• In many cases, the agent does not need to know everything about the world.

• Sometimes just one or two predicates are needed.

Introduction

Some of the possible uses for Vision:

• Manipulation – Grasping, insertion, needs local shape information and feedback for motor control.

• Navigation – Finding clear paths, avoiding obstacles, calculating one’s current velocity and orientation.

• Object Recognition – A useful skill for distinguishing between multiple objects.

Outline


Outline

9.2: Image Formation Pinhole Camera Lens Systems Photometry of Image Formation

Image Formation

• Vision works by gathering light scattered from objects in the scene and creating a 2-D image.

• It’s important to the understand the geometry of the process in order to obtain information about the scene.

Image Formation

Image Formation

Perspective Project Equations

-x/f = X/Z, -y/f = Y/Z =>

x = (-fX)/Z, y = (-fY)/Z

Image Formation

• The Perspective projection is often approximated using orthographic projection, but there is an important difference.

• The Orthographic projection does not project vectors through a pinhole.

• Instead, the vectors run parallel, either perpendicular to or at a consistent angle from the image plane.

Lens Systems

• Both human and artificial eyes use a lens.

• The lens is wider than a pinhole, allowing more light to enter, increasing the information collected.

• The human eye focuses by bending the shape of the lens.

•Artificial eyes focus by changing the distance between the lens and the image plane.

Photometry of Image Formation


• A processed image plane contains a brightness value for each pixel.

• The brightness of a pixel p in the image is proportional to amount of light directed toward the camera by the surface patch Sp that projects to pixel p.

• The light is characterized as being either Diffuse or Specular reflection.


• Diffuse reflection redirects light equally in all directions, and is common for dull surfaces.

• It is described by the following equation, known as Lambert's formula: E = p E0cos(theta)

where p describes how dull/shiny the surface is, E0 is the intensity of the light source and (theta) is the angle between the light direction and surface normal.


• Phong's formula: E = p E0cosm (theta)

• p is the coefficient of Specular reflection• E0 is the intensity of the light source• m is the 'shininess' of the surface• (theta) is the angle between the light direction and surface normal.


• In real life, surfaces exhibit a combination of diffuse and specular properties.

• Modeling this on the computer is what computer graphics is all about.

•Rendering realistic images is usually done by ray tracing.

Outline


Outline

9.3: Image Processing Operations for Early VisionEdge Detection

Image-Processing Operations

Edge Detection

• Edges are curves in the image plane across which there is a “significant” change in image brightness.

• The goal of edge detection is the construction of an idealized line drawing


•One idea to detect edges is to differentiate the image and look for places where the brightness undergoes a sharp change•Consider a 1-D example. Below is an intensity profile for a 1-D image.


•Below we have the derivative of the previous graph.

•Here we have a peak at x=18, x=50 and x=75.•These errors are due to the presence of noise in the image.


• This problem is countered by convolving a smoothing function along with the differentiation operation.

• The mathematical concept of convolution allows us to perform many useful image-processing operations.


• One standard form of smoothing is to use a Gaussian function.

• Now using the idea of convolving with the Gaussian functionwe can revisit the 1-D example.


• With the convolving applied we can more easily see the edge at x=50.

Using convolving we are able to discover where edges are locatedand this allows us to make an accurate line drawing.


•Here is an example of using convolving in an 2-D picture of Mona Lisa

Outline


Outline

9.4: Extracting 3D Information using VisionMotionBinocular StereopsisTexture GradientShadingContour

Extracting 3-D Information Using Vision

We need to extract 3-D information for performing certain tasks such as manipulation, navigation, and recognition.

Three aspects:1.Segmentation2.Position & Orientation3.Shape

To recover 3-D information there are a number of cues available including motion, binocular stereopsis, texture, shading and contour.

Extracting 3-D Information Using VisionMotion

• Optical Flow - resulting motion when a camera moves relative to the 3-D scene.


• To measure Optical Flow, we need to find corresponding points between one time frame and the next.

• One formula is Sum of Squared Differences (SSD)SSD(Dx, Dy) = (x,y) (I(x, y, t) - I(x+Dx, y+Dy, t+Dt))2


The other formula to show this is Cross-Correlation(CC):CC(Dx, Dy) = (x,y) I(x, y, t)I(x+Dx, y+Dy, t+Dt)

• Cross-Correlation works best when there is texture in the scene. Because there is a significant brightness variation among the pixels.


Binocular Stereopsis

•Binocular stereopsis uses multiple images in space. Where as motion used multiple images over time.

• Because the scenes will be in a different places relative to the z-axis, if we superpose the two images, there will be disparity in the location of important features.


• This also allows us to easily determine depth. Knowing the distance between the cameras, and the point at which their lines of sight intersect, it only requires a few simple geometric calculations to determine the depth coordinate z for any given (x, y) coordinate.


Texture Gradient

• Texture refers to a spatially repeating pattern on a surface that can be sensed visually.

•In the images, the apparent size, shape, spacing of the repeating texture elements(texels) vary.


The two main causes for this variation in size are:

• Varying distance from the camera to the different texture elements.• Varying orientation of the texel relative to the line of sight from the camera.

It is possible to express the rate of change of these texel features, by using some mathematical analysis called texture gradients.


Texture can be used to determine shape via a two-step process: (a) measure the texture gradients and (b) estimate the surface shape, slant and tilt that could give rise to them.


Shading

• The variation in the intensity of light received from different portions of a surface in the scene.

• Given the image brightness, I (x, y), our hope is to recover the scene geometry and the reflectance properties of the object.

•But this has proved difficult to do in anything but the simplest cases.


• The main problem is with dealing with interreflections.

• In most scenes the surfaces are not only illuminated by the light sources, but also by the light reflected from other surfaces which serve as a secondary light source.

• These mutual illumination effects are quite significant.


Contour

• The use of lines in a line drawing to get a vivid perception of 3-D shapes and layout.

• Determine the exact significance of each line in an image.

•Also called the line labeling problem as the task is to label each line according to its significance.


•In a simplified world, where all surface marks and shadows have been removed all the lines can be classified as either limbs or edges.

• Limbs are the locus point on the surface where the line of sight is tangent to the surface.

• Edge is a surface normal discontinuity.

• Each edge can be further broken up into convex, concave and occluding edges.


• "+" and "-" labels represent convex and concave edges respectively.• "<-" and "->" labels represent occluding edges. • "<-<-" and "->->" labels represent limbs.


In 1971 two men (Huffman and Clowes) independently studied the line labeling problem for trihedral solids – objects in which exactly three plane surfaces come together at each vertex.


For this particular trihedral world, Huffman and Clowes made an exhaustive list of all the different vertex types and the different ways in which they could be viewed under general view point.


They created a junction dictionary to find a labeling for the line drawing. Later this work was generalized for arbitrary polyhedral and for piecewise smooth curved objects.

Outline


Outline

9.5: Using Vision for Manipulation and NavigationDriving ExampleLateral ControlLongitudinal Control

Using Vision for Manipulation and Navigation

• One of the main uses of vision is to provide information for manipulating objects as well as navigating in a scene while avoiding obstacles.

• A perfect example of the use of vision is the driving example.


Figure 24.24: The information needed for visual control of a vehicle on a freeway.


The tasks for the driver in Figure 24.24:

1. Keep moving at a reasonable speed. (v0)2. Lateral control. (dl = dr)3. Longitudinal control. (d2 = safe distance)4. Monitor vehicles in neighboring lanes and be prepared for action if one of them decides to change lanes.


• The problem for the driver is to generate appropriate steering, actuation or braking actions to best accomplish these tasks.

• Focusing specifically on lateral and longitudinal control, what information is needed for these tasks?


Lateral Control:• The steering control system for the vehicle needs to detect edges corresponding to the lane marker segments and then needs to fit smooth curves to these.

• The parameters of these curves carry information about the lateral position of the car, the direction it is pointing relative to the lane, and the curvature of the lane.

• The dynamics of the vehicle are also needed.


Longitudinal Control:• The driver needs to know the distance to the vehicles in front.

• This can be accomplished using binocularstereopsis or optical flow.

• The driving example makes one point very clear: for a specific task, one does not need to recover all the information that in principle can be recovered from an image.

Outline


Outline

9.6: Object Representation and Recognition Alignment Method Projective Invariants Representation of Models Matching Models to Images

Object Representation and Recognition

Problem:

• Given: a scene consisting of one or more objects chosen from a collection of objects and an image of the scene taken from an unknown viewer position and orientation.

• Determine: Which of the objects from the collection are present in the scene and for each object, determine its position and orientation relative to the viewer.


• The two fundamental issues that any object recognition scheme must address are the representation of the models and the matching of models to images.

• The most common way of representing objects in a recognition system in 3D is by using generalized cylinders.


• Examples ofGeneralizedCylinders:


Alignment Method:

• Handy for identifying 3D objects without knowing their position or orientation in respect to the observer.

• Accomplishes this by representing the object with a set of m features or distinguishing points in 3D.

• The points are then subjected to 3D rotation R, followed by a translation by unknown amount t and projection to give rise to image feature points on the image plane.


• A disadvantage of this model is that this involves trying each model in the model library, resulting in a recognition complexity proportional to the number of models in the library.

• A solution is provided by using geometric invariants as the shape representation. This model uses the projective invariants measured from the image curves.


• When an invariant that is measured corresponds to a value in the library, a recognition hypothesis is generated.This is verified by back projecting the outline just like the alignment method.

• An advantage of invariant shape representation is that models can be acquired directly from images without making measurements on the actual objects because the shape descriptors have the same value when measured in any image.


• Although the computer is capable of recognizing a broad array of images, there are some images that are currently nearly impossible for the computer to recognize.


• Other images show ambiguities that humans are capable of handling with little difficulties. • Can a computer algorithm distinguish which object is intended when there are a number of possible objects?


• Further Examples:


• There are also other images which exist in 2D but cannot exist in 3D, can a computer algorithm detect this?

Summary

• Perception Agents

• Perception Sensors

• The Straightforward Approach

• Manipulation

• Navigation

• Object Recognition

Summary

• Perceptive Projection

• Orthographic Projection

• Lens Systems

•Photometry of Image Formation

Summary

• Edges

• Convolving

•The smoothing Gaussian function

Summary

• Motion

• Binocular stereopsis

• Texture

• Shading • Contour

Summary

• Some main uses of vision

• Representation of models

• Matching of models to images

• Alignment Method

• Projective Invariants

• Problems with Object Recognition

Sources

• 533 Text book• http://sern.ucalgary.ca/courses/CPSC/533/W99/presentations/L2_24A_Lee_Wang/ http://sern.ucalgary.ca/courses/CPSC/533/W99/presentations/L1_24A_Kaasten_Steller_Hoang/main.htm http://sern.ucalgary.ca/courses/CPSC/533/W99/presentations/L1_24_Schebywolok/index.html http://sern.ucalgary.ca/courses/CPSC/533/W99/presentations/L2_24B_Doering_Grenier/• http://www.geocities.com/SoHo/Museum/3828/optical.html• http://members.spree.com/funNgames/katbug/

damien blond alim fazal tory richard april 11th, 2000 perception

Documents

image formation9

fyz image formation

image processing operations

processed image plane

d information

raw information

local shape information

early vision9