[ieee 2010 international conference on optoelectronics and image processing (icoip) - haiko, hainan,...

4
The Obstacle Avoidance and Navigation based on Stereo Vision for Mobile Robot ZHAO Yong-guo 1 ,CHENG Wei 1 ,JIA Lei 2 ,MA Si-le 2 1.Institute of Automation , Shandong Academy of Sciences . Jinan, China 250014. 2. School of Control Science and Engineering, Shandong University. Jinan, China 250061 [email protected] Abstract—In this paper, an overview of stereo vision, stereo image processing is introduced as well as a method of obstacle avoidance and navigation based on stereo vision for mobile robot is provided. Making use of 3-d stereo reconstruction, the recognition of visible terrain in front of the mobile robot is solved for successive obstacle avoidance. The area-based stereo reconstruction algorithm which combines the pyramidal data structure and dynamic programming technique has been used for the recognition of the local environment. The vision system can thus be used to identify, locate and approach mechanical objects autonomously. Keywords- Stereo vision, Obstacle avoidance, Mobile robot I. INTRODUCTION Machine vision is a useful robotic sensor since it mimics the human sense of vision and allows for non-contact measurement of the environment. A 3-D object gives rise to an infinite variety of 2D images or views, because of the infinite number of possible poses relative to the viewer. Two eyes or cameras looking at the same scene from different perspectives provide a mean for determining three- dimensional shape and position. Stereo is an important method for machine perception because it leads to direct depth measurements. Additionally, unlike monocular techniques, stereo does not infer depth from weak or unverifiable photometric and statistical assumptions, nor does it require specific detailed objects models. Once stereo images have been brought into point-to-point correspondence, recovering depth by triangulation is straightforward. The heart of vision-based obstacle avoidance and navigation system is stereo reconstruction of the surface relief. Extensive research experience in the field of stereo analysis has been accumulated worldwide. The known algorithms for passive stereo matching can be classified in two basic categories: Feature-based algorithms and Area- based[1-5]. The algorithms of both categories often use special methods to improve the matching reliability. The special issue in vision-guided navigation is the design of relatively stable and fast algorithm for the stereo reconstruction. II. STEREO VISION Viewing a scene from two (or more) different positions simultaneously allows us to make inferences about 3-D structure, provided that we can match up corresponding points in the images. This technique is called stereo vision. The geometric basis key problem in stereo vision is to find corresponding points in stereo images. Corresponding points are the projections of a single 3D point in the different image spaces. The difference in the position of corresponding points in their respective images is called disparity (see fig.1). Disparity is a function of both the position of the 3D scene point and of the position, orientation, and physical characteristics of the stereo devices (e.g. cameras). F l I l P l Left camera system F r I r P r Right camera system P P r P l I l I r Δ Δ = disparit y ( a ) ( b ) Fig.1: (a) a system with two cameras: the focal points are Fl and Fr, the image planes are Il and Ir. A point P in the 3D scene is projected onto Pl in the left image and onto Pr in the right image; (b) cyclopean view: the disparity is the difference in the position of the projection of the point P onto the two stereo image planes. F l I l P l = P’ l Left camera system F r I r P r Right camera system P P Stereo baseline P’ r Epipolar lines Epipolar plane Fig.2 Epipolar lines and epipolar planes In addition to providing the function that maps pair of corresponding images points onto scene points, a camera model can be used to constraint the search for corresponding image point to one dimension. Any point in the 3D world space together with the centers of projection of two cameras 2010 International Conference on Optoelectronics and Image Processing 978-0-7695-4252-2/10 $26.00 © 2010 IEEE DOI 10.1109/ICOIP.2010.14 567 2010 International Conference on Optoelectronics and Image Processing 978-0-7695-4252-2/10 $26.00 © 2010 IEEE DOI 10.1109/ICOIP.2010.14 565 2010 International Conference on Optoelectronics and Image Processing 978-0-7695-4252-2/10 $26.00 © 2010 IEEE DOI 10.1109/ICOIP.2010.14 565

Upload: si-le

Post on 07-Mar-2017

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: [IEEE 2010 International Conference on Optoelectronics and Image Processing (ICOIP) - Haiko, Hainan, China (2010.11.11-2010.11.12)] 2010 International Conference on Optoelectronics

The Obstacle Avoidance and Navigation based on Stereo Vision for Mobile Robot

ZHAO Yong-guo1,CHENG Wei1 ,JIA Lei2,MA Si-le2 1.Institute of Automation , Shandong Academy of Sciences . Jinan, China 250014. 2. School of Control Science and

Engineering, Shandong University. Jinan, China 250061 [email protected]

Abstract—In this paper, an overview of stereo vision, stereo image processing is introduced as well as a method of obstacle avoidance and navigation based on stereo vision for mobile robot is provided. Making use of 3-d stereo reconstruction, the recognition of visible terrain in front of the mobile robot is solved for successive obstacle avoidance. The area-based stereo reconstruction algorithm which combines the pyramidal data structure and dynamic programming technique has been used for the recognition of the local environment. The vision system can thus be used to identify, locate and approach mechanical objects autonomously.

Keywords- Stereo vision, Obstacle avoidance, Mobile robot

I. INTRODUCTION Machine vision is a useful robotic sensor since it mimics

the human sense of vision and allows for non-contact measurement of the environment. A 3-D object gives rise to an infinite variety of 2D images or views, because of the infinite number of possible poses relative to the viewer.

Two eyes or cameras looking at the same scene from different perspectives provide a mean for determining three-dimensional shape and position. Stereo is an important method for machine perception because it leads to direct depth measurements. Additionally, unlike monocular techniques, stereo does not infer depth from weak or unverifiable photometric and statistical assumptions, nor does it require specific detailed objects models. Once stereo images have been brought into point-to-point correspondence, recovering depth by triangulation is straightforward.

The heart of vision-based obstacle avoidance and navigation system is stereo reconstruction of the surface relief. Extensive research experience in the field of stereo analysis has been accumulated worldwide. The known algorithms for passive stereo matching can be classified in two basic categories: Feature-based algorithms and Area-based[1-5]. The algorithms of both categories often use special methods to improve the matching reliability. The special issue in vision-guided navigation is the design of relatively stable and fast algorithm for the stereo reconstruction.

II. STEREO VISION Viewing a scene from two (or more) different positions

simultaneously allows us to make inferences about 3-D structure, provided that we can match up corresponding points in the images. This technique is called stereo vision. The geometric basis key problem in stereo vision is to find

corresponding points in stereo images. Corresponding points are the projections of a single 3D point in the different image spaces. The difference in the position of corresponding points in their respective images is called disparity (see fig.1). Disparity is a function of both the position of the 3D scene point and of the position, orientation, and physical characteristics of the stereo devices (e.g. cameras).

Fl

Il

Pl

Left camerasystem

Fr

Ir

Pr

Right camerasystem

P

Pr Pl

Il

Ir

Δ

Δ = disparity

(a) (b)

Fig.1: (a) a system with two cameras: the focal points are Fl and Fr, the

image planes are Il and Ir. A point P in the 3D scene is projected onto Pl in the left image and onto Pr in the right image; (b) cyclopean view: the

disparity � is the difference in the position of the projection of the point P onto the two stereo image planes.

Fl

Il

Pl= P’l

Left camera system

Fr

Ir

Pr

Right camera system

PP

Stereo baseline

P’r

Epipolar lines

Epipolarplane

Fig.2 Epipolar lines and epipolar planes In addition to providing the function that maps pair of

corresponding images points onto scene points, a camera model can be used to constraint the search for corresponding image point to one dimension. Any point in the 3D world space together with the centers of projection of two cameras

2010 International Conference on Optoelectronics and Image Processing

978-0-7695-4252-2/10 $26.00 © 2010 IEEE

DOI 10.1109/ICOIP.2010.14

567

2010 International Conference on Optoelectronics and Image Processing

978-0-7695-4252-2/10 $26.00 © 2010 IEEE

DOI 10.1109/ICOIP.2010.14

565

2010 International Conference on Optoelectronics and Image Processing

978-0-7695-4252-2/10 $26.00 © 2010 IEEE

DOI 10.1109/ICOIP.2010.14

565

Page 2: [IEEE 2010 International Conference on Optoelectronics and Image Processing (ICOIP) - Haiko, Hainan, China (2010.11.11-2010.11.12)] 2010 International Conference on Optoelectronics

systems, defines an epipolar plane. The intersection of such a plane with an image plane is called an epipolar line (see Fig.2). Every point of a given epipolar line must correspond to a single point on the corresponding epipolar line. The search for a match of a point in the first image may therefore be reduced to a one-dimensional neighborhood in the second image plane (as opposed to a 2D neighborhood).

When the stereo cameras are oriented such that there is a known horizontal displacement between them, disparity can only occur in the horizontal direction and the stereo images are said to be in correspondence. When a stereo pair is in correspondence, the epipolar lines are coincident with the horizontal scan lines of the digitized pictures.

Ideally, one would like to find the correspondence of every individual pixel in both images of a stereo pair. However, it is obvious that the information content in the intensity value of a single pixel is too low for unambiguous matching. In practice, continuous areas of image intensity are the basic units that are matched. This approach (called area matching) usually involves some form of cross-correlation to establish correspondences.

III. STEREO MATCHING The main problem in matching is to find an effective

definition of what we call a valid correlation [8].Correlation scores are computed by comparing a fixed window in the first image against a shifting window in the second. The second window is moved in the second image by integer increments along the corresponding epipolar line and a correlation score curve is generated for integer disparity values. The measured disparity can then be taken to be the one that provides the largest peak.

To quantify the similarity between two correlation windows, we must choose among many different criteria that produce reliable results in a minimum computation time. We denote by I1(x,y) and I2(x,y) the intensity value at pixel (x,y). The correlation window has dimensions. Therefore, the indexes which appear in the formula below vary between -n and +n for the i-index and between -m and +m for the j-index :

C x y

I x i y j I x i y j

I x i y j I x i y j

i j

i j i j

1

1 22

12

22

( , , )

[ ( , ) ( , )]

( , ) ( , )

,

, ,

Δ

Δ

Δ=

+ + − + + +

+ + × + + +

(1

) It is important to know if a match is reliable or not. The

form of the correlation curve (for example C1) can be used to decide if the probability of the match to be an error is high or not. Indeed, errors occur when a wrong peak slightly higher than the right one is chosen. Thus, if in the correlation curve we find several peaks with approximately the same height, the risk of choosing the wrong one increases, especially if the image is noisy. However, a confidence coefficient, proportional to the difference of height between the most important peaks may be defined. Other important information may also be extracted from the correlation curve as, for instance, bland areas.

IV. OBSTACLE AVOIDANCE AND NAVIGATION ALGORITHM

The general principle applied to match points in the right and left images is correlation [9]. It consists of comparing the gray level values of the images on a small size (3x3 pixels) local window centered on each point of the left image to find the most similar window on the right image. The disparities (parallaxes: pixel shift from left to right image) thus obtained are then used for the reconstruction of the distance between the robot and the points of the terrain surface, based on the given camera's geometry.

The stereo matching process is implemented under the following assumptions concerning the stereo images and a surface to be reconstructed:

1) The original images are always noisy due to geometrical and photometric distortions;

2) The reconstructed surface is mainly smooth ("continuity constraint");

3) The stereo pair is taken as perspective projection of a scene that means the disparity values are generally decreasing upwards.

The search for correspondences is based on: 1) A fast procedure to extract the brightness features

which are more stable with respect to the brightness distortions;

2) Data pyramid construction for both stereo images. The original image is considered as zero pyramid layer. Each next layer is defined from the previous one by panning it with a factor of 1/2;

A local correlation analysis is implemented iteratively on the image pyramid. The process of identification starts from the top pyramid layer and is continuing over the pyramid layers from top to bottom. The parallax values obtained are stored and then used as the initial shift values, while passing the higher resolution layer. Local correlation analysis is combined with the dynamic programming method to reconstruct the relief along the scan lines at each pyramid layer. The method meets the continuity principle; it also introduces regularization to extract a smooth relief.

Finally, the parallax map is recalculated to the real scale distance map according to the geometry of the acquisition system. The value of each point on the map defines the distance from the robot to the appropriate point on the surface. The path planning algorithm is implemented in two steps:

1) During the first one the obstacles are detected in the scale of origin images in the following way. The distance map is recalculated to create an elevation map. The obstacles are detected on the elevation map according to the robot locomotion capacities to get over them. The regions which are detected as the obstacles, represent the prohibit zones in the robot's field of view.

2) In the second step the path from the start point to the feasible destination points is generated by applying Dijkstra algorithm to the elevation map. To implement this step, we consider those pixels which do not belong to the obstacles as

568566566

Page 3: [IEEE 2010 International Conference on Optoelectronics and Image Processing (ICOIP) - Haiko, Hainan, China (2010.11.11-2010.11.12)] 2010 International Conference on Optoelectronics

the nodes of the directed graph. The start point of the path (see fig.3) is considered as the graph origin.

obstacle,prohibit area

feasible target points

start point

Y

Y+1

Fig.3 Path planning task: directed graph on the image field

As usual, the length of a particular path which joins any two given vertices of the graph is defined as the sum of weights of the edges composing the path. Because the real path must follow continuously through the image field, the graph edges can be connected in the following way: each node in the image row Y can be connected only with the three nodes from the previous row Y+1 (see fig.3). Under this restriction, the number of operations for searching the shortest path from the start point to destination points is strictly less than the number of operation used in the Voronoi Diagram method [6].

Finally, a virtual destination is selected as a target point keeping the robot displacement within the direction defined by the mission task. The whole path is then reconstructed from the target to the start position according to the best direction for each graph node.

V. SYSTEM ARCHITECTURE The robot stereo vision system consists of two cameras,

and onboard computer providing image capturing/processing facilities.

The stereo cameras were installed in the robot in such a way to enable them to analyze the nearest robot environment (fig.4). The blinded area is within approximately 2m from the robot. The cameras are installed on the vertical rack of 1m. The cameras’ inclination toward the horizon is 10o. A rather large stereo basis (50 cm) made possible to process stereopairs with the resolution of 128x128 pixels. This is enough to recognize major obstacles during the robot motion: the difference between the parallax values corresponding to the top and the bottom of a stone of 30 cm height is equal to 3 pixels from a distance of 14m from the robot's center.

Since the stereo matching process is based on the assumption of epipolar geometry of the original stereo images, the optical axes of the cameras must be strictly parallel to each other. Let’s estimate the accuracy for the

alignment of the cameras assuming that vertical parallax for the corresponding points is within one pixel. Let us suppose that the world co-ordinate system X, Y, Z coincides with the left camera position (fig.5). Let us denote the camera viewing angles as a pan angle ψ related to Y-direction, tilt angle θ related to X-direction, and roll angle α related to Z-direction (Δψ, Δθ and Δα are means variation of these angles with respect to the parallel optical axes.) The pixel mismatching due to camera imbalance can be calculated by the following formula:

2

2

( )

( )

x xyx x x f yf f

xy yy y y f xf f

ψ θ α

ψ θ α

′ − = Δ = + Δ − Δ + Δ

′ − = Δ = Δ − + Δ − Δ

(2)

where: f is the camera focal length, (x,y) is ideal projection, (x’,y’) is real projection of the point P (fig.5).

vertical visualangle=32

2m

11m

optical axis

initial pitch=10camerasheight1m

Fig.4The robot stereo vision system

O

z

xb

Pl Pr

P(x,y,z)

yl

xl

y r

x r

y

θ

ψ

α

x'

z'

y'

Fig.5 The scheme of the cameras orientation in the stereo vision system

The steps involved by the developed vision system are shown in Fig. 6. As shown in Fig 4, data is first acquired by two CCD cameras in a stereo set-up. The images are then processed for features. The processing includes region detection and corner detection. Using stereo vision theory 3-D information about the object is obtained. Each region (facet) detected is first identified. This is done by breaking each region into its smallest element and finding its relationship with its adjacent element. Three levels of facet

569567567

Page 4: [IEEE 2010 International Conference on Optoelectronics and Image Processing (ICOIP) - Haiko, Hainan, China (2010.11.11-2010.11.12)] 2010 International Conference on Optoelectronics

matching are used. The first check is to ensure if the numbers of sides are similar. The next check is to see if all the angles are similar and the last check is to see if adjacent sides are similar. Each recognized facet is assigned a type number which lies between 0 and 1. Once facets are identified, the relationships between them are found. This is done by using a grid which represents how each surface is connected to every other region touching it. This data is input as a vector into the artificial neural network which represents it as a node. The object is rotated slowly to simulate robot movement.

Fig. 6 System architecture

Fig. 7 Example of an object represented in the developed software.

Each characteristic view of the object is shown to the

camera and information is fed into the artificial neural network (ANN). Similar surface relationships access the same node, while new relationships are learnt as new nodes. This is a feature of all ART (adaptive resonance theory) based neural networks [10]. If the ANN has learned previously to recognize an input vector, then a resonant state is achieved quickly when that input vector is presented. During resonance, the adaptation process reinforces the memory of the stored pattern. If the input vector is not immediately recognized, the network rapidly searches through its stored patterns looking for a match. If no match is found, the network enters a resonant state whereupon the new pattern is stored for the first time. Thus the network

responds quickly to previously learned data yet remains able to learn when novel data are presented. The many-to-one learning feature of Fuzzy ARTMAP is used where relationships of several views of objects can be associated with a single vector at the second ART-B module [11]. When a single novel image of an object is presented, the correct object can be recognized. Fig.7 is an Example of an object represented in the developed software.

VI. CONCLUSIONS In this paper we have discussed the concept of a stereo

vision system, and the obstacle avoidance and navigation algorithm for mobile robots. Both the good quality and high performance of the software implementation have demonstrated a feasibility of real-time automatic navigation. The navigation software is written in C++ language. The overall processing time is 10 sec (image resolution is 128x128 pixels). This result is comparable to French hardware-based solution [7], but doesn’t require additional costs.

REFERENCES [1] Fua, P.. A Parallel Stereo Algorithm that Produces Dense Depth Maps

and Preserves Image Features. Machine Vision and Applications. 6(1), 35-49.1993.

[2] Grimson, W.E.L.Computational Experiments with a Feature Based Stereo Algorithm. IEEE Trans. on Patt. Anal. and Mach. Intell. PAMI-7(1), 17-33.Jan. 1995

[3] Hannah, M.J. A System for Digital Stereo Image Matching. Phot. Eng. and Rem. Sens. 55(12), 1765-1770.2001

[4] Kim, N.H. & A.C. Bovik . A Contour-Based Stereo Matching Algorithm Using Disparity Continuity. Pattern Recognition. 21(5), 505-514.2003.

[5] Medioni, G. & R. Nevatia . Matching Images Using Linear Features. IEEE Trans. on Patt. Anal. and Mach. Intell. PAMI-6(6), 675-685.1994.

[6] Proy, C., et al. . Improving autonomy of Marsokhod 96. 44th Cong. of the Int. Astronautical Federation. Graz, Austria. Oct. 2003

[7] V. Bruce and P. Green. Visual Perception, Physiology and Ecology. Lawrence Erlbaum Associates Ltd. Publishers, 1998

[8] N. Chauvin, G. Marti and K. Konolige. Contour Maps for Real-Time Range Image Parsing. Not yet published, January 1997.

[9] G.Marti. Diploma Work : Stereoscopic camera real time processing and robot navigation, March 1997

[10] C D’Souza, K Sivayoganathan , D Al-Dabass, V Balendran and J Keat , “ Machine vision for robotic assembly: issues and experiments”. Proc. 13th National Conference on Manufacturing Research, Glasgow, 9-11 Sept, 1997, pp 114-118. ISBN 1 9012 48119

[11] G A Carpenter, S Grossberg, “Fuzzy ARTMAP: A neural network architecture for incremental supervised learning of analog multidimensional maps”, IEEE Transactions on Neural Networks, 1992,Vol 3(5), pp 698-713

Image Processing

Facet Detection

Connectivity generation

ART-A MAP FIELD ART-B

Data acquisition

ARTMAP

570568568