author(s) - artc.org.tw...manufactured by videohome® catches frames from cameras into a computer....

20
The authors are solely responsible for the content of this technical presentation. The technical presentation does not necessarily reflect the official position of the American Society of Agricultural and Biological Engineers (ASABE), and its printing and distribution does not constitute an endorsement of views which may be expressed. Technical presentations are not subject to the formal peer review process by ASABE editorial committees; therefore, they are not to be presented as refereed publications. Citation of this work should state that it is from an ASABE meeting paper. EXAMPLE: Author's Last Name, Initials. 2012. Title of Presentation. ASABE Paper No. 12----. St. Joseph, Mich.: ASABE. For information about securing permission to reprint or reproduce a technical presentation, please contact ASABE at [email protected] or 269-932-7004 (2950 Niles Road, St. Joseph, MI 49085-9659 USA). Author(s) First Name Middle Name Surname Role Email Ta-Te Lin ASABE member [email protected] Affiliation Organization Address Country Department of Bio-Industrial Mechatronics Engineering, National Taiwan University No. 1, Sec. 4, Roosevelt Rd., Taipei 106, Taiwan, ROC Taiwan, ROC Author(s) First Name Middle Name Surname Role Email Kai-Chiang Chuang [email protected] Affiliation Organization Address Country Department of Bio-Industrial Mechatronics Engineering, National Taiwan University No. 1, Sec. 4, Roosevelt Rd., Taipei 106, Taiwan, ROC Taiwan, ROC Author(s) First Name Middle Name Surname Role Email An-Chih Tsai [email protected] Affiliation Organization Address Country Department of Bio-Industrial Mechatronics Engineering, National Taiwan University No. 1, Sec. 4, Roosevelt Rd., Taipei 106, Taiwan, ROC Taiwan, ROC

Upload: others

Post on 05-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Author(s) - artc.org.tw...manufactured by VideoHome® catches frames from cameras into a computer. The frame rate is 30 frames per second, and the frame resolution is 640 x 480 pixels

The authors are solely responsible for the content of this technical presentation. The technical presentation does not necessarily reflect the official position of the American Society of Agricultural and Biological Engineers (ASABE), and its printing and distribution does not constitute an endorsement of views which may be expressed. Technical presentations are not subject to the formal peer review process by ASABE editorial committees; therefore, they are not to be presented as refereed publications. Citation of this work should state that it is from an ASABE meeting paper. EXAMPLE: Author's Last Name, Initials. 2012. Title of Presentation. ASABE Paper No. 12----. St. Joseph, Mich.: ASABE. For information about securing permission to reprint or reproduce a technical presentation, please contact ASABE at [email protected] or 269-932-7004 (2950 Niles Road, St. Joseph, MI 49085-9659 USA).

Author(s)

First Name Middle Name Surname Role Email

Ta-Te Lin ASABE member [email protected]

Affiliation

Organization Address Country

Department of Bio-Industrial Mechatronics Engineering, National Taiwan University

No. 1, Sec. 4, Roosevelt Rd., Taipei 106, Taiwan, ROC

Taiwan, ROC

Author(s)

First Name Middle Name Surname Role Email

Kai-Chiang Chuang [email protected]

Affiliation

Organization Address Country

Department of Bio-Industrial Mechatronics Engineering, National Taiwan University

No. 1, Sec. 4, Roosevelt Rd., Taipei 106, Taiwan, ROC

Taiwan, ROC

Author(s)

First Name Middle Name Surname Role Email

An-Chih Tsai [email protected]

Affiliation

Organization Address Country

Department of Bio-Industrial Mechatronics Engineering, National Taiwan University

No. 1, Sec. 4, Roosevelt Rd., Taipei 106, Taiwan, ROC

Taiwan, ROC

Page 2: Author(s) - artc.org.tw...manufactured by VideoHome® catches frames from cameras into a computer. The frame rate is 30 frames per second, and the frame resolution is 640 x 480 pixels

The authors are solely responsible for the content of this technical presentation. The technical presentation does not necessarily reflect the official position of the American Society of Agricultural and Biological Engineers (ASABE), and its printing and distribution does not constitute an endorsement of views which may be expressed. Technical presentations are not subject to the formal peer review process by ASABE editorial committees; therefore, they are not to be presented as refereed publications. Citation of this work should state that it is from an ASABE meeting paper. EXAMPLE: Author's Last Name, Initials. 2012. Title of Presentation. ASABE Paper No. 12----. St. Joseph, Mich.: ASABE. For information about securing permission to reprint or reproduce a technical presentation, please contact ASABE at [email protected] or 269-932-7004 (2950 Niles Road, St. Joseph, MI 49085-9659 USA).

Author(s)

First Name Middle Name Surname Role Email

Yu-Sung Chen [email protected]

Affiliation

Organization Address Country

R&D Division, Automotive Research & Testing Center

No.6, Lugong S. 7th Rd., Lukang Township, Changhua County 505, Taiwan, ROC

Taiwan, ROC

Publication Information

Pub ID Pub Date

131597654 2013 ASABE Annual Meeting Paper

Page 3: Author(s) - artc.org.tw...manufactured by VideoHome® catches frames from cameras into a computer. The frame rate is 30 frames per second, and the frame resolution is 640 x 480 pixels

The authors are solely responsible for the content of this technical presentation. The technical presentation does not necessarily reflect the official position of the American Society of Agricultural and Biological Engineers (ASABE), and its printing and distribution does not constitute an endorsement of views which may be expressed. Technical presentations are not subject to the formal peer review process by ASABE editorial committees; therefore, they are not to be presented as refereed publications. Citation of this work should state that it is from an ASABE meeting paper. EXAMPLE: Author's Last Name, Initials. 2012. Title of Presentation. ASABE Paper No. 12----. St. Joseph, Mich.: ASABE. For information about securing permission to reprint or reproduce a technical presentation, please contact ASABE at [email protected] or 269-932-7004 (2950 Niles Road, St. Joseph, MI 49085-9659 USA).

An ASABE Conference Presentation Paper Number: 131597654

A Real-Time Stereo Vision System for Obstacle Recognition and Motion Estimation

Ta-Te Lin, Professor Dept. of Bio-Industrial Mechatronics Engineering, National Taiwan University, No. 1, Sec. 4, Roosevelt Rd., Taipei 106, Taiwan, ROC, [email protected]

Kai-Chiang Chuang, MS Graduate Student Dept. of Bio-Industrial Mechatronics Engineering, National Taiwan University, No. 1, Sec. 4, Roosevelt Rd., Taipei 106, Taiwan, ROC, [email protected]

An-Chih Tsai, PhD Graduate Student Dept. of Bio-Industrial Mechatronics Engineering, National Taiwan University, No. 1, Sec. 4, Roosevelt Rd., Taipei 106, Taiwan, ROC, [email protected]

Yu-Sung Chen, R&D Engineer R&D Division, Automotive Research & Testing Center, Changhua, Taiwan, ROC, [email protected]

Written for presentation at the 2013 ASABE Annual International Meeting

Sponsored by ASABE Kansas City, Missouri

July 21 –24, 2013

Abstract. Obstacle recognition and motion state estimation are important elements in autonomous navigation systems. In this research, a stereo vision system consisting of dual cameras was designed and efficient algorithms for obstacle recognition and motions estimation of obstacles are proposed. To satisfy the geometric constraints of ideal stereo vision theory, the dual cameras are mounted on a specially designed mechanism which makes the optical axes of cameras parallel. When the distortion of images due to camera lens is corrected by calibration, the disparity image can be estimated by the correspondence matching method; thus the distance and three-dimensional information of obstacles can be obtained in real-time. To detect

Page 4: Author(s) - artc.org.tw...manufactured by VideoHome® catches frames from cameras into a computer. The frame rate is 30 frames per second, and the frame resolution is 640 x 480 pixels

The authors are solely responsible for the content of this technical presentation. The technical presentation does not necessarily reflect the official position of the American Society of Agricultural and Biological Engineers (ASABE), and its printing and distribution does not constitute an endorsement of views which may be expressed. Technical presentations are not subject to the formal peer review process by ASABE editorial committees; therefore, they are not to be presented as refereed publications. Citation of this work should state that it is from an ASABE meeting paper. EXAMPLE: Author's Last Name, Initials. 2012. Title of Presentation. ASABE Paper No. 12----. St. Joseph, Mich.: ASABE. For information about securing permission to reprint or reproduce a technical presentation, please contact ASABE at [email protected] or 269-932-7004 (2950 Niles Road, St. Joseph, MI 49085-9659 USA).

and locate obstacles, the estimated three-dimensional information of each pixel in image plane is projected onto a non-linear top-view map, and the blob segmentation is used to define the obstacle candidates in top-view map. Then the pre-defined three-dimensional constraints are imposed to filter noisy blobs and thus to accurately detect obstacles. Following the detection of obstacles, an obstacle recognition strategy is applied. The obstacles are first separated into elongated and non-elongated shape obstacles by the geometrical feature which is the ratio of height and width. The elongated shape obstacle candidates include pedestrians and unknown objects, and the non-elongated shape candidates consist of small vehicles, large vehicles, and unknown objects. To recognize the obstacle types, histogram of oriented gradient (HOG) is applied to extract features from the obstacle images. Before support vector machine (SVM) is employed, linear discriminant analysis (LDA) is used to reduce the complexity of HOG feature to simplify the feature dimension and speed up the recognition time. To track the obstacle appearing in different video frames, Bhattacharyya distance is used to be the matching index. When the obstacle is able to be tracked in different video frames, the motions of obstacles can also be estimated by using the Kalman filter. Moreover, the trajectories of obstacles are recorded. In our experiments, the developed stereo vision system operates at a speed of over 10 frames per second at a resolution of 640x480. The obstacle detection rate and the accuracy of obstacle recognition were all above 90% under various conditions. The error of motion estimation was about 50 cm.

Keywords. Stereo Vision, Support Vector Machine, Histogram of Oriented Gradient, Kalman Filter, Motion Estimation

Page 5: Author(s) - artc.org.tw...manufactured by VideoHome® catches frames from cameras into a computer. The frame rate is 30 frames per second, and the frame resolution is 640 x 480 pixels

1

Introduction Obstacle recognition and motion state estimation are two leading topics in autonomous navigation and tracking. Conventionally, obstacle information is extracted from range sensors such as radar, range finder, ultrasonic, or infrared sensor. However, those sensors only provide distance information but no color information, which is critical in obstacle recognition. With up to date technologies, the stereo vision system has been widely applied for distance estimation applications. Combining two cameras and stereo vision technology, the stereo vision system provides the distance as well as the image information of obstacle. Based on the distance and image information obtained from the stereo vision system, obstacle recognition and motion state estimation are achievable in complex applications. Thus, a stereo vision based obstacle collision system is developed to recognize the type of obstacle and warn the driver the obstacle motion for traffic accident avoidance.

In agricultural applications, farmers often use tractors or vehicles to do agricultural tasks in fields. To apply the automatic mechanism in agricultural environments, keeping consciousness on surrounding working environment is important. For safety reasons, when tractors or vehicles are steering in agricultural environment, warning signals of obstacles surrounded would increase the safety of driver. Hence, obstacles recognition and tracking are important in agricultural automation. Food insufficiency has become a serious problem while world population keeps growing. Agricultural automation is a way to reduce labor and increase productivity. Therefore, Ollis and Stentz (1996) applied stereo vision to guide an automatic harvester by tracking the line between cut and uncut crops. Moreover, the stereo vision can also provide the harvesting area estimation of crop. Billingsley (2005) tracked the distance of wild animals by stereo vision to avoid damaging crops.

Recently, stereo vision technology has been applied for obstacle recognition and tracking in many researches. Foggia et al. (2005) presented a real-time system for moving objects and obstacle detection (MOOD) based on stereo vision. In this system, moving objects and obstacles were segmented out based on optical flow method. Besides, by applying blob algorithm, moving objects were detected from the disparity image. Pantilie et al. (2010) fused dense stereo vision and dense optical flow in a depth-adaptive occupancy grid framework to accurately detect moving obstacles. The proposed algorithm was then applied for obstacle detection in an intersection assistance system. If the obstacles were detected and recognized, the motion state of moving obstacles can be further estimated. In other words, obstacles tracking can be achieved. Many approaches to obstacles tracking were proposed lately. Yilmaz et al. (2006) synthesized a review paper of tracking methods and classified them into different categories. In general, there are three tracking categories: points, silhouette, and kernel. Kalman filter is a tracking algorithm, which uses positions and velocity measurements of objects to filter noise out and estimate the position of objects.

In our previous study, we have developed the hardware and software of a real-time system for obstacle detection and recognition based on stereo vision. In this paper, we focused on the study of motion state estimation of detected obstacles (Lin et al., 2012). Furthermore, the system is capable of recognizing and tracking the obstacles ahead of the agricultural vehicles. With this information, collision between obstacles and agricultural vehicles can be avoided, and the safety of driving agricultural vehicles can be improved. The stereo vision system we designed is introduced in the “System Configuration” section. ”The Obstacle Detection and Recognition Methods” section gives details on how to estimate 3D information and detect obstacles via the stereo vision scheme. In addition, the obstacle recognition and motion estimation methods including obstacle matching method and the use of Kalman filter are

Page 6: Author(s) - artc.org.tw...manufactured by VideoHome® catches frames from cameras into a computer. The frame rate is 30 frames per second, and the frame resolution is 640 x 480 pixels

2

presented. Both the performance of obstacle recognition and motion estimation are presented in the “Experimental Results and Discussion” section.

System Configuration In the stereo vision system, there are four parts including two cameras, the video acquisition device, the base mechanism and the lens fixer mechanisms. The focal length of two cameras is 16 mm, and the pan angle of view of field is around 19.8°. The video acquisition device manufactured by VideoHome® catches frames from cameras into a computer. The frame rate is 30 frames per second, and the frame resolution is 640 x 480 pixels. The interface of the acquisition device is USB interface that provides the convenience and flexibility to use a portable computer (i.e. a notebook computer) to work in outdoor environments. Figure 1 shows the whole stereo vision system where two cameras are mounted on a mechanism manufactured with precision. The base mechanism is applied to keep the two cameras in the same horizontal axis. Based on stereo vision theory, keeping both cameras in the same horizontal axis is a vital condition which increases the performance of stereo vision and reduces the computational consumption.

In addition, the distance between the cameras is another important factor in estimating the 3D information of each pixel in the image plane. The lens fixers are employed to avoid the shift between the lens and the CCD. After camera calibration, the relationship of the lens and the CCD is estimated. If the relative position of them is changed, the accuracy of the estimated 3D information will be influenced. In order to make the system stable and simplify the calibration procedure, the lens fixers are designed to fix the relative position of the lens and the CCD. The mechanisms is beneficial to the stability of the system which is mounted on cars where the vibration is unavoidable. Applying the mechanism, the camera calibration only needs to be conducted one time since the lens are fixed. Then, the parameters estimated by calibration procedure will not be changed and can be applied in later experiments.

Figure 1. The flexible binocular stereo vision system

In this approach, stereo vision scheme is applied to detect the obstacles in front of the vehicle. The obstacles are defined as objects that are vertical with respect to the ground. Based on this definition and the 3D information estimated by stereo vision scheme, obstacles can be detected and their images are also segmented out for later applications. Applying stereo vision to detect obstacles not only defines the obstacles’ locations, but also obtains their image information. According to this information, the type of obstacles can be recognized when extracting their features and then applying machine learning methods. Other than obstacle detection, the obstacle’s velocity can also be measured via the estimated 3D information. Using pattern matching methods, obstacles in different image frames can be defined. Kalman filter is a widely-used and well-known tracking method. The motion status of obstacles is able to be estimated by applying Kalman filter. The obstacles’ movement information is valuable to remind drivers the frontal environment. The whole system flowchart is illustrated in Figure 2.

Page 7: Author(s) - artc.org.tw...manufactured by VideoHome® catches frames from cameras into a computer. The frame rate is 30 frames per second, and the frame resolution is 640 x 480 pixels

3

Figure 2. The obatacle recognition and tracking flowchart

The Obstacle Detection and Recognition Methods

Stereo vision theme consists of the epipolar geometry and a corresponding matching method. Epipolar geometry can estimate 3D information of pixels in the left image plane when the pixels’ disparity values are given. However, the disparity value represents the relationship of the point in real-world projected onto left and right image planes. To determine the relationship and the disparity values, a corresponeding matching method is applied. In the remaining contents of this section, the epiploar geometry and the corresponding matching method are explained.

Epipolar Geometry Epipolar geometry is a basic and important geometry used in the stereo vision scheme. Both cameras are assumed to be parallel to the optical axis, and on the same horizontal line. The distance between them is called baseline B. Using equation (1) and (2), the 3D position of a point ( ), ,P X Y Z can be estimated. The parameters Δ 1x and Δ 2x denote the differences from the

Page 8: Author(s) - artc.org.tw...manufactured by VideoHome® catches frames from cameras into a computer. The frame rate is 30 frames per second, and the frame resolution is 640 x 480 pixels

4

epipolar points to the center point of the image plane. The epipolar points indicate the points that ( ), ,P X Y Z projecting onto the left and right image planes. In these equations, f is the focal length of the camera, and ( ), ,X Y Z represents the three-dimensional coordinates of the object in the real world. ( ),x y and ( ),o ox y are the point of interest and the center point of the left image

plane, respectively. Δ − Δ1 2x x is the disparity value.

−= ox xX

Z f, −

= oy yYZ f

(1)

⋅=Δ − Δ1 2

B fZx x (2)

Corresponding Matching Method for Stereo Vision Corresponding matching method is a vital part of stereo vision scheme. In stereo vision, a point in real-world is projected onto left and right image planes, and the relationship between the point in left and right image has to be calculated via the corresponding matching method. The relationship determined is actually the disparity value. However, defining the relationship between two images of each pixel is complicated since the searching area could be the whole image. Thus, in order to narrow down the searching area, the left and right image are supposed to be arranged in the same horizontal axis. In this study, an open source library for computer vision, OpenCV, was applied to accomplish this application. In this library, Block matching (BM) function, a type of corresponding matching method, provides efficiency performance. The main idea of this function is Sun of Absolute Differences (SAD) denoted as in equation (3) where ( )L iI x and ( )+R iI x u signify the intensity of the image pixel in the left and right images, and u

represents the displacement:

( ) ( ) ( )= +∑SAD L i R ii

E u I x - I x u (3)

Top-View Plot Projection Obstacles are assumed to be vertical to the floor plane. Based on this assumption, every obstacle has to occupy an area on the floor plane ( −X Z plane). X -direction is lateral and Y -direction is vertical with respect to floor plane. Z -direction indicates depth. When a disparity image of left image is obtained, the 3D information of pixels that satisfied the predefined constraints is projected onto the top-view plot proposed by Pocol et al. (2008). In fact, the top-view plot is like a bi-dimensional histogram. Each grid of the top-view plot is a trapezoid. The width and height of each grid are used to represent the practical width and depth in real-world. Figure 3 shows a typical top-view plot produced by this method; the shaded grids represent the occupied area of obstacles in the −X Z plane.

Page 9: Author(s) - artc.org.tw...manufactured by VideoHome® catches frames from cameras into a computer. The frame rate is 30 frames per second, and the frame resolution is 640 x 480 pixels

5

Figure 3. A typical top-view plot (occupied grid) for obstacle detection

The width of each grid in the top-view plot is modeled with a linear relationship; however, the height of each grid, ,r cH , is modeled with a logarithmic relationship represented as equation (6).

,r cH means the height of the grid at ( , )c r . The minimum and maximum depth of detection area

are Zmin and Zmax . The parameter K is defined by users and set as 0.05 in this study.

1 minZ = Z

( )= = ⋅ +max min 1 R

nZ Z Z K (4)

( )+= max

1min

logK

ZRZ (5)

−= −, 1r c n nH Z Z (6)

Obstacle Detection In a top-view plot, the grids are ignored if amount of pixels projected from the disparity image is less than the predefined threshold. Then, obstacles can be segmented from the top-view plot by applying the blob method. The 3D information of each obstacle candidate also have to satisfy the two constraints. The first constraint filters out the obstacles which are too small meaning those pixel amount are less than 300 pixels. The smaller obstacles may still appear due to corresponding matching errors. Furthermore, the second constraint is related to selecting the obstacles whose height is greater than the predefined threshold (30 cm). When the obstacles in the top-view plot are detected, the obstacles whose 3D information are similar are assumed to be one bigger obstacle. The lack of the texture information may cause detection errors like one obstacle to be detected as several small obstacles. Therefore, the merge method, a post-processing of obstacle detection, is applied to overcome this drawback.

Page 10: Author(s) - artc.org.tw...manufactured by VideoHome® catches frames from cameras into a computer. The frame rate is 30 frames per second, and the frame resolution is 640 x 480 pixels

6

Feature Extraction and Obstacle Recognition Apart from detecting the obstacles ahead of the vehicle, being aware of the type of the obstacle is another important application. In this approach, four types of obstacles are recognized: human (pedestrian), large agricultural vehicle (tractor), small agricultural vehicle (cultivator) and unknown obstacle, as shown in Figure 4. To achieve the obstacle recognition, a hierarchical decision tree is applied and illustrated in Figure 5. In this decision tree, obstacles are initially separated into two types: whether the obstacle is human or not. To recognize the obstacle as a human, a typical and widely-used feature, histogram of oriented gradient (HOG), is applied. The HOG feature was proposed by Dalal et al. (2005). The HOG feature is used to represent the obstacle with the magnitude and orientation of the obstacle image’s gradient. In this research, HOG feature is employed to build the recognition model via support vector machine (SVM) to recognize human and other obstacles including agricultural vehicles and unknown obstacles.

Besides, to recognize large agricultural vehicles, small agricultural vehicles and unknown obstacles, geometrical features are useful. The estimated distance and the aspect ratio of the obstacle image are combined into a vector. These feature vectors of each agricultural vehicle data are then processed by LDA. The feature vectors mapping into LDA space are reduced in the dimension. The distributions of the points in LDA space become more separable. In the LDA space, the mean and standard deviation of the two classes are estimated and denoted as equation (7). cL and cS represent the mean of large and small agricultural vehicles and so the standard deviations, LStd and SStd do. Hence, a new projected point pO can be classified into large, small or unknown obstacle with equation (7).

≤ ⋅⎧⎪

≤ ⋅⎨⎪⎩

- 3 if - 3

c p L

c p S

L O Stdlarge vehicleThe vehicle is small vehicle S O Std

unknown obstacle otherwise (7)

(a) (b) (c) (d)

Figure 4. The defined obstacles. (a) human (b) tractor (c) cultivator (d) unknown obstacle

Page 11: Author(s) - artc.org.tw...manufactured by VideoHome® catches frames from cameras into a computer. The frame rate is 30 frames per second, and the frame resolution is 640 x 480 pixels

7

Obstacle

HOG + LDA

Human

Geometrical feature + LDA

Large Vehicle(Tractor)

Unknown

AgriculturalVehicle

Unknown

Small Vehicle(Cultivator)

Figure 5. The strategy of recognizing obstacles

The Obstacle Tracking Method Once an obstacle is detected by the detection method mentioned above, the state of obstacle’s motion is important and helpful for surveillance and collision avoidance. To achieve these tasks, a tracking method was proposed. The purpose of obstacle tracking is to determine the position of the obstacle in view of field continuously and so the tracking method can reliably work in dynamic scenes. According to the obstacle detection method, the obstacles have been represented as simple geometric shapes. The tracking method called kernel tracking is based on the geometric shape of obstacles. The kernel refers to the shape and appearance of obstacles. Kernel tracking is typically employed because of its relative low computation cost. In this study, the tracking method includes two steps: feature extraction and feature matching. Based on the stereo vision system, the average depth of obstacles is estimated. Thus, the distance and image information can be useful for tracking. Histograms are often used in image processing for feature representations. Some approaches in computer vision field require multiple comparisons of histograms for the rectangular patches. The histogram contains a lot of information of the image. Bhattacharyya distance is a method for measuring the similarity of two discrete probability distributions. In the tracking scheme, color and distance histogram of each rectangular patch were established, and then Bhattacharyya distance are calculated as a criterion for matching the obstacle in different frames. Therefore, based on the change of the obstacle’s velocity, the average of Bhattacharyya distance may also change and result fail tracking. Adaptive Bhattacharyya method has been proposed to improve on this drawback. Figure 6 shows the flowchart of mentioned approach.

Page 12: Author(s) - artc.org.tw...manufactured by VideoHome® catches frames from cameras into a computer. The frame rate is 30 frames per second, and the frame resolution is 640 x 480 pixels

8

Image acquisition

Obstacle detection

Color histogram Depth histogram

Bhattacharyya distanceBhattacharyya distance

Feature fusion

Bhattacharyya distance < 0.3 ?

Feature matching

NoYes

Update threshold

Velocity estimation

Figure 6. The procedure of the proposed tracking approach

In equation (8), the two compared color histograms 1H and 2H with N (N=256) bins representing two obstacle features in different frames (t=0 and t=1). 1 2( , )d H H is the value of Bhattacharyya distance that shows how well both histograms match.

= − ∑1 2 1 22

1 2

1( , ) 1 ( ) ( )colorI

d H H H I H IH H N

(8)

The representatives of obstacle mentioned above are rectangular patches, but some redundant information of the background is also included in color histograms. Here, the background information of the obstacle image is treated as noise, so the background needs to be ignored when comparing the histogram. In the histogram, the range of I is [1,255]. If the two compared images are extremely high in similarity, the Bhattacharyya distance will be close to zero. In the same way, the depth histogram can also be accounted to calculate the Bhattacharyya distance.

3H and 4H are the two patches with 30 meters detection range dividing into 256 bins (N=256). The formula can be shown in equation (9).

= − ∑3 4 tan 3 42

3 4

1( , ) 1 ( ) ( )dis ceD

d H H H D H DH H N

(9)

Page 13: Author(s) - artc.org.tw...manufactured by VideoHome® catches frames from cameras into a computer. The frame rate is 30 frames per second, and the frame resolution is 640 x 480 pixels

9

Either equation (8) or (9) can be used for feature matching; however, using only one of them does not achieve a good performance. Many literatures have proposed some approaches based on data fusion which has been widely applied in many fields. Generally, feature preprocessing techniques can be divided into two types, serial and parallel feature fusion strategy. But there is no clear solution which is the best. Therefore, to deal with the mentioned two feature sets, serial feature fusion strategy is applied. Comparing with the parallel feature fusion, serial feature fusion may cause dimension increase. However, the serial method provides more reasonable weightings to the two features, and does not take much computational time. The feature fusion can be defined by

= +2 2tanmatch color dis ced d d (10)

t = 0 t = 1

Figure 7. One example of cross matching

Each obstacle has some candidate matching values corresponding to another frame. Due to the property of Bhattacharyya distance, the minimum value is possible to be less than the pre-defined threshold (0.3). The relationship of the obstacle in different frames is established and tracked in current frame. Cross matching is a two-way matching method that was applied in this study. An outstanding advantage of this matching approach is that it can overcome different amount of obstacles in the continuous frames as the condition illustrated in Figure 7.

Empirical results showed that the velocity of obstacle has a great impact to the proposed tracking strategy. Statistics showed that Bhattacharyya distance and the obstacle’s velocity are in linear relationship. Besides, the velocity estimated by the stereo vision is still erroneous to its real velocity. Therefore, a calibration equation is developed and shown in (11). The actual velocity magnitude of each obstacle is defined as

Δ= × −

Δ0.3323 0.4254ob

xvt

(11)

xΔ is the displacement calculated from stereo triangulation equations on X-Z plane in time interval tΔ . tΔ is the reciprocal of image capturing rate at 15 frames per second. Once the velocity is estimated, the threshold of matching criteria can be updated by

= × +0.0165 0.1601update obthreshold v (12)

Motion Estimation Motion estimation is useful to realize the dynamic environment in the front of vehicles. According to previous approaches, the information of obstacles such as locations and velocities with their classes are obtained. So traveling state can then be calculated. In this study, traveling

Page 14: Author(s) - artc.org.tw...manufactured by VideoHome® catches frames from cameras into a computer. The frame rate is 30 frames per second, and the frame resolution is 640 x 480 pixels

10

state can be classified into two types, approaching and being away. In the actual operation, the real location is calculated with the current position and those in last four frames. The mean of the five positions is considered as the real location position now. This data preprocessing is equivalent to 5-point smoothing by equation (13). As we can see from Figure 8, downward arrow is defined as the approaching obstacle, and upward arrow refers to being away. Furthermore, the length of arrow represents the velocity magnitude.

−=−

= − −∑1

2

12

1'( , ) ( , )5

L

Li

f x z f x i z i (13)

(a) (b)

Figure 8. Two types of traveling state, (a) approaching, (b) being away

Since 1960, Kalman proposed a recursive solution to the discrete linear filtering problem. A lot of subjects on extensive researches and applications have been proposed since then. In particular, in the autonomous navigation field that needs prediction and position modification. The Kalman filter is an algorithm which makes use of imprecise data optimally in a linear system with noises to continuously update the best estimate of current state. For this study, the motion of each obstacle is assumed to be linear. Kalman filter is applied to estimate the position of the obstacle and then predict its trajectory. The main equations of Kalman filter are separated into two groups: time update equations and measurement update equations. The former one can be seen as the predictor, and the later is the corrector equations. The specific equations for the time and measurement updates are described below.

− = − − +( | 1) ( 1| 1) ( )x k k Ax k k BU k (14)

− = − − +( | 1) ( 1 | 1) 'P k k AP k k A Q (15)

The two equations above project the state estimation ( | 1)x k k − and covariance estimates ( | 1)P k k − from time step k-1 to k. Q is the process noise covariance matrix. A and B are the

matrices related to the state and control input at time step. We call them the time update equations.

= − + − −( | ) ( | 1) ( )( ( ) ( | 1))x k k x k k Kg k z k Hx k k (16)

The difference ( ) ( | 1)z k Hx k k− − in equation (17) is called the measurement residual. Kg , Kalman gain matrix, is determined by

( ) ( | 1) '/ ( ( | 1) ' )K g k P k k H H P k k H R= − − + (17)

= − −( | ) ( ( ) ) ( | 1)P k k I K g k H P k k (18)

H is a matrix related to the state of measurement. Equation (16) to (18) are called the measurement update equations.

Page 15: Author(s) - artc.org.tw...manufactured by VideoHome® catches frames from cameras into a computer. The frame rate is 30 frames per second, and the frame resolution is 640 x 480 pixels

11

In general conditions, Kalman filters are usually initialized randomly with absence of covariance data. This paper presents a method to initialize the Kalman filter via a reliable velocity estimation approach which is mentioned above. As long as the velocity is obtained, the position can also be estimated at time step k. An excellent advantage to our system is that it converges fast. The concept is to reduce the error of initial guesses.

Experimental Results and Discussion Experiments were performed to evaluate our methods. In the experiments, the baseline is 15 cm which is the distance between two cameras. Before calculating the disparity image, camera calibration has to be accomplished first. The calibration procedure produces intrinsic and extrinsic matrices of the cameras to overcome the distortion from lens. In our system, the estimated distance error is around 1.8% in average with a detection range from 2.5 to 20 meters.

As described previously, obstacles are detected using the top-view plot and the blob method. The resolution of top-view plot is 100 x 125 pixels. In comparison with the entire left image (640x480), it takes less computation time. The speed of obstacle detection in the system is more than 10 fps. For obstacle detection experiments, three obstacles were considered including human, large agriculture vehicles and small vehicles. According to the results, the accuracy of obstacle detection is shown in Table 1 below.

Table 1. Accuracy of obstacle detection

Detected obstacle Negative Positive

Negative N/A 3 Actual obstacle Positive 6 989

Average detection rate 99.1%

For the recognition of obstacles, two models were built to classify the four kinds of obstacles by cascading them. In the first training set, it contains 194 human images and 274 non-human images. The non-human images include large agriculture vehicles, small agriculture vehicles and unknown obstacles. We trained and built a new model by a training set whose obstacles were collected through the obstacle detection method we proposed. For agricultural vehicle and unknown obstacle recognition, the second training set consists of 262 large agriculture vehicles, 250 small agriculture vehicles and 52 unknown images. The speed of recognizing each obstacle is faster than 20 fps in average. The accuracies of obstacle recognition are listed in Table 2 and 3. The performances are both greater than 85 %.

Table 2. Accuracy of recognizing human and non-human obstacle

Recognized obstacle Human Unknown

obstacle Total Recognition rate

Human 121 16 137 88.3 % Actual obstacle Unknown obstacle 10 533 543 98.2 %

Average recognition rate 96.6 %

Page 16: Author(s) - artc.org.tw...manufactured by VideoHome® catches frames from cameras into a computer. The frame rate is 30 frames per second, and the frame resolution is 640 x 480 pixels

12

Table 3. Accuracy of recognizing vehicle and non-vertical unknown obstacle

Recognized obstacle

Large Agricultural

Vehicle

Small Agricultural

Vehicle

Unknown obstacle Total Recognition

rate

Large Agricultural Vehicle 128 13 0 141 90.8 %

Small Agricultural Vehicle 12 106 0 118 89.8 % Actual

obstacle

Unknown obstacle 6 5 10 21 47.6 %

Average recognition rate 87.1 %

Figure 9 shows the empirical results of different obstacles in the NTU farm. In Figure 9 (a), the black margin area is the result from the calibration procedure. The disparity image was calculated and illustrated by pseudo color. The red, green, blue colors represent the obstacles in near-distance, mid-distance and far-distance respectively. The distance information can also be given in the top view. Besides, the green and blue rectangles are defined as human and tractor. The results of obstacle recognition are shown in Figure 9 (d).

For tracking experiments, three types of obstacles were set to be tracked conducting in exp1 to exp3 showing in Figure 10 (a). In the experiments, a man and two agriculture vehicles were moving in the field. A sequence of images was recorded by the stereo vision system. In the results, the two colors represent different tracked targets. If the tracking strategy is successful, obstacles in different frames should correspond to the same rectangular color. The actual velocity of each obstacle was measured with an R2 of 0.9703. The speed of tracking algorithm was about 12 frames per second in average.

(a)

(b)

(c)

(d)

Figure 9. The result of obstacle detection and recognition with human and unknown obstacle. (a) Calibrated left Image. (b) Disparity image with pseudo-color (c) Top-view plot (d) Obstacle

detection and recognition

Page 17: Author(s) - artc.org.tw...manufactured by VideoHome® catches frames from cameras into a computer. The frame rate is 30 frames per second, and the frame resolution is 640 x 480 pixels

13

Exp 1

Exp 2

Exp 3

(a)

(b)

(c)

(d)

Figure 10. Three examples of obstacle tracking and corresponding velocity estimation results. (a) Three experiments represented in different rows. Images are the successfully tracking results in

video sequence. (b) ~ (d) are the above three experiments that actual velocity estimation respectively in the period.

Page 18: Author(s) - artc.org.tw...manufactured by VideoHome® catches frames from cameras into a computer. The frame rate is 30 frames per second, and the frame resolution is 640 x 480 pixels

14

For motion estimation, the information was offered to drivers including position predictions and traveling states. Recalling to the previously mentioned methods, Kalman filter can be used to estimate traveling states. However in practical applications, better initial guess should be given for fast converge. For a reliable velocity estimation, location of obstacles in the next frame can be calculated beforehand. This is a useful value to predict with high precision. Thus, in Figure 10, two different initial guesses were compared. The result shows the trend of the average of error. With the worse one, prediction precision was down to 200 cm but the obstacle disappeared before the system converged. This phenomenon is not the expected result. On the other hand, applying velocity in the estimator had achieved a significant error reduction. Based on the statistical data, the error of position prediction was less than 50 cm.

Figure 10. Comparison of two different initial guesses of Kalman filter

(a)

(b)

Figure 11. Comparison of prediction trajectories with real tracks. (a) Top-view trajectories.

(b) Path of obstacle in the map and the actual small agriculture vehicle image.

Page 19: Author(s) - artc.org.tw...manufactured by VideoHome® catches frames from cameras into a computer. The frame rate is 30 frames per second, and the frame resolution is 640 x 480 pixels

15

In Figure 11, tracks of actual and prediction were recorded at the same time. In this experiment, an agricultural vehicle moved about 10 m in NTU farm as Figure 11 (b) displayed. As expected, the vehicle was tracked successfully. With the improved initial guess to predict the position, the trajectory was extremely close to the actual trail.

In Figure 12, the traveling states (motion state) were divided into two types: “Approach” and “Away”. Different colors corresponded to different obstacles. Downward arrow indicates that the obstacle is getting closer, while the opposite direction of the arrow denotes moving away. The states of each obstacle are projected and displayed on the top right in left image. In the top right image of Figure 12, two obstacles were tracked and their motion states were shown. Two arrows were both pointing down, and the relative position of them was shown on the black bar.

Figure 12. Experiments of obstacle motion state

Conclusions A stereo vision system for obstacle recognition and tracking was developed and evaluated with designed experiments. Based on the proposed method, experimental results have shown the system’s functionality, durability and efficiency in farms. For the obstacle recognition, the three types of obstacles, including large and small agricultural vehicles, and human, were examined. Integrating with HOG and LDA, a recognition model was built via SVM to classify whether the obstacle is a human or not. The performance of recognizing human is over 90%. In order to

Page 20: Author(s) - artc.org.tw...manufactured by VideoHome® catches frames from cameras into a computer. The frame rate is 30 frames per second, and the frame resolution is 640 x 480 pixels

16

classify agricultural vehicles, the geometrical feature and LDA are proposed to be the new feature. With the feature, the experimental results indicate the performance is over 85%. In obstacle tracking, the Bhattacharyya distance was employed for registering the obstacle in different frames. The 3D information of the obstacles and the relationship of obstacles were applied to estimate the motion model. The results show the tracking and recording trajectories both worked well. For future works, combining various information including color image and motion states is a possible way to develop the collision avoidance mechanism for different types of obstacles. With different avoidance mechanisms, the motion estimation would make driving agricultural vehicles safer.

References Billingsley, J. 2005. Machine vision applications in agriculture. The 9th International Conference

on Mechatronics Technology. Kuala Lumpur, 433-435. Dalal, N., B. Triggs. 2005. Histograms of oriented gradients for human detection. Proceedings of

IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 20-26. Foggia, P., A. Limongiello, and M. Vento. 2005. A real-time stereo-vision system for moving

object and obstacle detection in AVG and AMR applications. Proceedings of the 7th International Workshop on Computer Architecture for Machine Perception. 58-63.

Ollis, M., and A. Stentz. 1996. First results in vision-based crop line tracking. IEEE International Conference on Robotic and Automation. 951-956.

Pantilie, C. D., S. Bota, I. Haller, and S. Nedevschi. 2010. Real-time obstacle detection using dense stereo vision and dense optical flow. IEEE International Conference on Computer Communication and Processing. 191-196

Pocol, C., S. Nedevschi, and M. M. Meinecke. 2008. Obstacle detection based on dense stereo vision for urban ACC systems. Proceeding of 5th International Workshop on Intelligent Transportation. 13-18.

Lin, T. T., A. C. Tsai, K. C. Chuang, Y. C. Chen, Y. S. Chen. 2012. A real-time stereo vision system for obstacle detection and recognition. ASABE Paper No. 121340958. St. Joseph, Mich.: ASABE.

Yilmaz, A., O. Javed, and M. Shah. 2006. Object tracking: A survey. ACM Computing Surveys. 38(45):1-45.