01241907

8/2/2019 01241907

1/6

Proceedings of the 2001 IEEEInteinationd Conference 00 Robotics &AutomationTaipei, Taiwan, September 14-19, 2001

Robust M odel-based 3D Object Recognition by C ombiningFeature M atching with TrackingSungho Kim, Inso Kweon

Dept. of Electrical Engineering & Computer ScienceKorea Advanced Institute of Science and Technology373-1 Guseong-dong, Yuseong-gu, D aejeon [email protected], [email protected]

Abstrac: - We propose a vision based 3D object recognitionand tracking system, which provides high level scenedescriptions such as object identification and 3D poseinformation. The system is composed of object recognitionpart and real-time tracking part. In object recognition, wepropose a feature which is robust to scale, rotation,illumination change and background clutter. A probabilisticvoting scheme maximizes the conditional probability definedby the features in correspondence to recognize an object ofinterest. As a result of object recognition, we obtain thehomography between the model image and the input scene. Intracking, a Lie group formalism i s used to east the motioncomputation problem into simple geometric terms so thattracking becomes a simple optimization problem. An initialobject pose is estimated using correspondences between themodel image and the 3D CAD model which are predefined andthe homography which relates the model image to the inputscene. Results from the experim ents show the rob ustness of theproposed system.

.

I. INTRODUCTIONOne of the most important tasks for an intelligentservice robot is to identify objects of interest and to trackthem in indoor environment. If a mobile robot iscommanded to bring a certain object, it has to recognize

what objects are in the scene in advance, then track therecognized object for further actions such as grasping andmanipulation.There have been many methods to solve object recognitionproblems and tracking problems separately, but seldom tosolve both problems simultaneously.Object recognition is defined as the process of extractinginformation such as name, size, position, pose, andfunctions related to the object. In this paper, we restrict thedefinition to extracting objects identification and poseinformation only. Recently, there has been muchdevelopment in object recognition, hut it still remains onthe level of recognizing object in a well controlledenvironment. This is originated from the object variationssuch as view angle changes and illumination changes.Further more, it becomes a difficult task when an object isoccluded by others or placed in a cluttered environment [ l -21. To solve these problems, various approaches have beenproposed using invariant, 3D CAD model and appearance[3-51. ecently, reflecting the characteristic of the humanvisual object recognition methods based on local image

Incheol KimAgency for Defense D evelopmentYuseong P.O. BOX 35, Daejon 305-152Korea

[email protected]

. .L .

such as local differential invariants, SIFT (scale invariantfeature transform) and eigen window have been systemsuggested [6-91. But these approaches have shown somelimited success to some problems like illumination change.We propose a unified framework of object recognition andtracking for a service robot to work in indoor environments.To solve the object variations, we propose modified localZemike moments robust to various environmental changessuch as illumination changes and pose changes. Therecognized object can he tracked using Lie groupmethodologies to simplify representation and computationof a motion. The initial 3D CAD model alignment isperformed using the homography computed in the processof object recognition.11.RECOGNITION AND TRACKING

Figure 1 shows the proposed system of 3 0 objec trecognition by combining feature matching with tracking.The system is consisted of three components: the model DBgeneration, object recognition and tracking. In modelgeneration, four points of an image model are pre-selectedcorresponding to the 3D CAD model points. By performingobject recognition using local Zernike moments, we canobtain a homography between the image model and theinput scene. By transforming the image model points usingthis homograpby, we can establish correspondencesbetween the input scene feature point and 3D C AD featurepoints. Using these correspondences, we can estimate theinitial pose of the recognized object. Then we performobject tracking using Lie group SE(3) in real-time.I Sensor lmut I

Fig. I . Proposed recognition and racking system0-7803-7736-2/03/$17.002003 EEE 2123

8/2/2019 01241907

2/6

Ill. ENERAL OBJECT RECOGNITIONWe propose a new object recognition method which isrobust to scale, planar rotation, illumination change,occlusion and backg round clutter.

A . Prop osed object recognilion systemThe object recognition system is composed of robustfeature extraction part and feature matching part. Figure 2shows the overall system of the object recognition part.

~ _ _ - -~ ___ with the conditions ( n -1ml) : even, /mi 5 n .As the Zemike moments are calculated using radialpolynomials shown in figure 3, they have inherent rotationinvariant property. Especially, Zemike moments havesuperior properties in terms of image representation,information redundancy and noise characteristics [14].

Fig. 2. Theproposed object recognition systemIn off-line process, Zemike moments are calculatedaround interest points detected from the scale space ofimage model and are stored in a database. We canrecognize an object by probabilistic voting of these Zemikemoments in on-line process. The locality of the Zemike

moment provides some robustness to occlusion andbackground clutter. We verify the recognition by aligningmodel features to input scene. In this process, thehomography between image model and input scene iscalculated. We determine the success of the recognition bythe percentage of the outlier. The outlier isdetermined fromthe distance between scene feature position and transformedmodel feature position by ho mography.B. Robust localfeature: Zernike moments

The proposed object recognition method is based on th efact that the human visual system concentrates on a certainobject dwing recognition [lo]. Interesting points in imagemodel are placed on the same position in input scene. Thisis called the repeatability of interesting points. We use theHa ms co mer detector which has superior repeatability [ l l -121.We use Zemike moments to represent localcharacteristics of an image segment. Zemike moments aredefmed over a set of complex polynomials which form acomplete orthogonal set over the unit disk x 2 + y 2 < I[13]. Zemike moments are the projections of the imageintensity f ( x , ) onto the orthogonal basis fnnctionsV A x , ) .

Fig. 3. Radial polynomials of oder n=O, 1, ...,9But they are sensitive to scale and illumination changes.We reduced the scale problem by applying scale spacetheory to image model [15]. This method makes the comerextraction process more effective than using the imagepyramid. The problem of illumination change is simplysolved by n ormaliz ing the mom ents by the ZOO mom entwhich is equivalent to the average intensity. Since localillumination change can be mod eled as

f ( x j , y i ) = a J ( x i , r j ) (4)w e can make illumination invariant feature as

Z ( f ( X . Y ) )- ~ L Z ( J ( X , Y ) )- Z ( f ( X , Y ) ) ( 5 )m y a L m f m /where f ( x , y ) represents the intensity at ( x , y ) , f ( x , y )is the new intensity after illumination change, a, means therate of illumination change, m, denotes average intensityan d Z means Zemike mo ment operator.Figure 4 shows the robustness of the modified Zemikemoments to illumination changes. We can observe that thenormalized Zemike moments are almost constant evenunder illumination changes.

2124

8/2/2019 01241907

3/6

If all the objects are equally probable and independentmutually, equation (IO) becomes2 P ( H h M j ) P ( M , )

(1 1)

05, I, > > J 3 I, a .Imu.- -e*M(a) Without illumit&tioonvariant (b) With illmination invariantFig. 4. Zemike momentsand lluminationchanges

C. Probabilistic votingIn this paper, we propose a probabilistic voting basedrecognition method considering the stability of Zemikemoments of the image model and the similarity of Zem ike

moments between the image model and the input scene.Object recognition using probabilistic voting meansfinding a model M i hat maximizesargmax P (M i IS) ( 6 )

M,where S represents the input scene. In this paper, weassumed only one model M i for each object in frontalview. But the model images can be extended to representfull 3D views.

For each input feature, we form a set of matching pairsconsisting of the corresponding model features whoseZernike mom ents are similar to that o f the input feature. Fork-th input feature, a set of matching pairs is,wk{ ( z k , i k j ) 1 z kE S , ~ ~ ~M, i = l , Z ; . . , ~ , } (7)

where Z,.denotes the Zem ike mom ents of the k - h inputfeature, Zy means the Zernike moments of the modelfeature and N , is the number of corresponding modelfeatures. The hypothesis is

Hh=kZj,ij)l(Zi3.?j)~h@= l,2,...,,} (8)where Ns is the number of interest points of input scene.The set of total feature pairs can be written as

H = { H IU H 2 . . . uHN,] (9 )where N , is the size of product space as MFk x H h . Ineach hypothesis, a model point can be assigned to multipleinput points because corresponding pairs are made usingEuclidean distance measure. Since H is composed of themodel Zem ike moments correspond ing to the input sceneS we can substitute H fo r S By the Bayes theorem,equation (6) becomes

h=lBy the theorem of total probability, the denominator canbe written as

NMP ( H h ) = C P ( H h M , ) p (M , ) (12)

,= IThe term P ( H h M, ) is computed by considering thestability and similarity measure of matching pairs.Basically, P ( H h M , ) has to be large when both the

stability and the similarity measures are high. The stabilitymeasure reflects the incompleteness of repeatability ofinterest points and the similarity measure reflects thecloseness of feature vectors in Euclidean space.( I ) Stability ( os : As equation (13), the stability of

Zernike moments is inversely proportional to thesensitivity which is the standard deviation of Zemikemoments calculated at 5 positions around an interestingpoint, The smaller the sensitivity is, the more stableZernike moments of the image model are.

(2) Similarity ( CO, ): As defined equation (14), thesimilarity of a feature pair is inversely proportional tothe Euclidean distance between the Zemike momentsof input scene and that of the corresponding imagemodel.

Therefore, P ( H h Mi) is defined as

j = iwhere

P ( ( Z j , i , ) Mi ) =k[-+) f Z , E f ( M O (16)a s the normalization factor and E is assigned if thecorresponding model feature doesnt belong to a certainmodel. We use the approximate nearest neighbor searchalgorithm to find matching pairs [16]. It takes log time forlinear search space.

D. ecognition verification

else

Recognition results are verified using feature pairs. Wefind optimal feature pairs by rejecting outliers using thearea ratio which is preserved under the affine

21 25

8/2/2019 01241907

4/6

transformation. For given four points (4 , 2,4,P, ) shownin figure 5 , we calculate the area ratioS , / S , = A P 2 ~ P 4 1 A P , P 2 P 3 n the image model andS,'I S,'= P',', ', AP, ', P', in the input scen e. If thetwo ratios differ with by a predetermined threshold, wereject the fourth feature point. We assume the first threepoints are matched.

Model #

(a) Local Ceahlre of model (b ) Local feahlre of sceneFig. 5 . Rejection of outliers using arearatioThen we calculate an initial homography based on these

optimal feature pairs randomly selected. A LMedS basedmethod selects an optima1 homog raphy fro m feature pairs.This homography is used as the initial pose estimation of3D CAD model. We can also determine the percentage ofoutliers by applying this homog raphy to the remainingimage model points.E. Object Recognition results

Figure 6 shows the image models consisting of twentyobjects such as hooks, CDs, telephones, boxes, dolls,photos and etc. which can he found in sumounding us.

model1 model2 model3 model1 model6

Recognitionrate 1%1

teration Success

model 6 model1 model8 model9 model 10

I O

m118 105 88.9

model11 model 12 model 13 model 14 model 15

model 16 model 17 model 18 model 1S model20

Fig. 6. Image models used for object recognitionFirst, we evaluated the recognition system performancefor independent environments, such as scale change,illumination change, background clutter and occlusion. Thesystem can recognize objects under scale changed up tofactor 3, illumination intensity changed up to factor 2;somewhat complex background and half occlusion. Forthese cased, success rate is above 95%.

2

Second, we experimented under complex environment.Figure7-(a) shows some examples of recognition resultsunder view changed in cluttered background . Figure 7-(b)shows experimental results when cluttered background,illumination change and occlusion exist.

L .I",Fig. 7. Sample o bject recognition results Cor imagestaken under various conditionsTable 1 shows the statistical results of )the objectrecognition to model-IO. It is failed to recognize somescenes where sever specular, blurring, background clutter

and low illumination exist.

IV. REAL-TIME ALIGNMENT A ND TRACKINGBy aligning the projected 3D CAD model of therecognized object and computing the motion, we can track

126

8/2/2019 01241907

5/6

the recognized object continuously. W e utilize the generalparadigm of Lie group formalism [171.Conventional approaches often estimate an initial pose of3D CAD model manually. In this paper, we initialize thepose of 3~ CA Dhomo graphy computed in the recognition stage.

Figure &(a) shows the input image which containsmodel-10. Interesting points computed by the Harris comerdetector are shown in figure 8-(b). By the probabilisticvoting scheme proposed in this paper, we correctlyus,ng the recognize d model-IO as shown in figure 8-(c). Finally,figure X-(d) shows the transform ed image model features tothe input scene using the homography computed from thematrix, we can project the 3D CAD model to the inputscene a- shown in figure 9.

If we have n correspondences between the recognizdimage model points X, = ( u m , v m , l ) and the input Scene feature correspondencesSince we have the initial posepoints X, =(u,,v,,~)~,he relations are w ritten as

...~... ~ "X,= Hx, (17)where H is the 3 by 3 homography.From equation (17), we obtain

H = X,X,r(X,X,T)-' (18)The steps to compute the initial pose of 3D CAD model

include Iinitial al im en t frame#lOSpecify 4 corresponding points between a 3D CADmodel and the corresponding image model.C o m p u t e t h e h o m o g r a p h y t h r o u g h o b j e c trecognition.Transform the pre-selected model image points to the input scene using the homography obtained instep2.4. The image feature points corresponding to the 3D frame #20 hame #30CAD model points are obtained and transformed tothe camera coordinate system using pre-determinecamera intrinsic parameters [IS].Calculate the rotation and translation m atrices fromthese fea ture pairs.5. v . : ,

frame #5 0V. EXPERIMENTAL RESULTSWe have tested the proposed system using variousmodels shown in figure 6. Figure 8 shows the results ofobject recognition for each stage.

frame #60 frame #70

frame #SO frame #90Fig. 9. Object tracking results for model 10.

The tracking system shows fairly good performance forvarious object motions through a long sequence of images.It takes 1-2sec for object recognition. The trackingFig. 8. Object recogn ition process for model-IO.

2127

8/2/2019 01241907

6/6

performance shows 10-15 framesisec under AMD 1.4GHz.

VI. CONCLUSIONSIn this paper we have proposed a practical objectrecognition and tracking system. Main contributions of theproposed system are:First, we proposed an object recognition system which isrobust to view position, illumination change, occlusion andbackground clutter based on modified local Zemikemoments.Second, we proposed an automatic pose initializationtechnique to track the recognized object in real-time byusing the bomograpby computed from the featurecorrespondences.The experimental results demonstrate that robust objectrecognition is feasible by com bining feature matching w ith3D object tracking. For future work, we will develop arecognition system that can recognize and track full 3D

object laid in random poses.VII. ACKNOWLEDGMENTS

This research has been partly supported by ADD andHWRS-ERC grant.

VIII. REFERENCES[ I ] S . Ullman, High-level Vision: Object R ecognitionand Visual Cognition, f i e M T Press, 1997.[2] C. Olson, Fast Object Recognition by SelectivelyExamining Hypothesis. Doctors Thesis. Universify

of California, 1994.[3] K. Rob and I. Kweon, 2-D Occluded and CurvedObject Recognition using Invariant Descriptor andProjective Refinement, IElCE Transactions onInformation a ndSy stems, vol. E81-D, No. 8 , 1998.M. Stevens, J. Beveridge, Precise Matching of 3-DTarget Models to Multisensor Data,CSU, Jan. 20,1997.[5] D. Keren, M. Osadchy, C. Go tsman, Antifaces: ANovel, Fast Method for Image Detection, IEEE Tras.

[4]

on Pattern Recognition and Machine Intelligence, vol.23, No.7, pp.747-761, 2001.C. Schmid, A. Zesserman, R. Mohr, IntegratingGeometric and Photometric Information for ImageRetrieval, In International Workshop on Shape,Contour and Grouping in Computer Vision, 1998.D. Lowe, Object Recognition from Local Scale-Invariant Features, Pro c. International Conferenceon Computer Vision, IEEE Press, 1999.

[E] K. Ohha, K. Ikeuchi, Detectability, Uniqueness, andReliability of Eigen-Windows for Robust Recognitionof Partially Occluded O bjects, IEEE Pattern Analysisand Ma chinelntelligence 19(1 997 ), 110.9, 104 3-10 48,1997.D. Jugessur, Robust Object Recognition using LocalAppearance based Methods, Master Thesis, McGillUniv., 2000.[IO] A. Treisman, Features and Objects in VisualProcessing, Scientific Aam erican, November 1986.[ I I ] C. Scbmid, U. Mohr, C. Bauckhage, Comparing and

Evaluating Interest Evaluation o f nterests, In ICCV,pp. 230-235, 1998.[121 C. Ha m s and M. Stehens. A Combined Comer andEdge Detector, In Nvey Vision Conference, pages147-151, 1988.Invariance, DoctorS Thesis , The Univ. of Sydney,2000.[14] C. The, R. Chin, On Image Analysis by the Methodsof Moments, IEEE Transactions on Pattern A nalysisand Machine Intelligence, vol. 10, no . 4, 1988.[IS] T. Lindeberg, Scale-space thery: A Basic Tool forAnalysing Structures at Different Scales, Journal oAppliedStat is t ics ,20 ,2, pp. 224-270, 1994[I61 S. Arya, et al., An Optimal Algorithm forApproximate Nearest Neighbor Searching in FixedDimensions, J. ACM45(6):891-923, Nov 1998.[17] T. Drummond, U. Cipolla, Real-time visual trackingof complex structures, IEEE Transactions on PatternAnalysis and Machine Intelligence, Volume: 24 Issue:7 , p. 932 -946, July 2002.[I81 Z. Zhang, Flexible Camera Calibration by Viewing aPlane from Unknow n Orientations, ICCV, 1999.

[ 6 ]

[7]

[9]

1131 S . Abdallah, B. MengSc, O bject Recognition via

2128

01241907

Documents