real time tracking of borescope tip posestewart/papers/martin_ivc.pdftraditional, non-video...

1

Real Time Tracking of Borescope Tip Pose

Ken Martin, Charles V. Stewart*Kitware Inc., Clifton Park, NY 12065

*Department of Computer Science, Rensselaer Polytechnic Institute, Troy, NY [email protected], [email protected]

Keywords

pose estimation, borescope inspection, industrialinspection, lens distortion

Abstract

In this paper we present a technique for trackingborescope tip pose in real-time. While borescopes areused regularly to inspect machinery for wear or damage,knowing the exact location of a borescope is difficult dueto its flexibility. We present a technique for incrementalborescope pose determination consisting of off-line fea-ture extraction and on-line pose determination. The off-line feature extraction precomputes from a CAD model ofthe object the features visible in a selected set of views.These cover the region over which the borescope shouldtravel. The on-line pose determination starts from a cur-rent pose estimate, determines the visible model features,and projects them into a two-dimensional image coordi-nate system. It then matches each to the current borescopevideo image (without explicitly extracting features fromthis image), and uses the differences between the predictedand matched feature positions in a least squares techniqueto iteratively refine the pose estimate. Our approach sup-ports the mixed use of both matched feature positions anderrors along the gradient within the pose determination. Ithandles radial lens distortions inherent in borescopes andexecutes at video frame rates regardless of CAD modelsize. The complete algorithm provides a continual indica-tion of borescope tip pose.

1. Introduction

A borescope is a flexible tube which allows a personat one end of the tube to view images acquired at the otherend. This can be accomplished with coherent fiber bundlesrunning through the tube, or a small video camera locatedat the tip of the borescope and a video monitor at the otherend. Borescopes are typically a centimeter or less in diam-eter, a few meters long, and used to inspect inaccessible

regions of objects. For example, when inspecting the inter-nal structure of a jet engine for cracks or fatigue, smallopenings from the outside allow the borescope to besnaked into the engine without having to drop the enginefrom the plane. In such an inspection, it is often difficultfor the inspector to know the exact borescope tip locationwithin the engine, making it difficult to identify the loca-tion of new cracks found or to return to previously identi-fied trouble spots. When a crack is found, the inspectormust measure its length and width which is problematic ifthe borescope’s pose is unknown.

Traditional, non-video techniques for tracking anobject do not work well within the environment a bores-cope typically encounters. Electromagnetic fields are cor-rupted by the surrounding presence of metal. Acoustictechniques are susceptible to ringing caused by metal partsand convoluted passages, and any technique that relies ona clear line of sight is by definition implausible in theborescope inspection environment. The only option istracking based on the borescope images themselves.

2. Approach

Borescope tip tracking is essentially the camera poseproblem with added constraints and difficulties. First, thelight source for a borescope is collocated with the camera,so the light is moving but always coincident with the cam-era. Second, object recognition is non-standard: while thecomplete object is known, the position within the object(the jet engine) is not and only a small fraction of the com-plete object will appear in any single image. Third, theobject contains structural repetition, making an approachbased solely on dead reckoning unrealistic. Fourth, track-ing (pose estimate updates) must occur at near frame-rates,preferably without specialized hardware. Combined, theseconstraints make borescope tip tracking a novel and chal-lenging problem.

Our solution to these problems is a two fold approachconsisting of off-line feature extraction from a CADmodel and on-line pose estimation. The off-line extractionprecomputes from the CAD model 3D edges consisting of

2

a 3D position and normal direction in the coordinate sys-tem of the CAD model. These features are computed forlocal regions of the model and consider visibility con-straints. This off-line process need only be done once for agiven CAD model. The on-line process starts from aknown landmark and compares the precomputed featuresfor that region of the model to the borescope video. Fromthis comparison an error vector is created which is used tocompute an updated pose. This process repeats for eachframe of video. The performance of the on-line algorithmis unaffected by the size or complexity of the CAD model.

Beyond borescope inspection, this approach would beof value for any model-based pose-estimation problemwhere the camera is placed within the model.

3. Related Work

While borescope tip tracking is a novel problem, thereare a number of related techniques which could be consid-ered. One approach is to subdivide the CAD model intomany smaller parts and then determine the borescope posebased on standard object recognition strategies. Unfortu-nately, this approach, which is employed in a differentcontext by Kuno, Okamoto and Okada[8], suffers fromaperture problems caused by the structural repetitioninside a jet engine. Further, in the time required for partrecognition and associated pose estimation, the borescopemay have moved too far for unique localization. By con-trast, our algorithm determines pose incrementally basedon the image locations of precomputed features, allowingit to run nearly at frame rate.

While camera pose and object pose have been treatedextensively [5][12][15], existing techniques are difficult toapply in our situation. These techniques match 3D objectfeatures with a set of 2D features extracted from a singleimage. In contrast, since we must estimate pose in realtime, we can not extract complicated image features andthe matching process must be rapid. To accomplish this,we match simple 3D edgels to 2D intensity images. Weuse the current pose estimate to project these 3D edgelsinto the image and match by searching along the projectededgel's normal direction, usually deriving only one con-straint per projected edgel. This implies that the match fora 3D feature moves and changes during pose determina-tion, making our technique similar in spirit to other tech-niques that determine pose without specific pointmatches[16].

The first of such techniques is that of Branca [1] whopresents a passive navigation technique based on calculat-ing the focus of expansion (FOE). Essentially, Branca cal-culates the image movement vectors between two frames,performs an iterative error minimization to find a set offeature matches between the two frames, then calculates

the FOE and camera translation from them. This approachuses planar invariants to find the point matches by makingthe condition of planarity part of the error minimizationprocess.

This approach is certainly novel and it doesn’t requirea model to work from, but it is not suitable for borescopeinspection. It requires planar surfaces with at least fivefeatures on them while borescopes typically inspectcurved surfaces such as the inside of a jet engine. Further-more it only calculates camera translations (not rotation),and it has no method to prevent the incremental errorsfrom accumulating. This last point is critical in incremen-tal pose estimation. Many techniques use interframe dif-ferences without any form of dead reckoning. The errorsfrom each pose estimate accumulate and create stabilityproblems and gross errors. This is part of the motivationbehind using model based pose estimation. The model canbe used for dead reckoning as long as its 3D features arestatic. This paper does not directly address deformablemodels.

Another paper worth noting is Khan’s [7] paper onvision based navigation for an endoscope. This is a verysimilar problem to borescope navigation although Khan et.al. solve it in a different manner. Instead of starting with aMR or CT model of the patient’s colon, they construct themodel as they go. When the endoscope enters anuncharted region of the colon they start extending themodel. Navigation is accomplished by using two featuresthat are commonly found in colonoscopies: rings of tissuealong the colon wall and a dark spot (the lumen) in theimage where the colon extends away from the endoscope.This approach is real-time and once it has built the modelit can support dead-reckoning. It requires no modificationof current endoscope hardware or precomputed models.Unfortunately its choice of features restricts its usage toinspection of ribbed tubes.

Lowe presents a model based motion technique thatgoes beyond pose estimation to also handle models withlimited moving parts [10]. His technique is essentially amodified hypothesize & verify search tree based on linesegment matches. It uses probabilities from the previouspose estimate to order the evaluation of the possiblematches; this significantly improves the typically exces-sive computational requirements. There are three signifi-cant drawbacks to such an approach. The first is that itrelies on line segments as features. As will be demon-strated later, there are few line segments in a jet engine.Second, it requires complete feature extraction from theinput image which is time consuming. In his application,dedicated image processing hardware was required and yetthe resulting performance was limited to three to fiveframes per second. Finally, it is still sensitive to the num-ber of potential features in the model. Results were

3

reported for a model consisting of only a few lines. Unfor-tunately, pose estimation inside a complex part typicallyresults in millions of features in the model, nearly all ofwhich will be obscured in a given view.

Some well known related work was performed byDickmanns and his colleagues. His work has focused onreal-time, model-based pose estimation and navigation. Inan early paper [2] he describes a new approach based onmeasuring error vectors between predicted and actual fea-ture locations, and then using these errors as input to mod-ify the pose estimate. In later work the state estimation ishandled by a Kalman filter [3] and also incorporates nonvision based inputs such as airspeed [4]. The later paperpresents a good overview of his technique.

There are two significant differences between Dick-manns’ work and this work. First Dickmanns uses theCAD model to develop his control mechanism, i.e. hissoftware, feature selection, and Kalman filter are tuned toa specific task with a specific model. His approach doesnot support supplying a general CAD model to work from.This customization can be used to reduce the computationload and improve robustness. For example, when landingan airplane, Dickmanns uses ten predefined image featuresthat correspond to specific parts of a runway and the hori-zon. Incoming images are analyzed only at the predictedlocations of those ten features. Likewise those ten featuresrepresent unique defined inputs to the Kalman filter. Thispaper does address the issue of automatically calculating aset of pertinent features from a CAD model, and it assignsno semantic meaning to those features.

The second difference is Dickmanns’ choice of fea-tures. He uses constant features such as oriented edge tem-plates which yield one constraint. Our paper considers theinterpretation of a feature as providing zero to two con-straints as appropriate. This is especially important as thefeatures are automatically generated, not manually definedas in Dickmanns’ work. This impacts the central pose esti-mation calculation which must be adapted to the numberof constraints provided by the features. There are alsosome minor differences such as Dickmanns’ work notincorporating wide angle lens distortions as found in aborescope, but most remaining differences would requireminor modifications to either approach.

One of the most closely related approaches is that ofChris Harris which uses control points from the CADmodel combined with a Kalman filter [6]. The first keydifference is that he doesn’t indicate how his controlpoints are generated. It is implied that they are handselected from the 3D model which is impractical in manysituations. Second, since his approach is focused on esti-mating the pose of an object from the outside, its visibilitytests for the control points are too limited for internal poseestimation. Finally, his approach deals with features as

providing zero or one constraint. As will be discussedlater, it is sometimes necessary to have features providetwo constraints (position) instead of just a gradient vectordisplacement.

4. Feature Extraction

As summarized above, our approach consists of twoprimary pieces: off-line feature extraction, and on-linepose determination. Off-line feature extraction must takethe CAD model, which could be greater than one gigabytein size, and produce 3D features that can be loaded quicklybased on a current pose. This process is critical becausejust traversing the CAD model in memory would consumemore time than allowed for incremental pose estimation.The major components of the feature extraction processare depicted in Figure 1 and the process is summarized inthe following five steps:

1. Based on the CAD model, a list of 3D sample pointsis generated over the 3D region where the borescopemight travel. This could be either a uniform samplingof the 3D region completely enclosing the CADmodel, or it could be a non-uniform sampling of all orpart of the region.

2. At each sample point, computer graphics hardware isused to render a collection of 2D images of the CADmodel from that location. The view directions of theseimages are selected to ensure that the collectedimages form a mosaic completely enclosing the loca-tion in question.

3. For each synthetic image generated in step two, edgedetection is used to extract edgel chains having largeintensity gradients, and then a fixed size subset ofedgels are selected. The selected edgels have the larg-est intensity gradients but also satisfy a minimumpairwise image separation.

4. The 3D location and direction on the CAD model iscomputed for each selected edgel. These become the3D features for the current sample point.

5. The process repeats for each image and sample point.

The first step is to determine where the sample pointswill be and how many of them will be used. This operationmust generate enough sample points and 3D features thatthere will be one near any position encountered where theborescope travels. For the results presented here, the sam-ple points were generated simply as a regular grid.

In the second step computer graphics hardware isused to render images of the CAD model. The eventualgoal is to produce features useful during on-line pose esti-mation. The problem then becomes determining how to“predict” a useful feature from a CAD model. From the

4

perspective of the on-line algorithm, a useful feature willbe one that can be used to provide an image movementvector. As such, image intensity discontinuities, or edges,will be likely candidates. One possibility is that range ornormal discontinuities in the CAD model will form imageintensity discontinuities in the video. This is often, but notalways true, and some intensity discontinuities will arisesolely from markings and changes in materials. A moregeneral and more reliable approach is to use computer ren-dering to create a realistic image of what the part shouldlook like. This can then be examined for intensity edges.

In practice, step two involves rendering six images foreach sample point. These images are rendered with a viewangle of ninety degrees and oriented along the positive andnegative directions of the three cartesian axes to form animage cube. To obtain the best features possible, the spec-ular and diffuse material properties of the CAD model aremanually adjusted so that the computer rendered imagesclosely resemble video of the part to be inspected. Thisrequires manually selecting material properties for render-ing that match the properties of the physical part.

In step three, standard techniques are used to extractedgel chains from each rendered image. From the edgelchains up to fifty features are extracted with a minimumangular separation of three degrees relative to the bores-cope tip. That gives fifty features for each view, six viewsfor each sample point yielding up to 250 features per sam-ple point. If the synthetic images contain few edgels thenthere will be fewer than 250 features. Likewise more fea-tures could be used if available. Limiting the number offeatures selected effectively limits the processing timerequired for the on-line pose estimation.

These image features are then converted to 3D edgesby casting a ray from the sample point, through the edgelin the image plane and into the CAD model. The closestray-polygon intersection provides the 3D position of theedgel. The gradient in 3D can be found in a similar man-ner. These 3D positions and orientations are recorded inthe global coordinate system of the CAD model so theycan be projected onto any image, not just the ones fromwhich they were generated.

Overall, this off-line process is very time consuming,necessitating use of ray-polygon acceleration techniques,such as spatial hashing [11].

5. On-line Pose Determination

The on-line pose estimation algorithm combines aninitial pose estimate, the 3D features computed off-line,and the live video from the borescope to produce a runningpose estimate. This algorithm uses the current pose esti-mate to select the appropriate subset of the 3D features

extracted during preprocessing, project these features intothe image coordinate system, match these features to theborescope image, and refine the pose estimate based ondifferences between the projected and matched featurepositions. By using precomputed features, by avoiding theneed for explicit feature extraction in the borescopeimages, and by iterative pose refinement, the algorithmachieves video rate pose determination. An important fea-ture of this pose determination algorithm is that a pro-jected 3D feature (edgel) may either (a) be matchedexactly, giving a 2D position error vector, or (b) bematched along the gradient direction, giving only the 1Dcomponent of position error along this direction, or (c) beignored entirely as an outlier. The approach also accountsfor the significant lens distortions of a borescope. An over-view of the algorithm follows:1. Obtain the previous borescope location and orienta-

tion. Initially this comes from the operator positioningthe borescope at a known location or landmark. Sub-sequently, it is taken from the results of the estimatedpose for the previous borescope image.

2. Determine the 3D sample point closest to the previousborescope location.

3. Determine which features for that point would be vis-ible to the borescope based on its previous pose.

4. Repeat the following three steps until the change inthe pose estimate falls below a threshold or the inter-frame time has expired.5. Project the N 3D features selected in step three

onto a 2D image coordinate system based on thecurrent pose estimate (initially the estimate fromstep one), computing both the 2D position andgradient (normal) direction of each feature.

6. For each feature, estimate the error in its pro-jected position by finding the position of the bestmatch between the projected feature and theborescope image region near the projected posi-tion. The difference in position between pro-jected and matched image positions forms a 1Dor 2D error term for each feature, depending onthe results of the matching process.

7. Use the N error terms to update the borescopepose estimate.

8. Return to step one and start working on the next frameof video.In step one, there typically are a few predefined land-

marks for the inspector to choose as a starting point. Steptwo is a very simple calculation to find the closest samplepoint to the current pose. Step three selects only the fea-tures that could be seen by the borescope in its previouspose. Since the features are 3D locations, this involves

5

determining if they are in the borescope's view frustum.This set is further restricted to eliminate features near theedges of the view frustum by selecting an angle slightlysmaller than the borescope’s view angle. Step four startsan iterative error minimization process to determine theoptimal pose estimate. This process is limited to the inter-frame time of the borescope. In practice the optimal posemay not have been found when the time has elapsed, butthe current pose estimate is typically close enough to theoptimum to provide a suitable pose for projecting featuresfor the next frame. In addition, first order motion predic-tion is used to aid the projection of features for the nextframe. In step five the 3D features are projected onto thecurrent image plane yielding 2D edgels. This projection isthe standard pinhole perspective projection followed by asecond order radial lens distortion (see below).

Steps six and seven warrant additional detail. Step sixstarts with N 2D edgel locations, their corresponding 2Dedgel gradients, and the current borescope image. For each2D location, the normalized cross-correlation between aone dimensional step function template (as shown in Fig-ure 2) oriented along the edgel gradient, and the video atthat location (see Figure 3) is measured. This process isrepeated at locations along the positive and negative edgelgradient direction up to a maximum distance determinedfrom the camera’s uncertainty. If the maximum correlationfound is greater than a threshold, then the feature is con-sidered a gradient feature. It provides one constraint to thepose refinement computation --- the optimal gradient dis-placement. This displacement is calculated using aweighted average of the correlations, essentially giving asubpixel location to the matched position. In practice thegradient direction will not be axis aligned as in Figure 2,so calculating the normalized cross-correlation involvesresampling the step function template onto the pixel array.For maximum performance a set of step function tem-plates can be precomputed for a set of discrete orientationsand sub-pixel positions. This is what is actually done inthe implementation to avoid the cost of on-line resam-pling. The step function template selected provides a bal-ance between discrimination and noise resistance for the300x300 pixel images produced, but other templates couldbe used.

The initial template matching is restricted to imagepositions along the feature's gradient direction becauseeach feature consists of a location and a direction --- thereis nothing to limit tangential movement. Only if a correla-tion above the threshold isn't found along the gradientdirection, is the search extended to include tangential dis-placements. For example, if the feature shown in Figure 3were moved to the left, eventually the gradient directionsearch would no longer intersect the edge it previouslyhad. In this situation the tangential search can provide the

necessary error vector --- a two component error vector asopposed to the earlier one component gradient displace-ment error vector. This error vector will drive incrementalpose estimation toward a pose where the projected featurelocation will produce a match along its gradient in subse-quent iterations of steps five through seven.

After calculating the normalized cross-correlationover the entire region the maximum value may still bequite small. This corresponds to the situation where anedge is expected within the region but nothing suitable isfound. In this case no error vector is produced but the fea-ture still provides information by its lack of an error vec-tor. Its low correlation will impact the average correlationwhich is used as a confidence measure for the algorithm.This process is repeated independently for all N features,each contributing zero, one or two components to the errorvector for the overall local error at iteration . This isthe error vector to be minimized by the delta pose calcula-tion. Step seven, calculating the change in pose based onthe error vector , is described in the next section.

6. Delta Pose Calculation

The change in pose is computed from the error vector. For simplicity in deriving the delta pose estimate we

first consider the case where each of the features are 2Dmatches each producing two constraints. This will then beextended to handle all three conditions. We start with thefollowing definitions:

Where for we are using the simple perspectivecamera model with known intrinsic parameters (This isextended to handle radial lens distortion below.) Startingfrom the equation

we can derive an expression for the change in the fea-ture’s image coordinates based on changes in the bores-cope pose as follows:

Where is the Jacobian. Dropping the higher orderterms, yields a constraint on the pose for each error vector.

Et t

Et

Et

P t the 6D boroscope pose vector at iteration t=

uit the 2D image position for feature i at iteration t=

xi the 3D position of feature i=

F the boroscope projection function=

F

uit F P t xi t,( )=

ui t∆ uit ui t 1–( )–=

F P t 1–( ) P∆+ xi,( ) F P t 1–( ) xi,( )–=

Ji P t 1–( ) xi,( ) P∆ H.O.T.+=

J i

6

We can combine the constraints for all N >= 3 matches tosolve for the pose. First we combine the vectors intoa 2N error vector . Likewise we combine the Jacobians,

, into a 2N by 6 matrix . Then, we determine the poseerror by minimizing the following error norm:

yielding

The resulting provides an error vector for theborescope pose and can be computed using singular valuedecomposition (SVD). For a given frame this technique isapplied in an iterative manner to account for non-linearityand feature rematching. The Jacobians can be computedalong the lines of the technique described in Lowe[9]. Fora given borescope image frame this technique is applied inan iterative manner to account for non-linearities and fea-ture rematching, as discussed above.

This technique is easily extended to handle the radiallens distortion typical of borescopes. The following distor-tion model [13], solves for the distorted coordinates as afunction of the non-distorted coordinates.

In this and are the non-distorted pixel coordinates inconsideration, is the non-distorted radius of that point(measured form the center of the image), and is the dis-tortion constant computed off-line during camera calibra-tion. This is a second order approximation and it can easilybe extended to higher orders. This model is less commonlyused than the traditional formulation which computes non-distorted coordinates from distorted values. Our motiva-tion is that this formulation is easily differentiated andhence can be incorporated into the above Jacobian calcula-tions using the chain rule. The derivatives are:

The second extension to the above derivation handlesthe situation where some features provide one constraint(“gradient features”), others provide two constraints(“position features”) and the remainder are ignored. Start-ing with the equation for gradient displacement:

where is the unit gradient direction, we can con-

ui t∆Et

Ji JP t∆

J P t Et–∆2

P t∆ JtJ( )1–

JtEt=

P t∆

u u 1 kr2+( )=

v v 1 kr2

+( )=

r r 1 kr2+( )=

u vr

k

u∂u∂----- 1 3ku

2kv

2+ +=

u∂v∂----- 2kuv=

v∂u∂----- 2kuv=

v∂v∂----- 1 3kv2 ku2+ +=

d∆ it gt ui t∆=

g

struct an error vector

The error norm becomes where wehave introduced the matrix constructed as follows:

If is the number of gradient features then will have rows similar to the first two shown. These rows map 2Ddisplacements into gradient displacements essentially byperforming a dot product with the unit gradient. Likewiseif is the number of position features then the bottomright corner of will be the identity matrix. Theresulting size of will be by 2N. We can thensolve this in the same manner as before yielding:

While is not invertible, in general is.

7. Results

We have implemented and tested the feature extrac-tion and pose determination algorithms on both real andsynthetic data. The off-line feature extraction was devel-oped on a UNIX platform using the Visualization Tool-kit[14] as a framework. The on-line system is PC basedwhich, with a simple frame capture card, maintains poseupdates at over ten frames per second independent of thesize of the CAD model.

The top image in Figure 4 shows a portion of a CADmodel from a F110A exhaust duct and liner. The liner isan undulating surface with numerous cooling holes. The3D features extracted for a portion of this CAD model areshown as small white spheres typically surrounding thecooling holes. Note that there are no line segments or flatsurfaces to use for complex features. The bottom image inFigure 4 shows a computer rendering of a simple CADmodel with some of the 3D feature positions shown aswhite spheres. Note that some of the features are located atchanges in material properties, not just at range or normaldiscontinuities.

Figure 5 shows two sample images from a WelchAllen VideoProbe 2000 borescope. The image quality suf-

Etdi t∆ for all i that are gradient features

ui t∆ for all i that are position features=

HJ P t Et–∆2

H

H

g1u g1v 0 0 ... 0 0 0

0 0 g2u g2v ... 0 0 0

... 0 0 0 0 ... 1 0 00 0 0 0 ... 0 1 00 0 0 0 ... 0 0 1

=

a H a

bH 2bH a 2b+

P t∆ HJ( )t HJ( )( )1–

HJ( )tE t=

HtH HJ( )t HJ( )

7

fers from the borescope’s small lens, narrow depth offield, and relatively low resolution pixel array. Theseimages are typical of the input to the on-line pose estima-tion algorithm.

To test the robustness of the on-line algorithm a set oftwelve video test sequences were captured that included awide variety of borescope motions within a CAD part. Aneffort was made to collect a variety of movements withoutconsideration for how the pose estimation would perform.For each frame of each sequence the probable errors inposition and orientation were recorded along with theaverage correlation of the projected 3D features. The prob-able error is calculated from the residuals of the SVD solu-tion. The data from sequences two and three are shown inFigure 6. In sequence two the positional error averagesone millimeter while the rotational error averages a quarterdegree (the volume being inspected is about one cubicdecimeter). The average confidence is about 0.07 whichmay seem low given that an ideal match would be 1.0.This low correlation isn’t due to misalignment but ratherthe small gradients found in typical borescope images. Theresults for sequence three were similar except that near theend the confidence starts to fall while the positional androtational errors start to climb. Eventually this testsequence failed due to a lack of features as the borescopewas brought up against a wall.

Of the twelve test sequences, the on-line algorithmwas able to track nine of them without difficulty. Of thethree failed runs, one failed due to an interframe move-ment too large for the algorithm to handle, the other twofailed due to a lack of features. First consider the failuredue to interframe movement. The on-line algorithm usesfirst order motion prediction to handle borescope veloci-ties that could otherwise cause it to fail. However, given alarge enough acceleration, the algorithm will fail becausethe predicted feature locations are simply too far awayfrom where they were expected. The other two failures,which includes sequence three, occurred as the borescopewas moved up to a wall. As expected, when the bores-cope’s input video contains little or no variability, there isno information to use in determining the pose. Given thatthe borescope’s pose has six degrees of freedom the on-line algorithm will require at least six “edgels” in the inputvideo.

To validate the resulting pose estimates, videos weremade containing the original borescope video overlaidwith the CAD model according to the pose estimate. Oneframe of an overlaid video sequence is shown in Figure 7.The overlaid CAD model is shown as thick white lines.Generating the overlaid image consists of rendering theCAD model from the pose estimate, extracting edgelchains from the rendered image, warping the edgel chainsaccording to the lens distortion model, and finally render-

ing the warped edgels chains as white lines on top of theborescope video. On-line robustness measures include dis-play of the SVD’s condition as well as the average normal-ized cross-correlation of the projected 3D features onto thevideo.

8. Conclusions

This paper has made three contributions to the area ofmodel based pose estimation. The first is its use of com-puter graphics hardware to generate local features for poseestimation. While there has been work on directly extract-ing features from a CAD model, it has focused on catego-rizing this information based on the direction of view, noton location. It has also been based on visibility constraintsand geometric features. This paper goes further to considerview direction, view position, material properties andlighting conditions in determining features. The novel useof computer graphics hardware makes this computation-ally feasible for the large models found in industry. Thiswork has shown that computer renderings can be used as apredictor of features for pose estimation. This approachcan be extended to provide more accurate feature predic-tion by using more advanced rendering and lighting tech-niques.

The second contribution is in the on-line pose estima-tion, where a number of weaknesses in the current poseestimation literature have been addressed. First, neitherhigh order features nor explicit feature extraction isrequired from the input image. Instead, features precom-puted from the CAD model and predicted from the previ-ous pose are projected into the current image and matchedusing simple, efficient correlation techniques. Based onthe matching results, each feature provides from zero totwo constraints on the pose estimate. The frameworkincorporates radial lens distortions and the algorithm runsat near frame rates on a personal computer.

The third contribution is in the consideration of largeindustrial datasets. Many model based pose techniqueshave not been designed to scale to models composed of amillion or more polygons. Issues include the combinato-rics of matching, the reliability of matching, and the smallpercentage of model features typically visible in any singleview. Even though the requirements for borescope inspec-tion differ from other applications, many of these issuesare universal when dealing with CAD models of industrialparts. The techniques presented in this paper allow for on-line pose estimation without any performance or accuracydegradation for large, more complex, CAD models.

A few open issues arise from this work. The first ishow to improve feature selection from the renderedimages. When backprojecting detected image featuresonto the CAD models using ray-polygon intersections, the

8

stability of the 3D position of the feature can be deter-mined. For example, features caused by tangency betweenthe camera line of sight and a surface are unstable. Theseunstable features should be removed from the feature set.In addition, the selection of features could consider thedistribution of gradient directions in addition to the gradi-ent intensities. This can prevent singular matrix problemsthat occur when the gradient directions are closely aligned.

A related issue in the feature extraction is determiningwhere the sample points should be located and how manyof them to use. While we have used a simple regular grid,this is inefficient for the structural variety found in CADmodels. A more advanced approach would start with asparse set of sample points and then locally subdivide thevolume based on some measure of feature quality. This isbasically an octree decomposition of the volume where theerror measure is based on the feature variability across theoctent. One measure of the feature variability is the testpresented in the previous paragraph. If the ratio of the fea-tures rejected to the total number of features exceeds aspecified threshold, then the octent should be subdivided.This would provide an adaptive distribution of samplepoints throughout the model.

An open issue with the on-line pose estimation algo-rithm involves the weighted average used in the featurematching. Currently if a predicted feature lies between twoviable matches, the resulting error vector from theweighted average will be roughly zero. This is not reallycorrect because the true error vector is the vector to one ofthe two possible matches. What is correct, is to say that thefeature has no meaningful contribution to the error vector,only to the overall confidence. As the weighted averagetransitions from a bimodal to uni-modal distribution therelevance of the error vector will increase. Somehow thischange in distribution must be taken into account in addi-tion to the final result. One possibility is to modify theerror metric to include a weighting vector that representsthe confidence in the elements of the error vector.

While there are open issues to consider, this paper hascontributed a practical approach for model based pose esti-mation. This approach handles industrial CAD models,runs at frame rates, and requires no special hardware.These techniques developed to handle borescope andendoscope inspection are valuable to most pose estimationapplications involving industrial CAD models.

Acknowledgments

This work was performed while the first author wasemployed at General Electric Corporate Research andDevelopment. The second author would like to acknowl-edge the financial support of the National Science Founda-tion under grants IRI-9217195 and IRI-9408700.

References

[1] Branca A, Stella E, and Distante A, Passive Navigationusing Focus of Expansion, Proc. Third WACV (IEEE Com-puter Society Press, 1996) 64-69.

[2] Dickmanns E D, An Integrated Approach to Feature BasedDynamic Vision, Proc. CVPR ‘88 (IEEE Computer SocietyPress, 1988) 820-825.

[3] Dickmanns E D, Mysliwetz B, and Christians T, An Inte-grated Spatio-Temporal Approach to Automatic VisualGuidance of Autonomous Vehicles, IEEE Transactions onSystems, Man, and Cybernetics, 20(6) (1988) 1273-1284.

[4] Dickmanns E D, and Schell F R, Autonomous Landing ofAirplanes by Dynamic Machine Vision, Proc. WACV (IEEEComputer Society Press, 1992) 172-179.

[5] Gilg, A and Schmidt G, Landmark-Oriented Visual Naviga-tion of a Mobile Robot, IEEE Transactions on IndustrialElectronics, 41(4) (1994) 392-397.

[6] Harris C, Tracking with Rigid Models, in: Blake A, YuilleA, ed., Active Vision, (MIT Press, Cambridge, 1992) 59-73.

[7] Khan G N, and Gillies D F, Vision based navigation systemfor an endoscope, Image and Vision Computing, 14(Elsevier Science, 1996) 763-772.

[8] Kuno Y, Okamoto Y, and Okada S, Robot Vision Using aFeature Search Strategy Generated from a 3-D ObjectModel, IEEE PAMI, 13(10) (1991) 1085-1097.

[9] Lowe D G, Three-Dimensional Object Recognition fromSingle Two-Dimensional Images, Artificial Intelligence, 31(1987) 355-395.

[10] Lowe D G, Robust Model-based Motion Tracking Throughthe Integration of Search and Estimation., InternationalJournal of Computer Vision, 8(2) (Kluwer Academic Pub-lishers, 1992) 113-122.

[11] Mortenson M E, Geometric Modelling, (John Wiley & Sons,New York, 1985).

[12] Phong T Q, Horaud R, Yassine A, Pham D T, Optimal Esti-mation of Object Pose from a Single Perspective View,Proc. Fourth ICCV (1993) 534-539.

[13] Puskorius G V, and Feldkamp L A, Camera CalibrationMethodology Based on a Linear Perspective TransformationError Model, Proc. IEEE International Conference onRobotics and Automation (IEEE Computer Society Press,1988) 1858-1860.

[14] Schroeder W, Martin K, Lorensen B, The VisualizationToolkit: An Object Oriented Approach to 3D Graphics(Prentice-Hall, 1996).

[15] Shakunaga T, Robust Line-Based Pose Enumeration FromA Single Image Proc. Fourth ICCV (1993) 545-550.

[16] Viola P, and Wells W M III, Alignment by Maximization ofMutual Information, Proc. Fifth ICCV (1995) 16-23.

9

CAD Model

Graphics Hardware

Edge Detection

3D Feature Extraction

Figure 1. Feature extraction process.

Sample Points

Rendered Images

Edgel Chains

Resulting Features

1.0

-1.0

GradientDirection

0

Figure 2. Step function feature template.

1.0 1.0

1.01.0

-1.0 -1.0

-1.0-1.0

Edgel

pixels

Inte

nsity

Position

Gradient Direction of the Template

Tang

entia

lD

irect

ion

ofth

e Te

mpl

ate

Template

Figure 3. A simple example of matching afeature to an image.

TangentDirection

GradientDirection

Optimal Location

Feature Location

10

Figure 6. Results from test sequences two and three. The horizontal axis is the frame numberof the test run. The positional error is the probable error in centimeters for the borescope’sposition as determined by the singular value decomposition. The rotational error is the totalprobable rotational error for the three axes as determined by the singular value decomposi-tion. The confidence is the average correlation of the projected features to the image. (Notethat the graphs have different scales)

11

Figure 4. Computer rendered images (partial viewsfrom inside) of two CAD models with extracted 3Dfeature locations overlaid as small white spheres.The top image is from a F110-A exhaust duct and thebottom is from a test phantom

Figure 5. Two sample images from a WelchAllyn borescope placed inside two differentobjects. The top image is from a F110-A exhaustduct and the bottom is from a test phantom.

12

Figure 7. CAD model overlaid onto borescope video.

real time tracking of borescope tip posestewart/papers/martin_ivc.pdftraditional, non-video...

Documents