a multi-image shape-from-shading framework for near

18
Int J Comput Vis DOI 10.1007/s11263-009-0207-3 A Multi-Image Shape-from-Shading Framework for Near-Lighting Perspective Endoscopes Chenyu Wu · Srinivasa G. Narasimhan · Branislav Jaramaz Received: 10 February 2008 / Accepted: 6 January 2009 © Springer Science+Business Media, LLC 2009 Abstract This article formulates a near-lighting shape- from-shading problem with a pinhole camera (perspective projection) and presents a solution to reconstruct the Lam- bertian surface of bones using a sequence of overlapped en- doscopic images, with partial boundaries in each image. First we extend the shape-from-shading problem to deal with perspective projection and near point light sources that are not co-located with the camera center. Secondly we propose a multi-image framework which can align partial shapes obtained from different images in the world coordi- nates by tracking the endoscope. An iterative closest point (ICP) algorithm is used to improve the matching and recover complete occluding boundaries of the bone. Finally, a com- plete and consistent shape is obtained by simultaneously re- growing the surface normals and depths in all views. In order to fulfill our shape-from-shading algorithm, we also cali- brate both geometry and photometry for an oblique-viewing endoscope that are not well addressed before in the previous literatures. We demonstrate the accuracy of our technique using simulations and experiments with artificial bones. Keywords Multi-image · Shape-from-shading · Near-lighting · Perspective projection · Calibration · Endoscope · Bone C. Wu ( ) · S.G. Narasimhan · B. Jaramaz Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213, USA e-mail: [email protected] S.G. Narasimhan e-mail: [email protected] B. Jaramaz Institute of Computer Assisted Surgery, 3380 Boulevard of the Allies, Suite 270, Pittsburgh, PA 15213, USA e-mail: [email protected] 1 Introduction One of the main goals of orthopedic surgery is to enable minimally invasive procedures. As a key tool, endoscope is attracting increasing attention for its potential role in com- puter aided surgeries. Besides visualizing the interior of the anatomy, its capability can be significantly enhanced by tracking the endoscope in conjunction with surgical navi- gation systems. Figure 1 shows an endoscope illuminating and observ- ing an artificial spine. The endoscope uses one or more point light sources placed at its tip for illumination. Due to the small distance between the sources, camera and the scene, the endoscopic images are very different from the images we experience of natural scenes under distant light- ing such as from the sun or the sky. Since the field of view of the endoscope is small and the bone is usually a few millimeters away, we only observe a small part of the bone. As a result, it can be difficult even for a skilled surgeon to infer bone shape from a single endoscopic im- age. Better visualizations are achieved by overlaying endo- scopic images with 3D surfaces obtained from CT (Com- puted Tomography) scanned images (Clarkson et al. 1999; Dey et al. 2002). However, this technique requires us to solve a complex registration problem between the CT and endoscopic images during the surgery. Thus, there is an im- mediate need for explicit computer reconstruction of bone shapes from endoscopic images. Since bone surfaces have few identifiable features, sur- face shading is the primary cue for shape. Shape-from- shading has a long and rich history in computer vision and biological perception (Zhang et al. 1999; Durou et al. 2008), with most of the work focusing on distant lighting, orthographic projection and Lambertian surfaces (Horn and Brooks 1989; Kozera 1998; Zhang et al. 1999). Few papers

Upload: others

Post on 12-Apr-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Multi-Image Shape-from-Shading Framework for Near

Int J Comput VisDOI 10.1007/s11263-009-0207-3

A Multi-Image Shape-from-Shading Frameworkfor Near-Lighting Perspective Endoscopes

Chenyu Wu · Srinivasa G. Narasimhan ·Branislav Jaramaz

Received: 10 February 2008 / Accepted: 6 January 2009© Springer Science+Business Media, LLC 2009

Abstract This article formulates a near-lighting shape-from-shading problem with a pinhole camera (perspectiveprojection) and presents a solution to reconstruct the Lam-bertian surface of bones using a sequence of overlapped en-doscopic images, with partial boundaries in each image.First we extend the shape-from-shading problem to dealwith perspective projection and near point light sources thatare not co-located with the camera center. Secondly wepropose a multi-image framework which can align partialshapes obtained from different images in the world coordi-nates by tracking the endoscope. An iterative closest point(ICP) algorithm is used to improve the matching and recovercomplete occluding boundaries of the bone. Finally, a com-plete and consistent shape is obtained by simultaneously re-growing the surface normals and depths in all views. In orderto fulfill our shape-from-shading algorithm, we also cali-brate both geometry and photometry for an oblique-viewingendoscope that are not well addressed before in the previousliteratures. We demonstrate the accuracy of our techniqueusing simulations and experiments with artificial bones.

Keywords Multi-image · Shape-from-shading ·Near-lighting · Perspective projection · Calibration ·Endoscope · Bone

C. Wu (�) · S.G. Narasimhan · B. JaramazCarnegie Mellon University, 5000 Forbes Avenue,Pittsburgh, PA 15213, USAe-mail: [email protected]

S.G. Narasimhane-mail: [email protected]

B. JaramazInstitute of Computer Assisted Surgery, 3380 Boulevard of theAllies, Suite 270, Pittsburgh, PA 15213, USAe-mail: [email protected]

1 Introduction

One of the main goals of orthopedic surgery is to enableminimally invasive procedures. As a key tool, endoscope isattracting increasing attention for its potential role in com-puter aided surgeries. Besides visualizing the interior of theanatomy, its capability can be significantly enhanced bytracking the endoscope in conjunction with surgical navi-gation systems.

Figure 1 shows an endoscope illuminating and observ-ing an artificial spine. The endoscope uses one or morepoint light sources placed at its tip for illumination. Dueto the small distance between the sources, camera and thescene, the endoscopic images are very different from theimages we experience of natural scenes under distant light-ing such as from the sun or the sky. Since the field ofview of the endoscope is small and the bone is usuallya few millimeters away, we only observe a small part ofthe bone. As a result, it can be difficult even for a skilledsurgeon to infer bone shape from a single endoscopic im-age. Better visualizations are achieved by overlaying endo-scopic images with 3D surfaces obtained from CT (Com-puted Tomography) scanned images (Clarkson et al. 1999;Dey et al. 2002). However, this technique requires us tosolve a complex registration problem between the CT andendoscopic images during the surgery. Thus, there is an im-mediate need for explicit computer reconstruction of boneshapes from endoscopic images.

Since bone surfaces have few identifiable features, sur-face shading is the primary cue for shape. Shape-from-shading has a long and rich history in computer visionand biological perception (Zhang et al. 1999; Durou et al.2008), with most of the work focusing on distant lighting,orthographic projection and Lambertian surfaces (Horn andBrooks 1989; Kozera 1998; Zhang et al. 1999). Few papers

Page 2: A Multi-Image Shape-from-Shading Framework for Near

Int J Comput Vis

Fig. 1 Illustration of anendoscope (a) and endoscopicimages of an artificial spine (b,c, d) (It is difficult to perceivethe shape of the object in theimages because of small field ofview, distortion, lack of texture,partial boundaries, and shadingcaused by near-field lighting.)

have considered the more realistic scenarios. Real materi-als can be non-Lambertian model (Lee and Kuo 1997). Witha pin-hole camera, the projection is perspective instead oforthographic (Penna 1989; Lee and Kuo 1994; Hasegawaand Tozzi 1996; Okatani and Deguchi 1997; Samaras andMetaxas 1999; Tankus et al. 2004). Considering the spe-cial imaging devices such as endoscopes, the light sourcesare not located at infinity (Okatani and Deguchi 1997;Prados and Faugeras 2005), thus 1/r2 attenuation term ofthe illumination need to be taken into account (Prados andFaugeras 2005). Three papers have established the equationsfor the perspective projection (Prados and Faugeras 2003;Tankus et al. 2003; Courteille et al. 2004). A variety of math-ematical tools have been utilized to solve the perspectiveshape from shading (PSFS) problem (Kimmel and Sethian2001; Courteille et al. 2004; Tankus et al. 2004, 2005; Pra-dos and Faugeras 2005). Most relevant to this work, shape-from-shading under a near point light source and perspectiveprojection for an endoscope has been addressed in Okataniand Deguchi (1997), Forster and Tozzi (2000), Mourgueset al. (2001) and Tankus et al. (2004). But all these worksassume the light source and camera center are co-located.Note that this assumption is inaccurate in our setting sincethe scene is 5–10 mm away from the endoscope and thesource and camera separation distance (3.5 mm) is of thesame magnitude as the distance to the scene. In this pa-per we consider the perspective projection and near-pointlighting without assuming that the light source is located atthe center of the projection. Other works like photometricstereo can be used to estimate surface orientation (Wood-ham 1980; Kozera 1991, 1992) but it requires the switch oftwo light sources. However it is hard to achieve with a con-ventional endoscope since the light sources are designed tobe switched on or off at the same time. Structured light hasalso been used to recover the bone shape (Fuchs et al. 1998;Iyengar et al. 2001) but the system requires specialized se-tups that are expensive and cumbersome to use in the oper-ating room.

Due to the small field of view of the endoscope, only apartial shape from a single image can be obtained. By cap-turing image sequences as the endoscope is moved, it is pos-sible to cover a larger part of the shape that can be more eas-ily perceived (Seshamani et al. 2006). Once again, there is along and rich history of shape-from-motion in vision (bothcalibrated and uncalibrated) (Poelman and Kanade 1997;Pollefeys et al. 1999; Hartley and Zisserman 2004). In thecontext of anatomical reconstruction, the heart coronary(Mourgues et al. 2001) and tissue phantom (Stoyanov etal. 2005) have been reconstructed using motion cues (im-age features and correspondences). However, it is difficultto establish correspondences for textureless (or featureless)bones.

Neither shape-from-shading nor shape-from-motion canindividually solve the problem of bone reconstruction fromendoscopic images. The key idea in this work is to utilizeboth strengths to develop a global shape-from-shading ap-proach using multiple partial views. There are two contribu-tions in this paper.

1. We establish the reflectance function R for a perspectiveprojection and near-point light sources, without assum-ing that the light sources are located at the center of theprojection. We solve an image irradiance equation basedon this reflectance function. In our near-lighting perspec-tive SFS problem, we explicitly include depth z in the re-flectance function R, image irradiance equation and lateras an unknown in the minimization function. In order tocompute the reflectance map, we also calibrate the pho-tometry of endoscope including camera response func-tion, light source intensity and spatial distribution in ad-vance.

2. We present a multi-image shape-from-shading frame-work. Given the global boundary constraints obtainedfrom aligned individual contours in the world coordi-nates, by tracking the endoscope when capturing a se-quence of images. As a clarification, note that our globalshape-from-shading framework is not really merging

Page 3: A Multi-Image Shape-from-Shading Framework for Near

Int J Comput Vis

Fig. 2 Perspective projection model for endoscope imaging systemwith two near point light sources: �O is the camera projection center.�s1 and �s2 are two light sources. We assume the plane consisting of �O ,�s1 and �s2 is parallel to the image plane. The camera coordinate system(X–Y–Z) is centered at �O and Z-axis is parallel to the optical axis andpointing toward the image plane. X-axis and Y-axis are parallel to the

image plane. F is the focal length. a and b are two parameters related tothe position of the light sources. Given a scene point �P = (x, y, z), theprojected image pixel is �p = (x, y,F ), where (x, y) are image coordi-nates. Assuming a Lambertian surface, the surface illumination there-fore depends on the surface albedo, light source intensity and fall-off,and the angle between the normal and light rays

shape-from-shading and shape-from-motion in the clas-sical sense of the terms. We perform the shape-from-shading for all images simultaneously and use globalboundary constraints in each iteration. Meanwhile, in or-der to align the individual shapes, we calibrate the geo-metric parameters of endoscopes as well.

2 Shape-from-Shading under Near Point Sourcesand Perspective Projection

Given the special design of endoscopes, we formulate theshape-from-shading under near point lighting and a perspec-tive projection, where the light sources are not located atthe projection center. As Fig. 2 shows, given two point lightsources �s1 and �s2, we can compute the scene radiance emit-ted by the surface point �P according to the Lambertian co-sine law and inverse square distance fall-off law of isotropicpoint sources (Horn and Sjoberg 1979):

R = I0ρ

(cos θ1

r21

+ cos θ2

r22

)(1)

where I0 is the light intensity of �s1 and �s2, ρ is the surfacealbedo. We use the unit vector �n to represent the surfacenormal at �P , �l1 and �l2 are two light rays incident at �P , and r1

and r2 are the distances from each light source to the surface,and we have

cos θ1 = �n · �l1‖�l1‖

, �l1 = �s1 − �P , r1 = ‖�l1‖

cos θ2 = �n · �l2‖�l2‖

, �l2 = �s2 − �P , r2 = ‖�l2‖(2)

According to Horn and Brooks (1989), surface normal �n canbe represented in terms of partial derivatives of depth z w.r.t.x and y ((x, y, z) are camera coordinates):

�n =[− ∂z

∂x,− ∂z

∂y,1

]/√(∂z

∂x

)2

+(

∂z

∂y

)2

+ 1 (3)

Note that for the orthographic projection ∂z/∂x and ∂z/∂y

are used to represent surface normals, where (x, y) are im-age coordinates. Under the perspective projection we have

x = xz

F

y = yz

F

(4)

where F is the focal length and z < 0. We take the derivativeof both sides of (4) w.r.t. x and y and obtain

∂z

∂x= 1

x

(F − z

∂x/∂x

)

∂z

∂y= 1

y

(F − z

∂y/∂y

) (5)

Page 4: A Multi-Image Shape-from-Shading Framework for Near

Int J Comput Vis

We take the derivative of both sides of (4) w.r.t. x and y andobtain

∂x

∂x= 1

F

(z + x

∂z

∂x

)

∂y

∂y= 1

F

(z + y

∂z

∂y

) (6)

Let p = ∂z/∂x and q = ∂z/∂y, from (5) and (6) we have

∂z

∂x= Fp

z + xp

∂z

∂y= Fq

z + yq

(7)

Given two light sources s1 = [−a, b,0] and s2 = [a, b,0](calibration of a, b, F will be discussed later in Sect. 3.1),we can explicitly write light source vectors as follows:

�l1 =[−a − x

z

F, b − y

z

F,−z

]

�l2 =[a − x

z

F, b − y

z

F,−z

] (8)

Combine (1, 2, 3, 7) and (8), we obtain the reflectancemap R as a function of x, y, z,p, q:

R(x, y, z,p, q) = I0ρ

(�n(x, y, z,p, q) · �l1(x, y, z)

r1(x, y, z)3+ �n(x, y, z,p, q) · �l2(x, y, z)

r2(x, y, z)3

)

= I0ρ

⎛⎜⎝ p(z + yq)(a + x z

F) − q(z + xp)(b − y z

F) − z

F(z + xp)(z + yq)√

p2(z + yq)2 + q2(z + xp)2 + (z+xp)2(z+yq)2

F 2 ·(√

(−a − x zF

)2 + (b − y zF

)2 + (z)2)3

+ −p(z + yq)(a − x zF

) − q(z + xp)(b − y zF

) − zF

(z + xp)(z + yq)√p2(z + yq)2 + q2(z + xp)2 + (z+xp)2(z+yq)2

F 2 ·(√

(a − x zF

)2 + (b − y zF

)2 + (z)2)3

⎞⎟⎠ (9)

Equation (9) is similar to (2) in Tankus et al. (2004) and(2) in Prados and Faugeras (2003), but we consider the 1/r2

fall-off since the light sources are very close to the bone sur-face. Equation (9) can be easily extended to multiple pointlight sources as follows (M is the number of sources):

R(x, y, z,p, q) = I0ρ

M∑i

( �n(x, y, z,p, q) · �li (x, y, z)

ri(x, y, z)3

)

(10)

2.1 Solve Image Irradiance Equation

Given the image irradiance function I (x, y), the image irra-diance equation (Horn and Brooks 1989) is

R(x, y, z,p, q) = I (x, y) (11)

According to Durou et al. (2008), different mathematicalmethods have been proposed to solve the equation for PSFS,including methods of resolution of PDEs (Tankus et al.2004, 2005; Kimmel and Sethian 2001; Prados and Faugeras2005) and methods using optimization (Courteille et al.2004). The optimization methods based on the variationalapproach can work in most general case of PSFS and thereis no need to set values to the singular and local minimum

points (Durou et al. 2008), so we adopt the optimizationmethod and plug it in the multi-image framework describedlater in Sect. 3. Specifically, we solve (11) by minimizingthe error between the image irradiance I (x, y) and the re-flectance map R(x, y, z,p, q) given in (9). Image irradianceis obtained from the image intensity using the camera re-sponse function (see details in Sect. 2.2). Different from theprevious optimization methods (Ikeuchi and Horn 1981),depth z is explicitly included in R. The error can be com-puted as:

e(z,p, q) = λei(z,p, q) + (1 − λ)es(z,p, q) (12)

and

ei(z,p, q)

=∫ ∫

image

[I (x, y) − R(x, y, z,p, q)]2dxdy

es(z,p, q)

=∫ ∫

image

[(z2x + z2

y ) + (p2x + p2

y ) + (q2x + q2

y )]dxdy

(13)

where ei(z,p, q) is the irradiance error and es(z,p, q) is thesmoothness constraint for z, p and q . λ is a Lagrange multi-plier. We find a solution to (z,p, q) by minimizing the errore(z,p, q) (Leclerc and Bobick 1991):

Page 5: A Multi-Image Shape-from-Shading Framework for Near

Int J Comput Vis

[z∗,p∗, q∗] = argz,p,q

min(λei + (1 − λ)es) (14)

Similar to Horn and Brooks (1986), we discretize (12) andobtain:

e(zk,l, pk,l, qk,l) =∑

k

∑l

(λeik,l + (1 − λ)esk,l) (15)

The solution to (15) is to find [p∗k,l , q

∗k,l , z

∗k,l] that mini-

mize e:

∂e∂pk,l

∣∣∣p∗

k,l

= 0,∂e

∂qk,l

∣∣∣q∗k,l

= 0,∂e

∂zk,l

∣∣∣z∗k,l

= 0 (16)

Combing (15) to (16), yields the update functions for pk,l ,qk,l and zk,l in each iteration (Horn and Brooks 1986):

pn+1k,l = pn

k,l

+ λ

4(1 − λ)[Ik,l − R(k, l, zn

k,l , pnk,l , q

nk,l)]

∂R

∂pk,l

∣∣∣pn

k,l

qn+1k,l = qn

k,l

+ λ

4(1 − λ)[Ik,l − R(k, l, zn

k,l , pnk,l , q

nk,l)]

∂R

∂qk,l

∣∣∣qnk,l

zn+1k,l = zn

k,l

+ λ

4(1 − λ)[Ik,l − R(k, l, zn

k,l , pnk,l , q

nk,l)]

∂R

∂zk,l

∣∣∣znk,l

(17)

pnk,l , qn

k,l and znk,l are local 8-neighborhood mean of values

around pixel position (k, l). Detailed derivation can be foundin Appendix A. The value for the (n + 1)th iteration can beestimated from the nth iteration. During the iteration, theLagrange multiplier λ is gradually increased such that thesmoothness constraint is reduced as well. In our experimentthe setting of Lagrance multiplier is empirical. We set it to0.005 at the beginning and increase it by 0.02 whenever theerror in (15) is reduced by 1%.

Given initial values of p, q and z on the boundary, wecan compute a numerical solution to the shape. Variationalmethods usually rely on good initial guesses, so we man-ually label the boundaries of each images. After that, z isset to 0 for each pixel in the image. p and q are set to 0for pixels that are not on the occluding boundaries. p and q

at each pixel on the occluding boundaries are computed byperforming the cross product of the viewing vector (startingfrom the optical center) and edge vector. Since it is difficultto estimate the initial values of z, we recompute z valuesby integrating the normals using the method in Frankot andChellappa (1988), after several iterations when the error in(15) is reduced by 10%. We keep performing this adjustmentuntil the algorithm converges. Results for real endoscopicimages are showed in Fig. 3. It is still challenge to recoverthe entire shape of the spine from the individual reconstruc-tions due to the small field of view.

2.2 Photometric Calibration of Endoscope for ImageIrradiance

In order to compute the image irradiance I (x, y), we need tocalibrate the photometry of the endoscope, which is seldomaddressed in the literature. We propose a method to com-pute the radiometric response function of the endoscope,and also simultaneously calibrate the directional/spatial in-tensity distribution of the near light sources (the camera andthe sources are “on/off” at the same time) and light intensi-ties, inspired by Litvinov et al.’s (2005) work.

The image irradiance I (x, y) is related to the image in-tensity or gray level v via the camera response functionH(·):

I (x, y) = H−1[v(x, y)]M(x, y)

(18)

where, M(x, y) represents the anisotropy of the source in-tensity. The two sources are identical and are oriented in thesame way so we assume their intensity distributions M(x, y)

are the same. From (1, 2, 11) and (18) we have:

H−1[v(x, y)] = ρ · I0 · M(x, y) where

M(x, y) = M(x, y) ·( �n · �l1

r31

+ �n · �l2r3

2

)(19)

For calibration, we use a Macbeth color chart with theknown albedo for each patch. We capture a set of imagesby varying the source intensity for each patch (the exposuretime of the camera cannot be controlled in our endoscope).We apply log to both sides of (19) to obtain a linear system:

h[vji (x, y)] = ηi + γj + m(x, y) (20)

where, i indicates the surface albedo and j indexes the lightintensity. h[vj

i (x, y)] = log{H−1[vji (x, y)]}, ηi = log(ρi),

γj = log(I0j ) and m(x, y) = log[M(x, y)]. The unknowns(h(·), γj , m(x, y)) can be estimated by solving this lin-ear system of equations. The cosine term is then estimatedby physically measuring the distance to the chart from thescope tip and finally M(x, y) is recovered. Details of solu-tions to h(·), γj and m(x, y) are given in Appendix B.

A series of images of color chart are used for photo-metric calibration. We use 6 different levels of light inten-sity (see Appendix B). Figure 11(a), (b) and (c) show thecamera response functions with Red, Green, Blue channels.Figure 11(d) shows the recovered light intensities in differ-ent levels and a comparison to the ground truth. Smallernumbers in x-axis correspond to higher intensities. We ob-serve a small amount of variation when the light inten-sity is high, which can be caused by the saturation. Fig-ure 11(e) shows the original image and Fig. 11(f) shows

M(x, y). Figure 11(g) shows the cosine term �n· �l1r31

+ �n· �l2r32

and

Page 6: A Multi-Image Shape-from-Shading Framework for Near

Int J Comput Vis

Fig. 3 Results of shape fromshading from single image.(a) Input image. (b) Shape fromshading. (1)–(5) are capturedfrom different positions

Fig. 11(h) shows the spacial distribution function M(x, y) =M(x, y)/( �n· �l1

r31

+ �n· �l2r32

).

3 Global Shape-from-Shading Using Multiple PartialViews

As shown in Fig. 3, most images include a small fractionof visible contours. How do we estimate the complete shapereliably? An intuitive solution is to merge individual shapesrecovered from different views as shown in Fig. 4.

However, each image is defined in a view dependent localcoordinates. We must first transform them to the world coor-dinates before we combine individual shapes. We solve thisproblem by using a navigation tracking system. We attachoptical markers to the endoscope (as showed in Fig. 10) andtrack them to obtain the transformation between the localand world coordinates for each image.

Consider a 3D-point set [Si]3×Nireconstructed from the

image i, where Ni is the number of points in shape Si . Ifall the shapes are reconstructed correctly, and without anyerrors from the tracking system and the calibration process,

Page 7: A Multi-Image Shape-from-Shading Framework for Near

Int J Comput Vis

Fig. 4 Simulation results ofshape from shading frommultiple views.(a)–(d) Synthesized images ofdifferent parts of a sphere.(e) Reconstructed sphere

they should be perfectly aligned in the world coordinates.Let wTmi

denote the transformation from the marker at theposition taking ith image to the world coordinates, and M

denote the transformation from the marker coordinates tothe camera coordinates (image distortion has been correctedin advance). Thus, any 3D-point set Si in the local cameracoordinates can be transformed to the world coordinates Si

w

by:

Siw = wTmi

· [M−1 · Si;11×Ni] (21)

where, Siw is represented using homogenous coordinates in

the world reference frame. As shown in Fig. 5, each shapefrom single image is transformed to the world coordinates.However, due to the tracking error (less than 0.3 mm) andcalibration error (less than 6 pixels), Si

w are not initially wellmatched (e.g. Fig. 5(c) and (d)). Since the bone is rigid, weuse ICP (Besl and McKay 1992) to improve the matchingbetween different views by ensuring overlapping fields ofview in adjacent images. Then, for Si

w , ICP yields an alignedshape i−1Si

w with respect to the adjacent shape Si−1w :

i−1Siw = i−1Ti ∗ Si

w (22)

where, i−1Ti is the rigid transformation computed from ICP.Using ICP, we can align all the shapes from one view

with respect to the shape in the reference view, i.e. the shapeS1

w in the first image. Then, by combining (21) and (22), wehave

1Siw =

i−1∏j=1

jTj+1 ·wTmi· [M−1 · Si;11×Ni

] (23)

where 1Siw is the aligned shape of Si

w with respect to thereference shape S1

w , as shown in Fig. 6(a) and (b).Once the contours are aligned, we “re-grow” the shape in

the local views but under the global constraints. We computethe average value in (17) using the neighbors in the worldcoordinates. The key idea is to keep the global constraintsupdated according to the shape developed in the local view,

Table 1 Global shape-from-shading: key steps

* Loop k = 1: number of images

(1) Compute SFS from single image using (17)

(2) Track endoscope motions

(3) Transform local views to the world coordinates using (21)

(4) Register each shape to the previous shape using ICP using (22)

End loop

* Compute global boundaries using (23)

* Compute global constraints (z,p, q) in the world coordinates

* Loop until converge

(1) Loop k = 1: number of images

(a) Update (z,p, q) in the local views according to the global

constraints

(b) Compute new (z,p, q) by using (17)

End loop

(2) Update (z,p, q) in the world coordinates

End loop

and therefore SFS in the local views are guided towards con-vergence under global constraints.

We restart the SFS in the local coordinates simultane-ously for all images. After initializing (z,p, q) in each im-age, we initialize the values on the corresponding globalboundaries. If there are two or more points from differentviews that are mapped to the same point in the world co-ordinates, (z,p, q) at that point are computed by the aver-age. In each iteration, we grow each image and update theglobal constraints. Compared with the constraints for sin-gle shape-from-shading (see Fig. 5(a)), we can obtain moreinformation from global alignments (see Fig. 6(c)). At theend of each iteration, we update the local (z,p, q) accord-ing to the global constraints. We proceed this until the con-vergence. With global constraints, our algorithm convergesquickly though not in real-time (around 5 minutes for 18 im-ages in Matlab with P4 2.4 Ghz CPU). Table 1 lists the keysteps for global shape from shading.

For comparison, we choose only the points that are on thesurface of the spine. We use a laser range scanner to obtain

Page 8: A Multi-Image Shape-from-Shading Framework for Near

Int J Comput Vis

Fig. 5 Illustration of theproblem about directly mergingthe individual shapes in theworld coordinates. 18 imagesare captured by moving theendoscope horizontally (onlytranslation). Four of them areshown as an illustration.(a) After removing thedistortion and illuminationeffects, the boundaries in eachimage are labelled by hand, andthe initial (p, q) are computedautomatically on the boundaries.(b) Shape from each singleimage are reconstructed usingthe method described in Sect. 2.(c) Unaligned shapes in theworld coordinates.(d) Unaligned 3D contours inthe world coordinates

a ground truth shape of the spine. Then, the maximum, min-imum, mean and RMS errors are 2.8 mm, 0.0 mm, 1.24 mmand 1.45 mm respectively. This level of accuracy is practicalfor surgery. More views of the reconstructed surface againstthe ground truth are shown in Fig. 7. We also compare ourresult with the result under the orthographic projection as-sumption in Fig. 8. Compared with the real shape capturedby a regular camera, we can tell that the shape under theorthographic projection is over grown near the boundary.

3.1 Geometric Calibration of Endoscope for AligningMulti-Views

In order to align the individual shapes, we need to know thecamera parameters. Oblique-viewing endoscopes (obliquescope) are widely used in computer assisted surgeries. View-ing direction of an oblique scope can be changed by rotat-ing the scope cylinder, which extends the field of view, butmakes the scope geometric calibration process more diffi-cult. An oblique endoscope is illustrated in Fig. 9.

Page 9: A Multi-Image Shape-from-Shading Framework for Near

Int J Comput Vis

Fig. 6 Illustration of themulti-imageshape-from-shading.(a) Aligned shape in the worldcoordinates. (b) Aligned 3Dcontours in the worldcoordinates. (c) Projection ofthe global constraints onto eachimage. (d) Final shape arereconstructed using the methoddescribed in Sect. 3

Unlike regular cameras with the forward viewing di-

rection, most current endoscopes are designed for oblique

viewing with a tilt from the scope axis at the tip (see

Fig. 9(a)). The viewing direction can be changed by simply

rotating the scope cylinder without having to move it. To

increase flexibility of imaging in terms of focus and resolu-tion, the lens system (in the cylinder) and the CCD plane (inthe camera head) can be rotated with respect to one another(see Figs. 9 and 10(a)). While they are desirable, these prop-erties make the geometric calibration harder than in regularcameras.

Page 10: A Multi-Image Shape-from-Shading Framework for Near

Int J Comput Vis

Fig. 7 (Color online) Differentviews of the reconstructedsurface (yellow surface) againstthe ground truth (red points).(a) View from the top, (b) viewfrom the bottom, (c) view fromthe left side, (d) view from theright side

Fig. 8 (a) Real shape captured by a regular camera. (b) Reconstructedshape under orthographic projection. (c) Reconstructed shape underperspective projection

Yamaguchi et al. first modelled and calibrated the obliquescope (Yamaguchi et al. 2004). They formulated the rotationparameter of the scope cylinder as another external parame-ters in Tsai’s camera model (Tsai 1987; Zhang 1998). Theyused two extra transformations to compensate the rotation θ

of the lens system and still of the camera head. Yamaguchiet al.’s camera model successfully compensates the rotationeffect but their method requires five additional parametersand the model is complicated.

Yamaguchi et al.’s camera model is based on Tsai’smodel:

λ �μi = AcTm(θ)mTw�Pw

cTm(θ) = TR(−θ; �lh(θ))TR(θ; �ls)cTm(0) (24)

where λ is an arbitrary scale factor, �Pw is a 3D point in theworld coordinates, �μi is the corresponding 2D image pixel.A is the camera intrinsic matrix. mTw is the rigid transfor-mation from the world coordinates to the optical marker co-ordinates, cTm(θ) is a rigid transformation from the marker(camera head) to the camera coordinates. cTm(θ) is depen-dent on the rotation angle θ . By considering the marker co-ordinates (camera head) as a reference, only the lens systemrotates while the camera head, i.e., the image plane, remainsfixed irrespective of the rotation. They describe such a trans-formation due to the rotation by decomposing the one phys-ical rotation into two mathematical rotations. TR(θ; �ls) is arotation of both scope cylinder and the camera head (imageplane) around the axis of cylinder �ls . TR(−θ; �lh(θ)) is an in-verse rotation of the image plane around the z-axis of lenssystem �lh. Both �ls and �lh have two unknown parameters. Al-though this model works well, it is very complicated.

As Fig. 10(b) shows, in our work, we attach an opticalmarker to the scope cylinder instead. Our model is still anextension of Tsai’s model but much simpler. The geometricmodel illustrated in Fig. 10(a) can be written as:

λ �μ′i = A ·cTm ·mTw · �Pw

�μi = R(θ) · ( �μ′i − cc) + cc

(25)

where �Pw is a 3D point in the world coordinates, �μ′i is the

corresponding 2D image pixel without any rotation, �μi isthe image pixel with rotation θ . mTw is the rigid transfor-mation from world coordinates to the optical marker co-ordinates (scope cylinder). cTm is the rigid transformation

Page 11: A Multi-Image Shape-from-Shading Framework for Near

Int J Comput Vis

Fig. 9 An oblique endoscopeconsists of a scope cylinder witha lens and two point lightsources at the tip (the tip has atilt from the scope cylinder) anda camera head that capturesvideo. The scope cylinder isconnected flexibly to the camerahead via a coupler. The scopecylinder or the camera head canbe rotated together or separately

Fig. 10 The geometric modelof endoscope based on atracking system. A new coupler(see Fig. 9(b)) is designed tomount an optical marker to thescope cylinder which ensuresthat the transformation fromscope cylinder(marker)coordinates O2 to the lenssystem (camera) coordinates O3is fixed. World coordinates O1is defined by the optical tracker.Two optical markers areattached to the coupler andcamera head separately in orderto compute the rotation θ inbetween

from the marker to the camera coordinates and independenton θ . cc is the principal point which is an intrinsic parame-ter. R(θ) represents a rotation of the image plane around cc

by θ . Camera intrinsic matrix A and external matrix cTm canbe calibrated by using Zhang’s (1998) method and mTw canbe obtained directly from the tracking system. In our modelwe only need to estimate the rotation angle.

Rotation angle can be estimated by using a rotary en-coder, as Yamaguchi et al. (2004) suggested. When it is notavailable, the rotation angle can be estimated by using twooptical markers: one attached to the scope cylinder and theother one on the camera head, see details in Appendicies Cand D.

We tested our algorithm using two different setups. In

the first experiment, we tested the system in the controlled

environment. We used Stryker 344-71 arthroscope Vista

(70 degree, 4 mm) oblique-viewing endoscope, DYON-

ICS DyoCamTM 750 video camera, DYONICS DYOBRITE

3000 light source, Polaris (Northern Digital Inc., On-

tario, Canada) optical tracker. In the second experiment we

tested it in the operating room. We used Smith & Nephew

video arthroscope—autoclavable SN-OH 272589 (30 de-

gree, 4 mm), DYONICS video camera and light source, OP-

TOTRAK (Northern Digital Inc., Ontario, Canada) optical

tracker.

Page 12: A Multi-Image Shape-from-Shading Framework for Near

Int J Comput Vis

For each rotation angle of the endoscope, we computedthe average back projection error for this angle

ε(θ) = 1

M

M∑i=1

| �μi − �μ( �Pi, θ)| (26)

where �Pi is a 3D point in the world coordinates, �μi is thecorresponding 2D image pixel, �μ( �Pi, θ) is the back pro-jected 2D image pixel of �Pi computed by using (25). M isthe number of corners on the calibration pattern. We haveused different grid patterns (3 × 4, 4 × 5, 5 × 6, 6 × 7. Thesize of each checker is 2 mm × 2 mm). In order to obtainenough light on the grid pattern, the endoscope needs to beplaced very close to the target (usually 5–15 mm). Smallergrid cannot capture the radial distortion but the bigger gridwill exceed the field of view. The 5 × 6 grid gave the bestresults.

We did many trials by moving and rotating the endoscoperandomly and estimate θ simultaneously. The average backprojection error with respect to different rotation angles areshown in Fig. 14 (see Appendix D). Figure 14(a) showsthe result using Stryker 344-71 arthroscope Vista (70 de-gree, 4 mm) and Polaris optical tracker. Figure 14(b) showsthe result using Smith & Nephew video arthroscope—autoclavable SN-OH 272589 (30 degree, 4 mm) and OPTO-TRAK optical tracker. The red curve (square mark) repre-sents the back projection error without taking into account ofthe rotation angle, and the blue curve (circular mark) showsthe error with considering the rotation angle. The resultsshow that including the rotation angle into the camera modelsignificantly improve the accuracy of the calibration.

Figure 14 shows that two experiments have different ac-curacy. The reason is that endoscopes have different mag-nification and optical trackers have different accuracy (ac-cording to the manufacturer, RMS error is 0.1 mm forOPTOTRAK and 0.3 mm for Polaris). Yamaguchi et al.(2004) used an OTV-S5C laparoscope (Olympus Optical Co.Ltd., Tokyo, Japan) and Polaris optical tracker. They haveachieved a high accuracy of less than 5 mm back projec-tion error when the rotation angle is within 140 degrees.Our results show that we can achieve the same level of ac-curacy when the rotation angle is within 75 degrees. Beyondthis range, due to the bigger magnification, larger radial dis-tortion and poorer lighting, the back projection error is in-creased to 13 mm when the rotation angle is 100 degrees.When given the same quality endoscopes, we should be ableto achieve the same level of accuracy.

4 Conclusion and Discussions

Shape-from-shading and shape-from-motion have been suc-cessfully used in many vision applications, but both have

difficulties for 3D reconstruction from orthopedic endoscopydue to the featureless bone surface and partial occludingboundaries in a small field of view. In this work, we proposea method to combine the strengths of both approaches tosolve our problem: we formulate the shape-from-shading forendoscopes under near point light sources and perspectiveprojection, and develop a multi-image shape-from-shadingframework. To deal with the tracking and calibration errors,we use ICP to improve the alignment of partial shapes andcontours in the world coordinates. As a result, we can re-construct the shape of a larger bone area and provide usefulvisualizations for surgical navigation in the minimally inva-sive procedures.

A couple of issues worth further discussion in our frame-work. We build a near-lighting perspective shape-from-shading (NLPSFS) model without assuming that the lightsources are located in the optical center. Our contribution isto present a new model for medical endoscopes, instead ofa new solution to any existing models. We adopt the vari-ational optimization method since it is effective for mostgeneral cases and no pre-defined values are involved for thesingular and local minimum points. Since our multi-frameapproach is very general, other PSFS solutions such as Pra-dos et al.’s (2005) and Tankus et al.’s (2004) approach canbe employed to solve the NLPSFS problem as well. We planto compare different SFS methods in the future work. In thispaper we focus on the formulation of the NLPSFS modeland multi-image framework.

As mentioned in the previous section, in the real oper-ating environment, the bone surfaces are usually 5–10 mmaway from the endoscope. The light source and camera sep-aration distance is 3.5 mm in our case. These two numbersare in the same magnitude as the distance to the scene. Theapproximation becomes inaccurate if we ignore the parame-ters a and b in (9). However, it is a reasonable approximationwhen the endoscope is far away from the scene. In this paperwe attempt to build a general and accurate model to handlemore realistic cases, since we can calibrate all parametersbeforehand.

One advantage of our method is that the computation ofsingle image SFS in the first step of Table 1 can be paral-lelized since the error function in (15) uses only one sin-gle image. Note in the last step of Table 1, the computationof single image SFS cannot be parallelized because of theglobal constraints need updates for each iteration.

By using ICP in our multi-image framework, we are ableto deal with the tracking and calibration errors. Moreover,we can tolerant some shape errors caused by the single im-age SFS in the first step of Table 1. Therefore we use anearly stop strategy during the single image SFS optimiza-tions by terminating the iterations before the error in (15)reaches zero (we set the threshold to 15% of the initial er-ror). By this means we significantly speed up the algorithm.

Page 13: A Multi-Image Shape-from-Shading Framework for Near

Int J Comput Vis

This is one of the reasons why the result from a single im-age tends to be flat. Other reasons include partial occludingboundaries and the smoothness constraints.

The results from multi-image SFS still lack high fre-quency shape details due to the over smoothing. We areworking on applying shape prior to our multi-image frame-work and will describe the details in the future work. Deal-ing with image noise due to blood and tissues (Kozera andNoakes 2004) will be addressed in the future work as well.

Acknowledgements This work is supported in parts by an NSFGrant #0325920 ITR: Data-driven Human Knee Modeling for ExpertSurgical Planning Systems, an NSF CAREER Award #IIS-0643628, anNSF Award #CCF-0541307 and an ONR Award #N00014-05-1-0188.

Appendix A: Iteration of Solutions to ShapeParameters (z,p,q)

In this appendix we will show the derivation for the updatefunctions in (17). To minimize the error e, we take the deriv-ative of e w.r.t. pk,l , where (k, l) is the position of an imagepixel:

∂e∂pk,l

= ∂

∂pk,l

[(1 − λ)es + λei]

= (1 − λ)∂es

∂pk,l

+ λ∂ei

∂pk,l

(A.1)

where for each pixel (k, l) we have

esk,l = (pk+1,l − pk,l)2 + (pk,l+1 − pk,l)

2

+ (qk+1,l − qk,l)2 + (qk,l+1 − qk,l)

2

+ (zk+1,l − zk,l)2 + (zk,l+1 − zk,l)

2

eik,l = [Ik,l − R(k, l, zk,l , pk,l, qk,l)]2

(A.2)

For smoothness constraints:

∂es

∂pk,l

= ∂esk,l

∂pk,l

+ ∂esk−1,l

∂pk,l

+ ∂esk,l−1

∂pk,l

= 2((pk,l − pk+1,l) + (pk,l − pk,l+1))

+ 2(pk,l − pk−1,l) + 2(pk,l − pk,l−1)

= 8

(pk,l − 1

4(pk+1,l + pk,l+1 + pk−1,l + pk,l−1)

)

= 8(pk,l − pk,l) (A.3)

where pk,l is 4-neighbor average and we extend it to 8 neigh-bors:

pnk,l = 1

5(pn

k,l−1 + pnk,l+1 + pn

k+1,l + pnk−1,l)

+ 1

20(pn

k−1,l−1 + pnk−1,l+1 + pn

k+1,l−1 + pnk+1,l+1)

(A.4)

For brightness error:

∂ei

∂pk,l

= ∂eik,l

∂pk,l

= −2(Ik,l − R(k, l, zk,l , pk,l, qk,l))∂R

∂pk,l

(A.5)

From (A.3) and (A.5) we have

∂e∂pk,l

= 8(1 − λ)(pk,l − pk,l) − 2λ(Ik,l − Rk,l)∂R

∂pk,l

(A.6)

and setting it to zero yields:

pk,l = pk,l + λ

4(1 − λ)(Ik,l − R(k, l, zk,l, pk,l, qk,l))

∂R

∂pk,l

(A.7)

In order to make the algorithm robust we also use the av-erage p, q and z when computing the reflectance mapR(x, y, z,p, q) in (A.7). We use the same method to derivethe update for q and z and finally we have:

pk,l = pk,l

+ λ

4(1 − λ)(Ik,l − R(k, l, zk,l , pk,l , qk,l))

∂R

∂pk,l

∣∣∣pk,l

qk,l = qk,l

+ λ

4(1 − λ)(Ik,l − R(k, l, zk,l , pk,l , qk,l))

∂R

∂qk,l

∣∣∣qk,l

zk,l = zk,l

+ λ

4(1 − λ)(Ik,l − R(k, l, zk,l , pk,l , qk,l))

∂R

∂zk,l

∣∣∣zk,l

(A.8)

Appendix B: Solutions to Photometric Calibration

In this appendix, we will show how we separate and solvethree unknown photometric parameters (response function,light intensity and light distribution).

B.1 Solution to h(·)

Given the same light intensity γj and pixel value v(x, y) buttwo different albedos ηi1 and ηi2 , we have

{h[vj

i1(x, y)] − γj − m(x, y) − ηi1 = 0

h[vji2(x, y)] − γj − m(x, y) − ηi2 = 0

(B.1)

Subtract the first line from the second line of (B.1) we obtain

h[vji2(x, y)] − h[vj

i1x, y)] = ηi2 − ηi1 (B.2)

We use different pixels in the same image (albedo) ordifferent images (albedos) to make many equations as (B.2),

Page 14: A Multi-Image Shape-from-Shading Framework for Near

Int J Comput Vis

as long as we fix the light intensity for each pair of albedos.Since v

ji1(x, y) changes from 0 to 255 (image intensity), we

only need 256 such equations and stack them as:

⎡⎢⎣

· · · 1∗ −1∗ · · ·· · · −1# 1# · · ·

· · · · · ·

⎤⎥⎦ ·

⎡⎢⎢⎢⎣

h(0)

h(1)...

h(255)

⎤⎥⎥⎥⎦

=⎡⎢⎣

ηi2 − ηi1

ηi4 − ηi3...

⎤⎥⎦ (B.3)

where 1∗ and −1∗ correspond to the column vji2(x, y) + 1

and vji1(x, y)+1, respectively. 1# and −1# correspond to the

column vji4(x, y) + 1 and v

ji3(x, y) + 1, respectively. There-

fore h(v) is solved from (B.3) and H(·) = {exp[h(v)]}−1.

B.2 Solution to γj

Given the same albedo ηi and pixel value v(x, y) but twodifferent light intensities γj1 and γj2 we have

{h[vj1

i (x, y)] − γj1 − m(x, y) − ηi = 0

h[vj2i (x, y)] − γj2 − m(x, y) − ηi = 0

(B.4)

Subtract the first line from the second line of (B.4):

h[vj2i (x, y)] − h[vj1

i (x, y)] = γj2 − γj1 (B.5)

We use the minimum light intensity γ1 as a reference, forother light intensities γj , j = 2, . . . ,Nlight , we have

γj = γ1 + {h[vji (x, y)] − h[v1

i (x, y)]} (B.6)

Given estimated h[v(x, y)] and by changing the albedosand pixels, we compute the average value for each γj asfollows and ¯I0j = exp(γ1) · exp(γj ).

γj = γ1 + 1

Nalbedo

· 1

Npixels

×Nalbedo∑

i

Npixels∑x,y

{h[vji (x, y)] − h[v1

i (x, y)]} (B.7)

B.3 Solution to m(x, y)

Again, Given the same albedo ηi and light intensity γj buttwo different pixels (xp, yp) and (xq , yq) we have

{h[vj

i (xp, yp)] − γj − m(xp, yp) − ηi = 0

h[vji (xq , yq)] − γj − m(xq , yq) − ηi = 0

(B.8)

Fig. 11 (Color online) Results of photometric calibration. (a) Cam-era response function in Red channel. Red dots represents the datapoints and magenta line represents the nonlinear fit. x-axis representsthe image irradiance, and y-axis represents the image intensity, samein (b) and (c). When gray level is equal to 0, the image irradianceis restricted to 0 no matter what we compute from the nonlinear fit.(b) Camera response function in Green channel. Green dots representsthe data points and magenta line represents the nonlinear fit. (c) Cam-era response function in blue channel. Blue dots represents the datapoints and magenta line represents the nonlinear fit. (d) Calibrated lightintensity in different level (blue) and ground truth (green). x-axis isthe trail index and y-axis is the relative light intensity. We use level 6as a reference and plot level 1–5 with the small levels correspondingto high light intensities. A bit variation in range of the high intensi-ties may be caused by saturation. (e) Original image of color chart.

(f) M(x, y). (g) Cosine term �n· �l1r31

+ �n· �l2r32

. (h) Spacial distribution func-

tion M(x, y) = M(x, y)/( �n· �l1r31

+ �n· �l2r32

)

Subtract the first line from the second line of (B.8):

h[vji (xq , yq)] − h[vj

i (xp, yp)]= m(xq , yq) − m(xp, yp) (B.9)

For each γj and ηi , instead of using C2Npixels

differentpairs of pixels, we choose only Npixels = 720 × 480 pairs

Page 15: A Multi-Image Shape-from-Shading Framework for Near

Int J Comput Vis

Fig. 12 Illustration of therelationship between therotation angle θ and twomarkers’ coordinates. Marker 1is attached to the scope cylinderand Marker 2 is attached to thecamera head. “A” indicates theposition of Marker 2 whenθ = 0 and “B” indicates theposition of Marker 2 given arotation θ . For any point �Pr inMarker 2’s coordinates, its tracewith the rotation of the camerahead is a circle in Marker 1’scoordinates (It moves fromposition �P A

r to �P Br in the

figure). This circle is also on theplane perpendicular to the axisof scope cylinder. �O indicatesthe center of the circle

and stack the equations like the following:

⎡⎢⎢⎢⎢⎢⎣

1 −1

1 −1

··

−1 1

⎤⎥⎥⎥⎥⎥⎦

︸ ︷︷ ︸

·

⎡⎢⎢⎢⎢⎣

m(x1, y1)

m(x2, y2)

...

m(xNpixels, yNpixels

)

⎤⎥⎥⎥⎥⎦

=

⎡⎢⎢⎢⎢⎢⎣

h[vji (x1, y1)] − h[vj

i (x2, y2)]h[vj

i (x2, y2)] − h[vji (x3, y3)]

...

h[vji (xNpixels

, yNpixels)] − h[vj

i (x1, y1)]

⎤⎥⎥⎥⎥⎥⎦

(B.10)

It’s not practical to solve (B.10) using SVD directly sincematrix requires a huge memory. However we notice thatthe matrix in (B.10) is a special Npixels by Npixels matrixsuch that we could get the inverse directly by using Gauss-Jordan Elimination:

−1 =

⎡⎢⎢⎢⎢⎢⎢⎣

0 0 −1

−1 0 0 −1

−1 −1 0 0 −1...

...... · · · 0 −1

−1 −1 −1 · · · −1 −1

⎤⎥⎥⎥⎥⎥⎥⎦

(B.11)

Thus we can efficiently compute each element of m(x, y)

independently and again, M(x, y) = exp[m(x, y)].

Appendix C: Estimation of Rotation Angle Using TwoOptical Markers

Let the marker attached to the scope cylinder be Marker 1and the marker to the camera head be Marker 2 (Fig. 10(b)).As Fig. 12 shows, when we rotate the camera head aroundthe scope cylinder from the position “A” to “B” by θ , point�Pr in Marker 2’s coordinates O2 will move along a circle

with respect to a point �O on the axis of the scope cylinder,in Marker 1’s coordinates O1. We can estimate the center �Oof the circle first and compute θ

θ = arccos‖ �OP A

r ‖2 + ‖ �OP Br ‖2 − ‖ �P A

r P Br ‖2

2‖ �OP Ar ‖ · ‖ �OP B

r ‖(C.1)

The center of the circle can be represented in terms of thetransformations from the world coordinates Ow to Marker1’s coordinates O1, and Marker 2’s coordinates O2. At least3 different positions of Marker 2 (O2) (with different θ ) arenecessary.

Appendix D: Estimation of the Center of RotationCircle in 3D

We rotate the camera head around the cylinder to acquire 3different positions of Marker 2. Let the transformation ma-trix from the world coordinates Ow to both Marker 1’s coor-dinates O1 and Marker 2’s coordinates O2 for position i be(o1Tow

i , o2Tow

i ) (i = 1,2,3). Given any point �Pr in O2, we

Page 16: A Multi-Image Shape-from-Shading Framework for Near

Int J Comput Vis

Fig. 13 (Color online) Estimated rotation angles for two endoscopes.In each trial we rotated the camera head with respect to the scope cylin-der and captured an image. We captured a few images for the initialposition. After that we took two images for each rotation angle. Thered curves are estimated rotation angles from different RANSAC iter-ations. The black curve is the average rotation angle

first compute the position �Pi in O1 corresponding to differ-ent rotations as:

�Pi = o1Tow

i · (o2Tow

i)−1 · �Pr, i = 1,2,3. (D.1)

�O is the center of the circumcircle of the triangle ( �P1 �P2 �P3).Let �R1 = �P1 − �P3, �R2 = �P2 − �P3, the normal of the trian-

gle is �n = �R1 × �R2. The perpendicular bisector �L1 of �R1

and �L2 of �R2 can be computed as:

�L1 = �P3 + �R1/2 + λ1 · �n × �R1

�L2 = �P3 + �R2/2 + λ2 · �n × �R2(D.2)

where λ1 and λ2 are parameters of the line �L1 and �L2. Theintersection of these two lines is the center of the circle.

Fig. 14 (Color online) Back projection errors with respect to the ro-tation angles for two systems. (a) Stryker 344-71 arthroscope Vistaand Polaris optical tracker. (b) Smith & Nephew video arthroscope andOPTOTRAK optical tracker in the operating room. Images in the toprow of (a) and (b) correspond to different rotation angles (the angle isshown on the top of each image). The red curves in (a) and (b) repre-sent the errors without a rotation compensation. The blue curves in (a)and (b) represent errors with a rotation compensation

From (D.2) we can derive the center of the circle

�O = ( �R2 − �R1) · �R1/2

| �R1 × �R2|2· ( �R1 × �R2) × �R2

+ �R2/2 + �P3 (D.3)

It can be easily proved that �O does not depend on theselection of �Pr . Since at least 3 different positions are nec-

Page 17: A Multi-Image Shape-from-Shading Framework for Near

Int J Comput Vis

essary, we rotate the camera head around the scope cylinderby N different angles. We apply a RANSAC algorithm to es-timate �O using random positions, and select �O∗ which cor-responds to the smallest variance as the center of the circle.The similar RANSAC algorithm is also used to compute θ .

Figure 13 shows the estimated rotation angle usingRANSAC algorithm for two different endoscopes. The redcurves are output angles from different RANSAC iterations,the black curve is the average angle. We can see the varianceof the estimation is very small (less than 0.2 degree).

References

Besl, P. J., & McKay, N. D. (1992). A method for registration of 3-dshapes. IEEE Transactions on Pattern Analysis and Machine In-telligence, 14(2), 239–256.

Clarkson, M. J., Rueckert, D., King, A. P., Edwards, P. J., Hill,D. L. G., & Hawkes, D. J. (1999). Registration of video im-ages to tomographic images by optimising mutual informationusing texture mapping. In LNCS: Vol. 1679. Proceedings of thesecond international conference on medical image computingand computer-assisted intervention (MICCAI’99) (pp. 579–588).Berlin: Springer.

Courteille, F., Crouzil, A., Durou, J.-D., & Gurdjos, P. (2004). Towardsshape from shading under realistic photographic conditions. Pro-ceedings of the 17th International Conference on Pattern Recog-nition (ICPR’04), 2, 277–280.

Dey, D., Gobbi, D. G., Slomka, P. J., Surry, K. J. M., & Peters, T. M.(2002). Automatic fusion of freehand endoscopic brain imagesto three-dimensional surfaces: Creating stereoscopic panoramas.IEEE Transactions on Medical Imaging, 21(1), 23–30.

Durou, J.-D., Falcone, M., & Sagona, M. (2008). Numerical methodsfor shape-from-shading: A new survey with benchmarks. Com-puter Vision and Image Understanding, 109(1), 22–43.

Forster, C. H. Q., & Tozzi, C. L. (2000). Toward 3d reconstruction ofendoscope images using shape from shading. In Proceedings ofthe 13th Brazilian symposium on computer graphics and imageprocessing (SIBGRAPHI’00) (pp. 90–96).

Frankot, R. T., & Chellappa, R. (1988). A method for enforcing inte-grability in shape from shading algorithms. IEEE Transactions onPattern Analysis and Machine Intelligence, 10(4), 439–451.

Fuchs, H., Livingston, M. A., Raskar, R., Colucci, D., Keller, K., State,A., Crawford, J. R., Rademacher, P., Drake, S. H., & Meyer, A. A.(1998). Augmented reality visualization for laparoscopic surgery.In LNCS: Vol. 1496. Proceedings of the first international confer-ence on medical image computing and computer-assisted inter-vention (MICCAI’98) (pp. 934–943). Berlin: Springer.

Hartley, R., & Zisserman, A. (2004). Multiple view geometry in com-puter vision (2nd edn.). Cambridge: Cambridge University Press.

Hasegawa, J. K., & Tozzi, C. L. (1996). Shape from shading withperspective projection and camera calibration. Computers andGraphics, 20(3), 351–364.

Horn, B. K. P., & Brooks, M. J. (1986). The variational approachto shape from shading. Computer Vision, Graphics, and ImageProcessing (CVGIP’86), 33(2), 174–208.

Horn, B. K. P., & Brooks, M. J. (1989). Shape from shading. Cam-bridge: MIT.

Horn, B. K. P., & Sjoberg, R. W. (1979). Calculating the reflectancemap. Applied Optics, 18(11), 1770–1779.

Ikeuchi, K., & Horn, B. K. P. (1981). Numerical shape from shadingand occluding boundaries. Artificial Intelligence, 17(1-3), 141–184.

Iyengar, A. K. S., Sugimoto, H., Smith, D. B., & Sacks, M. S. (2001).Dynamic in vitro quantification of bioprosthetic heart valve leafletmotion using structured light projection. Annals of BiomedicalEngineering, 29(11), 963–973.

Kimmel, R., & Sethian, J. A. (2001). Optimal algorithm for shape fromshading and path planning. Journal of Mathematical Imaging andVision, 14(3), 237–244.

Kozera, R. (1991). Existence and uniqueness in photometric stereo.Applied Maths and Computation, 44(1), 1–103.

Kozera, R. (1992). On shape recovery from two shading patterns. In-ternational Journal of Pattern Recognition and Artificial Intelli-gence, 6(4), 673–698.

Kozera, R. (1998). An overview of the shape from shading problem.Machine Graphics and Vision, 7(1), 291–312.

Kozera, R., & Noakes, L. (2004). Noise reduction in photometric stereowith non-distant light sources. Proceedings of the InternationalConference on Computer Vision and Graphics (ICCVG’04), 32,103–110.

Leclerc, Y. G., & Bobick, A. F. (1991). The direct computationof height from shading. In Proceedings of IEEE computer so-ciety conference on computer vision and pattern recognition(CVPR’91) (pp. 552–558).

Lee, K. M., & Kuo, C.-C. J. (1994). Shape from shading with perspec-tive projection. Computer Vision, Graphics, and Image Process-ing: Image Understanding, 59(2), 202–212.

Lee, K. M., & Kuo, C.-C. J. (1997). Shape from shading with a gen-eralized reflectance map model. Computer Vision and Image Un-derstanding, 67(2), 143–160.

Litvinov, A., & Schechner, Y. Y. (2005). Addressing radiometric non-idealities: a unified framework. Proceedings of the 2005 IEEEComputer Society Conference on Computer Vision and PatternRecognition (CVPR’05), 2, 52–59.

Mourgues, F., Devernay, F., Malandain, G., & Coste-Manière, E.(2001). 3d reconstruction of the operating field for image over-lay in 3d-endoscopic surgery. In: Proceedings of the IEEE andACM international symposium on augmented reality (ISAR’01)(pp. 191–192).

Okatani, T., & Deguchi, K. (1997). Shape reconstruction from an en-doscope image by shape from shading technique for a point lightsource at the projection center. Computer Vision and Image Un-derstanding, 66(2), 119–131.

Penna, M. A. (1989). A shape from shading analysis for a single per-spective image of a polyhedron. IEEE Transactions on PatternAnalysis and Machine Intelligence, 11(6), 545–554.

Poelman, C. J., & Kanade, T. (1997). A paraperspective factorizationmethod for shape and motion recovery. IEEE Transactions on Pat-tern Analysis and Machine Intelligence, 19(3), 206–218.

Pollefeys, M., Koch, R., & Gool, L. V. (1999). Self-calibration andmetric reconstruction inspite of varying and unknown intrinsiccamera parameters. International Journal of Computer Vision,32(1), 7–25.

Prados, E., & Faugeras, O. (2003). Perspective shape from shading andviscosity solution. Proceedings of the Ninth IEEE InternationalConference on Computer Vision (ICCV’03), 2, 826–831.

Prados, E., & Faugeras, O. (2005). Shape from shading: a well-posedproblem? Proceedings of the 2005 IEEE Computer Society Con-ference on Computer Vision and Pattern Recognition (CVPR’05),2, 870–877.

Samaras, D., & Metaxas, D. N. (1999). Coupled lighting direction andshape estimation from single images. Proceedings of the Interna-tional Conference on Computer Vision (ICCV’99), 2, 868–874.

Seshamani, S., Lau, W., & Hager, G. (2006). Real-time endo-scopic mosaicking. In LNCS: Vol. 4190. Proceedings of theNinth International Conference on Medical Image Computingand Computer-Assisted Intervention (MICCAI’06) (pp. 355–363).Berlin: Springer.

Page 18: A Multi-Image Shape-from-Shading Framework for Near

Int J Comput Vis

Stoyanov, D., Darzi, A., & Yang, G. Z. (2005). A practical approachtowards accurate dense 3d depth recovery for robotic laparoscopicsurgery. Computer Aided Surgery, 10(4), 199–208.

Tankus, A., Sochen, N., & Yeshurun, Y. (2003). A new perspective [on]shape-from-shading. Proceedings of the Ninth IEEE InternationalConference on Computer Vision (ICCV’03), 2, 862–869.

Tankus, A., Sochen, N., & Yeshurun, Y. (2004). Reconstruction ofmedical images by perspective shape-from-shading. Proceedingsof the Pattern Recognition, 17th International Conference on(ICPR’04), 3, 778–781.

Tankus, A., Sochen, N., & Yeshurun, Y. (2005). Shape-from-shadingunder perspective projection. International Journal of ComputerVision, 63(1), 21–43.

Tsai, R. Y. (1987). A versatile camera calibration technique for high-accuracy 3d machine vision metrology using off-the-shelf tv cam-

eras and lenses. IEEE Journal of Robotics and Automation, 3,323–344.

Woodham, R. J. (1980). Photometric method for determining surfaceorientation from multiple images. Optical Engineering, 19(1),139–144.

Yamaguchi, T., Nakamoto, M., Sato, Y., Konishi, K., Hashizume, M.,Sugano, N., Yoshikawa, H., & Tamura, S. (2004). Developmentof a camera model and calibration procedure for oblique-viewingendoscopes. Computer Aided Surgery, 9(5), 203–214.

Zhang, R., Tsai, P. S., Cryer, J. E., & Shah, M. (1999). Shape fromshading: A survey. IEEE Transactions on Pattern Analysis andMachine Intelligence, 21(8), 690–706.

Zhang, Z. (1998). A flexible new technique for camera calibration(Technical Report MSR-TR-98-71). Microsoft Research.