feature article immersive electronic books for …andrei/pubs/2005_multimedia_ebooks.pdfimmersive...

Immersive electronicbooks (IEBooks) forsurgical training willlet surgeons exploreprevious surgicalprocedures in 3D.The authors describethe techniques andtools for creating apreliminary IEBookembodying some ofthe basic concepts.

Surgical training has traditionally fol-lowed the adage, “See one, do one,teach one,” but the complexity and rar-ity of some procedures limits a sur-

geon-in-training’s ability to learn them. Intrauma-related procedures, for example, surgeonsmust quickly stabilize the patient, and teachingis secondary to patient care. Such procedures areunplanned, and some critical traumas are rare.

This issue is not trivial. For most surgical spe-cialties, physicians require at least five years ofpostmedical school training before they are eli-gible for board certification, and it is frequentlymany years after board certification before mostsurgeons are entirely comfortable with their craft.In addition, as in many other technology-drivenfields, practicing surgeons must keep up-to-datewith a dizzying array of new devices and proce-dures designed to improve care. Combined withan overall increased focus on limiting medicalerrors is an intense interest in improving andmaintaining surgical competency. Currently,there is no educational environment that can

replace the apprenticeship environment of theoperating room. Schools and hospitals can usesome teaching modules to teach and assess vari-ous technical skills, but there is nothing similarto the flight simulator for pilot training that canreplace being in the operating room (see the“Visualizing Procedures in 3D” sidebar).

In 1994, Fuchs et al. outlined a vision andsome methods for using a “sea of cameras” toachieve 3D telepresence for remote medical pro-cedures.1 We have since pursued related work for-mally and informally with multiple collaborators.In 2002, we presented an intermediate summaryof results from a three-year effort incorporatingteleimmersion technologies into surgical train-ing.2 Our goal is to create technologies that letsurgeons witness and explore a previous surgicalprocedure in 3D as if they were present at theplace and time of the actual event, with the fol-lowing added benefits: nonlinear control overtime; instruction from the original surgeon orother instructor; and integrated 3D illustrations,annotations, and relevant medical metadata. Thiswould let surgeons integrate the technical aspectsof the procedure with the anatomy, radiography,and physiology of the patient’s condition, thusbest simulating and in many ways augmentingthe real-time educational experience of learninghow to perform an operative procedure. Figure 1shows an early artist’s sketch of our immersiveelectronic book (IEBook) and a screen shot fromour current prototype.

Here we present a current look at our efforts,describing a set of tools that together comprise asystem for creating and viewing an IEBook anddisplaying some results.

Equipment An understanding of the physical apparatus

involved should help readers better understandhow the system lets users create and view content.

Acquisition The primary piece of acquisition equipment is

the camera cube shown in Figure 2. The unit isconstructed from rigid modular aluminum fram-ing manufactured by 80/20 (http://www.8020.net). Each side is approximately one meter.

On the unit are four fluorescent high-frequen-cy linear lights from Edmund Industrial Optics,and eight 640 × 480 Dragonfly FireWire (IEEE1394) color cameras from Point Grey Research(PGR). The lights reduce specular reflections andprovide even illumination. The cameras support

22 1070-986X/05/$20.00 © 2005 IEEE Published by the IEEE Computer Society

ImmersiveElectronic Booksfor SurgicalTraining

Greg Welch, Andrei State, Adrian Ilie, Kok-Lim Low,Anselmo Lastra, Bruce Cairns, Herman Towles, and

Henry Fuchs University of North Carolina at Chapel Hill

Ruigang YangUniversity of Kentucky

Sascha Becker, Daniel Russo, Jesse Funaro, and Andries van DamBrown University

Feature Article

FireWire-based syn-chronization whenused with a PGR syncunit, which ensuresthat all eight camerasimage simultaneously,eliminating the needfor temporal rectifica-tion prior to recon-struction of dynamic3D models from the 2Dimages, as we describein the next section.

To accommodatethe FireWire band-width for 15-frame-per-second acquisition,we configured thecameras into four cam-era-pair groups. Each

23

The most obvious substitute for being in the operatingroomconventional 2D videohas long been available, butsurgeons universally consider these videos marginally effectiveat best for several reasons:

❚ Video relegates the viewer to the role of passive observerwho cannot interactively change the view.

❚ Subtle, complex motions that can be critical in an operationare easy to miss.

❚ Video provides few depth cues and offers only linear controlover the playback’s timing.

❚ Replaying video only provides the same experience a secondtime, rather than permitting a new or changed perspective.

In short, watching a video is not much like experiencing theprocedure. Video is data from the procedure, but as Davispoints out, “experiences are not data.”1 Davis describes expe-rience as “an intangible process of interaction among humansand the world that has its existence in human minds,” notingthat we can only capture and archive the materials (data) that“occasion experiences in human minds.”1

Three-dimensional computer graphics systems let sightedpeople see real or virtual models with the depth, occlusion rela-tionships, and motion parallax they are used to in everyday life.When coupled with interaction devices and techniques, suchsystems can indeed occasion personal experiences in humanminds. Several disciplines have leveraged the power of this par-

adigm in training for many years. Arguably the most successfulexample is flight simulation for pilot training.2 Today flight sim-ulators are considered so effective (and cost-effective) that pilotssometimes use them to train on a new version of a plane, andthen fly the real plane for the first time on a regularly scheduledflight with regular passengers.

Medical training and education programs have also used 3Dgraphics techniques.3–5 Previous training efforts have primarilycombined realistic models with interaction devices (sometimeshaptic) to simulate the experience of performing a particularprocedure. Rather than aiming to simulate the “do one” of theadage, “see one, do one, teach one,” this work aims to improvethe opportunities to “see one,” and improve the associatedlearning experience’s effectiveness.

References1. M. Davis, “Theoretical Foundations for Experiential Systems

Design,” Proc. ACM SIG MultiMedia 2003 Workshop on Experiential

Telepresence (ETP 2003), ACM Press, 2003, pp. 45-52.

2. B.J. Schachter, “Computer Image Generation for Flight

Simulation,” IEEE Computer Graphics and Applications, vol. 1, no.

4, Oct. 1981, pp. 29-68.

3. J.F. Brinkley, “The Digital Anatomist Project,” Proc. Am. Assoc.

Advancement of Science Ann. Meeting, AAAS Press, 1994, p. 6.

4. R. Zajtchuk and R.M. Satava, “Medical Applications of Virtual

Reality,” Comm. ACM, vol. 40, no. 9, 1997, pp. 63-64.

5. A. Liu et al., “A Survey of Surgical Simulation: Applications,

Technology, and Education,” Presence: Teleoperators and Virtual

Environments, vol. 12, no. 6, 2003, pp. 599-614.

Visualizing Procedures in 3D

Figure 1. Immersive electronic book

(IEBook) in surgical training:

(a) early conceptual sketch of a blunt

liver trauma surgery, (b) the

acquired 3D scene of the surgery with

annotations and trainee’s hands

(sketches by Andrei State); and

(c) an IEBook in our present

immersive display environment. The

third of six annotated snapshots is

selected and playing. The red box

magnifies the snapshot for the fourth

item.

(a) (b)

(c)

group includes two PGR cameras, a FireWire hub,a PGR sync unit, and a Dell PowerEdge rack-mounted server running Microsoft Windows2000 Professional. Figure 3 depicts the FireWireconnections within a group and between groups.

Reconstruction For reconstructing 3D models from the 2D

images, we use a standard personal computerrunning Microsoft Windows XP Professional. Wecurrently use a Dell Precision 340 with a 2.2-GHzPentium 4 and 2 Gbytes of RAM.

Authoring Both content authoring and viewing take

place in a Barco TAN VR-Cube located at BrownUniversity. The VR-Cube, which is approximate-ly three meters on each side, has four projectionsurfaces (three walls and a floor) and four corre-sponding Marquee 9500LC projectors configuredfor a 1,024 × 768 screen with 60-Hz field-sequen-tial stereo. The system uses five commodityLinux workstations: one to gather device dataand four to render. The rendering workstationsare dual 2-GHz Intel Xeons with genlockednVidia Quadro FX 3000G graphics cards.Myricom Myrinet connects the workstations forhigh-speed communication. We use CrystalEyesactive liquid crystal shutter glasses, and

Polhemus FasTrak andIntersense IS900 track-ing systems.

The primary auth-oring input device is ahandheld Tablet PC.We currently use aToshiba Portege M200to navigate the dynam-ic 3D points/meshes intime, initiate snap-shots, highlight and

annotate, and arrange snapshots hierarchically.

ViewingOur primary system for fully immersive inter-

action with a complete IEBook is a combinationof the Barco TAN VR-Cube projection system andthe Acer Tablet PC. This equipment lets usersenjoy both full immersion and the complete setof IEBook features available. This is currently theonly viewing option providing a complete expe-rience.

We also explored a hybrid display systemcombining head-mounted and projector-baseddisplays to simultaneously provide a high-quali-ty stereoscopic view of the procedure and alower-quality monoscopic view for peripheralawareness (see the “Visualizing Procedures UpClose and All Around” sidebar).

We also support a relatively limited but moreaccessible Web-based paradigm. The paradigm islimited in that it provides reduced or no immer-sion, and we currently only show dynamic 3Dpoints/meshes. We use VRML 97 to supportWeb-based viewing of the dynamic 3Dpoints/meshes with viewpoint control. TheVRML data sets use embedded JavaScript func-tions to implement the user interface actions.The functions are compatible with VRML plug-ins such as the Cortona plug-in (available at

24

Figure 2. Content

acquisition: (a) camera

cube used for

acquisition with a

medical training model

in place, and (b) a

surgeon (coauthor

Bruce Cairns)

performing a mock

procedure to manage

severe blunt liver

trauma.

Cameras

FireWirehub

Previous

Next

Sync unit

Server

Figure 3. A single camera-pair group,

and the FireWire intra- and

intergroup connections. The four

groups are connected from previous

to next.

(a) (b)

http://www.parallelgraphics.com). We generate anaglyphic (red-blue) stereo

images and movies from the dynamic 3Dpoints/meshes so users can view the data in stereo,using an inexpensive pair of red-blue 3D glasses.We select representative moments within eachsequence and render stereoscopic still images ofthe points/meshes at those moments. We thenmake these available with the Java Stereoscope

applet (see http://www.stereofoto.de). We also cre-ate anaglyphic stereo movies in QuickTime andother formats.

Content creationFigure 4 gives an overview of the two major

phases involved in IEBook production: contentcreation and content viewing. During content cre-ation, the system acquires 2D video of an actual

25

July–Septem

ber 2005

Clearly, a student trying to understand a new surgical pro-cedure would appreciate a high-quality, up-close view of theprocedure. But what is not so obvious is that learning how tointeract and deal with events and confusion around the patientand in the room is also critical. A display with high-fidelity stereoimagery at arm’s length and a wide (surrounding) view field canengender this complete experience. Head-mounted displays(HMDs) can provide the former, and projector-based displaysthe latter, but unfortunately we are not aware of any displaysystem that simultaneously meets both needs.

We merged the two paradigms and developed a hybrid dis-play system, shown in Figure A, that combines head-mountedand projector-based displays to simultaneously provide a high-quality stereoscopic view of the procedure and a lower-qualitymonoscopic view for peripheral awareness.1 We use a KaiserProView 30 stereo HMD, which does not have baffling materialaround the displays that would block the wearer’s peripheralview of the real world, and four projectors to render view-dependent monoscopic imagery on Gatorboard and Styrofoamarranged to match the operating table and nearby walls.

We evaluated the display’s efficacy with a formal humansubject study. We asked the subjects to solve a series of wordpuzzles directly in front of them while simultaneously detect-ing and identifying objects entering and exiting the periphery.

Our study indicated that peripheral distraction and associatedhead turn rates were significantly less when solving both sets oftasks using the hybrid display than an HMD alone.1

Reference1. A. Ilie et al., “Combining Head-Mounted and Projector-Based

Displays for Surgical Training,” Presence: Teleoperators and

Virtual Environments, vol. 13, no. 2, Apr. 2004, pp. 128-145.

Projected

HMD

Figure A. Our hybrid display system combining head-mounted

and projector-based displays.

Visualizing Procedures Up Close and All Around

Content creation Content viewing

IEBook

Acquisition

Reconstruction

Authoring

2D RGBvideo

Dynamic 3Dpoints/meshes

Annotated andtime-stamped2D snapshots

VRML

Movies

Images

WWW

Head-tracked

stereo 3D

Anaglyphic stereo

Figure 4. Overview of the phases involved in creating and viewing an immersive electronic book (IEBook). (RGB = Red,

Green, Blue; VRML = Virtual Reality Modeling Language; and WWW = World Wide Web.)

surgical event from multiple cameras, recon-structs 3D models, and authors an IEBook. Figure5 illustrates these processes.

Acquisition The first step in creating an IEBook is captur-

ing (acquiring) the event of interest using thecamera cube. The basic process, depicted inFigure 5a, involves calibrating the cameras, cap-turing synchronized video, and converting theraw images into the RGB color space for recon-struction.

Camera calibration. To perform the vision-based 3D reconstruction (see Figure 5b), we needestimates of analytical models for the cameras’geometric and photometric properties. We per-form respective calibration procedures to preparefor event capture.

For geometric calibration, we use Bouguet’sCamera Calibration Toolbox for Matlab.3 We takea sequence of frame sets (sets of eight synchro-nized images—one from each camera on thecamera cube) of a moving black-and-whitechecked calibration pattern (see Figure 6), andthen use the toolbox to estimate the intrinsic(focal length, image center, and pixel skew) andextrinsic (rotation and translation) parametersand lens distortion coefficients for each camera.

Because we use a color-based reconstructionapproach, the color responses of the individualcameras must match. To achieve this, we use aclosed-loop approach that seeks to adjust somehardware registers in the PGR cameras for opti-mal color matching and then compensates forremaining errors in software. By setting the hard-ware registers to their optimal values duringacquisition, we in effect apply part of the photo-

26

IEEE

Mul

tiM

edia

Geometric andphotometricparameters

2D Bayervideo

2D RGB video

Cameracalibration

Synchronizedcapture

Bayer-RGBconversion

Updateconstraints

Selectbest voxels

Photo-consistencycalculation

Dynamic 3Dpoint clouds

Dynamic 3Dmeshes

Filtering and triangulation

(a) (b)

Snapshots

Highlightand

annotate

Navigatetime andviewpoint

Dynamic 3Dpoint meshes

Annotated andtime-stamped2D snapshots

(c)

Interactive 2D/3D

Figure 5. Content creation involves three steps: (a) acquisition of the 2D video of an actual surgical event,

(b) reconstruction of 3D models, and (c) authoring an IEBook.

metric calibration. Residual errors are correctedduring a software refinement step.

To determine optimal values for the eight cam-eras’ hardware registers, we developed softwarethat iteratively adjusts the registers while observ-ing the effects in images of a Gretag Macbeth“Color-Checker Color Rendition Chart” (seeFigure 6). The software adjusts the parameters ineach camera and color channel such that theresulting images are as alike as possible.Specifically, we use Powell’s method for nonlinearoptimization4 to minimize a cost function definedas the weighted sum of the squared differencesbetween color values, and the variances across thesampling window of each color sample. The para-meters of the software refinement step are com-puted in a onetime final calibration stage.

Synchronized capture. To improve the qual-ity of the reconstruction results we set the cam-era cube on top of black fabric as shown in Figure6. Before capturing an event, we acquire imageryto estimate the actual color of the fabric, thenduring reconstruction we mark as “background”any scene points that appear to have that color.

To capture an event, we first start the imagecapture server software on each of the four Dellservers, and start a master synchronization pro-gram that sets off the synchronized capture todisk. The capture continues for a preset numberof frame sets.

We address server bandwidth and disk spaceconcerns both by design (the camera-group con-figuration of Figure 3) and by removing old filesand defragmenting disks prior to capture. Each640 × 480 camera image is approximately 300Kbytes. At a capture rate of 15 images per second,with two cameras per server, we need approxi-mately 10 Mbytes per second of bandwidth fromthe cameras to the server disks, and approxi-mately 0.5 Gbytes of disk space per minute ofcapture per server.

Color space conversion and distortionremoval. As with most single-sensor cameras,the charge-coupled device (CCD) in the PGRcameras achieves color imagery using optical fil-ters that spatially divide the 640 × 480 pixels (pic-ture elements) among red, green, and blue (RGB).The CCD in our PGR cameras uses a spatialarrangement called the Bayer Tile Pattern. Assuch, each native captured 8-bit image containsinterspersed data from all three color channelsthat a demosaicing algorithm combines to create

a 640 × 480 24-bit RGB image. We developedsoftware that uses PGR libraries to convert thenative 8-bit Bayer images into 24-bit RGB images.Our code also uses the Intel Image ProcessingLibrary to remove the lens distortion estimatedduring the geometric calibration.

Reconstruction The 3D reconstruction process involves two

major steps:

❚ reconstruction of 3D points from 2D imagesusing view-dependent pixel coloring (VDPC)and

❚ reconstruction of 3D surfaces from the 3Dpoints using application-specific point filter-ing and triangulation to create 3D meshes.

Figure 5b gives an overview of the reconstructionprocess.

View-dependent pixel coloring. VDPC is ahybrid image-based and geometric approach thatestimates the most likely color for every pixel ofan image that would be seen from some desiredviewpoint while simultaneously estimating aview-dependent 3D model (height field) of theunderlying scene. (Further theory and imple-mentation details of VDPC and a discussion ofrelated work are available elsewhere.5) By consid-ering a variety of factors, including object occlu-sions, surface geometry and materials, andlighting effects, VDPC produces results whereother methods fail—that is, in the presence oftextureless regions and specular highlights,which are common conditions in surgery (skin,organs, and bodily fluids).

Figure 7 illustrates VDPC’s fundamental con-cepts. We begin by defining a 3D perspectivevoxel grid from the desired viewpoint, which istypically situated above and looking down intothe camera cube, as Figure 7a demonstrates.

27

July–Septem

ber 2005

Figure 6. The geometric and

photometric calibration patterns, as

seen from one of the eight cameras.

The patterns shown together here are

actually used separately for

geometric and photometric

calibration.

As Figure 7b illustrates, for each x-y pixel inthe desired viewpoint image, we effectively tra-verse the ray away from the desired viewpoint toestimate the most likely color for that pixel. Todo this, we test each voxel along the ray by back-projecting it into each of the eight cameras andlooking at the actual camera image color at thatpoint. We choose one “winner” voxel (the voxelwith the most plausible back-projected appear-ance in all camera samples). We then use themedian of the winner’s back-projected camerasample colors as the surface (voxel) color esti-mate, and the position on the ray as the 3D coor-dinate of the surface point. We mark all winnervoxels as opaque surfaces, and all others alongthe ray between the desired view and the surfaceas empty. Like Kutulakos’s and Seitz’s seminalspace-carving work,6 this effectively carves awayvoxels that do not appear to be part of a surface.

We repeat this volume-sweeping process,allowing estimated opaque surface points toocclude other voxels, progressively refining anestimate of a height field corresponding to thefirst visible surfaces. The dashed box in Figure 5billustrates this cycle.

Two aspects of VDPC contribute to its relativerobustness in the presence of textureless andspecular surfaces.

First, we use a view-dependent smoothness con-straint, assuming that 3D surfaces are more likely

to be continuous without abruptchange in depth. We implement theconstraint with a new volumetricformulation of the relatively well-known disparity gradient principle.VDPC uses this constraint, updatingthe likelihoods as it progressivelyrefines the surface voxel model.

Second, VDPC uses a physicallybased consistency measure. Whensweeping along x-y desired view-point rays, we declare the voxelwith the most plausible back-pro-jected appearance the winner. Onemeasure of plausibility is the vari-ance of the back-projected colorsamples—the smaller the variancethe more likely a voxel is to be a sur-face voxel. Our physically basedconsistency measure goes farther,allowing for back-projected samplecolor distributions that are plausiblein the presence of specular high-lights on many types of surfaces. In

these cases, the colors of back-projected samplesoccurring near specular reflections of a lightsource tend toward the light’s color. In fact, inthe presence of these specular highlights, theeight back-projected sample colors will lie alonga line in the RGB color space, extending from thesurface’s inherent color toward the light’s color.By allowing for this specular distribution of back-projected sample colors, we can detect surfaceswhere simple variance tests would fail.

When we apply VDPC to a frame set, we get astatic surface voxel model from the desired view-point’s perspectivethat is, a height field. In fact,because some x-y rays finish without winner vox-els and some are declared part of the background,the result is a sparse surface voxel model. We usethe estimated camera geometry to transform thisinto a 3D point cloud in the camera cube space,and repeatedly apply the overall approach to all ofthe frame sets to obtain a dynamic 3D point cloud.

Filtering and triangulation. After recon-struction, some holes will exist in the 3D pointdata, and some 3D points will visually appear tobe outliers. If we filter and triangulate the 3Dpoints where we believe there are indeed sur-faces, we can obtain more continuous dynamicmodels, further improving the appearance.

To this end, we developed new postprocessingmethods to filter and triangulate the dynamic

28

IEEE

Mul

tiM

edia

Desiredviewpoint

Ray correspondingto pixel (x, y) in thedesired view image

Perspectivevoxel grid

Check colorconsistency

Figure 7. View-dependent pixel coloring (VDPC): (a) the desired viewpoint and perspective

voxel grid concepts, and (b) the color consistency check for a particular voxel along a ray

from the desired viewpoint image. The arrows extending from the hand toward the cameras

depict the reflectance or imaging of the surface (voxel) into or by the cameras.

(a) (b)

sparse 3D point data. Figure 8 shows some results.Although we could use various off-the-shelfmethods, VDPC allows for more specific con-straints. For example, because the reconstructedpoints are on a 2D view-dependent grid, we canuse 2D algorithms for filtering and triangulation.In addition, VDPC assigns a unique color topoints estimated to be on a surface’s edges, lettingus “break” meshes at edges and helping us iden-tify holes within the reconstructed shapes.

Our postprocessing filtering and triangulationsoftware applies three steps in order:

1. Median filtering. The software uses a 5 × 5median filter to smooth the z values. For eachsample, the software inspects the 24 grid loca-tions in the surrounding 5 × 5 x-y square. If atleast three of these locations contain samples(including edge points), we replace the origi-nal sample’s z value with the median value.

2. Hole filling. Exploiting the edge-color contour-ing, the software fills holes by examiningneighboring samples. For each unpopulatedsample located in a hole, the software uses a 3× 3 box filter to average both color and depthvalues from up to eight neighboring samples.

3. Triangulation. Finally, the software examinesthe array in square groups of 2 × 2 samples. Ifany group is fully populated, we can use two

possible triangulation techniques. We pickthe one that minimizes the color differencealong the diagonal.

Authoring Our current approach to authoring includes a

novel combination of 2D and 3D interactiontechniques. The primary motivation for a hybrid2D/3D approach is to provide a familiar and tan-gible means of sketching (the notepad paradigm)while simultaneously offering a natural andimmersive means of viewing the dynamic 3Ddata and the evolving IEBook.

Figure 5c illustrates the authoring process.Figure 9a shows an author annotating an IEBookin the VR-Cube, and Figure 9b is a screen shot ofthe authoring interface on the Tablet PC.

Using VCR-like time controls (see the right

29

July–Septem

ber 2005

(a) (b)

(c) (d)

Figure 8. Filtering and

triangulation:

(a) triangulation

without filtering,

(b) triangulation with

filtering, (c) filtered

points colored in green

and filled holes in red,

and (d) absolute filter

correction values as a

height field. The highest

spikes correspond to

pointy outliers in (a).

Figure 9. Authoring an

IEBook: (a) an author in

the VR-Cube and (b) a

screen shot of the inter-

face on the Tablet PC.

(a) (b)

side of Figure 9b), the author navigates time inthe captured sequence, looking for an interestingor important event. The author moves to a view-point where he or she has a good view of the sur-geon’s actions at that moment, and takes asnapshot of it using a button on the Tablet PC.Using the same Tablet PC interface, the authorcan highlight features, annotate the snapshot,and save the results to a virtual gallery in theIEBook. The author can arrange the snapshotshierarchically by dragging their titles on theTablet PC application.

Unfortunately, conventional methods forselection and highlighting in 2D images do notaccommodate depth or time. We therefore adapt-ed some conventional methods to enable theselection and highlighting of dynamic 3Dpoint/mesh data.7 Specifically, we implementedthree distinct dynamic 3D highlighting para-digms: marquee, freeform, and fill, which Figure10 illustrates. The implementations preserve high-lights throughout a sequence and are adaptive,reacting to changes in the 3D topology over time.

To create a marquee highlight, the authordrags a rectangle (or oval) around an area ofinterest in a 2D snapshot. The authoring appli-cation renders the selection into the currentframe of the dynamic 3D data, darkening allpoints outside the selected area by an adjustableamount. The effect is akin to a square (or oval)spotlight illuminating the region of interest. Theauthor can make as many selections as desired inthe current frame. When the user presses play,the authoring software attempts to highlight thesame regions in subsequent frames until the ani-mation pauses. We also implemented a surface-specific variation that only highlights actualsurfaces within the marquee region, omittingholes or background regions.

The freeform highlight is a paintbrush-styletool for creating colored marks on a fixed surfacewithin the point cloud. To create a freeformhighlight, the author clicks and drags the mouseto create a mark in the tablet display. When theauthor releases the mouse, the system renders thehighlight onto the current frame as a tint of theselected color. When the author highlights sub-sequent frames, portions of the highlight areobscured whenever their corresponding surfacepoints are obscured. The freeform highlight issurface specific in that the highlight’s location isbased entirely on the surface points’ positions.

The fill highlight is similar to a “paint bucket”tool. It applies a tint of the current highlightcolor to a group of contiguous points around aselected location, all of which have similar depthand color values. Like the freeform highlight, thefill is surface specific. Unlike the freeform high-light, however, the fill highlight can shrink andstretch, letting it compensate for small amountsof movement in the highlighted region.

The IEBook stores the highlight informationas a series of 2D Boolean matrices indicatingwhich vertices should be highlighted in eachframe. Currently, we have 10 independent high-light layers, and can assign each to a differentcolor. We can also hide the layers, letting the usersee the results with or without their effects.

Content viewing Our primary paradigm for experiencing an

IEBook is head-tracked stereo 3D, in which a stu-dent uses the VR-Cube and a subset of the author-ing interface. This IEBook environment combinesimmersive 3D imagery with Tablet PC-based nav-igation and a hierarchy of annotated snapshots.

A student views the snapshot gallery in theVR-Cube, holding the Tablet PC as in Figure 9.

30

IEEE

Mul

tiM

edia

(a) (b) (c)

Figure 10. Examples of our dynamic 3D highlighting paradigms: (a) marquee, (b) freeform, and (c) fill. The tip of a freeform arrow in

(b) is partially occluded by the surgeon’s hands, illustrating the 3D nature of the highlights. Note that this data set has not been

filtered and triangulated.

The user can navigate to a specific point in timeusing the VCR-like controls or by choosing asnapshot in the hierarchy. A blue frame providesfeedback about the active snapshot during theselection process. When the user selects a snap-shot, the model jumps to the time associatedwith it.

We built the software for our hybrid displaysystem prototype (see the “Visualizing ProceduresUp Close and All Around” sidebar) on the samescene graph software used in the Barco TAN VR-Cube implementation. However, we added a ren-dering mode to allow rendering in both ahead-mounted display and with projectors illu-minating disjoint and casually arranged (not in acube) planar surfaces. (We currently use thisequipment to show only the dynamic 3Dpoints/meshes from the reconstruction phase ofcontent creation and not the results of authoring.)

Although we believe the head-tracked stereo3D systems and VR-Cube provide the greatestsense of immersion, they are typically availableonly in visualization labs. So that more peoplecan experience our system, we made it available

on the Web. In consultation with our medicalpartners, we chose three media: anaglyphicstereo images, anaglyphic movies, and nonstereobut dynamic (in space and time) VRML models.We developed software to generate anaglyphicstills and movies from the 3D point/mesh dataand separate software that creates “boxed” VRMLdata sets. Each of the latter features a 3D VCR-likeuser interface to play and step through asequence, including a slider bar for randomaccess similar to typical computer-based movieplayers.

Technical resultsFigure 11 shows several views of a dynamic

reconstruction of Bruce Cairns performing amock procedure to manage blunt liver trauma(from Figure 2b), and another coauthor manipu-lating Rhesus monkey and human skulls.

Figure 12 (next page) shows a preliminaryIEBook in the VR-Cube. The six annotated snap-shots correspond to six (mock) items of interest.The third item is selected and playing. Zoomboxes magnify the imagery in the screen capture

31

July–Septem

ber 2005

Figure 11. Reconstruction results: (a) a mock procedure to manage blunt liver trauma, (b) manipulating a Rhesus monkey skull, and

(c) a human skull from a 12-year-old child. The human skull is a teaching artifact that can be separated into multiple parts. Both

skull reconstruction sequences include simulated dynamic shadows.

(a)

(b)

(c)

for several items. (The annotations are clear to anactual user, but do not reproduce well in a rela-tively small image.)

One of the skills most needed by new sur-geons, yet difficult to learn, is suturing. Knot-tying skills in particular can be critical—thewrong knot can cause a wound to open or resultin tissue damage. The surgeons we know agreethat 2D knot-tying images and knot-tyingmovies are woefully inadequate. We thereforeattempted to acquire and reconstruct some fun-damental knot-tying demonstrations. We canreconstruct hands and moderately thick rope,and have done so for a few basic surgical knots aswell as some common sailing knots. Figure 13shows an anaglyphic stereo movie and the VRMLinterface of a ring knot tying reconstruction.

Results showing basic sailing knotsand common surgical knots areavailable at http://www.cs.unc.e d u / ~ o o t f / P r o j e c t s / e b o o k s /reconstructions/. We will continueto add reconstruction results as weobtain them.

User reactions We have shown several surgeons

our lab-based (stereo and head-tracked) and Web-based (moviesand VRML) reconstructions (such asthose in Figure 11, not the completeIEBook results in Figure 12). Thesurgeons’ reactions have been verypositive.

Most of the surgeons cited size,resolution, and visibility of the recon-structions as the primary limitations,which is no surprise to us at the pro-

ject’s relatively early stage. We did not optimize thechoice and arrangement of cameras in our cameracube (see Figure 2) for any particular procedure.Rather, we based our choices on what was practicalfor a proof-of-concept system supporting recon-structions of modest detail throughout a modestworking volume.

As the “Visualizing Procedures Up Close andAll Around” sidebar explains, both the procedureand the surrounding events are important.Achieving appropriate resolution in terms ofreconstructions is a huge challenge. Ideally, wewould have 3D reconstructions on the order ofmillimeters over the entire operating room,which might be 5–10 meters per side (125–1,000cubic meters). Because this is currently impracti-cal, near-term future challenges will include

32

IEEE

Mul

tiM

edia

Selectionindicators

Figure 12. Screen shot

from the VR-Cube

showing several

annotated snapshots

arranged in an IEBook

virtual gallery. Zoom

boxes magnify three of

the snapshots, and the

blue frame and red

marker indicate the

current selection.

Figure 13. Ring knot

tying: (a) anaglyphic

stereo movie (“red over

right”), and (b) VRML

interface. The movie

includes simulated

dynamic shadows; the

VRML model does not.

(a) (b)

choosing camera configurations and algorithmsthat can achieve high-resolution reconstructionswhere most critical, and relatively low-resolutionreconstructions everywhere else.

Future workDespite our achievements over several years,

we are still in the early stages of a long-termeffort. Many problems remain to be solved, andthere is much work to be done to achieve thescale, fidelity, flexibility, and completeness thatwe envision.

In the near term, we will continue to recon-struct and process various objects and mock pro-cedures. For example, we are procuring somerapid-prototype models of a real skull for a mocksagittal synostosis procedure. Soon we hope toprovide anaglyphic stereo 3D data sets on theWeb, letting viewers control the viewpoint andtime. We are trying to solve problems with thenetwork connection to the Tablet PC used forauthoring and viewing. Although a wireless par-adigm is clearly desirable, the VR-Cube hardwareseems to interfere with the wireless connection.

A long-term primary goal is to scale our systemup to enable the acquisition and reconstructionof (at least) a surgical table, the surrounding area,and the involved medical personnel. In fact, with-in several years we plan to equip an intensive careunit with numerous heterogeneous cameras tocapture a variety of medical procedures. Thisinvolves solving nontrivial problems such as res-olution, visibility, computational complexity, andmassive model management. To assist with cam-era placement, we are working on a mathemati-cal and graphical tool to help estimate andvisualize acquisition information/uncertaintythroughout the acquisition volume for a particu-lar candidate set of cameras.

In light of our goals for fidelity and scale, wecontinue to improve our existing methods for 3Dreconstruction and to investigate new methods.For example, we might combine pre-acquiredlaser scans of an operating room with camera-based dynamic reconstructions so we can betterallocate cameras for dynamic events. We are alsoinvestigating possibilities for capturing real-timedata from medical monitoring equipment, andplan to include it as metadata in an IEBook.

Whereas in the past we chose to hold off onaudio acquisition, instead attempting to addressthe more difficult problems related to visualreconstruction, we plan to acquire audio as well.

We are also rethinking our choice of a Tablet PC

as the primary interface to the immersive author-ing system. The tablet display is hard to read whilewearing stereo glasses, and both hands are typical-ly busy holding the tablet and the stylus, so remov-ing the stereo glasses when necessary is not anattractive solution. In our current paradigm,authors choose a view with their heads (looking atthe data of interest) while trying to use the tabletstylus to initiate snapshots, which is awkward. Onepossibility is to track the Tablet PC so users canchoose snapshots in a “viewfinder” mode.8

Finally, we hope to increase the impact on themedical community by making complete IEBooksavailable on the Web. The primary difficulty hereis in determining which interaction techniquesare appropriate and how to implement them.Rather than simply “dumbing down” the fullyimmersive interfaces, we want to use the bestinterfaces for each paradigm and authoring toolsthat appropriately target each. MM

Acknowledgments At the University of North Carolina, we

acknowledge Marc Pollefeys for VDPC collabora-tion; Jim Mahaney and John Thomas for techni-cal support; and surgeons Ramon Ruiz andAnthony Meyer for general collaboration. AtBrown University, we thank Melih Betim andMark Oribello for their systems and video support.

This research was primarily supported by USNational Science Foundation InformationTechnology Research grant IIS0121657, and inpart by US National Library of Medicine contractN01LM33514 and NSF Research Infrastructuregrant EIA0303590.

References 1. H. Fuchs et al.,“Virtual Space Teleconferencing

Using a Sea of Cameras,” Proc. 1st Int’l Symp.

Medical Robotics and Computer Assisted Surgery,

Shadyside Hospital, Pittsburgh, 1994, pp. 161-167.

2. A. van Dam et al., “Immersive Electronic Books for

Teaching Surgical Procedures,” Telecomm.,

Teleimmersion, and Telexistence, S. Tachi, ed.,

Ohmsha, 2002, pp. 99-132.

3. J.Y. Bouguet, “Camera Calibration Toolbox for

Matlab,” http://www.vision.caltech.edu/bouguetj/

calib_doc/index.html.

4. W. Press et al., Numerical Recipes in C: The Art of

Scientific Computing, 2nd ed., Cambridge Univ.

Press, 1993.

5. R. Yang, View-Dependent Pixel Coloring—A Physically

Based Approach for 2D View Synthesis, doctoral disser-

tation, Univ. of North Carolina at Chapel Hill, 2003.

33

July–Septem

ber 2005

6. K. Kutulakos and S. Seitz, “A Theory of Shape by

Space Carving,” Int’l J. Computer Vision, vol. 38, no.

3, 2000, pp. 199-218.

7. D.M. Russo, “Real-Time Highlighting Techniques

for Point Cloud Animations,” master’s thesis, Brown

Univ., 2004.

8. M. Tsang et al., “Boom Chameleon: Simultaneous

Capture of 3D Viewpoint, Voice, and Gesture

Annotations on a Spatially Aware Display,” Proc.

15th Ann. ACM Symp. User Interface Software and

Technology, ACM Press, 2002, pp. 111-120.

Greg Welch is a research associate

professor of computer science at

the University of North Carolina

(UNC) at Chapel Hill. His research

interests include tracking and

sensing systems for virtual envi-

ronments and 3D telepresence. Welch has a PhD in

computer science from UNC-Chapel Hill. He is a mem-

ber of the IEEE Computer Society and the ACM.

Andrei State is a senior research

scientist in the Department of

Computer Science at UNC-Chapel

Hill. His technical interests

include 3D graphics and mixed

reality. State has a Dipl.-Ing.

degree from the University of Stuttgart and a master’s

degree in computer science from UNC-Chapel Hill.

Adrian Ilie is a graduate research

assistant in the Department of

Computer Science at UNC-

Chapel Hill. His research interests

include camera placement for 3D

reconstruction, photometric cali-

bration, and image processing. Ilie has an MS in com-

puter science from UNC-Chapel Hill.

Kok-Lim Low is a doctoral can-

didate in the Department of

Computer Science at UNC-

Chapel Hill. His research interests

include projector-based render-

ing, augmented reality, and 3D

imaging and modeling using active range sensing. Low

has an MS in computer science from UNC-Chapel Hill.

Anselmo Lastra is an associate

professor of computer science at

UNC-Chapel Hill. His research

interests are computer graphics,

specifically image-based model-

ing and rendering, and graphics

hardware architectures. Lastra has a PhD in computer

science from Duke University.

Bruce Cairns is an assistant pro-

fessor at UNC-Chapel Hill and

director of the North Carolina

Jaycee Burn Center. His research

interests include trauma, burns,

critical care, cellular immunolo-

gy, and shock. Cairns has an MD from the University

of Pennsylvania.

Herman Towles is a senior

research associate in the

Department of Computer Science

at UNC-Chapel Hill. His research

interests include graphics hard-

ware architecture, large-format

projective displays, 3D scene reconstruction, and tele-

presence. Towles has an MS in electrical engineering

from the University of Memphis.

Henry Fuchs is the Federico Gil

Professor of Computer Science,

adjunct professor of biomedical

engineering, and adjunct profes-

sor of radiation oncology at

UNC-Chapel Hill. His research

interests include computer graphics and vision and

their application to augmented reality, telepresence,

medicine, and the office of the future. Fuchs has a PhD

in computer science from the University of Utah. He is

a member of the National Academy of Engineering and

a Fellow of the ACM and of the American Academy of

Arts and Sciences.

Ruigang Yang is an assistant pro-

fessor in the Computer Science

Department at the University of

Kentucky. His research interests

include computer graphics, com-

puter vision, and multimedia.

34

IEEE

Mul

tiM

edia

Yang has a PhD in computer science from UNC-Chapel

Hill. He is a recipient of the NSF Career Award, and is a

member of the IEEE and the ACM.

Sascha Becker is a research sci-

entist for the Brown Computer

Graphics Group and a senior soft-

ware engineer at Laszlo Systems

in San Mateo, California. Her

work focuses on using technolo-

gy as a tool for art, education, and play. Becker has a BA

in computer science from Brown University. More

information is available at http://sbshine.net.

Daniel Russo is a software devel-

oper at Filangy, a start-up intent

on revolutionizing Web search-

ing. Russo has an ScM in comput-

er science from Brown University.

Jesse Funaro recently earned an

ScM in computer science from

Brown University, with research

in optimization techniques for

search result ranking.

Andries van Dam is the vice

president for research and profes-

sor of computer science at Brown

University. His principal research

interests are computer graphics,

computer-based education, and

hypertext and electronic books. Van Dam has a PhD in

electrical engineering from the University of

Pennsylvania. He is a member of the National Academy

of Engineering and a Fellow of the ACM and the IEEE.

Readers may contact Greg Welch at the University of

North Carolina at Chapel Hill, Department of

Computer Science, CB# 3175, Sitterson Hall, Chapel

Hill, NC 27510-3175; [email protected].

35

July–Septem

ber 2005

CERTIFIED SOFTWARE DEVELOPMENT PROFESSIONAL PROGRAMApply now for the 1 April—30 June test window.

G E T C E RT I F I E D

Visit the CSDP web site at www.computer.org/certification

or contact [email protected]

Doing Software Right

■ Demonstrate your level of ability in relation to your peers

■ Measure your professional knowledge and competence

Certification through the CSDP Program differentiates between you and other software developers. Although the field offers many kinds of credentials, the CSDP is the only one developed in close collaboration with software engineering professionals.

“The exam is valuable to me for two reasons:One, it validates my knowledge in various areas of expertise within the software field, without regard to specificknowledge of tools or commercial products...Two, my participation, along with others, in the exam and in continuing education sends a message that softwaredevelopment is a professional pursuit requiring advanced education and/or experience, and all the otherrequirements the IEEE Computer Society has established. I also believe in living by the Software Engineeringcode of ethics endorsed by the Computer Society. All of this will help to improve the overall quality of theproducts and services we provide to our customers...”

— Karen Thurston, Base Two Solutions

feature article immersive electronic books for …andrei/pubs/2005_multimedia_ebooks.pdfimmersive...

Documents