medical augmented reality system for … · medical augmented reality system for image-guided and...

MEDICAL AUGMENTED REALITY SYSTEM FOR IMAGE-GUIDED AND ROBOTIC SURGERY :

DEVELOPMENT AND SURGEON FACTORS ANALYSIS

by

ABHILASH PANDYA

DISSERTATION

Submitted to the Graduate School

of Wayne State University,

Detroit, Michigan

in partial fulfillment of the requirements

for the degree of

DOCTOR OF PHILOSOPHY

2004

MAJOR: Biomedical Engineering (Scientific Computing)

Approved by:

_____________________________

Advisor Date

_____________________________

_____________________________

_____________________________

_____________________________

ii

DEDICATION

To my parents

Dad has shown me that a smile and positive attitude go a long way.

Mom is pure love.

iii

ACKNOWLEDGEMENTS

I would like to start by first acknowledging my closest friend and best

supporter –my wife Alka. Alka has been very patient from start to as I was

finishing this thesis. I know it has not been easy trying to raise two little ones

and also going to school herself. Thanks Alka for all the love and support.

I would like to thank my advisor, Professor Greg Auner. He has been

very supportive of my work and has always provided me the encouragement to

develop my career with independence. He has pushed me to write proposals

as the PI and has really taken me with both direct conversation and as an

example, to the next level. Thanks Greg for your confidence and support. I

would like to especially thank Professor Darin Ellis. He has been full of energy

and has helped me tremendously with setting up, conducting and writing up

the Human Factors Studies. I think that we will be involved in many more

studies in the future. Thank you very much Dr. Klein for the detailed

comments on the thesis. Your pragmatic and insightful comments have kept

me well-grounded in “reality”. Dr. Robert Erlandson, thanks for the reviews

and comments on my thesis. Jim Maida, I must thank not just for the software

(mathlib.c) help and consultation with this thesis, but, for much much more.

Jim, over the 14 years that I have known you (since the days in 1989 at the

space center in Houston), I have learned so much from you. Thanks for being

a fantastic NASA Technical monitor, a great friend/mentor and the best PUNdit

around!

iv

There are several colleagues that I must recognize. Mohammad Siadat

has been my friend, colleague and teacher. I have learned a lot from him and

he has contributed immensely to this work. I cannot thank you enough. Dr.

Qinghang Li, I also wish to thank you for all the hard work we did together on

the Neuromate Robot. You have been a good friend and tremendously

supportive as I was learning the trade of how to be a Neurosurgical Engineer

in the OR. Dr. Gong was my first teacher in the Neurosurgery department and

he shared his knowledge. He has also helped me immensely with software

issues and other insights for this work. I would also like to thank Dr. Lucia

Zamorano for hiring me and introducing me to the field of Stereotactic

Neurosurgery. I feel that I have grown intellectually as well as emotionally

while working at the Neurosurgery department. Dr. Enrico Marchese, I would

like to thank you for the help in the head-up display vs. monitor study. I still

think that we should have stated that we used your children’s operation game

in our paper instead of a “metal plate that registered the touch of the

endoscope”. I would like to also thank our summer coops and GRAs Mark

Hanna, Elizabeth Klein and Mohammad Kalash for help on this project.

Lastly, but never least, I would like to thank my family. My uncle Piyush

has been very supportive of my education from when I was very young and my

Grandfather who taught me Algebra and Geometry. To my Mom and Dad,

words cannot express my gratitude, and to Keena (what is this universe?) and

Maya (illusion) for providing me with joyous and comic relief when I needed it

the most.

v

TABLE OF CONTENTS

DEDICATION..............................................................................................................................................II

ACKNOWLEDGEMENTS....................................................................................................................... III

LIST OF FIGURES ................................................................................................................................ VIII

CHAPTER 1: INTRODUCTION AND MOTIVATION.......................................................................1

1.1 MOTIVATION AND PROBLEM STATEMENT.....................................................................................5

1.2 RESEARCH OBJECTIVE AND SPECIFIC AIMS. .................................................................................7

1.3 OUTLINE OF THE THESIS ...............................................................................................................8

CHAPTER 2: BACKGROUND.............................................................................................................10

2.1 IMAGE GUIDED SURGERY -- CURRENT TECHNOLOGY IN THE OPERATING ROOM. .......................11

2.2 WHAT’S THE DIFFERENCE BETWEEN AUGMENTED REALITY AND VIRTUAL REALITY? ...............12

2.3 WHY IS VISUALIZATION IMPORTANT FOR MEDICAL ROBOTICS? .................................................15

2.4 MEDICAL IMAGING, SEGMENTATION AND 3D MODEL CREATION...............................................21

2.4.1 Medical Imaging Data...........................................................................................................22

2.4.2 Methods of Segmentation.......................................................................................................26

2.4.3 From Segmentation to 3Dmodel Creation.............................................................................30

2.5 WHAT IS THE IMPORTANCE OF HUMAN FACTORS IN MEDICINE AND ENGINEERING? ..................39

CHAPTER 3: IMAGE GUIDED SURGERY (IGS) ............................................................................41

3.1 LITERATURE REVIEW AND DESCRIPTION OF SYSTEM..................................................................41

3.2 IMPLEMENTATION OF IMAGE GUIDANCE SYSTEM.......................................................................44

3.2.1 Passive Robot Arm Used as the Tracking system ..................................................................44

3.2.2 Patient Registration with Fiducial Mapping .........................................................................48

3.2.3 Software Architecture............................................................................................................57

vi

3.3 DISCUSSION ................................................................................................................................60

CHAPTER 4: MEDICAL AUGMENTED REALITY SYSTEM (MARS) ........................................62

4.1 LITERATURE REVIEW AND DESCRIPTION OF SYSTEM..................................................................62

4.1.1 Medical Augmented Reality...................................................................................................66

4.1.2 Research in Camera Calibration...........................................................................................72

4.1.3 AR in Telepresence ................................................................................................................72

4.1.4 Live View with Registered Data ............................................................................................73

4.2 IMPLEMENTATION OF AUGMENTED REALITY...............................................................74

4.2.1 Coordinate Systems ...............................................................................................................75

4.2.2 Robotic-based Tracking of the Camera.................................................................................76

4.2.3 Computing the Pose of the Camera Relative to the End-Effector (TEE-C) ..............................78

4.2.4 Camera Calibration Used to Determine TP-C ........................................................................81

4.2.5 How to measure AR accuracy ...............................................................................................86

4.2.6 Software Architecture............................................................................................................90

4.3 AR SYSTEM ACCURACY.......................................................................................................91

4.3.1 Accuracy of the Microscribe..................................................................................................92

4.3.2 Accuracy of Camera Calibration...........................................................................................93

4.3.3 Total Application Error Dependencies..................................................................................94

4.4 DISCUSSION ............................................................................................................................97

CHAPTER 5: SURGEON FACTOR TESTING ................................................................................101

5.1 IMAGE GUIDED SURGERY VS. AUGMENTED REALITY—THE HUMAN FACTORS........................102

5.1.1 Intoduction/Motivation ........................................................................................................103

5.1.2 Method.................................................................................................................................104

5.1.3 Results .................................................................................................................................108

5.1.4 Conclusions .........................................................................................................................112

5.2 HUMAN FACTORS TESTING (HEADS-UP DISPLAY VS. MONITORS).........................114

5.2.1 Introduction/Motivation ......................................................................................................114

vii

5.2.2 Method.................................................................................................................................117

5.2.3 Results .................................................................................................................................119

5.2.4 Conclusions .........................................................................................................................122

CHAPTER 6: SUMMARY, CONTRIBUTIONS AND FUTURE WORK ......................................124

6.1 SUMMARY/CONTRIBUTIONS......................................................................................................124

6.2 FUTURE WORK..........................................................................................................................127

6.2.1 Stereo Augmentation ...........................................................................................................128

6.2.2 Sensor Technology/Data at the End Effectors of Robots.....................................................128

6.2.3 Continuous Zoom Camera Calibration ...............................................................................129

6.2.4 3D Ultrasound for AR .........................................................................................................130

6.2.5 Space Station Robotics (to infinity and beyond)..................................................................130

APPENDIX ................................................................................................................................................132

BIBLIOGRAPHY .....................................................................................................................................136

ABSTRACT ...............................................................................................................................................154

AUTOBIOGRAPHICAL STATEMENT................................................................................................156

viii

LIST OF FIGURES

Figure 1-1: The surgeon uses tools (shown on the right), and needs to visualize that data

using advanced display technology (as shown on the left). The highlighted portions of

the figure are the focus of this thesis. ................................................................................. 4

Figure 1-2: (a) is typical data displayed during Image Guided Surgery . It represents a

Virtual Environment. Notice that the vessel in is represented by two dots in the axial

slice and the cube is represented by a triangle in the coronal slice. (b) is 3D geometr

geometry data (models) registered and displayed on a live video view of the same

phantom viewed with the AR prototype system developed in this thesis. This represents

an Augmented Reality view. Note the difference between AR and VR and the associated

differences in visualization. ................................................................................................ 6

Figure 2-1: The foundation technologies behind both Augmented Reality and Image

Guidance which will be discussed in this chapter. At the heart of this thesis is a Human

Factors Analysis that compares Augmented Reality and Image Guidance technology. .. 10

Figure 2-2: Two different forms of "Virtual Reality": On the left, a tracked tool’s pose is

displayed in a 3D image and orthogonal slices of a CT scan of the phantom brain. This

system was a technology developed as part of this thesis. On the right, the tracked user of

the VR system becomes a model of a space-suited astronaut performing tasks on a 3D

model of the space station. Note that in both systems, the user is viewing a virtual world.

........................................................................................................................................... 13

Figure 2-3: This is an example of an Augmented Reality Scene. The live view of the

phantom is augmented with virtual objects derived from CT scans of the phantom brain.

ix

This augmentation overlays the actual objects with 3D wireframe models of the actual

objects. This technology (also developed as part of this thesis) uses the same tracking

device as the image guided system (See Figure 2-2a) but uses the end-effector mounted

camera to generate the AR scene. ..................................................................................... 16

Figure 2-4: Surgical site (right) and the remote surgeon site (left) for the master/slave

Zeus Robot (Computer Motion inc.). The surgeon is using hand controllers and voice

recognition to control the three arms of the laparoscopic instrumentation. He relies on

raw video images from the endoscopic camera to perform his operation. It is the premise

of this thesis that advanced forms of visualization may increase the surgeon’s

performance. Picture with permission from Computer Motions Inc. .............................. 16

Figure 2-5: Neurosurgical Robotic Device (Neuromate, Integrated Surgical Systems Inc.).

Here the robotic device holds a tool for the surgeon at a very precise position and

orientation and the surgeon can then performing a biopsy of the patient’s tumor. The

biopsy needle can be tracked on an image guided system (shown on the right) to allow

the surgeon to know when the target is achieve and that no other important structures

(like vessels are in the way). Augmentation techniques could easily be added to such a

system. .............................................................................................................................. 20

Figure 2-6: The process of segmentation and model generation: (a) A plastic phantom

skull with simple objects fixed inside the skull was scanned with a CT scanner at 2mm

slices. (b). This is one coronal slice of the raw CT data. (c) This is a label-map of that

one slice overlaid on the raw CT image. (d) This is the 3D model generated using the

Marching Cubes algorithm after processing all the CT slice data. It shows a transparent

skin model through which the internal front view of the phantom is visible. .................. 24

x

Figure 2-7: 3D models generated from CT scans of our phantom skull. Each patch of the

3D model contains a corresponding file item which specifies its vertex list and edge list.

........................................................................................................................................... 32

Figure 2-8: Data Flow for an AR Scene Synthesis. .......................................................... 35

Figure 2-9: Fiber Optics technology that was studied as a possible candidate for AR/IGS

tracking. ............................................................................................................................ 38

Figure 2-10: Successful technology development for medicine must include extensive

user testing and surgical feedback. ................................................................................... 40

Figure 3-1: The three planes of Image Data set. (Axial, Coronal, and Sagittal) ............ 43

Figure 3-2: In this Experiment, we captured the same point in space using different arm

configurations. We found that there is a 0.91mm Standard Deviation . The Red line

represents the average. ...................................................................................................... 47

Figure 3-3: This is the transformation structure of a typical robotic device. The green

arrows show the degrees of freedom of the arm. The blue lables (with red arrows) show

the transformations needed to compute the Base to End-effector Transform. ................. 47

Figure 3-4: The transformation between the patient coordinate system and the image

coordinate system can be represented by a translation vector (T) and a Rotation vector

(R). These entities for the homogeneous 4x4 transformation matrix. ............................. 50

Figure 3-5: Fiducial markers (visible in CT scans) are applied to the phantom before the

scan . ................................................................................................................................. 52

Figure 3-6: A Fiducial marker in the imaging space is located on all three slices of the CT

scan and is displayed (in blue) on the 3D model. ............................................................. 52

xi

Figure 3-7: On the left, the fiducial marker on the actual skull is being digitized by the

robotic articulated arm. This point corresponds to the one shown on the 3D model (right).

........................................................................................................................................... 53

Figure 3-8: Parameter estimation for pair-point matching algorithm. The paired points

(fiducials in image coordinates and patient coordinates) are used to derive a rigid body

transformation that will translate a vector in patient space to a vector in image/3D model

space.................................................................................................................................. 54

Figure 3-9: This is the model for coordinate transform estimation . The point in the input

of the model are converted to a new coordinate system using the given estimated

parameters. ........................................................................................................................ 56

Figure 3-10: A surgeon is using an image-guided system where the tool that she is using

is tracked by an infrared camera system and displayed on the orthogonal slices of the

preoperative MRI scan...................................................................................................... 57

Figure 3-11: Image Guidance systems that are used in the OR was re-implemented in this

thesis to use the same articulated-arm tracker as the AR system as a testbed for evaluation

of this current technology with the up-coming AR technology........................................ 57

Figure 3-12: Software implementation of the Image Guidance System. The system is

implemented as a client-server system in which multiple clients anywhere on the internet

can view the scene as seen by the main client. ................................................................. 59

Figure 3-13: This is the Tracker Server interface. This software does the pair-point

matching on the image data and also handles communication of various tracking

information to multiple clients on the network................................................................. 61

xii

Figure 4-1: An AR scene can be generated by the alignment of the camera’s trajectory

and the 3D graphics virtual camera’s trajectory. Once the two cameras are aligned, the

actual objects will match their 3D modeled replicas. ....................................................... 64

Figure 4-2: The precursor of Augmented Reality/Augmented Robotics. We use the

Microscirbe as the tracked tool, (a). The position and orientation of the end-effector is

shown on the orthogonal slices and 3D model of the phantom skull. After adding a

calibrated and registered camera, (b). we can generate a monoscopic Robotic

Augmentation Scene,(c).................................................................................................... 65

Figure 4-3: These are the steps needed to generate both a Neuro Navigation (NN) System

and an Augmented Reality System. Note that AR represents an extension to

Neuronavigation and can be performed simultaneously with NN.................................... 75

Figure 4-4: A series of coordinate systems for the AR development ............................... 76

Figure 4-5: The transformations needed to compute an AR scene. The main transform

( CEET − )- The transform from the end-effector to the camera coordinate system) is the

primarytransformation matrix that is computed for AR. .................................................. 80

Figure 4-6 Camera Calibration Model. Objects in the World Coordinate System need to

be transformed using two sets of parameters - extrinsic and intrinsic parameters. .......... 82

Figure 4-7: Camera Parameter Estimation. An initial guess of the extrinsic parameters

comes from the DLT method. The observed CCD array points and the corresponding

computed values are compared to determine if they are within a certain tolerance. If so,

the iteration ends ............................................................................................................... 86

Figure 4-9: A cube is augmented on the live video from the Microscibe. Three

orthogonal views are used to compute the error: (A) represents a closeup view of the

xiii

pointer (the known location of the cube corner) with the video camera on the x axis, (B)

the pointer here is viewed from the y axis and (C) the pointer viewed from the z axis.

(D) represents an oblique view of the scene which shows the pointer, the camera and the

video view on the monitor showing the cube superimposed. ........................................... 89

Figure 4-10: Software implementation of the AR system with the Image Guidance

System. Here, the Kinematic server supplies position orientation to both the AR and IGS

clients. Each client has their own models of display preloaded. With this software

architecture, simultaneous AR and IGS are possible........................................................ 91

Figure 4-11 The errors of the distorted image. The contours represent error boundaries.

(left) Note that for radial distortion at the center there are less than 5 pixels of error and at

the corners the errors exceed 25 pixels of error. The Tangential distortion is an order of

magnitude less than the radial distortion. ......................................................................... 94

Figure 4-12 Errors contributing to the total error involved in Augmented Reality .......... 97

Figure 5-1: Computer Motion (Zeus Robot) provides either a Heads-up Display view of

the surgical site or a monitor view. The Davinci robot provides an immersive

stereoscopic view of the remote video. Which configuration provides the best

performance?................................................................................................................... 102

Figure 5-2: This figure shows a screen shot of a subject looking at the live video view of

skull overlaid with the 3D graphics objects on the monitor. Their Marker is then placed

on the edges of the overlaid object and the object is traced on the surface of the draped

skull................................................................................................................................. 107

xiv

Figure 5-3: This figure shows screen shots of a subject looking at the orthogonal slices in

am image guidance system to find the extents and shape of the object for which the skull

opening has to be made................................................................................................... 107

Figure 5-4: Errors made by the subjects during the testing period. ................................ 109

Figure 5-5 The time required for each subject to complete the craniotomy task and

answer questions on. ....................................................................................................... 109

Figure 5-6: Different hardware methods to display a remote camera view.................... 116

Figure 5-7: The phantom skull with a black track of velcro and several blue plastic

pieces of tubing that the subject was asked to remove through the opening in the foam.

......................................................................................................................................... 118

Figure 5-8: A subject performing the experiment using the monitor and the heads-up

display. ............................................................................................................................ 118

Figure 5-9: Time for testing using the HUD subtracted from time for testing using the

Monitor. .......................................................................................................................... 119

Figure 5-10- Average time to perform a the same task for 15 subjects for the Heads Up

Display, a nonitor placed directly in front of the subject and a monitor placed at a 45

degree angle from the subjects task. ............................................................................... 120

Figure 5-11: Average number of errors to perform he same task for 15 subjects for the

Heads Up Display, a nonitor placed directly in front of the subject and a monitor placed

at a 45 degree angle from the subjects task..................................................................... 121

Figure 5-12: This is the t-statistic analysis that was done on the data. Note that the two

monitor conditions 0 and 45 degrees have a very high p value indicating that there is no

statistical significance. When comparing the 45 degree monitor with the HUD, there is a

xv

statistical significance at alpha = 0.5, and the monitor at 0 degrees is on the borderline of

significance. Note the break in the y axis to show the difference in magnitude............. 121

1

Chapter 1: INTRODUCTION AND MOTIVATION

Knowing something, seeing something, enquiring into something in different

ways from different angles is insight. - The Buddha

In medicine Intelligent Amplification (IA) (Azuma R. 1997) describes the use of

computers and other associated technology to gain insight about the state of the patient

and to make tasks easier to perform (Ayache 1995). Engineering has many tools

(Pransky 2001) that are proving to be extremely useful throughout the medical field.

Medical robotics (Burkart A. and Fu FH 2001;Cleary K. 2001) is beginning to

demonstrate tremendous potential for surgeons to improve their performance. One

important application of robotics is to augment the surgeon’s motor performance

(Riviere et al. 2003), especially in performing small delicate tasks, by tremor filtration

and motion scaling. The current clinical application is in minimally invasive surgery

(MIS), also known as laparoscopic, thoracoscopic, or endoscopic surgery. Minimally

invasive surgery in many cases results in less tissue trauma, less scarring, less pain,

and a quicker return to normal activities for patients (Taylor et al. 1999;Taylor et al.

2003). The surgical robots allow the surgeon to use a wrist on the end of the MIS

instrument that has no wrist if used without the robot (Taylor and Stoianovici 2003).

However, the problem it poses is that surgeries become more difficult. In MIS the

magnification and therefore the size of the field of view changes with the proximity of

the scope to the objects being viewed (Burkart A. and Fu FH 2001). Because of the

small incisions and camera view, the surgeon is no longer able to see inside the patient

directly. Visualization is critical for these systems that use a robotic interface as the

2

surgeon typically operates from a remote location and relies almost entirely on indirect

limited field-of-view video of the surgery (Ayache 1995;Burkart A. and Fu FH

2001;Cleary K. 2001).

During complex operations, a surgeon must maintain a precise sense of three-

dimensional anatomical relationships (Bucholz et al. 2001). It is of great importance to

see what is usually hidden. Hence, imaging is especially critical in medicine. Surgeons

can now "see" on a 3Dimentional image where their tracked surgical tools are with

respect to the lesion responsible for the patient's problems. A relatively new field, image

guided surgery blends the use of computer-based medical imaging data with real-time

instrument position data capture to assist the surgeons in localizing and removing

lesions. This technology is now starting to be used in several branches of surgery (e.g.

Neurosurgery, Spinal surgeries (Holly and Foley 2003), Orthopedic surgery (DiGioia

1998), Dental surgery, and even some examples of general surgery (Cash et al. 2003)).

Computer based medical imaging has revolutionized surgery. Relatively new imaging

modalities (to be discussed later) help pinpoint specific structures and substructures

and even variable function within a single structure can often be mapped. Newer

imaging modalities such as ultrasound, Computed Tomography (CT) scan, Magnetic

Resonance Imaging (MRI), and Positron Emitting Tomography (PET) scan go even

further in demonstrating specific structures such as nerves and blood vessels (Roberts

et al. 2001). In the case of functional MRI (fMRI) even variable function within a single

structure can often be mapped. Technology that integrates and brings this imaging and

sensor information to the surgeon in real-time as she performs procedures will add new

3

dimensions to what can be done to diagnose and treat patients (Nakao et al.

2003;Samset and Hirschberg 2003).

There are two different types of visualization technology that are being analyzed

here for the medical domain.--Augmented Reality (AR) and Virtual Reality (VR) (A.

Pandya 2003;Pandya A.K. ;Pandya A.K. 2000a;Pandya A.K. 2001a;Pandya A.K. 2002).

Image guidance is an example of Virtual Reality. In Image Guidance Surgery (IGS), the

surgeon views a computer-generated world of image data and 3D models. In contrast,

the AR system generates a composite view for the user that includes the live view fused

(registered) with either pre-computed data (e.g. 3D geometry) or other registered

sensed data (Pandya A. K. 2002). Augmented Reality is a variation and extension of

Virtual Reality and represents a middle ground between computer graphics in a

completely synthetically generated world (as in VR) and the real world (Pandya A.K.

2001f;Pandya A.K. 2001e;Pandya A.K. 2001c;Pandya A.K. 2003c;Pandya A.K. 2003a).

The current technique of image guidance does not allow the surgeon to use both real

and synthetic data simultaneously (Azuma R. 1997). The surgeon can detect

anomalies using advanced imaging and sensors and can accurately place their tools

within the surgical environments with robots. Nevertheless, he also needs his own

vision to detect other features that may not be available from the sensor information.

This, we conjecture, is one of the advantages of AR.

The overall goal of this research focuses on the improvement of 3-dimensional

visualization aspects of Robotics-based operations and Image Guided Surgery. The

highlighted portions of Figure 1-1 represent the areas of focus for this research. The

modern operating room is beginning to fill with visualization tools that enhance the

4

Surgeon

Robotic Manipulation

Patient site

Imaging & Sensing Tools

Image-guided Systems

Open MRI

Ultrasound

Flouroscopes

Raman Spectroscopes

Heads-up Display

Augmented Reality Displays

Virtual Reality Displays

3D Stereoscopic Displays

Holographic Displays

Sensor Fusion and Visualization

surgeon’s understanding of the patient’s medical situation (Pransky 2001). These tools

include image-guided systems, ultrasound, open Magnetic Resonance Imaging (MRI)

systems, fluoroscopes, microscopes, and endoscopes, all aimed to help the surgeon

see with greater clarity and from different points of view the problem that the patient is

facing. These tools have associated visual information that needs to be integrated and

optimally visualized by the surgeon (See Figure 1-1).

Figure 1-1: The surgeon uses tools (shown on the right), and needs to visualize that data using advanced display technology (as shown on the left). The highlighted portions of the figure are the focus of this thesis.

Visualization of multimodal sensor information can assist the surgeon to

synthesize and integrate relationships that are not readily formulated. Hence, the goal

of this work is to create new visualization technology that integrates with the imaging,

sensing and robotics systems that the surgeon uses, and compare it with existing

technology. We use the tools of Human Factors Analysis to gage whether the new

5

technology improves the surgeon’s performance by helping him see and understand the

reality of the patient’s condition.

1.1 Motivation and Problem Statement

One of the main new technologies that is developed and tested here is

Augmented Reality (AR). It blends the real world with 3D models generated from

medical scans or other data. AR technology fuses both the real view and the sensor

view to provide the surgeon both types of information. The virtual objects display

information that the user of the system may not be able to see directly due to occlusion

or other factors. The hypothesis of this work is that applying advanced technology for

the visualization of real-time medical data will enhance the performance, comfort and

insight of the surgeon. It will then also improve the morbidity and mortality of patients.

In this thesis, we focus on the development, the accuracy testing and the comparison of

a relatively new technology (Augmented Reality) to what is currently used in the OR

(Image guidance). We have used medical human factors testing and evaluation to

determine the validity and usability of the developed systems to gage if the AR

technology actually improves surgeon performance.

Currently image guidance provides primarily three 2-D views (coronal, axial and

sagittal views) to gain awareness of the patient geometry (Hinsche and Smith

2001;Wadley and Thomas 2000). The surgeon has to perform the 2D (image) to 3D

transformation in their minds while projecting the envisioned data on the view of the

patient. There are advantages (which we will prove) to registered visualization

techniques that are able to help fuse the 3D data with what the surgeon is actually

6

seeing. We believe that AR generation is a natural extension for the surgeon because it

does both the 2D to 3D transformation and projects the views directly on the patient

view. To illustrate the difficulty to interpret 2D slices, in Figure 1-2(a) a simple 3D shape

like the cube is represented by a triangle in the coronal slice and a vessel is

represented by two dots in the axial view of the CT scan due to the orientation of that

particular CT slice. Currently, the surgeon must convert these views to a 3D

representation and merge it with what she physically sees on the patient. This can be a

very heavy mental load on the surgeon. This scene represents a Virtual Reality (VR)

scene where the actual live view is not presented. Figure 1-2(b) represents a live video

view of the same phantom skull with the models of interest displayed directly on the

view. This view, generated using our prototype, represents an AR view because the

live view is presented and augmented with additional geometrical information.

Figure 1-2: (a) is typical data displayed during Image Guided Surgery . It represents a Virtual Environment. Notice that the vessel in is represented by two dots in the axial slice and the cube is represented by a triangle in the coronal slice. (b) is 3D geometric data (models) registered and displayed on a live video view of the same phantom viewed with the AR prototype system developed in this thesis. This represents an Augmented Reality view. Note the difference between AR and VR and the associated differences in visualization.

(a) 1(b

7

1.2 Research Objective and Specific Aims.

This research describes the implementation, accuracy assessment and usability

testing of two prototype systems. The first, an image guidance system, represents what

the surgeons currently use in the neurosurgery operating theater. The second, medical

Augmented Reality system, represents potentially new or upgraded visualization

system. A passive articulated robotic arm (Microscribe, Immersion Technology) is used

to develop both systems. The AR system uses the arm with a mounted camera system

at its end-effector. This thesis covers the steps needed to build and compute both the

real-time Image Guided (IG) and Augmented Reality (AR) scenes for objects of interest.

It also provides an in-depth error analysis of the built prototypes. In addition, we

provide all the software, procedures and hardware list to recreate both prototypes upon

request.

Our research focus is on accurate registration that must be maintained while a

user, a robot or the needed tools move within the real environment. The optical (focal

length and lens distortion) of the camera and geometrical (position and pose)

parameters of the surgeon, robot or tool determine exactly what is projected onto the

image plane (Tsai 1987.). The main technical objective for this work is to develop a

testbed AR system and test its effectiveness to assist the operator in the performance

of the task as compared to a testbed Image Guided Surgery (IGS)

system. In addition, advanced display technology will also be investigated for its

effectiveness in the display of remote information. The underlying hypotheses are that

(1) an Augmented Reality System will significantly improve the performance of the

8

surgeon and (2) that advanced visualization hardware (e.g. heads up display) can

improve the performance of the surgeon.

The specific aims of this thesis are as follows:

1. Describe the Implementation of a prototype Augmented Reality system

for robotics.

2. Describe the Implementation of a prototype Image guidance system.

3. Perform a subject study to determine the pros and cons of Augmented

Reality vs. Image guidance

4. Perform subject studies to determine if heads-up displays provide any

advantage over monitor viewing.

The ultimate aim is to extrapolate the findings and development of this thesis to

active medical robotic systems and IGS systems in existence. Examples of current

robotic systems that could take advantage of this technology include systems such as

the Neuromate and Robodoc systems (Integrated Surgical Systems Inc.) (Taylor and

Stoianovici 2003), daVinci (Intutive Inc.)(Hoznek et al. 2002;Tewari et al. 2002), and

the Zeus (Computer Motions Inc.) systems (Nio et al. 2002)(Knight et al. 2003b).

1.3 Outline of the Thesis

The outline of the thesis is as follows: Chapter two provides background

information on Image Guidance and Augmented Reality technology, Medical Imaging

technology, Segmentation, and 3D modeling and on the discipline of Human Factors

Engineering as applied to medical technology development. A section of background

is also provided on tracking devices. Successful augmentation must have accurate and

9

reliable tracking methods. Tracking is a key component and various tracking methods

along with their advantages and disadvantages are discussed. The next two chapters

(chapters three and four) deal with technology development for both Robotics-based

augmentation and image guidance. Both of these technologies were developed during

the course of this thesis. Image guidance represents what is currently used in the

operating room while AR is a technology that we contend may become the next step for

the operating room. Chapter five describes the human factors studies that were done. It

is not enough to develop technology, as Engineers, we must validate and prove the

performance benefits of the technology. The first study deals with the comparison of

Image guidance with Augmented Reality. The major questions are— “Does Augmented

Reality offer any improvement in the surgical performance over using an Image

Guidance system?” “What are the advantages and disadvantages of Medical AR

technology?” The second study compares the use of head mounted displays to monitor

viewing for endoscopic surgery. Endoscopic views are the ones primarily used in robotic

surgery and an important question is, “Does head-mounted display provide any

improvement over monitor viewing?” Chapter six concludes with the contributions of

this thesis and more importantly, the future applications and research directions that

can be used to build on this work. There are many avenues that we predict will lead to

very useful medical tools of the future that will allow the medical doctor to gain more

insight into the patient’s condition. In this last chapter, the potential future directions will

be provided.

10

Chapter 2: Background Research is the act of going up alleys to see if they are blind. - Plutarch

In this chapter, basic definitions and background on the foundation

technologies being used to develop the ideas in this thesis are given. First, a brief

introduction about image-guided surgery (Virtual Reality) is given along with its

impact on surgery. Second a description about what Augmented Reality is and how it

differs from Virtual Reality is discussed. Next a description of medical robots and

their advantages and disadvantages are described along with the reasons that

visualization technology is considered a critical factor for these systems. In addition,

the major foundation technologies on which the technologies of Augmented Reality

and Image Guidance Surgery (IGS) are based (3Dmodeling /imaging, tool/user

tracking and accurate registration and calibration of objects/cameras within the

environment) will be presented (See Figure 2-1).

Figure 2-1: The foundation technologies behind both Augmented Reality and Image Guidance which will be

discussed in this chapter. At the heart of this thesis is a Human Factors Analysis that compares Augmented Reality and Image Guidance technology.

Human Factors Analysis

3D Modeling

Segmentation

Medical Imaging

Tool/Camera Calibration

Tracking

Patient Registration

Augmented Reality/

Image Guidance

11

At the end (and at the heart of this work), the discipline of Human Factors

Engineering is introduced, and its critical importance in the early stages of medical

technology development is presented.

2.1 Image Guided Surgery -- Current technology in the Operating Room.

Throughout every operation, a surgeon must maintain a precise sense of

complex three-dimensional relationships. Computer image processing and real-time

3D visualization was first used in the field of Neurosurgery (Gallen et al.

1994;Zamorano et al. 1987a;Zamorano et al. 1987b). . It is and will have to remain

an integral part of the surgical field to increase the chance that highly delicate

surgeries will be smooth and successful. Computer based medical imaging has

revolutionized Neurosurgery. Surgeons can now "see" on a 3-Dimentional image

where their tracked surgical tools are with respect to the lesion responsible for the

patient's neurological problem. A relatively new field, image guided surgery, blends

the use of computer-based medical imaging data with real-time instrument position

data capture to assist the surgeons in localizing and removing lesions (See Figure

2-2). The contributions of computer image processing and 3-dimensional visualization

to neurosurgery are becoming widely recognized, and attempts are being made to

apply the benefits to other surgical disciplines. (e.g. spine surgery (Holly and Foley

2003), and orthopedic surgery (DiGioia 1998)). Surgeons are now starting to

recognize the importance of enhancing intra-operative visualization technology and

are requesting and doing more research in this area.

12

A Virtual Reality scene is a completely computer-generated scene and

requires high-performance graphics workstations to generate acceptable levels of

realism(Chmielewski C. 1999;Goldsby M.E. 1994;Pandya A.K. 1994). Typically,

Virtual Reality systems are interactive. A user’s requests and responses are

captured and the scene is updated. In typical VR scenes, the user is completely

immersed in the environment wearing a head-mounted display and interacts with

virtual objects. On the right side of Figure 2-2 a user of the VR system wears a head-

mounted display. The user’s arms, torso and head are tracked using magnetic

tracking and a pair of cybergloves are used to track his fingers(Chmielewski et al.

1998). The user then becomes an ‘avatar’ (in this case a space-suited astronaut) and

can navigate and interact with (via graphical collision detection) and move all the

virtual objects of interest. In image-guidance, another example of a virtual

environment, the surgeon’s tool is tracked and its actual position and orientation is

displayed in a virtual 3D model of the patient’s brain that is created through 3D image

segmentation techniques from imaging data. In this case, the surgeon views the

virtual world on a monitor (See Figure 2-2) or can use a heads-up-display.

2.2 What’s the difference between Augmented Reality and Virtual Reality?

Augmented Reality is a variation and extension of Virtual Reality. AR is a

middle ground between computer graphics in a completely synthetically generated

world and the real world. It is a computer graphics generated image of the real world

obtained from data from the real world specific to a patient. In AR a surgeon views

a model of a patient’s brain that has been created from imaging data. This imaging

data “augments” the surgeon’s simultaneous visualization of the real brain. In some

13

cases it may even substitute for this real-time visualization of the real brain. The AR

scene is built on much of the same technology as a VR scene. 3D models of interest

and patient registration are still needed. In contrast, an AR system involves the use

of either a video camera or a see-through head mounted display both of which allow

a window to the real world. The AR system generates a composite view for the user

that includes the live view fused (registered) with either pre-computed data (e.g. 3D

geometry) or other registered sensed data. It is a combination of the real scene

viewed by the user and a virtual scene generated from 3D geometry/segmentation

accurately co-registered on the display that augments the scene with additional

information. As opposed to VR, in the AR world, the user’s sense of being in the real

world is maintained. AR supplements reality, rather than completely replacing it as in

VR(Billinghurst et al. 2001;Broll et al. 2001).

Figure 2-2: Two different forms of "Virtual Reality": On the left, a tracked tool’s pose is displayed in a 3D

image and orthogonal slices of a CT scan of the phantom brain. This system was a technology developed as part of this thesis. On the right, the tracked user of the VR system becomes a model of a space-suited astronaut performing tasks on a 3D model of the space station. Note that in both systems, the user is viewing a virtual world.

Real-time tracked end point is shown on CT scan and 3D model

User arm movements and

view points match.

14

AR technology is not to be confused with 2-D virtual overlays on top of live

video. 2-D overlays are constructed without registration to the real 3-D world and

represent a static display on video monitors. In AR, it is a combination of the real

scene viewed by the user and a virtual scene generated from 3D

geometry/segmentation accurately co-registered on the display that augments the

scene with additional information that appears in the same visual environment.

Recently real-time video processing and computer graphics have provided us with the

capability of augmenting the video stream with geometrical replicas of the actual

objects or sensor data. Critical objects of interest within the patient’s brain (for

instance) determine exactly the size, shape, location and orientation of the

craniotomy (skull opening) to be performed. The objects of interest can be tumors,

major vessels, or anatomically/physiologically important brain structures. Before an

operation there are usually several sets of image data available (MRI, CT, SPECT,

Functional MRI, etc.). The image data provides very important information about the

spatial arrangements and the functional importance of objects of interest within the

brain in the image space. Figure 2-3 illustrates an example of AR. The live view of

the phantom is augmented with virtual objects derived from medical imaging scans of

the phantom brain. This augmentation shows where the various objects and

"vessels" are located directly on the live video view. As the user moves the arm, the

objects are regenerated to correspond to the actual objects.

Through the process of segmentation and model extraction, 3D computer

graphics models that can be used as the input to the AR system can be generated

(Cline et al. 1987). In a futuristic vision, Robinett (Robinett 1992) speculates that AR

15

may be useful in applications that require displaying any information not directly

available or sensed by the human by making that information visible, audible or even

felt. Examples of this kind of data could be spectroscopy data, Doppler (blood flow

velocities), temperature, chemical concentrations, pressure information etc.

2.3 Why is visualization important for Medical Robotics? Medical Robotic systems like the Zeus (Computer Motion Inc.) and DaVinci (Intuitive

Surgical Inc.) are master-slave systems for minimally invasive surgery (See Figure

2-4). They offer the advantages of motion scaling, tremor filtration and comfortable

surgeon interfaces. They also provide a very functional wrist at the end of the MIS

instrument which is not available in standard MIS (Knight et al. 2003a). Direct linkage

of medical robotic systems to patient data and the optimal visualization of that data

for the surgical team are important for successful operations. In their review article on

medical robots, (Cleary K. 2001) state that if medical robots are to reach their full

potential, they need to be more integrated systems in which the robots are linked to

the imaging modalities or to the patient anatomy directly. They state further that

robotics systems need to be developed in an “Image-Compatible” way. Visual

information from the patient (i.e., remote) site needs to be augmented in a way that

allows greater situational awareness, accuracy and confidence. That is, these

systems must operate within the constraints of various image modalities such as CT

and MRI. This link, they conjecture, is essential if the potential advantages of robots

are to be realized in the medical domain.

16

Figure 2-3: This is an example of an Augmented Reality Scene. The live view of the phantom is augmented with virtual objects derived from CT scans of the phantom brain. This augmentation overlays the actual objects with 3D wireframe models of the actual objects. This technology (also developed as part of this thesis) uses the same tracking device as the image guided system (See Figure 2-2a) but uses the end-effector mounted camera to generate the AR scene.

Figure 2-4: Surgical site (right) and the remote surgeon site (left) for the master/slave Zeus Robot (Computer Motion inc.). The surgeon is using hand controllers and voice recognition to control the three arms of the laparoscopic instrumentation. He relies on raw video images from the endoscopic camera to perform his operation. It is the premise of this thesis that advanced forms of visualization may increase the surgeon’s performance. Picture with permission from Computer Motions Inc.

Master-slave robotics system represents an evolutionary leap from traditional and

laparoscopic surgery (Taylor et al. 2003). The surgeon is comfortably seated at a

well-designed master-slave controller interface and console (See Figure 2-4). At the

surgeon site (remote site) the system consists of a stereoscopic monitor, a foot pedal

End-Effector Camera

17

for control, and hand controllers for robotic end-effector (tool) manipulation. At the

robot-patient interface, the system consists of three robotic arms. Two arms (with

tools mounted on them) are used for surgical manipulation and the third arm has a

camera system for visualization. The third camera arm is controlled effectively with,

for example, voice recognition technology or with foot controllers. The robotic system

has several advantages over traditional surgery. The system is able to modulate the

surgeon's motions by tremor filtration. Inadvertent high frequency motions made by

the surgeon can be filtered out allowing for finer and smoother control. The system

can also use a technique of motion scaling to allow centimeters of motions made by

the surgeon to be translated to sub-millimeter motions at the robot-patient interface.

This allows for more precise microsurgery. It can also be used to compensate for the

bodies own motion. For example, a beating heart can be followed by the robot’s

motion resulting in the heart appearing stationary to the remote operator (Kappert et

al. 2001). This would allow delicate surgery to be performed on a beating heart with

precision and without the dangers of stopping the heart. (Bowersox et al. 1998)

studied the feasibility of the use of telepresence surgery to perform basic operations

in vascular surgery, including tissue dissection, vessel manipulation, and suturing.

They used a prototype telepresence surgery system with bimanual force-reflective

manipulators, interchangeable surgical instruments, and stereoscopic video input.

Arteriotomies created ex vivo in segments of bovine aortae or in vivo in femoral

arteries of anesthetized swine were closed with telepresence surgery or by

conventional techniques. Time required, technical quality and subjective difficulty

were compared for the two methods. All attempted procedures were successfully

18

completed with telepresence surgery and the precision attained with telepresence

surgery was equal to that of conventional techniques. They concluded that Blood-

vessel manipulation and suturing with telepresence surgery are feasible. In fact, this

robotic technology has reached sufficient maturity to allow FDA approval for surgical

use on humans.

Even with enormous technological gains, robotic surgery is still at its infancy.

There are some major areas of technological improvement needed for this technology

to reach its ultimate potential (Cleary K. 2001). Because the surgeon is remotely

located and relies almost entirely on indirect visual information, we believe that

visualization technology is one of the major areas that will make these systems of the

future more useful, powerful, easier to use and ultimately lead to better surgical

outcomes.

Medical robots are typically used where the surgeon is remotely located from

the patient. Visual information from the patient (remote) site needs to be augmented

in a way that allows greater situational awareness and confidence. In addition,

surgical planning and information management for these robotic systems is essential

for successful operations (Cleary K. 2001). Two main problems encountered in

robotic surgery are non-optimal port placements and robotic arm collisions. Robotic

arm collisions often require manual repositioning of the robotic arms on the operating

table that unnecessarily adds to the operative time. Incorrect port placement typically

results in robotic arm collisions, can lead to damage to robotic instruments and can

also lead to inaccessibility of the operative site. Improved accessibility can enhance

patient safety (Partin et al. 1995). These problems can be avoided in the pre-

19

operative stages given the appropriate visualization tools. For these reasons, it is

important that a robust visualization system be built that is linked to patient imaging

data that offers the surgeon tools for visualization, robotic system setup and port

placement. Computer modeling tools that help visualize the anatomical structures of

the patient would greatly aid the surgeon in the pre-operative stage. Visualization

tools can help the surgeon determine optimal port placement sites. In addition, these

tools will help determine the placement of the robotic arms on the operating table in

order to avoid collisions between arms during the procedure while maximizing the

range of motion of the instruments. A significant potential exists to impact medical

robotics with the pre-operative planning and intra-operative visualization tools (Taylor

et al. 2003).

Medical robotic systems are playing an increasing role in different image-

guided surgical procedures (Taylor et al. 2003). The key advantages are that robots

can effectively position, orient, and manipulate surgical tools in 3D space with a high

level of accuracy. The NeuroMate™ robot system (Integrated Surgical Systems,

Davis, CA) is a commercially available, image-guided robotic-assisted system used

for stereotactic procedures in neurosurgery (See Figure 2-5). This robotic device is

able to precisely hold tools at predetermined configurations and allows the surgeon to

perform very delicate and accurate placement of tools. We have performed a very

detailed accuracy study for the Neuromate robotic system details of which are

available in several of our published papers (Li Q 2001;Pandya A.K. 2000b;Pandya

A.K. 2000a;Zamorano L. 2000). These papers contribute to this field by showing a

method to measure the accuracy of robotic devices and also proves the utility of the

20

Neuromate system for Neurosurgery. As seen in Figure 2-5, the device also comes

configured with an image-guided system that allows the tool being placed to be

accurately tracked within the surgical space. It is primarily used for Neurosurgeries

and assists the surgeon by accurately and stereotactically placing tools in the surgical

field. It’s sister robot, the Robodoc, has a very similar design and is used for knee and

hip replacement surgeries. It is an active robot and is approved to cut groves into the

patient for joint replacement.

Figure 2-5: Neurosurgical Robotic Device (Neuromate, Integrated Surgical Systems Inc.). Here the robotic device holds a tool for the surgeon at a very precise position and orientation and the surgeon can then

performing a biopsy of the patient’s tumor. The biopsy needle can be tracked on an image guided system (shown on the right) to allow the surgeon to know when the target is achieve and that no other important structures (like vessels are in the way). Augmentation techniques could easily be added to such a system.

Wayne State University at Harper Hospital (of which the author was the

Engineering lead of the team) was the first to perform a clinical case using the

Neuromate system in the United States. This system takes advantage of links to the

patient image data and connects the robotic movements to the knowledge base of

patient-specific image data and structures. For instance, the kinematic positioning

software system knows if the arm is about to intersect with the patient and restricts

movement accordingly. The system software (VoXim™, IVS Software Engineering)

21

allows precise image-based planning and visualization of multiple trajectories.

Although this system is image data linked, it is not a master-slave dexterous device

and does not have an AR interface.

2.4 Medical Imaging, Segmentation and 3D Model Creation

The topic of 3D model generation is important to Image Guidance Surgery

(IGS), and even more important to Augmented Reality systems. For IGS surgeons

use primarily the orthogonal scans of the imaging data to perform the surgery. The

available 3D models are typically used as secondary information. In Augmented

Reality, the 3D models are the primary source of visualization. Because the role of

3D modeling is central for the medical visualization domain, in this section, a

discussion of how 3D models are generated from medical imaging data is provided. A

brief description of commonly used imaging techniques is also described. In addition,

a literature survey is conducted which shows the current state-of-the-art in

segmentation technology. Finally, some of the segmentation results that show the

models that were used to conduct studies for this thesis are shown.

The overall process of segmentation and model creation is illustrated in In our

experiments with segmentation, we fixed some simple objects inside the plastic

phantom of the skull. We took a CT scan of the phantom at 2mm increments through

the entire phantom. Once the imaging data had been collected, the next and very

important step was to segment the data. Segmentation is defined as the process by

which a label map is generated for each slice of imaging data to represent the

different regions of interest. There is an enormous and important research area for

22

segmentation that can be used for improving the accuracy and ease of segmentation

(Chen et al. 2003;Harders and Szekely 2003;Horkaew and Yang 2003;Lee et al.

2003;Tsai et al. 2003). We created the 3D models the marching cubes algorithm

which will be discussed in this chapter (See Figure 3-7).

Creating the virtual objects is a necessary step for generating an Augmented

Reality environment that is based on imaging data. For medical applications, each

patient has a unique set of objects that can be used for augmentation. Usually these

objects are tumors, skin surfaces, a set of major vessels, and relevant normal or

abnormal structures. We define our object model by a segmentation procedure that

uses medical imaging data. Medical imaging examples include computed

tomography (CT, sometimes referred to as CAT scan for computerized axial

tomography), Magnetic Resonance Imaging (MRI), single-photon emission computed

tomography (SPECT) and Ultrasound. Such techniques allow the surgeon to peer

inside the body and are now in routine use for patients. A brief description of some of

the common imaging technologies is provided in the next section as background

information as it is considered essential for the understanding of segmentation and

3D rendering technology.

2.4.1 Medical Imaging Data

There are many different imaging technologies that can be utilized and each

provides a unique view of the system that the physician is studying. CT was

developed in 1967 by Godfrey Hounsfield (Hounsfield 1980). His contribution was

that he linked x-ray sensors to a computer and worked out a mathematical technique

23

called algebraic reconstruction for assembling images from transmission data. A CT

scan of the body provides a density-dependent differential absorption of x-rays.

These views are obtained by the exposure of photographic plates placed beyond the

patient. On the other side of the patient is an energy source from an x-ray beam.

Multiple X-rays are taken as the X-ray tube revolves around the patient.

Computations are done which decipher the amount of X-ray penetration through

specific planes of the system being examined. This computation gives each pixel of

the image a density coefficient which corresponds to the material being penetrated

and is translated into a gray scale. CT scans are best for bones and other rigid

structures. The outputs of the system are gray-scale images sliced at physician-

prescribed distance apart (usually 2mm – 5mm slices).

In 1946, Felix Bloch (Bloch et al. 1991) and Edward Purcell independently

discovered (and received a Nobel Prize for) discovering that when a magnetically

energized substance is exposed to radio frequency it emits a particular frequency.

This process is similar to a tuning fork. They found that the nuclei of different atoms

absorbed radio waves at different frequencies. In 1970, a major discovery was made

that significantly changed the imaging world. Damidian discovered that the structure

and abundance of water in the human body was the key to MR imaging, and that the

water (hydrogen) emitted a signal that was both detectable and recordable. The basis

for the MRI scans is the magnetic properties and bipolar nature of the hydrogen

molecule. These molecules (ubiquitous in soft tissue) change their alignment when a

pulsed magnetic field is imposed. In the alignment process these molecules absorb

energy from tuned radiofrequency pulses. As their excitation decays, they emit

24

radiofrequency signals. These signals vary in intensity due to nuclear abundance or

the molecular chemical environment and can be imaged and converted using field

gradients in the magnetic field into sets of tomographs. The magnetic field needed for

Figure 2-6: The process of segmentation and model generation: (a) A plastic phantom skull with simple objects fixed inside the skull was scanned with a CT scanner at 2mm slices. (b). This is one coronal slice of the raw CT data. (c) This is a label-map of that one slice overlaid on the raw CT image. (d) This is the 3D model generated using the Marching Cubes algorithm after processing all the CT slice data. It shows a transparent skin model through which the internal front view of the phantom is visible.

typical MRI scans are on the order of 1-4 Tesla and the higher the imposed magnetic

field, the higher the resolution of the image. MRI differs from CT in that it images

differences in tissue based on chemical rather than density properties. Hence, MRI

scans are important for soft tissue anomalies. Another very interesting aspect of the

MRI scan is that it can be used to observe real-time changes for instance of brain

activity as a particular task is performed by the patient. This type of MRI is called

Functional MRI (FMRI). FMRI is important because it helps the physician understand

a b

c d

What’s up Doc?

25

the relationship between structure and patient function. This can help (for instance)

neurosurgeons determine functional areas within the brain (language, motor skills

hearing etc) that should be avoided during surgery so that damage to an important

center does not occur. In typical imaging sessions, a patient would be asked to move

or speak or react in some way and imaging would be done to determine what portions

of the brain are being activated for that particular function. Functional MRI uses MRI

equipment to detect regional changes in cerebral metabolism or in blood flow, volume

or oxygenation in response to these types of tasks. A common technique called

blood oxygenation level dependent (BOLD) contrast is often used. This technique is

based on the differing magnetic properties of oxygenated (diamagnetic) and

deoxygenated (paramagnetic) blood which lead to detectable changes in MR image

intensity (D'Esposito et al. 2003).

Another form of very useful imaging is positron emission tomography (PET). It

is based on the detection of subatomic particles and produces physiologic images.

These subatomic particles are emitted from a radioactive source administered to the

patient and typically will gather at the organ of interest. Similar to FMRI, the views

obtained from PET scans can be used to evaluate function. They can also be used to

detect cancer or even characterize cellular biochemical changes in order to examine

the effects of cancer therapy. There are other uses of the PET scan. For example,

PET scans of the heart can be used to determine blood flow to the heart muscle and

help evaluate signs of coronary artery disease.

A very recent development in imaging is 3D ultrasound. A conventional 2D

ultrasound probe (which has been used for decades) can be used in a novel way to

26

produce 3D imaging (Delcker and Tegeler 1998). If the 2D probe is equipped with a

six degrees-of-freedom tracking device spatially registered 2D scans can be

acquired. These scans can then be mathematically combined to create a tomographic

3D image set. The resulting 3D data image planes can be visualized by either

volume rendering or by the process of segmentation and surface rendering

techniques. The volumes of the structures can also be measured accurately.

All of these imaging methods have a common feature. They produce slices of

imaging data that can be segmented and viewed using advanced 3D modeling

technology and can be used for image-guidance procedures and are potential

candidates for Augmented Reality interfacing. However, before algorithms and

methods can be used to create 3D models of imaging data, the user must segment

the imaged data set to determine exactly what portion of the images are to be

converted into graphics models. This can be a painstaking process. There are

various methods of segmentation which can simplify the process, and these will be

covered in the next section.

2.4.2 Methods of Segmentation

Although advanced automatic segmentation techniques are not the main topic

of this thesis, they are considered here as background, because AR and image

guidance rely on the ease of use and generation of segmented objects. We point to

numerous pockets of important research that aim to make segmentation easier for

use in very complex environments. There are at least four major categories of

segmentation algorithms. These are boundary localization, voxel classifiation,

27

knowledge-based segmentation and deformable atlases (Miller et al. 1993). There is

an enormous set of literature in each of these areas of segmentation. Other

techniques proposed make use of a combination of gray-level based systems that

simultaneously incorporate information about anatomical boundaries (shape) and

tissue signature (gray scale) using scale and edge-detection algorithms and some a

priori knowledge to provide an unsupervised segmentation. In his thesis, (Leventon

2000) provides a very good survey of segmentation. A general overview of these

methods will be covered here to provide the background necessary to understand

segmentation technology and its importance to image guidance.

Voxel classication subdivides each image into elements (called voxels). Each

voxel has associated with it an intensity distribution, a decision on the type of tissue

and tissue type decisions of neighboring voxels. The decisions for tissue types are

determined with a thresholding scheme which the user inputs based upon the

imaging data and on properties of the imaging modality. The method relies on

knowing accurate information about the pixel ranges that different structures (for

example, gray matter) may have. Each voxel is then classified according to this and

other information. Other information includes for instance, how the neighbors of this

voxel voted and what the properties of the imaging modality being used are. The

weakness of this method is that the distribution of intensity values corresponding to

one structure may vary throughout the structure and overlap those of another

structure. In this case, this segmentation technique does not produce accurate and

optimized results.

28

Segmentation performed with boundary detection techniques use some

property of the border of the object of interest with other objects that are adjacent.

Generally high-gradient features are indicative of a boundary. In general terms, a

gradient is a vector field in which the first partial derivative of a multi-variable function

forms the intensity and direction of each point in space (Kaplan 1981). In the image

space, a gradient field could be used to describe high-frequency changes at the

borders of different objects because the partial derivatives give the rates of change.

These changes are the criterion for the formation of boundaries between objects.

The parameters of the gradients computation can be controlled to produce

segmentations of structures at different resolution.

The third approach that can be classified as a model-base approach uses

atlas-mapping technology to assist in the segmentation process. An atlas contains a

normalized set of labeled scans of a particular organ type. The atlas

mapping/warping is among the most common methods used for human brain

segmentation. A brain atlas is a database of structural and positional information

that is scalable to a particular person's brain. It is usually based on the real scans of

several subjects. When registered with actual data of a patient’s brain, this atlas

provides various levels of information about the patient's brain structures. Many

researchers have worked on such atlases (Nowinski et al. 2003;Vayssiere et al.

2002). The research in this field has progressed to such an extent that clinical use of

these systems has begun. A traditional atlas is acquired from a sample of actual

brain data. Typically, these normalized databases are embedded with registration

algorithms that allow matching between an actual brain and the theoretical data. This

29

approach attempts to deform a given labeled atlas to that of the new image data that

is to be segmented. So, given a new image set, the algorithm computes a non-rigid

transformation such that it is in correspondence with a normalized set of atlas data. If

the correspondences are computed correctly, then the warped atlas can be

successfully used in the structure labeling or segmentation of the new scan.

Deformations of this category that correctly warp one person's anatomy into another's

is quite challenging and can result in correspondence mismatches. This is especially

true for relatively small structures that are highly variable between subjects, and in

patients with anomalies that significantly change the shape of the organ beyond

normal.

Differentiation of tube-formed tissue such as blood vessels, trachea,

pancreatic duct structures and the ability to independently render them have a variety

of potential applications in the head, neck, lungs, heart, abdomen, and lower

extremities. Disorders in which artery–vein separation is most critical in the

cerebrovascular system include brain arteriovenous malformation. However, tube-

tissue segmentation represents one of the most challenging problems in

segmentation. (Lei et al. 2003) present a near-automatic process for separating

vessels from background and other clutter. They report on separating arteries and

veins in contrast-enhanced magnetic resonance angiographic (CE-MRA) image data.

Their separation process utilizes fuzzy connected object delineation principles and

algorithms. The critical step was to separate artery from vein within this entire vessel

structure via iterative relative fuzzy connectedness. After seed voxels are specified

inside artery and vein in the CE-MRA image, the small regions of the bigger aspects

30

of artery and vein are separated in the initial iterations, and further detailed aspects of

artery and vein are included in later iterations. At each iteration, the artery and vein

compete among themselves to grab membership of each voxel in the vessel structure

based on the relative strength of connectedness of the voxel in the artery and vein.

The result of this produced correct artery–vein separation. And when compared with

manual segmentation/separation, their algorithm was able to separate higher order

branches, and therefore produce many more details in the segmented vascular

structure.

(Erdi et al. 1997) have developed an automatic image segmentation schema to

determine the volume of metastases to the lung from PET images, under conditions

of variable background activity. An elliptical Jaszczak phantom containing a set of

spheres with volumes ranging from 0.4 to 5.5 mL was filled with F-18 activity (2–3

mCi/mL) corresponding to activities clinically observed in lung lesions. The adaptive

thresholding method applied to PET scans enabled the definition of tumor volumes.

This method can also be applied to small lesions. It should enable physicians to track

objectively changes in disease status that could otherwise be obscured by the

uncertainties in the region-of-interest drawing.

2.4.3 From Segmentation to 3Dmodel Creation

Segmentation is an important step to the creation of 3D Models, however, an

important component is missing—the actual generation of 3D polygonal structure that

represent the segmentation that was created in each of the slices. It was in 1987 that

Lorensen and Cline (Cline et al. 1987;Cline et al. 1991) developed a robust method

that enabled the creation of 3D models from scanned data. Marching Cubes is their

31

algorithm for rendering isosurfaces in volumetric data and is the most significant

contribution to this field.

The basic notion in the Marching Cubes algorithm is that we can define a voxel

(cube) by the pixel values at the eight corners of the cube. The idea is to 'march'

through each of the cubes testing the corner points and replacing the cube with an

appropriate set of polygons. If the cube pixel values lay between the user-specified

isovalue that particular cube must contribute some component of the isosurface. The

intersecting edges of the cube can then form triangular patches that divide the cube

into inside and outside regions. By connecting the patches from all cubes a 3D

surface can then be created on. If we classify each of the corners of the cubes as

either being below or above the isovalue, there are 256 possible configurations of

corner classifications. The key is to decide where along each of cube edges, the

isosurface crosses, and use these edge intersection points to create one or more

triangular patches for the isosurface. The genius of Lorensen and Clien is that they

realized that if you account for symmetries, there are really only 14 unique

configurations in the remaining 254 possibilities. When there is only one corner less

than the isovalue, this forms a single triangle which intersects the edges which meet

at this corner, with the patch normal facing away from the corner.

Hence, the volume can be processed in slabs, where each slab is comprised

of 2 slices of pixels. We can either treat each cube independently, or we can

propagate edge intersections between cubes which share the edges. This sharing

can also be done between adjacent slabs which increases storage and complexity a

bit but saves in computation time. The sharing of edge/vertex information also results

32

in a more compact model, and one that is more amenable to interpolated shading

(Watt 1985) .

Each 3D model that is created in represented by a 3D data set. The first

segment of the data is a vertex list which is simply a numbered list of all the points

(x,y,z) in the data in the objects own coordinate system. The second segment

contains an edge list. This list provides information as to how each of the vertices is

connected to form triangular patches. For instance, the list could specify that vertex

(from the numbered list in segment 1) 1, 2 and 3 are connected to form a triangle, as

do 3, 5, and 9 etc. The last segment of the file contains vertex normal vectors which

are used in the 3D rendering process to ensure correct features like acceptable

lighting.

Figure 2-7: 3D models generated from CT scans of our phantom skull. Each patch of the 3D model contains a

corresponding file item which specifies its vertex list and edge list.

2.4.3.1 Tracking Technology

One of the most important issues to consider for a very accurate AR and VR

application is the method for tracking the various elements of the environment such

V1

V2 V3

e2

e2

e1

Triangle Patch on Surface

In the object file

… Vertex list V1 x1 y1 z1 V2 x2 y2 z2 V3 x3 y3 z3 Edge list 1 2 3 /* Connect Vertex 1, 2 and 3*/

33

as the video camera and the tools and the patient (Pandya A.K. 2001d) (Azuma R.

1997). It is provided here as background information. In the Augmented world, since

slight deviations of the virtual and actual world are very noticeable, the requirement

for tracking accuracy is critical. Trackers also determine exactly how accurately the

registration between the virtual and actual object will be. Not many trackers can meet

the required specifications for AR systems and each technology has particular

strengths and drawbacks. Our research has focused on at least four different general

methods for camera and object tracking: (1) tracking using an stereoscopic infrared

camera system, (2) using a precise robotic arm with a camera mounted on it (3) using

image processing methods and pattern recognition techniques for camera calibration

tracking and (4) fiber optics tracking. Hybrid methods have also been considered and

offer the advantage of redundancy at the expense of computational cost and

complexity.

AR scene synthesis needs several pieces of information including a

segmented object model, camera parameters, camera pose, a video stream and a

transformation that describes the object position and orientation (sensed or known).

The generation of this scene entails correct registration of the graphical viewpoint

with the actual camera view and a mixing of the video frames with the exact graphical

view of the object of interest

Figure 2-8 represents the data flow in our implementation of AR. The

generation of this scene entails correct registration of the graphical viewpoint with the

actual camera view and a mixing of the video frames with the exact graphical view of

the object of interest. Given these data, an AR scene can be synthesized. An

34

important aspect of this figure is camera and object tracking. We are researching

what method or combinations of methods are appropriate for tracking components of

the environment. Camera tracking, which is needed to provide the geometrical (pose)

parameters, can be done in many ways. Tracking can be achieved by mounting a

camera on a robot, which provides the geometrical information through forward

kinematics, or by tracking the camera itself by some measurement method for

example using an infrared tracking system. Image processing methods can also

derive camera position and orientation (extrinsic parameters). The camera calibration

procedure (Image Processing) is also needed because it provides the optical (focal

length and lens distortion) parameters of the camera (intrinsic parameters).

The following are some general comments derived from our research on each

of the tracking methods with which we have experimented. There are some

limitations and strengths for using each of the systems outlined. Recently, infrared-

based sensors have been developed that are based on three cameras fitted with

linear charged coupled devices (CCDs) and cylindrical lenses that detect tiny infrared

light-emitting diodes (LEDs). The systems consist of an array of CCD cameras that

track instrument position by localizing LEDs located on the object of interest. We

have used this method for routine tracking of tools and patients for neurosurgery

applications (Li Q. 1999a). It is a very efficient technique with very high accuracy.

Line-of-sight and lighting conditions are major drawbacks of infrared tracking. The

virtual objects will only appear when the tracking marks are in view and the lighting

conditions are properly adjusted. There have been incidences in the OR where the

surgeon’s headlight would disable the infrared tracking system. When the surgeon

35

AR Scene Synthesis

Camera/Object Position/Orientation

AR Scene

Tracking methods

Infrared Tracking

Robotic Tracking

Fiber-Optic Tracking

Image Processing Methods

Intrinsic Camera Parameters

Segmented Object

Object Registration

Video Stream

Magnetic Tracking

looked away from the monitor/tracker the system would be operational when he

needed the information and looked at the monitor, the system would fail because the

light from the headlight would interfere with the infrared cameras. Because of the

cluttered OR environment, there have been several cases where the instrumentation/

personnel have been in the line-of-sight of the infrared system. When this occurs, the

system cannot function. Another limitation of infrared cameras is their range. Tracked

objects must be within the optimal tracking volume of the tracker that is roughly one

meter cubed.

Figure 2-8: Data Flow for an AR Scene Synthesis.

Pattern Recognition technology uses computer vision techniques to calculate

the camera orientation relative to a pattern (Kato H. 2000). (Billinghurst et al. 2001)

are giants in the field of AR with pattern recognition. They provide a useful AR toolkit

36

for software development with which we have experimented. We use certain features

of this toolkit (like video mixing) in our implementation. In this form of AR, video frame

is turned into a binary image based on some predefined threshold value. This value

can be influenced by the lighting conditions of the environment. The binary image is

then searched for certain regions that include the tracking markers. The tracking

pattern is then captured and compared to a database of pre-trained pattern templates

of that particular pattern. If there is a match, the software then has to calculate the

position of the real video camera by knowing the particular pattern parameters and

pattern orientation relative to the physical marker. Once the coordinate system of the

pattern relative to the camera is computed, any tracked or sensed objects within this

coordinate can be placed in their corresponding position. In pattern recognition, the

larger the physical pattern, the further away the pattern can be detected and thus the

greater the tracking volume.

One of the other tracking technologies that we have investigated to track

various objects in the environment (e.g. the camera) is fiber optics. The first step we

have taken is to understand the complex shape data that is generated and to

ascertain its accuracy. One of the possible uses of this kind of tracking device is to

track camera position and orientation of a flexible endoscope. If it was accurate

enough, this technology could be used to augment the view of a flexible endoscope

with the graphical models of structures in the environment. The “Shape Tape” is a

fiber optic device with characteristics well-suited to this work. It is based on fiberoptic

technology and can report 3D information including flexion/extension and

bending/twisting motions. It does this with specially treated fibers to sense curvature

37

(bend and twist). These sensors have been treated on one side to lose light

proportional to bending (Danisch 1997). The lost light is contained in absorptive

layers that prevent interaction of light with the environment. Modulation of light

throughput is very linear with curvature, and uses over 30% of available throughput

over a typical sensor range. Although this technology has great potential, our

measurements of the accuracy of this sensor indicate that the accuracy (on the order

of 1-2cm) is not sufficient for the medical domain. There are more advanced fiberoptic

sensors that are being developed and may have greater potential application in this

field in the future. This sensor is mentioned here as a placeholder for future work

when the technology is improved.

Electromagnetic sensors are attractive because they are relatively inexpensive

and do not require a “line of sight” between the transmitter and the receiver.

Magnetic digitizers use a transmitter that generates an electromagnetic field upon the

operative field. Since the magnetic field is very regular and well know, position and

orientation of three orthogonal coils (in which the magnetic field induce proportional

currents) can be determined. These probes can detect gradients in the magnetic field

in three dimensions. Ferrous metals within the environment distort the

electromagnetic field and render the system inaccurate. However, Louis et al. (Louis

1999) used a magnetic tracking (Flock of Birds Ascension Technologies, Inc.) for a

virtual reality based system for cervical spine measurements. The magnetic sensors

were attached to the head and torso and enabled measurement of the translational

and rotational movements of the head with respect to the torso. We have used

magnetic sensors in our work in the development of a VR system for space station

38

applications (See Figure 2-2) (Chmielewski et al. 1998;Goldsby M.E. 1994;Pandya

A.K. 1994). These sensors have on the order of 5 mm of accuracy and are on the

borderline of what is needed for medical applications.

Figure 2-9: Fiber Optics technology that was studied as a possible candidate for AR/IGS tracking.

Fluoroscopy is another method used by some researchers to measure

movement. Fluoroscopic images are based on X-rays and hence are invasive.

These images can be time consuming and dynamic situations can be difficult to

capture. The accuracy of determining the relative vertebral motion is good, however,

the fluoroscopic image has a limitation in that it is a two-dimensional snap shot of a

three dimensional motion. If careful analysis is not done, 3D information is difficult to

capture and see in fluoroscopic images. One way to get around this problem is to

use multiple axes of fluoroscopic images and fuse the images together. (Komistek et

al. 2003) studied cervical disc degeneration. Their study focused on the

determination of the in vivo kinematics during active flexion and extension of normal,

degenerated, and fused cervical spines.

There are some researchers that have come to the conclusion that hybrid

methods of tracking must be used. (Rosenthal et al. 2002) have used fudicial

tracking in combination with standard magnetic tracking. In this system, the hardware

tracks fiducials in the video images where the locations of each of the fiducials are

39

known. They have achieved superior tracking using this methodology. The position

and orientation of the viewer is computed by inverting the projection operation. The

position data from the magnetic tracking system aids in the localization of the tracking

markers. This technique will work well when lighting conditions are stable and fiducial

trackers can easily be placed on objects of interest. In the medical domain, this may

not always be possible.

The solution that is chosen in this thesis is to use a robotic device with a

precisely mounted end-effector camera. A robotics-based camera overcomes the

problems of line-of-sight and lighting issues, but adds the limitation of range. The

robotic solution is dependent on the robotic kinematics and the range of motion of

each of the joints and their accuracy. This aspect of tracking will be considered in

detail in Chapter 4: For the restricted volume needed for robotic surgery applications

this solution is especially attractive. Also, since the robot is already in place for this

application, this tracking method would be practical and relatively straight-forward to

implement.

2.5 What is the Importance of Human Factors in Medicine and Engineering?

It is the premise of this work that technology development is necessary, but not

sufficient for successful application in medicine. In the development paradigm

chosen, important components are the technology development, user testing and

surgical testing. As the technology is developed, the new technology is first tested for

accuracy, and then it is tested with user feedback against the conventional

technology. Performance data is gathered on the subjects performing the tests. The

40

metrics used can be, for example, the number of error made during the test; time to

complete the task and tests to see what insight is gained by the user. An example of

this method is the way in which we studied the technology of using Heads-up display

as compared to the traditional monitor viewing for endoscopic surgery.

In this study, after all the components were tested and configured, a set of

subjects (22) were tested to see if there was any increase in performance of a

particular simulation of a surgical task. A statistical significance was shown for

increased performance and the technology was then successfully tested in the

operating room with positive results. The surgeon provided feedback that was used

to enhance the technology and improve the user testing. Using this methodology,

the technology development is grounded and balanced with subject testing and

extensive end-user feedback. This is bound to lead to solutions that provide more

return on investment and will be closer to actual use in surgery for the benefit of

patients (see Figure 2-10).

Figure 2-10: Successful technology development for medicine must include extensive user testing and surgical feedback.

0

0

0

0

0

0

End-user / engineering

feedback

User testing and comparison to conventional technology

Positive results? Usability testing feed back

Technology Development Surgical Testing

41

Chapter 3: Image Guided Surgery (IGS)

Any sufficiently advanced technology is indistinguishable from magic.

- Arthur C. Clarke

The science of presenting and displaying complex 3D images in an operationally

meaningful way to a surgeon needs to be studied systematically. For this thesis several

test beds were developed to evaluate different techniques of image data visualization.

One of the forms implemented was an Image Guided Surgery (IGS) system. This

chapter covers the description, implementation, operations and accuracy testing of such

systems.

3.1 Literature Review and Description of System

Throughout every operation, a surgeon must maintain a precise sense of

complex three-dimensional anatomical relationships. IGS was first used for

neurosurgery Accurate visualization is crucial in neurosurgery because visual

landmarks are relatively rare, and they are completely missing within gray matter.

Damage to eloquent portions of the brain anatomy can severely impair the patient.

Although neurosurgeons were the first to embrace this technology (due to the relatively

static nature of the brain) this technology is starting to impact all areas of medicine that

use an image-based approach (Berry et al. 2003;Hinsche and Smith 2001;Holly and

Foley 2003). IGS has made a tremendous impact and is here to stay. As imaging

42

systems become more integrated with the operating room and imaging becomes more

real-time, the tools of IGS will become even more useful.

Often lesions are surrounded by vital neurological and vascular structures and

have irregular configurations. This poses real problems during surgery in terms of

orientation, visualization and optimal tumor resection. Although stereotactic systems

provide the necessary position and orientation, the type of imaging data plays a key role

in the effectiveness of these operations. Diverse modes and types of imaging provide

alternative types of information. For instance, if the anomaly is close to the speech or

motor center, the FMRI scan will be done while the patient is speaking. The speech

center is enhanced on the scan and the surgeon, during image guidance, can avoid this

area that is displayed relative to his tool’s trajectory.

Advances in stereotactic science are transforming medicine and surgery (Wadley

and Thomas 2000). In preoperative evaluation, operative technique, intraoperative

monitoring and data collection. Computational power enables effective application of

technologies requiring high-resolution visualization and precise control and

manipulation of surgical instrumentation. Image Guidance Systems use stereotaxis

which divides the brain into three intersecting orthogonal spatial planes (sagittal,

coronal and axial, See Figure 3-1). These planes provide a rigid coordinate system

from which all the slices of the scan can be referenced. In other words, each point in

the patient’s MRI scan has a coordinate value relative to the imaging studies reference

frame.

As described in section 2.1, Image guidance involves pre-operative imaging

studies such as CT or MRI scans of the patient. These scans provide the surgeon with

43

different views of the pathology. IGS uses a methodology that translates into accurate

and reliable image-to-surgical space guidance. It is analogous to having a global

positioning system for the human body. These systems are primarily used as

navigation systems, but, as intelligence and a knowledge base is built into the systems

of the future, it may be possible to give optimal and least dangerous paths, warnings of

approaching dangers, signals and annotations of different organs or even information

from embedded smart sensors as guidance to a surgeon in real-time.

Figure 3-1: The three planes of an image data set. (Axial, Coronal, and Sagittal)

A relatively new field IGS blends the use of computer-based medical imaging

data with real-time instrument position data capture to assist the surgeons in localizing

and removing lesions. The methodology involves three components: image acquisition,

with definition of coordinated space from one or several imaging modalities, planning or

44

simulation of the surgical procedure, and intraoperative patient registration procedures

(Li Q. 1999a;Li Q. 1999b). Because the field is relatively new and much of the

technology is related to Augmented Reality, in the next section, the technical details of

how such a system can be implemented are given.

3.2 Implementation of Image Guidance System

The implementation of an IGS system has three basic components—tracking,

registration and image display. As described in section 2.4.3.1, there can be several

tracking technologies used for navigation systems. In this thesis, due to the emphasis

on ultimately enhancing robotic technology, the tracker that was chosen was a passive

robot arm (also considered an articulated arm). Due to its kinematic similarities to

active robotic devices, we have chosen the Microscribe arm which will streamline the

translation of the developed imaging technology from this test bed to active robots. A

section on the Microscribe tracker is given along with some kinematic details, next a

section on how the patient or phantom is registered to the imaging data is given and a

section is given on how the software is implemented.

3.2.1 Passive Robot Arm Used as the Tracking system

The critical component in interactive image-guided surgery is the use of an

intraoperative localizer system or a digitizer, which ultimately provides the surgeon

useful navigational information usually in the form of position and/or orientation of

surgical instruments. Infrared tracking is one of the most popular methods used in

stereotactic neurosurgery. The typical IGS system allows the surgeon to view the

45

trajectory that their tracked tool is taking through the tissue it is penetrating (See Figure

3-10). The Microscribe device can be considered a passive robot (articulated arm). Its

geometric and transformation structure similarities make it an inexpensive and useful

analog / test bed to current robotic systems. It has the advantage of being readily

accessible and amenable to quick prototype development and evaluation. We have

integrated the serial interface to the five degree-of-freedom Microscribe and can get a

precise position and orientation of its end-effector and any tool rigidly fixed to its end

point.

Figure 3-3 Illustrates, a forward kinematics solution (i.e. the end-effector of the

robot coordinates in terms of the base coordinates) for the Microscribe. The dotted

green arrows on the figure show the 5 degrees of freedom (DOF) (rotation axes) of the

device. Each individual transformation matrix (T) (See Equation 3-1) specifies how the

first joint is related to the second joint by describing the rotations (in terms of the

direction cosines) and the translations needed to transform one joint coordinate to

another. Knowing the joint angles of each DOF, a forward kinematics solution can

be compute by matrix multiplication to form a homogeneous (T) transformation matrix

that defines the position and orientation of the end-effector in the base coordinate

system.

EEJJJJJJJJBEEB TTTTTT −−−−−− ××××= 33221100 (Equation 3-1)

Each of the transformation matrices described above is a homogeneous

transform which has both the rotation component and a translational component as

specified by the equation below. The unique features of the homogeneous transform

46

are that the vectors of the rotation matrix (unit vectors) of the transformed coordinate

system are referenced in the row vectors of the matrix in the base coordinate system.

In other words, the inverse of the matrix is simply the transpose of the rotational matrix

and the negation of the translational vector and makes them mathematically elegant.

(Equation 3-2)

Because the measurement of the end-point of this system involves several joint,

we performed an experiment to evaluate with what precision the arm could capture the

same point using various arm configurations. This is an issue because a user of the

system who is asked to digitize a certain point can do it with a variety of joint angles.

This is important for the next step (patient registration) where several points need to be

captured from special markers on the patient’s skull. Our results indicate that there is a

rms average error between all these points of about 0.73 mm in the capture of a

particular point. This is a gage of the precision that we can expect from this instrument.

In another experiment we used a 3D ruler (which is accurate to about 0.5mm) and

measured a known point. For the measurements we made, the average error was

0.84mm different from the actual point Figure 3-2. We can conclude that the accuracy

of the Microscibe is within 1mm. More details of the error measurements are provided

in Chapter 4 where error is considered in more detail.

Rotation Translationn

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

−−−

=

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

= −−−

100010002221202

1211101

0201000

1

2222120

1121110

0020100

trrrtrrrtrrr

Ttrrrtrrrtrrr

T xxxxxx

47

Figure 3-2: In this Experiment, we captured the same point in space using different arm configurations. We found that there is a 0.91mm standard deviation . The Red line represents the average.

Figure 3-3: This is the transformation structure of a typical robotic device. The green arrows show the degrees of freedom of the arm. The blue lables (with red arrows) show the transformations needed to compute the base to end-effector transform.

J0

J1

J2

J3

EE

TB-J0

TJ0-J1

TJ1-J2

TJ2-J3

TJ3-EE

*TB-EE

B

Microscribe Accuracy of Same Point with Different Arm Configuration

325

326

327

328

329

330

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Point Number

Dis

tanc

e fr

om C

ente

r

48

3.2.2 Patient Registration with Fiducial Mapping

One of the most fundamental problems in IGS systems is the registration of

objects in the scene (Gong J. 1999). Patient registration involves the determination of

the spatial relationship between the image and the surgical coordinate space (Alp et al.

1998;Ferrant et al. 2002;Fleute and Lavallee 1999;Weese et al. 1998). The objects in

the real and virtual worlds must be accurately aligned. Without this alignment, the scene

will not be accurate. Certain applications (such as Brain surgery / biopsy etc.) require

even higher fidelity of registration. Registration is the process of defining special points

based on the fiducial markers or anatomical landmarks from CT, MRI, or PET scan data

and relating them to the corresponding patient data. In Figure 3-5 the 3D model of the

phantom skull (generated by the segmentation process of the CT images of the

phantom) shows the locations (in blue) of the corresponding fiducial markers (which

were visible on the CT scan). These points are then correlated with the points located

on the head of the patient in the “real world” in the operating room. The goal is to match

and correlate data from the medical images to the “real world” (i.e., the coordinate

space of the surgical instruments) (See Figure 3-5). Each fiducial in the image space is

located in each of the 3 orthogonal slices and is also verified on the 3D model of the

phantom. The exact position of each of these fiducials in the image coordinate system

is recorded in a file.

The next step is the patient/model registration step. A tracking device (the

articulated arm) is attached to the instruments to continually relay information regarding

its position and orientation of the tip (end-effector). Each of the fiducial points is located

in the image space and corresponds to an actual point on the surface of the model. All

49

the actual points are then digitized as shown in Figure 3-7. The paired points (image

space coordinates and robot space coordinates) need to be matched in such a way as

to provide a rigid-body transformation matrix which describes their relationship.

Coordinate matching ensures that any point seen in a medical image corresponds to an

actual point in the patient’s anatomy. To avoid problems of accuracy and also to reduce

the effect of noise, we usually take more information (points) than what it is theoretically

needed. The computation can be done with as few as 3 points; however, there is a risk

of computing the wrong transformation. There are various errors that make the

computation of an iterative solution justified. There are errors not only in the tracking

device, but also in the capturing of points in both the image space and the actual points.

These inaccuries in point position makes the computation of the exact transformation

between the coordinate systems a little tenuous. Also, typical fiducials that are used for

patient registration have a design which is conducive to error. The flat middle portion of

the fiducial has a 2-3mm area in which the user can place the digitizer probe anywhere.

A funnel shaped design would allow for a much more accurate measurement as it

would allow the tool tip to be guided to the exact position. Also, just how many fudicial

points and in what orientation they need to be placed is an active area of research

(Cash et al. 2003). Surface matching is also a technique that is used and the results

from this form of registration is as good as fiducial matching (Gong J. 1999).

50

3.2.2.1 Rigid-body transformations

In order to compute a transformation matrix that can translate/rotate a vector in

the patient space to the image space, 6 parameters must be known (3 translation and 3

rotation) (See Figure 3-4). For computation flexibility, these 6 parameters can be

converted into a 3x3 rotation matrix (consisting of the direction cosine vectors) and a

3x1 translation vector combining to make a 4x4 homogeneous transformation matrix

that can easily allow transformation between these coordinate systems by matrix

multiplication. Figure 3-4 describes the relationship between the same two points in

different coordinate systems. The equation below describes how a point in the patient

coordinate system can be transformed to a point in the image coordinate system using

a simple vector to matrix multiplication. The 44× transformation matrix is one that is

estimated from the pair point matching procedure (Gong J. 1999) to be discussed next.

Figure 3-4: The transformation between the patient coordinate system and the image coordinate system can be represented by a translation vector (T) and a Rotation vector (R). These entities are for the homogeneous 4x4 transformation matrix.

Zp

Zi

Rx Ry Rz

Tx, Ty Tz

Op Oi

Patient Space

Image Space

Xp

Yp

Xi

Yi

Paired Points Captured in both

51

I

x

y

x

P

x

y

x

p

pp

trrrtrrrtrrr

p

pp

⎥⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢⎢

⎣

⎡

=

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

⎥⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢⎢

⎣

⎡

110001

2222120

1121110

0020100

Equation 3-3

In rigid-body transformations, there are two transformation entities –a translation

and rotation. Given a rotation vector which represents rotations around the x, y and z

axes respectively, a rotation matrix can be computed which relates two orthogonal

coordinate systems having the same origin but differing in these rotation angles. Each

individual rotation is given by the following three sets of equations:

⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡−=

1000cossin0sincos

)( αααα

αxR ⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡ −

=ββ

ββ

βcos0sin010sin0cos

)(yR ⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡−=

1000cossin0sincos

)( γγγγ

γzR Equation 3-4

The combined effect of these rotation matricies is given by the following

equation:

⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡+−−−

−+−=

ββγβαγβγαγβαγαγβγγβγαγβαγαβα

αβγcossinsinsincossinsincoscossincossincossinsincoscos

cossinsincoscoscossinsinsincoscos)()()( xyz RRR Equation 3-5

In order to convert a matrix back to euler angles, one has to work form the

components of the above matrix and convert the direction cosine values back to angles.

For instance, the last component of the third row (2,2) of the cosine matrix can be used

to get the )][cos(ββ >−− , from which we can use the β to compute α by using

52

component (2,0) of the above matrix. Then use the β again to get γ by using

components (0,2 )and (1, 2) and (2,1).

Figure 3-5: Fiducial markers (visible in CT scans) are applied to the phantom before the scan .

Figure 3-6: A Fiducial marker in the imaging space is located on all three slices of the CT scan and is displayed (in blue) on the 3D model.

In order to convert rotation angles into a transformation matrix and vice versa,

equation 3-5 must be used. This conversion in effect reduces the number of

optimization parameters from 12 down to 6 as the entire direction cosine matrix and the

Axial axis Sagittal axis Coronal axis

3D Model

53

translation vector no longer need to be used in this minimization process. This is very

important for the convergence and complexity of the algorithm that we will cover next

(Van Loan 2000).

.

Figure 3-7: On the left, the fiducial marker on the actual skull is being digitized by the robotic articulated arm. This point corresponds to the one shown on the 3D model (right).

Figure 3-8 Illustrates at a conceptual level, how parameter estimation is applied

to the image space to patient space registration and what linear equation sets need to

be solved. Central to parameter estimation, is a model of the system (See Figure 3-9).

This model takes a set of parameters (which need to be optimized) and the input set of

observed values. Given this information, the model will compute the corresponding

output set of values. The output set is compared to the corresponding observed values

to ascertain if the values are within a certain tolerance. If not, the parameters will be

updated using a steepest-descent algorithm and the process will be iterated upon until a

solution has been reached (Van Loan 2000) (Gong J. 1999).

In general, in order to optimize a particular set of parameters, we must first

define an objective function ( 2χ ) which corresponds to the model of the system. In

order to use a steepest descent approach, we must be able to compute the gradient of

54

the objective function at the particular estimated parameter set ( ))(( 2 aχ∇ . The gradient

of a function gives the vector where the function takes the steepest slope (Press W.H.

1992). It is akin to a mountain stream which follows the gradient of the mountain to get

to the lowest possible minimum. There are dangers in that the function can get “stuck”

in a local minimum and not find a global minimum value. In order to take a functional

step in the direction of the gradient, the following equation must be used (Kaplan 1981).

))(( 2currentcurrentnext aaa χλ ∇×−= Equation 3-6

Figure 3-8: Parameter estimation for pair-point matching algorithm. The paired points (fiducials in image coordinates and patient coordinates) are used to derive a rigid body transformation that will translate a vector in patient space to a vector in image/3D model space.

Once the function can be evaluated at its gradient, the next iteration can be

taken and the value of the parameter vector (a) can be refined such that the correct

values can be achieved. We use the Marquardt and Levenberg (Van Loan 2000)

method for the computation of the steepest descent and it produces a fast convergence

and sufficient accuracy for this application.

(observed pairs of values) (xi, yi, zi) ↔ (xa, ya, za)

Model (M)

Updating Parameter

∆

x(M), y(M), z(M)

Initial Guess

ai

Within Tolerance?

Transformation Parameters Yes

No

Vectors in Patients

Coordinate

Vectors in Image

Coordinates

55

Once all the observed pairs can be predicted by the adjusted parameters to

within a certain tolerance, the results are reported. The 6 parameters reported (namely

the three translations and three rotation angles) are then converted to the

homogeneous transformation matrix. The real-time software system is then able to

apply this transformation to all the positions and orientations of the patient coordinate

system to produce the same point in the image coordinate system. Hence, the end-

effector vector of the tool being tracked can then be displayed in the image space or the

3D model space using this transformation.

If there are multiple sets of images taken (for instance CT /MRI and PET scans),

then the same methodology as above can be used to compute a transformation matrix

to align or overlay the images from multiple modalities. This is called image fusion and

is an active area of research. All available preoperative image data are fused into a

uniform coordinate system that corresponds to the individual patient's brain. The

imaging data can then be presented to a surgeon in a single display for surgical

planning and computer-based operations that will give the surgeon an optimal viewing

environment.

Each imaging modality displays anatomical structures and lesions in a unique

way. This benefits the surgeon by providing several different ways to view the same

anatomical structure, and requires the development of an interactive relationship

between the images and the real world. Registration is used to build this relationship

and enables the surgeon to use each imaging modality to its greatest advantage for

localizing the anatomical structure. The registration is mathematically identical to the

56

patient registration already described (Press W.H. 1992). The difference is that the

pairs of matched points come from each of the image modality coordinate systems.

Figure 3-9: This is the model for coordinate transform estimation . The point in the input of the model are converted to a new coordinate system using the given estimated parameters.

Figure 3-10 shows a surgeon using an image guided system in the operating

room on an actual patient (Zamorano L. 2001). In this case, the tool that she is holding

is tracked by an infrared camera. The registration has already been performed on the

fiducials applied to the patient before the imaging studies were done and the tools

trajectory is displayed in the image space in all three orthogonal scans of the patients

MRI and also on reconstructed orthogonal slices relative to the tool trajectory.

Figure 3-11 shows a similar system developed in this thesis showing the robotic

end-effector on orthogonal slices of the phantom’s CT scans as well as a 3D model of

the phantom. In this case, the difference is that tracking system is a robotic device, the

imaging study is a CT scan and the tool is also displayed in a 3D model of the phantom.

γβα ,,,, ,210 TTT

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

=−

10002222120

1121110

0020100

trrrtrrrtrrr

T xxx

iii ZYX , )(),(),( MZMYMX

Parameters to be estimated

Internally converted to Homogeneous Transformation

Model Output Model Input

57

Figure 3-10: A surgeon is using an image-guided system where the tool that she is using is tracked by an infrared camera system and displayed on the orthogonal slices of the preoperative MRI scan.

Figure 3-11: Image Guidance systems that are used in the OR was re-implemented in this thesis to use the same articulated-arm tracker as the AR system as a testbed for evaluation of this current technology with the up-coming AR technology.

3.2.3 Software Architecture

The implementation of the software system is described in Figure 3-12. There

are three main software components the server, the main client and the mirror clients.

This type of system architecture is important in telemedicine where expert collaboration

and teaching are emphasized (Pandya A.K. 2002). The server subsystem interfaces to

the robotic hardware and reads the kinematic data that the robot provides. The server

interface (Figure 3-13) allows the user to control the registration process and provides

information about the operations of the server software. It does a read of the robot

58

whenever the main client requests data. Only the main client can request a read of the

robot. All the other clients (anywhere on the internet) can only follow the lead of the

main client and mirror the data that it has processed. The reason for this is that each

connection to the server is implemented as a separate thread (process). Multiple

processes are forbidden to open and request information from the robotic hardware

serial interface at the same time. Only one thread can interact with the serial port (the

main client). Moreover, the main client is the software that is running where the

hardware resides. Although it is possible to control the robotic reads (and if it was an

active robot—the joints) from a remote location, it only makes sense from the surgery

site.

In addition, each client (including the main client) has a complete set of imaging

data and the associated 3D models. As computation power and graphics through-put

can be very different, each client is free to choose the fidelity of information that it can

handle. If the machine the client is executing on is very fast and has a high-end

graphics card for 3D rendering, it can display high-resolution 3D models. On the other

hand, a simpler PC may be able to handle only low resolution image data and only

simple wire frame models of the 3D data to achieve near real-time performance. The

design implemented for this software allows this flexibility and also opens a door for

research in this very fertile area of telemedicine.

Before it is open for requests by clients, the server prompts the user to do the

pair-point registration (which was discussed in detail in section 3.2.2). This registration

process takes the pairs of matched points (image space points and their corresponding

actual points) and computes the transformation matrix necessary to convert the raw

59

end-effector pose data to the image space coordinate system. The server then waits

for the main client to log in. The main client (which is another program running on the

internet) must be the first to initiate a request. It first must send an authentication signal

and a password (for data security reasons) to the server. The server checks its list of

users and then allows the main client to send requests. The main server then initiates

the sequence by sending a request for kinematic data that it requires. The server then

queries the robot, computes the necessary data and sends it to the main client. Up until

this point, all mirror clients are blocked from logging in. The server then waits on the

same internet port for more requests. If another client wants information, it must also

Figure 3-12: Software implementation of the Image Guidance System. The system is implemented as a client-server system in which multiple clients anywhere on the internet can view the scene as seen by the main client.

provide authentication information for the server and must identify itself as a mirror

client. The server then checks that a main client has requested information and sends

any of its already computed information (from the latest main client request) to the

Robotic Kinematic

Server

Object Registration

Software

Image

Guidance Clients

3D Models and

Imaging Data

3D Tool Trajectory

Visualization Software

Robotic /Articulated

Arm

Internet Clients: Request Kinematic DataServer: Sends Kinematic

60

mirror clients. In other words, the mirror client can only echo what the main client is

requesting and updating. Any mirror client can request a variety of pre-compute

information from the server which includes the individual joint angles, the end-effector

transformation in the robots coordinate system, the end-effector transformation in image

coordinates, or even the transformation matrix that the server has computed after it has

done the pair point matching. As this information is patient information that is crossing

the internet, using this same paradigm of client/server an encryption scheme can be

setup and the data can securely travel the internet in a Virtual Private Network (which is

beyond the scope of this thesis). The server software has been implemented on a PC

platform using Visual C++. The Client software uses 3Dslicer software platform

(Grimson W.E.L. 1996) written in Tcl/Tk onto which several modules have been added

to login to the server software and also to display the end-effector in the 3D display. All

software and methodologies and hardware specifications can be provide for non-

operating room use (as the software has not gone through the rigorous testing for OR

use) upon request to the author.

3.3 Discussion

The goal of this portion of the thesis was to create a test-bed IGS system for use

with subject testing. It was important that the test-bed hardware be the same for the AR

system development to maintain consistency and provide a baseline for comparative

user testing. It is important to note that the novel features of this in this section are the

client/server interfacing. This can be useful in a telepresence type of application where

multiple remote users of the system are needed for consultation. Many of the other

61

features implemented are readily available in commercial systems as IGS has

developed substantially over the last 5 years. The major benefits of this development in

the context of this thesis are (1) that this implementation provides a common platform

from which allows direct comparison of AR technology with IGS taking away the

inconsistencies of multiple platforms, and (2) that AR is based on much of the same

technology and an understanding of how IGS works provides a basis for understanding

AR. In the next chapter, implementation details of the AR system will be provided.

Figure 3-13: This is the Tracker Server interface. This software does the pair-point matching on the image data and also handles communication of various tracking information to multiple clients on the network.

Name of fiducial Image Space Matched Points File

Information on who is logged in and what information is being requested

Registration and other Messages from the Software

Internet Port that the server is listening on

Registration Successful. Error = 1.2mm Server Listening for Requests on port 1024

Main Client: Requests : Position in Image Space Mirror Client 1 : Requests Position in Image Space. Mirror Client 2 : Requests Registration Matrix.

62

Chapter 4: MEDICAL AUGMENTED REALITY SYSTEM (MARS)

Reality is merely an illusion, albeit a very persistent one.

- Albert Einstein

Like the image guidance system described in the previous chapter, another

technical aim of this thesis is to create a test-bed Augmented Reality (Pandya A.K.

2001e;Pandya A.K. 2001c) system to be used to prove the comparative utility and

accuracy of this technology and to provide an easily translatable kinematics-based

AR prototype. In this chapter, an introduction to AR technology is followed by an in-

depth literature survey. Then, the details of how an AR system is implemented are

given along with a component-by-component error analysis of the system.

4.1 Literature Review and Description of System

An Augmented Reality (AR) system generates a composite view for the user

that includes the live view fused (registered) with either pre-computed data (e.g. 3D

geometry) or other sensed data. There have been several researcher that have

studied AR techniques for various fields within the Medical domain (Azuma

1997;Blackwell et al. 2000;Cho and Neumann 2001;Freysinger et al. 1997). One of

the novel features that we present here in this thesis is the use of medical robotics

with AR. For convenience, a new term is introduced--Augmented Robotics--(Pandya

A. K. 2002) which is defined as an Augmented Reality scene generated using the

kinematics of a robotic device which has a camera system mounted on or near the

63

end-effector. It is a combination of the real scene viewed by the user and a virtual

scene generated by the computer that augments the scene with additional

information.

Real-time video image processing and computer graphic systems technologies

have converged to make possible the display of a virtual graphical image correctly

registered with a live video view of an environment of interest. To our knowledge, no

other medically related AR systems based on robotic tracking has been reported

(Pandya A.K. 2001e;Pandya A.K. 2001d;Siadat M. 2002). It potentially represents an

efficient and intuitive way to link robotic systems and the surgeon to the patient data.

We envision a system in which the surgeon can visualize critical imaged or sensed

data on demand directly overlaid on the video stream at the remote site. The research

activities in Augmented Reality center around the development of methods to register

the two distinct sets of image/data sets and keeping them registered in real time

(Billinghurst et al. 2001). The computer generated virtual objects must be accurately

registered with the real world in all dimensions. Errors in this registration will prevent

the user from seeing the real and virtual images as fused (See Figure 4-1) (Azuma R.

1997).

Robotic systems have advanced dramatically over the last few years as

described in the introductory chapter. However, they typically lack a link to the patient

imaging information and therefore have no Augmented Reality capability.

Visualization of the structures of interest before the surgery would greatly enhance

the surgeon’s ability to position the three small laparoscopic ports on the patient.

Incorrect placement of these ports can drastically affect the success of these

64

surgeries. The narrow field of view in laparoscopic surgery makes it hard for the

surgeon to recognize the internal organs. In addition, due to the fact that the video

views obtained from the scope are very near field, and not always in the surgeon’s

frame-of-reference, the surgeon can become disorientated. Therefore, an AR system

may bring a significant improvement into the robotic laparoscopic surgery procedure

(Cleary K. 2001).

Figure 4-1: An AR scene can be generated by the alignment of the camera’s trajectory and the 3D graphics virtual camera’s trajectory. Once the two cameras are aligned, the actual objects will match their 3D modeled replicas.

During surgeries, fixed critical objects overlaid on the video stream could

provide additional situational awareness cues. For instance, the robot arms could also

be augmented when not in direct view, simple coordinate systems could be added to

give orientation cues to the surgeon, or registered sensor data could be also overlaid.

It is the thesis of this paper that a potential exists for robotic and IGS systems to use

Video or View of Scene

Tracked Camera

Pose

Virtual Camera Position

Graphics View of

3D objects

A L I G N M EN T

65

augmentation technology to provide the surgeon with a direct link with patient pre

and/or intra-operative image (e.g. ultrasound data, open MRI data) and other sensor

data.

We have built a system in which the surgeon can visualize critical imaging data on

demand directly overlaid on the video stream. The system is built upon the technology

of IGS as described in chapter 3. The AR system implementation adds a video

camera mounted at the end of the Microscibe (passive robot arm) which is calibrated

and registered and is able to generate an augmented view of the phantom skull. The

augmentation in this case is made up of 3D models of the various structures within

the phantom that were generated using the segmentation techniques described in

Chapter 2 from a CT scan of the phantom (See Figure 4-2). In the next section, a

detailed literature review of Medical AR technology is provided.

Figure 4-2: The precursor of Augmented Reality/Augmented Robotics. We use the Microscirbe as the

tracked tool, (a). The position and orientation of the end-effector is shown on the orthogonal slices and 3D model of the phantom skull. After adding a calibrated and registered camera (b) we can generate a monoscopic Robotic Augmentation Scene,(c).

(a) Robotics-based Neuronavigation

(b) Camera System at End-effector

(c)Augmented Robotics

66

Augmented reality is an up-and-coming field in the medical world. Its uses for

the medical discipline in which accurate 3D visualization (particularly surgery) should

seriously be researched and implemented. AR systems are currently being

researched for clinical usage. We predict that AR will become an important tool in

medical training, preoperative planning, preoperative and intraoperative data

visualization, and intraoperative tool guidance. It is a technology that uses some form

of three-dimensional (3-D) position sensing, a 3D reconstruction of the patient data of

interest and real-time overlay of this information on an actual view of the patient. This

section reviews pioneering research in this field and provides the framework and

context on which the current prototype AR system provided in this thesis is based.

4.1.1 Medical Augmented Reality

Medical augmentation has been studied by numerous other researchers (Iseki

et al. 2001) (Maurer et al. 1999). However, a landmark paper on registration methods

for Image Guided Surgery and Enhanced Visualization was presented by (Grimson

W.E.L. 1996). In this paper, an augmentation scene of a static camera system was

produced using a laser range scanner and a video camera. In their approach, a laser

scan of the patient produced a set of range data. This data was manually separated

to include just the patient and region of interest. Next the data was manually matched

to the video frame and a computer controlled refinement stage that cycled over all the

possible pairings of the MRI points to laser points produces the needed AR

transformations. In their paper, they state that “Augmentation of a stationary video

camera is relatively straightforward; however, [dynamic] tracking of the camera is

67

more relevant and more challenging”. They report a registration RMS error of their

system at 1.6mm, but, admit that this error is the error of data fitting and that it was

difficult to ascertain the actual registration error and that their future work would

include “some kind of phantom study”. In our study we provide both the dynamic

tracking and a phantom accuracy study which will be covered in detail.

Raya et al. (Raya M. A. 2003) have proposed an AR prototype to replace the

traditional optical microscope view with a digital one. For their AR prototype, they use,

a) an infrared tracking system, b) two video cameras tracked by infrared LEDs. They

consider two types of error for their prototype error analysis: 1) object space error, 2)

camera calibration error. The first measure is obtained by finding the closest

approach between the point in object space and the line of sight formed by back

projecting the measured 2D coordinates out through the camera model. The second

measure is defined as the distance of the actual and image points as projected on the

screen, which is obviously not caused just by camera calibration error. They have

reported 0.2±0.15 mm and 0.4±0.2 pixel errors for what they have called “object

space error” and “camera calibration error,” respectively. Our experience has been

that typical tracking devices have on the order of 1mm of error. Moreover, they have

used the anatomical structures/points of the skull surface to measure the error, which

is, based on our experiences, very subjective and inaccurate.

Hattori et al.(Hattori A. 2003) have developed a data fusion system for the

robotics surgery system “daVinci,” composed of an optical 3D location sensor and a

digital video processing system. Their proposed system needs to be

calibrated/registered to calculate the transformation from the optical marker to the

68

camera. This extra step should be taken each time before the course of surgery. In

their paper, they don’t present a comprehensive results section and any accuracy

study of the daVinci system and their infrared tracking system to generate an AR

scene. Their system differs from what we propose here because it did not use the

robots kinematics and involved additional infrared hardware that needed active LEDs

in continuous line-of-sight view to an infrared camera. No rigorous error analysis was

reported in their report.

(Khamene A. 2003) have developed an AR system for MRI-guided needle

biopsy. Their main goal was to reduce or completely remove the need for the

interventional scanning (by a high field closed magnet MRI) as well as the need for an

open MRI scanner from the biopsy procedure. Their system consists of: 1) one video-

see-through head mounted display (HMD), 2) two video cameras attached to the

HMD to provide a stereo view of the scene, 3) a third video camera for tracking, 4) a

set of optical markers attached to the patient’s bed, 5) a set of optical markers

attached to the needle. In their analysis, they overlay the model of the skin of the

patient on the patient. They have reported an accuracy study as good as 1 mm for

the whole system. Our experience suggests that typical tracking devices alone are on

this order of error. They have pointed out that for a small number of cases where the

accuracy is substantially larger than 1 mm, the error were most likely caused by

needle bending. The line-of-sight problem is also a limitation for this type of tracking

method.

In their paper on a data fusion environment for multimodal neuronavigation,

Jannin et al (Jannin P. 2000) briefly experimented with AR techniques as applied to

69

the Zeiss Microscope. They used projected 2D contours in the focal plane of the right

ocular of the microscope. The main limitation that they noted was that no information

about structures before or behind this plane were visible and that different contours

could not be visualized with different colors, line widths, or labels. No error estimates

were provided for the augmentation technique they used.

(Iseki et al. 2001) used Augmentation techniques with endoscopes. They

present endoscopic, augmented reality (AR) navigation system. The system consisted

of a rigid endoscope with light-emitting diodes, an optical tracking system, and a

controller. Three-dimensional, virtual images of the tumor and nearby anatomic

structures (including the internal carotid arteries, sphenoid sinuses, and optic nerves)

were superimposed on real- time endoscopic live images. An interesting aspect of

their work was that as the device approached closer to the object of interest, the

object would change colors to reflect the distance. No error estimates were provided

for the AR scene generation.

(Masutani et al. 1998) also constructed an AR-based visualization system to

support intravascular neurosurgery and evaluated it in clinical environments. Three-

dimensional vascular models were overlaid on video images from X-ray fluoroscopy

by 2D/3D registration using fiducial markers. The models were reconstructed from 3D

data obtained from X-ray computed tomographic angiography using standard

techniques. Here, the virtual camera position (camera tracking) was calculated using

the coordinates of the fiducial markers so that the projected view geometry of the 3D

computer graphics corresponded to the X-ray fluoroscopy that they used. They report

an error of 3mm, but also state that these errors are computed using only pseudo-3D

70

estimation of errors, unlike the true 3D values for stereo camera measurements.

They report problems with very slow frame rates.

(Iseki et al. 1997) have developed an overlaid three-dimensional image

(Volumegraph)-guided navigation system that allows navigation during operative

procedures. The three-dimensional image is superimposed on the patient's head and

body via a semi-transparent mirror. The Volumegraph can display three-dimensional

images in the air by a light beam that is based on CT/MRI data. Based on clinical

application in 7 cases, the system was found to be advantageous because the

surgical procedures could be navigated easily by augmented reality in the surgical

field. Invisible parts of the surgical field were supplemented with the overlaid three-

dimensional images (Volumegraph) as if it were the virtual operative field. The

disadvantage of this work is that it relies on the printing and processing of CT or MRI

files into holographic films (which takes a day or so). The registration procedure and

error analysis is not well documented.

(Sato et al. 1998) in their paper describe AR visualization for the guidance of

breast-conservative cancer surgery using ultrasonic images acquired in the operating

room just before surgical resection. In their application, the 3-D position and

orientation of a video camera are obtained to integrate video and ultrasonic images in

a geometrically accurate manner. Superimposing the 3-D tumor models onto live

video images of the patient's breast enables the surgeon to perceive the exact 3-D

position of the tumor, including irregular cancer invasions which cannot be perceived

by touch, as if it were visible through the breast skin. The system was shown to be

effective in experiments using phantom and clinical data.

71

(Wagner et al. 1995) present a new visualization system for image-guided

stereotactic navigation in tumor surgery. The combination of frameless stereotactic

localization technology with real-time video processing permits the visualization of

medical imaging data as a video overlay during the actual surgical procedures. Virtual

computer-generated anatomical structures were displayed intraoperatively in a semi-

immersive heads-up display. This results in surgical navigation assistance without

limiting the judgment of the physician based on the continuous observation of the

operating field.

Another example for AR in the medical field has been reported the following

group (Fuchs et al. 1998;Fuchs et al. 1996;Livingston and State 1997). They used an

optical see-through display where the physician was able to view a volumetric

rendered image of the fetus overlaid on the abdomen of the pregnant woman. The

image appears as if it was inside the abdomen and is correctly positioned as the

physician moves within the environment.

A very interesting and useful AR implementation is for the craniofacial surgeon.

In this implementation (Patel et al. 1996) the surgeon was able to view the final

results of a surgery directly on the patient rather than only with the volume

visualization.

As illustrated with this sample of AR related work, many researchers have

worked on this new technology, however, there are very few examples of rigorous

error analysis of AR and there were no comparisons of AR with its older sister

technology –Image Guidance. In addition, there were no medical robotic kinematics-

based AR systems.

72

4.1.2 Research in Camera Calibration

In order to generate an accurate AR scene, one needs to set up a virtual

camera that models the actual camera accurately. There is a large body of research

in this regard (Abdel-Aziz Y. I. 1971;Heikkila 2000;Tsai 1987.;Wang L. 1991). Among

them a key paper is the one published by Roger Tsai (Tsai 1987.). We used the

camera model proposed originally by (Tsai 1987.) and used by (Weng J. 1992) and

refined by Heikkila (Heikkila 2000). Heikkila et al., also provided a very useful

implementation. Details of the camera calibration process (which is the foundation

upon which AR is built) are described in the implementation section of this chapter.

4.1.3 AR in Telepresence

The telepresence aspect of robotics allows the surgeon to perform surgery

remotely (Freysinger et al. 1997), . This may allow the surgeon in time-critical

situations to apply his/her skills to reach remote locations. Telepresence is a

technology that projects the operator’s motions and dexterity to a remote location

while providing tactile, visual and auditory feedback. This is a very challenging

operation for the user (Knight et al. 2003b) especially when the robot is remote from

the user and there are time delays. Under such conditions, it may be advantages for

instance to manipulate a virtual version of the robot arm and practice the operation

directly on a model of the arm. Researcher at the university of Toronto have built a

system (Drascic and Milgram 1996) for path planning using AR technology There are

others that have also used dynamic overlays with telepresence systems (Satava

73

1999) described the future of using telepresence technology for the use in a “digital

battlefield” type environment. Surgeons of the future may not have to live and work on

the battlefield. He claims that approximately 90% of the knowledge a physician

requires can be obtained through electronic means, such as diagnostic sensors and

imaging modalities, directly seeing the patient with a video camera for medical

consultation, or using electronic medical records. Using these modalities remotely

through a telepresence interface is a natural evolution of medical systems of the

future. In the methods section, detailed client-server software design is provided

which shows how the current IGS and AR system architecture can be used for

telepresence applications.

4.1.4 Live View with Registered Data

Combining video with graphics can be done in a number of ways. Once the

position and orientation of the camera and the objects is known an AR scene can be

generated. In computer graphics, AR is achieved by the alignment of the virtual

(graphical camera) with the actual camera. The techniques of texture mapping and

3D rendering are used to position the virtual segmented objects within the augmented

scene. In addition, a technique of chroma-keying can be used (Azuma 1997). The

background of the virtual scene can be set to a particular color (say orange). None of

the objects in the video scene should have that particular color. An algorithm which

takes all the orange areas and replaces them with the video view will produce a

picture which shows the virtual object on the video view. If the information of the 3D

coordinates of all surfaces in the scene was known, a depth search at each pixel

74

could be done to determine if the virtual object or the video object was closer. The

closer object would be drawn and the other discarded. In this type of display the

virtual object could then be displayed as behind the real object and vice-versa.

Although this form of visualization is beneficial in certain situations, it was not

preformed in this thesis because it is advantageous to see the projected image of the

objects of interest on the video view to perform, for instance, a craniotomy.

4.2 IMPLEMENTATION OF AUGMENTED REALITY

The steps taken to generate a Medical AR scene are illustrated in Figure 4-3.

After image data collection, segmentation of the objects of interest in the image space

allows the creation of 3D graphical models that are needed for augmentation. This

step can be performed for image guidance also, but, it is not a necessary step since

the orthogonal views of the imaging modality can suffice for navigation (and frequently

are the only ones used for navigation). The key differences between IGS and AR are

the camera parameter estimation, the camera mounting and the mixing of the live

video and graphics. Camera parameter estimation (camera calibration) to build up a

virtual camera that closely models the actual camera and pose estimation of the

camera needs to be done only once for the system, but, can be done periodically in

order to verify the camera system. These two steps will be covered in detail. When

generating the graphical view of the virtual objects the relative spatial positions and

orientations of the virtual camera and virtual objects at this step are the same as

those of the actual camera and actual objects of interest. Furthermore, the virtual

camera and virtual objects very accurately model their actual replicas. This allows the

75

graphical view of the virtual objects to be seen from the actual camera point of view

on the live video if the objects of interest are visible. The mixing of the live video (H.

Kato 2000) from the actual camera with the graphical view from the virtual camera

enables the surgeon to see the segmented objects of interest projected on the video

view from any perspective and at any depth.

Figure 4-3: These are the steps needed to generate both a Neuro Navigation (NN) System and an Augmented Reality System. Note that AR represents an extension to Neuronavigation and can be performed simultaneously with NN.

4.2.1 Coordinate Systems

At a conceptual level, there are several coordinate systems in AR that need to

be understood in order to correctly achieve the results. Figure 4-4 illustrates the

coordinate systems. The first coordinate system is the world coordinate. The robot

and all the other components reside in the world coordinate system. In dynamic

systems, the base of the robot is also tracked (sometime with infrared technology) or

aligned and fixed to the world coordinate system. In situations where the robot base

76

is mobile, there must be a means of tracking the robot base. The other coordinate

systems which include the robot base, it’s end-effector, the object(s), the camera, the

pattern used for camera calibration and the image/video all must have the correct

transformations to relate one to the other as shown below.

Figure 4-4: A series of coordinate systems for the AR development

4.2.2 Robotic-based Tracking of the Camera

We have researched view augmentation of a camera that can be mounted at

the end-effector of a robot. The live views from the robot end-effecter camera will

have synthetically generated 3D graphical views of the structures overlaid on them.

We have chosen to implement the AR system using the same hardware platform as

the IGS system (described in chapter 3) to make comparison and implementation

easier. As a testbed for the robotic-based AR system, we have mounted a miniature

camera (Watec LCL-628 7.5 mm diameter 330 TV Lines 3.9mm lens f2.8 (H-52º, V-

40º) ) on the end-effector of the Microscribe device. This device can be considered a

Object

World

Robot Base

Camera

Image

End-effector Calibration Pattern

77

passive robot (articulated arm). It’s geometric and transformation structure similarities

make it an inexpensive and useful analog to current active robotic systems. It has the

advantage of being readily accessible and amenable to quick prototype development

and evaluation. We have integrated the serial interface to the five degree-of-freedom

Microscribe and can get a precise position and orientation of the camera to the AR

interface and have generated a prototype Augmented Robotics system using this

device.

The methods to ascertain the end-effector position and orientation is exactly

the same as described in the chapter on image guidance (section 3.2.1) where details

of how the forward kinematics solution (i.e. the end-effector of the robot coordinates in

terms of the base coordinates) is defined. Each individual transform specifies how

the first joint is related to the second joint. The combination of the 4×4 homogeneous

transformation matrices, define the position and orientation of the end-effector in the

base coordinate system as follows:

EEJJJJJJJOJOBEEB TTTTTT −−−−−− ××××= 332211 (Equation 4-1)

Additional transformations are needed to compute the needed AR

transformation. A relationship between the object (O) that has to be augmented and

the camera coordinate system (C) needs to be derived ( COT − ). This transformation is

computed by knowing the transformation from the object to the base of the robot

( BOT − ), the measured transformation from the base of the robot to the end-effector

( EEBT − ) and also the transformation from the end-effector of the robot to the

coordinates of the camera ( CEET − ) as follows:

78

CEEEEBBOCO TTTT −−−− ××= (Equation 4-2)

The computed relationship of equation 4-2 will allow the alignment of the actual

camera coordinates with the coordinate system of virtual graphics camera. One of the

key registrations that need to be done is object (patient) registration ( BOT − ) (object to

base of robot). This transformation is computed using a pair-point matching that is

also done in standard neuronavigation systems and is described in detail in section

3.2.2. Knowing a set of matched pair-points (we use 5 pair-points, the minimum is 3)

between the actual patient (phantom) and the tomographic image space, an iterative

algorithm to optimize the transformation parameters can result in the needed

transformation. The Levenberg-Marquardt optimization method has been shown to

provide a fast convergence and we have used this algorithm in our implementation of

pair-point matching. In the next section, details of the method (based on the pattern-

based estimation of the extrinsic camera parameters) for the very essential

computation of the transformation CEET − , will be provided. This is the key

transformation because it computes exactly how the end-effector of the robot (a

measured entity) is related to the mounted camera system.

4.2.3 Computing the Pose of the Camera Relative to the End-Effector (TEE-C)

In some AR systems, the transformation from the tip of the tracking device to

the CCD array of the camera is determined by manual methods in a very non-

systematic way. In these systems, the real object is placed in view of the camera and

an augmentation is done. The model is then manually adjusted until it is aligned.

Then different views are chosen and the steps are repeated until it appears to be

79

correct. These methods do not achieve robust results with registration occurring at a

subset of locations and orientations. This transformation can also be measured,

however, the results are again not robust (Azuma 1997).

Figure 4-5 illustrates the different coordinate transformations that need to be

performed. In this work, we take advantage of the estimation procedure to calculate

the transformation matrix from the pattern to the camera coordinates systems ( CPT − ).

First, the pattern used for camera calibration, is rigidly fixed relative to the base of the

Micorscribe. Then the location and orientation of the pattern’s coordinate system

(relative to the robot’s base) was calculated using three collected points on the

pattern (digitized with the Microscribe). The first point ( oP ) as the origin of the

coordinate system, the second point ( xP ) was a point along the x-axis and the third

point ( yP ) was a point along the y-axis of the pattern coordinate system. At this point,

the vectors in the x and y directions ( xV ) and ( yV ) were computed as follows:

|| 0

0

PPPP

Vx

xx −

−= (Equation 4-3)

|| 0

0

PPPP

Vy

yy −

−= (Equation 4-4)

The vector in the z-direction was computed simply as the following cross

product.

yxz VVV ×= (Equation 4-5)

80

These three vectors ),,( zyx VVV along with the point oP form the 4×4

homogeneous transformation matrix from the base coordinate to the pattern

coordinate systems ( PBT − ) in the following way:

],,,[ 0PVVVT zyxPB =− (Equation 4-6)

Figure 4-5: The transformations needed to compute an AR scene. The main transform ( CEET − )- The transform from the end-effector to the camera coordinate system) is the primarytransformation matrix that is computed for AR.

The above transformation matrix ( PBT − ) can be described in terms of the

projection of the unit vectors of the pattern coordinate system onto the base

coordinate system and a transition vector from origin of the pattern to the origin of the

camera coordinate systems. Knowing this transformation from the base of the robot to

XTB-

P

TJ0-J1 Y

Z

TJ1-J2

TJ2-J3

TJ3-EE TP-C TB-EE

TEE-C

Camera

TB-J0

Base

End Effecto

TB-P

Object

TEE-C

TB-EE TO-C

TO-B X

81

the pattern ( PBT − ), knowing also the transformation from the base of the robot to the

end-effector ( EEBT − ) (which is measured by the robot) and also knowing the

transformation from the pattern to the camera coordinates (to be covered later-- CPT − ),

the needed transform from the end-effector of the robot to the camera ( CEET − ) can be

derived with Equation 4-7. How to compute the transformation from the pattern to the

camera is covered in detail in the next section.

CPPBEEBCEE TTTT −−−−

− ××= 1 (Equation 4-7)

4.2.4 Camera Calibration Used to Determine TP-C

Camera calibration is a very important issue for AR and hence is described in

brief here for clarity (Tsai 1987.). The camera calibration algorithm is used in this

research to (1) model the actual camera to be used for graphical view generation and

(2) estimate the transformation matrix between the end-effector and the camera

coordinates by using the technology to compute the transformation CPT − .

Figure 4-6 represents a camera model and the associated variables for the

estimation of the needed parameters.

Camera parameters are divided into intrinsic (focal length, principal point, skew

coefficient, and the radial/tangential distortions) and extrinsic parameters (the position

and orientation of the camera). The extrinsic parameter matrix ( wcT ), represented by

Equation 4-8 by the matrix of r and t parameters, is needed to transform objects from

the world-centered to the camera-centered coordinate system.

82

Figure 4-6 Camera Calibration Model. Objects in the World Coordinate System need to be transformed using two sets of parameters - extrinsic and intrinsic parameters.

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡=

⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡×=

1

:

3333231

2232221

1131211

w

w

w

c

c

c

wcwc zyx

trrrtrrrtrrr

zyx

PTP (Equation 4-8)

The intrinsic and the extrinsic parameters are derived from a 2-step process

that first involves a closed-form (pinhole model) solution to approximate the

parameters and then an iterative non-linear solution to obtain accurate parameters. In

a pinhole model of the camera (which ignores the radial and tangential distortion

parameters), each point in the world coordinate system is projected via a straight line

through the projection center to the image plane. This model of the camera system

only approximates the real camera projection as follows:

Ow (World Coordinate System)

m: (Camera model/Intrinsic parameters)

v

u zc

xc

yc

r

c

Oc (Camera Coordinate System)

Twc (Extrinsic Parameters)

Oi (Image coordinate System)

xw

zw

yw

83

⎥⎦

⎤⎢⎣

⎡=⎥⎥⎦

⎤

⎢⎢⎣

⎡

c

c

c yx

zf

v

u (Equation 4-9)

Note here that u and v are the coordinates of the projection of the

imaged/viewed point on the camera CCD.

By using the pinhole model, the initial estimates to the minimization process to

be described can be provided by using a direct linear transformation (DLT). The DLT

was first described by Abdel-Aziz and Karara (Abdel-Aziz Y. I. 1971). In the first step,

given N control points in the world coordinate system ),,( www zyx and their

corresponding CCD array points ),( ii vu , the camera parameters of interest in vector a

can be computed by solving the following equations:

M =

⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢

⎣

⎡

−−−−−−−−

−−−−−−−−

−−−−−−−−

nnnnnnnnnn

nnnnnnnnnn

iiiiiiiiii

iiiiiiiiii

1111111111

1111111111

vvzvyvx1zyx0000uuzuyux00001zyx



MMMMMMMMMMMM

MMMMMMMMMMMM

(Equation 4-10)

Where 0=Ma and ][ 333323122322211131211 trrrtrrrtrrra = (Equation 4-11)

The parameters vector a can be estimated with least square method, since

equation 4-11 is homogeneous (right-side is a zero vector). Note that this solution is

based on a pinhole model, lens distortion is not considered. This step only gives an

initial guess for the iterative nonlinear optimization method discussed next.

84

A brief overview of typical non-linear optimization/calibration process is given

here. Let ),,( www zyx represents the coordinates of any visible point P in a fixed

coordinate system (world coordinate system) and let ),,( ccc zyx represents the

coordinates of the same point in a camera-centered coordinate system. It is assumed

that the coordinates of the camera-centered coordinate system coincide with the

optical center of the camera and the cz axis coincides with its optical axis. An

extension to the standard calibration methods that does not take this simplification

into consideration is provided in our conference paper (Siadat M. 2002). The image

plane, which corresponds to the image-sensing array, is assumed to be parallel to the

),( cc yx plane and a distance of f to the origin (See The Tsai’s model (Tsai 1987)

governs the relationships between a point of the world space ),,( www zyx and its

projection on the camera CCD ),( ii cr . R (the rotation component) is a 3×3 rotation

matrix jir , defining the camera orientation and ),,( 321 ttt the translations component.

The following sets of equations represent the camera model that is used. Here, the

translation and rotation components (extrinsic parameters) and the intrinsic

parameters combine to form the model of the camera system (See Figure 4-6) as

follows:

vsccusrrzyfvzxfu vucccc =−=−== 00 ,,/,/ (Equation 4-12)

)ˆˆ(ˆˆˆˆˆ)(ˆ 221

214

231

33,32,31,3

13,12,11,1 vuukvgvuguggutzryxxrtzryrxr

www

www ++++++=++++++

(Equation 4-13)

)ˆˆ(ˆˆ)(ˆˆˆˆ 221

2423

22

33,32,31,3

23,22,21,2 vuvkvggvugugvtzryxxrtzryrxr

www

www ++++++=++++++

(Equation 4-14)

85

Where, 0,0,ˆ,ˆ 00 ><−

=−

= vuvu

fffccv

frru and g1, …, g4, and k1 the tangential

and radial distortions.

These equations need to be formulated in an objective function so that finding

the optimum of the functions leads us to the camera parameters. The camera

parameters are r0, c0, (image centering) fu, fv, (focal length) , radial and tangential

distortion (all intrinsic parameters) and wcT (extrinsic parameters). Where, m is the

Tsai’s model of distortion-free camera, )ˆ,ˆ( ii cr is our observation of the projection of the

i-th point on the CCD, and ))(),(( mcmr ii is its estimation based on the current estimate

of the camera model. The above objective function is a linear minimum variance

estimator (Press W.H. 1992). Note that we have n observed points in the world space.

Our objective function is as follows:

[ ] [ ]{ }∑=

+++=n

iiiij mccmrr

1

222 )(ˆ)(ˆχ (Equation 4-15)

Figure 4-7 illustrates the optimization of these parameters. n pairs of points are

collected for which the world coordinates )( www zyx and the CCD coordinates )( ii cr

are known. An initial guess of the extrinsic parameters is provided by the DLT method

described above (Equation 4-10, 4-11). After an initial guess, the camera model

parameters ( a ) are used to compute model estimated values of (r, c) for the entire

data set (Equations 4-12 to 4-14). These values are then compared with the

measured values using the objective function (Equation 4-15). If a certain tolerance is

met, the final parameters are reported. The Levenberg-Marquardt optimization

method has been shown to provide a fast convergence. It works very well in practice

86

and is considered as the standard of non-linear least-squares routines (Press W.H.

1992). Note here that the extrinsic parameters that are estimated here (R and T) form

the matrix needed in Equation 4-7, that is ( CPT − ). The entire camera calibration

procedure just described is used to compute this very important matrix. After this

matrix is computed, it allows the computation of the transformation from the end-

effector of the robot to the camera coordinate system. More details of the camera

calibration technology can be found in the publication (Heikkila 2000).

Figure 4-7: Camera Parameter Estimation. An initial guess of the extrinsic parameters comes from the DLT method. The observed CCD array points and the corresponding computed values are compared to determine if they are within a certain tolerance. If so, the iteration ends

4.2.5 How to measure AR accuracy

There are many components of errors that are observed in an AR simulation.

In this thesis, we have tried to explain where the errors of the observed AR scene

have come from. There are several sources which we have considered which include

complexities of camera calibration, the accuracy of tracking, and the imaging/3D

(observed pairs of values)

(xi, yi, zi) ↔ (ri, ci)

Model (M)

Updating Parameters

∆

r(M), c(M)

Initial Guess (a0)

ai

Within Tolerance?

Intrinsic and Extrinsic

parameters

Yes

No

Distorted Camera Image

Corrected Camera Image

87

segmentation. In order to measure the accuracy of the AR system with imaging data,

we first needed to evaluate the error of the AR portion of the technology. In order that

we isolate the error of the AR system from the errors inherent in imaging and

segmentation, we first choose a method that did not involve either imaging or the

segmentation components. We then performed a similar experiment using a phantom

skull imaged with CT data. For the non-imaging test, we used a stereotactic phantom

ring for the evaluation of the error of the system. This ring is equipped with a

moveable pointer for which the 3D position can be accurately read. The phantom is

known to be mechanically very accurate (within 0.5 millimeter) and has been used as

a gold standard in several of our previous accuracy studies that measured the

accuracy of robotic devices and also standard stereotactic systems (Li Q. 2002).

This device allowed us to pinpoint markers in the video image using the stereotacic

phantom’s 3D position pointer. We first choose several points on the surface of the

ring for which we knew the exact location relative to the ZD ring’s coordinate system.

These were the points that were used for the pair-point matching in order to

determine the pose of the ring relative to the Microscribe’s base. We then created a

model of a cube with known dimensions and placed it at the center of the phantom

space. We first placed the phantom’s pointer at the known location of each of the

corners of the cube. We then viewed the corner of each of the cube points from

orthogonal view directions to determine the error in each individual axis. In order to

compute the error for each point, we needed to know the distance from the apex of

the pointer to the corner of the virtual cube in all three axes. If we were to apply the

distance equation (equation 4-16 below) we would get the error for that point.

88

However, we took three orthogonal views of the cube corners and each view

contained two of the error measures. For instance, a view from the x direction would

give us the errors in both the y and z axes. In order to derive the needed equation

(equation 4-16) we used all three orthogonal views that we captured and determined

the two deltas for each view (See Figure 4-9). The distance squared in which the

camera was positioned on each of the orthogonal axes (x, y and z) was computed as

follows:

222)( zyxerrord ∆+∆+∆= ( Equation 4-16)

It is clear from these equations that the actual error reduces down to equation

4-16 after taking in to consideration all three views of the cube corner. Note that each

axis considered twice and once the division by 2 is performed, the actual distance can

be computed as follows:

222222222 ;; yxzzxyzyx ∆+∆=∆∆+∆=∆∆+∆=∆ (Equation 4-17)

2/)()( 222 zyxerrord ∆+∆+∆= (Equation 4-18)

Since the orthogonally of the view direction was critical, we positioned the

camera until the axis orthogonal to the image plane of the cube disappeared. In

Figure 4-9 panel D, the robot arm is shown positioned to view ZD arc’s pointer. The

actual views from the camera are then augmented with a virtual cube positioned at a

known location. The apex of the pointer is then positioned to the know location of a

particular corner of the cube being viewed. If the cube is augmented at the correct

position, the apex of the pointer would point at the corner that is being viewed. The

captured video frames were analyzed to ascertain the pixel error observed from

89

where the actual cube corners should be located (on the apex of the pointer) and

where they are in the augmentation (cube corner). The error is displayed in the view

as black dotted lines. This distance represents the 2D error from that view (See

Figure 4-9).

The deviation can be easily seen and measured very accurately both in term of

pixel error and actual mm error when scaled from a known distance in an image. The

cylindrical geometry of the ZD frame pointer provides us with a scaling reference to

transfer from the pixel-wised error measurement to an absolute value error space.

The diameter of the pointer is exactly 5mm and hence, a pixel error can be easily

converted to a distance measure by a computed scale factor. Several different sized

cubes were constructed to represent the operational volume of the AR system.

Figure 4-9: A cube is augmented on the live video from the Microscibe. Three orthogonal views are used to compute the error: (A) represents a close-up view of the pointer (the known location of the cube corner) with the video camera on the x axis, (B) the pointer here is viewed from the y axis and (C) the pointer viewed from the z axis. (D) Represents an oblique view of the scene which shows the pointer, the camera and the video view on the monitor showing the cube superimposed.

Figure 4-8: The ZD ring. This device is a

(A) X View (B) Y View

(C) Z View (D) Oblique

Y & Z Error X & Z Error

X & Y Error

90

4.2.6 Software Architecture

The major subsystems of the AR software implementation are shown in Figure

4-10. Here, the server software supplies each client (AR or IGS) on the network

position and orientation information. The camera system (which is mounted on the

robotic device) provides the video stream for both the AR clients as well as the

camera calibration. The server software as well as the registration software remains

identical to the IGS system described in the previous chapter. It is in theory possible

to supply the video stream data to any AR client on the network and have the AR view

displayed remotely. This feature is not fully-implemented for this thesis as it is more

related to telepresence and is beyond the scope of this work. It is possible, however,

to have both an IGS scene and an AR scene generated together in near real-time

while providing multiple IGS clients on the network. The AR system architecture is

similar to the one described in chapter 3 for the IGS system, except it involves a new

AR client and additional camera information.

91

Figure 4-10: Software implementation of the AR system with the Image Guidance System. Here, the Kinematic server supplies position orientation to both the AR and IGS clients. Each client has their own models of display preloaded. With this software architecture, simultaneous AR and IGS are possible.

4.3 AR SYSTEM ACCURACY

This section provides an in-depth review of the errors involved in AR

technology. Error is of critical importance in the medical field and especially in

neurosurgery where a few millimeters of error can impair a patient. We dissect the

observed error to its components in an attempt to identify avenues of error

improvement for future systems.

There are two types of errors in AR—static and dynamic. Static errors, as the

name implies, refers to the errors that exist in the augmentation even if the user/robot

is still. In any AR system, there are at least four different types of static errors: 1.

Optical distortion, errors in tracking systems, mechanical misalignments and incorrect

Tracked Video

Stream

Robotic Kinematic

Server

Object Registration

Software

Image

Guidance Clients

Augmented Reality Clients

Camera Calibration

Data

Robotic Hardware

Mounted Camera System

3D Models or

Sensor Data

92

field of view or camera parameters (Azuma R. 1997). We have considered each type

of error in this study. Optical distortion is part of every camera system. Radial

distortion has the largest component. It is dependent on the radial distance from the

optical axis and hence wide field-of-view systems are more prone to this type of error

(Heikkila 2000). Errors that are the result of tracking hardware are often time the

principal problem. Because the tracking system is used for not only the tracking, but

also for the calibration of the camera and (in the case of pair-point registration) for

determining the position of the objects in space, tracking errors are of significant

importance.

Dynamic errors occur due to a lag in the system’s ability to compute and fuse

the required frames. Generally, the computation is not real time and for very dynamic

movements, the system results in a lag between the real scene and the computed

scene. This type of error is not considered here as it is a function of hardware

speeds. For medical applications, generally, the surgeons do not move very fast and

the update rates that we observe are adequate. In this section, we first describe the

accuracy of the Microscribe system, the accuracy of the camera calibration

procedure, and then discuss the contribution of all the individual errors to the

application error of the entire prototype.

4.3.1 Accuracy of the Microscribe

In order to gage the accuracy of the Augmented Reality scene, we first

determine the accuracy of the measurement device the Microscribe. First, 40

measurement were taken of the same point using the Microscribe. It was found that

93

the average rms difference between all these points was 0.732mm. Another

experiment in which a known 50mm distance was measured using the Microscribe

ten times was performed. The average error for this measurement was 0.841 mm.

Although a more rigorous accuracy study could be conducted, we are confident that

the error of the Microscribe is within 1mm.

4.3.2 Accuracy of Camera Calibration

In camera calibration, a set of parameters is defined which serve to model the

optics of the actual camera. These measures are derived by the minimization of error

as described in the Methods section. For the camera we have used the set of

computed parameters values shown in Table 4-1. These parameters allow for the

correction of various image distortions that occur with this camera. The major

parameters of this camera model are the focal length, principal point, and the

radial/tangential distortions. After using the modeled values, the image can be

regenerated to produce the undistorted image. When this undistorted image is

compared to the ideal image (based on the grid pattern), a pixel error can be

computed for the images. The average pixel error reported are as follows: x axis

0.350 y axis 0.356 pixels. Figure 4-11 below illustrates the errors of the distorted

image. The contour lines in the images represent the error boundaries at that location

in pixels. Notice that for the radial distortion (left panel of Figure 4-11) at the center

of the image, the errors are all less than 5 pixels and converge to 0 pixels at the very

center. Notice also the errors at the corners of images are in the range of 25 pixels of

error. The tangential components (right panel of Figure 4-11) are in the range of about

94

1 pixel and plays a minor role relative to the radial distortion. Hence, the overall

camera calibration errors are primarily due to radial distortion.

Figure 4-11 The errors of the distorted image. The contours represent error boundaries. (left) Note that for

radial distortion at the center there are less than 5 pixels of error and at the corners the errors exceed 25 pixels of error. The Tangential distortion is an order of magnitude less than the radial distortion.

Table 4-1: Example of camera calibration coefficients

Pixel Error (0.3500, 0.3562) +/- [16.72, 13.55] Focal Length (704.643, 708.079) +/-[10.52, 15.94]

Principal Point 0 +/- 0 Radial Coefficients (-0.288000, -0.003537) +/- [0.0616, 0.4255,0]

Tangential Coefficients (-0.0003991, 0.001116) +/-[0.005146, 0.002516] 4.3.3 Total Application Error Dependencies

The total error of the system was computed both with and without using image

data. First, an error measurement was made without using imaging data. For this

method, the average error over the 10 measurements taken across the field of view of

the camera and covering the maximum extent of the phantom was 2.74 mm with a

Radial Component of Distortion Tangential Component of Distortion

95

standard deviation of 0.81mm and a max error of 4.05 mm. In order to ascertain the

additional error introduced by including the imaging data, another experiment in which

10 fiducials visible in the CT scan were digitized and the location of these fiducials

were recorded. One mm spheres were then modeled and placed on top of each of

the markers at the location that was predicted by the augmentation. The pixel error as

well as the scaled mm error was computed for each of the fiducials. The measured

error of the location of the fiducial in the AR scene was on average 2.75mm with a

standard deviation of 1.19 and a max error of 5.18mm. The effect of adding the CT

scan data was negligible, although a slightly higher standard deviation and max error

was observed.

Figure 9 below illustrates the individual component errors involved and their

dependencies and contributions to the overall application system. The total

Augmented Reality error is dependent on several different types of error. The tracker

(Microscribe) is a key source of error and it influences several other error estimations

(Camera pose determination and Registration error). Registration error (the error

involved in determining the transformation from the base of the tracker to the objects

being augmented) is dependent on how accurately the actual points on the patient

can be captured. In this method, actual points are captured and matched with their

image counterparts, hence, digitizer accuracy is an important factor. In addition,

registration error also involves human digitization error that relates to the fact that the

human can only achieve the actual point with an accuracy of 0.43 mm on average.

We ascertained this error by digitizing the same point from several different angles

(10 different values) and report the variation in the measured values. In addition, the

96

center of a typical fiducial is not represented by a single point, but by a circle about

0.5mm in diameter. This geometry can also lead to digitization errors. A better

fiducial design would be to have the center of the fiducial be a cone (rather than a

cylinder) that culminates to a single point. Furthermore, the computation of the

camera pose relative to the end-effector of the tracker requires the use of

digitized/tracked data also. In order to determine the camera pose relative to the end-

effector of the microscribe, we used a method that required both the extrinsic camera

parameters to be computed using the camera calibration techniques and also the

tracker end-effector transformation information (See the Methods section). Since it

was not possible to measure the actual camera position and orientation with the

precision required, we computed the pose matrix from 5 different camera angles and

report the standard deviation of the measured values. The homogeneous

transformation matrix ( CEET − ) was first decomposed to Euler angles and a position

vector. For the position component, the values had a standard deviation of 0.13, 0.30

and 0.13 mm in the x, y and z axes. Similarly the x, y, z angle (Euler angles) standard

deviation was 0.83, 1.25 and 0.44 degrees. These standard deviation values indicate

that there is an RMS mm standard deviation of about 0.35mm. Although the RMS

error for the angle measurements exceeds 1.5 degrees, this represents a spread of

error over all the computations. We cycle through the various computation and select

the one that produces the best AR result. Minor adjustments to the models can then

be made to then correct residual errors for the viewing volume if necessary. For

camera calibration, there was a pixel error of approximately 0.35pixels. This

corresponded to an error of 0.21mm of error that could be attributed to errors in the

97

camera calibration. Imaging data plays a critical role in the accuracy, however, we

have found no significant distortion error from the CT we have taken. We conjecture

that MRI or Ultrasound data will have more distortion and would have a greater

influence on the application accuracy. We have accounted for the majority of the error

(2.43mm of the total 2.75 mm) of the observed error. It is possible that the imaging

data has played a role in producing some of the unaccounted error of the system.

Figure 4-12 Errors contributing to the total error involved in Augmented Reality

4.4 DISCUSSION

Currently Neuronavigation systems provide primarily three 2-D views (coronal,

axial and sagittal views) to gain awareness of the patient geometry. The surgeon has

to perform the 2D (image) to 3D transformation in their minds and also project the

envisioned data on the view of the patient. We believe that Augmented Reality (AR)

Total Augmented Reality Error

(Avg: 2.75 mm)

Tracker Error

(Avg: 0.87mm)

Object Registration

Error (Avg: 1mm)

Camera Pose

Error RMS .35 mm

Camera Calibration

Error (0.21mm)

Imaging Data

2mm slices of CT (no apparent error)

Human Digitization

Error (Avg: 0.43mm)

98

generation is a natural extension for the surgeon because it does both the 2D to 3D

transformation and projects the views directly on the patient view. We have already

illustrated the difficulty to interpret 2D slices. In Figure 1-2, a simple 3D shape like the

cube is represented by a triangle in the coronal slice and a vessel is represented by

two dots in the axial view. This does not seem like the natural method of visualization.

This was part of the motivation for this work.

This chapter was focused on a prototype development and accuracy

evaluation of a medical Robotic Augmented Reality system. We used a passive

articulated arm (Microscribe) to track a calibrated end-effector mounted video camera.

In our prototype, the user, after registration, is able to navigate in and around a

phantom and in real-time able to visualize the objects of interest highlighted with wire

frame models from any angle and distance. In this case, we superimpose the live

video view with the synchronized graphical view of CT-derived segmented object(s) of

interest within a phantom skull. Since the accuracy of such a system is of critical

importance for medical use it was considered in detail. It is to be noted that the errors

here represent static error. Errors due to brain or organ shifting are not considered.

Intraoperative imaging methods (Open MRI or ultrasound) need to be integrated for

dynamic changes during surgery. We analyze the individual contributions and system

accuracy of the prototype. The AR accuracy mostly depends on the accuracy of: (1)

tracking technology (2) camera calibration (3) the image scanning device (e.g. CT or

MRI scanner) and (4) the object segmentation. The various accuracy measurements

are shown in Figure 4-12. After using data from a 2mm thickness CT scan the AR

error was measured at 2.75mm with a max error of 5.2mm. This error is on the

99

borderline of acceptability for neurosurgery applications (one of the most demanding

in terms of accuracy requirements).

The overall accuracy of the system can be improved in several ways. A higher

fidelity robotic tracking system could be used. The tracking device in this case

contributes about one third of the error because it is involved not only in the tracking

of the camera, but also for the registration of object space relative to its base. In the

registration phase, at least 4 points have to be digitized using the tracker. The

process of camera calibration can also be improved by choosing a more accurate

algorithm, e.g., our proposed method (Siadat M. 2002). From this study, we have

built a prototype that approaches the error requirements imposed by Neurosurgery.

We have also given details of the implementation of this prototype such that it can be

recreated. We conjecture that medical robotic devices of the future should be able to

use this technology to directly link these systems to patient data and provide the

optimal visualization of that data for the surgical team. The design and methods of

this prototype device can also be extrapolated for current medical robotics systems

and to Neuronavigation systems.

For neurosurgery the acceptable error is approximately 2-3mm. Our prototype

approaches the accuracy requirements. The accuracy can be improved with a higher

fidelity robotic tracking system and improved calibration and object registration. The

design and methods of this prototype device can be extrapolated for current medical

robotics and neuronavigation systems. It has already been translated for a space

station robotics application.

100

For the purposes of this thesis, the implementation of the AR system is

adequate for the main task of comparing the relative advantages of this new

technology with image guidance (developed in the previous chapter). The next

chapters will show the details of our Human Factors studies.

101

Chapter 5: Surgeon Factor Testing

"Nothing succeeds like a good display." Don Norman

It is not enough to develop new technology. The technology must be verified in

terms of accuracy (especially in the medical domain) and it must also be tested with

end-user/subject testing to verify if the new technology positively affects the

performance of the user (Drascic and Milgram 1996;Thompson et al. 1998;Walsh and

Beatty 2002;Weinger et al. 1998). In this phase of the thesis, there are two different,

but related Human Factors tests that were performed to understand if the developed

visualization technologies improve the performance of the surgeon. The two tests were

(1) a test to determine if an Augmented Reality display interface to surgery increases

performance and decreases the error rates of surgeons as compared to an Image

Guided Surgery system display (2) a comparison of display hardware for the video

stream that is being viewed from the remote site. In order to perform the first

comparison, a state-of-the-art image guidance system (as described in chapter 3) and

an Augmented Reality system (as described in chapter 4) were both developed using

the same hardware platform. Subject testing was performed to compare the two

systems. The second study is related to the method of display of the remote video data

to the end-user. The main question is: Does visualization of the remote video at the

surgical site by a Head Mounted Display improve the performance of the test subject

over viewing a monitor? The current robot vendors provide three variations for the

surgeon interface (See Figure 5-1). In the daVinci Medical Robotic System, the

102

surgeon is completely immersed in his world (middle panel of Figure 5-1). Here, the

surgeon must sit at a terminal, and place his face inside the display device to get a

stereoscopic view of the remote surgical site. In the Zeus system by Computer Motion,

the surgeon views the remote surgery using a stereoscopic monitor or can also choose

a heads-up-displays (HUD) where the surgeon can glance up to see her remote view

(without moving her head) and see her worksite and hands from under the monitors of

the HUD. Hence, the question of whether HUD display offers any advantages over a

monitor for viewing a remote site is an important question for Robotic surgery(Drascic et

al. 1989). In the next two sections, the two studies (AR vs. IGS) and (HUD vs. Monitor

viewing) will be presented in detail.

Heads Up Display Immersive Monitor V

Figure 5-1: Computer Motion (Zeus Robot) provides either a Heads-up Display view of the surgical site or a monitor view. The Davinci robot provides an immersive stereoscopic view of the remote video. Which configuration provides the best performance?

5.1 Image Guided Surgery vs. Augmented Reality—The Human Factors.

In this section, details of our Human Factors testing comparing the current state-

of-the-art technology (Image Guided Surgery) with our future vision on the technology

(Augmented Reality), is explained. First, an introduction and motivation is given for

103

conducting such a study, next, the methods used for the study are discussed along with

the results and conclusions drawn from our experiments.

5.1.1 Intoduction/Motivation

We could find no studies that have compared Image Guidance Technology with

Augmented Reality technology using human subjects. We feel it is important that

technology development (especially medical technology) be followed with (or be

developed in conjunction with) subject testing and evaluation (Taylor-Adams et al.

1999;Terazzi et al. 1998). This process will not only prove (or disprove) the utility of the

newly developed technology, but, help in the development of optimal user interfaces

and assist with the selection of relevant data and the appropriate display formats.

As described in chapter 3, we have developed an Image Guidance System able

to show the end-effector of a passive articulated arm (Microscribe, Immersion

technology) on orthogonal CT scans and 3D model of the phantom. Image guidance is

now a standard practice in the field of Neurosurgery. Surgeons performing complex

neurosurgery cases rely on the technology and would (or should) not attempt the

procedures (due to patient safety and liability issues) without it(Bernstein et al.

2003;Carthey et al. 2001;Cuschieri 2003). As described in Chapter 4, we use the same

passive articulated arm (Microscribe, Immersion technology) to track a calibrated end-

effector mounted video camera. In real time, we superimpose the live video view with

the synchronized graphical view of CT-derived segmented object(s) of interest within a

phantom skull. Both the AR and the IGS systems have been shown to be accurate to

within 3mm (Li 2000;Li Q 2000;Li Q 2001;Pandya A.K. ). In this section, the details of a

Human Factors study that was conducted in which twenty-one subjects (including 3

104

surgeons) were tested using the techniques of Human Factors Engineering are

presented (A. Pandya 2003;Pandya A.K. 2003b).

It is our premise that using an image guided system produces a substantial

mental load on the surgeon who has to interpret 3 orthogonal slices to form a picture of

the 3D geometry and then mentally fuse that information onto what he is viewing

(Pandya A.K. 2001e;Pandya A.K. 2003b). We believe, excessive mental load can lead

to fatigue, error and longer surgical (and organ exposure) times all of which may be

linked to the safety of the patient. This was the motivation to develop AR technology

that could help the surgeon fuse the 3D information on a live view of what he is

observing and alleviate some of the mental load and test it against the standard practice

of Image Guidance. In addition, current techniques of image guidance do not allow the

surgeon to use both real and synthetic data simultaneously. This section is focused on

the human factors analysis comparing a standard Image Guidance system with an

Augmented Reality system. This evaluation was undertaken to try and support or refute

our hypothesis that Augmented Reality is a viable technique and can improve the

performance of the surgeon over the current techniques of Image Guidance.

5.1.2 Method

For this study, there were 21 subjects, 7 female and 14 males, ranging in age

from 25 - 45. There were 3 surgeons involved in the study. The subjects were seated

in front of the Microscribe arm and a phantom skull. They were asked to manipulate the

arm in and around the phantom skull and given detailed instructions on how to work the

device. First the subjects were given time to train on both the IGS system and the AR

system. The training sessions consisted of a series of two trials and lasted as long as

105

the subjects required getting familiar with the system; typically about 10 minutes each

(although it took longer for the IGS). The actual tests were conducted in 2 trials (AR

and IGS) with a 5 minute rest between each trial. The order of administration of the AR

portion and the IGS portion for both the training session and the actual test were

counterbalanced (the orders were alternated). The total test lasted about 1 hour. During

each of the actual trials, each subject was first asked to locate certain objects inside the

un-opened phantom skull and asked to then draw an opening (craniotomy) for that

object from a particular location on the skull surface on the “surgical drape” wrapped

around the outside of the skull. The object that they were asked to draw for both the AR

and IGS were different objects of similar size and geometrical complexity. Drawing the

craniotomy is typically one of the very first steps (and a very critical one) during

neurosurgeries. If the opening is not correctly positioned, full-resection is

compromised, not to mention the patient’s safety. A similar issue of optimal port

placement also exists for robotic surgery. Hence, this task was chosen for testing. In the

AR system, the subjects were asked to position the camera away from the skull using

their non-dominant hand, locate the overlaid object in the video/object overlaid view at

the location specified by the test, place their marker in the video view with their

dominant hand and draw the object on the surface of the skull (See Figure 5-2).

For the Image Guided System, they had to determine the extent of the objects in

each of the orthogonal slices and then draw the outline of the object from that

information. Image Guidance systems primarily use three 2-D views (coronal, axial and

saigiittal views) to gain awareness of the patient geometry. The subjects had to perform

the 2D (image) to 3D transformation in their minds and also project the envisioned data

106

on the view of the patient. shows several subject screen shots during an IGS session at

various pointer locations from which he can ascertain the location and shape of the

craniotomy. Note that the objects that were chosen for this study were relatively simple

in shape and the location of the craniotomy was chosen to be orthogonal to the

orientation of the CT scans/pointer direction. Oblique/non-orthogonal craniotomies are

very difficult to perform in standard Image Guidance systems and hence were not even

tested. It is worth noting that this is relatively easy to do in an AR system. In the

display for Image Guidance, the subjects had both the 3D view with all the relevant

objects segmented, along with orthogonal 3D CT slices. Although the 3D model

information was provided, it is typically (in 99% of the cases I have been involved in)

either not provided or used by the surgeons.

After the drawing sessions for each of the techniques (IGS and AR) were

completed, the subject was asked a set of questions to gauge his understanding of 3D

arrangement of the objects within the phantom. He was asked to use the visualization

tool at hand (either Augmented Reality or Image Guidance) to determine if they were

able to understand the relationships of each of the objects that were inside the skull.

The objects inside of the skull were fabricated to be neutral objects with no anatomical

basis (cubes, bolts, cylinders, and tubes). This allowed us to test non-surgeons as well

as surgeons as anatomical knowledge did not give an advantage for the objects in

question. Examples of these questions are as follows:

Where is the pyramid? Top View: a. Front b. Middle c. Back

Side View: a. Top b. Middle c. Bottom

Front View: a. Left b. Middle c. Right

107

Is the bolt touching the vessel?

Is the bolt above or below the Cube? The complete set of questions given to

each subject during the test is provided in the appendix.

Figure 5-2: This figure shows a screen shot of a subject looking at the live video view of skull overlaid with the 3D graphics objects on the monitor. Their Marker is then placed on the edges of the overlaid object and the object is traced on the surface of the draped skull.

Figure 5-3: This figure shows screen shots of a subject looking at the orthogonal slices in am image guidance

system to find the extents and shape of the object for which the skull opening has to be made.

Each subject was timed on how long it took for the entire test which included the

drawing of the craniotomy and the answering of all the questions during the test. The

questions were graded to determine how many errors were made. The movements of

the robot arm were also recorded for later analysis (although not found to be very

useful). In addition, a questionnaire was given at the end of the test to get some

108

subjective and comparative impressions of the two systems. This post-test

questionnaire is provided in the appendix .

5.1.3 Results

Figure 5-4 shows the errors made by each of the 21 subjects. The two bars

directly above each of the subject numbers on the x axis represent the number of error

made using the Image Guidance System (first bar) and the Augmented Reality System

(the second bar). There are a few instances where no errors were made (as in the case

of the AR task for the first subject) in which case no bar appears in that location.

Figure 5-5 shows the times required for each subject to complete the testing. The

testing phase consisted of drawing the craniotomy, and answering all the questions

given. All but three of the subjects tested were non-medical (mostly Engineering

Graduate Students). Tables 5-1 and 5-2 show the summary statistics as well as the

paired t-test results for each of the data sets. It is interesting to note that all three of the

surgeons made more errors while taking more time during their IGS tasks. In addition

to the objective data, there was a questionnaire of paired questions that was given to

each subject. The results, which include the average response and a pair-wise t statistic

to understand if the answers are different, are given in Table 5-3. In addition, there were

several subjective questions for which answers are provided in Tables 5-4 to 5-6.

109

Figure 5-4: Errors made by the subjects during the testing period.

Figure 5-5 The time required for each subject to complete the craniotomy task and answer questions on.

Errors in Tasks (IGS vs. AR)

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

1 3 5 7 9 11 13 15 17 19 21

Subject Number

Erro

rs

error nnerror ar

IGSAR

Time to Complete Tasks (IGS vs. AR)

02468

101214

1 3 5 7 9 11 13 15 17 19 21

Subject Number

time min

IGS

AR

110

Table 5-1: Errors made during testing Table 5-2: Time taken during testing

Table 5-3: Paired Questionnaire Analysis between AR and IGS.

Paired Questions Average Answer

Answers t-statistic p value

Significant at α = 0.05?

1. How was the test to perform using AR?

4.7

1. How was the test to perform using IGS?

3.4

1 - Extremely difficult. 2 - Reasonably difficult. 3 - Somewhat difficult. 4 - So-so. 5 - Somewhat easy. 6 - Reasonably easy. 7 - Extremely easy.

0.001 Yes, the answers are different.

2. How accurate do you think you were in the test session using AR?

5.1

2. How accurate do you think you were in the test session using IGS?

4.7

1 - Completely inaccurate 2 - Reasonably inaccurate 3 - Barely inaccurate 4 – Borderline 5 - Barely accurate 6 - Reasonably accurate 7 - Completely accurate

0.250 No, the answers are not different.

3. How well do you think you got a feel for the position and orientation of the items in the phantom using AR ?

4.3

3. How well do you think you got a feel for the position and orientation of the items in the phantom using IGS?

4.0

1 - Extremely poor 2 - Remarkably poor. 3 - Poor. 4 - So-so. 5 - Well. 6 - Remarkable. 7 - Extremely Good.

0.015 Yes, the answers are different.

4. How often were you confused by the information you were presented AR?

4.0

4. How often were you confused by the information you were presented AR?

3.4

1 - >90% of the time 2 - >60 % of the time. 3 - 50% of the time 4 < 30 % of the time 5 < 10 % of the time 6 < 5% of the time 7 – Never.

0.055 The answers are on the borderline of being different.

1.908.47NN0.00013 2.745.38AR

P value STDAverage 1.27 1.7 NN

0.068461.02 1.04 AR

P valueSTD Average

111

Table 5-4: Table of Responses for Subjective Question 1

Question: Did you use any strategies to perform this task? Please describe how you completed this test (i.e., what did you look at, what did you think about, what did you pay attention to? Or any other comments)? * I was trying to pay attention to the area I was looking at and aligning the pointer as best as I could. The 3 views offered by IGS were very helpful (Surgeon Comment). *Color of the objects made it easier in AR. * Using the pointer view of the skull to determine position was required more for IGS, while AR provided enough information to make a direct analysis of the object relative to the probe. * In IGS, had difficulty viewing all the directional views at the same time to figure out the object locations. AR reduced the complexity. * For IGS I used the 3D images and then tried to move the pointer from there. For AR, I looked for where the image seemed the biggest and assumed that was the plane. * IGS is better than AR.


Question: Were there any parts of the simulations that you found particularly helpful or to which you paid particular attention? Please describe. * The AR was harder to tell what I was seeing and where (Surgeon Comment). *The wire mesh rendering of the object in AR provided useful information. It aided in the initial location of the object. *While using AR, having the blood vessels highlighted assisted me more so than the CT image.


Any other comments? * Incredible-This is really a breakthrough (Surgeon Comment). * Would help if AR video was in 3D stereo (Surgeon Comment). * In Image Guidance it was pretty difficult in recognizing position because of a lack of colors. * I feel that AR was much easier than the IGS. For AR I was sure of my answers, as for IGS I had to guess. * The robot arm needs more number of degrees of freedom to be more comfortable. * I think that AR is a great surgical navigation tool, but,I think that one may become an expert at using both techniques simultaneously. * With AR, it was pretty easy to identify the objects with respect to one another; however, IGS may be better at getting the exact location of the objects. * If the image remained fixed and the crosshairs moved, this may help for IGS. * The skill of using the equipment seems to be very important. * AR was a bit more confusing from the visual stance. If given the choice, I would prefer using Image Guidance.

112

5.1.4 Conclusions

We believe that Augmented Reality generation is a natural extension for the

surgeon because it does both the 2D to 3D transformation and projects the views

directly onto the patient view. Our Human Factors study indicated that IGS took a

statistically significant longer time than did AR. In addition, (although on the border of

statistical significance (p value of 0.068)), IGS did have on average a greater number of

errors. It is interesting to note that all three of the surgeons tested made more errors

using the IGS system and took longer to perform the tasks then they did with the AR

system (Pandya A.K. 2003) that AR does improve the performance and error rates of

the surgeon.

Each technique tested here (AR and IGS), however, has its advantages and

disadvantages. We have already illustrated the difficulty of interpreting 2D slices for

IGS. For example, a simple 3D shape like the cube is represented by a triangle in an

oblique CT slice. In fact, this shape was confused by some of the subjects with the

pyramid shape also present in the skull. The bolt is viewed as a circle in the axial view

and a disjoint line in the sagittal view and also confused several subjects. This does not

seem like the natural method of visualization. This was part of the motivation for trying

to improve IGS with the techniques of AR. On the other hand, AR is limited to what

objects are chosen for segmentation for that particular surgery. This can be viewed as

an advantage because it simplifies the data set before the surgery starts, however, all

the relevant data may not have been selected. It is impossible to segment all the

objects as this is a very time-consuming process. Moreover, having all the detailed

structures present would make the AR scene very cluttered and confusing. Image

113

Guidance uses the original raw imaging data and the advantage is that all the

information (albeit confusing) is present.

We speculate that certain phases of the surgery (like the craniotomy) may be

easier to execute using the Augmented Reality system. Here, the gross relative location

of the anomaly along with major vessels can easily be drawn on the surface of the skull

(even if they are non-orthogonal trajectories). Using an IGS system, this task can be

time-consuming and confusing. Other phases of the surgery where specific and very

detailed information at the resection site is needed may well be suited for an IGS. At the

risk of inundating the surgeon, perhaps a hybrid on-demand system where all three

modalities of information (3D models, live video AR scene, and an image-guidance

system) would be simultaneously available to the surgeon would be beneficial. There is

still a great deal of uncertainty with regards to the best form of visualization, however,

the current state-of-art systems, need to seriously consider AR technology as a

significant method of visualization. Medical robotic devices of the future should be able

to use this technology to directly link these systems to patient data and provide the

optimal visualization of that data for the surgical team. The design and methods of this

prototype device can, we believe, be extrapolated for both current medical robotics

systems and Image Guidance systems. We plan on improving the AR technology using

the results and comments of this study and performing another evaluation in the near

future.

114

5.2 HUMAN FACTORS TESTING (HEADS-UP DISPLAY VS. MONITORS) 5.2.1 Introduction/Motivation

When watching a few surgeons using a heads up display (HUD) system to view

the remote live video of test surgeries, I noticed an interesting behavior. Even though

moving the head in a particular orientation would not change the view point (the video

view) of the surgeon, all the surgeons moved their heads to a comfortable location as if

aligning some type of internal body coordinate system. This provided the motivation to

perform a study to understand if there were any performance advantages to using a

head-mounted display for doing endoscopic tasks vs. a monitor view.

The use of HUD in endoscopic neurosurgical procedures is previously described

in the literature, but there are very few studies reporting an objective evaluation of HUD

use (Drascic et al. 1989). In this study, we developed an experimental model to assess

the real impact of HUD on the performance of the endoscopic procedure. We believe

that this is the first study where a systematic analysis is done to understand the

difference between these two modalities of visualization.

Diverse types of display devices can give a different level of immersion in the

Virtual or Augmented world (Azuma R. 1997). There are several distinct types of

Augmented Reality displays. One is referred to as monitor-based viewing of augmented

scenes the other uses a head-mounted display (HMD). In the monitor form of

augmented reality, a video image of the actual environment is overlaid with graphics,

text or other images to enhance the data being seen and is displayed on a monitor. To

115

increase the sense of presence, a head-mounted display (HMD), also referred to as

see-through display, can be used. In this case, either the actual physical world or a

video image of the world is fused with graphics, text or other images and presented to

the operator. In optical see-through displays, an optical combiner is placed in front of

the user’s eyes. These combiners, which are partially transmissive, allow the user to

see the actual objects of interest and also have the virtual objects (which are projected

from mounted monitors). Because they are partially transmissive, the real world

appears darker as all the light is not let into to the viewers eyes. [Wanstall89]. Typically,

the user’s head is also tracked and head movement is reflected in the real video image.

The advantages and disadvantages of each of the different types of viewing systems is

explained in more detail in (Azuma R. 1997).

In contrast, some HMDs use cameras to display video signals from the real

world. They differ from HUDs in that they do not allow the user to see around the

display. There are commercially available products that consist of a set of goggles or a

helmet with tiny monitors in front of each eye to generate images seen by the wearer as

three-dimensional. This HMD is combined with a head tracker so that the images

displayed in the HMD change as the head moves. In addition, these displays can allow

the subject to see the virtual image superimposed over the real world. The wearer can

"see through" the virtual image.

The purpose of this study was to evaluate the use of heads-up display (HUD)

technology for endoscopic operations (Marchese M. and . 2003;Pandya A.K. 2001b). As

opposed to Head Mounted Displays (HMDs), HUDs allow the surgeon to view both the

116

video image (by glancing up at the monitors) as well as the surgical field. In endoscopic

operations, the surgeon's hand-eye coordination is critical for success. There are very

subtle issues involved. Slight motions of the head and hands can result in amplified

endoscopic motions due to the leaver arm effect as the entry point of the endoscope

acts as a fulcrum. In traditional endoscopic surgery, the surgeon performing the

surgery uses the room’s overhead monitors or obliquely placed monitors. Usually, he is

looking away from the surgical site while performing the surgery. With the heads-up

display, the surgeon is able to see the surgical field as well as the endoscopic view at

the same time without any head movement. This we believe is its main advantage

Figure 5-6: Different hardware methods to display a remote camera view.

Computer Graphics

Augmented Scene

Sensed Data

Optical Merging (mirror)

Position

Monitor

See Through Display

Heads Mounted Display

Video

Real World

Projector or monitor

117

5.2.2 Method

The endoscope is a surgical instrument mounted with a camera that is at the end

of ridged shaft. The shaft has various sleeves that can be used to thread tools that can

be used at the end-effector site. The major questions for this research are-- Can the

head-up display system improve the human performance (in terms of time and

accuracy) as compared to the traditional use of a monitor. Can it reduce neck strain (a

major issue for the multi-hour procedures)? And can it provide the user with more

focus/attention on their selected task?

The seven test subjects of the study were asked to use an endoscope in a

phantom brain and pick up five targets (distributed throughout the environment) using a

heads-up display or a monitor for viewing. In this case a small cutting (biopsy) tool was

used for the testing purposes. The phantom was covered with a piece of sponge and a

small hole was placed in the center of the sponge to create an opening in which the

user would guide an endoscope (See Figure 5-7). The glasses used for this study were

the iglasses (LCX2) from iO Display Systems. These glasses have TV resolution

(comparable to the monitor we used for the study). A phantom model of a brain was

prepared in which a Velcro strip was fixed in a circular track. Five small blue pieces of

plastic tubing cut at one end to provide edges to grasp were loosely fixed on the Velcro

strip. The test subject was explained the use of the endoscope with the written

instruction (Pandya A.K. 2001b).

118

There were 3 female and 4 males in the study ages 25-55. The test was

counterbalanced in terms of which system (heads-up display or monitor) was used first.

The subjects then were asked to perform the task using the two modes of visualization

Figure 5-7: The phantom skull with a black track of velcro and several blue plastic pieces of tubing that the

subject was asked to remove through the opening in the foam.

Figure 5-8: A subject performing the experiment using the monitor and the heads-up display.

In a follow-up study, to experiment further with the angle of the monitor, a test

was conducted to pick up six small objects from six tiny cavities located on the bottom

of a closed box using a 0° rigid endoscope and biopsy forceps. An electrical plate

surrounding the cavity’s edges registered any contact with the endoscope or with the

forceps with an electronic buzz. The test was formed in three different parts: in the first

the TV monitor was located in front of the subject (0°), in the second the monitor was

119

angled at 45° and in the third a HUD was used. To assure the data were free of a

"learning curve" bias, the three trials were sequenced randomly. The time to complete

the tasks and the error counts (contact with the surrounding plates) were both

measured (Marchese M. 2003).

5.2.3 Results

In the first study, five of the seven subjects did the task faster using the heads-up

display. On average, the subjects performed 8 percent better with the heads-up display

(See Figure 5-9). This particular study did not have enough subjects to be a statistically

valid comparison. Also, there was no objective method to measure the number of errors

that the subjects made. In addition, there was no comparison of placement of the

monitor for viewing. However, in the questionnaire provided, several of the subjects

commented that the use of the heads-up display helped in the concentration of the task.

It reduced external input and helped them focus on the task.at hand.

Figure 5-9: Time for testing using the HUD subtracted from time for testing using the Monitor.

Difference in time to perform task between Monitor and

Head Mounted Display

-40

-20

0

20

40

60

1 2 3 4 5 6 7

Subject Number

Tim

e (m

onito

r - h

md)

Difference in time toperform task

120

In order to improve on the results of the previous study and provide better control of the

placement of the monitor, a follow up study where fifteen subjects participated in the

test was conducted. This time, both time to complete the tasks as well as the errors

involved were measured. The average time and errors to perform the test was: HUD

194 sec-6.06 errors; monitor at 0° 224 sec-8.33 errors; monitor angled at 45° 234 sec-

8.46 errors (See Figure 5-10 and Figure 5-11). Statistically significant results (p=0.016

and p=0.014) were found matching both time and error data of the HUD with those of

the monitor angled at 45° (See Figure 5-12). The monitor at 0 degrees was on the

borderline of significance whereas the two monitor condition had very high p values

indicating that there was no difference between them. In Figure 5-13, a surgeon using

the HUD in surgery is shown and reported very good results in terms of comfort of the

surgery, the focus (reduced distractions), and accuracy.

Tab 1: Average time to perform the test

194 sec.

224 sec.234 sec.

0

50

100

150

200

250

Seco

nds

HUD Monitor 0 deg Monitor 45 deg Figure 5-10- Average time to perform a the same task for 15 subjects for the Heads Up Display, a monitor

placed directly in front of the subject and a monitor placed at a 45 degree angle from the subjects task.

121

Average number of errors that occurred during the test

6.0

8.3 8.4

0

1

2

3

4

5

6

7

8

9

Er rors

HUD Monitor 0 deg Monitor 45 deg

Figure 5-11: Average number of errors to perform he same task for 15 subjects for the Heads Up Display, a monitor placed directly in front of the subject and a monitor placed at a 45 degree angle from the subjects task.

Pairwise t-test comparison

0.076750813

0.066701264

0.0164075580.014514857

0

0.05

0.1

P va

lue

HMD vs Monitor 0': TIME

HMD vs Monitor 0': ERROR

HMD vs Monitor 45': TIME

HMD vs Monitor 45': ERROR

Monitor 0' vs 45': TIME

Monitor 0' vs 45': ERROR

1

0.64285325

0.90998928

Figure 5-12: This is the t-statistic analysis that was done on the data. Note that the two monitor conditions 0 and

45 degrees have a very high p value indicating that there is no statistical significance. When comparing the 45 degree monitor with the HUD, there is a statistical significance at alpha = 0.5, and the monitor at 0 degrees is on the borderline of significance. Note the break in the y axis to show the difference in magnitude.

122

5.2.4 Conclusions

The first and preliminary study that compared using a heads-up display with the

standard technique of a monitor suggested that using a heads up display to perform

endoscopic (“remote”) tasks was faster and caused less strain. In addition, it assisted

the user by drawing his attention to the task hence, minimizing external distractions.

The subject study indicated that the heads-up display may improve the focus and

efficiency of the operator. There were not enough subjects in this study to make a

conclusive decision. The heads-up display was used in several surgical cases with very

good results. The heads up display allows the surgeon to maintain a clear view of the

surgical field without moving her head while she was able to glance at the endoscopic

view. The surgeon commented that there was less neck strain and that she was able to

guide the surgery with a little more focus and confidence. It was said by the surgeon

that this was the fastest endoscopic neurosurgery she had ever performed.

The results obtained in the second and more rigorous study are more definitive.

The use of HUD compared with a 45° angled monitor, confirm that HUD influences

Figure 5-13: The HUD was tried in a Neurosurgery case with very good results.

123

positively the performance of the endoscopic procedure. It was shown in this study that

there was no difference in error or timing between the two monitor cases. However, the

HUD does have a positive influence on both time and number of errors.

In this chapter, two examples of Human Factors testing were given which helped

us understand some the advantages and disadvantages of our developed technology.

This is very essential as we move forward to improve on this work. The user-testing

has given us several avenues of future work. In the next and last chapter, we focus on

the contributions of this work along with the future directions

124

Chapter 6: Summary, Contributions and Future Work

What we call the beginning is often the end and to make an end is to make a

beginning. The end is where we start from. -- T. S. Eliot

This chapter will first provide a summary of the main results/contributions of the

thesis, followed by a description of what future avenues of research have opened up as

a result of this work.

6.1 Summary/Contributions

Image-guided medicine is beginning to make real contributions to patient care.

As imaging systems become more integrated with the operating room and imaging

becomes more real-time, the tools of IGS and AR will become even more useful. We

feel that there is a remarkable research potential in this field with a tremendous return

on the investment for efforts spent in improving this technology for the OR.

There are three novel contributions for this thesis. (1) A comparison of IGT with

AR. This has not been done before, probably for several reasons. One is the complexity

involved in building both types of systems for direct comparison and the other is the

relative newness of the field and the fact that Human Factors Engineering is still not in

the mainstream of medical technology development. (2) A comparison of various

display modalities for endoscopic surgery. Although various display devices have been

reported to be used in endoscopic and robotic surgery, there have been no known

studies that objectively test subject/surgeons to compare their performance using

125

various methods of visualizations. What hardware is used for visualization of robotic or

endoscopic views of the end-effector camera is very important. (3) We have created a

novel medical robotic system which uses the kinematics of the robotic device to track a

video camera and create an Augmented Reality scene which can supplement the video

view with geometrical or sensor information (Pandya A. K. 2002;Pandya A.K.

2003b;Pandya A.K. 2003c). AR blends the real world with 3D models generated from

medical scans or other data and has been demonstrated for a passive robotic system in

this thesis.

The technological contributions reported here are the implementation details of

both an Image Guidance System able to show the end-effector of a passive articulated

arm on orthogonal imaging scans and 3D models and an Augmented Reality system on

which registered 3D models are merged with live video. These are both relatively new

technologies and this detailed description of the methods for implementation is

important for future research/researchers. The implementation details provide for an in

depth understanding of the nature of the technologies and are instructive for new

researchers joining this field. Included in this thesis is a detailed section demonstrating

how 3D models are created from imaging data. This is the foundation technology of

Image Guidance and AR.

The two prototypes developments were designed to use the same hardware and

structured the software in such a way that it could impact both the Medical Robotics

world as well as the Image Guidance world. A passive articulated robotic arm (the

Microscribe) is used to develop both systems. This thesis covers the steps needed to

126

rebuild and compute both the real-time Image guided and Augmented Reality scenes

for objects of interest.

As emphasized several times in this thesis, we maintain that successful

technology development for medicine must include extensive user testing and surgical

feedback. We provide an example of how the development and testing cycle should

proceed with development, user testing, surgical feedback and further research and

development. The Human Factors contribution of this thesis is that we have provided

the details of our usability testing of two prototype systems. First, a subject test using 21

subject provided a comparison of IGS vs. AR. IGS represents what the surgeons

currently use in the operating theater and AR represents potentially new or upgraded

visualization system. The second study used a total of 22 subjects to test the use of

Heads-Up Displays vs. Monitor viewing of the remote video of either an endoscopic or

robotic interface.

The underlying hypotheses for both studies were that (1) an Augmented Reality

System will significantly improve the performance of the surgeon over an IGS system

(2) that advanced visualization hardware (i.e. a heads up display) can improve the

performance of the surgeon over a monitor view. Both our hypotheses have been

affirmed by our subject testing, albeit with some important caveats that are noted in the

respective chapters. The first Human Factors study indicated that the subjects

performed faster, with more accuracy and less errors (errors results being on the

borderline of significance) using the Augmented Reality interface. The results of the

HUD vs. Monitor study revealed that the HUD did impact positively both the

performance and the error rate of the subjects.

127

The ultimate aim of this work will be the extrapolation of the findings to the

development of AR and IGT to active medical robotic systems and to incorporate AR

techniques to image guidance systems in existence. Commercial systems should have

both technologies available on-demand of the surgeon. Examples of current robotic

systems that could take advantage of this technology include systems such as the

Neuromate and Robodoc systems (Integrated Surgical Systems Inc.), daVinci (Intutive

Inc.), and the Zeus (Computer Motions Inc.) systems. Research rarely gives definitive

answers to our questions and although positive results are provided for both

hypotheses, we feel that several more rounds of technology enhancements along with

subject testing will be essential as these technologies begin to become adopted into the

operating rooms of the future.

6.2 Future Work

There is a substantial amount of future work that needs to be performed to make

AR a routine part of main-stream surgery. Some of the major obstacles that remain are

continuous zoom camera calibration, the modeling of deformable objects, and

producing real-time AR scenes. There are many technical problems with the

development of AR. However, we recommend that future researchers also focus more

on psychophysical experiments in which one can answer important usability and

performance questions such as, “ How much error is tolerable by the surgeon, How

much dynamic error can the surgeon handle? What is the performance of the surgeon

when certain types of features are disabled? What are the limits of information

presented to the surgeon and would on-demand systems be better?” These are just a

128

few of the very relevant and important questions that need to be asked as the

technology is being further developed.

The following sections contain potential research directions that we can envision

at this juncture.

6.2.1 Stereo Augmentation

One of the surgeons that used the AR system suggested that a stereo AR

system would add value to the AR experience. This is a very valid observation

because, in order to get real depth perception in an AR scene, several points of view

are needed. One possible research direction is to pursue eyewear-free (auto-

stereoscopic/lenticular display) stereo display system for real-time video augmented

with computer generated dynamic overlays. A stereo image system should further

enhance the technology and the human computer interface. Again, after the

development of technology, subject testing for improved depth perception should be

used to quantify any improvements.

6.2.2 Sensor Technology/Data at the End Effectors of Robots

Experimentation should be done to envision sensor information. For instance,

sensors can be mounted on the end-effector of a robotic device and data taken with the

sensor can then be viewed using augmentation techniques. For example, tumor

removal requires that the surgeon be able to accurately define the tumor resection

margins. Current techniques in IGS rely primarily on visual feedback from the surgical

129

site. Here, the goal would be to enhance the sensing modalities of the surgeon by

adding sensor technology to the surgical robotic environment for the sensing of for

instance, pH, O2 and glucose levels or even higher order molecular signals such as

Raman Spectroscopy. It is hypothesized that other modalities of information from the

surgical/tumor site based on these non-visual/biochemical aspects will enhance the

surgeon’s ability to be able to more completely define resection margins. Other

examples include augmenting the 3D force/torque sensor information at the end-

effector of robots. The 3D nature of force and torque information is difficult to interpret

by the operator and perhaps registered vector flows directly on the video stream from

the remote site would be beneficial.

6.2.3 Continuous Zoom Camera Calibration

Camera calibration is a very important issue for AR technology to work correctly.

A process that can track real-time variations of camera parameters (intrinsic, camera-to-

camera pose, camera-to-worksite pose) that may occur due to operator adjustment of

camera zoom will be of utmost importance. Research to calibrate the cameras for

multiple zoom angles and to provide a robust method of interpolation in order to be able

to derive the camera parameter estimation for all zoom angles of the camera, would be

very important work. There has been extensive work done for the calibrating of a

camera with variable zoom (D. Liebowitz 1998;M. Li;O. D Faugeras 1992;Sturm 2002,).

The complexity of the calibration comes from the fact that both intrinsic and extrinsic

parameters of the camera are dependent on the zoom, the focus, and aperture settings.

In contrast, static cameras (on which our prototype is built) have only one zoom, focus

130

and aperture setting. The number of calibration points needed for a zoom camera will

be substantially greater than a static camera. Various techniques have been proposed

to reduce the number of data points (R. Atienza 2001). In these approaches, intrinsic

and extrinsic parameters are estimated for constant aperture settings for a range of

zoom angles. High-order polynomials are then used to approximate the camera

parameters in continuous mode for other zoom-focus combinations. Other researchers

have used neural networks to closely approximate the camera model (M. Ahmed 2000).

Experimentation with these methods and the integration of a robust solution to

continuous camera calibration problem will be essential.

6.2.4 3D Ultrasound for AR

A very recent development in imaging is 3D ultrasound (PM. et al. 2000.). A

conventional 2D ultrasound probe (which has been used for decades) can be used in a

novel way to produce 3D imaging. If the 2D probe is equipped with a six degrees-of-

freedom tracking device spatially registered 2D scans can be acquired. These scans

can then be mathematically combined to create a tomographic 3D image set. The

resulting 3D data image planes can be visualized by either volume rendering or by the

process of segmentation and surface rendering techniques. This data could provide live

updates to an AR system in use for surgery.

6.2.5 Space Station Robotics (to infinity and beyond)

Although the emphasis of this thesis is on Medical Robotics, it is interesting to

point out the dual-use/technology transfer of the same technology to NASA’s space

131

station robotic arm (Pandya A.K. 2001f;Pandya A.K. 2002;Pandya A.K. 2003a). In fact,

NASA has collaborated with us (supplied consultation and software) and financially

supported this work. The same technology that has been developed here to assist the

medical domain, has also been shared with NASA’s Graphics Research and Analysis

Facility at the Johnson Space Center where the technology is being used to augment a

Space Station Robot’s camera view with the appropriate graphics (for example a space

Shuttle docking target) to assist the astronaut. They have also followed suit and

performed human factors tests that concluded that Augmented Reality does improve

the performance of the robotic operator. Based on work done both as part of this thesis

and at the Johnson Space Center, NASA wants to implement this system for their

Space Station Mockup and have provided us an additional round of funding to assist

them. Possible future avenues could include merging the AR and the VR worlds such

that the operators could first practice the procedure in the overlaid virtual world and then

perform the actual operation once all the parameters were correctly adjusted.

132

Appendix Human Factors Study Subject Testing Material (AR vs. IGS)

Subject Instructions

1. You will be given detailed instruction depending on which portion of the experiment you will be performing.

2. You will be given a robotic instrument which is linked to a particular imaging system. 3. You will be first given some practice time to see how the visualization works. 4. You will then be given the task of outlining on the surface of a plastic skull where you

think the location of various objects are and will be asked to answer some question as you are performing the visualization.

5. You will be timed on how long it takes you to identify and draw the projections of these objects on the skull surface and answer all the questions.

6. The movements of the robot arm are also recorded for later analysis. 7. Your data will be completely confidential as only the subject number is recorded on the data

sheet. 8. A brief questionnaire will be given at the end of the test.

Practice Session: Please try and locate the center of each of the objects specified from the front, side and top views using the navigation tool given. You have about 5 minutes to practice each of the techniques. In the Neuronavigation—you will be provided 3 orhtogonal slices from where your pointer is located. Imagine looking at all three plane slices from the point of contact.. In the AR, you will see a projected view on a live video scene of the objects of interest. Subject # ……… M F Age ….. Image Guidance/ Augmented Reality Start time AR End Time AR Start time NN End Time NN

Figure 6-3

133

During Test Question: Please answer the following questions as you are performing the AR: Hint: Use the camera/pointer in all directions (orthogonal help the most) when determining the position of objects you are viewing when you are answering these questions (for both parts NN and AR). Tell the operator when you are starting.

1. Identify the pyramid structure and mark (color in) the projection of this object from location A (marked) from the view point of perpendicular to the surface.

2. Where is the pyramid? Circle one from each of these three lines below Top View : a. Front b. Middle c. Back Side View : a. Top b. Middle c. Bottom Front View: a. Left b. Middle c. Right 3. Where is the cylinder(facing the skull) Top View : a. Front b. Middle c. Back Side View : a. Top b. Middle c. Bottom Front View: a. Left b. Middle c. Right 4. Is the bolt touching the vessel? 5. Is the bolt above or below the Cube?

Tell the operator that you have finished. Please answer the following questions as you are performing the Neuro Navigation: Tell the operator when you are starting.

1 Identify the cube structure and mark (color in) the projection of this object from location B (marked) from the view point of perpendicular to the surface.

2 2. Where is the cube (facing the skull) Top View : a. Front b. Middle c. Back Side View : a. Top b. Middle c. Bottom Front View: a. Left b. Middle c. Right

3. Where is the bolt (facing the skull) Top View : a. Front b. Middle c. Back Side View : a. Top b. Middle c. Bottom Front View: a. Left b. Middle c. Right 4. Is the cylinder above or below the cube? 5. Is the bolt in front or behind the vessels?

Tell the operator that you have finished.

134

Post-Test Questionnaire for the project

1. How was the test to perform using Augmenter Reality? 1 - Extremely difficult. 2 - Reasonably difficult. 3 - Somewhat difficult. 4 - So-so. 5 - Somewhat easy. 6 - Reasonably easy. 7 - Extremely easy. 2. How was the test to perform using Neuro Navigation? 1 - Extremely difficult. 2 - Reasonably difficult. 3 - Somewhat difficult. 4 - So-so. 5 - Somewhat easy. 6 - Reasonably easy. 7 - Extremely easy. 3 How accurate do you think you were in the test session using Augmented Reality ? 1 - Completely inaccurate 2 - Reasonably inaccurate 3 - Barely inaccurate 4 - Borderline 5 - Barely accurate 6 - Reasonably accurate 7 - Completely accurate 4. How accurate do you think you were in the test session using NeuroNavigation ? 1 - Completely inaccurate 2 - Reasonably inaccurate 3 - Barely inaccurate 4 - Borderline 5 - Barely accurate 6 - Reasonably accurate 7 - Completely accurate 5. How well do you think you got a feel for the position and orientation of the items in the phantom Using AR ? 1 - Extremely poor 2 - Remarkably poor. 3 - Poor. 4 - So-so. 5 - Well.

135

6 - Remarkable. 7 - Extremely Good. 6. How well do you think you got a feel for the position and orientation of the items in the phantom Using NN ? 1 - Extremely poor 2 - Remarkably poor. 3 - Poor. 4 - So-so. 5 - Well. 6 - Remarkable. 7 - Extremely Good. 7. How often would you say that you were confused by the information you were presented AR? 1 ->90% of the time 2- > 60 % of the time. 3 - 50% of the time 4 < 30 % of the time 5 < 10 % of the time 6 < 5% of the time 7 – Never. 8. How often would you say that you were confused by the information you were presented NN? 1 ->90% of the time 2- > 60 % of the time. 3 - 50% of the time 4 < 30 % of the time 5 < 10 % of the time 6 < 5% of the time 7 – Never. 9. Did you use any strategies to perform this task? Please describe how you completed this test (i.e., what did you look at, what did you think about, what did you pay attention to? Or any other comments) 10. Were there any parts of the simulations that you found particularly helpful or to which you paid particular attention? Please describe. 11. Any other comments?

136

BIBLIOGRAPHY A. Pandya , M. Siadat, G. Auner (Invited Speaker). 2003 Augmented Reality vs.

Neuronavigation a Comparison of Surgeon Performance.In: Biomedical

Engineering Symposium 2003, Wayne State Univerity.

Abdel-Aziz Y. I. , Karara H. M. 1971 Direct Linear Transformation into Object Space

Coordinates in Close-Range Photogrammetry.In: Proc. Symposium on Close-

Range Photogrammetry, Urbana, Illinois, 1-18.

Alp, M. S., Dujovny, M., Misra, M., Charbel, F. T., and Ausman, J. I. 1998. Head

registration techniques for image-guided surgery. Neurological Research; 20(1);

31-37.

Ayache, N. 1995. Medical Computer Vision, Virtual-Reality and Robotics. Image and

Vision Computing; 13(4); 295-313.

Azuma R. 1997. A Survey of Augmented Reality. Presence; 6(4); 355-385.

Azuma, R. T. 1997. A survey of augmented reality. Presence-Teleoperators and Virtual

Environments; 6(4); 355-385.

Bernstein, M., Hebert, P. C., and Etchells, E. 2003. Patient safety in neurosurgery:

Detection of errors, prevention of errors, and disclosure of errors. Neurosurgery

Quarterly; 13(2); 125-137.

Berry, J., O'Malley, B. W., Humphries, S., and Staecker, H. 2003. Making image

guidance work: Understanding control of accuracy. Annals of Otology Rhinology

and Laryngology; 112(8); 689-692.

Billinghurst, M., Kato, H., and Poupyrev, I. 2001. The MagicBook: a transitional AR

interface. Computers & Graphics-Uk; 25(5); 745-753.

137

Blackwell, M., Nikou, C., DiGioia, A. M., and Kanade, T. 2000. An Image Overlay

system for medical data visualization. Medical Image Analysis; 4(1); 67-72.

Bloch, P., Lenkinski, R. E., Buhle, E. L., Hendrix, R., Bryer, M., and Mckenna, W. G.

1991. The Use of T2-Distribution to Study Tumor Extent and Heterogeneity in

Head and Neck-Cancer. Magnetic Resonance Imaging; 9(2); 205-211.

Bowersox, J. C., Cordts, P. R., and LaPorta, A. J. 1998. Use of an intuitive

telemanipulator system for remote trauma surgery: An experimental study.

Journal of the American College of Surgeons; 186(6); 615-621.

Broll, W., Schafer, L., Hollerer, T., and Bowman, D. 2001. Interface with angels: The

future of VR and AR interfaces. Ieee Computer Graphics and Applications; 21(6);

14-17.

Bucholz, R. D., Smith, K. R., Laycock, K. A., and McDurmont, L. L. 2001. Three-

dimensional localization: From image-guided surgery to information-guided

therapy. Methods; 25(2); 186-200.

Burkart A., Debski RE., McMahon PJ., Rudy T.,, and Fu FH, M. V., van Scyoc A,

Woo SL. 2001. Precision of ACL Tunnel Placement Using Traditional and

Robotic Techniques. Computer Aided Surgery; 6; 270-278.

Carthey, J., de Leval, M. R., and Reason, J. T. 2001. The human factor in cardiac

surgery: Errors and near misses in a high technology medical domain. Annals of

Thoracic Surgery; 72(1); 300-305.

Cash, D. M., Sinha, T. K., Chapman, W. C., Terawaki, H., Dawant, B. M., Galloway, R.

L., and Miga, M. I. 2003. Incorporation of a laser range scanner into image-

138

guided liver surgery: Surface acquisition, registration, and tracking. Medical

Physics; 30(7); 1671-1682.

Chen, Y. M., Guo, W. H., Huang, F., Wilson, D., and Geiser, E. A. 2003. Using prior

shape and points in medical image segmentation. Energy Minimization Methods

in Computer Vision and Pattern Recognition, Proceedings; 2683; 291-305.

Chmielewski, C., Pandya, A., Woolford, B., Adolf, J., Whitmore, M., Berman, A. H., and

Maida, J. (1998). "Comparison of the Features of Multimedia and Virtual Reality

for use in Learning." LMSMSS 32906, NASA, Houston.

Chmielewski C., Pandya, A., Adolf, J., Whitmore, M., Berman, A., Woolford, B., and

Maida, J. C. 1999 Comparison Of The Features Of Multimedia And Virtual

Reality For Use In Learning.In: Electronic Proceedings of the 1999 International

Conference on Computer-Aided Ergonomics and Safety, Barcelona, Spain.

Cho, Y. K., and Neumann, U. 2001. Multiring fiducial systems for scalable fiducial-

tracking augmented reality. Presence-Teleoperators and Virtual Environments;

10(6); 599-612.

Cleary K. , Nguyen C. 2001. State of the Art in Surgical Robotics: Clinical Applications

and Technology Challenges. Computer Aided Surgery; 6; 312-328.

Cline, H. E., Dumoulin, C. L., Hart, H. R., Lorensen, W. E., and Ludke, S. 1987. 3d

Reconstruction of the Brain from Magnetic-Resonance Images Using a

Connectivity Algorithm. Magnetic Resonance Imaging; 5(5); 345-352.

Cline, H. E., Dumoulin, C. L., Lorensen, W. E., Souza, S. P., and Adams, W. J. 1991.

Volume Rendering and Connectivity Algorithms for Mr Angiography. Magnetic

Resonance in Medicine; 18(2); 384-394.

139

Cuschieri, A. 2003. Medical errors, incidents, accidents and violations. Minimally

Invasive Therapy & Allied Technologies; 12(3-4); 111-120.

D. Liebowitz , A. Zisserman. 1998 Metric Rectification for Perspective Images of

Planes.In: in Proc. IEEE Conf. on CVPR, 482-488.

Danisch, L. A. 1997. Fiber-optic shape sensors (TM) and shape tape (TM).

Measurements & Control (186); 99-102.

Delcker, A., and Tegeler, C. 1998. Development and application of 3D ultrasound in

neurology. Aktuelle Neurologie; 25(2); 56-62.

D'Esposito, M., Deouell, L. Y., and Gazzaley, A. 2003. Alterations in the bold FMRI

signal with ageing and disease: A challenge for neuroimaging. Nature Reviews

Neuroscience; 4(11); 863-872.

DiGioia, A. M. 1998. Computer assisted orthopaedic surgery: Medical robotics and

image guided surgery - Comment. Clinical Orthopaedics and Related Research

(354); 2-4.

Drascic, D., and Milgram, P. 1996 Perceptual Issues in Augmented Reality.In: SPIE,

San Jose, 123-134.

Drascic, D., Milgram, P., and Grodski, J. J. 1989 Learning Effects in Telemanipulation

With Monoscopic Versus Stereoscopic Remote Viewing.In: IEEE International

Conference on Systems, Man, and Cybernetics, Boston.

Erdi, Y. E., Humm, J. L., Imbriaco, M., Yeung, H., and Larson, S. M. 1997. Quantitative

bone metastases analysis based on image segmentation. Journal of Nuclear

Medicine; 38(9); 1401-1406.

140

Ferrant, M., Nabavi, A., Macq, B., Black, P. M., Jolesz, F. A., Kikinis, R., and Warfield,

S. K. 2002. Serial registration of intraoperative MR images of the brain. Medical

Image Analysis; 6(4); 337-359.

Fleute, M., and Lavallee, S. 1999. Nonrigid 3-D/2-D registration of images using

statistical models. Medical Image Computing and Computer-Assisted

Intervention, Miccai'99, Proceedings; 1679; 138-147.

Freysinger, W., Truppe, M. J., Gunkel, A. R., Thumfart, W. F., Pongracz, F., and

Maierbaeuerl, J. 1997. Interactive telepresence and augmented reality in ENT

surgery: Interventional Video Tomography. Cvrmed-Mrcas'97; 1205; 817-820.

Fuchs, H., Livingston, M. A., Raskar, R., Colucci, D., Keller, K., State, A., Crawford, J.

R., Rademacher, P., Drake, S. H., and Meyer, A. A. 1998. Augmented reality

visualization for laparoscopic surgery. Medical Image Computing and Computer-

Assisted Intervention - Miccai'98; 1496; 934-943.

Fuchs, H., State, A., Pisano, E. D., Garrett, W. F., Hirota, G., Livingston, M., Whitton, M.

C., and Pizer, S. M. 1996. Towards performing ultrasound-guided needle

biopsies from within a head-mounted display. Visualization in Biomedical

Computing; 1131; 591-600.

Gallen, C. C., Bucholz, R., and Sobel, D. F. 1994. Intracranial Neurosurgery Guided by

Functional Imaging. Surgical Neurology; 42(6); 523-530.

Goldsby M.E., Pandya, A.K., Maida, J.C., Hancock, L.H. 1994 Scripting Human

Animations in a Virtual Environment.In: Preceedings of the Fourth International

Symposium on Measurements and Control in Rototics: Topical Workshop of

Virtual Realisty, pp. 1434-150.

141

Gong J. , Zamorano L. , Li Q.H., Pandya A.K. , Diaz F.:. 1999 Tradeoff Analysis of

Medical Image Registration Strategy.In: 49th Annual Meeting of the Congress of

Neurological Surgeons, Boston, Massachusetts, pp 519-520.

Grimson W.E.L., Ettinger G.J., White S.J., Lozano-Pérez T., Wells III W.M. , Kikinis R.

1996. An Automatic Registration Method for Frameless Stereotaxy, Image

Guided Surgery, and Enhanced Reality Visualization. IEEE Trans. Medical

Imaging; 15(2); 129--140.

H. Kato, M. Billinghurst, I. Poupyrev, K. Imamoto, K. Tachibana. 2000 Virtual Object

Manipulation on a Table-Top AR Environment.In: Proceedings of ISAR 2000.

Harders, M., and Szekely, G. 2003. Enhancing human-computer interaction in medical

segmentation. Proceedings of the Ieee; 91(9); 1430-1442.

Hattori A. , Suzuki N., Hashizume M., Akahoshi T. , Konishi K., Yamaguchi S. ,

Shimada M. , Hayashibe M. 2003 A Robotic Surgery System (da Vinci) with

Image Guided Function - System Architecture and Cholecystectomy

Application.In: Medicine Meets Virtual Reality 11, 110-116.

Heikkila, J. 2000. Geometric Camera Calibration Using Circular Control Points. IEEE

PAMI; 22(10).

Hinsche, A. F., and Smith, R. M. 2001. Image-guided surgery. Current Orthopaedics;

15(4); 296-303.

Holly, L. T., and Foley, K. T. 2003. Intraoperative spinal navigation. Spine; 28(15); S54-

S61.

142

Horkaew, P., and Yang, G. Z. 2003. Optimal deformable surface models for 3D medical

image analysis. Information Processing in Medical Imaging, Proceedings; 2732;

13-24.

Hounsfield, G. N. 1980. Computed Medical Imaging - Nobel Lecture, December 8,

1979. Journal of Computer Assisted Tomography; 4(5); 665-674.

Hoznek, A., Zaki, S. K., Samadi, D. B., Salomon, L., Lobontiu, A., Lang, P., and Abbou,

C. C. 2002. Robotic assisted kidney transplantation: An initial experience.

Journal of Urology; 167(4); 1604-1606.

Iseki, H., Masutani, Y., Iwahara, M., Tanikawa, T., Muragaki, Y., Taira, T., Dohi, T., and

Takakura, K. 1997. Volumegraph (Overlaid three-dimensional image-guided

navigation) - Clinical application of augmented reality in neurosurgery.

Stereotactic and Functional Neurosurgery; 68(1-4); 18-24.

Iseki, H., Muragaki, Y., Taira, T., Kawamata, T., Maruyama, T., Naemura, K., Nambu,

K., Sugiura, M., Hirai, N., Hori, T., and Takakura, K. 2001. New possibilities for

stereotaxis - Information-guided stereotaxis. Stereotactic and Functional

Neurosurgery; 76(3-4); 159-167.

Jannin P. , Fleig O. J., Seigneuret E. , Grova C., Morandi X. , Scarabin J.M. 2000. A

Data Fusion Environment for Multimodal and Multi-Informational Neuro-

Navigation. Journal of Computer Aided Surgery; 5(1); 11-17.

Kaplan, W. (1981). Advance Mathematics for Engineers, Addison Wesley, Reading MA.

Kappert, U., Cichon, R., Schneider, J., Gulielmos, V., Ahmadzade, T., Nicolai, J.,

Tugtekin, S. M., and Schueler, S. 2001. Technique of closed chest coronary

143

artery surgery on the beating heart. European Journal of Cardio-Thoracic

Surgery; 20(4); 765-769.

Kato H. , Billinghurst M., Poupyrev I., Imamoto K., Tachibana K. 2000 Virtual Object

Manipulation on a Table-Top AR Environment.In: Proceedings of ISAR 2000.

Khamene A., Wacker F., Lewin J. 2003. An Augmented Reality System for MRI-Guided

Needle Biopsies.In: Medicine meets Virtual Reality 11, 151-157.

Knight, C., Cao, A., Lorincz, A., Gidell, K., Langenburg, S., and Klein, M. 2003a.

Application of a Surgical Robot to Open Microsurgery:. Pediatric Endosurgery &

Innovative Technique; 7(3); 227-232.

Knight, C., Cao, A., Lorincz , A., Gidell K., Langenburg, S., and Klein, M. 2003b.

Application of a Surgical Robot to Open Microsurgery: The Equipment. Pediatric

Endosurgery & Innovative Technique; 7(3); 227-232.

Komistek, R. D., Dennis, D. A., and Mahfouz, M. 2003. In vivo fluoroscopic analysis of

the normal human knee. Clinical Orthopaedics and Related Research (410); 69-

81.

Lee, C. C., Chung, P. C., and Tsai, H. M. 2003. Identifying multiple abdominal organs

from CT image series using a multimodule contextual neural network and spatial

fuzzy rules. Ieee Transactions on Information Technology in Biomedicine; 7(3);

208-217.

Lei, T. H., Udupa, J. K., Odhner, D., Nyul, L. G., and Saha, P. K. 2003. 3DVIEWNIX-

AVS: a software package for the separate visualization of arteries and veins in

CE-MRA images. Computerized Medical Imaging and Graphics; 27(5); 351-362.

144

Leventon, M. E. (2000). "Statistical Models in Medical Image Analysis," Ph.D.Thesis,

MIT, Boston.

Li Q, Zamorano L., Gong J., Pandya A.K. , Diaz: F. 2000 Application Accuracy of

Different Registration Methods in Frameless Computer-Assisted Surgery.In:

American Association of Neurological Surgeons Annual Meeting, San Francisco,

California, 125.

Li, Q., Zamorano L., Guthikonda M., Pandya A.K., Perez R., Diaz F. 2000 The

Application Accuracy of Intraoperative Registration and Related Factors in Image

Guided Neurosurgery.In: Congress of Neurological Surgeons Annual Meeting.

Li Q, Z. L., , Pandya A, Gong J, Elkhatib E, Perez R, Diaz F. 2001 The Application

Accuracy of the NeuroMate Robot System.In: AANS Annual Meeting, Toronto,

Ontario, 92.

Li Q. , Zamorano L. , Jiang Z. , Gong J., Pandya A.K., Perez R., Diaz F. 1999a. Effect

of optical digitizer selection on the application accuracy of a surgical localization

system - a quantitative comparison between the OPTOTRAK and flashpoint

tracking systems. Computer Aided Surgery; 4; 322-327.

Li Q. , Zamorano L., Pandya A.K., Perez R. ,Gong J., Diaz F. 2002. The Application

Accuracy of the NeuroMate Robot - A Quantitative Comparison with Frameless

and Frame-based Surgical Localization Systems. Computer Aided Surgery; 7;

90-98.

Li Q., Zamorano L., Perez R., Gong J., Pandya A.K., Diaz F.:. 1999b Endoscopic

Transnasal, Transseptal, and Transsphenoidal Approach for Pituitary tumors

145

Guided by Infrared Tracking System.In: 49th Annual Meeting of the Congress of

Neurological Surgeons, Boston, Massachusetts, 325 - 326.

Livingston, M. A., and State, A. 1997. Magnetic tracker calibration for improved

augmented reality registration. Presence-Teleoperators and Virtual

Environments; 6(5); 532-546.

M. Li. "Camera Calibration of the KTH Head-Eye System." TRITR-NA-9407, Dept. of

Numerical Analysis and Computer Science.

M. Ahmed, A. Farag. 2000 A Neural Optimization Framework for Zoom Lens Camera

Calibration.In: IEEE CVPR.

Marchese M., L. Q., Zamorano L. Pandya A. 2003 Quantitative Comparison

between the Heads-up-display (HUD) and Common Monitor in Endoscopic Surgery.In:

The 71th Annual Meeting of The American Association of Neurological

Surgeons, San Diego, California.

Marchese M., P. A., Mahmoud M., Higgins M., Li Q., Zamorano L, and . 2003

Quantitative Comparison between the Heads-up-display (HUD) and Common

Monitor in Endoscopic Surgery.In: Congress of Neurological Surgeons Annual

Meeting, Philadelphia.

Masutani, Y., Doshi, T., Yamane, F., Iseki, H., and Takakura, K. 1998. Augmented

reality based visualization system for intravascular neurosurgery,. Journal of

Computer Aided Surgery; 3(5); 239-47.

Maurer, C. R., Gaston, R. P., Hill, D. L. G., Gleeson, M. J., Taylor, M. G., Fenlon, M. R.,

Edwards, P. J., and Hawkes, D. J. 1999. AcouStick: A tracked A-mode

ultrasonography system for registration in image-guided surgery. Medical Image

146

Computing and Computer-Assisted Intervention, Miccai'99, Proceedings; 1679;

953-962.

Miller, M. I., Christensen, G. E., Amit, Y., and Grenander, U. 1993. Mathematical

Textbook of Deformable Neuroanatomies. Proceedings of the National Academy

of Sciences of the United States of America; 90(24); 11944-11948.

Nakao, N., Nakai, K., and Itakura, T. 2003. Updating of neuronavigation based on

images intraoperatively acquired with a mobile computerized tomographic

scanner: Technical note. Minimally Invasive Neurosurgery; 46(2); 117-120.

Nio, D., Bemelman, W. A., den Boer, K. T., Dunker, M. S., Gouma, D. J., and van Gulik,

T. M. 2002. Efficiency of manual vs robotical (Zeus) assisted laparoscopic

surgery in the performance of standardized tasks. Surgical Endoscopy and Other

Interventional Techniques; 16(3); 412-415.

Nowinski, W. L., Belov, D., and Benabid, A. L. 2003. An algorithm for rapid calculation

of a probabilistic functional atlas of subcortical structures from

electrophysiological data collected during functional neurosurgery procedures.

Neuroimage; 18(1); 143-155.

O. D Faugeras, Luong, Q. T., and Maybank, S. J.,. 1992 Camera Self-Calibration:

Theory and Experiments.In: n Proc. Of European Conf. Computer Vision, Santa

margherita Ligure, 321-334.

Pandya A. K. , Zamorano L. (patent authors in alphabetical order). (2002). "Augmented

Tracking Using Video, Computer Data and/or Sensing Technologies." Application

No. 10/101421 Customer Number 26646, Wayne State University, USA.

147

Pandya A.K. , .Zamorano, L. (2001a). "The Development and Human Factors Analysis

of Advanced 3-D Visualization for Telepresence." Galveston, Tx.

Pandya A.K., Aldridge Ann, Goldsby M., Maida J. 1994 Analysis of Human Posture

using a Strength Model and a Virtual Environment.In: Houston Society for

Medicine and Engineering, Houston, Tx.

Pandya A.K., Li Q., Zamorano L., Perez-de la Torre R. 2000a The Application

Accuracy of the Neuromate Robot ---- A Quantitative Comparison with Frameless

Infrared and Frame Based Surgical Localization Systems.In: Computer Assisted

Orthopeadic Surgery (CAOS), Pittsburg, Pennsylvania., 261.

Pandya A.K. , M. Siadat, G. Auner. Design, Implementation and Accuracy of a

Prototype for Medical Robotic Vision Augmentation. Computer Aided Surgery;

(Submitted).

Pandya A.K. , Siadat M., Li Q., Gong J., Zamorano L., Martinez J., Perez R., Maida

J.C. 2001b Does Heads-up Display Improve Neurosurgical Endoscopic

Procedures?In: Congress of Nurological Surgery, San Diego, California.

Pandya A.K., Siadat M., Gong J., Li. Q, Zamorano, L , Maida J.C. 2001c Towards

Using Augmented Reality for Neurosurgery.In: Medicine Meets Virtual Reality 9:

Outer Space, Inner Space, Virtual Space, Newport Beach, CA.

Pandya A.K., Siadat M., Maida J., Auner G., Zamorano L. 2003a Robotic Vision

Registration and Live-Video Augmentation-- A Prototype for Medical and Space

Station Robots.In: Bioastronautics Investigators Workshop, Galveston, Texas,

27.

148

Pandya A.K. , Siadat M., Zamorano L. ,Gong J.,Li Q. , Maida J.C., Kakadiaris I. 2001d

Tracking Methods for Medical Augmented Reality.In: Medical Image Computing

and Computer-Assisted Intervention - MICCAI 2001 (Lecture Notes in Computer

Science), Utrecht, The Netherlands,, 1406-1408.

Pandya A.K., Siadat M., Zamorano L., Gong J., Li Q. , Maida J.C., Kakadiaris I. 2001e

Augmented Robotics for Neurosurgery.In: American Association of Neurological

Surgeons, Toronto, Ontario.

Pandya A.K., Siadat M.,Auner G., Kalash M., Ellis R.D. 2003b Development and

Human Factors Analysis of Neuronavigation vs. Augmented Reality.In: Medicine

Meets Virtual Reality, Newportbeach, CA.

Pandya A.K., Siadat M.. , Ye Z. , Prasad M., Auner G., Zamorano L. , Klein M. 2003c

Medical Robot Vision Augmentation--A Prototype.In: Medicine Meets Virtual

Reality, Newport Beach, California, 85.

Pandya A.K., Zamorano L , Siadat M. ,Gong J., Li Q.. Maida J.C., Daryan L. 2002 The

Development and Human Factors Analysis of Advanced 3-D Visualization for

Telepresence-- NASA Grant Report Year 2.In: NASA's Space Human Factors

Workshop, Center for Advanced Space Studies, Houston , Tx.

Pandya A.K., Zamorano L. Li Q., Gong J., Grosky W., Miada, J.C. 2000b Advanced

Surgical Image Environments.In: Detroit Neurosurgery Conference, Detroit, Mi.

Pandya A.K., Zamorano L., Siadat M. , Li Q. , Gong J., Maida J.C. 2001f Augmented

Robotics for Medical and Space Applications.In: Human Systems 2001, NASA

Johnson Space Center, Houston, Tx.

149

Partin, A. W., Adams, J. B., Moore, R. G., and Kavoussi, L. R. 1995. Complete robot-

assisted laparoscopic urologic surgery: a preliminary report. J Am Coll Surg;

181(6); 552-7.

Patel, V., Vannier, M., Marsh, J., and Lo, L. 1996. Assessing Craniofacial Surgical

Simulation. IEEE Computer Graphics and Applications; 46-54.

PM., T., Gee, A., Prager, R., and Berman, L. 2000. Body-centered visualisation for

freehand 3D

ultrasound. Ultrasound in Medicine and Biology; 26(4); 539–550.

Pransky, J. 2001. An intelligent operating room of the future - an interview with the

University of California Los Angeles Medical Center. Industrial Robot; 28(5); 376-

380.

Press W.H., Teukolsky S.A., Vetterling W.T. , Flannery. (1992). Numerical Recipes in

C++: The Art of Scientific Computing, Press Syndicate of the University of

Cambridge.

R. Atienza, A. Zelinsky. 2001 A Practical Zoom Camera Calibration Technique: An

Application of Active Vision for Human-Robot Interaction.In: Australian

Conference on Robotics and Automation, Sydney, Australia.

Raya M. A. , Marcinek H. V., . Saez J. M. M, . Sanchez R. T, Lizandra M. C. J., Aranda

, Gomez J. A. G. 2003 Mixed Reality for Neurosurgery: A Novel Prototype.In:

Medicine meets Virtual Reality 11, 11-15.

Riviere, C. N., Ang, W. T., and Khosla, P. K. 2003. Toward active tremor canceling in

handheld microsurgical instruments. Ieee Transactions on Robotics and

Automation; 19(5); 793-800.

150

Roberts, D. W., Lunn, K., Sun, H., Hartov, A., Miga, M., Kennedy, F., and Paulsen, K.

2001. Intra-operative image updating. Stereotactic and Functional Neurosurgery;

76(3-4); 148-150.

Robinett, W. 1992. Synthetic Experience: A Proposed TAxonomy. Presence; 1(2); 229-

247.

Rosenthal, M., State, A., Lee, J., Hirota, G., Ackerman, J., Keller, K., Pisano, E. D.,

Jiroutek, M., Muller, K., and Fuchs, H. 2002. Augmented reality guidance for

needle biopsies: An initial randomized, controlled trial in phantoms. Medical

Image Analysis; 6(3); 313-320.

Samset, E., and Hirschberg, H. 2003. Image-guided stereotaxy in the interventional

MRI. Minimally Invasive Neurosurgery; 46(1); 5-10.

Satava, R. M. 1999. Emerging technologies for surgery in the 21st century. Archives of

Surgery; 134(11); 1197-1202.

Sato, Y., Nakamoto, M., Tamaki, Y., Sasama, T., Sakita, I., Nakajima, Y., Monden, M.,

and Tamura, S. 1998. Image guidance of breast cancer surgery using 3-D

ultrasound images and augmented reality visualization. Ieee Transactions on

Medical Imaging; 17(5); 681-693.

Siadat M., Pandya A.K. ,Zamorano L., Li Q., Gong J., Maida J.,. 2002 Camera

Calibration for Neurosurgery Augmented Reality.In: World Multiconference on

Systemics, Cybernetics and Informatics, Orlando Florida, July 14-18.

Sturm, P. 2002,. Critical Motion Sequences for the Self-Calibration of Cameras and

Stereo Systems with Variable Focal Length. Image and Vision Computing; 20;

415-426.

151

Taylor, R., Jensen, P., Whitcomb, L., Barnes, A., Kumar, R., Stoianovici, D., Gupta, P.,

Wang, Z. X., deJuan, E., and Kavoussi, L. 1999. A steady-hand robotic system

for microsurgical augmentation. International Journal of Robotics Research;

18(12); 1201-1210.

Taylor, R. H., Dario, P., and Troccaz, J. 2003. Special issue on medical robotics. Ieee

Transactions on Robotics and Automation; 19(5); 763-764.

Taylor, R. H., and Stoianovici, D. 2003. Medical robotics in computer-integrated

surgery. Ieee Transactions on Robotics and Automation; 19(5); 765-781.

Taylor-Adams, S., Vincent, C., and Stanhope, N. 1999. Applying human factors

methods to the investigation and analysis of clinical adverse events. Safety

Science; 31(2); 143-159.

Terazzi, A., Giordano, A., and Minuco, G. 1998. How can usability measurement affect

the re-engineering process of clinical software procedures? International Journal

of Medical Informatics; 52(1-3); 229-234.

Tewari, A., Peabody, J., Sarle, R., Balakrishnan, G., Hemal, A., Shrivastava, A., and

Menon, M. 2002. Technique of da Vinci robot-assisted anatomic radical

prostatectomy. Urology; 60(4); 569-572.

Thompson, J. M., Ottensmeyer, M. P., and Sheridan, T. B. 1998. Human factors in tele-

inspection and tele-surgery: Cooperative manipulation under asynchronous video

and control feedback. Medical Image Computing and Computer-Assisted

Intervention - Miccai'98; 1496; 368-376.

152

Tsai, A., Wells, W., Tempany, C., Grimson, E., and Willsky, A. 2003. Coupled multi-

shape model and mutual information for medical image segmentation.

Information Processing in Medical Imaging, Proceedings; 2732; 185-197.

Tsai, R. 1987. A Versatile Camera Calibration Technique for High-Accuracy 3D

Machine Vision Metrology Using Off-The-Shelf TV Cameras and Lenses. IEEE J.

of Robotics and Automation; RA-3(4).

Van Loan, F. (2000). Introduction to Scientific Computing-- A Matrix-Vector Approach

Using matlab, Prentice-Hall, Upper Saddle River, NJ.

Vayssiere, N., Hemm, S., Cif, L., Picot, M. C., Diakonova, N., El Fertit, H., Frerebeau,

P., and Coubes, P. 2002. Comparison of atlas- and magnetic resonance

imaging-based stereotactic targeting of the globus pallidus internus in the

performance of deep brain stimulation for treatment of dystonia. Journal of

Neurosurgery; 96(4); 673-679.

Wadley, J. P., and Thomas, D. G. T. 2000. Neuronavigation: Accuracy, benefits, and

pitfalls. Neurosurgery Quarterly; 10(4); 276-310.

Wagner, A., Ploder, O., Enislidis, G., Truppe, M., and Ewers, R. 1995. Virtual Image-

Guided Navigation in Tumor Surgery - Technical Innovation. Journal of Cranio-

Maxillo-Facial Surgery; 23(5); 271-273.

Walsh, T., and Beatty, P. C. W. 2002. Human factors error and patient monitoring.

Physiological Measurement; 23(3); R111-R132.

Wang L., Tsai W. 1991. Camera Calibration by Vanishing Lines for 3-D Computer

Vision. IEEE PAMI; 13(4).

153

Watt, R. J. 1985. Image Segmentation at Contour Intersections in Human Focal Vision.

Journal of the Optical Society of America a-Optics Image Science and Vision;

2(7); 1200-1204.

Weese, J., Buzug, T. M., Penney, G. P., and Desmedt, P. 1998. 2D/3D registration and

motion tracking for surgical interventions. Philips Journal of Research; 51(2);

299-316.

Weinger, M. B., Pantiskas, C., Wiklund, M. E., and Carstensen, P. 1998. Incorporating

human factors into the design of medical devices. Jama-Journal of the American

Medical Association; 280(17); 1484-1484.

Weng J., Cohen P., Herniou M. 1992. Camera Calibration with Distortion Models and

Accuracy Evaluation. IEEE PAMI; 14,(10).

Zamorano, L., Dujovny, M., Malik, G., Mehta, B., and Yakar, D. 1987a. Factors Affecting

Measurements in Computed-Tomography-Guided Stereotactic Procedures.

Applied Neurophysiology; 50(1-6); 53-56.

Zamorano, L., Dujovny, M., Malik, G., Yakar, D., and Mehta, B. 1987b. Multiplanar Ct-

Guided Stereotaxis and I125 Interstitial Radiotherapy - Image-Guided Tumor

Volume Assessment, Planning, Dosimetric Calculations, Stereotactic Biopsy and

Implantation of Removable Catheters. Applied Neurophysiology; 50(1-6); 281-

286.

Zamorano L., Li Q., Pandya A.K.,Gong J., Diaz F. 2001 Interactive Image-Guided

Neurosurgery with Stryker Wireless Navigation System.In: CNS, San Diego,

California.

.

154

ABSTRACT

MEDICAL AUGMENTED REALITY SYSTEM FOR IMAGE-GUIDED AND ROBOTIC SURGERY :

DEVELOPMENT AND SURGEON FACTORS ANALYSIS

by

ABHILASH PANDYA

May 2004

Advisor: Dr. Gregory Auner

Major: Biomedical Engineering (Scientific Computing)

Degree: Doctor of Philosophy

This research is focused on the development and surgeon factors analysis of

advanced visualization technology for the operating room. The hypothesis of this work

is that applying advanced technology for the visualization of real-time medical data will

enhance the performance, comfort and insight of the surgeon. It will then also improve

the morbidity and mortality of patients.

In the first study, we use a passive robot arm to track a calibrated end-effector

mounted video camera. In real time, we superimpose the live video view with the

synchronized graphical view of CT-derived segmented object(s) of interest within a

phantom skull (Augmented Reality (AR)). Using the same arm, we have also

developed an Image Guided Surgery system (IGS) (Virtual Reality) able to show a

tracked tool’s trajectory on orthogonal image data scans and 3D models. Both systems

are designed with client/server architecture for potential use in telepresence. A Human

factors study was conducted using 21 subjects (3 surgeons) to try and see if differences

155

in terms of time, errors and level of awareness of the patient 3D anatomy existed

between the two systems. This study indicated that IGS took a statistically significant

longer time than did AR. In addition, (although on the border of statistical significance (p

value of 0.068)), IGS did have on average a greater number of errors indicating gaps in

awareness of the phantom’s anatomy.

In a second study, a comparison of display hardware for the video stream that is

being viewed from the remote surgical site was conducted. The main question was:

Does visualization of the remote video at the surgical site by a Head-up display improve

the performance of the test subject over viewing a monitor? In this study we concluded

(using 22 subjects) that the use of a Head-up display compared with a 45° angled

monitor influences positively the performance of the surgeon.

We believe and have show via subject testing that Augmented Reality generation

is a natural extension for the surgeon because it does both the 2D to 3D transformation

and projects the views directly onto the patient view. We conjecture that medical robotic

devices of the future should be able to use this technology to directly link these systems

to patient data and provide the optimal visualization of that data for the surgical team.

The design and methods of the AR prototype device can, we believe, be extrapolated

for current medical robotics systems and IGS systems. There are distinct advantages

and disadvantages for the use of both AR and IGS systems and hence, as future work

we propose a hybrid on-demand AR/VR system for use in Robotic and Image Guided

Surgery.

156

AUTOBIOGRAPHICAL STATEMENT Abhilash Pandya was born in Kenya, East Africa on July 16th 1965 and came to Michigan in 1972. He is married and has two daughters (Maya 5 and Keena 3). His research interest has always been in the utilization of Computation and Engineering principles to study and impact Science and Medicine (Bioengineering). He completed his Masters degree in 1988 in Bioengineering (with concentration in Computer Science) at the University of Michigan at Ann Arbor where his research was in modeling and simulation of the signal processing (from sound waves to action potentials) of the inner ear. For this work he worked at the “Bionic Ear” (Ear Implants) laboratory at UM-Ann Arbor. He also has a certification in Scientific Computing from Wayne State University. His undergraduate education was a combination of Biochemistry with a concentration in Computer Science from the University of Michigan (attending both the Ann Arbor and Dearborn campuses). where his research and education focused on graphical modeling and simulation of biochemical reactions. In 1987 he was a key original member of a small (15 people) start-up Biotechnology company (Virogen Inc.) in which he developed a computer genomic model of the AIDS virus and also built a graphical simulator for robotic processing of AIDS samples. From 1988 to 1998 (10 years), he worked at NASA Johnson Space Center under various (Lockheed) contracts for NASA’s Flight Crew Support Division. His major projects included building software for a Laser-based 3D scanning system for scanning Astronauts for 3D modeling, developing software for a Space Station Robotics hand-controller commonality study, and computer graphics based modeling of the kinematics, dynamics and strength of space suited and unsuited astronauts. He was also the primary member of a team of 3 who developed the software for a fully-immersive human-model-based Virtual Reality system for Space Station applications. From 1998 to 2002, he worked at the Neurosurgery Department (Harper Hospital) at Wayne State University where he developed and supported Image Guided Surgery software for use in the operating room and led a team of Engineers in research in Robotic and Image Guided Neurosurgery and Augmented Reality. He has been working at the Smart Sensors and Integrated Microsystems Laboratory (SSIM) at the Electrical and Computer Engineering Department at Wayne State University leading a group of Engineers doing research on sensor fusion, advanced visualization and interface optimization for Robotic Assisted Surgery. He has been involved in the preparation and execution of 6 NASA grants. He is the Principle Investigator on a recent NASA grant (starting March 2004) in which Augmented Reality technology will be developed for Space Station Robotics. He has over 60 publications (including conferences paper, invited talks (2), journals (5) and NASA technical Reports (3) ) in the fields of Virtual Reality, Augmented Reality, Robotics, Human/ Space Suit Modeling and Human Factors. He has also filed for a patent on AR techniques related to robotics.

medical augmented reality system for … · medical augmented reality system for image-guided and...

Documents