medical augmented reality system for … · medical augmented reality system for image-guided and...
TRANSCRIPT
MEDICAL AUGMENTED REALITY SYSTEM FOR IMAGE-GUIDED AND ROBOTIC SURGERY :
DEVELOPMENT AND SURGEON FACTORS ANALYSIS
by
ABHILASH PANDYA
DISSERTATION
Submitted to the Graduate School
of Wayne State University,
Detroit, Michigan
in partial fulfillment of the requirements
for the degree of
DOCTOR OF PHILOSOPHY
2004
MAJOR: Biomedical Engineering (Scientific Computing)
Approved by:
_____________________________
Advisor Date
_____________________________
_____________________________
_____________________________
_____________________________
ii
DEDICATION
To my parents
Dad has shown me that a smile and positive attitude go a long way.
Mom is pure love.
iii
ACKNOWLEDGEMENTS
I would like to start by first acknowledging my closest friend and best
supporter –my wife Alka. Alka has been very patient from start to as I was
finishing this thesis. I know it has not been easy trying to raise two little ones
and also going to school herself. Thanks Alka for all the love and support.
I would like to thank my advisor, Professor Greg Auner. He has been
very supportive of my work and has always provided me the encouragement to
develop my career with independence. He has pushed me to write proposals
as the PI and has really taken me with both direct conversation and as an
example, to the next level. Thanks Greg for your confidence and support. I
would like to especially thank Professor Darin Ellis. He has been full of energy
and has helped me tremendously with setting up, conducting and writing up
the Human Factors Studies. I think that we will be involved in many more
studies in the future. Thank you very much Dr. Klein for the detailed
comments on the thesis. Your pragmatic and insightful comments have kept
me well-grounded in “reality”. Dr. Robert Erlandson, thanks for the reviews
and comments on my thesis. Jim Maida, I must thank not just for the software
(mathlib.c) help and consultation with this thesis, but, for much much more.
Jim, over the 14 years that I have known you (since the days in 1989 at the
space center in Houston), I have learned so much from you. Thanks for being
a fantastic NASA Technical monitor, a great friend/mentor and the best PUNdit
around!
iv
There are several colleagues that I must recognize. Mohammad Siadat
has been my friend, colleague and teacher. I have learned a lot from him and
he has contributed immensely to this work. I cannot thank you enough. Dr.
Qinghang Li, I also wish to thank you for all the hard work we did together on
the Neuromate Robot. You have been a good friend and tremendously
supportive as I was learning the trade of how to be a Neurosurgical Engineer
in the OR. Dr. Gong was my first teacher in the Neurosurgery department and
he shared his knowledge. He has also helped me immensely with software
issues and other insights for this work. I would also like to thank Dr. Lucia
Zamorano for hiring me and introducing me to the field of Stereotactic
Neurosurgery. I feel that I have grown intellectually as well as emotionally
while working at the Neurosurgery department. Dr. Enrico Marchese, I would
like to thank you for the help in the head-up display vs. monitor study. I still
think that we should have stated that we used your children’s operation game
in our paper instead of a “metal plate that registered the touch of the
endoscope”. I would like to also thank our summer coops and GRAs Mark
Hanna, Elizabeth Klein and Mohammad Kalash for help on this project.
Lastly, but never least, I would like to thank my family. My uncle Piyush
has been very supportive of my education from when I was very young and my
Grandfather who taught me Algebra and Geometry. To my Mom and Dad,
words cannot express my gratitude, and to Keena (what is this universe?) and
Maya (illusion) for providing me with joyous and comic relief when I needed it
the most.
v
TABLE OF CONTENTS
DEDICATION..............................................................................................................................................II
ACKNOWLEDGEMENTS....................................................................................................................... III
LIST OF FIGURES ................................................................................................................................ VIII
CHAPTER 1: INTRODUCTION AND MOTIVATION.......................................................................1
1.1 MOTIVATION AND PROBLEM STATEMENT.....................................................................................5
1.2 RESEARCH OBJECTIVE AND SPECIFIC AIMS. .................................................................................7
1.3 OUTLINE OF THE THESIS ...............................................................................................................8
CHAPTER 2: BACKGROUND.............................................................................................................10
2.1 IMAGE GUIDED SURGERY -- CURRENT TECHNOLOGY IN THE OPERATING ROOM. .......................11
2.2 WHAT’S THE DIFFERENCE BETWEEN AUGMENTED REALITY AND VIRTUAL REALITY? ...............12
2.3 WHY IS VISUALIZATION IMPORTANT FOR MEDICAL ROBOTICS? .................................................15
2.4 MEDICAL IMAGING, SEGMENTATION AND 3D MODEL CREATION...............................................21
2.4.1 Medical Imaging Data...........................................................................................................22
2.4.2 Methods of Segmentation.......................................................................................................26
2.4.3 From Segmentation to 3Dmodel Creation.............................................................................30
2.5 WHAT IS THE IMPORTANCE OF HUMAN FACTORS IN MEDICINE AND ENGINEERING? ..................39
CHAPTER 3: IMAGE GUIDED SURGERY (IGS) ............................................................................41
3.1 LITERATURE REVIEW AND DESCRIPTION OF SYSTEM..................................................................41
3.2 IMPLEMENTATION OF IMAGE GUIDANCE SYSTEM.......................................................................44
3.2.1 Passive Robot Arm Used as the Tracking system ..................................................................44
3.2.2 Patient Registration with Fiducial Mapping .........................................................................48
3.2.3 Software Architecture............................................................................................................57
vi
3.3 DISCUSSION ................................................................................................................................60
CHAPTER 4: MEDICAL AUGMENTED REALITY SYSTEM (MARS) ........................................62
4.1 LITERATURE REVIEW AND DESCRIPTION OF SYSTEM..................................................................62
4.1.1 Medical Augmented Reality...................................................................................................66
4.1.2 Research in Camera Calibration...........................................................................................72
4.1.3 AR in Telepresence ................................................................................................................72
4.1.4 Live View with Registered Data ............................................................................................73
4.2 IMPLEMENTATION OF AUGMENTED REALITY...............................................................74
4.2.1 Coordinate Systems ...............................................................................................................75
4.2.2 Robotic-based Tracking of the Camera.................................................................................76
4.2.3 Computing the Pose of the Camera Relative to the End-Effector (TEE-C) ..............................78
4.2.4 Camera Calibration Used to Determine TP-C ........................................................................81
4.2.5 How to measure AR accuracy ...............................................................................................86
4.2.6 Software Architecture............................................................................................................90
4.3 AR SYSTEM ACCURACY.......................................................................................................91
4.3.1 Accuracy of the Microscribe..................................................................................................92
4.3.2 Accuracy of Camera Calibration...........................................................................................93
4.3.3 Total Application Error Dependencies..................................................................................94
4.4 DISCUSSION ............................................................................................................................97
CHAPTER 5: SURGEON FACTOR TESTING ................................................................................101
5.1 IMAGE GUIDED SURGERY VS. AUGMENTED REALITY—THE HUMAN FACTORS........................102
5.1.1 Intoduction/Motivation ........................................................................................................103
5.1.2 Method.................................................................................................................................104
5.1.3 Results .................................................................................................................................108
5.1.4 Conclusions .........................................................................................................................112
5.2 HUMAN FACTORS TESTING (HEADS-UP DISPLAY VS. MONITORS).........................114
5.2.1 Introduction/Motivation ......................................................................................................114
vii
5.2.2 Method.................................................................................................................................117
5.2.3 Results .................................................................................................................................119
5.2.4 Conclusions .........................................................................................................................122
CHAPTER 6: SUMMARY, CONTRIBUTIONS AND FUTURE WORK ......................................124
6.1 SUMMARY/CONTRIBUTIONS......................................................................................................124
6.2 FUTURE WORK..........................................................................................................................127
6.2.1 Stereo Augmentation ...........................................................................................................128
6.2.2 Sensor Technology/Data at the End Effectors of Robots.....................................................128
6.2.3 Continuous Zoom Camera Calibration ...............................................................................129
6.2.4 3D Ultrasound for AR .........................................................................................................130
6.2.5 Space Station Robotics (to infinity and beyond)..................................................................130
APPENDIX ................................................................................................................................................132
BIBLIOGRAPHY .....................................................................................................................................136
ABSTRACT ...............................................................................................................................................154
AUTOBIOGRAPHICAL STATEMENT................................................................................................156
viii
LIST OF FIGURES
Figure 1-1: The surgeon uses tools (shown on the right), and needs to visualize that data
using advanced display technology (as shown on the left). The highlighted portions of
the figure are the focus of this thesis. ................................................................................. 4
Figure 1-2: (a) is typical data displayed during Image Guided Surgery . It represents a
Virtual Environment. Notice that the vessel in is represented by two dots in the axial
slice and the cube is represented by a triangle in the coronal slice. (b) is 3D geometr
geometry data (models) registered and displayed on a live video view of the same
phantom viewed with the AR prototype system developed in this thesis. This represents
an Augmented Reality view. Note the difference between AR and VR and the associated
differences in visualization. ................................................................................................ 6
Figure 2-1: The foundation technologies behind both Augmented Reality and Image
Guidance which will be discussed in this chapter. At the heart of this thesis is a Human
Factors Analysis that compares Augmented Reality and Image Guidance technology. .. 10
Figure 2-2: Two different forms of "Virtual Reality": On the left, a tracked tool’s pose is
displayed in a 3D image and orthogonal slices of a CT scan of the phantom brain. This
system was a technology developed as part of this thesis. On the right, the tracked user of
the VR system becomes a model of a space-suited astronaut performing tasks on a 3D
model of the space station. Note that in both systems, the user is viewing a virtual world.
........................................................................................................................................... 13
Figure 2-3: This is an example of an Augmented Reality Scene. The live view of the
phantom is augmented with virtual objects derived from CT scans of the phantom brain.
ix
This augmentation overlays the actual objects with 3D wireframe models of the actual
objects. This technology (also developed as part of this thesis) uses the same tracking
device as the image guided system (See Figure 2-2a) but uses the end-effector mounted
camera to generate the AR scene. ..................................................................................... 16
Figure 2-4: Surgical site (right) and the remote surgeon site (left) for the master/slave
Zeus Robot (Computer Motion inc.). The surgeon is using hand controllers and voice
recognition to control the three arms of the laparoscopic instrumentation. He relies on
raw video images from the endoscopic camera to perform his operation. It is the premise
of this thesis that advanced forms of visualization may increase the surgeon’s
performance. Picture with permission from Computer Motions Inc. .............................. 16
Figure 2-5: Neurosurgical Robotic Device (Neuromate, Integrated Surgical Systems Inc.).
Here the robotic device holds a tool for the surgeon at a very precise position and
orientation and the surgeon can then performing a biopsy of the patient’s tumor. The
biopsy needle can be tracked on an image guided system (shown on the right) to allow
the surgeon to know when the target is achieve and that no other important structures
(like vessels are in the way). Augmentation techniques could easily be added to such a
system. .............................................................................................................................. 20
Figure 2-6: The process of segmentation and model generation: (a) A plastic phantom
skull with simple objects fixed inside the skull was scanned with a CT scanner at 2mm
slices. (b). This is one coronal slice of the raw CT data. (c) This is a label-map of that
one slice overlaid on the raw CT image. (d) This is the 3D model generated using the
Marching Cubes algorithm after processing all the CT slice data. It shows a transparent
skin model through which the internal front view of the phantom is visible. .................. 24
x
Figure 2-7: 3D models generated from CT scans of our phantom skull. Each patch of the
3D model contains a corresponding file item which specifies its vertex list and edge list.
........................................................................................................................................... 32
Figure 2-8: Data Flow for an AR Scene Synthesis. .......................................................... 35
Figure 2-9: Fiber Optics technology that was studied as a possible candidate for AR/IGS
tracking. ............................................................................................................................ 38
Figure 2-10: Successful technology development for medicine must include extensive
user testing and surgical feedback. ................................................................................... 40
Figure 3-1: The three planes of Image Data set. (Axial, Coronal, and Sagittal) ............ 43
Figure 3-2: In this Experiment, we captured the same point in space using different arm
configurations. We found that there is a 0.91mm Standard Deviation . The Red line
represents the average. ...................................................................................................... 47
Figure 3-3: This is the transformation structure of a typical robotic device. The green
arrows show the degrees of freedom of the arm. The blue lables (with red arrows) show
the transformations needed to compute the Base to End-effector Transform. ................. 47
Figure 3-4: The transformation between the patient coordinate system and the image
coordinate system can be represented by a translation vector (T) and a Rotation vector
(R). These entities for the homogeneous 4x4 transformation matrix. ............................. 50
Figure 3-5: Fiducial markers (visible in CT scans) are applied to the phantom before the
scan . ................................................................................................................................. 52
Figure 3-6: A Fiducial marker in the imaging space is located on all three slices of the CT
scan and is displayed (in blue) on the 3D model. ............................................................. 52
xi
Figure 3-7: On the left, the fiducial marker on the actual skull is being digitized by the
robotic articulated arm. This point corresponds to the one shown on the 3D model (right).
........................................................................................................................................... 53
Figure 3-8: Parameter estimation for pair-point matching algorithm. The paired points
(fiducials in image coordinates and patient coordinates) are used to derive a rigid body
transformation that will translate a vector in patient space to a vector in image/3D model
space.................................................................................................................................. 54
Figure 3-9: This is the model for coordinate transform estimation . The point in the input
of the model are converted to a new coordinate system using the given estimated
parameters. ........................................................................................................................ 56
Figure 3-10: A surgeon is using an image-guided system where the tool that she is using
is tracked by an infrared camera system and displayed on the orthogonal slices of the
preoperative MRI scan...................................................................................................... 57
Figure 3-11: Image Guidance systems that are used in the OR was re-implemented in this
thesis to use the same articulated-arm tracker as the AR system as a testbed for evaluation
of this current technology with the up-coming AR technology........................................ 57
Figure 3-12: Software implementation of the Image Guidance System. The system is
implemented as a client-server system in which multiple clients anywhere on the internet
can view the scene as seen by the main client. ................................................................. 59
Figure 3-13: This is the Tracker Server interface. This software does the pair-point
matching on the image data and also handles communication of various tracking
information to multiple clients on the network................................................................. 61
xii
Figure 4-1: An AR scene can be generated by the alignment of the camera’s trajectory
and the 3D graphics virtual camera’s trajectory. Once the two cameras are aligned, the
actual objects will match their 3D modeled replicas. ....................................................... 64
Figure 4-2: The precursor of Augmented Reality/Augmented Robotics. We use the
Microscirbe as the tracked tool, (a). The position and orientation of the end-effector is
shown on the orthogonal slices and 3D model of the phantom skull. After adding a
calibrated and registered camera, (b). we can generate a monoscopic Robotic
Augmentation Scene,(c).................................................................................................... 65
Figure 4-3: These are the steps needed to generate both a Neuro Navigation (NN) System
and an Augmented Reality System. Note that AR represents an extension to
Neuronavigation and can be performed simultaneously with NN.................................... 75
Figure 4-4: A series of coordinate systems for the AR development ............................... 76
Figure 4-5: The transformations needed to compute an AR scene. The main transform
( CEET − )- The transform from the end-effector to the camera coordinate system) is the
primarytransformation matrix that is computed for AR. .................................................. 80
Figure 4-6 Camera Calibration Model. Objects in the World Coordinate System need to
be transformed using two sets of parameters - extrinsic and intrinsic parameters. .......... 82
Figure 4-7: Camera Parameter Estimation. An initial guess of the extrinsic parameters
comes from the DLT method. The observed CCD array points and the corresponding
computed values are compared to determine if they are within a certain tolerance. If so,
the iteration ends ............................................................................................................... 86
Figure 4-9: A cube is augmented on the live video from the Microscibe. Three
orthogonal views are used to compute the error: (A) represents a closeup view of the
xiii
pointer (the known location of the cube corner) with the video camera on the x axis, (B)
the pointer here is viewed from the y axis and (C) the pointer viewed from the z axis.
(D) represents an oblique view of the scene which shows the pointer, the camera and the
video view on the monitor showing the cube superimposed. ........................................... 89
Figure 4-10: Software implementation of the AR system with the Image Guidance
System. Here, the Kinematic server supplies position orientation to both the AR and IGS
clients. Each client has their own models of display preloaded. With this software
architecture, simultaneous AR and IGS are possible........................................................ 91
Figure 4-11 The errors of the distorted image. The contours represent error boundaries.
(left) Note that for radial distortion at the center there are less than 5 pixels of error and at
the corners the errors exceed 25 pixels of error. The Tangential distortion is an order of
magnitude less than the radial distortion. ......................................................................... 94
Figure 4-12 Errors contributing to the total error involved in Augmented Reality .......... 97
Figure 5-1: Computer Motion (Zeus Robot) provides either a Heads-up Display view of
the surgical site or a monitor view. The Davinci robot provides an immersive
stereoscopic view of the remote video. Which configuration provides the best
performance?................................................................................................................... 102
Figure 5-2: This figure shows a screen shot of a subject looking at the live video view of
skull overlaid with the 3D graphics objects on the monitor. Their Marker is then placed
on the edges of the overlaid object and the object is traced on the surface of the draped
skull................................................................................................................................. 107
xiv
Figure 5-3: This figure shows screen shots of a subject looking at the orthogonal slices in
am image guidance system to find the extents and shape of the object for which the skull
opening has to be made................................................................................................... 107
Figure 5-4: Errors made by the subjects during the testing period. ................................ 109
Figure 5-5 The time required for each subject to complete the craniotomy task and
answer questions on. ....................................................................................................... 109
Figure 5-6: Different hardware methods to display a remote camera view.................... 116
Figure 5-7: The phantom skull with a black track of velcro and several blue plastic
pieces of tubing that the subject was asked to remove through the opening in the foam.
......................................................................................................................................... 118
Figure 5-8: A subject performing the experiment using the monitor and the heads-up
display. ............................................................................................................................ 118
Figure 5-9: Time for testing using the HUD subtracted from time for testing using the
Monitor. .......................................................................................................................... 119
Figure 5-10- Average time to perform a the same task for 15 subjects for the Heads Up
Display, a nonitor placed directly in front of the subject and a monitor placed at a 45
degree angle from the subjects task. ............................................................................... 120
Figure 5-11: Average number of errors to perform he same task for 15 subjects for the
Heads Up Display, a nonitor placed directly in front of the subject and a monitor placed
at a 45 degree angle from the subjects task..................................................................... 121
Figure 5-12: This is the t-statistic analysis that was done on the data. Note that the two
monitor conditions 0 and 45 degrees have a very high p value indicating that there is no
statistical significance. When comparing the 45 degree monitor with the HUD, there is a
xv
statistical significance at alpha = 0.5, and the monitor at 0 degrees is on the borderline of
significance. Note the break in the y axis to show the difference in magnitude............. 121
1
Chapter 1: INTRODUCTION AND MOTIVATION
Knowing something, seeing something, enquiring into something in different
ways from different angles is insight. - The Buddha
In medicine Intelligent Amplification (IA) (Azuma R. 1997) describes the use of
computers and other associated technology to gain insight about the state of the patient
and to make tasks easier to perform (Ayache 1995). Engineering has many tools
(Pransky 2001) that are proving to be extremely useful throughout the medical field.
Medical robotics (Burkart A. and Fu FH 2001;Cleary K. 2001) is beginning to
demonstrate tremendous potential for surgeons to improve their performance. One
important application of robotics is to augment the surgeon’s motor performance
(Riviere et al. 2003), especially in performing small delicate tasks, by tremor filtration
and motion scaling. The current clinical application is in minimally invasive surgery
(MIS), also known as laparoscopic, thoracoscopic, or endoscopic surgery. Minimally
invasive surgery in many cases results in less tissue trauma, less scarring, less pain,
and a quicker return to normal activities for patients (Taylor et al. 1999;Taylor et al.
2003). The surgical robots allow the surgeon to use a wrist on the end of the MIS
instrument that has no wrist if used without the robot (Taylor and Stoianovici 2003).
However, the problem it poses is that surgeries become more difficult. In MIS the
magnification and therefore the size of the field of view changes with the proximity of
the scope to the objects being viewed (Burkart A. and Fu FH 2001). Because of the
small incisions and camera view, the surgeon is no longer able to see inside the patient
directly. Visualization is critical for these systems that use a robotic interface as the
2
surgeon typically operates from a remote location and relies almost entirely on indirect
limited field-of-view video of the surgery (Ayache 1995;Burkart A. and Fu FH
2001;Cleary K. 2001).
During complex operations, a surgeon must maintain a precise sense of three-
dimensional anatomical relationships (Bucholz et al. 2001). It is of great importance to
see what is usually hidden. Hence, imaging is especially critical in medicine. Surgeons
can now "see" on a 3Dimentional image where their tracked surgical tools are with
respect to the lesion responsible for the patient's problems. A relatively new field, image
guided surgery blends the use of computer-based medical imaging data with real-time
instrument position data capture to assist the surgeons in localizing and removing
lesions. This technology is now starting to be used in several branches of surgery (e.g.
Neurosurgery, Spinal surgeries (Holly and Foley 2003), Orthopedic surgery (DiGioia
1998), Dental surgery, and even some examples of general surgery (Cash et al. 2003)).
Computer based medical imaging has revolutionized surgery. Relatively new imaging
modalities (to be discussed later) help pinpoint specific structures and substructures
and even variable function within a single structure can often be mapped. Newer
imaging modalities such as ultrasound, Computed Tomography (CT) scan, Magnetic
Resonance Imaging (MRI), and Positron Emitting Tomography (PET) scan go even
further in demonstrating specific structures such as nerves and blood vessels (Roberts
et al. 2001). In the case of functional MRI (fMRI) even variable function within a single
structure can often be mapped. Technology that integrates and brings this imaging and
sensor information to the surgeon in real-time as she performs procedures will add new
3
dimensions to what can be done to diagnose and treat patients (Nakao et al.
2003;Samset and Hirschberg 2003).
There are two different types of visualization technology that are being analyzed
here for the medical domain.--Augmented Reality (AR) and Virtual Reality (VR) (A.
Pandya 2003;Pandya A.K. ;Pandya A.K. 2000a;Pandya A.K. 2001a;Pandya A.K. 2002).
Image guidance is an example of Virtual Reality. In Image Guidance Surgery (IGS), the
surgeon views a computer-generated world of image data and 3D models. In contrast,
the AR system generates a composite view for the user that includes the live view fused
(registered) with either pre-computed data (e.g. 3D geometry) or other registered
sensed data (Pandya A. K. 2002). Augmented Reality is a variation and extension of
Virtual Reality and represents a middle ground between computer graphics in a
completely synthetically generated world (as in VR) and the real world (Pandya A.K.
2001f;Pandya A.K. 2001e;Pandya A.K. 2001c;Pandya A.K. 2003c;Pandya A.K. 2003a).
The current technique of image guidance does not allow the surgeon to use both real
and synthetic data simultaneously (Azuma R. 1997). The surgeon can detect
anomalies using advanced imaging and sensors and can accurately place their tools
within the surgical environments with robots. Nevertheless, he also needs his own
vision to detect other features that may not be available from the sensor information.
This, we conjecture, is one of the advantages of AR.
The overall goal of this research focuses on the improvement of 3-dimensional
visualization aspects of Robotics-based operations and Image Guided Surgery. The
highlighted portions of Figure 1-1 represent the areas of focus for this research. The
modern operating room is beginning to fill with visualization tools that enhance the
4
Surgeon
Robotic Manipulation
Patient site
Imaging & Sensing Tools
Image-guided Systems
Open MRI
Ultrasound
Flouroscopes
Raman Spectroscopes
Heads-up Display
Augmented Reality Displays
Virtual Reality Displays
3D Stereoscopic Displays
Holographic Displays
Sensor Fusion and Visualization
surgeon’s understanding of the patient’s medical situation (Pransky 2001). These tools
include image-guided systems, ultrasound, open Magnetic Resonance Imaging (MRI)
systems, fluoroscopes, microscopes, and endoscopes, all aimed to help the surgeon
see with greater clarity and from different points of view the problem that the patient is
facing. These tools have associated visual information that needs to be integrated and
optimally visualized by the surgeon (See Figure 1-1).
Figure 1-1: The surgeon uses tools (shown on the right), and needs to visualize that data using advanced display technology (as shown on the left). The highlighted portions of the figure are the focus of this thesis.
Visualization of multimodal sensor information can assist the surgeon to
synthesize and integrate relationships that are not readily formulated. Hence, the goal
of this work is to create new visualization technology that integrates with the imaging,
sensing and robotics systems that the surgeon uses, and compare it with existing
technology. We use the tools of Human Factors Analysis to gage whether the new
5
technology improves the surgeon’s performance by helping him see and understand the
reality of the patient’s condition.
1.1 Motivation and Problem Statement
One of the main new technologies that is developed and tested here is
Augmented Reality (AR). It blends the real world with 3D models generated from
medical scans or other data. AR technology fuses both the real view and the sensor
view to provide the surgeon both types of information. The virtual objects display
information that the user of the system may not be able to see directly due to occlusion
or other factors. The hypothesis of this work is that applying advanced technology for
the visualization of real-time medical data will enhance the performance, comfort and
insight of the surgeon. It will then also improve the morbidity and mortality of patients.
In this thesis, we focus on the development, the accuracy testing and the comparison of
a relatively new technology (Augmented Reality) to what is currently used in the OR
(Image guidance). We have used medical human factors testing and evaluation to
determine the validity and usability of the developed systems to gage if the AR
technology actually improves surgeon performance.
Currently image guidance provides primarily three 2-D views (coronal, axial and
sagittal views) to gain awareness of the patient geometry (Hinsche and Smith
2001;Wadley and Thomas 2000). The surgeon has to perform the 2D (image) to 3D
transformation in their minds while projecting the envisioned data on the view of the
patient. There are advantages (which we will prove) to registered visualization
techniques that are able to help fuse the 3D data with what the surgeon is actually
6
seeing. We believe that AR generation is a natural extension for the surgeon because it
does both the 2D to 3D transformation and projects the views directly on the patient
view. To illustrate the difficulty to interpret 2D slices, in Figure 1-2(a) a simple 3D shape
like the cube is represented by a triangle in the coronal slice and a vessel is
represented by two dots in the axial view of the CT scan due to the orientation of that
particular CT slice. Currently, the surgeon must convert these views to a 3D
representation and merge it with what she physically sees on the patient. This can be a
very heavy mental load on the surgeon. This scene represents a Virtual Reality (VR)
scene where the actual live view is not presented. Figure 1-2(b) represents a live video
view of the same phantom skull with the models of interest displayed directly on the
view. This view, generated using our prototype, represents an AR view because the
live view is presented and augmented with additional geometrical information.
Figure 1-2: (a) is typical data displayed during Image Guided Surgery . It represents a Virtual Environment. Notice that the vessel in is represented by two dots in the axial slice and the cube is represented by a triangle in the coronal slice. (b) is 3D geometric data (models) registered and displayed on a live video view of the same phantom viewed with the AR prototype system developed in this thesis. This represents an Augmented Reality view. Note the difference between AR and VR and the associated differences in visualization.
(a) 1(b
7
1.2 Research Objective and Specific Aims.
This research describes the implementation, accuracy assessment and usability
testing of two prototype systems. The first, an image guidance system, represents what
the surgeons currently use in the neurosurgery operating theater. The second, medical
Augmented Reality system, represents potentially new or upgraded visualization
system. A passive articulated robotic arm (Microscribe, Immersion Technology) is used
to develop both systems. The AR system uses the arm with a mounted camera system
at its end-effector. This thesis covers the steps needed to build and compute both the
real-time Image Guided (IG) and Augmented Reality (AR) scenes for objects of interest.
It also provides an in-depth error analysis of the built prototypes. In addition, we
provide all the software, procedures and hardware list to recreate both prototypes upon
request.
Our research focus is on accurate registration that must be maintained while a
user, a robot or the needed tools move within the real environment. The optical (focal
length and lens distortion) of the camera and geometrical (position and pose)
parameters of the surgeon, robot or tool determine exactly what is projected onto the
image plane (Tsai 1987.). The main technical objective for this work is to develop a
testbed AR system and test its effectiveness to assist the operator in the performance
of the task as compared to a testbed Image Guided Surgery (IGS)
system. In addition, advanced display technology will also be investigated for its
effectiveness in the display of remote information. The underlying hypotheses are that
(1) an Augmented Reality System will significantly improve the performance of the
8
surgeon and (2) that advanced visualization hardware (e.g. heads up display) can
improve the performance of the surgeon.
The specific aims of this thesis are as follows:
1. Describe the Implementation of a prototype Augmented Reality system
for robotics.
2. Describe the Implementation of a prototype Image guidance system.
3. Perform a subject study to determine the pros and cons of Augmented
Reality vs. Image guidance
4. Perform subject studies to determine if heads-up displays provide any
advantage over monitor viewing.
The ultimate aim is to extrapolate the findings and development of this thesis to
active medical robotic systems and IGS systems in existence. Examples of current
robotic systems that could take advantage of this technology include systems such as
the Neuromate and Robodoc systems (Integrated Surgical Systems Inc.) (Taylor and
Stoianovici 2003), daVinci (Intutive Inc.)(Hoznek et al. 2002;Tewari et al. 2002), and
the Zeus (Computer Motions Inc.) systems (Nio et al. 2002)(Knight et al. 2003b).
1.3 Outline of the Thesis
The outline of the thesis is as follows: Chapter two provides background
information on Image Guidance and Augmented Reality technology, Medical Imaging
technology, Segmentation, and 3D modeling and on the discipline of Human Factors
Engineering as applied to medical technology development. A section of background
is also provided on tracking devices. Successful augmentation must have accurate and
9
reliable tracking methods. Tracking is a key component and various tracking methods
along with their advantages and disadvantages are discussed. The next two chapters
(chapters three and four) deal with technology development for both Robotics-based
augmentation and image guidance. Both of these technologies were developed during
the course of this thesis. Image guidance represents what is currently used in the
operating room while AR is a technology that we contend may become the next step for
the operating room. Chapter five describes the human factors studies that were done. It
is not enough to develop technology, as Engineers, we must validate and prove the
performance benefits of the technology. The first study deals with the comparison of
Image guidance with Augmented Reality. The major questions are— “Does Augmented
Reality offer any improvement in the surgical performance over using an Image
Guidance system?” “What are the advantages and disadvantages of Medical AR
technology?” The second study compares the use of head mounted displays to monitor
viewing for endoscopic surgery. Endoscopic views are the ones primarily used in robotic
surgery and an important question is, “Does head-mounted display provide any
improvement over monitor viewing?” Chapter six concludes with the contributions of
this thesis and more importantly, the future applications and research directions that
can be used to build on this work. There are many avenues that we predict will lead to
very useful medical tools of the future that will allow the medical doctor to gain more
insight into the patient’s condition. In this last chapter, the potential future directions will
be provided.
10
Chapter 2: Background Research is the act of going up alleys to see if they are blind. - Plutarch
In this chapter, basic definitions and background on the foundation
technologies being used to develop the ideas in this thesis are given. First, a brief
introduction about image-guided surgery (Virtual Reality) is given along with its
impact on surgery. Second a description about what Augmented Reality is and how it
differs from Virtual Reality is discussed. Next a description of medical robots and
their advantages and disadvantages are described along with the reasons that
visualization technology is considered a critical factor for these systems. In addition,
the major foundation technologies on which the technologies of Augmented Reality
and Image Guidance Surgery (IGS) are based (3Dmodeling /imaging, tool/user
tracking and accurate registration and calibration of objects/cameras within the
environment) will be presented (See Figure 2-1).
Figure 2-1: The foundation technologies behind both Augmented Reality and Image Guidance which will be
discussed in this chapter. At the heart of this thesis is a Human Factors Analysis that compares Augmented Reality and Image Guidance technology.
Human Factors Analysis
3D Modeling
Segmentation
Medical Imaging
Tool/Camera Calibration
Tracking
Patient Registration
Augmented Reality/
Image Guidance
11
At the end (and at the heart of this work), the discipline of Human Factors
Engineering is introduced, and its critical importance in the early stages of medical
technology development is presented.
2.1 Image Guided Surgery -- Current technology in the Operating Room.
Throughout every operation, a surgeon must maintain a precise sense of
complex three-dimensional relationships. Computer image processing and real-time
3D visualization was first used in the field of Neurosurgery (Gallen et al.
1994;Zamorano et al. 1987a;Zamorano et al. 1987b). . It is and will have to remain
an integral part of the surgical field to increase the chance that highly delicate
surgeries will be smooth and successful. Computer based medical imaging has
revolutionized Neurosurgery. Surgeons can now "see" on a 3-Dimentional image
where their tracked surgical tools are with respect to the lesion responsible for the
patient's neurological problem. A relatively new field, image guided surgery, blends
the use of computer-based medical imaging data with real-time instrument position
data capture to assist the surgeons in localizing and removing lesions (See Figure
2-2). The contributions of computer image processing and 3-dimensional visualization
to neurosurgery are becoming widely recognized, and attempts are being made to
apply the benefits to other surgical disciplines. (e.g. spine surgery (Holly and Foley
2003), and orthopedic surgery (DiGioia 1998)). Surgeons are now starting to
recognize the importance of enhancing intra-operative visualization technology and
are requesting and doing more research in this area.
12
A Virtual Reality scene is a completely computer-generated scene and
requires high-performance graphics workstations to generate acceptable levels of
realism(Chmielewski C. 1999;Goldsby M.E. 1994;Pandya A.K. 1994). Typically,
Virtual Reality systems are interactive. A user’s requests and responses are
captured and the scene is updated. In typical VR scenes, the user is completely
immersed in the environment wearing a head-mounted display and interacts with
virtual objects. On the right side of Figure 2-2 a user of the VR system wears a head-
mounted display. The user’s arms, torso and head are tracked using magnetic
tracking and a pair of cybergloves are used to track his fingers(Chmielewski et al.
1998). The user then becomes an ‘avatar’ (in this case a space-suited astronaut) and
can navigate and interact with (via graphical collision detection) and move all the
virtual objects of interest. In image-guidance, another example of a virtual
environment, the surgeon’s tool is tracked and its actual position and orientation is
displayed in a virtual 3D model of the patient’s brain that is created through 3D image
segmentation techniques from imaging data. In this case, the surgeon views the
virtual world on a monitor (See Figure 2-2) or can use a heads-up-display.
2.2 What’s the difference between Augmented Reality and Virtual Reality?
Augmented Reality is a variation and extension of Virtual Reality. AR is a
middle ground between computer graphics in a completely synthetically generated
world and the real world. It is a computer graphics generated image of the real world
obtained from data from the real world specific to a patient. In AR a surgeon views
a model of a patient’s brain that has been created from imaging data. This imaging
data “augments” the surgeon’s simultaneous visualization of the real brain. In some
13
cases it may even substitute for this real-time visualization of the real brain. The AR
scene is built on much of the same technology as a VR scene. 3D models of interest
and patient registration are still needed. In contrast, an AR system involves the use
of either a video camera or a see-through head mounted display both of which allow
a window to the real world. The AR system generates a composite view for the user
that includes the live view fused (registered) with either pre-computed data (e.g. 3D
geometry) or other registered sensed data. It is a combination of the real scene
viewed by the user and a virtual scene generated from 3D geometry/segmentation
accurately co-registered on the display that augments the scene with additional
information. As opposed to VR, in the AR world, the user’s sense of being in the real
world is maintained. AR supplements reality, rather than completely replacing it as in
VR(Billinghurst et al. 2001;Broll et al. 2001).
Figure 2-2: Two different forms of "Virtual Reality": On the left, a tracked tool’s pose is displayed in a 3D
image and orthogonal slices of a CT scan of the phantom brain. This system was a technology developed as part of this thesis. On the right, the tracked user of the VR system becomes a model of a space-suited astronaut performing tasks on a 3D model of the space station. Note that in both systems, the user is viewing a virtual world.
Real-time tracked end point is shown on CT scan and 3D model
User arm movements and
view points match.
14
AR technology is not to be confused with 2-D virtual overlays on top of live
video. 2-D overlays are constructed without registration to the real 3-D world and
represent a static display on video monitors. In AR, it is a combination of the real
scene viewed by the user and a virtual scene generated from 3D
geometry/segmentation accurately co-registered on the display that augments the
scene with additional information that appears in the same visual environment.
Recently real-time video processing and computer graphics have provided us with the
capability of augmenting the video stream with geometrical replicas of the actual
objects or sensor data. Critical objects of interest within the patient’s brain (for
instance) determine exactly the size, shape, location and orientation of the
craniotomy (skull opening) to be performed. The objects of interest can be tumors,
major vessels, or anatomically/physiologically important brain structures. Before an
operation there are usually several sets of image data available (MRI, CT, SPECT,
Functional MRI, etc.). The image data provides very important information about the
spatial arrangements and the functional importance of objects of interest within the
brain in the image space. Figure 2-3 illustrates an example of AR. The live view of
the phantom is augmented with virtual objects derived from medical imaging scans of
the phantom brain. This augmentation shows where the various objects and
"vessels" are located directly on the live video view. As the user moves the arm, the
objects are regenerated to correspond to the actual objects.
Through the process of segmentation and model extraction, 3D computer
graphics models that can be used as the input to the AR system can be generated
(Cline et al. 1987). In a futuristic vision, Robinett (Robinett 1992) speculates that AR
15
may be useful in applications that require displaying any information not directly
available or sensed by the human by making that information visible, audible or even
felt. Examples of this kind of data could be spectroscopy data, Doppler (blood flow
velocities), temperature, chemical concentrations, pressure information etc.
2.3 Why is visualization important for Medical Robotics? Medical Robotic systems like the Zeus (Computer Motion Inc.) and DaVinci (Intuitive
Surgical Inc.) are master-slave systems for minimally invasive surgery (See Figure
2-4). They offer the advantages of motion scaling, tremor filtration and comfortable
surgeon interfaces. They also provide a very functional wrist at the end of the MIS
instrument which is not available in standard MIS (Knight et al. 2003a). Direct linkage
of medical robotic systems to patient data and the optimal visualization of that data
for the surgical team are important for successful operations. In their review article on
medical robots, (Cleary K. 2001) state that if medical robots are to reach their full
potential, they need to be more integrated systems in which the robots are linked to
the imaging modalities or to the patient anatomy directly. They state further that
robotics systems need to be developed in an “Image-Compatible” way. Visual
information from the patient (i.e., remote) site needs to be augmented in a way that
allows greater situational awareness, accuracy and confidence. That is, these
systems must operate within the constraints of various image modalities such as CT
and MRI. This link, they conjecture, is essential if the potential advantages of robots
are to be realized in the medical domain.
16
Figure 2-3: This is an example of an Augmented Reality Scene. The live view of the phantom is augmented with virtual objects derived from CT scans of the phantom brain. This augmentation overlays the actual objects with 3D wireframe models of the actual objects. This technology (also developed as part of this thesis) uses the same tracking device as the image guided system (See Figure 2-2a) but uses the end-effector mounted camera to generate the AR scene.
Figure 2-4: Surgical site (right) and the remote surgeon site (left) for the master/slave Zeus Robot (Computer Motion inc.). The surgeon is using hand controllers and voice recognition to control the three arms of the laparoscopic instrumentation. He relies on raw video images from the endoscopic camera to perform his operation. It is the premise of this thesis that advanced forms of visualization may increase the surgeon’s performance. Picture with permission from Computer Motions Inc.
Master-slave robotics system represents an evolutionary leap from traditional and
laparoscopic surgery (Taylor et al. 2003). The surgeon is comfortably seated at a
well-designed master-slave controller interface and console (See Figure 2-4). At the
surgeon site (remote site) the system consists of a stereoscopic monitor, a foot pedal
End-Effector Camera
17
for control, and hand controllers for robotic end-effector (tool) manipulation. At the
robot-patient interface, the system consists of three robotic arms. Two arms (with
tools mounted on them) are used for surgical manipulation and the third arm has a
camera system for visualization. The third camera arm is controlled effectively with,
for example, voice recognition technology or with foot controllers. The robotic system
has several advantages over traditional surgery. The system is able to modulate the
surgeon's motions by tremor filtration. Inadvertent high frequency motions made by
the surgeon can be filtered out allowing for finer and smoother control. The system
can also use a technique of motion scaling to allow centimeters of motions made by
the surgeon to be translated to sub-millimeter motions at the robot-patient interface.
This allows for more precise microsurgery. It can also be used to compensate for the
bodies own motion. For example, a beating heart can be followed by the robot’s
motion resulting in the heart appearing stationary to the remote operator (Kappert et
al. 2001). This would allow delicate surgery to be performed on a beating heart with
precision and without the dangers of stopping the heart. (Bowersox et al. 1998)
studied the feasibility of the use of telepresence surgery to perform basic operations
in vascular surgery, including tissue dissection, vessel manipulation, and suturing.
They used a prototype telepresence surgery system with bimanual force-reflective
manipulators, interchangeable surgical instruments, and stereoscopic video input.
Arteriotomies created ex vivo in segments of bovine aortae or in vivo in femoral
arteries of anesthetized swine were closed with telepresence surgery or by
conventional techniques. Time required, technical quality and subjective difficulty
were compared for the two methods. All attempted procedures were successfully
18
completed with telepresence surgery and the precision attained with telepresence
surgery was equal to that of conventional techniques. They concluded that Blood-
vessel manipulation and suturing with telepresence surgery are feasible. In fact, this
robotic technology has reached sufficient maturity to allow FDA approval for surgical
use on humans.
Even with enormous technological gains, robotic surgery is still at its infancy.
There are some major areas of technological improvement needed for this technology
to reach its ultimate potential (Cleary K. 2001). Because the surgeon is remotely
located and relies almost entirely on indirect visual information, we believe that
visualization technology is one of the major areas that will make these systems of the
future more useful, powerful, easier to use and ultimately lead to better surgical
outcomes.
Medical robots are typically used where the surgeon is remotely located from
the patient. Visual information from the patient (remote) site needs to be augmented
in a way that allows greater situational awareness and confidence. In addition,
surgical planning and information management for these robotic systems is essential
for successful operations (Cleary K. 2001). Two main problems encountered in
robotic surgery are non-optimal port placements and robotic arm collisions. Robotic
arm collisions often require manual repositioning of the robotic arms on the operating
table that unnecessarily adds to the operative time. Incorrect port placement typically
results in robotic arm collisions, can lead to damage to robotic instruments and can
also lead to inaccessibility of the operative site. Improved accessibility can enhance
patient safety (Partin et al. 1995). These problems can be avoided in the pre-
19
operative stages given the appropriate visualization tools. For these reasons, it is
important that a robust visualization system be built that is linked to patient imaging
data that offers the surgeon tools for visualization, robotic system setup and port
placement. Computer modeling tools that help visualize the anatomical structures of
the patient would greatly aid the surgeon in the pre-operative stage. Visualization
tools can help the surgeon determine optimal port placement sites. In addition, these
tools will help determine the placement of the robotic arms on the operating table in
order to avoid collisions between arms during the procedure while maximizing the
range of motion of the instruments. A significant potential exists to impact medical
robotics with the pre-operative planning and intra-operative visualization tools (Taylor
et al. 2003).
Medical robotic systems are playing an increasing role in different image-
guided surgical procedures (Taylor et al. 2003). The key advantages are that robots
can effectively position, orient, and manipulate surgical tools in 3D space with a high
level of accuracy. The NeuroMate™ robot system (Integrated Surgical Systems,
Davis, CA) is a commercially available, image-guided robotic-assisted system used
for stereotactic procedures in neurosurgery (See Figure 2-5). This robotic device is
able to precisely hold tools at predetermined configurations and allows the surgeon to
perform very delicate and accurate placement of tools. We have performed a very
detailed accuracy study for the Neuromate robotic system details of which are
available in several of our published papers (Li Q 2001;Pandya A.K. 2000b;Pandya
A.K. 2000a;Zamorano L. 2000). These papers contribute to this field by showing a
method to measure the accuracy of robotic devices and also proves the utility of the
20
Neuromate system for Neurosurgery. As seen in Figure 2-5, the device also comes
configured with an image-guided system that allows the tool being placed to be
accurately tracked within the surgical space. It is primarily used for Neurosurgeries
and assists the surgeon by accurately and stereotactically placing tools in the surgical
field. It’s sister robot, the Robodoc, has a very similar design and is used for knee and
hip replacement surgeries. It is an active robot and is approved to cut groves into the
patient for joint replacement.
Figure 2-5: Neurosurgical Robotic Device (Neuromate, Integrated Surgical Systems Inc.). Here the robotic device holds a tool for the surgeon at a very precise position and orientation and the surgeon can then
performing a biopsy of the patient’s tumor. The biopsy needle can be tracked on an image guided system (shown on the right) to allow the surgeon to know when the target is achieve and that no other important structures (like vessels are in the way). Augmentation techniques could easily be added to such a system.
Wayne State University at Harper Hospital (of which the author was the
Engineering lead of the team) was the first to perform a clinical case using the
Neuromate system in the United States. This system takes advantage of links to the
patient image data and connects the robotic movements to the knowledge base of
patient-specific image data and structures. For instance, the kinematic positioning
software system knows if the arm is about to intersect with the patient and restricts
movement accordingly. The system software (VoXim™, IVS Software Engineering)
21
allows precise image-based planning and visualization of multiple trajectories.
Although this system is image data linked, it is not a master-slave dexterous device
and does not have an AR interface.
2.4 Medical Imaging, Segmentation and 3D Model Creation
The topic of 3D model generation is important to Image Guidance Surgery
(IGS), and even more important to Augmented Reality systems. For IGS surgeons
use primarily the orthogonal scans of the imaging data to perform the surgery. The
available 3D models are typically used as secondary information. In Augmented
Reality, the 3D models are the primary source of visualization. Because the role of
3D modeling is central for the medical visualization domain, in this section, a
discussion of how 3D models are generated from medical imaging data is provided. A
brief description of commonly used imaging techniques is also described. In addition,
a literature survey is conducted which shows the current state-of-the-art in
segmentation technology. Finally, some of the segmentation results that show the
models that were used to conduct studies for this thesis are shown.
The overall process of segmentation and model creation is illustrated in In our
experiments with segmentation, we fixed some simple objects inside the plastic
phantom of the skull. We took a CT scan of the phantom at 2mm increments through
the entire phantom. Once the imaging data had been collected, the next and very
important step was to segment the data. Segmentation is defined as the process by
which a label map is generated for each slice of imaging data to represent the
different regions of interest. There is an enormous and important research area for
22
segmentation that can be used for improving the accuracy and ease of segmentation
(Chen et al. 2003;Harders and Szekely 2003;Horkaew and Yang 2003;Lee et al.
2003;Tsai et al. 2003). We created the 3D models the marching cubes algorithm
which will be discussed in this chapter (See Figure 3-7).
Creating the virtual objects is a necessary step for generating an Augmented
Reality environment that is based on imaging data. For medical applications, each
patient has a unique set of objects that can be used for augmentation. Usually these
objects are tumors, skin surfaces, a set of major vessels, and relevant normal or
abnormal structures. We define our object model by a segmentation procedure that
uses medical imaging data. Medical imaging examples include computed
tomography (CT, sometimes referred to as CAT scan for computerized axial
tomography), Magnetic Resonance Imaging (MRI), single-photon emission computed
tomography (SPECT) and Ultrasound. Such techniques allow the surgeon to peer
inside the body and are now in routine use for patients. A brief description of some of
the common imaging technologies is provided in the next section as background
information as it is considered essential for the understanding of segmentation and
3D rendering technology.
2.4.1 Medical Imaging Data
There are many different imaging technologies that can be utilized and each
provides a unique view of the system that the physician is studying. CT was
developed in 1967 by Godfrey Hounsfield (Hounsfield 1980). His contribution was
that he linked x-ray sensors to a computer and worked out a mathematical technique
23
called algebraic reconstruction for assembling images from transmission data. A CT
scan of the body provides a density-dependent differential absorption of x-rays.
These views are obtained by the exposure of photographic plates placed beyond the
patient. On the other side of the patient is an energy source from an x-ray beam.
Multiple X-rays are taken as the X-ray tube revolves around the patient.
Computations are done which decipher the amount of X-ray penetration through
specific planes of the system being examined. This computation gives each pixel of
the image a density coefficient which corresponds to the material being penetrated
and is translated into a gray scale. CT scans are best for bones and other rigid
structures. The outputs of the system are gray-scale images sliced at physician-
prescribed distance apart (usually 2mm – 5mm slices).
In 1946, Felix Bloch (Bloch et al. 1991) and Edward Purcell independently
discovered (and received a Nobel Prize for) discovering that when a magnetically
energized substance is exposed to radio frequency it emits a particular frequency.
This process is similar to a tuning fork. They found that the nuclei of different atoms
absorbed radio waves at different frequencies. In 1970, a major discovery was made
that significantly changed the imaging world. Damidian discovered that the structure
and abundance of water in the human body was the key to MR imaging, and that the
water (hydrogen) emitted a signal that was both detectable and recordable. The basis
for the MRI scans is the magnetic properties and bipolar nature of the hydrogen
molecule. These molecules (ubiquitous in soft tissue) change their alignment when a
pulsed magnetic field is imposed. In the alignment process these molecules absorb
energy from tuned radiofrequency pulses. As their excitation decays, they emit
24
radiofrequency signals. These signals vary in intensity due to nuclear abundance or
the molecular chemical environment and can be imaged and converted using field
gradients in the magnetic field into sets of tomographs. The magnetic field needed for
Figure 2-6: The process of segmentation and model generation: (a) A plastic phantom skull with simple objects fixed inside the skull was scanned with a CT scanner at 2mm slices. (b). This is one coronal slice of the raw CT data. (c) This is a label-map of that one slice overlaid on the raw CT image. (d) This is the 3D model generated using the Marching Cubes algorithm after processing all the CT slice data. It shows a transparent skin model through which the internal front view of the phantom is visible.
typical MRI scans are on the order of 1-4 Tesla and the higher the imposed magnetic
field, the higher the resolution of the image. MRI differs from CT in that it images
differences in tissue based on chemical rather than density properties. Hence, MRI
scans are important for soft tissue anomalies. Another very interesting aspect of the
MRI scan is that it can be used to observe real-time changes for instance of brain
activity as a particular task is performed by the patient. This type of MRI is called
Functional MRI (FMRI). FMRI is important because it helps the physician understand
a b
c d
What’s up Doc?
25
the relationship between structure and patient function. This can help (for instance)
neurosurgeons determine functional areas within the brain (language, motor skills
hearing etc) that should be avoided during surgery so that damage to an important
center does not occur. In typical imaging sessions, a patient would be asked to move
or speak or react in some way and imaging would be done to determine what portions
of the brain are being activated for that particular function. Functional MRI uses MRI
equipment to detect regional changes in cerebral metabolism or in blood flow, volume
or oxygenation in response to these types of tasks. A common technique called
blood oxygenation level dependent (BOLD) contrast is often used. This technique is
based on the differing magnetic properties of oxygenated (diamagnetic) and
deoxygenated (paramagnetic) blood which lead to detectable changes in MR image
intensity (D'Esposito et al. 2003).
Another form of very useful imaging is positron emission tomography (PET). It
is based on the detection of subatomic particles and produces physiologic images.
These subatomic particles are emitted from a radioactive source administered to the
patient and typically will gather at the organ of interest. Similar to FMRI, the views
obtained from PET scans can be used to evaluate function. They can also be used to
detect cancer or even characterize cellular biochemical changes in order to examine
the effects of cancer therapy. There are other uses of the PET scan. For example,
PET scans of the heart can be used to determine blood flow to the heart muscle and
help evaluate signs of coronary artery disease.
A very recent development in imaging is 3D ultrasound. A conventional 2D
ultrasound probe (which has been used for decades) can be used in a novel way to
26
produce 3D imaging (Delcker and Tegeler 1998). If the 2D probe is equipped with a
six degrees-of-freedom tracking device spatially registered 2D scans can be
acquired. These scans can then be mathematically combined to create a tomographic
3D image set. The resulting 3D data image planes can be visualized by either
volume rendering or by the process of segmentation and surface rendering
techniques. The volumes of the structures can also be measured accurately.
All of these imaging methods have a common feature. They produce slices of
imaging data that can be segmented and viewed using advanced 3D modeling
technology and can be used for image-guidance procedures and are potential
candidates for Augmented Reality interfacing. However, before algorithms and
methods can be used to create 3D models of imaging data, the user must segment
the imaged data set to determine exactly what portion of the images are to be
converted into graphics models. This can be a painstaking process. There are
various methods of segmentation which can simplify the process, and these will be
covered in the next section.
2.4.2 Methods of Segmentation
Although advanced automatic segmentation techniques are not the main topic
of this thesis, they are considered here as background, because AR and image
guidance rely on the ease of use and generation of segmented objects. We point to
numerous pockets of important research that aim to make segmentation easier for
use in very complex environments. There are at least four major categories of
segmentation algorithms. These are boundary localization, voxel classifiation,
27
knowledge-based segmentation and deformable atlases (Miller et al. 1993). There is
an enormous set of literature in each of these areas of segmentation. Other
techniques proposed make use of a combination of gray-level based systems that
simultaneously incorporate information about anatomical boundaries (shape) and
tissue signature (gray scale) using scale and edge-detection algorithms and some a
priori knowledge to provide an unsupervised segmentation. In his thesis, (Leventon
2000) provides a very good survey of segmentation. A general overview of these
methods will be covered here to provide the background necessary to understand
segmentation technology and its importance to image guidance.
Voxel classication subdivides each image into elements (called voxels). Each
voxel has associated with it an intensity distribution, a decision on the type of tissue
and tissue type decisions of neighboring voxels. The decisions for tissue types are
determined with a thresholding scheme which the user inputs based upon the
imaging data and on properties of the imaging modality. The method relies on
knowing accurate information about the pixel ranges that different structures (for
example, gray matter) may have. Each voxel is then classified according to this and
other information. Other information includes for instance, how the neighbors of this
voxel voted and what the properties of the imaging modality being used are. The
weakness of this method is that the distribution of intensity values corresponding to
one structure may vary throughout the structure and overlap those of another
structure. In this case, this segmentation technique does not produce accurate and
optimized results.
28
Segmentation performed with boundary detection techniques use some
property of the border of the object of interest with other objects that are adjacent.
Generally high-gradient features are indicative of a boundary. In general terms, a
gradient is a vector field in which the first partial derivative of a multi-variable function
forms the intensity and direction of each point in space (Kaplan 1981). In the image
space, a gradient field could be used to describe high-frequency changes at the
borders of different objects because the partial derivatives give the rates of change.
These changes are the criterion for the formation of boundaries between objects.
The parameters of the gradients computation can be controlled to produce
segmentations of structures at different resolution.
The third approach that can be classified as a model-base approach uses
atlas-mapping technology to assist in the segmentation process. An atlas contains a
normalized set of labeled scans of a particular organ type. The atlas
mapping/warping is among the most common methods used for human brain
segmentation. A brain atlas is a database of structural and positional information
that is scalable to a particular person's brain. It is usually based on the real scans of
several subjects. When registered with actual data of a patient’s brain, this atlas
provides various levels of information about the patient's brain structures. Many
researchers have worked on such atlases (Nowinski et al. 2003;Vayssiere et al.
2002). The research in this field has progressed to such an extent that clinical use of
these systems has begun. A traditional atlas is acquired from a sample of actual
brain data. Typically, these normalized databases are embedded with registration
algorithms that allow matching between an actual brain and the theoretical data. This
29
approach attempts to deform a given labeled atlas to that of the new image data that
is to be segmented. So, given a new image set, the algorithm computes a non-rigid
transformation such that it is in correspondence with a normalized set of atlas data. If
the correspondences are computed correctly, then the warped atlas can be
successfully used in the structure labeling or segmentation of the new scan.
Deformations of this category that correctly warp one person's anatomy into another's
is quite challenging and can result in correspondence mismatches. This is especially
true for relatively small structures that are highly variable between subjects, and in
patients with anomalies that significantly change the shape of the organ beyond
normal.
Differentiation of tube-formed tissue such as blood vessels, trachea,
pancreatic duct structures and the ability to independently render them have a variety
of potential applications in the head, neck, lungs, heart, abdomen, and lower
extremities. Disorders in which artery–vein separation is most critical in the
cerebrovascular system include brain arteriovenous malformation. However, tube-
tissue segmentation represents one of the most challenging problems in
segmentation. (Lei et al. 2003) present a near-automatic process for separating
vessels from background and other clutter. They report on separating arteries and
veins in contrast-enhanced magnetic resonance angiographic (CE-MRA) image data.
Their separation process utilizes fuzzy connected object delineation principles and
algorithms. The critical step was to separate artery from vein within this entire vessel
structure via iterative relative fuzzy connectedness. After seed voxels are specified
inside artery and vein in the CE-MRA image, the small regions of the bigger aspects
30
of artery and vein are separated in the initial iterations, and further detailed aspects of
artery and vein are included in later iterations. At each iteration, the artery and vein
compete among themselves to grab membership of each voxel in the vessel structure
based on the relative strength of connectedness of the voxel in the artery and vein.
The result of this produced correct artery–vein separation. And when compared with
manual segmentation/separation, their algorithm was able to separate higher order
branches, and therefore produce many more details in the segmented vascular
structure.
(Erdi et al. 1997) have developed an automatic image segmentation schema to
determine the volume of metastases to the lung from PET images, under conditions
of variable background activity. An elliptical Jaszczak phantom containing a set of
spheres with volumes ranging from 0.4 to 5.5 mL was filled with F-18 activity (2–3
mCi/mL) corresponding to activities clinically observed in lung lesions. The adaptive
thresholding method applied to PET scans enabled the definition of tumor volumes.
This method can also be applied to small lesions. It should enable physicians to track
objectively changes in disease status that could otherwise be obscured by the
uncertainties in the region-of-interest drawing.
2.4.3 From Segmentation to 3Dmodel Creation
Segmentation is an important step to the creation of 3D Models, however, an
important component is missing—the actual generation of 3D polygonal structure that
represent the segmentation that was created in each of the slices. It was in 1987 that
Lorensen and Cline (Cline et al. 1987;Cline et al. 1991) developed a robust method
that enabled the creation of 3D models from scanned data. Marching Cubes is their
31
algorithm for rendering isosurfaces in volumetric data and is the most significant
contribution to this field.
The basic notion in the Marching Cubes algorithm is that we can define a voxel
(cube) by the pixel values at the eight corners of the cube. The idea is to 'march'
through each of the cubes testing the corner points and replacing the cube with an
appropriate set of polygons. If the cube pixel values lay between the user-specified
isovalue that particular cube must contribute some component of the isosurface. The
intersecting edges of the cube can then form triangular patches that divide the cube
into inside and outside regions. By connecting the patches from all cubes a 3D
surface can then be created on. If we classify each of the corners of the cubes as
either being below or above the isovalue, there are 256 possible configurations of
corner classifications. The key is to decide where along each of cube edges, the
isosurface crosses, and use these edge intersection points to create one or more
triangular patches for the isosurface. The genius of Lorensen and Clien is that they
realized that if you account for symmetries, there are really only 14 unique
configurations in the remaining 254 possibilities. When there is only one corner less
than the isovalue, this forms a single triangle which intersects the edges which meet
at this corner, with the patch normal facing away from the corner.
Hence, the volume can be processed in slabs, where each slab is comprised
of 2 slices of pixels. We can either treat each cube independently, or we can
propagate edge intersections between cubes which share the edges. This sharing
can also be done between adjacent slabs which increases storage and complexity a
bit but saves in computation time. The sharing of edge/vertex information also results
32
in a more compact model, and one that is more amenable to interpolated shading
(Watt 1985) .
Each 3D model that is created in represented by a 3D data set. The first
segment of the data is a vertex list which is simply a numbered list of all the points
(x,y,z) in the data in the objects own coordinate system. The second segment
contains an edge list. This list provides information as to how each of the vertices is
connected to form triangular patches. For instance, the list could specify that vertex
(from the numbered list in segment 1) 1, 2 and 3 are connected to form a triangle, as
do 3, 5, and 9 etc. The last segment of the file contains vertex normal vectors which
are used in the 3D rendering process to ensure correct features like acceptable
lighting.
Figure 2-7: 3D models generated from CT scans of our phantom skull. Each patch of the 3D model contains a
corresponding file item which specifies its vertex list and edge list.
2.4.3.1 Tracking Technology
One of the most important issues to consider for a very accurate AR and VR
application is the method for tracking the various elements of the environment such
V1
V2 V3
e2
e2
e1
Triangle Patch on Surface
In the object file
… Vertex list V1 x1 y1 z1 V2 x2 y2 z2 V3 x3 y3 z3 Edge list 1 2 3 /* Connect Vertex 1, 2 and 3*/
33
as the video camera and the tools and the patient (Pandya A.K. 2001d) (Azuma R.
1997). It is provided here as background information. In the Augmented world, since
slight deviations of the virtual and actual world are very noticeable, the requirement
for tracking accuracy is critical. Trackers also determine exactly how accurately the
registration between the virtual and actual object will be. Not many trackers can meet
the required specifications for AR systems and each technology has particular
strengths and drawbacks. Our research has focused on at least four different general
methods for camera and object tracking: (1) tracking using an stereoscopic infrared
camera system, (2) using a precise robotic arm with a camera mounted on it (3) using
image processing methods and pattern recognition techniques for camera calibration
tracking and (4) fiber optics tracking. Hybrid methods have also been considered and
offer the advantage of redundancy at the expense of computational cost and
complexity.
AR scene synthesis needs several pieces of information including a
segmented object model, camera parameters, camera pose, a video stream and a
transformation that describes the object position and orientation (sensed or known).
The generation of this scene entails correct registration of the graphical viewpoint
with the actual camera view and a mixing of the video frames with the exact graphical
view of the object of interest
Figure 2-8 represents the data flow in our implementation of AR. The
generation of this scene entails correct registration of the graphical viewpoint with the
actual camera view and a mixing of the video frames with the exact graphical view of
the object of interest. Given these data, an AR scene can be synthesized. An
34
important aspect of this figure is camera and object tracking. We are researching
what method or combinations of methods are appropriate for tracking components of
the environment. Camera tracking, which is needed to provide the geometrical (pose)
parameters, can be done in many ways. Tracking can be achieved by mounting a
camera on a robot, which provides the geometrical information through forward
kinematics, or by tracking the camera itself by some measurement method for
example using an infrared tracking system. Image processing methods can also
derive camera position and orientation (extrinsic parameters). The camera calibration
procedure (Image Processing) is also needed because it provides the optical (focal
length and lens distortion) parameters of the camera (intrinsic parameters).
The following are some general comments derived from our research on each
of the tracking methods with which we have experimented. There are some
limitations and strengths for using each of the systems outlined. Recently, infrared-
based sensors have been developed that are based on three cameras fitted with
linear charged coupled devices (CCDs) and cylindrical lenses that detect tiny infrared
light-emitting diodes (LEDs). The systems consist of an array of CCD cameras that
track instrument position by localizing LEDs located on the object of interest. We
have used this method for routine tracking of tools and patients for neurosurgery
applications (Li Q. 1999a). It is a very efficient technique with very high accuracy.
Line-of-sight and lighting conditions are major drawbacks of infrared tracking. The
virtual objects will only appear when the tracking marks are in view and the lighting
conditions are properly adjusted. There have been incidences in the OR where the
surgeon’s headlight would disable the infrared tracking system. When the surgeon
35
AR Scene Synthesis
Camera/Object Position/Orientation
AR Scene
Tracking methods
Infrared Tracking
Robotic Tracking
Fiber-Optic Tracking
Image Processing Methods
Intrinsic Camera Parameters
Segmented Object
Object Registration
Video Stream
Magnetic Tracking
looked away from the monitor/tracker the system would be operational when he
needed the information and looked at the monitor, the system would fail because the
light from the headlight would interfere with the infrared cameras. Because of the
cluttered OR environment, there have been several cases where the instrumentation/
personnel have been in the line-of-sight of the infrared system. When this occurs, the
system cannot function. Another limitation of infrared cameras is their range. Tracked
objects must be within the optimal tracking volume of the tracker that is roughly one
meter cubed.
Figure 2-8: Data Flow for an AR Scene Synthesis.
Pattern Recognition technology uses computer vision techniques to calculate
the camera orientation relative to a pattern (Kato H. 2000). (Billinghurst et al. 2001)
are giants in the field of AR with pattern recognition. They provide a useful AR toolkit
36
for software development with which we have experimented. We use certain features
of this toolkit (like video mixing) in our implementation. In this form of AR, video frame
is turned into a binary image based on some predefined threshold value. This value
can be influenced by the lighting conditions of the environment. The binary image is
then searched for certain regions that include the tracking markers. The tracking
pattern is then captured and compared to a database of pre-trained pattern templates
of that particular pattern. If there is a match, the software then has to calculate the
position of the real video camera by knowing the particular pattern parameters and
pattern orientation relative to the physical marker. Once the coordinate system of the
pattern relative to the camera is computed, any tracked or sensed objects within this
coordinate can be placed in their corresponding position. In pattern recognition, the
larger the physical pattern, the further away the pattern can be detected and thus the
greater the tracking volume.
One of the other tracking technologies that we have investigated to track
various objects in the environment (e.g. the camera) is fiber optics. The first step we
have taken is to understand the complex shape data that is generated and to
ascertain its accuracy. One of the possible uses of this kind of tracking device is to
track camera position and orientation of a flexible endoscope. If it was accurate
enough, this technology could be used to augment the view of a flexible endoscope
with the graphical models of structures in the environment. The “Shape Tape” is a
fiber optic device with characteristics well-suited to this work. It is based on fiberoptic
technology and can report 3D information including flexion/extension and
bending/twisting motions. It does this with specially treated fibers to sense curvature
37
(bend and twist). These sensors have been treated on one side to lose light
proportional to bending (Danisch 1997). The lost light is contained in absorptive
layers that prevent interaction of light with the environment. Modulation of light
throughput is very linear with curvature, and uses over 30% of available throughput
over a typical sensor range. Although this technology has great potential, our
measurements of the accuracy of this sensor indicate that the accuracy (on the order
of 1-2cm) is not sufficient for the medical domain. There are more advanced fiberoptic
sensors that are being developed and may have greater potential application in this
field in the future. This sensor is mentioned here as a placeholder for future work
when the technology is improved.
Electromagnetic sensors are attractive because they are relatively inexpensive
and do not require a “line of sight” between the transmitter and the receiver.
Magnetic digitizers use a transmitter that generates an electromagnetic field upon the
operative field. Since the magnetic field is very regular and well know, position and
orientation of three orthogonal coils (in which the magnetic field induce proportional
currents) can be determined. These probes can detect gradients in the magnetic field
in three dimensions. Ferrous metals within the environment distort the
electromagnetic field and render the system inaccurate. However, Louis et al. (Louis
1999) used a magnetic tracking (Flock of Birds Ascension Technologies, Inc.) for a
virtual reality based system for cervical spine measurements. The magnetic sensors
were attached to the head and torso and enabled measurement of the translational
and rotational movements of the head with respect to the torso. We have used
magnetic sensors in our work in the development of a VR system for space station
38
applications (See Figure 2-2) (Chmielewski et al. 1998;Goldsby M.E. 1994;Pandya
A.K. 1994). These sensors have on the order of 5 mm of accuracy and are on the
borderline of what is needed for medical applications.
Figure 2-9: Fiber Optics technology that was studied as a possible candidate for AR/IGS tracking.
Fluoroscopy is another method used by some researchers to measure
movement. Fluoroscopic images are based on X-rays and hence are invasive.
These images can be time consuming and dynamic situations can be difficult to
capture. The accuracy of determining the relative vertebral motion is good, however,
the fluoroscopic image has a limitation in that it is a two-dimensional snap shot of a
three dimensional motion. If careful analysis is not done, 3D information is difficult to
capture and see in fluoroscopic images. One way to get around this problem is to
use multiple axes of fluoroscopic images and fuse the images together. (Komistek et
al. 2003) studied cervical disc degeneration. Their study focused on the
determination of the in vivo kinematics during active flexion and extension of normal,
degenerated, and fused cervical spines.
There are some researchers that have come to the conclusion that hybrid
methods of tracking must be used. (Rosenthal et al. 2002) have used fudicial
tracking in combination with standard magnetic tracking. In this system, the hardware
tracks fiducials in the video images where the locations of each of the fiducials are
39
known. They have achieved superior tracking using this methodology. The position
and orientation of the viewer is computed by inverting the projection operation. The
position data from the magnetic tracking system aids in the localization of the tracking
markers. This technique will work well when lighting conditions are stable and fiducial
trackers can easily be placed on objects of interest. In the medical domain, this may
not always be possible.
The solution that is chosen in this thesis is to use a robotic device with a
precisely mounted end-effector camera. A robotics-based camera overcomes the
problems of line-of-sight and lighting issues, but adds the limitation of range. The
robotic solution is dependent on the robotic kinematics and the range of motion of
each of the joints and their accuracy. This aspect of tracking will be considered in
detail in Chapter 4: For the restricted volume needed for robotic surgery applications
this solution is especially attractive. Also, since the robot is already in place for this
application, this tracking method would be practical and relatively straight-forward to
implement.
2.5 What is the Importance of Human Factors in Medicine and Engineering?
It is the premise of this work that technology development is necessary, but not
sufficient for successful application in medicine. In the development paradigm
chosen, important components are the technology development, user testing and
surgical testing. As the technology is developed, the new technology is first tested for
accuracy, and then it is tested with user feedback against the conventional
technology. Performance data is gathered on the subjects performing the tests. The
40
metrics used can be, for example, the number of error made during the test; time to
complete the task and tests to see what insight is gained by the user. An example of
this method is the way in which we studied the technology of using Heads-up display
as compared to the traditional monitor viewing for endoscopic surgery.
In this study, after all the components were tested and configured, a set of
subjects (22) were tested to see if there was any increase in performance of a
particular simulation of a surgical task. A statistical significance was shown for
increased performance and the technology was then successfully tested in the
operating room with positive results. The surgeon provided feedback that was used
to enhance the technology and improve the user testing. Using this methodology,
the technology development is grounded and balanced with subject testing and
extensive end-user feedback. This is bound to lead to solutions that provide more
return on investment and will be closer to actual use in surgery for the benefit of
patients (see Figure 2-10).
Figure 2-10: Successful technology development for medicine must include extensive user testing and surgical feedback.
0
0
0
0
0
0
End-user / engineering
feedback
User testing and comparison to conventional technology
Positive results? Usability testing feed back
Technology Development Surgical Testing
41
Chapter 3: Image Guided Surgery (IGS)
Any sufficiently advanced technology is indistinguishable from magic.
- Arthur C. Clarke
The science of presenting and displaying complex 3D images in an operationally
meaningful way to a surgeon needs to be studied systematically. For this thesis several
test beds were developed to evaluate different techniques of image data visualization.
One of the forms implemented was an Image Guided Surgery (IGS) system. This
chapter covers the description, implementation, operations and accuracy testing of such
systems.
3.1 Literature Review and Description of System
Throughout every operation, a surgeon must maintain a precise sense of
complex three-dimensional anatomical relationships. IGS was first used for
neurosurgery Accurate visualization is crucial in neurosurgery because visual
landmarks are relatively rare, and they are completely missing within gray matter.
Damage to eloquent portions of the brain anatomy can severely impair the patient.
Although neurosurgeons were the first to embrace this technology (due to the relatively
static nature of the brain) this technology is starting to impact all areas of medicine that
use an image-based approach (Berry et al. 2003;Hinsche and Smith 2001;Holly and
Foley 2003). IGS has made a tremendous impact and is here to stay. As imaging
42
systems become more integrated with the operating room and imaging becomes more
real-time, the tools of IGS will become even more useful.
Often lesions are surrounded by vital neurological and vascular structures and
have irregular configurations. This poses real problems during surgery in terms of
orientation, visualization and optimal tumor resection. Although stereotactic systems
provide the necessary position and orientation, the type of imaging data plays a key role
in the effectiveness of these operations. Diverse modes and types of imaging provide
alternative types of information. For instance, if the anomaly is close to the speech or
motor center, the FMRI scan will be done while the patient is speaking. The speech
center is enhanced on the scan and the surgeon, during image guidance, can avoid this
area that is displayed relative to his tool’s trajectory.
Advances in stereotactic science are transforming medicine and surgery (Wadley
and Thomas 2000). In preoperative evaluation, operative technique, intraoperative
monitoring and data collection. Computational power enables effective application of
technologies requiring high-resolution visualization and precise control and
manipulation of surgical instrumentation. Image Guidance Systems use stereotaxis
which divides the brain into three intersecting orthogonal spatial planes (sagittal,
coronal and axial, See Figure 3-1). These planes provide a rigid coordinate system
from which all the slices of the scan can be referenced. In other words, each point in
the patient’s MRI scan has a coordinate value relative to the imaging studies reference
frame.
As described in section 2.1, Image guidance involves pre-operative imaging
studies such as CT or MRI scans of the patient. These scans provide the surgeon with
43
different views of the pathology. IGS uses a methodology that translates into accurate
and reliable image-to-surgical space guidance. It is analogous to having a global
positioning system for the human body. These systems are primarily used as
navigation systems, but, as intelligence and a knowledge base is built into the systems
of the future, it may be possible to give optimal and least dangerous paths, warnings of
approaching dangers, signals and annotations of different organs or even information
from embedded smart sensors as guidance to a surgeon in real-time.
Figure 3-1: The three planes of an image data set. (Axial, Coronal, and Sagittal)
A relatively new field IGS blends the use of computer-based medical imaging
data with real-time instrument position data capture to assist the surgeons in localizing
and removing lesions. The methodology involves three components: image acquisition,
with definition of coordinated space from one or several imaging modalities, planning or
44
simulation of the surgical procedure, and intraoperative patient registration procedures
(Li Q. 1999a;Li Q. 1999b). Because the field is relatively new and much of the
technology is related to Augmented Reality, in the next section, the technical details of
how such a system can be implemented are given.
3.2 Implementation of Image Guidance System
The implementation of an IGS system has three basic components—tracking,
registration and image display. As described in section 2.4.3.1, there can be several
tracking technologies used for navigation systems. In this thesis, due to the emphasis
on ultimately enhancing robotic technology, the tracker that was chosen was a passive
robot arm (also considered an articulated arm). Due to its kinematic similarities to
active robotic devices, we have chosen the Microscribe arm which will streamline the
translation of the developed imaging technology from this test bed to active robots. A
section on the Microscribe tracker is given along with some kinematic details, next a
section on how the patient or phantom is registered to the imaging data is given and a
section is given on how the software is implemented.
3.2.1 Passive Robot Arm Used as the Tracking system
The critical component in interactive image-guided surgery is the use of an
intraoperative localizer system or a digitizer, which ultimately provides the surgeon
useful navigational information usually in the form of position and/or orientation of
surgical instruments. Infrared tracking is one of the most popular methods used in
stereotactic neurosurgery. The typical IGS system allows the surgeon to view the
45
trajectory that their tracked tool is taking through the tissue it is penetrating (See Figure
3-10). The Microscribe device can be considered a passive robot (articulated arm). Its
geometric and transformation structure similarities make it an inexpensive and useful
analog / test bed to current robotic systems. It has the advantage of being readily
accessible and amenable to quick prototype development and evaluation. We have
integrated the serial interface to the five degree-of-freedom Microscribe and can get a
precise position and orientation of its end-effector and any tool rigidly fixed to its end
point.
Figure 3-3 Illustrates, a forward kinematics solution (i.e. the end-effector of the
robot coordinates in terms of the base coordinates) for the Microscribe. The dotted
green arrows on the figure show the 5 degrees of freedom (DOF) (rotation axes) of the
device. Each individual transformation matrix (T) (See Equation 3-1) specifies how the
first joint is related to the second joint by describing the rotations (in terms of the
direction cosines) and the translations needed to transform one joint coordinate to
another. Knowing the joint angles of each DOF, a forward kinematics solution can
be compute by matrix multiplication to form a homogeneous (T) transformation matrix
that defines the position and orientation of the end-effector in the base coordinate
system.
EEJJJJJJJJBEEB TTTTTT −−−−−− ××××= 33221100 (Equation 3-1)
Each of the transformation matrices described above is a homogeneous
transform which has both the rotation component and a translational component as
specified by the equation below. The unique features of the homogeneous transform
46
are that the vectors of the rotation matrix (unit vectors) of the transformed coordinate
system are referenced in the row vectors of the matrix in the base coordinate system.
In other words, the inverse of the matrix is simply the transpose of the rotational matrix
and the negation of the translational vector and makes them mathematically elegant.
(Equation 3-2)
Because the measurement of the end-point of this system involves several joint,
we performed an experiment to evaluate with what precision the arm could capture the
same point using various arm configurations. This is an issue because a user of the
system who is asked to digitize a certain point can do it with a variety of joint angles.
This is important for the next step (patient registration) where several points need to be
captured from special markers on the patient’s skull. Our results indicate that there is a
rms average error between all these points of about 0.73 mm in the capture of a
particular point. This is a gage of the precision that we can expect from this instrument.
In another experiment we used a 3D ruler (which is accurate to about 0.5mm) and
measured a known point. For the measurements we made, the average error was
0.84mm different from the actual point Figure 3-2. We can conclude that the accuracy
of the Microscibe is within 1mm. More details of the error measurements are provided
in Chapter 4 where error is considered in more detail.
Rotation Translationn
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
−−−
=
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
= −−−
100010002221202
1211101
0201000
1
2222120
1121110
0020100
trrrtrrrtrrr
Ttrrrtrrrtrrr
T xxxxxx
47
Figure 3-2: In this Experiment, we captured the same point in space using different arm configurations. We found that there is a 0.91mm standard deviation . The Red line represents the average.
Figure 3-3: This is the transformation structure of a typical robotic device. The green arrows show the degrees of freedom of the arm. The blue lables (with red arrows) show the transformations needed to compute the base to end-effector transform.
J0
J1
J2
J3
EE
TB-J0
TJ0-J1
TJ1-J2
TJ2-J3
TJ3-EE
*TB-EE
B
Microscribe Accuracy of Same Point with Different Arm Configuration
325
326
327
328
329
330
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Point Number
Dis
tanc
e fr
om C
ente
r
48
3.2.2 Patient Registration with Fiducial Mapping
One of the most fundamental problems in IGS systems is the registration of
objects in the scene (Gong J. 1999). Patient registration involves the determination of
the spatial relationship between the image and the surgical coordinate space (Alp et al.
1998;Ferrant et al. 2002;Fleute and Lavallee 1999;Weese et al. 1998). The objects in
the real and virtual worlds must be accurately aligned. Without this alignment, the scene
will not be accurate. Certain applications (such as Brain surgery / biopsy etc.) require
even higher fidelity of registration. Registration is the process of defining special points
based on the fiducial markers or anatomical landmarks from CT, MRI, or PET scan data
and relating them to the corresponding patient data. In Figure 3-5 the 3D model of the
phantom skull (generated by the segmentation process of the CT images of the
phantom) shows the locations (in blue) of the corresponding fiducial markers (which
were visible on the CT scan). These points are then correlated with the points located
on the head of the patient in the “real world” in the operating room. The goal is to match
and correlate data from the medical images to the “real world” (i.e., the coordinate
space of the surgical instruments) (See Figure 3-5). Each fiducial in the image space is
located in each of the 3 orthogonal slices and is also verified on the 3D model of the
phantom. The exact position of each of these fiducials in the image coordinate system
is recorded in a file.
The next step is the patient/model registration step. A tracking device (the
articulated arm) is attached to the instruments to continually relay information regarding
its position and orientation of the tip (end-effector). Each of the fiducial points is located
in the image space and corresponds to an actual point on the surface of the model. All
49
the actual points are then digitized as shown in Figure 3-7. The paired points (image
space coordinates and robot space coordinates) need to be matched in such a way as
to provide a rigid-body transformation matrix which describes their relationship.
Coordinate matching ensures that any point seen in a medical image corresponds to an
actual point in the patient’s anatomy. To avoid problems of accuracy and also to reduce
the effect of noise, we usually take more information (points) than what it is theoretically
needed. The computation can be done with as few as 3 points; however, there is a risk
of computing the wrong transformation. There are various errors that make the
computation of an iterative solution justified. There are errors not only in the tracking
device, but also in the capturing of points in both the image space and the actual points.
These inaccuries in point position makes the computation of the exact transformation
between the coordinate systems a little tenuous. Also, typical fiducials that are used for
patient registration have a design which is conducive to error. The flat middle portion of
the fiducial has a 2-3mm area in which the user can place the digitizer probe anywhere.
A funnel shaped design would allow for a much more accurate measurement as it
would allow the tool tip to be guided to the exact position. Also, just how many fudicial
points and in what orientation they need to be placed is an active area of research
(Cash et al. 2003). Surface matching is also a technique that is used and the results
from this form of registration is as good as fiducial matching (Gong J. 1999).
50
3.2.2.1 Rigid-body transformations
In order to compute a transformation matrix that can translate/rotate a vector in
the patient space to the image space, 6 parameters must be known (3 translation and 3
rotation) (See Figure 3-4). For computation flexibility, these 6 parameters can be
converted into a 3x3 rotation matrix (consisting of the direction cosine vectors) and a
3x1 translation vector combining to make a 4x4 homogeneous transformation matrix
that can easily allow transformation between these coordinate systems by matrix
multiplication. Figure 3-4 describes the relationship between the same two points in
different coordinate systems. The equation below describes how a point in the patient
coordinate system can be transformed to a point in the image coordinate system using
a simple vector to matrix multiplication. The 44× transformation matrix is one that is
estimated from the pair point matching procedure (Gong J. 1999) to be discussed next.
Figure 3-4: The transformation between the patient coordinate system and the image coordinate system can be represented by a translation vector (T) and a Rotation vector (R). These entities are for the homogeneous 4x4 transformation matrix.
Zp
Zi
Rx Ry Rz
Tx, Ty Tz
Op Oi
Patient Space
Image Space
Xp
Yp
Xi
Yi
Paired Points Captured in both
51
I
x
y
x
P
x
y
x
p
pp
trrrtrrrtrrr
p
pp
⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢
⎣
⎡
=
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢
⎣
⎡
110001
2222120
1121110
0020100
Equation 3-3
In rigid-body transformations, there are two transformation entities –a translation
and rotation. Given a rotation vector which represents rotations around the x, y and z
axes respectively, a rotation matrix can be computed which relates two orthogonal
coordinate systems having the same origin but differing in these rotation angles. Each
individual rotation is given by the following three sets of equations:
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡−=
1000cossin0sincos
)( αααα
αxR ⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡ −
=ββ
ββ
βcos0sin010sin0cos
)(yR ⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡−=
1000cossin0sincos
)( γγγγ
γzR Equation 3-4
The combined effect of these rotation matricies is given by the following
equation:
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡+−−−
−+−=
ββγβαγβγαγβαγαγβγγβγαγβαγαβα
αβγcossinsinsincossinsincoscossincossincossinsincoscos
cossinsincoscoscossinsinsincoscos)()()( xyz RRR Equation 3-5
In order to convert a matrix back to euler angles, one has to work form the
components of the above matrix and convert the direction cosine values back to angles.
For instance, the last component of the third row (2,2) of the cosine matrix can be used
to get the )][cos(ββ >−− , from which we can use the β to compute α by using
52
component (2,0) of the above matrix. Then use the β again to get γ by using
components (0,2 )and (1, 2) and (2,1).
Figure 3-5: Fiducial markers (visible in CT scans) are applied to the phantom before the scan .
Figure 3-6: A Fiducial marker in the imaging space is located on all three slices of the CT scan and is displayed (in blue) on the 3D model.
In order to convert rotation angles into a transformation matrix and vice versa,
equation 3-5 must be used. This conversion in effect reduces the number of
optimization parameters from 12 down to 6 as the entire direction cosine matrix and the
Axial axis Sagittal axis Coronal axis
3D Model
53
translation vector no longer need to be used in this minimization process. This is very
important for the convergence and complexity of the algorithm that we will cover next
(Van Loan 2000).
.
Figure 3-7: On the left, the fiducial marker on the actual skull is being digitized by the robotic articulated arm. This point corresponds to the one shown on the 3D model (right).
Figure 3-8 Illustrates at a conceptual level, how parameter estimation is applied
to the image space to patient space registration and what linear equation sets need to
be solved. Central to parameter estimation, is a model of the system (See Figure 3-9).
This model takes a set of parameters (which need to be optimized) and the input set of
observed values. Given this information, the model will compute the corresponding
output set of values. The output set is compared to the corresponding observed values
to ascertain if the values are within a certain tolerance. If not, the parameters will be
updated using a steepest-descent algorithm and the process will be iterated upon until a
solution has been reached (Van Loan 2000) (Gong J. 1999).
In general, in order to optimize a particular set of parameters, we must first
define an objective function ( 2χ ) which corresponds to the model of the system. In
order to use a steepest descent approach, we must be able to compute the gradient of
54
the objective function at the particular estimated parameter set ( ))(( 2 aχ∇ . The gradient
of a function gives the vector where the function takes the steepest slope (Press W.H.
1992). It is akin to a mountain stream which follows the gradient of the mountain to get
to the lowest possible minimum. There are dangers in that the function can get “stuck”
in a local minimum and not find a global minimum value. In order to take a functional
step in the direction of the gradient, the following equation must be used (Kaplan 1981).
))(( 2currentcurrentnext aaa χλ ∇×−= Equation 3-6
Figure 3-8: Parameter estimation for pair-point matching algorithm. The paired points (fiducials in image coordinates and patient coordinates) are used to derive a rigid body transformation that will translate a vector in patient space to a vector in image/3D model space.
Once the function can be evaluated at its gradient, the next iteration can be
taken and the value of the parameter vector (a) can be refined such that the correct
values can be achieved. We use the Marquardt and Levenberg (Van Loan 2000)
method for the computation of the steepest descent and it produces a fast convergence
and sufficient accuracy for this application.
(observed pairs of values) (xi, yi, zi) ↔ (xa, ya, za)
Model (M)
Updating Parameter
∆
x(M), y(M), z(M)
Initial Guess
ai
Within Tolerance?
Transformation Parameters Yes
No
Vectors in Patients
Coordinate
Vectors in Image
Coordinates
55
Once all the observed pairs can be predicted by the adjusted parameters to
within a certain tolerance, the results are reported. The 6 parameters reported (namely
the three translations and three rotation angles) are then converted to the
homogeneous transformation matrix. The real-time software system is then able to
apply this transformation to all the positions and orientations of the patient coordinate
system to produce the same point in the image coordinate system. Hence, the end-
effector vector of the tool being tracked can then be displayed in the image space or the
3D model space using this transformation.
If there are multiple sets of images taken (for instance CT /MRI and PET scans),
then the same methodology as above can be used to compute a transformation matrix
to align or overlay the images from multiple modalities. This is called image fusion and
is an active area of research. All available preoperative image data are fused into a
uniform coordinate system that corresponds to the individual patient's brain. The
imaging data can then be presented to a surgeon in a single display for surgical
planning and computer-based operations that will give the surgeon an optimal viewing
environment.
Each imaging modality displays anatomical structures and lesions in a unique
way. This benefits the surgeon by providing several different ways to view the same
anatomical structure, and requires the development of an interactive relationship
between the images and the real world. Registration is used to build this relationship
and enables the surgeon to use each imaging modality to its greatest advantage for
localizing the anatomical structure. The registration is mathematically identical to the
56
patient registration already described (Press W.H. 1992). The difference is that the
pairs of matched points come from each of the image modality coordinate systems.
Figure 3-9: This is the model for coordinate transform estimation . The point in the input of the model are converted to a new coordinate system using the given estimated parameters.
Figure 3-10 shows a surgeon using an image guided system in the operating
room on an actual patient (Zamorano L. 2001). In this case, the tool that she is holding
is tracked by an infrared camera. The registration has already been performed on the
fiducials applied to the patient before the imaging studies were done and the tools
trajectory is displayed in the image space in all three orthogonal scans of the patients
MRI and also on reconstructed orthogonal slices relative to the tool trajectory.
Figure 3-11 shows a similar system developed in this thesis showing the robotic
end-effector on orthogonal slices of the phantom’s CT scans as well as a 3D model of
the phantom. In this case, the difference is that tracking system is a robotic device, the
imaging study is a CT scan and the tool is also displayed in a 3D model of the phantom.
γβα ,,,, ,210 TTT
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
=−
10002222120
1121110
0020100
trrrtrrrtrrr
T xxx
iii ZYX , )(),(),( MZMYMX
Parameters to be estimated
Internally converted to Homogeneous Transformation
Model Output Model Input
57
Figure 3-10: A surgeon is using an image-guided system where the tool that she is using is tracked by an infrared camera system and displayed on the orthogonal slices of the preoperative MRI scan.
Figure 3-11: Image Guidance systems that are used in the OR was re-implemented in this thesis to use the same articulated-arm tracker as the AR system as a testbed for evaluation of this current technology with the up-coming AR technology.
3.2.3 Software Architecture
The implementation of the software system is described in Figure 3-12. There
are three main software components the server, the main client and the mirror clients.
This type of system architecture is important in telemedicine where expert collaboration
and teaching are emphasized (Pandya A.K. 2002). The server subsystem interfaces to
the robotic hardware and reads the kinematic data that the robot provides. The server
interface (Figure 3-13) allows the user to control the registration process and provides
information about the operations of the server software. It does a read of the robot
58
whenever the main client requests data. Only the main client can request a read of the
robot. All the other clients (anywhere on the internet) can only follow the lead of the
main client and mirror the data that it has processed. The reason for this is that each
connection to the server is implemented as a separate thread (process). Multiple
processes are forbidden to open and request information from the robotic hardware
serial interface at the same time. Only one thread can interact with the serial port (the
main client). Moreover, the main client is the software that is running where the
hardware resides. Although it is possible to control the robotic reads (and if it was an
active robot—the joints) from a remote location, it only makes sense from the surgery
site.
In addition, each client (including the main client) has a complete set of imaging
data and the associated 3D models. As computation power and graphics through-put
can be very different, each client is free to choose the fidelity of information that it can
handle. If the machine the client is executing on is very fast and has a high-end
graphics card for 3D rendering, it can display high-resolution 3D models. On the other
hand, a simpler PC may be able to handle only low resolution image data and only
simple wire frame models of the 3D data to achieve near real-time performance. The
design implemented for this software allows this flexibility and also opens a door for
research in this very fertile area of telemedicine.
Before it is open for requests by clients, the server prompts the user to do the
pair-point registration (which was discussed in detail in section 3.2.2). This registration
process takes the pairs of matched points (image space points and their corresponding
actual points) and computes the transformation matrix necessary to convert the raw
59
end-effector pose data to the image space coordinate system. The server then waits
for the main client to log in. The main client (which is another program running on the
internet) must be the first to initiate a request. It first must send an authentication signal
and a password (for data security reasons) to the server. The server checks its list of
users and then allows the main client to send requests. The main server then initiates
the sequence by sending a request for kinematic data that it requires. The server then
queries the robot, computes the necessary data and sends it to the main client. Up until
this point, all mirror clients are blocked from logging in. The server then waits on the
same internet port for more requests. If another client wants information, it must also
Figure 3-12: Software implementation of the Image Guidance System. The system is implemented as a client-server system in which multiple clients anywhere on the internet can view the scene as seen by the main client.
provide authentication information for the server and must identify itself as a mirror
client. The server then checks that a main client has requested information and sends
any of its already computed information (from the latest main client request) to the
Robotic Kinematic
Server
Object Registration
Software
Image
Guidance Clients
3D Models and
Imaging Data
3D Tool Trajectory
Visualization Software
Robotic /Articulated
Arm
Internet Clients: Request Kinematic DataServer: Sends Kinematic
60
mirror clients. In other words, the mirror client can only echo what the main client is
requesting and updating. Any mirror client can request a variety of pre-compute
information from the server which includes the individual joint angles, the end-effector
transformation in the robots coordinate system, the end-effector transformation in image
coordinates, or even the transformation matrix that the server has computed after it has
done the pair point matching. As this information is patient information that is crossing
the internet, using this same paradigm of client/server an encryption scheme can be
setup and the data can securely travel the internet in a Virtual Private Network (which is
beyond the scope of this thesis). The server software has been implemented on a PC
platform using Visual C++. The Client software uses 3Dslicer software platform
(Grimson W.E.L. 1996) written in Tcl/Tk onto which several modules have been added
to login to the server software and also to display the end-effector in the 3D display. All
software and methodologies and hardware specifications can be provide for non-
operating room use (as the software has not gone through the rigorous testing for OR
use) upon request to the author.
3.3 Discussion
The goal of this portion of the thesis was to create a test-bed IGS system for use
with subject testing. It was important that the test-bed hardware be the same for the AR
system development to maintain consistency and provide a baseline for comparative
user testing. It is important to note that the novel features of this in this section are the
client/server interfacing. This can be useful in a telepresence type of application where
multiple remote users of the system are needed for consultation. Many of the other
61
features implemented are readily available in commercial systems as IGS has
developed substantially over the last 5 years. The major benefits of this development in
the context of this thesis are (1) that this implementation provides a common platform
from which allows direct comparison of AR technology with IGS taking away the
inconsistencies of multiple platforms, and (2) that AR is based on much of the same
technology and an understanding of how IGS works provides a basis for understanding
AR. In the next chapter, implementation details of the AR system will be provided.
Figure 3-13: This is the Tracker Server interface. This software does the pair-point matching on the image data and also handles communication of various tracking information to multiple clients on the network.
Name of fiducial Image Space Matched Points File
Information on who is logged in and what information is being requested
Registration and other Messages from the Software
Internet Port that the server is listening on
Registration Successful. Error = 1.2mm Server Listening for Requests on port 1024
Main Client: Requests : Position in Image Space Mirror Client 1 : Requests Position in Image Space. Mirror Client 2 : Requests Registration Matrix.
62
Chapter 4: MEDICAL AUGMENTED REALITY SYSTEM (MARS)
Reality is merely an illusion, albeit a very persistent one.
- Albert Einstein
Like the image guidance system described in the previous chapter, another
technical aim of this thesis is to create a test-bed Augmented Reality (Pandya A.K.
2001e;Pandya A.K. 2001c) system to be used to prove the comparative utility and
accuracy of this technology and to provide an easily translatable kinematics-based
AR prototype. In this chapter, an introduction to AR technology is followed by an in-
depth literature survey. Then, the details of how an AR system is implemented are
given along with a component-by-component error analysis of the system.
4.1 Literature Review and Description of System
An Augmented Reality (AR) system generates a composite view for the user
that includes the live view fused (registered) with either pre-computed data (e.g. 3D
geometry) or other sensed data. There have been several researcher that have
studied AR techniques for various fields within the Medical domain (Azuma
1997;Blackwell et al. 2000;Cho and Neumann 2001;Freysinger et al. 1997). One of
the novel features that we present here in this thesis is the use of medical robotics
with AR. For convenience, a new term is introduced--Augmented Robotics--(Pandya
A. K. 2002) which is defined as an Augmented Reality scene generated using the
kinematics of a robotic device which has a camera system mounted on or near the
63
end-effector. It is a combination of the real scene viewed by the user and a virtual
scene generated by the computer that augments the scene with additional
information.
Real-time video image processing and computer graphic systems technologies
have converged to make possible the display of a virtual graphical image correctly
registered with a live video view of an environment of interest. To our knowledge, no
other medically related AR systems based on robotic tracking has been reported
(Pandya A.K. 2001e;Pandya A.K. 2001d;Siadat M. 2002). It potentially represents an
efficient and intuitive way to link robotic systems and the surgeon to the patient data.
We envision a system in which the surgeon can visualize critical imaged or sensed
data on demand directly overlaid on the video stream at the remote site. The research
activities in Augmented Reality center around the development of methods to register
the two distinct sets of image/data sets and keeping them registered in real time
(Billinghurst et al. 2001). The computer generated virtual objects must be accurately
registered with the real world in all dimensions. Errors in this registration will prevent
the user from seeing the real and virtual images as fused (See Figure 4-1) (Azuma R.
1997).
Robotic systems have advanced dramatically over the last few years as
described in the introductory chapter. However, they typically lack a link to the patient
imaging information and therefore have no Augmented Reality capability.
Visualization of the structures of interest before the surgery would greatly enhance
the surgeon’s ability to position the three small laparoscopic ports on the patient.
Incorrect placement of these ports can drastically affect the success of these
64
surgeries. The narrow field of view in laparoscopic surgery makes it hard for the
surgeon to recognize the internal organs. In addition, due to the fact that the video
views obtained from the scope are very near field, and not always in the surgeon’s
frame-of-reference, the surgeon can become disorientated. Therefore, an AR system
may bring a significant improvement into the robotic laparoscopic surgery procedure
(Cleary K. 2001).
Figure 4-1: An AR scene can be generated by the alignment of the camera’s trajectory and the 3D graphics virtual camera’s trajectory. Once the two cameras are aligned, the actual objects will match their 3D modeled replicas.
During surgeries, fixed critical objects overlaid on the video stream could
provide additional situational awareness cues. For instance, the robot arms could also
be augmented when not in direct view, simple coordinate systems could be added to
give orientation cues to the surgeon, or registered sensor data could be also overlaid.
It is the thesis of this paper that a potential exists for robotic and IGS systems to use
Video or View of Scene
Tracked Camera
Pose
Virtual Camera Position
Graphics View of
3D objects
A L I G N M EN T
65
augmentation technology to provide the surgeon with a direct link with patient pre
and/or intra-operative image (e.g. ultrasound data, open MRI data) and other sensor
data.
We have built a system in which the surgeon can visualize critical imaging data on
demand directly overlaid on the video stream. The system is built upon the technology
of IGS as described in chapter 3. The AR system implementation adds a video
camera mounted at the end of the Microscibe (passive robot arm) which is calibrated
and registered and is able to generate an augmented view of the phantom skull. The
augmentation in this case is made up of 3D models of the various structures within
the phantom that were generated using the segmentation techniques described in
Chapter 2 from a CT scan of the phantom (See Figure 4-2). In the next section, a
detailed literature review of Medical AR technology is provided.
Figure 4-2: The precursor of Augmented Reality/Augmented Robotics. We use the Microscirbe as the
tracked tool, (a). The position and orientation of the end-effector is shown on the orthogonal slices and 3D model of the phantom skull. After adding a calibrated and registered camera (b) we can generate a monoscopic Robotic Augmentation Scene,(c).
(a) Robotics-based Neuronavigation
(b) Camera System at End-effector
(c)Augmented Robotics
66
Augmented reality is an up-and-coming field in the medical world. Its uses for
the medical discipline in which accurate 3D visualization (particularly surgery) should
seriously be researched and implemented. AR systems are currently being
researched for clinical usage. We predict that AR will become an important tool in
medical training, preoperative planning, preoperative and intraoperative data
visualization, and intraoperative tool guidance. It is a technology that uses some form
of three-dimensional (3-D) position sensing, a 3D reconstruction of the patient data of
interest and real-time overlay of this information on an actual view of the patient. This
section reviews pioneering research in this field and provides the framework and
context on which the current prototype AR system provided in this thesis is based.
4.1.1 Medical Augmented Reality
Medical augmentation has been studied by numerous other researchers (Iseki
et al. 2001) (Maurer et al. 1999). However, a landmark paper on registration methods
for Image Guided Surgery and Enhanced Visualization was presented by (Grimson
W.E.L. 1996). In this paper, an augmentation scene of a static camera system was
produced using a laser range scanner and a video camera. In their approach, a laser
scan of the patient produced a set of range data. This data was manually separated
to include just the patient and region of interest. Next the data was manually matched
to the video frame and a computer controlled refinement stage that cycled over all the
possible pairings of the MRI points to laser points produces the needed AR
transformations. In their paper, they state that “Augmentation of a stationary video
camera is relatively straightforward; however, [dynamic] tracking of the camera is
67
more relevant and more challenging”. They report a registration RMS error of their
system at 1.6mm, but, admit that this error is the error of data fitting and that it was
difficult to ascertain the actual registration error and that their future work would
include “some kind of phantom study”. In our study we provide both the dynamic
tracking and a phantom accuracy study which will be covered in detail.
Raya et al. (Raya M. A. 2003) have proposed an AR prototype to replace the
traditional optical microscope view with a digital one. For their AR prototype, they use,
a) an infrared tracking system, b) two video cameras tracked by infrared LEDs. They
consider two types of error for their prototype error analysis: 1) object space error, 2)
camera calibration error. The first measure is obtained by finding the closest
approach between the point in object space and the line of sight formed by back
projecting the measured 2D coordinates out through the camera model. The second
measure is defined as the distance of the actual and image points as projected on the
screen, which is obviously not caused just by camera calibration error. They have
reported 0.2±0.15 mm and 0.4±0.2 pixel errors for what they have called “object
space error” and “camera calibration error,” respectively. Our experience has been
that typical tracking devices have on the order of 1mm of error. Moreover, they have
used the anatomical structures/points of the skull surface to measure the error, which
is, based on our experiences, very subjective and inaccurate.
Hattori et al.(Hattori A. 2003) have developed a data fusion system for the
robotics surgery system “daVinci,” composed of an optical 3D location sensor and a
digital video processing system. Their proposed system needs to be
calibrated/registered to calculate the transformation from the optical marker to the
68
camera. This extra step should be taken each time before the course of surgery. In
their paper, they don’t present a comprehensive results section and any accuracy
study of the daVinci system and their infrared tracking system to generate an AR
scene. Their system differs from what we propose here because it did not use the
robots kinematics and involved additional infrared hardware that needed active LEDs
in continuous line-of-sight view to an infrared camera. No rigorous error analysis was
reported in their report.
(Khamene A. 2003) have developed an AR system for MRI-guided needle
biopsy. Their main goal was to reduce or completely remove the need for the
interventional scanning (by a high field closed magnet MRI) as well as the need for an
open MRI scanner from the biopsy procedure. Their system consists of: 1) one video-
see-through head mounted display (HMD), 2) two video cameras attached to the
HMD to provide a stereo view of the scene, 3) a third video camera for tracking, 4) a
set of optical markers attached to the patient’s bed, 5) a set of optical markers
attached to the needle. In their analysis, they overlay the model of the skin of the
patient on the patient. They have reported an accuracy study as good as 1 mm for
the whole system. Our experience suggests that typical tracking devices alone are on
this order of error. They have pointed out that for a small number of cases where the
accuracy is substantially larger than 1 mm, the error were most likely caused by
needle bending. The line-of-sight problem is also a limitation for this type of tracking
method.
In their paper on a data fusion environment for multimodal neuronavigation,
Jannin et al (Jannin P. 2000) briefly experimented with AR techniques as applied to
69
the Zeiss Microscope. They used projected 2D contours in the focal plane of the right
ocular of the microscope. The main limitation that they noted was that no information
about structures before or behind this plane were visible and that different contours
could not be visualized with different colors, line widths, or labels. No error estimates
were provided for the augmentation technique they used.
(Iseki et al. 2001) used Augmentation techniques with endoscopes. They
present endoscopic, augmented reality (AR) navigation system. The system consisted
of a rigid endoscope with light-emitting diodes, an optical tracking system, and a
controller. Three-dimensional, virtual images of the tumor and nearby anatomic
structures (including the internal carotid arteries, sphenoid sinuses, and optic nerves)
were superimposed on real- time endoscopic live images. An interesting aspect of
their work was that as the device approached closer to the object of interest, the
object would change colors to reflect the distance. No error estimates were provided
for the AR scene generation.
(Masutani et al. 1998) also constructed an AR-based visualization system to
support intravascular neurosurgery and evaluated it in clinical environments. Three-
dimensional vascular models were overlaid on video images from X-ray fluoroscopy
by 2D/3D registration using fiducial markers. The models were reconstructed from 3D
data obtained from X-ray computed tomographic angiography using standard
techniques. Here, the virtual camera position (camera tracking) was calculated using
the coordinates of the fiducial markers so that the projected view geometry of the 3D
computer graphics corresponded to the X-ray fluoroscopy that they used. They report
an error of 3mm, but also state that these errors are computed using only pseudo-3D
70
estimation of errors, unlike the true 3D values for stereo camera measurements.
They report problems with very slow frame rates.
(Iseki et al. 1997) have developed an overlaid three-dimensional image
(Volumegraph)-guided navigation system that allows navigation during operative
procedures. The three-dimensional image is superimposed on the patient's head and
body via a semi-transparent mirror. The Volumegraph can display three-dimensional
images in the air by a light beam that is based on CT/MRI data. Based on clinical
application in 7 cases, the system was found to be advantageous because the
surgical procedures could be navigated easily by augmented reality in the surgical
field. Invisible parts of the surgical field were supplemented with the overlaid three-
dimensional images (Volumegraph) as if it were the virtual operative field. The
disadvantage of this work is that it relies on the printing and processing of CT or MRI
files into holographic films (which takes a day or so). The registration procedure and
error analysis is not well documented.
(Sato et al. 1998) in their paper describe AR visualization for the guidance of
breast-conservative cancer surgery using ultrasonic images acquired in the operating
room just before surgical resection. In their application, the 3-D position and
orientation of a video camera are obtained to integrate video and ultrasonic images in
a geometrically accurate manner. Superimposing the 3-D tumor models onto live
video images of the patient's breast enables the surgeon to perceive the exact 3-D
position of the tumor, including irregular cancer invasions which cannot be perceived
by touch, as if it were visible through the breast skin. The system was shown to be
effective in experiments using phantom and clinical data.
71
(Wagner et al. 1995) present a new visualization system for image-guided
stereotactic navigation in tumor surgery. The combination of frameless stereotactic
localization technology with real-time video processing permits the visualization of
medical imaging data as a video overlay during the actual surgical procedures. Virtual
computer-generated anatomical structures were displayed intraoperatively in a semi-
immersive heads-up display. This results in surgical navigation assistance without
limiting the judgment of the physician based on the continuous observation of the
operating field.
Another example for AR in the medical field has been reported the following
group (Fuchs et al. 1998;Fuchs et al. 1996;Livingston and State 1997). They used an
optical see-through display where the physician was able to view a volumetric
rendered image of the fetus overlaid on the abdomen of the pregnant woman. The
image appears as if it was inside the abdomen and is correctly positioned as the
physician moves within the environment.
A very interesting and useful AR implementation is for the craniofacial surgeon.
In this implementation (Patel et al. 1996) the surgeon was able to view the final
results of a surgery directly on the patient rather than only with the volume
visualization.
As illustrated with this sample of AR related work, many researchers have
worked on this new technology, however, there are very few examples of rigorous
error analysis of AR and there were no comparisons of AR with its older sister
technology –Image Guidance. In addition, there were no medical robotic kinematics-
based AR systems.
72
4.1.2 Research in Camera Calibration
In order to generate an accurate AR scene, one needs to set up a virtual
camera that models the actual camera accurately. There is a large body of research
in this regard (Abdel-Aziz Y. I. 1971;Heikkila 2000;Tsai 1987.;Wang L. 1991). Among
them a key paper is the one published by Roger Tsai (Tsai 1987.). We used the
camera model proposed originally by (Tsai 1987.) and used by (Weng J. 1992) and
refined by Heikkila (Heikkila 2000). Heikkila et al., also provided a very useful
implementation. Details of the camera calibration process (which is the foundation
upon which AR is built) are described in the implementation section of this chapter.
4.1.3 AR in Telepresence
The telepresence aspect of robotics allows the surgeon to perform surgery
remotely (Freysinger et al. 1997), . This may allow the surgeon in time-critical
situations to apply his/her skills to reach remote locations. Telepresence is a
technology that projects the operator’s motions and dexterity to a remote location
while providing tactile, visual and auditory feedback. This is a very challenging
operation for the user (Knight et al. 2003b) especially when the robot is remote from
the user and there are time delays. Under such conditions, it may be advantages for
instance to manipulate a virtual version of the robot arm and practice the operation
directly on a model of the arm. Researcher at the university of Toronto have built a
system (Drascic and Milgram 1996) for path planning using AR technology There are
others that have also used dynamic overlays with telepresence systems (Satava
73
1999) described the future of using telepresence technology for the use in a “digital
battlefield” type environment. Surgeons of the future may not have to live and work on
the battlefield. He claims that approximately 90% of the knowledge a physician
requires can be obtained through electronic means, such as diagnostic sensors and
imaging modalities, directly seeing the patient with a video camera for medical
consultation, or using electronic medical records. Using these modalities remotely
through a telepresence interface is a natural evolution of medical systems of the
future. In the methods section, detailed client-server software design is provided
which shows how the current IGS and AR system architecture can be used for
telepresence applications.
4.1.4 Live View with Registered Data
Combining video with graphics can be done in a number of ways. Once the
position and orientation of the camera and the objects is known an AR scene can be
generated. In computer graphics, AR is achieved by the alignment of the virtual
(graphical camera) with the actual camera. The techniques of texture mapping and
3D rendering are used to position the virtual segmented objects within the augmented
scene. In addition, a technique of chroma-keying can be used (Azuma 1997). The
background of the virtual scene can be set to a particular color (say orange). None of
the objects in the video scene should have that particular color. An algorithm which
takes all the orange areas and replaces them with the video view will produce a
picture which shows the virtual object on the video view. If the information of the 3D
coordinates of all surfaces in the scene was known, a depth search at each pixel
74
could be done to determine if the virtual object or the video object was closer. The
closer object would be drawn and the other discarded. In this type of display the
virtual object could then be displayed as behind the real object and vice-versa.
Although this form of visualization is beneficial in certain situations, it was not
preformed in this thesis because it is advantageous to see the projected image of the
objects of interest on the video view to perform, for instance, a craniotomy.
4.2 IMPLEMENTATION OF AUGMENTED REALITY
The steps taken to generate a Medical AR scene are illustrated in Figure 4-3.
After image data collection, segmentation of the objects of interest in the image space
allows the creation of 3D graphical models that are needed for augmentation. This
step can be performed for image guidance also, but, it is not a necessary step since
the orthogonal views of the imaging modality can suffice for navigation (and frequently
are the only ones used for navigation). The key differences between IGS and AR are
the camera parameter estimation, the camera mounting and the mixing of the live
video and graphics. Camera parameter estimation (camera calibration) to build up a
virtual camera that closely models the actual camera and pose estimation of the
camera needs to be done only once for the system, but, can be done periodically in
order to verify the camera system. These two steps will be covered in detail. When
generating the graphical view of the virtual objects the relative spatial positions and
orientations of the virtual camera and virtual objects at this step are the same as
those of the actual camera and actual objects of interest. Furthermore, the virtual
camera and virtual objects very accurately model their actual replicas. This allows the
75
graphical view of the virtual objects to be seen from the actual camera point of view
on the live video if the objects of interest are visible. The mixing of the live video (H.
Kato 2000) from the actual camera with the graphical view from the virtual camera
enables the surgeon to see the segmented objects of interest projected on the video
view from any perspective and at any depth.
Figure 4-3: These are the steps needed to generate both a Neuro Navigation (NN) System and an Augmented Reality System. Note that AR represents an extension to Neuronavigation and can be performed simultaneously with NN.
4.2.1 Coordinate Systems
At a conceptual level, there are several coordinate systems in AR that need to
be understood in order to correctly achieve the results. Figure 4-4 illustrates the
coordinate systems. The first coordinate system is the world coordinate. The robot
and all the other components reside in the world coordinate system. In dynamic
systems, the base of the robot is also tracked (sometime with infrared technology) or
aligned and fixed to the world coordinate system. In situations where the robot base
76
is mobile, there must be a means of tracking the robot base. The other coordinate
systems which include the robot base, it’s end-effector, the object(s), the camera, the
pattern used for camera calibration and the image/video all must have the correct
transformations to relate one to the other as shown below.
Figure 4-4: A series of coordinate systems for the AR development
4.2.2 Robotic-based Tracking of the Camera
We have researched view augmentation of a camera that can be mounted at
the end-effector of a robot. The live views from the robot end-effecter camera will
have synthetically generated 3D graphical views of the structures overlaid on them.
We have chosen to implement the AR system using the same hardware platform as
the IGS system (described in chapter 3) to make comparison and implementation
easier. As a testbed for the robotic-based AR system, we have mounted a miniature
camera (Watec LCL-628 7.5 mm diameter 330 TV Lines 3.9mm lens f2.8 (H-52º, V-
40º) ) on the end-effector of the Microscribe device. This device can be considered a
Object
World
Robot Base
Camera
Image
End-effector Calibration Pattern
77
passive robot (articulated arm). It’s geometric and transformation structure similarities
make it an inexpensive and useful analog to current active robotic systems. It has the
advantage of being readily accessible and amenable to quick prototype development
and evaluation. We have integrated the serial interface to the five degree-of-freedom
Microscribe and can get a precise position and orientation of the camera to the AR
interface and have generated a prototype Augmented Robotics system using this
device.
The methods to ascertain the end-effector position and orientation is exactly
the same as described in the chapter on image guidance (section 3.2.1) where details
of how the forward kinematics solution (i.e. the end-effector of the robot coordinates in
terms of the base coordinates) is defined. Each individual transform specifies how
the first joint is related to the second joint. The combination of the 4×4 homogeneous
transformation matrices, define the position and orientation of the end-effector in the
base coordinate system as follows:
EEJJJJJJJOJOBEEB TTTTTT −−−−−− ××××= 332211 (Equation 4-1)
Additional transformations are needed to compute the needed AR
transformation. A relationship between the object (O) that has to be augmented and
the camera coordinate system (C) needs to be derived ( COT − ). This transformation is
computed by knowing the transformation from the object to the base of the robot
( BOT − ), the measured transformation from the base of the robot to the end-effector
( EEBT − ) and also the transformation from the end-effector of the robot to the
coordinates of the camera ( CEET − ) as follows:
78
CEEEEBBOCO TTTT −−−− ××= (Equation 4-2)
The computed relationship of equation 4-2 will allow the alignment of the actual
camera coordinates with the coordinate system of virtual graphics camera. One of the
key registrations that need to be done is object (patient) registration ( BOT − ) (object to
base of robot). This transformation is computed using a pair-point matching that is
also done in standard neuronavigation systems and is described in detail in section
3.2.2. Knowing a set of matched pair-points (we use 5 pair-points, the minimum is 3)
between the actual patient (phantom) and the tomographic image space, an iterative
algorithm to optimize the transformation parameters can result in the needed
transformation. The Levenberg-Marquardt optimization method has been shown to
provide a fast convergence and we have used this algorithm in our implementation of
pair-point matching. In the next section, details of the method (based on the pattern-
based estimation of the extrinsic camera parameters) for the very essential
computation of the transformation CEET − , will be provided. This is the key
transformation because it computes exactly how the end-effector of the robot (a
measured entity) is related to the mounted camera system.
4.2.3 Computing the Pose of the Camera Relative to the End-Effector (TEE-C)
In some AR systems, the transformation from the tip of the tracking device to
the CCD array of the camera is determined by manual methods in a very non-
systematic way. In these systems, the real object is placed in view of the camera and
an augmentation is done. The model is then manually adjusted until it is aligned.
Then different views are chosen and the steps are repeated until it appears to be
79
correct. These methods do not achieve robust results with registration occurring at a
subset of locations and orientations. This transformation can also be measured,
however, the results are again not robust (Azuma 1997).
Figure 4-5 illustrates the different coordinate transformations that need to be
performed. In this work, we take advantage of the estimation procedure to calculate
the transformation matrix from the pattern to the camera coordinates systems ( CPT − ).
First, the pattern used for camera calibration, is rigidly fixed relative to the base of the
Micorscribe. Then the location and orientation of the pattern’s coordinate system
(relative to the robot’s base) was calculated using three collected points on the
pattern (digitized with the Microscribe). The first point ( oP ) as the origin of the
coordinate system, the second point ( xP ) was a point along the x-axis and the third
point ( yP ) was a point along the y-axis of the pattern coordinate system. At this point,
the vectors in the x and y directions ( xV ) and ( yV ) were computed as follows:
|| 0
0
PPPP
Vx
xx −
−= (Equation 4-3)
|| 0
0
PPPP
Vy
yy −
−= (Equation 4-4)
The vector in the z-direction was computed simply as the following cross
product.
yxz VVV ×= (Equation 4-5)
80
These three vectors ),,( zyx VVV along with the point oP form the 4×4
homogeneous transformation matrix from the base coordinate to the pattern
coordinate systems ( PBT − ) in the following way:
],,,[ 0PVVVT zyxPB =− (Equation 4-6)
Figure 4-5: The transformations needed to compute an AR scene. The main transform ( CEET − )- The transform from the end-effector to the camera coordinate system) is the primarytransformation matrix that is computed for AR.
The above transformation matrix ( PBT − ) can be described in terms of the
projection of the unit vectors of the pattern coordinate system onto the base
coordinate system and a transition vector from origin of the pattern to the origin of the
camera coordinate systems. Knowing this transformation from the base of the robot to
XTB-
P
TJ0-J1 Y
Z
TJ1-J2
TJ2-J3
TJ3-EE TP-C TB-EE
TEE-C
Camera
TB-J0
Base
End Effecto
TB-P
Object
TEE-C
TB-EE TO-C
TO-B X
81
the pattern ( PBT − ), knowing also the transformation from the base of the robot to the
end-effector ( EEBT − ) (which is measured by the robot) and also knowing the
transformation from the pattern to the camera coordinates (to be covered later-- CPT − ),
the needed transform from the end-effector of the robot to the camera ( CEET − ) can be
derived with Equation 4-7. How to compute the transformation from the pattern to the
camera is covered in detail in the next section.
CPPBEEBCEE TTTT −−−−
− ××= 1 (Equation 4-7)
4.2.4 Camera Calibration Used to Determine TP-C
Camera calibration is a very important issue for AR and hence is described in
brief here for clarity (Tsai 1987.). The camera calibration algorithm is used in this
research to (1) model the actual camera to be used for graphical view generation and
(2) estimate the transformation matrix between the end-effector and the camera
coordinates by using the technology to compute the transformation CPT − .
Figure 4-6 represents a camera model and the associated variables for the
estimation of the needed parameters.
Camera parameters are divided into intrinsic (focal length, principal point, skew
coefficient, and the radial/tangential distortions) and extrinsic parameters (the position
and orientation of the camera). The extrinsic parameter matrix ( wcT ), represented by
Equation 4-8 by the matrix of r and t parameters, is needed to transform objects from
the world-centered to the camera-centered coordinate system.
82
Figure 4-6 Camera Calibration Model. Objects in the World Coordinate System need to be transformed using two sets of parameters - extrinsic and intrinsic parameters.
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡=
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡×=
1
:
3333231
2232221
1131211
w
w
w
c
c
c
wcwc zyx
trrrtrrrtrrr
zyx
PTP (Equation 4-8)
The intrinsic and the extrinsic parameters are derived from a 2-step process
that first involves a closed-form (pinhole model) solution to approximate the
parameters and then an iterative non-linear solution to obtain accurate parameters. In
a pinhole model of the camera (which ignores the radial and tangential distortion
parameters), each point in the world coordinate system is projected via a straight line
through the projection center to the image plane. This model of the camera system
only approximates the real camera projection as follows:
Ow (World Coordinate System)
m: (Camera model/Intrinsic parameters)
v
u zc
xc
yc
r
c
Oc (Camera Coordinate System)
Twc (Extrinsic Parameters)
Oi (Image coordinate System)
xw
zw
yw
83
⎥⎦
⎤⎢⎣
⎡=⎥⎥⎦
⎤
⎢⎢⎣
⎡
c
c
c yx
zf
v
u (Equation 4-9)
Note here that u and v are the coordinates of the projection of the
imaged/viewed point on the camera CCD.
By using the pinhole model, the initial estimates to the minimization process to
be described can be provided by using a direct linear transformation (DLT). The DLT
was first described by Abdel-Aziz and Karara (Abdel-Aziz Y. I. 1971). In the first step,
given N control points in the world coordinate system ),,( www zyx and their
corresponding CCD array points ),( ii vu , the camera parameters of interest in vector a
can be computed by solving the following equations:
M =
⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
−−−−−−−−
−−−−−−−−
−−−−−−−−
nnnnnnnnnn
nnnnnnnnnn
iiiiiiiiii
iiiiiiiiii
1111111111
1111111111
vvzvyvx1zyx0000uuzuyux00001zyx
vvzvyvx1zyx0000uuzuyux00001zyx
vvzvyvx1zyx0000uuzuyux00001zyx
MMMMMMMMMMMM
MMMMMMMMMMMM
(Equation 4-10)
Where 0=Ma and ][ 333323122322211131211 trrrtrrrtrrra = (Equation 4-11)
The parameters vector a can be estimated with least square method, since
equation 4-11 is homogeneous (right-side is a zero vector). Note that this solution is
based on a pinhole model, lens distortion is not considered. This step only gives an
initial guess for the iterative nonlinear optimization method discussed next.
84
A brief overview of typical non-linear optimization/calibration process is given
here. Let ),,( www zyx represents the coordinates of any visible point P in a fixed
coordinate system (world coordinate system) and let ),,( ccc zyx represents the
coordinates of the same point in a camera-centered coordinate system. It is assumed
that the coordinates of the camera-centered coordinate system coincide with the
optical center of the camera and the cz axis coincides with its optical axis. An
extension to the standard calibration methods that does not take this simplification
into consideration is provided in our conference paper (Siadat M. 2002). The image
plane, which corresponds to the image-sensing array, is assumed to be parallel to the
),( cc yx plane and a distance of f to the origin (See The Tsai’s model (Tsai 1987)
governs the relationships between a point of the world space ),,( www zyx and its
projection on the camera CCD ),( ii cr . R (the rotation component) is a 3×3 rotation
matrix jir , defining the camera orientation and ),,( 321 ttt the translations component.
The following sets of equations represent the camera model that is used. Here, the
translation and rotation components (extrinsic parameters) and the intrinsic
parameters combine to form the model of the camera system (See Figure 4-6) as
follows:
vsccusrrzyfvzxfu vucccc =−=−== 00 ,,/,/ (Equation 4-12)
)ˆˆ(ˆˆˆˆˆ)(ˆ 221
214
231
33,32,31,3
13,12,11,1 vuukvgvuguggutzryxxrtzryrxr
www
www ++++++=++++++
(Equation 4-13)
)ˆˆ(ˆˆ)(ˆˆˆˆ 221
2423
22
33,32,31,3
23,22,21,2 vuvkvggvugugvtzryxxrtzryrxr
www
www ++++++=++++++
(Equation 4-14)
85
Where, 0,0,ˆ,ˆ 00 ><−
=−
= vuvu
fffccv
frru and g1, …, g4, and k1 the tangential
and radial distortions.
These equations need to be formulated in an objective function so that finding
the optimum of the functions leads us to the camera parameters. The camera
parameters are r0, c0, (image centering) fu, fv, (focal length) , radial and tangential
distortion (all intrinsic parameters) and wcT (extrinsic parameters). Where, m is the
Tsai’s model of distortion-free camera, )ˆ,ˆ( ii cr is our observation of the projection of the
i-th point on the CCD, and ))(),(( mcmr ii is its estimation based on the current estimate
of the camera model. The above objective function is a linear minimum variance
estimator (Press W.H. 1992). Note that we have n observed points in the world space.
Our objective function is as follows:
[ ] [ ]{ }∑=
+++=n
iiiij mccmrr
1
222 )(ˆ)(ˆχ (Equation 4-15)
Figure 4-7 illustrates the optimization of these parameters. n pairs of points are
collected for which the world coordinates )( www zyx and the CCD coordinates )( ii cr
are known. An initial guess of the extrinsic parameters is provided by the DLT method
described above (Equation 4-10, 4-11). After an initial guess, the camera model
parameters ( a ) are used to compute model estimated values of (r, c) for the entire
data set (Equations 4-12 to 4-14). These values are then compared with the
measured values using the objective function (Equation 4-15). If a certain tolerance is
met, the final parameters are reported. The Levenberg-Marquardt optimization
method has been shown to provide a fast convergence. It works very well in practice
86
and is considered as the standard of non-linear least-squares routines (Press W.H.
1992). Note here that the extrinsic parameters that are estimated here (R and T) form
the matrix needed in Equation 4-7, that is ( CPT − ). The entire camera calibration
procedure just described is used to compute this very important matrix. After this
matrix is computed, it allows the computation of the transformation from the end-
effector of the robot to the camera coordinate system. More details of the camera
calibration technology can be found in the publication (Heikkila 2000).
Figure 4-7: Camera Parameter Estimation. An initial guess of the extrinsic parameters comes from the DLT method. The observed CCD array points and the corresponding computed values are compared to determine if they are within a certain tolerance. If so, the iteration ends
4.2.5 How to measure AR accuracy
There are many components of errors that are observed in an AR simulation.
In this thesis, we have tried to explain where the errors of the observed AR scene
have come from. There are several sources which we have considered which include
complexities of camera calibration, the accuracy of tracking, and the imaging/3D
(observed pairs of values)
(xi, yi, zi) ↔ (ri, ci)
Model (M)
Updating Parameters
∆
r(M), c(M)
Initial Guess (a0)
ai
Within Tolerance?
Intrinsic and Extrinsic
parameters
Yes
No
Distorted Camera Image
Corrected Camera Image
87
segmentation. In order to measure the accuracy of the AR system with imaging data,
we first needed to evaluate the error of the AR portion of the technology. In order that
we isolate the error of the AR system from the errors inherent in imaging and
segmentation, we first choose a method that did not involve either imaging or the
segmentation components. We then performed a similar experiment using a phantom
skull imaged with CT data. For the non-imaging test, we used a stereotactic phantom
ring for the evaluation of the error of the system. This ring is equipped with a
moveable pointer for which the 3D position can be accurately read. The phantom is
known to be mechanically very accurate (within 0.5 millimeter) and has been used as
a gold standard in several of our previous accuracy studies that measured the
accuracy of robotic devices and also standard stereotactic systems (Li Q. 2002).
This device allowed us to pinpoint markers in the video image using the stereotacic
phantom’s 3D position pointer. We first choose several points on the surface of the
ring for which we knew the exact location relative to the ZD ring’s coordinate system.
These were the points that were used for the pair-point matching in order to
determine the pose of the ring relative to the Microscribe’s base. We then created a
model of a cube with known dimensions and placed it at the center of the phantom
space. We first placed the phantom’s pointer at the known location of each of the
corners of the cube. We then viewed the corner of each of the cube points from
orthogonal view directions to determine the error in each individual axis. In order to
compute the error for each point, we needed to know the distance from the apex of
the pointer to the corner of the virtual cube in all three axes. If we were to apply the
distance equation (equation 4-16 below) we would get the error for that point.
88
However, we took three orthogonal views of the cube corners and each view
contained two of the error measures. For instance, a view from the x direction would
give us the errors in both the y and z axes. In order to derive the needed equation
(equation 4-16) we used all three orthogonal views that we captured and determined
the two deltas for each view (See Figure 4-9). The distance squared in which the
camera was positioned on each of the orthogonal axes (x, y and z) was computed as
follows:
222)( zyxerrord ∆+∆+∆= ( Equation 4-16)
It is clear from these equations that the actual error reduces down to equation
4-16 after taking in to consideration all three views of the cube corner. Note that each
axis considered twice and once the division by 2 is performed, the actual distance can
be computed as follows:
222222222 ;; yxzzxyzyx ∆+∆=∆∆+∆=∆∆+∆=∆ (Equation 4-17)
2/)()( 222 zyxerrord ∆+∆+∆= (Equation 4-18)
Since the orthogonally of the view direction was critical, we positioned the
camera until the axis orthogonal to the image plane of the cube disappeared. In
Figure 4-9 panel D, the robot arm is shown positioned to view ZD arc’s pointer. The
actual views from the camera are then augmented with a virtual cube positioned at a
known location. The apex of the pointer is then positioned to the know location of a
particular corner of the cube being viewed. If the cube is augmented at the correct
position, the apex of the pointer would point at the corner that is being viewed. The
captured video frames were analyzed to ascertain the pixel error observed from
89
where the actual cube corners should be located (on the apex of the pointer) and
where they are in the augmentation (cube corner). The error is displayed in the view
as black dotted lines. This distance represents the 2D error from that view (See
Figure 4-9).
The deviation can be easily seen and measured very accurately both in term of
pixel error and actual mm error when scaled from a known distance in an image. The
cylindrical geometry of the ZD frame pointer provides us with a scaling reference to
transfer from the pixel-wised error measurement to an absolute value error space.
The diameter of the pointer is exactly 5mm and hence, a pixel error can be easily
converted to a distance measure by a computed scale factor. Several different sized
cubes were constructed to represent the operational volume of the AR system.
Figure 4-9: A cube is augmented on the live video from the Microscibe. Three orthogonal views are used to compute the error: (A) represents a close-up view of the pointer (the known location of the cube corner) with the video camera on the x axis, (B) the pointer here is viewed from the y axis and (C) the pointer viewed from the z axis. (D) Represents an oblique view of the scene which shows the pointer, the camera and the video view on the monitor showing the cube superimposed.
Figure 4-8: The ZD ring. This device is a
(A) X View (B) Y View
(C) Z View (D) Oblique
Y & Z Error X & Z Error
X & Y Error
90
4.2.6 Software Architecture
The major subsystems of the AR software implementation are shown in Figure
4-10. Here, the server software supplies each client (AR or IGS) on the network
position and orientation information. The camera system (which is mounted on the
robotic device) provides the video stream for both the AR clients as well as the
camera calibration. The server software as well as the registration software remains
identical to the IGS system described in the previous chapter. It is in theory possible
to supply the video stream data to any AR client on the network and have the AR view
displayed remotely. This feature is not fully-implemented for this thesis as it is more
related to telepresence and is beyond the scope of this work. It is possible, however,
to have both an IGS scene and an AR scene generated together in near real-time
while providing multiple IGS clients on the network. The AR system architecture is
similar to the one described in chapter 3 for the IGS system, except it involves a new
AR client and additional camera information.
91
Figure 4-10: Software implementation of the AR system with the Image Guidance System. Here, the Kinematic server supplies position orientation to both the AR and IGS clients. Each client has their own models of display preloaded. With this software architecture, simultaneous AR and IGS are possible.
4.3 AR SYSTEM ACCURACY
This section provides an in-depth review of the errors involved in AR
technology. Error is of critical importance in the medical field and especially in
neurosurgery where a few millimeters of error can impair a patient. We dissect the
observed error to its components in an attempt to identify avenues of error
improvement for future systems.
There are two types of errors in AR—static and dynamic. Static errors, as the
name implies, refers to the errors that exist in the augmentation even if the user/robot
is still. In any AR system, there are at least four different types of static errors: 1.
Optical distortion, errors in tracking systems, mechanical misalignments and incorrect
Tracked Video
Stream
Robotic Kinematic
Server
Object Registration
Software
Image
Guidance Clients
Augmented Reality Clients
Camera Calibration
Data
Robotic Hardware
Mounted Camera System
3D Models or
Sensor Data
92
field of view or camera parameters (Azuma R. 1997). We have considered each type
of error in this study. Optical distortion is part of every camera system. Radial
distortion has the largest component. It is dependent on the radial distance from the
optical axis and hence wide field-of-view systems are more prone to this type of error
(Heikkila 2000). Errors that are the result of tracking hardware are often time the
principal problem. Because the tracking system is used for not only the tracking, but
also for the calibration of the camera and (in the case of pair-point registration) for
determining the position of the objects in space, tracking errors are of significant
importance.
Dynamic errors occur due to a lag in the system’s ability to compute and fuse
the required frames. Generally, the computation is not real time and for very dynamic
movements, the system results in a lag between the real scene and the computed
scene. This type of error is not considered here as it is a function of hardware
speeds. For medical applications, generally, the surgeons do not move very fast and
the update rates that we observe are adequate. In this section, we first describe the
accuracy of the Microscribe system, the accuracy of the camera calibration
procedure, and then discuss the contribution of all the individual errors to the
application error of the entire prototype.
4.3.1 Accuracy of the Microscribe
In order to gage the accuracy of the Augmented Reality scene, we first
determine the accuracy of the measurement device the Microscribe. First, 40
measurement were taken of the same point using the Microscribe. It was found that
93
the average rms difference between all these points was 0.732mm. Another
experiment in which a known 50mm distance was measured using the Microscribe
ten times was performed. The average error for this measurement was 0.841 mm.
Although a more rigorous accuracy study could be conducted, we are confident that
the error of the Microscribe is within 1mm.
4.3.2 Accuracy of Camera Calibration
In camera calibration, a set of parameters is defined which serve to model the
optics of the actual camera. These measures are derived by the minimization of error
as described in the Methods section. For the camera we have used the set of
computed parameters values shown in Table 4-1. These parameters allow for the
correction of various image distortions that occur with this camera. The major
parameters of this camera model are the focal length, principal point, and the
radial/tangential distortions. After using the modeled values, the image can be
regenerated to produce the undistorted image. When this undistorted image is
compared to the ideal image (based on the grid pattern), a pixel error can be
computed for the images. The average pixel error reported are as follows: x axis
0.350 y axis 0.356 pixels. Figure 4-11 below illustrates the errors of the distorted
image. The contour lines in the images represent the error boundaries at that location
in pixels. Notice that for the radial distortion (left panel of Figure 4-11) at the center
of the image, the errors are all less than 5 pixels and converge to 0 pixels at the very
center. Notice also the errors at the corners of images are in the range of 25 pixels of
error. The tangential components (right panel of Figure 4-11) are in the range of about
94
1 pixel and plays a minor role relative to the radial distortion. Hence, the overall
camera calibration errors are primarily due to radial distortion.
Figure 4-11 The errors of the distorted image. The contours represent error boundaries. (left) Note that for
radial distortion at the center there are less than 5 pixels of error and at the corners the errors exceed 25 pixels of error. The Tangential distortion is an order of magnitude less than the radial distortion.
Table 4-1: Example of camera calibration coefficients
Pixel Error (0.3500, 0.3562) +/- [16.72, 13.55] Focal Length (704.643, 708.079) +/-[10.52, 15.94]
Principal Point 0 +/- 0 Radial Coefficients (-0.288000, -0.003537) +/- [0.0616, 0.4255,0]
Tangential Coefficients (-0.0003991, 0.001116) +/-[0.005146, 0.002516] 4.3.3 Total Application Error Dependencies
The total error of the system was computed both with and without using image
data. First, an error measurement was made without using imaging data. For this
method, the average error over the 10 measurements taken across the field of view of
the camera and covering the maximum extent of the phantom was 2.74 mm with a
Radial Component of Distortion Tangential Component of Distortion
95
standard deviation of 0.81mm and a max error of 4.05 mm. In order to ascertain the
additional error introduced by including the imaging data, another experiment in which
10 fiducials visible in the CT scan were digitized and the location of these fiducials
were recorded. One mm spheres were then modeled and placed on top of each of
the markers at the location that was predicted by the augmentation. The pixel error as
well as the scaled mm error was computed for each of the fiducials. The measured
error of the location of the fiducial in the AR scene was on average 2.75mm with a
standard deviation of 1.19 and a max error of 5.18mm. The effect of adding the CT
scan data was negligible, although a slightly higher standard deviation and max error
was observed.
Figure 9 below illustrates the individual component errors involved and their
dependencies and contributions to the overall application system. The total
Augmented Reality error is dependent on several different types of error. The tracker
(Microscribe) is a key source of error and it influences several other error estimations
(Camera pose determination and Registration error). Registration error (the error
involved in determining the transformation from the base of the tracker to the objects
being augmented) is dependent on how accurately the actual points on the patient
can be captured. In this method, actual points are captured and matched with their
image counterparts, hence, digitizer accuracy is an important factor. In addition,
registration error also involves human digitization error that relates to the fact that the
human can only achieve the actual point with an accuracy of 0.43 mm on average.
We ascertained this error by digitizing the same point from several different angles
(10 different values) and report the variation in the measured values. In addition, the
96
center of a typical fiducial is not represented by a single point, but by a circle about
0.5mm in diameter. This geometry can also lead to digitization errors. A better
fiducial design would be to have the center of the fiducial be a cone (rather than a
cylinder) that culminates to a single point. Furthermore, the computation of the
camera pose relative to the end-effector of the tracker requires the use of
digitized/tracked data also. In order to determine the camera pose relative to the end-
effector of the microscribe, we used a method that required both the extrinsic camera
parameters to be computed using the camera calibration techniques and also the
tracker end-effector transformation information (See the Methods section). Since it
was not possible to measure the actual camera position and orientation with the
precision required, we computed the pose matrix from 5 different camera angles and
report the standard deviation of the measured values. The homogeneous
transformation matrix ( CEET − ) was first decomposed to Euler angles and a position
vector. For the position component, the values had a standard deviation of 0.13, 0.30
and 0.13 mm in the x, y and z axes. Similarly the x, y, z angle (Euler angles) standard
deviation was 0.83, 1.25 and 0.44 degrees. These standard deviation values indicate
that there is an RMS mm standard deviation of about 0.35mm. Although the RMS
error for the angle measurements exceeds 1.5 degrees, this represents a spread of
error over all the computations. We cycle through the various computation and select
the one that produces the best AR result. Minor adjustments to the models can then
be made to then correct residual errors for the viewing volume if necessary. For
camera calibration, there was a pixel error of approximately 0.35pixels. This
corresponded to an error of 0.21mm of error that could be attributed to errors in the
97
camera calibration. Imaging data plays a critical role in the accuracy, however, we
have found no significant distortion error from the CT we have taken. We conjecture
that MRI or Ultrasound data will have more distortion and would have a greater
influence on the application accuracy. We have accounted for the majority of the error
(2.43mm of the total 2.75 mm) of the observed error. It is possible that the imaging
data has played a role in producing some of the unaccounted error of the system.
Figure 4-12 Errors contributing to the total error involved in Augmented Reality
4.4 DISCUSSION
Currently Neuronavigation systems provide primarily three 2-D views (coronal,
axial and sagittal views) to gain awareness of the patient geometry. The surgeon has
to perform the 2D (image) to 3D transformation in their minds and also project the
envisioned data on the view of the patient. We believe that Augmented Reality (AR)
Total Augmented Reality Error
(Avg: 2.75 mm)
Tracker Error
(Avg: 0.87mm)
Object Registration
Error (Avg: 1mm)
Camera Pose
Error RMS .35 mm
Camera Calibration
Error (0.21mm)
Imaging Data
2mm slices of CT (no apparent error)
Human Digitization
Error (Avg: 0.43mm)
98
generation is a natural extension for the surgeon because it does both the 2D to 3D
transformation and projects the views directly on the patient view. We have already
illustrated the difficulty to interpret 2D slices. In Figure 1-2, a simple 3D shape like the
cube is represented by a triangle in the coronal slice and a vessel is represented by
two dots in the axial view. This does not seem like the natural method of visualization.
This was part of the motivation for this work.
This chapter was focused on a prototype development and accuracy
evaluation of a medical Robotic Augmented Reality system. We used a passive
articulated arm (Microscribe) to track a calibrated end-effector mounted video camera.
In our prototype, the user, after registration, is able to navigate in and around a
phantom and in real-time able to visualize the objects of interest highlighted with wire
frame models from any angle and distance. In this case, we superimpose the live
video view with the synchronized graphical view of CT-derived segmented object(s) of
interest within a phantom skull. Since the accuracy of such a system is of critical
importance for medical use it was considered in detail. It is to be noted that the errors
here represent static error. Errors due to brain or organ shifting are not considered.
Intraoperative imaging methods (Open MRI or ultrasound) need to be integrated for
dynamic changes during surgery. We analyze the individual contributions and system
accuracy of the prototype. The AR accuracy mostly depends on the accuracy of: (1)
tracking technology (2) camera calibration (3) the image scanning device (e.g. CT or
MRI scanner) and (4) the object segmentation. The various accuracy measurements
are shown in Figure 4-12. After using data from a 2mm thickness CT scan the AR
error was measured at 2.75mm with a max error of 5.2mm. This error is on the
99
borderline of acceptability for neurosurgery applications (one of the most demanding
in terms of accuracy requirements).
The overall accuracy of the system can be improved in several ways. A higher
fidelity robotic tracking system could be used. The tracking device in this case
contributes about one third of the error because it is involved not only in the tracking
of the camera, but also for the registration of object space relative to its base. In the
registration phase, at least 4 points have to be digitized using the tracker. The
process of camera calibration can also be improved by choosing a more accurate
algorithm, e.g., our proposed method (Siadat M. 2002). From this study, we have
built a prototype that approaches the error requirements imposed by Neurosurgery.
We have also given details of the implementation of this prototype such that it can be
recreated. We conjecture that medical robotic devices of the future should be able to
use this technology to directly link these systems to patient data and provide the
optimal visualization of that data for the surgical team. The design and methods of
this prototype device can also be extrapolated for current medical robotics systems
and to Neuronavigation systems.
For neurosurgery the acceptable error is approximately 2-3mm. Our prototype
approaches the accuracy requirements. The accuracy can be improved with a higher
fidelity robotic tracking system and improved calibration and object registration. The
design and methods of this prototype device can be extrapolated for current medical
robotics and neuronavigation systems. It has already been translated for a space
station robotics application.
100
For the purposes of this thesis, the implementation of the AR system is
adequate for the main task of comparing the relative advantages of this new
technology with image guidance (developed in the previous chapter). The next
chapters will show the details of our Human Factors studies.
101
Chapter 5: Surgeon Factor Testing
"Nothing succeeds like a good display." Don Norman
It is not enough to develop new technology. The technology must be verified in
terms of accuracy (especially in the medical domain) and it must also be tested with
end-user/subject testing to verify if the new technology positively affects the
performance of the user (Drascic and Milgram 1996;Thompson et al. 1998;Walsh and
Beatty 2002;Weinger et al. 1998). In this phase of the thesis, there are two different,
but related Human Factors tests that were performed to understand if the developed
visualization technologies improve the performance of the surgeon. The two tests were
(1) a test to determine if an Augmented Reality display interface to surgery increases
performance and decreases the error rates of surgeons as compared to an Image
Guided Surgery system display (2) a comparison of display hardware for the video
stream that is being viewed from the remote site. In order to perform the first
comparison, a state-of-the-art image guidance system (as described in chapter 3) and
an Augmented Reality system (as described in chapter 4) were both developed using
the same hardware platform. Subject testing was performed to compare the two
systems. The second study is related to the method of display of the remote video data
to the end-user. The main question is: Does visualization of the remote video at the
surgical site by a Head Mounted Display improve the performance of the test subject
over viewing a monitor? The current robot vendors provide three variations for the
surgeon interface (See Figure 5-1). In the daVinci Medical Robotic System, the
102
surgeon is completely immersed in his world (middle panel of Figure 5-1). Here, the
surgeon must sit at a terminal, and place his face inside the display device to get a
stereoscopic view of the remote surgical site. In the Zeus system by Computer Motion,
the surgeon views the remote surgery using a stereoscopic monitor or can also choose
a heads-up-displays (HUD) where the surgeon can glance up to see her remote view
(without moving her head) and see her worksite and hands from under the monitors of
the HUD. Hence, the question of whether HUD display offers any advantages over a
monitor for viewing a remote site is an important question for Robotic surgery(Drascic et
al. 1989). In the next two sections, the two studies (AR vs. IGS) and (HUD vs. Monitor
viewing) will be presented in detail.
Heads Up Display Immersive Monitor V
Figure 5-1: Computer Motion (Zeus Robot) provides either a Heads-up Display view of the surgical site or a monitor view. The Davinci robot provides an immersive stereoscopic view of the remote video. Which configuration provides the best performance?
5.1 Image Guided Surgery vs. Augmented Reality—The Human Factors.
In this section, details of our Human Factors testing comparing the current state-
of-the-art technology (Image Guided Surgery) with our future vision on the technology
(Augmented Reality), is explained. First, an introduction and motivation is given for
103
conducting such a study, next, the methods used for the study are discussed along with
the results and conclusions drawn from our experiments.
5.1.1 Intoduction/Motivation
We could find no studies that have compared Image Guidance Technology with
Augmented Reality technology using human subjects. We feel it is important that
technology development (especially medical technology) be followed with (or be
developed in conjunction with) subject testing and evaluation (Taylor-Adams et al.
1999;Terazzi et al. 1998). This process will not only prove (or disprove) the utility of the
newly developed technology, but, help in the development of optimal user interfaces
and assist with the selection of relevant data and the appropriate display formats.
As described in chapter 3, we have developed an Image Guidance System able
to show the end-effector of a passive articulated arm (Microscribe, Immersion
technology) on orthogonal CT scans and 3D model of the phantom. Image guidance is
now a standard practice in the field of Neurosurgery. Surgeons performing complex
neurosurgery cases rely on the technology and would (or should) not attempt the
procedures (due to patient safety and liability issues) without it(Bernstein et al.
2003;Carthey et al. 2001;Cuschieri 2003). As described in Chapter 4, we use the same
passive articulated arm (Microscribe, Immersion technology) to track a calibrated end-
effector mounted video camera. In real time, we superimpose the live video view with
the synchronized graphical view of CT-derived segmented object(s) of interest within a
phantom skull. Both the AR and the IGS systems have been shown to be accurate to
within 3mm (Li 2000;Li Q 2000;Li Q 2001;Pandya A.K. ). In this section, the details of a
Human Factors study that was conducted in which twenty-one subjects (including 3
104
surgeons) were tested using the techniques of Human Factors Engineering are
presented (A. Pandya 2003;Pandya A.K. 2003b).
It is our premise that using an image guided system produces a substantial
mental load on the surgeon who has to interpret 3 orthogonal slices to form a picture of
the 3D geometry and then mentally fuse that information onto what he is viewing
(Pandya A.K. 2001e;Pandya A.K. 2003b). We believe, excessive mental load can lead
to fatigue, error and longer surgical (and organ exposure) times all of which may be
linked to the safety of the patient. This was the motivation to develop AR technology
that could help the surgeon fuse the 3D information on a live view of what he is
observing and alleviate some of the mental load and test it against the standard practice
of Image Guidance. In addition, current techniques of image guidance do not allow the
surgeon to use both real and synthetic data simultaneously. This section is focused on
the human factors analysis comparing a standard Image Guidance system with an
Augmented Reality system. This evaluation was undertaken to try and support or refute
our hypothesis that Augmented Reality is a viable technique and can improve the
performance of the surgeon over the current techniques of Image Guidance.
5.1.2 Method
For this study, there were 21 subjects, 7 female and 14 males, ranging in age
from 25 - 45. There were 3 surgeons involved in the study. The subjects were seated
in front of the Microscribe arm and a phantom skull. They were asked to manipulate the
arm in and around the phantom skull and given detailed instructions on how to work the
device. First the subjects were given time to train on both the IGS system and the AR
system. The training sessions consisted of a series of two trials and lasted as long as
105
the subjects required getting familiar with the system; typically about 10 minutes each
(although it took longer for the IGS). The actual tests were conducted in 2 trials (AR
and IGS) with a 5 minute rest between each trial. The order of administration of the AR
portion and the IGS portion for both the training session and the actual test were
counterbalanced (the orders were alternated). The total test lasted about 1 hour. During
each of the actual trials, each subject was first asked to locate certain objects inside the
un-opened phantom skull and asked to then draw an opening (craniotomy) for that
object from a particular location on the skull surface on the “surgical drape” wrapped
around the outside of the skull. The object that they were asked to draw for both the AR
and IGS were different objects of similar size and geometrical complexity. Drawing the
craniotomy is typically one of the very first steps (and a very critical one) during
neurosurgeries. If the opening is not correctly positioned, full-resection is
compromised, not to mention the patient’s safety. A similar issue of optimal port
placement also exists for robotic surgery. Hence, this task was chosen for testing. In the
AR system, the subjects were asked to position the camera away from the skull using
their non-dominant hand, locate the overlaid object in the video/object overlaid view at
the location specified by the test, place their marker in the video view with their
dominant hand and draw the object on the surface of the skull (See Figure 5-2).
For the Image Guided System, they had to determine the extent of the objects in
each of the orthogonal slices and then draw the outline of the object from that
information. Image Guidance systems primarily use three 2-D views (coronal, axial and
saigiittal views) to gain awareness of the patient geometry. The subjects had to perform
the 2D (image) to 3D transformation in their minds and also project the envisioned data
106
on the view of the patient. shows several subject screen shots during an IGS session at
various pointer locations from which he can ascertain the location and shape of the
craniotomy. Note that the objects that were chosen for this study were relatively simple
in shape and the location of the craniotomy was chosen to be orthogonal to the
orientation of the CT scans/pointer direction. Oblique/non-orthogonal craniotomies are
very difficult to perform in standard Image Guidance systems and hence were not even
tested. It is worth noting that this is relatively easy to do in an AR system. In the
display for Image Guidance, the subjects had both the 3D view with all the relevant
objects segmented, along with orthogonal 3D CT slices. Although the 3D model
information was provided, it is typically (in 99% of the cases I have been involved in)
either not provided or used by the surgeons.
After the drawing sessions for each of the techniques (IGS and AR) were
completed, the subject was asked a set of questions to gauge his understanding of 3D
arrangement of the objects within the phantom. He was asked to use the visualization
tool at hand (either Augmented Reality or Image Guidance) to determine if they were
able to understand the relationships of each of the objects that were inside the skull.
The objects inside of the skull were fabricated to be neutral objects with no anatomical
basis (cubes, bolts, cylinders, and tubes). This allowed us to test non-surgeons as well
as surgeons as anatomical knowledge did not give an advantage for the objects in
question. Examples of these questions are as follows:
Where is the pyramid? Top View: a. Front b. Middle c. Back
Side View: a. Top b. Middle c. Bottom
Front View: a. Left b. Middle c. Right
107
Is the bolt touching the vessel?
Is the bolt above or below the Cube? The complete set of questions given to
each subject during the test is provided in the appendix.
Figure 5-2: This figure shows a screen shot of a subject looking at the live video view of skull overlaid with the 3D graphics objects on the monitor. Their Marker is then placed on the edges of the overlaid object and the object is traced on the surface of the draped skull.
Figure 5-3: This figure shows screen shots of a subject looking at the orthogonal slices in am image guidance
system to find the extents and shape of the object for which the skull opening has to be made.
Each subject was timed on how long it took for the entire test which included the
drawing of the craniotomy and the answering of all the questions during the test. The
questions were graded to determine how many errors were made. The movements of
the robot arm were also recorded for later analysis (although not found to be very
useful). In addition, a questionnaire was given at the end of the test to get some
108
subjective and comparative impressions of the two systems. This post-test
questionnaire is provided in the appendix .
5.1.3 Results
Figure 5-4 shows the errors made by each of the 21 subjects. The two bars
directly above each of the subject numbers on the x axis represent the number of error
made using the Image Guidance System (first bar) and the Augmented Reality System
(the second bar). There are a few instances where no errors were made (as in the case
of the AR task for the first subject) in which case no bar appears in that location.
Figure 5-5 shows the times required for each subject to complete the testing. The
testing phase consisted of drawing the craniotomy, and answering all the questions
given. All but three of the subjects tested were non-medical (mostly Engineering
Graduate Students). Tables 5-1 and 5-2 show the summary statistics as well as the
paired t-test results for each of the data sets. It is interesting to note that all three of the
surgeons made more errors while taking more time during their IGS tasks. In addition
to the objective data, there was a questionnaire of paired questions that was given to
each subject. The results, which include the average response and a pair-wise t statistic
to understand if the answers are different, are given in Table 5-3. In addition, there were
several subjective questions for which answers are provided in Tables 5-4 to 5-6.
109
Figure 5-4: Errors made by the subjects during the testing period.
Figure 5-5 The time required for each subject to complete the craniotomy task and answer questions on.
Errors in Tasks (IGS vs. AR)
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
1 3 5 7 9 11 13 15 17 19 21
Subject Number
Erro
rs
error nnerror ar
IGSAR
Time to Complete Tasks (IGS vs. AR)
02468
101214
1 3 5 7 9 11 13 15 17 19 21
Subject Number
time min
IGS
AR
110
Table 5-1: Errors made during testing Table 5-2: Time taken during testing
Table 5-3: Paired Questionnaire Analysis between AR and IGS.
Paired Questions Average Answer
Answers t-statistic p value
Significant at α = 0.05?
1. How was the test to perform using AR?
4.7
1. How was the test to perform using IGS?
3.4
1 - Extremely difficult. 2 - Reasonably difficult. 3 - Somewhat difficult. 4 - So-so. 5 - Somewhat easy. 6 - Reasonably easy. 7 - Extremely easy.
0.001 Yes, the answers are different.
2. How accurate do you think you were in the test session using AR?
5.1
2. How accurate do you think you were in the test session using IGS?
4.7
1 - Completely inaccurate 2 - Reasonably inaccurate 3 - Barely inaccurate 4 – Borderline 5 - Barely accurate 6 - Reasonably accurate 7 - Completely accurate
0.250 No, the answers are not different.
3. How well do you think you got a feel for the position and orientation of the items in the phantom using AR ?
4.3
3. How well do you think you got a feel for the position and orientation of the items in the phantom using IGS?
4.0
1 - Extremely poor 2 - Remarkably poor. 3 - Poor. 4 - So-so. 5 - Well. 6 - Remarkable. 7 - Extremely Good.
0.015 Yes, the answers are different.
4. How often were you confused by the information you were presented AR?
4.0
4. How often were you confused by the information you were presented AR?
3.4
1 - >90% of the time 2 - >60 % of the time. 3 - 50% of the time 4 < 30 % of the time 5 < 10 % of the time 6 < 5% of the time 7 – Never.
0.055 The answers are on the borderline of being different.
1.908.47NN0.00013 2.745.38AR
P value STDAverage 1.27 1.7 NN
0.068461.02 1.04 AR
P valueSTD Average
111
Table 5-4: Table of Responses for Subjective Question 1
Question: Did you use any strategies to perform this task? Please describe how you completed this test (i.e., what did you look at, what did you think about, what did you pay attention to? Or any other comments)? * I was trying to pay attention to the area I was looking at and aligning the pointer as best as I could. The 3 views offered by IGS were very helpful (Surgeon Comment). *Color of the objects made it easier in AR. * Using the pointer view of the skull to determine position was required more for IGS, while AR provided enough information to make a direct analysis of the object relative to the probe. * In IGS, had difficulty viewing all the directional views at the same time to figure out the object locations. AR reduced the complexity. * For IGS I used the 3D images and then tried to move the pointer from there. For AR, I looked for where the image seemed the biggest and assumed that was the plane. * IGS is better than AR.
Table 5-5: Table of Responses for Subjective Question 2
Question: Were there any parts of the simulations that you found particularly helpful or to which you paid particular attention? Please describe. * The AR was harder to tell what I was seeing and where (Surgeon Comment). *The wire mesh rendering of the object in AR provided useful information. It aided in the initial location of the object. *While using AR, having the blood vessels highlighted assisted me more so than the CT image.
Table 5-6: Table of Responses for Subjective Question 3
Any other comments? * Incredible-This is really a breakthrough (Surgeon Comment). * Would help if AR video was in 3D stereo (Surgeon Comment). * In Image Guidance it was pretty difficult in recognizing position because of a lack of colors. * I feel that AR was much easier than the IGS. For AR I was sure of my answers, as for IGS I had to guess. * The robot arm needs more number of degrees of freedom to be more comfortable. * I think that AR is a great surgical navigation tool, but,I think that one may become an expert at using both techniques simultaneously. * With AR, it was pretty easy to identify the objects with respect to one another; however, IGS may be better at getting the exact location of the objects. * If the image remained fixed and the crosshairs moved, this may help for IGS. * The skill of using the equipment seems to be very important. * AR was a bit more confusing from the visual stance. If given the choice, I would prefer using Image Guidance.
112
5.1.4 Conclusions
We believe that Augmented Reality generation is a natural extension for the
surgeon because it does both the 2D to 3D transformation and projects the views
directly onto the patient view. Our Human Factors study indicated that IGS took a
statistically significant longer time than did AR. In addition, (although on the border of
statistical significance (p value of 0.068)), IGS did have on average a greater number of
errors. It is interesting to note that all three of the surgeons tested made more errors
using the IGS system and took longer to perform the tasks then they did with the AR
system (Pandya A.K. 2003) that AR does improve the performance and error rates of
the surgeon.
Each technique tested here (AR and IGS), however, has its advantages and
disadvantages. We have already illustrated the difficulty of interpreting 2D slices for
IGS. For example, a simple 3D shape like the cube is represented by a triangle in an
oblique CT slice. In fact, this shape was confused by some of the subjects with the
pyramid shape also present in the skull. The bolt is viewed as a circle in the axial view
and a disjoint line in the sagittal view and also confused several subjects. This does not
seem like the natural method of visualization. This was part of the motivation for trying
to improve IGS with the techniques of AR. On the other hand, AR is limited to what
objects are chosen for segmentation for that particular surgery. This can be viewed as
an advantage because it simplifies the data set before the surgery starts, however, all
the relevant data may not have been selected. It is impossible to segment all the
objects as this is a very time-consuming process. Moreover, having all the detailed
structures present would make the AR scene very cluttered and confusing. Image
113
Guidance uses the original raw imaging data and the advantage is that all the
information (albeit confusing) is present.
We speculate that certain phases of the surgery (like the craniotomy) may be
easier to execute using the Augmented Reality system. Here, the gross relative location
of the anomaly along with major vessels can easily be drawn on the surface of the skull
(even if they are non-orthogonal trajectories). Using an IGS system, this task can be
time-consuming and confusing. Other phases of the surgery where specific and very
detailed information at the resection site is needed may well be suited for an IGS. At the
risk of inundating the surgeon, perhaps a hybrid on-demand system where all three
modalities of information (3D models, live video AR scene, and an image-guidance
system) would be simultaneously available to the surgeon would be beneficial. There is
still a great deal of uncertainty with regards to the best form of visualization, however,
the current state-of-art systems, need to seriously consider AR technology as a
significant method of visualization. Medical robotic devices of the future should be able
to use this technology to directly link these systems to patient data and provide the
optimal visualization of that data for the surgical team. The design and methods of this
prototype device can, we believe, be extrapolated for both current medical robotics
systems and Image Guidance systems. We plan on improving the AR technology using
the results and comments of this study and performing another evaluation in the near
future.
114
5.2 HUMAN FACTORS TESTING (HEADS-UP DISPLAY VS. MONITORS) 5.2.1 Introduction/Motivation
When watching a few surgeons using a heads up display (HUD) system to view
the remote live video of test surgeries, I noticed an interesting behavior. Even though
moving the head in a particular orientation would not change the view point (the video
view) of the surgeon, all the surgeons moved their heads to a comfortable location as if
aligning some type of internal body coordinate system. This provided the motivation to
perform a study to understand if there were any performance advantages to using a
head-mounted display for doing endoscopic tasks vs. a monitor view.
The use of HUD in endoscopic neurosurgical procedures is previously described
in the literature, but there are very few studies reporting an objective evaluation of HUD
use (Drascic et al. 1989). In this study, we developed an experimental model to assess
the real impact of HUD on the performance of the endoscopic procedure. We believe
that this is the first study where a systematic analysis is done to understand the
difference between these two modalities of visualization.
Diverse types of display devices can give a different level of immersion in the
Virtual or Augmented world (Azuma R. 1997). There are several distinct types of
Augmented Reality displays. One is referred to as monitor-based viewing of augmented
scenes the other uses a head-mounted display (HMD). In the monitor form of
augmented reality, a video image of the actual environment is overlaid with graphics,
text or other images to enhance the data being seen and is displayed on a monitor. To
115
increase the sense of presence, a head-mounted display (HMD), also referred to as
see-through display, can be used. In this case, either the actual physical world or a
video image of the world is fused with graphics, text or other images and presented to
the operator. In optical see-through displays, an optical combiner is placed in front of
the user’s eyes. These combiners, which are partially transmissive, allow the user to
see the actual objects of interest and also have the virtual objects (which are projected
from mounted monitors). Because they are partially transmissive, the real world
appears darker as all the light is not let into to the viewers eyes. [Wanstall89]. Typically,
the user’s head is also tracked and head movement is reflected in the real video image.
The advantages and disadvantages of each of the different types of viewing systems is
explained in more detail in (Azuma R. 1997).
In contrast, some HMDs use cameras to display video signals from the real
world. They differ from HUDs in that they do not allow the user to see around the
display. There are commercially available products that consist of a set of goggles or a
helmet with tiny monitors in front of each eye to generate images seen by the wearer as
three-dimensional. This HMD is combined with a head tracker so that the images
displayed in the HMD change as the head moves. In addition, these displays can allow
the subject to see the virtual image superimposed over the real world. The wearer can
"see through" the virtual image.
The purpose of this study was to evaluate the use of heads-up display (HUD)
technology for endoscopic operations (Marchese M. and . 2003;Pandya A.K. 2001b). As
opposed to Head Mounted Displays (HMDs), HUDs allow the surgeon to view both the
116
video image (by glancing up at the monitors) as well as the surgical field. In endoscopic
operations, the surgeon's hand-eye coordination is critical for success. There are very
subtle issues involved. Slight motions of the head and hands can result in amplified
endoscopic motions due to the leaver arm effect as the entry point of the endoscope
acts as a fulcrum. In traditional endoscopic surgery, the surgeon performing the
surgery uses the room’s overhead monitors or obliquely placed monitors. Usually, he is
looking away from the surgical site while performing the surgery. With the heads-up
display, the surgeon is able to see the surgical field as well as the endoscopic view at
the same time without any head movement. This we believe is its main advantage
Figure 5-6: Different hardware methods to display a remote camera view.
Computer Graphics
Augmented Scene
Sensed Data
Optical Merging (mirror)
Position
Monitor
See Through Display
Heads Mounted Display
Video
Real World
Projector or monitor
117
5.2.2 Method
The endoscope is a surgical instrument mounted with a camera that is at the end
of ridged shaft. The shaft has various sleeves that can be used to thread tools that can
be used at the end-effector site. The major questions for this research are-- Can the
head-up display system improve the human performance (in terms of time and
accuracy) as compared to the traditional use of a monitor. Can it reduce neck strain (a
major issue for the multi-hour procedures)? And can it provide the user with more
focus/attention on their selected task?
The seven test subjects of the study were asked to use an endoscope in a
phantom brain and pick up five targets (distributed throughout the environment) using a
heads-up display or a monitor for viewing. In this case a small cutting (biopsy) tool was
used for the testing purposes. The phantom was covered with a piece of sponge and a
small hole was placed in the center of the sponge to create an opening in which the
user would guide an endoscope (See Figure 5-7). The glasses used for this study were
the iglasses (LCX2) from iO Display Systems. These glasses have TV resolution
(comparable to the monitor we used for the study). A phantom model of a brain was
prepared in which a Velcro strip was fixed in a circular track. Five small blue pieces of
plastic tubing cut at one end to provide edges to grasp were loosely fixed on the Velcro
strip. The test subject was explained the use of the endoscope with the written
instruction (Pandya A.K. 2001b).
118
There were 3 female and 4 males in the study ages 25-55. The test was
counterbalanced in terms of which system (heads-up display or monitor) was used first.
The subjects then were asked to perform the task using the two modes of visualization
Figure 5-7: The phantom skull with a black track of velcro and several blue plastic pieces of tubing that the
subject was asked to remove through the opening in the foam.
Figure 5-8: A subject performing the experiment using the monitor and the heads-up display.
In a follow-up study, to experiment further with the angle of the monitor, a test
was conducted to pick up six small objects from six tiny cavities located on the bottom
of a closed box using a 0° rigid endoscope and biopsy forceps. An electrical plate
surrounding the cavity’s edges registered any contact with the endoscope or with the
forceps with an electronic buzz. The test was formed in three different parts: in the first
the TV monitor was located in front of the subject (0°), in the second the monitor was
119
angled at 45° and in the third a HUD was used. To assure the data were free of a
"learning curve" bias, the three trials were sequenced randomly. The time to complete
the tasks and the error counts (contact with the surrounding plates) were both
measured (Marchese M. 2003).
5.2.3 Results
In the first study, five of the seven subjects did the task faster using the heads-up
display. On average, the subjects performed 8 percent better with the heads-up display
(See Figure 5-9). This particular study did not have enough subjects to be a statistically
valid comparison. Also, there was no objective method to measure the number of errors
that the subjects made. In addition, there was no comparison of placement of the
monitor for viewing. However, in the questionnaire provided, several of the subjects
commented that the use of the heads-up display helped in the concentration of the task.
It reduced external input and helped them focus on the task.at hand.
Figure 5-9: Time for testing using the HUD subtracted from time for testing using the Monitor.
Difference in time to perform task between Monitor and
Head Mounted Display
-40
-20
0
20
40
60
1 2 3 4 5 6 7
Subject Number
Tim
e (m
onito
r - h
md)
Difference in time toperform task
120
In order to improve on the results of the previous study and provide better control of the
placement of the monitor, a follow up study where fifteen subjects participated in the
test was conducted. This time, both time to complete the tasks as well as the errors
involved were measured. The average time and errors to perform the test was: HUD
194 sec-6.06 errors; monitor at 0° 224 sec-8.33 errors; monitor angled at 45° 234 sec-
8.46 errors (See Figure 5-10 and Figure 5-11). Statistically significant results (p=0.016
and p=0.014) were found matching both time and error data of the HUD with those of
the monitor angled at 45° (See Figure 5-12). The monitor at 0 degrees was on the
borderline of significance whereas the two monitor condition had very high p values
indicating that there was no difference between them. In Figure 5-13, a surgeon using
the HUD in surgery is shown and reported very good results in terms of comfort of the
surgery, the focus (reduced distractions), and accuracy.
Tab 1: Average time to perform the test
194 sec.
224 sec.234 sec.
0
50
100
150
200
250
Seco
nds
HUD Monitor 0 deg Monitor 45 deg Figure 5-10- Average time to perform a the same task for 15 subjects for the Heads Up Display, a monitor
placed directly in front of the subject and a monitor placed at a 45 degree angle from the subjects task.
121
Average number of errors that occurred during the test
6.0
8.3 8.4
0
1
2
3
4
5
6
7
8
9
Er rors
HUD Monitor 0 deg Monitor 45 deg
Figure 5-11: Average number of errors to perform he same task for 15 subjects for the Heads Up Display, a monitor placed directly in front of the subject and a monitor placed at a 45 degree angle from the subjects task.
Pairwise t-test comparison
0.076750813
0.066701264
0.0164075580.014514857
0
0.05
0.1
P va
lue
HMD vs Monitor 0': TIME
HMD vs Monitor 0': ERROR
HMD vs Monitor 45': TIME
HMD vs Monitor 45': ERROR
Monitor 0' vs 45': TIME
Monitor 0' vs 45': ERROR
1
0.64285325
0.90998928
Figure 5-12: This is the t-statistic analysis that was done on the data. Note that the two monitor conditions 0 and
45 degrees have a very high p value indicating that there is no statistical significance. When comparing the 45 degree monitor with the HUD, there is a statistical significance at alpha = 0.5, and the monitor at 0 degrees is on the borderline of significance. Note the break in the y axis to show the difference in magnitude.
122
5.2.4 Conclusions
The first and preliminary study that compared using a heads-up display with the
standard technique of a monitor suggested that using a heads up display to perform
endoscopic (“remote”) tasks was faster and caused less strain. In addition, it assisted
the user by drawing his attention to the task hence, minimizing external distractions.
The subject study indicated that the heads-up display may improve the focus and
efficiency of the operator. There were not enough subjects in this study to make a
conclusive decision. The heads-up display was used in several surgical cases with very
good results. The heads up display allows the surgeon to maintain a clear view of the
surgical field without moving her head while she was able to glance at the endoscopic
view. The surgeon commented that there was less neck strain and that she was able to
guide the surgery with a little more focus and confidence. It was said by the surgeon
that this was the fastest endoscopic neurosurgery she had ever performed.
The results obtained in the second and more rigorous study are more definitive.
The use of HUD compared with a 45° angled monitor, confirm that HUD influences
Figure 5-13: The HUD was tried in a Neurosurgery case with very good results.
123
positively the performance of the endoscopic procedure. It was shown in this study that
there was no difference in error or timing between the two monitor cases. However, the
HUD does have a positive influence on both time and number of errors.
In this chapter, two examples of Human Factors testing were given which helped
us understand some the advantages and disadvantages of our developed technology.
This is very essential as we move forward to improve on this work. The user-testing
has given us several avenues of future work. In the next and last chapter, we focus on
the contributions of this work along with the future directions
124
Chapter 6: Summary, Contributions and Future Work
What we call the beginning is often the end and to make an end is to make a
beginning. The end is where we start from. -- T. S. Eliot
This chapter will first provide a summary of the main results/contributions of the
thesis, followed by a description of what future avenues of research have opened up as
a result of this work.
6.1 Summary/Contributions
Image-guided medicine is beginning to make real contributions to patient care.
As imaging systems become more integrated with the operating room and imaging
becomes more real-time, the tools of IGS and AR will become even more useful. We
feel that there is a remarkable research potential in this field with a tremendous return
on the investment for efforts spent in improving this technology for the OR.
There are three novel contributions for this thesis. (1) A comparison of IGT with
AR. This has not been done before, probably for several reasons. One is the complexity
involved in building both types of systems for direct comparison and the other is the
relative newness of the field and the fact that Human Factors Engineering is still not in
the mainstream of medical technology development. (2) A comparison of various
display modalities for endoscopic surgery. Although various display devices have been
reported to be used in endoscopic and robotic surgery, there have been no known
studies that objectively test subject/surgeons to compare their performance using
125
various methods of visualizations. What hardware is used for visualization of robotic or
endoscopic views of the end-effector camera is very important. (3) We have created a
novel medical robotic system which uses the kinematics of the robotic device to track a
video camera and create an Augmented Reality scene which can supplement the video
view with geometrical or sensor information (Pandya A. K. 2002;Pandya A.K.
2003b;Pandya A.K. 2003c). AR blends the real world with 3D models generated from
medical scans or other data and has been demonstrated for a passive robotic system in
this thesis.
The technological contributions reported here are the implementation details of
both an Image Guidance System able to show the end-effector of a passive articulated
arm on orthogonal imaging scans and 3D models and an Augmented Reality system on
which registered 3D models are merged with live video. These are both relatively new
technologies and this detailed description of the methods for implementation is
important for future research/researchers. The implementation details provide for an in
depth understanding of the nature of the technologies and are instructive for new
researchers joining this field. Included in this thesis is a detailed section demonstrating
how 3D models are created from imaging data. This is the foundation technology of
Image Guidance and AR.
The two prototypes developments were designed to use the same hardware and
structured the software in such a way that it could impact both the Medical Robotics
world as well as the Image Guidance world. A passive articulated robotic arm (the
Microscribe) is used to develop both systems. This thesis covers the steps needed to
126
rebuild and compute both the real-time Image guided and Augmented Reality scenes
for objects of interest.
As emphasized several times in this thesis, we maintain that successful
technology development for medicine must include extensive user testing and surgical
feedback. We provide an example of how the development and testing cycle should
proceed with development, user testing, surgical feedback and further research and
development. The Human Factors contribution of this thesis is that we have provided
the details of our usability testing of two prototype systems. First, a subject test using 21
subject provided a comparison of IGS vs. AR. IGS represents what the surgeons
currently use in the operating theater and AR represents potentially new or upgraded
visualization system. The second study used a total of 22 subjects to test the use of
Heads-Up Displays vs. Monitor viewing of the remote video of either an endoscopic or
robotic interface.
The underlying hypotheses for both studies were that (1) an Augmented Reality
System will significantly improve the performance of the surgeon over an IGS system
(2) that advanced visualization hardware (i.e. a heads up display) can improve the
performance of the surgeon over a monitor view. Both our hypotheses have been
affirmed by our subject testing, albeit with some important caveats that are noted in the
respective chapters. The first Human Factors study indicated that the subjects
performed faster, with more accuracy and less errors (errors results being on the
borderline of significance) using the Augmented Reality interface. The results of the
HUD vs. Monitor study revealed that the HUD did impact positively both the
performance and the error rate of the subjects.
127
The ultimate aim of this work will be the extrapolation of the findings to the
development of AR and IGT to active medical robotic systems and to incorporate AR
techniques to image guidance systems in existence. Commercial systems should have
both technologies available on-demand of the surgeon. Examples of current robotic
systems that could take advantage of this technology include systems such as the
Neuromate and Robodoc systems (Integrated Surgical Systems Inc.), daVinci (Intutive
Inc.), and the Zeus (Computer Motions Inc.) systems. Research rarely gives definitive
answers to our questions and although positive results are provided for both
hypotheses, we feel that several more rounds of technology enhancements along with
subject testing will be essential as these technologies begin to become adopted into the
operating rooms of the future.
6.2 Future Work
There is a substantial amount of future work that needs to be performed to make
AR a routine part of main-stream surgery. Some of the major obstacles that remain are
continuous zoom camera calibration, the modeling of deformable objects, and
producing real-time AR scenes. There are many technical problems with the
development of AR. However, we recommend that future researchers also focus more
on psychophysical experiments in which one can answer important usability and
performance questions such as, “ How much error is tolerable by the surgeon, How
much dynamic error can the surgeon handle? What is the performance of the surgeon
when certain types of features are disabled? What are the limits of information
presented to the surgeon and would on-demand systems be better?” These are just a
128
few of the very relevant and important questions that need to be asked as the
technology is being further developed.
The following sections contain potential research directions that we can envision
at this juncture.
6.2.1 Stereo Augmentation
One of the surgeons that used the AR system suggested that a stereo AR
system would add value to the AR experience. This is a very valid observation
because, in order to get real depth perception in an AR scene, several points of view
are needed. One possible research direction is to pursue eyewear-free (auto-
stereoscopic/lenticular display) stereo display system for real-time video augmented
with computer generated dynamic overlays. A stereo image system should further
enhance the technology and the human computer interface. Again, after the
development of technology, subject testing for improved depth perception should be
used to quantify any improvements.
6.2.2 Sensor Technology/Data at the End Effectors of Robots
Experimentation should be done to envision sensor information. For instance,
sensors can be mounted on the end-effector of a robotic device and data taken with the
sensor can then be viewed using augmentation techniques. For example, tumor
removal requires that the surgeon be able to accurately define the tumor resection
margins. Current techniques in IGS rely primarily on visual feedback from the surgical
129
site. Here, the goal would be to enhance the sensing modalities of the surgeon by
adding sensor technology to the surgical robotic environment for the sensing of for
instance, pH, O2 and glucose levels or even higher order molecular signals such as
Raman Spectroscopy. It is hypothesized that other modalities of information from the
surgical/tumor site based on these non-visual/biochemical aspects will enhance the
surgeon’s ability to be able to more completely define resection margins. Other
examples include augmenting the 3D force/torque sensor information at the end-
effector of robots. The 3D nature of force and torque information is difficult to interpret
by the operator and perhaps registered vector flows directly on the video stream from
the remote site would be beneficial.
6.2.3 Continuous Zoom Camera Calibration
Camera calibration is a very important issue for AR technology to work correctly.
A process that can track real-time variations of camera parameters (intrinsic, camera-to-
camera pose, camera-to-worksite pose) that may occur due to operator adjustment of
camera zoom will be of utmost importance. Research to calibrate the cameras for
multiple zoom angles and to provide a robust method of interpolation in order to be able
to derive the camera parameter estimation for all zoom angles of the camera, would be
very important work. There has been extensive work done for the calibrating of a
camera with variable zoom (D. Liebowitz 1998;M. Li;O. D Faugeras 1992;Sturm 2002,).
The complexity of the calibration comes from the fact that both intrinsic and extrinsic
parameters of the camera are dependent on the zoom, the focus, and aperture settings.
In contrast, static cameras (on which our prototype is built) have only one zoom, focus
130
and aperture setting. The number of calibration points needed for a zoom camera will
be substantially greater than a static camera. Various techniques have been proposed
to reduce the number of data points (R. Atienza 2001). In these approaches, intrinsic
and extrinsic parameters are estimated for constant aperture settings for a range of
zoom angles. High-order polynomials are then used to approximate the camera
parameters in continuous mode for other zoom-focus combinations. Other researchers
have used neural networks to closely approximate the camera model (M. Ahmed 2000).
Experimentation with these methods and the integration of a robust solution to
continuous camera calibration problem will be essential.
6.2.4 3D Ultrasound for AR
A very recent development in imaging is 3D ultrasound (PM. et al. 2000.). A
conventional 2D ultrasound probe (which has been used for decades) can be used in a
novel way to produce 3D imaging. If the 2D probe is equipped with a six degrees-of-
freedom tracking device spatially registered 2D scans can be acquired. These scans
can then be mathematically combined to create a tomographic 3D image set. The
resulting 3D data image planes can be visualized by either volume rendering or by the
process of segmentation and surface rendering techniques. This data could provide live
updates to an AR system in use for surgery.
6.2.5 Space Station Robotics (to infinity and beyond)
Although the emphasis of this thesis is on Medical Robotics, it is interesting to
point out the dual-use/technology transfer of the same technology to NASA’s space
131
station robotic arm (Pandya A.K. 2001f;Pandya A.K. 2002;Pandya A.K. 2003a). In fact,
NASA has collaborated with us (supplied consultation and software) and financially
supported this work. The same technology that has been developed here to assist the
medical domain, has also been shared with NASA’s Graphics Research and Analysis
Facility at the Johnson Space Center where the technology is being used to augment a
Space Station Robot’s camera view with the appropriate graphics (for example a space
Shuttle docking target) to assist the astronaut. They have also followed suit and
performed human factors tests that concluded that Augmented Reality does improve
the performance of the robotic operator. Based on work done both as part of this thesis
and at the Johnson Space Center, NASA wants to implement this system for their
Space Station Mockup and have provided us an additional round of funding to assist
them. Possible future avenues could include merging the AR and the VR worlds such
that the operators could first practice the procedure in the overlaid virtual world and then
perform the actual operation once all the parameters were correctly adjusted.
132
Appendix Human Factors Study Subject Testing Material (AR vs. IGS)
Subject Instructions
1. You will be given detailed instruction depending on which portion of the experiment you will be performing.
2. You will be given a robotic instrument which is linked to a particular imaging system. 3. You will be first given some practice time to see how the visualization works. 4. You will then be given the task of outlining on the surface of a plastic skull where you
think the location of various objects are and will be asked to answer some question as you are performing the visualization.
5. You will be timed on how long it takes you to identify and draw the projections of these objects on the skull surface and answer all the questions.
6. The movements of the robot arm are also recorded for later analysis. 7. Your data will be completely confidential as only the subject number is recorded on the data
sheet. 8. A brief questionnaire will be given at the end of the test.
Practice Session: Please try and locate the center of each of the objects specified from the front, side and top views using the navigation tool given. You have about 5 minutes to practice each of the techniques. In the Neuronavigation—you will be provided 3 orhtogonal slices from where your pointer is located. Imagine looking at all three plane slices from the point of contact.. In the AR, you will see a projected view on a live video scene of the objects of interest. Subject # ……… M F Age ….. Image Guidance/ Augmented Reality Start time AR End Time AR Start time NN End Time NN
Figure 6-3
133
During Test Question: Please answer the following questions as you are performing the AR: Hint: Use the camera/pointer in all directions (orthogonal help the most) when determining the position of objects you are viewing when you are answering these questions (for both parts NN and AR). Tell the operator when you are starting.
1. Identify the pyramid structure and mark (color in) the projection of this object from location A (marked) from the view point of perpendicular to the surface.
2. Where is the pyramid? Circle one from each of these three lines below Top View : a. Front b. Middle c. Back Side View : a. Top b. Middle c. Bottom Front View: a. Left b. Middle c. Right 3. Where is the cylinder(facing the skull) Top View : a. Front b. Middle c. Back Side View : a. Top b. Middle c. Bottom Front View: a. Left b. Middle c. Right 4. Is the bolt touching the vessel? 5. Is the bolt above or below the Cube?
Tell the operator that you have finished. Please answer the following questions as you are performing the Neuro Navigation: Tell the operator when you are starting.
1 Identify the cube structure and mark (color in) the projection of this object from location B (marked) from the view point of perpendicular to the surface.
2 2. Where is the cube (facing the skull) Top View : a. Front b. Middle c. Back Side View : a. Top b. Middle c. Bottom Front View: a. Left b. Middle c. Right
3. Where is the bolt (facing the skull) Top View : a. Front b. Middle c. Back Side View : a. Top b. Middle c. Bottom Front View: a. Left b. Middle c. Right 4. Is the cylinder above or below the cube? 5. Is the bolt in front or behind the vessels?
Tell the operator that you have finished.
134
Post-Test Questionnaire for the project
1. How was the test to perform using Augmenter Reality? 1 - Extremely difficult. 2 - Reasonably difficult. 3 - Somewhat difficult. 4 - So-so. 5 - Somewhat easy. 6 - Reasonably easy. 7 - Extremely easy. 2. How was the test to perform using Neuro Navigation? 1 - Extremely difficult. 2 - Reasonably difficult. 3 - Somewhat difficult. 4 - So-so. 5 - Somewhat easy. 6 - Reasonably easy. 7 - Extremely easy. 3 How accurate do you think you were in the test session using Augmented Reality ? 1 - Completely inaccurate 2 - Reasonably inaccurate 3 - Barely inaccurate 4 - Borderline 5 - Barely accurate 6 - Reasonably accurate 7 - Completely accurate 4. How accurate do you think you were in the test session using NeuroNavigation ? 1 - Completely inaccurate 2 - Reasonably inaccurate 3 - Barely inaccurate 4 - Borderline 5 - Barely accurate 6 - Reasonably accurate 7 - Completely accurate 5. How well do you think you got a feel for the position and orientation of the items in the phantom Using AR ? 1 - Extremely poor 2 - Remarkably poor. 3 - Poor. 4 - So-so. 5 - Well.
135
6 - Remarkable. 7 - Extremely Good. 6. How well do you think you got a feel for the position and orientation of the items in the phantom Using NN ? 1 - Extremely poor 2 - Remarkably poor. 3 - Poor. 4 - So-so. 5 - Well. 6 - Remarkable. 7 - Extremely Good. 7. How often would you say that you were confused by the information you were presented AR? 1 ->90% of the time 2- > 60 % of the time. 3 - 50% of the time 4 < 30 % of the time 5 < 10 % of the time 6 < 5% of the time 7 – Never. 8. How often would you say that you were confused by the information you were presented NN? 1 ->90% of the time 2- > 60 % of the time. 3 - 50% of the time 4 < 30 % of the time 5 < 10 % of the time 6 < 5% of the time 7 – Never. 9. Did you use any strategies to perform this task? Please describe how you completed this test (i.e., what did you look at, what did you think about, what did you pay attention to? Or any other comments) 10. Were there any parts of the simulations that you found particularly helpful or to which you paid particular attention? Please describe. 11. Any other comments?
136
BIBLIOGRAPHY A. Pandya , M. Siadat, G. Auner (Invited Speaker). 2003 Augmented Reality vs.
Neuronavigation a Comparison of Surgeon Performance.In: Biomedical
Engineering Symposium 2003, Wayne State Univerity.
Abdel-Aziz Y. I. , Karara H. M. 1971 Direct Linear Transformation into Object Space
Coordinates in Close-Range Photogrammetry.In: Proc. Symposium on Close-
Range Photogrammetry, Urbana, Illinois, 1-18.
Alp, M. S., Dujovny, M., Misra, M., Charbel, F. T., and Ausman, J. I. 1998. Head
registration techniques for image-guided surgery. Neurological Research; 20(1);
31-37.
Ayache, N. 1995. Medical Computer Vision, Virtual-Reality and Robotics. Image and
Vision Computing; 13(4); 295-313.
Azuma R. 1997. A Survey of Augmented Reality. Presence; 6(4); 355-385.
Azuma, R. T. 1997. A survey of augmented reality. Presence-Teleoperators and Virtual
Environments; 6(4); 355-385.
Bernstein, M., Hebert, P. C., and Etchells, E. 2003. Patient safety in neurosurgery:
Detection of errors, prevention of errors, and disclosure of errors. Neurosurgery
Quarterly; 13(2); 125-137.
Berry, J., O'Malley, B. W., Humphries, S., and Staecker, H. 2003. Making image
guidance work: Understanding control of accuracy. Annals of Otology Rhinology
and Laryngology; 112(8); 689-692.
Billinghurst, M., Kato, H., and Poupyrev, I. 2001. The MagicBook: a transitional AR
interface. Computers & Graphics-Uk; 25(5); 745-753.
137
Blackwell, M., Nikou, C., DiGioia, A. M., and Kanade, T. 2000. An Image Overlay
system for medical data visualization. Medical Image Analysis; 4(1); 67-72.
Bloch, P., Lenkinski, R. E., Buhle, E. L., Hendrix, R., Bryer, M., and Mckenna, W. G.
1991. The Use of T2-Distribution to Study Tumor Extent and Heterogeneity in
Head and Neck-Cancer. Magnetic Resonance Imaging; 9(2); 205-211.
Bowersox, J. C., Cordts, P. R., and LaPorta, A. J. 1998. Use of an intuitive
telemanipulator system for remote trauma surgery: An experimental study.
Journal of the American College of Surgeons; 186(6); 615-621.
Broll, W., Schafer, L., Hollerer, T., and Bowman, D. 2001. Interface with angels: The
future of VR and AR interfaces. Ieee Computer Graphics and Applications; 21(6);
14-17.
Bucholz, R. D., Smith, K. R., Laycock, K. A., and McDurmont, L. L. 2001. Three-
dimensional localization: From image-guided surgery to information-guided
therapy. Methods; 25(2); 186-200.
Burkart A., Debski RE., McMahon PJ., Rudy T.,, and Fu FH, M. V., van Scyoc A,
Woo SL. 2001. Precision of ACL Tunnel Placement Using Traditional and
Robotic Techniques. Computer Aided Surgery; 6; 270-278.
Carthey, J., de Leval, M. R., and Reason, J. T. 2001. The human factor in cardiac
surgery: Errors and near misses in a high technology medical domain. Annals of
Thoracic Surgery; 72(1); 300-305.
Cash, D. M., Sinha, T. K., Chapman, W. C., Terawaki, H., Dawant, B. M., Galloway, R.
L., and Miga, M. I. 2003. Incorporation of a laser range scanner into image-
138
guided liver surgery: Surface acquisition, registration, and tracking. Medical
Physics; 30(7); 1671-1682.
Chen, Y. M., Guo, W. H., Huang, F., Wilson, D., and Geiser, E. A. 2003. Using prior
shape and points in medical image segmentation. Energy Minimization Methods
in Computer Vision and Pattern Recognition, Proceedings; 2683; 291-305.
Chmielewski, C., Pandya, A., Woolford, B., Adolf, J., Whitmore, M., Berman, A. H., and
Maida, J. (1998). "Comparison of the Features of Multimedia and Virtual Reality
for use in Learning." LMSMSS 32906, NASA, Houston.
Chmielewski C., Pandya, A., Adolf, J., Whitmore, M., Berman, A., Woolford, B., and
Maida, J. C. 1999 Comparison Of The Features Of Multimedia And Virtual
Reality For Use In Learning.In: Electronic Proceedings of the 1999 International
Conference on Computer-Aided Ergonomics and Safety, Barcelona, Spain.
Cho, Y. K., and Neumann, U. 2001. Multiring fiducial systems for scalable fiducial-
tracking augmented reality. Presence-Teleoperators and Virtual Environments;
10(6); 599-612.
Cleary K. , Nguyen C. 2001. State of the Art in Surgical Robotics: Clinical Applications
and Technology Challenges. Computer Aided Surgery; 6; 312-328.
Cline, H. E., Dumoulin, C. L., Hart, H. R., Lorensen, W. E., and Ludke, S. 1987. 3d
Reconstruction of the Brain from Magnetic-Resonance Images Using a
Connectivity Algorithm. Magnetic Resonance Imaging; 5(5); 345-352.
Cline, H. E., Dumoulin, C. L., Lorensen, W. E., Souza, S. P., and Adams, W. J. 1991.
Volume Rendering and Connectivity Algorithms for Mr Angiography. Magnetic
Resonance in Medicine; 18(2); 384-394.
139
Cuschieri, A. 2003. Medical errors, incidents, accidents and violations. Minimally
Invasive Therapy & Allied Technologies; 12(3-4); 111-120.
D. Liebowitz , A. Zisserman. 1998 Metric Rectification for Perspective Images of
Planes.In: in Proc. IEEE Conf. on CVPR, 482-488.
Danisch, L. A. 1997. Fiber-optic shape sensors (TM) and shape tape (TM).
Measurements & Control (186); 99-102.
Delcker, A., and Tegeler, C. 1998. Development and application of 3D ultrasound in
neurology. Aktuelle Neurologie; 25(2); 56-62.
D'Esposito, M., Deouell, L. Y., and Gazzaley, A. 2003. Alterations in the bold FMRI
signal with ageing and disease: A challenge for neuroimaging. Nature Reviews
Neuroscience; 4(11); 863-872.
DiGioia, A. M. 1998. Computer assisted orthopaedic surgery: Medical robotics and
image guided surgery - Comment. Clinical Orthopaedics and Related Research
(354); 2-4.
Drascic, D., and Milgram, P. 1996 Perceptual Issues in Augmented Reality.In: SPIE,
San Jose, 123-134.
Drascic, D., Milgram, P., and Grodski, J. J. 1989 Learning Effects in Telemanipulation
With Monoscopic Versus Stereoscopic Remote Viewing.In: IEEE International
Conference on Systems, Man, and Cybernetics, Boston.
Erdi, Y. E., Humm, J. L., Imbriaco, M., Yeung, H., and Larson, S. M. 1997. Quantitative
bone metastases analysis based on image segmentation. Journal of Nuclear
Medicine; 38(9); 1401-1406.
140
Ferrant, M., Nabavi, A., Macq, B., Black, P. M., Jolesz, F. A., Kikinis, R., and Warfield,
S. K. 2002. Serial registration of intraoperative MR images of the brain. Medical
Image Analysis; 6(4); 337-359.
Fleute, M., and Lavallee, S. 1999. Nonrigid 3-D/2-D registration of images using
statistical models. Medical Image Computing and Computer-Assisted
Intervention, Miccai'99, Proceedings; 1679; 138-147.
Freysinger, W., Truppe, M. J., Gunkel, A. R., Thumfart, W. F., Pongracz, F., and
Maierbaeuerl, J. 1997. Interactive telepresence and augmented reality in ENT
surgery: Interventional Video Tomography. Cvrmed-Mrcas'97; 1205; 817-820.
Fuchs, H., Livingston, M. A., Raskar, R., Colucci, D., Keller, K., State, A., Crawford, J.
R., Rademacher, P., Drake, S. H., and Meyer, A. A. 1998. Augmented reality
visualization for laparoscopic surgery. Medical Image Computing and Computer-
Assisted Intervention - Miccai'98; 1496; 934-943.
Fuchs, H., State, A., Pisano, E. D., Garrett, W. F., Hirota, G., Livingston, M., Whitton, M.
C., and Pizer, S. M. 1996. Towards performing ultrasound-guided needle
biopsies from within a head-mounted display. Visualization in Biomedical
Computing; 1131; 591-600.
Gallen, C. C., Bucholz, R., and Sobel, D. F. 1994. Intracranial Neurosurgery Guided by
Functional Imaging. Surgical Neurology; 42(6); 523-530.
Goldsby M.E., Pandya, A.K., Maida, J.C., Hancock, L.H. 1994 Scripting Human
Animations in a Virtual Environment.In: Preceedings of the Fourth International
Symposium on Measurements and Control in Rototics: Topical Workshop of
Virtual Realisty, pp. 1434-150.
141
Gong J. , Zamorano L. , Li Q.H., Pandya A.K. , Diaz F.:. 1999 Tradeoff Analysis of
Medical Image Registration Strategy.In: 49th Annual Meeting of the Congress of
Neurological Surgeons, Boston, Massachusetts, pp 519-520.
Grimson W.E.L., Ettinger G.J., White S.J., Lozano-Pérez T., Wells III W.M. , Kikinis R.
1996. An Automatic Registration Method for Frameless Stereotaxy, Image
Guided Surgery, and Enhanced Reality Visualization. IEEE Trans. Medical
Imaging; 15(2); 129--140.
H. Kato, M. Billinghurst, I. Poupyrev, K. Imamoto, K. Tachibana. 2000 Virtual Object
Manipulation on a Table-Top AR Environment.In: Proceedings of ISAR 2000.
Harders, M., and Szekely, G. 2003. Enhancing human-computer interaction in medical
segmentation. Proceedings of the Ieee; 91(9); 1430-1442.
Hattori A. , Suzuki N., Hashizume M., Akahoshi T. , Konishi K., Yamaguchi S. ,
Shimada M. , Hayashibe M. 2003 A Robotic Surgery System (da Vinci) with
Image Guided Function - System Architecture and Cholecystectomy
Application.In: Medicine Meets Virtual Reality 11, 110-116.
Heikkila, J. 2000. Geometric Camera Calibration Using Circular Control Points. IEEE
PAMI; 22(10).
Hinsche, A. F., and Smith, R. M. 2001. Image-guided surgery. Current Orthopaedics;
15(4); 296-303.
Holly, L. T., and Foley, K. T. 2003. Intraoperative spinal navigation. Spine; 28(15); S54-
S61.
142
Horkaew, P., and Yang, G. Z. 2003. Optimal deformable surface models for 3D medical
image analysis. Information Processing in Medical Imaging, Proceedings; 2732;
13-24.
Hounsfield, G. N. 1980. Computed Medical Imaging - Nobel Lecture, December 8,
1979. Journal of Computer Assisted Tomography; 4(5); 665-674.
Hoznek, A., Zaki, S. K., Samadi, D. B., Salomon, L., Lobontiu, A., Lang, P., and Abbou,
C. C. 2002. Robotic assisted kidney transplantation: An initial experience.
Journal of Urology; 167(4); 1604-1606.
Iseki, H., Masutani, Y., Iwahara, M., Tanikawa, T., Muragaki, Y., Taira, T., Dohi, T., and
Takakura, K. 1997. Volumegraph (Overlaid three-dimensional image-guided
navigation) - Clinical application of augmented reality in neurosurgery.
Stereotactic and Functional Neurosurgery; 68(1-4); 18-24.
Iseki, H., Muragaki, Y., Taira, T., Kawamata, T., Maruyama, T., Naemura, K., Nambu,
K., Sugiura, M., Hirai, N., Hori, T., and Takakura, K. 2001. New possibilities for
stereotaxis - Information-guided stereotaxis. Stereotactic and Functional
Neurosurgery; 76(3-4); 159-167.
Jannin P. , Fleig O. J., Seigneuret E. , Grova C., Morandi X. , Scarabin J.M. 2000. A
Data Fusion Environment for Multimodal and Multi-Informational Neuro-
Navigation. Journal of Computer Aided Surgery; 5(1); 11-17.
Kaplan, W. (1981). Advance Mathematics for Engineers, Addison Wesley, Reading MA.
Kappert, U., Cichon, R., Schneider, J., Gulielmos, V., Ahmadzade, T., Nicolai, J.,
Tugtekin, S. M., and Schueler, S. 2001. Technique of closed chest coronary
143
artery surgery on the beating heart. European Journal of Cardio-Thoracic
Surgery; 20(4); 765-769.
Kato H. , Billinghurst M., Poupyrev I., Imamoto K., Tachibana K. 2000 Virtual Object
Manipulation on a Table-Top AR Environment.In: Proceedings of ISAR 2000.
Khamene A., Wacker F., Lewin J. 2003. An Augmented Reality System for MRI-Guided
Needle Biopsies.In: Medicine meets Virtual Reality 11, 151-157.
Knight, C., Cao, A., Lorincz, A., Gidell, K., Langenburg, S., and Klein, M. 2003a.
Application of a Surgical Robot to Open Microsurgery:. Pediatric Endosurgery &
Innovative Technique; 7(3); 227-232.
Knight, C., Cao, A., Lorincz , A., Gidell K., Langenburg, S., and Klein, M. 2003b.
Application of a Surgical Robot to Open Microsurgery: The Equipment. Pediatric
Endosurgery & Innovative Technique; 7(3); 227-232.
Komistek, R. D., Dennis, D. A., and Mahfouz, M. 2003. In vivo fluoroscopic analysis of
the normal human knee. Clinical Orthopaedics and Related Research (410); 69-
81.
Lee, C. C., Chung, P. C., and Tsai, H. M. 2003. Identifying multiple abdominal organs
from CT image series using a multimodule contextual neural network and spatial
fuzzy rules. Ieee Transactions on Information Technology in Biomedicine; 7(3);
208-217.
Lei, T. H., Udupa, J. K., Odhner, D., Nyul, L. G., and Saha, P. K. 2003. 3DVIEWNIX-
AVS: a software package for the separate visualization of arteries and veins in
CE-MRA images. Computerized Medical Imaging and Graphics; 27(5); 351-362.
144
Leventon, M. E. (2000). "Statistical Models in Medical Image Analysis," Ph.D.Thesis,
MIT, Boston.
Li Q, Zamorano L., Gong J., Pandya A.K. , Diaz: F. 2000 Application Accuracy of
Different Registration Methods in Frameless Computer-Assisted Surgery.In:
American Association of Neurological Surgeons Annual Meeting, San Francisco,
California, 125.
Li, Q., Zamorano L., Guthikonda M., Pandya A.K., Perez R., Diaz F. 2000 The
Application Accuracy of Intraoperative Registration and Related Factors in Image
Guided Neurosurgery.In: Congress of Neurological Surgeons Annual Meeting.
Li Q, Z. L., , Pandya A, Gong J, Elkhatib E, Perez R, Diaz F. 2001 The Application
Accuracy of the NeuroMate Robot System.In: AANS Annual Meeting, Toronto,
Ontario, 92.
Li Q. , Zamorano L. , Jiang Z. , Gong J., Pandya A.K., Perez R., Diaz F. 1999a. Effect
of optical digitizer selection on the application accuracy of a surgical localization
system - a quantitative comparison between the OPTOTRAK and flashpoint
tracking systems. Computer Aided Surgery; 4; 322-327.
Li Q. , Zamorano L., Pandya A.K., Perez R. ,Gong J., Diaz F. 2002. The Application
Accuracy of the NeuroMate Robot - A Quantitative Comparison with Frameless
and Frame-based Surgical Localization Systems. Computer Aided Surgery; 7;
90-98.
Li Q., Zamorano L., Perez R., Gong J., Pandya A.K., Diaz F.:. 1999b Endoscopic
Transnasal, Transseptal, and Transsphenoidal Approach for Pituitary tumors
145
Guided by Infrared Tracking System.In: 49th Annual Meeting of the Congress of
Neurological Surgeons, Boston, Massachusetts, 325 - 326.
Livingston, M. A., and State, A. 1997. Magnetic tracker calibration for improved
augmented reality registration. Presence-Teleoperators and Virtual
Environments; 6(5); 532-546.
M. Li. "Camera Calibration of the KTH Head-Eye System." TRITR-NA-9407, Dept. of
Numerical Analysis and Computer Science.
M. Ahmed, A. Farag. 2000 A Neural Optimization Framework for Zoom Lens Camera
Calibration.In: IEEE CVPR.
Marchese M., L. Q., Zamorano L. Pandya A. 2003 Quantitative Comparison
between the Heads-up-display (HUD) and Common Monitor in Endoscopic Surgery.In:
The 71th Annual Meeting of The American Association of Neurological
Surgeons, San Diego, California.
Marchese M., P. A., Mahmoud M., Higgins M., Li Q., Zamorano L, and . 2003
Quantitative Comparison between the Heads-up-display (HUD) and Common
Monitor in Endoscopic Surgery.In: Congress of Neurological Surgeons Annual
Meeting, Philadelphia.
Masutani, Y., Doshi, T., Yamane, F., Iseki, H., and Takakura, K. 1998. Augmented
reality based visualization system for intravascular neurosurgery,. Journal of
Computer Aided Surgery; 3(5); 239-47.
Maurer, C. R., Gaston, R. P., Hill, D. L. G., Gleeson, M. J., Taylor, M. G., Fenlon, M. R.,
Edwards, P. J., and Hawkes, D. J. 1999. AcouStick: A tracked A-mode
ultrasonography system for registration in image-guided surgery. Medical Image
146
Computing and Computer-Assisted Intervention, Miccai'99, Proceedings; 1679;
953-962.
Miller, M. I., Christensen, G. E., Amit, Y., and Grenander, U. 1993. Mathematical
Textbook of Deformable Neuroanatomies. Proceedings of the National Academy
of Sciences of the United States of America; 90(24); 11944-11948.
Nakao, N., Nakai, K., and Itakura, T. 2003. Updating of neuronavigation based on
images intraoperatively acquired with a mobile computerized tomographic
scanner: Technical note. Minimally Invasive Neurosurgery; 46(2); 117-120.
Nio, D., Bemelman, W. A., den Boer, K. T., Dunker, M. S., Gouma, D. J., and van Gulik,
T. M. 2002. Efficiency of manual vs robotical (Zeus) assisted laparoscopic
surgery in the performance of standardized tasks. Surgical Endoscopy and Other
Interventional Techniques; 16(3); 412-415.
Nowinski, W. L., Belov, D., and Benabid, A. L. 2003. An algorithm for rapid calculation
of a probabilistic functional atlas of subcortical structures from
electrophysiological data collected during functional neurosurgery procedures.
Neuroimage; 18(1); 143-155.
O. D Faugeras, Luong, Q. T., and Maybank, S. J.,. 1992 Camera Self-Calibration:
Theory and Experiments.In: n Proc. Of European Conf. Computer Vision, Santa
margherita Ligure, 321-334.
Pandya A. K. , Zamorano L. (patent authors in alphabetical order). (2002). "Augmented
Tracking Using Video, Computer Data and/or Sensing Technologies." Application
No. 10/101421 Customer Number 26646, Wayne State University, USA.
147
Pandya A.K. , .Zamorano, L. (2001a). "The Development and Human Factors Analysis
of Advanced 3-D Visualization for Telepresence." Galveston, Tx.
Pandya A.K., Aldridge Ann, Goldsby M., Maida J. 1994 Analysis of Human Posture
using a Strength Model and a Virtual Environment.In: Houston Society for
Medicine and Engineering, Houston, Tx.
Pandya A.K., Li Q., Zamorano L., Perez-de la Torre R. 2000a The Application
Accuracy of the Neuromate Robot ---- A Quantitative Comparison with Frameless
Infrared and Frame Based Surgical Localization Systems.In: Computer Assisted
Orthopeadic Surgery (CAOS), Pittsburg, Pennsylvania., 261.
Pandya A.K. , M. Siadat, G. Auner. Design, Implementation and Accuracy of a
Prototype for Medical Robotic Vision Augmentation. Computer Aided Surgery;
(Submitted).
Pandya A.K. , Siadat M., Li Q., Gong J., Zamorano L., Martinez J., Perez R., Maida
J.C. 2001b Does Heads-up Display Improve Neurosurgical Endoscopic
Procedures?In: Congress of Nurological Surgery, San Diego, California.
Pandya A.K., Siadat M., Gong J., Li. Q, Zamorano, L , Maida J.C. 2001c Towards
Using Augmented Reality for Neurosurgery.In: Medicine Meets Virtual Reality 9:
Outer Space, Inner Space, Virtual Space, Newport Beach, CA.
Pandya A.K., Siadat M., Maida J., Auner G., Zamorano L. 2003a Robotic Vision
Registration and Live-Video Augmentation-- A Prototype for Medical and Space
Station Robots.In: Bioastronautics Investigators Workshop, Galveston, Texas,
27.
148
Pandya A.K. , Siadat M., Zamorano L. ,Gong J.,Li Q. , Maida J.C., Kakadiaris I. 2001d
Tracking Methods for Medical Augmented Reality.In: Medical Image Computing
and Computer-Assisted Intervention - MICCAI 2001 (Lecture Notes in Computer
Science), Utrecht, The Netherlands,, 1406-1408.
Pandya A.K., Siadat M., Zamorano L., Gong J., Li Q. , Maida J.C., Kakadiaris I. 2001e
Augmented Robotics for Neurosurgery.In: American Association of Neurological
Surgeons, Toronto, Ontario.
Pandya A.K., Siadat M.,Auner G., Kalash M., Ellis R.D. 2003b Development and
Human Factors Analysis of Neuronavigation vs. Augmented Reality.In: Medicine
Meets Virtual Reality, Newportbeach, CA.
Pandya A.K., Siadat M.. , Ye Z. , Prasad M., Auner G., Zamorano L. , Klein M. 2003c
Medical Robot Vision Augmentation--A Prototype.In: Medicine Meets Virtual
Reality, Newport Beach, California, 85.
Pandya A.K., Zamorano L , Siadat M. ,Gong J., Li Q.. Maida J.C., Daryan L. 2002 The
Development and Human Factors Analysis of Advanced 3-D Visualization for
Telepresence-- NASA Grant Report Year 2.In: NASA's Space Human Factors
Workshop, Center for Advanced Space Studies, Houston , Tx.
Pandya A.K., Zamorano L. Li Q., Gong J., Grosky W., Miada, J.C. 2000b Advanced
Surgical Image Environments.In: Detroit Neurosurgery Conference, Detroit, Mi.
Pandya A.K., Zamorano L., Siadat M. , Li Q. , Gong J., Maida J.C. 2001f Augmented
Robotics for Medical and Space Applications.In: Human Systems 2001, NASA
Johnson Space Center, Houston, Tx.
149
Partin, A. W., Adams, J. B., Moore, R. G., and Kavoussi, L. R. 1995. Complete robot-
assisted laparoscopic urologic surgery: a preliminary report. J Am Coll Surg;
181(6); 552-7.
Patel, V., Vannier, M., Marsh, J., and Lo, L. 1996. Assessing Craniofacial Surgical
Simulation. IEEE Computer Graphics and Applications; 46-54.
PM., T., Gee, A., Prager, R., and Berman, L. 2000. Body-centered visualisation for
freehand 3D
ultrasound. Ultrasound in Medicine and Biology; 26(4); 539–550.
Pransky, J. 2001. An intelligent operating room of the future - an interview with the
University of California Los Angeles Medical Center. Industrial Robot; 28(5); 376-
380.
Press W.H., Teukolsky S.A., Vetterling W.T. , Flannery. (1992). Numerical Recipes in
C++: The Art of Scientific Computing, Press Syndicate of the University of
Cambridge.
R. Atienza, A. Zelinsky. 2001 A Practical Zoom Camera Calibration Technique: An
Application of Active Vision for Human-Robot Interaction.In: Australian
Conference on Robotics and Automation, Sydney, Australia.
Raya M. A. , Marcinek H. V., . Saez J. M. M, . Sanchez R. T, Lizandra M. C. J., Aranda
, Gomez J. A. G. 2003 Mixed Reality for Neurosurgery: A Novel Prototype.In:
Medicine meets Virtual Reality 11, 11-15.
Riviere, C. N., Ang, W. T., and Khosla, P. K. 2003. Toward active tremor canceling in
handheld microsurgical instruments. Ieee Transactions on Robotics and
Automation; 19(5); 793-800.
150
Roberts, D. W., Lunn, K., Sun, H., Hartov, A., Miga, M., Kennedy, F., and Paulsen, K.
2001. Intra-operative image updating. Stereotactic and Functional Neurosurgery;
76(3-4); 148-150.
Robinett, W. 1992. Synthetic Experience: A Proposed TAxonomy. Presence; 1(2); 229-
247.
Rosenthal, M., State, A., Lee, J., Hirota, G., Ackerman, J., Keller, K., Pisano, E. D.,
Jiroutek, M., Muller, K., and Fuchs, H. 2002. Augmented reality guidance for
needle biopsies: An initial randomized, controlled trial in phantoms. Medical
Image Analysis; 6(3); 313-320.
Samset, E., and Hirschberg, H. 2003. Image-guided stereotaxy in the interventional
MRI. Minimally Invasive Neurosurgery; 46(1); 5-10.
Satava, R. M. 1999. Emerging technologies for surgery in the 21st century. Archives of
Surgery; 134(11); 1197-1202.
Sato, Y., Nakamoto, M., Tamaki, Y., Sasama, T., Sakita, I., Nakajima, Y., Monden, M.,
and Tamura, S. 1998. Image guidance of breast cancer surgery using 3-D
ultrasound images and augmented reality visualization. Ieee Transactions on
Medical Imaging; 17(5); 681-693.
Siadat M., Pandya A.K. ,Zamorano L., Li Q., Gong J., Maida J.,. 2002 Camera
Calibration for Neurosurgery Augmented Reality.In: World Multiconference on
Systemics, Cybernetics and Informatics, Orlando Florida, July 14-18.
Sturm, P. 2002,. Critical Motion Sequences for the Self-Calibration of Cameras and
Stereo Systems with Variable Focal Length. Image and Vision Computing; 20;
415-426.
151
Taylor, R., Jensen, P., Whitcomb, L., Barnes, A., Kumar, R., Stoianovici, D., Gupta, P.,
Wang, Z. X., deJuan, E., and Kavoussi, L. 1999. A steady-hand robotic system
for microsurgical augmentation. International Journal of Robotics Research;
18(12); 1201-1210.
Taylor, R. H., Dario, P., and Troccaz, J. 2003. Special issue on medical robotics. Ieee
Transactions on Robotics and Automation; 19(5); 763-764.
Taylor, R. H., and Stoianovici, D. 2003. Medical robotics in computer-integrated
surgery. Ieee Transactions on Robotics and Automation; 19(5); 765-781.
Taylor-Adams, S., Vincent, C., and Stanhope, N. 1999. Applying human factors
methods to the investigation and analysis of clinical adverse events. Safety
Science; 31(2); 143-159.
Terazzi, A., Giordano, A., and Minuco, G. 1998. How can usability measurement affect
the re-engineering process of clinical software procedures? International Journal
of Medical Informatics; 52(1-3); 229-234.
Tewari, A., Peabody, J., Sarle, R., Balakrishnan, G., Hemal, A., Shrivastava, A., and
Menon, M. 2002. Technique of da Vinci robot-assisted anatomic radical
prostatectomy. Urology; 60(4); 569-572.
Thompson, J. M., Ottensmeyer, M. P., and Sheridan, T. B. 1998. Human factors in tele-
inspection and tele-surgery: Cooperative manipulation under asynchronous video
and control feedback. Medical Image Computing and Computer-Assisted
Intervention - Miccai'98; 1496; 368-376.
152
Tsai, A., Wells, W., Tempany, C., Grimson, E., and Willsky, A. 2003. Coupled multi-
shape model and mutual information for medical image segmentation.
Information Processing in Medical Imaging, Proceedings; 2732; 185-197.
Tsai, R. 1987. A Versatile Camera Calibration Technique for High-Accuracy 3D
Machine Vision Metrology Using Off-The-Shelf TV Cameras and Lenses. IEEE J.
of Robotics and Automation; RA-3(4).
Van Loan, F. (2000). Introduction to Scientific Computing-- A Matrix-Vector Approach
Using matlab, Prentice-Hall, Upper Saddle River, NJ.
Vayssiere, N., Hemm, S., Cif, L., Picot, M. C., Diakonova, N., El Fertit, H., Frerebeau,
P., and Coubes, P. 2002. Comparison of atlas- and magnetic resonance
imaging-based stereotactic targeting of the globus pallidus internus in the
performance of deep brain stimulation for treatment of dystonia. Journal of
Neurosurgery; 96(4); 673-679.
Wadley, J. P., and Thomas, D. G. T. 2000. Neuronavigation: Accuracy, benefits, and
pitfalls. Neurosurgery Quarterly; 10(4); 276-310.
Wagner, A., Ploder, O., Enislidis, G., Truppe, M., and Ewers, R. 1995. Virtual Image-
Guided Navigation in Tumor Surgery - Technical Innovation. Journal of Cranio-
Maxillo-Facial Surgery; 23(5); 271-273.
Walsh, T., and Beatty, P. C. W. 2002. Human factors error and patient monitoring.
Physiological Measurement; 23(3); R111-R132.
Wang L., Tsai W. 1991. Camera Calibration by Vanishing Lines for 3-D Computer
Vision. IEEE PAMI; 13(4).
153
Watt, R. J. 1985. Image Segmentation at Contour Intersections in Human Focal Vision.
Journal of the Optical Society of America a-Optics Image Science and Vision;
2(7); 1200-1204.
Weese, J., Buzug, T. M., Penney, G. P., and Desmedt, P. 1998. 2D/3D registration and
motion tracking for surgical interventions. Philips Journal of Research; 51(2);
299-316.
Weinger, M. B., Pantiskas, C., Wiklund, M. E., and Carstensen, P. 1998. Incorporating
human factors into the design of medical devices. Jama-Journal of the American
Medical Association; 280(17); 1484-1484.
Weng J., Cohen P., Herniou M. 1992. Camera Calibration with Distortion Models and
Accuracy Evaluation. IEEE PAMI; 14,(10).
Zamorano, L., Dujovny, M., Malik, G., Mehta, B., and Yakar, D. 1987a. Factors Affecting
Measurements in Computed-Tomography-Guided Stereotactic Procedures.
Applied Neurophysiology; 50(1-6); 53-56.
Zamorano, L., Dujovny, M., Malik, G., Yakar, D., and Mehta, B. 1987b. Multiplanar Ct-
Guided Stereotaxis and I125 Interstitial Radiotherapy - Image-Guided Tumor
Volume Assessment, Planning, Dosimetric Calculations, Stereotactic Biopsy and
Implantation of Removable Catheters. Applied Neurophysiology; 50(1-6); 281-
286.
Zamorano L., Li Q., Pandya A.K.,Gong J., Diaz F. 2001 Interactive Image-Guided
Neurosurgery with Stryker Wireless Navigation System.In: CNS, San Diego,
California.
.
154
ABSTRACT
MEDICAL AUGMENTED REALITY SYSTEM FOR IMAGE-GUIDED AND ROBOTIC SURGERY :
DEVELOPMENT AND SURGEON FACTORS ANALYSIS
by
ABHILASH PANDYA
May 2004
Advisor: Dr. Gregory Auner
Major: Biomedical Engineering (Scientific Computing)
Degree: Doctor of Philosophy
This research is focused on the development and surgeon factors analysis of
advanced visualization technology for the operating room. The hypothesis of this work
is that applying advanced technology for the visualization of real-time medical data will
enhance the performance, comfort and insight of the surgeon. It will then also improve
the morbidity and mortality of patients.
In the first study, we use a passive robot arm to track a calibrated end-effector
mounted video camera. In real time, we superimpose the live video view with the
synchronized graphical view of CT-derived segmented object(s) of interest within a
phantom skull (Augmented Reality (AR)). Using the same arm, we have also
developed an Image Guided Surgery system (IGS) (Virtual Reality) able to show a
tracked tool’s trajectory on orthogonal image data scans and 3D models. Both systems
are designed with client/server architecture for potential use in telepresence. A Human
factors study was conducted using 21 subjects (3 surgeons) to try and see if differences
155
in terms of time, errors and level of awareness of the patient 3D anatomy existed
between the two systems. This study indicated that IGS took a statistically significant
longer time than did AR. In addition, (although on the border of statistical significance (p
value of 0.068)), IGS did have on average a greater number of errors indicating gaps in
awareness of the phantom’s anatomy.
In a second study, a comparison of display hardware for the video stream that is
being viewed from the remote surgical site was conducted. The main question was:
Does visualization of the remote video at the surgical site by a Head-up display improve
the performance of the test subject over viewing a monitor? In this study we concluded
(using 22 subjects) that the use of a Head-up display compared with a 45° angled
monitor influences positively the performance of the surgeon.
We believe and have show via subject testing that Augmented Reality generation
is a natural extension for the surgeon because it does both the 2D to 3D transformation
and projects the views directly onto the patient view. We conjecture that medical robotic
devices of the future should be able to use this technology to directly link these systems
to patient data and provide the optimal visualization of that data for the surgical team.
The design and methods of the AR prototype device can, we believe, be extrapolated
for current medical robotics systems and IGS systems. There are distinct advantages
and disadvantages for the use of both AR and IGS systems and hence, as future work
we propose a hybrid on-demand AR/VR system for use in Robotic and Image Guided
Surgery.
156
AUTOBIOGRAPHICAL STATEMENT Abhilash Pandya was born in Kenya, East Africa on July 16th 1965 and came to Michigan in 1972. He is married and has two daughters (Maya 5 and Keena 3). His research interest has always been in the utilization of Computation and Engineering principles to study and impact Science and Medicine (Bioengineering). He completed his Masters degree in 1988 in Bioengineering (with concentration in Computer Science) at the University of Michigan at Ann Arbor where his research was in modeling and simulation of the signal processing (from sound waves to action potentials) of the inner ear. For this work he worked at the “Bionic Ear” (Ear Implants) laboratory at UM-Ann Arbor. He also has a certification in Scientific Computing from Wayne State University. His undergraduate education was a combination of Biochemistry with a concentration in Computer Science from the University of Michigan (attending both the Ann Arbor and Dearborn campuses). where his research and education focused on graphical modeling and simulation of biochemical reactions. In 1987 he was a key original member of a small (15 people) start-up Biotechnology company (Virogen Inc.) in which he developed a computer genomic model of the AIDS virus and also built a graphical simulator for robotic processing of AIDS samples. From 1988 to 1998 (10 years), he worked at NASA Johnson Space Center under various (Lockheed) contracts for NASA’s Flight Crew Support Division. His major projects included building software for a Laser-based 3D scanning system for scanning Astronauts for 3D modeling, developing software for a Space Station Robotics hand-controller commonality study, and computer graphics based modeling of the kinematics, dynamics and strength of space suited and unsuited astronauts. He was also the primary member of a team of 3 who developed the software for a fully-immersive human-model-based Virtual Reality system for Space Station applications. From 1998 to 2002, he worked at the Neurosurgery Department (Harper Hospital) at Wayne State University where he developed and supported Image Guided Surgery software for use in the operating room and led a team of Engineers in research in Robotic and Image Guided Neurosurgery and Augmented Reality. He has been working at the Smart Sensors and Integrated Microsystems Laboratory (SSIM) at the Electrical and Computer Engineering Department at Wayne State University leading a group of Engineers doing research on sensor fusion, advanced visualization and interface optimization for Robotic Assisted Surgery. He has been involved in the preparation and execution of 6 NASA grants. He is the Principle Investigator on a recent NASA grant (starting March 2004) in which Augmented Reality technology will be developed for Space Station Robotics. He has over 60 publications (including conferences paper, invited talks (2), journals (5) and NASA technical Reports (3) ) in the fields of Virtual Reality, Augmented Reality, Robotics, Human/ Space Suit Modeling and Human Factors. He has also filed for a patent on AR techniques related to robotics.