simulations of prostate biopsy methods - citeseer

SIMULATIONS OF PROSTATEBIOPSY METHODSbyCatherine Colby PellishB.S.E.E., Marquette University, 1985A thesis submitted to theUniversity of Colorado at Denverin partial ful�llmentof the requirements for the degree ofMaster of ScienceApplied Mathematics1997

This thesis for the Master of Sciencedegree byCatherine Pellishhas been approvedbyWilliam L. BriggsJames R. KoehlerWeldon A. Lodwick

Date

Pellish, Catherine Colby (M.S., Applied Mathematics)Simulations of Prostate Biopsy MethodsThesis directed by Associate Professor William L. BriggsAbstractAn accepted practice in screening for prostate cancer involves a nee-dle core biopsy of the prostate gland, which can provide information regardingif, and how much, cancer is present in a gland. This paper documents severalinvestigations into prostate gland biopsy techniques. The �rst phase of studyinvolves a geometric model of a prostate gland containing one to three tu-mors. This mathematical model of the gland is then used to simulate variousbiopsy techniques and compare the resulting data. Secondly, the best biopsyprocedure, as determined from the geometric model, is simulated on actualspecimen data which have been digitized. These specimen data are also usedfor simulation of the six random systematic core biopsy technique (SRSCB)currently in clinical use. The results of the geometric model are comparedto the results of the simulation on actual data. Finally, the geometric modelis used in another series of simulations that investigate the number of needlesamples needed to estimate the tumor to gland volume ratio.iii

This abstract accurately represents the content of the candidate's thesis. Irecommend its publication. Signed William L. Briggs

iv

ACKNOWLEDGEMENTSI would like to sincerely thank a number of people who consistentlyprovided me with their support, encouragement and guidance as I pursued thecompletion of this thesis. Dr. Bill Briggs, my advisor, served as a constantsource of insight and motivation, as well as providing considerable directionthroughout this process. I am also grateful for the time spent with Dr. JimKoehler who had to teach me the �ner points of statistics again and again.My thanks to both of these professers for proving to be excellent academicsources. I also would like to thank Norm LeMay who, out of the generousityof his heart and his need for a free lunch, assisted me in running the ANOVAanalysis which this thesis required.Finally, I must thank my family, Mark, Eric and Corinne for encour-aging me and making me laugh through every crisis.

CONTENTSChapter1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.1 Clinical Prostate Biopsy Analysis . . . . . . . . . . . . . 21.2 Summary of Mathematical Methods . . . . . . . . . . . 42 The Geometric Model . . . . . . . . . . . . . . . . . . . . . . 52.1 Geometric Model of gland and tumor . . . . . . . . . . 52.2 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . 102.3 Statistical Analysis of Results . . . . . . . . . . . . . . . 142.4 Simulation Results . . . . . . . . . . . . . . . . . . . . . 162.4.1 Applying the ANOVA to the Biopsy Simulation Data 182.4.2 ANOVA Mechanics . . . . . . . . . . . . . . . . . . . 232.4.3 Residuals . . . . . . . . . . . . . . . . . . . . . . . . 242.4.4 The Null and Alternate Hypotheses . . . . . . . . . . 252.4.5 Are the Main E�ects all Equal? . . . . . . . . . . . . 272.4.6 Recognizing Interaction between Factors . . . . . . . . 302.4.7 Clinical Distribution of Tumors . . . . . . . . . . . . . 38vi

3 Digitized Specimen Data . . . . . . . . . . . . . . . . . . . . 433.1 Summary of Software Tool . . . . . . . . . . . . . . . . 433.2 Speci�c Algorithms . . . . . . . . . . . . . . . . . . . . 453.2.1 Locating the Apex . . . . . . . . . . . . . . . . . . . . 453.2.2 Establishing Needle Positions . . . . . . . . . . . . . . 473.3 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . 493.4 Geometric Model vs Clinical Model . . . . . . . . . . . 513.5 Optimal Technique vs SRSCB . . . . . . . . . . . . . . 534 Geometric Model - Volume Estimates . . . . . . . . . . . . . 564.1 Tumor Volume Estimates . . . . . . . . . . . . . . . . . 564.1.1 One-Dimensional Analysis - Line Model . . . . . . . . 584.1.2 Two-Dimensional - Strip Model . . . . . . . . . . . . 584.1.3 Three-Dimensional - Cylinder Model . . . . . . . . . . 594.2 Experiment Setup . . . . . . . . . . . . . . . . . . . . . 604.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 624.4 Interactive Utility . . . . . . . . . . . . . . . . . . . . . 63AppendixA ANOVA De�nitions . . . . . . . . . . . . . . . . . . . . . . . 651

1. Introduction1.1 Clinical Prostate Biopsy AnalysisCurrently the standard method of determining if a given prostategland is cancerous involves two procedures. The �rst is the prostate-speci�cantigen (PSA) test which measures the level of antigens in the patient's blood,a high level indicating a higher possibility of cancerous tissue. The secondprocedure is the needle biopsy which is carried out if the PSA test so indicates.The clinician conducts this biopsy by inserting a needle-tool, equipped withultrasound capabilities, into the patient's rectum. The gland is located andthe urologist �res three needles into the right lobe of the gland and threeneedles into the left lobe at approximately symmetric positions. The left-right division of the gland is determined by the position of the urethra in thegland. This physical landmark is used as the visual dividing line, enablingclinicians to execute the biopsy in a systematic manner. The needle-tool isrotated to the left or right depending on the targeted lobe. This rotationcorresponds to the angle � used in the mathematical analysis. Following thisslight rotation, the needles are inserted at a second independent angle, referred2

to as �. The choice of a six-needle biopsy is based on the six random systematiccore biopsies (SRSCB) method developed by Hodge et al [1] and currentlythought to achieve the best detection rates.The results from this diagnostic biopsy are then analyzed in order todetermine the best treatment plan for the patient. There are several factorsthat help the urologist choose the optimal treatment plan. The �rst factor isobviously whether the biopsy shows any tumor cells at all. According to theHodge study, 96% of the 83 men diagnosed with cancer had the cancer detectedby SRSCB. However, as investigated by Daneshgari et al [2], in prostate glandswith low tumor volume, the SRSCB fails to achieve such a high percentage ofdetection. This study concluded that \an improved biopsy strategy may beneeded in detection of CaP (carcinoma of the prostate) in patients with lowvolume cancer". Secondly, the volume of the tumor itself is a deciding factorin determining treatment. Thirdly, the location of the tumor, speci�cally ifthe tumor penetrates the capsule of the gland, can de�ne a speci�c treatmentplan. Some of this information is available from a single needle-core biopsy;more information is gleaned from successive, strategically placed biopsies.3

1.2 Summary of Mathematical MethodsAs an aid in understanding this problem, as well as researching waysto improve diagnosis, two methods of analysis are undertaken. The �rstmethod relies on a geometric model of the prostate gland with from one tothree tumors. Various biopsy methods are simulated with this mathematicalmodel and results are tabulated. The second method involves running thesame biopsy simulations on actual prostate glands which have been digitizedand stored as three-dimensional objects in a computer. The experimental re-sults from these two methods are then compared. All of the simulations wereexecuted using software created for this purpose primarily by this author, al-though the skeletons of these software tools were engineered during the Spring1995 Math Clinic on this topic by several participants. The simulations arewritten in C and C++, running on a UNIX-based computer. They are exten-sively documented and exible enough to be useful in a variety of experimentswithin this realm of research.4

2. The Geometric Model2.1 Geometric Model of gland and tumorAn actual prostate gland is about the size of a walnut with volumesranging from 22 cc to 61 cc [3]. The geometry of an ellipsoid closely modelsthis gland and any tumors present within it. Therefore, an ellipsoid of theform x2A2 + y2B2 + z2C2 = 1;is used to represent the prostate gland. Ellipsoids are also used to representeach of the tumors. The dimensions of the gland, A;B, and C, are chosenrandomly in the following experimentally determined ranges:� 3.0 cm < A < 4.8 cm� 3.8 cm < B < 4.6 cm� 3.8 cm < C < 5.2 cm� 22 cc < [gland volume] < 61 cc.The prostate is divided into 3 zones: the peripheral, the central andthe transition region. The peripheral zone comprises approximately 70% ofthe mass of the prostate gland. It is located in the lower area of the gland,5

closest to the rectum. This region is the \site of origin of most carcinomas"[3].The central region makes up approximately 25% of the glandular mass andis \resistant to both carcinoma and in ammation"[3]. The transition regioncontains the remaining 5% of prostate gland tissue and can be the site of somecancers. Figure 2.1 shows these regions of the prostate gland. Based on thisclinical information, the software-generated tumors are located in the lowerpart of the elliptical gland model to simulate tumors residing in the peripheralzone. Figure 2.2 depicts the geometrical gland and tumor model in the xyzsystem. Since the gland model is centered at the origin, the y-coordinate ofthe tumor center, yc, is always negative in order to place the tumor in theperipheral zone of the gland. However, other distributions of y could be usedto improve the model.Tumors are modeled by an equation of the form(x� xc)2a2 + (y � yc)2b2 + (z � zc)2c2 = 1where xc, yc and zc specify the center of the tumor.The biopsy needle is modeled as a line with the parametric equationsx(t) = x0 + t sin � sin�y(t) = y0 + t sin � cos�z(t) = z0 + t cos �;6

Figure 2.1. The peripheral (PZ), central (CZ) and tran-sition (TZ) regions divide the prostate gland into 3 ma-jor zones.

Tumor Ellipsoid

Y

Z

X

Gland Ellipsoid

B

A

C

Figure 2.2. The gland and tumor are modeled by ellip-soids in the xyz coordinate system.7

where x0, y0, and z0 are the coordinates of the entry point of the needles(Figure 2.3 and Figure 2.4). The angle � is measured from the y-axis anddetermines a plane. The angle � is then assumed to remain in this plane andis measured from the z-axis. From these de�nitions, the parametric equationsfor the line are determined. The parameter t measures the length of the needle.

x , y , z

Y

X

Needle

Gland Ellipsoid

Φ

0 0 0Figure 2.3. This �gure of the xy plane and needle illus-trates measurement of �.Substituting the parametric equations of the needle into the equation for thetumor, it is possible to determine values of t corresponding to an intersection.The equation of the tumor is(x(t)� xc)2a2 + (y(t)� yc)2b2 + (z(t)� zc)2c2 = 1:

8

0 0 0

Needle

Y

Z

Gland Ellipsoid

Θ

x , y , z Figure 2.4. This �gure of the yz plane and needle illus-trates measurement of �.Replacing x(t), y(t) and z(t) by the parametric equations of the nee-dle gives t2(sin2 � sin2 �a2 + sin2 � cos2 �b2 + cos2 �c2 )+t(2(x0 � xc) sin� sin �a2 + 2(y0 � yc) sin � cos�b2 +2(z0 � zc) cos �c2 + ((x0 � xc)2a2 + (y0 � yc)2b2 + (z0 � zc)2c2 ) = 1: (2.1)If the discriminant (B02 � 4A0C 0) is positive, two real roots exist. In this casewe have A0 = sin2 � sin2 �a2 + sin2 � cos2 �b2 + cos2 �c2B0 = 2(x0 � xc) sin� sin �a2 + 2(y0 � yc) sin � cos�b2 + 2(z0 � zc) cos �c2C 0 = (x0 � xc)2a2 + (y0 � yc)2b2 + (z0 � zc)2c2 :9

If real roots t1 and t2 exist, they give the points where the tumorellipsoid and the line intersect. If these values are greater than 0 and less thanthe actual needle length, the needle has intersected the tumor. The amountof tumor extracted by the needle is proportional to the di�erence between thetwo roots of the quadratic, j t1� t2 j. By comparing the two roots, an estimateof the volume of the tumor that is contained in the needle can be made. Ifreal roots do not exist, the needle does not intersect the tumor ellipsoid andno tumor information is gained by that needle.In this analysis, each biopsy procedure was simulated on 1000 di�er-ent gland models and the number of times a tumor was detected per procedurewas recorded. This method does not di�erentiate between one or more nee-dles detecting the tumor. It simply records a hit or miss per biopsy procedure.In addition, an estimate of the tumor volume is made whenever a tumor isdetected.2.2 SimulationsSince a fundamental goal of any biopsy is to determine whether ornot the gland contains cancerous cells, the �rst series of simulations is intendedto compare the detection rate of several biopsy techniques. The detection rateis de�ned as the number of times a biopsy procedure detects a tumor to the10

total number of biopsies conducted. A set of 54 di�erent biopsy proceduresis simulated with variation in the following parameters: number of needles,o�set between needles in the z direction, �, and �.The distance in the z direction between needles can be a relativespacing based on the gland dimension in the z direction or an absolute spacingof 1 cm between each needle. The �rst method is referred to as relativespacing since it depends on the gland size and separates the needles by equaldistance. The second is referred to as the absolute spacing and has its basisin the SRSCB procedure.As a means of clari�cation, Figures 2.5 and 2.6 illustrate the analysisof a single specimen and the execution of the entire experiment. Each of the54 biopsy procedures is simulated on 1000 di�erent gland models. The randomnumber generator is seeded once for each series of 1000 simulations using aspeci�c biopsy technique. Prior to the next technique, the random numbergenerator is reseeded with the same number, thereby yielding the identical setof 1000 prostate models. This insures that each of the biopsies is conductedon the same set of 1000 simulated glands. The detection rate is determinedfor each of these procedures and the results of the simulation are documentedin Table 2.1.11

Make Tumor(s)

Determine starting

location for all

done?

needles

Simulate a single

needle biopsy. Solve

using initial needle

position; store hit

and volume results.

NO

YES

Make a gland model

Simulation Over

equation (1) for roots

Needles

All

Figure 2.5. This ow chart depicts the top-level algo-rithm for modeling a single biopsy with several needles.12

Read in Biopsyparameters for a

Simulate this biopsy on a single

Simulation Over

gland model.

given procedure.

Done

glands?1000

NO

YES

All54 biopsy

procedures done?

NO

YES

Figure 2.6. This ow chart depicts the simulation pro-cess for the entire simulation, each biopsy procedure issimulated on 1000 geometric gland models.13

2.3 Statistical Analysis of ResultsIn order to interpret the output from the simulations legitimately, astatistical tool is needed. First, we must determine whether or not the variousbiopsy settings in uence the observed detection rate. In other words, is there arelationship between the settings of any one or combination of the four factors(number of needles, z-spacing, � and �) and the detection rate or are the resultscompletely random, therefore implying that the biopsy speci�cation does notdetermine the detection rate? We need a mathematically sound method tocompare the detection rates provided by the simulation and to infer someconclusions. The statistical model known as Analysis of Variance (ANOVA)was used to compare the population means between various treatments, thusresulting in a statistically valid conclusion. This model can be employed todetermine whether the various factors interact and which factors have the mostimpact on the outcome.In order to describe the ANOVA model, a few de�nitions are required.(1) Factors are the independent variables that are under investigation.In this instance, the biopsy parameters (number of needles, spacing14

method, � and �) are the factors for the ANOVA model.Number of Needles Spacing Method � �Factor 4 Absolute 30� 30�Levels 6 Relative 45� 45�8 60� 60�(2) Factor levels are the values that each of the factors can take on duringa single simulation. As shown in the list of biopsy simulation factorsand levels, each factor does not have the same number of factor levels.The factor Spacing Method only has two factor levels, whereas theother three factors each have three factor levels.(3) A treatment is a particular combination of levels of each of the factorsinvolved in the experiment, where an experiment is the simulationof the treatment on 1000 geometric specimens. In this example, atreatment refers to a biopsy with speci�c settings (for example, 4 nee-dles, absolute spacing, � = 45�, � = 45�). For the simulation, thereare 54 di�erent treatments and therefore, 54 di�erent experiments,corresponding to all the combinations of the levels of the four factors.(4) A trial is de�ned to be a simulation of one treatment on one geomet-ric model. The outcome of a trial is either 1, the biopsy proceduredetected the tumor, or 0, the tumor remained undetected. The out-come of the experiment is the detection rate achieved by a speci�c15

treatment simulated on 1000 geometric specimens. In other words, theoutcome of the experiment is the number of specimens in whichtumor is detected versus the total number of specimens simulated andis referred to as outcome for the remainder of this thesis.2.4 Simulation ResultsFor each of the 54 treatments, the simulation is conducted on 1000di�erent gland models. The following table summarizes the treatment param-eters as well as the results:Treatment Parameters OutcomeNumber of Spacing DetectionExperiment Needles Method � � Rate1 4 Relative 45� 45� 0.2522 6 Relative 45� 45� 0.3073 8 Relative 45� 45� 0.3354 4 Absolute 45� 45� 0.2635 6 Absolute 45� 45� 0.2936 8 Absolute 45� 45� 0.2987 4 Relative 60� 45� 0.2678 6 Relative 60� 45� 0.3419 8 Relative 60� 45� 0.36910 4 Absolute 60� 45� 0.27011 6 Absolute 60� 45� 0.32012 8 Absolute 60� 45� 0.33913 4 Relative 30� 45� 0.19614 6 Relative 30� 45� 0.22515 8 Relative 30� 45� 0.25516 4 Absolute 30� 45� 0.20717 6 Absolute 30� 45� 0.22118 8 Absolute 30� 45� 0.221Table 2.1. The results from the 54 geometric modelexperiments are displayed.16

Treatment Parameters OutcomeNumber of Spacing DetectionExperiment Needles Method � � Rate19 4 Relative 45� 60� 0.20020 6 Relative 45� 60� 0.23421 8 Relative 45� 60� 0.26822 4 Absolute 45� 60� 0.21123 6 Absolute 45� 60� 0.22524 8 Absolute 45� 60� 0.22825 4 Relative 60� 60� 0.19126 6 Relative 60� 60� 0.25427 8 Relative 60� 60� 0.26828 4 Absolute 60� 60� 0.20929 6 Absolute 60� 60� 0.24030 8 Absolute 60� 60� 0.24631 4 Relative 30� 60� 0.17232 6 Relative 30� 60� 0.19433 8 Relative 30� 60� 0.21934 4 Absolute 30� 60� 0.18835 6 Absolute 30� 60� 0.19736 8 Absolute 30� 60� 0.19737 4 Relative 45� 30� 0.26038 6 Relative 45� 30� 0.31639 8 Relative 45� 30� 0.34140 4 Absolute 45� 30� 0.26441 6 Absolute 45� 30� 0.30542 8 Absolute 45� 30� 0.31643 4 Relative 60� 30� 0.28344 6 Relative 60� 30� 0.35145 8 Relative 60� 30� 0.38546 4 Absolute 60� 30� 0.27947 6 Absolute 60� 30� 0.34648 8 Absolute 60� 30� 0.37249 4 Relative 30� 30� 0.21050 6 Relative 30� 30� 0.24751 8 Relative 30� 30� 0.27352 4 Absolute 30� 30� 0.22553 6 Absolute 30� 30� 0.24554 8 Absolute 30� 30� 0.247Table 2.1. (Cont.) The results from the 54 geometric modelexperiments are displayed.17

2.4.1 Applying the ANOVA to the Biopsy Simulation DataThe biopsy simulation is a multi-factored system, in which the fourparameters (number of needles, spacing, � and �) individually and perhapsin some combinations may have a measurable e�ect on the detection rate.Therefore a factor e�ects model is used in order to determine the impactof and interactions between these four parameters. This biopsy simulationis considered a complete factorial study since all possible combinations of thefour parameters were simulated and evaluated. The indices i; j; k; l refer to thelevels of the factors number of needles, spacing method, � and �, respectively.In this multi-factored system, a true overall mean, � which is equiv-alent to the true overall detection rate, is assumed to exist. The entire simu-lation results in 54 observed detection rates, pijkl, each of which indicates theobserved detection rate for a given experiment. This set of 54 observed detec-tion rates is used in the ANOVA to determine estimated factor e�ects and anestimated overall mean which are used in the factor e�ects model. The factore�ects model is used to predict a detection rate, a probability of detection,pijkl, given the levels of the four factors.A factor level mean is the average detection rate for a group of18

treatments that have one common factor level held constant while all othersvary. For example, all outcomes from experiments with Number of Needles= 6are averaged to yield the factor level mean for the factor Number of Needlesat the level i = 6. The overall mean, �, is simply the average outcomeof all experiments. The di�erence between each factor level mean and theoverall mean yields the main e�ect for that factor level. Because this modelhas 4 factors each with either 2 or 3 levels, the following main e�ects aredesignated.� �i - the main e�ect for the factor Number of Needles at each of itslevels (4,6,8): 1 � i � 3.� �j - the main e�ect for the factor Spacing Method at each of its levels(0,1): 1 � j � 2.� k - the main e�ect for the factor � at each of its levels (30�,45�,60�):1 � k � 3.� �l - the Main E�ect for the factor � at each of its levels (30�,45�,60�):1 � l � 3.A factor at a particular level may in uence another factor either byinhibiting or enhancing its impact. Because of these interactions betweenfactors, the interaction e�ects are included in the model. Pairwise interaction19

e�ects are a measure of the combined e�ect of two factors, across the di�erentlevels, minus the main e�ects of these factors. We de�ne these two-way e�ectsas follows.� (��)ij - number of needles and spacing method� (� )ik - number of needles and �� (��)il - number of needles and �� (� )jk - spacing method and �� (��)jl - spacing method and �� ( �)kl - � and �.Three-way factor e�ects are a measure of the interaction e�ect of three factors.� (�� )ijk - number of needles, spacing method and �� (��)ijl - number of needles, spacing method and �� (� �)jkl - spacing method, � and �� (� �)ikl - number of needles, � and �.The four-way e�ect is the measure of the interaction e�ect of all four factors.� (�� )ijkl - number of needles, spacing method, � and �.

20

Summary of VariablesTrue overall mean �Estimated overall mean �True treatment mean �ijklEstimated treatment mean �ijklObserved treatment detection rate pijklTransformed observed treatment detection rate YijklEstimated treatment detection rate pijklTransformed estimated treatment detection rate YijklAverage observed detection rate pTrue main factor level e�ects �i, �j, k, �lEstimated main factor level e�ects �i, �j, k, �lTrue two-way e�ects (��)ij, (� )ik, (��)il(� )jk, (��)jl, ( �)klEstimated two-way e�ects d(��)ij, d(� )ik, d(��)ild(� )jk, d(��)jl, d( �)klTable 2.2. A list of the variables used in the ANOVA analysisis displayed.The factor e�ects model takes the general form�ijkl = �+�i+�j + k+ �l+(��)ij +(� )ik+(��)il+(� )jk+(��)jl+( �)kl+(�� )ijk + (��)ijl + (� �)jkl + (� �)ikl + (�� )ijkl:The observed outcome, the detection rate for a particular treatment,as given in Table 2.1, is pijkl and is the sum of the true mean for that treatmentand a residual term: pijkl = �ijkl + "ijkl:21

The goal of the analysis is to formulate a model that predicts theoutcome of a given treatment. Since the true means and true factor e�ects arenot known, estimates of these terms are determined from the simulation andused in the model. Estimated values are indicated with the ^notation. Thepredicted outcome pijkl is represented by the following relationship:pijkl = �+ �i+ �j + k+ �l+ d(��)ij + d(� )ik+ d(��)il+ d(� )jk+ d(��)jl+ d( �)kl+ d(�� )ijk + d(��)ijl + d(� �)jkl + d(� �)ikl + d(�� )ijkl:In this equation pijkl is the estimated probability of detecting a tumor at thefactor levels indicated by i; j; k; l. This probability is predicted by the modelusing least -square estimators for the terms in the equation. The probabilityof detection is a function of the estimated overall mean, �, and the estimatede�ects from the four factors, alone and in combination with one another. Notall of these e�ects may be signi�cant. In order to determine which of thefactors do signi�cantly e�ect the detection rate and therefore belong in the�nal model, various means are evaluated. If all the means for a particularfactor (or combination of factors) are equal, varying a factor level does notadd to or subtract from the overall mean and therefore the factor does notbelong in the �nal model. This equality question is put, not only to eachfactor individually, but to all the combinations of factors as well.22

2.4.2 ANOVA MechanicsUse of the ANOVA model is founded on several assumptions:(1) The outcomes follow a normal probability distribution.(2) Each distribution has the same variance.(3) The outcomes for each factor level are independent of the other factorlevel outcomes.With these assumptions in mind, note that the probability distributions of afactor at each of its levels di�ers only with respect to the mean [4]. Therefore,the �rst step in executing the analysis is to determine if the detection rates,are statistically di�erent. Secondly, if they are di�erent, one of the intents ofthe ANOVA model is to determine if the di�erence between the detection rateof two or more treatments is su�cient, after examining the variability withinthe treatments, to conclude that one treatment does indeed produce a higherdetection rate. In addition, by evaluating the statistical data, conclusions maybe drawn as to how each factor, both independently and within establishedinteraction groups (pairwise, three-way or four-way), in uences the outcome.23

2.4.3 ResidualsWe de�ne p to be the average of all observations. The model statesthat pijkl = �ijkl+"ijkl; therefore the residual term is "ijkl = pijkl��ijkl. Since�ijkl is estimated by �ijkl, the estimated residual term is eijkl = pijkl � �ijkl,the di�erence between the observed and the estimated average detection rate.The set of all 54 residuals, eijkl, for all i,j,k and l are evaluated for threecharacteristics which indicate whether the �tted data are well-suited for theanalysis. These characteristics are:1. Normality of error terms.2. Constancy of error variance.3. Independence of error terms.Several statistical tests and plots used on the residual data determinewhether one of the �ve assumptions is violated. These tests revealed thatthe error variances were not stable, thus violating the �rst characteristic. Atransformation was employed to preserve the statistical information in theoutput, but stabilize the error variances. Since nothing is lost by employing atransformation and the error variances are stabilized, the detection rate datap is transformed to Y via the following relationship:Y = 2 arcsin(pp):24

The outcome from these simulations is the detection rate, a proportion of thenumber of specimens where tumor is detected to the total number of specimens.The arcsine transformation is the most appropriate transformation when theoutcome is a proportion [4]. All ANOVA data referenced from this point on aretransformed unless noted otherwise. The inverse transformation is calculatedat the conclusion of this analysis to get a true estimate of the probability.2.4.4 The Null and Alternate HypothesesA starting point in the ANOVA process is to establish two hypothesis,a null and alternate hypothesis. The null hypothesis assumes that all e�ectsare equal, therefore indicating that speci�c factor levels do not in uence theoutcome. The alternate hypothesis assumes that at least two of the e�ects arenot the same.The F-test is used to decide which of these two hypotheses concerningthe data will be accepted. The test consists of computing the ratio of between-e�ect variation to within-e�ect variation. This between-e�ect variation, whichchanges depending on the e�ect, is called the treatment sum of squaresand is denoted SSA, SSB, SSC, and SSD (see Appendix also). It is ameasure of the di�erence between the detection rate of a set of treatmentsand the average detection rate over all treatments. The within-e�ect variation25

is called the error sum of squares and is denoted SSE. It is a measureof the di�erence between the individual outcome for a given treatment andthe estimated detection rate over that treatment. The error sum of squaresmeasures variability that is not explained by the SSA, SSB, SSC, or SSDterms and therefore occurs within the set of treatments. Both of these variationmeasurements are evaluated using sum of the squares expressions as detailedin the Appendix. The means of the SSA; SSB; SSC; SSD and SSE termsare MSA;MSB;MSC;MSD and MSE respectively, and are computed bydividing by the degrees of freedom, df, associated with each term. This resultsin F = MSA=MSE where MSA = SSA=dfA (MSB = SSB=dfB,etc) andMSE = SSE=df . Large values of F tend to support the conclusion that allthe e�ects are not equal (Ha), whereas values of F near 1 support the nullhypothesis (H0). In the event that the alternate hypothesis is indicated viathe F-test, the ANOVA also provides the probability of a TYPE I error. ATYPE I error occurs when it is concluded that di�erences between meansexist when, in fact, they do not (i.e. accept Ha when in fact Ho is true). Thisinformation is given in the column labelled Pr(F) in the ANOVA output inTable 2.3.26

2.4.5 Are the Main E�ects all Equal?Following the general process of establishing null and alternate hy-pothesis as described above, a pair of null and alternate hypotheses are statedfor each factor in the biopsy model. The null hypothesis assumes that themain e�ects for a given factor at each of its levels are equivalent. The alter-nate hypothesis obviously assumes that the main e�ects di�er.H0: �1 = �2 = �3 Ha: not all �i are equal.�1 = �2 not all �i are equal.�1 = �2 = �3 not all i are equal. 1 = 2 = 3 not all �i are equal.The F-test statistic is applied to determine which hypothesis to ac-cept in each case. The factor sum of squares for each factor, number of nee-dles, spacing, � and �, denoted SSA, SSB, SSC and SSD, respectively,is computed as shown in the Appendix. The mean of each of these fac-tor sum of square terms is computed by dividing each term by its associ-ated degrees of freedom so that MSA = SSA=dfA, MSB = SSB=dfB, etc.as detailed in the Appendix. The test statistic is formed for each hypoth-esis in the following manner. To test the e�ect of the �rst factor, Num-ber of Needles, F = MSA=MSE; to test the e�ect of the spacing factor,27

F = MSB=MSE; to test the e�ect of �, F = MSC=MSE; and to test thee�ect of �, F =MSD=MSE. Accepting the alternate hypothesis means thata speci�c setting of the given factor corresponds to a change in detection rate;thus that factor has an e�ect on the overall outcome of the biopsy.Df Sum of Sq Mean Sq F Value Pr(F)Needles 2 0.15862 0.07931 607.427 0.0000000Main Spacing 1 0.00498 0.00498 38.209 0.0000011E�ects � 2 0.29249 0.14624 1120.073 0.0000000� 2 0.28115 0.14057 1076.661 0.0000000Ndls:Spc 2 0.1641 0.00820 62.846 0.0000000Needles: � 4 0.01444 0.00361 27.653 0.00000002-Way Spacing: � 2 0.00059 0.00029 2.283 0.1206068E�ects Needles: � 4 0.00395 0.00098 7.569 0.0002892Spacing: � 2 0.00046 0.00023 1.794 0.1848710�: � 4 0.02867 0.00716 54.902 0.0000000Residuals 28 0.00365 0.00013Table 2.3. The output from the ANOVA is displayed above. SeeAppendix for details of the calculations.Refering to this ANOVA output, the column of numbers labelled Sumof Sq refers to the parameters SSA, SSB, SSC and SSD detailed in the Ap-pendix. The column labelled Mean Square lists the parameters MSA, MSB,MSC, MSD. The F Value column lists the F-test outcome for each row: (Nee-dles F Value = MSA/MSE). The larger values in this column tend to supportthe alternate hypothesis that the main e�ect for a given factor di�ers across28

the possible levels for that factor. The �nal column, Pr(F), gives the probabil-ity of a Type I error. Again, a Type I error occurs if the alternate hypothesis isconcluded when in fact, the null hypothesis is true. The row labelled Residualsindicates the total degrees of freedom, the SSE and the MSE for this analysis.Based on the numbers in the table, each of the four main e�ectshas a signi�cant e�ect on the outcome with the factor � having the great-est in uence on the detection rate, followed by the factors � and Num-ber of Needles. This fact is indicated by the high F-value that corre-sponds to each of the four factors. The rows labelled with two factor names(for example, Needles: Spacing) indicate the ANOVA output correspond-ing to pair-wise interactions and include the sum of squares computed foreach pair of factors. The sum of squares for all of the pair-wise interac-tion terms (SSAB; SSAC; SSAD; SSBC; SSBD; SSCD) are computed asdetailed in the Appendix. The total treatment sum of squares, SSTR =SSA+SSB+SSC+SSD+SSAB+SSAC+SSAD+SSBC+SSBD+SSCD.This sum does not include the sum of square terms due to the three-way andfour-way interactions because there are not enough degrees of freedom in theexperiment to use the full model.29

2.4.6 Recognizing Interaction between FactorsAt this point, the F-test has determined that each of the main factore�ects contributes to the overall detection rate. To evaluate the interactione�ects, the F-test is applied again . The F-test is applied to determine inter-action between, in this case, two, three or four factors. A null and alternatehypothesis is formulated for all possible combinations of factors and sum ofsquare terms are computed for the factor groups and used in each F-test. Thenull and alternate hypothesis are constructed for each of the pairwise interac-tions. H0: all (��)ij = 0 Ha: not all (��)ij = 0all (� )ik = 0 not all (� )ik = 0all (��)il = 0 not all (��)il = 0all (� )jk = 0 not all (� )jk = 0all (��)jl = 0 not all(��)jl = 0all ( �)kl = 0 not all ( �)kl = 0All three-way combinations are formed, hypotheses are constructed and F-testresults are evaluated.H0: all (�� )ijk = 0 Ha: not all (�� )ijk = 0all (��)ijl = 0 not all (��)ijl = 0all (� �)ikl = 0 not all (� �)ikl = 0all (� �)jkl = 0 not all (� �)jkl = 0The null/alternate set of hypothesis is constructed for the four-way interaction.30

H0: all (�� )ijkl = 0Ha: not all (�� )ijkl equal 0Based on the actual ANOVA results in the preceding table, four ofthe pair-wise interactions appear strongly signi�cant: Needles: Spacing,Needles: �, Needles: � and �: �. The other two pair-wise interactions areincluded in the �nal model even though the strength of their signi�cance isuncertain. The ANOVA was executed once to include all three-way interac-tions. Since these interactions proved insigni�cant, they are not included inthe model. There are not enough degrees of freedom in the experiment toestimate the residuals and test for the four-way interaction.As stated previously, the Y notation indicates the transformed de-tection rate (p). At this point the general model, of the formYijklm = �:::: + �i + �j + k + �l Main e�ects+(��)ij + (� )ik + (��)il + (� )jk ++(��)jl + ( �)kl Pairwise e�ects+(�� )ijk + (��)ijl + (� �)jkl Three-way e�ects+(�� )ijkl Four-way e�ect+�ijklm residual erroris reduced to the �nal model for this analysis:Yijkl = �+ �i+ �j+ k+ �l+ d(��)ij+ d(� )ik+ d(��)il+ d(� )jk+ d(��)jl+ d( �)kl:This model yields the transformed probability of detection at the given levelsfor i,j, k and l. 31

Now that the factor e�ects have been identi�ed, the analysis revolvesaround determining the factor levels that result in the highest detection rate.For this part of the analysis, the tables of means and tables of e�ects areevaluated.�:::: Grand Mean 1.072Needles 4 6 8 Spacing Relative Absolute�i::: 0.999 1.09 1.128 �:j:: 1.082 1.063� 30� 45� 60� � 30� 45� 60��::k: 0.9723 1.098 1.147 �:::l 1.14 1.1104 0.9724Table 2.4. The ANOVA tables of means list the transformed values.� �Needles 30� 45� 60� Spacing 30� 45� 60�4 0.926 1.027 1.045 Relative 0.978 1.111 1.1576 0.979 1.113 1.176 Absolute 0.967 1.084 1.1378 1.012 1.152 1.221� �Needles 30� 45� 60� Spacing 30� 45� 60�4 1.054 1.028 0.915 Relative 1.148 1.118 0.9806 1.161 1.123 0.985 Absolute 1.132 1.091 0.9658 1.205 1.163 1.017Spacing �Needles Relative Absolute � 30� 45� 60�4 0.987 1.011 30� 1.026 0.978 0.9136 1.099 1.080 45� 1.159 1.139 .09948 1.159 1.097 60� 1.235 1.196 1.010Table 2.5. The transformed values of the pairwise means are shown.32

Referring to the ANOVA tables of means, the highest numbers in eachcategory re ect the best setting for a particular factor. On reading throughthe tables of means, the conclusion is that a technique of 8 needles, relativespacing, � = 60� and � = 30� yields the best detection rate. In order tocorroborate this more fully, the interactions that are deemed signi�cant areanalysed to verify that the main e�ect is not contradicted by an interaction.Therefore, the table forNeedles: � is reviewed and it is found that the settingof 8 needles and � = 60� again yields the highest mean. The tables for all ofthe pair-wise combinations are reviewed to determine that the best settingsyield the highest means in the interaction tables just as they did in the maine�ect tables. This proves to be the case, so none of the interactions contradictthe conclusion drawn from the main e�ect information.

33

Number of Needles(4, 6, or 8) �1 �2 �3E�ect -0.07329 0.01723 0.05607Spacing(Relative or Absolute) �1 �2E�ect 0.009612 -0.009612�(30�, 45�, or 60�) 1 2 3E�ect -0.1001 0.02519 0.07486�(30�, 45�, or 60�) �1 �2 �3E�ect 0.0678 0.03215 -0.09995Table 2.6. The main factor level e�ects from the ANOVA outputare documented.

34

SpacingRelative Absolute4 -0.02127 0.02127Needles 6 -0.00017 0.000178 0.02143 -0.02143�30� 45� 60�4 0.02680 0.00244 -0.02925Needles 6 -0.01031 -0.00127 0.011588 -0.01649 -0.00118 0.01767�30� 45� 60�Spacing Relative -0.004354 0.003708 0.000646Absolute 0.004354 -0.003708 -0.000646�30� 45� 60�4 -0.01271 -0.00292 0.01563Needles 6 0.00363 0.00087 -0.004508 0.00907 0.00206 -0.01113�30� 45� 60�Spacing Relative -0.001740 0.004148 -0.002407Absolute 0.001740 -0.004148 0.002407�30� 45� 60�30� -0.01404 -0.02664 0.04067� 45� -0.00621 0.00978 -0.0035760� 0.02025 0.01686 -0.03711Table 2.7. The ANOVA table of e�ects for pairwise interactionsis displayed.35

By using the values from the tables of e�ects, a probability for de-tection is calculated for the optimal setting:Y3131 = �+ �3+ �1+ 3+ �1+ d(��)31+ d(� )33+ d(��)31+ d(� )13+ d(��)11+ d( �)311:347918 = 1:072 + :05607 + :009612 + :07486 + :0678+:02143 + :01767 + :00907 + :000646 +�0:00174 + :02025This result of 1.347918 is then transformed back (arcsine equation)to yield a probability of 0.38948 for this setting.1:347918 = 2 arcsinq(p)p = (sin(1:347918=2))2 = 0:38949:Therefore, with the factors set to 8 needles, relative spacing, � = 60� and� = 30�, the biopsy procedure has a 38:9% probability of detecting the cancergiven the tumor distribution model used. This estimated probability is bestused in comparisons with the other estimated probabilities rather than asan absolute measure of detection rate. Therefore the conclusion from thisanalysis is a relative ranking of treatments in terms of their detection rate.Since the 1000 simulated specimens were the same for each treatment, theANOVA model determined the relative di�erences between detection rates ofvarious treatments, not necessarily providing enough data and results to draw36

conclusions about absolute detection rates. Table 2.8 lists each experimentand the probability of detection predicted from the factor e�ects model.Treatment ParametersNumber of Spacing PredictedExperiment Needles Method � � Probability1 4 Relative 45� 45� 0.2472 6 Relative 45� 45� 0.2973 8 Relative 45� 45� 0.3274 4 Absolute 45� 45� 0.2515 6 Absolute 45� 45� 0.2816 8 Absolute 45� 45� 0.2917 4 Relative 60� 45� 0.2658 6 Relative 60� 45� 0.3379 8 Relative 60� 45� 0.36910 4 Absolute 60� 45� 0.27111 6 Absolute 60� 45� 0.32412 8 Absolute 60� 45� 0.33513 4 Relative 30� 45� 0.19514 6 Relative 30� 45� 0.22715 8 Relative 30� 45� 0.25116 4 Absolute 30� 45� 0.20517 6 Absolute 30� 45� 0.21918 8 Absolute 30� 45� 0.22419 4 Relative 45� 60� 0.20020 6 Relative 45� 60� 0.23621 8 Relative 45� 60� 0.26022 4 Absolute 45� 60� 0.20823 6 Absolute 45� 60� 0.22724 8 Absolute 45� 60� 0.23225 4 Relative 60� 60� 0.19226 6 Relative 60� 60� 0.24727 8 Relative 60� 60� 0.27328 4 Absolute 60� 60� 0.20329 6 Absolute 60� 60� 0.241Table 2.8. The probabilities of detection for one tumorsimulations are displayed.37

Treatment ParametersNumber of Spacing PredictedExperiment Needles Method � � Probability30 8 Absolute 60� 60� 0.24831 4 Relative 30� 60� 0.17532 6 Relative 30� 60� 0.19633 8 Relative 30� 60� 0.21534 4 Absolute 30� 60� 0.18935 6 Absolute 30� 60� 0.19436 8 Absolute 30� 60� 0.19537 4 Relative 45� 30� 0.25738 6 Relative 45� 30� 0.31439 8 Relative 45� 30� 0.34640 4 Absolute 45� 30� 0.26641 6 Absolute 45� 30� 0.30342 8 Absolute 45� 30� 0.31543 4 Relative 60� 30� 0.27644 6 Relative 60� 30� 0.35445 8 Relative 60� 30� 0.38946 4 Absolute 60� 30� 0.28747 6 Absolute 60� 30� 0.34648 8 Absolute 60� 30� 0.36049 4 Relative 30� 30� 0.20850 6 Relative 30� 30� 0.24651 8 Relative 30� 30� 0.27252 4 Absolute 30� 30� 0.22353 6 Absolute 30� 30� 0.24354 8 Absolute 30� 30� 0.250Table 2.8. (Cont.) The probabilities of detection for one tumorsimulations are displayed.2.4.7 Clinical Distribution of TumorsThe biopsy simulations were conducted a second time on more real-istic geometric glands. By using a clinically derived distribution of numberof tumors per gland, a better population was available for these biopsy sim-ulations. A sample size of 1000 was again used but in this experiment, 1/438

of the glands had a single tumor, 1=2 had two tumors and the remaining 1/4had 3 tumors. The total gland volume was again held to be less than 6.4cc. This distribution is based on the analysis done by Daneshagari [2]. TheANOVA results are found in the Appendix and yield the same optimal biopsyprocedure with a slightly di�erent probability resulting from the factor e�ectsmodel. By using the values from this second table of e�ects, a probabilityfor detection is calculated for the optimal setting:Y3131 = �+ �3+ �1+ 3+ �1+ d(��)31+ d(� )33+ d(��)31+ d(� )13+ d(��)11+ d( �)311:7535 = 1:429 + 0:0733 + 0:01507 + 0:07456 + 0:07091+0:02321 + 0:02650 + 0:01412� 0:005442� 0:004094 + 0:03638Transforming this value (arcsine) yields a probability of detection forthe optimal setting of :5908. This probability of 59.08% is higher than the38.9% achieved by the simulation using geometric models of one tumor aswould be expected. The predicted probabilities for each of the 54 experimentsgiven this distribution of tumors is shown in Table 2.9.39

Treatment ParametersNumber of Spacing PredictedExperiment Needles Method � � Probability1 4 Relative 45� 45� 0.4172 6 Relative 45� 45� 0.4893 8 Relative 45� 45� 0.5264 4 Absolute 45� 45� 0.4175 6 Absolute 45� 45� 0.4706 8 Absolute 45� 45� 0.4827 4 Relative 60� 45� 0.4278 6 Relative 60� 45� 0.5249 8 Relative 60� 45� 0.56910 4 Absolute 60� 45� 0.43611 6 Absolute 60� 45� 0.51412 8 Absolute 60� 45� 0.53313 4 Relative 30� 45� 0.35314 6 Relative 30� 45� 0.40515 8 Relative 30� 45� 0.43116 4 Absolute 30� 45� 0.35417 6 Absolute 30� 45� 0.38718 8 Absolute 30� 45� 0.38819 4 Relative 45� 60� 0.35820 6 Relative 45� 60� 0.40821 8 Relative 45� 60� 0.44322 4 Absolute 45� 60� 0.36023 6 Absolute 45� 60� 0.39124 8 Absolute 45� 60� 0.40125 4 Relative 60� 60� 0.32226 6 Relative 60� 60� 0.39527 8 Relative 60� 60� 0.43728 4 Absolute 60� 60� 0.33229 6 Absolute 60� 60� 0.38630 8 Absolute 60� 60� 0.403Table 2.9. Given the distribution of one to three tumors,the probabilities of detection predicted by the ANOVA modelare displayed.40

Treatment ParametersNumber of Spacing PredictedExperiment Needles Method � � Probability31 4 Relative 30� 60� 0.32632 6 Relative 30� 60� 0.35733 8 Relative 30� 60� 0.38134 4 Absolute 30� 60� 0.32935 6 Absolute 30� 60� 0.34136 8 Absolute 30� 60� 0.34037 4 Relative 45� 30� 0.41738 6 Relative 45� 30� 0.49839 8 Relative 45� 30� 0.54140 4 Absolute 45� 30� 0.42541 6 Absolute 45� 30� 0.48642 8 Absolute 45� 30� 0.50443 4 Relative 60� 30� 0.43644 6 Relative 60� 30� 0.54145 8 Relative 60� 30� 0.59046 4 Absolute 60� 30� 0.45147 6 Absolute 60� 30� 0.53748 8 Absolute 60� 30� 0.56249 4 Relative 30� 30� 0.35150 6 Relative 30� 30� 0.41251 8 Relative 30� 30� 0.44452 4 Absolute 30� 30� 0.35953 6 Absolute 30� 30� 0.40154 8 Absolute 30� 30� 0.407Table 2.9. (Cont.) Given the distribution of one to threetumors, the probablities of detection predicted by theANOVA model are displayed.A selection of detection rates are graphed in Figure 2.7 to providevisualization of the relative ranking of various treatments. The plots indicate6 and 8 needles, relative spacing and all of the levels for � and �.41

.55

ϕ

Rate

Hit

.6

θ

θ

θ

θ

θ

= 30 ; 6 needlesθ

Legend

0

= 30 ; 8 needles

= 45 ; 6 needles

= 60 ; 6 needles

= 45 ;8 needles

= 60 ; 8 needles

0

0

0

0

0

450030 600

.35

.4

.45

.5

Figure 2.7. The detection rates for several experimentsare graphed and the common treatment parameters arenoted for each experiment. This gives a visual under-standing of the ranking of these treatments in terms oftheir detection rate.

42

3. Digitized Specimen Data3.1 Summary of Software ToolAn analysis program, written in C, was created to simulate needlebiopsies on clinical data provided by the University of Colorado Health Sci-ences Center, Pathology Department. The clinical data were gathered fromautopsies, pathologically investigated and digitized [2].The data for each specimen are stored as a 3-dimensional array ofinformation. The software uses an input �le to determine the characteristics ofa given experiment. These characteristics include the number of needles, theinitial placement of the �rst needle, the angles � and �, the spacing betweenneedles, and the needle diameter and length. In this manner, the analysissoftware is exible enough to handle a variety of simulations. The goal ofthis biopsy simulation tool is to provide the means to experiment realisticallywith various needle parameters on clinical data in order to determine anycorrespondence between biopsy methods and detection rates.The initial needle position is o�set by the distance requested (thez-o�set entered by the user), with half of the needles entering the right lobe43

and the other half entering the left lobe, in symmetry with each other. Theinitial position is determined as an absolute (in cm) o�set from the apex of thegland. The other parameters are used to position each needle on the specimendata set and determine how much of the specimen data is to be returned inthe needle biopsy. This specimen data is analyzed to determine whether andhow much tumor data is present in the needle. This information is availableto the user.Having read the input �le with parameter values, the code begins aloop on the specimen data �les requested for simulation. In this loop, the three-dimensional specimen data �le is opened, the data are read into a 3-d array,with all of the background trimmed o�, the apex of the gland is located, andthe needle positions are translated into array coordinates. These coordinatesare fed to the biopsy routine which extracts the specimen data coinciding withthe needle and analyzes the data for tumor information. The information forthe entire experiment is stored in an output �le that documents the needleparameters and the results for each image data set.

44

3.2 Speci�c Algorithms3.2.1 Locating the ApexThe apex is de�ned as the �rst contact with the prostate when ap-proaching it through the rectum, as done clinically. This location is used asa landmark for positioning each biopsy needle. In the data set, the algorithmthat searches for this landmark proceeds as follows. The planes are de�ned asshown in Figure 3.1.Each pixel in the three-dimensional specimen �le contains a numberindicating the type of data at that location. The possible types are gland,tumor, capsule or background. Capsule data indicate those pixels de�ning theboundary of the gland. The apex is indicated by the �rst pixel pointing tocapsule data. Therefore one plane of specimen data is evaluated at a time,until a pixel that points to capsule data is found. This location is recorded asthe apex location.

45

Apex

X

Y

Z

Figure 3.1. The x; y; z axis, as de�ned for the digitaldata, mimic those de�ned for the geometric models.

46

3.2.2 Establishing Needle PositionsThe starting position, the location of the apex, serves as the land-mark for each additional needle. From this starting point and the additionaluser-supplied parameters (z-o�set, distance between needles) all of the nee-dle positions are calculated in terms of a vector. This vector, represented by(x; y; z) coordinates, along with the � angle, is a pointer to a speci�c pixel ofimage data. The z-o�set is assumed to be in centimeters and is added to theinitial (x; y; z) of the starting position to locate the �rst needle position. Eachtime any coordinate is changed, the new vector may be pointing to gland,tumor, background, urethra or capsule data. The pixel represented by thevector is read to insure that the needle entry position remains located on cap-sule data. If it does not, the y coordinate is adjusted to make sure that theentry position of the needle is on capsule data.At this point in the algorithm, the �rst needle position is determined.There are two ways to space the remaining needles. The user may enterabsolute distances in centimeters or a relative measure taken to be a percentageof the z dimension of the gland. In addition, a zero percentage indicates that47

the spacing is based on the number of needles in the biopsy; the needles areequally spaced across the z-axis of the gland. The remaining needle positionsare calculated from the initial needle position: half of the needles are positionedin the right lobe by using �, the remainder use �� to rotate into the left lobe.All of the needles have the x coordinate set to the midpoint of the gland inthe x dimension.The user-entered distance, in centimeters, is converted to a speci�cnumber of pixels. This z distance is added to the �rst needle position to obtainthe second needle position, added to the second to obtain the third, etc. Eachtime a needle position is calculated, the coordinates are evaluated to insurethat they point to capsule data. If the gland is too short in the z direction tohandle all the needles requested, the experiment proceeds with the number ofneedles that do stay within the gland.The experiments that depend on a relative distance between needles,require additional analysis of the yz slice before determining the z o�set. Thez diameter of the particular yz slice is calculated. The z distance required fora needle of a speci�c length, inserted at a speci�c angle is then subtracted fromthis z diameter. Rather than having the last needle pierce more backgroundthan gland data, this subtraction enables the full number of needles to be48

inserted into the gland. This new z diameter is then divided into the numberof segments required by the speci�ed percentage. If the user indicates 0% forthe distance spacing, the software calculates the distance based on the numberof needles requested and the diameter of the yz plane.3.3 SimulationsThe 54 treatments used in the geometric model were used as biopsyprocedures on a maximum of 53 digitized clinical specimens. Some of thebiopsy techniques were simulated on only 52 of these clinical specimens. Table3.1 shows the results from these simulations on the digitized clinical data.The table documents both the multiple-tumor geometric model hit rate aswell as the number of hits resulting from the same biopsy on the digitizedclinical data. The �rst �ve columns indicate the experiment number and thebiopsy parameter settings for the four variables, number of needles, spacingmethod, � and �. The column labelled Detection Rate is the number of hitsper 1000 simulations of the geometric model. The column labelled Numberof Hits is the number of hits per number of digitized clinical samples. Mostexperiments were run on all 53 of the digitized specimens. However, someof the simulations resulted in an error on one or more of the specimens andthese specimens were then removed from the experiment. The �nal column,49

labelled Clincial Detection Rate is the rate for the experiments on the digitizedspecimens. Number Number Clinicalof Spacing Detection of DetectionExperiment Needles Method � � Rate Hits Rate1 4 Relative 45� 45� 0.417 853 0.15092 6 Relative 45� 45� 0.489 1153 0.20753 8 Relative 45� 45� 0.526 852 0.15384 4 Absolute 45� 45� 0.417 953 0.16985 6 Absolute 45� 45� 0.470 1153 0.20756 8 Absolute 45� 45� 0.482 1052 0.19237 4 Relative 60� 45� 0.427 953 0.16988 6 Relative 60� 45� 0.524 952 0.17319 8 Relative 60� 45� 0.569 1353 0.245310 4 Absolute 60� 45� 0.436 1053 0.188711 6 Absolute 60� 45� 0.514 1253 0.226412 8 Absolute 60� 45� 0.533 1253 0.226413 4 Relative 30� 45� 0.353 753 0.132114 6 Relative 30� 45� 0.405 1253 0.226415 8 Relative 30� 45� 0.431 953 0.169816 4 Absolute 30� 45� 0.354 753 0.132117 6 Absolute 30� 45� 0.387 753 0.132118 8 Absolute 30� 45� 0.388 953 0.169819 4 Relative 45� 60� 0.358 653 0.113220 6 Relative 45� 60� 0.408 953 0.169821 8 Relative 45� 60� 0.443 1152 0.211522 4 Absolute 45� 60� 0.360 853 0.150923 6 Absolute 45� 60� 0.391 1053 0.188724 8 Absolute 45� 60� 0.401 1053 0.188725 4 Relative 60� 60� 0.322 853 0.150926 6 Relative 60� 60� 0.395 852 0.153827 8 Relative 60� 60� 0.437 952 0.173128 4 Absolute 60� 60� 0.332 652 0.115429 6 Absolute 60� 60� 0.386 952 0.173130 8 Absolute 60� 60� 0.403 952 0.1731Table 3.1 The detection rates for the geometric and clinicalsimulations are displayed.50

Number Number Clinicalof Spacing Detection of DetectionExperiment Needles Method � � Rate Hits Rate31 4 Relative 30� 60� 0.326 552 0.096232 6 Relative 30� 60� 0.357 552 0.096233 8 Relative 30� 60� 0.381 952 0.173134 4 Absolute 30� 60� 0.329 452 0.076935 6 Absolute 30� 60� 0.341 452 0.076936 8 Absolute 30� 60� 0.340 452 0.076937 4 Relative 45� 30� 0.417 652 0.115438 6 Relative 45� 30� 0.498 1052 0.192339 8 Relative 45� 30� 0.541 1252 0.230840 4 Absolute 45� 30� 0.425 852 0.153841 6 Absolute 45� 30� 0.486 1052 0.192342 8 Absolute 45� 30� 0.504 1152 0.211543 4 Relative 60� 30� 0.436 652 0.115444 6 Relative 60� 30� 0.541 1052 0.192345 8 Relative 60� 30� 0.590 1053 0.188746 4 Absolute 60� 30� 0.451 852 0.153847 6 Absolute 60� 30� 0.537 1252 0.230848 8 Absolute 60� 30� 0.562 1252 0.230849 4 Relative 30� 30� 0.351 330 0.100050 6 Relative 30� 30� 0.412 1152 0.211551 8 Relative 30� 30� 0.444 1052 0.192352 4 Absolute 30� 30� 0.359 652 0.115453 6 Absolute 30� 30� 0.401 852 0.153854 8 Absolute 30� 30� 0.407 1052 0.1923Table 3.1 (Cont.) The detection rates for the geometric andclinical simulations are displayed.3.4 Geometric Model vs Clinical ModelComparison of the detection rates between the geometric model andthe clinical model reveals that the geometric simulation produces much higherrates than its clinical counterpart. In attempting to explain this discrepency,several characteristics of the experiment are noted.51

The distribution of the tumors and the total tumor volume in a givenspecimen can impact the detection rate of a treatment. A comparison of thetumor volumes is graphically displayed in Figures 3.2 and 3.3. As shown bythe histograms, the tumor volumes for the autopsy data tend strongly towardsmall (� :5 cc) volumes. In contrast, the geometric model produces tumorswith volumes more equally spaced across the spectrum of possible volumes.In fact, 80% of the autopsy specimens have a total tumor volume less than :5cc. In contrast, only 49% of the geometric gland models have a total tumorvolume in this range. This di�erence in the size of the tumors can explainsome of the di�erence in detection rate between the clinical and geometricalmodels. A second di�erence is that the relative ranking of detection rates forthe digital data simulations is di�erent than the ranking of detection rates forthe geometric simulations. An example of this discrepency is that experiment9, ( 8 Needles, Relative Spacing, � = 60�, � = 45�) achieved a detection rateof 0.2453 or 13 hits out of 53 samples. This detection rate is better thanthe detection rate of experiment 45, ( 8 Needles, Relative Spacing, � = 60�,� = 30�) which is the optimal biopsy as indicated by the geometric simulation.This di�erence may be due to the fact that only 53 specimens were used in the52

digital simulation in contrast to the 1000 models constructed for the geometricsimulation.3.5 Optimal Technique vs SRSCBThe optimal technique, determined by the geometric model, consistsof 8 needles, relative spacing, � = 60� and � = 30�. The SRSCB procedureuses 6 needles, absolute spacing, � = 45� and � = 45�. Both techniques weresimulated on the geometric model as well as the digitized clinical data. Theoptimal technique actually proved slightly worse at tumor detection than theSRSCB procedure when simulated on the clinical data. In fact, the optimalmethod detected tumor in 10 out of 53 specimens (.189). The SRSCB methoddetected tumor in 11 out of 53 specimens (.207). These results compare withthe overall results from the geometric simulation as follows. The SRSCB hada detection rate of .47 and the optimal had a detection rate of .59 on the1000 geometric models. This discrepency is addressed by noting the samplesize available in the two simulations and the distribution of tumor volumes asnoted earilier.

53

ofTumors

Number

Autopsy Specimens

15

20

10

Sum of Tumor Volume

25

.05 .5 1 1.5 2 2.5 3 3.5 4 4.5 5

5

0

Figure 3.2. The histogram of the clinical data shows thetumor distribution by volume.

54

ofTumors

Number

.05 .5

400

1 1.5 2 2.5 3 3.5 4 4.5 5

0

Sum of Tumor Volume

Geometric Specimens

100

200

300

Figure 3.3. The histogram of the geometric data showsthe tumor distribution by volume.

55

4. Geometric Model - Volume Estimates4.1 Tumor Volume EstimatesThe total volume of tumor in a gland is an important piece of infor-mation for clinicians who use it to improve both the diagnosis and treatmentplan for a patient. The ultrasound used during a biopsy accurately measuresthe prostate gland volume so that an approximate ratio of tumor to glandvolume can be used to estimate the volume of tumor in a gland. These sim-ulations o�ered an avenue to explore a means of approximating this volumeratio by using the volume of the needle that contains tumor information andthe total volume of the needle.Three methods are used to estimate the amount of tumor intersectedby the needle. The needle can be modeled by a line, a strip, or a cylinderin one, two, and three dimensions, respectively. The length and diameterof the needle are constant and are set by clinical limits. This incrementalapproach began in one dimension in order to simplify aspects of the simulationduring software veri�cation. As the research progressed, the two- and three-dimensional needles were introduced in order to model the actual biopsy more56

closely. The �rst method of estimating the volume ratio is R = 1n�( viVi ) wherevi represents the tumor volume within a single needle, Vi represents the volumeof that same needle, and n is the number of needles. This ratio is referred to asthe average of the ratios. A second estimator of volume ratio is r = �vi�Vi , wherevi is the tumor volume within a single needle and Vi is the total volume of thatneedle. This ratio is considered the ratio of the average volumes since 1nPni=1 viis the average tumor volume and 1n Pni=1 Vi is the average needle volume. Thisyields r = 1n�vi1n�Vi = �vi�Vi . Both methods of estimating the ratio are documentedbelow.

1

Y

Z

Gland Ellipsoid

Tumor

Needle

lT

t2

tFigure 4.1. This illustration of the gland, tumor andone-dimensional needle depicts the variables used in de-termining the volume ratio estimator.57

4.1.1 One-Dimensional Analysis - Line ModelIn this �rst model, we represent the needle by a line segment as shownin Figure 4.1. The length of the needle that contains tumor pixels, lT , is thedi�erence between t1 and t2, the two roots of equation ( 2.1): lT =j t1� t2 j. Aneedle length, L, of 1.25 cm is used in the estimate of volume ratio. Thus theratio lTL is an approximation of the true volume ratio TVPGV ; that is, lTL � TVPGV .4.1.2 Two-Dimensional - Strip ModelIn the two-dimensional case we represent the needle by a strip. Theneedle entry points (x0,y0,z0) are used as a starting point in the two-dimensionalanalysis. Two lines are created, each o�set from this starting coordinate by theneedle radius. The intersection between these two lines and the tumor ellipse isdetermined and the roots of the two resulting quadratics are used to computeboth the occurrence of a detection and the amount of tumor within the needle.In this case, the estimate of the volume ratio is the area of the tumor over thearea of the needle. Figure 4.2 de�nes the lengths used in determining the area.The area of the tumor is calculated by estimating the needle length which con-tains tumor data with the roots of intersection: lt1 =j t11�t12 j; lt2 =j t21�t22 j.58

The area of tumor is then given by aT = d2(lt1 + lt2) where d is the diameterof the needle. The area of the needle is calculated in the same way using thelength of the needle: aN = d2(L + L). Thus aTaN serves as an estimate of thetrue tumor to gland volume ratio, TVPGV .t1

Y

Z

Gland Ellipsoid

Tumor

Needle

l

12t

t21

11t

t2

t22

l

Figure 4.2. This illustration of the gland, tumor andtwo-dimensional needle depicts the variables used in de-termining the volume ratio estimator.4.1.3 Three-Dimensional - Cylinder ModelThe three-dimensional analysis models the needle as a cylinder and issimilar to the two-dimensional case in that the entry point of the needle is againused as a center coordinate for four needles. In this case, the four needles areconstructed symmetrically about this point to generate a cylindrical needle.Then intersections and roots are computed. A more accurate representation ofthe volume ratio is obtained using the volume of the tumor within the needle59

over the volume of the needle. In this case, the length is estimated to be themaximum of the lengths determined from the four sets of intersection roots:lt = max(j t11 � t12 j; j t21 � t22 j; j t31 � t32 j; j t41 � t42 j):The volume of the needle depends on the known diameter and length: vN =�(d2)2(L). The estimated volume of the tumor depends on the needle lengthswhich contain tumor data as shown in Figure 4.3. This leads to the tumorvolume estimate vT = �(d2)2(lt). The ratio vTvN estimates the true volume ratio,TVPGV .4.2 Experiment SetupA second set of experiments utilizing the geometric model involvedexploring the question of accurately estimating the tumor volume to gland vol-ume ratio. The experiment simulated a biopsy on a single specimen, increasingthe number of needles each iteration and comparing the volume ratio obtainedfrom the biopsy sample to the known volume ratio. The parameters for thebiopsy include the optimal angles � and � determined from the ANOVA in-vestigation The optimal number of needles and distancing method determinedfrom the ANOVA analysis do not apply to this experiment since the numberof needles increases from 6 to 20 and the distancing of these needles is done sothat the maximum` number, 20, are equally spaced. The maximum number60

Y

31

Gland Ellipsoid

Needle

Tumor

Z

Y

XTumor

Gland Ellipsoid

Needle

t22

lt2 t

21

t t11

t12

t42

t32

t41

t

Figure 4.3. This illustration of the gland, tumor andthree-dimensional needle depicts the variables used indetermining the volume ratio estimator.

61

of needles was set at 20 due to clinical limitations. The spacing of the nee-dles is dependent on the maximum number so that from one iteration to thenext n� 2 needles are in the same exact location, yielding the same detectioninformation. In this manner the comparison between a specimen biopsied by6 needles and the same specimen biopsied by 10 needles is not dependent onneedle position, but instead compares the gain made by the four additionalneedles. The simulation is executed on 1000 specimens, varying the numberof needles from 6 to 20 in increments of 2. `The output from this experimentconsists of a �le for each specimen that contains the results of each set ofneedles including the tumor to needle volume ratio achieved and the associatedestimates (R = 1nP(vt=vn) and r = �vt�vn ). In addition, the actual tumor togland volume ratio is noted.4.3 ResultsThe results of this experiment were not as anticipated as there ap-pears to be no pattern of convergence to the actual tumor to gland volumeratio within the limit of 20 total needles. However, much was learned fromthis exercise that provided insight into the next series of investigations. First,it is noted that in the great majority of cases, a single 8-needle biopsy tends62

to overestimate the true tumor to gland volume ratio. Secondly, a comparisonbetween the two methods of calculating the error leads to the conclusion thatthe sum of the ratios is the more accurate method at least in this set of limitedtrials.4.4 Interactive UtilityUsing the preceding idea as a starting point, an interactive softwaretool was created to investigate the volume ratio question in greater detail.This tool prompts the user for a random number, seeds the random numbergenerator, creates a gland containing a single tumor and conducts the optimal8-needle biopsy. This optimal biopsy has 8 needles, relative spacing betweenthe needles, � = 60� and � = 30�. The results, which include each needleposition, the amount of tumor volume contained in the needle and an estimateas to the volume ratio of tumor to gland, are displayed for the user. At thispoint, the user is able to choose the location for the next needle. This newneedle is then simulated and the tumor volume information it retrieves isincorporated into the volume ratio. The user can continue this process ofrequesting additional needles and evaluate the estimated volume ratio and itserror from the true ratio. A maximum of 20 needles can be simulated ona single gland, beginning with the 8 original needles and accumulating the63

additional 12 based on user speci�cations.This area of research is full of open-ended questions where tools suchas this interactive utility can help shed light on answers. With involvementfrom clinicians and medical researchers, experiments can be designed to gathermore information regarding the two issues of volume ratio and optimal biopsytechnique. In addition, using the results of this body of research, more real-istic tumor distributions and geometric models can be constructed to betterunderstand the impact of treatment parameters on detection rate.

64

A. APPENDIX ANOVA De�nitionsA dot in the subscript indicates averaging over the variable repre-sented by that index.� The number of levels for Number of Needles: a = 3.� The number of levels for Distancing Method: b = 2.� The number of levels for �: c = 3.� The number of levels for �: d = 3.� The number of specimens = 1000.� The number of experiments: abcd = 54.In general, Y is an observation, Y is the mean of observations, � is the truemean and � is the least squares estimate of the true mean.Yijkl is the observed detection rate at the factor levels indicated byi; j; k and l.Y :::: is the mean of all specimens over all treatment levels i; j; k; l. Itindicates the overall detection rate for the entire experiment.Y :::: = 1abcd aXi=1 bXj=1 cXk=1 dXl=1 Yijkl65

SSTO, or total sum of squares is a measure of the total variabilityof the observations without consideration of factor level.SSTO = aXi=1 bXj=1 cXk=1 dXl=1(Yijkl � Y ::::)2dfSSTO is the total degrees of freedom. The SSTO has abdc � 1 =54 � 1 degrees of freedom. One degree of freedom is lost due to the lack ofindependence between the deviations.SSTR or treatment sum of squares measures the extent of di�er-ences between estimated factor level means and the mean over all treatments.The greater the di�erence between factor level means (treatment means), thegreater the value of SSTR.SSTR = aXi=1 bXj=1 cXk=1 dXl=1(Yijkl � Y ::::)2dfSSTR is the degrees of freedom. There are r� 1 degrees of freedomfor the SSTR, where r is the number of parameters in the model. In the fullmodel, r = abcd = 54, the total combinations of factor levels. In the modelused for this simulation, r = (a�1)+(b�1)+(c�1)+(d�1)+(a�1)(b�1)+(a�1)(c� 1)+ (a� 1)(d� 1)+ (b� 1)(c� 1)+ (b� 1)(d� 1)+ (c� 1)(d� 1) = 26.One degree of freedom is lost due to the lack of independence between thedeviations.66

SSE or error sum of squares, measures variability which is not ex-plained by the di�erences between sample means. It is a measure of the varia-tion within treatments. A smaller value of SSE indicates less variation withinsimulations at the same factor level.SSE = aXi=1 bXj=1 cXk=1 dXl=1(Yijkl � Yijkl)2dfSSE is the degrees of freedom. Since SSE is the sum of the errorsacross factor level, the degrees of freedom is the sum of the degrees of freedomfor each factor level. It is the total number of simulations minus r, abcd� r.MSE is the mean square for error de�ned by MSE = SSE=dfSSE.Note: The above de�nitions imply SSTO = SSTR + SSE. Due to thisrelationship, this process is referred to as the partitioning of the total sum ofthe squares.In order to measure the variability within a factor level, the fac-tor sum of square terms are computed. These terms are integral in the teststatistic applied to determine whether a factor main e�ect is signi�cant. Inaddition, interaction sum of squares are computed to measure variability ofthe interactions.67

The factor A sum of squares corresponds to the number of needlesfactor. SSA = bcd aXi=1(Y i::: � Y ::::)2Similar factor sum of squares are computed for each of the factors:Factor Sum of Square Mean Sum of SquareNumber of Needles SSA = bcdPai=1(Y i::: � Y ::::)2 MSA = SSA=(a� 1)Spacing Method SSB = acdPbj=1(Y :j:: � Y ::::)2 MSB = SSB=(b� 1)� SSC = abdPck=1(Y ::k: � Y ::::)2 MSC = SSC=(c� 1)� SSD = abcPdl=1(Y :::l � Y ::::)2 MSD = SSD=(d� 1)The interaction sum of squares are computed as well for use in theF-test on the interactions. The �rst three pair-wise interaction sum of squaresare shown below. The others are computed in the same manner.

68

Number of Needles: SpacingSSAB = cdPai=1Pbj=1(Y ij:: � Y i::: � Y :j:: + Y ::::)2MSAB = SSAB=(a� 1)(b� 1)Number of Needles: �SSAC = bdPai=1Pck=1(Y i:k: � Y i::: � Y ::k: + Y ::::)2MSAC = SSAC=(a� 1)(c� 1)Number of Needles: �SSAD = bcPai=1Pdl=1(Y i::l � Y i::: � Y :::l + Y ::::)2MSAD = SSAD=(a� 1)(d� 1)The treatment means, �ijkl, indicate the mean for the treatment atthe ijkl levels of the respective factors.The overall mean, �, is the mean across all factors and all levels(across all i; j; k; l).

69

References(1) Hodge K.K., McNeal J.E., Terris M.K., Stamey T.A. \Random sys-tematic versus directed ultrasound guided transrectal core biopsies ofthe prostate." Journal of Urology 142 (1989): 71-74.(2) Daneshgari, Firouz M.D., Taylor, Gerald D. PhD, Miller, Gary J.M.D., PhD, Crawford, E. David M.D. \Computer Simulation of theProbability of Detecting Low Volume Carcinoma of the Prostate withSix Random Systematic Core Biopsies". Urology 45 (April 1989): 604-609.(3) McNeal, John M.D. \Normal Histology of the Prostate" The AmericanJournal of Surgical Pathology (1988): 619-633.(4) Neter, John, Wasserman, William,Applied Linear Statistical Mod-els, Richard D. Irwin, Inc 1974.

70

simulations of prostate biopsy methods - citeseer

Documents