optimization in action: helping treat cancers
TRANSCRIPT
Optimization in Action: Helping Treat Cancers
Yin ZhangDepartment of Computational
and Applied MathematicsRice University
February 27, 2004
1
Outline
• Part I. Intensity Modulated Radiation Therapy (IMRT)
? How does IMRT work?? Planning data before optimization? Current Approaches
• Part II. Our Most Recent Work in Progress
? A geometric formulation and fast algorithms? Preliminary results and Demo
1
Outline
• Part I. Intensity Modulated Radiation Therapy (IMRT)
? How does IMRT work?? Planning data before optimization? Current Approaches
• Part II. Our Most Recent Work in Progress
? A geometric formulation and fast algorithms? Preliminary results and Demo
Collaborators and Student:
— Dr. Rahde Mohan and group, M.D. Anderson Cancer Center
— Optimization Collaborative Working Group3 oncologists, 3 Physicists, 4 Optimizers.
— Graduate Student: Michael Merritt, CAAM Dept, Rice University
2
Part I: IMRT
Who gets Cancer?
“Approximately one out of every two American men and one out of every threeAmerican women will have some type of cancer at some point during theirlifetime.” — American Cancer Society
How is Cancer Treated?
“The four major types of treatment for cancer are surgery, radiation,chemotherapy, and biologic therapies.” — American Cancer Society
What is IMRT?
“IMRT (Intensity Modulated Radiation Therapy) is a state-of-the-art cancertreatment method that delivers high doses of radiation directly to cancer cellsin a very targeted way, much more precisely than is possible with conventionalradiotherapy ... while sparing more of the surrounding healthy tissue.” — AVendor
3
Major IMRT Equipments
• a medical linear accelerator with
• a computer-controlled multi-leaf collimator
A Rotating Medical Linear Accelerator
6
Crossfire: Focus on Tumors and Spare the Eyes
Multi-leafs form a sequence of patterns at each angle
7
Major Optimization Variables in IMRT
4 angles & 4 intensity profiles
• Beam angles:How many? Usually 5 to 9What angles? Need optim.
• Intensity Profiles (Maps):
Ia(x, y), a = 1, 2, · · ·
A 2D function for each angle.
• Leaf sequences for MLC todeliver intensity maps.
We’ll concentrate on the 2nd issue.
8
Intensity Profile Optimization
General Statement:
Given the number and values of beam angles, determine the optimal intensityprofiles
Ia(x, y), a = 1, 2, · · · ,
for all the angles such that
• tumors cells receive prescribed amounts of radiation;
• healthy cells receive as little radiation as possible.
9
Discretization
• A 2D grid on MLC for each beam angle a:
Ia(x, y) =⇒ Ia(xi, yj), a = 1, 2, · · · .
Each little rectangle is a “beamlet ”, with a unknown intensity value.
• A 3D grid on the region of treatment (tumors, critical organs, ...).
Each little cube is a “voxel ”, with a known desired dose value (tumor) or aknown dose upper-bound (healthy tissue).
Re-arrange voxels and beamlets into 1D vectors:
• voxels: i = 1, 2, · · · ,m (105 − 106)
• beamlets: j = 1, 2, · · · , n (2000− 20000)
10
Planning variables and Data
Variables (Unknowns):— Intensity Ia(xi, yj), rearranged and renamed as
xj, j = 1, 2, · · · , n; beamlet intensities
Data:
• CT-scan images, geometric contours for structures, ......
• Desired dose values and bounds, prescribed by oncologists
• Influence Matrix, A, for dose calculation.
(Computed by radiation physicists; a research issue by itself)
11
Influence Matrix A
At a fixed energy level for the linear accelerator, the amount of outlet radiationis determined by the exposure time.
Aij = radiation received at voxel i due to a unit emission from beamlet j.
With scattering, A is fairly dense.
12
Dose Calculation
The total dosage accumulated at voxel i corresponding to beamlet intensityvalues x = (x1, x2, · · · , xn) is
di =n∑
j=1
Aijxj, i = 1, 2, · · · , n,
or simply put d = Ax.
[dose] = [influence matrix] * [beamlet intensity]
— The formula is obviously a first-order approximation.
— Influence matrix A is m× n (say, 500, 000× 20, 000).
— A row for each voxel and a column for each beamlet.
13
Beamlet Intensity Optimization
Given data:
1. a “prescription vector” b from physicians;
2. an influence matrix A from physicists.
Ideal Formulation: Find beamlet-intensity vector x such that
d = Ax, x ≥ 0
di = bi, i ∈ {tumors voxels}di ≤ bi, i ∈ {healthy voxels}.
This over-determined system has no solution. Something’s gotta give.
14
No Pain, No Gain: Dose-Volume Constraints
• Unfortunately, killing cancers requires sacrifices.
• Some organs can sustain a certain degree of damage while stillfunctioning, and can eventually recover.
• Dose-volume constraints allow carefully controlled overdoses.
Dose-volume constraint (DVC):
A given percentage of the volume of an structure can exceed its prescribeddose upper-bound. E.g.,
• 30% of right lung may receive a dose greater than 19Gy; or70% of right lung should receive a dose less than 19Gy.
• Which 30% should be sacrificed? Need optimization.
15
Physicians’ Prescriptions are DVC-Based
Simple Case: one tumor in the right lung
Prescription:-------------------------------------> 95% of Tumor receives > 63 Gy< 1% of Tumor receives > 72 Gy> 95% of Ext_Tumor receives > 60 Gy< 1% of Ext_Tumor receives > 70 Gy-------------------------------------< 1% of Cord receives > 43 Gy< 15% of Heart receives > 30 Gy< 20% of Esophagus receives > 10 Gy< 2% of Lt_Lung receives > 20 Gy< 8% of Lt_Lung receives > 10 Gy< 30% of Rt_Lung receives > 19 Gy< 40% of Rt_Lung receives > 10 Gy< 50% of Norm_Tissue receives > 54 Gy-------------------------------------
16
(FDA-Approved) Current Practice
Assume there are 1 tumor and 5 healthy structures, labelled Structures 0 to 5.The k-th structure consists of voxels in the set Sk.
Weighted least-squares formulation used by IMRT vendors:
minx≥0
5∑k=0
wkfk(x)
where fk’s are quadratic penalty functions,
f0(x) =∑i∈S0
[(Ax)i − b0)]2,
fk(x) =∑i∈Sk
[max(0, (Ax)i − bk)]2, k = 1, 2, · · · , 5.
(i) Ad hoc terms may be added to “encourage” DVC satisfaction.(ii) Method of choice: gradient descent (+ projection?)
17
Current Practice: Pros & Cons
• Relationships between a prescription (DVCs) and weights are nottransparent, nor predictable.
• Manual try-and-error requires experienced personnel and time. Even so itoften has difficulty to produce acceptable plans.
• Planning requires multiple meetings between physicians and physicists(average 2 weeks from arrival to treatment, including CT scans, imageprocessing, and planning).
• For fixed weights, least-squares problems can be approximately solvedquickly by algorithms easy to implement.
Summarize Cons in 3 words?
17
Current Practice: Pros & Cons
• Relationships between a prescription (DVCs) and weights are nottransparent, nor predictable.
• Manual try-and-error requires experienced personnel and time. Even so itoften has difficulty to produce acceptable plans.
• Planning requires multiple meetings between physicians and physicists(average 2 weeks from arrival to treatment, including CT scans, imageprocessing, and planning).
• For fixed weights, least-squares problems can be approximately solvedquickly by algorithms easy to implement.
Summarize Cons in 3 words? Weights! Weights! Weights!
18
MIP & LP: favored by Optimizers
MIP: Mixed Integer Programming LP: Linear Programming
Example DVC: 75 out of 100 voxels receive 50Gy or less.
(Ax)i ≤ 50 + 500yi, i = 1 : 100100∑i=1
yi ≤ 25, yi ∈ {0, 1}, ∀i
— Rigorous formulations, but hard to solve.(E. Lee et al, R. Rardin et al, ......)
— LP approximations: faster but still costly in practice.(Ahuja et al, Holder, Merritt/Z et al.....)
— Need sophisticated optimization software like Cplex.
— High-accuracy constraint satisfaction. (Is it necessary?)
— Interior-point method for LP not warm-start friendly.
19
Part II
Our Most Recent Work in Progress
Our Goals:
• Simple, “weightless”, formulations & very fast algorithms
• Optimization process driven directly by prescriptions (DVCs).
20
Model in Higher Space
Prescription : Tumor dose = bt; DVCs are for healthy structures only.
Influence Matrix Partition: A =[
At
Ah
](Tumor and Healthy).
(Tumor: deliverable) Atx = bt (desirable, given)
(Healthy: deliverable) Ahx + s = u (feasible, unknown)
(Nonnegativity) x ≥ 0, s ≥ 0 (s = slacks for healthy voxels)
(Healthy: DVCs) u ∈ Dv (Dv = {doses satisfying DVCs})
• s, u in 3D “dose space” while x in 2D “beamlet space”. More work?
• Better to have more degrees of freedom than weights.
• Still over-determined.
21
DVC set Dv: Non-convex but “nice”
Dv is a union of “boxes”, and non-convex. E.g., 50% of u ≤ 1 in <2+:
Dv = {u ∈ <2+ : u1 ≤ 1} ∪ {u ∈ <2
+ : u2 ≤ 1}.
“Projection” onto Dv is easy. E.g., 70% of u ≤ 5:
ProjDv([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) = [1, 2, 3, 4, 5, 5, 5, 8, 9, 10].
22
A Geometric Formulation
• Physician’s dose prescription: a non-convex set
Prescription Set: H ={[
bt
u
]: u ∈ Dv
}⊂ <m
+
• Physicists’ dose calculation: a close convex cone
Physical Set: K ={[
AtxAhx + s
]: (x, s) ≥ 0
}⊂ <m
+
• Find dH ∈ H and dK ∈ K such that
dist(dH, dK) = dist(H,K).
24
Optimization in Dose Space
Define objective function:
f(u) = minx,s≥0
12
(‖Atx− bt‖2 + ‖Atx + s− u‖2
)≡ 1
2dist2
([bt
u
],K
).
min f(u), s.t. u ∈ Dv.
Theoretically Hard: The feasibility set Dv is non-convex.May not be practically hard if approximate solutions are allowed.
• f(u) is the optimal value of an NNLS problem.
• f(u) is continuous and “practically” differentiable.
• ∇f(u) = −max(0, Ahx(u)− u) ≤ 0.
• f(u) monotonically decreases as u increases.
25
Two Simple Algorithms
Recall that ProjDvis easy (so is ProjH).
— Gradient Projection Algorithm: Given u0 ∈ Dv
uk+1 = Proj [uk − α∇f(uk)]Dv, k = 0, 1, 2, · · · .
— Successive Projection Algorithm: Given d0 ∈ H
dk+1 = ProjH ( ProjK (dk) ) , k = 0, 1, 2, · · · .
Very old algorithm for convex sets, extended to our non-convex case.(Conjecture: It guarantees convergence to a local minimum.)
Theorem: The two algorithms are equivalent if α ≡ 1.
27
Which healthy voxels are sacrificed?
• Starting points must satisfy all dose upper bounds.
• Algorithms then automatically selects voxels to sacrifice.
• Selections are based on voxels’ sensitivity w.r.t. f(u).
• The algorithms are GREEDY.
How close to optimum the computed solutions are depends on how correctlythe sacrificed voxels are chosen.
28
Non-Negative Least Squares (NNLS)
Projection onto K requires solving NNLS of the form:
min q(x) :=12‖Bx− b‖2 s.t. x ≥ 0.
• In our cases, B can have sizes up to O(106)×O(106).
• Classic active-set algorithms (e.g., Lawson & Hanson) are too slow.
• Newton-type interior-point algorithms are too costly.
• We need very fast algorithms for solving NNLS.
29
IPSG for NNLS
Interior-Point Scaled Gradient (IPSG) Algorithm:
x0 > 0, xk+1 = xk − αk D−1k ∇q(xk) > 0,
where Dk > 0 is diagonal, and αk minimizes q(xk − αD−1∇q(x)) or is a stepbefore boundary, whichever smaller. IPSG decreases q(x) monotonically.
• We have studied Dk = Diag((BTBxk + rk) ./ xk) > 0.
• Theorem: If BTBxk + rk > 0, BT b + rk > 0 and {rk} is bounded, then(i) αk ≥ 1; (ii) if {xk} converges, it converges to the optimum of NNLS.
• In our case, b > 0 and B ≥ 0, we can set rk ≡ 0.
• It has always converged so far in our experiments, and fairly quickly.
30
IPSG vs. Matlab “lsqnonneg”
>> driverGenerating problem: size [m n] = [150 120]
ipsg time: 4.2000e-01 Residual = 4.140135e+00lsqnonneg time: 9.4400e+00 Residual = 4.139811e+00
>> driverGenerating problem: size [m n] = [10000 200]
ipsg time: 4.9600e+00 Residual = 4.500604e+01lsqnonneg time: 1.0120e+04 Residual = 4.500214e+01
(5 sec. vs. 2.8 hrs. Relative accuracy: 8.67e-05)
>> driverGenerating problem: size [m n] = [1000 800]
ipsg time: 8.3500e+00 Residual = 8.674440e+00lsqnonneg time: 7.0144e+03 Residual = 8.674433e+00(8+ sec. vs. 1.95 hrs. Relative accuracy: 7.55e-07)
Matrices are dense & random. Comparisons with other methods are needed.
31
Concluding Remarks
• Helping save lives is an optimizer’s dream application.
• Can we really make a difference? Absolutely, given time.
• More and more optimization problems are appearing in medicine.
31
Concluding Remarks
• Helping save lives is an optimizer’s dream application.
• Can we really make a difference? Absolutely, given time.
• More and more optimization problems are appearing in medicine.
Preliminary Simulations Demo:
• 2D phantom; 1 tumor, 1 or 2 OAR (organ at risk).
• Geometries constructed to represent difficult test cases
• Matlab implementation using successive projection and IPSG.
• Key: Does the algorithm choose the right voxels to sacrifice?