optimization in action: helping treat cancers

Optimization in Action: Helping Treat Cancers

Yin ZhangDepartment of Computational

and Applied MathematicsRice University

February 27, 2004

1

Outline

• Part I. Intensity Modulated Radiation Therapy (IMRT)

? How does IMRT work?? Planning data before optimization? Current Approaches

• Part II. Our Most Recent Work in Progress

? A geometric formulation and fast algorithms? Preliminary results and Demo

1

Outline

• Part I. Intensity Modulated Radiation Therapy (IMRT)

? How does IMRT work?? Planning data before optimization? Current Approaches

• Part II. Our Most Recent Work in Progress

? A geometric formulation and fast algorithms? Preliminary results and Demo

Collaborators and Student:

— Dr. Rahde Mohan and group, M.D. Anderson Cancer Center

— Optimization Collaborative Working Group3 oncologists, 3 Physicists, 4 Optimizers.

— Graduate Student: Michael Merritt, CAAM Dept, Rice University

2

Part I: IMRT

Who gets Cancer?

“Approximately one out of every two American men and one out of every threeAmerican women will have some type of cancer at some point during theirlifetime.” — American Cancer Society

How is Cancer Treated?

“The four major types of treatment for cancer are surgery, radiation,chemotherapy, and biologic therapies.” — American Cancer Society

What is IMRT?

“IMRT (Intensity Modulated Radiation Therapy) is a state-of-the-art cancertreatment method that delivers high doses of radiation directly to cancer cellsin a very targeted way, much more precisely than is possible with conventionalradiotherapy ... while sparing more of the surrounding healthy tissue.” — AVendor

3

Major IMRT Equipments

• a medical linear accelerator with

• a computer-controlled multi-leaf collimator

A Rotating Medical Linear Accelerator

4

Computer-Controlled Multi-Leaf Collimator

Leafs move up and down

5

Another Multi-Leaf Collimator (MLC)

Leafs move left and right

6

Crossfire: Focus on Tumors and Spare the Eyes

Multi-leafs form a sequence of patterns at each angle

7

Major Optimization Variables in IMRT

4 angles & 4 intensity profiles

• Beam angles:How many? Usually 5 to 9What angles? Need optim.

• Intensity Profiles (Maps):

Ia(x, y), a = 1, 2, · · ·

A 2D function for each angle.

• Leaf sequences for MLC todeliver intensity maps.

We’ll concentrate on the 2nd issue.

8

Intensity Profile Optimization

General Statement:

Given the number and values of beam angles, determine the optimal intensityprofiles

Ia(x, y), a = 1, 2, · · · ,

for all the angles such that

• tumors cells receive prescribed amounts of radiation;

• healthy cells receive as little radiation as possible.

9

Discretization

• A 2D grid on MLC for each beam angle a:

Ia(x, y) =⇒ Ia(xi, yj), a = 1, 2, · · · .

Each little rectangle is a “beamlet ”, with a unknown intensity value.

• A 3D grid on the region of treatment (tumors, critical organs, ...).

Each little cube is a “voxel ”, with a known desired dose value (tumor) or aknown dose upper-bound (healthy tissue).

Re-arrange voxels and beamlets into 1D vectors:

• voxels: i = 1, 2, · · · ,m (105 − 106)

• beamlets: j = 1, 2, · · · , n (2000− 20000)

10

Planning variables and Data

Variables (Unknowns):— Intensity Ia(xi, yj), rearranged and renamed as

xj, j = 1, 2, · · · , n; beamlet intensities

Data:

• CT-scan images, geometric contours for structures, ......

• Desired dose values and bounds, prescribed by oncologists

• Influence Matrix, A, for dose calculation.

(Computed by radiation physicists; a research issue by itself)

11

Influence Matrix A

At a fixed energy level for the linear accelerator, the amount of outlet radiationis determined by the exposure time.

Aij = radiation received at voxel i due to a unit emission from beamlet j.

With scattering, A is fairly dense.

12

Dose Calculation

The total dosage accumulated at voxel i corresponding to beamlet intensityvalues x = (x1, x2, · · · , xn) is

di =n∑

j=1

Aijxj, i = 1, 2, · · · , n,

or simply put d = Ax.

[dose] = [influence matrix] * [beamlet intensity]

— The formula is obviously a first-order approximation.

— Influence matrix A is m× n (say, 500, 000× 20, 000).

— A row for each voxel and a column for each beamlet.

13

Beamlet Intensity Optimization

Given data:

1. a “prescription vector” b from physicians;

2. an influence matrix A from physicists.

Ideal Formulation: Find beamlet-intensity vector x such that

d = Ax, x ≥ 0

di = bi, i ∈ {tumors voxels}di ≤ bi, i ∈ {healthy voxels}.

This over-determined system has no solution. Something’s gotta give.

14

No Pain, No Gain: Dose-Volume Constraints

• Unfortunately, killing cancers requires sacrifices.

• Some organs can sustain a certain degree of damage while stillfunctioning, and can eventually recover.

• Dose-volume constraints allow carefully controlled overdoses.

Dose-volume constraint (DVC):

A given percentage of the volume of an structure can exceed its prescribeddose upper-bound. E.g.,

• 30% of right lung may receive a dose greater than 19Gy; or70% of right lung should receive a dose less than 19Gy.

• Which 30% should be sacrificed? Need optimization.

15

Physicians’ Prescriptions are DVC-Based

Simple Case: one tumor in the right lung

Prescription:-------------------------------------> 95% of Tumor receives > 63 Gy< 1% of Tumor receives > 72 Gy> 95% of Ext_Tumor receives > 60 Gy< 1% of Ext_Tumor receives > 70 Gy-------------------------------------< 1% of Cord receives > 43 Gy< 15% of Heart receives > 30 Gy< 20% of Esophagus receives > 10 Gy< 2% of Lt_Lung receives > 20 Gy< 8% of Lt_Lung receives > 10 Gy< 30% of Rt_Lung receives > 19 Gy< 40% of Rt_Lung receives > 10 Gy< 50% of Norm_Tissue receives > 54 Gy-------------------------------------

16

(FDA-Approved) Current Practice

Assume there are 1 tumor and 5 healthy structures, labelled Structures 0 to 5.The k-th structure consists of voxels in the set Sk.

Weighted least-squares formulation used by IMRT vendors:

minx≥0

5∑k=0

wkfk(x)

where fk’s are quadratic penalty functions,

f0(x) =∑i∈S0

[(Ax)i − b0)]2,

fk(x) =∑i∈Sk

[max(0, (Ax)i − bk)]2, k = 1, 2, · · · , 5.

(i) Ad hoc terms may be added to “encourage” DVC satisfaction.(ii) Method of choice: gradient descent (+ projection?)

17

Current Practice: Pros & Cons

• Relationships between a prescription (DVCs) and weights are nottransparent, nor predictable.

• Manual try-and-error requires experienced personnel and time. Even so itoften has difficulty to produce acceptable plans.

• Planning requires multiple meetings between physicians and physicists(average 2 weeks from arrival to treatment, including CT scans, imageprocessing, and planning).

• For fixed weights, least-squares problems can be approximately solvedquickly by algorithms easy to implement.

Summarize Cons in 3 words?

17

Current Practice: Pros & Cons

• Relationships between a prescription (DVCs) and weights are nottransparent, nor predictable.

• Manual try-and-error requires experienced personnel and time. Even so itoften has difficulty to produce acceptable plans.

• Planning requires multiple meetings between physicians and physicists(average 2 weeks from arrival to treatment, including CT scans, imageprocessing, and planning).

• For fixed weights, least-squares problems can be approximately solvedquickly by algorithms easy to implement.

Summarize Cons in 3 words? Weights! Weights! Weights!

18

MIP & LP: favored by Optimizers

MIP: Mixed Integer Programming LP: Linear Programming

Example DVC: 75 out of 100 voxels receive 50Gy or less.

(Ax)i ≤ 50 + 500yi, i = 1 : 100100∑i=1

yi ≤ 25, yi ∈ {0, 1}, ∀i

— Rigorous formulations, but hard to solve.(E. Lee et al, R. Rardin et al, ......)

— LP approximations: faster but still costly in practice.(Ahuja et al, Holder, Merritt/Z et al.....)

— Need sophisticated optimization software like Cplex.

— High-accuracy constraint satisfaction. (Is it necessary?)

— Interior-point method for LP not warm-start friendly.

19

Part II

Our Most Recent Work in Progress

Our Goals:

• Simple, “weightless”, formulations & very fast algorithms

• Optimization process driven directly by prescriptions (DVCs).

20

Model in Higher Space

Prescription : Tumor dose = bt; DVCs are for healthy structures only.

Influence Matrix Partition: A =[

At

Ah

](Tumor and Healthy).

(Tumor: deliverable) Atx = bt (desirable, given)

(Healthy: deliverable) Ahx + s = u (feasible, unknown)

(Nonnegativity) x ≥ 0, s ≥ 0 (s = slacks for healthy voxels)

(Healthy: DVCs) u ∈ Dv (Dv = {doses satisfying DVCs})

• s, u in 3D “dose space” while x in 2D “beamlet space”. More work?

• Better to have more degrees of freedom than weights.

• Still over-determined.

21

DVC set Dv: Non-convex but “nice”

Dv is a union of “boxes”, and non-convex. E.g., 50% of u ≤ 1 in <2+:

Dv = {u ∈ <2+ : u1 ≤ 1} ∪ {u ∈ <2

+ : u2 ≤ 1}.

“Projection” onto Dv is easy. E.g., 70% of u ≤ 5:

ProjDv([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) = [1, 2, 3, 4, 5, 5, 5, 8, 9, 10].

22

A Geometric Formulation

• Physician’s dose prescription: a non-convex set

Prescription Set: H ={[

bt

u

]: u ∈ Dv

}⊂ <m

+

• Physicists’ dose calculation: a close convex cone

Physical Set: K ={[

AtxAhx + s

]: (x, s) ≥ 0

}⊂ <m

+

• Find dH ∈ H and dK ∈ K such that

dist(dH, dK) = dist(H,K).

23

Geometry in Dose Space

Find a physical dose closest to prescription

24

Optimization in Dose Space

Define objective function:

f(u) = minx,s≥0

12

(‖Atx− bt‖2 + ‖Atx + s− u‖2

)≡ 1

2dist2

([bt

u

],K

).

min f(u), s.t. u ∈ Dv.

Theoretically Hard: The feasibility set Dv is non-convex.May not be practically hard if approximate solutions are allowed.

• f(u) is the optimal value of an NNLS problem.

• f(u) is continuous and “practically” differentiable.

• ∇f(u) = −max(0, Ahx(u)− u) ≤ 0.

• f(u) monotonically decreases as u increases.

25

Two Simple Algorithms

Recall that ProjDvis easy (so is ProjH).

— Gradient Projection Algorithm: Given u0 ∈ Dv

uk+1 = Proj [uk − α∇f(uk)]Dv, k = 0, 1, 2, · · · .

— Successive Projection Algorithm: Given d0 ∈ H

dk+1 = ProjH ( ProjK (dk) ) , k = 0, 1, 2, · · · .

Very old algorithm for convex sets, extended to our non-convex case.(Conjecture: It guarantees convergence to a local minimum.)

Theorem: The two algorithms are equivalent if α ≡ 1.

26

Successive Projection Algorithm

Also called Alternating Projections (von Neumann, 1930’s)

27

Which healthy voxels are sacrificed?

• Starting points must satisfy all dose upper bounds.

• Algorithms then automatically selects voxels to sacrifice.

• Selections are based on voxels’ sensitivity w.r.t. f(u).

• The algorithms are GREEDY.

How close to optimum the computed solutions are depends on how correctlythe sacrificed voxels are chosen.

28

Non-Negative Least Squares (NNLS)

Projection onto K requires solving NNLS of the form:

min q(x) :=12‖Bx− b‖2 s.t. x ≥ 0.

• In our cases, B can have sizes up to O(106)×O(106).

• Classic active-set algorithms (e.g., Lawson & Hanson) are too slow.

• Newton-type interior-point algorithms are too costly.

• We need very fast algorithms for solving NNLS.

29

IPSG for NNLS

Interior-Point Scaled Gradient (IPSG) Algorithm:

x0 > 0, xk+1 = xk − αk D−1k ∇q(xk) > 0,

where Dk > 0 is diagonal, and αk minimizes q(xk − αD−1∇q(x)) or is a stepbefore boundary, whichever smaller. IPSG decreases q(x) monotonically.

• We have studied Dk = Diag((BTBxk + rk) ./ xk) > 0.

• Theorem: If BTBxk + rk > 0, BT b + rk > 0 and {rk} is bounded, then(i) αk ≥ 1; (ii) if {xk} converges, it converges to the optimum of NNLS.

• In our case, b > 0 and B ≥ 0, we can set rk ≡ 0.

• It has always converged so far in our experiments, and fairly quickly.

30

IPSG vs. Matlab “lsqnonneg”

>> driverGenerating problem: size [m n] = [150 120]

ipsg time: 4.2000e-01 Residual = 4.140135e+00lsqnonneg time: 9.4400e+00 Residual = 4.139811e+00


ipsg time: 4.9600e+00 Residual = 4.500604e+01lsqnonneg time: 1.0120e+04 Residual = 4.500214e+01

(5 sec. vs. 2.8 hrs. Relative accuracy: 8.67e-05)


ipsg time: 8.3500e+00 Residual = 8.674440e+00lsqnonneg time: 7.0144e+03 Residual = 8.674433e+00(8+ sec. vs. 1.95 hrs. Relative accuracy: 7.55e-07)

Matrices are dense & random. Comparisons with other methods are needed.

31

Concluding Remarks

• Helping save lives is an optimizer’s dream application.

• Can we really make a difference? Absolutely, given time.

• More and more optimization problems are appearing in medicine.

31

Concluding Remarks

• Helping save lives is an optimizer’s dream application.

• Can we really make a difference? Absolutely, given time.

• More and more optimization problems are appearing in medicine.

Preliminary Simulations Demo:

• 2D phantom; 1 tumor, 1 or 2 OAR (organ at risk).

• Geometries constructed to represent difficult test cases

• Matlab implementation using successive projection and IPSG.

• Key: Does the algorithm choose the right voxels to sacrifice?

optimization in action: helping treat cancers

Documents