fast gpu monte carlo simulation for radiotherapy, dna ...€¦ · geant4 • toolkit for simulation...
TRANSCRIPT
![Page 1: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety](https://reader036.vdocument.in/reader036/viewer/2022071515/6137d8ab0ad5d2067648e36c/html5/thumbnails/1.jpg)
Fast GPU Monte Carlo Simulation for Radiotherapy, DNA Ionization and
Beyond2017 GPU Technology Conference
Shogo Okada <[email protected]> Koichi Murakami <[email protected]>
Nick Henderson <[email protected]>
![Page 2: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety](https://reader036.vdocument.in/reader036/viewer/2022071515/6137d8ab0ad5d2067648e36c/html5/thumbnails/2.jpg)
Outline
Geant4 GPUexperimentation MPEXS
Algorithmresearch
ApplicationdevelopmentGeant4
multi-threading
![Page 3: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety](https://reader036.vdocument.in/reader036/viewer/2022071515/6137d8ab0ad5d2067648e36c/html5/thumbnails/3.jpg)
Big Picture
![Page 4: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety](https://reader036.vdocument.in/reader036/viewer/2022071515/6137d8ab0ad5d2067648e36c/html5/thumbnails/4.jpg)
(~x, ~p, k)
k 2 {�, e�, e+, . . . }Goal: record effect of particle
interaction in material
![Page 5: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety](https://reader036.vdocument.in/reader036/viewer/2022071515/6137d8ab0ad5d2067648e36c/html5/thumbnails/5.jpg)
Geant4• Toolkit for simulation of particles traveling
through and interacting with matter • Supports wide variety of physics models,
geometries, and materials • Extendable - users can add new models • Used in numerous and diverse
application areas • high energy physics • medical physics • spacecraft • semiconductor devices • biology research
ATLAS
LISA
gMocren
![Page 6: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety](https://reader036.vdocument.in/reader036/viewer/2022071515/6137d8ab0ad5d2067648e36c/html5/thumbnails/6.jpg)
Parallelism• Simulations require many events for statistical significance • Events are IID • Each computation thread processes an event Challenges:• Random nature of simulation leads to thread divergence • Storage of secondary particles • Recording of energy deposition If you want to consider full capability of Geant4:• Very complicated geometry -- non uniform data structures • Many material types • Large data tables to support physics processes
![Page 7: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety](https://reader036.vdocument.in/reader036/viewer/2022071515/6137d8ab0ad5d2067648e36c/html5/thumbnails/7.jpg)
MPEXS• MPEXS is an adaptation of the core simulation algorithm from Geant4 for
GPU • Target application: X-ray radiotherapy • Geometry: uniformly discretized box • Material: Water with variable density • Physics: Low energy electromagnetics
• Gamma: Compton scattering, photoelectric effect, pair-production • Electron/Positron: ionization, multiple scattering, Bremsstrahlung,
positron annihilation • Each GPU thread tracks an active particle • Secondary particles are stored on thread-local secondary stacks • Threads deposit energy to a shared global domain (via atomicAdd)
![Page 8: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety](https://reader036.vdocument.in/reader036/viewer/2022071515/6137d8ab0ad5d2067648e36c/html5/thumbnails/8.jpg)
MPEXS - Performance & Validation
![Page 9: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety](https://reader036.vdocument.in/reader036/viewer/2022071515/6137d8ab0ad5d2067648e36c/html5/thumbnails/9.jpg)
Verification for Dose Distribution
z
y
densitywater 1.0 g/cm3
lung 0.26 g/cm3
bone 1.85 g/cm3
air 0.0012 g/cm3
- phantom size : 30.5 x 30.5 x 30 cm - voxel size : 5 x 5 x 2 mm- field size : 10 cm2- SSD : 100 cm- slab materials :
(1) water(2) lung(3) bone
air
source
Beam particle and its initial kinetic energy: - electron with 20MeV - photon with 6MV Linac- photon with 18MV Linac
Dose Distribution of slab phantoms
![Page 10: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety](https://reader036.vdocument.in/reader036/viewer/2022071515/6137d8ab0ad5d2067648e36c/html5/thumbnails/10.jpg)
Comparison of depth dose for γ 6MV
− G4 v9.6.3�− G4CU
(1) water
• x-axis: z-direction (cm)• y-axis: dose (Gy)• residual = (G4CU−G4) / G4
(2) lung (3) bone
depth (cm)0 5 10 15 20 25 30
dose
(Gy)
0.05
0.1
0.15
0.2
0.25
0.3
-310×
G4G4CU
depth dose distribution
depth (cm)0 5 10 15 20 25 30
resi
dual
-0.2
-0.1
0
0.1
0.2
depth (cm)0 5 10 15 20 25 30
dose
(Gy)
0.1
0.15
0.2
0.25
0.3
-310×
G4G4CU
depth dose distribution
depth (cm)0 5 10 15 20 25 30
resi
dual
-0.2
-0.1
0
0.1
0.2
depth (cm)0 5 10 15 20 25 30
dose
(Gy)
0.01
0.02
0.03
0.04
0.05
0.06
0.07-310×
G4G4CU
depth dose distribution
depth (cm)0 5 10 15 20 25 30
resi
dual
-0.2
-0.1
0
0.1
0.2
lung bone
MPEXS
MPEXS
MPEXS
MPEXS MPEXS
![Page 11: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety](https://reader036.vdocument.in/reader036/viewer/2022071515/6137d8ab0ad5d2067648e36c/html5/thumbnails/11.jpg)
Comparison of depth dose for γ 18MV
− G4 v9.6.3�− G4CU
(1) water
• x-axis: z-direction (cm)• y-axis: dose (Gy)• residual = (G4CU−G4) / G4
(2) lung (3) bone
depth (cm)0 5 10 15 20 25 30
dose
(Gy)
0.02
0.04
0.06
0.08
0.1
0.12
-310×
G4G4CU
depth dose distribution
depth (cm)0 5 10 15 20 25 30
resi
dual
-0.2
-0.1
0
0.1
0.2
depth (cm)0 5 10 15 20 25 30
dose
(Gy)
0.02
0.04
0.06
0.08
0.1
0.12
-310×
G4G4CU
depth dose distribution
depth (cm)0 5 10 15 20 25 30
resi
dual
-0.2
-0.1
0
0.1
0.2
depth (cm)0 5 10 15 20 25 30
dose
(Gy)
0.02
0.04
0.06
0.08
0.1
0.12
-310×
G4G4CU
depth dose distribution
depth (cm)0 5 10 15 20 25 30
resi
dual
-0.2
-0.1
0
0.1
0.2
lung bone
MPEXS
MPEXS
MPEXS
MPEXS MPEXS
![Page 12: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety](https://reader036.vdocument.in/reader036/viewer/2022071515/6137d8ab0ad5d2067648e36c/html5/thumbnails/12.jpg)
depth (cm)0 5 10 15 20 25 30
dose
(Gy)
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
-310×
G4G4CU
depth dose distribution
depth (cm)0 5 10 15 20 25 30
resi
dual
-0.2
-0.1
0
0.1
0.2depth (cm)
0 5 10 15 20 25 30
dose
(Gy)
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
-310×
G4G4CU
depth dose distribution
depth (cm)0 5 10 15 20 25 30
resi
dual
-0.2
-0.1
0
0.1
0.2
depth (cm)0 5 10 15 20 25 30
dose
(Gy)
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
-310×
G4G4CU
depth dose distribution
depth (cm)0 5 10 15 20 25 30
resi
dual
-0.2
-0.1
0
0.1
0.2
Comparison of depth dose for e- 20MeV
− G4 v9.6.3�− G4CU
(1) water
• x-axis: z-direction (cm)• y-axis: dose (Gy)• residual = (G4CU−G4) / G4
(2) lung (3) bone
depth (cm)0 5 10 15 20 25 30
dose
(Gy)
-610
-510
-410
depth dose distribution
log scale
depth (cm)0 5 10 15 20 25 30
dose
(Gy)
-610
-510
-410
depth dose distribution
depth (cm)0 5 10 15 20 25 30
dose
(Gy)
-610
-510
-410
depth dose distribution
log scale log scale
lung bone
MPEXS
MPEXS
![Page 13: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety](https://reader036.vdocument.in/reader036/viewer/2022071515/6137d8ab0ad5d2067648e36c/html5/thumbnails/13.jpg)
Computation Time Performance
γ beam with 6MV γ beam with 18MV
(1) water (2) lung (3) bone (1) water (2) lung (3) bone
G4 [msec/particle] 0.780 0.822 0.819 0.803 0.857 0.924
G4CU [msec/particle] 0.00336 0.00331 0.00341 0.00433 0.00425 0.00443
× speedup factor( = G4 / G4CU ) 232 248 240 185 201 208
GPU:- Tesla K20c (Kepler architecture)- 2496 cores, 706 MHz- 4096 x 128 threads
- # of primaries
- 50M particles -> e- 20MeV
- 500M particles -> γ 6MV, 18MV
CPU:- Xeon E5-2643 v2 3.50 GHz
e- beam with 20MeV
(1) water (2) lung (3) bone
G4 [msec/particle] 1.84 1.87 1.65
G4CU [msec/particle] 0.00881 0.00958 0.00885
× speedup factor( = G4 / G4CU ) 208 195 193
185~250 times speedup against single-core G4 simulation!!
MPEXS
/ MPEXS)
MPEXS
/ MPEXS)
![Page 14: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety](https://reader036.vdocument.in/reader036/viewer/2022071515/6137d8ab0ad5d2067648e36c/html5/thumbnails/14.jpg)
Algorithm Research
![Page 15: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety](https://reader036.vdocument.in/reader036/viewer/2022071515/6137d8ab0ad5d2067648e36c/html5/thumbnails/15.jpg)
• MPEXS does not attempt to sort particles
• Thread divergence: if threads in the same warp are tracking different particle kinds, then thread divergence occurs in physics process code
• Size of particle stack is the same for each thread and is fixed at run-time. Some applications call for the generation of many secondary particles. This restriction meant that we could only run with a small number of active threads.
![Page 16: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety](https://reader036.vdocument.in/reader036/viewer/2022071515/6137d8ab0ad5d2067648e36c/html5/thumbnails/16.jpg)
ᶕ e- e+ ᶕ ᶕ e- e- ᶕ
ᶕ e- e+ ᶕ ᶕ e- e- ᶕ
computation
ᶕ process
e- process
e+ process
ᶕ ᶕ ᶕ ᶕ
e- e- e-
e+
particles in memory
0 1 2 3 4 5 6 7index
particles in memory
![Page 17: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety](https://reader036.vdocument.in/reader036/viewer/2022071515/6137d8ab0ad5d2067648e36c/html5/thumbnails/17.jpg)
MPEXS Experiments
• Initialize each thread with the same random number generator state. This leads to a non-physical simulation, but eliminates thread divergence. We saw a factor 3x speedup in these runs.
• Measure the time it takes to sort particle index by selected process and perform a run length encode against the time for a single trip through event loop. Calculations indicate we should expect a factor 2x speedup if implemented in full simulation.
![Page 18: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety](https://reader036.vdocument.in/reader036/viewer/2022071515/6137d8ab0ad5d2067648e36c/html5/thumbnails/18.jpg)
New Architecture
• Goal 1: minimize/eliminate thread divergence
• Goal 2: eliminate need for fixed-size and thread-local secondary stacks
• Goal 3: maintain extensibility
![Page 19: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety](https://reader036.vdocument.in/reader036/viewer/2022071515/6137d8ab0ad5d2067648e36c/html5/thumbnails/19.jpg)
How it works
![Page 20: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety](https://reader036.vdocument.in/reader036/viewer/2022071515/6137d8ab0ad5d2067648e36c/html5/thumbnails/20.jpg)
input buffer ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ
![Page 21: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety](https://reader036.vdocument.in/reader036/viewer/2022071515/6137d8ab0ad5d2067648e36c/html5/thumbnails/21.jpg)
input buffer ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ
output buffers ˠ ˠ ˠ ˠ ˠ ˠ e- e-
![Page 22: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety](https://reader036.vdocument.in/reader036/viewer/2022071515/6137d8ab0ad5d2067648e36c/html5/thumbnails/22.jpg)
input buffer ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ
pop
ˠ ˠ ˠ ˠ ˠ
output buffers ˠ ˠ ˠ ˠ ˠ ˠ e- e-
![Page 23: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety](https://reader036.vdocument.in/reader036/viewer/2022071515/6137d8ab0ad5d2067648e36c/html5/thumbnails/23.jpg)
input buffer ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ
pop
ˠ ˠ ˠ ˠ ˠ
process selection
ˠ ˠ ˠ ˠ ˠCompton scattering
Photoelectric effect
output buffers ˠ ˠ ˠ ˠ ˠ ˠ e- e-
![Page 24: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety](https://reader036.vdocument.in/reader036/viewer/2022071515/6137d8ab0ad5d2067648e36c/html5/thumbnails/24.jpg)
input buffer ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ
pop
ˠ ˠ ˠ ˠ ˠ
process selection
ˠ ˠ ˠ ˠ ˠ
output buffers ˠ ˠ ˠ ˠ ˠ ˠ e- e-
sort by selected process
Compton scattering
Photoelectric effect
![Page 25: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety](https://reader036.vdocument.in/reader036/viewer/2022071515/6137d8ab0ad5d2067648e36c/html5/thumbnails/25.jpg)
input buffer ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ
pop
ˠ ˠ ˠ ˠ ˠ
process selection
ˠ ˠ ˠ ˠ ˠ
secondary generation
secondary particles ˠ ˠ ˠ e- e- e- e-e-
output buffers ˠ ˠ ˠ ˠ ˠ ˠ e- e-
sort by selected process
Compton scattering
Photoelectric effect
![Page 26: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety](https://reader036.vdocument.in/reader036/viewer/2022071515/6137d8ab0ad5d2067648e36c/html5/thumbnails/26.jpg)
input buffer ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ
pop
ˠ ˠ ˠ ˠ ˠ
process selection
ˠ ˠ ˠ ˠ ˠ
secondary generation
secondary particles ˠ ˠ ˠ e- e- e- e-e-
output buffers ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ e- e- e- e- e- e-e-
secondary storage
sort by selected process
Compton scattering
Photoelectric effect
![Page 27: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety](https://reader036.vdocument.in/reader036/viewer/2022071515/6137d8ab0ad5d2067648e36c/html5/thumbnails/27.jpg)
Features• Store particles on a generalized stack that allows pushing and popping a block of
particles in one operation.
• Group particles by kind (gamma, e-, e+). When we pop a block of particles, we know they are all the same kind, thus we can apply the same (non-divergent) operations.
• Maintain separate input and output buffers. Physics processes know the input and output particles. For example, in Compton scattering the input is a photon and the output is a scattered photon and a recoil electron. Thus, we can read from the active input photon buffers and write to output electron and photon buffers that are pushed onto appropriate stacks.
• The sort and run-length encode operations are applied after process selection so that after-step processes are applied only to particles that call for it.
![Page 28: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety](https://reader036.vdocument.in/reader036/viewer/2022071515/6137d8ab0ad5d2067648e36c/html5/thumbnails/28.jpg)
Properties• No thread-divergence due to process selection. Thread
divergence may occur in the application of a physics process, because many of them rely on sample-reject algorithms to sample from various distributions.
• Have non-coalesced reads of particle data in the after-step physics process. However, all writes of particle data is coalesced. We have to pay for the randomness somewhere.
• Thread-local stacks are not required.
![Page 29: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety](https://reader036.vdocument.in/reader036/viewer/2022071515/6137d8ab0ad5d2067648e36c/html5/thumbnails/29.jpg)
Experiments• The new architecture is substantially different from MPEXS. We
have not yet ported the physics processes over. We've done performance experiments with fake/model physics processes (which mimic computation and memory access patterns of the real ones).
• We can vary the number of physics processes and the amount of data moved. The numbers shown are the speed up of the new-architecture against the old for a variety of configurations.
![Page 30: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety](https://reader036.vdocument.in/reader036/viewer/2022071515/6137d8ab0ad5d2067648e36c/html5/thumbnails/30.jpg)
Speedup via new architecture• Speedup due to sorting by process id for fake/model processes • Vary number of process and amount of data required by each process • Results collected from K40
Number of processes1 2 4 8 16 32 64 128
Dat
a tra
nsfe
r (flo
at #
)
1 0.5 0.6 0.8 1.0 1.3 1.8 2.7 4.12 0.5 0.7 0.8 1.0 1.5 2.1 2.9 4.24 0.6 0.7 0.9 1.2 1.8 2.6 3.4 4.68 0.6 0.8 1.1 1.6 2.4 3.3 4.2 5.216 0.6 1.0 1.5 2.1 3.0 4.1 4.9 5.932 0.7 1.2 1.8 2.6 3.6 4.6 5.4 6.364 0.8 1.4 2.0 2.8 3.9 4.9 5.7 6.6128 0.9 1.7 2.4 3.1 4.0 5.1 5.9 6.7
speedup
![Page 31: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety](https://reader036.vdocument.in/reader036/viewer/2022071515/6137d8ab0ad5d2067648e36c/html5/thumbnails/31.jpg)
Summary• MPEXS is a GPU-based Monte Carlo simulator for X-ray radiotherapy
• MPEXS attains around 200x speedup when compared to Geant4 running on single CPU core
• Algorithm experimentation indicates a further 2x speed up with a sort operation after process selection
• New architecture also opens opportunities for other applications
• better performance with more physics processes
• no thread-local secondary stacks
![Page 32: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety](https://reader036.vdocument.in/reader036/viewer/2022071515/6137d8ab0ad5d2067648e36c/html5/thumbnails/32.jpg)
Outline
Geant4 GPUexperimentation MPEXS
Algorithmresearch
ApplicationdevelopmentGeant4
multi-threading
![Page 33: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety](https://reader036.vdocument.in/reader036/viewer/2022071515/6137d8ab0ad5d2067648e36c/html5/thumbnails/33.jpg)
MPEXS-DNA
![Page 34: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety](https://reader036.vdocument.in/reader036/viewer/2022071515/6137d8ab0ad5d2067648e36c/html5/thumbnails/34.jpg)
The Geant4-DNA Project“Geant4-DNA”, an extension of Geant4 to DNA physics
• Estimates biological effects (e.g. DNA strand breaks) by radiation with ultra low energy scale (down to meV)
• The main objective of the project: • Evaluates effects on human health in chronic radiation exposure
• ex.) Medical diagnostic, Astronauts in space missions, Airline crews, …
• Should be improved its computing performance using GPU power. • Energy spread in cells is an important factor for DNA damage.
• Geant4-DNA calculates complex track geometry within cells. • Needs to handle a large number of secondary particles.
• ex.) More than 20k secondaries are generated per primary • Days-Weeks simulation on CPU cluster
![Page 35: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety](https://reader036.vdocument.in/reader036/viewer/2022071515/6137d8ab0ad5d2067648e36c/html5/thumbnails/35.jpg)
• Based on Geant4-DNA 10.02 p03 • EM Physics for lower energy range (down to meV)
• Calculates energy loss and generates primary molecules like excited and ionized H2O.
• Radiolysis of water • Diffusion and production of chemical species
• Estimates DNA damage (-> future work).A single He+ 100 keV produces direct DNA damages • 5 Single Strand Breaks • 2 Double Strand Breaks in a total of 1.2×108 basis elementary volumes
Chromatine fiber (constituent of chromosomes)
EM shower in DNA ∅
10 nm
© CENBG
in collaboration with G. Cosmo, CERN
Courtesy of Sebasien Incerti (IN2P3-CNRS / CENBG)
Physics phase: primary radiation interacting with matter (DNA) and producing radicals Chemistry phase: Brownian motion of radicals (further cell level damage) and interactions between radicals 1. Physical Phase 2. Chemical Phase
• Calculates dose distributions • Generates primary chemical
species like H2O*, H2O-/+, e-aq
Diffusion and reactions for chemical species
3. Biological Phase (Future work)
MPEXS-DNA, microdosimetry simulation on GPU
http://www.windows2universe.org/earth/Life/cell_radiation_damage.html
![Page 36: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety](https://reader036.vdocument.in/reader036/viewer/2022071515/6137d8ab0ad5d2067648e36c/html5/thumbnails/36.jpg)
Physics Processes for X-rays
Compton scattering 100 eV - 1 GeV, Livermore
Photoelectric effect 100 eV - 1 GeV, Livermore
Gamma conversion 100 eV - 1 GeV, Livermore
Rayleigh scattering 100 eV - 1 GeV, Livermore
Particles Electrons Protons Hydrogen atoms
Helium atoms (He++, He+, He0)
Elastic scattering
9 eV - 10 keVUehara
10 keV - 1 MeVChampion
100 eV - 1 MeVHoang
100 eV - 10 MeVHoang
Excitation10 eV - 10 keVEmfietzoglou
10 keV - 1 MeVBorn
10 eV - 500 keVMiller Green
500 keV - 100 MeVBorn
10 eV - 500 keVMiller Green
1 keV - 400 MeVMiller Green
Chargechange — 100 eV - 10 MeV
Dingfelder100 eV - 10 MeV
Dingfelder1 keV - 400 MeV
Dingfelder
Ionization10 eV - 10 keVEmfietzoglou
10 keV - 1 MeVBorn
100 eV - 500 keVRudd
500 keV - 100 MeVBorn
100 eV - 100 MeVRudd
1 keV - 400 MeVRudd
Vibrationalexcitation
2 - 100 eVMichaud et al. — — —
Disociative attachment
4 - 13 eVMelton — — —
E1
E2
pe-
H atom -> p
AB + e- -> AB- -> A + B-
((( (((
ΔEe-e-
p
Phys
ics
Proc
esse
s
MPEXS-DNA Physics Processes
Atomic deexcitation occurs during ionization process, and emits auger electrons and X-rays
![Page 37: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety](https://reader036.vdocument.in/reader036/viewer/2022071515/6137d8ab0ad5d2067648e36c/html5/thumbnails/37.jpg)
The difference of energy loss process (EM Physics vs DNA Physics)
Standard EM Physics• Continues process
• Energy loss is below a given threshold. • Calculates average energy loss at each
step with the Bethe-Bloch formula. • No secondaries are generated.
• Discrete process • Generates a secondary if energy loss is
above the threshold. DNA physics• Handling as a discrete process without
energy thresholds to calculate complex energy spread within cells for DNA damage • A large number of secondaries are generated
(~ 20k / primary).
Bethe-Bloch formula:
ΔE1 ΔE3
ΔE2ion
izatio
n
excita
tion
ΔE4
ΔE5
ΔE6
ΔE1
ΔE2
ΔE3
Δx1
Δx2
Δx3
Δx4
ΔE4
“continues process”
“discrete process”
![Page 38: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety](https://reader036.vdocument.in/reader036/viewer/2022071515/6137d8ab0ad5d2067648e36c/html5/thumbnails/38.jpg)
• DNA Physics simulation had an issue of Low thread occupancy.
• The number of active threads was limited due to large memory consumption for storing secondaries generated into the stack.
NVIDIA, Tesla K40c, Global Memory: 11,439 MB (GDDR5)
The difference of # of secondaries and active thread number (DNA vs EM)
Incidentparticle
Initialenergy
Typical # ofsecondariesgenerated
Stack size per CUDA thread
Total active CUDA thread numbers
(Nblk x Nthr/blk)
Total memory usage
for stacks
DNAPhysics He++ 1 MeV > 20,000 25,000
(1,074 kB)10,240
(80 x 128) 10,740 MB
EMPhysics e- 20 MeV < 40 100
(4.3 kB)1,048,576
(4,096 x 256) 4,405 MB
An issue on lower thread occupancy in physics simulation
![Page 39: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety](https://reader036.vdocument.in/reader036/viewer/2022071515/6137d8ab0ad5d2067648e36c/html5/thumbnails/39.jpg)
CUDA Thread Assignment For MPEXS-DNA Physics Simulation
• A group of 32 CUDA threads is assigned per event and the threads in a group share a secondary stack. • cf.) In MPEXS case (Standard EM Physics), each thread has its own stack.
• Host memory is also available as a stack (using virtual memory addressing) • Reduces memory consumption for the stacks and increases active thread number
(~10k threads -> more than 1 M threads) -> Keeps high thread occupancy during the simulation
DNA PhysicsStandard EM Physics
0 1 2 3 4 5 6 …e- e- γ γ e- e+ γ …
…
CUDA Threads
Secondary stacks
(capacity: 100)
Thread# …CUDA Threads
Secondary stacks (tot. capacity: 25k)
32 threads
…Warp #0
0 1 2 3 4 5 6 … 30H
31p e- H e- e- e- H … H e-
Thread#
Event #0 Event #1
Warp #132 33 34 35 36 37 38 … 62
H63
p H e- e- H p …
on host mem.
on device mem.
![Page 40: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety](https://reader036.vdocument.in/reader036/viewer/2022071515/6137d8ab0ad5d2067648e36c/html5/thumbnails/40.jpg)
MPEXS-DNA Physics PerformanceDepth dose curves (CPU vs GPU)
z-direction (um)0 5 10 15 20 25
Dos
e (G
y)
0
100
200300
400500
600
700310×
depth dose distributiondepth dose distribution
z-direction (um)0 5 10 15 20 25
Ener
gy D
epos
it (e
V)
0
100
200
300
400
500
energy depositenergy deposit
z-direction (um)0 10 20 30 40 50 60 70 80 90 100
Dos
e (G
y)
1
10
210
depth dose distribution
z-direction (um)0 10 20 30 40 50 60 70 80 90 100
Ener
gy D
epos
it (e
V)
1−10
1
10
energy deposit
z-direction (um)0 1 2 3 4 5 6 7 8 9 10
Dos
e (G
y)
0500
1000150020002500300035004000
310×depth dose distribution
z-direction (um)0 1 2 3 4 5 6 7 8 9 10
Ener
gy D
epos
it (e
V)0
20406080
100120140160180200
energy deposit
— Geant4-DNA (CPU) — MEPXS-DNA (GPU)
p 1 MeV
He++ 1 MeV
e- 100 keV
Good agreement with Geant4-DNA
![Page 41: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety](https://reader036.vdocument.in/reader036/viewer/2022071515/6137d8ab0ad5d2067648e36c/html5/thumbnails/41.jpg)
Physico-Chemical Phase for MPEXS-DNA• Physical interactions (Ionization / Excitation / Attachment) produce ionised and
excited H2O molecules (H2O+/H2O-, H2O*)
• Then, dissociates or releases energy into water
• Electrons (Ekin < 8.22 eV) become hydrated electrons (e-aq)
• These processes occur within 1 ps after irradiation
Electronic state Process Dissociation channel Fraction (%)Ionization state Dissociative decay H3O+ + •OH 100
Excitation state: A1B1Dissociative decay •OH + H• 65
Relaxation H2O + ΔE 35
Excitation state: B1A1Auto-ionization H3O+ + •OH + e-aq 55
Dissociative decay •OH + •OH + H2 15Relaxation H2O + ΔE 30
Excitation state: Rydberg,diffusion bands
Auto-ionization H3O+ + •OH + e-aq 50Relaxation H2O + ΔE 50
Dissociative attachment: H2O- Dissociative decay •OH + OH- + H2 100
Ref.) Radiat Environ Biophys (2009) 48: 11- 20
![Page 42: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety](https://reader036.vdocument.in/reader036/viewer/2022071515/6137d8ab0ad5d2067648e36c/html5/thumbnails/42.jpg)
(1)Calculates intermolecular distance (d) for all pairs. • Computation time increases by O(N2/2).
• kd-tree algorithm (Geant4-DNA) • Spreading CUDA threads (MPEXS-DNA)
Then, makes reactions for pairs with d < R
(2)Finds minimum distance in remains,and calculates time step (Δt).
(3)Diffuses molecules using Δt. • A CUDA thread transports a molecule.
(4)Loops (1) ~ (3)
Species Diffusion coefficient [m2/s]
H3O+ 9.0E-09
H• 7.0E-09
OH- 5.0E-09
e-aq 4.9E-09
H2 4.8E-09
•OH 2.8E-09
H2O2 2.3E-09
Reactions Reaction rate [M-1s-1]2e-aq + 2H2O -> H2+ 2OH- 5.00E+09e-aq + •OH -> OH- 2.95E+10e-aq + H• + H2O -> OH- + H2 2.65E+10e-aq + H3O+ -> H• + H2O 2.11E+10e-aq + H2O2 -> OH- + •OH 1.44E+10•OH + •OH -> H2O 4.40E+09•OH + H• -> H2O 1.44E+10H• + H• -> H2 1.20E+10H3O+ + OH- -> 2H2O 1.43E+10Ref.) Radiat Environ Biophys (2009) 48: 11- 20
d
d < R ?No Yes
Make reactionDiffusion
R = k4πNAD
Reaction radius (R)(by Smoluchowski Model) :
Chemical Phase for MPEXS-DNA
![Page 43: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety](https://reader036.vdocument.in/reader036/viewer/2022071515/6137d8ab0ad5d2067648e36c/html5/thumbnails/43.jpg)
Time(ps)1 10 210 310 410 510 610
G-v
alue
(# o
f mol
ecul
es /
100
eV)
0
1
2
3
4
5
6 Comparison of G-value profile (CPU vs GPU) ✓ Line: Geant4-DNA ✓ Filled circle: MPEXS-DNA
p 20 MeV
OH・ OH- H3O+ eaq- H2 ・H H2O2
Agrees with Geant4-DNA within ~ 3 %G-value = # of Molecules
Energy loss
Time(ps)1 10 210 310 410 510 610
G-v
alue
(# o
f mol
ecul
es /
100
eV)
0
1
2
3
4
5
6
7
Time(ps)1 10 210 310 410 510 610
G-v
alue
(# o
f mol
ecul
es /
100
eV)
0
1
2
3
4
5
6
7
Time(ps)1 10 210 310 410 510 610
G-v
alue
(# o
f mol
ecul
es /
100
eV)
0
1
2
3
4
5
6
7
Time(ps)1 10 210 310 410 510 610
G-v
alue
(# o
f mol
ecul
es /
100
eV)
0
1
2
3
4
5
6
7
G-value (e- 750 keV)
先週 new
eaq (! MPEXS-DNA)
H2 (! MPEXS-DNA)
eaq (Partrac)
H2 (Partrac)
eaq (! MPEXS-DNA)
H2 (! MPEXS-DNA)
eaq (Partrac)
H2 (Partrac)
・OH (! MPEXS-DNA)
H2O2 (! MPEXS-DNA)
・OH (! MPEXS-DNA)
H2O2 (! MPEXS-DNA)
・OH (Partrac)
H2O2 (Partrac)
・OH (Partrac)
H2O2 (Partrac)
e- 750 keV
Verifying with other simulation dataRef.) J. Radiat. Res., 46, 333–341 (2005)
MPEXS-DNA Physics and Chemical Performance
Diffusions and chemical reactions after irradiated water phantom with a 10 keV electron
![Page 44: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety](https://reader036.vdocument.in/reader036/viewer/2022071515/6137d8ab0ad5d2067648e36c/html5/thumbnails/44.jpg)
• Fast math option (nvcc --use_fast_math) • ~ 1.2x speedup
• L1 cache (nvcc -Xpxas -dlcm=ca) • ~ 1.8x speedup
• CUDA Stream • For kernels without dependency in Physics Phase
• Calculating cross-section value for each physical interaction • To use GPU resource fully in Chemical Phase
Code optimization for Tesla K40c GPU
![Page 45: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety](https://reader036.vdocument.in/reader036/viewer/2022071515/6137d8ab0ad5d2067648e36c/html5/thumbnails/45.jpg)
13.48
2.57
3279.47932.82
1.0E+00
1.0E+01
1.0E+02
1.0E+03
1.0E+04
e- 750keV p20MeV
EventN
umber/1min.
Geant4-DNA(CPU) MPEXS-DNA(GPU)
363x243x
Up to 360 times speedup against single-core Xeon CPU • Process time for p 20 MeV (total ~15k events)
• ~ 4 days (single-core Xeon CPU) -> ~ 16 min. (Tesla K40c GPU)
GPU Performance for MPEXS-DNA SimulationIncluding Physics and Chemical Phases
• GPU: • NVIDIA, Tesla K40c,
2,880 cores, 745 MHz • CPU:
• Intel, Xeon E5-2643 v2, 3.50 GHz
Comparison of event number processed per 1 min.
![Page 46: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety](https://reader036.vdocument.in/reader036/viewer/2022071515/6137d8ab0ad5d2067648e36c/html5/thumbnails/46.jpg)
Performance Gain for Tesla P100 against Tesla K40c
3279.47
932.82
10053.09
3028.60
0.0E+00 2.0E+03 4.0E+03 6.0E+03 8.0E+03 1.0E+04 1.2E+04
e- 750keV p20MeV
EventN
umber/1min.
MPEXS-DNA(K40c) MPEXS-DNA(P100)
3.06x
3.24x
• Adopted the same thread configuration as K40c in the simulation with P100 • More than 3 times performance gain against K40c
Comparison of event number processed per 1 min.
Preliminary result
![Page 47: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety](https://reader036.vdocument.in/reader036/viewer/2022071515/6137d8ab0ad5d2067648e36c/html5/thumbnails/47.jpg)
Summary• MPEXS-DNA is an extension of MPEXS to DNA Physics.
• Geant4-DNA should be improved an issue on long duration of simulation time.
• We’ve succeeded to boost up computing performance for microdosimetry simulation using GPU power drastically. • Up to 360 times speedup against single-core Xeon CPU for K40c
• A Tesla P100 is equivalent to ~ 1000 cores of Xeon CPU.• ~ 3 times performance gain against K40c without any optimization • Could achieve further performance improvement by appropriate
optimization.
![Page 48: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety](https://reader036.vdocument.in/reader036/viewer/2022071515/6137d8ab0ad5d2067648e36c/html5/thumbnails/48.jpg)
In near future• Developing “killer applications” based on MPEXS-DNA to estimate biological effects
on radiation quantitatively • DNA single- and double-strand breaks • Cellular survival rate • Radiosensitization to tumor in radiation therapy
(e.g. Gold nanoparticle; GNP) • …
• Extending MPEXS to “nuclear physics” and “thermal neutron physics” • Proton and carbon therapy • Boron Neutron Capture Therapy • Radiation shielding calculations • …
![Page 49: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety](https://reader036.vdocument.in/reader036/viewer/2022071515/6137d8ab0ad5d2067648e36c/html5/thumbnails/49.jpg)
Acknowledgements• Makoto Asai, SLAC
• Joseph Perl, SLAC
• Andrea Dotti, SLAC
• Takashi Sasaki, KEK
• Akinori Kimura, Ashikaga Institute of Technology
• Margot Gerritsen, ICME, Stanford
![Page 50: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety](https://reader036.vdocument.in/reader036/viewer/2022071515/6137d8ab0ad5d2067648e36c/html5/thumbnails/50.jpg)
References• N. Henderson, et al. A CUDA Monte Carlo simulator for radiation therapy dosimetry
based on Geant4. <https://dx.doi.org/10.1051/snamc/201404204>
• K. Murakami, et al. Geant4 Based simulation of radiation dosimetry in CUDA. <https://dx.doi.org/10.1109/NSSMIC.2013.6829452>
• S. Okada, et al. GPU Acceleration of Monte Carlo Simulation at the cellular and DNA levels. <https://dx.doi.org/10.1007/978-3-319-23024-5_29>
• S. Agostinelli, et al. Geant4—-a simulation toolkit.<https://dx.doi.org/10.1016/S0168-9002(03)01368-8>
• M.A. Bernal, et al. Track structure modeling in liquid water: A review of the Geant4-DNA very low energy extension of the Geant4 Monte Carlo simulation toolkit. <https://dx.doi.org/10.1016/j.ejmp.2015.10.087>