generation of planar radiographs from 3d anatomical models using the gpu
DESCRIPTION
TRANSCRIPT
Generation of planar radiographs from 3Danatomical models using the GPU
André dos Santos CardosoSupervisor: Jorge M. G. Barbosa
University of PortoFaculty of Engineering of University of Porto
11th February, 2011
André Cardoso [email protected] DRR Synthesis Algorithms 1/271/27
Contents
Introduction and Context
CUDA Platform
Input Data
Pre-Processing Steps
Developed Algorithms
Conclusion
André Cardoso [email protected] DRR Synthesis Algorithms 2/272/27
Introduction and Context
CUDA Platform
Input Data
Pre-Processing Steps
Developed Algorithms
Conclusion
André Cardoso [email protected] DRR Synthesis Algorithms 2/272/27
DRRs
• Digitally Reconstructed Radiographs – DRRs• Artificial Radiographs taken from vertebrae models
Figure: L3 Vertebra, frontal DRR Figure: L3 Vertebra, lateral DRR
André Cardoso [email protected] DRR Synthesis Algorithms 3/273/27
DRRs – Why?
• Shape recovery of human spine◦ 100s of DRRs per second
• Scoliosis Evaluation◦ Alternative to MRIs and CTs
André Cardoso [email protected] DRR Synthesis Algorithms 4/274/27
Project’s Objective
Build Fast DRR Algorithms• Common bottleneck!◦ Applications in medical area – high throughputs are demanded
• Take advantage new GPUs and APIs◦ Common workstations could do the job!
André Cardoso [email protected] DRR Synthesis Algorithms 5/275/27
Existing Solution – GLSL
• GLSL implementation – multi-pass working solution• Depth Peeling Based – Cass Everitt, InteractiveOrder-Independent Transparency
• Let’s try to enhance its performance!!
André Cardoso [email protected] DRR Synthesis Algorithms 6/276/27
Algorithm Concepts
P4
P3
P2
P1
Object
X-ray source
Image Plane
Problem!Potential Artifact Generation!
Object
• Each ray traverses the object◦ Energy is attenuatedPixelColor = exp ((||P2 − P1||+ ||P4 − P3||)× AttenuationFactor)
• Common edges may lead to artifact generation!André Cardoso [email protected] DRR Synthesis Algorithms 7/27
7/27
Introduction and Context
CUDA Platform
Input Data
Pre-Processing Steps
Developed Algorithms
Conclusion
André Cardoso [email protected] DRR Synthesis Algorithms 7/277/27
CUDA Platform
• Compute Unified Device Architecture◦ Parallel Computing Architecture◦ Exposes GPU functions and memory◦ SIMT execution model◦ Allows hierarchical configuration of
threads
• Cheap threads, dozens/hundreds of cores◦ Thousands of concurrent threads!
• GeForce GT 240◦ 96 cores◦ 12288 active threads
André Cardoso [email protected] DRR Synthesis Algorithms 8/278/27
CUDA Platform – Threading and Memory
André Cardoso [email protected] DRR Synthesis Algorithms 9/279/27
Introduction and Context
CUDA Platform
Input Data
Pre-Processing Steps
Developed Algorithms
Conclusion
André Cardoso [email protected] DRR Synthesis Algorithms 9/279/27
Inputs for Our Algorithms
• Geometry file – thevertebrae models
André Cardoso [email protected] DRR Synthesis Algorithms 10/2710/27
Inputs for Our Algorithms
• Geometry file – thevertebrae models
André Cardoso [email protected] DRR Synthesis Algorithms 10/2710/27
Inputs for Our Algorithms
• Camera Calibration Matrix
André Cardoso [email protected] DRR Synthesis Algorithms 10/2710/27
Inputs for Our Algorithms
• Camera Calibration Matrix
Figure: Pinhole Model
C =
αu λ u00 αv v00 0 1
P =
f 0 0 00 f 0 00 0 1 0
K =
[R t0T
3 1
]
s
uv1
= C.P.K.
XYZ1
André Cardoso [email protected] DRR Synthesis Algorithms 10/27
10/27
Introduction and Context
CUDA Platform
Input Data
Pre-Processing Steps
Developed Algorithms
Conclusion
André Cardoso [email protected] DRR Synthesis Algorithms 10/2710/27
Pre-Processing Steps
1. 2D Bounding Box
2. (Projection Source)3. Ray Direction
(for each pixel)
◦ R(t) = O + tD
André Cardoso [email protected] DRR Synthesis Algorithms 11/2711/27
Pre-Processing Steps
1. 2D Bounding Box
2. (Projection Source)3. Ray Direction
(for each pixel)
◦ R(t) = O + tD
André Cardoso [email protected] DRR Synthesis Algorithms 11/2711/27
Pre-Processing Steps
1. 2D Bounding Box2. (Projection Source)
3. Ray Direction(for each pixel)
◦ R(t) = O + tD
André Cardoso [email protected] DRR Synthesis Algorithms 11/2711/27
Pre-Processing Steps
1. 2D Bounding Box2. (Projection Source)
3. Ray Direction(for each pixel)
◦ R(t) = O + tD
André Cardoso [email protected] DRR Synthesis Algorithms 11/2711/27
Pre-Processing Steps
1. 2D Bounding Box2. (Projection Source)3. Ray Direction
(for each pixel)
◦ R(t) = O + tD
André Cardoso [email protected] DRR Synthesis Algorithms 11/2711/27
Pre-Processing Steps
1. 2D Bounding Box2. (Projection Source)3. Ray Direction
(for each pixel)
◦ R(t) = O + tD
André Cardoso [email protected] DRR Synthesis Algorithms 11/2711/27
Introduction and Context
CUDA Platform
Input Data
Pre-Processing Steps
Developed Algorithms
Conclusion
André Cardoso [email protected] DRR Synthesis Algorithms 11/2711/27
Image Order Approach
• Ray Casting!
1 Thread for Each Pixel• Thread ⇐⇒ Ray• Thread loops over ALL triangles◦ Tests intersections between ray and
triangle◦ Acumulates distances to source
along ray path
André Cardoso [email protected] DRR Synthesis Algorithms 12/2712/27
Image Order Approach – Problems
1. Many threads loopingover many triangles
2. Useless intersectiontests – heavyoperations!
3. Artifacts – hard to takecare of!
L3 Vertebra Model• 776 vertices, 1552 triangles• PA perspective: 266 × 138 pixels =36708 threads
André Cardoso [email protected] DRR Synthesis Algorithms 13/2713/27
Image Order Approach – Problems
1. Many threads loopingover many triangles
2. Useless intersectiontests – heavyoperations!
3. Artifacts – hard to takecare of!
André Cardoso [email protected] DRR Synthesis Algorithms 13/2713/27
Image Order Approach – Problems
1. Many threads loopingover many triangles
2. Useless intersectiontests – heavyoperations!
3. Artifacts – hard to takecare of!
André Cardoso [email protected] DRR Synthesis Algorithms 13/2713/27
Image Order Approach – Results
• L3 vertebra model• PA camera – 265 × 137pixels
• GPU time only!• Incomplete implementation
SLOW!
André Cardoso [email protected] DRR Synthesis Algorithms 14/2714/27
Object Order Approach
• Ray Casting!• Threads spanned foreach triangle◦ Reverse the approach
of the formeralgorithm!
1 Thread for Each Triangle• Thread loops over each pixel coveredby the triangle bounding box◦ Tests intersections between ray and
triangle◦ Acumulates distances to source
along ray path• Concurrency problems!
André Cardoso [email protected] DRR Synthesis Algorithms 15/2715/27
Object Order Approach – Problems
1. Concurrency problems onpixel data.◦ Fang Liu et al, FreePipe:
a programmable parallelrendering architecture forefficient multi-fragmenteffects
2. Still many intersectiontests
3. Artifacts still hard to avoidor correct
int index = atomicInc(sharedCounter);
Pixe
l Bu�
er
Concurrent Threads
André Cardoso [email protected] DRR Synthesis Algorithms 16/2716/27
Object Order Approach – Problems
1. Concurrency problems onpixel data.
2. Still many intersectiontests
3. Artifacts still hard to avoidor correct
André Cardoso [email protected] DRR Synthesis Algorithms 16/2716/27
Object Order Approach – Problems
1. Concurrency problems onpixel data.
2. Still many intersectiontests
3. Artifacts still hard to avoidor correct
André Cardoso [email protected] DRR Synthesis Algorithms 16/2716/27
Object Order Approach – Results
• L3 vertebra model• PA camera – 265 × 137pixels
• GPU time only!• Incomplete implementation
SLOW!
André Cardoso [email protected] DRR Synthesis Algorithms 17/2717/27
Multi-depth Approach - Principle
Assume a Simplification• Discard the Euclidean distance between intersections!• Consider only distance between Fragments, along depth axis!!
Source
P1
P2
P’2P’1
d1
d2
André Cardoso [email protected] DRR Synthesis Algorithms 18/2718/27
Multi-depth Approach - Pipeline
• Rasterization done using Scanline+Bresenham algorithm◦ Filling convention avoids artifacts :) !
• Interpolation in Integer interval◦ Depth = Z−Zmin
Zmax−Zmin × INT_MAX
• Saving depth in pixel array, raises concurrency problems (again)!
André Cardoso [email protected] DRR Synthesis Algorithms 19/2719/27
Multi-depth Approach - Depth arrayOrdering
atomicMin inserts in right place1: initializeDepthArrays(MAX_INTEGER)2: Znew ← interpolateDepth()3: for i = 0 to DEPTH_ARRAY_SIZE − 1 do4: Zold ← atomicMin(&(getPixelDepthArray(u, v , i)),Znew)5: if Zold == MAX_INTEGER then6: break7: end if8: Znew ← fmaxf (Znew ,Zold)9: end for
• Fang Liu et al, FreePipe: a programmable parallel renderingarchitecture for efficient multi-fragment effects
André Cardoso [email protected] DRR Synthesis Algorithms 20/2720/27
Multi-depth Approach - Results• Best time:◦ 202 × 132 pixels◦ GPU + CPU time!
◦ Performance With andWithout DRR transfer tohost!
BETTER!André Cardoso [email protected] DRR Synthesis Algorithms 21/27
21/27
Multi-depth Optimization
• Multi-depth allows for an ordered set of depths◦ More depths =⇒ more atomicMin() calls
We can postpone depth Ordering...1: index← atomicInc(&counter, INT_MAX)2: depthArray [index ]← Znew // RAW-hazard free!!!!
• depthArray has all the depth values;◦ Ordering can be done on a post-processing kernel!!!
André Cardoso [email protected] DRR Synthesis Algorithms 22/2722/27
Multi-depth Optimization
int index = atomicInc(sharedCounter);Pi
xel B
u�er
Concurrent Threads
André Cardoso [email protected] DRR Synthesis Algorithms 22/2722/27
Multi-depth Optimization – Results
• A-buffer Scheme Versus GLSL Solution• 202 × 132 pixels
André Cardoso [email protected] DRR Synthesis Algorithms 23/2723/27
Multi-depth Optimization – Results
Better than Current Solution
André Cardoso [email protected] DRR Synthesis Algorithms 23/2723/27
Introduction and Context
CUDA Platform
Input Data
Pre-Processing Steps
Developed Algorithms
Conclusion
André Cardoso [email protected] DRR Synthesis Algorithms 23/2723/27
Conclusion
• CUDA implementations for DRR extraction◦ Both pre-processing and main computation tasks◦ Artifact-free
• Single geometry pass• Shared memory model◦ May be adapted to other technologies
• Final implementation shows better performance than GLSL
André Cardoso [email protected] DRR Synthesis Algorithms 24/2724/27
Future Work
There’s a Big Chart to Fill Up...
André Cardoso [email protected] DRR Synthesis Algorithms 25/2725/27
Future Work
• Still some artifacts• Memory operations optimizations• Comparisons with other implementations, other geometrymodels
• Build a DRR generation library◦ possibly an open-source project
• Participation in IJUP’11 • Paper preparation forVIPIMAGE 2011. AbstractDeadline: 15th March.
André Cardoso [email protected] DRR Synthesis Algorithms 26/2726/27
Thank You for Listening!Ask Away!
André Cardoso [email protected] DRR Synthesis Algorithms 27/2727/27