richard dorrance november 4, 2011

20
High Speed 3D Tomography on CPU, GPU, and FPGA Nicolas GAC, Stéphane Mancini, Michel Desvignes, Dominique Houzet Reconfigurable MPSoC versus GPU: Performance, Power and Energy Evaluation Diana Göhringer, Matthias Birk, Yves Dasse-Tiyo, Nicole Ruiter, Michael Hübner, Jürgen Becker Richard Dorrance November 4, 2011 Literature Review

Upload: delila

Post on 29-Jan-2016

38 views

Category:

Documents


0 download

DESCRIPTION

Literature Review. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Richard Dorrance November 4, 2011

Click to edit Master title style

High Speed 3D Tomographyon CPU, GPU, and FPGA

Nicolas GAC, Stéphane Mancini, Michel Desvignes, Dominique Houzet

Reconfigurable MPSoC versus GPU:Performance, Power and Energy Evaluation

Diana Göhringer, Matthias Birk, Yves Dasse-Tiyo,Nicole Ruiter, Michael Hübner, Jürgen Becker

Richard DorranceNovember 4, 2011

Literature Review

Page 2: Richard Dorrance November 4, 2011

Click to edit Master title style

Review

Computed Tomography

Page 3: Richard Dorrance November 4, 2011

Tomography

Basis for CAT scan, MRI, PET, SPECT, etc.

Cross-sectional imagingtechnique using transmissionor reflection data frommultiple angles

Computed Tomography (CT):A form of tomographic reconstruction on computers

3

Page 4: Richard Dorrance November 4, 2011

Cross-Sections by X-Ray Projections

Project X-ray through biological tissue;measure total absorption of ray by tissue

Projection Pθ(t) is the Radontransform of object functionf(x,y):

Total set of projections calledsinogram

4

, cos sinP t f x y x y t dxdy

Page 5: Richard Dorrance November 4, 2011

Phantom and Sinogram

5

Shepp-Logan Phantom

Page 6: Richard Dorrance November 4, 2011

CT Reconstruction

Restore image from projection data

Inverse Radon transform

Most common algorithm is filtered backprojection– “Smear” each projection over image plane

Accuracy of reconstruction depends on the number of detectors and projection angles

6

Original 4 Angles 16 Angles 64 Angles 256 Angles

Page 7: Richard Dorrance November 4, 2011

Note on Filtering

7

No Filtering With Filtering

Page 8: Richard Dorrance November 4, 2011

FBP Algorithm

Input: sinogram sino(θ, N) Output: image img(x,y)

for each θfilter sino(θ,*)for each x

for each yn = x cos θ + y sin θimg(x,y) = sino(θ, n) + img(x,y)

O(N3) algorithm– But highly parallelizable, given sufficient memory

bandwidth; not computationally intensive

8

Page 9: Richard Dorrance November 4, 2011

Click to edit Master title style

High Speed 3D Tomographyon CPU, GPU, and FPGA

Nicolas GAC, Stéphane Mancini, Michel Desvignes, Dominique Houzet

Page 10: Richard Dorrance November 4, 2011

3PA-PET (Pipelined, Prefetch, Parallelized)

10

Page 11: Richard Dorrance November 4, 2011

Algorithms

11

Page 12: Richard Dorrance November 4, 2011

Hardware

CPU– Desktop PC: Pentium 4 (3.2 GHz)– Workstation: bi-Xeon Dual Core (3.0 GHz)

GPU– Nvidia GeForce 8800 GTS (1.2 GHz, 96 Cores)

FPGA– Virtex 4 (200 MHz)

ASIC– Projected/Extrapolated (1.2 GHz)

12

Page 13: Richard Dorrance November 4, 2011

CPU vs. GPU vs. FPGA vs. ASIC

13

Page 14: Richard Dorrance November 4, 2011

w/ Proper Normalization

Hardware Algorithm # of PE [cycles/px] [cycles/px*PE]

Pentium 4 STIR 1 34,505.21 34,505.21

Pentium 4 VBI-flt(v1) 1 169,580.85 169,580.85

Pentium 4 VBI-flt(v2) 1 53,943.45 53,943.45

Pentium 4 VBI-flt(v3) 1 7,750.50 7,750.50

Xeon (Dual Core) STIR 1 16,682.94 16,682.94

Xeon (Dual Core) VBI-flt(v3) 1 3,400.53 3,400.53

Xeon (Dual Core) VBI-flt(v3) 2 1,694.45 3,388.90

Xeon (Dual Core) VBI-flt(v3) 4 854.49 3,417.97

GPU VBI-flt(v4) 96 115.09 11,049.11

GPU VBI-flt(v5) 96 58.13 5,580.36

FPGA VBI-fix 1 484.41 484.41

FPGA VBI-fix 4 149.97 599.89

FPGA VBI-fix 8 101.92 815.35

ASIC VBI-fix 1 580.12 580.12

ASIC VBI-fix 4 248.79 995.16

ASIC VBI-fix 8 156.95 1,255.58

ASIC VBI-fix 40 31.39 1,255.58

14

Page 15: Richard Dorrance November 4, 2011

Click to edit Master title style

Reconfigurable MPSoC versus GPU:Performance, Power and Energy Evaluation

Diana Göhringer, Matthias Birk, Yves Dasse-Tiyo,Nicole Ruiter, Michael Hübner, Jürgen Becker

Page 16: Richard Dorrance November 4, 2011

RAMPSoC

Runtime adaptive multi-processor system-on-chip– ROACH/iBOB-like system from a group out of Germany

16

Page 17: Richard Dorrance November 4, 2011

3D Ultrasound Computed Tomography

Mammography for earlybreast cancer detection

3D USCT works on thesame principles asregular CT scans

17

Page 18: Richard Dorrance November 4, 2011

Hardware

CPU– AMD Athlon 64 3200+ (2.2 GHz, 1 GB RAM)

GPU– Nvidia Tesla C2050 (1.15 GHz, 448 Cores)

FPGA– Xilinx Virtex-4FX100 (125 MHz)

18

Page 19: Richard Dorrance November 4, 2011

CPU vs. GPU vs. FPGA

19

Hardware # of PE [cycles/img] [cycles/img*PE] [W] [1/J]

Athlon 64 1 330,000.00 330,000.00 177 37

GPU 448 3,714.50 1,664,096.00 270 1147

FPGA 8 18,000.00 144,000.00 3.61 1924

Page 20: Richard Dorrance November 4, 2011

References

1. N. GAC, et al., “High Speed 3D Tomography on CPU, GPU, and FPGA,” EURASIP Journal on Embedded Systems, vol. 2008, Article ID 930250, 12 pages, 2008.

2. D. Göhringer, et al., “Reconfigurable MPSoC versus GPU: Performance, power and energy evaluation,” INDIN‘11, pp.848-853, 26-29 July 2011.

3. A. C. Kak and M. Slaney, Principles of Computerized Tomographic Imaging, IEEE Press, 1988.

4. J. Hsieh, Computerized Tomography: Principles, Design, Artifacts, and Recent Advancements, SPIE & Wiley, 2009.

20