radon gpu manuscript

Upload: mustafa-berkan-bicer

Post on 13-Apr-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/27/2019 Radon Gpu Manuscript

    1/5

    Real-Time Radon Transform

    via the GPU Graphics Pipeline

    Christian B. Mendl

    Abstract

    Graphics processing units (GPUs) found in consumer hardware have emerged as first choice forcomputational-intensive image processing tasks, e.g., real-time computer tomography (CT) reconstruction.However, the GPU programming paradigm is rather intricate. To leverage the potential of GPUs for a broad

    audience, this paper is concerned with a self-contained, pedagogical implementation of the prototypical for-ward and inverse Radon transform. The full source code is available for download. Instead of utilizing theGPU merely as multiprocessor, the algorithm explicitly employs the graphics pipeline to take advantage ofthe dedicated GPU hardware components.

    1. Introduction

    Interactive computer tomographic (CT) imagingtechniques in medical and physical sciences haverecently become the focus of active research, forexample, to enable assistance during surgery [1],for monitoring, or in (closed-loop) feedback ex-

    periments. The associated real-time data process-ing tasks can profit considerably from the broadavailability and exponential performance growth ofconsumer graphics processing units (GPUs) [2, 3].However, the GPU programming paradigm is stillrather intricate as compared to traditional CPUimplementations. To render this topic accessibleto a broad audience, this paper provides a self-contained, pedagogical implementation of the pro-totypical forward and inverse Radon transform [4,5] on GPUs. The full source code is available fordownload [6] and includes a real-time demonstra-

    tion program.The paper is organized as follows. Section2ex-plains the theoretical background and contains adetailed description of our algorithmic implementa-tion for both the forward and inverse Radon trans-form. We evaluate the performance of our programin section3and estimate the GPU speedup as com-pared to a fast CPU. Finally, section4draws con-cluding remarks.

    Email address: [email protected]

    (Christian B. Mendl)

    2. Implementation

    This section contains the technical description ofour implementation. It is subdivided into the for-ward and inverse Radon transform, which can behandled independently. In each subsection, we firststate the well-known classical theory[4,7,5]and es-

    tablish our notation, and then explain the mappingto the graphics pipeline of modern GPUs. There,the only essential hardware requirement is supportfor floating point arithmetic.

    2.1. Forward Radon transform

    The forward Radon transform R of a continuousfunctionf : R2 Rwith bounded support may bedefined by the line integral

    (Rf)(, t) :=

    n

    x=t

    f(x) dm(x), [0, 2), t R

    (1)with n := (cos , sin ) and the 1-dimensional hy-perplane measure dm. In other words, the linecontains the point t n and is perpendicular to n.Figuratively, R describes a set of parallel projec-tions of f at a certain angle , as illustrated inFig.1.

    The implementation we present here is a directmapping of (1) to the graphics pipeline of modernGPUs, as illustrated in Fig.2. The discretized ver-sion offis an image or texture, which we assume

    Preprint submitted to Computer Methods and Programs in Biomedicine October 27, 2010

  • 7/27/2019 Radon Gpu Manuscript

    2/5

    Figure 1: Parallel projections of an image f : R2 R, foran angle .

    to be square for simplicity of presentation. Denoteits width and height by h. Furthermore, we haveto discretize the rotation angle , i.e.,

    i := i

    m, i= 0, 1, . . . , m 1 (2)

    with some fixed integer m > 0. To accommodateall parallel projections of the input texture for anyrotation angle requires line segments of length

    2 h.

    Thus, the output texture of the algorithm has di-mensions

    2 h

    m, with rowi corresponding torotation anglei. To register this texture as outputof the graphics processing stages, we set it as rendertargetof the GPU.

    Our algorithm consists of the following steps: Al-locate a vertex buffer containing

    2 h

    screen

    quads, i.e., pairs of triangles which form a rect-angle. The vertex position coordinates are chosensuch that each quad precisely covers the render tar-get texture. Additionally, each vertex contains tex-ture coordinates (tex.x, tex.y). These are used by

    the pixel shader to calculate the sampling coordi-nates (u, v) for the input texture, as follows:

    u

    v

    :=

    cos i sin i sin i cos i

    tex.x 0.5tex.y 0.5

    +

    0.50.5

    .

    The addition/subtraction of 0.5 shifts the rotationfixpoint to the center of the texture. The index i isequal to the y-coordinate of the current render tar-get pixel, and is thus available to the pixel shader.

    To obtain the projection line integrals, we assignthe same tex.y coordinate to the four vertices form-ing a single screen quad, such that the collection of

    all screen quads samples the projection lines. Moreprecisely, for all four vertices of screen quad k ,

    tex.y := k2 h

    , k = 0, 1, . . . ,2 h 1.Then, the projection line integrals are effectivelycalculated by blending all screen quads together.In terms of graphics programming, this is effectedby the ADD blendstate.

    2.2. Inverse Radon transform

    First, we briefly recall the the theoretical back-ground [4, 7, 5]. In what follows,Fn denotes theFourier transform in n dimensions. We treat asfixed parameter for now and set P(t) := (Rf)(, t).

    Then, the so-called Fourier slice theorem may bederived as follows. Let

    S() :=

    2F1P =

    nx=t

    f(x) dm(x) e t dt

    =

    R2

    f(x) e nx d2x= 2(F2f)(n).

    (3)

    That is, the 1-dimensional Fourier transform ofPyields the 2-dimensional Fourier transform off.

    Thus, applying the inverse Fourier transform re-covers f. Using polar coordinates, we obtain

    f(y) = 1

    (2)2

    20

    0

    S() e

    ny d d

    = 1

    (2)2

    0

    S() e

    ny || d d,

    where we have used thatS+() =S(). Defin-ing

    Q(t) :=

    1

    2

    S() || e

    t

    d, (4)

    i.e., the 1-dimensional inverse Fourier transform ofS() || up to a constant prefactor, we thus obtain

    f(y) = 1

    2

    0

    Q(n y) d. (5)

    This is precisely the backprojection algorithm.Note that for fixed, the contribution to the recon-structed image is constant along each line n y =const.

    2

  • 7/27/2019 Radon Gpu Manuscript

    3/5

    Figure 2: Illustration of the rotated projections renderingpass, which yields the forward Radon transform of the inputtexture using a single draw call. The highlighted rotatedbar depicts a projection line along the input texture. Thesampling of the texels within the bar (blue) is performed byscreen quads. Summing these texel values up is effectivelyimplemented by the ADD blend state. Note that all fourvertices forming a screen quad have the same tex.y coordi-nate. The rotation angle depends on the y -position of thecurrent render target pixel.

    Directly from the definitions in (3) and (4), wemay write Q concisely as

    Q = F11 ((F1P) () ||) . (6)

    In other words, Q is a frequency domain filteringoperation applied to P, with filter||.

    The graphics pipeline has to carry out this filter-ing operation for each row of the output texture inFig.2, which serves as input to the inverse Radontransform algorithm. Zero-padding the rows to apower of 2 is required to prevent spatial domainaliasing, and also to speed up the FFT. We will notgo into details here since there already exist FFTimplementations specifically designed for graphicscards, i.e., the CuFFT [8] library. However, fordemonstration purposes we have implemented theCooley-Tukey FFT algorithm [9] in the providedsource code.

    In what follows, we show how the backprojec-tion operation in (5) can be mapped to the graphicspipeline, as illustrated in Fig.3. The input texture

    has the same structure as the output of the forwardtransform since the filtering operation(6) does notchange dimensions. That is, row i corresponds torotation anglei defined in (2). Backprojecting canbe visualized as smearing the highlighted bluebar in Fig.3over the reconstruction plane, which isschematically indicated by the grayscale pixel lines.We employ a collection of m rotated squares orscreen quads such that all four vertices formingquad i have the same tex.y texture coordinate,

    tex.y := i

    m=

    i

    , i= 0, 1, . . . , m 1.

    This quad is rotated by i in accordance with (5).

    3. Results

    The following table summarizes the runningtimes for a 256 256 input image and m = 180rotation angles on a NVIDIA GeForce GTS 250graphics card with 512 MB internal video memory:

    forward (256

    256 pixels) 3.7 msinverse (1024180 pixels FFT filtering,363 180 pixels backprojection)

    9.7 ms

    The overall system configuration consists of an In-tel Core i7 CPU 920 running at 2.67 GHz (8 cores),6 GB RAM, Windows 7 64-bit version with Di-rectX 11. For comparison, we have run the sameprogram on the CPU only (DirectX 11 WRAP de-vice) and obtained 112 ms for the forward and71 ms for the inverse transform. Note that DirectXactually employs all 8 CPU cores simultaneously.Rather surprisingly, the CPU takes longer for the

    3

  • 7/27/2019 Radon Gpu Manuscript

    4/5

    Figure 3: Illustration of the backprojection implementationusing rotated screen quads. Each quad corresponds to afixed angle . Sampling the corresponding row in the inputtexture is effected via the tex.y texture coordinate, whichhas the same value for all four vertices within a quad.

    forward than for the inverse transform. From theresults, the GPU speedup factor equals 30 and 7.2,respectively. Note that we have not even employedthe latest GeForce graphics card series (GeForceGTX 480 at the time of writing), or one of theNVIDIA Tesla computing processors. These wouldpresumably lead to an even higher speedup factor.

    Fig. 4 shows a screenshot of the demonstrationprogram, which computes and visualizes the for-ward and inverse Radon transform in real-time.The output texture of the forward transform has di-mensions

    2 hm = 363 180 and is shown inthe middle (rotated by 90). As mentioned above,

    the filtering operation for the inverse transform re-quires zero-padding. Our implementation extendsthe width from 363 to 1024; thus, a forward andinverse 1-dimensional FFT of length 1024 has to becarried out 180 times.

    4. Conclusion

    As expected, the GPU implementation leads toa significant performance speedup as compared to

    modern CPUs, since graphics cards are specifi-cally designed for texture operations like the Radontransform. Our contribution here is a detailed, ped-agogical description of how to map the theoreti-cal framework to the graphics pipeline. It is worthmentioning that except for the Fourier transform, itsuffices to employ classical rendering techniquesusing vertex and pixel shaders. This provides ad-ditional benefits, like, for example, linear texturefiltering, which is readily available in hardware.

    [1] P. M. Novotny, J. A. Stoll, N. V. Vasilyev, P. J. del Nido,P. E. Dupont, T. E. Zickler, R. D. Howe, GPU basedreal-time instrument tracking with three-dimensional ul-trasound, Medical Image Analysis 11 (5) (2007) 458 464. doi:10.1016/j.media.2007.06.009.

    [2] F. Xu, K. Muller, Real-Time 3D Computed Tomographic

    Reconstruction Using Commodity Graphics Hardware,Physics in Medicine and Biology 52 (2007) 3405 3419.

    [3] N. Neophytou, F. Xu, K. Muller, Hardware accelerationvs. algorithmic acceleration: Can GPU-based process-ing beat complexity optimization for CT?, SPIE MedicalImaging 6510 (2007) 65105F.

    [4] J. Radon, Uber die Bestimmung von Funktionen durchihre Integralwerte langs gewisser Mannigfaltigkeiten,Sachsische Akademie der Wissenschaften, Leipzig 69(1917) 262 277.

    [5] A. C. Kak, M. Slaney, Principles of Computerized To-mographic Imaging, IEEE Press, 1988.

    [6] C. B. Mendl, Forward and inverse Radon transformdemonstration program (October 2010).URL http://christian.mendl.net/software/iradon.zip

    [7] J. Radon, P. Parks, On the Determination of Func-tions from Their Integral Values along Certain Mani-folds, IEEE Transactions on Medical Imaging 5 (1986)170 176.

    [8] NVIDIA,CuFFT Library(August 2010).URL http://developer.nvidia.com/object/gpucomputing.html

    [9] J. W. Cooley, J. W. Tukey, An Algorithm for the Ma-chine Calculation of Complex Fourier Series, Mathemat-ics of Computation 19 (1965) 297 301.

    4

    http://dx.doi.org/10.1016/j.media.2007.06.009http://dx.doi.org/10.1016/j.media.2007.06.009http://christian.mendl.net/software/iradon.ziphttp://christian.mendl.net/software/iradon.ziphttp://christian.mendl.net/software/iradon.ziphttp://christian.mendl.net/software/iradon.ziphttp://developer.nvidia.com/object/gpucomputing.htmlhttp://developer.nvidia.com/object/gpucomputing.htmlhttp://developer.nvidia.com/object/gpucomputing.htmlhttp://developer.nvidia.com/object/gpucomputing.htmlhttp://developer.nvidia.com/object/gpucomputing.htmlhttp://developer.nvidia.com/object/gpucomputing.htmlhttp://christian.mendl.net/software/iradon.ziphttp://christian.mendl.net/software/iradon.ziphttp://christian.mendl.net/software/iradon.ziphttp://christian.mendl.net/software/iradon.ziphttp://dx.doi.org/10.1016/j.media.2007.06.009
  • 7/27/2019 Radon Gpu Manuscript

    5/5

    Figure 4: Screenshot of the animated forward and inverse Radon transform demonstration program, showing the Shepp-Loganphantom. Note that all computations are performed in real-time. The image in the middle shows the calculated forwardtransform and serves as input to the inverse transform. On the right is the reconstructed image after filtering (not shown) andbackprojection.

    5