accelarating optical quadrature microscopy using gpus

23
Phase Unwrapping and Affine Transformations using CUDA Perhaad Mistry, Sherman Braganza, David Kaeli, Miriam Leeser ECE Department Northeastern University Boston, MA

Upload: perhaad-mistry

Post on 05-Dec-2014

1.036 views

Category:

Documents


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Accelarating Optical Quadrature Microscopy Using GPUs

Phase Unwrapping and Affine Transformations using CUDA

Perhaad Mistry, Sherman Braganza, David Kaeli, Miriam Leeser

ECE DepartmentNortheastern University

Boston, MA

Page 2: Accelarating Optical Quadrature Microscopy Using GPUs

Keck 3D Fusion Microscope

Page 3: Accelarating Optical Quadrature Microscopy Using GPUs

Motivation

3

Wrapped Phase Image Unwrapped Phase Image

Page 4: Accelarating Optical Quadrature Microscopy Using GPUs

MotivationPhase Unwrapping used for research in In-Vitro Fertilization (IVF) in Optical Quadrature MicroscopyCell counts obtained after unwrapping determine viability of embryosResults required at close to real time to see changes in sampleImprove cost-effectiveness of OQM and quality of research

4

Page 5: Accelarating Optical Quadrature Microscopy Using GPUs

OutlineOptical Quadrature Microscopy Affine TransformationsMinimum LP Norm Phase UnwrappingMEX Interface UsagePerformanceFuture WorkConclusion

5

OQM Frame Grabber

Affine Transforms

Phase Unwrapping

OQM Imaging System Layout

Page 6: Accelarating Optical Quadrature Microscopy Using GPUs

OQM ImagingPattern determined by phase difference between laser beamsPhase calculated using magnitudes captured from four camerasMixed (M), signal only(S), reference(R) and dark (D) imagesPhase difference measured by angle(E) ranges from –π to π

angle(E) yields phase that is required for unwrapping

6

3

0

1 *4

nn n n n

n n

M S RE iR

=

=

− −= ∑

Optical Quadrature Microscopy Layout

REF

FromLaser

Sample

NPBS

PBS

Mirror

Mirror

SIG

Camera 0

Camera 1

Camera 2

Camera 3

Page 7: Accelarating Optical Quadrature Microscopy Using GPUs

Affine Transformation

Algorithm for Affine Transform [1]

Align images to fix sample’s position in different images2x2 and 2x1 matrices obtained from microscopeImage pixels divided among blocks of threadsEach thread calculates one pixelImplemented using CUDA since result needed for Phase Unwrapping

(x,y) = f(BlockId, ThreadID)Load Data = Image[x][y]Includes - Rotation (θ), Scaling (s), Shearing (sh)

x' s(x)*cos(θ) sh(y)*sin(θ) x a= +

y' -sh(x)*sin(θ) s(y)*cos(θ) y bStore Image[x'][y

⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠

'] = Data(Uncoalesced Writes)

7

[1] A. K. Jain, Fundamentals of digital image processing, Prentice Hall

Microscope’s Frame

Grabber

Affine Transforms

Phase Unwrapping

Page 8: Accelarating Optical Quadrature Microscopy Using GPUs

Noise Removal & Phase Information

Separate “noise” only image available (Dark Image)Dark voltage image subtracted from the mixed, signal and reference imagesSubtraction of signal image (Sn) and reference image (Rn) due to fixed pattern levels of individual cameras

Square root of Rn to balance intensities for detection

8

n n3

0

and so on for S and R1 *4

( )

n n nn

n n n n

n n

M M DM S RE i

Rphase angle E

=

=

= −

− −=

=

∑ n: camera number

Page 9: Accelarating Optical Quadrature Microscopy Using GPUs

Example Images for Affine Transform

9

After Affine Transform and Noise Removal

Signal only ImageMixed Image

Dark ImageReference Image

Page 10: Accelarating Optical Quadrature Microscopy Using GPUs

Optical phase difference from interferometric image is shownParameter of interest is Optical path difference which can be more than πUnwrapping along a line:

Sum till a gradient of π then, add 2π

In 2D if noise is presentPath dependencies while summingCauses accumulation errors

Phase Unwrapping

10

Wrapped Image(note range of values)

Frame Grabber

Affine Transforms

Phase Unwrapping

Page 11: Accelarating Optical Quadrature Microscopy Using GPUs

MATLAB’s Unwrap v/s Minimum LP Norm

11

Unwrap Using MATLAB’s unwrap() function

Unwrap Using Minimum Lp

Norm Algorithm [1]

[1] Dennis C. Ghiglia, Mark D. Pritt: Two-Dimensional Phase Unwrapping: Theory, Algorithms, and Software

Page 12: Accelarating Optical Quadrature Microscopy Using GPUs

Minimum LP Norm AlgorithmMinimize difference between gradients of wrapped and unwrapped data [1]Nonlinear PDE since U and V are data dependant weightsPDE solved iteratively using the Preconditioned Conjugate Gradient Method

, 1, , , 1, , ,

, 1, 1, , , 1 ,

, , 1,

, 1,

,

, ,

( ) ( )

( ) ( )

( , ) ( 1, )

( , ) ( 1, )

is the unwrapped phase at (x,y)

and denot

i j i j i j i j i j i j i j

i j i j i j i j i j i j

x xi j i j i j

y yi j i j

x y

x yx y x y

c U V

U V

c U i j U i j

V i j V i j

where

φ φ φ φ

φ φ φ φ

φ

+ +

− − −

= − + −

− − − −

= ∆ − ∆ − +

∆ − ∆ −

∆ ∆ e the wrapped phase in the

x and y direction respectively

12

Can be expressed as Q cφ =

[1] Dennis C. Ghiglia, Mark D. Pritt. Two-Dimensional Phase Unwrapping: Theory, Algorithms, and Software, Wiley New York

Page 13: Accelarating Optical Quadrature Microscopy Using GPUs

Preconditioned Conjugate Gradient

Solve un-weighted least square phase unwrapping problemMinimize ε2

Preconditioning using a DCT neededConjugate gradient method called after DCT preconditioning

13

2 22 2

1, , ,0 0

2 22

1, , ,0 0

,

,

( )

( )

where is the wrapped phase at (i,j) and

the unwrapped phase at (i,j)

M Nx

i j i j i ji j

M Ny

i j i j i ji j

i j

xi j is

ε φ φ

φ φ

φ

− −

+= =

− −

+= =

= − −∆

+ − −∆

∑ ∑

∑ ∑

,,

After discretizing the equationand taking the 2D Fourier Transform

2cos( ) 2cos( ) 4

where and are the Fourier Transformedversions of and respectively

m nm n m n

M Nπ π

φ

ΡΦ =

+ −

Φ Ρ∆

Page 14: Accelarating Optical Quadrature Microscopy Using GPUs

Better DCT ImplementationImplementing N point DCT using DFT uses 2N point DFTAnother method seen in [1]

Rearrange x(n) into v(n) such that

DCT(x(n)) = 2*DFT( * Real(v(n)))

For 2D DCT: 4x less work since N*N instead of 2N*2NPCG is ~ 90% of LP Norm Execution timeShuffle kernel before CUFFT call

14

1( ) (2 ) 02

1(2 2 1) 12

Nv n x n n

Nx N n n N

−⎡ ⎤= ≤ ≤ ⎢ ⎥⎣ ⎦+⎡ ⎤= − − ≤ ≤ −⎢ ⎥⎣ ⎦

4kNW

[1] Makhoul J. A fast cosine transform in one and two dimensions.

Page 15: Accelarating Optical Quadrature Microscopy Using GPUs

Minimum LP Norm Algorithmfor k ← 0 to LP Norm Count

Calculate Data Derived Weightsfor j ← 0 to PCG Iteration Count

DCT To DFT Steps (Shuffle Kernel)Execute CUFFT Call Point-Wise Complex Multiplication to Get DCT ResultScaling StepExecute CUFFT Call to do iDCTConjugate Gradient (Point Wise Matrix Operations)

end forif Convergence exit

end for

15

PCG Algorithm

Preconditioning before CG

Page 16: Accelarating Optical Quadrature Microscopy Using GPUs

Example Images – Phase Unwrapping

16

Wrapped Phase Data Unwrapped Phase Data

Images of Glass Bead and water

Page 17: Accelarating Optical Quadrature Microscopy Using GPUs

Example Images – Mouse Embryo

17

Wrapped Phase Data Unwrapped Phase Data

Page 18: Accelarating Optical Quadrature Microscopy Using GPUs

MATLAB External Interface (MEX)

MEX produces linked objects from C codeMATLAB present in our system due to the frame grabbersGateway function in C makes MATLAB data (mxArray) visible to our linked C code

18OQM Processing Control Flow

MATLAB Legacy Interface

Frame Grabber Image

Acquisition Tool Box

Affine Transform Wrapper(C) Using

MEX

Affine Transform & Noise Removal

GPU Code(CUDA)

Phase Unwrapping Wrapper (C) Using MEX

Save Affine ?

MATLAB Legacy Interfaces

Phase Unwrapping GPU code (CUDA)

Yes

No

Page 19: Accelarating Optical Quadrature Microscopy Using GPUs

Performance (Phase Unwrapping)Baseline

Time GPU Time GPU Speedup

MEX and GPU Time(sec)

Mex & GPU Speedup

Preconditioning 11.17 1.2 9.3X 1.2 9.3X

Conjugate Grad. 2.89 0.55 5.25X 0.55 5.25X

IO Activity 0.7 0.7 1X 0.02 NA

Miscellaneous 0.79 0.51 1.53X 0.41 1.92X

Total for Unwrap 15.55 2.97 5.24X 2.16 7.20X

19

Baseline : Serial implementation using FFTW for the Fourier Transform

Page 20: Accelarating Optical Quadrature Microscopy Using GPUs

Performance Results

Page 21: Accelarating Optical Quadrature Microscopy Using GPUs

Future WorkStudy implementation of Conjugate GradientStudy scatter operations in CUDA

Present literature shows improvements only for larger data sizes (may be better on G200)

Multi-Gpu version for imaging scenarios where images may be grabbed at faster rates

Presently only one image at a time acquired.

Phase Unwrapping also required for applications like Synthetic Aperture Radar (SAR)

21

Page 22: Accelarating Optical Quadrature Microscopy Using GPUs

ConclusionsImplemented the computationally intensive Minimum Lp

Norm Phase Unwrapping and Affine Transforms on the GPUNot a batch-oriented process

Latency was critical, single image used at a time

Reducing latency throughout the system improves quality of research and time to discovery in Optical Science Laboratory

22

Page 23: Accelarating Optical Quadrature Microscopy Using GPUs

Thank You

Questions?

Perhaad Mistry([email protected])