canny edge pic - umiacsramani/cmsc828e_gpusci/canny_pic.pdfcase studies canny edge detection...

Case Studies

Canny Edge Detection

Particle-In-Cell Methods

Edges for inference

• Edges in an image have many causes• An edge presents an opportunity to infer/

compress information required– Look at a few pixels in a binary image as opposed to

all pixels in a grayscale image

• Biological plausibility– initial stages of mammalian vision systems involve

detection of edges and local features

depth discontinuity

surface color discontinuity

illumination discontinuity

surface normal discontinuity

Edge detection

• Important 1st step of many vision/graphics/compression algorithms

• If you know the edges, the stuff in between has low dimensional representations

Issues in edge detection• Noise and gradients

• Edge localization (obtained edge must be at true edge)

• Edges of interest are in between other gradients

– Need non-maximum

suppression

• Finally, edges returned by the detector need to be completed or prunedusing some labeling technique

Effects of noise• Consider a single row or column of the image

– Plotting intensity as a function of position gives a

signal

• Why does this happen and how can we find the edge?

• Recall derivative definition

Laplacian of Gaussian

• Consider

Laplacian of Gaussian

operator

• Where is the edge? • Zero-crossings of bottom graph

Canny Algorithm

• Gaussian convolution

• Compute gradients via first derivative operators to find magnitude and direction

• Find pixels exhibiting the maxima of first derivative gradient. i.e. find zero-crossings of the second derivative, suppress other pixels

• Run hysteresis

Cost of CPU version of Canny

Previous work• Reference CPU implementation: OpenCV computer vision

library.

– Extremely fast.

– Takes advantage of multicore and special processor instructions

• Implementation on Parallel hardware:

– Neoh & Hazanchuk, 2005: Edge Detection on FPGAs

– Reference GPGPU implementation: Fung & Mann 2004, Fung

2005

• J.-P Farrugia et al. GGPUCV

• Filter part of the algorithm implemented in the NVIDIA SDK

(Podlozhnyuk, Image Convolution with CUDA)

First steps of Canny on GPU

• Since these are done in the CUDA SDK will not discuss much

• Smoothing via the CUDA SDK implementation

• Coalesced read/write possible in one direction due to natural ordering of image pixels

• Introduce a data structure with column apron pixels placed in contiguous memory to allow coalesced reads and writes for these

Smoothing Convolution filter

Non-separable Convolution Apron

Separable Convolut ion Apron

7x7 Fil ter Window

16x16 Thread B lock

228 A pr on Pixels

192 A pron Pixels

Gradient estimation

2078642

8964328

7546217

9874652

0187655

6876541

5384651

Source Image

1

-2

-2

-101 = -2

-101 = -2

-101 1

Sam ple Intermediate

ResultsFilter 1 Window

=

1

2

1

1

- 2

- 2

•

765

654

465

•

•

•=

=

-5

Filter 2 Interm ediate W indow Result

Direction finding

)G

Garctan(

x

y=θ

o0

o45o135

o90

o0

o45

o90 o135

Pixel Configurations

At q, we have a maximum if the value is

larger than those at both p and at r.

Interpolate to get these values.

Non maxima suppression

14079156

170188132155179185169

16064212188189199198

122199225177166155

133170188647764

Sample Gradient Magnitude Maxima

Gradient Direction

Ridge Pixels

Non-ridge Pixels

Hysteresis

• Hysteresis Principles– 2 thresholds technique for filtering edges

• Low threshold t1• High threshold t2

– Ridge pixels with magnitude m >= t2: Mark as an edge pixel

– t1 < m <t2 and adjacent to an edge pixel: Mark as an edge pixel

– Identifies strong edges and accounts for comparatively weaker ones

Hysteresis Cont’d

• Generalized Algorithm– Identify and mark all ridge pixels having

m >= t2 as visited edges

– Add pixel into queue Q

– Run a breadth first search (BFS) on Q• For each pixel i in Q

– For each unvisited adjacent pixel j of i

» Mark j as visited

» If m(j) > t1, mark j as an edge and add j to Q

– Remove i from Q

• Terminate when Q is empty

Hysteresis Cont’d• CUDA Algorithm

– Similar to the generalized algorithm

– Multi-pass approach:• Preprocessing

– Load central and apron gradient magnitude data from global to shared memory

– Set initial edge states

• BFS– For each thread, run a BFS that marks potential edges

as definite edges

• Write-back– Write all edge states to the gradient magnitude space

and reiterate

Hysteresis Cont’d– Preprocessing

• For each thread-block– Load the central gradient magnitudes from

global to shared memory

– Load a 1 unit wide apron from global to shared memory

– For each thread• Map the thread to unique central pixel i (non-

apron)

• Set pixel state as

S(mi) = {definite edge (-2): t2 <= mi

potential edge (-1): t1 <= mi < t2non-edge (0): 0 < mi

(mi): mi <= 0}

Hysteresis Cont’d

144143200125122145166150

120154189122111160150156

130170188140185183179170

156155180212199189199198

150160255199225170166150

2001891881921901709680

521321661871781567860

1321881471881661556654

Preprocessing

187

224

2

1

=

=

t

t144143200125122145166150

1200-10000156

1300-10000170

1560-1-1-1-1-1198

00-2-1-200150

200-1-1-1-10080

5200-100060

1321881471881661556654

B1 A1

Breadth First Search

B2A2A2A2

B1A2A1

B2B2B2A2

B3

B2A2A2A2A3

B1A2A1

B2B2B2A2

A3

B4

B3

B2A2A2A2A3

B1A2A1

B2B2B2A2

A3

Definite Edge

Potential Edge

Non-edge

Apron Pixel

1 2

3 4

• Breadth First Search

• For each thread with pixel i with edge state = potential edge (-1) – If i has an adjacent pixel j with edge state = definite

edge (-2), add i to queue Qi

– For each pixel k in Qi• Mark k as visited in

shared memory• Set edge state of k to

definite edge

• For each unvisited adjacent pixel l of k

– Mark l as visited

– If m(l) > t1, add lto Qi

• Remove k from Qi

Hysteresis

Cont’d

• Write-back– When all BFS from treads

terminate, write edge states of central pixels from shared memory to the gradient magnitude space in global memory

• Multi-pass– Multiple calls to the entire

procedure allow data migration

– 1 unit wide aprons allow previous edge states to be passed between thread-blocks at the beginning of the preprocessing stage

– Bordering pixels along thread-blocks that became definite edges will be visible to adjacent thread-blocks in the following pass

144143200125122145166150

120154189122111160200156

130170188140185183188190

156155180212199150199198

150160255199225170166150

200189188192190188199190

5213216618717815678170

1321881471881661556654

144143200125122145166150

200000000156

18818819919823700170

1991872220000198

166000000150

19918818618718819919880

7800000060

1321881471881661556654

144143200125122145166150

1200-2000-1156

1300-2000-1190

1560-2-2-20-1198

1500-2-2-200150

200-2-2-2-2--2-2190

5200-2000170

1321881471881661556654

144143200125122145166150

200000000156

188-2-2-2-200170

199-2-20000198

166000000150

199-1-1-1-1-1-180

7800000060

1321881471881661556654

Iteration 1:

Pre-

processing

Iteration 1:

BFS

Iteration 2

Preprocessing

144143200125122145166150

1200-2000-10

1300-2000-1-2

1560-2-2-20-1-2

1500-2-2-2000

200-2-2-2-2--2-2-1

5200-20000

1321881471881661556654

144143200125122145166150

-1000000156

-1-2-2-2-200170

-1-2-20000198

0000000150

-2-1-1-1-1-1-180

000000060

1321881471881661556654

144143200125122145166150

1200-2000-20

1300-2000-2-2

1560-2-2-20-2-2

1500-2-2-2000

200-2-2-2-2--2-2-1

5200-20000

1321881471881661556654

144143200125122145166150

-1000000156

-1-2-2-2-200170

-1-2-20000198

0000000150

-2-2-2-2-2-2-280

000000060

1321881471881661556654

Iteration 2:

Pre-

processing

Iteration 2:BFS

Non-edge Potential-edge Definite-edge Apron Pixel

profiling

Comparison with OpenCV

LENA MANDRILL

Matlab• CUDA interoperability with Matlab via

compiled mex files (SDK example)

• Adapted existing code to use• glCudaEdgeDetector(A,nChannels,threshold

Low,thresholdHigh,hysteresisIts,sigma);

• More substantial speedups

Particle in Cell Methods• Arise in many areas

– Fluid Mechanics

– Plasma simulations

• Ingredients– Field Equation

– Particle evolution equation

• Particles may be– Real particles (electrons, ions, bubbles, solid particles)

– Point samples of a continuous function

• General scheme– Separation of length and time scales

– Field evolves at a macroscopic scale and particles contribute tothis evolution in an integral sense

– Particles evolve and are influenced by local field, and possiblylocal particle collisions

• Particle Equations

• Maxwell’s equations for E and B

How Does PIC work?

pq

Charge Assignment Force Interpolation

( ),p p

E B

( ),i i

E Biq

p pi iα=∑E E

i ip pq qα=∑

� Lorentz-Force: ( )p p p p

qq

m= + ×F E p B

� Solve Maxwell Equations on grid

Grid-Point ChargeGrid-Point Charge

:pi ip

α α= (zero self-force)

Dual Grid CellDual Grid Cell

Basis of PIC plasma simulation

Calculation of forces

from fields and

velocity

Acceleration and

increment of velocity

Displacements and new

positions. Boundary

conditions.

Calculation of

density. Current

profile fixed.

Resolution of Poisson

equation for the

electrostatic potential.

Computation of electric

field. Magnetic is given.

Initial step

with φ=0

Major computational tasks

• Interpolation of fields forces to particles

• Evolving particles

• Interpolating particle properties to grid for next step

• Major issues – coalesced reads and writes as particles move

• Fast solution of field equations– Spectral methods

– Finite-difference

canny edge pic - umiacsramani/cmsc828e_gpusci/canny_pic.pdfcase studies canny edge detection...

Documents