canny edge pic - umiacsramani/cmsc828e_gpusci/canny_pic.pdfcase studies canny edge detection...
TRANSCRIPT
Case Studies
Canny Edge Detection
Particle-In-Cell Methods
Edges for inference
• Edges in an image have many causes• An edge presents an opportunity to infer/
compress information required– Look at a few pixels in a binary image as opposed to
all pixels in a grayscale image
• Biological plausibility– initial stages of mammalian vision systems involve
detection of edges and local features
depth discontinuity
surface color discontinuity
illumination discontinuity
surface normal discontinuity
Edge detection
• Important 1st step of many vision/graphics/compression algorithms
• If you know the edges, the stuff in between has low dimensional representations
Issues in edge detection• Noise and gradients
• Edge localization (obtained edge must be at true edge)
• Edges of interest are in between other gradients
– Need non-maximum
suppression
• Finally, edges returned by the detector need to be completed or prunedusing some labeling technique
Effects of noise• Consider a single row or column of the image
– Plotting intensity as a function of position gives a
signal
• Why does this happen and how can we find the edge?
• Recall derivative definition
Laplacian of Gaussian
• Consider
Laplacian of Gaussian
operator
• Where is the edge? • Zero-crossings of bottom graph
Canny Algorithm
• Gaussian convolution
• Compute gradients via first derivative operators to find magnitude and direction
• Find pixels exhibiting the maxima of first derivative gradient. i.e. find zero-crossings of the second derivative, suppress other pixels
• Run hysteresis
Cost of CPU version of Canny
Previous work• Reference CPU implementation: OpenCV computer vision
library.
– Extremely fast.
– Takes advantage of multicore and special processor instructions
• Implementation on Parallel hardware:
– Neoh & Hazanchuk, 2005: Edge Detection on FPGAs
– Reference GPGPU implementation: Fung & Mann 2004, Fung
2005
• J.-P Farrugia et al. GGPUCV
• Filter part of the algorithm implemented in the NVIDIA SDK
(Podlozhnyuk, Image Convolution with CUDA)
First steps of Canny on GPU
• Since these are done in the CUDA SDK will not discuss much
• Smoothing via the CUDA SDK implementation
• Coalesced read/write possible in one direction due to natural ordering of image pixels
• Introduce a data structure with column apron pixels placed in contiguous memory to allow coalesced reads and writes for these
Smoothing Convolution filter
Non-separable Convolution Apron
Separable Convolut ion Apron
7x7 Fil ter Window
16x16 Thread B lock
228 A pr on Pixels
192 A pron Pixels
Gradient estimation
2078642
8964328
7546217
9874652
0187655
6876541
5384651
Source Image
1
-2
-2
-101 = -2
-101 = -2
-101 1
Sam ple Intermediate
ResultsFilter 1 Window
=
1
2
1
1
- 2
- 2
•
765
654
465
•
•
•=
=
-5
Filter 2 Interm ediate W indow Result
Direction finding
)G
Garctan(
x
y=θ
o0
o45o135
o90
o0
o45
o90 o135
Pixel Configurations
At q, we have a maximum if the value is
larger than those at both p and at r.
Interpolate to get these values.
Non maxima suppression
14079156
170188132155179185169
16064212188189199198
122199225177166155
133170188647764
Sample Gradient Magnitude Maxima
Gradient Direction
Ridge Pixels
Non-ridge Pixels
Hysteresis
• Hysteresis Principles– 2 thresholds technique for filtering edges
• Low threshold t1• High threshold t2
– Ridge pixels with magnitude m >= t2: Mark as an edge pixel
– t1 < m <t2 and adjacent to an edge pixel: Mark as an edge pixel
– Identifies strong edges and accounts for comparatively weaker ones
Hysteresis Cont’d
• Generalized Algorithm– Identify and mark all ridge pixels having
m >= t2 as visited edges
– Add pixel into queue Q
– Run a breadth first search (BFS) on Q• For each pixel i in Q
– For each unvisited adjacent pixel j of i
» Mark j as visited
» If m(j) > t1, mark j as an edge and add j to Q
– Remove i from Q
• Terminate when Q is empty
Hysteresis Cont’d• CUDA Algorithm
– Similar to the generalized algorithm
– Multi-pass approach:• Preprocessing
– Load central and apron gradient magnitude data from global to shared memory
– Set initial edge states
• BFS– For each thread, run a BFS that marks potential edges
as definite edges
• Write-back– Write all edge states to the gradient magnitude space
and reiterate
Hysteresis Cont’d– Preprocessing
• For each thread-block– Load the central gradient magnitudes from
global to shared memory
– Load a 1 unit wide apron from global to shared memory
– For each thread• Map the thread to unique central pixel i (non-
apron)
• Set pixel state as
S(mi) = {definite edge (-2): t2 <= mi
potential edge (-1): t1 <= mi < t2non-edge (0): 0 < mi
(mi): mi <= 0}
Hysteresis Cont’d
144143200125122145166150
120154189122111160150156
130170188140185183179170
156155180212199189199198
150160255199225170166150
2001891881921901709680
521321661871781567860
1321881471881661556654
Preprocessing
187
224
2
1
=
=
t
t144143200125122145166150
1200-10000156
1300-10000170
1560-1-1-1-1-1198
00-2-1-200150
200-1-1-1-10080
5200-100060
1321881471881661556654
B1 A1
Breadth First Search
B2A2A2A2
B1A2A1
B2B2B2A2
B3
B2A2A2A2A3
B1A2A1
B2B2B2A2
A3
B4
B3
B2A2A2A2A3
B1A2A1
B2B2B2A2
A3
Definite Edge
Potential Edge
Non-edge
Apron Pixel
1 2
3 4
• Breadth First Search
• For each thread with pixel i with edge state = potential edge (-1) – If i has an adjacent pixel j with edge state = definite
edge (-2), add i to queue Qi
– For each pixel k in Qi• Mark k as visited in
shared memory• Set edge state of k to
definite edge
• For each unvisited adjacent pixel l of k
– Mark l as visited
– If m(l) > t1, add lto Qi
• Remove k from Qi
Hysteresis
Cont’d
• Write-back– When all BFS from treads
terminate, write edge states of central pixels from shared memory to the gradient magnitude space in global memory
• Multi-pass– Multiple calls to the entire
procedure allow data migration
– 1 unit wide aprons allow previous edge states to be passed between thread-blocks at the beginning of the preprocessing stage
– Bordering pixels along thread-blocks that became definite edges will be visible to adjacent thread-blocks in the following pass
144143200125122145166150
120154189122111160200156
130170188140185183188190
156155180212199150199198
150160255199225170166150
200189188192190188199190
5213216618717815678170
1321881471881661556654
144143200125122145166150
200000000156
18818819919823700170
1991872220000198
166000000150
19918818618718819919880
7800000060
1321881471881661556654
144143200125122145166150
1200-2000-1156
1300-2000-1190
1560-2-2-20-1198
1500-2-2-200150
200-2-2-2-2--2-2190
5200-2000170
1321881471881661556654
144143200125122145166150
200000000156
188-2-2-2-200170
199-2-20000198
166000000150
199-1-1-1-1-1-180
7800000060
1321881471881661556654
Iteration 1:
Pre-
processing
Iteration 1:
BFS
Iteration 2
Preprocessing
144143200125122145166150
1200-2000-10
1300-2000-1-2
1560-2-2-20-1-2
1500-2-2-2000
200-2-2-2-2--2-2-1
5200-20000
1321881471881661556654
144143200125122145166150
-1000000156
-1-2-2-2-200170
-1-2-20000198
0000000150
-2-1-1-1-1-1-180
000000060
1321881471881661556654
144143200125122145166150
1200-2000-20
1300-2000-2-2
1560-2-2-20-2-2
1500-2-2-2000
200-2-2-2-2--2-2-1
5200-20000
1321881471881661556654
144143200125122145166150
-1000000156
-1-2-2-2-200170
-1-2-20000198
0000000150
-2-2-2-2-2-2-280
000000060
1321881471881661556654
Iteration 2:
Pre-
processing
Iteration 2:BFS
Non-edge Potential-edge Definite-edge Apron Pixel
profiling
Comparison with OpenCV
LENA MANDRILL
Matlab• CUDA interoperability with Matlab via
compiled mex files (SDK example)
• Adapted existing code to use• glCudaEdgeDetector(A,nChannels,threshold
Low,thresholdHigh,hysteresisIts,sigma);
• More substantial speedups
Particle in Cell Methods• Arise in many areas
– Fluid Mechanics
– Plasma simulations
• Ingredients– Field Equation
– Particle evolution equation
• Particles may be– Real particles (electrons, ions, bubbles, solid particles)
– Point samples of a continuous function
• General scheme– Separation of length and time scales
– Field evolves at a macroscopic scale and particles contribute tothis evolution in an integral sense
– Particles evolve and are influenced by local field, and possiblylocal particle collisions
• Particle Equations
• Maxwell’s equations for E and B
How Does PIC work?
pq
Charge Assignment Force Interpolation
( ),p p
E B
( ),i i
E Biq
p pi iα=∑E E
i ip pq qα=∑
� Lorentz-Force: ( )p p p p
m= + ×F E p B
� Solve Maxwell Equations on grid
Grid-Point ChargeGrid-Point Charge
:pi ip
α α= (zero self-force)
Dual Grid CellDual Grid Cell
Basis of PIC plasma simulation
Calculation of forces
from fields and
velocity
Acceleration and
increment of velocity
Displacements and new
positions. Boundary
conditions.
Calculation of
density. Current
profile fixed.
Resolution of Poisson
equation for the
electrostatic potential.
Computation of electric
field. Magnetic is given.
Initial step
with φ=0
Major computational tasks
• Interpolation of fields forces to particles
• Evolving particles
• Interpolating particle properties to grid for next step
• Major issues – coalesced reads and writes as particles move
• Fast solution of field equations– Spectral methods
– Finite-difference