parallel implementation of geodesic distance transform with application in superpixel segmentation
DESCRIPTION
This poster presents a parallel implementation of geodesic distance transform using OpenMP. This work forms part of a C implementation for geodesic superpixel segmentation of natural images. Presented at DICTA 2013 conferenceTRANSCRIPT
Tuan Q. Pham
Canon Information Systems Research Australia (CISRA)
Parallel implementation of geodesic distance transform with application in superpixel segmentation
References: 1. Achanta et al., SLIC superpixels compared to state-of-the-art superpixel methods, PAMI 34(11), 2012. 2. Levinshtein et al., TurboPixels: Fast superpixels using geometric flows, PAMI 31(12), 2009.
Contact details: Tuan Q. Pham ([email protected]), 1 Thomas Holt drive, North Ryde, NSW 2113, Australia
Presented at Int’l Conf. on Digital Image Computing: Techniques and Applications (DICTA) Paper 5, Poster session 2 on Thursday 8th November, 2013. Hobart, Australia
Segment image into superpixels using GDT where
Cost image = gradient energy + a small offset
Seed points = well-separated local gradient
minima by adaptive non-maximum suppression
Superpixel segmentation
Summary
We proposed a parallel implementation of
geodesic distance transform using OpenMP
Our geodesic segmentation method
produces more regular, edge-following
superpixels at orders of magnitude faster
than state-of-the-art segmentation methods.
Within a pass, chamfer algorithm is sequential
The algorithm can be parallelised if multiple
passes are allowed (OK since GDT is iterative)
Image is divided into bands for parallel processing
Distance transform is propagated across bands in
a next iteration (may require more iterations)
Parallel distance transform
Geodesic distance between two points = sum of
pixel costs along a minimum-cost path
Geodesic distance transform d(cost image f,
seed points) = geodesic distance from every
pixel to its nearest seed
Geodesic distance
Geodesic distance transform
Frame 8
source
destination
0
0.2
0.4
0.6
0.8
1
minimum path, cost = 1.7
straight path, cost = 11.1
seed
2
4
6
8
10
Fig. 1. Cost image (left) and its geodesic distance transform (right)
Chamfer distance algorithm = multiple iterations of
a forward
propagation
and
a backward
propagation
Fig. 2. One iteration of a forward pass (left) and a backward pass (right)
Geodesic distance transform (GDT) produces edge-
following Voronoi tessellation if edge is used as cost
Fig. 3. Geodesic distance transform (2nd row) and tessellation (3rd row)
Fig. 4. Band-based image partitioning for parallel GDT implementation
Nearest seed label after
a first forward pass
Nearest seed label after
a first backward pass
Input image
Nearest seed label
after 10 iterations
Cost image & 4 seed points
Intermediate GDT after
a first forward passIntermediate GDT after
a first backward pass
GDT after 10 fwd+bwd
propagation iterations
fragmentationregion without a nearest seed
OpenMP = an easy to use Open Multi-Processing
platform that is designed for multicore processors
and is supported by most compilers
OpenMP parallelises loop by compiler directives
OpenMP implementationSegmentation comparison
Geodesic superpixel is faster & follows edges better
frame 1frame 4frame 8frame 12
Fig. 8. Segmentation of 1MP image (# denotes number of superpixels returned)
Method # Time Platform Method # Time Platform Method # Time Platform
Watershed 1008 3.2s C/Matlab FH 1024 2.3s C Quickshift 992 13.3s C
Entropy 1000 6.5s C Geodesic 1000 0.3s C CVT 1000 2.7s Matlab
Lattice 1024 1.4s C SLIC 990 1.2s C Turbo 1067 58.1s Matlab
Fig. 7. Three state-of-the-art superpixel methods on 2MP image in Fig.6
SLIC [1] (4.6 seconds) Geodesic (0.64 sec) TurboPixels [2] (207 sec)
Fig. 6. 1000 geodesic superpixels from a 2MegaPixel image (1936×1288)
Best with static scheduling (where bands are
assigned to threads in a round-robin fashion)
Number of fwd+bwd propagation iterations
increases slightly under parallel implementation
(10 iterations are often enough for segmentation)
Sub-second runtime on 5 MP image or smaller
Speedup of 1.3× on 2-core, 2.6× on 4-core CPU
Evaluation of parallel GDT
Fig. 5. Runtime & speedup factor on 2.8GHz quad-core CPU with 12GB RAM
0 1000 2000 3000 40000
0.5
1
1.5
2
image width (pixels)
run
tim
e (
seco
nd
s)
without OpenMP
static schedule
dynamic schedule
0 1000 2000 3000 40000
0.5
1
1.5
2
2.5
3
3.5
image width (pixels)
spee
du
p facto
r
static schedule
dynamic schedule
Runtime Speedup factor
Geodesic Voronoi tessellation