fast ccl(connected component labeling) with gpu
TRANSCRIPT
ICESS 2016, Takamatsu, Japan
14 ~ 16 Nov. 2016
Young-Min KangTongmyong University
A Parallel Approach to Object IdentificationIn Large-scale Images
Sung-Soo Kim, ETRI Gyung-Tae Nam, GCSC Inc.
Bigger images
• Era of Big data– Increased sizes of images data
• Image processing– Heavy Computation
• One of the most fundamental operations– Object identification/recognition
• Image segmentation• Connected components labeling
Parallel image processing
• Most image processing algorithms– Pixel-wise operations
• can be implemented with pixel-wise threads• can be efficiently performed in a data-parallel fashion
• GPU– Data parallel device– can be easily applied to various image processing methods
GPU:Many-core architecture
CCL and parallelism
• CCL with graph traversal– cannot be easily parallelized
• Traversal = sequential
• GPU based approaches– has not been very successful
Our method
• GPU-based efficient algorithm for CCL– Data initialization– Computing column-wise label runs– Efficient label merge
Data initialization
• Each pixel is assigned unique label if it is turned on
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
16 17 18 19 20
21 22 23 24 25
Data initialization
• Each pixel is assigned unique label if it is turned on
1 2 -1 -1 -1
6 7 -1 9 10
11 12 -1 14 -1
-1 -1 -1 19 20
-1 -1 -1 -1 -1
Column-wise label runs
• Run– Block of contiguous object pixels in a column
• Computing column-wise label runs– Can be done with w threads
h
w
Label merge
• After computing “column-wise label runs”– We have separate trees to be merges in accordance
with their connectivity• What is needed
– Checking vertical adjacency
Previous methods
1. Check the connectivity2. Update the hierarchy3. Iterate this process until no update is made
A kind of graph traversalHeavy computation when the pixels make a
long connected chain
Our method
• Label merge is performed with fixed number of iterations– The number of iteration
• log2(w)– Computation cost at every iteration
• reduced to be the half the previous one
• Efficient label merge• Moreover
– Can be easily parallelized
Label merge boundary
• Final merge
log2(w) –th merge
Computation cost at the 1st merge: C(1)
Total Cost
Performance
• Computational cost for each task– Cost for Initialization = 1– 4096x4096 images with different number of connected components
50 labels 1869 labels
initialization 1.0 1.0
column-wise run 1.6 1.6
label merge 3.4 3.6
Performance
• Computational cost for each task– Cost for Initialization = 1– 4096x4096 images with different number of connected components
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
initialization column-wise run label merge
50 labels
1869 labels
Experimental results
• Reference– Grana’s method implemented with OpenCV
• Two Tests– Random noise with varying densities– Object identification with shapes
Conclusion
• An efficient GPGPU implementation for CCL
• Data-parallelism of GPU exploited• Experimental results show its efficiency• Can be successfully applied to various
applications with large-scale images– e.g., Object identification from radar signals