fast ccl(connected component labeling) with gpu

ICESS 2016, Takamatsu, Japan

14 ~ 16 Nov. 2016

Young-Min KangTongmyong University

A Parallel Approach to Object IdentificationIn Large-scale Images

Sung-Soo Kim, ETRI Gyung-Tae Nam, GCSC Inc.

Bigger images

• Era of Big data– Increased sizes of images data

• Image processing– Heavy Computation

• One of the most fundamental operations– Object identification/recognition

• Image segmentation• Connected components labeling

Connected component labeling

• Objective– Pixels in a connected component have an identical labels

Parallel image processing

• Most image processing algorithms– Pixel-wise operations

• can be implemented with pixel-wise threads• can be efficiently performed in a data-parallel fashion

• GPU– Data parallel device– can be easily applied to various image processing methods

GPU:Many-core architecture

Pixel connectivity

• Graph representation

Image Pixel connectivity

CCL and parallelism

• CCL with graph traversal– cannot be easily parallelized

• Traversal = sequential

• GPU based approaches– has not been very successful

Our method

• GPU-based efficient algorithm for CCL– Data initialization– Computing column-wise label runs– Efficient label merge

Data initialization

• Each pixel is assigned unique label if it is turned on

Data initialization


1 2 3 4 5

6 7 8 9 10

11 12 13 14 15

16 17 18 19 20

21 22 23 24 25

Data initialization


1 2 -1 -1 -1

6 7 -1 9 10

11 12 -1 14 -1

-1 -1 -1 19 20

-1 -1 -1 -1 -1

Column-wise label runs

• Run– Block of contiguous object pixels in a column

• Computing column-wise label runs– Can be done with w threads

h

w


• Label change within a column (1 thread)


• Graph-based interpretation


• Implementation

Label merge

• After computing “column-wise label runs”– We have separate trees to be merges in accordance

with their connectivity• What is needed

– Checking vertical adjacency

Label merge

• Connectivity check

Label merge

• Updated hierarchy

Why only roots are changed

Let’s merge

OK! I will follow you

Why only roots are changed

Merged tree

Previous methods

1. Check the connectivity2. Update the hierarchy3. Iterate this process until no update is made

A kind of graph traversalHeavy computation when the pixels make a

long connected chain

Our method

• Label merge is performed with fixed number of iterations– The number of iteration

• log2(w)– Computation cost at every iteration

• reduced to be the half the previous one

• Efficient label merge• Moreover

– Can be easily parallelized

Label merge boundary

• 1st merge

w/2 boundariesh comparisons in each boundary

wh/2 threads


• 2nd merge


wh/22 threads


• 3rd merge


wh/23 threads


• Final merge

log2(w) –th merge

Computation cost at the 1st merge: C(1)

Total Cost

Performance

• Computational cost for each task– Cost for Initialization = 1– 4096x4096 images with different number of connected components

50 labels 1869 labels

initialization 1.0 1.0

column-wise run 1.6 1.6

label merge 3.4 3.6

Performance

• Computational cost for each task– Cost for Initialization = 1– 4096x4096 images with different number of connected components

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

initialization column-wise run label merge

50 labels

1869 labels

Experimental results

• Reference– Grana’s method implemented with OpenCV

• Two Tests– Random noise with varying densities– Object identification with shapes

Varying densities

• Image size: 2048x2048

Varying densities

• Image size: 4096x4096

Object identification with shapes

• Two spiral curves


• Stars

Applications

• Object tracking with radar signal

Conclusion

• An efficient GPGPU implementation for CCL

• Data-parallelism of GPU exploited• Experimental results show its efficiency• Can be successfully applied to various

applications with large-scale images– e.g., Object identification from radar signals

감사합니다.ありがとうございます

谢谢Thank you

Q & A

fast ccl(connected component labeling) with gpu

Education