accelerated connected component labeling using cuda … · connected component labeling (ccl) •...
TRANSCRIPT
![Page 1: Accelerated Connected Component Labeling Using CUDA … · Connected Component Labeling (CCL) • There have been a number of attempts to improve performance of CCL: • Bailey and](https://reader033.vdocument.in/reader033/viewer/2022050420/5f8f596aa1c0192288062ce5/html5/thumbnails/1.jpg)
Accelerated Connected Component Labeling Using CUDA Framework Fanny Nina-Paravecino, David Kaeli
ICCVG 2014
![Page 2: Accelerated Connected Component Labeling Using CUDA … · Connected Component Labeling (CCL) • There have been a number of attempts to improve performance of CCL: • Bailey and](https://reader033.vdocument.in/reader033/viewer/2022050420/5f8f596aa1c0192288062ce5/html5/thumbnails/2.jpg)
Outline
• Introduction• Connected Component Labeling• NVIDIA’s Compute Unified Device Architecture• Accelerated Connected Component Labeling• Performance Results• Conclusions
Fanny Nina-Paravecino and David Kaeli, ICCVG, 15-17 Sep. 2014, Warsaw, Poland
2
![Page 3: Accelerated Connected Component Labeling Using CUDA … · Connected Component Labeling (CCL) • There have been a number of attempts to improve performance of CCL: • Bailey and](https://reader033.vdocument.in/reader033/viewer/2022050420/5f8f596aa1c0192288062ce5/html5/thumbnails/3.jpg)
Introduction• Image analysis plays an important role in many applications • In the field of physical security, there are challenging tasks
such as luggage scanning at airports that require:• Near real-time response• Very high rate accuracy
• Connected component algorithm identifies neighboring segments possessing similar intensities• Potential for efficient segmentation• Provides high quality results
Fanny Nina-Paravecino and David Kaeli, ICCVG, 15-17 Sep. 2014, Warsaw, Poland
3
![Page 4: Accelerated Connected Component Labeling Using CUDA … · Connected Component Labeling (CCL) • There have been a number of attempts to improve performance of CCL: • Bailey and](https://reader033.vdocument.in/reader033/viewer/2022050420/5f8f596aa1c0192288062ce5/html5/thumbnails/4.jpg)
Introduction Matrix of Image512 x512
~700 images…
~700 matrices
…
One Frame
Multiple Frames
Fanny Nina-Paravecino and David Kaeli, ICCVG, 15-17 Sep. 2014, Warsaw, Poland
4
![Page 5: Accelerated Connected Component Labeling Using CUDA … · Connected Component Labeling (CCL) • There have been a number of attempts to improve performance of CCL: • Bailey and](https://reader033.vdocument.in/reader033/viewer/2022050420/5f8f596aa1c0192288062ce5/html5/thumbnails/5.jpg)
Introduction• Flow chart of Object Detection
DICOM Image
DICOM Image
Input Object Detection
Preprocessing Preprocessing
Image Segmentation
Image Segmentation
Features ExtractionFeatures
Extraction
Object DetectionObject Detection
Our current focus
Our current focus
Fanny Nina-Paravecino and David Kaeli, ICCVG, 15-17 Sep. 2014, Warsaw, Poland
5
![Page 6: Accelerated Connected Component Labeling Using CUDA … · Connected Component Labeling (CCL) • There have been a number of attempts to improve performance of CCL: • Bailey and](https://reader033.vdocument.in/reader033/viewer/2022050420/5f8f596aa1c0192288062ce5/html5/thumbnails/6.jpg)
Connected Component Labeling (CCL)• There have been a number of attempts to improve
performance of CCL:• Bailey and Johnston, “Single Pass Connected Components
Analysis. Image and Vision Computing” (2007)• Zhao et al., “Stripe-based Connected Components
Labeling” (2010)• Klaiber et al., “A memory-efficient parallel single pass
architecture for connected component labeling of streamed images” (2012)
• GPU implementations• Stava and Benes, “Connected component labeling in CUDA”,
GPU Computing Gems, (2010)
Fanny Nina-Paravecino and David Kaeli, ICCVG, 15-17 Sep. 2014, Warsaw, Poland
6
![Page 7: Accelerated Connected Component Labeling Using CUDA … · Connected Component Labeling (CCL) • There have been a number of attempts to improve performance of CCL: • Bailey and](https://reader033.vdocument.in/reader033/viewer/2022050420/5f8f596aa1c0192288062ce5/html5/thumbnails/7.jpg)
NVIDIA’s Compute Unified Device Architecture (CUDA)• Compute capability architecture:• Tesla: Compute capability 1.0, 1.1, 1.2, 1.3.• Fermi: Compute capability 2.0, 2.1.• Kepler: Compute capability 3.0, 3.5.
Fanny Nina-Paravecino and David Kaeli, ICCVG, 15-17 Sep. 2014, Warsaw, Poland
7
![Page 8: Accelerated Connected Component Labeling Using CUDA … · Connected Component Labeling (CCL) • There have been a number of attempts to improve performance of CCL: • Bailey and](https://reader033.vdocument.in/reader033/viewer/2022050420/5f8f596aa1c0192288062ce5/html5/thumbnails/8.jpg)
NVIDIA’s Compute Unified Device Architecture (CUDA)• Dynamic Parallelism
Fanny Nina-Paravecino and David Kaeli, ICCVG, 15-17 Sep. 2014, Warsaw, Poland
8
![Page 9: Accelerated Connected Component Labeling Using CUDA … · Connected Component Labeling (CCL) • There have been a number of attempts to improve performance of CCL: • Bailey and](https://reader033.vdocument.in/reader033/viewer/2022050420/5f8f596aa1c0192288062ce5/html5/thumbnails/9.jpg)
NVIDIA’s Compute Unified Device Architecture (CUDA)• Concurrent Kernel Execution: Hyper-Q
Issue Order
Stream 0
Stream 1
Fermi
Stream 0 Stream 1 Stream 0 Stream 1
Kepler GK110Kernel Execution
Time
Fanny Nina-Paravecino and David Kaeli, ICCVG, 15-17 Sep. 2014, Warsaw, Poland
9
![Page 10: Accelerated Connected Component Labeling Using CUDA … · Connected Component Labeling (CCL) • There have been a number of attempts to improve performance of CCL: • Bailey and](https://reader033.vdocument.in/reader033/viewer/2022050420/5f8f596aa1c0192288062ce5/html5/thumbnails/10.jpg)
Accelerated Connected Component Labeling• Two phases:• Phase 0: Find Spans• Phase 1: Merge Spans
Phase 0 Phase 1
1 1
1 1
1
0 0 2 2
1 2 - -
0 0 - -
Spans matrixN x K
Image matrixN x M
Each pair = span
1 2
3 -
5 -
Label Index MatrixN x K/2
Input
Binary imageN x M
threads
0 0 2 2
1 2 - -
0 0 - -
Spans matrix
1 2
2 -
5 -
Label Index
UpdateLabel
Kernel
UpdateLabel
Kernel
Child
threads
Fanny Nina-Paravecino and David Kaeli, ICCVG, 15-17 Sep. 2014, Warsaw, Poland
10
![Page 11: Accelerated Connected Component Labeling Using CUDA … · Connected Component Labeling (CCL) • There have been a number of attempts to improve performance of CCL: • Bailey and](https://reader033.vdocument.in/reader033/viewer/2022050420/5f8f596aa1c0192288062ce5/html5/thumbnails/11.jpg)
Accelerated Connected Component Labeling• Phase 0: Find Spans• Each span has two elements: (ystart, yend)
• A unique label is assigned immediately1 1
1 1
1
0 0 2 2
1 2 - -
0 0 - -
Spans matrix
1 2
3 -
5 -
Label Matrix
Binary imageN x M
Fanny Nina-Paravecino and David Kaeli, ICCVG, 15-17 Sep. 2014, Warsaw, Poland
11
spanx {(ystart,yend) | I (x,ystart ) I (x,ystart1 ) ... I (x,yend )}
![Page 12: Accelerated Connected Component Labeling Using CUDA … · Connected Component Labeling (CCL) • There have been a number of attempts to improve performance of CCL: • Bailey and](https://reader033.vdocument.in/reader033/viewer/2022050420/5f8f596aa1c0192288062ce5/html5/thumbnails/12.jpg)
Accelerated Connected Component Labeling• Phase 1: Merge Spans
Merge Span parent kernel
0 0 2 2
1 2 - -
0 0 - -
Spans matrix
1 2
3 -
5 -
Label Matrix
Merge Span parent kernel
0 0 2 2
0 0 2 2
0 1 - -
Spans matrix
1 2
1 2
1 -
Label Matrix
Concurrent Kernels
Multiples images at a time
Update LabelChild KernelUpdate LabelChild Kernel
Merge?
Merge?
Yes
NoNext span
One single update
Multiples updates at the same time
Update LabelChild KernelUpdate LabelChild Kernel
Merge?
Merge?
Yes
NoNext span
1 1
1 1
1 -
1 2
2 -
5 -
Label Matrix
Label Matrix
Fanny Nina-Paravecino and David Kaeli, ICCVG, 15-17 Sep. 2014, Warsaw, Poland
12
![Page 13: Accelerated Connected Component Labeling Using CUDA … · Connected Component Labeling (CCL) • There have been a number of attempts to improve performance of CCL: • Bailey and](https://reader033.vdocument.in/reader033/viewer/2022050420/5f8f596aa1c0192288062ce5/html5/thumbnails/13.jpg)
Performance Results• Input Image:• DICOM format• Integer values [0 – 255]• More than 700 images (512 x 512 pixels)
Fanny Nina-Paravecino and David Kaeli, ICCVG, 15-17 Sep. 2014, Warsaw, Poland
13
![Page 14: Accelerated Connected Component Labeling Using CUDA … · Connected Component Labeling (CCL) • There have been a number of attempts to improve performance of CCL: • Bailey and](https://reader033.vdocument.in/reader033/viewer/2022050420/5f8f596aa1c0192288062ce5/html5/thumbnails/14.jpg)
Performance Results• Pre-processing steps:• Background noise removal• Binary Conversion
Original Image Binary Image
Fanny Nina-Paravecino and David Kaeli, ICCVG, 15-17 Sep. 2014, Warsaw, Poland
14
![Page 15: Accelerated Connected Component Labeling Using CUDA … · Connected Component Labeling (CCL) • There have been a number of attempts to improve performance of CCL: • Bailey and](https://reader033.vdocument.in/reader033/viewer/2022050420/5f8f596aa1c0192288062ce5/html5/thumbnails/15.jpg)
Performance Results• Experimental Environment:• CPU• Intel Core i7-3779K processor• RAM: 8GB
• GPU• GK 110 (NVIDIA GTX Titan)• Compute Capability 3.5• CUDA 5.5
• gcc compiler 3.7• OpenMP 3.0
Fanny Nina-Paravecino and David Kaeli, ICCVG, 15-17 Sep. 2014, Warsaw, Poland
15
![Page 16: Accelerated Connected Component Labeling Using CUDA … · Connected Component Labeling (CCL) • There have been a number of attempts to improve performance of CCL: • Bailey and](https://reader033.vdocument.in/reader033/viewer/2022050420/5f8f596aa1c0192288062ce5/html5/thumbnails/16.jpg)
Performance Results• One Image
Method Running Time (s) Speedup
CCL Serial 0.25 1.00x
CCL OpenMP 0.18 1.39x
ACCL 0.05 5.00x
Fanny Nina-Paravecino and David Kaeli, ICCVG, 15-17 Sep. 2014, Warsaw, Poland
16
![Page 17: Accelerated Connected Component Labeling Using CUDA … · Connected Component Labeling (CCL) • There have been a number of attempts to improve performance of CCL: • Bailey and](https://reader033.vdocument.in/reader033/viewer/2022050420/5f8f596aa1c0192288062ce5/html5/thumbnails/17.jpg)
Performance Results• Multiple Images: Hyper-Q
# Streams CCL Serial (s) ACCL (s) Speedup
1 0.25 0.05 5.00x
2 1.08 0.10 10.80x
3 2.16 0.14 15.36x
4 4.18 0.19 21.44x
5 6.09 0.23 25.91x
Fanny Nina-Paravecino and David Kaeli, ICCVG, 15-17 Sep. 2014, Warsaw, Poland
17
![Page 18: Accelerated Connected Component Labeling Using CUDA … · Connected Component Labeling (CCL) • There have been a number of attempts to improve performance of CCL: • Bailey and](https://reader033.vdocument.in/reader033/viewer/2022050420/5f8f596aa1c0192288062ce5/html5/thumbnails/18.jpg)
Performance Results• Stava, O., Benes, B., CCL in CUDA comparison analysis
Mpixels/s Speedup
O. Stava, B. BenesCCL in CUDA
1542 1.0x
ACCL 5242 3.3x
Fanny Nina-Paravecino and David Kaeli, ICCVG, 15-17 Sep. 2014, Warsaw, Poland
18
![Page 19: Accelerated Connected Component Labeling Using CUDA … · Connected Component Labeling (CCL) • There have been a number of attempts to improve performance of CCL: • Bailey and](https://reader033.vdocument.in/reader033/viewer/2022050420/5f8f596aa1c0192288062ce5/html5/thumbnails/19.jpg)
Conclusions• Described Accelerated Connected Component Labeling
(ACCL) using the CUDA framework• Presented evaluation of new features of the NVIDIA
Kepler GPU such as: dynamic parallelism and Hyper-Q• Compared serial CCL, OpenMP CCL with ACCL• Our algorithm scales well as long as we increase the number
of streams
• Dynamic parallelism turns out to be a disadvantage when trying to use a larger number of child thread kernels
Fanny Nina-Paravecino and David Kaeli, ICCVG, 15-17 Sep. 2014, Warsaw, Poland
19
![Page 20: Accelerated Connected Component Labeling Using CUDA … · Connected Component Labeling (CCL) • There have been a number of attempts to improve performance of CCL: • Bailey and](https://reader033.vdocument.in/reader033/viewer/2022050420/5f8f596aa1c0192288062ce5/html5/thumbnails/20.jpg)
ThanksQuestions?