compression on...

25
S5455, GTC 2015 High Capability Multidimensional Data High Capability Multidimensional Data Compression on GPUs Sergio E. Zarantonello [email protected] Ed Karrels [email protected] School of Engineering, Santa Clara University

Upload: others

Post on 23-May-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Compression on GPUson-demand.gputechconf.com/gtc/2015/presentation/S5455-Sergio-Zarantonello.pdfS5455, GTC 2015 High Capability Multidimensional Data Compression on GPUs Sergio E

S5455, GTC 2015

High Capability Multidimensional DataHigh Capability Multidimensional Data Compression on GPUs

Sergio E. [email protected]

Ed [email protected]

School of Engineering, Santa Clara University

Page 2: Compression on GPUson-demand.gputechconf.com/gtc/2015/presentation/S5455-Sergio-Zarantonello.pdfS5455, GTC 2015 High Capability Multidimensional Data Compression on GPUs Sergio E

Part 1: Theory and Applications

Challenge• Massive amounts of multidimensional data being generated by

scientific simulations, monitoring devices, and high-end imaging li ti

g

applications.• Growing inability of current networks, conventional computer

hardware and software, to transmit, store, and analyze this data., , , y

Solution• Fast and effective lossy data compression• Fast and effective lossy data compression.• Optimized compression ratios subject to a priori set error bounds,

requiring several iterations of compress/decompress cycle.• GPUs to make the above feasible.

Page 3: Compression on GPUson-demand.gputechconf.com/gtc/2015/presentation/S5455-Sergio-Zarantonello.pdfS5455, GTC 2015 High Capability Multidimensional Data Compression on GPUs Sergio E

Part 1: Theory and Applications

Our goalg• A multidimensional wavelet-based CODEC for large data.• A discrete optimization procedure to provide best compression ratios p p p p

subject to error tolerances and error metrics specified by user • A high performance CUDA implementation for large data, exploiting

parallelism at various levelsparallelism at various levels.• A flexible design allowing for future enhancements (redundant

bases, adaptive dictionaries, compressive sensing, sparse representations, etc.)

• An initial focus on Medical Computed Tomography, Seismic Imaging, and Non-Destructive Testing of Materialsand Non-Destructive Testing of Materials.

Page 4: Compression on GPUson-demand.gputechconf.com/gtc/2015/presentation/S5455-Sergio-Zarantonello.pdfS5455, GTC 2015 High Capability Multidimensional Data Compression on GPUs Sergio E

Part 1: Theory and Applications

Why wavelets ?y• Wavelets are “short” waves “localized” in

both, spatial and frequency domains. p q y• Can be used as basis functions for sparse

representations of data.• Give compact representations of well-behaved data and point

singularities. • Multidimensional wavelets take advantage of data correlation alongMultidimensional wavelets take advantage of data correlation along

all coordinate axes.• Wavelet encoding/decoding can be implemented with fast

algorithmsalgorithms.

Page 5: Compression on GPUson-demand.gputechconf.com/gtc/2015/presentation/S5455-Sergio-Zarantonello.pdfS5455, GTC 2015 High Capability Multidimensional Data Compression on GPUs Sergio E

Part 1: Theory and Applications

Conventional 2d procedurep

Page 6: Compression on GPUson-demand.gputechconf.com/gtc/2015/presentation/S5455-Sergio-Zarantonello.pdfS5455, GTC 2015 High Capability Multidimensional Data Compression on GPUs Sergio E

Part 1: Theory and Applications

Designg• Data decomposed into overlapping cubelets.• Cubelets encoded via biorthogonal wavelet filters along each g g

coordinate axis.• Wavelet coefficients are thresholded, then quantized.• Quantized cubelets are Huffman encoded. • Process is “reversed” to reconstruct the data.• “Hill Climbing” algorithm is implemented to deliver highest

compression possible subject to error constraint(s).

Page 7: Compression on GPUson-demand.gputechconf.com/gtc/2015/presentation/S5455-Sergio-Zarantonello.pdfS5455, GTC 2015 High Capability Multidimensional Data Compression on GPUs Sergio E

2d (frame‐by‐frame) versus 3d procedure   

Part 1: Theory and Applications

6( y ) p

5

6

X-Ray CT scan10 steps of the cardiac cycle

Error

3

4

y512 x 512 x 96 cubehttp://www.osirix-viewer.com

2

0

1

2d procedure 3d procedureSame error rate: PSNR=46

Compression Ratio=6.6Cutoff=88% Bins=1106Max Error= 9.45

Compression Ratio=10.2Cutoff=92% Bins=850Max Error = 5.68

Page 8: Compression on GPUson-demand.gputechconf.com/gtc/2015/presentation/S5455-Sergio-Zarantonello.pdfS5455, GTC 2015 High Capability Multidimensional Data Compression on GPUs Sergio E

Part 1: Theory and Applications

Outline of 3d procedure

p

zy

x zy

xxyz

Page 9: Compression on GPUson-demand.gputechconf.com/gtc/2015/presentation/S5455-Sergio-Zarantonello.pdfS5455, GTC 2015 High Capability Multidimensional Data Compression on GPUs Sergio E

Part 1: Theory and Applications

Optimized compression for given error tolerance

p p g

1 C l l t l t ffi i tError Check

1. Calculate wavelet coefficients 2. Find starting compression parameters 3. Calculate reconstruction error4. Calculate compression Ratio4. Calculate compression Ratio 5. “Hill Climbing” iterations to find maximum compression ratio subject to error tolerance

Page 10: Compression on GPUson-demand.gputechconf.com/gtc/2015/presentation/S5455-Sergio-Zarantonello.pdfS5455, GTC 2015 High Capability Multidimensional Data Compression on GPUs Sergio E

Optimized compression for given error tolerance

Part 1: Theory and Applications

p p g

54

56

12

14

1600

1800

2000

46

48

50

52

4

6

8

10

1000

1200

1400

95

42

44

46

70 75 80 85 90 95

5001000

15002000

25002

476 78 80 82 84 86 88 90 92800

% Cutoff

Cutoff         Bins  

76 %         1850

92 %         850

Begin End 60 65 700

500

PSNRRatio 

1850 52.43.6 X

850 46.2

10.2 X

Page 11: Compression on GPUson-demand.gputechconf.com/gtc/2015/presentation/S5455-Sergio-Zarantonello.pdfS5455, GTC 2015 High Capability Multidimensional Data Compression on GPUs Sergio E

Applications: Optical Coherence Tomography

Part 1: Theory and Applications

pp p g p yObjective: efficient transfer over the internet of high-resolution 3d images of retina for diagnosis.

Dataset courtesy of Quinglin Wang Carl Zeiss Meditec Inc.

100

0200

250

200

300

0

0

100

150

400

500

0

050

00

100 200 300 400 500

600 100 200 300 400 500

0 0

Page 12: Compression on GPUson-demand.gputechconf.com/gtc/2015/presentation/S5455-Sergio-Zarantonello.pdfS5455, GTC 2015 High Capability Multidimensional Data Compression on GPUs Sergio E

Applications: Reverse Time Migration

Part 1: Theory and Applications

pp g

502

3x 10-3

100

1501

2

200

250-1

0

300

350-2

0

0

0

0

100 200 300 400 500 600 700 800

400 -3 0

0

0

0

Page 13: Compression on GPUson-demand.gputechconf.com/gtc/2015/presentation/S5455-Sergio-Zarantonello.pdfS5455, GTC 2015 High Capability Multidimensional Data Compression on GPUs Sergio E

Part 2: Implementationp• Stages of compression 3D CT X‐Ray Inspection

of a carburetor Dataset courtesy of

– Wavelet transform– Threshold

of a carburetor. Dataset courtesy of North Star Imaging

– Threshold– Quantization– Huffman coding

• Overall speedupp p

Page 14: Compression on GPUson-demand.gputechconf.com/gtc/2015/presentation/S5455-Sergio-Zarantonello.pdfS5455, GTC 2015 High Capability Multidimensional Data Compression on GPUs Sergio E

Wavelet Transform on GPUPart 2: Implementation

• Apply convolution• Each row is independent• Within each row, multiple read / write passes

…Before

• 1 row == 1 thread blockodds evens

After

• Synchronize between read & write

Page 15: Compression on GPUson-demand.gputechconf.com/gtc/2015/presentation/S5455-Sergio-Zarantonello.pdfS5455, GTC 2015 High Capability Multidimensional Data Compression on GPUs Sergio E

3d Wavelet TransformPart 2: Implementation

Thread block 0,2

Thread block 0,1

,

Thread block 0,0

Thread block 1,0

Thread block 2,0

Thread block 3,0

Height × depth rows, each one is an independent thread block.

Page 16: Compression on GPUson-demand.gputechconf.com/gtc/2015/presentation/S5455-Sergio-Zarantonello.pdfS5455, GTC 2015 High Capability Multidimensional Data Compression on GPUs Sergio E

Transform Along Each AxisPart 2: Implementation

gX

Transpose XYZ→ YZX

Transpose YZX→ ZXYYYY

ZZZ

Page 17: Compression on GPUson-demand.gputechconf.com/gtc/2015/presentation/S5455-Sergio-Zarantonello.pdfS5455, GTC 2015 High Capability Multidimensional Data Compression on GPUs Sergio E

GPU TransposePart 2: Implementation

p• Access global memory in contiguous order

read write0 1 2 3

4 5 6 7

8 9 10 11

read0 1 2 3

4 5 6 7

8 9 10 11

write(thread indices) Contiguous

write to globalmemory

Globalmemory

Sh d

8 9 10 11

12 13 14 15

write

8 9 10 11

12 13 14 15

memory

0 1 2 3

4 5 6 7

8 9 10 11

Noncontiguousread from sharedmemory

Sharedmemory

write0 4 8 12

1 5 9 13

2 6 10 14

read

8 9 10 11

12 13 14 15

memory2 6 10 14

3 7 11 15

Page 18: Compression on GPUson-demand.gputechconf.com/gtc/2015/presentation/S5455-Sergio-Zarantonello.pdfS5455, GTC 2015 High Capability Multidimensional Data Compression on GPUs Sergio E

OptimizationsPart 2: Implementation

pVersion 1: Global memoryVersion 2: Shared memory

• 2.5× speedupp p

Version 3: Constant factors double → float• 1 6× speedup #define FILTER_0  .8521.6× speedup

Speedup over CPU version: 105x

_#define FILTER_1  .377#define FILTER_2 ‐.110

(860ms → 8.2ms for 256x256x256 cubelet, 8 levels of transforms along each axis)

#define FILTER_0  .852f#define FILTER_1  .377f#define FILTER_2 ‐.110f

Page 19: Compression on GPUson-demand.gputechconf.com/gtc/2015/presentation/S5455-Sergio-Zarantonello.pdfS5455, GTC 2015 High Capability Multidimensional Data Compression on GPUs Sergio E

ThresholdPart 2: Implementation

• Trim smallest x% of values – round to 0

[‐5 .. +5][ 5 .. 5]

• Just sort absolute values using Thrust libraryJust sort absolute values using Thrust library• Speedup over CPU sort: 112× (7.0 toolkit is 35% faster than 6.5)

Page 20: Compression on GPUson-demand.gputechconf.com/gtc/2015/presentation/S5455-Sergio-Zarantonello.pdfS5455, GTC 2015 High Capability Multidimensional Data Compression on GPUs Sergio E

QuantizationPart 2: Implementation

QMap floating point values to small set of integers

• Log : bin size near x proportional to x– Matches data distribution well– Simple function; fast

• Lloyd's algorithm• Lloyd s algorithm– Given starting bins, fine‐tune to minimize overall error– Start with log quantization binsg q– Multiple passes over full data set, time‐consuming

Page 21: Compression on GPUson-demand.gputechconf.com/gtc/2015/presentation/S5455-Sergio-Zarantonello.pdfS5455, GTC 2015 High Capability Multidimensional Data Compression on GPUs Sergio E

Log / Lloyd QuantizationPart 2: Implementation

g / y Q

Log quantization, pSNR 38.269 GPU speedup: 97× (thrust::transform())

Lloyd quantization, pSNR 45.974 GPU speedup: create 13×, apply 48×

Page 22: Compression on GPUson-demand.gputechconf.com/gtc/2015/presentation/S5455-Sergio-Zarantonello.pdfS5455, GTC 2015 High Capability Multidimensional Data Compression on GPUs Sergio E

Huffman EncodingPart 2: Implementation

g• Optimal bit encoding based on value frequencies• Compute histogram on CPU 

– Copy data GPU → CPU: 17msValue Count Encoding

9 16609445 1

8 46198 011

– Compute on CPU: 27ms• Compute histogram on GPU

8 46198 011

10 42896 001

11 32594 000Compute histogram on GPU– No copy needed– Compute: .61ms

7 30831 0101

12 6942 01000

6 5388 010011Compute:  .61ms– Optimization: per‐thread counter for common value

Page 23: Compression on GPUson-demand.gputechconf.com/gtc/2015/presentation/S5455-Sergio-Zarantonello.pdfS5455, GTC 2015 High Capability Multidimensional Data Compression on GPUs Sergio E

Overall CPU→ GPU speedupPart 2: Implementation

CPUWavelet transform

Sort

p p

0 500 1000 1500 2000 2500

GPU

Milliseconds

Quantize

Histogram

GPU: GTX 980CPU I l C i7 3 5GH0 5 10 15 20 25

GPU

CPU: Intel Core i7 3.5GHz0 5 10 15 20 25

MATLAB CPU GPU

Compress 43000 2300 21

Error control 39000 1400 18

time to process 2563 cubelet, in milliseconds

Page 24: Compression on GPUson-demand.gputechconf.com/gtc/2015/presentation/S5455-Sergio-Zarantonello.pdfS5455, GTC 2015 High Capability Multidimensional Data Compression on GPUs Sergio E

Future DirectionsPart 2: Implementation

• Improve performance– Use subsample for training Lloyd's– Use Quickselect to find threshold value– Multiple GPUs

• Improve accuracyp y– Weighted values in Lloyd's algorithm– Normalize values in each quadrantq

Page 25: Compression on GPUson-demand.gputechconf.com/gtc/2015/presentation/S5455-Sergio-Zarantonello.pdfS5455, GTC 2015 High Capability Multidimensional Data Compression on GPUs Sergio E

Our Team

Sergio ZarantonelloSanta Clara Universityszarantonello@scu edu

Ed KarrelsSanta Clara University,[email protected]

Drazen FabrisSanta Clara [email protected]@scu.edu

David ConchaUniversidad Rey Juan Carlos, Spain

[email protected] Anupam GoyalAlgorithmica LLC

[email protected]

Bonnie SmithsonSanta Clara University

[email protected]