columnstore indexes - best practices for the etl process - damian widera

34
@ITCAMPRO #ITCAMP17 Community Conference for IT Professionals Columnstore indexes – best practices for the ETL process Damian Widera Microsoft Data Platform MVP EUVIC @damianwidera http://sqlblog.com/blogs/damian_widera/default.aspx

Upload: itcamp

Post on 18-Mar-2018

323 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Columnstore indexes - best practices for the ETL process - Damian Widera

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

Columnstore indexes – best

practices for the ETL process

Damian Widera

Microsoft Data Platform MVP

EUVIC

@damianwidera

http://sqlblog.com/blogs/damian_widera/default.aspx

Page 2: Columnstore indexes - best practices for the ETL process - Damian Widera

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

Many thanks to our sponsors & partners!

GOLD

SILVER

PARTNERS

PLATINUM

POWERED BY

Page 3: Columnstore indexes - best practices for the ETL process - Damian Widera

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

Visit Poland this autumn – 16th September

Page 4: Columnstore indexes - best practices for the ETL process - Damian Widera

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

Damian Widera

Project Manager & Technical Lead | EUVIC (www.euvic.pl)

MVP | MCT | MCSE | MCITP

[email protected]

+48 665-229-227

@damian.widera

facebook.com/damian.widera.10

http://sqlblog.com/blogs/damian_widera/default.aspx

Channel9

Kursy MVA

Page 5: Columnstore indexes - best practices for the ETL process - Damian Widera

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

EUVIC

PALO ALTO

NOWY JORK

WARSZAWA

KATOWICE

GLIWICE

BIELSKO BIAŁA

WROCŁAW

CZĘSTOCHOWA

GDYNIA

KRAKÓW

BYDGOSZCZ

WIEDEŃ

BIAŁYSTOK

Page 6: Columnstore indexes - best practices for the ETL process - Damian Widera

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

Customers…

Page 7: Columnstore indexes - best practices for the ETL process - Damian Widera

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

• Introduction to CI

• Three important views at the Clustered Columnstore

Index:

– How to load data efficiently

– How to use the index efficiently

– How to maintain it efficiently

• Internals....

What and how?

Page 8: Columnstore indexes - best practices for the ETL process - Damian Widera

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

Anatomy of a columnstore index

• Traditional (rowstore) clustered

index

Saledate Product Amt GrossPrice SalesTax NetPrice ...

2012-03-08 Candy bar 50 75.00 14.25 89.25 ...

2012-03-10 Smart phone 1 349.50 66.41 419.91 ...

2012-03-11 Apple (bag) 7 31.57 1.89 33.46 ...

2012-03-12 Smart phone 1 349.50 66.41 419.91 ...

2012-03-19 Chair 1 599.50 113.91 713.41 ...

2012-03-20 Chair 3 1,798.50 341.72 2,140.22 ...

2012-03-20 Laptop 2 2,860.00 543.40 3,403.40 ...

2012-03-20 Toy car 3 29.97 5.69 35.66 ...

2012-03-21 Apple (bag) 14 63.14 3.79 66.93 ...

2012-03-24 Pocket knife 1 12.95 2.46 15.41 ...

2012-03-27 Apple (bag) 2 9.02 0.54 9.56 ...

2012-03-28 Candy bar 5 7.50 1.43 8.93 ...

Page 9: Columnstore indexes - best practices for the ETL process - Damian Widera

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

Anatomy of a columnstore index

• Traditional (rowstore) nonclustered

index

Saledate Product Amt GrossPrice SalesTax NetPrice ...

2012-03-08 Candy bar 50 75.00 14.25 89.25 ...

2012-03-10 Smart phone 1 349.50 66.41 419.91 ...

2012-03-11 Apple (bag) 7 31.57 1.89 33.46 ...

2012-03-12 Smart phone 1 349.50 66.41 419.91 ...

2012-03-19 Chair 1 599.50 113.91 713.41 ...

2012-03-20 Chair 3 1,798.50 341.72 2,140.22 ...

2012-03-20 Laptop 2 2,860.00 543.40 3,403.40 ...

2012-03-20 Toy car 3 29.97 5.69 35.66 ...

2012-03-21 Apple (bag) 14 63.14 3.79 66.93 ...

2012-03-24 Pocket knife 1 12.95 2.46 15.41 ...

2012-03-27 Apple (bag) 2 9.02 0.54 9.56 ...

2012-03-28 Candy bar 5 7.50 1.43 8.93 ...

Saledate Amt NetPrice

2012-03-08 50 89.25

2012-03-10 1 419.91

2012-03-11 7 33.46

2012-03-12 1 419.91

2012-03-19 1 713.41

2012-03-20 3 2,140.22

2012-03-20 2 3,403.40

2012-03-20 3 35.66

2012-03-21 14 66.93

2012-03-24 1 15.41

2012-03-27 2 9.56

2012-03-28 5 8.93

Saledate Amt NetPrice

2012-04-08 50 89.25

2012-04-10 1 419.91

2012-04-11 7 33.46

2012-04-12 1 419.91

2012-04-19 1 713.41

2012-04-20 3 2,140.22

2012-04-20 2 3,403.40

2012-04-20 3 35.66

2012-04-21 14 66.93

2012-04-24 1 15.41

2012-04-27 2 9.56

2012-04-28 5 8.93

Page 10: Columnstore indexes - best practices for the ETL process - Damian Widera

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

How Row Mode Works

• Each operator calls child for each row to

“pull” the next row

• Works fine for smaller queries

• Often each operator transition causes L2

cache misses to load instructions/data

• When databases were new, the cost of IO

was MUCH larger than CPU speed and

this never mattered

• Now the equation has changed

Project

Filter

Table Scan

GetRow()…(row returned)

Page 11: Columnstore indexes - best practices for the ETL process - Damian Widera

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

Saledate Product Amt GrossPrice SalesTax NetPrice ...

2012-03-08 Candy bar 50 75.00 14.25 89.25 ...

2012-03-10 Smart phone 1 349.50 66.41 419.91 ...

2012-03-11 Apple (bag) 7 31.57 1.89 33.46 ...

2012-03-12 Smart phone 1 349.50 66.41 419.91 ...

2012-03-19 Chair 1 599.50 113.91 713.41 ...

2012-03-20 Chair 3 1,798.50 341.72 2,140.22 ...

2012-03-20 Laptop 2 2,860.00 543.40 3,403.40 ...

2012-03-20 Toy car 3 29.97 5.69 35.66 ...

2012-03-21 Apple (bag) 14 63.14 3.79 66.93 ...

2012-03-24 Pocket knife 1 12.95 2.46 15.41 ...

2012-03-27 Apple (bag) 2 9.02 0.54 9.56 ...

2012-03-28 Candy bar 5 7.50 1.43 8.93 ...

Anatomy of a columnstore index

• Columnstore index

Saledate

2012-03-08

2012-03-10

2012-03-11

2012-03-12

2012-03-19

2012-03-20

2012-03-20

2012-03-20

2012-03-21

2012-03-24

2012-03-27

2012-03-28

1 m

illio

n r

ow

ch

un

ks

Storage inLOB pages

Page 12: Columnstore indexes - best practices for the ETL process - Damian Widera

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

Saledate Product

2012-03-08 Candy bar

2012-03-10 Smart phone

2012-03-11 Apple (bag)

2012-03-12 Smart phone

2012-03-19 Chair

2012-03-20 Chair

2012-03-20 Laptop

2012-03-20 Toy car

2012-03-21 Apple (bag)

2012-03-24 Pocket knife

2012-03-27 Apple (bag)

2012-03-28 Candy bar

Anatomy of a columnstore index

• Nonclustered columnstore index

12

Page 13: Columnstore indexes - best practices for the ETL process - Damian Widera

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

Anatomy of a columnstore index

• Nonclustered columnstore index

13

Saledate Product Amt

2012-03-08 Candy bar 50

2012-03-10 Smart phone 1

2012-03-11 Apple (bag) 7

2012-03-12 Smart phone 1

2012-03-19 Chair 1

2012-03-20 Chair 3

2012-03-20 Laptop 2

2012-03-20 Toy car 3

2012-03-21 Apple (bag) 14

2012-03-24 Pocket knife 1

2012-03-27 Apple (bag) 2

2012-03-28 Candy bar 5

Page 14: Columnstore indexes - best practices for the ETL process - Damian Widera

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

Anatomy of a columnstore index

• Nonclustered columnstore index

14

Saledate Product Amt GrossPrice

2012-03-08 Candy bar 50 75.00

2012-03-10 Smart phone 1 349.50

2012-03-11 Apple (bag) 7 31.57

2012-03-12 Smart phone 1 349.50

2012-03-19 Chair 1 599.50

2012-03-20 Chair 3 1,798.50

2012-03-20 Laptop 2 2,860.00

2012-03-20 Toy car 3 29.97

2012-03-21 Apple (bag) 14 63.14

2012-03-24 Pocket knife 1 12.95

2012-03-27 Apple (bag) 2 9.02

2012-03-28 Candy bar 5 7.50

Page 15: Columnstore indexes - best practices for the ETL process - Damian Widera

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

Anatomy of a columnstore index

• Nonclustered columnstore index

15

Saledate Product Amt GrossPrice SalesTax

2012-03-08 Candy bar 50 75.00 14.25

2012-03-10 Smart phone 1 349.50 66.41

2012-03-11 Apple (bag) 7 31.57 1.89

2012-03-12 Smart phone 1 349.50 66.41

2012-03-19 Chair 1 599.50 113.91

2012-03-20 Chair 3 1,798.50 341.72

2012-03-20 Laptop 2 2,860.00 543.40

2012-03-20 Toy car 3 29.97 5.69

2012-03-21 Apple (bag) 14 63.14 3.79

2012-03-24 Pocket knife 1 12.95 2.46

2012-03-27 Apple (bag) 2 9.02 0.54

2012-03-28 Candy bar 5 7.50 1.43

Page 16: Columnstore indexes - best practices for the ETL process - Damian Widera

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

Anatomy of a columnstore index

• Nonclustered columnstore index

16

Saledate Product Amt GrossPrice SalesTax NetPrice

2012-03-08 Candy bar 50 75.00 14.25 89.25

2012-03-10 Smart phone 1 349.50 66.41 419.91

2012-03-11 Apple (bag) 7 31.57 1.89 33.46

2012-03-12 Smart phone 1 349.50 66.41 419.91

2012-03-19 Chair 1 599.50 113.91 713.41

2012-03-20 Chair 3 1,798.50 341.72 2,140.22

2012-03-20 Laptop 2 2,860.00 543.40 3,403.40

2012-03-20 Toy car 3 29.97 5.69 35.66

2012-03-21 Apple (bag) 14 63.14 3.79 66.93

2012-03-24 Pocket knife 1 12.95 2.46 15.41

2012-03-27 Apple (bag) 2 9.02 0.54 9.56

2012-03-28 Candy bar 5 7.50 1.43 8.93

Page 17: Columnstore indexes - best practices for the ETL process - Damian Widera

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

An Aside…How CPUs Work

Level 3 Cache (Megabytes)

Level 2 Cache (100s Kilobytes)

L1 Data (32KB)

CPU Core

L1 Instr(32KB)

• Modern CPUs have Multiple Cores

• Cache Hierarchies: L1, L2, L3– Small L1 and L2 per core; L3 shared by all cores on die

– L1 is faster than L2, L2 faster than L3

– CPUs can stall waiting for caches to load

Level 2 Cache (100s Kilobytes)

L1 Data (32KB)

CPU Core

L1 Instr(32KB)

Time to Access Increases each level you need to touch!

Page 18: Columnstore indexes - best practices for the ETL process - Damian Widera

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

Batch Model

• Move from “pull” model to “push”

• Group rows into batches– Re-use instructions while in cache

– Touch all “close” data in each operator

• This model reduces L2 cache misses

• It works best for queries with lots of

rows being processed

Project

Filter

Table Scan

ProcessBatch()

Page 19: Columnstore indexes - best practices for the ETL process - Damian Widera

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

C1 C2 C3 C5C4

Benefits:• Improved compression:

Data from same domain compress better

• Reduced I/O:Fetch only columns needed

• Improved Performance:More data fits in memory

Data stored as rows

Columnstore Refresher = > how is it different?

Data stored as columns

Page 20: Columnstore indexes - best practices for the ETL process - Damian Widera

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

ColumnStore Terminology

C1 C2 C3 C5 C6C4

Row Group

Column Segment

• Column Segment– contains values from one column for a set of rows

• Row Group– Segments for the same set of rows comprise a row group

• Segments are compressed

• Each segment stored in a separate LOB

• Segment is unit of transfer between disk and memory

Page 21: Columnstore indexes - best practices for the ETL process - Damian Widera

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

First – quick recap of the CCI

Page 22: Columnstore indexes - best practices for the ETL process - Damian Widera

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

First – quick recap of the CCI

Page 23: Columnstore indexes - best practices for the ETL process - Damian Widera

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

Columnstore Index – segment elimination

SELECT ProductKey, SUM (SalesAmount) FROM dbo.FactInternetSalesWHERE OrderDateKey < 20101108GROUP BY ProductKey

Column elimination

Segm

ent

elim

inat

ion

OrderDateKey

20101107

20101107

20101107

20101107

20101107

20101108

ProductKey

106

103

109

103

106

106

StoreKey

01

04

04

03

05

02

RegionKey

1

2

2

2

3

1

Quantity

6

1

2

1

4

5

SalesAmount

30.00

17.00

20.00

17.00

20.00

25.00

OrderDateKey

20101108

20101108

20101108

20101109

20101109

20101109

ProductKey

102

106

109

106

106

103

StoreKey

02

03

01

04

04

01

RegionKey

1

2

1

2

2

1

Quantity

1

5

1

4

5

1

SalesAmount

14.00

25.00

10.00

20.00

25.00

17.00

Page 24: Columnstore indexes - best practices for the ETL process - Damian Widera

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

Page 25: Columnstore indexes - best practices for the ETL process - Damian Widera

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

How to load data to the CCI and not get into the troubles

Initial situation: Table is a Heap

– (1) Use INSERT .... SELECT and then create CCI

– (2) Use BULK LOAD and then create CCI

– (3) Use SELECT * INTO and then create CCI

Initial situation: Table already has a CCI

– (1) Use INSERT .... SELECT

– (2) Use BULK LOAD

Page 26: Columnstore indexes - best practices for the ETL process - Damian Widera

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

Page 27: Columnstore indexes - best practices for the ETL process - Damian Widera

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

• The „Magic Number” described by Niko

Neugebauer – 102400

• There is also another magic number: 1048576

How to load data to the CCI – BONUS!!!

Page 28: Columnstore indexes - best practices for the ETL process - Damian Widera

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

Page 29: Columnstore indexes - best practices for the ETL process - Damian Widera

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

How to use the index

• Don’t use it in OLTP scenario – but WHY NOT????

• Update or Insert + Delete?

• What about transaction support?

• Partitioning

Page 30: Columnstore indexes - best practices for the ETL process - Damian Widera

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

Page 31: Columnstore indexes - best practices for the ETL process - Damian Widera

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

How to maintain the index

• Tupple mover revealed

• Reorganize or rebuild the index ?

• Extended events – great monitoring „tool”

Page 32: Columnstore indexes - best practices for the ETL process - Damian Widera

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

• How to make use of the DBCC commands for the CCI ?

• Where is my memory?

• What about memory grants?

• What about memory pressure?

• What about the transaction log usage?

Internals

Page 33: Columnstore indexes - best practices for the ETL process - Damian Widera

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

Resources

• Niko Neugebauer: http://www.nikoport.com/columnstore/

• Benjamin Nevarez: http://www.benjaminnevarez.com/

• Paul White: http://sqlblog.com/blogs/paul_white/

• Remus Rusanu: http://rusanu.com/

• Hugo Kornelis: http://sqlblog.com/blogs/hugo_kornelis/

• Joe Sack: http://www.sqlskills.com/blogs/joe

• Sunil Agarwalhttp://blogs.msdn.microsoft.com/sqlserverstorageengine

Page 34: Columnstore indexes - best practices for the ETL process - Damian Widera

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

Q & A