on-disk bitmap index in bizgres

18
1 On-Disk Bitmap Index In Bizgres Ayush Parashar [email protected] and Jie Zhang [email protected]

Upload: benoit

Post on 04-Feb-2016

45 views

Category:

Documents


0 download

DESCRIPTION

On-Disk Bitmap Index In Bizgres. Ayush Parashar [email protected] and Jie Zhang [email protected]. Agenda. Introduction to On-Disk Bitmap Index Bitmap index creation Bitmap index creation performance - index size and creation time Performance with varying cardinality - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: On-Disk Bitmap Index In Bizgres

1

On-Disk Bitmap Index In Bizgres

Ayush [email protected] and

Jie [email protected]

Page 2: On-Disk Bitmap Index In Bizgres

2

Agenda

Introduction to On-Disk Bitmap Index Bitmap index creation Bitmap index creation performance - index size

and creation time Performance with varying cardinality Query performance Summary

Page 3: On-Disk Bitmap Index In Bizgres

3

Introduction

Access method specially efficient for low-cardinality column of high-dimensional fact table

Takes a fraction of space as compared to B-tree

Very less index creation time

Very effective on queries with multiple-conditions in where clause

Page 4: On-Disk Bitmap Index In Bizgres

4

Introduction: continued..

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.

Consider the following data:

Page 5: On-Disk Bitmap Index In Bizgres

5

Introduction: B-Tree representation

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 6: On-Disk Bitmap Index In Bizgres

6

Introduction: Bitmap representation

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 7: On-Disk Bitmap Index In Bizgres

7

Bitmap index creation

Steps in bitmap-index creation Build bin indexes Apply encoding schemes: Equality encoding scheme

used in the present implementation Apply compression schemes

Contributors: Jie Zhang, Mark Kirkwood, Gavin Sherry

Page 8: On-Disk Bitmap Index In Bizgres

8

Bitmap index creation: continued..

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 9: On-Disk Bitmap Index In Bizgres

9

On-disk Bitmap Index - Performance

Table definition from OSDL - DBT 3 benchmark Size of table is corresponding to scale factor 10 for DBT 3

Table Name Indexed Column Data Type Cardinality

LINEITEM L_SHIPMODE character(10) 7

LINEITEM L_QUANTITY numeric(15,2) 50

LINEITEM L_LINENUMBER integer 7

LINEITEM L_SHIPMODE, L_QUANTITY

* multi-column index

character(10), numeric(15,2)

350

ORDERS O_ORDERSTATUS character(1) 3

ORDERS O_ORDERPRIORITY character(15) 5

CUSTOMER C_MKTSEGMENT character(10) 5

CUSTOMER C_NATIONKEY integer 25

Page 10: On-Disk Bitmap Index In Bizgres

10

On-disk Bitmap Index - Index Size Performance

Index size - Size of bitmap index is a fraction of B-tree index

58 117 59176

5 11 1 2

1804 1804

1285

2845

321

580

45 32

0

500

1000

1500

2000

2500

3000

l_shipmode l_quantity l_linenumber l_sm & l_q o_orderstatus o_orderpriorityc_mktsegment c_nationkey

Size of the index (in MBytes)

Bitmap B-tree

Page 11: On-Disk Bitmap Index In Bizgres

11

On-disk Bitmap Index - Creation Time Performance

Index creation time: Up to 7 times faster index creation

454.9547.2

374.6

948.7

83.5 108.510.9 8.3

2217.1

937.8

412.4

2933.4

241.4

679.2

51.4 9.30

500

1000

1500

2000

2500

3000

3500

l_shipmode l_quantity l_linenumber l_sm & l_q o_orderstatus o_orderpriorityc_mktsegment c_nationkey

Time taken to create the index (seconds)

Bitmap B-Tree

Page 12: On-Disk Bitmap Index In Bizgres

12

On-disk Bitmap Index - Performance with varying cardinality

Index size with varying cardinality: Total rows 250 million

Variation in index size with change in column cardinality

0

1000

2000

3000

4000

5000

6000

7000

0 2000 4000 6000 8000 10000 12000

Column Cardinality 50, 100, 500, 1000, 2000, 4000, 10000 (total rows 250 million)

Index size in MB

Bitmap IndexB-tree Index

Page 13: On-Disk Bitmap Index In Bizgres

13

On-disk Bitmap Index - Performance with varying cardinality

Index creation time with varying cardinality: Total rows 250 million

Variation in index creation time with varying cardinality

0

200

400

600

800

1000

1200

1400

1600

0 2000 4000 6000 8000 10000 12000

Column Cardinality: 50, 100, 500, 1000, 2000, 4000, 10000 (total rows 250 million)

Index creation time in seconds

Bitmap IndexB-tree Index

Page 14: On-Disk Bitmap Index In Bizgres

14

On-disk Bitmap Index - Query Performance

Query 1

SELECT sum(lineitem.l_discount)

FROM

lineitem, orders, customer, nation

WHERE

nation.n_name='UNITED STATES' AND

customer.c_mktsegment='AUTOMOBILE' AND

orders.o_orderstatus='P' AND

orders.o_orderpriority='2-HIGH' AND

lineitem.l_quantity=5 AND

lineitem.l_shipmode='AIR' AND

lineitem.l_linenumber=5 AND

customer.c_custkey=orders.o_custkey AND

orders.o_orderkey=lineitem.l_orderkey AND

nation.n_nationkey=customer.c_nationkey;

Query 2

SELECT avg(lineitem.l_tax)

FROM

lineitem, orders

WHERE

orders.o_orderstatus='F' AND

orders.o_orderpriority='4-NOT SPECIFIED' AND

lineitem.l_linenumber=5 AND

lineitem.l_shipmode='TRUCK' AND

lineitem.l_quantity=2 AND

orders.o_orderkey=lineitem.l_orderkey;

Query 3 SELECT count(*) FROM lineitem WHERE l_linenumber=1;

Query 4

SELECT count(*) FROM lineitem WHERE l_linenumber in (1,2) AND l_shipmode IN ('RAIL','TRUCK');

Query 5

SELECT count(*) FROM lineitem WHERE l_linenumber=5 AND l_shipmode='RAIL' AND l_quantity=18;

Page 15: On-Disk Bitmap Index In Bizgres

15

On-disk Bitmap Index - Query Performance

Query Performance: Run1, Run2 and Run3 indicate that the same query has been run consecutively three-times

136.3

60.4

101.4 108.6

53.9

1.7

60.6

99.4 109.1

0.81.7

68.3

98.7109.3

0.6

386.5 386.6

181.8

283.4

350.8

392.8 386.7

181

283.3

346.7

395.1 388.7

180.5

283.3

352.2

0

50

100

150

200

250

300

350

400

450

Query 1 Query 2 Query 3 Query 4 Query 5Time taken for query completion (seconds)

Bitmap - Run1 Bitmap - Run2 Bitmap - Run3 B-tree - Run1

B-tree - Run2 B-tree - Run3

Page 16: On-Disk Bitmap Index In Bizgres

16

On-disk Bitmap Index - Query Performance

QUERY PLAN

-----------------------------------------------------------------------------

Aggregate (cost=713682.71..713682.72 rows=1 width=9)

-> Merge Join (cost=706473.11..713678.58 rows=1649 width=9)

Merge Cond: ("outer".l_orderkey = "inner".o_orderkey)

-> Sort (cost=122798.17..122841.70 rows=17412 width=13)

Sort Key: lineitem.l_orderkey

-> Bitmap Heap Scan on lineitem (cost=57440.64..121571.69 rows=17412 width=13)

Recheck Cond: ((l_quantity = 2::numeric) AND (l_linenumber = 5) AND (l_shipmode = 'TRUCK'::bpchar))

-> BitmapAnd (cost=57440.64..57440.64 rows=17412 width=0)

-> Bitmap Index Scan on l_quantity_bm_idx(on-disk bitmap index) (cost=0.00..4500.02 rows=1199721 width=0) Index Cond: (l_quantity = 2::numeric)

-> Bitmap Index Scan on l_linenumber_bm_idx(on-disk bitmap index) (cost=0.00..22779.89 rows=6278540 width=0) Index Cond: (l_linenumber = 5)

-> Bitmap Index Scan on l_shipmode_bm_idx(on-disk bitmap index) (cost=0.00..30160.23 rows=8318065 width=0) Index Cond: (l_shipmode = 'TRUCK'::bpchar)

-> Sort (cost=583674.94..587225.94 rows=1420397 width=4)

Sort Key: orders.o_orderkey

-> Bitmap Heap Scan on orders (cost=10318.21..372165.55 rows=1420397 width=4)

Recheck Cond: (o_orderpriority = '4-NOT SPECIFIED'::bpchar)

Filter: (o_orderstatus = 'F'::bpchar)

-> Bitmap Index Scan on o_orderpriority_bm_idx(on-disk bitmap index) (cost=0.00..10318.21 rows=2869489 width=0) Index Cond: (o_orderpriority = '4-NOT SPECIFIED'::bpchar)

(21 rows)

Example: Query I explain plan output

Page 17: On-Disk Bitmap Index In Bizgres

17

Summary

On-disk Bitmap Index: New feature in Bizgres 0.9 Provides dramatic improvements in index creation

time and space used by the index Dramatically improves response time for large

classes of ad hoc data-warehousing queries

Page 18: On-Disk Bitmap Index In Bizgres

18

Thanks

Questions..?

References

Bitmap Index Design and Evaluation: Chee-Yong Chan and Yannis E. Ioannidis Compressed bitmap indices for efficient query processing: Kesheng Wu Ekow J. Otoo

Arie Shoshani On-Disk Bitmap Index Performance in Bizgres 0.9 - A Greenplum Whitepaper:

http://bgn.greenplum.com http://www.bizgres.org/home.php