on-disk bitmap index in bizgres
DESCRIPTION
On-Disk Bitmap Index In Bizgres. Ayush Parashar [email protected] and Jie Zhang [email protected]. Agenda. Introduction to On-Disk Bitmap Index Bitmap index creation Bitmap index creation performance - index size and creation time Performance with varying cardinality - PowerPoint PPT PresentationTRANSCRIPT
2
Agenda
Introduction to On-Disk Bitmap Index Bitmap index creation Bitmap index creation performance - index size
and creation time Performance with varying cardinality Query performance Summary
3
Introduction
Access method specially efficient for low-cardinality column of high-dimensional fact table
Takes a fraction of space as compared to B-tree
Very less index creation time
Very effective on queries with multiple-conditions in where clause
4
Introduction: continued..
QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.
Consider the following data:
5
Introduction: B-Tree representation
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
6
Introduction: Bitmap representation
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
7
Bitmap index creation
Steps in bitmap-index creation Build bin indexes Apply encoding schemes: Equality encoding scheme
used in the present implementation Apply compression schemes
Contributors: Jie Zhang, Mark Kirkwood, Gavin Sherry
8
Bitmap index creation: continued..
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
9
On-disk Bitmap Index - Performance
Table definition from OSDL - DBT 3 benchmark Size of table is corresponding to scale factor 10 for DBT 3
Table Name Indexed Column Data Type Cardinality
LINEITEM L_SHIPMODE character(10) 7
LINEITEM L_QUANTITY numeric(15,2) 50
LINEITEM L_LINENUMBER integer 7
LINEITEM L_SHIPMODE, L_QUANTITY
* multi-column index
character(10), numeric(15,2)
350
ORDERS O_ORDERSTATUS character(1) 3
ORDERS O_ORDERPRIORITY character(15) 5
CUSTOMER C_MKTSEGMENT character(10) 5
CUSTOMER C_NATIONKEY integer 25
10
On-disk Bitmap Index - Index Size Performance
Index size - Size of bitmap index is a fraction of B-tree index
58 117 59176
5 11 1 2
1804 1804
1285
2845
321
580
45 32
0
500
1000
1500
2000
2500
3000
l_shipmode l_quantity l_linenumber l_sm & l_q o_orderstatus o_orderpriorityc_mktsegment c_nationkey
Size of the index (in MBytes)
Bitmap B-tree
11
On-disk Bitmap Index - Creation Time Performance
Index creation time: Up to 7 times faster index creation
454.9547.2
374.6
948.7
83.5 108.510.9 8.3
2217.1
937.8
412.4
2933.4
241.4
679.2
51.4 9.30
500
1000
1500
2000
2500
3000
3500
l_shipmode l_quantity l_linenumber l_sm & l_q o_orderstatus o_orderpriorityc_mktsegment c_nationkey
Time taken to create the index (seconds)
Bitmap B-Tree
12
On-disk Bitmap Index - Performance with varying cardinality
Index size with varying cardinality: Total rows 250 million
Variation in index size with change in column cardinality
0
1000
2000
3000
4000
5000
6000
7000
0 2000 4000 6000 8000 10000 12000
Column Cardinality 50, 100, 500, 1000, 2000, 4000, 10000 (total rows 250 million)
Index size in MB
Bitmap IndexB-tree Index
13
On-disk Bitmap Index - Performance with varying cardinality
Index creation time with varying cardinality: Total rows 250 million
Variation in index creation time with varying cardinality
0
200
400
600
800
1000
1200
1400
1600
0 2000 4000 6000 8000 10000 12000
Column Cardinality: 50, 100, 500, 1000, 2000, 4000, 10000 (total rows 250 million)
Index creation time in seconds
Bitmap IndexB-tree Index
14
On-disk Bitmap Index - Query Performance
Query 1
SELECT sum(lineitem.l_discount)
FROM
lineitem, orders, customer, nation
WHERE
nation.n_name='UNITED STATES' AND
customer.c_mktsegment='AUTOMOBILE' AND
orders.o_orderstatus='P' AND
orders.o_orderpriority='2-HIGH' AND
lineitem.l_quantity=5 AND
lineitem.l_shipmode='AIR' AND
lineitem.l_linenumber=5 AND
customer.c_custkey=orders.o_custkey AND
orders.o_orderkey=lineitem.l_orderkey AND
nation.n_nationkey=customer.c_nationkey;
Query 2
SELECT avg(lineitem.l_tax)
FROM
lineitem, orders
WHERE
orders.o_orderstatus='F' AND
orders.o_orderpriority='4-NOT SPECIFIED' AND
lineitem.l_linenumber=5 AND
lineitem.l_shipmode='TRUCK' AND
lineitem.l_quantity=2 AND
orders.o_orderkey=lineitem.l_orderkey;
Query 3 SELECT count(*) FROM lineitem WHERE l_linenumber=1;
Query 4
SELECT count(*) FROM lineitem WHERE l_linenumber in (1,2) AND l_shipmode IN ('RAIL','TRUCK');
Query 5
SELECT count(*) FROM lineitem WHERE l_linenumber=5 AND l_shipmode='RAIL' AND l_quantity=18;
15
On-disk Bitmap Index - Query Performance
Query Performance: Run1, Run2 and Run3 indicate that the same query has been run consecutively three-times
136.3
60.4
101.4 108.6
53.9
1.7
60.6
99.4 109.1
0.81.7
68.3
98.7109.3
0.6
386.5 386.6
181.8
283.4
350.8
392.8 386.7
181
283.3
346.7
395.1 388.7
180.5
283.3
352.2
0
50
100
150
200
250
300
350
400
450
Query 1 Query 2 Query 3 Query 4 Query 5Time taken for query completion (seconds)
Bitmap - Run1 Bitmap - Run2 Bitmap - Run3 B-tree - Run1
B-tree - Run2 B-tree - Run3
16
On-disk Bitmap Index - Query Performance
QUERY PLAN
-----------------------------------------------------------------------------
Aggregate (cost=713682.71..713682.72 rows=1 width=9)
-> Merge Join (cost=706473.11..713678.58 rows=1649 width=9)
Merge Cond: ("outer".l_orderkey = "inner".o_orderkey)
-> Sort (cost=122798.17..122841.70 rows=17412 width=13)
Sort Key: lineitem.l_orderkey
-> Bitmap Heap Scan on lineitem (cost=57440.64..121571.69 rows=17412 width=13)
Recheck Cond: ((l_quantity = 2::numeric) AND (l_linenumber = 5) AND (l_shipmode = 'TRUCK'::bpchar))
-> BitmapAnd (cost=57440.64..57440.64 rows=17412 width=0)
-> Bitmap Index Scan on l_quantity_bm_idx(on-disk bitmap index) (cost=0.00..4500.02 rows=1199721 width=0) Index Cond: (l_quantity = 2::numeric)
-> Bitmap Index Scan on l_linenumber_bm_idx(on-disk bitmap index) (cost=0.00..22779.89 rows=6278540 width=0) Index Cond: (l_linenumber = 5)
-> Bitmap Index Scan on l_shipmode_bm_idx(on-disk bitmap index) (cost=0.00..30160.23 rows=8318065 width=0) Index Cond: (l_shipmode = 'TRUCK'::bpchar)
-> Sort (cost=583674.94..587225.94 rows=1420397 width=4)
Sort Key: orders.o_orderkey
-> Bitmap Heap Scan on orders (cost=10318.21..372165.55 rows=1420397 width=4)
Recheck Cond: (o_orderpriority = '4-NOT SPECIFIED'::bpchar)
Filter: (o_orderstatus = 'F'::bpchar)
-> Bitmap Index Scan on o_orderpriority_bm_idx(on-disk bitmap index) (cost=0.00..10318.21 rows=2869489 width=0) Index Cond: (o_orderpriority = '4-NOT SPECIFIED'::bpchar)
(21 rows)
Example: Query I explain plan output
17
Summary
On-disk Bitmap Index: New feature in Bizgres 0.9 Provides dramatic improvements in index creation
time and space used by the index Dramatically improves response time for large
classes of ad hoc data-warehousing queries
18
Thanks
Questions..?
References
Bitmap Index Design and Evaluation: Chee-Yong Chan and Yannis E. Ioannidis Compressed bitmap indices for efficient query processing: Kesheng Wu Ekow J. Otoo
Arie Shoshani On-Disk Bitmap Index Performance in Bizgres 0.9 - A Greenplum Whitepaper:
http://bgn.greenplum.com http://www.bizgres.org/home.php