generalized hash teams for join and group-by

26
Generalized Hash Teams for Join and Group-By Alfons Kemper Donald Kossmann Christian Wiesner Universität Passau Germany

Upload: sheryl

Post on 02-Feb-2016

37 views

Category:

Documents


0 download

DESCRIPTION

Generalized Hash Teams for Join and Group-By. Alfons Kemper Donald Kossmann Christian Wiesner Universität Passau Germany. Outline. Motivating Example Standard Hash Teams Generalized Hash Teams for Joins Generalized Hash Teams for Joins/Grouping False Drops Analysis - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Generalized Hash Teams for Join and Group-By

Generalized Hash Teams for Join and Group-By

Alfons Kemper Donald KossmannChristian Wiesner

Universität PassauGermany

Page 2: Generalized Hash Teams for Join and Group-By

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams

2VLDB´99

Outline Motivating Example Standard Hash Teams Generalized Hash Teams for Joins Generalized Hash Teams for

Joins/Grouping False Drops Analysis Application Examples (TPC-D) Performance Evaluation

Page 3: Generalized Hash Teams for Join and Group-By

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams

3VLDB´99

Traditional Join Plan

Result

R

S

A

A

T

R S T

Page 4: Generalized Hash Teams for Join and Group-By

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams

4VLDB´99

Traditional Hash Team Join Plan[Graefe, Bunker, Cooper: VLDB 98]

R

S

A

A

T

Result

A

AR.AR.A

S.AS.A

T.AT.A

R AA S T

Page 5: Generalized Hash Teams for Join and Group-By

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams

5VLDB´99

Generalized Hash Teams

SA B

4 3

6 2

3 5

7 0

R BA S T

TB ...

3 ...

0 ...

5 ...

2 ...

ST

A B ...

4 3 ...

3 5 ...

6 2 ...

7 0 ...

R... A

... 4

... 3

... 6

... 7

Page 6: Generalized Hash Teams for Join and Group-By

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams

6VLDB´99

bit maps

0 0 0

1 0 1

2 0 1

3 1 0

4 1 0

SA B

4 3

6 2

3 5

7 0

Generalized Hash TeamsT

B ...

3 ...

0 ...

5 ...

2 ...

R... A

... 4

... 3

... 6

... 7

R BA S T

R BA S T

6 m

od 5

=

1

Partitionon B

odd: yelloweven: green

Page 7: Generalized Hash Teams for Join and Group-By

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams

7VLDB´99

Generalized Hash Team for Grouping/Aggregation select c.City, sum(o.Value)from Customer c, Order owhere c.C# = o.C#group by c.City

Agg

Bit-maps(BM)

OrderCustomer

Ptn on C# Ptn on C#

Ptn on City

OrderCustomer

Ptn on City Ptn on BM

Agg

Join and

grouping

team

Page 8: Generalized Hash Teams for Join and Group-By

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams

8VLDB´99

Group (Customer Order )C#City

Customer

Order

C#

City

C#

Partition on Cityand generate bitmaps for C#

Partition withbitmaps for C#

Page 9: Generalized Hash Teams for Join and Group-By

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams

9VLDB´99

Group (Customer Order Lineitem)C#City O#

Customer

Order

Lineitem

O#

C#

City

C#

O#

Partition on Cityand generate bitmaps for C#

Partition withbitmaps for O#

Partition withbitmaps for C#and generate bitmaps for O#

Page 10: Generalized Hash Teams for Join and Group-By

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams

10VLDB´99

bit maps

0 0 0

1 0 1

2 0 1

3 1 1

4 1 0

False Drops

R BA S T

R BA S TR... A

... 4

... 3

... 6

... 7

SA B

4 3

6 2

3 5

7 0

8 4

TB ...

3 ...

0 ...

5 ...

2 ...

4 ...

Page 11: Generalized Hash Teams for Join and Group-By

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams

11VLDB´99

Overlapping Partitions

T

S

R

Customer

Order

Lineitem

Partition onC# and generatebitmaps for O#

Partition withBitmaps

Partition on B andgenerate

bitmaps for A

Partition based on the bitmaps for A

(Customer Order Lineitem)C# O#

Page 12: Generalized Hash Teams for Join and Group-By

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams

12VLDB´99

Applicability ofGeneralized Hash Teams

• for partitioning hierarchical structures A B

Partitionon B

Partition onbitmaps

for A

• but it is also correct for non-strict hierarchies A B (but performance deteriorates)

Page 13: Generalized Hash Teams for Join and Group-By

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams

13VLDB´99

bit maps

0 0 0

1 0 1

2 0 1

3 1 1

4 1 0

Non-strict hierarchyA B

R BA S T

R BA S TR... A

... 4

... 3

... 6

... 7

SA B

4 3

6 2

3 5

7 0

3 2

TB ...

3 ...

0 ...

5 ...

2 ...

T

S

R

Page 14: Generalized Hash Teams for Join and Group-By

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams

14VLDB´99

False Drops Estimation

11

11)1(S

bnnR

b: cardinality of the bitmapsn: number of partitions

probability that some s sets a bit leading to a false drop of an r into a particular partition:

total number of false drops:

conservative approximation:

11

11S

bn

bn

SnR

1)1(

Page 15: Generalized Hash Teams for Join and Group-By

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams

15VLDB´99

Implementation Details:Fine Tuning the Partitioning

0 0 1 01 0 0 02 0 1 03 0 1 14 0 0 05 1 0 0

usedeed

coll

1 00 01 01 10 01 0

R... A

... 4

... 5

... 6

... 3

10010000001001..

Bitmaps

Bloom-Filter[Bratbergsengen]

[Valduriez]

Page 16: Generalized Hash Teams for Join and Group-By

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams

16VLDB´99

Implementation Details:Teaming up Join and Grouping

Group (Customer Order )C#City

Customer

Order

C#

City

C#

Partition on Cityand generate bitmaps for C#

Partition withbitmaps for C#

Page 17: Generalized Hash Teams for Join and Group-By

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams

17VLDB´99

Teaming Up Join and Grouping: Build Phase

HT JoinC# Ptr

HT AggrCity Ptr

Customer1C# City5 PA

13 M25 M23 PA

5

PA

Hash-Area

City Value Hit

PA 0

M 0

M

13

25

23

Page 18: Generalized Hash Teams for Join and Group-By

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams

18VLDB´99

HT JoinC# Ptr

HT AggrCity Ptr

Customer1C# City5 PA

13 M25 M23 PA

5

PA

Hash-Area

City Value Hit

PA 0

M 0

M

13

25

23 Order1

C# Value25 103 665 335 34

13 0

10 1

Teaming Up Join and Grouping: Probe Phase

Page 19: Generalized Hash Teams for Join and Group-By

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams

19VLDB´99

Performance Comparison:Group (Customer Order )C#City

Memory [MB]

Page 20: Generalized Hash Teams for Join and Group-By

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams

20VLDB´99

False Drops Estimation and Measurement

Page 21: Generalized Hash Teams for Join and Group-By

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams

21VLDB´99

Performance Comparison:Group (Customer Order Lineitem)C#City O#

Memory [MB]

Page 22: Generalized Hash Teams for Join and Group-By

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams

22VLDB´99

False Drops Estimation and Measurement

Page 23: Generalized Hash Teams for Join and Group-By

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams

23VLDB´99

Conclusion and Future Work

Look-Ahead Partitioning for Joins and Grouping

Applicable for hierarchical data structures correctness does not depend on strict

hierarchies Applicable for several TPC-D (TPC-H and

TPC-R) queries: e.g., Q5, Q10, Q18 Combining Generalized Hash Teams and

Order Preserving Hash Joins (OHJ)

Page 24: Generalized Hash Teams for Join and Group-By

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams

24VLDB´99

TPC-D Q5

SELECT N_NAME, SUM(L_EXTENDEDPRICE * ( 1 - L_DISCOUNT)) AS REVENUE FROM CUSTOMER, ORDER, LINEITEM, SUPPLIER, NATION, REGION WHERE C_CUSTKEY = O_CUSTKEY AND O_ORDERKEY = L_ORDERKEY AND L_SUPPKEY = S_SUPPKEY AND C_NATIONKEY = S_NATIONKEY AND S_NATIONKEY = N_NATIONKEY AND N_REGIONKEY = R_REGIONKEY AND R_NAME = '[region]' AND O_ORDERDATE >= DATE '[date]' AND O_ORDERDATE < DATE '[date]' + INTERVAL 1 YEAR GROUP BY N_NAME ORDER BY REVENUE DESC;

Page 25: Generalized Hash Teams for Join and Group-By

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams

25VLDB´99

TPC-D Q10

SELECT C_CUSTKEY, C_NAME, SUM(L_EXTENDEDPRICE * (1 - L_DISCOUNT)) AS REVENUE, C_ACCTBAL, N_NAME, C_ADDRESS, C_PHONE, C_COMMENT FROM CUSTOMER, ORDER, LINEITEM, NATION WHERE C_CUSTKEY = O_CUSTKEY AND L_ORDERKEY = O_ORDERKEY AND O_ORDERDATE >= DATE '[date]' AND O_ORDERDATE < DATE '[date]' + INTERVAL 3 MONTH AND L_RETURNFLAG = 'R' AND C_NATIONKEY = N_NATIONKEY GROUP BY C_CUSTKEY, C_NAME, C_ACCTBAL, C_PHONE, N_NAME, C_ADDRESS, C_COMMENT ORDER BY REVENUE DESC;

Page 26: Generalized Hash Teams for Join and Group-By

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams

26VLDB´99

Indirectly Partitioning a Hierarchical Structure

Lineitem

Order

Customer

O#

O#

C#

C#

City

Partition 1 Partition 3Partition 2