icde 2002, san jose, ca efficient temporal join processing using indices donghui zhang university of...

35
ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University of California, Riverside Bernhard Seeger University of Marburg, Germany

Upload: jasmine-hawkins

Post on 14-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University

ICDE 2002, San Jose, CA

Efficient Temporal Join Processing using Indices

Donghui Zhang

University of California, Riverside

Vassilis J. Tsotras

University of California, Riverside

Bernhard SeegerUniversity of Marburg, Germany

Page 2: ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University

ICDE 2002, San Jose, CA

Contents

Problem definition: GTE-Join Straightforward approaches Temporal indexing Proposed join algorithms Performance study Conclusions

Page 3: ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University

ICDE 2002, San Jose, CA

Problem Definition

Temporal record: (key, start, end, attributes) TE-Join: two records qualify for join if

their time intervals intersect; and their keys are equal.

Page 4: ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University

ICDE 2002, San Jose, CA

DeptLocation Dept Start End Locaction D1 1 7 Boston D1 8 20 Riverside D2 1 20 Los Angeles D3 1 15 New York D3 16 20 San Jose

DeptManager Dept Start End Manager D1 1 10 John D1 11 20 Mart D2 1 20 Jane D3 1 20 Alice

DeptLocationManager Dept Start End Location Manager D1 1 7 Boston John D1 8 10 Riverside John D1 11 20 Riverside Mart D2 1 20 Los Angeles Jane D3 1 15 New York Alice D3 16 20 San Jose Alice

TE-Join: “find the locations and Managers of all departments over time”.

Page 5: ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University

ICDE 2002, San Jose, CA

Problem Definition GTE-Join: general TE-Join – record keys

should be in a certain range r and time intervals should intersect a given interval i.

temporal relations are large;

TE-Join is a special case, when r and i are (-, +).

Interesting because:

time

key

r

i

Page 6: ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University

ICDE 2002, San Jose, CA

DeptLocation Dept Start End Location D1 1 7 Boston D1 8 20 Riverside D2 1 20 Los Angeles D3 1 15 New York D3 16 20 San Jose

DeptManager Dept Start End Manager D1 1 10 John D1 11 20 Mart D2 1 20 Jane D3 1 20 Alice

DeptLocationManager Dept Start End Locaction Manager D1 5 7 Boston John D1 8 10 Riverside John D2 5 10 Los Angeles Jane

GTE-Join: “find the locations and managers of departments in range [D1, D2] during time [5, 10]”.

Page 7: ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University

ICDE 2002, San Jose, CA

Straightforward Solutions

Non-indexed join; Unsynchronized join; Synchronized join using B+-trees; Synchronized join using R-trees.

Page 8: ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University

ICDE 2002, San Jose, CA

Straightforward Solutions

1. Non-indexed join: existing TE-Join research [Zur97] focuses on non-indexed join; not efficient for GTE-Join due to full scan.

2. Unsynchronized join: separate the selection and join phases; not efficient for:

storage of intermediate result; selection in one relation ignores data

distribution of the other relation.

Page 9: ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University

ICDE 2002, San Jose, CA

3. Synchronized using B+-trees;

Not efficient:

x2

tmin tmax

i

x1

Straightforward Solutions

If cluster on start:

Cluster on end is similar.

Page 10: ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University

ICDE 2002, San Jose, CA

records with keys in r are stored together and are sorted;

focus on these records in each relation and sort-merge join, while skipping those whose intervals not in i.

However, not efficient since records in the query rectangle are scattered.

3. Synchronized using B+-trees;

Straightforward Solutions

If cluster on key:

Page 11: ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University

ICDE 2002, San Jose, CA

Store each record as a two-dimensional interval in the R-tree;

Use existing R-tree join algorithms [BKS93, HJR97];

Modification: integrate the selection regarding query rectangle.However, not efficient since R-trees do

not handle long intervals well.

4. Synchronized using R-trees;

Straightforward Solutions

Page 12: ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University

ICDE 2002, San Jose, CA

Our Solutions

Synchronized join using temporal indices. Multi-version B+-tree (MVBT) [BGO+96]:

asymptotically optimal space, update, query. We propose: two categories of synchronized,

MVBT-based join algorithms.

(apply to other temporal indices as well)

Page 13: ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University

ICDE 2002, San Jose, CA

Review of MVBT Suppose a page holds up to 3 records.

time

key

t0

Page 14: ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University

ICDE 2002, San Jose, CA

Review of MVBT Suppose a page holds up to 3 records.

time

key

now t0

Page 15: ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University

ICDE 2002, San Jose, CA

Review of MVBT Suppose a page holds up to 3 records.

time

key

now t0 t1

Page 16: ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University

ICDE 2002, San Jose, CA

Review of MVBT Suppose a page holds up to 3 records.

time

key

now t0 t1

Page 17: ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University

ICDE 2002, San Jose, CA

Review of MVBT Suppose a page holds up to 3 records.

time

key

now t0 t1 t2

Page 18: ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University

ICDE 2002, San Jose, CA

time

key

now t0 t1 t2

[t0, t1) Root 1

Page 19: ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University

ICDE 2002, San Jose, CA

time

key

now t0 t1 t2

Root 2

[t0, t1) Root 1

[t1, t2)

Page 20: ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University

ICDE 2002, San Jose, CA

time

key

now t0 t1 t2

Root 2 Root 3

[t1, t2) [t2, now) [t0, t1) Root 1

Page 21: ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University

ICDE 2002, San Jose, CA

Review of MVBT

A “forest”: different trees may overlap; Root nodes correspond to contiguous, non-

intersecting time intervals; A record may be stored in multiple pages;

end time of all but the last copy is +. Range-Interval selection algorithms [BS96]:

avoid duplicate by reporting the first copy.

Page 22: ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University

ICDE 2002, San Jose, CA

The Incorrect End Time Problem

key

time

y

t1 copy point

t2

x

Solution: report the rightmost copy!

[BS96] reports first copy of x (whose end is +); would lead GTE-Join algorithms to join x with y.

Page 23: ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University

ICDE 2002, San Jose, CA

Top-down Approaches

Idea: for each pair of trees, one from each MVBT forest, synchronized tree traversal (STT).

STT for two trees: initially, join root nodes; to join two nodes, join their children; eventually, join elements in leaf pages.

? join condition?

Page 24: ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University

ICDE 2002, San Jose, CA

Balancing Condition Optimization (BCO)

To find <x, y>, Page 3 and page 0 has to join;

page 2

page 3 x

y

page 1

page 0

BCO: balancing two conditions. (1) only intersecting pages join; (2) examine records even if not last copy. E.g. join <x, y> when joining page 2 with page 0.

In general, join two pages even though they do not intersect. Inefficient!

page 4 page 5

page 6

Page 25: ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University

ICDE 2002, San Jose, CA

Virtual Height Optimization (VHO)

A1

B4 B5 B6 B7

B2 B3

B1

A3 A4 A2

At the middle level, STT joins:<A2, B2>, <A3, B2>, <A4, B2>,<A2, B3>, <A3, B3> ,<A4, B3>

A1’

With VHO: <A1, B2>, <A1, B3>

Page 26: ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University

ICDE 2002, San Jose, CA

Sideways Approach 1: Link-based

A

B

C

In each leaf page, store a pointer to its predecessor;

D find pairs of data pages that intersect with the

right border of the query rectangle and with each other;

keep such pairs in priority queue; sweep left synchronously.

For GTE-Join:

Page 27: ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University

ICDE 2002, San Jose, CA

Sideways Approach 1: Link-based

A

B

C

In each leaf page, store a pointer to its predecessor;

D

special techniques to avoid duplicates.

find pairs of data pages that intersect with the right border of the query rectangle and with each other;

keep such pairs in priority queue; sweep left synchronously.

For GTE-Join:

Page 28: ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University

ICDE 2002, San Jose, CA

Sideways Approach 2: Plane Sweep

Similar to link-based; Maintain two priority queues, one for each

MVBT; At each step, access the leaf page with the

largest end time and add records to buffer; To add records to buffer, join with

existing records from the other MVBT; Throw away useless records.

Page 29: ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University

ICDE 2002, San Jose, CA

Performance StudyNotation: Meaning:

mvbt_df Synchronized MVBT, depth-first

mvbt_bf Synchronized MVBT, breadth-first

mvbt_link Synchronized MVBT, link-based

mvbt_ps Synchronized MVBT, plane-sweep

mvbt_sm Unsynchronized, sort-merge after selection

b+ Synchronized B+-tree, index on keyr*_df Synchronized R*-tree, depth-first

r*_bf Synchronized R*-tree, breadth-first

Page 30: ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University

ICDE 2002, San Jose, CA

Experimental Setup

• Implemented in GNU C++;• Sun Enterprise 250 Server machine with two

UltraSPARC-II processors using Solaris 2.8;• Page size = 8KB;• Buffer size = 10MB; LRU buffer;• Each data set: 10 million records;• QRS: size ratio between the query rectangle

and the whole space.• Long intervals: 1/100 of time space;• Short intervals: 1/10,000 of time space.

Page 31: ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University

ICDE 2002, San Jose, CA

GTE-Join Performance

mvbt_df

mvbt_bf

mvbt_link

mvbt_ps

mvbt_sm

b+ r*_df r*_bf

0

1000

2000

3000

4000

IO

CPU

Tot

al T

ime

(# s

ec)

Joining mainly long intervals.

Page 32: ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University

ICDE 2002, San Jose, CA

GTE-Join Performance

Joining mainly short intervals.

mvbt_df

mvbt_bf

mvbt_link

mvbt_ps

mvbt_sm

b+ r*_df r*_bf

0

500

1000

1500

2000

2500

IO

CPU

Tot

al T

ime

(# s

ec)

Page 33: ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University

ICDE 2002, San Jose, CA

GTE-Join Performance

Varying QRS.

0.1% 1% 10%10

100

1000

10000

100000

mvbt_df

mvbt_link

mvbt_ps

mvbt_sm

b+

r*_dfTo

tal T

ime

(#

se

c)

(Log Scale)

Page 34: ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University

ICDE 2002, San Jose, CA

Conclusions We addressed the GTE-Join; Unsynchronized approach not efficient; Synchronized approaches based on traditional

indices (B+-tree, R-tree) also not efficient; We proposed synchronized approaches based on

temporal indices (MVBT); We also proposed BCO and VHO optimizations; Experiments: link-based is the best.

Page 35: ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University

ICDE 2002, San Jose, CA