1 comparison of access methods for time-evolving data betty salzberg and vassilis tsotras cs599,...

45
1 Comparison of Access Methods for Time-Evolving Data Betty Salzberg and Vassilis Tsotras CS599, Temporal and Spatial Databases Course. Presented by: Atousa Golpayegani 11/16/2000

Post on 20-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 Comparison of Access Methods for Time-Evolving Data Betty Salzberg and Vassilis Tsotras CS599, Temporal and Spatial Databases Course. Presented by: Atousa

1

Comparison of Access Methods for Time-Evolving Data

Betty Salzberg and Vassilis Tsotras

CS599, Temporal and Spatial Databases Course.

Presented by:

Atousa Golpayegani

11/16/2000

Page 2: 1 Comparison of Access Methods for Time-Evolving Data Betty Salzberg and Vassilis Tsotras CS599, Temporal and Spatial Databases Course. Presented by: Atousa

2

Temporal database design problem

Access Methods

DBMS Software

Access Method?

Temporal Queries

Time Dimension

Database Designer

B+-tree

R-treesnapshot-index

time-index

Page 3: 1 Comparison of Access Methods for Time-Evolving Data Betty Salzberg and Vassilis Tsotras CS599, Temporal and Spatial Databases Course. Presented by: Atousa

3

Temporal database design problem

Access Methods

DBMS Software

Access Method

Temporal Queries

Selected Time Dimension

Access MethodSelection Criteria

Database Designer

B+-tree R-tree

snapshot-index

time-index

Page 4: 1 Comparison of Access Methods for Time-Evolving Data Betty Salzberg and Vassilis Tsotras CS599, Temporal and Spatial Databases Course. Presented by: Atousa

4

Outline

Brief introduction to temporal databases

Criteria for comparison of access methods

Efficient method design for transaction/Bitemporal data

Method classification and comparison

Page 5: 1 Comparison of Access Methods for Time-Evolving Data Betty Salzberg and Vassilis Tsotras CS599, Temporal and Spatial Databases Course. Presented by: Atousa

5

Conventional Vs. Temporal Databases

Conventional database captures only a single snapshot of the modeled reality, usually the most current. Can not support past and future data.

Ex: “ the current salary of an employee”.

Temporal database supports some time domain and is thus able to manage time varying data. User-defined time is excluded.

Ex: “ the current and the previous salaries of an employee since hiring”.

Page 6: 1 Comparison of Access Methods for Time-Evolving Data Betty Salzberg and Vassilis Tsotras CS599, Temporal and Spatial Databases Course. Presented by: Atousa

6

What is Stored in a Temporal Database?

Tuple-versioning Temporal model is used. In this model Database is a set of records that store the versions of the real life objects. Each record has: A time invariant key A number of time variant attributes (for simplicity just one) One or two intervals

Start time End time

Ex:

Employee (ss#, name, salary, [t1, now) )

Page 7: 1 Comparison of Access Methods for Time-Evolving Data Betty Salzberg and Vassilis Tsotras CS599, Temporal and Spatial Databases Course. Presented by: Atousa

7

Taxonomy of Time in a Temporal Database

Transaction Time Is defined as the time when a fact is stored in the database.

Valid Time Is defined as the time when a fact becomes effective (valid) in

reality. Bitemporal Time

Is the combination of the above two types. Time is assumed to be discrete, and consecutive nonnegative

integers Any change is assumed to occur only at an indicated time

Addition of an object Deletion of an object Value change of an object’s attribute

Page 8: 1 Comparison of Access Methods for Time-Evolving Data Betty Salzberg and Vassilis Tsotras CS599, Temporal and Spatial Databases Course. Presented by: Atousa

8

Transaction Time Database

Time Interval shows when an object was added and deleted. An object is alive from the time it is added to the database until

it is deleted. Ex. Object ‘a’ is the only one that is currently alive.

No record is physically deleted (logical deletion). Therefore database has logical state Ex. Object ‘b’ is deleted.

Logical state at time t consists of those records whose transaction time interval contains t. Ex. S(T4) = (‘a’ , ‘b’)

T1 T2 T3 T4 T5 Time

ab b

Logical state T4

Page 9: 1 Comparison of Access Methods for Time-Evolving Data Betty Salzberg and Vassilis Tsotras CS599, Temporal and Spatial Databases Course. Presented by: Atousa

9

Assumptions in a Transaction Time Database

Past states can not be changed.

Linear transaction time evolution A New database state is created by updating only the current

database state. Another option is called branching transaction time evolution, where

new states can be created from any past database state.

Implicit updating assumption If an object is updated at time t, the database system will be

updated at the same time. There is no delay.

Page 10: 1 Comparison of Access Methods for Time-Evolving Data Betty Salzberg and Vassilis Tsotras CS599, Temporal and Spatial Databases Course. Presented by: Atousa

10

Transaction Time Database Example

T1 T2 T3 T4 T5 Time

(SS1,A,X1)(SS2,B,X2) (SS12B,X2)

key Salary Transaction Time

SS1

SS2

SS2

X1

X2

X2

[ T1, now)

[ T2, T3)

[T4, T5)

Name

A

B

B

Page 11: 1 Comparison of Access Methods for Time-Evolving Data Betty Salzberg and Vassilis Tsotras CS599, Temporal and Spatial Databases Course. Presented by: Atousa

11

T1 T2 T3 T4 T5 Time

ab b

Logical state T4

Transaction Time Database Access Method Characteristic

Store the past logical states.

Support addition/deletion/modification changes on the objects of the current logical state.

Efficiently access and query objects in any of logical states.

Page 12: 1 Comparison of Access Methods for Time-Evolving Data Betty Salzberg and Vassilis Tsotras CS599, Temporal and Spatial Databases Course. Presented by: Atousa

12

Valid Time Database

Time Interval shows the validity period of an object in reality. Ex. The period that a contract is valid.

An object that is not valid anymore will be physically deleted from the database.

Only the latest “snapshot” of the collection of interval-objects are kept.

Valid TimeValid Time

I1I2

State i-1 State i

Page 13: 1 Comparison of Access Methods for Time-Evolving Data Betty Salzberg and Vassilis Tsotras CS599, Temporal and Spatial Databases Course. Presented by: Atousa

13

Valid Time Database Access Method Characteristic

Store the latest collection of interval-objects.

Support addition/deletion/modification changes to this collection.

Efficiently query the interval-objects contained in the collection.

Valid Time

I2

Page 14: 1 Comparison of Access Methods for Time-Evolving Data Betty Salzberg and Vassilis Tsotras CS599, Temporal and Spatial Databases Course. Presented by: Atousa

14

Bitemporal Database

Supports both transaction time and valid time. Instead of a single collection of interval-objects, there is a

sequence of collections, C(ti), indexed by transaction time.

Transaction time and valid time are two orthogonal time dimensions.

Ex. (Contract#, amount, duration, [t1, now))

t1 t4t3t2C(t1) C(t4)C(t3)C(t2)

Page 15: 1 Comparison of Access Methods for Time-Evolving Data Betty Salzberg and Vassilis Tsotras CS599, Temporal and Spatial Databases Course. Presented by: Atousa

15

Bitemporal Database Access Method Characteristic

Store its past logical states.

Support addition/deletion/modification changes on the interval-objects of its current logical state.

Efficiently access and query the interval-objects on any of its states.

Page 16: 1 Comparison of Access Methods for Time-Evolving Data Betty Salzberg and Vassilis Tsotras CS599, Temporal and Spatial Databases Course. Presented by: Atousa

16

Criteria for Comparison of Access Methods

Query

Access Method Costs

Index Pagination and Data Clustering

Migration of Past Data to Another Location

Lower Bounds on I/O Complexity

Page 17: 1 Comparison of Access Methods for Time-Evolving Data Betty Salzberg and Vassilis Tsotras CS599, Temporal and Spatial Databases Course. Presented by: Atousa

17

Query

One of the criteria for the comparison is how efficient an access method answers a query.

Three general classes of queries are chosen.

They can be used for both valid time and transaction time since

from a query perspective, valid time and transaction time are

simply collections of intervals.

Page 18: 1 Comparison of Access Methods for Time-Evolving Data Betty Salzberg and Vassilis Tsotras CS599, Temporal and Spatial Databases Course. Presented by: Atousa

18

Query Type I

Given a contiguous interval T, find all objects alive during this interval.

Representative query for this type:

Pure-timeslice query : A special case of type I when interval

T is reduced to a single time instant t

Ex.” Find all employees working at the company at time T’ “

T1 T2 T3 T8 T9 Time

ab b

TT’

Page 19: 1 Comparison of Access Methods for Time-Evolving Data Betty Salzberg and Vassilis Tsotras CS599, Temporal and Spatial Databases Course. Presented by: Atousa

19

Query Type II

Given a key range and contiguous time interval T, find the objects with keys in the given range that are alive during interval T.

Representative query for this type:

Range-timeslice query : A special case of type II when interval T is reduced to a single time instant t

Ex. “ Find the employees working at the company at time T’ and whose ssn belongs in range K “

T1 T2 T3 T8 T9 Time

ab b

TT’Range = {a,b}

c

Page 20: 1 Comparison of Access Methods for Time-Evolving Data Betty Salzberg and Vassilis Tsotras CS599, Temporal and Spatial Databases Course. Presented by: Atousa

20

Query Type III

Given a key range, find a history of objects in this range.

Representative query for this type: Pure-Key query : A special case of type III where the key

range is reduced to a single key

Ex. “Find the salary history of employee with ssn b“

T1 T2 T3 T8 T9 Time

ab b

Range = {b}c

Page 21: 1 Comparison of Access Methods for Time-Evolving Data Betty Salzberg and Vassilis Tsotras CS599, Temporal and Spatial Databases Course. Presented by: Atousa

21

Bitemporal Queries

Bitemporal queries will be a combination of the previous types.

Ex.“ Find all the company contracts that were valid on v = Jan,1,94 as recorded in the database during transaction time interval T = May,1 - May,20, 1993.

Page 22: 1 Comparison of Access Methods for Time-Evolving Data Betty Salzberg and Vassilis Tsotras CS599, Temporal and Spatial Databases Course. Presented by: Atousa

22

Three-Entry Notation Query-Type Representation

Key / Valid / Transaction

Values for Each entry: Point Range ‘ * ’ ‘ - ‘

Example Transaction time Pure-Timeslice: “ * / - / point “ Valid Pure-Key: “ point / * / - “

Page 23: 1 Comparison of Access Methods for Time-Evolving Data Betty Salzberg and Vassilis Tsotras CS599, Temporal and Spatial Databases Course. Presented by: Atousa

23

Access Method Costs

The performance of an access method is characterized by three costs:

1) Storage Space, to physically store the data records and the structures of the access method

2) Update Processing Time, is the time to update the method’s data structures as a result of a change.

3) Query Time, for each of the basic queries.

Page 24: 1 Comparison of Access Methods for Time-Evolving Data Betty Salzberg and Vassilis Tsotras CS599, Temporal and Spatial Databases Course. Presented by: Atousa

24

Storage Space

For Transaction time and Bitemporal methods, storage space is a function of ‘n’, O(n). ‘n’ is the number of changes in the database.

Ex. For 1000 insertions and 1000 deletions, n is 2000.

For Valid time methods, storage space is a function of ‘l ’, O(l).

‘l ’ is the number of interval-objects currently stored in the method..

Ex. If seven Objects are stored in the database, l = 7.

Page 25: 1 Comparison of Access Methods for Time-Evolving Data Betty Salzberg and Vassilis Tsotras CS599, Temporal and Spatial Databases Course. Presented by: Atousa

25

Update Processing Time

Update processing time is dependent on the access method index structure.

For tree structured access method, it is a logarithmic time.

Ex. The best in-core algorithm for valid pure-timeslice query, requires O (log l ) update processing time per change.

Page 26: 1 Comparison of Access Methods for Time-Evolving Data Betty Salzberg and Vassilis Tsotras CS599, Temporal and Spatial Databases Course. Presented by: Atousa

26

Query Time

A method’s query time is a function of the query answer size, denoted by ‘ a ‘.

It is usually consists of two parts:

Logarithmic time for searching the access method structure to find the answer records.

Time that is required to retrieve the result.

Ex. The best in-core algorithm for valid pure-timeslice query, requires O (log l + a) query time.

Page 27: 1 Comparison of Access Methods for Time-Evolving Data Betty Salzberg and Vassilis Tsotras CS599, Temporal and Spatial Databases Course. Presented by: Atousa

27

Index Pagination and Data clustering

In a database environment the major cost of computation is based on how many pages are transferred between main memory and secondary memory.

Index pagination. How well index nodes of a method are paginated on the secondary memory.

Since index is used to search for and update data, its pagination greatly improves the performance of the access method.

Ex. B+-tree is a well paginated index, since it needs O(logB r) page accesses for searching or updating r objects, using pages of size B.

Page 28: 1 Comparison of Access Methods for Time-Evolving Data Betty Salzberg and Vassilis Tsotras CS599, Temporal and Spatial Databases Course. Presented by: Atousa

28

Index Pagination and Data clustering

Data Clustering. Is to store ‘logically’ related data, physically close.

It can substantially improve the performance of an access method since in order to answer query, fewer pages are accessed.

Ex. A transaction pure-timeslice query takes O(logB n + a/B) page accesses if the result are clustered in pages of size B. ‘n’ is the number of changes in the database and ‘a’ is the answer size. If result is not clustered O(logB n + a) page accesses are required.

Page 29: 1 Comparison of Access Methods for Time-Evolving Data Betty Salzberg and Vassilis Tsotras CS599, Temporal and Spatial Databases Course. Presented by: Atousa

29

Migration of Past Data to Another Location

Methods that support transaction time maintain all their past states. This property can easily result in excessive amounts of data.

Methods that support the following two criteria will have a better performance:

Whether or not past data can be separated from the current data, so that the smaller collection of the current data can be accessed more efficiently.

Whether data is appended sequentially to the method and never changed, so that write-once read-many (WORM) devices can be used.

Page 30: 1 Comparison of Access Methods for Time-Evolving Data Betty Salzberg and Vassilis Tsotras CS599, Temporal and Spatial Databases Course. Presented by: Atousa

30

Migration of Past Data to Another Location

Since current data is more frequently accessed than past data, If a method supports these criteria, past data can be moved to a higher capacity, but slower medium such as optical disk.

There are two ways for this separation:

With Manual approach, a process vacuums all records that are ‘dead’ when the process is called and moves them to the optical disk.

With Automated approach, any objects that become ‘dead’ (logically deleted) will be moved to the optical disk

Page 31: 1 Comparison of Access Methods for Time-Evolving Data Betty Salzberg and Vassilis Tsotras CS599, Temporal and Spatial Databases Course. Presented by: Atousa

31

Lower Bound on I/O Complexity

The minimum number of I/Os to solve a transaction time query is at least (logB n + a/B)

Since in a paginated environment O(logB n) is the minimum number of I/Os to search for the answer in the index structure and O(a/B) is the minimum number of pages to store the answer.

This lower bound is under the assumption that all time instants have the same probability of being in the query predicate.

If queries have special probabilities, the logarithmic search part will be improved. Ex. If most queries ask for the recent times.

Page 32: 1 Comparison of Access Methods for Time-Evolving Data Betty Salzberg and Vassilis Tsotras CS599, Temporal and Spatial Databases Course. Presented by: Atousa

32

Lower Bound on I/O Complexity

The minimum number of I/Os to solve a valid time query is also at least (logB l + a/B).

A method is called “ I/O Optimal “ if it achieves:

O(logB n + a/B) query time and O(n/B) space for transaction time

O(logB l + a/B) query time and O(l/B) space for valid time.

O(logB m) update processing per change. ‘m’ is the number of alive objects when the update takes place.

Page 33: 1 Comparison of Access Methods for Time-Evolving Data Betty Salzberg and Vassilis Tsotras CS599, Temporal and Spatial Databases Course. Presented by: Atousa

33

Efficient Method Design for Transaction/Bitemporal Data

The Question is how to efficiently store large amount of data produced by a transaction time database.

There are two approaches for storing the data to answer pure-timeslice queries efficiently (copy, log):

Copy approach stores a copy of the transaction database state (timeslice) for each transaction time that at least one change occurred.

T1 T2 T3 T4 T5 Time

ab b

CopyCopy Copy CopyCopy

Page 34: 1 Comparison of Access Methods for Time-Evolving Data Betty Salzberg and Vassilis Tsotras CS599, Temporal and Spatial Databases Course. Presented by: Atousa

34

Efficient Method Design for Transaction/Bitemporal Data

These copies are indexed by time t. Since changes arrive in order, therefore indexes are paginated (which provides a minimal query time).

It means O(logB n) page accesses are needed to find the timeslice and O(a/B) pages to retrieve the result. Therefore the page accesses and also query time is O(logB n + a/B)

T1 T2 T3 T4 T5 Time

ab b

Copy 2 Rec.Copy 1 Rec. Copy 2 Rec. Copy 2 Rec.Copy 2 Rec.

Page 35: 1 Comparison of Access Methods for Time-Evolving Data Betty Salzberg and Vassilis Tsotras CS599, Temporal and Spatial Databases Course. Presented by: Atousa

35

Disadvantages:

The space used can in the worst case be O(n2/ B) pages.

1 + 2 + 3 + 4 + 5 + . . . + n = n *(n+1)/2 = (n2 + n)/2 O(n2/ B ).

Update processing per change in the worst: O(n/ B ).

Efficient Method Design for Transaction/Bitemporal Data

T1 T2 T3 T4 T5 Time

ab

Copy 3 Rec.Copy 1 Rec. Copy 2 Rec. Copy 5 Rec. . . .Copy 4 Rec.

dc

e

Page 36: 1 Comparison of Access Methods for Time-Evolving Data Betty Salzberg and Vassilis Tsotras CS599, Temporal and Spatial Databases Course. Presented by: Atousa

36

T1 T2 T3 T4 T5 Time

ab b

Copy 1 Rec.Copy 1 Rec. Copy 1 Rec. Copy 1 Rec.Copy 1 Rec.

Efficient Method Design for Transaction/Bitemporal Data

Log approach stores only the changes that occur in the database time-stamped by the time instant on which they occurred.

The space used is reduced to O(n/ B) pages. The Update processing per change is also reduces to O(1). The Query time increased to O(n/ B) Since to find a timeslice all

the past records may have to be searched

Page 37: 1 Comparison of Access Methods for Time-Evolving Data Betty Salzberg and Vassilis Tsotras CS599, Temporal and Spatial Databases Course. Presented by: Atousa

37

Efficient Method Design for Transaction/Bitemporal Data

For the Pure-Key queries a better approach than copy and log is to store history of each key separately, This creates a key-Only method.

For the Range-Timeslice queries it is best to cluster by transaction time and key within pages.

For the Bitemporal queries the copy approach can be used to store the collections at given transaction time and a log approach to store changes between copies.

Another approach is to index the bitemporal objects on a single time axis. This approach is not efficient because very few of the accessed objects may satisfy the valid-time predicate.

Page 38: 1 Comparison of Access Methods for Time-Evolving Data Betty Salzberg and Vassilis Tsotras CS599, Temporal and Spatial Databases Course. Presented by: Atousa

38

Method Classification

Transaction access method is classified into three categories:

Time-only : clustering data by time only.

Key-only : clustering data by key only.

Time-key : clustering data by both time and key.

Page 39: 1 Comparison of Access Methods for Time-Evolving Data Betty Salzberg and Vassilis Tsotras CS599, Temporal and Spatial Databases Course. Presented by: Atousa

39

Examples of Method Classes

Transaction Time

time-only key-only time-key append-only tree reverse chaining composite indexes

time-index accession lists segment-R tree

differential file approach time sequence array write-once B-tree

achievable time index C-lists time-split B-tree

snap shot index multiversion access window method structure

overlapping B-tree

Page 40: 1 Comparison of Access Methods for Time-Evolving Data Betty Salzberg and Vassilis Tsotras CS599, Temporal and Spatial Databases Course. Presented by: Atousa

40

Time Index Access Method (Time Only)

Page 41: 1 Comparison of Access Methods for Time-Evolving Data Betty Salzberg and Vassilis Tsotras CS599, Temporal and Spatial Databases Course. Presented by: Atousa

41

Time Index Access Method (Time Only)

Based B+-Tree access method on the time axis. New nodes are always added on the rightmost leaf of the index. It is equivalent to Copy approach

Query Time Update processing

timeSpace

Pure-TimeSlice

O(logB n + a/B) O(n/B) O(n2/B)

Range-TImeSclice

O(M logB n + a)O(n/B) O(n2/B)

Pure-Key N/A N/A N/A

Page 42: 1 Comparison of Access Methods for Time-Evolving Data Betty Salzberg and Vassilis Tsotras CS599, Temporal and Spatial Databases Course. Presented by: Atousa

42

Reverse chaining (Key-Only)

B+-treeIndex

Current Past

B+-treeIndex

Key

(K1,A1,I1) ø

(K2,A2,I2) P2

(K3,A3,I3) P3

(K2,A2”,I2”)(K2,A2’,I2’) ø

(K3,A3’,I3’) ø

(K4,A4,I4) ø

(K4,A4”,I4”)

(K4,A4’,I4’)

Key

Page 43: 1 Comparison of Access Methods for Time-Evolving Data Betty Salzberg and Vassilis Tsotras CS599, Temporal and Spatial Databases Course. Presented by: Atousa

43

Reverse Chaining

The current and past data are stored separately. Previous version of a given key are linked together in reverse

chronological order. Current data is assume to be queried more often.(smaller data size

means faster query) Each tuple includes

Key Attribute value Life span interval Pointer (which points to the previous version, if any, of this key).

Update processing is O(logB n) , the query time is O(logB n + a), and the used space is O(n / B)

Page 44: 1 Comparison of Access Methods for Time-Evolving Data Betty Salzberg and Vassilis Tsotras CS599, Temporal and Spatial Databases Course. Presented by: Atousa

44

Comparison of Access Methods for time evolving data

Page 45: 1 Comparison of Access Methods for Time-Evolving Data Betty Salzberg and Vassilis Tsotras CS599, Temporal and Spatial Databases Course. Presented by: Atousa

45

The End