ics 214b: transaction processing and distributed data management

67
1 ICS 214B: Transaction Processing and Distributed Data Management Lecture 9: Fragmentation and Distributed Query Processing Professor Chen Li

Upload: luigi

Post on 13-Jan-2016

40 views

Category:

Documents


3 download

DESCRIPTION

ICS 214B: Transaction Processing and Distributed Data Management. Lecture 9: Fragmentation and Distributed Query Processing Professor Chen Li. Which simple predicates should we use in Pr?. Desired property of Pr: - minimality - uniformity. Return to example:. E(#, NM, LOC, SAL,…) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: ICS 214B: Transaction Processing and Distributed Data Management

1

ICS 214B: Transaction Processing and Distributed Data Management

Lecture 9: Fragmentation and Distributed Query Processing

Professor Chen Li

Page 2: ICS 214B: Transaction Processing and Distributed Data Management

ICS214B Notes 09 2

Which simple predicates should we use in Pr?

Desired property of Pr: - minimality

- uniformity

Page 3: ICS 214B: Transaction Processing and Distributed Data Management

ICS214B Notes 09 3

Return to example:E(#, NM, LOC, SAL,…)Common queries:

Qa: select * Qb: select * from E from E where LOC=Sa where

LOC=Sb and … and ...

Page 4: ICS 214B: Transaction Processing and Distributed Data Management

ICS214B Notes 09 4

Three choices:

(1) Pr = { } F1 ={ E }

(2) Pr = {LOC=Sa, LOC=Sb}

F2={ loc=Sa E, loc=Sb E }

(3) Pr = {LOC=Sa, LOC=Sb, Sal<10}

F3={ loc=Sa sal<10 E, loc=Sa sal10 E, loc=Sb sal<10E, loc=Sb sal10 E }

Page 5: ICS 214B: Transaction Processing and Distributed Data Management

ICS214B Notes 09 5

In other words:Loc=Sa sal < 10

Loc=Sa sal 10

Loc=Sb sal < 10

Loc=Sb sal 10

F1 F3F2

Qa: Select … loc = Sa ...

Qb: Select … loc = Sb ...

F2 is good…

(not F1 , F3 )

Page 6: ICS 214B: Transaction Processing and Distributed Data Management

ICS214B Notes 09 6

Informal definitionSet of predicates Pr is uniform if:

for every Fi F[Pr], every t Fi has equal probability of access by every major application.

Note: F[Pr] is fragmentation defined by minterm predicates generated by Pr.

Page 7: ICS 214B: Transaction Processing and Distributed Data Management

ICS214B Notes 09 7

Back to example:Loc=Sa sal < 10

Loc=Sa sal 10

Loc=Sb sal < 10

Loc=Sb sal 10

F1

Qa: Select … loc = Sa ...

Qb: Select … loc = Sb ...tuples here havehigher probabilityof access

tuples here havelower probabilityof access

so F1 is not “good”...

Page 8: ICS 214B: Transaction Processing and Distributed Data Management

ICS214B Notes 09 8

Back to example:Loc=Sa sal < 10

Loc=Sa sal 10

Loc=Sb sal < 10

Loc=Sb sal 10

F2

Qa: Select … loc = Sa ...

Qb: Select … loc = Sb ...tuples here havesame probabilityof access

so F2 is “good”...

so is F3 ...

Page 9: ICS 214B: Transaction Processing and Distributed Data Management

ICS214B Notes 09 9

Informal definitionSet of predicates Pr is minimal if no Pr’ Pr is uniform

Page 10: ICS 214B: Transaction Processing and Distributed Data Management

ICS214B Notes 09 10

Back to example:

(1) Pr = { } N(2) Pr = {LOC=Sa, LOC=Sb} Y(3) Pr = {LOC=Sa, LOC=Sb, Sal<10} N

uniform?

Pr(2) is a subset of Pr(3), so Pr(3) is not minimal...

Page 11: ICS 214B: Transaction Processing and Distributed Data Management

ICS214B Notes 09 11

Is Pr uniform and minimal a good thing?

Not necessarily! But it does simplify allocation problem...

Page 12: ICS 214B: Transaction Processing and Distributed Data Management

ICS214B Notes 09 12

Derived horizontal fragmentation

E(ENO, NAME, SAL, LOC)

J(ENO, DESCRIPTION,…)

(Owner)

(Member)

Common query: Given an employee name, list projects (s)he works in

E F={ E1, E2} by LOC

Page 13: ICS 214B: Transaction Processing and Distributed Data Management

ICS214B Notes 09 13

E1

(at Sa) (at Sb)

E2# NM Loc Sal5 Joe Sa 108 Tom Sa 15…

# NM Loc Sal7 Sally Sb 2512 Fred Sb 15…

# Description5 work on 347 hw7 go to moon5 build table12 rest…

J

Page 14: ICS 214B: Transaction Processing and Distributed Data Management

ICS214B Notes 09 14

E1

(at Sa) (at Sb)

E2# NM Loc Sal5 Joe Sa 108 Tom Sa 15…

# NM Loc Sal7 Sally Sb 2512 Fred Sb 15…

J1 J2

J1 = J E1 J2 = J E2

# Des5 work on 347 hw5 build table…

# Des7 go to moon12 rest…

Page 15: ICS 214B: Transaction Processing and Distributed Data Management

ICS214B Notes 09 15

Derived horizontal fragmentation

R, F = { F1, F2, ... Fn}

S, D = {D1, D2, …Dn} where Di =S Fi

Convention: R is owner S is member

F could beprimary or derived

Page 16: ICS 214B: Transaction Processing and Distributed Data Management

ICS214B Notes 09 16

Checking completeness and disjointness of derived fragmentation

But no #= 33 in E1 nor in E2!

# Des…33 build chair…

Example: Say J is:

This J tuple will not be in J1 nor J2Fragmentation not complete

Page 17: ICS 214B: Transaction Processing and Distributed Data Management

ICS214B Notes 09 17

To get completeness: Need to enforcereferential integrity constraint: join attr(#) of member

relation

join attr(#) of owner relation

Page 18: ICS 214B: Transaction Processing and Distributed Data Management

ICS214B Notes 09 18

# NM Loc Sal5 Joe Sa 10…

# NM Loc Sal5 Fred Sb 20…

Example:

E1 E2

# Description5 day off…

# Description5 day off…

# Description5 day off…

J1

J

J2

Fragmentationis not

disjoint!

Page 19: ICS 214B: Transaction Processing and Distributed Data Management

ICS214B Notes 09 19

To get disjointness: Join attribute(#) should be key of owner relation

Page 20: ICS 214B: Transaction Processing and Distributed Data Management

ICS214B Notes 09 20

Summary: horizontal fragmentation

• Type: primary, derived• Properties: completeness,

disjointness• Predicates: minimal, uniform

Page 21: ICS 214B: Transaction Processing and Distributed Data Management

ICS214B Notes 09 21

Vertical fragmentation

E1

# NM Loc Sal5 Joe Sa 107 Sally Sb 258 Fred Sa 15…

# NM Loc5 Joe Sa7 Sally Sb8 Fred Sa…

# Sal5 107 258 15…

E

E2

Example:

Page 22: ICS 214B: Transaction Processing and Distributed Data Management

ICS214B Notes 09 22

R[T] R1[T1] Ti T

Rn[Tn]

Just like normalization of relations

...

Page 23: ICS 214B: Transaction Processing and Distributed Data Management

ICS214B Notes 09 23

Properties: R[T] Ri[Ti]

(1) Completeness

U Ti = Tall i

Page 24: ICS 214B: Transaction Processing and Distributed Data Management

ICS214B Notes 09 24

(2) DisjointnessTi Tj = for all i,j ij

E(#,LOC,SAL)E1(#,LOC)

E2(SAL)

Not a desirable property!!(could not reconstruct R!)

Page 25: ICS 214B: Transaction Processing and Distributed Data Management

ICS214B Notes 09 25

(3) Reconstruction: Lossless join

Ri = Rall i

One way to achieve lossless join: Repeat key in all fragments, i.e.,

Key Ti for all i

Page 26: ICS 214B: Transaction Processing and Distributed Data Management

ICS214B Notes 09 26

Hybrid Fragmentation

R

R1 R2

R11 R12 R21 R22

Horizontal

Vertical

Page 27: ICS 214B: Transaction Processing and Distributed Data Management

ICS214B Notes 09 27

Hybrid Fragmentation -- Reconstruction

R11 R12 R21 R22

Horizontal

Vertical

U

Page 28: ICS 214B: Transaction Processing and Distributed Data Management

ICS214B Notes 09 28

AllocationExample: E(#,NM,LOC,SAL)

F1 = loc=Sa E ; F2 = loc=Sb E Qa: select … where loc=Sa...Qb: select … where loc=Sb…

Site a Site b

Where do F1,F2 go?

?

Page 29: ICS 214B: Transaction Processing and Distributed Data Management

ICS214B Notes 09 29

Issues• Where do queries originate?• What is communication cost?

and size of answers, relations,…• What is storage capacity, cost at sites?

and size of fragments?• What is processing power at sites?• What is query processing strategy?

– How are joins done?– Where are answers collected?

Page 30: ICS 214B: Transaction Processing and Distributed Data Management

ICS214B Notes 09 30

• Cost of updating copies?• Writes and concurrency control?• ...

Do we replicate fragments?

Page 31: ICS 214B: Transaction Processing and Distributed Data Management

ICS214B Notes 09 31

Optimization problem:

• What is best placement of fragments and/or best number of copies to:– minimize query response time– maximize throughput– minimize “some cost”– ...

• Subject to constraints?– Available storage– Available bandwidth, power,…– Keep 90% of response time below X– ...

• Often, can use common sense– Place fragments where they are most heavily

accessed

This is an incredibly hard problem

Page 32: ICS 214B: Transaction Processing and Distributed Data Management

ICS214B Notes 09 32

Summary

• Horizontal and vertical fragmentation• Designing good fragmentations and

allocation

Next:• Query processing in distributed databases

Page 33: ICS 214B: Transaction Processing and Distributed Data Management

ICS214B Notes 09 33

Query

Query Plan

Algebraic query tree on relations

(1) Decomposition

Algebraic query tree on relation fragments

(2) Localization

(3) Optimization

Page 34: ICS 214B: Transaction Processing and Distributed Data Management

ICS214B Notes 09 34

Decomposition• Same as in centralized system

– Normalization– Eliminating redundancy– Algebraic rewriting

Page 35: ICS 214B: Transaction Processing and Distributed Data Management

ICS214B Notes 09 35

Normalization

• Convert from query language to relational algebra

Page 36: ICS 214B: Transaction Processing and Distributed Data Management

ICS214B Notes 09 36

ExampleSELECT R.A, S.DFROM R, SWHERE (R.B=1 and S.C=2) and (R.A = S.A)

R.A,S.D

(R.B=1 S.C=2)

R S(R.A = S.A)

Page 37: ICS 214B: Transaction Processing and Distributed Data Management

ICS214B Notes 09 37

Eliminate redundancyE.g.: in conditions:

(S.A=1) (S.A>5) False(S.A<10) (S.A<5) S.A<5

Page 38: ICS 214B: Transaction Processing and Distributed Data Management

ICS214B Notes 09 38

E.g.: Common sub-expressions

U U

S cond cond T S cond T

R R R

Page 39: ICS 214B: Transaction Processing and Distributed Data Management

ICS214B Notes 09 39

Algebraic rewritingE.g.: Push conditions down

cond3

cond

cond1

cond2

R S R S

Page 40: ICS 214B: Transaction Processing and Distributed Data Management

ICS214B Notes 09 40

Query

Algebraic query tree on relations

(1) Decomposition

Algebraic query tree on relation fragments

(2) Localization

Page 41: ICS 214B: Transaction Processing and Distributed Data Management

ICS214B Notes 09 41

Localization steps(1) Start with query tree(2) Replace relations by fragments

(3) Push : up

, : down

(4) Simplify – eliminate unnecessary operations

Page 42: ICS 214B: Transaction Processing and Distributed Data Management

ICS214B Notes 09 42

Notation for fragment

[R: cond]

fragment conditions its tuples satisfy

Page 43: ICS 214B: Transaction Processing and Distributed Data Management

ICS214B Notes 09 43

Example A

(1) E=3

R

Page 44: ICS 214B: Transaction Processing and Distributed Data Management

ICS214B Notes 09 44

(2) E=3

[R1: E < 10] [R2: E 10]

Page 45: ICS 214B: Transaction Processing and Distributed Data Management

ICS214B Notes 09 45

(3)

E=3 E=3

[R1: E < 10] [R2: E 10]

Ø

Page 46: ICS 214B: Transaction Processing and Distributed Data Management

ICS214B Notes 09 46

(4) E=3

[R1: E < 10]

Page 47: ICS 214B: Transaction Processing and Distributed Data Management

ICS214B Notes 09 47

Rule 1

C1[R: c2] C1[R: c1 c2]

[R: False] ØA

B

Page 48: ICS 214B: Transaction Processing and Distributed Data Management

ICS214B Notes 09 48

In example A:E=3[R2: E10] [E=3 R2: E=3

E10]

[E=3 R2: False]

Ø

Page 49: ICS 214B: Transaction Processing and Distributed Data Management

ICS214B Notes 09 49

Example B(1) A=common

attribute

R S

A

Page 50: ICS 214B: Transaction Processing and Distributed Data Management

ICS214B Notes 09 50

(2)

A

[S1: A<5] [S2: A 5]

[R1: A<5] [R2: 5 A 10] [R3: A>10]

Page 51: ICS 214B: Transaction Processing and Distributed Data Management

ICS214B Notes 09 51

(3)

[R1:A<5][S1:A<5] [R1:A<5][S2:A5] [R2:5A10][S1:A<5]

[R2:5A10][S2:A5] [R3:A>10][S1:A<5] [R3:A>10]

[S2:A5]

A AA

AA A

Page 52: ICS 214B: Transaction Processing and Distributed Data Management

ICS214B Notes 09 52

(4)

[R1:A<5][S1:A<5] [R2:5A10][S2:A5]

A A A

[R3:A>10][S2:A5]

Page 53: ICS 214B: Transaction Processing and Distributed Data Management

ICS214B Notes 09 53

Rule 2

[R: C1] [S: C2]

[R S: C1 C2 R.A = S.A]

A

A

Page 54: ICS 214B: Transaction Processing and Distributed Data Management

ICS214B Notes 09 54

In step 4 of Example B:

[R1: A<5] [S2: A 5]

[R1 S2: R1.A < 5 S2.A 5 R1.A = S2.A ]

[R1 S2: False] ØA

A

A

Page 55: ICS 214B: Transaction Processing and Distributed Data Management

ICS214B Notes 09 55

Localization with derived fragmentation

Example C(2)

[R1:A<10][R210] S1:K=R.K S2:K=R.K

R.A<10 R.A10

K

Page 56: ICS 214B: Transaction Processing and Distributed Data Management

ICS214B Notes 09 56

(3)

[R1][S1] [R1][S2] [R2][S1] [R2][S2]

K K K K

Page 57: ICS 214B: Transaction Processing and Distributed Data Management

ICS214B Notes 09 57

(4)

[R1][S1] [R2][S2]

K K

Page 58: ICS 214B: Transaction Processing and Distributed Data Management

ICS214B Notes 09 58

In step 3 of Example C:

[R1:A<10] [S2:K=R.K R.A10]

[R1 S2: R1.A<10 S2.K=R.K R.A10 R1.K= S2.K]

[R1 S2:False ] (K is key of R, R1)

Ø

K

K

K

Page 59: ICS 214B: Transaction Processing and Distributed Data Management

ICS214B Notes 09 59

Localization with vertical fragmentation

Example D(1) A R1(K, A, B)

R R2(K, C, D)

Page 60: ICS 214B: Transaction Processing and Distributed Data Management

ICS214B Notes 09 60

(2) A

R1 R2

(K, A, B) (K, C, D)

K

Page 61: ICS 214B: Transaction Processing and Distributed Data Management

ICS214B Notes 09 61

(3) A

K,A K,A

R1 R2

(K, A, B) (K, C, D)

K

not reallyneeded

Page 62: ICS 214B: Transaction Processing and Distributed Data Management

ICS214B Notes 09 62

(4) A

R1

(K, A, B)

Page 63: ICS 214B: Transaction Processing and Distributed Data Management

ICS214B Notes 09 63

Rule 3• Given vertical fragmentation of R:

Ri = Ai (R), Ai A

• Then for any B A:

B (R) = B [ Ri | B Ai Ø ]i

Page 64: ICS 214B: Transaction Processing and Distributed Data Management

ICS214B Notes 09 64

Localization with hybrid fragmentationExample E

R1 = k<5 [k,A R]

R2 = k5 [k,A R]

R3 = k,B R

Page 65: ICS 214B: Transaction Processing and Distributed Data Management

ICS214B Notes 09 65

Query: A

k=3

R

Page 66: ICS 214B: Transaction Processing and Distributed Data Management

ICS214B Notes 09 66

ReducedQuery: A

k=3

R1

Page 67: ICS 214B: Transaction Processing and Distributed Data Management

ICS214B Notes 09 67

Distributed Query Processing

• Decomposition • Localization • Optimization next