cs 347notes 021 cs 347: parallel and distributed data management notes02: distributed db design...
TRANSCRIPT
CS 347 Notes 02 1
CS 347: Parallel and Distributed
Data ManagementNotes02: Distributed DB
Design
Hector Garcia-Molina
CS 347 Notes 02 2
Distributed DB Design
Top-down approach: - have DB… - how to split and allocate the sites
Multi-DBs (or bottom-up): no design issues!
Chapter 5Ozsu & Valduriez
CS 347 Notes 02 3
Two issues in DDB design:
• Fragmentation• Allocation
Note: issues not independent, but will cover separately
CS 347 Notes 02 4
Example
Employee relation E (#,name,loc,sal,…) 40% of queries: 40% of queries: Qa: select * Qb: select *
from E from Ewhere loc=Sa where
loc=Sband… and ...
CS 347 Notes 02 5
Example
Employee relation E (#,name,loc,sal,…) 40% of queries: 40% of queries: Qa: select * Qb: select *
from E from Ewhere loc=Sa where
loc=Sband… and ...
Motivation: Two sites: Sa, Sb Qa QbSa Sb
CS 347 Notes 02 7
# NM Loc Sal
E 578
Sa 10Sally Sb 25Tom Sa 15
Joe
# NM Loc Sal # NM Loc Sal58
Sa 10Tom Sa 15Joe 7 Sb 25Sally
..
..
....
F
At SaAt Sb
CS 347 Notes 02 9
F = { F1, F2 }
F1 = loc=Sa E F2 = loc=Sb E
called primary horizontal fragmentation
CS 347 Notes 02 10
Fragmentation
• Horizontal Primary depends on local attributes
R Derived depends on foreign
relation
• Vertical
R
CS 347 Notes 02 11
Fragmentation
• Horizontal Primary depends on local attributes
R Derived depends on foreign
relation
• Vertical
R Fragmentation also calledSharding
CS 347 Notes 02 12
Three common horizontal partitioning techniques
• Round robin• Hash partitioning• Range partitioning
CS 347 Notes 02 13
• Round robin
R D0 D1 D2
t1 t1t2 t2t3 t3t4 t4... t5
• Evenly distributes data• Good for scanning full relation• Not good for point or range queries
CS 347 Notes 02 14
• Hash partitioning
R D0 D1 D2
t1h(k1)=2 t1t2h(k2)=0 t2t3h(k3)=0 t3t4h(k4)=1 t4...
• Good for point queries on key; also for joins• Not good for range queries; point queries not on
key• If hash function good, even distribution
CS 347 Notes 02 15
• Range partitioning
R D0 D1 D2
t1: A=5 t1t2: A=8 t2t3: A=2 t3t4: A=3 t4...
• Good for some range queries on A• Need to select good vector: else unbalance
data skew execution skew
4 7
partitioningvector
V0 V1
CS 347 Notes 02 16
Which are good fragmentations?
Example:
F = { F1, F2 }
F1 = sal<10 E F2 = sal>20 E
CS 347 Notes 02 17
Which are good fragmentations?
Example:
F = { F1, F2 }
F1 = sal<10 E F2 = sal>20 E
Problem: Some tuples lost!
CS 347 Notes 02 18
Which are good fragmentations?
Second example:
F = { F3, F4 }
F3 = sal<10 E F4 = sal>5 E
CS 347 Notes 02 19
Which are good fragmentations?
Second example:
F = { F3, F4 }
F3 = sal<10 E F4 = sal>5 E
Tuples with 5 < sal < 10 are duplicated...
CS 347 Notes 02 20
Prefer to deal with replication explicitly
Example: F = { F5, F6, F7 }
F5 = sal 5 E F6 = 5< sal <10
E
F7 = sal 10 E
Then replicate F6 if convenient (part of allocation problem)
CS 347 Notes 02 21
Desired properties for horizontal fragmentation
R F ={ F1, F2, … }
(1) Completeness
t R, Fi F such that t Fi
CS 347 Notes 02 22
(2) Disjointness t Fi, Fj such that tFj, i j, Fi, Fj F
(3) Reconstruction - ignore
CS 347 Notes 02 23
How do we get completeness and disjointness?
(1) Check it “manually”!
e.g., F1 = sal<10 E ; F2 = sal10 E
CS 347 Notes 02 24
How do we get completeness and disjointness?
(2) “Automatically” generate fragments with these properties
Desired simple predicates Fragments
CS 347 Notes 02 25
Example of generation
• Say queries use predicates:A<10, A>5, Loc = SA, Loc = SB
• Next: - generate “minterm” predicates
- eliminate useless ones
CS 347 Notes 02 26
Minterm predicates (part I)(1) A<10 A>5 Loc=SA Loc=SB
(2) A<10 A>5 Loc=SA ¬(Loc=SB)(3) A<10 A>5 ¬(Loc=SA) Loc=SB
(4) A<10 A>5 ¬(Loc=SA) ¬(Loc=SB)(5) A<10 ¬(A>5) Loc=SA Loc=SB
(6) A<10 ¬(A>5) Loc=SA ¬(Loc=SB)(7) A<10 ¬(A>5) ¬(Loc=SA) Loc=SB
(8) A<10 ¬(A>5) ¬(Loc=SA) ¬(Loc=SB)
CS 347 Notes 02 27
Minterm predicates (part I)(1) A<10 A>5 Loc=SA Loc=SB
(2) A<10 A>5 Loc=SA ¬(Loc=SB)(3) A<10 A>5 ¬(Loc=SA) Loc=SB
(4) A<10 A>5 ¬(Loc=SA) ¬(Loc=SB)(5) A<10 ¬(A>5) Loc=SA Loc=SB
(6) A<10 ¬(A>5) Loc=SA ¬(Loc=SB)(7) A<10 ¬(A>5) ¬(Loc=SA) Loc=SB
(8) A<10 ¬(A>5) ¬(Loc=SA) ¬(Loc=SB)
CS 347 Notes 02 28
Minterm predicates (part I)(1) A<10 A>5 Loc=SA Loc=SB
(2) A<10 A>5 Loc=SA ¬(Loc=SB)(3) A<10 A>5 ¬(Loc=SA) Loc=SB
(4) A<10 A>5 ¬(Loc=SA) ¬(Loc=SB)(5) A<10 ¬(A>5) Loc=SA Loc=SB
(6) A<10 ¬(A>5) Loc=SA ¬(Loc=SB)(7) A<10 ¬(A>5) ¬(Loc=SA) Loc=SB
(8) A<10 ¬(A>5) ¬(Loc=SA) ¬(Loc=SB)
A 5
5 < A < 10
CS 347 Notes 02 29
Minterm predicates (part II)
(9) ¬(A<10) A>5 Loc=SA Loc=SB
(10) ¬(A<10) A>5 Loc=SA ¬(Loc=SB)(11) ¬(A<10) A>5 ¬(Loc=SA) Loc=SB
(12) ¬(A<10) A>5 ¬(Loc=SA) ¬(Loc=SB)(13) ¬(A<10) ¬(A>5) Loc=SA Loc=SB
(14) ¬(A<10) ¬(A>5) Loc=SA ¬(Loc=SB)(15) ¬(A<10) ¬(A>5) ¬(Loc=SA) Loc=SB
(16) ¬(A<10) ¬(A>5) ¬(Loc=SA) ¬(Loc=SB)
CS 347 Notes 02 30
Minterm predicates (part II)
(9) ¬(A<10) A>5 Loc=SA Loc=SB
(10) ¬(A<10) A>5 Loc=SA ¬(Loc=SB)(11) ¬(A<10) A>5 ¬(Loc=SA) Loc=SB
(12) ¬(A<10) A>5 ¬(Loc=SA) ¬(Loc=SB)(13) ¬(A<10) ¬(A>5) Loc=SA Loc=SB
(14) ¬(A<10) ¬(A>5) Loc=SA ¬(Loc=SB)(15) ¬(A<10) ¬(A>5) ¬(Loc=SA) Loc=SB
(16) ¬(A<10) ¬(A>5) ¬(Loc=SA) ¬(Loc=SB)
A 10
CS 347 Notes 02 31
Final fragments:
F2: 5 < A < 10 Loc=SA F3: 5 < A < 10 Loc=SB F6: A 5 Loc=SA F7: A 5 Loc=SB F10: A 10 Loc=SA F11: A 10 Loc=SB
CS 347 Notes 02 32
Note: elimination of useless fragments depends on application semantics:
e.g.: if LOC could be SA, SB, we need to add fragments
F4: 5 <A <10 Loc SA Loc SB
F8: A 5 Loc SA Loc SB
F12: A 10 Loc SA Loc SB
CS 347 Notes 02 33
Why does this work?
Predicates: p1 p2 p3 p4 p1 p2 p3 ¬ p4
¬ p1 ¬ p2 ¬ p3 ¬ p4
...
CS 347 Notes 02 34
(1) Completeness: Take t Rpi(t) must be T or F!
Say p1(t) =T p2(t) = T p3(t) =F p4(t) =F
Then t is in fragment with predicate p1 p2 ¬ p3 ¬ p4
CS 347 Notes 02 35
(2) DisjointnessSay t Fragment p1 p2 ¬ p3 ¬ p4
Then: p1(t) = T, p2(t) = T, p3(t) = F, p4(t)= F
t cannot be in any other fragment!
CS 347 Notes 02 36
Summary• Given simple predicates Pr= { p1, p2,..
pm }minterm predicates are
M={m | m = pk*, 1 k m }
where pk* is pk or is ¬ pk
pkPr
• Fragments m R for all m M are
complete and disjoint
Another Desired Fragmentation Property:
Match Access Patterns
CS 347 Notes 02 37
data A
data B
data C
frequentlyaccessedtogether
try toplace insame
fragment
CS 347 Notes 02 38
Return to example:
E(#, NM, LOC, SAL,…)Common queries:
Qa: select * Qb: select * from E from E where LOC=Sa where
LOC=Sb and … and ...
CS 347 Notes 02 39
Three choices:
(1) Pr = { } F1 ={ E }
(2) Pr = {LOC=Sa, LOC=Sb}
F2={ loc=Sa E, loc=Sb E }
(3) Pr = {LOC=Sa, LOC=Sb, Sal<10}
F3={ loc=Sa sal<10 E, loc=Sa sal10 E, loc=Sb sal<10E, loc=Sb sal10 E }
CS 347 Notes 02 40
In other words:
Loc=Sa sal < 10
Loc=Sa sal 10
Loc=Sb sal < 10
Loc=Sb sal 10
F1 F3F2
Qa: Select … loc = Sa ...
Qb: Select … loc = Sb ...
CS 347 Notes 02 41
In other words:
Loc=Sa sal < 10
Loc=Sa sal 10
Loc=Sb sal < 10
Loc=Sb sal 10
F1 F3F2
Qa: Select … loc = Sa ...
Qb: Select … loc = Sb ...
F2 is good…
(not F1 , F3 )
CS 347 Notes 02 42
Derived horizontal fragmentation
Example: E(#, NM, SAL, LOC) F={ E1, E2} by LOC
J(#, DES,…)Common query for project: [Given employee name,
list projects (s)he works in]
CS 347 Notes 02 43
E1
(at Sa) (at Sb)
E2# NM Loc Sal5 Joe Sa 108 Tom Sa 15…
# NM Loc Sal7 Sally Sb 2512 Fred Sb 15…
# Description5 work on 347 hw7 go to moon5 build table12 rest…
J
CS 347 Notes 02 44
E1
(at Sa) (at Sb)
E2# NM Loc Sal5 Joe Sa 108 Tom Sa 15…
# NM Loc Sal7 Sally Sb 2512 Fred Sb 15…
J1 J2
J1 = J E1 J2 = J E2
# Des5 work on 347 hw5 build table…
# Des7 go to moon12 rest…
CS 347 Notes 02 45
Derived horizontal fragmentation
R, F = { F1, F2, ... Fn}
S, D = {D1, D2, …Dn} where Di =S Fi
Convention: R is owner S is member
F could beprimary or derived
CS 347 Notes 02 46
• Checking completeness and disjointness of derived fragmentation
But no #= 33 in E1 nor in E2!
# Des…33 build chair…
Example: Say J is:
This J tuple will not be in J1 nor J2Fragmentation not complete
CS 347 Notes 02 47
Need to enforcereferential integrity constraint:
join attr(#) of member relation
joint attr(#) of owner
relation
To get completeness
CS 347 Notes 02 48
# NM Loc Sal5 Joe Sa 10…
# NM Loc Sal5 Fred Sb 20…
Example:
E1 E2
# Description5 day off…
# Description5 day off…
# Description5 day off…
J1
J
J2
Fragmentationis not
disjoint!
CS 347 Notes 02 50
Summary: horizontal fragmentation
• Type: primary, derived• Properties: completeness,
disjointness
CS 347 Notes 02 51
Vertical fragmentation
E1
# NM Loc Sal5 Joe Sa 107 Sally Sb 258 Fred Sa 15…
# NM Loc5 Joe Sa7 Sally Sb8 Fred Sa…
# Sal5 107 258 15…
E
E2
Example:
CS 347 Notes 02 55
(2) DisjointnessTi Tj = for all i,j ij
E(#,LOC,SAL)E1(#,LOC)
E2(SAL)
Not a desirable property!!(could not reconstruct R!)
CS 347 Notes 02 56
(3) Lossless join
Ri = Rall i
One way to achieve lossless join: Repeat key in all fragments, i.e.,
Key Ti for all i
CS 347 Notes 02 57
How do we decide what attributes are grouped with which?
E1(#,NM,LOC)E2(#,SAL)
Example:E(#,NM,LOC,SAL) E1(#,NM)
E2(#,LOC)E3(#,SAL)
?
CS 347 Notes 02 58
A1 A2 A3 A4 A5
A1 - - - - -
A2 50 - - - -
A3 45 48 - - -
A4 1 2 0 - -
A5 0 0 4 75 -
Attribute affinity matrix
CS 347 Notes 02 59
A1 A2 A3 A4 A5
A1 - - - - -
A2 50 - - - -
A3 45 48 - - -
A4 1 2 0 - -
A5 0 0 4 75 -
Attribute affinity matrix
R1[K,A1,A2,A3] R2[K,A4,A5]
CS 347 Notes 02 60
• Textbook (Ozsu & Valduriez) discusses– How to build affinity matrix– How to identify attribute clusters– How to partition relation
• You are not responsible for– Clustering and partitioning algorithms (i.e., Skip pages 135-145)
CS 347 Notes 02 61
Allocation
Example: E(#,NM,LOC,SAL) F1 = loc=Sa E ; F2 = loc=Sb E
Qa: select … where loc=Sa...Qb: select … where loc=Sb…
Site a Site b
Where do F1,F2 go?
?
CS 347 Notes 02 62
Issues
• Where do queries originate• What is communication cost?
and size of answers, relations,…
• What is storage capacity, cost at sites? and size of fragments?
• What is processing power at sites?
CS 347 Notes 02 63
• What is query processing strategy?
– How are joins done?– Where are answers collected?
More Issues
CS 347 Notes 02 64
• Cost of updating copies?• Writes and concurrency control?• ...
Do we replicate fragments?
CS 347 Notes 02 65
Optimization problem:• What is best placement of
fragments and/or best number of copies to:– minimize query response time– maximize throughput– minimize “some cost”– ...
• Subject to constraints?– Available storage– Available bandwidth, power,…– Keep 90% of response time below X– ...
CS 347 Notes 02 66
Optimization problem:• What is best placement of
fragments and/or best number of copies to:– minimize query response time– maximize throughput– minimize “some cost”– ...
• Subject to constraints?– Available storage– Available bandwidth, power,…– Keep 90% of response time below X– ...
This is an incredibly hard problem
CS 347 Notes 02 67
Example: Single fragment F
Read cost: [ti MIN Cij]
i: Originating site of requestti: Read traffic at Si
Cij: Retrieval costAccessing fragment F at Sj from Si
ji=1
m
CS 347 Notes 02 68
Scenario - Read cost
1 2
.
. .
.
.
.
3i
C=inf
ci,3
ci,1 ci,2
Stream of readrequests for F ti REQ/SEC
C=inf
C=infF
F F
CS 347 Notes 02 69
Write cost
Xj ui C’ij
i: Originating site of requestj: Site being updatedXj: 0 if F not stored at Sj
1 if F stored at Sj
ui: Write traffic at Si C’ij: Write cost
Updating F at Sj from Si
i=1 j=1
m m
CS 347 Notes 02 71
Storage cost:
Xi di
Xi: 0 if F not stored at Si
1 if F stored at Si
di: storage cost at Si
i=1
m
CS 347 Notes 02 73
Can add more complications:
Examples: - Multiple fragments- Fragment sizes- Concurrency control cost
Case Study: PNUTS• Where in the World is My Data?
Sudarshan Kadambi, Jianjun Chen, Brian F. Cooper, David Lomax, Raghu Ramakrishnan, Adam Silberstein, Erwin Tam, Hector Garcia-Molina; VLDB 2011
• Distributed object/tuple store for Yahoo!
CS 347 Notes 02 74
Case Study: PNUTS• Issue: Where to locate data• Issue: What and where to replicate
CS 347 Notes 02 75
Policy Constraints• MIN_COPIES: The minimum number of
full replicas of the record that must exist.
• INCL_LIST: An inclusion list -- the locations where a full replica of the record must exist.
• EXCL_LIST: An exclusion list -- the locations where a full replica of the record cannot exist.
CS 347 Notes 02 77
Example Rule
• Rule 1: • IF TABLE_NAME = "Users“
THENSET 'MIN_COPIES' = 2
CONSTRAINT_PRI = 0
CS 347 Notes 02 78
Another Example Rule
• Rule 2:• IF TABLE_NAME = "Users" AND
FIELD STR('home location') = 'France‘THEN
SET 'MIN_COPIES' = 3 ANDSET 'EXCL LIST' = 'USWest,
USEast‘CONSTRAINT PRI = 1
CS 347 Notes 02 79