l3: ddbs design (tailored from the slides by prof. hector garcia-molina )

71
L3-1 DDBS Design -- 1 L3: DDBS Design (Tailored from L3: DDBS Design (Tailored from the Slides by Prof. Hector the Slides by Prof. Hector Garcia-Molina) Garcia-Molina) 1. Introduction 2. Fragmentation 1) Horizontal fragmentation 2) Vertical fragmentation 3. Allocation

Upload: reia

Post on 12-Jan-2016

35 views

Category:

Documents


0 download

DESCRIPTION

L3: DDBS Design (Tailored from the Slides by Prof. Hector Garcia-Molina ). Introduction Fragmentation Horizontal fragmentation Vertical fragmentation Allocation. Distributed Database Design. Top-down approach: - have DB… - how to split and allocate the sites - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 1

L3: DDBS Design (Tailored from L3: DDBS Design (Tailored from the Slides by Prof. Hector the Slides by Prof. Hector Garcia-Molina)Garcia-Molina)

1. Introduction2. Fragmentation

1) Horizontal fragmentation

2) Vertical fragmentation

3. Allocation

Page 2: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 2

Distributed Database Design Distributed Database Design

Top-down approach: - have DB…

- how to split and

allocate the sites

Multi-DBs (or bottom-up): no design

issues!

Page 3: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 3

Two issues in DDB design:Two issues in DDB design:

Fragmentation Allocation

Note: issues not independent, but will cover separately

Page 4: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 4

ExampleExample

Employee relation E (eno,name,add,sal,…)

40% of queries: 40% of queries:

Qa: select * Qb: select *from E from Ewhere add=Sa where

add=Sband… and ...

Page 5: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 5

ExampleExample

Employee relation E (eno,name,add,sal,…)

40% of queries: 40% of queries:

Qa: select * Qb: select *from E from Ewhere add=Sa where

add=Sband… and ...

Motivation: Two sites: Sa, Sb

Qa QbSa Sb

Page 6: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 6

Eno NM Add Sal E

578

Sa 10Sally Sb 25Tom Sa 15

Joe

Eno NM Add Sal Eno NM Add Sal58

Sa 10Tom Sa 15Joe 7 Sb 25Sally

..

..

....

F

At SaAt Sb

Page 7: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 7

F = { F1, F2 }

F1 = add=Sa E F2 = add=Sb E

Page 8: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 8

F = { F1, F2 }

F1 = loc=Sa E F2 = loc=Sb

E called primary horizontal fragmentation

Page 9: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 9

FragmentationFragmentation

Horizontal Primary depends on local attributesR Derived

depends on foreign relation

Vertical

RFragmentation also called

Sharding

Page 10: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 10

Three common horizontal Three common horizontal partitioning techniquespartitioning techniques

Round robin Hash partitioning Range partitioning

Page 11: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 11

• Round robinRound robin

R D0 D1 D2

t1 t1t2 t2t3

t3t4 t4... t5

Evenly distributes data Good for scanning full relation Not good for point or range queries

Page 12: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 12

• Hash partitioningHash partitioning

R D0 D1 D2

t1h(k1)=2 t1

t2h(k2)=0 t2t3h(k3)=0 t3t4h(k4)=1 t4...

Good for point queries on key; also for joins

Not good for range queries; point queries not on key

If hash function good, even distribution

Page 13: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 13

• Range partitioningRange partitioning

R D0 D1 D2

t1: A=5 t1t2: A=8 t2t3: A=2 t3t4: A=3 t4...

Good for some range queries on A Need to select good vector: else

unbalance data skew execution

skew

4 7

partitioningvector

V0 V1

Page 14: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 14

Which are good Which are good fragmentations?fragmentations?Example:

F = { F1, F2 }

F1 = sal<10 E F2 = sal>20 E

Problem: Some tuples lost!

Page 15: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 15

Which are good Which are good fragmentations?fragmentations?Second example:

F = { F3, F4 }F3 = sal<10 E F4 = sal>5 E

Tuples with 5 < sal < 10 are duplicated...

Page 16: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 16

Prefer to deal with replication explicitly

Example: F = { F5, F6, F7 }

F5 = sal 5 E F6 = 5< sal <10

E

F7 = sal 10 E

Then replicate F6 if convenient (part of allocation problem)

Page 17: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 17

Desired properties for Desired properties for horizontal fragmentationhorizontal fragmentation

R F ={ F1, F2, … }

(1) Completeness

t R, Fi F such that t Fi

Page 18: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 18

(2) Disjointness t Fi, Fj such that

tFj, i j, Fi, Fj F

(3) Reconstruction - ignore

Page 19: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 19

How do we get completeness How do we get completeness and disjointness?and disjointness?

(1) Check it “manually”!

e.g., F1 = sal<10 E ; F2 = sal10 E

(2) “Automatically” generate fragments with these properties

Desired simple predicates Fragments

Page 20: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 20

Example of generationExample of generation

Say queries use predicates:A<10, A>5, Add = SA, Add = SB

Next: - generate “minterm” predicates

- eliminate useless ones

Page 21: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 21

Minterm predicates (part I)Minterm predicates (part I)(1) A<10 A>5 Add=SA Add=SB

(2) A<10 A>5 Add=SA ¬(Add=SB)(3) A<10 A>5 ¬(Add=SA) Add=SB

(4) A<10 A>5 ¬(Add=SA) ¬(Add=SB)(5) A<10 ¬(A>5) Add=SA Add=SB

(6) A<10 ¬(A>5) Add=SA ¬(Add=SB)(7) A<10 ¬(A>5) ¬(Add=SA) Add=SB

(8) A<10 ¬(A>5) ¬(Add=SA) ¬(Add=SB)

Page 22: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 22

Minterm predicates (part I)Minterm predicates (part I)(1) A<10 A>5 Add=SA Add=SB

(2) A<10 A>5 Add=SA ¬(Add=SB)(3) A<10 A>5 ¬(Add=SA) Add=SB

(4) A<10 A>5 ¬(Add=SA) ¬(Add=SB)(5) A<10 ¬(A>5) Add=SA Add=SB

(6) A<10 ¬(A>5) Add=SA ¬(Add=SB)(7) A<10 ¬(A>5) ¬(Add=SA) Add=SB

(8) A<10 ¬(A>5) ¬(Add=SA) ¬(Add=SB)

Page 23: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 23

Minterm predicates (part I)Minterm predicates (part I)(1) A<10 A>5 Add=SA Add=SB

(2) A<10 A>5 Add=SA ¬(Add=SB)(3) A<10 A>5 ¬(Add=SA) Add=SB

(4) A<10 A>5 ¬(Add=SA) ¬(Add=SB)(5) A<10 ¬(A>5) Add=SA Add=SB

(6) A<10 ¬(A>5) Add=SA ¬(Add=SB)(7) A<10 ¬(A>5) ¬(Add=SA) Add=SB

(8) A<10 ¬(A>5) ¬(Add=SA) ¬(Add=SB)

A 5

5 < A < 10

Page 24: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 24

Minterm predicates (part II)Minterm predicates (part II)

(9) ¬(A<10) A>5 Add=SA Add=SB

(10) ¬(A<10) A>5 Add=SA ¬(Add=SB)(11) ¬(A<10) A>5 ¬(Add=SA) Add=SB

(12) ¬(A<10) A>5 ¬(Add=SA) ¬(Add=SB)(13) ¬(A<10) ¬(A>5) Add=SA Add=SB

(14) ¬(A<10) ¬(A>5) Add=SA ¬(Add=SB)(15) ¬(A<10) ¬(A>5) ¬(Add=SA) Add=SB

(16) ¬(A<10) ¬(A>5) ¬(Add=SA) ¬(Add=SB)

Page 25: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 25

Minterm predicates (part II)Minterm predicates (part II)

(9) ¬(A<10) A>5 Add=SA Add=SB

(10) ¬(A<10) A>5 Add=SA ¬(Add=SB)(11) ¬(A<10) A>5 ¬(Add=SA) Add=SB

(12) ¬(A<10) A>5 ¬(Add=SA) ¬(Add=SB)(13) ¬(A<10) ¬(A>5) Add=SA Add=SB

(14) ¬(A<10) ¬(A>5) Add=SA ¬(Add=SB)(15) ¬(A<10) ¬(A>5) ¬(Add=SA) Add=SB

(16) ¬(A<10) ¬(A>5) ¬(Add=SA) ¬(Add=SB)

A 10

Page 26: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 26

Final fragments:Final fragments:

F2: 5 < A < 10 Add=SA F3: 5 < A < 10 Add=SB F6: A 5 Add=SA F7: A 5 Add=SB F10: A 10 Add=SA F11: A 10 Add=SB

Page 27: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 27

Note:Note: elimination of useless elimination of useless fragments depends on fragments depends on application application semantics:semantics:

e.g.: if LOC could be SA, SB, we need to add fragments

F4: 5 <A <10 Add SA Add SB

F8: A 5 Add SA Add SB

F12: A 10 Add SA Add SB

Page 28: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 28

Why does this work?Why does this work?

Predicates: p1 p2 p3 p4 p1 p2 p3 ¬ p4

¬ p1 ¬ p2 ¬ p3 ¬ p4

...

Page 29: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 29

(1) Completeness: Take t Rpi(t) must be T or F!

Say p1(t) =T p2(t) = T p3(t) =F p4(t) =F

Then t is in fragment with predicate p1 p2 ¬ p3 ¬ p4

Page 30: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 30

(2) DisjointnessSay t Fragment p1 p2 ¬ p3 ¬ p4

Then: p1(t) = T, p2(t) = T, p3(t) = F, p4(t)= F

t cannot be in any other fragment!

Page 31: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 31

SummarySummary Given simple predicates Pr= { p1, p2,..

pm }minterm predicates are

M={m | m = pk*, 1 k m }

where pk* is pk or is ¬ pk

pkPr

• Fragments m R for all m M are

complete and disjoint

Page 32: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 32

Another Desired Fragmentation Property:Another Desired Fragmentation Property:

Match Access PatternsMatch Access Patterns

data A

data B

data C

frequentlyaccessedtogether

try toplace in

samefragment

Page 33: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 33

Return to example:Return to example:

E(Eno, NM, Add, SAL,…)Common queries:

Qa: select * Qb: select * from E from E where Add=Sa where

Add=Sb and … and ...

Page 34: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 34

Three choices:Three choices:

(1) Pr = { } F1 ={ E }

(2) Pr = {Add=Sa, Add=Sb}

F2={ add=Sa E, add=Sb E }

(3) Pr = {Add=Sa, Add=Sb, Sal<10}

F3={ add=Sa sal<10 E, add=Sa sal10 E,

add=Sb sal<10E, add=Sb sal10 E }

Page 35: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 35

In other words:In other words:

Add=Sa sal < 10

Add=Sa

sal 10

Add=Sb sal < 10

Add=Sb

sal 10

F1 F3F2

Qa: Select … add = Sa ...

Qb: Select … add = Sb ...

F2 is good…(not F1 , F3 )

Page 36: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 36

HF Example Database HF Example Database InformationInformation

Title, Sal

PAY

Eno, Ename, Title

EMP

Jno, Jname, Budget. Loc

PROJ

Eno, Jno, Resp, Dur

ASG

The global schema

Owner and member relations

Cardinality of each relation

L1

L2 L3

Page 37: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 37

PHF Desired Properties: PHF Desired Properties: Completeness Completeness

A set of simple predicates Pr is said to be complete if and only if there is an equal probability of access by every application to any tuple belonging to any minterm fragment that is defined according to Pr.

Example:

PNO PNAME BUDGET LOCP1 Instrumentation 150000 MontrealP2 Database 135000 New YorkP3 CAD/CAM 250000 New YorkP4 Maintenance 310000 ParisApplications:

Q1: Find the projects at each locationQ2: Find projects with budget less than $200,000 Predicates:Pr={ LOC=“Montreal”, LOC=“New York”, LOC=“Paris”}Pr={ LOC=“Montreal”, LOC=“New York”, LOC=“Paris”,

BUDGET 200000, BUDGET >200000}

Page 38: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 38

PHF Example: PAYPHF Example: PAY

TITLE SALMech. Eng. 27000Programmer 24000Elec. Eng. 40000Syst. Anal. 34000

PAYApplication: Employee records kept at two sites , one site handling records with salary>30,000, another site handling records with salary <=30,000.

Simple predicates: Pr = {SAL 30000, SAL > 30000}

Minterm predicates: m1: (SAL 30000); m2: (SAL > 30000)

TITLE SALElec. Eng. 40000Syst. Anal. 34000

PAY2

TITLE SALMech. Eng. 27000Programmer 24000

PAY1

Page 39: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 39

PHF Example: PROJPHF Example: PROJ

Applications: Find the name and budget of

projects given their location.– Issued at three sites

Access project information according to budget– One site access 200000 other

accesses >200000

p1: LOC = “Montreal”

p2 : LOC = “New York”

p3 : LOC = “Paris”

p4: BUDGET 200000

p5 : BUDGET > 200000

m1: LOC = “Montreal” BUDGET 200000

m2 : LOC = “Montreal” BUDGET > 200000

m3 : LOC = “New York” BUDGET 200000

m4 : LOC = “New York” BUDGET > 200000

m5 : LOC = “Paris” BUDGET 200000

M6 : LOC = “Paris” BUDGET > 200000

Page 40: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 40

PHF Example: PROJ - ResultPHF Example: PROJ - Result

PNO PNAME BUDGET LOCP1 Instrumentation 150000 MntrealP2 Database 135000 New YorkP3 CAD/CAM 250000 New YorkP4 Maintenance 310000 Paris

PROJ

m1: LOC = “Montreal” BUDGET 200000

m2 : LOC = “Montreal” BUDGET > 200000

m3 : LOC = “New York” BUDGET 200000

m4 : LOC = “New York” BUDGET > 200000

m5 : LOC = “Paris” BUDGET 200000

M6 : LOC = “Paris” BUDGET > 200000

PNO PNAME BUDGET LOCP1 Instrumentation 150000 Mntreal

PROJ1

PNO PNAME BUDGET LOCP2 Database 135000 New York

PNO PNAME BUDGET LOCP3 CAD/CAM 250000 New York

PNO PNAME BUDGET LOCP4 Maintenance 310000 Paris

PROJ2

PROJ4

PROJ6

Page 41: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 41

PHF - CorrectnessPHF - Correctness

Completeness Since Pr is complete and minimal, the

selection predicates are complete Reconstruction

If relation R is fragmented into FR = {R1, R2, …, Rr}

R = Ri FR Ri

Disjointness Minterm predicates that form the basis of

fragmentation should be mutually exclusive

Page 42: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 42

DHF: Derived Horizontal DHF: Derived Horizontal FragmentationFragmentation

Title, Sal

PAY

Eno, Ename, Title

EMP

Jno, Jname, Budget. Loc

PROJ

Eno, Jno, Resp, Dur

ASG

L1

L2 L3

Each link is an equijoin

Owner relation

Member relation

DHF: Defined on a member relation according to a selection operation on its owner

Page 43: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 43

DHF – ExampleDHF – Example

TITLE SALMech. Eng. 27000Programmer 24000

PAY1= SAL 30000 PAY

ENO ENAME TITLEE1 J. Doe Elec.Eng.E2 M. Smith Syst. Anal.E3 A. Lee Mech. Eng.E4 J. Miller ProgrammerE5 B. Casey Syst. Anal.E6 L. Chu Elec.Eng.E7 R. Davis Mech. Eng.E8 J. Jones Syst. Anal.

EMP ENO ENAME TITLEE3 A. Lee Mech. Eng.E4 J. Miller ProgrammerE7 R. Davis Mech. Eng.

EMP1 = EMP PAY1

ENO ENAME TITLEE1 J. Doe Elec.Eng.E2 M. Smith Syst. Anal.E5 B. Casey Syst. Anal.E6 L. Chu Elec.Eng.E8 J. Jones Syst. Anal.

EMP2 = EMP PAY2

TITLE SALElec. Eng. 40000Syst. Anal. 34000

PAY2= SAL> 30000 PAY

DHF

Title, Sal

PAY

Eno, Ename, Title

EMP L1

Page 44: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 44

DHF: Derived Horizontal DHF: Derived Horizontal Fragmentation Fragmentation Let S be horizontally fragmented and let

there be a link L with owner(L) = S, and member(L) = R, the derived horizontal fragments of R are defined as

Ri = R Si , 1 i w

where Si is the horizontal fragment of S, is the semi join operator, and w is the maximum number of fragments

Inputs to derived horizontal fragmentation: partitions of owner relation member relation the semi join condition

The algorithm is straight forward.

Page 45: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 45

DHF: Correctness DHF: Correctness

Completeness: primary horizontal fragmentation based on

completeness of selection predicates. For derived horizontal fragmentation based on referential integrity

Reconstruction: union

Disjointedness: primary horizontal fragmentation based on

mutually exclusive simple predicates. For derived horizontal fragmentation, based on whether the link between owner and member is 1:1 or 1:m relationship

Page 46: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 46

DHF: IssuesDHF: Issues

Multiple owners for a member relation; how should we derived horizontally fragment a member relation.

There can be a chain of derived horizontal fragmentation.

Page 47: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 47

L3.2.2: DDBS DesignL3.2.2: DDBS Design

Introduction Fragmentation

Horizontal fragmentation

Vertical fragmentation Allocation

Page 48: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 48

Vertical fragmentationVertical fragmentation

E1

Eno NM Add Sal5 Joe Sa 107 Sally Sb 258 Fred Sa 15…

Eno NM Add5 Joe Sa7 Sally Sb8 Fred Sa…

Eno Sal5 107 258 15…

E

E2

Example:

Page 49: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 49

R[T] R1[T1] Ti T

Rn[Tn]

Just like normalization of relations

...

Page 50: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 50

Properties:Properties: R[T] R[T] RRii[T[Tii] ]

(1) Completeness

U Ti = T

all i

Page 51: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 51

(2) DisjointnessTi Tj = for all i,j ij

E(#,LOC,SAL) E1(#,LOC)

E2(SAL)

Page 52: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 52

(2) DisjointnessTi Tj = for all i,j ij

E(#,LOC,SAL) E1(#,LOC)

E2(SAL)

Not a desirable property!!(could not reconstruct R!)

Page 53: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 53

(3) Lossless join

Ri = R

all i

One way to achieve lossless join: Repeat key in all fragments, i.e.,

Key Ti for all i

Page 54: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 54

How do we decide what How do we decide what attributesattributes are grouped with which? are grouped with which?

E1(Eno,NM,Add)E2(Eno,SAL)

Example:E(Eno,NM,LOC,SAL)

E1(Eno,NM)E2(Eno,Add)E3(Eno,SAL)

?

Page 55: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 55

A1 A2 A3 A4 A5A1 - - - - -A2 50 - - - -A3 45 48 - - -A4 1 2 0 - -A5 0 0 4 75 -

Attribute affinity matrixAttribute affinity matrix

Page 56: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 56

A1 A2 A3 A4 A5A1 - - - - -A2 50 - - - -A3 45 48 - - -A4 1 2 0 - -A5 0 0 4 75 -

Attribute affinity matrixAttribute affinity matrix

R1[K,A1,A2,A3] R2[K,A4,A5]

Page 57: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 57

Textbook (Ozsu & Valduriez) discusses How to build affinity matrix How to identify attribute clusters How to partition relation

You are not responsible for Clustering and partitioning algorithms (i.e., Skip pages 135-145)

Page 58: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 58

AllocationAllocation

Example: E(Eno,NM,Add,SAL) F1 = add=Sa E ; F2 = add=Sb E

Qa: select … where add=Sa...Qb: select … where add=Sb…

Site a Site b

Where do F1,F2 go?

?

Page 59: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 59

IssuesIssues Where do queries originate What is communication cost?

and size of answers, relations,…

What is storage capacity, cost at sites? and size of fragments?

What is processing power at sites?

Page 60: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 60

What is query processing strategy?

How are joins done? Where are answers collected?

More IssuesMore Issues

Page 61: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 61

Cost of updating copies? Writes and concurrency control? ...

Do we replicate fragments?Do we replicate fragments?

Page 62: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 62

Optimization problem:Optimization problem: What is best placement of

fragments and/or best number of copies to: minimize query response time maximize throughput minimize “some cost” ...

Subject to constraints? Available storage Available bandwidth, power,… Keep 90% of response time below X ...

Page 63: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 63

Optimization problem:Optimization problem: What is best placement of

fragments and/or best number of copies to: minimize query response time maximize throughput minimize “some cost” ...

Subject to constraints? Available storage Available bandwidth, power,… Keep 90% of response time below X ...

This is an incredibly hard problem

Page 64: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 64

Example:Example: Single fragment F Single fragment F

Read cost: [ti MIN Cij]

i: Originating site of requestti: Read traffic at Si

Cij: Retrieval costAccessing fragment F at Sj

from Si

ji=1

m

Page 65: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 65

Scenario Scenario - Read cost- Read cost

1 2

.

. .

.

.

.

3

i

C=inf

ci,3

ci,1 ci,2

Stream of readrequests for F ti REQ/SEC

C=inf

C=infF

F F

Page 66: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 66

Write costWrite cost

Xj ui C’ij

i: Originating site of requestj: Site being updatedXj: 0 if F not stored at Sj

1 if F stored at Sjui: Write traffic at Si C’ij: Write cost

Updating F at Sj from Si

i=1 j=1

m m

Page 67: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 67

ScenarioScenario - write cost - write cost

Updates

ui updates/sec

. .

. .

. .i

F

F F

Page 68: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 68

Storage cost:Storage cost:

Xi di

Xi: 0 if F not stored at Si

1 if F stored at Si

di: storage cost at Si

i=1

m

Page 69: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 69

Target functionTarget function::

min [ti MIN Cij + Xj ui

C’ij ]

+ Xi di

ji=1 j=1

i=1

m m

m

Page 70: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 70

Can add more complications:Can add more complications:

Examples: - Multiple fragments- Fragment sizes- Concurrency control cost

Page 71: L3: DDBS Design (Tailored from the Slides by Prof.  Hector Garcia-Molina )

L3-1 DDBS Design -- 71

Solution MethodsSolution Methods

NP-complete Heuristics approaches Reduce search space