today normal forms - duke university...bcnf 3nf 2nf 1nf only bcnf and 4nf are covered in the class...

10
9/12/17 1 CompSci 516 Data Intensive Computing Systems Lecture 6a Design Theory and Normalization – 2/2 Instructor: Sudeepa Roy 1 Duke CS, Fall 2017 CompSci 516: Database Systems Announcements HW1 deadline: Due on 09/21 (Thurs), 11:55 pm, no late days Project proposal deadline: Preliminary idea and team members due by 09/18 (Mon) by email to the instructor Proposal due on sakai by 09/25 (Mon), 11:55 pm Duke CS, Fall 2017 CompSci 516: Database Systems 2 Today Finish Normalization from Lecture 5 Start Database Internals Recap Why redundancy is bad Functional dependencies Closure of attributes and functional dependencies Duke CS, Fall 2017 CompSci 516: Database Systems 3 Acknowledgement: The following slides have been created adapting the instructor material of the [RG] book provided by the authors Dr. Ramakrishnan and Dr. Gehrke. Some slides have been adapted from slides by Prof. Jun Yang Normal Forms R is in 4NF R is in BCNF R is in 3NF R is in 2NF (a historical one) R is in 1NF (every field has atomic values) Duke CS, Fall 2017 CompSci 516: Database Systems 4 BCNF 3NF 2NF 1NF Only BCNF and 4NF are covered in the class 4NF Boyce-Codd Normal Form (BCNF) Relation R with FDs F is in BCNF if, for all X → A in F A ϵ X (called a trivial FD), or X contains a key for R i.e. X is a superkey Duke CS, Fall 2017 CompSci 516: Database Systems 5 Next lecture: BCNF decomposition algorithm Decomposition Eliminates redundancy To get back to the original relation: 6 uid uname twitterid gid fromDate 142 Bart @BartJSimpson dps 1987-04-19 123 Milhouse @MilhouseVan_ gov 1989-12-17 857 Lisa @lisasimpson abc 1987-04-19 857 Lisa @lisasimpson gov 1988-09-01 456 Ralph @ralphwiggum abc 1991-04-25 456 Ralph @ralphwiggum gov 1992-09-01 uid uname twitterid 142 Bart @BartJSimpson 123 Milhouse @MilhouseVan_ 857 Lisa @lisasimpson 456 Ralph @ralphwiggum uid gid fromDate 142 dps 1987-04-19 123 gov 1989-12-17 857 abc 1987-04-19 857 gov 1988-09-01 456 abc 1991-04-25 456 gov 1992-09-01 (on twitter) User id user name Twitter id Group id Joining Date (to a group) Duke CS, Fall 2017 CompSci 516: Database Systems

Upload: others

Post on 05-May-2020

14 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Today Normal Forms - Duke University...BCNF 3NF 2NF 1NF Only BCNF and 4NF are covered in the class 4NF Boyce-Codd Normal Form (BCNF) •Relation R with FDs Fis in BCNFif, for all X

9/12/17

1

CompSci 516DataIntensiveComputingSystems

Lecture6aDesignTheoryandNormalization– 2/2

Instructor:Sudeepa Roy

1DukeCS,Fall2017 CompSci516:DatabaseSystems

Announcements• HW1deadline:

– Dueon09/21(Thurs),11:55pm,nolatedays

• Projectproposaldeadline:– Preliminaryideaandteammembersdueby09/18(Mon)byemailtotheinstructor

– Proposaldueonsakai by09/25(Mon),11:55pm

DukeCS,Fall2017 CompSci516:DatabaseSystems 2

Today

• FinishNormalizationfromLecture5• StartDatabaseInternals

• Recap– Whyredundancyisbad– Functionaldependencies– Closureofattributesandfunctionaldependencies

DukeCS,Fall2017 CompSci516:DatabaseSystems 3

Acknowledgement:• Thefollowingslideshavebeencreatedadaptingtheinstructormaterialofthe[RG]bookprovidedbytheauthorsDr.Ramakrishnan andDr.Gehrke.• SomeslideshavebeenadaptedfromslidesbyProf.JunYang

NormalForms

Risin4NF⇒ RisinBCNF⇒ Risin3NF⇒ Risin2NF(ahistoricalone)⇒ Risin1NF(everyfieldhasatomicvalues)

DukeCS,Fall2017 CompSci516:DatabaseSystems 4

BCNF

3NF

2NF

1NF

OnlyBCNFand4NFarecoveredintheclass

4NF

Boyce-CoddNormalForm(BCNF)

• RelationRwithFDsF isinBCNF if,forallX→AinF– AϵX(calledatrivial FD),or– XcontainsakeyforR

• i.e.Xisasuperkey

DukeCS,Fall2017 CompSci516:DatabaseSystems 5

Nextlecture:BCNFdecompositionalgorithm

Decomposition

• Eliminatesredundancy• Togetbacktotheoriginalrelation:

6

uid uname twitterid gid fromDate

142 Bart @BartJSimpson dps 1987-04-19

123 Milhouse @MilhouseVan_ gov 1989-12-17

857 Lisa @lisasimpson abc 1987-04-19

857 Lisa @lisasimpson gov 1988-09-01

456 Ralph @ralphwiggum abc 1991-04-25

456 Ralph @ralphwiggum gov 1992-09-01

… … … … …

uid uname twitterid

142 Bart @BartJSimpson

123 Milhouse @MilhouseVan_

857 Lisa @lisasimpson

456 Ralph @ralphwiggum

… … …

uid gid fromDate

142 dps 1987-04-19

123 gov 1989-12-17

857 abc 1987-04-19

857 gov 1988-09-01

456 abc 1991-04-25

456 gov 1992-09-01

… … …

(ontwitter)

• Userid• username• Twitterid• Groupid• JoiningDate

(toagroup)

DukeCS,Fall2017 CompSci516:DatabaseSystems

Page 2: Today Normal Forms - Duke University...BCNF 3NF 2NF 1NF Only BCNF and 4NF are covered in the class 4NF Boyce-Codd Normal Form (BCNF) •Relation R with FDs Fis in BCNFif, for all X

9/12/17

2

uid twitterid

142 @BartJSimpson

123 @MilhouseVan_

857 @lisasimpson

456 @ralphwiggum

… …

uid uname

142 Bart

123 Milhouse

857 Lisa

456 Ralph

… …

Unnecessarydecomposition

• Fine:joinreturnstheoriginalrelation• Unnecessary:noredundancyisremoved;schemaismore

complicated(anduid isstoredtwice!)7

uid uname twitterid

142 Bart @BartJSimpson

123 Milhouse @MilhouseVan_

857 Lisa @lisasimpson

456 Ralph @ralphwiggum

… … …

DukeCS,Fall2017 CompSci516:DatabaseSystems

uid fromDate

142 1987-04-19

123 1989-12-17

857 1987-04-19

857 1988-09-01

456 1991-04-25

456 1992-09-01

… …

Baddecomposition

• Associationbetweengid andfromDate islost• Joinreturnsmorerowsthantheoriginalrelation

8

uid gid fromDate

142 dps 1987-04-19

123 gov 1989-12-17

857 abc 1987-04-19

857 gov 1988-09-01

456 abc 1991-04-25

456 gov 1992-09-01

… … …uid gid

142 dps

123 gov

857 abc

857 gov

456 abc

456 gov

… …

DukeCS,Fall2017 CompSci516:DatabaseSystems

Losslessjoindecomposition

• Decomposerelation𝑅 intorelations𝑆 and𝑇– 𝑎𝑡𝑡𝑟𝑠 𝑅 = 𝑎𝑡𝑡𝑟𝑠 𝑆 ∪ 𝑎𝑡𝑡𝑟𝑠 𝑇– 𝑆 = 𝜋,--./ 0 𝑅– 𝑇 = 𝜋,--./ 1 𝑅

• Thedecompositionisalosslessjoindecompositionif,givenknownconstraintssuchasFD’s,wecanguaranteethat𝑅 =𝑆 ⋈ 𝑇

• 𝑅 ⊆ 𝑆 ⋈ 𝑇 or𝑅 ⊇ 𝑆 ⋈ 𝑇 ?

• Anydecompositiongives𝑅 ⊆ 𝑆 ⋈ 𝑇 (why?)– Alossy decompositionisonewith𝑅 ⊂ 𝑆 ⋈ 𝑇

9DukeCS,Fall2017 CompSci516:DatabaseSystems

uid gid fromDate

142 dps 1987-04-19

123 gov 1989-12-17

857 abc 1987-04-19

857 gov 1988-09-01

456 abc 1991-04-25

456 gov 1992-09-01

… … …

uid gid fromDate

142 dps 1987-04-19

123 gov 1989-12-17

857 abc 1988-09-01

857 gov 1987-04-19

456 abc 1991-04-25

456 gov 1992-09-01

… … …

Loss?ButIgotmorerows!

• “Loss”refersnottothelossoftuples,buttothelossofinformation– Or,theabilitytodistinguishdifferentoriginalrelations

10

Nowaytotellwhichistheoriginalrelation

uid fromDate

142 1987-04-19

123 1989-12-17

857 1987-04-19

857 1988-09-01

456 1991-04-25

456 1992-09-01

… …

uid gid

142 dps

123 gov

857 abc

857 gov

456 abc

456 gov

… …DukeCS,Fall2017 CompSci516:DatabaseSystems

BCNFdecompositionalgorithm

• FindaBCNFviolation– Thatis,anon-trivialFD𝑋 → 𝑌 in𝑅 where𝑋 isnotasuperkeyof𝑅

• Decompose𝑅 into𝑅8 and𝑅9,where– 𝑅8 hasattributes𝑋 ∪ 𝑌– 𝑅9 hasattributes𝑋 ∪ 𝑍,where𝑍 containsallattributesof𝑅 thatareinneither𝑋 nor𝑌

• RepeatuntilallrelationsareinBCNF

• Alsogivesalosslessdecomposition!

11DukeCS,Fall2017 CompSci516:DatabaseSystems

BCNFdecompositionexample- 1

• CSJDPQV,keyC,F={JP→ C,SD→ P,J→ S}– TodealwithSD→P,decomposeintoSDP,CSJDQV.– TodealwithJ→ S,decomposeCSJDQVintoJSandCJDQV

• IsJP→ CaviolationofBCNF?

• Note:– severaldependenciesmaycauseviolationofBCNF– Theorderinwhichwepickthemmayleadtoverydifferentsetsofrelations

– theremaybemultiplecorrectdecompositions(canpickJ→ Sfirst)DukeCS,Fall2017 CompSci516:DatabaseSystems 12

Page 3: Today Normal Forms - Duke University...BCNF 3NF 2NF 1NF Only BCNF and 4NF are covered in the class 4NF Boyce-Codd Normal Form (BCNF) •Relation R with FDs Fis in BCNFif, for all X

9/12/17

3

BCNFdecompositionexample- 2

13

UserJoinsGroup (uid,uname,twitterid,gid,fromDate)

uid→ uname,twitteridtwitterid→ uiduid,gid→ fromDate

BCNFviolation:uid→ uname,twitterid

User (uid,uname,twitterid) Member(uid,gid,fromDate)

BCNFBCNF

uid→ uname,twitteridtwitterid→ uid

uid,gid→ fromDate

DukeCS,Fall2017 CompSci516:DatabaseSystems 14

UserJoinsGroup (uid,uname,twitterid,gid,fromDate)

uid→ uname,twitteridtwitterid→ uiduid,gid→ fromDate

BCNFviolation:twitterid→ uid

UserId (twitterid,uid)

Member(twitterid,gid,fromDate)

BCNF

BCNF

twitterid→ unametwitterid,gid→ fromDate

UserJoinsGroup’ (twitterid,uname,gid,fromDate)

BCNFviolation:twitterid→ uname

UserName (twitterid,uname)BCNF

applyArmstrong’saxiomsandrules!

DukeCS,Fall2017 CompSci516:DatabaseSystems

BCNFdecompositionexample- 3

Recap

• Functionaldependencies:ageneralizationofthekeyconcept

• Non-keyfunctionaldependencies:asourceofredundancy

• BCNFdecomposition:amethodforremovingredundancies– BNCFdecompositionisalosslessjoindecomposition

• BCNF:schemainthisnormalformhasnoredundancyduetoFD’s

15DukeCS,Fall2017 CompSci516:DatabaseSystems

BCNF=noredundancy?

• User (uid,gid,place)– Ausercanbelongtomultiplegroups– Ausercanregisterplacesshe’svisited– Groupsandplaceshavenothingtodowithother– FD’s?

• None– BCNF?

• Yes– Redundancies?

• Tons!

16

uid gid place

142 dps Springfield

142 dps Australia

456 abc Springfield

456 abc Morocco

456 gov Springfield

456 gov Morocco

… … …

DukeCS,Fall2017 CompSci516:DatabaseSystems

Multivalueddependencies

• Amultivalueddependency(MVD)hastheform𝑋 ↠ 𝑌,where𝑋 and𝑌 aresetsofattributesinarelation𝑅

• 𝑋 ↠ 𝑌 meansthatwhenevertworowsin𝑅 agreeonalltheattributesof𝑋,thenwecanswaptheir𝑌 componentsandgettworowsthatarealsoin𝑅

17

𝑿 𝒀 𝒁𝑎 𝑏8 𝑐8𝑎 𝑏9 𝑐9… … …

𝑿 𝒀 𝒁𝑎 𝑏8 𝑐8𝑎 𝑏9 𝑐9𝑎 𝑏9 𝑐8𝑎 𝑏8 𝑐9… … …

DukeCS,Fall2017 CompSci516:DatabaseSystems

MVDexamples

User(uid,gid,place)• uid↠ gid• uid↠ place

– Intuition:givenuid,attributesgid andplaceare“independent”

• uid,gid↠ place– Trivial:LHS∪ RHS=allattributesof𝑅

• uid,gid↠ uid– Trivial:LHS⊇ RHS

18DukeCS,Fall2017 CompSci516:DatabaseSystems

Page 4: Today Normal Forms - Duke University...BCNF 3NF 2NF 1NF Only BCNF and 4NF are covered in the class 4NF Boyce-Codd Normal Form (BCNF) •Relation R with FDs Fis in BCNFif, for all X

9/12/17

4

CompleteMVD+FDrules

• FDreflexivity,augmentation,andtransitivity• MVDcomplementation:

If𝑋 ↠ 𝑌,then𝑋 ↠ 𝑎𝑡𝑡𝑟𝑠 𝑅 − 𝑋 − 𝑌• MVDaugmentation:

If𝑋 ↠ 𝑌 and𝑉 ⊆ 𝑊,then𝑋𝑊 ↠ 𝑌𝑉• MVDtransitivity:

If𝑋 ↠ 𝑌 and𝑌 ↠ 𝑍,then𝑋 ↠ 𝑍 − 𝑌• Replication(FDisMVD):

If𝑋 → 𝑌,then𝑋 ↠ 𝑌• Coalescence:

If𝑋 ↠ 𝑌 and𝑍 ⊆ 𝑌 andthereissome𝑊 disjointfrom𝑌 suchthat𝑊 → 𝑍,then𝑋 → 𝑍

19

Tryprovingthingsusingthese!?

Verifytheseyourself!

DukeCS,Fall2017 CompSci516:DatabaseSystems

Anelegantsolution:“chase”

• GivenasetofFD’sandMVD’s𝒟,doesanotherdependency𝑑 (FDorMVD)followfrom𝒟?

• Procedure– Startwiththepremiseof𝑑,andtreatthemas“seed”tuplesinarelation

– Applythegivendependenciesin𝒟 repeatedly• IfweapplyanFD,weinferequalityoftwosymbols• IfweapplyanMVD,weinfermoretuples

– Ifweinfertheconclusionof𝑑,wehaveaproof– Otherwise,ifnothingmorecanbeinferred,wehaveacounterexample

20

Readthisslideafterlookingattheexamples

DukeCS,Fall2017 CompSci516:DatabaseSystems

Proofbychase• In𝑅 𝐴, 𝐵, 𝐶, 𝐷 ,does𝐴 ↠ 𝐵 and𝐵 ↠ 𝐶implythat𝐴 ↠ 𝐶?

21

𝑨 𝑩 𝑪 𝑫𝑎 𝑏8 𝑐8 𝑑8𝑎 𝑏9 𝑐9 𝑑9

𝑨 𝑩 𝑪 𝑫𝑎 𝑏8 𝑐9 𝑑8𝑎 𝑏9 𝑐8 𝑑9

Have: Need:

DukeCS,Fall2017 CompSci516:DatabaseSystems

Anotherproofbychase• In𝑅 𝐴, 𝐵, 𝐶, 𝐷 ,does𝐴 → 𝐵 and𝐵 → 𝐶 implythat𝐴 → 𝐶?

22

𝑨 𝑩 𝑪 𝑫𝑎 𝑏8 𝑐8 𝑑8𝑎 𝑏9 𝑐9 𝑑9

Have: Need:𝑐8 = 𝑐9

DukeCS,Fall2017 CompSci516:DatabaseSystems

Counterexamplebychase• In𝑅 𝐴, 𝐵, 𝐶, 𝐷 ,does𝐴 ↠ 𝐵𝐶 and𝐶𝐷 → 𝐵implythat𝐴 → 𝐵?

23

𝑨 𝑩 𝑪 𝑫𝑎 𝑏8 𝑐8 𝑑8𝑎 𝑏9 𝑐9 𝑑9

Have: Need:𝑏8 = 𝑏9

DukeCS,Fall2017 CompSci516:DatabaseSystems

4NF

• Arelation𝑅 isinFourthNormalForm(4NF)if– Foreverynon-trivialMVD𝑋 ↠ 𝑌 in𝑅,𝑋 isasuperkey

– Thatis,allFD’sandMVD’sfollowfrom“key→otherattributes”(i.e.,noMVD’sandnoFD’sbesideskeyfunctionaldependencies)

• 4NFisstrongerthanBCNF– BecauseeveryFDisalsoaMVD

24DukeCS,Fall2017 CompSci516:DatabaseSystems

Page 5: Today Normal Forms - Duke University...BCNF 3NF 2NF 1NF Only BCNF and 4NF are covered in the class 4NF Boyce-Codd Normal Form (BCNF) •Relation R with FDs Fis in BCNFif, for all X

9/12/17

5

4NFdecompositionalgorithm

• Finda4NFviolation– Anon-trivialMVD𝑋 ↠ 𝑌 in𝑅 where𝑋 isnot asuperkey

• Decompose𝑅 into𝑅8 and𝑅9,where– 𝑅8 hasattributes𝑋 ∪ 𝑌– 𝑅9 hasattributes𝑋 ∪ 𝑍 (where𝑍 contains𝑅 attributesnotin𝑋 or𝑌)

• Repeatuntilallrelationsarein4NF

• AlmostidenticaltoBCNFdecompositionalgorithm• Anydecompositionona4NFviolationislossless

25DukeCS,Fall2017 CompSci516:DatabaseSystems

4NFdecompositionexample

26

uid gid place

142 dps Springfield

142 dps Australia

456 abc Springfield

456 abc Morocco

456 gov Springfield

456 gov Morocco

… … …

User (uid,gid,place)4NFviolation:uid↠gid

Member(uid,gid) Visited(uid,place)4NF 4NFuid gid

142 dps

456 abc

456 gov

… …

uid place

142 Springfield

142 Australia

456 Springfield

456 Morocco

… …

DukeCS,Fall2017 CompSci516:DatabaseSystems

Otherkindsofdependenciesandnormalforms

• Dependencypreservingdecompositions• Joindependencies• Inclusiondependencies• 5NF,3NF,2NF• Seebookifinterested(notcoveredinclass)

DukeCS,Fall2017 CompSci516:DatabaseSystems 27

Summary

• PhilosophybehindBCNF,4NF:Datashoulddependonthekey,thewholekey,andnothingbutthekey!– Youcouldhavemultiplekeysthough

• Redundancyisnotdesiredtypically– notalways,mainlyduetoperformancereasons

• Functional/multivalueddependencies– captureredundancy• Decompositions– eliminatedependencies• Normalforms

– Guaranteescertainnon-redundancy– BCNF,and4NF

• Losslessjoin• HowtodecomposeintoBCNF,4NF• Chase

28DukeCS,Fall2017 CompSci516:DatabaseSystems

CompSci 516DataIntensiveComputingSystems

Lecture6bStorage andIndexing

Instructor:Sudeepa Roy

29DukeCS,Fall2017 CompSci516:DatabaseSystems

Wherearewenow?

Welearntü RelationalModelandQueryLanguages

ü SQL,RA,RCü Postgres(DBMS)ü XML(overview)§ HW1

ü DatabaseNormalization

Next• DBMSInternals

– Storage– Indexing– QueryEvaluation– OperatorAlgorithms– Externalsort– QueryOptimization

DukeCS,Fall2017 CompSci516:DatabaseSystems 30

Page 6: Today Normal Forms - Duke University...BCNF 3NF 2NF 1NF Only BCNF and 4NF are covered in the class 4NF Boyce-Codd Normal Form (BCNF) •Relation R with FDs Fis in BCNFif, for all X

9/12/17

6

ReadingMaterial

• [RG]– Storage:Chapters8.1,8.2,8.4,9.4-9.7– Index:8.3,8.5– Tree-basedindex:Chapter10.1-10.7– Hash-basedindex:Chapter11

Additionalreading• [GUW]

– Chapters8.3,14.1-14.4

DukeCS,Fall2017 CompSci516:DatabaseSystems 31

Acknowledgement:Thefollowingslideshavebeencreatedadaptingtheinstructormaterialofthe[RG]bookprovidedbytheauthorsDr.Ramakrishnan andDr.Gehrke.

Whatwillwelearn?

• HowdoesaDBMSorganizefiles?– Recordformat,Pageformat

• Whatisanindex?• Whataredifferenttypesofindexes?

– Tree-basedindexing:• B+tree• insert,delete

– Hash-basedindexing• Staticanddynamic(extendiblehashing,linearhashing)

• Howdoweuseindextooptimizeperformance?

DukeCS,Fall2017 CompSci516:DatabaseSystems 32

Storage

DukeCS,Fall2017 CompSci516:DatabaseSystems 33

DBMSArchitecture

DukeCS,Fall2017 CompSci516:DatabaseSystems 34

• AtypicalDBMShasalayeredarchitecture

• Thefiguredoesnotshowtheconcurrencycontrolandrecoverycomponents

– tobedonein“transactions”

• Thisisoneofseveralpossiblearchitectures

– eachsystemhasitsownvariations

Query Parsing, Optimization,and Execution

Relational Operators

Files and Access Methods

Buffer Management

Disk Space Management

DB

Theselayersmustconsiderconcurrencycontrolandrecovery

DataonExternalStorage• Datamustpersistondisk acrossprogramexecutionsina

DBMS– Dataishuge– Mustpersistacrossexecutions– ButhastobefetchedintomainmemorywhenDBMSprocessesthe

data

• Theunitofinformationforreadingdatafromdisk,orwritingdatatodisk,isapage

• Disks: Canretrieverandompageatfixedcost– Butreadingseveralconsecutivepagesismuchcheaperthanreading

theminrandomorder

DukeCS,Fall2017 CompSci516:DatabaseSystems 35

DiskSpaceManagement• LowestlayerofDBMSsoftwaremanagesspaceondisk

• Higherlevelscalluponthislayerto:– allocate/de-allocateapage– read/writeapage

• Sizeofapage= sizeofadiskblock=dataunit

• Requestforasequenceofpagesoftensatisfiedbyallocatingcontiguousblocksondisk

• SpaceondiskmanagedbyDisk-spaceManager– Higherlevelsdon’tneedtoknowhowthisisdone,orhowfreespace

ismanaged

DukeCS,Fall2017 CompSci516:DatabaseSystems 36

Page 7: Today Normal Forms - Duke University...BCNF 3NF 2NF 1NF Only BCNF and 4NF are covered in the class 4NF Boyce-Codd Normal Form (BCNF) •Relation R with FDs Fis in BCNFif, for all X

9/12/17

7

BufferManagement

Suppose• 1millionpagesindb,butonlyspacefor1000inmemory• Aqueryneedstoscantheentirefile• DBMShasto

– bringpagesintomainmemory– decidewhichexistingpagestoreplacetomakeroomforanewpage

– calledReplacementPolicy• ManagedbytheBuffermanager

– Filesandaccessmethodsaskthebuffermanagertoaccessapagementioningthe“recordid”(soon)

– Buffermanagerloadsthepageifnotalreadythere

DukeCS,Fall2017 CompSci516:DatabaseSystems 37

BufferManagement

• DatamustbeinRAMforDBMStooperateonit• Tableof<frame#,pageid>pairsismaintained

DB

MAIN MEMORY

DISK

disk page

free frame

Page Requests from Higher Levels

BUFFER POOL

choice of frame dictatedby replacement policy

DukeCS,Fall2017 CompSci516:DatabaseSystems 38

Bufferpool=mainmemoryispartitionedintoframeseithercontainsapagefromdiskorisafreeframe

WhenaPageisRequested...Foreveryframe,store• adirty bit:

– whetherthepagehasbeenmodifiedsinceithasbeenbroughttomemory

– initially0oroff

• apin-count:– thenumberoftimesapagehasbeenrequestedbutnotreleased

(andno.ofcurrentusers)– initially0– whenapageisrequested,thecountinincremented– whentherequestorreleasesthepage,countisdecremented– buffermanageronlyreadsapageintoaframewhenitspin-countis0– ifnopagewithpin-count0,buffermanagerhastowait(ora

transactionisaborted-- later)

DukeCS,Fall2017 CompSci516:DatabaseSystems 39

WhenaPageisRequested...

• Checkifthepageisalreadyinthebufferpool• ifyes,incrementthepin-countofthatframe• Ifno,

– Chooseaframeforreplacementusingthereplacementpolicy– Ifthechosenframeisdirty (hasbeenmodified),writeittodisk– Readrequestedpageintochosenframe

• Pin (increasepin-countof)thepageandreturnitsaddress totherequestor

• If requests can be predicted (e.g., sequential scans), pages can be pre-fetched several pages at a time

• ConcurrencyControl&recoverymayentailadditionalI/Owhenaframeischosenforreplacement• e.g.Write-AheadLogprotocol:whenwedoTransactions

DukeCS,Fall2017 CompSci516:DatabaseSystems 40

BufferReplacementPolicy

• Frameischosenforreplacementbyareplacementpolicy

• Least-recently-used(LRU)– addframeswithpin-count0totheendofaqueue– choosefromhead

• Clock(anefficientimplementationofLRU)• FirstInFirstOut(FIFO)• Most-Recently-Used(MRU)etc.

DukeCS,Fall2017 CompSci516:DatabaseSystems 41

BufferReplacementPolicy

• Policycanhavebigimpacton#ofI/O’s• Dependsontheaccesspattern• Sequentialflooding:NastysituationcausedbyLRU+

repeatedsequentialscans– Whathappenswith10framesand9pages?– Whathappenswith10framesand11pages?– #bufferframes<#pagesinfilemeanseachpagerequestineachscan

causesanI/O– MRUmuchbetterinthissituation(butnotinallsituations,ofcourse)

DukeCS,Fall2017 CompSci516:DatabaseSystems 42

Page 8: Today Normal Forms - Duke University...BCNF 3NF 2NF 1NF Only BCNF and 4NF are covered in the class 4NF Boyce-Codd Normal Form (BCNF) •Relation R with FDs Fis in BCNFif, for all X

9/12/17

8

DBMSvs.OSFileSystem

• OperatingSystemsdodiskspaceandbuffermanagementtoo:• WhynotletOSmanagethesetasks?

• DBMScanpredictthepagereferencepatternsmuchmoreaccurately– canoptimize– adjustreplacementpolicy– pre-fetch pages– alreadyinbuffer+contiguousallocation– pinapageinbufferpool,forceapagetodisk(importantfor

implementingTransactionsconcurrencycontrol&recovery)

• DifferencesinOSsupport:portabilityissues

• Somelimitations,e.g.,filescan’tspandisks

DukeCS,Fall2017 CompSci516:DatabaseSystems 43

FilesofRecords

• PageorblockisOKwhendoingI/O,buthigherlevelsofDBMSoperateonrecords,andfilesofrecords

• FILE:Acollectionofpages,eachcontainingacollectionofrecords

• Mustsupport:– insert/delete/modifyrecord– readaparticularrecord(specifiedusingrecordid)– scanallrecords(possiblywithsomeconditionsontherecordstoberetrieved)

DukeCS,Fall2017 CompSci516:DatabaseSystems 44

FileOrganization

• Fileorganization:Methodofarrangingafileofrecordsonexternalstorage– Onefilecanhavemultiplepages– Recordid(rid)issufficienttophysicallylocatethepagecontainingtherecordondisk

– Indexes aredatastructuresthatallowustofindtherecordidsofrecordswithgivenvaluesinindexsearchkeyfields

• NOTE:Severalusesof“keys”inadatabase– Primary/foreign/candidate/superkeys– Indexsearchkeys

DukeCS,Fall2017 CompSci516:DatabaseSystems 45

AlternativeFileOrganizationsManyalternativesexist,eachidealforsomesituations,and

notsogoodinothers:• Heap(randomorder)files: Suitablewhentypicalaccessisa

filescanretrievingallrecords• SortedFiles:Bestifrecordsmustberetrievedinsome

order,oronlya“range”ofrecordsisneeded.• Indexes:Datastructurestoorganizerecordsviatreesor

hashing– Likesortedfiles,theyspeedupsearchesforasubsetofrecords,

basedonvaluesincertain(“searchkey”)fields– Updatesaremuchfasterthaninsortedfiles

DukeCS,Fall2017 CompSci516:DatabaseSystems 46

Unordered(Heap)Files

• Simplestfilestructurecontainsrecordsinnoparticularorder

• Asfilegrowsandshrinks,diskpagesareallocatedandde-allocated

• Tosupportrecordleveloperations,wemust:– keeptrackofthepages inafile– keeptrackoffreespaceonpages– keeptrackoftherecords onapage

• Therearemanyalternativesforkeepingtrackofthis

DukeCS,Fall2017 CompSci516:DatabaseSystems 47

HeapFileImplementedasaList

• TheheaderpageidandHeapfilenamemustbestoredsomeplace

• Eachpagecontains2`pointers’plusdata• Problem:

– toinsertanewrecord,wemayneedtoscanseveralpagesonthefreelisttofindonewithsufficientspace

HeaderPage

DataPage

DataPage

DataPage

DataPage

DataPage

DataPage Pages with

Free Space

Full Pages

DukeCS,Fall2017 CompSci516:DatabaseSystems 48

Page 9: Today Normal Forms - Duke University...BCNF 3NF 2NF 1NF Only BCNF and 4NF are covered in the class 4NF Boyce-Codd Normal Form (BCNF) •Relation R with FDs Fis in BCNFif, for all X

9/12/17

9

HeapFileUsingaPageDirectory

• Theentryforapagecanincludethenumberoffreebytesonthepage.

• Thedirectoryisacollectionofpages– linkedlistimplementationofdirectoryisjustonealternative– Muchsmallerthanlinkedlistofallheapfilepages!

DataPage 1

DataPage 2

DataPage N

HeaderPage

DIRECTORY

DukeCS,Fall2017 CompSci516:DatabaseSystems 49

Howdowearrangeacollectionofrecordsonapage?

• Eachpagecontainsseveralslots– oneforeachrecord

• Recordisidentifiedby<page-id,slot-number>

• Fixed-LengthRecords• Variable-LengthRecords

• Forboth,thereareoptionsfor– Recordformats(howtoorganizethefieldswithinarecord)– Pageformats(howtoorganizetherecordswithinapage)

DukeCS,Fall2017 CompSci516:DatabaseSystems 50

PageFormats:FixedLengthRecords

• Recordid=<pageid,slot#>• Packed:movingrecordsforfreespacemanagementchangesrid;maynotbe

acceptable• Unpacked:useabitmap– scanthebitarraytofindanemptyslot• Eachpagealsomaycontainadditionalinfoliketheidofthenextpage(notshown)

Slot 1Slot 2

Slot N

. . . . . .

N M10. . .M ... 3 2 1

PACKED UNPACKED, BITMAP

Slot 1Slot 2

Slot N

FreeSpace

Slot M11

number of records

numberof slots

DukeCS,Fall2017 CompSci516:DatabaseSystems 51

PageFormats:VariableLengthRecords

• Needtofindapagewiththerightamountofspace– Toosmall– cannotinsert– Toolarge– wasteofspace

• ifarecordisdeleted,needtomovetherecordssothatallfreespaceiscontiguous– needabilitytomoverecordswithinapage

• Canmaintainadirectoryofslots(nextslide)– <record-offset,record-length>– deletion=setrecord-offsetto-1

• Record-idrid=<page,slot-in-directory>remainsunchanged

DukeCS,Fall2017 CompSci516:DatabaseSystems 52

PageFormats:VariableLengthRecords

• Canmoverecordsonpagewithoutchangingrid– so,attractiveforfixed-lengthrecordstoo

• Store(record-offset,record-length)ineachslot• rid-sunaffectedbyrearrangingrecordsinapage

Page iRid = (i,N)

Rid = (i,2)

Rid = (i,1)

Pointerto startof freespace

SLOT DIRECTORY

N . . . 2 120 16 24 N

# slots

DukeCS,Fall2017 CompSci516:DatabaseSystems 53

RecordFormats:FixedLength

• Eachfieldhasafixedlength– forallrecords– thenumberoffieldsisalsofixed– fieldscanbestoredconsecutively

• Informationaboutfieldtypessameforallrecordsinafile– storedinsystemcatalogs

• Findingi-th fielddoesnotrequirescanofrecord– giventheaddressoftherecord,addressofafieldcanbeobtained

easily

Base address (B)

L1 L2 L3 L4

F1 F2 F3 F4

Address = B+L1+L2

DukeCS,Fall2017 CompSci516:DatabaseSystems 54

Page 10: Today Normal Forms - Duke University...BCNF 3NF 2NF 1NF Only BCNF and 4NF are covered in the class 4NF Boyce-Codd Normal Form (BCNF) •Relation R with FDs Fis in BCNFif, for all X

9/12/17

10

RecordFormats:VariableLength• Cannotusefixed-lengthslotsforrecords• Twoalternativeformats(#fieldsisfixed):

• Second offers direct access to i-th field, efficient storage of nulls (special don’t know value); small directory overhead

• Modification may be costly (may grow the field and not fit in the page)

4 $ $ $ $

FieldCount

Fields Delimited by Special Symbols

F1 F2 F3 F4

F1 F2 F3 F4

Array of Field Offsets

1.usedelimiters

2.useoffsetsatthestartofeachrecord

DukeCS,Fall2017 CompSci516:DatabaseSystems 55