cs 4604: introduction to database management systemscs4604/fall18/lectures/... · 2018. 12. 7. ·...
TRANSCRIPT
-
CS4604:IntroductiontoDatabaseManagementSystems
B.AdityaPrakashLecture#2:TheRelationalModeland
RelationalAlgebra
-
CourseOutline§ Weeks1–4:Query/
ManipulationLanguagesandDataModeling– RelationalAlgebra– Datadefinition– ProgrammingwithSQL– Entity-Relationship(E/R)approach
– SpecifyingConstraints– GoodE/Rdesign
§ Weeks5–8:Indexes,ProcessingandOptimization– Storing– Hashing/Sorting– QueryOptimization– NoSQLandHadoop
§ Week9-10:RelationalDesign– FunctionalDependencies– Normalizationtoavoidredundancy
§ Week11-12:ConcurrencyControl– Transactions– LoggingandRecovery
§ Week13–14:Students’choice– PracticeProblems– XML– Dataminingandwarehousing
Prakash2018 VTCS4604 2
-
DataModel§ ADataModelisanotationfordescribingdataorinformation.
– Structureofdata(e.g.arrays,structs)• Conceptualmodel:Indatabases,structuresareatahigherlevel.
– Operationsondata(ModificationsandQueries)• LimitedOperations:Easeofprogrammersandefficiencyofdatabase.
– Constraintsondata(whatthedatacanbe)§ Examplesofdatamodels
– TheRelationalModel– TheSemistructured-DataModel
• XMLandrelatedstandards– Object-RelationalModel
Prakash2018 VTCS4604 3
-
TheRelationalModel
§ Structure:Table(likeanarrayofstructs)§ Operations:Relationalalebgra(selection,projection,conditions,etc)
§ Constraints:E.g.,gradescanbeonly{A,B,C,F}
Prakash2018 VTCS4604
Student Course Grade
HermioneGrainger Potions A
DracoMalfoy Potions B
HarryPotter Potions A
RonWeasley Potions C
4
-
TheSemi-structuredmodel
§ Structure:Treesorgraphs,tagsdefineroleplayedbydifferentpiecesofdata.
§ Operations:Followpathsintheimpliedtreefromoneelementtoanother.
§ Constraints:E.g.,canexpresslimitationsondatatypes
Prakash2018 VTCS4604
Hermione Grainger
Potions
A
Draco Malfoy
Potions
B ...
5
-
Comparingthetwomodels
§ Flexibility:XMLcanrepresentgraphs§ Easeofuse:SQLenablesprogrammertoexpresswishesathighlevel.
Prakash2018 VTCS4604 6
-
TheRelationalModel§ Simple:Builtaroundasingleconceptformodelingdata:therelationortable.– Arelationaldatabaseisacollectionofrelations. – Eachrelationisatablewithrowsandcolumns.
§ Supportshigh-levelprogramminglanguage(SQL).– Limitedbutveryusefulsetofoperations
§ Hasanelegantmathematicaldesigntheory.§ MostcurrentDBMSarerelational(Oracle,IBMDB2,MSSQL)
Prakash2018 VTCS4604 7
-
Relations§ Arelationisatwo-dimensionaltable:
– Relation==table.– Attribute==columnname.– Tuple==row(nottheheaderrow).
§ Database==collectionofrelations.§ Arelationhastwoparts:
– Schemadefinescolumnheadsofthetable(attributes).– Instancecontainsthedatarows(tuples,rows,orrecords)ofthetable.
Student Course Grade
HermioneGrainger Potions A
DracoMalfoy Potions B
HarryPotter Potions A
RonWeasley Potions CPrakash2018 VTCS4604 8
-
SchemaCoursesTaken:§ Theschemaofarelationisthenameoftherelationfollowed
byaparenthesizedlistofattributes. CoursesTaken(Student, Course, Grade)
§ Adesigninarelationalmodelconsistsofasetofschemas.§ Suchasetofschemasiscalledarelationaldatabaseschema.
Student Course Grade
HermioneGrainger Potions A
DracoMalfoy Potions B
HarryPotter Potions A
RonWeasley Potions C
Prakash2018 VTCS4604 9
-
Relations:EquivalentRepresentationsCoursesTaken:
CoursesTaken(Student, Course, Grade)
§ Relationisasetoftuplesandnotalistoftuples.– Orderinwhichwepresentthetuplesdoesnotmatter.– Veryimportant!
§ Theattributesinaschemaarealsoaset(notalist).– Schemaisthesameirrespectiveoforderofattributes. CoursesTaken(Student, Grade, Course)
– Wespecifya“standard”orderwhenweintroduceaschema.§ Howmanyequivalentrepresentationsarethereforarelationwith
mattributesandntuples?
Student Course Grade
HermioneGrainger Potions A
DracoMalfoy Potions B
HarryPotter Potions A
RonWeasley Potions C
m! n! Prakash2018 VTCS4604 10
-
DegreeandCardinalityCoursesTaken:§ Degree/Arityisthenumberoffields/attributesinschema(=3
inthetableabove)§ Cardinalityisthenumberoftuplesinrelation(=4inthetable
above)
Student Course Grade
HermioneGrainger Potions A
DracoMalfoy Potions B
HarryPotter Potions A
RonWeasley Potions C
Prakash2018 VTCS4604 11
-
KeysofRelations§ Keysareoneformofintegrityconstraints(IC)
– Nopairoftuplesshouldhaveidenticalkeys§ WhatisthekeyforCoursesTaken?
– Studentifonlyonecourseintherelation– Pair(Student,Course)ifmultiplecourses– Whatifstudenttakessamecoursemanytimes?
Prakash2018 VTCS4604
Student Course Grade
HermioneGrainger Potions A
DracoMalfoy Potions B
HarryPotter Potions A
RonWeasley Potions C12
-
KeysofRelations
§ Keyshelpassociatetuplesindifferentrelations
Prakash2018 VTCS4604
SID Student GPA
123 HermioneGrainger
3.9
111 DracoMalfoy 3.0
234 HarryPotter 3.7
456 RonWeasley 3.1
SID CID Grade
123 15-401 A
111 15-401 B
123 14-501 B
…. …. ….
13
-
Example§ Create a database for managing class enrollments in a single
semester. You shouldkeep trackofall students (theirnames, Ids,and addresses) and professors (name, Id, department). Do notrecord the address of professors but keep track of their ages.Maintainrecordsofcoursesalso.Likewhatclassroomisassignedtoa course, what is the current enrollment, and which departmentoffersit.Atmostoneprofessorteacheseachcourse.Eachstudentevaluates the professor teaching the course. Note that all courseofferings in the semester are unique, i.e. course names andnumbers do not overlap. A course can have ≥ 0 pre-requisites,excludingitself.Astudentenrolledinacoursemusthaveenrolledin all its pre-requisites. Each student receives a grade in eachcourse.Thedepartmentsarealsounique,andcanhaveatmostonechairperson(ordept.head).Achairpersonisnot allowedtoheadtwoormoredepartments.
Prakash2018 VTCS4604 14
-
Example§ Create a database for managing class enrollments in a single
semester.Youshouldkeep trackofallstudents (theirnames, Ids,and addresses) and professors (name, Id, department). Do notrecord the address of professors but keep track of their ages.Maintain recordsofcourses also. Likewhat classroom is assignedtoacourse,whatisthecurrentenrollment,andwhichdepartmentoffersit.Atmostoneprofessorteacheseachcourse.Eachstudentevaluates the professor teaching the course.Note that all courseofferings in the semester are unique, i.e. course names andnumbers do not overlap. A course can have ≥ 0 pre-requisites,excludingitself.Astudentenrolledinacoursemusthaveenrolledin all its pre-requisites. Each student receives a grade in eachcourse. The departments are also unique, and can have at mostonechairperson (ordept.head).A chairperson isnot allowed toheadtwoormoredepartments.
Prakash2018 VTCS4604 15
-
RelationalDesignfortheExample§ Students(PID:string,Name:string,Address:string)
§ Professors(PID:string,Name:string,Office:string,Age:integer,DepartmentName:string)
§ Courses(Number:integer,DeptName:string,CourseName:string,Classroom:string,Enrollment:integer)
§ Teach(ProfessorPID:string,Number:integer,DeptName:string)
§ Take(StudentPID:string,Number: integer,DeptName:string,Grade:string,ProfessorEvaluation:integer)
§ Departments(Name:string,ChairmanPID:string)
§ PreReq(Number:integer,DeptName:string,PreReqNumber:integer, PreReqDeptName:string)
Prakash2018 VTCS4604 16
-
RelationalDesignExample:Keys?§ Students(PID:string,Name:string,Address:string)
§ Professors(PID:string,Name:string,Office:string,Age:integer,DepartmentName:string)
§ Courses(Number:integer,DeptName:string,CourseName:string,Classroom:string,Enrollment:integer)
§ Teach(ProfessorPID:string,Number:integer,DeptName:string)
§ Take(StudentPID:string,Number: integer,DeptName:string,Grade:string,ProfessorEvaluation:integer)
§ Departments(Name:string,ChairmanPID:string)
§ PreReq(Number:integer,DeptName:string,PreReqNumber:integer, PreReqDeptName:string)
Prakash2018 VTCS4604 17
-
RelationalDesign:Keys?§ Students(PID:string,Name:string,Address:string)
§ Professors(PID:string,Name:string,Office:string,Age:integer,DepartmentName:string)
§ Courses(Number:integer,DeptName:string,CourseName:string,Classroom:string,Enrollment:integer)
§ Teach(ProfessorPID:string,Number:integer,DeptName:string)
§ Take(StudentPID:string,Number: integer,DeptName:string,Grade:string,ProfessorEvaluation:integer)
§ Departments(Name:string,ChairmanPID:string)
§ PreReq(Number:integer,DeptName:string,PreReqNumber:integer, PreReqDeptName:string)
Prakash2018 VTCS4604 18
-
IssuestoConsiderintheDesign§ CanwemergeCoursesandTeachsinceeachprofessorteachesatmostonecourse?
§ Doweneedaseparaterelationtostoreevaluations?§ Howcanwehandlepre-requisitesthatare“or”s,e.g.,youcantakeCS4604ifyouhavetakeneitherCS3114orCS2606?
§ Howdowegeneralizethisschematohandledataovermorethanonesemester?
§ Whatmodificationsdoestheschemaneedifmorethanoneprofessorcanteachacourse?
Prakash2018 VTCS4604 19
-
Prakash2018 VTCS4604
Formalquerylanguages
§ Howdowecollectinformation?§ Eg.,findssn’sofpeoplein415§ (recall:everythingisaset!)§ Onesolution:Rel.algebra,ie.,setoperators§ Q1:Whichones??§ Q2:whatisaminimalsetofoperators?
20
-
Prakash2018 VTCS4604
§ .§ .§ .§ setunionU§ setdifference‘-’
Relationaloperators
21
-
Prakash2018 VTCS4604
Example:
FT-STUDENTSsn Name
129 peters main str239 lee 5th ave
PT-STUDENTSsn Name Address
123 smith main str234 jones forbes ave
§ Q:findallstudents(partorfulltime)§ A:PT-STUDENTunionFT-STUDENT
22
-
Prakash2018 VTCS4604
Observations:
§ twotablesare‘unioncompatible’iftheyhavethesameattributes(‘domains’)
§ Q:howaboutintersectionU
23
-
Prakash2018 VTCS4604
Observations:
§ A:redundant:§ STUDENTintersectionSTAFF=
STUDENT STAFF
24
-
Prakash2018 VTCS4604
Observations:
§ A:redundant:§ STUDENTintersectionSTAFF=
STUDENT STAFF
25
-
Prakash2018 VTCS4604
Observations:
§ A:redundant:§ STUDENTintersectionSTAFF=STUDENT-(STUDENT-STAFF)
STUDENT STAFF
26
-
Prakash2018 VTCS4604
Observations:
§ A:redundant:§ STUDENTintersectionSTAFF=STUDENT-(STUDENT-STAFF)
Doublenegation:We’llseeitagain,later…
27
-
Prakash2018 VTCS4604
§ .§ .§ .§ setunion§ setdifference‘-’
Relationaloperators
U
28
-
Prakash2018 VTCS4604
Otheroperators?
§ eg,findallstudentson‘Mainstreet’§ A:‘selection’
)('' STUDENTstrmainaddress=σSTUDENTSsn Name Address
123 smith main str234 jones forbes ave
29
-
Prakash2018 VTCS4604
Otheroperators?
§ Notice:selection(andrestofoperators)expecttables,andproducetables(->canbecascaded!!)
§ Forselection,ingeneral:
)(RELATIONconditionσ
30
-
Prakash2018 VTCS4604
Selection-examples
§ Findall‘Smiths’on‘MainSt.’
σ name='Smith '∧ address='Main st. ' (STUDENT )
‘condition’canbeanybooleancombinationof‘=‘,‘>’,‘>=‘,‘’,...
31
-
Prakash2018 VTCS4604
§ selection§ .§ .§ setunion§ setdifferenceR-S
Relationaloperators
)(Rconditionσ
RUS
32
-
Prakash2018 VTCS4604
§ selectionpicksrows-howaboutcolumns?§ A:‘projection’-eg.:findsallthe‘ssn’-removingduplicates
Relationaloperators
)(STUDENTssnπ
STUDENTSsn Name Address
123 smith main str234 jones forbes ave
33
-
Prakash2018 VTCS4604
Cascading:‘findssnofstudentson‘mainst.’
Relationaloperators
π ssn (σ address='main st ' (STUDENT ))
STUDENTSsn Name Address
123 smith main str234 jones forbes ave
34
-
Prakash2018 VTCS4604
§ selection§ projection§ .§ setunion§ setdifferenceR-S
Relationaloperators
)(Rconditionσ)(Rlistatt−π
RUS
35
-
Prakash2018 VTCS4604
Arewedoneyet?Q:Giveaquerywecannotansweryet!
Relationaloperators
36
-
Prakash2018 VTCS4604
A:anyqueryacrosstwoormoretables,eg.,‘findnamesofstudentsin4604’
Q:whatextraoperatordoweneed??
Relationaloperators
STUDENTSsn Name Address
123 smith main str234 jones forbes ave
SSN c-id grade123 4604 A234 5614 B
37
-
Prakash2018 VTCS4604
A:anyqueryacrosstwoormoretables,eg.,‘findnamesofstudentsin4604’
Q:whatextraoperatordoweneed??A:surprisingly,cartesianproductisenough!
Relationaloperators
STUDENTSsn Name Address
123 smith main str234 jones forbes ave
SSN c-id grade123 4604 A234 5614 B
38
-
Prakash2018 VTCS4604
Cartesianproduct
§ eg.,dog-breeding:MALExFEMALE§ givesallpossiblecouples
MALEnamespikespot
FEMALEnamelassieshiba
x =M.name F.namespike lassiespike shibaspot lassiespot shiba
39
-
Prakash2018 VTCS4604
sowhat?
§ Eg.,howdowefindnamesofstudentstaking4604?
40
-
Prakash2018 VTCS4604
Cartesianproduct
§ A:
Ssn Name Address ssn cid grade123 smith main str 123 4604 A234 jones forbes ave 123 4604 A123 smith main str 234 5614 B234 jones forbes ave 234 5614 B
)(......... .. TAKESxSTUDENTssnTAKESssnSTUDENT =σ
41
-
Prakash2018 VTCS4604
Cartesianproduct
..σ cid=4604(σ STUDENT .ssn=TAKES.ssn (STUDENT x TAKES))
Ssn Name Address ssn cid grade123 smith main str 123 4604 A234 jones forbes ave 123 4604 A123 smith main str 234 5614 B234 jones forbes ave 234 5614 B
42
-
Prakash2018 VTCS4604
π name(σ cid=4604(σ STUDENT .ssn=TAKES.ssn (STUDENT x TAKES)))
Ssn Name Address ssn cid grade123 smith main str 123 4604 A234 jones forbes ave 123 4604 A123 smith main str 234 5614 B234 jones forbes ave 234 5614 B
43
-
Prakash2018 VTCS4604
§ selection§ projection§ cartesianproductMALExFEMALE§ setunion§ setdifferenceR-S
FUNDAMENTALRelationaloperators
)(Rconditionσ)(Rlistatt−π
RUS
44
-
Prakash2018 VTCS4604
Relationalops
§ Surprisingly,theyareenough,tohelpusansweralmostanyquerywewant!!
§ derived/convenienceoperators:– setintersection– join(thetajoin,equi-join,naturaljoin)– ‘rename’operator– division
)(' RRρSR ÷
45
-
Prakash2018 VTCS4604
Joins
§ Equijoin:SR bSaR .. =▹◃ )(.. SRbSaR ×= =σ
46
-
Prakash2018 VTCS4604
Cartesianproduct
§ A: )(......... .. TAKESxSTUDENTssnTAKESssnSTUDENT =σ
Ssn Name Address ssn cid grade123 smith main str 123 4604 A234 jones forbes ave 123 4604 A123 smith main str 234 5614 B234 jones forbes ave 234 5614 B
47
-
Prakash2018 VTCS4604
Joins
§ Equijoin:§ theta-joins:generalizationofequi-join-anycondition
SR bSaR .. =▹◃ )(.. SRbSaR ×= =σSR θ▹◃
θ
48
-
Prakash2018 VTCS4604
Joins
§ verypopular:naturaljoin:RS§ likeequi-join,butitdropsduplicatecolumns:STUDENT(ssn,name,address)TAKES(ssn,cid,grade)
49
-
Prakash2018 VTCS4604
Joins
§ nat.joinhas5attributes TAKESSTUDENT ▹◃
TAKESSTUDENT ssnTAKESssnSTUDENT .. =▹◃equi-join:6
Ssn Name Address ssn cid grade123 smith main str 123 4604 A234 jones forbes ave 123 4604 A123 smith main str 234 5614 B234 jones forbes ave 234 5614 B
50
-
Prakash2018 VTCS4604
NaturalJoins-nit-picking
§ ifnoattributesincommonbetweenR,S:nat.join->cartesianproduct
51
-
Prakash2018 VTCS4604
Overview-rel.algebra
§ fundamentaloperators§ derivedoperators
– joinsetc– rename– division
§ examples
52
-
Prakash2018 VTCS4604
Renameop.
§ Q:why?§ A:shorthand;self-joins;…§ forexample,findthegrand-parentsof‘Tom’,givenPC(parent-id,child-id)
)(BEFOREAFTERρ
53
-
Prakash2018 VTCS4604
Renameop.
§ PC(parent-id,child-id) PCPC▹◃
PCp-id c-idMary TomPeter MaryJohn Tom
PCp-id c-idMary TomPeter MaryJohn Tom
54
-
Prakash2018 VTCS4604
Renameop.
§ first,WRONGattempt:§ (why?howmanycolumns?)§ SecondWRONGattempt:
PCPC▹◃
PCPC idpPCidcPC −=− ..▹◃
55
-
Prakash2018 VTCS4604
Renameop.
§ weclearlyneedtwodifferentnamesforthesametable-hence,the‘rename’op.
PCPC idpPCidcPCPC −=− ..11 )( ▹◃ρ
56
-
Prakash2018 VTCS4604
Overview-rel.algebra
§ fundamentaloperators§ derivedoperators
– joinsetc– rename– division
§ examples
57
-
Prakash2018 VTCS4604
Division
§ Rarelyused,butpowerful.§ Example:findsuspicioussuppliers,ie.,suppliersthatsuppliedallthepartsinA_BOMB
58
-
Prakash2018 VTCS4604
Division
SHIPMENTs# p#s1 p1s2 p1s1 p2s3 p1s5 p3
ABOMBp#p1p2
BAD_Ss#s1÷
=
59
Example:findsuspicioussuppliers,ie.,suppliersthatsuppliedallthepartsinA_BOMB
-
Prakash2018 VTCS4604
Division
§ Observations:~reverseofcartesianproduct§ Itcanbederivedfromthe5fundamentaloperators(!!)
§ How?
60
-
Prakash2018 VTCS4604
Division
§ Answer:
§ Observation:find‘good’suppliers,andsubtract!(doublenegation)
]))([()( )()()( rsrrsr SRSRSR −×−=÷ −−− πππ
61
-
Prakash2018 VTCS4604
Division
§ Answer:
§ Observation:find‘good’suppliers,andsubtract!(doublenegation)
]))([()( )()()( rsrrsr SRSRSR −×−=÷ −−− πππ
SHIPMENTs# p#s1 p1s2 p1s1 p2s3 p1s5 p3
ABOMBp#p1p2
BAD_Ss#s1÷
=
62
-
Prakash2018 VTCS4604
Division
§ Answer:
]))([()( )()()( rsrrsr SRSRSR −×−=÷ −−− πππ
SHIPMENTs# p#s1 p1s2 p1s1 p2s3 p1s5 p3
ABOMBp#p1p2
BAD_Ss#s1÷
=
Allsuppliers
Allbadparts
63
Table‘r’Table‘s’
-
Prakash2018 VTCS4604
Division
§ Answer:
]))([()( )()()( rsrrsr SRSRSR −×−=÷ −−− πππ
SHIPMENTs# p#s1 p1s2 p1s1 p2s3 p1s5 p3
ABOMBp#p1p2
BAD_Ss#s1÷
=
allpossiblesuspiciousshipments
64
Table‘r’Table‘s’
-
Prakash2018 VTCS4604
Division
§ Answer:
]))([()( )()()( rsrrsr SRSRSR −×−=÷ −−− πππ
SHIPMENTs# p#s1 p1s2 p1s1 p2s3 p1s5 p3
ABOMBp#p1p2
BAD_Ss#s1÷
=
allpossiblesuspiciousshipmentsthatdidn’thappen
65
Table‘r’Table‘s’
-
Prakash2018 VTCS4604
Division
§ Answer:
]))([()( )()()( rsrrsr SRSRSR −×−=÷ −−− πππ
SHIPMENTs# p#s1 p1s2 p1s1 p2s3 p1s5 p3
ABOMBp#p1p2
BAD_Ss#s1÷
=
allsupplierswhomissedatleastonesuspiciousshipment,
i.e.:‘good’suppliers
66
Table‘r’Table‘s’
-
Prakash2018 VTCS4604
Overview-rel.algebra
§ fundamentaloperators§ derivedoperators
– joinsetc– rename– division
§ examples
67
-
Prakash2018 VTCS4604
Sampleschema
STUDENTSsn Name Address
123 smith main str234 jones forbes ave
CLASSc-id c-name units
4513 s.e. 24512 o.s. 2
SSN c-id grade123 4513 A234 4513 B
findnamesofstudentsthattake4604
68
TAKES
-
Prakash2018 VTCS4604
Examples
§ findnamesofstudentsthattake4604
69
-
Prakash2018 VTCS4604
Examples
§ findnamesofstudentsthattake4604
π name[σ c−id=4604 (STUDENT ▹◃TAKES)]
70
-
Prakash2018 VTCS4604
Sampleschema
STUDENTSsn Name Address
123 smith main str234 jones forbes ave
CLASSc-id c-name units
4613 s.e. 24612 o.s. 2
SSN c-id grade123 4613 A234 4613 B
findcoursenamesof‘smith’
71
TAKES
-
Prakash2018 VTCS4604
Examples
§ findcoursenamesof‘smith’
π c−name[σ name='smith ' (
STUDENT ▹◃TAKES ▹◃CLASS)]
72
-
Prakash2018 VTCS4604
Examples
§ findssnof‘overworked’students,ie.,thattake4612,4613,4604
73
-
Prakash2018 VTCS4604
Examples
§ findssnof‘overworked’students,ie.,thattake4612,4613,4604:almostcorrectanswer:
σ c−name=4612 (TAKES)∩σ c−name=4613(TAKES)∩σ c−name=4604 (TAKES)
74
-
Prakash2018 VTCS4604
Examples
§ findssnof‘overworked’students,ie.,thattake4612,4613,4604-Correctanswer:
π ssn[σ c−name=4612 (TAKES)]∩π ssn[σ c−name=4613(TAKES)]∩π ssn[σ c−name=4604 (TAKES)]
75
-
Prakash2018 VTCS4604
Examples
§ findssnofstudentsthatworkatleastashardasssn=123,ie.,theytakeallthecoursesofssn=123,andmaybemore
76
-
Prakash2018 VTCS4604
Sampleschema
STUDENTSsn Name Address
123 smith main str234 jones forbes ave
CLASSc-id c-name units
4613 s.e. 24612 o.s. 2
SSN c-id grade123 4613 A234 4613 B
77
TAKES
-
Prakash2018 VTCS4604
Examples
§ findssnofstudentsthatworkatleastashardasssn=123(ie.,theytakeallthecoursesofssn=123,andmaybemore
)]([)]([ 123, TAKESTAKES ssnidcidcssn =−− ÷ σππ
78
-
Prakash2018 VTCS4604
Conclusions
§ Relationalmodel:onlytables(‘relations’)§ relationalalgebra:powerful,minimal:5operatorscanhandlealmostanyquery!
79