cs 4604: introducon to database management...
TRANSCRIPT
![Page 1: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/1.jpg)
CS4604:Introduc0ontoDatabaseManagementSystems
B.AdityaPrakashLecture#10:QueryProcessing
![Page 2: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/2.jpg)
Prakash2016 VTCS4604 2
Outline
§ introduc?on§ selec?on§ projec?on§ join§ set&aggregateopera?ons
![Page 3: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/3.jpg)
Prakash2016 VTCS4604 3
Introduc0on
§ Today’stopic:QUERYPROCESSING§ Somedatabaseopera?onsareEXPENSIVE§ Cangreatlyimproveperformancebybeing“smart”– e.g.,canspeedup1,000,000xovernaïveapproach
![Page 4: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/4.jpg)
Prakash2016 VTCS4604 4
Introduc0on(cnt’d)
§ Mainweaponsare:– cleverimplementa?ontechniquesforoperators– exploi?ng“equivalencies”ofrela?onaloperators– usingsta?s?csandcostmodelstochooseamongthese.
![Page 5: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/5.jpg)
Prakash2016 VTCS4604 5
AReallyBadQueryOp0mizer
§ ForeachSelect-From-Wherequeryblock– docartesianproductsfirst– thendoselec?ons– etc,ie.:
• GROUPBY;HAVING• projec?ons• ORDERBY
§ Incrediblyinefficient– Hugeintermediateresults!
× σpredicates
tables …
![Page 6: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/6.jpg)
Prakash2016 VTCS4604 6
Cost-basedQuerySub-System
Query Parser
Query Optimizer
Plan Generator Plan Cost Estimator
Query Plan Evaluator
Catalog Manager
Usuallythereisaheuris?cs-basedrewri?ngstepbeforethecost-basedsteps.
Schema Sta?s?cs
Select * From Blah B Where B.blah = blah
Queries
![Page 7: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/7.jpg)
Prakash2016 VTCS4604 7
TheQueryOp0miza0onGame§ “Op?mizer”isabitofamisnomer…§ Goalistopicka“good”(i.e.,lowexpectedcost)plan.– Involveschoosingaccessmethods,physicaloperators,operatororders,…
– No?onofcostisbasedonanabstract“costmodel”
![Page 8: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/8.jpg)
Prakash2016 VTCS4604 8
Rela0onalOpera0ons§ Wewillconsiderhowtoimplement:– Selec3on(σ)Selectsasubsetofrowsfromrela?on.– Projec3on(π)Deletesunwantedcolumnsfromrela?on.– Join()Allowsustocombinetworela?ons.– Set-difference(-)Tuplesinreln.1,butnotinreln.2.– Union(∪)Tuplesinreln.1andinreln.2.– Aggrega3on(SUM,MIN,etc.)andGROUPBY
§ Recall:opscanbecomposed!§ Later(akerspringbreak),we’llseehowtoop3mizequerieswithmanyops
▹◃
![Page 9: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/9.jpg)
Prakash2016 VTCS4604 9
SchemaforExamples
§ Similartooldschema;rnameaddedforvaria?ons.§ Sailors:– Eachtupleis50byteslong,80tuplesperpage,500pages.– N=500,pS=80.
§ Reserves:– Eachtupleis40byteslong,100tuplesperpage,1000pages.– M=1000,pR=100.
Sailors(sid:integer,sname:string,ra3ng:integer,age:real)Reserves(sid:integer,bid:integer,day:dates,rname:string)
![Page 10: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/10.jpg)
Prakash2016 VTCS4604 10
SimpleSelec0ons
§ Oftheform§ Ques?on:howbesttoperform?
SELECT*FROMReservesRWHERER.rname<‘C%’
σ R attr valueop R. ( )
![Page 11: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/11.jpg)
Prakash2016 VTCS4604 11
SimpleSelec0ons
§ A:Dependson:– whatindexes/accesspathsareavailable– whatistheexpectedsizeoftheresult(intermsofnumberoftuplesand/ornumberofpages)
![Page 12: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/12.jpg)
Prakash2016 VTCS4604 12
SimpleSelec0ons
§ Sizeofresultapproximatedas sizeofR*reduc3onfactor– “reduc?onfactor”isalsocalledselec3vity.– es?mateofreduc?onfactorsisbasedonsta?s?cs–wewilldiscussshortly.
![Page 13: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/13.jpg)
Prakash2016 VTCS4604 13
Alterna0vesforSimpleSelec0ons
§ Withnoindex,unsorted:– Mustessen?allyscanthewholerela?on– costisM(#pagesinR).For“reserves”=1000I/Os.
![Page 14: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/14.jpg)
Prakash2016 VTCS4604 14
SimpleSelec0ons(cnt’d)
§ Withnoindex,sorted:– costofbinarysearch+numberofpagescontainingresults.
– Forreserves=10I/Os+⎡selec?vity*#pages⎤
![Page 15: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/15.jpg)
Prakash2016 VTCS4604 15
SimpleSelec0ons(cnt’d)
§ Withanindexonselec?onauribute:– Useindextofindqualifyingdataentries,– thenretrievecorrespondingdatarecords.– (Hashindexusefulonlyforequalityselec?ons.)
![Page 16: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/16.jpg)
Prakash2016 VTCS4604 16
UsinganIndexforSelec0ons
§ Costdependson#qualifyingtuples,andclustering.– Cost:• findingqualifyingdataentries(typicallysmall)• pluscostofretrievingrecords(couldbelargew/oclustering).
![Page 17: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/17.jpg)
Prakash2016 VTCS4604 17
Selec0onsusingIndex(cnt’d)
Index entries
Data entries
direct search for
(Index File) (Data file)
Data Records
data entries
Data entries
Data Records
CLUSTEREDUNCLUSTERED
![Page 18: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/18.jpg)
Prakash2016 VTCS4604 18
Selec0onsusingIndex(cnt’d)– Inexample“reserves”rela?on,if10%oftuplesqualify(100pages,10,000tuples).• Withaclusteredindex,costisliulemorethan100I/Os;• ifunclustered,couldbeupto10,000I/Os!unless…
![Page 19: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/19.jpg)
Prakash2016 VTCS4604 19
Selec0onsusingIndex(cnt’d)§ Importantrefinementforunclusteredindexes:
1.Findqualifyingdataentries.2.Sorttherid’softhedatarecordstoberetrieved.3.Fetchridsinorder.Thisensuresthateachdatapageislookedatjustonce(though#ofsuchpageslikelytobehigherthanwithclustering).
![Page 20: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/20.jpg)
Prakash2016 VTCS4604 20
GeneralSelec0onCondi0ons
§ Q:Whatwouldyoudo?
(day<8/9/94 AND rname=‘Paul’) OR bid=5 OR sid=3 SKIP
![Page 21: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/21.jpg)
Prakash2016 VTCS4604 21
GeneralSelec0onCondi0ons
§ Q:Whatwouldyoudo?§ A:trytofindaselec?ve(clustering)index.Specifically:
(day<8/9/94 AND rname=‘Paul’) OR bid=5 OR sid=3 SKIP
![Page 22: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/22.jpg)
Prakash2016 VTCS4604 22
GeneralSelec0onCondi0ons
§ Converttoconjunc3venormalform(CNF):– (day<8/9/94ORbid=5ORsid=3)AND(rname=‘Paul’ ORbid=5ORsid=3)
§ WeonlydiscussthecasewithnoORs(aconjunc?onoftermsoftheformaUropvalue).
(day<8/9/94 AND rname=‘Paul’) OR bid=5 OR sid=3 SKIP
![Page 23: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/23.jpg)
Prakash2016 VTCS4604 23
GeneralSelec0onCondi0ons
§ AB-treeindexmatches(aconjunc?onof)termsthatinvolveonlyauributesinaprefixofthesearchkey.– Indexon<a,b,c>matchesa=5ANDb=3,butnotb=3.
§ ForHashindex,musthaveallauributesinsearchkey
(day<8/9/94 AND rname=‘Paul’) OR bid=5 OR sid=3 SKIP
![Page 24: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/24.jpg)
Prakash2016 VTCS4604 24
TwoApproachestoGeneralSelec0ons
§ Firstapproach:Findthecheapestaccesspath,retrievetuplesusingit,andapplyanyremainingtermsthatdon’tmatchtheindex
§ Secondapproach:getridsfromfirstindex;ridsfromsecondindex;intersectandfetch.
SKIP
![Page 25: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/25.jpg)
Prakash2016 VTCS4604 25
TwoApproachestoGeneralSelec0ons
§ Firstapproach:Findthecheapestaccesspath,retrievetuplesusingit,andapplyanyremainingtermsthatdon’tmatchtheindex:– Cheapestaccesspath:AnindexorfilescanwithfewestI/Os.
– Termsthatmatchthisindexreducethenumberoftuplesretrieved;othertermshelpdiscardsomeretrievedtuples,butdonotaffectnumberoftuples/pagesfetched.
SKIP
![Page 26: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/26.jpg)
Prakash2016 VTCS4604 26
CheapestAccessPath-Example§ Considerday<8/9/94ANDbid=5ANDsid=3.
§ AB+treeindexondaycanbeused;– then,bid=5andsid=3mustbecheckedforeachretrievedtuple.
§ Similarly,ahashindexon<bid,sid>couldbeused;– Then,day<8/9/94mustbechecked.
SKIP
![Page 27: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/27.jpg)
Prakash2016 VTCS4604 27
CheapestAccessPath-cnt’d
§ Considerday<8/9/94ANDbid=5ANDsid=3.
§ HowaboutaB+treeon<rname,day>?§ HowaboutaB+treeon<day,rname>?§ HowaboutaHashindexon<day,rname>?
SKIP
![Page 28: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/28.jpg)
Prakash2016 VTCS4604 28
Intersec0onofRIDs
§ Secondapproach:ifwehave2ormorematchingindexes(w/Alterna?ves(2)or(3)fordataentries):– Getsetsofridsofdatarecordsusingeachmatchingindex.
– Thenintersectthesesetsofrids.– Retrievetherecordsandapplyanyremainingterms.
SKIP
![Page 29: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/29.jpg)
Prakash2016 VTCS4604 29
Intersec0onofRIDs(cnt’d)
§ EXAMPLE:Considerday<8/9/94ANDbid=5ANDsid=3.
§ WithaB+treeindexondayandanindexonsid,§ wecanretrieveridsofrecordssa?sfyingday<8/9/94usingthefirst,
§ ridsofrecssa?sfyingsid=3usingthesecond,§ intersect,§ retrieverecordsandcheckbid=5.
SKIP
![Page 30: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/30.jpg)
Prakash2016 VTCS4604 30
TheProjec0onOpera0on
§ Issueisremovingduplicates.§ Basicapproach:sor?ng– 1.ScanR,extractonlytheneededaurs(why?)– 2.Sorttheresul?ngset– 3.RemoveadjacentduplicatesCost:Reserveswithsizera?o0.25=250pages.With20bufferpagescansortin2passes,so1000+250+2*2*250+250=2500I/Os
SELECTDISTINCTR.sid,R.bidFROMReservesR
![Page 31: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/31.jpg)
Prakash2016 VTCS4604 31
Projec0on
§ Canimprovebymodifyingexternalsortalgorithm(seechapter13):– ModifyPass0ofexternalsorttoeliminateunwantedfields.
– Modifymergingpassestoeliminateduplicates.Cost:forabovecase:read1000pages,writeout250inrunsof40pages,mergeruns=1000+250+250=1500.
SKIP
![Page 32: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/32.jpg)
Prakash2016 VTCS4604 32
DiscussionofProjec0on
§ Ifanindexontherela?oncontainsallwantedauributesinitssearchkey,candoindex-onlyscan.– Applyprojec?ontechniquestodataentries(muchsmaller!)
![Page 33: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/33.jpg)
Prakash2016 VTCS4604 33
DiscussionofProjec0on
§ Ifanordered(i.e.,tree)indexcontainsallwantedauributesasprefixofsearchkey,candoevenbeuer:– Retrievedataentriesinorder(index-onlyscan),discardunwantedfields,compareadjacenttuplestocheckforduplicates.
AB-treeindexmatches(aconjunc?onof)termsthatinvolveonlyauributesinaprefixofthesearchkey.– Indexon<a,b,c>matchesa=5ANDb=3,butnotb=3.
ForHashindex,musthaveallauributesinsearchkey
![Page 34: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/34.jpg)
Prakash2016 VTCS4604 34
Joins
§ Joinsareverycommon.§ Joinscanbeveryexpensive(crossproductin
worstcase).§ Manyapproachestoreducejoincost.
![Page 35: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/35.jpg)
Prakash2016 VTCS4604 35
Joins
§ Jointechniqueswewillcover:– Nested-loopsjoin– Index-nestedloopsjoin– Sort-mergejoin– Hashjoin
![Page 36: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/36.jpg)
Prakash2016 VTCS4604 36
EqualityJoinsWithOneJoinColumn
§ Inalgebra:RS.Common!Mustbecarefullyop?mized.R×Sislarge;so,R×Sfollowedbyaselec?onisinefficient.
§ Remember,joinisassocia?veandcommuta?ve.
SELECT*FROMReservesR1,SailorsS1WHERER1.sid=S1.sid
▹◃
![Page 37: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/37.jpg)
Prakash2016 VTCS4604 37
EqualityJoins
§ Assume:– MpagesinR,pRtuplesperpage,mtuplestotal– NpagesinS,pStuplesperpage,ntuplestotal– Inourexamples,RisReservesandSisSailors.
§ Wewillconsidermorecomplexjoincondi?onslater.
§ Costmetric:#ofI/Os.Wewillignoreoutputcosts.
![Page 38: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/38.jpg)
Prakash2016 VTCS4604 38
Nestedloops
§ Algorithm#0:(naive)nestedloop(SLOW!)
R(A,..)
S(A, ......) m
n
![Page 39: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/39.jpg)
Prakash2016 VTCS4604 39
Nestedloops
§ Algorithm#0:(naive)nestedloop(SLOW!)foreachtuplerofR
foreachtuplesofSprint,iftheymatch
R(A,..)
S(A, ......) m
n
![Page 40: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/40.jpg)
Prakash2016 VTCS4604 40
Nestedloops
§ Algorithm#0:(naive)nestedloop(SLOW!)foreachtuplerofR
foreachtuplesofSprint,iftheymatch
R(A,..)
S(A, ......) m
n
outer relation
inner relation
![Page 41: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/41.jpg)
Prakash2016 VTCS4604 41
Nestedloops
§ Algorithm#0:whyisitbad?§ howmanydiskaccesses(‘M’and‘N’arethenumberofblocksfor‘R’and‘S’)?
R(A,..)
S(A, ......) M pages,
m tuples N pages,
n tuples
![Page 42: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/42.jpg)
Prakash2016 VTCS4604 42
Nestedloops
§ Algorithm#0:whyisitbad?§ howmanydiskaccesses(‘M’and‘N’arethenumberofblocksfor‘R’and‘S’)?M+m*N
R(A,..)
S(A, ......) M pages,
m tuples N pages,
n tuples
![Page 43: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/43.jpg)
Prakash2016 VTCS4604 43
SimpleNestedLoopsJoin
§ Actualnumber(pR*M)*N+M=100*1000*500+1000I/Os.– At10ms/IO,Total:???
§ Whatifsmallerrela?on(S)wasouter?
§ Whatassump?onsarebeingmadehere?
![Page 44: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/44.jpg)
Prakash2016 VTCS4604 44
SimpleNestedLoopsJoin
§ Actualnumber§ (pR*M)*N+M=100*1000*500+1000I/Os.– At10ms/IO,Total:~6days(!)
§ Whatifsmallerrela?on(S)wasouter?– slightlybeuer
§ Whatassump?onsarebeingmadehere?– 1bufferforeachtable(and1foroutput)
![Page 45: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/45.jpg)
Prakash2016 VTCS4604 45
Nestedloops
R(A,..)
S(A, ......) M pages,
m tuples N pages,
n tuples
• Algorithm #1: Blocked nested-loop join – read in a block of R
• read in a block of S – print matching tuples COST?
![Page 46: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/46.jpg)
Prakash2016 VTCS4604 46
Nestedloops
R(A,..)
S(A, ......) M pages,
m tuples N pages,
n tuples
• Algorithm #1: Blocked nested-loop join – read in a block of R
• read in a block of S – print matching tuples COST= M+M*N
![Page 47: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/47.jpg)
Prakash2016 VTCS4604 47
Nestedloops
R(A,..)
S(A, ......) M pages,
m tuples N pages,
n tuples
• Which one should be the outer relation?
COST= M+M*N
![Page 48: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/48.jpg)
Prakash2016 VTCS4604 48
Nestedloops
R(A,..)
S(A, ......) M pages,
m tuples N pages,
n tuples
• Which one should be the outer relation? • A: the smallest (page-wise)
COST= M+M*N
![Page 49: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/49.jpg)
Prakash2016 VTCS4604 49
Nestedloops
R(A,..)
S(A, ......) M pages,
m tuples N pages,
n tuples
• M=1000, N=500 • Cost = 1000 + 1000*500 = 501,000 • = 5010 sec ~ 1.4h COST= M+M*N
![Page 50: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/50.jpg)
Prakash2016 VTCS4604 50
Nestedloops
R(A,..)
S(A, ......) M pages,
m tuples N pages,
n tuples
• M=1000, N=500 - if smaller is outer: • Cost = 500 + 1000*500 = 500,500 • = 5005 sec ~ 1.4h COST= N+M*N
![Page 51: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/51.jpg)
Prakash2016 VTCS4604 51
Nestedloops
R(A,..)
S(A, ......) M pages,
m tuples N pages,
n tuples
• What if we have B buffers available?
![Page 52: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/52.jpg)
Prakash2016 VTCS4604 52
Nestedloops
R(A,..)
S(A, ......) M pages,
m tuples N pages,
n tuples
• What if we have B buffers available? • A: give B-2 buffers to outer, 1 to inner, 1 for
output
![Page 53: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/53.jpg)
Prakash2016 VTCS4604 53
Nestedloops
R(A,..)
S(A, ......) M pages,
m tuples N pages,
n tuples
• Algorithm #1: Blocked nested-loop join – read in B-2 blocks of R
• read in a block of S – print matching tuples COST= ?
![Page 54: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/54.jpg)
Prakash2016 VTCS4604 54
Nestedloops
R(A,..)
S(A, ......) M pages,
m tuples N pages,
n tuples
• Algorithm #1: Blocked nested-loop join – read in B-2 blocks of R
• read in a block of S – print matching tuples COST= M+M/(B-2)*N
![Page 55: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/55.jpg)
Prakash2016 VTCS4604 55
Nestedloops
R(A,..)
S(A, ......) M pages,
m tuples N pages,
n tuples
• and, actually: • Cost = M + ceiling(M/(B-2)) * N
COST= M+M/(B-2)*N
![Page 56: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/56.jpg)
Prakash2016 VTCS4604 56
Nestedloops
R(A,..)
S(A, ......) M pages,
m tuples N pages,
n tuples
• If smallest (outer) fits in memory • (ie., B= N+2), • Cost =? COST= N+N/(B-2)*M
![Page 57: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/57.jpg)
Prakash2016 VTCS4604 57
Nestedloops
R(A,..)
S(A, ......) M pages,
m tuples N pages,
n tuples
• If smallest (outer) fits in memory • (ie., B= N+2), • Cost =N+M (minimum!) COST= N+N/(B-2)*M
![Page 58: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/58.jpg)
Prakash2016 VTCS4604 58
Nestedloops-guidelines
§ pickasouterthesmallesttable(=fewestpages)
§ fitasmuchofitinmemoryaspossible§ loopovertheinner
![Page 59: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/59.jpg)
Prakash2016 VTCS4604 59
§ useanexis?ngindex,orevenbuildoneonthefly
§ cost:M+m*c(c:look-upcost)
IndexNLjoin
R(A,..)
S(A, ......) M pages,
m tuples N pages,
n tuples
![Page 60: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/60.jpg)
Prakash2016 VTCS4604 60
§ cost:M+m*c(c:look-upcost)§ ‘c’dependswhethertheindexisclusteredornot.
IndexNLjoin
R(A,..)
S(A, ......) M pages,
m tuples N pages,
n tuples
![Page 61: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/61.jpg)
Prakash2016 VTCS4604 61
Joins
§ Jointechniqueswewillcover:– Nested-loopsjoin– Index-nestedloopsjoin– Sort-mergejoin– Hashjoin
![Page 62: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/62.jpg)
Prakash2016 VTCS4604 62
Sort-mergejoin
R(A,..)
S(A, ......) M pages,
m tuples N pages,
n tuples
• sort both on joining attributed • scan each and merge • Cost, given B buffers?
![Page 63: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/63.jpg)
Prakash2016 VTCS4604 63
Sort-mergejoin
R(A,..)
S(A, ......) M pages,
m tuples N pages,
n tuples
• Cost, given B buffers? • ~ 2*M*logM/logB + 2*N* logN/logB + M + N
![Page 64: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/64.jpg)
Prakash2016 VTCS4604 64
Sort-MergeJoin§ Usefulif
![Page 65: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/65.jpg)
Prakash2016 VTCS4604 65
Sort-MergeJoin§ Usefulif– oneorbothinputsarealreadysortedonjoinauribute(s)
– outputisrequiredtobesortedonjoinauributes(s)
§ “Merge”phasecanrequiresomebacktrackingifduplicatevaluesappearinjoincolumn
![Page 66: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/66.jpg)
Prakash2016 VTCS4604 66
ExampleofSort-MergeJoin
sid sname rating age22 dustin 7 45.028 yuppy 9 35.031 lubber 8 55.544 guppy 5 35.058 rusty 10 35.0
sid bid day rname28 103 12/4/96 guppy28 103 11/3/96 yuppy31 101 10/10/96 dustin31 102 10/12/96 lubber31 101 10/11/96 lubber58 103 11/12/96 dustin
![Page 67: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/67.jpg)
Prakash2016 VTCS4604 67
ExampleofSort-MergeJoin
§ With35,100or300bufferpages,bothReservesandSailorscanbesortedin2passes;totaljoincost:7500.
§ (while Block Nested Loop (BNL) cost: 2,500 to 15,000 I/Os)
![Page 68: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/68.jpg)
Prakash2016 VTCS4604 68
Sort-mergejoin
§ Worstcaseformergingphase?
§ Cost?
![Page 69: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/69.jpg)
Prakash2016 VTCS4604 69
Refinements
§ Alltherefinementsofexternalsor?ng§ plusoverlappingofthemergingofsor?ngwiththemergingofjoining.
![Page 70: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/70.jpg)
Prakash2016 VTCS4604 70
Joins
§ Jointechniqueswewillcover:– Nested-loopsjoin– Index-nestedloopsjoin– Sort-mergejoin– Hashjoin
![Page 71: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/71.jpg)
Prakash2016 VTCS4604 71
§ hashjoin:usehashingfunc?onh()– hash‘R’into(0,1,...,‘max’)buckets– hash‘S’intobuckets(samehashfunc?on)– joineachpairofmatchingbuckets
Hashjoins
R(A, ...) S(A, ......) 0
1
max
![Page 72: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/72.jpg)
Prakash2016 VTCS4604 72
– howtojoineachpairofpar??onsHr-i,Hs-i?– A:buildanotherhashtableforHs-i,andprobeitwitheachtupleofHr-i
Hashjoin-details
R(A, ...) S(A, ......)
Hr-0
0
1
max
Hs-0
![Page 73: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/73.jpg)
Prakash2016 VTCS4604 73
Hashjoin-details
§ Inmoredetail:§ Choosethe(page-wise)smallest-ifitfitsinmemory,do~NL– and,actually,buildahashtable(withh2()!=h())– andprobeit,witheachtupleoftheother
![Page 74: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/74.jpg)
Prakash2016 VTCS4604 74
§ whatifHs-iistoolargetofitinmain-memory?
§ A:recursivepar??oning§ moredetails(overflows,hybridhashjoins):inbook
§ costofhashjoin?(ifwehaveenoughbuffers:)3(M+N)(why?Seenextslide)
Hashjoindetails
![Page 75: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/75.jpg)
Prakash2016 VTCS4604 75
CostofHash-Join
§ Inpar??oningphase,read+writebothrelns;2(M+N).Inmatchingphase,readbothrelns;M+NI/Os.
§ Inourrunningexample,thisisatotalof4500I/Os.
![Page 76: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/76.jpg)
Prakash2016 VTCS4604 76
§ [costofhashjoin?(ifwehaveenoughbuffers:)3(M+N)]
§ Whatis‘enough’?sqrt(N),orsqrt(M)?
Hashjoindetails
![Page 77: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/77.jpg)
Prakash2016 VTCS4604 77
§ [costofhashjoin?(ifwehaveenoughbuffers:)3(M+N)]
§ Whatis‘enough’?sqrt(N),orsqrt(M)?§ A:sqrt(smallest)(why?)– Becauseyouonlyneedenoughmemorytoholdthehashtablepar??onsofthesmallertableinmemorysoB>sizeofsmaller/B-1èB~sqrt(size-of-smaller)
Hashjoindetails Details
![Page 78: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/78.jpg)
Prakash2016 VTCS4604 78
Sort-MergeJoinvs.HashJoin
§ Givenaminimumamountofmemorybothhaveacostof3(M+N)I/Os.
(min.memoryforsort-merge=sqrt(largertable)usingaggressiverefinements---intextbook)(min.memoryforhash=sqrt(smallertable)---seepreviousslides)
![Page 79: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/79.jpg)
Prakash2016 VTCS4604 79
Sort-MergevsHashjoin
§ HashJoinPros:– ??– ??– ??
§ Sort-MergeJoinPros:– ??
![Page 80: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/80.jpg)
Prakash2016 VTCS4604 80
Sort-MergevsHashjoin
§ HashJoinPros:– Superiorifrela?onsizesdiffergreatly– Showntobehighlyparallelizable(beyondscopeofclass)
§ Sort-MergeJoinPros:– ??
![Page 81: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/81.jpg)
Prakash2016 VTCS4604 81
Sort-MergevsHashjoin
§ HashJoinPros:– Superiorifrela?onsizesdiffergreatly– Showntobehighlyparallelizable(beyondscopeofclass)
§ Sort-MergeJoinPros:– Lesssensi?vetodataskew– Resultissorted(mayhelp“upstream”operators)– goesfasterifoneorbothinputsalreadysorted
![Page 82: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/82.jpg)
Prakash2016 VTCS4604 82
GeneralJoinCondi0ons
§ Equali?esoverseveralauributes(e.g.,R.sid=S.sidANDR.rname=S.sname):– allpreviousmethodsapply,usingthecompositekey
![Page 83: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/83.jpg)
Prakash2016 VTCS4604 83
GeneralJoinCondi0ons
§ Inequalitycondi?ons(e.g.,R.rname<S.sname):§ whichmethodss?llapply?– NL– indexNL– Sortmerge– Hashjoin
![Page 84: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/84.jpg)
Prakash2016 VTCS4604 84
GeneralJoinCondi0ons
§ Inequalitycondi?ons(e.g.,R.rname<S.sname):§ whichmethodss?llapply?– NL (probably,thebest!)– indexNL (onlyifclusteredindex)– Sortmerge (doesnotapply!)(why?)– Hashjoin (doesnotapply!)(why?)
![Page 85: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/85.jpg)
Prakash2016 VTCS4604 85
SetOpera0ons
§ Intersec?onandcross-product:specialcasesofjoin
§ Union(Dis?nct)andExcept:similar;we’lldounion:
§ Effec?vely:concatenate;usesor?ngorhashing§ Sor?ngbasedapproachtounion:– Sortbothrela?ons(oncombina?onofallauributes).– Scansortedrela?onsandmergethem.– Alterna3ve:MergerunsfromPass0forbothrela?ons.
SKIP
![Page 86: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/86.jpg)
Prakash2016 VTCS4604 86
SetOpera0ons,cont’d
§ Hashbasedapproachtounion:– Par??onRandSusinghashfunc?onh.– ForeachS-par??on,buildin-memoryhashtable(usingh2),scancorrespondingR-par??onandaddtuplestotablewhilediscardingduplicates.
SKIP
![Page 87: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/87.jpg)
Prakash2016 VTCS4604 87
AggregateOpera0ons(AVG,MIN,etc.)
§ Withoutgrouping:– Ingeneral,requiresscanningtherela?on.– GivenindexwhosesearchkeyincludesallauributesintheSELECTorWHEREclauses,candoindex-onlyscan.
SKIP
![Page 88: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/88.jpg)
Prakash2016 VTCS4604 88
Summary§ Avirtueofrela?onalDBMSs:– queriesarecomposedofafewbasicoperators– Theimplementa?onoftheseoperatorscanbecarefullytuned
– Importanttodothis!
§ Manyalterna?veimplementa?ontechniquesforeachoperator– Nouniversallysuperiortechniqueformostoperators.
“it depends” [Guy Lohman (IBM)]
![Page 89: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest](https://reader034.vdocument.in/reader034/viewer/2022052105/6040e5abaf636a472d76b83e/html5/thumbnails/89.jpg)
Prakash2016 VTCS4604 89
Summarycont’d
§ Mustconsideravailablealterna?vesforeachopera?oninaqueryandchoosebestonebasedonsystemsta?s?cs,etc.– Partofthebroadertaskofop?mizingaquerycomposedofseveralops.