WORKINGSETANALYTICS6/15/20--DRAFTv2PeterJ.DenningNavalPostgraduateSchool,Monterey,CA
Abstract
Theworkingsetmodelforprogrambehaviorwasinventedin1965.Ithasstoodthetestoftimeinvirtualmemorymanagementforoverfiftyyears.Itisconsideredtheidealformanagingmemoryinoperatingsystemsandcaches.Itssuperiorperformancewasbasedontheprincipleoflocality,whichwasdiscoveredatthesametime;localityistheobservedtendencyofprogramstousedistinctsubsetsoftheirpagesoverextendedperiodsoftime.Thistutorialtracesthedevelopmentofworkingsettheoryfromitsoriginstothepresentday.Wewilldiscusstheprincipleoflocalityanditsexperimentalverification.Wewillshowwhyworkingsetmemorymanagementresiststhrashingandgeneratesnear-optimalsystemthroughput.Wewillpresentthepowerful,linear-timealgorithmsforcomputingworkingsetstatisticsandapplyingthemtothedesignofmemorysystems.Wewilldebunkseveralmythsaboutlocalityandtheperformanceofmemorysystems.Wewillconcludewithadiscussionoftheapplicationoftheworkingsetmodelinparallelsystems,modernsharedCPUcaches,networkedgecaches,andinventoryandlogisticsmanagement.
Keywords:Workingset,workingsetmodel,programbehavior,virtualmemory,cache,pagingpolicy,locality,localityprinciple,thrashing,multiprogramming,memorymanagement,optimalpaging
Denning WorkingSetAnalytics 2
VirtualmemorymadeitspublicdebutattheUniversityofManchesterin1962.Itwashailedasabreakthroughinautomaticmemorymanagementandenjoyedanenthusiasticreception.Bythemid-1960s,however,operatingsystemengineershadbecomeskepticalofvirtualmemory.Itsperformancewasuntrustworthyanditwaspronetounpredictablethrashing.In1965,IjoinedProjectMACatMIT,whichwasdevelopingMultics.ThedesignersofMulticsdidnotwanttheirvirtualmemorytosuccumbtotheseproblems.MyPhDresearchprojectwastofindanewapproachtomanagingvirtualmemorythatwouldmakeitsperformancetrustworthy,reliable,andnear-optimal.Thenewapproachhadtwoparts.ThefirstwastheWorkingSetModel,whichwasbasedontheideaofmeasuringtheintrinsicmemorydemandsofindividualprograms;workingsetsthendeterminedhowmanymainmemoryslotswereneededandwhatpagesshouldbeloadedintothem.ThesecondpartwasthePrincipleofLocality,whichwasthestrongtendencyofexecutingprogramstoconfinetheirreferencestolimitedlocalitysetsoverextendedphases.Afterextensiveexperimentalverification,thelocalityprinciplewasacceptedasauniversallawofcomputing.Thelocalityprinciplemadeitpossibletoprovethatworking-setmemorymanagementisnear-optimalandimmunetothrashing.ThesediscoveriesenabledthesuccessofvirtualmemoryonMulticsandoncommercialoperatingsystems.
Sincethe1970s,everyoperatingsystemtextbookhasdiscussedtheworkingsetmodel.Further,everymodernoperatingsystemusestheworkingsetmodelasanidealtounderpinitsapproachtomemorymanagement[den16].Thiscanbeseen,forexample,bylookingattheprocessactivitycontrolpanelsinWindows,whereyouwillsee“workingset”mentionedinthememoryusecolumn.AsearchoftheUSpatentdatabaseshowsover12,000patentsthatbasetheirclaimsontheworkingsetorlocalitytheories.
Inthistutorial,Iwilltracehowallthiscametobeandwillshowyouthepowerfulalgorithmswedevelopedtocomputeworkingsetstatisticsandusethemtodesignmemorysystems.
TheGrowingPainsofVirtualMemoryIn1959TomKilburnandhisteam,whowerebuildingtheAtlascomputerand
operatingsystemattheUniversityofManchester,UK,inventedthefirstvirtualmemory[fot61].Theyputitintooperationin1962[kil62].Virtualmemorymanagedthecontentofthesmallmainmemorybyautomaticallytransferringpagesbetweenitandthesecondarymemory.Kilburnargueditwouldimproveprogrammerproductivitybyautomatingthelabor-intensiveworkofplanningmemorytransfers.Itwasimmediatelyseenbymanyasaningenioussolutiontothecomplexproblemsofmanaginginformationflowsinthememoryhierarchy.
TheManchesterdesignersintroducedfourinnovationsthatsoonwereadoptedasstandardsincomputerarchitectureandoperatingsystemsfromthentothepresentday.Onewasthepage,afixedsizedunitofstorageandtransfer.Programs
Denning WorkingSetAnalytics 3
anddataweredividedintopagesandstoredinmainmemoryslotscalledpageframeswithcopiesonthesecondarymemory.Asecondinnovationwasthedistinctionbetweenaddresses(namesofvalues)andlocations(memoryslotsholdingvalues).Theaddressspacewasalargelinearsequenceofaddressesandthemainmemory(RAM)apoolofpageframes,eachofwhichcouldholdanypagefromtheaddressspace.AstheCPUgeneratedaddressesintoitsaddressspace,ahardwarememorymappingunittranslatedaddressestolocations,usingapagetablethatassociatedpageswithpageframes.Athirdinnovationwasthepagefault,aninterrupttriggeredintheoperatingsystemwhenanexecutingprogrampresentedthemappingunitwithanaddresswhosepagewasnotinmainmemory;theoperatingsystemlocatedthemissingpageinsecondarymemory,chosealoadedpagetoevict,andtransferredthemissingpageintothevacatedpageframe.Afourthinnovationwasthepagereplacementpolicy,thealgorithmthatchoseswhichpagemustbeevictedfrommainmemoryandreturnedtosecondarymemorytomakewayforanincomingpage.Themissratefunction–fractionofreferencesthatproduceapagefault–wasthekeyperformancemeasureforvirtualmemories.
Performanceofoperatingsystemshasalwaysbeenabigdeal.Tobeacceptedintotheoperatingsystem,avirtualmemorysystemhadtobeefficient.Thetwopotentialsourcesofinefficiencywereinaddressmappingandpagetransfers.TheycanbecalledtheAddressingProblemandtheReplacementProblem.
TheAddressingProblemyieldedquicklytoefficientsolutions.Atranslationlookasidebuffer,whichwasasmallcacheinthememorymappingunit,limitedtheslowdownfromaddressmappingto3%ofthenormalRAMaccesstime.Thiswasanacceptablecostforallthebenefits.1Virtualmemorydidindeeddoubleortripleprogrammerproductivity.Italsoeliminatedtheannoyingproblemthathand-craftedtransferscheduleshadtoberedoneforeachdifferentsizeofmainmemory.Evenmore,virtualmemoryenabledthepartitioningofmainmemoryintodisjointsets,oneforeachaddressspace,supportingaprimalobjectivetopreventexecutingjobsfrominterferingwithoneanother.Thiswasallgoodnewstothedesignersandusersofearlyoperatingsystems.
TheReplacementProblemwasmuchmoredifficultandcontroversial.Pagetransferswereverycostlybecausethespeedgapbetweenmainandsecondarymemorywas10,000ormore.Itwascriticaltofindreplacementpoliciesthatminimizedpagetransfers.Earlytestsbroughtgoodnews:thepagetransferschedulesgeneratedautomaticallybythevirtualmemoryusuallygeneratedfewerpagemovesthanhand-craftedschedules[say69].Becausethemainmemorywasveryexpensiveandsmall,2eventhecleverestprogramswerelikelytogeneratealotofpagefaults.Despitethegreatpressureonthemtofindgoodreplacementpolicies,
1Theterm“job”isusedthroughoutthistutorialtomeananyofprocess,thread,orexecutingprogram.Itdenotesacomputationaltaskdoingworkforauserorthesystem.2Mainmemorywasveryexpensive,atleast$0.25abyte;todayagigabyte(GB)costs$5.00,abouttwomillionthsofthat.Eventhoughmemoryaccesstimeshavebecomemuchsmaller,thespeedgapbetweenthemainmemory(RAM)andsecondarymemory(usuallyDISK)hasrisenfrom104in1960to106today.Pagefaultshaveneverbeencheap.
Denning WorkingSetAnalytics 4
designersofvirtualmemoriesfoundnoclearwinners;by1965therewasconsiderableheateddebateandconflictingdataaboutwhichreplacementpolicywouldbethebest.Manyengineersbegantoharbordoubtsaboutwhethervirtualmemorycouldbecountedontogivegoodperformancetoeveryoneusingacomputersystem[ran68].
TheAtlasdesigners,keenlyawareofthehighcostofpaging,inventedaningeniousreplacementpolicytheycalledthe“learningalgorithm”.Itassumedthattypicalpagesspentthemajorityoftheirtimebeingvisitedinwell-definedloops.Itmeasuredtheperiodofeachpage’sloopandchoseforreplacementthepagenotexpectedtobeaccessedagainforthelongesttimeintothefuture.ThelearningalgorithmemployedtheoptimalityprinciplethatIBM’sLesBeladymadepreciseinhisMINpagingalgorithmin1966[bel66].Unfortunately,thelearningalgorithmdidnotworkwellforprogramsthatdidnothavetight,well-definedloops;itdidnotfindfavoramongtheengineerssearchingforrobustpagingalgorithms.
Theinnovationofmultiprogramminginthemid1960saddedtotheconsternationaboutvirtualmemoryperformance.Withmultiprogramming,severaljobscouldbeloadedsimultaneouslyintoseparatepartitionsofthemainmemory,yieldingsignificantimprovementsofCPUefficiencyandsystemthroughput.Butmultiprogrammedvirtualmemoriesbroughtahostofnewproblems.Howmanyprogramsshouldbeloaded?Howmuchmemoryspaceshouldeachprogramget?Shouldspaceallocationsbeallowedtovary?Intheprocessoftryingtoanswerthesequestions,engineersdiscoveredthatvirtualmemorysystemswerepronetoanew,unexpected,andveryseriousproblem:thrashing.
ThrashingisthesuddencollapseofCPUefficiencyandsystemthroughputwhentoomanyprogramsareloadedintomainmemoryatonce.Itisacatastrophicinstabilitytriggeredwhenthepagingpolicystealspagesfromotherprogramstosatisfypagefaults.Thiscondition,originallycalled“pagingtodeath”,couldbetriggeredbyaddingjustonemoreprogramtothemainmemory.Thetriggerthresholdwasunpredictable.Whowouldpurchaseamillion-dollarcomputersystemwhoseperformancecouldsuddenlycollapseatrandomandunpredictabletimes?
Thus,by1965,operatingsystemsdesignerswerefacinganenormouschallenge.Couldtheydesignpoliciesformanagingvirtualmemorythatminimizedpagefaultsanddidnotthrash?
Thegoodnewsisaffirmative:theworkingsetpolicybecameaclassicidealformanagingmemory[den16].Allthemajormodernoperatingsystems–todayincludingWindows,MacOS,andLinux–usedmemorymanagementpoliciesinspiredbytheworkingsetmodel.
IntotheStormIjoinedMITProjectMACatthestartoftheMulticsprojectin1965.Multics
plannedtohaveamultiprogrammedvirtualmemory.Thedesignersweredeeply
Denning WorkingSetAnalytics 5
worriedthattheycouldwinduplikesomeofthecommercialoperatingsystemsthatwereadoptingmultiprogrammedvirtualmemory–hobbledbyexcessivepagingandsusceptibletothrashing.IbecamefascinatedwiththesequestionsandtookthemonformyPhDwork.JerrySaltzerposedtheresearchquestioninaniceway:consideringMulticsasablackbox,canyoudesignanautomaticcontrolmechanismforthevirtualmemorywithasingle,tunable,optimizingparameter?[den80]
Isetouttofindanswersfortheskepticswhoseriouslydoubtedthatvirtualmemorycouldbestabilized.Theirprimaryconcernwasthatnooneofavarietyofpagereplacementpoliciesworkedconsistentlywell.Theyalsoknewthatnoreplacementpolicyworkswellforsomecommonproblems.Forexample,matrixmultiplication,amainstayofgraphicsandneuralnetworks,isveryfastifthematricesareallinmemory.Butiftheyarestoredwithrowsorcolumnsonseparatepagesandmemorycannotholdthemall,matrixmultiplicationgeneratesenormousamountsofpagingandgrindstoanearhalt.
In1966,LesBeladyofIBMWatsonResearchLabpublishedafamousexperimentalstudycomparingalargenumberofreplacementalgorithms[bel66].Oneofhisfindingswasthatreplacementpoliciesrelyingonusebitstodecidewhichpagestokeepinmainmemoryperformedbetterthanothers.3Hetookthisasevidenceofa“localityproperty”–processestendtoreusepagesmostrecentlyusedinthepast.Inlate1965Ihadindependentlyreachedasimilarconclusion,whichIproposedtoharnesswitha“workingset”,definedasthepagesusedduringabackward-lookingsamplingwindowofvirtualtime.4Theworkingsetwouldseethemostrecentlyusedpagesandtheoperatingsystemwouldprotectthemfrombeingpagedout[den68a,den72].Theoperatingsystemcouldpreventthrashingbyneverallowingapagefaulttostealapagefromanotherworkingset[den68b].BeladyandIbeganacollaborationin1967,inwhichwepostulatedthatallprogramsobeya“localityprinciple”thatcouldbeusefullyexploitedbypagingalgorithms.
PageReferenceMapsPagereferencemapswereausefultoolinearlyvirtualmemoryresearch.Amap
showstimeonthex-axisandpagesonthey-axis.Eachpointofthetimeaxisrepresentsasampleintervalandaboveitisacolumnofdarkenedpixelsmarkingthepagesareusedinthatsampleinterval.(AnexampleappearsinFigure1.)ThesamplingintervalisTtimeunits,whereonetimeunitisasinglepagereference.Thepagesusedinasamplingintervalarethelocalitysetofthatinterval.Aphaseisasequenceofsamplingintervalsoverwhichthelocalitysetisunchanged.Thesemapsclearlyshowedthatjobsaccessedsmallsubsetsoftheirtotaladdressspaces
3Ausebitisahardwarebitassociatedwitheachpageframe.Whenthepageisaccessed(readorwritten)theusebitissetto1bytheaddressinghardware.Theoperatingsystemcanscanforusedpagesandresetthebitsto0.Unusedpagesareconsideredinactiveandarefirsttoberemovedfrommainmemorywhenspacemustbefreedup.4Virtualtimeisdiscretetimemeasuredasnumberofmemoryaccesses;itisnotinterruptedbyexternaleventssuchaspagefaultsorotherinput-output.
Denning WorkingSetAnalytics 6
forextendedperiods.Noprogramwithrandomreferencemapwaseverobserved.ThesemapswerestrongevidenceofauniversalLocalityPrincipleexhibitedbyallexecutingprograms.
WhenheundertookhisresearchonpaginginLinuxaround2009,ahalfcenturyaftertheinventionofvirtualmemory,AdrianMcMenaminencounteredalotofskepticismamongLinuxsystemprogrammersaboutlocality[mcm11].Theyregardeditasanobsoleteideafromtheearlydaysofvirtualmemory,nolongerrelevantbecausemoderncomputershavesomuchmorememory.Tothecontrary,McMenaminfoundoutthatLinuxprogramsdisplaylocality–anditisevenmorepronouncedthaninearlyvirtualmemories.Whywouldtheremovalofmemoryconstraintsleadtomorepronouncedlocality?Thereasonappearstobethatunconstrainedprogrammersbuiltmoremodularprograms:thephasesareintervalsofaparticularmodule’suse.
McMenamindemonstratedlocalitybyrecordingthepagereferencemapsfromasignificantsampleofLinuxprograms.Everyprogramhadclearlyidentifiablelocalitysetsandphases–auniquelocality-phase“signature”.5Heconcludedthattheskepticismwasmisplacedandthatconsiderableperformancebenefitswillcometosystemsthatexploitthelocalitybehavior.Pagereferencemapsremainapowerfultoolforvisualizinglocalityandtheoperationofmemorymanagementpolicies.
5Theonlyknownexceptionisadatascanner,aprogramthatexamineseachiteminadatasequencejustonceandthendiscardsit.Thebestreplacementpolicyinthiscaseevictsapagejustafteritisused.
Denning WorkingSetAnalytics 7
Figure1.ThisisapagereferencemapoftheFirefoxWebbrowserinaLinuxsystem.Thehorizontalaxisrepresentsvirtualtime,dividedintoequalsamplewindowsofabout380Kreferences,andtheverticalrepresentsvirtualaddressesofpages.Acoloredpixelindicatesthatthepagewasreferencedduringtheassociatedsamplewindow;awhitepixelindicatesnoreference.Theverticalgridlinesarespaced200sampleintervalsapartandthehorizontalgridlinesarespaced50pagesapart.Themaprevealsthelocalitysetsoftheprogramandshowsdramaticallythatlocalitysetsarestableoverextendedperiods(phases),punctuatedbysharpshiftstootherlocalitysets.Forover99percentofthetimeinthismap,thepagesseeninasampleintervalareanearperfectpredictorforthepagesusedinthenextsampleinterval.Mostexecutingprogramshavestrikinglocalitymapslikethisone.Eachhasitsownuniquelocalitybehavior,likeadigitalsignature.Thereisnorandomnessinthewayprogramsusetheircodeanddata.(Source:AdrianMcMenamin[mcm11],CreativeCommonslicense.)
Denning WorkingSetAnalytics 8
OriginsofLocalityandWorkingSetsTheideasoflocalityandworkingsets(WS)areseparateanddistinctbutare
intimatelyconnected.Localityisaboutthepatternsofprogramsreferencingtheirpages.Workingsetisaboutdetectingthosepatternsinrealtimeandusingthemtomakememorymanagementdecisions.
LesBelady’sfamousstudyofpagingalgorithmsin1966demonstratedthatLRU(leastrecentlyused)replacementconsistentlyproducedfewerpagefaultsthanFIFO(firstinfirstout)replacement[bel66].Beladyreasonedthat,ifprogramslocalizetheirreferencesintosubsetsoftheirpages,themostrecentlyusedpagesweremostlikelytobereusedintheimmediatefuture–andthustheleastrecentlyusedpageswerethebestchoicesforreplacement.Beladyalsonotedthataweakerformoflocality–nonuniformuseofpages–explainedwhythefaultratefunctionsofLRUandFIFOwerenotlinear.
In1966therewasaconsensusamongoperatingsystemengineersthatlocalitywasanobservedtendencyforprogramstoreusetheirpagesintheimmediatefuture.Isawaconnectionbetweenthisinformalideaoflocalityandanoldprogrammingconcept.Programmersusedtheterm“workingset”tomeanthepagesthatneededtobeloadedinmainmemorysothataprogramwouldexecuteefficiently.Itwasuptotheprogrammertodeclaretheworkingsetsanddesignascheduleofpagetransferstoensurethatworkingsetswereloadedinmainmemory.Itseemedtomethattheoperatingsystemcoulddetectworkingsetsbymonitoringusebits,enablingittoautomaticallyloadworkingsetsinmemorywithouthavingtoaskprogrammerstodeclarethem.Pageswhoseusebitsweresetduringawindowoffixedsizewouldestimatetheprogram’sworkingset[den68a].6Thusthetwoideas,localityandworkingset,becameapowerfulpartnershipforallocatingmemory.
Let’sexaminelocalitymoreclosely.Frompagereferencemaps,earlyvirtualmemoryresearcherssawthatprogramsusedonlyasmallsubsetoftheirpagesatanygiventime.Thepagesthatwereusedtogetherwereseenas“spatiallyclustered”becausetheuseofoneimpliedthatanotherwouldbeusedsoon.Forexample,thepagesofalooporthepagesofacodemodulearespatiallyclustered.Spatialclusteringimpliedtemporalclustering.Thesetwotermsbecamefavoritewaysofexplainingwhylocalitywasacharacteristicofprogramexecution.
Inthedecadeafter1968,mystudentsandI,alongwithotherresearchers,studiedhowlocalityismanifestedinactualprograms.Westudiedandmeasuredmanyprograms,leadingustointroducetheterms“localitysets”,“phases”,and“transitions”.Thesetermscapturedtherecurringstructureoflocality–periodsofstabilitypunctuatedbyabrupttransitions.
6AlthoughWSandLRUfavormost-recently-usedpages,theyarenotthesame.LRUoperateswithinafixedmemoryspacebutdoesnotadvisetheoperatingsystemwhatsizeitshouldbe.WSmeasuresthelocalitysetandadvisestheoperatingsystemtoallocatejusttherightamountofmemoryrequiredtoholdit.TheWScontentsarethemostrecentlyusedpagesinthewindow,buttherethesimilaritywithLRUends.
Denning WorkingSetAnalytics 9
Weconcludedthattemporalclusteringisaratherpoordescriptionofthepunctuatedstabilityobservedinexecutingprograms.Temporalclusteringexemplifies“slowdrift”butnot“suddenchange”.Toconfirmthis,webuiltandstudiedmathematicalmodelsoflocality.MystudentJeffSpirntestedthe“slowdrift”hypothesisbyformulatingaseriesofmathematicalmodelsofslowdriftbehaviorandthenexamininghowwelleachmodelpredictedtheobservedLRUmiss-ratefunctionofaprogram[spi72].Spirnfoundthatsimplemodelssuchasindependentreferencemodel(eachpageisreferencedwithafixedprobability)andindependentstackdistancemodel(eachLRUstackdistanceoccurswithafixedprobability)ledtopoorpredictionsofLRUmissrate.
MystudentKevinKahnmodeledthepunctuatedstabilitybehaviordirectly[kah76].Inhismodels,stateswerelocalitysetsandphaseswereholdingtimesinthestates.Phase-transitionmodelsparameterizedfrommeasurementsofpagereferencemapsyieldedexcellentagreementbetweenpredictedandactualLRUmissrates.Moreover,becauseitadjustedtolocalitysets,aWSpolicygeneratedlesspagingthanLRUwithoutusingmorememory.
WayneMadisonandAlanBatsonconfirmedthatthesekeyaspectsoflocality–localitysets,phases,andtransitions–existatthesourcecodelevel[mad76].Theyconcludedthatlocalityisnotanartifactofthewaythatcompilerslayoutdataandcodeblocksonpagesofaddressspace.Designtechniquessuchasloopiteration,divide-and-conquer,andmodularityleadtosubsetsofpagesbeingusedforextendedperiods.Thelocalityseeninpagereferencesistheimageofthehigher-levellocalitycreatedasprogrammersdesigntheiralgorithms.
Programmerswhounderstoodthatvirtualmemoryperformsbetterwithprogramsofgoodlocalityeasilydesignedprogramsthatranwellinvirtualmemory[Say69].Everyprogramweeversawexhibitedlocality.Noprogramuseditspagesrandomly.Bythemid1970s,wehadsettledonthedefinitionoflocalityintheaccompanyingbox[denn80].
TheLocalityPrincipleExecutingprocessesreferencetheirmemoryobjectswithpunctuatedstabilitydescribedby:
(𝐿!, 𝐻!), (𝐿", 𝐻"), … , (𝐿# , 𝐻#), …
whereLiisalocalitysetandHiistheholdingtimeofitsphase.Theshortestsamplingintervalrequiredtoseethefulllocalitysetislikelytobeasmallfractionofholdingtime.Successivelocalitysetsarelikelytobemostlydifferent,withfewoverlaps.
Denning WorkingSetAnalytics 10
RecentresearchbyChenDingandhisstudentswithlargesharedcacheshasdemonstratedthatlocalityisobservedincachereferencepatterns.Consequently,workingsetmemorymanagementalsoappliestosharedcaches,justasinmultiprogrammedoperatingsystems[xia11,xia13,xia18].
Ihavefoundthatstudentstodayoftenhavetroubleunderstandingtheprincipleoflocality.ManyarelikeMcMenaminbeforehisstudy:itseemscounterintuitivethatprogramshavesuchpronouncedlocalitybehavior,orthattrackinglocalityoptimizesperformance.Thismisunderstandingmayberootedintheirownexperienceofprogramming:theywerenotconcernedaboutmemoryconstraintsanddidnotconsciouslydesigntheirprogramstohavelongphasesofstablereferencestoobjects.Yettheexperimentalevidencerepeatedlyshowsthatwhentheirprogramisembeddedasamoduleintoalargersoftwaresystem,thewholesystemdisplaysthephase-transitionbehavioroflocality.
Thismisunderstandingissupportedbythelimiteddefinitionsoflocalityinoperatingsystemstextbooks.Thebooksusuallydefinelocalityasacombinationoftemporalandspatialclosenessofreferences.Thesedefinitionsignoretheempiricalfactofabruptchangesatphasetransitions.Noticethatrandomassignmentofdatatopagesmightremovespatiallocalitybutitwillnotremovetemporallocality.Thesamephasesandtransitionswillbeobservedonthepagereferencemap.
LetusillustratewithFigure1howwecanmeasuresomepropertiesoflocalitybehaviorfromapagereferencemap.ThefigureshowsfivelocalitysetsandtheirassociatedphasesforasamplingwindowTofapproximately380Ktimeunits.AmeasurementofthegraphyieldsTable1,wheresizeisthenumberofpagesinthelocalityset,lengthisthenumberofsampleintervalsinaphase,andfractionisthepercentageofthefulladdressspacecoveredbythelocalityset.CachepoliciesthatdetectlocalitysetsinFigure1wouldneedonlymemorysufficienttohold15-33%oftheaddressspacetoachieve100%oftheperformance.
Table1
set size length fraction
1 50 180 25%
2 65 220 33%
3 30 220 15%
4 55 50 28%
5 35 180 18%
Denning WorkingSetAnalytics 11
Noticethatthepunctuated-stabilityideaoflocalityeasilytranslatesintomeasurementsoflocalitysetsizes,phaselengths,andtransitionprobabilitiesonpagereferencemaps,whereasthevaguetermstemporalandspatiallocalitydonotsuggestmeasurementprotocols.
Pagereferencemaps,whichwereausefultoolintheearlystudiesofvirtualmemoryperformance,continuetobeusefultodayinstudiesofcacheperformance.Theyarestrikingevidenceoflocalityandvisuallyconveyconsiderableinsightintothedynamicsofexecutingprograms.
Likeotherscientificdiscoveries,theLocalityPrinciplebeganasahypothesisandwasacceptedasscientificfactonlyaftermanyvalidations.Theoriginalnotionof“slowdrift”temporallocalitygavewaytoamoresophisticatednotionof“punctuatedstability”characterizedbyphasesandtransitions.Workingsetisareal-timedetectorofthisbehavior.Itallowsavirtualmemorymanagertodynamicallyadjustmemoryallocationsandmaximizesystemthroughput.
MemorySpace-TimeLawOperatingsystemdesignershavealwaysbeenconcernedwithperformance.
Theywouldliketostateperformanceguaranteesforthroughputandresponsetimethatwillholdoverawiderangeofworkloadsandnumbersofsimultaneoususers.Oneofthemostchallengingquestionswashowtoestablishaconnectionbetweenmemorymanagementandthekeyperformancemeasureofthroughput.Whencanwesaythatoptimizingmemorymanagementoptimizessystemthroughput?
Whenjobsuselessspace,wecangetmoreofthemintomemoryandincreasethroughputbecauseoftheparallelism.Andwhentheycompleteinlesstime,wecanprocessmoreofthemovertime.Thisiswhymanyofushaveanintuitionthatthesmallerthespace-timefootprintofprograms,thelargeristhesystemthroughput.
Thespacetimefootprintisthenumberofpage-secondsofmemoryusageofanexecutingjob.Apage-secondisaunitofrentforusingmemory.Itisanalogoustotheideaofchargingforofficespacebythenumberofsquare-feetrented,orcharginglaboronaprojectbyperson-hours.Whenaprocessloads1pageintomainmemoryforSseconds,theprocessaddsSpage-secondstoitsmemorybill.LoadingSpagesfor1secondalsoaddsSpage-secondstothebill.
JeffBuzenin1976discoveredthe“memoryspace-timelaw”(MSTLaw),anexactformulathatlinksspace-timeandsystemthroughput[buz76].Itsays“averagetotalmemoryusedbyalljobs=systemthroughput´averagespace-timefootprintperjob”.Insymbols,
𝑀 = 𝑋𝑌
Theproofissimple.MeasurethesysteminanintervaloflengthT.LetZdenotethetotalspace-timeusedbyallthejobsinthatinterval.(Forfixedmemory,Z=MT.)Thethroughputisthenumberofjobscompletedinthatinterval(C)dividedbythe
Denning WorkingSetAnalytics 12
lengthoftheinterval:X=C/T.Themeanspace-timefootprintofajobisY=Z/C.TheproductofXandYisobviouslyM=Z/T.7
TheMSTLawisgeneralandappliestoanysystemornetworkthathasmemoryusageandthroughput.Somepeoplefindithardtobelievethattherelationbetweenmemoryusageandthroughputisthissimple.Itreallyis.
Theobviousconclusionisthatapolicythatminimizesspace-timewillmaximizethroughput.Tomakeiteasytotellwhenthisishappening,wewill,intheanalyticstofollow,usespace-timetomeasurethememoryusageofworkingsetsandothermemorypolicies.Ifneeded,wecaneasilyconvertspace-timemeasurestotimeaveragesbydividingspace-timebythelengthoftimeofthemeasurement.
Thereisacomplication.Thespace-timemeasuresofmemorypoliciesaredefinedinvirtualtime.Thespace-timeneededfortheMSTLawisdefinedinrealtime.Weneedawaytoconvertvirtualspace-timetorealspace-time.
Themostaccuratewaytodothisiswiththehelpofaqueueingnetworkmodel[den78b].Themodelwouldaccountforalldelaysbeyondvirtualtime,suchasinput-outputandpagetransfers.Settingupsuchamodelisbeyondthescopeofthistutorial.
However,agoodapproximationcanbemadesimplybyaugmentingvirtualspace-timewiththespace-timeaccumulatedwhileservicingpagefaults.Todothis,werequirethesequantities:
D=pagefaultdelay,typically106memoryaccesses
N=lengthofvirtualtimeaprocessisexecutedS=virtualspace-timeaccumulatedbytheprocessduringexecution
C=numberofpagefaultsaccumulatedduringexecutionm=missrate,C/N
Thespace-timecostofonepagefaultisthemeanspaceS/NtimesthedelayD.Forallpagefaultsitis(S/N)(D)(C)=(S)(D)(C/N)=SDm.Thereforethetotalrealspace-timeisestimatedasS+SDm,orinthenotationoftheMSRLaw,
𝑌 = 𝑆(1 + 𝐷𝑚)
Thustherealspace-timeisapproximatelythevirtualspacetimedilatedbythefactor1+Dm.Forexample,ifthemissrateis10-4(1faultin104memoryaccesses),thedilationfactoris101.
PoliciessuchasLRUorFIFOworkwithafixednumberkofpages;theirtotalvirtualspace-timeiskN.Thus,Yissmalleronlywhenmissmallerandthepolicywithlowestmissratewillhavehighestsystemthroughput.Butthereismoretothe
7Thememoryspace-timelawcanbeseenasaninstanceofLittle’slaw.Little’slawsaysthatthemeannumberinsystemistheproductofthethroughputandthesystemresponsetime.WecaninterpretMasthemeannumberofpagesinthesystem;Xasthethroughput;andYastheaggregateholdingtimeaccumulatedbyajobforallitspages.
Denning WorkingSetAnalytics 13
story.Oncethemissrateasafunctionofkisknown,thereisavalueofkthatminimizestheexpressionforY[den80].Evenfortheoptimalpolicy,thereisabestmemorysizethatminimizesthespace-time.
Whenwediscussthevariable-spacepolicyworkingset,wewillseethatminimizingvirtualspace-timemaximizessystemthroughput.
TwoContrastingViewsofMemoryManagementThefamiliarwayoflookingatthememoryseesasingleCPU(representingajob
inexecution)accessingpagesinafixed-sizememory.NootherCPUormemoryregionisvisible.ThisiscalledtheFixedMemoryView(FMV).(SeeFigure2.)Inthisview,thejobgetsafixedspaceinmainmemoryandisunaffectedbythepresenceofotherjobsinthememory.Theperformanceofareplacementpolicyisthencompletelydeterminedbyitstotalfaultcountfunction.
Figure2.TheFixedMemoryView(FMV)seesthesystemasasingleCPUaccessingasingleRAM(mainmemory)offixedsizekpages.AddressmappinghitsaretranslateddirectlytoRAMaddresses.AddressmappingmissestriggerpagefaultsthatcauseupanddownpagemovesbetweenRAMandDISK(secondarymemory).TheobjectiveistoefficientlycomputethefaultfunctionF(k),whichcountsthenumberofpagefaultswhenmemorysizeiskpages,andthenselectreplacementpoliciesthatminimizeF(k).Inasystemwithmultiprogramming,theRAMvisibleinthisviewisafixedregionofthefullRAM.
Denning WorkingSetAnalytics 14
Extensiveexperimentalstudiesandexperiencewithoperatingsystemsleddesignerstofavortwobasicreplacementpolicies.FIFO(firstinfirstout)treatsthememoryspaceofkpagesasaFIFOqueueandignorespageusage.Itsprimaryattractionissimplicityandalmostnegligibleoverhead.LRU(leastrecentlyused)treatsthememoryasacontainerofthekmostrecentlyusedpages;atapagefault,itreplacesthepageinmemorythathasnotbeenusedforthelongesttime.AlthoughLRUgeneratesfewerpagefaultsthanFIFO,LRUhasahighimplementationoverhead.ManyanalystsbelievedthatthesavingsofLRUarecancelledbyitsoverhead.
AnelegantcompromisebetweenFIFOandLRUiscalledCLOCK.IttreatstheFIFOqueueasacircularlistofsizekwithascanningpointer;thepagenamesareanalogoustothenumeralsonrimofaclockandthepointertotheclock’shand.Atapagefault,theoperatingsystemmovesthehandalongthelist,skippingoverthosewithusebitson(andresettingthem).Whenitfindsapagewithusebitoff,itselectsthatpageforreplacement.CLOCKhasoverheadcomparabletoFIFOandperformancecomparabletoLRU.CLOCKiscommonlyusedinoperatingsystems.(Inearlyvirtualmemorysystems,CLOCKwascalledFINUFO,forfirst-in-not-used-first-out[den68a].)
In1970,RichardMattsonandhisIBMcolleaguesdiscoveredahighlyefficientwaytorepresentalargeclassofpagingalgorithmsandcomputetheirfaultfunctions.Theycalledtheirtheory“stackalgorithms”[mat70].Astackalgorithmisapagingpolicywhosememorycontentscanberepresentedwithasinglelistofallthejob’spagescalledthestack,suchthatthecontentsofk-pagememoryarethefirstkelementsofthestack.LRU’sstacklistsallthejob’spagesfrommosttoleastrecentlyused;thecontentsofthek-pageLRUmemoryarethefirstkpagesinthestack.Ateachpagereference,LRU’sstackisupdatedbymovingthereferencedpagetothetopandpushingtheinterveningpagesdownoneposition.Forageneralstackalgorithm,muchthesamehappens:thereferencedpagemovestothetopandtheinterveningpagesarerearrangeddownwardaccordingtotheirrelativeprioritiesassignedbythepagingpolicy.Thepositionofthenextreferenceinthestackiscalledstackdistance.ThefaultfunctionF(k)canbecomputedsimplyasthenumberofstackdistanceslargerthank.Thiseleganttheoryisfrequentlydiscussedinoperatingsystemstextbooks.
Thestacktheorydidnotanswertwoimportantquestions:Whatistheoptimalamountofmemorytoallocate?HowissystemthroughputrelatedtothefaultfunctionF(k)?Thesequestionswereansweredbyothermodelingtechniques;detailsarein[den80].
Unfortunately,theFMVanditsanalytictheoryisnotveryhelpfulforrealoperatingsystems,whichallowmultiplejobstoresideinRAMatthesametime.TheFMVgivesnoinsightintoimportantquestionsincluding:
• HowtopartitiontheRAMamongthejobs?
Denning WorkingSetAnalytics 15
• Howmuchspacetogiveeachjob?
• Howtomanagevariationsinthespaceallocations?
• Howtohandlepagereplacementtomaximizesystemthroughput?
• Howtomanageinteractionsamongjobssuchasapagefaultinonestealingapagefromanother?
• Howtopreventthrashing,acollapseofsystemthroughputwhentoomanyjobsareloadedintoRAM?
Weobviouslyneedadifferentwayofthinkingaboutthecommonsituationofmultiprogramming.Itiscalledthesharedmemoryview(SMV).(SeeFigure3.)
Figure3.Thesharedmemoryview(SMV)seesthesystemasasetofCPUs(oneforeachjob)sharingtheM-pageRAM.ThememoryispartitionedamongNjobs,eachgettingitsownsetofpageframesdisjointfromalltheothers.Apagefaultmaytriggerthepagereplacementalgorithmtostealapagefromanotherjob,therebyincreasingthespaceoccupiedbythefaulteranddecreasingthespaceoccupiedbythevictim.
Denning WorkingSetAnalytics 16
Becauseofallthefactorsinvolved,theoptimalmanagementofsharedmemorycannotbeinferredfromfixedmemory.Obviousextensionsofthefixedmemoryviewleadtopoorperformanceandinstability.Forexample,manyoperatingsystemssimplyextendedthefixedmemoryviewtoincludeallofRAM.TheresultingglobalLRUwouldreplacetheoldestunusedpageinRAMregardlessofwhichjobitbelongsto.Unfortunately,thegloballistofpagesdoesnotreflectactualrecencyofusewithinjobs.Itisorderedmainlybytheround-robinschedulerofthereadylist:thepagesatoptheglobalLRUstackbelongtothejobthatmostrecentlyreceivedatimeslice.Ifpagingactivityishigh,bythetimeajobcyclesbacktothefrontofthereadylistsomeofitspageshavebeenremoved.Thecascadingeffectcauseshighpagingineveryjob,pushingwholesystemintoastateof“pagingtodeath”,inwhicheveryjobspendsmostofitstimequeuedattheDISKandCPUthroughputcollapses.Thisconditioniscalledthrashing.(SeeFigure4.)
Figure4.ThrashingisthecollapseofsystemthroughputwhentoomanyjobsareloadedintoRAMatonce.Itisachaoticconditionwhoseonsetcannotbepredictedaccurately–thetriggerthresholdN*isunpredictableandverysensitive.Loadingoneadditionaljobcantriggerthrashing.Thiscanhappeninanysharedmemory,suchascache,notjustRAM.
Denning WorkingSetAnalytics 17
Weneedadifferentwayofthinkingaboutmanagingsharedmemory.Theworkingsetmodelprovidesthebasis.Weneedtoabandonthefixedmemoryviewandinsteadusetheworkingsetinterpretationofsharedmemoryviewtoseewhatisgoingon.Theremainderofthispaperwillapproachthisintwostages.First,wewilldefineanalyticmethodsforcomputingworkingsetstatistics,suchasmissrate,fromagivenaddresstrace.Thesemethodsdonotdependonlocalityoranyotherassumptionsaboutprogrambehavior.Second,wewillshowthattheworkingsetpolicyisclosetooptimalforprogramswhoseaddresstracesconformtotheprincipleoflocality.Inthatcase,theworkingsetspace-timeisveryclosetotheoptimalspace-time.
WorkingSetsOriginallytheterm“workingset”wasaninformaltermmeaningthesmallest
setofajob’spagesthatneededtobeloadedinmainmemoryforefficientexecution.Theworkingsetmodelgaveaprecisedefinitionintermsofthepage-referencebehaviorofanexecutingjob.Theexecutingjobitselfinformsusofwhatmemoryitneeds,withoutregardforexternalfactorssuchasinterrupts.Thus,workingsetsareameasureoftheintrinsic,dynamically-varyingdemandofacomputationformemory[den68a,den72].
Theformaldefinitionisthattheworkingsetataparticulartimetisthepagesreferencedinabackward-lookingwindowofsizeTincludingtimet.ItisdenotedW(t,T).Thesize,w(t,T),isthenumberofdistinctpagesinW(t,T);thesizeisalwaysatleast1andneverlargerthanT.SizemaybeconsiderablysmallerthanTduetorepeatedreferencestothesamepageswiththewindow.Figure5illustrates.
Figure5.Thisexampleshowstheworkingsetofasimplejobattwodistincttimes.Weassumetimeisdiscreteandeachclocktickrepresentsasinglepagereference.Theseriesofnumbersabovethetimeline,calledanaddresstrace,isthesequenceofpagenumbersaccessedbyajob.Inthiscase,thereareaccessesattimest=1,2,…,15.ThewindowsizeisT=4.Thebackwardwindowatt=8contains4referencestothreedistinctpages;itssizeis3.Thebackwardwindowatt=15contains4referencestofourdistinctpages;itssizeis4.Weimaginethewindowslidingalongthetimeline,givingusadynamicallyvaryingseriesofworkingsets.
Denning WorkingSetAnalytics 18
Theworkingsetwindowcanbethoughtofasalease.AleaseisaguaranteetoholdapageinRAMforminimalperiodoftimeT.Apage’sleasecanbestoredinatimerregisterassociatedwithapageframe.WhenapageisloadedintoRAMorreusedwhileinRAM,itsleaseisresettoT.Itticksdownto0afterTtimeunitsofnon-use.Whentheleaserunsout,thepageisevictedfromtheworkingset.Theleasedefinitionisequivalenttothewindowdefinition[li19].
TheWorkingSetPolicyWorkingsetmemorymanagementpartitionsthemainmemoryinto
dynamicallychangingregions,onefortheworkingsetofeachjobusingthememory.Thebasicideaistomaintaineachjob’sworkingsetandnotallowanyotherprocesstostealpagesfromit[den68a].(SeeFigure6.)Thisisimplementedbymaintainingafreespaceinmemory,usuallysmallerthananyoftheworkingsets.Whenajobreferencesapagenotinitsworkingset(apagemissorfault),theoperatingsystemtransfersapagefromfreespacetothejob,increasingitsworkingsetsizebyone.Inparallel,whenapagetimesoutfromitsjob’swindowT,theoperatingsystemtransfersitbacktothefreespace,decreasingtheworkingsetbyone.Inthisregimepagefaultsandevictionsneednotcoincideastheydowhenmemoryallocationisfixed.
Figure6.Theboxrepresentsmemory,aRAMorcache,inusebyNexecutingjobs.Eachjob’sworkingsetisloadedinmemory.ThememorynotoccupiedbyworkingsetsiscalledFREE.Whenajobencountersapagemiss,thememoryslottoholdthenewpageistransferredfromFREEtothejob’sworkingset,thusenlargingthatworkingsetbyonepage.(IfFREEisempty,theincomingpagereplacestheLRUpageoftheworkingset.)WhenaworkingsetpageleaseTtimesout,thepageisevictedfromtheworkingsetandthe
Denning WorkingSetAnalytics 19
emptymemoryslotreturnedtoFREE,thusdecreasingthatworkingsetbyonepage.Whenajobquits,allitspagesareerasedandtheirmemoryslotsreturnedtoFREE.Aschedulermaintainsaqueueofjobswantingtousememory,admittingthenextoneonlyifFREEislargeenoughtoholditsworkingset.Thispolicypreventsanyjobfromstealingapagefromanotherduringapagefault,therebyprotectingthesystemfromthrashing.
AnimportantquestionishowtochoosethewindowsizeT?Whenwereturnto
thisquestionlater,wewillseethatwecanselectasingleglobalvalueofthewindowTforwhichworkingsetmanagementdeliversnear-optimalsystemthroughput.
Moreover,becausetheworkingsetpolicypreventsprocessesfromstealingpagesfromeachother,itisimmunetothrashing.
Theseaspects–simplicityofworkingsetmemorymanagement,optimalityofthroughput,andresistancetothrashing–makeworkingsetmemorymanagementtheidealofoperatingsystemsandcaches[den80].
Workingsetanalytics,discussednext,showsushowtoefficientlymeasureworkingsetstatisticstodeterminethememorycapacityofasystemandthethroughputlikelytobeobservedunderaworkingsetpolicy.Allthedataneededfortheworkingsetstatisticscanbemeasuredinasinglepassofanaddresstraceandcapturedasahistogramofreuseintervals.Eachstatisticisasimplelinearcomputationfromthosedata.Theanalyticsdonotdependonanylocalityorstochasticassumptionsabouthowjobsrefertotheirpages.
Traces,Reuses,ColdandWarmStartsInthissectionwewilldefinethebasicterminologyandnotationused
throughoutworkingsetanalytics.
Allthestatisticsaremeasuredindiscretevirtualtime,whichisthetimeofprogramexecution,onetickpermemoryaccess.Delaysforinput,output,ortime-slicingareignored.Measuringinvirtualtimeallowsustoseetheinherentmemorydemandofaprogramwithoutdistortionbyrandominterrupts.Whenneeded,wecanconvertthesemeasuresbackintorealtimebyinsertingdelayswheninterruptsoccur.
Anaddresstraceisarecordingofthesequenceofpagenumbersreferenced(accessed)byajobattimest=1,2,…,N;r(t)=imeansthattheprocessaccessed(used)pageiattimet.OSperformanceanalystsuseaddresstracesasinputstosimulatorsofmemorymanagementpoliciesintheOSorthecache.TracescanalsobeusedtobuildpagereferencemapssuchasFigure1.Theintervalsbetweensuccessivereferencestoapagearecalledreuseintervals.Figure7illustrates.
Denning WorkingSetAnalytics 20
Figure7.TheaddresstraceofFigure5isshownagainintoprow.Itspans15timeunits(N=15)and5pages(M=5).Thereuseintervalsappearjustbeloweachreference;areuseintervalisthetimesincepriorreference.Themark“x”indicatesfirstreferences,whichhavenoprioruse.Reuseintervalsareimportantbecausetheyindicatewhetherapageisintheworkingset.Forexample,attimet=6,page2isreusedwithinterval4;aworkingsetwithwindows3orlesswillgenerateapagemissatthattime.
Themisscount,mc(T),isthenumberofmisses–referencesnotintheworkingset.Missescausepagefaults.Weoccasionallyspeakofthe“missrate”,whichissimplythemisscountdividedbyN.Itissometimessuggestedthat,topreventthelargespeedgapof106betweenmainandsecondarymemoryfromruiningperformance,weshouldchooseTlargeenough,ifpossible,tokeepmissratesnohigherthan10-6.Aswewillsee,however,thisisnotthebestwayofchoosingT.
Thefirstreferencestopagearedifferentfromtheothersbecausetheyhavenoreuseintervals.Whethertheycauseinitialpagefaultsdependsonhowthememoryisinitialized.Therearetwopossibleinitializations.Thecoldstart(ornormal)initializationissimplyanemptymemory;thefirstreferencescausepagefaults.ThewarmstartinitializationcontainsallMpages;thefirstreferencescausenopagefaults.Theworkingsetcontentsthroughouttheaddresstracearethesameforcoldandwarmstart.Thedifferenceisthatwithcoldstarteveryfirstreferenceisamiss;withwarmstarteveryfirstreferenceisahit.Themisscountmc(T)isusedforcoldstartandanewcountmw(T)isusedforwarmstart;thedifferenceis
𝑚𝑐(𝑇) − 𝑚𝑤(𝑇) = 𝑀Thedistinctionbetweencoldandwarmstartsisusedinoperatingsystemsfor
thegeneralnotionofwhetherresumptionofasuspendedjoboccurswithemptymemoryorwiththepreviouscontentsofmemory.Coldstartsareslowbecauseallthedatamustbereloadedfromthedisksorothersources;warmstartsarefastbutrequirealotofinitialmemory.Theaddresstracemodelimplicitlyassumeswarmre-startsofprocessessuspendedbyinterrupts–thememorycontentsattimet+1arejustwhattheywereaftertimet,whetherornotaninterruptionoccurredbetweentandt+1.
SomeanalystsspeculatedthatvirtualmemoryperformancewouldimproveiftheOSachieveswarmstartbypreloadingpages.Ourformulas,however,showthat
Denning WorkingSetAnalytics 21
inthelongrunthereisverylittledifferencebetweeninitialcoldandwarmstartsofaddresstraces.
MeanWorkingSetSizeThedefinitionofmeanworkingsetsizeoveranaddresstraceis
𝑠(𝑇) = 1𝑁7𝑤(𝑡, 𝑇)
$
%&!
=𝑠𝑡(𝑇)𝑁
wherest(T)isthespace-timeoccupiedbyworkingsetsinthetrace;oneunitofspace-timemeansthatonepage(space)wasinmemoryforonetimeunit(time).Weseethat[den68a,den72,den78a]
• Whent<T,theT-windowextendsbackwardbeforethestartofthetrace;onlytheportionofthewindowcontainedinthetracecountstowardst(T).Thus,for1≤t<T,theworkingsetsizecanbewrittenw(t,t).
• Fort≥T,thereareN-T+1workingsetswithwindowTfortheremainderoftheaddresstrace.
Insomeoftheanalyticsdiscussedbelow,thisdistinctionisimportant,andwewillseparatetheoriginalsumintotwopartsfort<Tandt≥T.Wewillworkwithfourspace-timeandcountingmeasures:
st(T)=space-timeaccumulatedbyworkingsetsofwindowT mc(T)=cold-startmisscount,thenumberofmisseswithwindowT
mw(T)=warm-startmisscount,sameasmc(T)excludingMfirstreferences
mwh(T)=warm-start-hot-finishmisscount(definedbelow)EachmeasurecanbeconvertedtoatimeaveragebydividingbyN,thetrace
length.Forexample,theclassicalmeanworkingsetsizeandmissrateare:
𝑠(𝑇) = 𝑠𝑡(𝑇)𝑁
𝑚(𝑇) =𝑚𝑐(𝑇)𝑁
ColumnSums,RowSums,andRunsWecandepictdynamicmemoryusewithabit-matrixwithonecolumnforeach
referencefort=1,…,Nandonerowforeachpagei=1,…,M.Position(i,t)is1ifpageiisinworkingsetW(t,T)and0ifiisnotinW(t,T).Letuscallthismatrixtheworking
Denning WorkingSetAnalytics 22
setresidencymap.Inthemap,thespace-timeoccupiedbyworkingsets,st(T),isthetotalnumberofpositionscontaining1.Figure8illustrates.
Figure8.Theaddresstracegivenearlierisshownacrossthetop,labelingthecolumns,andthepagesalongtheside,labelingtherows.Eachmapposition(i,t)ismarkedwith1ifthepageiisintheworkingset(T=4)atthattimet,or0otherwise.Thecolumnsrepresentthedifferentworkingsetsandthenumberof1sinacolumnistheworkingsetsize.Thecolumnatt=0indicatesthattheworkingsetisinitiallyempty(coldstart).(Warmstartwouldberepresentedbyacolumnof1s.)Notethatamissoccurswheneverthereisa1inarowimmediatelyprecededbya0.Notealsothatthisisaminiaturepagereferencemapwithsamplinginterval1.
Letcol(t)bethenumberof1’sincolumntofthematrix;noticecol(t)istheworkingsetsizew(t,T).
Letrow(i)bethenumberof1’sinrowi.
Arunisaseriesofconsecutive1sstartingwithamissandendingwitheithera0orendoftrace.Figure9illustrates.ThefinalrunmayterminateattimeNwithnomore0’s.
Denning WorkingSetAnalytics 23
Figure9.Arunisatimeintervalduringwhichaparticularpageiscontinuouslyintheworkingset.Inthisillustration,eachverticalmarkisareferencetotheparticularpageandthedistancesbetweenverticalmarksarereuseintervals.Arunbeginswithamiss,containszeroormorereuseintervals≤T,andendsinareuseinterval>T.Inthereuseinterval>T,thepageremainsintheworkingsetforT-1timeunitsbeyondthepreviousreference–theoverhang.Aruncanendwithasmalleroverhangafterthelastreferencetothepage;afinalintervaloflengthk≤Thousesanoverhangofk-1,notT-1.Eachnewmissstartsanewrun.
Little’sLawforWorkingSetsLittle’slawisveryusefulinqueueinganalysesbecauseitgivesarelation
betweenthreemeanvalues:themeannumberinasystemistheproductofthemeanholdingtimeinthesystemandthethroughputofthesystem.Foraworkingsetconsideredasanevolvingsystemcontainingpages,thenumberisthemeanworkingsetsize,theholdingtimeisthemeanlengthofarun,andthethroughputisthemissrate:
𝑠(𝑇) = 𝑅(𝑇)𝑚(𝑇)
Thislawiseasytoverifyfromourdefinitions.Becauseeveryrunbeginswithamiss,thetotalnumberofrunsismc(T).Themeanrunlengthis
𝑅(𝑇) =1
𝑚𝑐(𝑇)7𝑟𝑜𝑤(𝑖)'
#&!
Then:
𝑠(𝑇) = 1𝑁7𝑐𝑜𝑙(𝑡) =
1𝑁
$
%&!
7𝑟𝑜𝑤(𝑖) = 7𝑟𝑜𝑤(𝑖)𝑚𝑐(𝑇)
'
#&!
'
#&!
𝑚𝑐(𝑇)𝑁 = 𝑅(𝑇)𝑚(𝑇)
IntheexampleofFigure8,thetotalspace-timeis48andnumberofrunsis7;thuss(4)=48/15,m(4)=7/15,andR(4)=48/7.
Little’slawalsoquantifiestheamountofoverhang(seeFigure9).Foraverylongaddresstrace,everyrunterminateswithareuseinterval>TwithoverhangT-1.
Denning WorkingSetAnalytics 24
Takethe“system”tobeadelaylineofT-1timeunitsrepresentingtheoverhang.Thethroughputism(T).Theresponse(holding)timeofapageinoverhangisT-1.Thereforethemeannumberofpagesinoverhangistheproduct(T-1)m(T).(ThisargumentisnotexactbecausesomeoverhangsattheendofthetraceareshorterthanT-1;wewillshowshortlyhowtocorrectforthis.)ThemeanoverhangisusefultoassessthedistanceWSisfromoptimalbehavior:ifwecouldeliminatealloverhangs,theworkingsetwouldbeoptimal.Wewillprovethisshortly.
WorkingSetRecursionsInworkingsetanalytics,recursionsaresimpleformulasthatrelateameasure
atwindowsizeT-1tothemeasureatwindowsizeT.Wewillderiverecursionsformisscountandworkingsetspace-time.Theserecursionsenableustocalculatemissrateandworkingsetsizeiterativelyinlineartime–thatis,O(N)forNtheaddresstracelength.
ThealgorithmformissrateismuchfasterthanitscounterpartintheFixedMemoryView.ThebestalgorithmintheFixedMemoryViewsimulatesthestackinordertogetthestackdistances,requiringtimeO(MN).Thisdifferencecanbesignificant.Forexample,a32-bitaddressspace(232)composedof4096-bytepages(212)hasM=220(106)distinctpages;computingfixed-memoryfaultcountwouldthereforetakeamilliontimeslongerthantheworkingsetmisscount.
Inasinglepassofthetrace,wecanaccumulateahistogramofreusecounts,𝑐(𝑘) = 𝑛𝑢𝑚𝑏𝑒𝑟𝑜𝑓𝑟𝑒𝑢𝑠𝑒𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙𝑠𝑜𝑓𝑙𝑒𝑛𝑔𝑡ℎ𝑘
fork=1,…,N.Wewillalsoincludeoneadditionalcounterc(x)thatcountsthenumberoffirstreferences.Thesecounterscanallbeupdatedinlineartime.8
Bydefinition,thereisnoreuseintervalprecedingthefirstreferencetoapage.ThenextreferenceisamissifthereuseintervallengthisgreaterthanT.Inotherwords,forwarmstart,
𝑚𝑤(𝑇) = 7 𝑐(𝑘)()*
andthusforcoldstart𝑚𝑐(𝑇) = 𝑀 +𝑚𝑤(𝑇)
Themisscountsatisfiesthesimplerecursion
𝑚𝑐(𝑇 + 1) = 𝑚𝑐(𝑇) − 𝑐(𝑇)withtheinitialconditionmc(0)=N.
8Thiscanbedonebymaintainingtimestamps,last(i),forthelastreferencetopagei.Atareferencetopageiattimet,thereuseintervalisk=t-last(i).Afterthelastreference,theendintervalisk=N+1-last(i).Adding1toc(k)foreachoftheseeventsleavescorrectedvaluesinthecounters.Thisisthesameprocedureasinthe1978paperwithSlutz[den78a].
Denning WorkingSetAnalytics 25
Nowweturntoarecursionfortheworkingsetsize.Todothis,weexploitaninclusionproperty:therunsforwindowT+1includethoseforwindowT.
WhathappenswhenweincreaseTtoT+1?Atfirstapproximation,everyrunisextendedbyjust1unit:allthe“reuse≤T”intervalsarealso“reuse≤T+1”intervalsandonlythefinal“reuse>T”hasroomforexpansion.Itsexpansionisoneunitofspace-time.Becausethetotalnumberofrunsismc(T),thetotalnumberof1’saddedtothereferencemapismc(T).Thus
𝑠𝑡(𝑇 + 1) = 𝑠𝑡(𝑇) + 𝑚𝑐(𝑇)
Thisrecursionisexactforinfiniteaddresstraces[den68a,den72],butcontainsanerrorforfinitetraces[den78a].Theerrorsoccurintheendintervalsofpagesasfollows.ThereareMendintervals,oneforeachpage.Anendintervalisthetimefromthelastreferencetoapageuntiltheendofthetrace.Noticethatthisisthesameasifwepretendthereisanother,phantomreuseofpageiattimeN+1.Noticealsothatanendintervaloflength≤TwillnotexpandwhenTincreasestoT+1.Tocorrectforthis,weneedtodeductfrommc(T)thenumberofend-intervalsoflength≤T.Wecandefineanendfactor
𝑒(𝑇) = 𝑛𝑢𝑚𝑏𝑒𝑟𝑜𝑓𝑒𝑛𝑑𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙𝑠 ≤ 𝑇andthentheaccuraterecursionis
𝑠𝑡(𝑇 + 1) = 𝑠𝑡(𝑇) + 𝑚𝑐(𝑇) − 𝑒(𝑇)Wecanexpressthiswiththemeans
𝑠(𝑇 + 1) = 𝑠(𝑇) + 𝑚(𝑇) −𝑒(𝑇)𝑁
Becausee(T)≤M,theendfactorvanishesasNbecomeslarge.9Wecansimplifythisbylookingcloselyatthewaytheendintervalscontribute
toe(T).Thereisonlyoneendintervalforeachpagei.Definetheend-counterec(k)=1ifapagehasend-intervaloflengthkand0otherwise.Thenthesumofallthee(k)isM.Thelasttwotermsinthest-equationabovecanbereduced:
𝑚𝑐(𝑇) − 𝑒(𝑇) = 𝑀 +7𝑐(𝑘) − 7 𝑒𝑐(𝑘)!+(+*()*
= 𝑀 +7𝑐(𝑘) −()*
J𝑀 −7𝑒𝑐(𝑘)()*
K
= 7(𝑐(𝑘) + 𝑒𝑐(𝑘))()*
≜ 𝑚𝑤ℎ(𝑇)
9Itisinterestingtonotethate(T)=w(N+1,T),thenumberofpagesinaworkingsetmeasuredattheendofthetrace.ThereasonisthatapageisinthatworkingsetifandonlyifitsfinalreferenceiswithinthelastTwindowofthetrace.
Denning WorkingSetAnalytics 26
Inotherwords,thetermmc(T)-e(T)=mwh(T)iscomputedbyaddingtheend-correctioncountstothereusecountsc(k).Wecallthemisscountwithwarmstartandcorrectedendintervalsthe“warmstarthotfinish”misscount.Itisawarmstartbecauseinitialpagefaultsarenotcounted.Itis“hotfinish”becausetheendcorrectionsareappliedattheveryendbypretendingthatallMpagesaresimultaneouslyreferencedattimeN+1.Thiscanbesummarizedastheworkingsetsizerecursion:
WorkingSetSizeRecursion
𝑠𝑡(𝑇 + 1) = 𝑠𝑡(𝑇) + 𝑚𝑤ℎ(𝑇)wherest(T)isthespace-timeaccumulatedbyworkingsetsofwindowTandmwh(T)isthewarmstarthotfinishmisscount.Theinitialconditionsarest(0)=0andmwh(0)=N.
Noticethatforverylongaddresstraces(largeN),theMendcorrectionsareinsignificantcomparedtothetotalofallthecounters.ForlargeN,mwh(T)convergestomc(T).
Therecursionwereported1972wasmathematicallythesame,butitdependedontheassumptionthatthereferencestringisarandom(stochastic)processthatentersalong-termsteadystate[den72].In1978weremovedthestochasticassumptionandfoundtherecursionworksforfiniteaddresstracesifwemakeendcorrectionstothecounters.Theworkingsetrecursionstatedherehasamuchsimplerderivation.
Theworkingsetrecursioncanalsoberun“backwards”todeducethemissrategiventhemeanworkingsetsize.Ifwehadadirectmeasurementofthemeanworkingsetsize,wecouldfindthemisscountsbytakingthedifferencesmc(T)=st(T+1)-st(T).[xia13]
Denning WorkingSetAnalytics 27
Forcompleteness,wesummarizethemissraterecursions:
SummaryofMissRateRecursionsMissratesetrecursionsenablethelinear-timecalculationofmisscountsatwindowsizeT+1intermsofthoseforwindowsizeT.Formisscounts:
𝑚𝑐(𝑇 + 1) = 𝑚𝑐(𝑇) − 𝑐(𝑇)Wherec(T)isthecountofreusedintervalsoflengthT,withinitialconditionsc(0)=0andmc(0)=N.Forwarm-start-hot-finish
𝑚𝑤ℎ(𝑇 + 1) = 𝑚𝑤ℎ(𝑇) − 𝑐𝑐(𝑇)
wherecc(T)=c(T)+ec(T)isthecountcorrectedforendintervals.Theinitialconditionsarecc(0)=0andmwh(0)=N.
ThemisscountmeasuresmwhandmcbothcontainNcounts;mcincludesMfirstreferencesbutnoendcorrections,andmwhincludesnofirstreferencesbuthasMendcorrections.AsNgetslarge,bothratesmc(T)/Nandmwh(T)/Nconvergetom(T).
ExampleFigure10showsthesequenceofreuseintervalsandendintervalsforthe
exampleaddresstrace.Wetallythesedataalongwiththemisscountsandworkingsetspace-timeinthetablebelow.Therowfork=0givestheinitialconditions.
Figure10.ThetoprowistheaddresstracefromFigure7.Thereuseintervalsareshowninthemiddlerow,whereeach“x”marksafirstreference.Theendintervalsareshowninthebottomrow,whereeach“x”marksanon-finalreference.
Denning WorkingSetAnalytics 28
Table2
k c(k) ec(k) mc(k) mw(k) mwh(k) st(k)
0 0 0 15 10 15 0
1 1 1 14 9 13 15
2 1 1 13 8 11 28
3 1 1 12 7 9 39
4 5 1 7 2 3 48
5 1 0 6 1 2 51
6 0 1 6 1 1 53
7 0 0 6 1 1 54
8 0 0 6 1 1 55
9 0 0 6 1 1 56
10 0 0 6 1 1 57
11 1 0 5 0 0 58
12 0 0 5 0 0 58
13 0 0 5 0 0 58
14 0 0 5 0 0 58
x 5
ThesedataareplottedinFigure11andcomparedwithLRUandMIN.ThegraphshowsaregionatthesmallermemoryallocationswhereLRUisbetterthanWS.Asimilarpatternissometimesobservedinactualprograms[wan15,wir14].
Denning WorkingSetAnalytics 29
Figure11.Thisgraphcomparesthreefaultcurves.Afaultcurveplotsthenumberoffaults(verticalaxis)versusmemorydemand(horizontalaxis).ThesolidcurveistheWSexampleofFigure8.LRUandMINarealsoshown.Forafixed-spacepolicy,thememorydemandinspace-timeistheproductofthememorysizeandaddresstracelength;heretheallowablememorysizes1,2,3,4,5forLRUandMINcorrespondtospace-times15,30,45,60,75.Incontrast,WSisvariablespaceandisdefinedatpointscorrespondingtonon-integeraveragememorysizes.Forthisaddresstrace,WSismostlybetterthanLRUandmostlyworsethanMIN,althoughatthelargestwindowsize,WSisslightlybetterthanMIN.Thisisnotananomaly,butrathertheconsequenceofworkingsetsbeingvariableinsize.
Denning WorkingSetAnalytics 30
OptimalMemoryPolicyNowwewillexaminetheoptimalpoliciesformanagingmemory.Theoptimal
policyforFixedMemoryisMIN.ItwasdefinedbyLesBeladyin1966[bel66].MIN’sprinciple,invokedatapagefault,is“replacethepagethatwillnotbereusedforthelongesttimeinthefuture.”Thispolicyisunrealizablebecausetheoperatingsystemcannotknowthefuture.However,itsfaultcountcanbecomputedinasinglepassofanaddresstracewithaboutthesameoverheadasforLRU:orderO(NM)[matt70].
TheoptimalpolicyforSharedMemory,inwhichthespaceallocatedtoajobcanvary,isVMIN.ItwasdefinedbyBartonPrieveandRobertFabryin1976[pre76].VMIN’sprinciple,invokedaftereachreference,is“retainthecurrentreferenceinmemoryifitsforwardreuseintervalis≤T,otherwisedeleteitimmediately.”Thispolicyisalsounrealizable.However,itsfaultfunctionandmeansizecanbecalculatedrapidlyasforWS:orderO(N).
Itisinterestingthatdatabasedesignersdiscoveredthesameoptimizingprincipleinthe1970s[gra85].Hereistheirargument.Supposewewanttodecidewhentokeepapageofdatainmainmemoryversusthemuchslowerharddisk.Justafterthepageisused,welookintothefuturetoseeexactlywhenitwillbeusedagain.Thenwedoasimplecalculationtocomparetherentalcostofkeepingitinmemoryuntilnextusewiththecostofremovingitimmediatelyfrommemoryandpayingtheswappingcostforitsretrievallater.Thedatabasedesignersfoundthatwithtypicalparametersformemorycostanddiskdelay,adatapageshouldbedeletedifithasnotbeenreusedafterabout5minutes.
VMINusesthewindowofsizeTasthethresholdpointatwhichretaininguntilnextreusecoststhesameasapagefault.Tostatethisprecisely,supposetimet+xisthenextreuseofthepagereferencedattimet;VMINdecideswhethertoretainthatpageornotasfollows:
Ifx>Tthenimmediatelyevictthepagefrommemory;
Ifx≤Tthenkeepthepagecontinuouslyinmemoryuntilnextreuse.VMINexercisesthischoiceateverytimet.Becauseeachandeveryreferenceisevaluatedforminimumcost,thetotalcostmustbeminimumtoo.
VMINisjustlikeWS,butwithaforwardlookingratherthanbackwardlookingwindow[den80].Ifthepageisusedagainintheforwardwindow,VMINretainsitinmemoryuntilnextuse.Ifthepageisnotusedagainintheforwardwindow,VMINimmediatelyremovesitfrommemory.VMIN’spagefaultsequenceisidenticaltoWS.
Whenthereuseintervalxis≤T,WSandVMINretainthepagecontinuouslybetweenthetwosuccessivereferences.Ifx>T,WSretainsthepageforadditionaltimeuntilevictingit,whereasVMevictsitimmediately.Thismeansthatthespace-timedifferencebetweenWSandVMINisdueentirelytotheoverhangsinthereuseintervalslongerthanT.ThisobservationallowsustocalculatetheVMINspace-timevt(T)directlyfromreusecounters.Startwithvt(1)=Nbecausewithwindowsize1
Denning WorkingSetAnalytics 31
onlythecurrentreferencesareintheVMINmemory.Areuseintervaloflengthk≤Tcontributesanadditionalk-1tothespace-time.Thus,
𝑣𝑡(𝑇) = 𝑁 +7(𝑘 − 1)𝑐(𝑘)*
(&!
Wegetarecursionstraightway,
𝑣𝑡(𝑇 + 1) = 𝑁 +7(𝑘 − 1)𝑐(𝑘) = 𝑁 +7(𝑘 − 1)𝑐(𝑘) + 𝑇𝑐(𝑇 + 1)*
(&!
*,!
(&!
orsimply,𝑣𝑡(𝑇 + 1) = 𝑣𝑡(𝑇) + 𝑇𝑐(𝑇 + 1)
Therefore,aswithWS,theVMINspace-timeandmissratecanbecalculateddirectlyfromthereuseintervalcountersc(k),withoutasimulationofVMIN.10
WecancalculatethedifferenceofWSandVMINspace-timesimplyasthetotaloverhanginreuseintervals>T.Specifically,theoverhangisT-1inanyreuseorendinterval>T;byourpreviouscalculationstherearemwh(T)ofthese.Forallreuseintervalsoflengthk≤T,thereisnooverhang,butendintervalsoflengthk≤Thaveanoverhangofk-1.Thisleadstotherelation
𝑠𝑡(𝑇) = 𝑣𝑡(𝑇) + (𝑇 − 1)𝑚𝑤ℎ(𝑇) +7(𝑘 − 1)𝑒𝑐(𝑘)*
(&!
Figure12illustratesthisforthedataoftheprevioustable.
Thelessonfromallthismathisthatwecanquicklycomputethespace-timeofaVMINpolicyfromthesamereuseintervalstatisticsasforWS.
10Bycomparison,theWSspace-timerecursionhasamiss-ratetermmwh(T),thesumofcounters,insteadofasinglecounterc(T+1).ThatisbecausewhenweincreaseTtoT+1,theWSoverhanggrowsinallthereuseintervals>T,thetotalnumberofwhichisthesumofallthecountersc(k)fork>T.VMINisparsimonious:increasingTtoT+1onlyaddsthespace-timeofreuseintervalsofexactlylengthT+1.
Denning WorkingSetAnalytics 32
Figure12.ThisgraphplotstheVMINspace-timeandtheWSspace-time.Ifwepickanyvalueofmisscount,wecancomparetheVMINandWSmemorydemands.Forexample,atmisscount8,VMINspace-timeis30page-unitsandWSspace-timeisabout47;theVMINmemorydemandisabout1/3smallerthanWSforthesamepagingrate.NoticethattheVMINcurveisalwaysconvex,whereasWSisnot.Becausetheexampleaddresstraceistooshorttoexhibitlocality,WSandVMINarenotclose.
GoodLocalityMakesWorkingSetNearOptimalWewouldliketoclarifyapointabouttheassumptionsbehindthemathematical
formulasgivenabove.Theonlyassumptionisthatanaddresstracecanberecordedfromacomputation.Therecursionformulasaresimplymathematicalrelationships
Denning WorkingSetAnalytics 33
formeasurescomputedonaddresstraces.Theworkingsetandfaultmeasuressimplycounteventsintheaddresstrace.Optimalitydefinesthebestthatcanbedoneonagivenaddresstrace.Therearenoassumptionsaboutlocalitybuiltintotheworkingsetandoptimalitymeasures.Workingsetsareunbiasedmeanstomeasurewhetherlocalityispresentornot.
Thepagereferencemapisconstructedfromanaddresstrace.Localityappearsinthedistinctivephase-transitionpatternsofthemap.Theworkingsetwasdevisedtomeasurethelocalitysetsseeninthemap.Itissimplyameasurementtool.Itmakesnoassumptionsaboutlocality.
LocalitycomesinwhenwewanttoclaimthatWSisclosetotheoptimalVMIN.Let’sseehowthatworkswiththereferencemapinFigure1.
ConsideralocalitysetofPpagesanditsassociatedphaseconsistingofLsamplewindowsoflengthT.EachsamplewindowwithinthephasecontributesPTspace-time.VMINandWShavetheidenticalmemorycontentseverywhereinthephaseexceptforthefirstandlastsamplingwindows.Inthefirstwindowofthephase,theybothacquirethePpagesofthenewlocalityatthesamemomentsofpagefaults;butWShassomeadditionalspace-timebecauseitretainspagesfromthepreviousphase.LetfandgbethefractionofPTusedrespectivelybyWSandVMINinthefirstsamplewindowandhbethefractionofPTusedbyVMINinthelastwindow.Thenthespace-timeforWSacrossthewholephaseisfPT+(L-1)PT.ForVMINitisgPT+(L-2)PT+hPT.TheratioofWStoVMINis
𝑠𝑡(𝑇)𝑣𝑡(𝑇) =
𝑓𝑃𝑇 + (𝐿 − 1)𝑃𝑇𝑔𝑃𝑇 + (𝐿 − 2)𝑃𝑇 + ℎ𝑃𝑇 <
𝐿𝐿 − 2
Theinequalityresultsfromreplacingfwithitsupperbound(1)inthenumeratorandgandhwiththeirlowerbounds(0)inthedenominator.
Figure1shows5localitysetsandtheirassociatedphasesforasamplingwindowTofapproximately380Kunits.AmeasurementofthegraphtofindthelocalitysetsizesanddurationsyieldstheTable3,wheresizeisthenumberofpagesinthelocalitysetandlengthisthenumberofsampleintervalsinaphase.
Table3
set size length L/(L-2)
1 50 180 1.01
2 65 220 1.01
3 30 220 1.01
4 55 50 1.04
5 35 180 1.01
Denning WorkingSetAnalytics 34
ThefinalcolumninthetableaboveshowstheboundingratioforeachofthephasesofFigure1.Forthelongerphases,VMINandWSareabout1%apart.Fortheshortphase,theyareupto4%apart.Amorepreciseanalysiswouldhavenarrowerseparationsthanthisapproximateanalysis.
VMIN’svirtualspacetimeislessthanWS’s.Howdoesthistranslatetorealspace-timeasneededbythememoryspace-timelaw?Inournotationthelawbecomesst(T)(1+Dmc(T)).SinceWSandVMINhaveidenticalpaging,theexpressionforVMINrealspace-timeisvt(T)(1+Dmc(T)).ThusVMIN’sminimumvirtualspacetimetranslatesdirectlytominimumrealtimespace-timeandmaximumthroughput.ThisiswhywhenWSisclosetoVMIN,theirrespectivesystemthroughputsareclose.
Insummary,workingsetanalyticsprovidestoolsformeasuringlocality.Whenlocalityispresent,aworkingsetmemorymanagementpolicywillsetsystemthroughputtowithinafewpercentofoptimal.
ComparisonsMemorypoliciescanbeclassifiedintwodimensions:fixedorvariablespace,
optimalornot.Wehavechosenrepresentativesofeachcombination:
Table4
Fixed space Variable space
Not optimal LRU WS
Optimal MIN VMIN
Theoptimalpoliciesaregenerallynotrealizableinrealtimebecausethey
requireknowledgeofthefuturereferencepatterns.Althoughtheoptimalpoliciesarenotrealizable,theirperformanceiseasytocomputefromarecordedaddresstrace.Therefore,itiseasytomeasurehowfararealizablepolicy(LRUorWS)isfromoptimal.
Fixedspacepoliciesareusedformemoriesoffixedsizesuchascachesorhardpartitionsofmemoryinanoperatingsystem.Fixedspacepoliciesaresubjecttothrashingwhenextendedtosharedmemoryorcache,becausethespaceallocatedtoeveryjobdecreasesasthenumberofjobs.TheWSpolicy(Figure6)isresistanttothrashingbecauseitcannotstealpagesfromotherprocessesandbecausetheschedulerwillnotloadnewprocessesiftheirworkingsetsdonotfitintotheavailablefreespaceofmemory.
Denning WorkingSetAnalytics 35
Whathappenswhenthesepoliciesareappliedforjobsdisplayingastrongdegreeoflocality?Whenthememoryallocatedbythepolicyistoosmalltoholdthejob’slocalitysets,LRUandWSwouldbecomparable–butwithpoorperformance.Whenmemoryissufficienttoholdlocalitysets,LRUandMINwouldbecomparable,andWSbetterthanbothbecausethefixed-spacepoliciesretainpagesnotbeingusedinalocalityset.
Moreover,LRUisalsosusceptibletothrashingwhenextendedtosharedmemory.WSwillnotthrashanditisclosetooptimal.
ChoosingtheWindowSizeWhatisagoodwindowsize?Isthereabestwindowsizeforeachprocess?
Whatwouldhappenifallprocessesweremeasuredbythesamewindowsize?
Thisquestionhasbeeninvestigatedbyresearchersdatingbacktothe1970s.OnebasicfindingisthatgraphsofWSspace-timeversusTshowthattheminimumofst(T)isinawide,near-flatplateauinmostjobs.ThereisusuallyasinglevalueofTthatintersectsalltheplateausofthejobs.Inotherwords,itdoesnotmakemuchdifferencewhatvalueofTyouchoose;Tcanbeasingle,globalvaluechosenoverawiderangewithoutsignificantlychangingthesystemthroughput.
Fromthepagereferencemaps,weseethatanidealTisjustlargeenoughtoseeallthelocalitypagesinthewindowthroughoutthephase.InFigure1,forexample,T=380Kissufficienttoseeallthepagesofthelocalityset.ThatvalueofTisasmallfractionofthephaselength–lessthan0.5%forthefourlongphasesand2%fortheoneshortphase.
Thisgivesapracticalanswertotheperformancetuningproblemmentionedatthebeginning.AsystemadministratorcanadjusttheparameterTinaWS-managedsystemtofindavaluethatmaximizessystemthroughput.Afterthat,theparameterTdoesnotneedtobechanged.ThatmaximumwillbeclosetothetheoreticaloptimumofVMIN.
ImplementationsThereareseveralwaystoimplementtheworkingsetcheaply.Theoriginalideasforimplementation(1968)wereoftwokinds[den68a].The
firstwastohavetheoperatingsystemscanandresettheusebitsinpagetableseveryTtimeunitsofvirtualtime,removinganyunusedpagesfromworkingsets.ThisimplementationwasnotattractivebecausescanningallthepagestableseveryTtimeunitsinalargesystemwouldbeveryexpensive.
ThesecondideawastoassociateahardwaretimerofdurationTwitheachpageframeofmemory.ThetimerissettoTateachaccesstothatframe.Itticksdownto0afterTtimeunitsofnon-use.Theoperatingsystemcandetecttheseunused
Denning WorkingSetAnalytics 36
framesfromtheirexpiredtimersandremovethemfromtheirworkingsets.Thisimplementationwasnotattractiveatthetimebecausetherequiredhardwarewouldbetooexpensiveand,moreover,becauseofCPUmultiplexing,thetimerswouldneedtobeshutoffwhenthejobowningthepagewasnotrunning.
ChenDingandhisstudentsatUniversityofRochesternotethatinmoderncaches,multipleCPUsexecuteinparallelandshareon-chipL1andL2cache.Thehardwareiseasilydesignedtoincludetimersoncachepages.Becausetherearemultiplecores,thereisnoneedtoshutanytimersofftoaccommodateround-robinCPUscheduling.Theirproposalofa“leasecache”,inwhicheachcachepagehasitsowntimer,isnowfeasible[li19].AleasecachewithleaseTisexactlytheworkingsetpolicywithwindowT.
AsimplemodificationofthepopularCLOCKalgorithmenablestheoperatingsystemtodetecttheworkingsetpagesofajob.Asbefore,theprocess’spagesarearrangedinacircularlist,butnoweachpagehasatimestampinsteadofausebit.Thetimestamprecordsthetimeofthelastaccesstothepage.ThescanningclockhandskipsoverallpageswhosetimestampiswithinTofthecurrenttime,andstopsatthefirstpagewithanoldtimestamp.Thismethod,calledWSCLOCK,wasinventedandvalidatedbyRichardCarr[car81].
NotethatWSCLOCKhasnobuilt-inloadcontrol.Byitself,itcannotimplementthefullWSmultiprogrammingpolicy(Figure6),whichdoesnotloadanotherjobuntilthereissufficientfreespaceforitsworkingset.However,theWSCLOCKsearchtimeforapagewithanoldtimestampcanbeaproxyforthemeasureoffreespace:thelongerthesearchtime,thelessfreespaceisavailable.Thememoryallocatorcanloadanewjobwhenthesearchtimeisbelowasetthreshold.(NotethatthememoryallocatorisnotthesameastheCPUcorescheduler.Thememoryallocatordecideswhentoloadajob’sworkingsetintomemory.TheschedulerassignsfreeCPUcorestojobswhoseworkingsetsareloaded.)
ApplicationinModernCachesModerncomputerchipsrelyoncomplexhierarchiesofcachestoachieveand
maintainhighperformance.TypicalcachelevelsL1,L2,andL3bufferthegapbetweentheRAMandtheCPU.LevelL1isclosesttotheCPUandisthesmallestandfastestofthecaches.Therearetwokindsofcacheconfigurations.Inclusivecachemeansthatcachepages(alsocalledslotsorblocks)ofalevelaresubsetsoflargerpagesatthenextleveldown.Exclusivecachedoesnotrequirethissubsetting[ye17].ThesecachesusuallyusesomeformofLRUreplacementtodeterminewhenacachepageispusheddowntothenextlowerlevelcache.Thehighest-levelcache(L1andL2)isdedicatedtoindividualCPUcores,whileL3cacheissharedamongallthecoressimilartomultiprogramming[ye17].ThesharedL3cachemaybepartitionedunequally,withsomecoresgettingmorethanothersdependingontheirlocality[bro15].ResearchersareinvestigatingwhetheravariablepartitionbasedonCPUcoreworkingsetswouldbemoreefficient.
Denning WorkingSetAnalytics 37
ChenDing’sresearchgrouplooksdeeplyintopolicyandperformancequestionsforsharedcaches[lia19,xia11,xia13,xia18].Manyoftheiranalyticmethodsforcacheswereinspiredbytheworkingsettheory.Theydefinedtwomeasuresofloadonthecache,calledfootprintandfootmark.I’lldiscussbothofthemandcomparewithWS.
Thefootprintmeasurewasmotivatedbyadesireforaccuracy.ForthefirstT-1referencesofatrace,theworkingsetsarethecontentsofatruncatedwindowbecausetherearenopagereferencespriortotimet=1.Infact,theinitialworkingsetsforthefirstT-1referencesareofsizesw(t,t).Thosetruncatedwindowspresentapotentialunderestimateinthecalculationsofmeanworkingsetsize.ChenDing’sfootprintavoidsthesecomplicationsbyaveragingtogetheralltheworkingsetswhoseTwindowsfitcompletelyinsidetheaddresstrace:
𝑓𝑝(𝑇) = 1
𝑁 − 𝑇 + 17𝑤(𝑡, 𝑇)$
%&*
Dingarguedthatthederivativeofthefootprintfunctionisthemissrate,justasindicatedbytheworkingsetrecursion.Thus,themissratecanbecomputedoncethefootprintiscomputed.Histeamfoundarecursionforcomputingfp(T)fromfp(T-1)andshowedthatthemeanfootprintiswithinonepageofthemeanworkingsetifthewindowisnottoolarge:
𝑠(𝑇) − 𝑓𝑝(𝑇) < 1,𝑖𝑓𝑇 ≤ √2𝑁Thiswillbeeasilytrueforprogramswithgoodlocality.DetailsareintheAppendix.
ChenDingwasnothappywiththemessinessofalgorithmsforcomputingfootprint.HesearchedforarecursionbasedonfootprintthatwasidenticaltotheonereportedbyDenningandSchwartzforlongaddresstracesinstochasticsteadystate.Heandhisstudentsfoundanew,closelyrelatedmeasuretheycalledfootmark[yu18].FootmarksatisfiestheDenning-Schwartzrecursion
𝑓𝑚(𝑇 + 1) = 𝑓𝑚(𝑇) + 𝑚(𝑇)
withinitialconditionsfm(0)=0andm(0)=1.ThedetailsofitsderivationareintheAppendix.
Wenowhavetworecursions–workingsetsizeandfootmark–thatsatisfythesamemathematicalform,
𝐹(𝑇 + 1) = 𝐹(𝑇) + 𝑀(𝑇)
whereFisaspacemeasureandMamissesmeasure.TheworkingsetusesforMthewarm-start-hot-finishmissratemwh(T)andfootmarktheactualworkingsetmissratemc(T).Forfiniteaddresstraces,thetwomissratesdifferbyafewendcorrections.Butforverylongtraces(largeN),theybecomeidenticalandthetworecursionsarethesame.Thatreplicatesthe1972findingforthesteady-statevaluesofworkingsetsizes(T)[den72].Likefootprint,thefootmarkmeasureiswithinonepageofthemeanworkingsetsizewhenT≤√2𝑁.Figure13illustratesthesemeasuresfortheexampleaddresstraceusedpreviously.WhenthereisariskthatT
Denning WorkingSetAnalytics 38
>√2𝑁,thebeststrategyistousetheworkingsetrecursion,whichisfullyaccurateandiseasilycomputedfrommwh(T),whichinturndiffersfrommc(T)byafewendcorrections.
Figure13.Thisgraphcomparesthefootprint(FP),meanworkingset(WS),andfootmark(FM)measuresforthe5-pageexampleaddresstrace.Theworkingsetrisesasymptoticallytoward4.0pages.Boththefootprintandfootmarkcontinuegrowing.WhenT≤√𝟐𝑵=5(verticalline)footprintandfootmarkarewithin1pageofWSaspredictedbythebound.Forexample,inthepagereferencemapofFigure1,thereareover1000sampleintervals;Tcouldbeaslargeas45(= √𝟐𝟎𝟎𝟎)sampleintervalswithoutcausingsignificanterrorbetweenfootmarkandmeanworkingset.WhenNisverylarge(notshowninthisdiagram),thetwomeasuresFMandWSconvergetothesame.
Denning WorkingSetAnalytics 39
WorkingSetsinParallelEnvironmentsThecomputingenvironmentsinwhichvirtualmemorywasformulatedfeatured
asingleCPUaccessingamemorysharedbymanyjobs.Itwasbasicallyaserialenvironment.TheCPUofoldhasevolvedintomodernmulticorechips,graphicsprocessors(GPUs),andseveralkindsofcache.Doesallthisparallelisminvalidateanyassumptionsoftheanalytics?Dographicsandneuralnetworks,theprimaryapplicationsofGPUs,exhibitlocality?
Asforthefirstquestion,theassumptionsofworkingsetanalyticsdoindeedapplyinaparallelenvironment.Theanalyticsdependonlyontheassumptionthatanaddresstracecanbemeasured.Aparticularrunofaparallelsystemyieldsanaddresstraceforthatrun.Thenextrunofthesamesystemmayyieldadifferentaddresstracebecausetheorderofeventsisdifferentortheinputdataaredifferent.Butthisdoesnotcreateanyproblemsfortheanalytics,becausetheanalyticsaredefinedforagivenaddresstrace.WSwillrevealthelocalitysetsofthattraceandadapttothem.VMINwillrevealthebestpossibleperformanceforthattrace.Whathappensinothertracesdoesnotaffecttheanalytics.11
Asforthesecondquestion,considerthatasystemisasetofCPUsaccessingasetofpagesinmemory.WheneveranyCPU,runninganyjob,accessesapage,thepage’sleaseissettoTandcountsdowntowardzerowitheachclocktick.Theprotocolfordecidingwhethertokeepapageinmemoryisthesamewhetherornotjobssharepagesorruninparallel.
Whethertheleaseruleoptimizessystemthroughputdependsonthemixbetweenpartsofjobswithgoodlocalityandpartswithout.Thereislotsofdebatearoundthisissue.AGPUworkingwithstreamingdata(forexample,runningagraphicsdisplay)orwithsimulationofaneuralnetwork(doingthelinearalgebracalculationsforeachlayer)maynotexhibitphase-transitionbehavior.Buttherestofthecomputation,suchasfeedbackforreinforcementlearninginaneuralnetworkandtheuserinterface,islikelytoexhibitgoodlocalitybehavior.Thismeansthatlocalityisimportantforoptimizingtheperformanceofthenon-GPUcomponentsandmanagingcommunicationbetweenjob-componentsrunningonconventionalCPUsandthoserunningonGPUs.Otheroptimizations,suchasstreamingcaches,canbeappliedtotheGPUparts.
WiderApplicationofWorkingSetPrinciplesTheprinciplesofmemorymanagementintheworkingsetmodelhavebeenused
outsidethetraditionaloperatingsystemsthathatchedthem.Thetwomost
11InhisbookRethinkingRandomness,JeffBuzendiscusseshowmanyaspectsofcomputersystemscanberepresentedaseventtraces[buz15].Thetraditionalalgorithmsofqueueingtheoryforutilization,throughput,andresponsetimeworkforthesetracesbasedpurelyonwhatcanbeobservedinthetrace.Theaddresstraceandworkingsetanalyticsareofthiskind.
Denning WorkingSetAnalytics 40
prominentarecontentdeliveryintheInternetandmanagementofinventoriesandlogisticsnetworks.
Startwithcontentdelivery.TheInternethostsmajorservicesthatdistributedatathroughouttheworld.Examplesaredistributionofmusic,video,books,andcloudservices.Eventhoughtheseserviceslooklikeasingleentity,theyarebuiltasdistributednetworksbecausecongestionatasingleserverwouldbetoogreatforacceptableservice.Thenetworksincludecacheslocatedinbusyzones,suchasmajorcities,sothatusersinthoseareascanaccessthecontentviaphysicallyshorthighbandwidthpaths.Contentisautomaticallydownloadedtoalocalcachewhenauserofrequestsnewdata.Datathathavebeenresidentforalongtimeareautomaticallyrefreshedatintervals.
Thesecachesareoftencalled“edgecaches”becausetheyrepresentmovingdataawayfromcentralizedlocationsthatareeasilycongested,totheedgeofthenetworkwheretheusersconnect.Numerouscompaniesemployedgecachesaspartoftheircontentdeliveryservices.
Anedgecachefunctionsinthesamewayasacacheinsidetheoperatingsystem.Whenablockofdataisaccessedforthefirsttime,itisloadedintotheedgecachefromacentralserver.AnLRUreplacementpolicyoraleasepolicyisusedtoremoveolddatawhenthecacheisfull.Thesecachesaccumulateworkingsetsandachievehighperformancebykeepingtheworkingsetsphysicallyclosetotheirusers.Allcachesperformwellbyexploitingtheprincipleoflocalityinthepatternsofhowusersaccessdata.
Turnnowtoinventorymanagement.Aninventoryisasetofitemsorparts.Alogisticsnetworkisanetworkofdepotsthatholdinventoriesofpartsusedbycompaniesormilitaries.Whenappliedtoinventories,aworkingsetisasetofpartsthathavebeenusedrecently.Depotmanagersaimtokeeplocaldepotsfilledwiththemostneededworkingpartssothattheycansupplytheirlocalusersrapidly.Whensomeoneasksforapartnotinthelocaldepot,themanagermustsendarequesttoanotherdepottoobtainthepart.
Giventheprevalenceoflocalitybehavior,itisreasonabletoexpectuseraccesspatternsforinventoriestoexhibitlocalitysets(subsetsofpossibleparts)andphases(intervalsofheavyuseofaparticularlocalityset).Workingsetanalyticscouldbeusedtodeterminethecapacityneededofaninventorydepot.
Thereisadifferencebetweena“logisticsworkingset”anda“memoryworkingset”.Thelogisticsworkingsetincludesmultipleinstancesofapart,whereasthememoryworkingsethasjustoneinstanceofapage.Thenumberofrequestsforaparticularitemintheworkingsetwindowforecaststhequantityofthatitemtokeeponhand.Itwouldbeworthwhiletostudythisideaoflogisticsworkingsetsandtheirphasetransitionmapstoseewhatinventorymanagementstrategiesgiveoptimalperformanceofthelogisticsnetwork.
Denning WorkingSetAnalytics 41
MemoryMisconceptionsBeforeclosingthistutorial,Iwouldliketocommentonfourcommon
misconceptionsaboutcomputermemory.Theyresultfromunfoundedassumptionsthatleadtopessimisticanswerstofourbigquestions:
1. Ismemoryflattening?2. Isvirtualmemoryobsolete?3. IscomputermemoryirrelevantintheCloud?4. IsthePrincipleofLocalityobsolete?
IsMemoryFlattening?
Memoryisanessentialcomponentofcomputers.MostpeoplethinkofmemoryasacompaniontechnologythattheCPUinteractswithtofetchinstructionsandaccessdata.PerformanceofthecomputerthenseemstobegovernedbytheCPUspeed.
Butthisisnotso.MemoryisnotasinglecomponentasisaCPUchip.MemoryisasystemthatincludescachesintheCPU,RAM,disks,networkservices,andCloudstorage.WhereasaCPUcorerunsonlyonejobatatime,memoryissharedamongmanyjobs.Memorycontentionbirthsqueueingathigh-demandservers.Diskandnetworkbottlenecksatthesequeueswillkillperformanceofmemorysystems.Asdiscussedinthistutorial,wehavelearnedhowtoorganizetheallocationofmemoryanddatatransferssothatweavoidthesebottlenecks.
Athoughtexperimenthighlightsthisbasicrealityaboutmemory.Alotofpeoplebelievetheclaimthatwithinafewyearswewillknowhowtomakememoriesthatareessentiallyinfiniteandflat.Flatmeansthattheaccesstimeofanyiteminthememoryisapproximatelythesame.Flatmemorywouldnotneedlocalityoptimizationbecauseeverypagewouldhavethesameaccesstime.Butflatmemoryisnotathingofthefuture.Itisalreadyhere–theInternet.Ifyouissuea“ping”commandfromanywhereintheInternetforanyIPaddressintheInternet,youwillseethatthepacketroundtriptimesaremostly30-90milliseconds.Thatisnotperfectlyflatbutisclose.Doesthatmeanyourcomputerwillexperienceamaximumdelayof90millisecondswhenitreachesouttoawebserver?Ofcoursenot.Iftheserverisverypopular,manycomputersallovertheInternetwillbetryingtoaccessit.Theserverwillqueueuptherequestsbecauseitsdisksareabottleneck.Thegreaterthedemandforthepopularwebserver,thelongerthewaittimeinthequeue–andthelongertheresponsetimetogetawebpage.Inotherwords,flatmemorysystemsdonotguaranteegoodperformance.ThisiswhytheInternetcontainssomanyedgecaches–theyspreadtheloadandavoidthequeueing.
ThesameproblemappearsatsmallerscalewhenmanyjobsshareRAMtogetherandcontendforuseofthepagingdisks.Workingsetmemorymanagementpreventsdiskoverloadandprotectsthesystemfromthrashing.
Denning WorkingSetAnalytics 42
Thecontrolsystemsthatpreventexcessdelayinaccessingsharedmemoryresourcesareanimportantpartofthememorystory.Mostusersandprogrammersarenotawareofthosecontrolsystems.
IsVirtualmemoryObsolete?
Virtualmemorywasinventedtoovercomethehighprogrammingcostofsolvingthe“overlayproblem”--manuallyplanningpagetransfersthatoverwritepreviouslyloadedpages.Today’smemoriesareusuallylargeenoughtocontainanentireprogramanditsdata.Thus,itwouldseemthatvirtualmemoryisnotusefulanymoreforsolvingtheoverlayproblem.
Theoverlayproblemismuchlessimportantthaninearlyvirtualmemories.Virtualmemory’srealvaluecomesfromitsabilitytopartitionmemory.Itallocatesnonoverlappingsubsetsofpagestoeachjob’smemoryandpreventsanyjobfromaccessingdatainanotherjob’smemory.Virtualmemoryprovidesthebasistoencapsulateuntrustedsoftwaresothatitcannotdamageanythingoutsideitsallocatedmemoryspace.Virtualmemoryprovidesbasicaccesscontrolbydistinguishingread,write,andexecutepermissionsforindividualpages.Inotherwords,virtualmemoryimplementsthebasicguaranteesofanoperatingsystemfordataprotection.
Virtualmemorydoesmorethanpartitionmemoryandprovidebasicaccesscontrol.ItalsomanagesmultiprogrammedRAMtoavoidthrashingandtomaximizesystemthroughput.ThesameconcernsforstabilityandthroughputnowoccurinthecacheontheCPUchip,whichissharedamongthemultipleCPUcoresonthechip.Theprinciplesofvirtualmemoryareimportantincachedesign.
IsComputermemoryirrelevantintheCloud?
TheCloudprovidesverylargestorageinthenetworkoutsideyourcomputer.Itdoesnotincludethestorageinsideyourcomputer.YourcomputerstillneedsL1,L2,L3cache,localRAM,andlocalstorage.Theoperatingsystemonyourcomputerneedstomanagetheselocalmemoryresources.TheCloudenlargesthestorageaccessibletoyourcomputerbutdoesnotreplacestoragemanagementwithinyourcomputer.
TheCloudisacomplex,distributedstoragesystem.Itconsistsofmanydatacentersaroundtheworld.Eachdatacenterincludestensofthousandsofcomputersanddisks.Sophisticatedredundancycontrolsmanagemultiplecopiesoffilesdistributedacrossmultipledatacenters,providinghighreliabilityandeaseofrecoveryfromfailuresofcomputersanddisks.TheoperatingsystemsrunningtheCloudmustalsomanagememory,avoidthrashing,andmaximizeCloudthroughput.
TheCloudisnotthefinalanswertostoragemanagement.OneoftheprimarylimitationsonperformanceofcomputersistheinterfacesbetweenCPUsandmemorysystem.MovingdataacrossthoseinterfacessignificantlyslowsCPU
Denning WorkingSetAnalytics 43
speeds.TheCloudisanotherinterface.ThoseinterfacedelaysareoftencalledvonNeumannbottlenecksbecausetheseparationofCPUandmemorywasacentralprincipleofthestored-programcomputerarchitecturethatbecameubiquitousafter1945.ManyresearchprojectstodayareexaminingnewarchitecturesthanhavenovonNeumannbottleneck.Theideaistobuildthecomputersothatcomputationsareperformedbyhugenumbersofprocessingelementsthatrequireonlylocalaccesstolimitedstorage.Theseareoftencalled“processing-in-memoryarchitectures”.Neuralnetworksareaprominentexample.
IsThePrincipleofLocalityobsolete?
TheheartofallthemethodstocontrolmemoryandoptimizeitsperformanceisthePrincipleofLocality.Eachprogramgeneratesauniquefootprintoflocalitysetsandphases.Highperformancecaches,RAM-DISKinterfaces,andnetworksallowetheirsuccesstothisprinciple.
Thelocalityprinciplerunsdeepincomputing.Algorithmtheoristshaveshownthataprocedurecannotbeanalgorithmunlesseachofitsoperationscanapplyonlytoabounded,localsetofdata.Computingmachinesthemselvesarebuiltfrommanycomponentsandmodulesthatuseonlylocalinputs.
Yetthelocalityprincipleformemoryisoftenmisunderstood.Somepeoplebelieveitsimplymeansunequalfrequenciesofuseofeachpage.Othersbelieveitmeansaslowdriftamongasetoffavoredpages.Fewseeitaslongphrasesofnearconstantlocalitysetspunctuatedbysharptransitionstonewlocalitysets.Yetthephase-transitionbehavioriscommonandisthereasonwhyworkingsetisabletogivenearoptimalsystemthroughput.
Anothermisunderstandingisthebeliefthatincentiveshavechanged.Intheearlydays,programmersofearlyvirtualmemorysystemspurposelyinducedlocalitybehaviorinordertogetthebestperformancefromthepagingalgorithms.Weareinadifferentagenow:memoryisnowherenearasscarceandprogrammersfeelnopressuretoinduceworkingsetsintotheirprograms.Thus,itwouldseemthatthemotivationforlocalityisgone.Thismisunderstandingisrefutedbytwofacts.Oneistheexperimentalstudiesshowingthatlocalitybehaviorispresentinthesourcecodeofprograms–thephase-transitionbehaviorappearinginpagereferencemapsisanimageofsourcelocality.Localityistheconsequenceofourproblem-solvingstrategiessuchasiterativeloopsanddivide-and-conquer.Theotherrefutationisrecentinstrumentationsshowingpagereferencemapswithevenmorepronouncedphase-transitionbehaviorthanwesawintheearlydays.Theincreaseduseofmodularprogramsisthemostlikelyexplanation.(SeeMcMenamin’sstudyofLinuxprograms[mcm11].)
Denning WorkingSetAnalytics 44
ConclusionTheWorkingSetModelforprogrambehavior,firstarticulatedin1968,has
stoodthetestoftime.Itstimulatedtheresearchthatrevealedthatalmostallexecutingprocessesexhibitlocality,establishingthePrincipleofLocalityasoneofthefundamentalprinciplesofcomputing.Itledtoasimple,precisewaytomeasurealltheworkingsetstatisticsinrealtimebyrecordingthereuseintervalstatisticsofanaddresstrace.ItledtotheconclusionthatWSisnear-optimalforprocessesexhibitinglocality.Itprovidedanexplanationforthemysteryofthrashingandshowedhowtobuildacontrolsystemtoavoidthrashing.
Since1968,computerCPUshavebecomeprogressivelyfaster,wideningthecostgapofretrievingapagefromalowerlevelofmemory.Atthesametimememoryhasbecomefarcheapersothatmostroutineapplicationsarefullyloadedinmemoryanddonotnormallycausepagefaults.Somepeoplehaveaskedwhetherthetheoryappliestorealsystemstoday.
Yes,itdoes.Realsystemstodayhavememoriesconsistingofmultiplelevelsofcache,RAM,anddisk.ThecachesnearCPU(L1andL2)aresharedamongmultiplecores(CPUs).Mostofthecurrentcachemanagementstrategiesdonotvarythepartitionofthecacheamongthecores,butresearchershavebeendemonstratingthatvariablepartitioncachesaremuchmoreefficient.Unfortunately,variablepartitionLRUcachesaresusceptibletothrashing.TheWStheorycanbehelpfultodesigncachemanagementstrategies,suchasleasecache,thatdonotthrashandmaintaincachemissestoclosetotheoptimallevels.
Despitethetremendousadvancesinmemorytechnologyoverthepasthalfcentury,thebasicassumptionsbehindmemorymanagementhavenotchanged.Virtualmemoryremainsusefulbecauseofitsabilitytoconfinejobstotheirlimitedregionsofmemory,toencapsulateuntrustedsoftware,andtomanageloadtoavoidthrashing.Flatmemorydoesnoteliminatetheneedforvirtualmemory;itintroducesitsownproblemsduetoqueueingandcongestionasmanyjobsaccesssharedmemoryresources.TheCloudaugmentsbutdoesnotreplacememorymanagementonlocalcomputers.Moreover,theCloudexacerbatesthevonNeumannbottleneckbetweenCPUandmemory.Thelocalityprincipleisfarfromobsolete–itcontinuestounderpinhighperformancememorysystems.Theworkingsettheorycanbeextendedandcombinedwithnetworkcachingtheoryforpossibleapplicationinlogisticsnetworksandinventorymanagement.
Workingsetsandvirtualmemorywillbepartsofcomputingforalongtimetocome.
Bibliography[aho71] Aho,A.,P.J.Denning,J.Ullman.1971.Principlesofoptimalpage
replacement.J.ACM18,1(January),80-93.
Denning WorkingSetAnalytics 45
[bel66] Belady,L.A.1966.Astudyofreplacementalgorithmsforvirtualstoragecomputers.IBMSystemsJ.5,2,78-101.
[bro15] Brock,J.,C.Ye,C.Ding,Y.Li,X.Wang,andY.Luo.2015.Optimalcachepartitionsharing.In44thIEEEInt’lConf.onParallelProcessing(September),749-758.
[buz76] Buzen,J.P.1976.Fundamentaloperationallawsofcomputersystemperformance.ActaInformatica7,2,167-182.
[buz15] Buzen,J.P.2015.RethinkingRandomness.Amazon.comPlatform.SeealsoACMUbiquityinterviewsoftheauthor,https://ubiquity.acm.org/article.cfm?id=2986329andhttps://ubiquity.acm.org/article.cfm?id=2986331
[car81] Carr,R.W.andJ.Hennessy.1981.WSCLOCK—asimpleandeffectivealgorithmforvirtualmemorymanagement.InProceedingsoftheeighthACMsymposiumonOperatingsystemsprinciples(SOSP‘81).
[den68a] Denning,P.J.1968.Theworkingsetmodelforprogrambehavior.CommunicationsofACM11,5(May),323-333.
[den68b] Denning,P.J.1968.Thrashing:itscausesandprevention.InProceedingsofthe1968,FallJointComputerConference(FJCC),partI(AFIPS‘68(Fall,partI)).
[den72] Denning,P.J.andS.C.Schwartz.1972.Propertiesoftheworkingsetmodel.CommunicationsofACM15,3(March),191-198.
[den78a] Denning,P.J.andD.L.Slutz.1978.Generalizedworkingsetsforsegmentreferencestrings.CommunicationsofACM21,9(September),750-759.
[den78b] Denning,P.J.,andJ.P.Buzen.1978.Theoperationalanalysisofqueueingnetworkmodels.ACMComputingSurveys10,3(September),225-261.
[den80] Denning,P.J.1980.Workingsetspastandpresent.IEEETrans.SoftwareEngineeringSE-6,1(January),64-84.
[den16] Denning,P.J.2016.Fiftyyearsofoperatingsystems.ACMCommunications59,3(March),30-32.
[fot61] Fotheringham,J.1961.DynamicstorageallocationintheAtlascomputer,includinganautomaticuseofabackingstore.CommunicationsofACM4,10(October1961),435-436.
[gra76] Graham,G.S.1976.Astudyofprogramandmemorypolicybehavior.PhDdissertation,PurdueUniversityDeptofComputerScience.
[gra85] Gray,Jim,andFrancoPutzolu.1985.The5minuterulefortradingmemoryfordiskaccesses.TandemCorporationTechnicalReport86.1.Availablehttps://www.hpl.hp.com/techreports/tandem/TR-86.1.pdf
Denning WorkingSetAnalytics 46
[kah76] Kahn,K.C.1976.Programbehaviorandloaddependentsystemperformance.PhDdissertation,PurdueUniversityDeptofComputerScience.
[kil62] T.Kilburn,D.B.G.Edwards,M.J.Lanigan,F.H.Sumner.1962.One-levelstoragesystem.IRETransEC-11(April),223-235.
[li19] Li,P.,C.Pronovost,W.Wilson,B.Tait,J.Zhou,C.Ding,C.,andJ.Criswell.2019.BeatingOPTwithstatisticalclairvoyanceandvariablesizecaching.InProc.24thInt’lConf.onArchitecturalSupportforProgrammingLanguagesandOperatingSystems(April),243-256.
[lia19] LiangYuan,ChenDing,WesleySmith,PeterDenning,andYunquanZhang.2019.Arelationaltheoryoflocality.ACMTransactionsonArchitectureandCodeOptimization(TACO)16,3(August),1-26.
[mad76] Madison,A.W.andA.P.Batson.1976.Characteristicsofprogramlocalities.CommunicationsofACM19,5(May),285-294.
[mat70] Mattson,R.,J.Gecsei,D.R.Slutz,andI.L.Traiger.1970.Evaluationtechniquesforstoragehierarchies.IBMSystemsJournal9,78-117.
[mcm11] McMenamin,A.2011.ApplyingWorkingSetHeuristicstotheLinuxKernel.MastersThesis,BirkbeckCollege,UniversityofLondon.Availableathttp://cartesianproduct.files.wordpress.com/2011/12/main.pdf
[pri76] Prieve,B.G.andR.S.Fabry.1976.VMIN--anoptimalvariable-spacepagereplacementalgorithm.CommunicationsofACM19,5(May1976),295-297.
[ran68] Randell,B.andC.J.Kuehner.1968.Dynamicstorageallocationsystems.CommunicationsofACM11,5(May),297-306.
[say69] Sayre,D.1969.Isautomaticfoldingofprogramsefficientenoughtodisplacemanual?ACMCommunications13,12(December),656-660.
[spi72] Spirn,J.R.andP.J.Denning.1972.Experimentswithprogramlocality.ProcAFIPSConf41,SJCC.AFIPSPress.
[wan15] Wang,X.,etal.,"OptimalFootprintSymbiosisinSharedCache,"201515thIEEE/ACMInternationalSymposiumonCluster,CloudandGridComputing,Shenzhen,2015,pp.412-422
[wir14] Wires,J.etal.2014.Characterizingstorageworkloadswithcounterstacks.USENIXProc.11thConf.onOperatingSystemsDesignandImplementation,335-349.
[xia11] XiaoyaXiang,BinBao,ChenDing,YaoqingGao.2011.Linear-timeModelingofProgramWorkingSetinSharedCache.PACT2011:350-360
[xia13] XiaoyaXiang,ChenDing,HaoLuo,BinBao:2013.HOTL:ahigherordertheoryoflocality.ASPLOS2013,343-356.
Denning WorkingSetAnalytics 47
[xia18] XiamengHu,XiaolinWang,LanZhou,YingweiLuo,ZhenlinWang,ChenDing,andChenchengYe.2018.Fastmissratiocurvemodelingforstoragecache.ACMTransactionsonStorage14,2(April),Article12,34pp.
[ye17] Ye,C.,C.Ding,H.Luo,J.Brock,D.Chen,andH.Jin.2017.Cacheexclusivityandsharing:Theoryandoptimization.ACMTrans.onArchitectureandCodeOptimization(TACO)14,4,1-26.
[yua18] Yuan,L.,W.Smith,S.Fan,Z.Chen,C.Ding,andY.Zhang.2018.Footmark:Anewformulationforworkingsetstatistics.InInt’lWorkshoponLanguagesandCompilersforParallelComputing(October),61-69.Springer.
AcknowledgementsIamdeeplygratefultoJeffBuzen,ChenDing,ErolGelenbe,RolandIbbett,and
AdrianMcMenaminformanyconversationsthatsharpenedtheideaswrittenhere.AndIfondlyremembermyearlyteachersandcollaboratorsinthiswork,LesBelady,FernandoCorbato,JackDennis,RogerNeedham,BrianRandell,JerrySaltzer,StuartSchwartz,DonaldSlutz,andMauriceWilkes.
Denning WorkingSetAnalytics 48
APPENDIX:FOOTPRINTANDFOOTMARKChenDingattheUniversityofRochesterdefinedtwomeasuresofloadonthe
cache,calledfootprintandfootmark[lia19,xia11,xia13,xia18].Thefootmarkmeasurewasmotivatedbyadesireforaccuracy:forthefirstT-1
referencesofatrace,theworkingsetscontentsaretruncatedwindows,potentiallyunderestimatingmeanworkingsetsize.ThefootprintavoidsthisbyaveragingtogetheronlytheworkingsetswhoseTwindowsfitcompletelyinsidetheaddresstrace:
𝑓𝑝(𝑇) = 1
𝑁 − 𝑇 + 17𝑤(𝑡, 𝑇)$
%&*
Thefootprintandmeanworkingsetmeasuresdonotdiffersignificantlyunderpracticalconditions.Thedefinitionofmeanworkingsetsizeincludesthedefinitionoffootprint:
𝑠(𝑇) = 1𝑁7𝑤(𝑡, 𝑇) =
1𝑁
$
%&!
7𝑤(𝑡, 𝑡) +𝑁 − 𝑇 + 1
𝑁
*-!
%&!
𝑓𝑝(𝑇)
Sincewindowsizeisanupperboundforanyworkingsetsize,w(t,t)≤t,andthefirstsumhasanupperboundof(1+2+3+…+T-1)/N=T(T-1)/2N<T2/2N.Thesecondtermhasanupperboundoffp(T).Therefore,
𝑠(𝑇) < 𝑇"
2𝑁 + 𝑓𝑝(𝑇)
Thisreducesto
𝑠(𝑇) − 𝑓𝑝(𝑇) < 1,𝑖𝑓𝑇 ≤ √2𝑁
Inotherwords,themeanworkingsetsizeandfootprintarewithin1pageofeachotheraslongas𝑇 ≤ √2𝑁.Thiswillbeeasilytrueforprogramswithgoodlocality,asinFigure1.
Dingandhiscolleaguesalsodevelopedarecursionforfp(T)thatwouldenablecalculatingfootprintforallTinlineartime.Hereisaderivationofarecursion.StartbywritingthefootprintforwindowT+1:
𝑓𝑝(𝑇 + 1) = 1
𝑁 − 𝑇 7 𝑤(𝑡, 𝑇 + 1) = 1
𝑁 − 𝑇
$
%&*,!
J7𝑤(𝑡, 𝑇 + 1) − 𝑤(𝑇, 𝑇 + 1)$
%&*
K
Thetermsinvolvingw(t,T+1)canbereducedtotermsinvolvingw(t,T)usingtheworkingsetsizerecursiondevelopedearlier.Recallthat,whenTisincreasedtoT+1,alltherunsendinginareuseinterval>Tarelengthenedby1.Becauseeveryrunbeginswithapagemiss,andallthefirstreferencesintheinterval[1,…,T-1]startrunsthatextendinto[T,…,N],thenumberof1saddedtothepagereferencemapintheinterval[T,…,N]ismc(T).Inotherwords,thesamerecursionappliestotheextendedsum,withthesameendcorrectionsasbefore:
Denning WorkingSetAnalytics 49
7𝑤(𝑡, 𝑇 + 1) = 7𝑤(𝑡, 𝑇) + 𝑚𝑐(𝑇) − 𝑒(𝑇)$
%&*
$
%&*
Asaboveearlier,e(T)isthenumberoflastreferencesoccurringthelastTtimeunits.Definef(T)asthenumberoffirstreferencesinthefirstTtimeunits,andnoticethatf(T)=w(T,T+1).Thefootprintformulabecomes
𝑓𝑝(𝑇 + 1) = 1
𝑁 − 𝑇 J7𝑤(𝑡, 𝑇) + 𝑚𝑐(𝑇) − 𝑒(𝑇) − 𝑓(𝑇)$
%&*
K
Applyingthedefinitionoffp(T)andrecallingthatm(T)=mc(T)/N,thisbecomesthedesiredrecursion,
𝑓𝑝(𝑇 + 1) = 𝑁 − 𝑇 + 1𝑁 − 𝑇 𝑓𝑝(𝑇) +
𝑁𝑁 − 𝑇𝑚
(𝑇) −𝑒(𝑇) + 𝑓(𝑇)𝑁 − 𝑇
Thismessyexpressioncanbecomputedinlineartimebecausemc(T),e(T),andf(T)areallcomputableinlineartime.
ChenDingwasnothappywiththemessinessofalgorithmsforcomputingfootprint.Heandhisstudentsdefinedanewmeasuretheycalledfootmark.FootmarkwouldbelikeWSbutwouldcontainadditionaltermsthatcompensatedfortheshortworkingsetwindowsduringtheinitialsegmentofthetrace.Theydividedanaddresstraceintothreeregions:
• TheinitialsegmentoflengthT-1.Inthisintervaltheworkingsetsizeshavetheformw(t,t),effectivelyawindowsmallerthanT.
• ThesecondsegmenthaslengthN-T+1.Inthissegmenttheworkingsetsizesareoftheformw(t,T).ThissegmentincludesallthewindowsoflengthTasinfootmark.
• ThethirdsegmentisthelastT-1referencesofthetrace.Inthissegmentaseriesofphantomworkingsetsofsizesw(N,k)aredefinedwithprogressivelyshorterwindowsklookingbackfromtheendofthetrace.Theyare“phantoms”becausetheyarenotactuallyobservedinarealcache.Theideaisw(N,k)canbepairedwithw(T-k,T-k)intheinitialsegment;everypairspansafullwindowlengthT.Afterthepairing,theshort-windowworkingsetsoftheinitialsegmentarereplacedbyfull-windowworkingsets.TheneteffectisthatNworkingsetswithwindowTdefinefootmark.
Theseideasproducedthefollowingdefinitionoffootmarkspace-timeFM(T):
𝐹𝑀(𝑇) = 7𝑤(𝑡, 𝑡) +7𝑤(𝑡, 𝑇) +7𝑤(𝑁, 𝑘)*-!
(&!
$
%&*
*-!
%&!
Denning WorkingSetAnalytics 50
ThepairingargumentsaysthatthefirstandthirdsumscanbereplacedwithasinglesumofT-1termsw(t,t)+w(N,t),sothateveryworkingsetinFM(T)haswindowT.Therefore,thefootmarkisfm(T)=FM(T)/N.
Becausethefirsttwosumsarethedefinitionofworkingsetspace-time,
𝐹𝑀(𝑇) = 𝑠𝑡(𝑇) +7𝑤(𝑁, 𝑘)*-!
(&!
Wecannowapplytheworkingsetrecursiontodefineafootmarkrecursion:
𝐹𝑀(𝑇 + 1) = 𝑠𝑡(𝑇 + 1) +7𝑤(𝑁, 𝑘)*
(&!
= 𝑠(𝑇) + 𝑚𝑐(𝑇) − 𝑒(𝑇) +7𝑤(𝑁, 𝑘) + 𝑤(𝑁, 𝑇)*-!
(&!
wheree(T)isthenumberofendintervals≤T.Nowconsidertheworkingsetw(N,T).ApageisvisibleinthatworkingsetifandonlyifitsfinalreferenceoccursbeforetimeN-T+1.Thusthecontentsofthatworkingarepreciselythepageswhoseendintervalsare≤T,whichisthedefinitionofe(T).Thiscancelsthee(T)term.Thes(T)andsumtermscombineintothedefinitionofFM(T).Thus
𝐹𝑀(𝑇 + 1) = 𝐹𝑀(𝑇) + 𝑚𝑐(𝑇)WhenalltermsaredividedbyN,wegetthefootmarkrecursion:
𝑓𝑚(𝑇 + 1) = 𝑓𝑚(𝑇) + 𝑚(𝑇)
wherefm(T)isthefootmarkforwindowsizeTandm(T)istheworkingsetmissrate.Theinitialconditionsarefm(0)=0andm(0)=1.