the art and science of memory allocaon - sbuporter/courses/cse506/s16/slides/malloc.pdf · the art...

Post on 24-Mar-2018

213 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

CSE506:Opera.ngSystems

TheArtandScienceofMemoryAlloca4onDonPorter

1

CSE506:Opera.ngSystems

LogicalDiagram

MemoryManagement

CPUScheduler

User

Kernel

Hardware

BinaryFormats

Consistency

SystemCalls

Interrupts Disk Net

RCU FileSystem

DeviceDrivers

Networking Sync

MemoryAllocators Threads

Today’sLecture

2

CSE506:Opera.ngSystems

Lecturegoal•  Thislecturesisaboutalloca4ngsmallobjects–  Futurelectureswilltalkaboutalloca4ngphysicalpages

•  Understandhowmemoryallocatorswork–  Inbothkernelandapplica4ons

•  Understandtrade-offsandcurrentbestprac4ces

3

CSE506:Opera.ngSystems

libc.soheap

BigPicture

int main () { struct foo *x = malloc(sizeof(struct foo)); ... void * malloc (ssize_t n) { if (heap empty) mmap(); // add pages to heap find a free block of size n; }

4

VirtualAddressSpace

0 0xffffffff

Code(.text) heapstackheap

(empty)n

CSE506:Opera.ngSystems

Today’sLecture•  Howtoimplementmalloc()ornew –  Notethatnewisessen4allymalloc+constructor–  malloc()ispartoflibc,andexecutesintheapplica4on

•  malloc()getspagesofmemoryfromtheOSviammap()andthensub-dividesthemfortheapplica4on

•  Thenextlecturewilltalkabouthowthekernelmanagesphysicalpages–  Forinternaluse,ortoallocatetoapplica4ons

5

CSE506:Opera.ngSystems

Bumpallocator

•  malloc(6)•  malloc(12)•  malloc(20)•  malloc(5)

6

CSE506:Opera.ngSystems

Bumpallocator•  Simply“bumps”upthefreepointer•  Howdoesfree()work?Itdoesn’t– Well,youcouldtrytorecyclecellsifyouwanted,butcomplicatedbookkeeping

•  Controversialobserva4on:Thisisidealforsimpleprograms–  Youonlycareaboutfree()ifyouneedthememoryforsomethingelse

7

CSE506:Opera.ngSystems

Assumememoryislimited•  Hoard:best-of-breedconcurrentallocator–  Userapplica4ons–  Seminalpaper

•  We’llalsotalkabouthowLinuxallocatesitsownmemory

8

CSE506:Opera.ngSystems

Overarchingissues•  Fragmenta4on•  Alloca4onandfreelatency–  Synchroniza4on/Concurrency

•  Implementa4oncomplexity•  Cachebehavior–  Alignment(cacheandword)–  Coloring

9

CSE506:Opera.ngSystems

Fragmenta4on•  Undergradreview:Whatisit?Whydoesithappen?•  Whatis–  Internalfragmenta4on?

•  Wastedspacewhenyouroundanalloca4onup

–  Externalfragmenta4on?•  Whenyouendupwithsmallchunksoffreememorythataretoosmalltobeuseful

•  Whichkinddoesourbumpallocatorhave?

10

CSE506:Opera.ngSystems

Hoard:Superblocks•  Atahighlevel,allocatoroperatesonsuperblocks–  Chunkof(virtually)con4guouspages–  Allobjectsinasuperblockarethesamesize

•  Agivensuperblockistreatedasanarrayofsame-sizedobjects–  Theygeneralizeto“powersofb>1”;–  Inusualprac4ce,b==2

11

CSE506:Opera.ngSystems

Superblockintui4on256byte

objectheap

4KBpage

(Freespace)

4KBpage

next next next

next next next

Free next

FreelistinLIFOorder

Eachpageanarrayofobjects

Storelistpointersinfreeobjects!

12

CSE506:Opera.ngSystems

SuperblockIntui4on

malloc (8);

1)  Findthenearestpowerof2heap(8)

2)  Findfreeobjectinsuperblock

3)  Addasuperblockifneeded.Goto2.

13

CSE506:Opera.ngSystems

malloc(200)256byte

objectheap

4KBpage

(Freespace)

4KBpage

next next next

next next next

Free next

Pickfirstfreeobject

14

CSE506:Opera.ngSystems

Superblockexample•  Supposemyprogramallocatesobjectsofsizes:–  4,5,7,34,and40bytes.

•  HowmanysuperblocksdoIneed(ifb==2)?–  3–(4,8,and64bytechunks)

•  IfIallocatea5byteobjectfroman8bytesuperblock,doesn’tthatyieldinternalfragmenta4on?–  Yes,butitisboundedto<50%–  Giveupsomespacetoboundworstcaseandcomplexity

15

CSE506:Opera.ngSystems

High-levelstrategy•  Allocateaheapforeachprocessor,andonesharedheap–  Note:notthreads,butCPUs–  CanonlyuseasmanyheapsasCPUsatonce–  Requiressomewaytofigureoutcurrentprocessor

•  Tryper-CPUheapfirst•  Ifnofreeblocksofrightsize,thentryglobalheap– Whytrythisfirst?

•  Ifthatfails,getanothersuperblockforper-CPUheap

16

CSE506:Opera.ngSystems

Example:malloc()onCPU0

17

CPU0Heap CPU1Heap

GlobalHeapFirst,tryper-CPUheap

Second,tryglobalheap

Ifglobalheapfull,grow

per-CPUheap

CSE506:Opera.ngSystems

Bigobjects•  Ifanobjectsizeisbiggerthanhalfthesizeofasuperblock,justmmap()it–  Recall,asuperblockisontheorderofpagesalready

•  Whataboutfragmenta4on?–  Example:4097byteobject(1page+1byte)–  Argument:Moretroublethanitisworth

•  Extrabookkeeping,poten4alconten4on,andpoten4albadcachebehavior

18

CSE506:Opera.ngSystems

Memoryfree•  Simplyputbackonfreelistwithinitssuperblock•  Howdoyoutellwhichsuperblockanobjectisfrom?–  Supposesuperblockis8k(2pages)

•  Andalwaysmappedatanaddressevenlydivisibleby8k

–  Objectataddress0x431a01c–  Justmaskoutthelow13bits!–  Camefromasuperblockthatstartsat0x431a000

•  Simplemathcantellyouwhereanobjectcamefrom!

19

CSE506:Opera.ngSystems

LIFO•  Whyareobjectsre-allocatedmost-recentlyusedfirst?–  Aren’tallgoodOSheuris4csFIFO?– Morelikelytobealreadyincache(hot)–  Recallfromundergradarchitecturethatittakesquiteafewcyclestoloaddataintocachefrommemory

–  Ifitisallthesame,let’strytorecycletheobjectalreadyinourcache

20

CSE506:Opera.ngSystems

HoardSimplicity•  Thebookkeepingforallocandfreeisstraighsorward– Manyallocatorsarequitecomplex(lookingatyou,slab)

•  Overall:(#CPUs+1)heaps

–  Perheap:1listofsuperblocksperobjectsize(22—211)

–  Persuperblock:•  Needtoknowwhich/howmanyobjectsarefree

–  LIFOlistoffreeblocks

21

CSE506:Opera.ngSystems

CPU0Heap,Illustrated

22OneoftheseperCPU(andoneshared)

FreeList:

Order: 2

FreeList:

3

FreeList:

4

FreeList:

5

FreeList:

11

...

FreeList:LIFOorder

Somesizescanbeempty

CSE506:Opera.ngSystems

Locking•  Onallocandfree,locksuperblockandper-CPUheap•  Why?–  AnobjectcanbefreedfromadifferentCPUthanitwasallocatedon

•  Alterna4ve:– Wecouldaddmorebookkeepingforobjectstomovetolocalsuperblock

–  Reintroducefragmenta4onissuesandlosesimplicity

23

CSE506:Opera.ngSystems

Howtofindthelocks?•  Again,pagealignmentcaniden4fythestartofasuperblock

•  Andeachsuperblockkeepsasmallamountofmetadata,includingtheheapitbelongsto–  Per-CPUorsharedHeap–  Andheapincludesalock

24

CSE506:Opera.ngSystems

Lockingperformance•  Acquiringandreleasingalockgenerallyrequiresanatomicinstruc4on–  Tenstoafewhundredcyclesvs.afewcycles

•  Wai4ngforalockcantakethousands–  Dependsonhowgoodthelockimplementa4onisatmanagingconten4on(spinning)

–  Blockinglocksrequiremanyhundredsofcyclestocontextswitch

25

CSE506:Opera.ngSystems

Performanceargument•  Commoncase:alloca4onsandfreesarefromper-CPUheap

•  Yes,grabbingalockaddsoverheads–  Butbeverthanthefragmentedorcomplexalterna4ves–  Andlockinghurtsscalabilityonlyunderconten4on

•  Uncommoncase:allCPUscontendtoaccessoneheap–  Hadtoallcomefromthatheap(onlyfreescrossheaps)–  Bizarreworkload,probablywon’tscaleanyway

26

CSE506:Opera.ngSystems

Cachelinealignment•  Linesarethebasicunitatwhichmemoryiscached•  Cachelinesarebiggerthanwords– Word:32-bitsor64-bits–  Cacheline–64—128bytesonmostCPUs

27

CSE506:Opera.ngSystems

UndergradArchitectureReview

CPU0

Cache

ldw0x1008

CPUloadsoneword(4bytes)

MemoryBus

CacheMiss

0x1000

RAM

Cacheoperatesatlinegranularity(64

bytes)

28

CSE506:Opera.ngSystems

CacheCoherence(1)

CPU0

Cache

MemoryBus

0x1000

RAM

CPU1

Cache

ldw0x1010

Linessharedforreadinghaveasharedlock 29

CSE506:Opera.ngSystems

CacheCoherence(2)

CPU0

Cache

MemoryBus

0x1000

RAM

CPU1

Cache

ldw0x1010

Linestobewrivenhaveanexclusivelock

stw0x1000 Copiesoflineevicted

0x1000

30

CSE506:Opera.ngSystems

Simplecoherencemodel•  Whenamemoryregioniscached,CPUautoma4callyacquiresareader-writerlockonthatregion– Mul4pleCPUscanshareareadlock– Writelockisexclusive

•  Programmercan’tcontrolhowlongtheselocksareheld–  Ex:astorefromaregisterholdsthewritelocklongenoughtoperformthewrite;heldfromthereun4lthenextCPUwantsit

31

CSE506:Opera.ngSystems

Objectfoo(CPU0writes)

Objectbar(CPU1writes)

Falsesharing

•  Theseobjectshavenothingtodowitheachother–  Atprogramlevel,privatetoseparatethreads

•  Atcachelevel,CPUsarefigh4ngforawritelock

Cacheline

32

CSE506:Opera.ngSystems

FalsesharingisBAD•  Leadstopathologicalperformanceproblems–  Super-linearslowdowninsomecases

•  Ruleofthumb:anyperformancetrendthatismorethanlinearinthenumberofCPUsisprobablycausedbycachebehavior

33

CSE506:Opera.ngSystems

Strawman•  Roundeverythinguptothesizeofacacheline•  Thoughts?– Wastestoomuchmemory;abitextreme

34

CSE506:Opera.ngSystems

Hoardstrategy(pragma4c)•  Roundinguptopowersof2helps–  Onceyourobjectsarebiggerthanacacheline

•  Localityobserva4on:thingstendtobeusedontheCPUwheretheywereallocated

•  Forsmallobjects,alwaysreturnfreetotheoriginalheap–  Rememberideaaboutextrabookkeepingtoavoidsynchroniza4on:someallocatorsdothis•  Savelocking,butintroducefalsesharing!

35

CSE506:Opera.ngSystems

Hoardsummary•  Reallynicepieceofwork•  Establishesnicebalanceamongconcerns•  Goodperformanceresults

36

CSE506:Opera.ngSystems

Part2:Linuxkernelallocators•  malloc()andfriends,butinthekernel

•  Focustodayondynamicalloca4onofsmallobjects–  Laterclassonmanagementofphysicalpages–  Andalloca4onofpagerangestoallocators

37

CSE506:Opera.ngSystems

kmem_caches•  Linuxhasakmallocandkfree,butcachespreferredforcommonobjecttypes

•  LikeHoard,agivencacheallocatesaspecifictypeofobject–  Ex:acacheforfiledescriptors,acacheforinodes,etc.

•  UnlikeHoard,objectsofthesamesizenotmixed–  Allocatorcandoini4aliza4onautoma4cally– Mayalsoneedtoconstrainwherememorycomesfrom

38

CSE506:Opera.ngSystems

Caches(2)•  Cachescanalsokeepacertain“reserve”capacity–  Noguarantees,butallowsperformancetuning–  Example:IknowI’llhave~100listnodesfrequentlyallocatedandfreed;targetthecachecapacityat120elementstoavoidexpensivepagealloca4on

–  Oyencalledamemorypool

•  Universalinterface:canchangeallocatorunderneath•  Kernelhaskmallocandkfreetoo–  Implementedoncachesofvariouspowersof2(familiar?)

39

CSE506:Opera.ngSystems

Superblockstoslabs•  Thedefaultcacheallocator(atleastasofearly2.6)wastheslaballocator

•  Slabisachunkofcon4guouspages,similartoasuperblockinHoard

•  Similarbasicideas,butsubstan4allymorecomplexbookkeeping–  Theslaballocatorcamefirst,historically

40

CSE506:Opera.ngSystems

Complexitybacklash•  I’llspareyouthedetails,butslabbookkeepingiscomplicated

•  2groupsupset:(guesseswho?)–  Usersofverysmallsystems–  Usersoflargemul4-processorsystems

41

CSE506:Opera.ngSystems

Smallsystems•  Think4MBofRAMonasmalldevice(thermostat)•  Assystemmemorygets4ny,thebookkeepingoverheadsbecomealargepercentoftotalsystemmemory

•  Howbadisfragmenta4onreallygoingtobe?–  Note:notsurethishasbeencarefullystudied;mayjustbeintui4on

42

CSE506:Opera.ngSystems

SLOBallocator•  SimpleListOfBlocks•  Justkeepafreelistofeachavailablechunkanditssize

•  Grabthefirstonebigenoughtowork–  Splitblockifleyoverbytes

•  Nointernalfragmenta4on,obviously•  Externalfragmenta4on?Yes.Tradedforlowoverheads

43

CSE506:Opera.ngSystems

Largesystems•  Forverylarge(thousandsofCPU)systems,complexallocatorbookkeepinggetsoutofhand

•  Example:slabstrytomigrateobjectsfromoneCPUtoanothertoavoidsynchroniza4on–  Per-CPU*Per-CPUbookkeeping

44

CSE506:Opera.ngSystems

SLUBAllocator•  TheUnqueuedSlabAllocator•  AmuchmoreHoard-likedesign–  Allobjectsofsamesizefromsameslab–  Simplefreelistperslab–  Nocross-CPUnonsense

•  NowthedefaultLinuxcacheallocator

45

CSE506:Opera.ngSystems

Conclusion•  Differentalloca4onstrategieshavedifferenttrade-offs–  Noone,perfectsolu4on

•  Allocatorstrytoop4mizeformul4plevariables:–  Fragmenta4on,lowfalseconflicts,speed,mul4-processorscalability,etc.

•  Understandtradeoffs:HoardvsSlabvs.SLOB

46

CSE506:Opera.ngSystems

Miscnotes•  Whenisasuperblockconsideredfreeandeligibletobemovetotheglobalbucket?–  Seefigure2,free(),line9–  Essen4allyaconfigurable“emptyfrac4on”

•  Isa"usedblock"countstoredsomewhere?–  Notclear,butprobably

47

top related