what is some existing documentation on linux memory management

7/24/2019 What is Some Existing Documentation on Linux Memory Management

1/13

What is some existing documentation on Linux memory management?

Ulrich Drepper (the ex-glibc maintainer) wrote an article series called "What every programmer should now about memory"!

art #! http!$$lwn%net$&rticles$'*+,$

art '! http!$$lwn%net$&rticles$''#'$ art ! http!$$lwn%net$&rticles$'+#$ art .! http!$$lwn%net$&rticles$'...$ art ! http!$$lwn%net$&rticles$'+.$ art +! http!$$lwn%net$&rticles$'+.$ art ,! http!$$lwn%net$&rticles$','*$ art /! http!$$lwn%net$&rticles$'/#.$ art *! http!$$lwn%net$&rticles$'/#//$

0el 1orman2s boo "Understanding the Linux 3irtual 0emory 0anager" is available online!

http!$$ernel%org$doc$gorman$

What is virtual memory?

3irtual memory provides a so4tware-controlled set o4 memory addresses5 allowing each process to have its own uni6ue view o4 a computer2s memory%

3irtual addresses only mae sense within a given context5 such as a speci4ic process% 7he same virtual address can simultaneously mean di44erent things in di44erent contexts%

3irtual addresses are the si8e o4 a 9U register% :n ' bit systems each process has . gigabytes o4 virtual address space all to itsel45 which is o4ten more memory than the system actually has%

3irtual addresses are interpreted by a processor2s 0emory 0anagement Unit (mmu)5 using data structures called page tables which map virtual address ranges to associated content%

3irtual memory is used to implement la8y allocation5 swapping5 4ilemapping5 copy on write shared memory5 de4ragmentation5 and more%

;or details5 see Ulrich Drepper2s "What every programmer should now about memory5 art ! 3irtual 0emory"!

http!$$lwn%net$&rticles$'+#$

What is physical memory?

hysical memory is storage hardware that records data with low latencyand small granularity% hysical memory addresses are numbers sent across a memory bus to identi4y the speci4ic memory cell within a piece o4 storage hardware associated with a given read or write operation%

&0)5 D memory cards (4lash)5 video cards (4rame bu44ers and texture memory)5 networ cards (=$: bu44ers)5 and so on%

:nly the ernel uses physical memory addresses directly% Userspace programs exclusively use virtual addresses%


2/13

;or details5 see the &> 7echnica D>&0 1uide!

http!$$arstechnica%com$paedia$r$ram@guide$ram@guide%part#-#%html http!$$arstechnica%com$paedia$r$ram@guide$ram@guide%part'-#%html http!$$arstechnica%com$paedia$r$ram@guide$ram@guide%part-#%html

&nd Ulrich Drepper2s "What every programmer should now about memory5 art #"!

http!$$lwn%net$&rticles$'*+,$

What is a 0emory 0anagement Unit (00U)?

7he memory management unit is the part o4 the 9U that interprets virtual addresses% &ttempts to read5 write5 or execute memory at virtualaddresses are either translated to corresponding physical addresses5 or else generate an interrupt (page 4ault) to allow so4tware to respond to the attempted access%

7his gives each process its own virtual memory address range5 which is limited only by address space (. gigabytes on most '-bit system)5 while physical memory is limited by the amount o4 available storage hardware%

hysical memory addresses are uni6ue in the system5 virtual memoryaddresses are uni6ue per-process%

What are page tables?

age tables are data structures which containing a process2s list o4memory

mappings and trac associated resources%


3/13

& page table can be thought o4 as a description o4 a set o4 memory mappings%


4/13

physical memory has been allocated to it yet%

&ttempts to write to the page trigger the normal copy-on-write mechanismin the page 4ault handler5 allocating 4resh memory only when needed to allow the write to proceed% (Aote5 pre8eroing optimi8ations change the

implementation details here5 but the theory2s the same%) 7hus "dirtying" anonymous pages allocates physical memory5 the actual allocation callonly allocates virtual memory%

Dirty anonymous pages can be written to swap space5 but in the absence o4 swap they remain "pinned" in physical memory%

&nonymous mappings may be created by passing the 0&@&A:AB0:U 4lag to mmap()%

What is a 4ile baced mapping?

;ile baced mappings mirror the contents o4 an existing 4ile% 7hemapping has some administrative data noting which 4ile to map 4rom5 and at which o44set5 as well as permission bits indicating whether the pages may beread5 written5 or executed%

When page 4aults attach new physical pages to such a mapping5 thecontents o4 those pages is initiali8ed by reading the contents o4 the 4ile beingmapped5 at the appropriate o44set 4or that page%

7hese physical pages are usually shared with the page cache5 the ernel2s dis cache o4 4ile contents% 7he ernel caches the contents o4 4iles when the page is read5 so sharing those cache pages with the processreduces the total number o4 physical pages re6uired by the system%

Writes to 4ile mappings created with the 0&@C&>=3&7< 4lag per4orm a copy on write5 allocating a new local copy o4 the page to store the changes% 7hese changes are not made visible to other processes5 and do not update the on-dis copy o4 the 4ile%

Aote that this means writes to 0&@C&>=3&7< pages do (the copy in the page cache and the local copy the program needs diverge5 so two pages are needed to store them5 and 4lushing the page cache copy bac to dis won2t 4ree up the local copy o4 the changed contents)%

What is the page cache?

7he page cache is the ernel2 cache o4 4ile contents% =t2s the main user o4 virtual memory that doesn2t belong to a speci4ic process%


5/13

ee "What is a 4ile baced mapping" and "What is 4ree memory" 4or morein4o%

What is 9U cache?

7he 9U cache is a very small amount o4 very 4ast memory built into a processor5 containing temporary copies o4 data to reduce processinglatency%

7he L# cache is a tiny amount o4 memory (generally between # and +.) wired directly into the processor that can be accessed in a single cloc cycle% 7he L' cache is a larger amount o4 memory (up to several megabytes) adacent to the processor5 which can be accessed in a small number o4 cloc cycles% &ccess to uncached memory (across the memorybus) can tae do8ens5 hundreds5 or even thousands o4 cloc cycles%

(Aote that latency is the issue 9U cache addresses5 not throughput% 7he

memory bus can provide a constant stream o4 memory5 but taes a while to start doing so%)

;or details5 see Ulrich Drepper2s "What every programmer should now about memory5 art '"!

http!$$lwn%net$&rticles$''#'$

What is a 7ranslation Looaside Eu44er (7LE)?

7he 7LE is a cache 4or the 00U% &ll memory in the 9U2s L# cache must have an associated 7LE entry5 and invalidating a 7LE entry 4lushes the associated cache line(s)%

7he 7LE is a small 4ixed-si8e array o4 recently used pages5 which the 9U checs on each memory access% =t lists a 4ew o4 the virtual address ranges to which physical pages are currently assigned%

&ccesses to virtual addresses listed in the 7LE go directly through tothe associated physical memory (or cache pages) without generating page4aults (assuming the page permissions allow that category o4 access)% &ccesses to virtual addresses not listed in the 7LE (a "7LE miss") trigger a page table looup5 which is per4ormed either by hardware5 or by the page 4ault handler5 depending on processor type%

;or details5 see!

http!$$en%wiipedia%org$wii$7ranslation@looaside@bu44er

#** interview with Linus 7orvalds describing the i/+5 95 and &lpha7LEs!

http!$$www%linuxournal%com$article$+

What is a page 4ault handler?

& page 4ault handler is an interrupt routine5 called by the 0emory 0anagement Unit in response an attempt to access virtual memory which did not immediately succeed%


6/13

When a program attempts to read5 write5 or execute memory in a page that hasn2t got the appropriate permission bits set in its page table entry to allow that type o4 access5 the instruction generates an interrupt%7his calls the page 4ault handler to examines the registers and page tables o4 the interrupted process and determine what action to tae to handle

the 4ault%

7he page 4ault handler may respond to a page 4ault in three ways!

#) 7he page 4ault handler can resolve the 4ault by immediately attachinga page o4 physical memory to the appropriate page table entry5 adusting the entry5 and resuming the interrupted instruction% 7his is called a "so4t 4ault"%

') When the 4ault handler can2t immediately resolve the 4ault5 it may suspend the interrupted process and switch to another while the system wors to resolve the issue% 7his is called a "hard 4ault"5 and results

when an =$: operation most be per4ormed to prepare the physical page needed to resolve the 4ault%

) =4 the page 4ault handler can2t resolve the 4ault5 it sends a signal (=1


7/13

When the page 4ault handler needs to allocate physical memory to handle a page 4ault5 it 8eroes a 4ree physical page (or grabs a page 4rom a pool o4 pre8eroed pages)5 attaches that page o4 memory to the age 7able =7


8/13

out to the shared library code to connect% (=t calls mprotect() to setthe pages read only be4ore handing control over to the lined executable%) 7he dynamic liner traces through various lists o4 calls in the program2s


9/13

What are clean pages?

9lean pages have copies o4 their data stored elsewhere5 such as in swap space or in a 4ile% 7hus the physical memory storing that in4ormationmay be reclaimed an reused elsewhere by detaching the physical page 4rom the

associated age 7able


10/13

statistical method o4 e44ectively obtaining extra physical pages by identi4ying existing allocations unliely to be used again in the near 4uture and recycling them%

age stealing removes existing physical pages 4rom their mappings5disposes

o4 their current contents (o4ten by writing them to dis)5 and reuses the memory elsewhere% =4 the original user needs their page bac5 a new physical page is allocated and the old contents loaded into the new page%

age stealing loos 4or inactive pages5 since active ones would probably ust be 4aulted bac in again immediately% 9lean inactive pages arealmost as good as 4ree pages5 because their current contents are already copied somewhere else and can be discarded without even per4orming any =$:% Dirty pages are cleaned by scheduling =$: to write them to bacing store (swap 4or anonymous pages5 to the mapped 4ile 4or shared 4ile baced mappings)%

age stealing attempts to determine which existing physical pages areleast liely to be needed again soon5 meaning its trying to predict the 4uture actions o4 the processes using those pages% =t does so through various heuristics5 which can never be per4ect%

What is a "woring set" o4 pages?

& woring set is the set o4 memory chuns re6uired to complete an operation% ;or example5 the 9U attempts to eep the set o4 cache lines re6uired 4or tight inner loops in L# cache until the loopcompletes% =t attempts to eep the set o4 4re6uently used 4unctions 4rom various

parts o4 a program (including shared libraries) in the L' cache% =t does so both by pre4etching cache lines it predicts it may need soon5 and by maing decisions about which cache lines to discard and which to eepwhen maing space to load new cache lines%

7he page 4ault handler attempts to eep each currently running process2s woring set o4 pages in physical memory until the process blocs awaiting input or exits% Unused portions o4 program code may never even be loaded on a given program run (such as an "options" menu 4or a program thatisn2t currently being con4igured5 or portions o4 generic shared libraries which this program doesn2t actually use)%

7he woring set is determined dynamically at runtime5 and can change over time as a program does di44erent things%

7he obective o4 page stealing is to eep the "woring set" o4 pages in 4ast physical memory5 allowing processes to "race to 6uiescence" where the system completes its current tass 6uicly and settles down into an idle state waiting 4or the next thing to do% ;rom this point o4 view5 physical memory can be seen as a cache both 4or swap pages and 4or executables in the 4ilesystem% 7he tas o4 eeping the woring set in physical memory (and avoiding page 4aults that trigger =$:) is analogous to the 9U2s tas o4 eeping the appropriate contents in L# and L'caches%

What is thrashing?


11/13

=n low memory situations5 each new allocation involves stealing an in-use page 4rom elsewhere5 saving its current contents5 and loading newcontents% When that page is again re4erenced5 another page must be stolen toreplace it5 saving the new contents and reloading the old contents%

=t essentially means that the woring set re6uired to service the main loops o4 the programs the system is running are larger than available physical memory5 either because physical memory is tied up doingsomething else or because the woring set is ust that big%

7his can lead to a state where the 9U generates a constant stream o4 page 4aults5 and spends most o4 its time sitting idle5 waiting 4or =$: to service those page 4aults%

7his is o4ten called "swap thrashing"5 and in some ways is the result o4 a 4ailure o4 the system2s swap 4ile%

=4 the swap 4ile is too small (or entirely absent)5 the system can only steal pages 4rom 4ile baced mappings% ince every executable program and shared library is a 4ile baced mapping5 this means the system yans executable pages5 which is generally 4aults bac in 4airly rapidly since they tend to get used a lot% 7his can 6uicly lead to thrasing%

7he other way to encourage swap thrashing is by having too large o4 aswap 4ile5 so that programs that 6uery available memory see huge amounts o4swap space and try to use it% 7he system2s available physical memory and =$: bandwidth don2t change with the si8e o4 the swap 4ile5 so attempts to use

any signi4icant portion o4 that swap space result memory accessesoccuring at dis =$: speed (4our orders o4 magnitude slower than main memory5stretching each #$#th o4 a second out to about two minutes)%

What is the :ut :4 0emory (::0) iller?

=4 the system ever truly ran out o4 physical memory5 it could reach astate where every process is waiting 4or some other process to release a page be4ore it could continue% 7his deadloc situation would 4ree8e thesystem%

Ee4ore this happened5 the system would start thrashing5 where it would slow itsel4 to a crawl by spending all its time constantly stealing pages only to steal them bac again immediately% 7his situation is almost as bad as true deadloc5 slowing response time to useless levels (4ive orten minute latency on normally instantaneous responses is not unusual during swap thrashingI this is assuming your operation does not time outinstead)%

& system that enters swap thrashing may tae hours to recover (assuming a baclog o4 demands does not emerge as it 4ails to service them5preventing it 4rom ever recovering)% :r it can tae ust as long to proceed to a true deadloc (where the 4lood o4 swap =$: stops because the 9U ispegged


12/13

at #J searching 4or the next page to steal5 never 4inding one5 and thus stops scheduling new =$:)%

7o avoid either situation5 Linux introduced the ::0 iller% When it detects the system has entered swap thrashing5 it heuristicallydetermines

a process to ill to 4ree up pages% =t can also be con4igured to reboot the entire system instead o4 selecting a speci4ic process to ill%

7he ::0 iller2s process-illing capability is a reasonable way to dealwith runaway processes and "4or bombs"5 but in the absence o4 a clearly mal4unctioning process that is truly "at 4ault"5 illing any process is o4ten unacceptable%

Aote that the ::0 iller doesn2t wait 4or a true memory exhaustion to deadloc the system5 both because the system is e44ectively down while thrashing5 and because a paraly8ed system might not be able to run even the ::0 iller%

7he ::0 iller2s process illing heuristics are a reasonable way to dealwith runaway processes and "4or bombs"5 but in the absence o4 a clearly mal4unctioning process that is truly "at 4ault"5 illing any process is o4ten unacceptable% Developers o4ten argue about the choice o4 processes to ill5 and exactly when the thrashing is bad enough to trigger the ::0 iller and when to allow the system to attempt to wor its way through to recovery% Eoth o4 these heuristics are by their nature imper4ect5 because they attempt to predict the 4uture%

=n general5 developers try to avoid triggering the ::0 iller5 and treat its occurrence as the userspace e6uivalent o4 a ernel panic()% 7he

system got into an untenable state5 it might be good to 4ind out why and prevent its recurrence%

Why is "strict overcommit" a dumb idea?

eople who don2t understand how virtual memory wors o4ten insist on tracing the relationship between virtual and physical memory5 andattempting to en4orce some correspondence between them (when there isn2t one)5instead o4 controlling their programs2 behavior%

0any common unix programming idioms create large virtual memory ranges with the potential to consume a lot o4 physical memory5 but never reali8e that potential% Linux allows the system to "overcommit" memory5 creating memory mappings that promise more physical memory than the system could actually deliver%

;or example5 the 4or$exec combo creates transient virtual memory usage spies5 which go away again almost immediately without ever breaing the copy on write status o4 most o4 the pages in the 4ored page tables%7hus i4 a large process 4ors o44 a smaller process5 enormous physical memory demands threaten to happen (as 4ar as overcommit is concerned)5 but never materiali8e%

Dynamic lining raises similar issues! the dynamic liner maps executable 4iles and shared libraries 0&@>=3&7


13/13

pages to per4orm the dynamic lining 4ixups allowing the shared library calls to connect to the shared library% =n theory5 there could be a call to 4unctions or data in a shared library within every page o4 the executable (and thus the entire mapping could be converted to anonymous memory by the copy on write actions o4 the dynamic liner)% &nd since shared libraries can call other shared libraries5 those could re6uire

private physical memory 4or their entire mapping too%

=n reality5 that doesn2t happen% =t would be incredibly ine44icient and de4eat the purpose o4 using shared libraries in the 4irst place% 0ost shared libraries are compiled as osition =ndependent 9ode (=9)5 and some executables are osition =ndependent

what is some existing documentation on linux memory management

Documents