cse p501 – compiler construction

27
Spring 2014 Jim Hogg - UW - CSE P501 W-1 CSE P501 – Compiler Construction Conventional Heap Storage Garbage Collection

Upload: tejano

Post on 23-Feb-2016

52 views

Category:

Documents


0 download

DESCRIPTION

CSE P501 – Compiler Construction. Conventional Heap Storage Garbage Collection. Conventional Heap Storage. C. ... char* s = (char*) malloc (50 ); ... free(s );. In Use. C Runtime Heap Memory. Developer must remember to free memory when no longer required - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CSE P501 –  Compiler Construction

Spring 2014 Jim Hogg - UW - CSE P501 W-1

CSE P501 – Compiler Construction

Conventional Heap Storage

Garbage Collection

Page 2: CSE P501 –  Compiler Construction

Spring 2014 Jim Hogg - UW - CSE P501 W-2

...char* s = (char*) malloc(50);...free(s);

C

C Runtime Heap Memory

• Developer must remember to free memory when no longer required

• Eventual fragmentation => slow to malloc, slow to free

In Use

Conventional Heap Storage

Page 3: CSE P501 –  Compiler Construction

Spring 2014 Jim Hogg - UW - CSE P501 W-3

Heap Storage Fragmentation

C Runtime Heap Memory

In Use

• malloc: walk the freelist to find a slot big enough for current request

• free: adjust freelist; collapse contiguous freespace• fragmentation: plenty free chunks but none big enough for request• cannot compact the used space - may contain pointers; may be

pointed-at

Page 4: CSE P501 –  Compiler Construction

Spring 2014 Jim Hogg - UW - CSE P501 W-4

Bugs• Forget to free => eventually run out of memory

• called a "memory leak"

• Call free, but continue to use!• called "use-after-free", or "dangling pointer"• memory corruption - wrong answers; crash if lucky!• major source of security issues• detect via "pool poisoning"

2 pointers

free via 1

free malloc; corruption!

Page 5: CSE P501 –  Compiler Construction

Spring 2014 Jim Hogg - UW - CSE P501 W-5

Solving Memory Leaks

int f() { ... Car c = Car("Chevy"); ...}

C++

• object c is destructed [calls ~Car() ]automatically when c it falls out of scope

• called RAII ("Resource Acquisition Is Initialization")

• more general case still requires Developer to delete c when no longer needed

• "smart pointers" can help

Page 6: CSE P501 –  Compiler Construction

W-6

Solving Leaks and Use-After-Freepublic static int Main(String[] args) { ... if (...) { ArrayList cars = new ArrayList(); for (int i = 0; i < 1000; i++) { cars.Add(new Car()); } ... } ... Boat b = new Boat(); ...}

A

• At point A I have 1000 Car objects allocated on the heap• At some later time I ask for more heap memory. May

trigger a "garbage collection" - automatic, and silent• Compiler realizes that cars is not referenced later by the

program; so it's "garbage"; GC recycles it for use by other objects

• Magically solves leaks and use-after-free!

B

Page 7: CSE P501 –  Compiler Construction

Spring 2014 Jim Hogg - UW - CSE P501 W-7

next

next

Allocate an object; fast!

next

Allocate more objects; and one more, please?

Garbage Collection

Page 8: CSE P501 –  Compiler Construction

Spring 2014 Jim Hogg - UW - CSE P501 W-8

Garbage Collection, 1

Allocate another object

next

next"roots"

Trace reachable objects

next

"roots"

Compact unreachables;update all pointers

GC does not find garbage: it finds live objects and ignores all other memory

Page 9: CSE P501 –  Compiler Construction

Spring 2014 Jim Hogg - UW - CSE P501 W-9

Roots?

"Roots" are locations that hold a pointer to any object in the heap:

• Method-local variables• Method Arguments• Global variables• Class-static fields• Registers

Page 10: CSE P501 –  Compiler Construction

Spring 2014 Jim Hogg - UW - CSE P501 W-10

GC Start

root

root

Page 11: CSE P501 –  Compiler Construction

Spring 2014 Jim Hogg - UW - CSE P501 W-11

GC Mark Phase

root

root

UnreachableReachable

Page 12: CSE P501 –  Compiler Construction

Spring 2014 Jim Hogg - UW - CSE P501 W-12

GC Sweep Phase

root

root

Reachable

With memory free, now allocate space for object that provoked the GC

Page 13: CSE P501 –  Compiler Construction

Spring 2014 Jim Hogg - UW - CSE P501 W-13

No Bugs

• Forget to free => eventually run out of memory• called a "memory leak"

• Call free, but continue to use!• called "use-after-free"

• GC removes control over free'ing memory from the hands of the Developer

• GC will find all live objects and reclaim all other memory each time it is invoked. So no "memory leak"s

• An object is garbage-collected only if it was unreachable by the program; being unreachable, GC removes "use-after-free" bugs

• So what's a "memory leak" in C# or Java? - holding on to objects that really could be freed - eg, by setting the object-ref to null. Particularly troublesome in a Generational Garbage Collector

Page 14: CSE P501 –  Compiler Construction

Spring 2014 Jim Hogg - UW - CSE P501 W-14

C# Finalizers

• Class C above defines a "Finalize" method. Syntax is ~C()• Syntactic sugar for: protected override void Finalize()

• As each object of class C is created, the CLR links it onto a Finalization Queue

• When the object is about to be reclaimed, it is moved onto the Freachable Queue; a Finalizer background thread will run that Finalize method later

class C { private StreamWriter sw; ... ~C() { sw.Close(); } // flush output buffer ...}

Page 15: CSE P501 –  Compiler Construction

Spring 2014 Jim Hogg - UW - CSE P501 W-15

Finalization

root

Finalization Queue

root

Finalization Queue

FreachableQueue

Not live; not dead; zombies

Page 16: CSE P501 –  Compiler Construction

Spring 2014 Jim Hogg - UW - CSE P501 W-16

Threads• Compiler needs to create GC info - at each point in program,

where are current roots? - variables, registers

• GC info is bulky - cannot store for every instruction!

• So store GC info for selected points in the program - "GC-safepoints"

• Typical GC-safepoint is a function return

• Need to stop all threads to carry out a GC• For each thread stack-frame, stomp its return address; when it

reaches there, the thread will return, not to its caller, but into GC code

Page 17: CSE P501 –  Compiler Construction

Spring 2014 Jim Hogg - UW - CSE P501 W-17

Hijacking a Thread

retaddr

gc

caller

GC

gc

GC

gc

GC

Top of Stack

Save & Stomp retaddr

Thread continues to

runEventually

returns and hits the hijack

Page 18: CSE P501 –  Compiler Construction

Spring 2014 Jim Hogg - UW - CSE P501 W-18

Threads Run to Their Safepoints

Start GC

After each thread has hit its safepoint, GC can start its sweep phase

Page 19: CSE P501 –  Compiler Construction

Spring 2014 Jim Hogg - UW - CSE P501 W-19

Object Age Distribution• Performing GC over large heap consumes lots of cpu time (and

trashes the caches!)

• Experiment reveals:• Many objects live a very short time (high "infant mortality")• Some object live a very long time• Bi-modal

number

age

Object Ages

Page 20: CSE P501 –  Compiler Construction

Spring 2014 Jim Hogg - UW - CSE P501 W-20

Generational Garbage Collection

Gen0 Gen2Gen1

'Nursery'

• Divide heap into 3 areas• Allocate new objects from Gen0• At next GC

• move all Gen0 survivors into Gen1• move all Gen1 survivors into Gen2

Page 21: CSE P501 –  Compiler Construction

Spring 2014 Jim Hogg - UW - CSE P501 W-21

Migration thru Generations

Gen0 Gen2Gen1

Gen0 Gen2Gen1

Gen0 0 Gen2Gen1

Survivors

Gen0 Gen2Gen1

Page 22: CSE P501 –  Compiler Construction

Spring 2014 Jim Hogg - UW - CSE P501 W-22

Generational GC• Garbage-collecting entire heap is expensive

• A process of diminishing returns

• Instead, Garbage-Collect only the nursery (Gen0)• Smaller, so faster• Yields lots of free space (objects die young)• Maximum 'bang for the buck'

• Sometime collect Gen0+Gen1

• When the going gets tough, collect Gen0+Gen1+Gen2 ("full" collection)

• Some old objects may become unreachable - just ignore them in a Gen0 collect

• Above picture assumes no writes to old objects kept Gen0 objects reachable

Page 23: CSE P501 –  Compiler Construction

Spring 2014 Jim Hogg - UW - CSE P501 W-23

Card Table• In practice, older objects are written-to between GCs• Might update a pointer that results in keeping young object alive;

missing this is a disaster

Gen0 Gen2Gen1

• Card Tables: bitmap, with 1 bit per 128 Bytes of Gen2 heap• Create a "write barrier" - any write into Gen1 or Gen2 heap sets

corresponding bit in Card Table• During GC sweep, check Card Tables for any "keep alive" pointers• In practice, Card Tables contain few 1 bits

Gen2 Card Table

Gen1 Card Table

Page 24: CSE P501 –  Compiler Construction

Spring 2014 Jim Hogg - UW - CSE P501 W-24

Debugging the GC• What if a GC goes wrong?

• Fails to free some memory that is unreachable => memory leak• Frees some memory that is still reachable - impending disaster• Frees some non-objects - disaster• Colloquially called a GC 'hole'

• Causes• Bug in GC - mark, sweep, compaction• Bug in GC info

• Symptoms• Wrong answers• Process Crash• If we're 'lucky' failure comes soon after error• Otherwise, failure may occur seconds, minutes or hours later

• Testing• GC-Stress• eg: maximally randomize the heap; force frequent GCs; check

results

Page 25: CSE P501 –  Compiler Construction

Spring 2014 Jim Hogg - UW - CSE P501 W-25

Further Design Complexities• Objects with Finalizers:

• take 2 GCs to die• extend the lifetime of other connected objects• run at some later time => "non-deterministic" finalization• no guarantee on order that finalizers runs• uncaught exceptions escape and disappear!

• Strong refs• Weak refs - "short" and "long"• "Dispose" pattern• Resurrection• Large Object Heap (objects > 85 kB in size)• Concurrent Garbage Collection• Per-processor heaps; processor affinity• Fully-interruptible GC• Safe-point on back-edge of loop

Page 26: CSE P501 –  Compiler Construction

Spring 2014 Jim Hogg - UW - CSE P501 W-26

References for CLR GC

Garbage Collection: Automatic Memory Management in the Microsoft .NET Framework – by Jeffrey Richter

Garbage Collection—Part 2: Automatic Memory Management in the Microsoft .NET Framework - by Jeffrey Richter

Garbage Collector Basics and Performance Hints – by Rico Mariani

Using GC Efficiently – Part 1 by MaoniUsing GC Efficiently – Part 2 by MaoniUsing GC Efficiently – Part 3 by MaoniUsing GC Efficiently – Part 4 by Maoni

GC Performance Counters - by Maoni

Tools that help diagnose managed memory related issues – by Maoni

Clearing up some confusion over finalization and other areas in GC – by Maoni

Page 27: CSE P501 –  Compiler Construction

And a bit of perspective…

Automatic GC has been around since LISP I in 1958

Ubiquitous in functional and object-oriented programming communities for decades

Mainstream since Java(?) (mid-90s)

Now conventional? - nope!

Specialized patterns of allocate/free are still better coded by-hand - eg "Arena" storage inside compiler optimization phases

Spring 2014 Jim Hogg - UW - CSE P501 W-27