automatic heap memory management · incorrect heap memory management causes issues - leads to...

75
Garbage Collection Automatic Heap Memory Management 1 Diogenes Nunez April 3, 2019

Upload: others

Post on 19-Aug-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

Garbage CollectionAutomatic Heap Memory Management

1

Diogenes NunezApril 3, 2019

Page 2: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

Garbage collection is in most languages- Java (desktop and Android)- Javascript- Swift (or Objective-C)- Python- Haskell- Lisp/Scheme- ML/OCaML- PHP- Go- ...

2

Page 3: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

Memory Allocation- Memory can be allocated in different locations

- Stack- Heap- Global- Filesystem

3

Page 4: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

Objects- An object is an allocated chunk of memory for use by the application.- Examples

- C stack objects: Static arrays, local variables- C heap objects: Anything allocated with malloc- Java stack objects: Local variables- Java heap objects: Anything allocated with new, Class definitions

- A reference is a pointer to an object.

4

Page 5: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

Incorrect heap memory management causes issues- Leads to memory corruption errors when wrong

- Use after free- Dangling pointer- Double free

- Can create space leaks by not deallocating- Can fragment the heap

5

Page 6: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

Incorrect heap memory management causes issues- Leads to memory corruption errors when wrong

- Use after free- Dangling pointer- Double free

- Can create space leaks by not deallocating- Can fragment the heap

6

Page 7: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

Liveness- A heap object is live if the object will be accessed by the program in the

future.- A heap object is dead if the object will never be accessed by the program.

7

Page 8: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

Garbage Collector- Garbage collector (GC) is a program that reclaims dead objects

automatically for an application to reuse.- GC runs when the application fails to allocate a new heap object.

8

Page 9: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

Perfect GC can’t exist.- Detecting liveness of a heap object in an arbitrary program is undecidable.- Therefore, GC must estimate liveness.

- Must be conservative in liveness.

9

Page 10: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

1. Garbage Collection (GC)2. GC Algorithms

a. Reference Countingb. Mark Sweepc. Copyingd. Generational and other Modifiers

3. GC in Practice

10

Page 11: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

Roots The roots of a program are memory addresses directly accessible by the application without following a pointer.

- Stack objects- Global objects- Register values

The collection of roots is called the root set.

11

Page 12: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

Reference Counting Algorithm: Insight- If a heap object has no incoming pointers, that object can never be accessed.- Count the number of incoming pointers on each heap object.- Pointer updates change the count.

- +1 when adding an incoming pointer- -1 when removing an incoming pointer- If 0 incoming pointers, remove object’s outgoing references and then reclaim object

12

Page 13: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

RC: Adding a new pointer

Root Set

2

1

Heap

1

13

Page 14: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

RC: Adding a new pointer

Root Set

2

1

Heap

1

14

Page 15: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

RC: Adding a new pointer

Root Set

2

1

Heap

15

2 1

Page 16: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

RC: Removing a pointer

Root Set

2

1

Heap

16

2 1

Page 17: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

RC: Removing a pointer

Root Set

2

1

Heap

17

2 11

Page 18: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

RC: Dealing with dead objects

Root Set

2

1

Heap

18

2 11

Page 19: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

RC: Dealing with dead objects

Root Set

2

Heap

19

2 11

10

Page 20: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

RC: Dealing with dead objects

Root Set

2

Heap

20

12 11

10

Page 21: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

RC: Dealing with dead objects

Root Set

2

Heap

21

2 11

10

1

Page 22: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

RC: Dealing with dead objects

Root SetHeap

22

2 11

10

21

Page 23: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

RC: Dealing with dead objects

Root SetHeap

23

0 2 11

10

21

Page 24: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

RC: Dealing with dead objects

Root SetHeap

24

10

21

Page 25: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

RC: Dealing with dead objects

Root SetHeap

25

21

Page 26: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

RC: Pros and Cons- Pros

- Immediately reclaim objects upon death- Counter updates are incremental

- Cons- Overhead on each pointer update- Decrementing a count can cause a lengthy pause- Can cause fragmentation- Cannot reclaim dead object cycles

Root Set

1

1

Heap26

Page 27: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

1. Garbage Collection (GC)2. GC Algorithms

a. Reference Countingb. Mark Sweepc. Copyingd. Generational and other Modifiers

3. GC in Practice

27

Page 28: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

Mark Sweep Algorithm: Reachability- Object o is reachable from object p if there exists a path of pointers from p to

o.

p o

28

Page 29: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

Mark Sweep estimates liveness with reachability.A heap object is live iff it is reachable from any root.

- By transitivity, if an object is live, so are any objects it points to.

Root Set Heap

29

a

c

b

Page 30: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

- Mark phase: Use BFS or DFS to mark object reachable from roots as live- Sweep phase: Traverse heap and reclaim all unmarked objects

Mark Sweep Algorithm: Two Phases

30

Root Set Heap

Root Set

Root SetHeap Heap

Initial Heap After Mark Phase After Sweep Phase

Page 31: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

MS: Mark Phase

Root SetHeap

31

Page 32: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

MS: Mark Phase

Root SetHeap

32

Page 33: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

MS: Mark Phase

Root SetHeap

33

Page 34: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

MS: Mark Phase

Root SetHeap

34

Page 35: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

MS: Mark Phase

Root SetHeap

35

Page 36: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

MS: Mark Phase

Root SetHeap

36

Page 37: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

MS: Mark Phase

Root SetHeap

37

Page 38: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

MS: Mark Phase

Root SetHeap

38

Page 39: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

MS: Mark Phase

Root SetHeap

39

Page 40: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

MS: Mark Phase

Root SetHeap

40

Page 41: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

MS: Sweep Phase

Root SetHeap

41

Page 42: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

MS: Sweep Phase

Root SetHeap

42

Page 43: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

Mark Sweep: Pros and Cons- Pros

- Can collect dead object cycles- Marking an object is inexpensive- Little to no overhead while application runs

- Cons- Must pause the application to mark and sweep- Marking phase is slow if there are a lot of live objects- Must traverse the whole heap to sweep dead objects- Still creates fragmentation

43

Page 44: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

1. Garbage Collection (GC)2. GC Algorithms

a. Reference Countingb. Mark Sweepc. Copyingd. Generational and other Modifiers

3. GC in Practice

44

Page 45: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

Aside: Copying Collector has many names- Semispace Collector- Scavenging Collector- Stop and Copy collector- Copying Collector

45

Page 46: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

Copying Algorithm: Heap Layout- Divide the heap into two equal-sized parts, From-Space and To-Space- Application allocates into the From-Space

46

Page 47: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

Copying Algorithm: Bump Pointer Allocation

47

Page 48: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

Copying Algorithm: Bump Pointer Allocation

48

Page 49: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

Copying Algorithm: Bump Pointer Allocation

49

Page 50: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

Copying Algorithm: Bump Pointer Allocation

50

Page 51: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

Copying Algorithm: Collection- Visit objects using BFS starting at roots- When visiting an object

- Copy the object over to the To-Space- Update incoming pointer to the visited object

- At the end of GC, flip the two halves

51

Page 52: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

Copying Algorithm: Example

52

Root Set

From-Space

To-Space

Page 53: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

Copying Algorithm: Example

53

Root Set

From-Space

To-Space

Page 54: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

Copying Algorithm: Example

54

Root Set

From-Space

To-Space

Page 55: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

Copying Algorithm: Example

55

Root Set

From-Space

To-Space

Page 56: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

Copying Algorithm: Example

56

Root Set

From-Space

To-Space

Page 57: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

Copying Algorithm: Example

57

Root Set

From-Space

To-Space

Page 58: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

Copying Algorithm: Example

58

Root Set

From-Space

To-Space

Page 59: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

Copying Algorithm: Example

59

Root Set

From-Space

To-Space

Page 60: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

Copying Algorithm: Example

60

Root Set

From-Space

To-Space

Page 61: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

Copying Algorithm: Example

61

Root Set

To-Space

From-Space

Page 62: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

Copying: Pros and Cons- Pros

- Removes fragmentation- Doesn’t traverse whole heap to reclaim memory- Fast allocation

- Cons- Need 2x the heap- Must pause entire program to move objects

62

Page 63: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

1. Garbage Collection (GC)2. GC Algorithms

a. Reference Countingb. Mark Sweepc. Copyingd. Generational and other Modifiers

3. GC in Practice

63

Page 64: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

Exploit generation hypothesis for faster GC- Generational Hypothesis: “Most objects die young.” [Ungar 1984]

64Picture from https://docs.oracle.com/javase/8/docs/technotes/guides/vm/gctuning/generations.html

Page 65: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

Exploit generation hypothesis for faster GC- Split the heap into young and old partitions called generations.

- Young generation usually called nursery or eden.- Old generation usually called mature space or tenured generation.

- Allocate into the nursery exclusively- Collection Algorithm

- Collect nursery when allocation fails.- Move young survivors to the mature space.- Collect mature space when whole heap is full.

- In practice, nursery collected using a copying collector.- Collector for the mature space varies.

65

Page 66: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

Algorithms can be improved further.- Concurrent Marking

- GC does some marking while application is running

- Compacting Pass- Modern MS implementations compact the heap after a GC.

- Add a MS or copying collector as a backup- RC collectors usually have a MS collector to clean up object cycles every so often

- Different heap layouts- e.g. Copying collector’s heap layout - e.g. Generations- e.g. Split heap into buckets of similar sized objects- ...

66

Page 67: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

Algorithms can be improved further.- Lazy Sweeping

- Sweep the heap as needed while application runs

- Compiler Optimizations- In RC, remove increments immediately followed by decrements to same object

67

Page 68: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

Actual GCs in Java Right Now- Generational Mark Sweep (-XX:+UseSerialGC)

- Old generation collected using compacting Mark Sweep.

- Concurrent Generational Mark Sweep (-XX:+UseParallelGC)- Same as above, but marks concurrently with user application.

- G1 (-XX:+UseG1GC)- Concurrent generational Mark Sweep GC that guarantees a pause time.- Divides the heap into equally sized regions.- Each region is either a nursery or mature region.- Uses statistics to determine which mature regions have the most dead objects.

- These are real java flags- java -XX:+UseG1GC -verbose:gc -XX:+PrintGCDetails HelloWorld

68

Page 69: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

1. Garbage Collection (GC)2. GC Algorithms

a. Reference Countingb. Mark Sweepc. Copyingd. Generational and other Modifiers

3. GC in Practice

69

Page 70: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

Memory correctness is not an issue- Dangling pointers and use-after-free can’t happen

- Would imply GC collected a live object.

- Double-free can’t happen- GC will only reclaim dead objects.- GC will reclaim a dead object exactly once.

70

Page 71: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

Example: Caching Web Requests

Browser Cache Server

71

Page 72: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

Performance is an issue

72

Page 73: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

GC dominates runtime

73

Page 74: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

Dealing with Performance Issues Due to GC- Allocate only the memory you need

- Reduces the amount of live data and speeds up MS and Copying

- Null pointers to objects you know are dead- Allows GC to recover more memory from dead objects

public void foo() { Object o = new Object(); // … use o only here o = null; // o is now dead}

74

Page 75: Automatic Heap Memory Management · Incorrect heap memory management causes issues - Leads to memory corruption errors when wrong - Use after free - Dangling pointer - Double free

Questions?

75