knowing your garbage collector - pycon italy 2015
TRANSCRIPT
Knowing your garbage collector
Francisco Fernandez Castano
upclose.me
[email protected] @fcofdezc
April 17, 2015
Francisco Fernandez Castano (@fcofdezc) Python GC April 17, 2015 1 / 61
Overview
1 IntroductionMotivationConcepts
2 AlgorithmsCPython RCPyPy
Francisco Fernandez Castano (@fcofdezc) Python GC April 17, 2015 2 / 61
Motivation
Managing memory manually is hard.
Who owns the memory?
Should I free these resources?
What happens with double frees?
Francisco Fernandez Castano (@fcofdezc) Python GC April 17, 2015 3 / 61
Dangling pointers
int *func(void)
{
int num = 1234;
/* ... */
return #
}
Francisco Fernandez Castano (@fcofdezc) Python GC April 17, 2015 4 / 61
Ownership
int *func(void)
{
int *num = malloc (10 * sizeof(int ));;
/* ... */
return num;
}
Francisco Fernandez Castano (@fcofdezc) Python GC April 17, 2015 5 / 61
John Maccarthy
Francisco Fernandez Castano (@fcofdezc) Python GC April 17, 2015 6 / 61
Basic concepts
Heap
A data structure in which objects may be allocated or deallocated in anyorder.
Francisco Fernandez Castano (@fcofdezc) Python GC April 17, 2015 7 / 61
Basic concepts
Heap
A data structure in which objects may be allocated or deallocated in anyorder.
Mutator
The part of a running program which executes application code.
Francisco Fernandez Castano (@fcofdezc) Python GC April 17, 2015 8 / 61
Basic concepts
Heap
A data structure in which objects may be allocated or deallocated in anyorder.
Mutator
The part of a running program which executes application code.
Collector
The part of a running program responsible of garbage collection.
Francisco Fernandez Castano (@fcofdezc) Python GC April 17, 2015 9 / 61
Garbage collection
Definition
Garbage collection is automatic memory management. While themutator runs , it routinely allocates memory from the heap. If morememory than available is needed, the collector reclaims unused memoryand returns it to the heap.
Francisco Fernandez Castano (@fcofdezc) Python GC April 17, 2015 10 / 61
CPython GC
CPython implementation has garbage collection.
CPython GC algorithm is Reference counting with cycle detector
It also has a generational GC.
Francisco Fernandez Castano (@fcofdezc) Python GC April 17, 2015 11 / 61
Reference Counting Algorithm
typedef struct _object {
_PyObject_HEAD_EXTRA
Py_ssize_t ob_refcnt;
struct _typeobject *ob_type;
} PyObject;
Francisco Fernandez Castano (@fcofdezc) Python GC April 17, 2015 12 / 61
Reference Counting Algorithm
my_list = []
class A(object ): pass
a = A()
my_list.append(a)
Francisco Fernandez Castano (@fcofdezc) Python GC April 17, 2015 13 / 61
Reference Counting Algorithm
static int
ins1(PyListObject *self , Py_ssize_t where , PyObject *v)
{
.
.
.
items = self ->ob_item;
for (i = n; --i >= where; )
items[i+1] = items[i];
Py_INCREF(v);
items[where] = v;
return 0;
}
Francisco Fernandez Castano (@fcofdezc) Python GC April 17, 2015 14 / 61
Reference Counting Algorithm
#define Py_INCREF(op) ( \
_Py_INC_REFTOTAL _Py_REF_DEBUG_COMMA \
(( PyObject *)(op))->ob_refcnt ++)
Francisco Fernandez Castano (@fcofdezc) Python GC April 17, 2015 15 / 61
Reference Counting Algorithm
my_list [0] = 1
Francisco Fernandez Castano (@fcofdezc) Python GC April 17, 2015 16 / 61
Reference Counting Algorithm
#define Py_DECREF(op)
\
do {
\
if (_Py_DEC_REFTOTAL _Py_REF_DEBUG_COMMA
\
--((PyObject *)(op))->ob_refcnt != 0)
\
_Py_CHECK_REFCNT(op)
\
else
\
_Py_Dealloc (( PyObject *)(op));
\
} while (0)
Francisco Fernandez Castano (@fcofdezc) Python GC April 17, 2015 17 / 61
Reference Counting Algorithm
Francisco Fernandez Castano (@fcofdezc) Python GC April 17, 2015 18 / 61
Reference Counting Algorithm
Francisco Fernandez Castano (@fcofdezc) Python GC April 17, 2015 19 / 61
Reference Counting Algorithm
Francisco Fernandez Castano (@fcofdezc) Python GC April 17, 2015 20 / 61
Reference Counting Algorithm
Francisco Fernandez Castano (@fcofdezc) Python GC April 17, 2015 21 / 61
Reference Counting Algorithm
Francisco Fernandez Castano (@fcofdezc) Python GC April 17, 2015 22 / 61
Reference Counting Algorithm
Francisco Fernandez Castano (@fcofdezc) Python GC April 17, 2015 23 / 61
Cycles
l = []
l.append(l)
del l
Francisco Fernandez Castano (@fcofdezc) Python GC April 17, 2015 24 / 61
Cycles
Francisco Fernandez Castano (@fcofdezc) Python GC April 17, 2015 25 / 61
Cycles
Francisco Fernandez Castano (@fcofdezc) Python GC April 17, 2015 26 / 61
PyGC Head
typedef union _gc_head {
struct {
union _gc_head *gc_next;
union _gc_head *gc_prev;
Py_ssize_t gc_refs;
} gc;
double dummy; /* force worst -case alignment */
} PyGC_Head;
Francisco Fernandez Castano (@fcofdezc) Python GC April 17, 2015 27 / 61
Cycles
Francisco Fernandez Castano (@fcofdezc) Python GC April 17, 2015 28 / 61
Cycles
Francisco Fernandez Castano (@fcofdezc) Python GC April 17, 2015 29 / 61
Cycles
Francisco Fernandez Castano (@fcofdezc) Python GC April 17, 2015 30 / 61
Cycles
Francisco Fernandez Castano (@fcofdezc) Python GC April 17, 2015 31 / 61
Cycles
Francisco Fernandez Castano (@fcofdezc) Python GC April 17, 2015 32 / 61
Cycles
Francisco Fernandez Castano (@fcofdezc) Python GC April 17, 2015 33 / 61
Cycles
Francisco Fernandez Castano (@fcofdezc) Python GC April 17, 2015 34 / 61
Cycles
Francisco Fernandez Castano (@fcofdezc) Python GC April 17, 2015 35 / 61
Cycles
Francisco Fernandez Castano (@fcofdezc) Python GC April 17, 2015 36 / 61
Cycles
Wait!!
Francisco Fernandez Castano (@fcofdezc) Python GC April 17, 2015 37 / 61
Cycles
Finalizers
Weak references
Francisco Fernandez Castano (@fcofdezc) Python GC April 17, 2015 38 / 61
Cycles
Objects reachable from finalizers can’t safely be deleted.
Francisco Fernandez Castano (@fcofdezc) Python GC April 17, 2015 39 / 61
Cycles
So, they’re moved to uncollectable set.
Francisco Fernandez Castano (@fcofdezc) Python GC April 17, 2015 40 / 61
Cycles
So, they’re moved to uncollectable set.
Francisco Fernandez Castano (@fcofdezc) Python GC April 17, 2015 41 / 61
Cycles
Only reachable weakref callbacks are invoked.
Francisco Fernandez Castano (@fcofdezc) Python GC April 17, 2015 42 / 61
CPython Memory Allocator
Francisco Fernandez Castano (@fcofdezc) Python GC April 17, 2015 43 / 61
CPython Memory Allocator
Francisco Fernandez Castano (@fcofdezc) Python GC April 17, 2015 44 / 61
Demo
Francisco Fernandez Castano (@fcofdezc) Python GC April 17, 2015 45 / 61
Reference counting
Pros: Is incremental, as it works, it frees memory.
Cons: Detecting Cycles could be hard.
Cons: Size overhead on objects.
Francisco Fernandez Castano (@fcofdezc) Python GC April 17, 2015 46 / 61
PyPy
Francisco Fernandez Castano (@fcofdezc) Python GC April 17, 2015 47 / 61
PyPy GC
Agnostic GC
Different implementations over time
Nowadays it uses incminmark
Francisco Fernandez Castano (@fcofdezc) Python GC April 17, 2015 48 / 61
Young objects
[elem * 2 for elem in elements]
balance = (a / b / c) * 4
’asdadsasd -xxx’.replace(’x’, ’y’). replace(’a’, ’b’)
foo.bar()
Francisco Fernandez Castano (@fcofdezc) Python GC April 17, 2015 49 / 61
PyPy memory model
Francisco Fernandez Castano (@fcofdezc) Python GC April 17, 2015 50 / 61
PyPy GC
Minor and Major collection
Objects are moved only once
Major collection is done incrementally (to avoid long stops)
Francisco Fernandez Castano (@fcofdezc) Python GC April 17, 2015 51 / 61
PyPy memory model
Francisco Fernandez Castano (@fcofdezc) Python GC April 17, 2015 52 / 61
Francisco Fernandez Castano (@fcofdezc) Python GC April 17, 2015 53 / 61
Mark and Sweep Algorithm
Francisco Fernandez Castano (@fcofdezc) Python GC April 17, 2015 54 / 61
Mark and Sweep Algorithm
Francisco Fernandez Castano (@fcofdezc) Python GC April 17, 2015 55 / 61
Mark and Sweep Algorithm
Francisco Fernandez Castano (@fcofdezc) Python GC April 17, 2015 56 / 61
Mark and Sweep Algorithm
Francisco Fernandez Castano (@fcofdezc) Python GC April 17, 2015 57 / 61
Mark and sweep
Pros: Can collect cycles.
Cons: Basic implementation stops the world
Francisco Fernandez Castano (@fcofdezc) Python GC April 17, 2015 58 / 61
EuroPython
Francisco Fernandez Castano (@fcofdezc) Python GC April 17, 2015 59 / 61
Questions?
Francisco Fernandez Castano (@fcofdezc) Python GC April 17, 2015 60 / 61
The End
Francisco Fernandez Castano (@fcofdezc) Python GC April 17, 2015 61 / 61