joblib: lightweight pipelining for parallel jobs (v2)

26
Marcel Caraciolo @marcelcaraciolo CTO Genomika Diagnósticos, DataGeek, Machine learning, Python, Data and brainssss! joblib, Running Python functions as pipeline jobs

Upload: marcel-caraciolo

Post on 25-Jul-2015

173 views

Category:

Software


5 download

TRANSCRIPT

Page 1: Joblib:  Lightweight pipelining for parallel jobs (v2)

Marcel Caraciolo @marcelcaraciolo

CTO Genomika Diagnósticos, DataGeek, Machine learning, Python, Data and brainssss!

joblib, Running Python functions as pipeline jobs

Page 2: Joblib:  Lightweight pipelining for parallel jobs (v2)

Disclaimer

Page 3: Joblib:  Lightweight pipelining for parallel jobs (v2)

The problem

Take the advantage of more than one core or more than one CPU by default .

Special for problems that can solved or optimized by parallel computing

Matrix Multiplication, A web server that is answering small requests (like static files). Have a worker process each request. Web crawler that is following all links on a website, spin off a thread for each link. Batch processing images

Page 4: Joblib:  Lightweight pipelining for parallel jobs (v2)

For starters

Embarassingly parallel problem: Sum all of the primes in a range of integers starting from 100,000 and going to 5,000,000.

http://pt.wikipedia.org/wiki/Crivo_de_Erat%C3%B3stenes

Page 5: Joblib:  Lightweight pipelining for parallel jobs (v2)

For beginners

Module threading in Python

Easy to start, native in Python, generally the most selected choice.

Share the memory and state of the parent Light weight Each gets its own stack Do not use Inter-process communication Good for: Adding throughput and reduce latency

Page 6: Joblib:  Lightweight pipelining for parallel jobs (v2)

For beginners

EXAMPLE 2

Page 7: Joblib:  Lightweight pipelining for parallel jobs (v2)

For beginners

But: Python only allows a single thread to be executing within the interpreter at once. This restriction is enforced by the GIL.

To put it into a real world analogy: imagine 100 developers working at a company with only a single coffee mug. Most of the developers would spend their time waiting for coffee instead of coding.

GIL: “Global Interpreter Lock” - this is a lock which must be acquired for a thread to enter the interpreter’s space. Only one thread may be executing within the Python interpreter at once.

Page 8: Joblib:  Lightweight pipelining for parallel jobs (v2)

For advanced

Module Multiprocessing in PythonNot well known by the developers, specially because of process creation can be sluggish: create the workers up front.

Follows the threading API closely but uses Processes and inter-process communication under the hood Also offers distributed-computing faculties as well. Allows the side-stepping of the GIL for CPU bound applications. Allows for data/memory sharing.

Page 9: Joblib:  Lightweight pipelining for parallel jobs (v2)

For advanced

EXAMPLE 3

Page 10: Joblib:  Lightweight pipelining for parallel jobs (v2)

For advanced

This gets around the GIL limitation, but obviously has more overhead. In addition, communicating between processes is not as easy as reading and writing shared memory. !

Python multiprocessing, on the other hand, uses multiple system level processes, that is, it starts up multiple instances of the Python interpreter.

Page 11: Joblib:  Lightweight pipelining for parallel jobs (v2)

Benchmarks

All results are in wall-clock time. • Single Threaded: 41 minutes, 57 seconds • Multi Threaded (8 threads): 106 minutes, 29 seconds • MultiProcessing (8 Processes): 6 minutes, 22 seconds

http://nathangrigg.net/2015/04/python-threading-vs-processes/

Page 12: Joblib:  Lightweight pipelining for parallel jobs (v2)

For the lazier

joblib, simple package outside Python for writing parallel loops using multiprocessing

Easy syntax and optimized to be fast and robust in particular on large data and has specific optimizations for numpy arrays

Transparent and fast disk-caching of output value (memoize function) Embarrassingly parallel helper Logging/tracing Fast compressed Persistence (pickle to load and dump data)

Page 13: Joblib:  Lightweight pipelining for parallel jobs (v2)

For the lazier

Page 14: Joblib:  Lightweight pipelining for parallel jobs (v2)

For the lazier

>>> from math import sqrt >>> [sqrt(i ** 2) for i in range(10)] [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]

>>> from math import sqrt >>> from joblib import Parallel, delayed >>> Parallel(n_jobs=2)(delayed(sqrt)(i ** 2) for i in range(10)) [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]

create a multiprocessing pool that forks the Python interpreter in

multiple processes to execute each of the items of the list.

Page 15: Joblib:  Lightweight pipelining for parallel jobs (v2)

For the lazier

>>> from math import sqrt >>> [sqrt(i ** 2) for i in range(10)] [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]

>>> from math import sqrt >>> from joblib import Parallel, delayed >>> Parallel(n_jobs=2)(delayed(sqrt)(i ** 2) for i in range(10)) [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]

create a multiprocessing pool that forks the Python interpreter in

multiple processes to execute each of the items of the list.

The delayed function is a simple trick to be able to create a tuple (function,

args, kwargs) with a function-call syntax.

Page 16: Joblib:  Lightweight pipelining for parallel jobs (v2)

Examples

>>> from math import sqrt >>> from joblib import Parallel, delayed >>> Parallel(n_jobs=1)(delayed(sqrt)(i**2) for i in range(10)) [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]

>>> from math import modf >>> from joblib import Parallel, delayed >>> r = Parallel(n_jobs=1)(delayed(modf)(i/2.) for i in range(10)) >>> res, i = zip(*r) >>> res (0.0, 0.5, 0.0, 0.5, 0.0, 0.5, 0.0, 0.5, 0.0, 0.5) >>> i (0.0, 0.0, 1.0, 1.0, 2.0, 2.0, 3.0, 3.0, 4.0, 4.0)

Page 17: Joblib:  Lightweight pipelining for parallel jobs (v2)

Examples>>> from time import sleep >>> from joblib import Parallel, delayed >>> r = Parallel(n_jobs=2, verbose=5)(delayed(sleep)(.1) for _ in range(10)) [Parallel(n_jobs=2)]: Done 1 out of 10 | elapsed: 0.1s remaining: 0.9s [Parallel(n_jobs=2)]: Done 3 out of 10 | elapsed: 0.2s remaining: 0.5s [Parallel(n_jobs=2)]: Done 6 out of 10 | elapsed: 0.3s remaining: 0.2s [Parallel(n_jobs=2)]: Done 9 out of 10 | elapsed: 0.5s remaining: 0.1s [Parallel(n_jobs=2)]: Done 10 out of 10 | elapsed: 0.5s finished

>>> from heapq import nlargest >>> from joblib import Parallel, delayed >>> Parallel(n_jobs=2)(delayed(nlargest)(2, n) for n in (range(4), 'abcde', 3)) #... --------------------------------------------------------------------------- Sub-process traceback: --------------------------------------------------------------------------- TypeError Mon Nov 12 11:37:46 2012 PID: 12934 Python 2.7.3: /usr/bin/python ........................................................................... /usr/lib/python2.7/heapq.pyc in nlargest(n=2, iterable=3, key=None) 419 if n >= size: 420 return sorted(iterable, key=key, reverse=True)[:n] 421 422 # When key is none, use simpler decoration 423 if key is None: --> 424 it = izip(iterable, count(0,-1)) # decorate 425 result = _nlargest(n, it) 426 return map(itemgetter(0), result) # undecorate 427 428 # General case, slowest method !TypeError: izip argument #1 must support iteration

Page 18: Joblib:  Lightweight pipelining for parallel jobs (v2)

At our lab

Page 19: Joblib:  Lightweight pipelining for parallel jobs (v2)

Genome in one file

Page 20: Joblib:  Lightweight pipelining for parallel jobs (v2)

From optical to digital

Page 21: Joblib:  Lightweight pipelining for parallel jobs (v2)

Alignments and reads

Page 22: Joblib:  Lightweight pipelining for parallel jobs (v2)

Parallel Variant Calling

Page 23: Joblib:  Lightweight pipelining for parallel jobs (v2)

joblib benefits

It helped us to put into production in hours the capability of our pipeline to run in parallel taking the maximum of our cores

Easy to read and to debug

But it requires for multiple tasks and steps a expertise on allocating CPUS in order to avoid memory overscheduling.

Page 24: Joblib:  Lightweight pipelining for parallel jobs (v2)

The most important!

Interruption of multiprocesses jobs with ‘Ctrl-C’!!

good bye kill all pids!

Page 25: Joblib:  Lightweight pipelining for parallel jobs (v2)

Further information

https://pythonhosted.org/joblib/parallel.html

pip install joblib

Page 26: Joblib:  Lightweight pipelining for parallel jobs (v2)

Marcel Caraciolo @marcelcaraciolo

CTO Genomika Diagnósticos, DataGeek, Machine learning, Python, Data and brainssss!

joblib, Running Python functions as pipeline jobs