high performance python with numba - intel · high performance python with numba stan seibert may...
TRANSCRIPT
© 2015 Continuum Analytics- Confidential & Proprietary
High Performance Python with Numba
Stan Seibert
May 3, 2016
The Platform to Accelerate, Connect & Empower
Continuum Analytics is the company behind Anaconda and offers:
2
is….the leading open data science platform
powered by Python the fastest growing open data science language
• Consulting
• Training
• Open-Source Software
• Enterprise Software
Why Compile Python?
• Pure, interpreted Python is slow.
• Python can easily interface with fast code written in
C, C++, FORTRAN
• Do we always need to switch languages for speed?
• Answer: No, Numba provides an alternative.
3
How Does Numba Work?
4
Python Function
(bytecode)
Bytecode
Analysis
Functions
Arguments
Numba IR
Machine
Code
Execut
e!
Type
Inference
LLVM/NVVM JIT LLVM IR
Lowering
Rewrite IR
Cache
@jit
def do_math(a, b):
…
>>> do_math(x, y)
Supported Platforms
5
OS HW SW
• Windows (7 and later) • 32 and 64-bit x86 CPUs • Python 2 and 3
• OS X (10.9 and later) • CUDA-capable NVIDIA GPUs • NumPy 1.7 through 1.10
• Linux (~RHEL 5 and
later)• HSA-capable AMD GPUs
6
Example: Filter an array
7
Array Allocation
Looping over ndarray x as an iterator
Using numpy math functions
Returning a slice of the array
Numba decorator
(nopython=True not required)
Example: Filter an array
2.7x speedup
8
Type Specialization
Numba compiles a new version for each set of argument types
Automatically dispatches to correct version in subsequent calls.
9
Function Inlining
LLVM can inline functions, optimizing away overhead
No clamp function call is found in mean_with_clamp
assembly.
10
Multi-threadingNumba will release the GIL if requested.
Allows multithreaded execution to use multiple cores.
01234
1 Core 2 Cores 4 Cores
Speedup
11
NumPy Ufuncs
Turn a scalar function into an array function (“ufunc”):
12
Compiling with Multi-threaded Target
Only change required to generate multithreaded ufunc!
3.2x speedup on quad-core system
13
Using Numba in Distributed Computing
• Numba-compiled functions can be serialized and sent to remote systems.
• Functions are recompiled on remote system(and specialized to that CPU!)
• Works with systems that rely use pickle to send functions. Tested with Spark and Dask.
Dask
14
Summary
• Numba compiles numerical Python with large speedups
• Supports a wide range of Python and NumPy use cases.
• Create ufuncs that can be parallelized automatically!
• Numba works with distributed compute frameworks like
Spark and Dask.
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice15
Additional Materials
Intel® Distribution for Python*:
Visit webpage for download, documentation, and support
Intel® VTune™ Amplifier:
Now supports profiling Python apps!
Visit webpage for more information and download
More Intel products for developers on Intel Developer Zone
Visit Intel® Software Development Tools webpage
Numba JIT Compiler for Python
Project page: http://numba.pydata.org
Code examples: https://anaconda.org/seibert/intel_continuum_webinar_may3/notebook