numpy-aware dynamic python compiler - meetupfiles.meetup.com/263790/numba.pdfnumba --- a deeper look...
TRANSCRIPT
NumbaNumPy-aware dynamic Python compiler
NYC Python MeetupOctober 24, 2012
Travis E. Oliphant
Friday, October 26, 12
Motivation
•Python is great for rapid development and high-level thinking-in-code
•It is slow for interior loops because lack of type information leads to a lot of indirection and “extra” code.
Friday, October 26, 12
Motivation
•NumPy users have a lot of type information --- but only currently have one-size fits all pre-compiled, vectorized loops.
•Many new features envisioned will need the ability for high-level expressions to be compiled to machine code.
Friday, October 26, 12
Goals
• Most developers should not have to write anything but Python -- maybe other even higher-level Domain Specific Language (DSL).
• Create faster code using array-expressions from NumPy users -- Fortran is the initial target
• Take advantage of multi-core and GPUs for a subset of Python.
Friday, October 26, 12
Why Not PyPy?
• PyPy does not work with CPython• PyPy is a (meta) “tracing” JIT. Machine code is
generated on the fly so there is no “build step” -- but we want to support a “build step” when justified
• PyPy tries to speed up everything -- we want to optimize more specifically on numeric codes (including complex numbers)
More to the story...
Friday, October 26, 12
Why not Cython?
• Cython is great for what it does, but...• Cython creates extension modules which cannot be
“unloaded” dynamically• Cython requires a full C-compiler• Cython doesn’t do type inference -- you have to
declare types on everything• Cython is another syntax to learn
Friday, October 26, 12
More Ranting
• The world needs more array-oriented compilers --- Python has needed one for a decade at least (Numeric provided typed multi-dimensional arrays in 1995)
• Array-oriented computing needs more light in CS curricula (secrets of APL and N’IAL)
• Most domain experts can write what they want at a high-level. Commonly this is then “translated” to a lower-level and then the compiler gets a hold of it. This is sub-optimal.
Friday, October 26, 12
More Ranting• Today’s vector machines (and vector co-processors,
or GPUS) were made for array-oriented computing. • The software stack has just not caught up ---
unfortunate because APL came out in 1963. • There is a reason Fortran remains popular.
Friday, October 26, 12
Array-Oriented Computing• Loosely defined as “Organize data-together” and
operate on it together (or in cache-size chunks) with array-level operations (e.g. NumPy)
Object
Attr1
Attr2
Attr3
Object
Attr1
Attr2
Attr3
Object
Attr1
Attr2
Attr3
Object
Attr1
Attr2
Attr3
Object
Attr1
Attr2
Attr3
Object
Attr1
Attr2
Attr3
Attr1 Attr2 Attr3
Object1
Object2
Object3
Object4
Object5
Object6
Friday, October 26, 12
Goal:
Numba should be the world’s best array-oriented compiler.
Friday, October 26, 12
NumPy + Mamba = Numba
LLVM Library
Intel Nvidia AppleAMD
OpenCLISPC CUDA CLANGOpenMP
LLVM-PY
Python Function Machine Code
ARM
Friday, October 26, 12
Uses of NumbaPython
Function
NumPy Runtime
Ufu
ncs
Gen
eral
ized
U
Func
s
Func
tion-
base
d In
dexi
ng
Mem
ory
Filte
rs
Win
dow
K
erne
l Fu
ncs
I/O F
ilter
s
Red
uctio
n Fi
lters
Com
pute
d C
olum
ns
Numba
function pointer
Friday, October 26, 12
Uses of Numba in SciPy
integrateoptimize
odespecial
writing more of SciPy at high-level
Friday, October 26, 12
Numba --- a deeper look
Numba is a Python to LLVM translator. It translates Python to LLVM IR (the LLVM
machinery is then used to create machine code from there). Numba is NumPy aware
--- it understands NumPy’s type system, methods, C-API, and data-structures
Friday, October 26, 12
Numba -- written in Python
• Numba itself is pure Python -- it uses (an updated) LLVM-py to interact with the LLVM C++ library to build a representation of the code in LLVM assembler.
• LLVM then creates machine code (or a “bitcode” module which can be persisted or sent to another machine)
• Machine-code is equivalent to a C-level function-pointer (e.g. a ctypes function)
Friday, October 26, 12
Example
Friday, October 26, 12
Examples
Friday, October 26, 12
Friday, October 26, 12
Friday, October 26, 12
Demo
Friday, October 26, 12
Status and Future
• My early bytecode branch further developed by Jon Riehl (Resilient Science) sponsored by Continuum Analytics, Inc. --- interprets bytecode directly
• Current trunk works with AST directly and making rapid progress- Mark Florrison (minivect)- Siu Kwan Lam (pymothoa)
Friday, October 26, 12
RoadMap
• Numba 0.2 available now• Github trunk has many changes -- 0.3 will support
- structures- Python code- objects with inheritance
Check out NumbaPro for cutting-edge features
Friday, October 26, 12
Software Stack Future?
LLVM
Python
COBJCFORTRAN
R
C++
Plateaus of Code re-use + DSLs
MatlabSQL
TDPL
Friday, October 26, 12
Join Us!
http://numba.pydata.org
Friday, October 26, 12