gpu computing with python and anaconda: the next frontier
Post on 21-Jan-2018
901 Views
Preview:
TRANSCRIPT
© 2017 Anaconda, Inc. - Confidential & Proprietary
GPU Computing with Python and Anaconda: The Next Frontier
Accelerate. Connect. Empower.
Stan Seibert
Director of Community Innovation
© 2017 Anaconda, Inc. - Confidential & Proprietary 2
GPUs & Python: A Great Combination
• Python is becoming the glue that binds data
science
• Rapid integration empowers data scientists to
combine new technologies
• This is our goal for Anaconda:
• Free distribution of Python and R for
Win/Mac/Linux
• Includes GPU-accelerated packages:
Caffe, TensorFlow, PyTorch, Theano,
Numba, Pyculib...
© 2017 Anaconda, Inc. - Confidential & Proprietary 3
ReLU
ReLU
ReLU
ReLU
Deep Learning: An Early Success
• Powerful machine learning
technique
• Many great open source options
• Every major package has a Python
interface
• Very compute intensive
➡Perfect for GPU acceleration
© 2017 Anaconda, Inc. - Confidential & Proprietary 4
• Compile numerical
Python functions for
CPU or GPU
• Based on the LLVM
compiler library
• Great for rapid,
custom algorithm
development
Numba: JIT Python Compilation
© 2017 Anaconda, Inc. - Confidential & Proprietary
Problem: An Ecosystem of Silos?
GPU
ETL/Data
Prep
Database
Machine
Learning
Visualization
Data
Data Data
Data
© 2017 Anaconda, Inc. - Confidential & Proprietary
Problem: An Ecosystem of Silos?
GPU
ETL/Data
Prep
Database
Machine
Learning
Visualization
Data
Data Data
Data
CPU transfer
CPU transferCPU transfer
© 2017 Anaconda, Inc. - Confidential & Proprietary
Problem: An Ecosystem of Silos?
GPU
ETL/Data
Prep
Database
Machine
Learning
Visualization
Data
Data Data
Data
CPU transfer
CPU transferCPU transfer Why do GPU applications share
data through slow CPU memory?
© 2017 Anaconda, Inc. - Confidential & Proprietary
GPU Open Analytics Initiative
Goal:
Standardize data exchange between
GPU analytics applications
Current Members:
MapD, Anaconda, H2O.ai,
BlazingDB, Graphistry, Gunrock
http://gpuopenanalytics.com/
© 2017 Anaconda, Inc. - Confidential & Proprietary 9
Streamlining the Data Science Pipeline
GPU Database
Python Data
Transformation
Generalized
Linear Model
All data stays on the GPU
GDFPacked
Array
Apache
Arrow
© 2017 Anaconda, Inc. - Confidential & Proprietary 10
• A format for tabular data in GPU memory
• Exchange GDF between different libraries
• Move between processes using CUDA IPC
• Based on Apache Arrow
• Code in separate library
• Work in progress to move functionality
into Arrow project
GPU Dataframe (GDF)
© 2017 Anaconda, Inc. - Confidential & Proprietary 11
• A Python library of manipulating GPU Dataframes:
• Create from NumPy arrays and Pandas Dataframes
• Exchange between processes
• Math operations
• Sort, Filter, Join, Group By
• Ideal for data manipulation and feature engineering stages between data source and machine learning
• Not intended to replace dedicated database applications
• Interoperates with our Python compiler for GPU: Numba
PyGDF: Python GPU Dataframes
© 2017 Anaconda, Inc. - Confidential & Proprietary 12
PyGDF: Group By Performance
GPU speedup become
very large above 10
million elements
Aggregation functions
are extremely efficient
on the GPU
© 2017 Anaconda, Inc. - Confidential & Proprietary 13
• Scalable execution task graphs of task graphs from single
computers to 1000+ node clusters
• Scheduler is "resource aware" and can direct GPU tasks to nodes
with appropriate hardware. Great for heterogeneous clusters!
Dask: Distributed Computing
© 2017 Anaconda, Inc. - Confidential & Proprietary 14
The Future
• In flight:
• Merger of common code into Apache Arrow GPU support
• Node.js interface to GDF (Graphistry)
• Dask GDF: Distributed GPU dataframe
• Other potential future projects:
• Tensor exchange between Python GPU libraries
• GPU shared memory service (Plasma for GPU)
• Can we improve the interaction of unified memory and IPC?
• What do you want to see?
© 2017 Anaconda, Inc. - Confidential & Proprietary
Learn More
GPU Open Analytics Websitehttp://gpuopenanalytics.com
GOAI Github Organizationhttps://github.com/gpuopenanalytics/
GOAI Google Grouphttps://groups.google.com/forum/#!forum/gpuopenanalytics
top related