the platform inside and out · 2020/4/14 · • easy training: with the same apis • trusted:...
TRANSCRIPT
![Page 1: The Platform Inside and Out · 2020/4/14 · • Easy Training: With the same APIs • Trusted: With the same developer community PyData Native • Easy to install and use on a laptop](https://reader036.vdocument.in/reader036/viewer/2022081615/5fd2f73d5d65031a7d1b30d0/html5/thumbnails/1.jpg)
Nick BeckerRAPIDS Engineering
The Platform Inside and Out
![Page 2: The Platform Inside and Out · 2020/4/14 · • Easy Training: With the same APIs • Trusted: With the same developer community PyData Native • Easy to install and use on a laptop](https://reader036.vdocument.in/reader036/viewer/2022081615/5fd2f73d5d65031a7d1b30d0/html5/thumbnails/2.jpg)
2
RAPIDSEnd-to-End Accelerated GPU Data Science
cuDF cuIOAnalytics
GPU Memory
Data Preparation VisualizationModel Training
cuMLMachine Learning
cuGraphGraph Analytics
PyTorch Chainer MxNet
Deep Learning
cuxfilter <> pyVizVisualization
Dask
![Page 3: The Platform Inside and Out · 2020/4/14 · • Easy Training: With the same APIs • Trusted: With the same developer community PyData Native • Easy to install and use on a laptop](https://reader036.vdocument.in/reader036/viewer/2022081615/5fd2f73d5d65031a7d1b30d0/html5/thumbnails/3.jpg)
3
Data Processing EvolutionFaster data access, less data movement
25-100x Improvement
Less codeLanguage flexible
Primarily In-Memory
HDFS Read
HDFS Write
HDFS Read
HDFS Write
HDFS ReadQuery ETL ML Train
HDFS Read Query ETL ML Train
HDFS Read
GPU ReadQuery
CPUWrit
e
GPU Read ETL
CPUWrit
e
GPU Read
MLTrain
5-10x ImprovementMore code
Language rigidSubstantially on GPU
Traditional GPU Processing
Hadoop Processing, Reading from disk
Spark In-Memory Processing
![Page 4: The Platform Inside and Out · 2020/4/14 · • Easy Training: With the same APIs • Trusted: With the same developer community PyData Native • Easy to install and use on a laptop](https://reader036.vdocument.in/reader036/viewer/2022081615/5fd2f73d5d65031a7d1b30d0/html5/thumbnails/4.jpg)
4
APP A
Data Movement and TransformationData Movement and TransformationThe bane of productivity and performance
CPU
APP B
Copy & Convert
Copy & Convert
Copy & Convert
APP A GPU Data
APP BGPU Data
Read Data
Load Data
APP B
APP A
GPU
![Page 5: The Platform Inside and Out · 2020/4/14 · • Easy Training: With the same APIs • Trusted: With the same developer community PyData Native • Easy to install and use on a laptop](https://reader036.vdocument.in/reader036/viewer/2022081615/5fd2f73d5d65031a7d1b30d0/html5/thumbnails/5.jpg)
5
Data Movement and TransformationData Movement and TransformationWhat if we could keep data on the GPU?
APP A
APP B
Copy & Convert
Copy & Convert
Copy & Convert
Read Data
Load Data
CPU
APP A GPU Data
APP BGPU Data
APP B
APP A
GPUCopy & Convert
![Page 6: The Platform Inside and Out · 2020/4/14 · • Easy Training: With the same APIs • Trusted: With the same developer community PyData Native • Easy to install and use on a laptop](https://reader036.vdocument.in/reader036/viewer/2022081615/5fd2f73d5d65031a7d1b30d0/html5/thumbnails/6.jpg)
6
Learning from Apache Arrow
From Apache Arrow Home Page - https://arrow.apache.org/
● Each system has its own internal memory format● 70-80% computation wasted on serialization and deserialization● Similar functionality implemented in multiple projects
● All systems utilize the same memory format● No overhead for cross-system communication● Projects can share functionality (eg, Parquet-to-Arrow
reader)
![Page 7: The Platform Inside and Out · 2020/4/14 · • Easy Training: With the same APIs • Trusted: With the same developer community PyData Native • Easy to install and use on a laptop](https://reader036.vdocument.in/reader036/viewer/2022081615/5fd2f73d5d65031a7d1b30d0/html5/thumbnails/7.jpg)
7
Data Processing EvolutionFaster data access, less data movement
25-100x Improvement
Less codeLanguage flexible
Primarily In-Memory
HDFS Read
HDFS Write
HDFS Read
HDFS Write
HDFS ReadQuery ETL ML Train
HDFS Read Query ETL ML Train
HDFS Read
GPU ReadQuery
CPUWrit
e
GPU Read ETL
CPUWrit
e
GPU Read
MLTrain
ArrowRead ETL ML
Train
5-10x ImprovementMore code
Language rigidSubstantially on GPU
50-100x ImprovementSame code
Language flexiblePrimarily on GPU
RAPIDS
Traditional GPU Processing
Hadoop Processing, Reading from disk
Spark In-Memory Processing
Query
![Page 8: The Platform Inside and Out · 2020/4/14 · • Easy Training: With the same APIs • Trusted: With the same developer community PyData Native • Easy to install and use on a laptop](https://reader036.vdocument.in/reader036/viewer/2022081615/5fd2f73d5d65031a7d1b30d0/html5/thumbnails/8.jpg)
8
Faster Speeds, Real-World BenefitscuIO/cuDF – Load and Data Preparation XGBoost Machine Learning
Time in seconds (shorter is better)cuIO/cuDF (Load and Data Prep) Data Conversion XGBoost
Benchmark
200GB CSV dataset; Data prep includes joins, variable transformations
CPU Cluster ConfigurationCPU nodes (61 GiB memory, 8 vCPUs, 64-bit platform), Apache Spark
DGX Cluster Configuration5x DGX-1 on InfiniBand network
8762
6148
3925
3221
322
213
End-to-End
![Page 9: The Platform Inside and Out · 2020/4/14 · • Easy Training: With the same APIs • Trusted: With the same developer community PyData Native • Easy to install and use on a laptop](https://reader036.vdocument.in/reader036/viewer/2022081615/5fd2f73d5d65031a7d1b30d0/html5/thumbnails/9.jpg)
9
Faster Speeds, Real-World BenefitscuIO/cuDF – Load and Data Preparation XGBoost Machine Learning
Time in seconds (shorter is better)cuIO/cuDF (Load and Data Prep) Data Conversion XGBoost
Benchmark
200GB CSV dataset; Data prep includes joins, variable transformations
CPU Cluster ConfigurationCPU nodes (61 GiB memory, 8 vCPUs, 64-bit platform), Apache Spark
DGX Cluster Configuration5x DGX-1 on InfiniBand network
End-to-End
Improving Over Time
![Page 10: The Platform Inside and Out · 2020/4/14 · • Easy Training: With the same APIs • Trusted: With the same developer community PyData Native • Easy to install and use on a laptop](https://reader036.vdocument.in/reader036/viewer/2022081615/5fd2f73d5d65031a7d1b30d0/html5/thumbnails/10.jpg)
10
Speed, Ease of Use, and IterationThe Way to Win at Data Science
![Page 11: The Platform Inside and Out · 2020/4/14 · • Easy Training: With the same APIs • Trusted: With the same developer community PyData Native • Easy to install and use on a laptop](https://reader036.vdocument.in/reader036/viewer/2022081615/5fd2f73d5d65031a7d1b30d0/html5/thumbnails/11.jpg)
11
RAPIDS Core
![Page 12: The Platform Inside and Out · 2020/4/14 · • Easy Training: With the same APIs • Trusted: With the same developer community PyData Native • Easy to install and use on a laptop](https://reader036.vdocument.in/reader036/viewer/2022081615/5fd2f73d5d65031a7d1b30d0/html5/thumbnails/12.jpg)
12
PandasAnalytics
CPU Memory
Data Preparation VisualizationModel Training
Scikit-LearnMachine Learning
NetworkXGraph Analytics
PyTorch Chainer MxNet
Deep Learning
Matplotlib/PlotlyVisualization
Open Source Data Science EcosystemFamiliar Python APIs
Dask
![Page 13: The Platform Inside and Out · 2020/4/14 · • Easy Training: With the same APIs • Trusted: With the same developer community PyData Native • Easy to install and use on a laptop](https://reader036.vdocument.in/reader036/viewer/2022081615/5fd2f73d5d65031a7d1b30d0/html5/thumbnails/13.jpg)
13
cuDF cuIOAnalytics
GPU Memory
Data Preparation VisualizationModel Training
cuMLMachine Learning
cuGraphGraph Analytics
PyTorch Chainer MxNet
Deep Learning
cuxfilter <> pyVizVisualization
RAPIDSEnd-to-End Accelerated GPU Data Science
Dask
![Page 14: The Platform Inside and Out · 2020/4/14 · • Easy Training: With the same APIs • Trusted: With the same developer community PyData Native • Easy to install and use on a laptop](https://reader036.vdocument.in/reader036/viewer/2022081615/5fd2f73d5d65031a7d1b30d0/html5/thumbnails/14.jpg)
14
Dask
![Page 15: The Platform Inside and Out · 2020/4/14 · • Easy Training: With the same APIs • Trusted: With the same developer community PyData Native • Easy to install and use on a laptop](https://reader036.vdocument.in/reader036/viewer/2022081615/5fd2f73d5d65031a7d1b30d0/html5/thumbnails/15.jpg)
15
cuDF cuIOAnalytics
GPU Memory
Data Preparation VisualizationModel Training
cuMLMachine Learning
cuGraphGraph Analytics
PyTorch Chainer MxNet
Deep Learning
cuxfilter <> pyVizVisualization
RAPIDSScaling RAPIDS with Dask
Dask
![Page 16: The Platform Inside and Out · 2020/4/14 · • Easy Training: With the same APIs • Trusted: With the same developer community PyData Native • Easy to install and use on a laptop](https://reader036.vdocument.in/reader036/viewer/2022081615/5fd2f73d5d65031a7d1b30d0/html5/thumbnails/16.jpg)
16
Why Dask?
• Easy Migration: Built on top of NumPy, Pandas Scikit-Learn, etc.
• Easy Training: With the same APIs• Trusted: With the same developer community
PyData Native
• Easy to install and use on a laptop• Scales out to thousand-node clusters
Easy Scalability
• Most common parallelism framework today in the PyData and SciPy community
Popular
• HPC: SLURM, PBS, LSF, SGE• Cloud: Kubernetes• Hadoop/Spark: Yarn
Deployable
![Page 17: The Platform Inside and Out · 2020/4/14 · • Easy Training: With the same APIs • Trusted: With the same developer community PyData Native • Easy to install and use on a laptop](https://reader036.vdocument.in/reader036/viewer/2022081615/5fd2f73d5d65031a7d1b30d0/html5/thumbnails/17.jpg)
17
Why OpenUCX?
• TCP sockets are slow!
• UCX provides uniform access to transports (TCP, InfiniBand, shared memory, NVLink)
• Alpha Python bindings for UCX (ucx-py)
• Will provide best communication performance, to Dask based on available hardware on nodes/cluster
Bringing hardware accelerated communications to Dask
conda install -c conda-forge -c rapidsai \ cudatoolkit=<CUDA version> ucx-proc=*=gpu ucx ucx-py
![Page 18: The Platform Inside and Out · 2020/4/14 · • Easy Training: With the same APIs • Trusted: With the same developer community PyData Native • Easy to install and use on a laptop](https://reader036.vdocument.in/reader036/viewer/2022081615/5fd2f73d5d65031a7d1b30d0/html5/thumbnails/18.jpg)
18
cuDF v0.13, UCX-PY 0.13
Running on NVIDIA DGX-1 (8GPUs):
GPU: NVIDIA Tesla V100 32GBCPU: Intel(R) Xeon(R) CPU 8168 @ 2.70GHz
Benchmark Setup:
DataFrames: Left/Right 1x int64 column key column, 1x int64 value columns
Merge: inner
30% of matching data balanced across each partition
Benchmarks: Distributed cuDF Random Merge
![Page 19: The Platform Inside and Out · 2020/4/14 · • Easy Training: With the same APIs • Trusted: With the same developer community PyData Native • Easy to install and use on a laptop](https://reader036.vdocument.in/reader036/viewer/2022081615/5fd2f73d5d65031a7d1b30d0/html5/thumbnails/19.jpg)
19
cuDF
![Page 20: The Platform Inside and Out · 2020/4/14 · • Easy Training: With the same APIs • Trusted: With the same developer community PyData Native • Easy to install and use on a laptop](https://reader036.vdocument.in/reader036/viewer/2022081615/5fd2f73d5d65031a7d1b30d0/html5/thumbnails/20.jpg)
20
cuDF cuIOAnalytics
GPU Memory
Data Preparation VisualizationModel Training
cuMLMachine Learning
cuGraphGraph Analytics
PyTorch Chainer MxNet
Deep Learning
cuxfilter <> pyVizVisualization
RAPIDSGPU Accelerated data wrangling and feature engineering
Dask
![Page 21: The Platform Inside and Out · 2020/4/14 · • Easy Training: With the same APIs • Trusted: With the same developer community PyData Native • Easy to install and use on a laptop](https://reader036.vdocument.in/reader036/viewer/2022081615/5fd2f73d5d65031a7d1b30d0/html5/thumbnails/21.jpg)
21
GPU-Accelerated ETLThe average data scientist spends 90+% of their time in ETL as opposed to training
models
![Page 22: The Platform Inside and Out · 2020/4/14 · • Easy Training: With the same APIs • Trusted: With the same developer community PyData Native • Easy to install and use on a laptop](https://reader036.vdocument.in/reader036/viewer/2022081615/5fd2f73d5d65031a7d1b30d0/html5/thumbnails/22.jpg)
22
ETL Technology Stack
Dask cuDFcuDF
Pandas
ThrustCub
Jitify
Python
Cython
cuDF C++
CUDA Libraries
CUDA
![Page 23: The Platform Inside and Out · 2020/4/14 · • Easy Training: With the same APIs • Trusted: With the same developer community PyData Native • Easy to install and use on a laptop](https://reader036.vdocument.in/reader036/viewer/2022081615/5fd2f73d5d65031a7d1b30d0/html5/thumbnails/23.jpg)
23
ETL: the Backbone of Data SciencelibcuDF is…
CUDA C++ Library
● Table (dataframe) and column types and algorithms
● CUDA kernels for sorting, join, groupby, reductions, partitioning, elementwise operations, etc.
● Optimized GPU implementations for strings, timestamps, numeric types (more coming)
● Primitives for scalable distributed ETL
std::unique_ptr<table>gather(table_view const& input, column_view const& gather_map, …){
// return a new table containing // rows from input indexed by // gather_map}
![Page 24: The Platform Inside and Out · 2020/4/14 · • Easy Training: With the same APIs • Trusted: With the same developer community PyData Native • Easy to install and use on a laptop](https://reader036.vdocument.in/reader036/viewer/2022081615/5fd2f73d5d65031a7d1b30d0/html5/thumbnails/24.jpg)
24
ETL: the Backbone of Data SciencecuDF is…
Python Library
● A Python library for manipulating GPU DataFrames following the Pandas API
● Python interface to CUDA C++ library with additional functionality
● Creating GPU DataFrames from Numpy arrays, Pandas DataFrames, and PyArrow Tables
● JIT compilation of User-Defined Functions (UDFs) using Numba
![Page 25: The Platform Inside and Out · 2020/4/14 · • Easy Training: With the same APIs • Trusted: With the same developer community PyData Native • Easy to install and use on a laptop](https://reader036.vdocument.in/reader036/viewer/2022081615/5fd2f73d5d65031a7d1b30d0/html5/thumbnails/25.jpg)
25
cuDF v0.13, Pandas 0.25.3
Running on NVIDIA DGX-1:
GPU: NVIDIA Tesla V100 32GBCPU: Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz
Benchmark Setup:
RMM Pool Allocator Enabled
DataFrames: 2x int32 columns key columns, 3x int32 value columns
Merge: inner
GroupBy: count, sum, min, max calculated for each value column
Benchmarks: single-GPU Speedup vs. Pandas
![Page 26: The Platform Inside and Out · 2020/4/14 · • Easy Training: With the same APIs • Trusted: With the same developer community PyData Native • Easy to install and use on a laptop](https://reader036.vdocument.in/reader036/viewer/2022081615/5fd2f73d5d65031a7d1b30d0/html5/thumbnails/26.jpg)
26
cuDF cuIOAnalytics
GPU Memory
Data Preparation VisualizationModel Training
cuMLMachine Learning
cuGraphGraph Analytics
PyTorch Chainer MxNet
Deep Learning
cuxfilter <> pyVizVisualization
Dask
ETL: the Backbone of Data SciencecuDF is not the end of the story
![Page 27: The Platform Inside and Out · 2020/4/14 · • Easy Training: With the same APIs • Trusted: With the same developer community PyData Native • Easy to install and use on a laptop](https://reader036.vdocument.in/reader036/viewer/2022081615/5fd2f73d5d65031a7d1b30d0/html5/thumbnails/27.jpg)
27
ETL: the Backbone of Data ScienceString Support
•Regular Expressions•Element-wise operations
• Split, Find, Extract, Cat, Typecasting, etc…•String GroupBys, Joins, Sorting, etc.•Categorical columns fully on GPU•Native String type in libcudf C++
Current v0.13 String Support
• Further performance optimization• JIT-compiled String UDFs
Future v0.14+ String Support
![Page 28: The Platform Inside and Out · 2020/4/14 · • Easy Training: With the same APIs • Trusted: With the same developer community PyData Native • Easy to install and use on a laptop](https://reader036.vdocument.in/reader036/viewer/2022081615/5fd2f73d5d65031a7d1b30d0/html5/thumbnails/28.jpg)
28
● Follow Pandas APIs and provide >10x speedup
● CSV Reader - v0.2, CSV Writer v0.8● Parquet Reader – v0.7, Parquet Writer v0.12● ORC Reader – v0.7, ORC Writer v0.10● JSON Reader - v0.8● Avro Reader - v0.9
● GPU Direct Storage integration in progress for bypassing PCIe bottlenecks!
● Key is GPU-accelerating both parsing and decompression wherever possible
Source: Apache Crail blog: SQL Performance: Part 1 - Input File Formats
Extraction is the CornerstonecuDF I/O for Faster Data Loading
![Page 29: The Platform Inside and Out · 2020/4/14 · • Easy Training: With the same APIs • Trusted: With the same developer community PyData Native • Easy to install and use on a laptop](https://reader036.vdocument.in/reader036/viewer/2022081615/5fd2f73d5d65031a7d1b30d0/html5/thumbnails/29.jpg)
29
ETL is not just DataFrames!
![Page 30: The Platform Inside and Out · 2020/4/14 · • Easy Training: With the same APIs • Trusted: With the same developer community PyData Native • Easy to install and use on a laptop](https://reader036.vdocument.in/reader036/viewer/2022081615/5fd2f73d5d65031a7d1b30d0/html5/thumbnails/30.jpg)
30
GPU Memory
Data Preparation VisualizationModel Training
RAPIDSBuilding bridges into the array ecosystem
Dask
cuDF cuIOAnalytics
cuMLMachine Learning
cuGraphGraph Analytics
PyTorch Chainer MxNet
Deep Learning
cuxfilter <> pyVizVisualization
![Page 31: The Platform Inside and Out · 2020/4/14 · • Easy Training: With the same APIs • Trusted: With the same developer community PyData Native • Easy to install and use on a laptop](https://reader036.vdocument.in/reader036/viewer/2022081615/5fd2f73d5d65031a7d1b30d0/html5/thumbnails/31.jpg)
31
Interoperability for the WinDLPack and __cuda_array_interface__
mpi4py
![Page 32: The Platform Inside and Out · 2020/4/14 · • Easy Training: With the same APIs • Trusted: With the same developer community PyData Native • Easy to install and use on a laptop](https://reader036.vdocument.in/reader036/viewer/2022081615/5fd2f73d5d65031a7d1b30d0/html5/thumbnails/32.jpg)
32
ETL: Arrays and DataFramesDask and CUDA Python arrays
• Scales NumPy to distributed clusters• Used in climate science, imaging, HPC analysis
up to 100TB size• Now seamlessly accelerated with GPUs
![Page 33: The Platform Inside and Out · 2020/4/14 · • Easy Training: With the same APIs • Trusted: With the same developer community PyData Native • Easy to install and use on a laptop](https://reader036.vdocument.in/reader036/viewer/2022081615/5fd2f73d5d65031a7d1b30d0/html5/thumbnails/33.jpg)
33
Benchmark: single-GPU CuPy vs NumPy
More details: https://blog.dask.org/2019/06/27/single-gpu-cupy-benchmarks
![Page 34: The Platform Inside and Out · 2020/4/14 · • Easy Training: With the same APIs • Trusted: With the same developer community PyData Native • Easy to install and use on a laptop](https://reader036.vdocument.in/reader036/viewer/2022081615/5fd2f73d5d65031a7d1b30d0/html5/thumbnails/34.jpg)
34
SVD BenchmarkDask and CuPy Doing Complex Workflows
![Page 35: The Platform Inside and Out · 2020/4/14 · • Easy Training: With the same APIs • Trusted: With the same developer community PyData Native • Easy to install and use on a laptop](https://reader036.vdocument.in/reader036/viewer/2022081615/5fd2f73d5d65031a7d1b30d0/html5/thumbnails/35.jpg)
35
Architecture Time
Single CPU Core 2hr 39min
Forty CPU Cores 11min 30s
One GPU 1min 37s
Eight GPUs 19s
Petabyte Scale Analytics with Dask and CuPy
Cluster configuration: 20x GCP instances, each instance has:CPU: 1 VM socket (Intel Xeon CPU @ 2.30GHz), 2-core, 2 threads/core, 132GB mem, GbE ethernet, 950 GB diskGPU: 4x NVIDIA Tesla P100-16GB-PCIe (total GPU DRAM across nodes 1.22 TB)Software: Ubuntu 18.04, RAPIDS 0.5.1, Dask=1.1.1, Dask-Distributed=1.1.1, CuPY=5.2.0, CUDA 10.0.130
https://blog.dask.org/2019/01/03/dask-array-gpus-first-steps
![Page 36: The Platform Inside and Out · 2020/4/14 · • Easy Training: With the same APIs • Trusted: With the same developer community PyData Native • Easy to install and use on a laptop](https://reader036.vdocument.in/reader036/viewer/2022081615/5fd2f73d5d65031a7d1b30d0/html5/thumbnails/36.jpg)
36
cuML
![Page 37: The Platform Inside and Out · 2020/4/14 · • Easy Training: With the same APIs • Trusted: With the same developer community PyData Native • Easy to install and use on a laptop](https://reader036.vdocument.in/reader036/viewer/2022081615/5fd2f73d5d65031a7d1b30d0/html5/thumbnails/37.jpg)
37
GPU Memory
Data Preparation VisualizationModel Training
Dask
Machine LearningMore models more problems
cuDF cuIOAnalytics
cuMLMachine Learning
cuGraphGraph Analytics
PyTorch Chainer MxNet
Deep Learning
cuxfilter <> pyVizVisualization
![Page 38: The Platform Inside and Out · 2020/4/14 · • Easy Training: With the same APIs • Trusted: With the same developer community PyData Native • Easy to install and use on a laptop](https://reader036.vdocument.in/reader036/viewer/2022081615/5fd2f73d5d65031a7d1b30d0/html5/thumbnails/38.jpg)
38
ProblemData sizes continue to grow
Histograms / Distributions
Dimension ReductionFeature Selection
Remove Outliers
Sampling
Massive Dataset
Better to start with as much data aspossible and explore / preprocess to scaleto performance needs.
Iterate. Cross Validate & Grid Search.
Iterate some more.
Meet reasonable speed vs accuracy tradeoff
Hours? Days?
TimeIncrease
s
![Page 39: The Platform Inside and Out · 2020/4/14 · • Easy Training: With the same APIs • Trusted: With the same developer community PyData Native • Easy to install and use on a laptop](https://reader036.vdocument.in/reader036/viewer/2022081615/5fd2f73d5d65031a7d1b30d0/html5/thumbnails/39.jpg)
39
ML Technology Stack
Python
Cython
cuML Algorithms
cuML Prims
CUDA Libraries
CUDA
Dask cuMLDask cuDF
cuDFNumpy
ThrustCub
cuSolvernvGraphCUTLASScuSparsecuRandcuBlas
![Page 40: The Platform Inside and Out · 2020/4/14 · • Easy Training: With the same APIs • Trusted: With the same developer community PyData Native • Easy to install and use on a laptop](https://reader036.vdocument.in/reader036/viewer/2022081615/5fd2f73d5d65031a7d1b30d0/html5/thumbnails/40.jpg)
40
AlgorithmsGPU-accelerated Scikit-Learn
Classification / Regression
Inference
Clustering
Decomposition & Dimensionality Reduction
Time Series
Decision Trees / Random ForestsLinear RegressionLogistic RegressionK-Nearest NeighborsSupport Vector Machines
Random forest / GBDT inference
K-MeansDBSCANSpectral Clustering
Principal ComponentsSingular Value DecompositionUMAPSpectral EmbeddingT-SNE
Holt-WintersSeasonal ARIMA
Cross Validation
More to come!
Hyper-parameter Tuning Key:● Preexisting● NEW or enhanced for
0.13
![Page 41: The Platform Inside and Out · 2020/4/14 · • Easy Training: With the same APIs • Trusted: With the same developer community PyData Native • Easy to install and use on a laptop](https://reader036.vdocument.in/reader036/viewer/2022081615/5fd2f73d5d65031a7d1b30d0/html5/thumbnails/41.jpg)
41
RAPIDS matches common Python APIs
from sklearn.cluster import DBSCANdbscan = DBSCAN(eps = 0.3, min_samples = 5)
dbscan.fit(X)
y_hat = dbscan.predict(X)
Find Clusters
from sklearn.datasets import make_moonsimport pandas
X, y = make_moons(n_samples=int(1e2), noise=0.05, random_state=0)
X = pandas.DataFrame({'fea%d'%i: X[:, i] for i in range(X.shape[1])})
CPU-Based Clustering
![Page 42: The Platform Inside and Out · 2020/4/14 · • Easy Training: With the same APIs • Trusted: With the same developer community PyData Native • Easy to install and use on a laptop](https://reader036.vdocument.in/reader036/viewer/2022081615/5fd2f73d5d65031a7d1b30d0/html5/thumbnails/42.jpg)
42
RAPIDS matches common Python APIs
from cuml import DBSCANdbscan = DBSCAN(eps = 0.3, min_samples = 5)
dbscan.fit(X)
y_hat = dbscan.predict(X)
Find Clusters
from sklearn.datasets import make_moonsimport cudf
X, y = make_moons(n_samples=int(1e2), noise=0.05, random_state=0)
X = cudf.DataFrame({'fea%d'%i: X[:, i] for i in range(X.shape[1])})
GPU-Accelerated Clustering
![Page 43: The Platform Inside and Out · 2020/4/14 · • Easy Training: With the same APIs • Trusted: With the same developer community PyData Native • Easy to install and use on a laptop](https://reader036.vdocument.in/reader036/viewer/2022081615/5fd2f73d5d65031a7d1b30d0/html5/thumbnails/43.jpg)
43
Benchmarks: single-GPU cuML vs scikit-learn
1x V100 vs.
2x 20 core CPU
![Page 44: The Platform Inside and Out · 2020/4/14 · • Easy Training: With the same APIs • Trusted: With the same developer community PyData Native • Easy to install and use on a laptop](https://reader036.vdocument.in/reader036/viewer/2022081615/5fd2f73d5d65031a7d1b30d0/html5/thumbnails/44.jpg)
44
cuML’s Forest Inference Library accelerates prediction (inference) for random forests and boosted decision trees:
● Works with existing saved models (XGBoost, LightGBM, scikit-learn RF cuML RF soon)
● Lightweight Python API● Single V100 GPU can infer up to 34x
faster than XGBoost dual-CPU node● Over 100 million forest inferences
per sec (with 1000 trees) on a DGX-1 for large (sparse) or dense models
Forest InferenceTaking models from training to production
23x 36x 34x 23x
![Page 45: The Platform Inside and Out · 2020/4/14 · • Easy Training: With the same APIs • Trusted: With the same developer community PyData Native • Easy to install and use on a laptop](https://reader036.vdocument.in/reader036/viewer/2022081615/5fd2f73d5d65031a7d1b30d0/html5/thumbnails/45.jpg)
45
● RAPIDS works closely with the XGBoost community to accelerate GBDTs on GPU
● The default rapids conda metapackage includes XGBoost
● XGBoost can seamlessly load data from cuDF dataframes and cuPy arrays
● Dask allows XGBoost to scale to arbitrary numbers of GPUs
● With the gpu_hist tree method, a single GPU can outpace 10s to 100s of CPUs
● Version 1.0 of XGBoost launched with the RAPIDS 0.13 stack.
+
![Page 46: The Platform Inside and Out · 2020/4/14 · • Easy Training: With the same APIs • Trusted: With the same developer community PyData Native • Easy to install and use on a laptop](https://reader036.vdocument.in/reader036/viewer/2022081615/5fd2f73d5d65031a7d1b30d0/html5/thumbnails/46.jpg)
46
Road to 1.0 March 2020 - RAPIDS 0.13
![Page 47: The Platform Inside and Out · 2020/4/14 · • Easy Training: With the same APIs • Trusted: With the same developer community PyData Native • Easy to install and use on a laptop](https://reader036.vdocument.in/reader036/viewer/2022081615/5fd2f73d5d65031a7d1b30d0/html5/thumbnails/47.jpg)
47
Road to 1.0 2020 - RAPIDS 1.0
![Page 48: The Platform Inside and Out · 2020/4/14 · • Easy Training: With the same APIs • Trusted: With the same developer community PyData Native • Easy to install and use on a laptop](https://reader036.vdocument.in/reader036/viewer/2022081615/5fd2f73d5d65031a7d1b30d0/html5/thumbnails/48.jpg)
48
cuGraph
![Page 49: The Platform Inside and Out · 2020/4/14 · • Easy Training: With the same APIs • Trusted: With the same developer community PyData Native • Easy to install and use on a laptop](https://reader036.vdocument.in/reader036/viewer/2022081615/5fd2f73d5d65031a7d1b30d0/html5/thumbnails/49.jpg)
49
GPU Memory
Data Preparation VisualizationModel Training
Dask
Graph AnalyticsMore connections more insights
cuDF cuIOAnalytics
cuMLMachine Learning
cuGraphGraph Analytics
PyTorch Chainer MxNet
Deep Learning
cuxfilter <> pyVizVisualization
![Page 50: The Platform Inside and Out · 2020/4/14 · • Easy Training: With the same APIs • Trusted: With the same developer community PyData Native • Easy to install and use on a laptop](https://reader036.vdocument.in/reader036/viewer/2022081615/5fd2f73d5d65031a7d1b30d0/html5/thumbnails/50.jpg)
50
Goals and Benefits of cuGraphFocus on Features and User Experience
• Property Graph support via DataFrames
Seamless Integration with cuDF and cuML
• Up to 500 million edges on a single 32GB GPU• Multi-GPU support for scaling into the billions of
edges
Breakthrough Performance
• Python: Familiar NetworkX-like API• C/C++: lower-level granular control for
application developers
Multiple APIs
• Extensive collection of algorithm, primitive, and utility functions
Growing Functionality
![Page 51: The Platform Inside and Out · 2020/4/14 · • Easy Training: With the same APIs • Trusted: With the same developer community PyData Native • Easy to install and use on a laptop](https://reader036.vdocument.in/reader036/viewer/2022081615/5fd2f73d5d65031a7d1b30d0/html5/thumbnails/51.jpg)
51
AlgorithmsGPU-accelerated NetworkX
Community
Components
Link Analysis
Link Prediction
Traversal
Centrality
Spectral ClusteringBalanced-CutModularity Maximization
LouvainEnsemble Clustering for GraphsSubgraph ExtractionKCore and KCore NumberTriangle CountingK-Truss
JaccardWeighted JaccardOverlap Coefficient
Single Source Shortest Path (SSSP)Breadth First Search (BFS)
Graph Classes
Multi-GPU
More to come!
Utilities
Weakly Connected ComponentsStrongly Connected Components
KatzBetweenness Centrality
Structure
RenumberingAuto-renumbering
Page Rank (Multi-GPU)Personal Page Rank
![Page 52: The Platform Inside and Out · 2020/4/14 · • Easy Training: With the same APIs • Trusted: With the same developer community PyData Native • Easy to install and use on a laptop](https://reader036.vdocument.in/reader036/viewer/2022081615/5fd2f73d5d65031a7d1b30d0/html5/thumbnails/52.jpg)
52
Multi-GPU PageRank Performance
PageRank portion of the HiBench benchmark suite
HiBench Scale Vertices Edges CSV File (GB)
# of V100 GPUs
# of CPU Threads
PageRank for3 Iterations (secs)
Huge 5,000,000 198,000,000 3 1 1.1
BigData 50,000,000 1,980,000,000 34 3 5.1
BigData x2 100,000,000 4,000,000,000 69 6 9.0
BigData x4 200,000,000 8,000,000,000 146 12 18.2
BigData x8 400,000,000 16,000,000,000 300 16 31.8
BigData x8 400,000,000 16,000,000,000 300 800* 5760*
*BigData x8, 100x 8-vCPU nodes, Apache Spark GraphX ⇒ 96 mins!
![Page 53: The Platform Inside and Out · 2020/4/14 · • Easy Training: With the same APIs • Trusted: With the same developer community PyData Native • Easy to install and use on a laptop](https://reader036.vdocument.in/reader036/viewer/2022081615/5fd2f73d5d65031a7d1b30d0/html5/thumbnails/53.jpg)
53
Community
![Page 54: The Platform Inside and Out · 2020/4/14 · • Easy Training: With the same APIs • Trusted: With the same developer community PyData Native • Easy to install and use on a laptop](https://reader036.vdocument.in/reader036/viewer/2022081615/5fd2f73d5d65031a7d1b30d0/html5/thumbnails/54.jpg)
54
Ecosystem Partners
![Page 55: The Platform Inside and Out · 2020/4/14 · • Easy Training: With the same APIs • Trusted: With the same developer community PyData Native • Easy to install and use on a laptop](https://reader036.vdocument.in/reader036/viewer/2022081615/5fd2f73d5d65031a7d1b30d0/html5/thumbnails/55.jpg)
55
Building on top of RAPIDSA bigger, better, stronger ecosystem for all
Streamz
High-Performance Serverless event and data processing that utilizes RAPIDS for GPU Acceleration
Distributed stream processing using RAPIDS and Dask
GPU accelerated SQL engine built on top of RAPIDS
![Page 56: The Platform Inside and Out · 2020/4/14 · • Easy Training: With the same APIs • Trusted: With the same developer community PyData Native • Easy to install and use on a laptop](https://reader036.vdocument.in/reader036/viewer/2022081615/5fd2f73d5d65031a7d1b30d0/html5/thumbnails/56.jpg)
56
Easy InstallationInteractive Installation Guide
![Page 57: The Platform Inside and Out · 2020/4/14 · • Easy Training: With the same APIs • Trusted: With the same developer community PyData Native • Easy to install and use on a laptop](https://reader036.vdocument.in/reader036/viewer/2022081615/5fd2f73d5d65031a7d1b30d0/html5/thumbnails/57.jpg)
57
Contribute BackIssues, feature requests, PRs, Blogs, Tutorials, Videos, QA...bring your best!
![Page 58: The Platform Inside and Out · 2020/4/14 · • Easy Training: With the same APIs • Trusted: With the same developer community PyData Native • Easy to install and use on a laptop](https://reader036.vdocument.in/reader036/viewer/2022081615/5fd2f73d5d65031a7d1b30d0/html5/thumbnails/58.jpg)
58
Getting Started
![Page 59: The Platform Inside and Out · 2020/4/14 · • Easy Training: With the same APIs • Trusted: With the same developer community PyData Native • Easy to install and use on a laptop](https://reader036.vdocument.in/reader036/viewer/2022081615/5fd2f73d5d65031a7d1b30d0/html5/thumbnails/59.jpg)
59
RAPIDS DocsImproved and easier to use!
https://docs.rapids.ai
![Page 60: The Platform Inside and Out · 2020/4/14 · • Easy Training: With the same APIs • Trusted: With the same developer community PyData Native • Easy to install and use on a laptop](https://reader036.vdocument.in/reader036/viewer/2022081615/5fd2f73d5d65031a7d1b30d0/html5/thumbnails/60.jpg)
60
• https://ngc.nvidia.com/registry/nvidia-rapidsai-rapidsai
• https://hub.docker.com/r/rapidsai/rapidsai/
• https://github.com/rapidsai
• https://anaconda.org/rapidsai/
RAPIDSHow do I get the software?