jammy zhou - linaro · deep learning pipelines deep learning pipelines is built on spark ml...

Post on 04-Jun-2020

6 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

The Convergence of Big Data and AIJammy Zhou - Linaro

Big Data and AI

Data Science

Data Analytics

ArtificialIntelligence

Machine Learning

Deep Learning

Big DataDatasets

Algorithms

A unified cluster for both!

ML/DL Integration with Big Data

TensorFlowOnSparkCaffeOnSpark

Deep Learning Pipelines

Apache Initiatives 3rd Party Solutions

From Industry Vendors From Chip Vendors

Apache Spark EcosystemA unified analytics engine for large scale data processing

Standalone Hadoop YARN Mesos KubernetesResource Manager

Spark CoreComputing Engine

Spark SQL Spark Streaming MLlib GraphXService Modules

Languages JavaScala Python R

Storage & Data Sources HDFS HBase CassandraHive …...

AWS EC2

SQL

Alluxio

ML Pipelines in MLlib● ML Pipelines provide a set of APIs built on top of DataFrames from Spark SQL● Transformer is an algorithm to transform one DataFrame into another● Estimator is an algorithm to be fit on a DataFrame to produce a Transformer● Pipeline chains multiple Transformers and Estimators to specify a workflow

Transformer Transformer EstimatorDataFrame

Pipeline

PipelineModel

Transformer Transformer ModelDataFrame

PipelineModel

Result

ML Algorithms in MLlib

● Classification & Regression○ Logistic Regression○ Decision Tree○ Random Forest○ Gradient-Boosted Tree○ Linear Regression○ Multilayer Perceptron○ Linear Support Vector Machine○ Naive Bayes

● Clustering○ K-means○ Latent Dirichlet Allocation○ Bisecting k-means○ Gaussian Mixture Model

● Collaborative Filtering● Frequent Pattern Mining

○ FP-Growth○ PrefixSpan

How about Deep Learning?

Deep Learning Pipelines● Deep Learning Pipelines is built on Spark ML Pipelines by Databricks● Images are loaded into a DataFrame and decoded automatically● Enable fast transfer learning with Featurizer to reuse pre-trained models

● Apply pre-trained deep learning models as Transformers○ TF-backed Keras models and TF Graphs are supported

● Deploy models with Spark DataFrames and SQL UDFs● Distributed hyperparameter tuning with Estimator and MLlib built-in tools like

CrossValidator and TrainValidationSplit

source

Project Hydrogen● A Spark initiative to unify the Big Data and AI workloads● Barrier execution mode was introduced in Spark to run distributed DL job as

Spark job with gang scheduling○ Horovod integration via HorovodRunner (by Databricks Runtime ML) or horovod.spark (by

Horovod) to run Horovod as a Spark job

● Optimized data exchange between Spark and DL frameworks○ Pandas UDF implementation via Apache Arrow

● Accelerator aware scheduling○ Heterogeneous accelerator support by resource managers like YARN, Mesos & Kubernetes

Distributed Deep Learning● Distributed support is critical to integrate DL frameworks with Spark

● Parallelism for Deep Learning○ Data parallelism (a.k.a between-graph replication)

■ Synchronous vs. asynchronous■ Centralized vs. decentralized for synchronous training

● Parameter server for centralized mode● Ring-allreduce for decentralized mode

■ Parameter server can also be used for asynchronous training○ Model parallelism (a.k.a in-graph replication)

● Multi-device & multi-node communication○ Interconnect: PCIe, NVLink, xGMI, InfiniBand, Omni-Path, High-Speed Ethernet, RoCE○ Libraries: OpenMPI, NCCL (Nvidia), RCCL (AMD), libfabric (OpenFabrics), UCX

source

Distributed Framework Support

[1] TensorFlow has MPI collectives for Baidu allreduce, Horovod replaces Baidu allreduce with NCCL[2] CollectiveAllReduceStrategy is used by HopsML[3] HorovodEstimator is Horovod integration with Spark MLlib for distributed training

TensorFlow

TensorFlowOnSpark Horovod HopsML

Parameter server

[1][2]

HorovodEstimator

KerasBackend

PyTorchMXNet

Ring-allreduce

[3]

Angel ML

The Arm Story● Linaro Data Center and Cloud Group

○ Big Data Lead Project○ HPC SIG (SVE, MPI, math libraries, etc)

● Linaro Machine Intelligence Initiative○ Initial focus on inference support with Cortex-A SoCs○ ArmNN, TVM, etc

● Nvidia AI and HPC stack for Arm (planned for end of 2019)○ Announced at ISC 19 in Frankfurt on June 17th○ Lift the major barrier to integrate AI solutions with Big Data on Arm platforms

● What’s next?

Thank youJoin Linaro to accelerate deployment of your Arm-based solutions through collaboration

contactus@linaro.org

top related