petuum - a new platform

7/26/2019 Petuum - A new Platform

1/15

PETUUMa New Platform for Distributed Machine Learning on Big Data

- Seminar on Big Data Architectures for Machine Learning and Data Mini


2/15

utline

Introduction the need

current problems

Motivation why build a new

framework?!

scalability

Components

& Approach key concepts

data-parallelism

model-parallelism

System Design internals -

schematic view

scheduler, workers

YARN compatibility problems &

solutions proposed

Performan Comparison w

different plat


3/15

ntroduction

Machine Learning is becoming a primary mechanism for extracting inform

Need ML methods to scale beyond single machine.

Flickr, Instagram and Facebook 10s of billions of images.

Highly inefficient to use such big data sequentially in a batch fashion in a ML algorithm

Despite rapid development of many new ML models and algorithms aim

application, adoption of these technologies remains generally unseen in

mining, NLP, vision, and other application communities.

Difficult migration from an academic implementation (small desktop PCs

clusters) to a big, less predictable platform (cloud or a corporate cluster)

models and algorithms from being widely applied.


4/15

otivation

Find a systematic way to efficiently apply a wide spectrum of adprograms to industrial scale problems, using Big Models (up to 1of parameters) on Big Data (up to terabytes or petabytes).

Modern parallelization strategies employ fine-grained operationdifficult to find a universal platform applicable to a wide range oat scale.

Problems with other platforms

Hadoop :o simplicity of MapReduce difficult to exploit ML properties (erro

o performance on many ML programs has been surpassed by alterna

Spark :

o does not offer fine-grained scheduling of computation and commuand correct execution of advanced ML algorithms.


5/15

LDA - 20m unique words and 1

topics (220b sparse parameters

a cluster with total 256 cores a

1TB memory.

MF - 20m-by-20k matrix and ra400 (8b parameters) on a cluste

with total 128 cores and 1TB

memory.

CNN - 1b parameters on a clu

with total 1024 CPU cores an

2TB memory. GPUs not requi

Fig. 1: The scale of Big ML efforts in recent literature. A key goal of Petuum is to enable lML models to be run on fewer resources, even relative to highly-specialized implementat

otivation


6/15

omponents pproach

Formalized ML algorithms as iterative-convergent programs:

o stochastic gradient descent

o MCMC for determining point estimates in latent variable models

o coordinate descent, variational methods for graphical methods

o proximal optimization for structured sparsity problems, and othe

Found out the shared properties across all algorithms.

Key lies in the recognition of a clear dichotomy(division) b/w DATA an

This inspired bimodal approach to parallelism: data parallel and mod

and execution of a big ML program over cluster of machines.


7/15

This approach exploits unique statistical nature of ML algorithms, mainly three properties:

Error tolerance iterative-convergent algorithms are robust against limited errors in intermcalculations.

Dynamic structural dependency changing correlation strengths between model parameteefficient parallelization.

Non-uniform convergence No. of steps required for a parameter to converge can be highparameters.

Fig. 2: Key pro

algorithms:

(a) Non-unifor

(b) Error-toler

(c) Dependenc

variables.

omponents pproach


8/15

Iterative-Convergent ML Algorithm: Given data D and loss L (i.e., fitness function such a

lihood), a typical ML problem can be grounded as executing the following update equa

iteratively, until the model state (i.e., parameters and/or latent variables)

A reaches some stopping criteria:

The update function L() (which improves the loss L) performs computation on data D

model stateA, and outputs intermediate results to be aggregated by F().

omponents pproach


9/15

the data D is partitioned and assigned to

computational workers;

we assume that the function () can be applied to

each of these data subsets independently

() are aggregated via summation

Each parallel worker contributes equally

the dataA is partitioned and assign

unlike data-parallelism each update

takes a scheduling function Spt-1() w

to operate on a subset of the mode

the model parameters are not inde

each parallel worker contributes eq

omponents pproach

Fig. 3: Petuum Data-Parallelism Fig. 4: Petuum Model-P


10/15

ystem design

Fig. 5: Petuum scheduler, workers, parameter servers.

Petuum Goal: allow users to easily implement data-parallel andmodel-parallel ML algorithms!

Parameter Server (PS):

enables data-parallelism

global read/write access

three functions: PS.get

Scheduler:

enables model-paralleliscontrol which model par

worker machines

scheduling function sch

of parameters for each w

Workers:

receives parameters to b

schedule(), and th

functionspush()


11/15

ystem design

STRADSBSEN POSEIDON

parameter server for data-parallel Big Data AI & ML

uses a consistency model,which allows asynchronous-

like performance and bulk

synchronous execution, yet

does not sacrifice ML

algorithm correctness

scheduler for model-parallelHigh Speed AI & ML

performs fine-grainedscheduling of ML update

operations

prioritizing computation on the

parts of the ML program that

need it most

avoiding unsafe parallel

operations that could hurt

performance

distributed Glearning fram

on Caffe andBsen

enjoys speed

communicati

bandwidth m

features of B


12/15

ARN compatibility

Problem:

many industrial and academic clusters run HadoopMapReduce

however, programs that are written for stand-alone clusters are n

with YARN/HDFS, and vice versa, applications written for YARN/HD

compatible with stand alone clusters.

Solution: providing common libraries that work on either Hadoop or non-H

o YARN launcher that will deploy any Petuum application (inclu

written ones) onto a Hadoop cluster

o a data access library for HDFS access, which allows users to w

stream code that works on both HDFS files the local filesystem


13/15

erformance

Fig. 6: Petuum relative speedup vs

popular platforms (larger is better).

Across ML programs, Petuum is at least

2-10 times faster than popular

implementations.

Fig. 7: Matrix Factorization convergence time: Petuu

Spark. Petuum is fastest and the most memory-effic

platform that could handle Big MF models with rank

given hardware budget.


14/15

Fig. 8: Petuum Program Structure

Fig. 9: Petuum DML data-parallel ps

xample


15/15

eferences

[PETUUM A New Platform for Distributed Machine Learning on Big Data Eric] P. Xing

Wei Dai, Jin Kyu Kim Jinliang Wei, Seunghak Lee Xun Zheng, Pengtao Xie AbhimanuYaoliang Yu. IEEE Transactions on Big Data, June 2015

https://github.com/petuum

http://www.simplilearn.com/how-facebook-is-using-big-data-article

hank ou !

petuum - a new platform

Documents