microservices & teraflops: effortlessly scaling data science with pywren | anacondacon 2017

51

Upload: continuum-analytics

Post on 19-Mar-2017

165 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Page 1: Microservices & Teraflops: Effortlessly Scaling Data Science with PyWren | AnacondaCON 2017
Page 2: Microservices & Teraflops: Effortlessly Scaling Data Science with PyWren | AnacondaCON 2017

MICROSERVICES & TERAFLOPS

Effortlessly scaling data science #thecloudistoodamnhard

Eric Jonas Postdoctoral Researcher [email protected] | @stochastician

Page 3: Microservices & Teraflops: Effortlessly Scaling Data Science with PyWren | AnacondaCON 2017
Page 4: Microservices & Teraflops: Effortlessly Scaling Data Science with PyWren | AnacondaCON 2017

A BIG FAN OF ANACONDA

Page 5: Microservices & Teraflops: Effortlessly Scaling Data Science with PyWren | AnacondaCON 2017

“BIG” DATA(near-by) stars neurons nuclei

size 10^9 m 10^-5m 10^-14m

number 1 10^11 10^26

data size 2 PB 12 TB/sec ??/sec

Page 6: Microservices & Teraflops: Effortlessly Scaling Data Science with PyWren | AnacondaCON 2017

images courtesy NASA SOHO

Sun in UV (304 Å)you are here

Page 7: Microservices & Teraflops: Effortlessly Scaling Data Science with PyWren | AnacondaCON 2017

Solar Flare Prediction Using Photospheric and Coronal Image Data. Jonas, Bobra, Shankar, Recht. American Geophysical Union, 2016

Page 8: Microservices & Teraflops: Effortlessly Scaling Data Science with PyWren | AnacondaCON 2017

NEUROSCIENCE AT ALL SCALES

Page 9: Microservices & Teraflops: Effortlessly Scaling Data Science with PyWren | AnacondaCON 2017

Could a Neuroscientist understand a microprocessor? Jonas, Kording. PLOS Computational Biology, 2017

Page 10: Microservices & Teraflops: Effortlessly Scaling Data Science with PyWren | AnacondaCON 2017

AND I WANT MORE!

Page 11: Microservices & Teraflops: Effortlessly Scaling Data Science with PyWren | AnacondaCON 2017

Superresolution

Phase contrastTomography

Adaptive Optics

Page 12: Microservices & Teraflops: Effortlessly Scaling Data Science with PyWren | AnacondaCON 2017

How do you get busy physicists and electrical engineers to give up Matlab?

How do we get busy astronomers

to give up IDL?

Page 13: Microservices & Teraflops: Effortlessly Scaling Data Science with PyWren | AnacondaCON 2017

Why is there no “cloud button”?

PREVIOUSLY, ON

Page 14: Microservices & Teraflops: Effortlessly Scaling Data Science with PyWren | AnacondaCON 2017

The cloud is too damn hard!

Jimmy McMillanFounder and Chairman The Rent is Too Damn High Party

Less than half of the graduatestudents in our group have

ever written a Spark or Hadoop job

Page 15: Microservices & Teraflops: Effortlessly Scaling Data Science with PyWren | AnacondaCON 2017

–Eric Jonas, 2017“I hate computers”

Page 16: Microservices & Teraflops: Effortlessly Scaling Data Science with PyWren | AnacondaCON 2017
Page 17: Microservices & Teraflops: Effortlessly Scaling Data Science with PyWren | AnacondaCON 2017

#THECLOUDISTOODAMNHARD

• What type? what instance? What base image?

• How many to spin up? What price? spot?

• wait, Wait, WAIT oh god

• now what? DEVOPS

Page 18: Microservices & Teraflops: Effortlessly Scaling Data Science with PyWren | AnacondaCON 2017

WHAT DO WE WANT?

1. Very little overhead for setup once someone has an AWS account. In particular, no persistent overhead -- you don't have to keep a large (expensive) cluster up and you don't have to wait 10+ min for a cluster to come up

Page 19: Microservices & Teraflops: Effortlessly Scaling Data Science with PyWren | AnacondaCON 2017

WHAT DO WE WANT?

2. As close to zero overhead for users as possible In particular, anyone who can write python should be able to invoke it through a reasonable interface. It should support all legacy code

Page 20: Microservices & Teraflops: Effortlessly Scaling Data Science with PyWren | AnacondaCON 2017

WHAT DO WE WANT?

3. Target jobs that run in the minutes-or-more regime.

Page 21: Microservices & Teraflops: Effortlessly Scaling Data Science with PyWren | AnacondaCON 2017

WHAT DO WE WANT?

4. I don't want to run a service. That is, I personally don't want to offer the front-end for other people to use, rather, I want to directly pay AWS.

Page 22: Microservices & Teraflops: Effortlessly Scaling Data Science with PyWren | AnacondaCON 2017

WHAT DO WE WANT?

5. It has to be from a cloud player that's likely to give out an academic grant -- AWS, Google, MS Azure. There are startups in this space that might build cool technology, but often don't want to be paid in AWS research credits.

Page 23: Microservices & Teraflops: Effortlessly Scaling Data Science with PyWren | AnacondaCON 2017

WHAT WE WANT1.Very little overhead for setup once someone has an AWS account. In particular, no persistent overhead -- you don't have to keep a large (expensive) cluster up and you don't have to wait 10+ min for a cluster to come up

2.As close to zero overhead for users as possible -- in particular, anyone who can write python should be able to invoke it through a reasonable interface.

3.Target jobs that run in the minutes-or-more regime.

4.I don't want to run a service. That is, I personally don't want to offer the front-end for other people to use, rather, I want to directly pay AWS.

5.It has to be from a cloud player that's likely to give out an academic grant -- AWS, Google, Azure. There are startups in this space that might build cool technology, but often don't want to be paid in AWS research credits.

Page 24: Microservices & Teraflops: Effortlessly Scaling Data Science with PyWren | AnacondaCON 2017

Powered by Continuum Analytics

+

Page 25: Microservices & Teraflops: Effortlessly Scaling Data Science with PyWren | AnacondaCON 2017

–Eric Jonas, 2017“I hate computers”

servers

Page 26: Microservices & Teraflops: Effortlessly Scaling Data Science with PyWren | AnacondaCON 2017

• 300 seconds single-core (AVX2)

• 512 MB in /tmp

• 1.5GB RAM

• Python, Java, Node

AWS LAMBDA

Page 27: Microservices & Teraflops: Effortlessly Scaling Data Science with PyWren | AnacondaCON 2017

THE API

Page 28: Microservices & Teraflops: Effortlessly Scaling Data Science with PyWren | AnacondaCON 2017

LAMBDA SCALABILITYCompute Data

Page 29: Microservices & Teraflops: Effortlessly Scaling Data Science with PyWren | AnacondaCON 2017

YOU CAN DO A LOT OF WORK WITH MAP!

ETL parametertuning

Page 30: Microservices & Teraflops: Effortlessly Scaling Data Science with PyWren | AnacondaCON 2017

IMAGENET EXAMPLEPreprocess 1.4M images from

IMAGENETCompute GIST image descriptor(some random python code off

the internet)

Page 31: Microservices & Teraflops: Effortlessly Scaling Data Science with PyWren | AnacondaCON 2017
Page 32: Microservices & Teraflops: Effortlessly Scaling Data Science with PyWren | AnacondaCON 2017
Page 33: Microservices & Teraflops: Effortlessly Scaling Data Science with PyWren | AnacondaCON 2017

HOW IT WORKS

pull job from s3download anaconda runtime

python to run codepickle resultstick in S3

your laptop the cloud

future = runner.map(fn, data)

Serialize func and dataPut on S3Invoke Lambda

func datadatadata

future.result()

poll S3unpickle and return

result

Page 34: Microservices & Teraflops: Effortlessly Scaling Data Science with PyWren | AnacondaCON 2017

A BRIEF HISTORY OF SHARING

Overhead

Isolat

ion

Processes1960s, MULTICS

Virtual Machines

1990s, VMWare, Xen

Renting/VPS1990s, SGE

HW VMs2000s, Intel VT-X

Containers2008 chroot/LXC

(mostly wrong)

• Process isolation

• network isolation

• filesystem isolation

• memory / cpu constraints

Page 35: Microservices & Teraflops: Effortlessly Scaling Data Science with PyWren | AnacondaCON 2017

(Leptotyphlops carlae)

Start

Delete non-AVX2 MKL

strip shared libs

conda clean

eliminate pkg

delete pyc

977 MB

1205MB

441MB

946 MB

670 MB

510MB

Want our runtime to include

Page 36: Microservices & Teraflops: Effortlessly Scaling Data Science with PyWren | AnacondaCON 2017

MAP IS NOT ENOUGH? A lot of data analytics looks like:

ETL / preprocessing featurizationData machine learning

Distributed! Scale! TensorFlow

Deep MLBaseGreat PyWren Fit

Page 37: Microservices & Teraflops: Effortlessly Scaling Data Science with PyWren | AnacondaCON 2017

–Paul Barnum, quoted in McSherry, 2015

“You can have a second computer when you’ve shown you know how to use the first one.”

Page 38: Microservices & Teraflops: Effortlessly Scaling Data Science with PyWren | AnacondaCON 2017

Scalability! But at what COST? Frank McSherry, Michael Isard, Derek G. Murray. USENIX Hot Topics In Operating Systems, 2015

Page 39: Microservices & Teraflops: Effortlessly Scaling Data Science with PyWren | AnacondaCON 2017

SINGLE-MACHINE REDUCE

But I don’t have a big server!

futures = exec.map(function, data)answer = exec.reduce(reduce_func, futures)

cores RAM COST

x1.32xlarge 64 2 TB $14/hr

x1.16xlarge 32 1TB $7/hr

p2.16xlarge 32 + 16 GPUs 750 GB $14/hr

r4.16xlarge 32 500 GB $4/hr

Page 40: Microservices & Teraflops: Effortlessly Scaling Data Science with PyWren | AnacondaCON 2017

STUPID LAMBDA TRICKS

Shivaram told me todayhe has this up to 6M/sec

transactions (!)

Page 41: Microservices & Teraflops: Effortlessly Scaling Data Science with PyWren | AnacondaCON 2017

BUT I CAN’T USE THE CLOUD!

Page 42: Microservices & Teraflops: Effortlessly Scaling Data Science with PyWren | AnacondaCON 2017

PYWREN MAKES SCALE A BIT EASIER• Do you have a python

function?

• Do you want to scale it?

• Try it out!

• Map : Today

• BigReduce : 1.0 in a week

• Parameter server : Experimental

Page 43: Microservices & Teraflops: Effortlessly Scaling Data Science with PyWren | AnacondaCON 2017

THANKS! https://github.com/ericmjonas/pywren

ShivaramVenkataraman

BenRecht

IonStoica

Page 44: Microservices & Teraflops: Effortlessly Scaling Data Science with PyWren | AnacondaCON 2017
Page 45: Microservices & Teraflops: Effortlessly Scaling Data Science with PyWren | AnacondaCON 2017

EXTRA SLIDES

Page 46: Microservices & Teraflops: Effortlessly Scaling Data Science with PyWren | AnacondaCON 2017

BEHIND THE HOOD

Page 47: Microservices & Teraflops: Effortlessly Scaling Data Science with PyWren | AnacondaCON 2017

UNDERSTANDINGHOST ALLOCATION

Page 48: Microservices & Teraflops: Effortlessly Scaling Data Science with PyWren | AnacondaCON 2017

SO WHEN IS THIS USEFUL?• Parameter searching

• Last-minute NIPS experiments

• Expensive forward modelsm

assiv

ely p

arall

el co

mpu

te

serial/ local

mas

sively

par

allel

com

pute

serial/ local

mas

sively

par

allel

com

pute

serial/ local

mas

sively

par

allel

com

pute

serial/ local

Page 49: Microservices & Teraflops: Effortlessly Scaling Data Science with PyWren | AnacondaCON 2017

GETTING AROUND THE LIMITATIONS

• Runtime [anaconda]

• Job lifetime [generators]

• Synchronization (memcache/redis?)

• inter-lambda IPC

Page 50: Microservices & Teraflops: Effortlessly Scaling Data Science with PyWren | AnacondaCON 2017

WORKER REUSE

Page 51: Microservices & Teraflops: Effortlessly Scaling Data Science with PyWren | AnacondaCON 2017

COORDINATION?