3 ways to move your data science projects to production: secure and scalable data science deployment...

49
© 2016 Continuum Analytics - Confidential & Proprietary © 2017 Continuum Analytics - Confidential & Proprietary Three Ways to Move Your Data Science Projects to Production Secure and Scalable Data Science Deployment with Anaconda Christine Doig and Kris Overholt May 24, 2017

Upload: anaconda

Post on 22-Jan-2018

3.747 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: 3 Ways to Move Your Data Science Projects to Production: Secure and Scalable Data Science Deployment with Anaconda

© 2016 Continuum Analytics - Confidential & Proprietary© 2017 Continuum Analytics - Confidential & Proprietary

Three Ways to Move Your Data Science Projects to ProductionSecure and Scalable Data Science Deployment with AnacondaChristine Doig and Kris Overholt

May 24, 2017

Page 2: 3 Ways to Move Your Data Science Projects to Production: Secure and Scalable Data Science Deployment with Anaconda

© 2017 Continuum Analytics - Confidential & Proprietary

• Worked on MEMEX, a DARPA-funded project helping stop human trafficking

• Co-author of the recently published book, Breaking Data Science Open, published by O’Reilly

• 5+ years of experience in analytics, operations research, and machine learning

• MS in Industrial Engineering, Polytechnic University of Catalonia, Barcelona.

Christine Doig, Senior Data Scientist

Page 3: 3 Ways to Move Your Data Science Projects to Production: Secure and Scalable Data Science Deployment with Anaconda

© 2017 Continuum Analytics - Confidential & Proprietary

• Developing the cluster management features of Anaconda • 10+ years of experience in scientific computing, systems administration,

computational modeling and more • Ph.D. in Civil Engineering, University of Texas • Master’s degree, Worcester Polytechnic Institute, focus on computational

fluid dynamics

Kris Overholt, Product Manager

Page 4: 3 Ways to Move Your Data Science Projects to Production: Secure and Scalable Data Science Deployment with Anaconda

© 2017 Continuum Analytics - Confidential & Proprietary 4

• Overview of Anaconda • End-to-End Collaborative Data Science Workflows • Data Science Development and Deployment

• Anaconda + Docker • Anaconda Project • Anaconda Enterprise

• Examples of Data Science Deployment • Getting Started with Anaconda Enterprise Deployment

Agenda

Page 5: 3 Ways to Move Your Data Science Projects to Production: Secure and Scalable Data Science Deployment with Anaconda

Overview of Anaconda

Page 6: 3 Ways to Move Your Data Science Projects to Production: Secure and Scalable Data Science Deployment with Anaconda

© 2017 Continuum Analytics - Confidential & Proprietary 6

Anaconda, the leading Data Science ecosystem with over 4M users

Page 7: 3 Ways to Move Your Data Science Projects to Production: Secure and Scalable Data Science Deployment with Anaconda

© 2017 Continuum Analytics - Confidential & Proprietary 7

Numba

dask

xlwings

Airflow

Blaze

Distributed Systems

Business Intelligence

Web

Scientific Computing / HPC

Machine Learning / Statistics

ANACONDA DISTRIBUTION

Python & R distribution with 1000+ curated packages that makes it easy to get started with Data Science

Page 8: 3 Ways to Move Your Data Science Projects to Production: Secure and Scalable Data Science Deployment with Anaconda

© 2016 Continuum Analytics - Confidential & Proprietary 8

https://www.continuum.io/downloads

Page 9: 3 Ways to Move Your Data Science Projects to Production: Secure and Scalable Data Science Deployment with Anaconda

© 2016 Continuum Analytics - Confidential & Proprietary 9

What’s in ANACONDA DISTRIBUTION?

Page 10: 3 Ways to Move Your Data Science Projects to Production: Secure and Scalable Data Science Deployment with Anaconda

© 2016 Continuum Analytics - Confidential & Proprietary 10

• Install data science libraries

$ conda install pandas

• Manage package versions

$ conda install pandas=0.14

• Create isolated environments

$ conda create -n myenv python=3.5 pandas=0.18

• Update package version

$ conda update pandas

Page 11: 3 Ways to Move Your Data Science Projects to Production: Secure and Scalable Data Science Deployment with Anaconda

© 2016 Continuum Analytics - Confidential & Proprietary 11

Page 12: 3 Ways to Move Your Data Science Projects to Production: Secure and Scalable Data Science Deployment with Anaconda

© 2016 Continuum Analytics - Confidential & Proprietary 12

anaconda-project.yml

• Define and manage: • project package dependencies • deployment commands • data • …

Page 13: 3 Ways to Move Your Data Science Projects to Production: Secure and Scalable Data Science Deployment with Anaconda

© 2016 Continuum Analytics - Confidential & Proprietary 13

• Launch applications • Manage package

versions and environments

• Create and upload projects

Page 14: 3 Ways to Move Your Data Science Projects to Production: Secure and Scalable Data Science Deployment with Anaconda

End-to-end Collaborative Data Science Workflows

Page 15: 3 Ways to Move Your Data Science Projects to Production: Secure and Scalable Data Science Deployment with Anaconda

© 2017 Continuum Analytics - Confidential & Proprietary 15

• Explore, Analyze & Collaborate • Scale, Deploy & Operate

End-to-end Collaborative Data Science Workflows

Page 16: 3 Ways to Move Your Data Science Projects to Production: Secure and Scalable Data Science Deployment with Anaconda

© 2017 Continuum Analytics - Confidential & Proprietary 16

Biz Analyst

Data Scientists

Explore, Analyze & Collaborate

Page 17: 3 Ways to Move Your Data Science Projects to Production: Secure and Scalable Data Science Deployment with Anaconda

© 2017 Continuum Analytics - Confidential & Proprietary 17

DevOps

Scale, Deploy & Operate

Developer

Data Engineers

Page 18: 3 Ways to Move Your Data Science Projects to Production: Secure and Scalable Data Science Deployment with Anaconda

Data Science Development and Deployment

Page 19: 3 Ways to Move Your Data Science Projects to Production: Secure and Scalable Data Science Deployment with Anaconda

© 2017 Continuum Analytics - Confidential & Proprietary 19

How do you… • Download and install data science libraries? • Manage versions and dependencies? • Upgrade libraries? • Isolate dependencies between projects?

Challenges in the data science ecosystem

Page 20: 3 Ways to Move Your Data Science Projects to Production: Secure and Scalable Data Science Deployment with Anaconda

© 2016 Continuum Analytics - Confidential & Proprietary 20

What do data scientists develop?

Workflows

Data

Query Visualize

Clean & Tidy

Predict, Simulate, & Optimize

Reports

Presentations

Interactive Notebooks

Interactive Apps

Predictive Models

Interactive data visualizations and dashboards

Jupyter notebooks Scripts

Predictive models

Processed Data

Page 21: 3 Ways to Move Your Data Science Projects to Production: Secure and Scalable Data Science Deployment with Anaconda

© 2016 Continuum Analytics - Confidential & Proprietary 21

LaptopData Science Development

scikit-learn

Bokeh Tensorflow

Jupyter pandas

matplotlib

seaborn

dask

numba

script 1 script 2 notebook A dataset Zscript 3

Python, R

Page 22: 3 Ways to Move Your Data Science Projects to Production: Secure and Scalable Data Science Deployment with Anaconda

© 2017 Continuum Analytics - Confidential & Proprietary 22

How do you… • Share your data science project with others? • Ensure that you can reproduce your analysis? • Deploy your project?

Challenges in data science development and deployment

Page 23: 3 Ways to Move Your Data Science Projects to Production: Secure and Scalable Data Science Deployment with Anaconda

© 2016 Continuum Analytics - Confidential & Proprietary 23

The Path to Simple Data Science Deployment!

Anaconda Enterprise

DIYAnaconda Project

Anaconda

Docker containers

conda env 1 conda env 2 conda env 3

Page 24: 3 Ways to Move Your Data Science Projects to Production: Secure and Scalable Data Science Deployment with Anaconda

Anaconda and Docker - Better Together

Page 25: 3 Ways to Move Your Data Science Projects to Production: Secure and Scalable Data Science Deployment with Anaconda

© 2016 Continuum Analytics - Confidential & Proprietary

Laptop

conda env 1

Analysis 1

conda env 2 conda env 3

Analysis 2

Analysis 3

Server

conda env 1

Analysis 1

conda env 2 conda env 3

Analysis 2

Analysis 3

Docker container

Data Science DevelopmentData Science Deployment

Page 26: 3 Ways to Move Your Data Science Projects to Production: Secure and Scalable Data Science Deployment with Anaconda

© 2016 Continuum Analytics - Confidential & Proprietary 26

https://hub.docker.com/r/continuumio/anaconda/

Page 27: 3 Ways to Move Your Data Science Projects to Production: Secure and Scalable Data Science Deployment with Anaconda

© 2016 Continuum Analytics - Confidential & Proprietary

• Dependencies

Anaconda and Docker

27

• Data • Deployment commands • Security • Scalability • Availability

Page 28: 3 Ways to Move Your Data Science Projects to Production: Secure and Scalable Data Science Deployment with Anaconda

Portable Data Science with Anaconda Project - More than just Dockerfiles

Page 29: 3 Ways to Move Your Data Science Projects to Production: Secure and Scalable Data Science Deployment with Anaconda

© 2016 Continuum Analytics - Confidential & Proprietary

Laptop Server

Project 1 Project 2 Project 3 Project 1 Project 2 Project 3

Data Science Development Data Science Deployment

Page 30: 3 Ways to Move Your Data Science Projects to Production: Secure and Scalable Data Science Deployment with Anaconda

© 2016 Continuum Analytics - Confidential & Proprietary

LaptopServer

Project 1 Project 2 Project 3 Project 1 Project 2 Project 3

Data Science Development Data Science Deployment

Docker container

Page 31: 3 Ways to Move Your Data Science Projects to Production: Secure and Scalable Data Science Deployment with Anaconda

© 2016 Continuum Analytics - Confidential & Proprietary

• Dependencies • Data • Deployment commands

Anaconda Project

31

• Security • Scalability • Availability

Page 32: 3 Ways to Move Your Data Science Projects to Production: Secure and Scalable Data Science Deployment with Anaconda

One-click Data Science Deployments with Anaconda Enterprise

Page 33: 3 Ways to Move Your Data Science Projects to Production: Secure and Scalable Data Science Deployment with Anaconda

© 2016 Continuum Analytics - Confidential & Proprietary

Laptop

Project 1 Project 2 Project 3

Project 1 Project 2 Project 3

Data Science Development Data Science Development and Deployment

Anaconda Enterprise

Container 1

Container 2

Container 3 Container 4

Page 34: 3 Ways to Move Your Data Science Projects to Production: Secure and Scalable Data Science Deployment with Anaconda

© 2016 Continuum Analytics - Confidential & Proprietary

• Dependencies • Data • Deployment commands • Security • Scalability • Availability

Anaconda Enterprise

34

Page 35: 3 Ways to Move Your Data Science Projects to Production: Secure and Scalable Data Science Deployment with Anaconda

© 2016 Continuum Analytics - Confidential & Proprietary 35

Page 36: 3 Ways to Move Your Data Science Projects to Production: Secure and Scalable Data Science Deployment with Anaconda

© 2017 Continuum Analytics - Confidential & Proprietary 36

• One-click deployment of: • Self-Service Data Science Notebooks (Python and R) • Interactive visualizations and dashboards (Bokeh, Shiny, etc.) • Machine learning models with REST APIs

• Secure deployments to a cluster with end-to-end SSL • API wrapper for easily exposing inputs/outputs for models • Ability to securely share apps with other users, groups, and roles

(LDAP, AD, SAML, Kerberos)

Anaconda Enterprise Features - Data Science Deployment

Page 37: 3 Ways to Move Your Data Science Projects to Production: Secure and Scalable Data Science Deployment with Anaconda

© 2017 Continuum Analytics - Confidential & Proprietary 37

• Ability to deploy apps and APIs that can be used/consumed via a token • Ability to configure CPU/memory limits for deployed apps in system-wide

configuration • Ability to fetch logs for each app with error handling, health checks, and

automatic app restarts • Deployments can be backed by remote storage, databases, or Hadoop/

Spark • Cluster can be configured for high availability

Anaconda Enterprise Features - Data Science Deployment

Page 38: 3 Ways to Move Your Data Science Projects to Production: Secure and Scalable Data Science Deployment with Anaconda

Example Data Science Deployments

Page 39: 3 Ways to Move Your Data Science Projects to Production: Secure and Scalable Data Science Deployment with Anaconda

© 2017 Continuum Analytics - Confidential & Proprietary 39

• 1) Self-service notebooks • 2) Interactive visualizations and dashboards • 3) Machine learning models with REST APIs • 4) Composable data science projects • 5) Machine learning models with visualization

Examples Overview

Page 40: 3 Ways to Move Your Data Science Projects to Production: Secure and Scalable Data Science Deployment with Anaconda

© 2017 Continuum Analytics - Confidential & Proprietary 40

• Self-service data science notebooks, including: • Python • R

• Notebooks with live, attached kernels • Can be used to share runnable versions of analyses • Share running notebooks with users, groups, and roles • Handle portability and manage dependencies with Anaconda Project

Example 1 - Notebooks (Python/R)

Page 41: 3 Ways to Move Your Data Science Projects to Production: Secure and Scalable Data Science Deployment with Anaconda

© 2017 Continuum Analytics - Confidential & Proprietary 41

• Deploy apps using any visualization package in Anaconda, including: • Bokeh • Shiny apps • Datashader • deck.gl

• Develop and share visualizations and dashboards • Include data in project or reference remote data and databases • Deploy visualization apps powered by Hadoop and Spark

Example 2 - Interactive Visualizations

Page 42: 3 Ways to Move Your Data Science Projects to Production: Secure and Scalable Data Science Deployment with Anaconda

© 2017 Continuum Analytics - Confidential & Proprietary 42

• Machine learning models and applications with REST APIs • scikit-learn, Theano, Lasagne, Neon • Tensorflow (w/ GPU), Caffe, H2O • and many more!

• Support for model scoring and prediction APIs from trained models • Compatible with web frameworks in Anaconda, including:

• Flask, Django, Tornado, and more • Models can be shared or consumed via API tokens

Example 3 - Machine Learning w/ APIs

Page 43: 3 Ways to Move Your Data Science Projects to Production: Secure and Scalable Data Science Deployment with Anaconda

© 2017 Continuum Analytics - Confidential & Proprietary 43

• Deploy composable applications across your data science team • Example end-to-end workflow with custom endpoints and API tokens:

• Stage 1 - Data cleansing • Stage 2 - Anomaly detection • Stage 3 - Model scoring • Stage 4 - Interactive applications and dashboards • Stage 5 - Reports and file exports

Example 4 - Composable Deployments

Page 44: 3 Ways to Move Your Data Science Projects to Production: Secure and Scalable Data Science Deployment with Anaconda

© 2017 Continuum Analytics - Confidential & Proprietary 44

• Can be built on top of machine learning libraries in Anaconda, including: • Tensorflow, H2O, and many more

• Easily develop interactive applications and dashboards with existing frameworks

• Handle inputs and outputs to machine learning models • Including complex visualization toolkits such as Tensorboard

Example 5 - ML Models with Visualization

Page 45: 3 Ways to Move Your Data Science Projects to Production: Secure and Scalable Data Science Deployment with Anaconda

Getting Started with Anaconda Enterprise for Data Science Deployments and More

Page 46: 3 Ways to Move Your Data Science Projects to Production: Secure and Scalable Data Science Deployment with Anaconda

© 2017 Continuum Analytics - Confidential & Proprietary 46

Anaconda Platform

Anaconda Distribution

Anaconda Support

Anaconda Enterprise

The most trusted Python distribution for data science

Deploy Anaconda with Confidence. World class support for open source production environments.

Enterprise-ready data science platform for end-to-end workflows, including governance, collaboration, and deployment.

Page 47: 3 Ways to Move Your Data Science Projects to Production: Secure and Scalable Data Science Deployment with Anaconda

© 2017 Continuum Analytics - Confidential & Proprietary 47

• Empower data scientists to easily deploy secure and scalable data science projects to production

• World class support for open-source production environments • Securely govern and version control data science artifacts (projects,

packages, installers) from development to production • Secure and scalable data science project collaboration • Manage Anaconda across a cluster and run data science projects

backed by enterprise scalable compute and data sources • Bring the power of data science to Business Analysts

Anaconda Enterprise Features

Page 48: 3 Ways to Move Your Data Science Projects to Production: Secure and Scalable Data Science Deployment with Anaconda

© 2017 Continuum Analytics - Confidential & Proprietary 48

• Get started with the Anaconda Enterprise Innovator program • https://go.continuum.io/anaconda-enterprise-innovator/

• Contact us at: • [email protected] • https://www.continuum.io/contact-us

Next Steps

Page 49: 3 Ways to Move Your Data Science Projects to Production: Secure and Scalable Data Science Deployment with Anaconda

© 2017 Continuum Analytics - Confidential & Proprietary 49

Questions?

Christine Doig @ch_doig

Kristopher Overholt @koverholt

@ContinuumIO