be fast, be agile, at scale - openshift anwender...appagile data science workstation 06.12.2017 7...

17
be fast, be agile, At Scale Appagile Data Science Workstation - Analytics from the cloud Stefan Zosel May, 2018 – Openshift Anwenderforum http:// AppAgile.de @appagile

Upload: others

Post on 22-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: be fast, be agile, At Scale - OPENSHIFT ANWENDER...AppAgile Data Science Workstation 06.12.2017 7 Presenting Appagile‘s Data science Workstation be fast, be agile, at scale Quick

be fast, be agile, At Scale Appagile Data Science Workstation - Analytics from the cloud

Stefan Zosel May, 2018 – Openshift Anwenderforum

http://AppAgile.de @appagile

Page 2: be fast, be agile, At Scale - OPENSHIFT ANWENDER...AppAgile Data Science Workstation 06.12.2017 7 Presenting Appagile‘s Data science Workstation be fast, be agile, at scale Quick

AppAgile Data Science Workstation 2 06.12.2017

Digitization with T-Systems

Managed Cloud Services Security approved operation Openshift for Enterprises

Page 3: be fast, be agile, At Scale - OPENSHIFT ANWENDER...AppAgile Data Science Workstation 06.12.2017 7 Presenting Appagile‘s Data science Workstation be fast, be agile, at scale Quick

AppAgile Data Science Workstation 3 06.12.2017

Warum DataScience auf Openshift?

Neue UseCases

Neue Methoden

BIG Data trifft PaaS

Innovationen aus der Cloud

AI/ML

Viele neue SW & Services Self Service Bedarf

BIG Data trifft Cloud

Page 4: be fast, be agile, At Scale - OPENSHIFT ANWENDER...AppAgile Data Science Workstation 06.12.2017 7 Presenting Appagile‘s Data science Workstation be fast, be agile, at scale Quick

AppAgile Data Science Workstation 4 06.12.2017

IT

Data Science - DevOps From data to product

Acquire

Data Engineering

Data Munging

Data Modeling

Data Visualizin

g

Data Science

Data Cleansing

Dashboards

Production Pipeline

Online Service

Production

Online Scoring

§  More Processes §  Higher Complexity §  IT departments stumble to serve

DevOps Idea Production

Process

ETL

Ideation MVP

CI/CD

Page 5: be fast, be agile, At Scale - OPENSHIFT ANWENDER...AppAgile Data Science Workstation 06.12.2017 7 Presenting Appagile‘s Data science Workstation be fast, be agile, at scale Quick

AppAgile Data Science Workstation 5 06.12.2017

Data Science DevOps – not Admin! The pain points

§  Language and version conflicts §  Sharing results takes extra effort

§  Provision of environments takes forever §  Compatibility problems

§  Hardware resources on desktops limited §  Models can only be trained and tested on

samples §  Popular frameworks are not parallelized.

Setup

Scalability

Sharing

Connection §  Provision of data access takes time §  Connection to data sources is cumbersome

Sound Similar to

Developer

Page 6: be fast, be agile, At Scale - OPENSHIFT ANWENDER...AppAgile Data Science Workstation 06.12.2017 7 Presenting Appagile‘s Data science Workstation be fast, be agile, at scale Quick

AppAgile Data Science Workstation 6 06.12.2017

Data Science DEVOPS What else Would be nice?

Sandboxing Try new ideas without destroying anything

PoC Develop your PoC agile, at fast pace in the cloud

Training Train your team in the cloud on new ML frameworks

Locality Run analyses in the cloud where your data is located

Ad-hoc Analysis Set up quickly an environment with all tools

CI/CD Work in parallel and contiuously on project modules

Dockerization in the cloud

Page 7: be fast, be agile, At Scale - OPENSHIFT ANWENDER...AppAgile Data Science Workstation 06.12.2017 7 Presenting Appagile‘s Data science Workstation be fast, be agile, at scale Quick

AppAgile Data Science Workstation 7 06.12.2017

Presenting Appagile‘s Data science Workstation be fast, be agile, at scale

§  Quick and easy self-service §  Dedicated Data Science Engineer §  Working with AppAgile‘s Small

Volume Hadoop cluster §  R, Python and Scala in multiple

versions §  Common machine learning

frameworks §  Visualization via notebooks,

markdown and R Shiny Server §  Available on OTC, vCloud, and

Microsoft Azure §  From sample to big data

Page 8: be fast, be agile, At Scale - OPENSHIFT ANWENDER...AppAgile Data Science Workstation 06.12.2017 7 Presenting Appagile‘s Data science Workstation be fast, be agile, at scale Quick

AppAgile Data Science Workstation 8 06.12.2017

HOW Fits BIG Data in Cloud?

Data Lake – One Namespace

Cloud Most flexible For GB of Data Hadoop@Docker

Bare Metal for Big TB and Performace Cloud Small TB / Flexible

Custom Combined BareMetal / Cloud PB++ Global/Distributed Namespace

SMALL Volume BIG Volume HUGE Volume

Too slow,

But easy to use, Flexible

Fixed cost / CAPEX

But optimized Price/

performance

Complex

But highly distributed,

perormant solution

Data Science Workstation

Analytic Tools From „Anywhere“

Page 9: be fast, be agile, At Scale - OPENSHIFT ANWENDER...AppAgile Data Science Workstation 06.12.2017 7 Presenting Appagile‘s Data science Workstation be fast, be agile, at scale Quick

AppAgile Data Science Workstation 9 06.12.2017

Appagile Small volume hadoop data Science Setup Example

Small Volume Hadoop PODs Data Science Workstation

§  2 Hadoop Name Node PODs §  3 Hadoop Data Nodes PODs §  MySQL POD for metadata §  Hue POD for HDFS management

Persistent Storage

§  1 Rstudio POD for R programming

§  1 Jupyter POD for Python Programming

§  1 R Shiny POD for publishing and sharing results

Infrastructure: Private vCloud / Public vCloud/ OTC / Microsoft Azure

§  Connect from DSWS via Livy Server to Spark

§  Execute Spark Jobs from RStudio, Jupyter or R Shiny Server

Page 10: be fast, be agile, At Scale - OPENSHIFT ANWENDER...AppAgile Data Science Workstation 06.12.2017 7 Presenting Appagile‘s Data science Workstation be fast, be agile, at scale Quick

Some Screens

Page 11: be fast, be agile, At Scale - OPENSHIFT ANWENDER...AppAgile Data Science Workstation 06.12.2017 7 Presenting Appagile‘s Data science Workstation be fast, be agile, at scale Quick

AppAgile Data Science Workstation 11 06.12.2017

Page 12: be fast, be agile, At Scale - OPENSHIFT ANWENDER...AppAgile Data Science Workstation 06.12.2017 7 Presenting Appagile‘s Data science Workstation be fast, be agile, at scale Quick

AppAgile Data Science Workstation 12 06.12.2017

Page 13: be fast, be agile, At Scale - OPENSHIFT ANWENDER...AppAgile Data Science Workstation 06.12.2017 7 Presenting Appagile‘s Data science Workstation be fast, be agile, at scale Quick

AppAgile Data Science Workstation 13 06.12.2017

Page 14: be fast, be agile, At Scale - OPENSHIFT ANWENDER...AppAgile Data Science Workstation 06.12.2017 7 Presenting Appagile‘s Data science Workstation be fast, be agile, at scale Quick

AppAgile Data Science Workstation 14 06.12.2017

Page 15: be fast, be agile, At Scale - OPENSHIFT ANWENDER...AppAgile Data Science Workstation 06.12.2017 7 Presenting Appagile‘s Data science Workstation be fast, be agile, at scale Quick

AppAgile Data Science Workstation 15 06.12.2017

Page 16: be fast, be agile, At Scale - OPENSHIFT ANWENDER...AppAgile Data Science Workstation 06.12.2017 7 Presenting Appagile‘s Data science Workstation be fast, be agile, at scale Quick

AppAgile Data Science Workstation 16 06.12.2017

Community - AppAgile.IO

Page 17: be fast, be agile, At Scale - OPENSHIFT ANWENDER...AppAgile Data Science Workstation 06.12.2017 7 Presenting Appagile‘s Data science Workstation be fast, be agile, at scale Quick

AppAgile Data Science Workstation 17 06.12.2017

§  Ready-to-use workstation for DS

§  DS self-service §  Predefined data

and analytic modules

Appagile data science workstation What‘s in the Package:

§  Hadoop in dockers on OpenShift §  Horizontally scalable on demand §  Free assignment of capacities per node §  On public and private cloud §  Compatible to big data solution

Middleware - Small Volume Hadoop

Analytics Services – Data Science Workstation Data Science

Engineer

§  For named users

§  Dedicated to DS support

§  Building of DS libraries

§  In charge of DS community

§  DS Q&A (Best Practise)

Container Repository

§  For named user

Data Science Workstation:

Small Volume: §  Unmanaged -

Free

Dedicated DS

Engineer Data Science Community

Container Images