rosettahub the next generation data science platform...the next generation data science platform...

12
RosettaHUB The next generation data science platform LondonR meetup 5 th April 2016

Upload: others

Post on 08-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: RosettaHUB The next generation data science platform...The next generation data science platform LondonR meetup 5th April 2016 ... Traceable and reproducible data science RosettaHUB

RosettaHUBThe next generation data

science platform

LondonR meetup

5th April 2016

Page 2: RosettaHUB The next generation data science platform...The next generation data science platform LondonR meetup 5th April 2016 ... Traceable and reproducible data science RosettaHUB

A universal open platformfor data science

Computational ComponentsR packages, Wrapped C,C++,Fortran code, Python modules, Matlab Toolkits…

Open source or commercial

Computational ResourcesClusters, grids, private or public clouds

Free or pay-per-use

Computational GUIsHTML5 and Desktop Workbench

Built-in views /Plugins /Collaborative views

Open source or commercial

Computational ScriptsR / Python / Matlab / Groovy

Computational APIsJava / SOAP / REST, Stateless and stateful

Computational StorageLocal, NFS, FTP, Amazon S3, EBS, HDFS

Generated Computational Web ServicesStateful or stateless, mapping of R objects/functions

RosettaHUB

Page 3: RosettaHUB The next generation data science platform...The next generation data science platform LondonR meetup 5th April 2016 ... Traceable and reproducible data science RosettaHUB

Infrastructures federation: RosettaHUB cloud

Public Clouds

Private Cloud

rosettahub.com

Page 4: RosettaHUB The next generation data science platform...The next generation data science platform LondonR meetup 5th April 2016 ... Traceable and reproducible data science RosettaHUB

AWS: programmable infrastructure

Command Line

Web Console

SDK

API

Page 5: RosettaHUB The next generation data science platform...The next generation data science platform LondonR meetup 5th April 2016 ... Traceable and reproducible data science RosettaHUB

RosettaHUB Command Line

RosettaHUB Web Console

RosettaHUB SDKs

RosettaHUB API

RosettaHUB: programming with data andinfrastructure

Page 6: RosettaHUB The next generation data science platform...The next generation data science platform LondonR meetup 5th April 2016 ... Traceable and reproducible data science RosettaHUB

Google Docs-like real time collaboration

rosettahub.com

Page 7: RosettaHUB The next generation data science platform...The next generation data science platform LondonR meetup 5th April 2016 ... Traceable and reproducible data science RosettaHUB

Traceable and reproducible data science

RosettaHUB

Machine Image A

R 3.1

Bioconductor 3.0

RosettaHUB Amazon Machine Images

RosettaHUB

Machine Instance 1

Bsaed on Image A

Amazon Elastic Block Stores

rosettahub.com

RosettaHUB

EBS 2

Data Set D2

RosettaHUB

Machine Image B

R 3.2

Bioconductor 3.1 RosettaHUB

Machine Image C

R 3.2

Bioconductor 3.2

RosettaHUB

EBS 2

Data Set D2

RosettaHUB

Machine Instance 2

Bsaed on Image A

Researcher

Reviewer

RosettaHUB

EBS 2

Data Set D2

RosettaHUB

EBS 1

Data Set D1

RosettaHUB

EBS 4

Data Set D4

RosettaHUB

EBS 3

Data Set D3

Page 8: RosettaHUB The next generation data science platform...The next generation data science platform LondonR meetup 5th April 2016 ... Traceable and reproducible data science RosettaHUB

A multi-language framework

Page 9: RosettaHUB The next generation data science platform...The next generation data science platform LondonR meetup 5th April 2016 ... Traceable and reproducible data science RosettaHUB

A universal data science engine

• Reactive data science micro services platform

• Based on Java/R/Python processes

• Events-driven remote objects/engines

• Fully Dockerized

• Collaborative spreadsheets

• Collaborative scientific graphics canvas

• Collaborative dashboards

• Collaborative widgets

Page 10: RosettaHUB The next generation data science platform...The next generation data science platform LondonR meetup 5th April 2016 ... Traceable and reproducible data science RosettaHUB

The platform architecture

Docker Swarm

Linux cluster

RosettaHUB Cloud Broker

Data Science Portal

Platform Message Broker

Liferay MySQL

Database

RosettaHUB MySQL Database

Ro

settaHU

B P

latform

System Administrator

Clouds Management

Console

Azure API

Job Sched

uler A

PI

OpenN

ebula API

Do

cker/Swarm

AP

I

Data Science Workbench

Views

Science Gateways

Factory

eLearning Apps

Social Apps MarketplaceReal-time

Collaboration Apps

LiferayAPI

RosettaHUB Public API

VM

VM

VM

GCE API OpenStack APIAWS API

Researcher Teacher Student

Page 11: RosettaHUB The next generation data science platform...The next generation data science platform LondonR meetup 5th April 2016 ... Traceable and reproducible data science RosettaHUB

Inside the containers and engines

Rosetta Engine

PythonWolframLanguage

Julia SQLEmbeddedderbyDB

GraphDB

EmbeddedorientDB

JDBC SQLMySql

PostgreSQLRedshift

...

Scala/

Spark

Java Platform

LanguagesGroovyJython

...JNI JNIJNI

Jupyter Server

R Studio

shiny

R

JNI

VNC Server

JSON / NoSQL cloud databaseEquiv. Firebase

Server

ParaviewWeb ServerSsh

Unified Data Bus

Java Virtual Machine

SpreadsheetEngine

Cross-language Interactive/Collaborative

Widgets

Cross-language MacrosHTTPFile

Server

Rosetta Engine SOAP API

Rosetta Engine JSON HTTP API

Rosetta Engine Real-time Events Bus

Rosetta Gateway

FTP File ServerSecurity Policy

Manager

Docker Container

Virtual Machine

Page 12: RosettaHUB The next generation data science platform...The next generation data science platform LondonR meetup 5th April 2016 ... Traceable and reproducible data science RosettaHUB

Learn more and register at:www.rosettahub.com

Get in touch:[email protected]

https://uk.linkedin.com/in/karimchine