rosettahub for aws data science software, real-time collaboration frameworks, social portals and...

31
AWS mass adoption in Higher Education and research. Pervasive cloud, data science, machine learning, big data and HPC education. RosettaHUB for AWS

Upload: others

Post on 20-May-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: RosettaHUB for AWS data science software, real-time collaboration frameworks, social portals and people. The RosettaHUB data science platform makes it easy for educators to compose

AWS mass adoption in Higher Education and research.

Pervasive cloud, data science, machine learning, big data and HPC education.

RosettaHUB for AWS

Page 2: RosettaHUB for AWS data science software, real-time collaboration frameworks, social portals and people. The RosettaHUB data science platform makes it easy for educators to compose

80 higher education institutions including 4 among the top 10 universities in the World

20,000 students, educators and researchers

16 Countries including the UK, Ireland, France and Germany

100% Automation of onboarding, resources and consumption monitoring and users management

Fully automated digital university model

RosettaHUBfor Amazon Web Services

Page 3: RosettaHUB for AWS data science software, real-time collaboration frameworks, social portals and people. The RosettaHUB data science platform makes it easy for educators to compose

RosettaHUB Overview

Page 4: RosettaHUB for AWS data science software, real-time collaboration frameworks, social portals and people. The RosettaHUB data science platform makes it easy for educators to compose

Governance, Federation and Management for AWS E-learning and E-research platform EduOps and ResOps platform

Account ManagementAutomated users enrollment and processing AWS Accounts full life cycle management Full integration with Liferay's accounts, organizations and rolesFull mapping of organizational hierarchy and responsibilities.Seamless accounts limits management and traceability.Integrated ticketing system

Budget Control and optimizationCosts and resources real-time monitoring and control Management of user budgets' and AWS permissions Safeguards and cost optimizationSeamless Spot market managementAWS grants full life cyle management

Compliance enforcement and cloud access sandboxing Automated AWS accounts limits management Detailed auditing and reportingRosettaHUB management Web Services

Universal Workbench and notebooks at scaleRosettaHUB collaborative Workbench Jupyter servers Unlimited RStudio servers, Shiny Apps serversZeppelin servers, DaaS

Ubuiquitous sharing and real-time collaborationEasy sharing of all RosettaHUB artifacts withusers, groups or organizationsAccess to the RosettaHUB publishing and sharing platform for e-Learning and e-research

Big Data, HPC and Deep Learning made easyRosettaHUB-managed Elastic Map Reduceclusters RosettaHUB nvidia-docker-based virtualenvironments for deep learningRosettaHUB-managed CfnCluster and Alces Flight HPC clustersRosettaHUB spreadsheets for technicalcomputing

End-to-end reproducible e-learning and e-researchManagement platform for Docker containers

Convergent IaC, containers and technical computing APIsSoftware Development Kits: Python, Java, R, C# and js sdksOffice integration: Word and Excel Add-insRosettaHUB meta-cloud and technicalcomputing APIsProgrammable hybrid-kernel (R/Python/Java/Scala)Reactive programming framework.

Full auditabilityDetailed auditing and reportingEnd-to-end traceability

Integration capabilitiesDedicated RosettaHUB portalDedicated publishing and sharing platformSAML/Oauth 2.0/LDAP/Active Directory integrationSsl certificates management platform Programmable emailing platform Advanced scheduling

Page 5: RosettaHUB for AWS data science software, real-time collaboration frameworks, social portals and people. The RosettaHUB data science platform makes it easy for educators to compose

RosettaHUB, state of the art governance and management platform for AWS

Page 6: RosettaHUB for AWS data science software, real-time collaboration frameworks, social portals and people. The RosettaHUB data science platform makes it easy for educators to compose

RosettaHUB provides every student and every educator with an account on a social collaboration portal. Each portal account is linked to a private AWS account created, managed and monitored by RosettaHUB.

The portal makes advanced AWS capabilities easy to understand and operate by students and educators. It also makes all cloud artifacts easy to share.

RosettaHUB fully automates the onboarding processes and gives institutions flexibility on budget allocation.

The building blocks of AWS democratization

Page 7: RosettaHUB for AWS data science software, real-time collaboration frameworks, social portals and people. The RosettaHUB data science platform makes it easy for educators to compose

The institution’ Central Point Of Contact (CPOC) and educators can monitor on real-time (*) the students’ interaction with AWS and the portal.

The CPOC can manage students: adjust their budgets, their rights on AWS, their resources allowances, etc.

The CPOC can create sub-organizations and assign roles to colleagues for a multi-tenant management of students.

System administrators can generate reportson users activities and cloud usage. They can measure and assess effectiveness of the use of cloud resources.

Repositories of pedagogic cloud artifacts can be prepared and shared with students.

(*) ..

End-to-end monitoring, management and audit

Page 8: RosettaHUB for AWS data science software, real-time collaboration frameworks, social portals and people. The RosettaHUB data science platform makes it easy for educators to compose

The RosettaHUB students and educators dashboards display an access button to the AWS console as well as access keys for programmatic access to AWS. It provides detailed aggregated real-time information about the resources being used on AWS, the budget amount left and the estimated overall hourly cost.

Students and educators can request:

1. Limit increase to access higher capacity machine instance types (eg. p2.*, p3.*, g3.* GPU instances).

2. Access to optional AWS services

3. Budget increase and budget transfer to other users

4. Support

44 AWS Services are accessible by default. Access is available to IAM in a proxied manner to preserve the accounts sandboxing. IAM users and IAM roles can be easily and safely created and managed from the dashboard.(*)

Limits and budget requests are automatically processed by the RosettaHUB pipelines within a predefined scope. RosettaHUB creates and tracks tickets with AWS support.

Students and educators dashboards

Page 9: RosettaHUB for AWS data science software, real-time collaboration frameworks, social portals and people. The RosettaHUB data science platform makes it easy for educators to compose

Cost optimization and safeguards

Accounts get automatically disabledand all on-demand EC2 instances are stopped if the user goes above 100% of his/her budget or if the estimated hourly price exceeds the maximum hourly price. Spot instances are snapshotted then terminated. No data is deleted when a user is disabled.

Auto-stop on idle EC2 instances: the user can set the maximum idle time or disable this feature. By default it is set to 6 hours.

Notification emails at 50%, 70%, 90% and 100% of budget consumption.

Use of Spot instances is promoted in the RosettaHUB launch panels, spot instances are the first choice when launching instances or clusters.

Users monitoring panel in the CPOC’s management console

Page 10: RosettaHUB for AWS data science software, real-time collaboration frameworks, social portals and people. The RosettaHUB data science platform makes it easy for educators to compose

Institutions, educators and students take no financial risks as all AWS accounts are guaranteed by RosettaHUB.

RosettaHUB acts as a procurement adapter: It allows Higher Education institutions and research laboratories to top-up their RosettaHUB institutional account with cloud credits in compliance with their regulatory frameworks and administrative constraints.

A dedicated RosettaHUB infrastructure can be fully integrated with the institution’s Information system.

Full technical and compliance integration

Page 11: RosettaHUB for AWS data science software, real-time collaboration frameworks, social portals and people. The RosettaHUB data science platform makes it easy for educators to compose

RosettaHUB, Next generation e-research and e-learning platform

Page 12: RosettaHUB for AWS data science software, real-time collaboration frameworks, social portals and people. The RosettaHUB data science platform makes it easy for educators to compose

The RosettaHUB platform closes the technology gap between clouds, containers, data science software, real-time collaboration frameworks, social portals and people.

The RosettaHUB data science platform makes it easy for educators to compose containers-based virtual e-learning environments and for researchers to compose virtual e-science environments.

Jupyter, RStudio, Spark, Zeppelin, Shiny Apps, virtual desktops, HPC clusters, etc. can be added to the virtual environments and made accessible in a secure and highly scalable-manner to thousands of students or collaborating researchers.

Democratic and pervasive data science

Page 13: RosettaHUB for AWS data science software, real-time collaboration frameworks, social portals and people. The RosettaHUB data science platform makes it easy for educators to compose

Defining the meta-cloud: RosettaHUB Web Services& managed images

Public Cloud

Private Cloud

RosettaHUB delivers :• A docker-based meta-cloud.• A universal data science workbench.• A meta-kernel for data science• A man-cloud and man-data interaction

design• A sharing model for cloud artifacts• A SOAP/Restful API with ~1000 functions• SDKs and add-ins• A cloud and data products marketplace.

RosettaHUB fosters • Usability• Reproducibility• Shareability• Auditability at all layers of interaction between students, educators and researchers and their software tools, infrastructures and peers.

Data scientist

Page 14: RosettaHUB for AWS data science software, real-time collaboration frameworks, social portals and people. The RosettaHUB data science platform makes it easy for educators to compose

The RosettaHUB dashboard displays the cloud and data science related artifacts as customizable icons structured in categories.

RosettaHUB meta-formations: they enable one-click provisioning and access to fully-managed complex infrastructures for e-learning and e-Research.

RosettaHUB meta-keys: they map AWS access keys and a default VPC, they allow rapid access to AWS services and they can be shared.

RosettaHUB meta-images:

• Managed: they come with agents to orchestrate all service components and expose a composable virtual workbench to the end user

• Semi-managed: they map any EC2 AMI

RosettaHUB meta-storages: they map S3 buckets, EFS or EBS volumes. They can be used as the working or reference volumes for managed instances and clusters.

One-click access to AWS-powered data science

Page 15: RosettaHUB for AWS data science software, real-time collaboration frameworks, social portals and people. The RosettaHUB data science platform makes it easy for educators to compose

Seamless creation of Hadoop and Spark clusters based on AWS EMR, the RosettaHUB smart proxies and the RosettaHUB workbench.

Support for both on-demand and spot.

Seamless access to clusters with shells and notebooks including RosettaHUB notebooks, Zeppelin, Jupyter, Spark-Notebook, etc.

Real-time collaborative access, cluster sharing, security and access control for Hadoop and Spark.

Seamless data management, seamless mounting of S3 and EFS volumes on master and slave nodes.

Very rapid big data applications prototypingusing the RosettaHUB reactive programming frameworks, web applications designers and spreadsheet engines.

User-friendly Spark and Hadoop clusters for research and education

Launching an EMR cluster can be done in one click by choosing an available formation or by creating a custom formation with custom settings

Access the cluster’s master in the browser from the RosettaHUB collaborative workbench

Page 16: RosettaHUB for AWS data science software, real-time collaboration frameworks, social portals and people. The RosettaHUB data science platform makes it easy for educators to compose

Seamless creation of NVIDIA-docker based virtual environments for deep learning on GPU.

Seamless creation and access to HPC clusters based on Alces Flight or cfnCluster, the RosettaHUB smart proxies and the RosettaHUB workbench.

Real-time eagle-view on resources, billing and hourly cost for HPC clusters.

Seamless data management, seamless mounting of S3 and EFS volumes on master and slave nodes.

Extended support for spot and autoscaling.

Out-of-the-box cluster security and access control.

Notebooks, cluster sharing and real-time collaboration for Alces Flight and cfnCluster.

Seamless scheduling using cron and rate tasks.

Interactive Scientific Web UIs and reactive programming frameworks for HPC clusters.

User-friendly managed HPC for research and education

Launching a HPC cluster can be done in one click by choosing an available formation or by creating a formation with custom settings

Page 17: RosettaHUB for AWS data science software, real-time collaboration frameworks, social portals and people. The RosettaHUB data science platform makes it easy for educators to compose

RosettaHUB ResOps/EduOpsVirtual-labs-as-code

Page 18: RosettaHUB for AWS data science software, real-time collaboration frameworks, social portals and people. The RosettaHUB data science platform makes it easy for educators to compose

RosettaHUBmeta Formation

Machine

Spot Machine Pool

Machine Pool

Spot EMR Cluster

Spot HPC Cluster

EMR Cluster

Spot Machine

Instance type: p2.xlarge

SSL certificate

Machine Image: Tensorflow GPU

Image

Maximum Bid Price

Spot Machine

Cloud Keys: AWS Keys

RosettaHUB meta-Formations

Reference and Working Volumes

HPC Cluster

Master Instance type: m4.large

SSL certificate

Proxy Image: Standard CPU Image

Slave Instance type: m4.large

EMR Cluster

Cloud Keys: AWS Keys

Reference and Working Volumes

Proxy Instance Type

eg. Deep learning assignments

eg. Big data workshop

Page 19: RosettaHUB for AWS data science software, real-time collaboration frameworks, social portals and people. The RosettaHUB data science platform makes it easy for educators to compose

RosettaHUB creates for each student and educator a default S3 storage and a default EFS storage which map an S3 bucket and an EFS volume

Formations are configured with working volumes and reference volumes which can be mappings of EFS, EBS, S3 or FTP. These are automatically mounted on the EC2 instances including nodes of HPC and EMR clusters

Any public formation that the user launches automatically uses the default user’s EFS as its working volume: Data generated by students and educators is persistent and survives the termination of machine instances

The reference volume can by synched at start-up to the working volume

Students and educators persistent workspaces

EFS, EBS and S3 Volumes can be automatically mounted on the docker container of the RosettaHUB managed instances

Page 20: RosettaHUB for AWS data science software, real-time collaboration frameworks, social portals and people. The RosettaHUB data science platform makes it easy for educators to compose

The RosettaHUB meta-formations and Images can be used to create RosettaHUB Sessions. Sessions provide access to the universal workbench and they can be shared with a user or a group of users. Users have the same view on the workbench and can collaboratively create and adjust widgets, interact with tools and data.

Composable widgets include:

• Real-time collaborative consoles, notebooks and code editors on the most commonly used tools for data analysis: R, Pyhton, Scala, RStudio etc.

• Applications access (Jupyter, Zeppelin, etc.)

• Real-time collaborative RStudio

• Real-time collaborative remote desktop access in the browser.

• Data visualization and interaction components such as charts, sliders, buttons.

Universal collaborative workbench

Page 21: RosettaHUB for AWS data science software, real-time collaboration frameworks, social portals and people. The RosettaHUB data science platform makes it easy for educators to compose

The universal workbench allows the remote interactive control of RosettaHUB meta-kernels created and managed by the RosettaHUB docker agents.

The RosettaHUB meta-kernels are processes merging the virtual machines of Java, R and Python. Meta-kernels allow intercommunication and in-memory transfer of variables from one language to the other

Meta-kernels data access is fully managed by RosettaHUB.

Meta-kernels can be shared as well as their working volumes and reference volumes.

Meta compute kernels& seamless data management

Page 22: RosettaHUB for AWS data science software, real-time collaboration frameworks, social portals and people. The RosettaHUB data science platform makes it easy for educators to compose

Semi-managed images allow users to easily launch a machine from the RosettaHUB web console using their RosettaHUB keys

Launching semi-managed images can be done in one click from the RosettaHUB dashboard

Access to the instances is managed by RosettaHUB, ie. RosettaHUB generates and saves the private keys associated with the instance as well as the password for Windows instances.

Users can retrieve their private keys and passwords anytime .

Instructions on how to connect to Linux and Windows instances are provided to the user

Semi-managed images

Page 23: RosettaHUB for AWS data science software, real-time collaboration frameworks, social portals and people. The RosettaHUB data science platform makes it easy for educators to compose

RosettaHUB mass onboarding process

Page 24: RosettaHUB for AWS data science software, real-time collaboration frameworks, social portals and people. The RosettaHUB data science platform makes it easy for educators to compose

The RosettaHUB automated mass onboarding processfor AWS: Oxford University

Students/Educators register individually at

https://ox.rosettahub.com

Students/Educators verify their email addresses by clicking on a link on the

verification email sent by RosettaHUB

Users with emails ending with the institution’s domain get approved automatically and receive an email

with credentials after a few minutes

Users who register with emails not linked to the institution get approved

manually by the CPOC

Page 25: RosettaHUB for AWS data science software, real-time collaboration frameworks, social portals and people. The RosettaHUB data science platform makes it easy for educators to compose

CPOC uploads in Excel format lists of students and

educators (first name, last name, email, graduation, bio

link etc.)

CPOC selects the valid student and educators registrations and clicks

process from the RosettaHUB users panel

After a few minutes users receive their credentials for

RosettaHUB

Institution’s CPOC registers at:

https://www.rosettahub.com/institutions

Set default limits for institution: budgets,

budget limits, EC2 instance perimeters, regions, services etc.

Create CPOC’s RosettaHUB account

Allocate domain name to institution, create

dedicated registration website

Create CPOC email linked to institution’s domain

ending with @subdomain.rosettahub.

com

Create AWS master account and assign it to

the CPOC

Enable detailed billing, cost explorer, create

Organization

Create support ticket to increase AWS

Organizations limit

Configure CPOC’s AWS account for

resources/billing monitoring

Configure CPOC’s RosettaHUB account with

default keys, S3, EFS

Affiliation of students & educators using Excel files

Initial setup for a new institution

Students/Educators register individually at

https://subdomain.rosettahub.com

Students verify their email addresses by clicking on a link on the verification email sent by

RosettaHUB

Users with emails ending with the institution’s domain get approved automatically and receive an email

with credentials after a few minutes

Users who register with emails not linked to the institution get approved

manually by the CPOC

Affiliation via individual registrations

The RosettaHUB automated mass onboarding processfor AWS

Page 26: RosettaHUB for AWS data science software, real-time collaboration frameworks, social portals and people. The RosettaHUB data science platform makes it easy for educators to compose

Add user to the RosettaHUB portal

Create email account on RH email server ending with

@subdomain.rosettahub.com

Create AWS Sub-Account linked to RH email using

AWS Organization

Create IAM user with rights based on the institution’s settings (instance types,

regions, etc.)

Create Roles for EMR, and ElasticBeanstalk and service

roles for all allowed services

Add monitoring to each user’s account: Lambda

function, Cloudtrail

Create RH VPC where all managed RH EC2 instances

will be running

Create secondary IAM user for RH keys enabling spot

instances access

Create user’s default S3 bucket as well as the RH S3 storage artifact that maps

the bucket

Create EFS storage to be used as a default working volume for RH managed

instances

Send welcome email with user’s credentials for

RosettaHUB

Fully automated process for registering students and educators to AWS and RosettaHUB

The RosettaHUB automated mass onboarding processfor AWS

Page 27: RosettaHUB for AWS data science software, real-time collaboration frameworks, social portals and people. The RosettaHUB data science platform makes it easy for educators to compose

RosettaHUB governance and management platform, modus operandi

Page 28: RosettaHUB for AWS data science software, real-time collaboration frameworks, social portals and people. The RosettaHUB data science platform makes it easy for educators to compose

RosettaHUB uses AWS building blocksto harness the AWS platform and makeit work seamlessly for research andeducation.

It leverages:

Organizations to streamline the affiliation of students and faculty members.

IAM to restrict the students and educators’ perimeters of action.

CloudWatch, SNS and Lambda to monitor and control resources and budget consumption in real-time.

STS to federate users access to the AWS console.

The AWS building blocks

Amazon CloudWatch

AmazonSNS

AWSLambda

AWSCloudTrail

AmazonS3

AWSOrganizations

IAM

AWS STS

Page 29: RosettaHUB for AWS data science software, real-time collaboration frameworks, social portals and people. The RosettaHUB data science platform makes it easy for educators to compose

A Lambda function is inserted in each AWS account for real-time monitoring.

The Lambda function on the master account is triggered a few times per day when a new billing report is made available by AWS. This triggers on RosettaHUB computation of all sub-accounts usage. Actions are taken to disable sub-accounts which over-consumed.

The Lambda functions on sub-accounts are triggered whenever EC2, RDS, EBSresources are created or updated.

They send information about compute and storage resources to the platform which estimates consumption on real-time and disables sub-accounts which exceed their hourly cost limits.

Monitoring and audit at scale Institution

master AWS Account

Students and Educators AWS Accounts

Monitors resources on real-time EC2, EMR, ECS, RDS, EBS, S3, EFS ...

Monitors costs on each sub-account

Page 30: RosettaHUB for AWS data science software, real-time collaboration frameworks, social portals and people. The RosettaHUB data science platform makes it easy for educators to compose

Users can authenticate through institutional SAML or Active Directoryinfrastructures.

Registrations’ lifecycle management actions can be triggered programmatically by the Institutional students management system.

Notification emails can be customized for the institution and custom Email servers can be used.

Cloud resources lifecycle management and sharing actions can be scheduledwith cron and rate tasks.

A dedicated marketplace can be used as an institutional sharing platform for pedagogic and research artifacts (files and data, virtual labs, machines and containers images, etc.)

DedicatedRosettaHUB

Page 31: RosettaHUB for AWS data science software, real-time collaboration frameworks, social portals and people. The RosettaHUB data science platform makes it easy for educators to compose

Contacts: [email protected]

RosettaHUB Website:https://www.rosettahub.com

To register a new institution:https://www.rosettahub.com/institutions