scientific computing with application containers: onedata and … · hosted container management...

Scientific Computing with Application Containers:

Onedata and Hyperflow Use CasesMichał Orzechowski1,2, Bartosz Baliś1, Michał Cwiertnia2,

Łukasz Dutka2, Renata G. Słota1, Jacek Kitowski1,2

1AGH University of Science and Technology,Department of Computer Science, Krakow, Poland

2AGH University of Science and Technology, ACK Cyfronet AGH, Krakow, Poland

Motivation

• Scientific computing management remains challenging

– Resource management, data management, software deployment, …

• Virtualization, application containers, and infrastructure automation can mitigate deployment and execution complexity

• However, in scientific computing app containers are not used to their full potential

– Mainly as a means to provide user-defined software

Objectives

• Discuss benefits of application containers in scientific computing

• Demonstrate on two case studies:

– Onedata: scientific data management

– HyperFlow: scientific workflow management

• How to improve support for application containers in computing centers?

Containers and container orchestration

https://devopscube.com/wp-content/uploads/2016/09/kubernetes-architecure.png

Container management softwareKubernetes, Mesos, Docker Swarm

Hosted container management platformsAmazon ECS/EKS, Google Kubernetes Engine

Serverless container managementAmazon Fargate, Google Knative

Benefits of application containers for (scientific) computing

• Development

– Consistent runtime environment (easier testing and debugging)

– Component re-use and built-in versioning

• Deployment

– Sandboxing and portable deployment across machines and computing infrastructures

– Runtime environment in the container rather than host machine

• Execution

– Better resource utilization

– Isolation

– Ease to achieve high scalability of scientific application components

– Lightweight virtualization and lower runtime overhead compared to VMs

SCIENTIFIC WORKFLOW MANAGEMENT SYSTEM

Scientific workflow management

Inputdata

Final results

Intermediate data

Workflowtasks

Scientificapplications

WMS

(WorkflowManagement

System)

Infrastruktura obliczeniowa

Scientific workflow management in HyperFlow

Programming

• Infrastructure-independent workflowdescription

Planning

• Map workflows to computing infrastructures

• Possible hybrid deployment

Provisioning

• Create required infrastructure resources

• Deploy workflow WMS and application software

Execution

• Run workflow(s) on the provisioned infrastructure

• Scale the infrastructure if needed

Tear down

• Destroy created resources

Virtualmachine

Cloudfunction

HyperFlow WMSComputing infrastructure

Programming language ecosystem

Workflow orchestration

HyperFlowEngine Activity f1

Activity f2

Activity f3

Workflow development

Text editor or generating

script

Executable Wf Description (JSON)

Executable workflow description (JSON)

f1

f2

f3

community

Wf graph

Wf activities

Local commandsWorkflow execution

WorkerApp Code

HF Executor

PBS cluster

ComputingInfrastructure

Workflow deployment

AWS Lambda

Amazon ECS + Docker containers

S3 NFS

Amazon EC2 configuration

Amazon Lambda configuration

PLGrid cloud configuration

Workflow monitoring

HyperFlow dashboard (Grafana)

InfluxDB

PLGrid Cloud

Infrastructure

automation

tools

ONEDATA

EVENTUALLY CONSISTENT VIRTUAL

FILESYSTEM FOR MULTI-CLOUD

INFRASTRUCTURESCYFRONET AGH

WHO WE ARE?

• Group of developers bringing hybrid cloud open source platform to life

• 5+ years devoted development

• Our main goal is:

– to deliver data management platform for large scale and distributed problems

– to make the solution decentralized and eventually consistent in order build a mesh of data sources

– to deliver virtual file system for hybrid cloud

• The work is supported by:

13

ONEDATA FOR ONEWORLD

14

CAPABILITIES OF ONEDATA

Multi-protocol transparent access to data “[…] but we want POSIX”

Heterogeneity of storage technologies

Block data access

Easy Data Sharing and publication (DIO)

Metadata Management Integrated with Data Management Platform

Flexible authentication and authorization

Easy integration using API with external services

High-throughput data processing

1

2

3

4

5

6

7

15

8

MULTI-PROTOCOL TRANSPARENT ACCESS […] BUT WE WANT POSIX

• Transparently access and create data in multi-cloud environments

• Care less about data locality, all your data are accessible wherever you go

• Support for most of the POSIX operations on globally distributed virtual file system

• All data accessible via a unified file system mountable on virtual machines, Grid worker nodes and containers

16

BLOCK DATA ACCESS

• File distribution between storage locations is underneath the file structure

• Files management on a chunk basis

• Missing chunks delivered on the fly

• API for file management for pre-staging and implementing external data policy management

17

HIGH-THROUGHPUT PROCESSING

CephParallel Processing Nodes using POSIX

oneclient, CDMI or REST

Direct Access if possible

Storage

Access

S3 SWIFT Lustre

18

CASE STUDIES

Case study: HyperFlow @ Amazon ECS cloud with auto-scaling

AWS resources

User machineRunner instance

HyperFlow Engine container

Worker instances

HyperFlow executor containers

Configuration filesInfrastructure description files HyperFlow engine Workflow

graph

HyperFlow worker

HF Executor Worker

App containersApp container

Application

Amazon S3

Cloud functions

Input dataWorkflow graph and input data

HyperFlow serverless executor

Workflow cloud functions

Terraform

Monitoring container

EC2 Monit. agent

Infrastructure configuration files

Local disk volumes

Amazon ECS

CloudWatch

Scaling rules

Security policies

Workflow cloud functions

Intermediate d.

Amazon EC2 infrastructure

Monitoring server instance

InfluxDBGrafana

dashboard

Master instance

HyperFlow Job queue executor

HF Executor MasterJobs / results queue

Input data

ECS monitoring plugin

Monitoring agent

CloudWatch notifier

Case study: Onedata @ multi-cloud

Cloud - OTC

GEANT

Local CacheCeph on 4 VMsCapacity 4 TB

10G Link

Private Cloud – INFN BARI

NFS Test Collection 2.5 TB

1 VMOneprovider

1 VMOneprovider

High Troughput ~10GB/sLow Latency Access ~500µs

Kubernetes VMs

Cloud - EXOSCALE

Local CacheCeph on 4 VMsCapacity 2 TB

1 VMOneprovider

High Troughput ~10GB/sLow Latency Access ~500µs

Kubernetes VMs

How to improve support for containers in computing centers?

• Automated deployment of k8s cluster on Cyfronet Openstack– Currently done manually – it is HARD!

• Better integration of k8s with Openstack:– automated storage provisioning with k8s

– automated Openstack Public IPs assigned to k8s services

– k8s cluster auto scaling

Unified integration of HPC resources (Prometheus) with k8s: submitting jobs to k8s that actually run on HPC .

Conclusion

Application containers offer a LOT for scientific computing

All the benefits of virtualization, but much more lightweight

Onedata and HyperFlow developed using Cyfronet cloud resources –Openstack and PLCloud:

• bamboo cluster for continuous integration

• K8s cluster for large deployments and performance testing

To achieve their full potential, we need container orchestrators

Better support in computing centers is desirable

QUESTIONS?

24

Please visit:www.onedata.org

https://github.org/hyperflow-wms

http://www.onedata.org/

https://github.org/hyperflow-wms

scientific computing with application containers: onedata and … · hosted container management...

Documents