scientific computing with application containers: onedata and … · hosted container management...
TRANSCRIPT
Scientific Computing with Application Containers:
Onedata and Hyperflow Use CasesMichał Orzechowski1,2, Bartosz Baliś1, Michał Cwiertnia2,
Łukasz Dutka2, Renata G. Słota1, Jacek Kitowski1,2
1AGH University of Science and Technology,Department of Computer Science, Krakow, Poland
2AGH University of Science and Technology, ACK Cyfronet AGH, Krakow, Poland
Motivation
• Scientific computing management remains challenging
– Resource management, data management, software deployment, …
• Virtualization, application containers, and infrastructure automation can mitigate deployment and execution complexity
• However, in scientific computing app containers are not used to their full potential
– Mainly as a means to provide user-defined software
Objectives
• Discuss benefits of application containers in scientific computing
• Demonstrate on two case studies:
– Onedata: scientific data management
– HyperFlow: scientific workflow management
• How to improve support for application containers in computing centers?
Containers and container orchestration
https://devopscube.com/wp-content/uploads/2016/09/kubernetes-architecure.png
Container management softwareKubernetes, Mesos, Docker Swarm
Hosted container management platformsAmazon ECS/EKS, Google Kubernetes Engine
Serverless container managementAmazon Fargate, Google Knative
Benefits of application containers for (scientific) computing
• Development
– Consistent runtime environment (easier testing and debugging)
– Component re-use and built-in versioning
• Deployment
– Sandboxing and portable deployment across machines and computing infrastructures
– Runtime environment in the container rather than host machine
• Execution
– Better resource utilization
– Isolation
– Ease to achieve high scalability of scientific application components
– Lightweight virtualization and lower runtime overhead compared to VMs
SCIENTIFIC WORKFLOW MANAGEMENT SYSTEM
Scientific workflow management
Inputdata
Final results
Intermediate data
Workflowtasks
Scientificapplications
WMS
(WorkflowManagement
System)
Infrastruktura obliczeniowa
Scientific workflow management in HyperFlow
Programming
• Infrastructure-independent workflowdescription
Planning
• Map workflows to computing infrastructures
• Possible hybrid deployment
Provisioning
• Create required infrastructure resources
• Deploy workflow WMS and application software
Execution
• Run workflow(s) on the provisioned infrastructure
• Scale the infrastructure if needed
Tear down
• Destroy created resources
Virtualmachine
Cloudfunction
HyperFlow WMSComputing infrastructure
Programming language ecosystem
Workflow orchestration
HyperFlowEngine Activity f1
Activity f2
Activity f3
Workflow development
Text editor or generating
script
Executable Wf Description (JSON)
Executable workflow description (JSON)
f1
f2
f3
community
Wf graph
Wf activities
Local commandsWorkflow execution
WorkerApp Code
HF Executor
PBS cluster
ComputingInfrastructure
Workflow deployment
AWS Lambda
Amazon ECS + Docker containers
S3 NFS
Amazon EC2 configuration
Amazon Lambda configuration
PLGrid cloud configuration
Workflow monitoring
HyperFlow dashboard (Grafana)
InfluxDB
PLGrid Cloud
Infrastructure
automation
tools
ONEDATA
EVENTUALLY CONSISTENT VIRTUAL
FILESYSTEM FOR MULTI-CLOUD
INFRASTRUCTURESCYFRONET AGH
WHO WE ARE?
• Group of developers bringing hybrid cloud open source platform to life
• 5+ years devoted development
• Our main goal is:
– to deliver data management platform for large scale and distributed problems
– to make the solution decentralized and eventually consistent in order build a mesh of data sources
– to deliver virtual file system for hybrid cloud
• The work is supported by:
13
ONEDATA FOR ONEWORLD
14
CAPABILITIES OF ONEDATA
Multi-protocol transparent access to data “[…] but we want POSIX”
Heterogeneity of storage technologies
Block data access
Easy Data Sharing and publication (DIO)
Metadata Management Integrated with Data Management Platform
Flexible authentication and authorization
Easy integration using API with external services
High-throughput data processing
1
2
3
4
5
6
7
15
8
MULTI-PROTOCOL TRANSPARENT ACCESS […] BUT WE WANT POSIX
• Transparently access and create data in multi-cloud environments
• Care less about data locality, all your data are accessible wherever you go
• Support for most of the POSIX operations on globally distributed virtual file system
• All data accessible via a unified file system mountable on virtual machines, Grid worker nodes and containers
16
BLOCK DATA ACCESS
• File distribution between storage locations is underneath the file structure
• Files management on a chunk basis
• Missing chunks delivered on the fly
• API for file management for pre-staging and implementing external data policy management
17
HIGH-THROUGHPUT PROCESSING
CephParallel Processing Nodes using POSIX
oneclient, CDMI or REST
Direct Access if possible
Storage
Access
S3 SWIFT Lustre
18
CASE STUDIES
Case study: HyperFlow @ Amazon ECS cloud with auto-scaling
AWS resources
User machineRunner instance
HyperFlow Engine container
Worker instances
HyperFlow executor containers
Configuration filesInfrastructure description files HyperFlow engine Workflow
graph
HyperFlow worker
HF Executor Worker
App containersApp container
Application
Amazon S3
Cloud functions
Input dataWorkflow graph and input data
HyperFlow serverless executor
Workflow cloud functions
Terraform
Monitoring container
EC2 Monit. agent
Infrastructure configuration files
Local disk volumes
Amazon ECS
CloudWatch
Scaling rules
Security policies
Workflow cloud functions
Intermediate d.
Amazon EC2 infrastructure
Monitoring server instance
InfluxDBGrafana
dashboard
Master instance
HyperFlow Job queue executor
HF Executor MasterJobs / results queue
Input data
ECS monitoring plugin
Monitoring agent
CloudWatch notifier
Case study: Onedata @ multi-cloud
Cloud - OTC
GEANT
Local CacheCeph on 4 VMsCapacity 4 TB
10G Link
Private Cloud – INFN BARI
NFS Test Collection 2.5 TB
1 VMOneprovider
1 VMOneprovider
High Troughput ~10GB/sLow Latency Access ~500µs
Kubernetes VMs
Cloud - EXOSCALE
Local CacheCeph on 4 VMsCapacity 2 TB
1 VMOneprovider
High Troughput ~10GB/sLow Latency Access ~500µs
Kubernetes VMs
How to improve support for containers in computing centers?
• Automated deployment of k8s cluster on Cyfronet Openstack– Currently done manually – it is HARD!
• Better integration of k8s with Openstack:– automated storage provisioning with k8s
– automated Openstack Public IPs assigned to k8s services
– k8s cluster auto scaling
Unified integration of HPC resources (Prometheus) with k8s: submitting jobs to k8s that actually run on HPC .
Conclusion
Application containers offer a LOT for scientific computing
All the benefits of virtualization, but much more lightweight
Onedata and HyperFlow developed using Cyfronet cloud resources –Openstack and PLCloud:
• bamboo cluster for continuous integration
• K8s cluster for large deployments and performance testing
To achieve their full potential, we need container orchestrators
Better support in computing centers is desirable
QUESTIONS?
24
Please visit:www.onedata.org
https://github.org/hyperflow-wms