opennebulaconf 2016 - provisioning flexible and high available climate data services by marco...
TRANSCRIPT
MarcoMancini,Ph.D.SeniorScien)st-AdvancedScien)ficCompu)ngDivisionCTO–Supercompu)[email protected]>p://github.com/km4rcus@marcomancini72h>ps://www.linkedin.com/in/marco-mancini-0787551
Provisioning Flexible and High Available Climate Data Services
About CMCC
• CMCC is a non-profit research institution (Since 10th Dec. 2015 it is a Foundation)
• Established in 2005, with the financial support of the Ministry of Education, University and Research (MIUR), the Ministry of the Environment and Protection of Land and Sea (MATT), the Ministry of Agricultural and Forestry Policies (MIPAF) and the Ministry of Finance (MEF)
• CMCC’s Mission is to investigate and model our climate system and its interactions with society and the environment to guarantee reliable, rigorous, and timely scientific results to stimulate sustainable growth, protect the environment, and to develop science driven adaptation and mitigation policies in a changing climate.
• 6 Consortium Members: National Institute of Geophysics and Volcanology (INGV); University of Salento; Italian Aerospace Research Center (CIRA S.c.p.a); Ca’ Foscari University of Venice; University of Tuscia; University of Sassari.
• 8 Research Divisions: ASC, CSP, ECIP, IAFES, ODA, OPA, RAAS, REMHI • 1 Supercomputing Center with HPC and Storage facilities
The big challenge is to model this complex system
• Several complex processes to be simulated
• Several interacting processes
• Great range of time scales to be analyzed
• Great range of spatial scales to be considered
• Need interdisciplinar sciences (physics, chemistry, biology, geology,…)
• Inherently non-linear governing equations
• Need sophisticated numerics • Need huge computational
resources • …and large volumes of data
can be produced
WarrenM.Washington–NCAR
Scien9ficGrandChallengesWorkshopSeries:ChallengesinClimateChangeScienceandtheRoleofCompu9ngattheExtremeScale
DOEWorkshop(ASCR-BER)November6-7,2008
CMCC information LIfecycle Management plAtform
CLIMA CMCC information LIfecycle Management plAtform
High Performance Computing
Analysis and Visualization
Sharing and Publication
Archiving and Retrieval
Objectives • Enforcing Data Policies • Optimizing Storage Cost • Improving Data High Availability • Robust Implementation of Operational Chains • Ease Search&Discovery, Data Sharing and Collaboration
Federation of Data Services
CLIMA Data Service
Ingestion
Operational Chains
Data Access
Portal Gateway
Search & Discovery
Data Manage-
ment
iRODS is an open-source data management software: • Virtualization • Data Discovery • Workflow Automation • Data Sharing
Solr is open source enterprise search server that provides faceted navigation, clustering, grouping, and other search features
Thredds is a data access server that provides bulk file transfer, remote access, subsetting, web map services
ServersServersServers Disks
DisksDisks
NetworkingNetworkingNetworking
VLAN
ONEFLOW
Storage Service Compute & Networking Service
Physical Resources Storage Networking Virtualization Authentication
Multi-tier Infrastructure Orchestration (VMs)
Multi-tier Service Orchestration (Containers)
Multi-tier Application Provisioning - Scaling - Self Healing
Portal GatewayPortal Gateway Workflow AutomationOperational Chains Data AccessData Access Search & DiscoverySearch & Discovery Data ManagementData Management IngestionIngestion
CLIMA Rest Engine
CLIMA Backend
Create Data Service
ONEFLOWCLIMA Backend
Create Environment
Create API Key
Create Registration Token
Create OneFlow Service Template
Instantiate OneFlow Service Template
Create S3 Bucket
Instantiate Rancher Stack Create Container Volumes
High Available Data Services
Amazon EC2 Amazon S3
ONEFLOW
VPN
VMVMVM
VMVMVM
VMVMVM
VMVMVM
ONEFLOW
Federation + File Replication
Cross Data Center Replication
Federation
Slave Zone
Master Zone