Download - DKRZ German Climate Computing Center
DKRZGerman Climate Computing Center
Stephan Kindermann <[email protected]>
Distributed Data Handling Infrastructures in Climatology
and “the Grid”
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
Talk Context: From climatology to grid infrastructures
Climatology: Climatology is the study of climate, scientifically defined as weather conditions averaged over a period of time and is a branch of the atmospheric sciences (Wikipedia)
We concentrate on the part of climatology dealing with complex global climate models and especially on the aspect of data handling:
Climatology Global Climate Models HPC computers (Intro part of talk)
huge amount of model data data handling infrastructure grid
(Main focus of talk)
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
Grid- infrastructures: From prototypes towards a sustainable infrastructure
Access to distributed heterogeneous data repositories
• A national grid project: C3Grid
• Prototype C3Grid/EGEE integration
An emerging worldwide infrastructure to support intercomparison and management of climate model data
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
Climate Models and HPC
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
Intertrans Umzüge wrote:
Motivation: Unprecedented environmental change is indisputable
– The red areas on these two images show the expansion of seasonal melting of the Greenland ice sheet from 1992 to 2002.– The Yellow line shows the temperature increased by 1ºC from 1900 to 2000
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
(One) Question:
Environmental change
because of antropogenic
forcings ?!
Models to understand
earth system needed !!
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
"Science may be described as the art of oversimplification: the art of discerning what we may with advantage omit."[Karl Popper, “The Open Universe”, Hutchinson, London (1982)]
But:
The earth system is complex and with many highly coupled subsystems (and often poorly understood coupling effects)
The need for (complex) coupled General Circulation Models (GCMs) requiring tightly coupled HPC ressources
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
Complex Earth System Models: Components
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
Atmosphere GCM
Dynamics ECHAM5+PhysicsAerosols HAM(M7)
Ocean+Ice GCM
Dynamics MPI-OM+PhysicsBiogeochem. HAMOCC/DMS
Land model
Hydrology HDVegetation JSBACH
Example: The COSMOS Earth System Model
COSMOS: Community Earth System Model Initiative (http://cosmos.enes.org)
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
The complexity of models is increasing
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
Increasing Complexity, increasing computing demands
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
Complexity is just one dimension .. !
Disagreement about what terms meanWhat is a model?What is a component?What is a coupler?What is a code base?
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
Thus the need for dedicated HPC ressources ...
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
The DKRZ: A national facility for the climate
community
(providing compute + data services)
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
The german climate computing centre: DKRZ
DKRZ in Europe unique as a national service
in its combination of
• HPC
• Data services
• Applications consulting
Non profit organization (gmbH)
with 4 shareholders: MPG (6/11),
HH/UniHH: (3/11), GKSS (1/11), AWI (1/11);
investment costs BMBF (until now)
Hamburg „centre of excellence“
for climate related studies
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
A brand new building ..
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
• 252x32 IBM System p575 Power6
• 8x 288 port Qlogic 4x DDR IB-Switch
.. for a brand new supercomputer
Power6-Cluster and HPSS movers nodes
connected to the same Infiniband Switches
• Storage Capacity 10 PB / year
• Archive Capacity 60 PB
Transfer Rates (proposed)
• 5 GB/s (peak)
• 3 GB/s (sustained)
Data migration from GPFS to HPSS
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
Compute power for the next generation of climate model runs ..
Linpack = 115,9 TFLOPS*
252 Nodes = 8064 Cores
76,4% of peak 152 TFLOP
Aggregate transfer rate*
Write: 29 GB/s
Read: 32 GB/s
Single stream transfer rate
Write: 1.3 GB/s
Read: 1.2 GB/s
Metadata operations
10 k/s – 55 k/s
* 12x p575 I/O-Servers
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
Fine, but …
.. Centralized HPC Centers ..
.. Centralized Data Centers ..
.. And where ist the „Grid“ perspective ??
[Ma:07]
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
The Climate Model Data Handling Problem
Modeling Centers produce an exponentionally growing amount of data stored in distributed data centers
Integration of model data and observation data
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
Expected growth rate for data archive @ DKRZ
We are forced to limit data archiving to ~10 PB/year
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
Data management for the IPCC Assessment Report
Data Volume 10s of terabytes (1012 bytes) Downloads: ~500GB/day
Models 25 models
Metadata CF-1 + IPCC-specific
User Community Thousands of users WG1, domain knowledge
Data Volume 1-10 petabytes (1015 bytes) Downloads: 10s of TB/day
Models ~35 models Increased resolution More experiments Increased complexity (ex: biogeochemistry)
Metadata CF-1 + IPCC-specific Richer set of search criteria Model configuration Grid specification from CF (support for native grids)
User Community 10s of thousands of users Wider range of user groups will require better descriptions
of data, attention to ease-of-use
AR5AR4
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
Network Traffic, Climate and Physics Data, and Network Capacity (foil from ESG-CET)
All
Th
ree
Dat
a S
erie
s ar
e N
orm
aliz
ed t
o “
1” a
t Ja
n. 1
990 Ignore the units of the quantities being graphed they are normalized to 1 in 1990, just look at the long-term trends: All of the “ground truth”
measures are growing significantly faster than ESnet projected capacity
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
Problem to access data stored at distributed data centers all over the world
Move computation to data
Infrastructural (grid) support components needed
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
Collect & Prepare
Visualize4
Analyse
Find & Select
Distributed Climate Data
Model DataObservation Data
Analysis Dataset
Result Dataset
Scenario data
3
2
Data description
1
A typical scientific workflow
E-infrastructure components needed to support 1,2,3,4:
Data volume
“humidity flux”
workflow example:
Several PB
~3,1TB
(300-500 files)
~10,3GB
(28 files)
~76 MB
~6MB
~66KB
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
E-Science Infrastructures for Climate Data Handling
(1) A National Climate Community Grid:
The German Collaborative Climate Community
Data and Processing Grid (C3Grid) Project
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
C3Grid Data and Job Management Middleware
D-Grid(SRM, d-cache,..)
D-Grid(SRM, d-cache,..)
C3Grid: Overview
World Data Centers Research Institutes
Climate Mare RSAT PIK GKSSAWI MPI-M
Universities
FU Berlin Uni Köln
Data Access Interface
DWD
ISO Discovery Metadata
Data +
Metadata
WorkflowData +
Metadata
Grid Data / Job Interface
ISO 19139
Discovery
Catalog
Result Data Products + Metadata
C3Grid Data Providers
Collaborative Grid Workspace(A)(B)
?!
IFM-GeomarDKRZ
Portal
C3RC
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
(A) Finding data
Description at aggregate level (e.g. experiment)
Aggregate extent description
with multiple verticalExtent sections
Sub-selection in data request
C3Grid metadata description based on ISO 19139
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
(A) Finding Data: The C3Grid Portal
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
(B) Accessing Data: Portal
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
primarydata
primarydata
base data
pre-proc.pre-proc.
ComputeResourceComputeResource
workspace
(B) Accessing Data: Server Side
Generic Data Request
Web Service Interface
Provider specific data access
interface
Analysis
Selection of preprocessing tools
Metadata generation
Implementation Examples:
- DB + Archive Wrapper (DKRZ, M&D)- Data Warehouse (Pangaea)- OGSA-DAI + DB (DWD)- ....
Grid based datamanagementmetadata
netCDF, GRIB, HDF, XML, ..
geographical + vertical + temporal + content + file format selection
Initial Implementation: • WSDL Web Service • next: WSRF Web Service
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
Workflow Processing
WorkflowSchedulerWorkflowScheduler
RISRIS
ComputeResourceComputeResource
workspace
local resources and interfaces
• GT4 WS-GRAM Interfaces
• Preinstalled SW packages (use of „modules“ system)
• „modules“ info published to Grid Resource Information Service (MDS based)
• Scheduler controls execution (decision based e.g. on modules info + data avalability)
• Initial set of fixed workflows integrated in portal
Open Issues:
• workflow composition support interdependency between processing and data
• user defined processing debugging, substantial user support needed; security !
PortalPortal
JSDL basedworkflow
description
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
PortalPortal
DISDIS
primarymetadataprimary
metadataprimary
dataprimary
database data
metadata
pre-proc.pre-proc. workspace
local resources and interfaces
C3Grid
Portal
DMSDMS
Distributed gridinfrastructure
WorkflowSchedulerWorkflowScheduler
RISRIS
ComputeResourceComputeResource
: Interface
Research Institutes
Climate Mare RSAT PIK GKSSAWI MPI-M
Universities
FU Berlin Uni KölnDWDIFM-GeomarDKRZ
C3Grid Data / Compute Providers World Data Centers
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
C3Grid Security Infrastructure:
Shibboleth + GSI + VOMS / SAML attributes embedded in grid certificates …
I omit details in this talk ..
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
E-Science Infrastructures for Climate Data Handling
(2) Climate data handling in an international Grid infrastructure:The C3Grid / EGEE Prototype
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
Collect & Prepare
Visualize4
Analyse
Find & Select
AWI, GKSS, …
World Data Centers
Analysis Dataset
Result Dataset
DKRZ
3
2
1
C3Grid: community specific tools and agreements
• Standardized data description
• Uniform data access with preprocessing functionality
• Grid based data delivery
EGEE: Approved international grid infrastructure
• mature middleware
• secure and consistent data management
• established 7-24 support infrastructureC3Grid Middleware
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
Bridging EGEE and C3
EGEEEGEE
UI
C3Grid data interfaceC3Grid data interface
ClimateData
Workspace
Webservice Interface
SE
CEWNWNWNWNWNWN
LFCCatalog
Web Portal C3
Lucene Index
OAI-PMHserver
Webservice Interface
OAI-PMHserver
AMGAMetadata Catalog
(f) Publish (ISO
19115/19139)
(g) Harvest (OAI-PMH)
German Climate Data Providers:
WDC Climate WDC RSAT WDC Mare DWD AWI PIK IFMGeomar MPI-Met GKSS
DataResource Metadata
(a) Publish (ISO
19115/19139)
(b) Harvest (OAI-PMH)
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
Finding Data
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
Accessing Data
EGEEEGEE
UI
DataResource
C3Grid data interfaceC3Grid data interface
ClimateData
Workspace
Webservice Interface
SE
CEWNWNWNWNWNWN
LFCCatalog
Web Portal C3
Lucene Index
Webservice Interface
OAI-PMHserverOAI-PMH
server
AMGAMetadata Catalog
(1) Find & Select
(2) Collect & Prepare
(b) Retrieve (jdbc or archive)
(c) Stage & Provide
Webservice Interface
(a) Request (webservice)
(d) notifyWebservice Interface
(f) Transfer &
Register (lcg-tools)
(e) Request (webservice)
(g) Register
(Java-API)
Metadata
(f) Publish (ISO
19115/19139)
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
Trigger qflux workflow
EGEEEGEE
UI
DataResource Metadata
C3Grid data interfaceC3Grid data interface
ClimateData
Workspace
Webservice Interface
SE
CEWNWNWNWNWNWN
(3) Analyse
LFCCatalog
(4) Visualize
Web Portal C3
Lucene Index
Webservice Interface
OAI-PMHserverOAI-PMH
server
AMGAMetadata Catalog
Webservice Interface
(b) submit
(glite)
qflux
qflux
(a) Request (webservice)(g)
Harvest (OAI-PMH)
(f) Publish (ISO
19115/19139)
(c) retrieve
(lcg-tools)
(e) Return graphic
(d) Update (Java-
API)
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
The context: Climate models and HPC • A national climate research facility: The DKRZ
Climate data handling e-/grid- infrastructures Bridging Heterogenity: Access to distributed data repositories
• A national grid project: C3Grid• Prototype C3Grid/EGEE integration
An emerging infrastructure to support intercomparison and management of climate model data (in the context of CMIP5 and IPCC AR5)
Talk Overview
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
Motivation (1): Different models, different results
CCMa
ECHAM GFDL
HADCM
Change in mean annual temperature (°C) SRES A2
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
Motivation (2): Complexity adds uncertainty and new data intercomparison requirements !
Friedlingstein et al., 2006
„Carbon cycle feedbacks are likly to play a critical role in determining the atmospheric concentration of CO2 over the coming centuries (Friedlingstein et al. 2006; Denman et al. 2007; Meehl et al. 2007)” – taken from Climate-Carbon Cycle Feedbacks: The implications for Australian climate policy, Andrew Macintosh and Oliver Woldring, CCLP Working Paper Series
Coupled Carbon Cycle Climate Model Intercomparison Project
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
The Climate Model Intercomparison Project (CMIP)
• There are different, highly complex global coupled atmosphere-ocean general circulation models (`climate models‘)
• They provide different results over next decades and longer timescales
Intercomparisons necassary to discover why and where different models give different output or detect ‚consensus‘ aspects.
The World Climate Research Programme`s Working Group on Coupled Modelling (WGCM) proposed and developed CMIP (now in phase 5)
CMIP5 will provide the basis for the next Intergovernmental Panel on Climate Change Assessment (AR5), which is scheduled for publication in 2013
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
Data management for the IPCC Assessment Report
Data Volume 10s of terabytes (1012 bytes) Downloads: ~500GB/day
Models 25 models
Metadata CF-1 + IPCC-specific
User Community Thousands of users WG1, domain knowledge
Data Volume 1-10 petabytes (1015 bytes) Downloads: 10s of TB/day
Models ~35 models Increased resolution More experiments Increased complexity (ex: biogeochemistry)
Metadata CF-1 + IPCC-specific Richer set of search criteria Model configuration Grid specification from CF (support for native grids)
User Community 10s of thousands of users Wider range of user groups will require better descriptions
of data, attention to ease-of-use
AR5AR4
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
An emerging world wide infrastructure for climate model data intercomparison
The scene:
CMIP5 / IPCC AR5 ESG-CET (Earth System Grid – Center for enabling
technologies) IS-ENES and Metafor FP7 programs
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
IPCCCore
Gateways
(Tier 1)Data Nodes
GB(BADC)
US(PCMDI)
DE(WDCC)
DKRZ
Data nodes:
• holding data from individual modeling groups
Gateways:
• search, access services to data
• oftenly co-allocated to (big) data nodes
• roadmap: Curator+ESG in US,
Metafor+IS-ENES in Europe
Core Nodes:
• providing CMIP5 defined CORE data (on rotating disks)
• roadmap: several in US,
two in Europe (BADC, WDCC) and one in Japan
The CMIP5 federated architecture:
Federation is a virtual trust relationship among independent management domains that have their own set of services. Users authenticate once to gain access to data across multiple systems and organizations.
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
CMIP5:
> 20 modelling centres
> 50 numerical experiments
> 86 simulations (total ensemble members) within experiments
> 6500 years of simulation
> Data to be available from “core-nodes” and “modelling-nodes” in a global federation.
> Users need to find & download datasets, and discriminate between models, and between simulation characteristics.
CMIP5, IPCC-AR5, Timeline:
- Simulations Starting in mid-2009.
- Model and Simulation Documentation needed in 2009 (while models are running).
- Data available: end of 2010
- Scientific Analysis, Paper Submission and Review: early to mid 2012 (current absolute deadline, July).
- Reports: early 2013!
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
An emerging world wide infrastructure for climate model data intercomparison
The scene:
CMIP5 / IPCC AR5
ESG-CET (Earth System Grid – Center for enabling technologies)
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
Architecture of AR5 federation based on ESG
AR5 ESG Gateway (PCMDI)user
registrationsecurityservices
monitoringservices
metadataservices
notificationservices
services startup/shutdown
OPeNDAP/OLFS(aggregation) product server publishing
(harvester)
storagemanagement
backend analysisand vis engine workflow
metricsservices
replica location services
replicamanagement
ESG Node (GFDL)accesscontrol
HTTP/FTP/GFTPservers
metricsservices
publishing(extraction) OPeNDAP/OLFS OPeNDAP/BS
backend analysisand vis engine
monitoringinfo provider
storagemanagement
diskcache
deeparchive
data
onlinedata
ESGnodeESGnode
ESGnodeESGnode
ESGnodeESGnode
ESG Gateway (CCES)ESG Gateway (CCES)
centralizedmetrics services
centralizedmetrics services
ESGnodeESGnode
ESGnodeESGnode
ESGnodeESGnode
ESG Gateway (CCSM)ESG Gateway (CCSM)
centralizedsecurity services
centralizedsecurity services
Analysis ToolAnalysis Toolbrowserbrowser
data provider
service APIservice
implementation
client
Global sevices
ARS mandatorycomponent
FEDERATION
JOIN
ESG-CETarchitecture
Publication GUIPublication GUI
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
Web based single sign on (SSO):• Authentication based on OpenID
• Authorization based on Attribute Service
• Details omitted in this talk ..
Security infrastructure:
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
That`s the basic technology, but …..
… to compare model data we need a common understanding / common language ….
• Metadata definition in the Metafor FP7 project (EU)
(Metafor is cooperating with the US metadata initiative - the Earth System Curator project)
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
Metafor: Metadata Definition
An activity uses software to produce data to be archived in a repository.
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
Metafor is also defining a common vocabulary ..
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
An emerging world wide infrastructure for climate model data intercomparison
The scene:
CMIP5 / IPCC AR5
ESG-CET (Earth System Grid – Center for enabling technologies)
The Metafor FP7 project
European deployment: The IS-ENES FP7 project
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
Infrastructure for European Network for Earth System Modelling (IS-ENES)
• IS-ENES will provide a service for models and model results both to modelling groups and to the users of model results, especially the impact community.
•Joint research activities will improve:
– efficient use of high-performance computers
– model evaluation tool sets
– access to model results
– climate services for the impact community.
•Networking activities will
– increase the cohesion of the European ESM community
– advance a coherent European Network for Earth System modelling.
•A 4 year, FP7 project, starting March 2009
•Led by IPSL, 20 partners
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
IS-ENES data services
Core data nodes
Large data node
Ancillary data node
Supercomputer
Server cluster
v.E.R.C.: virtual Earth System Resource Centre
Enhancing European data services infrastructure
– OGC service infrastructure
– Access to distributed data and processing resources
– Integration into CMIP5 federation
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
Summary: Infrastructure building for
climate model data intercomparison
E-Infra
CMIP5 / AR5
Metafor
IS-ENES
ESG-CET
A big problem: climate model data management
A technology provider: .. + „grid“
Common portal + resource sharing
A community vocabulary and a common conceptual model
community nebula community e-infrastructure !?
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
Is this only for the climate model community ?
What about related communities ?
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
Climate Impact Community
IPCCCore(Tier 0)
Gateways
(Tier 1)
GB
(BADC)
US
(PCMDI)
DE
(WDCC)
DKRZ
Data Nodes
(Tier 2)
International climate model data federation
IS-ENES Portal
IS-ENES PortalIm
pact Comm
unity Portal
Impact Com
munity Portal
IS-ENES plan:
• OGC interfaces
• Analysis services
• ..
A long way to go towards standardized interfaces / services..
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
IPCCCore(Tier 0)
Gateways
(Tier 1)
GB
(BADC)
US
(PCMDI)
DE
(WDCC)
DKRZ
C3Grid data nodes
WDC RSAT
WDC Mare
DWD
….
Data Nodes
(Tier 2)
International climate model data federation (IPCC AR5)
Datenlebenszyklus-verwaltung
Datenlebenszyklus-verwaltung
Workflow-Management
Workflow-Management
Portal
Portal
VirtualWorkspace
VirtualWorkspace
C3Grid Infrastructure
Uni Köln
DKRZ
…
Climate model data analysis in the (proposed) C3-INAD project
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
Summary (1): Infrastructure building/using experience
• (Prototype) Grid-Infrastructures
C3Grid (in context of D-Grid), C3Grid/EGEE
heterogeneous data integration, few users by now
• New infrastructure building effort for a highly demanding community problem:
CMIP5/IPCC data federation and associated e-infra initiatives
community specific e-infrastructure components, lots of users, a „must not fail“ project ..!
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
Summary (2): A social perspective
The term scientific cyberinfrastructure refers to a new research environment.
BUT: a cyberinfrastructure is NOT only technical.
A cyberinfrastructure is also an infrastructure with heterogeneous participants (informatics, domain scientists, technologists, etc.), organizational and political practices and social norms. Therefore developing cyberinfrastructure is a technical and a social endeavor!
[Thanks to Sonja Palfner (TU Darmstadt) for the following foils]
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
„Speaking of cyberinfrastructure as a machine to be built or technical system to be designed tends to downplay the importance of social, institutional, organizational, legal, cultural, and other non-technical problems developers always face.“ (Edwards et al. 2007: 7)
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
Example:
Monitoring, Modeling and Memory: Dynamics of Data and Knowledge in Scientific Cyberinfrastructures (2008-2011)
The project investigates different cases of cyberinfrastructure developments: Long Term Ecological Research Network, the Center for Embedded Networked Sensing, the WATer and Environmental Research Systems Network, and the Earth System Modeling Framework.
Objective: to understand how scientists actually create and share data in practice, and how they use it to create new knowledge.
(www.si.umich.edu/~pne/mmm.htm)
The National Science Foundation (NSF) pays attention to this complexity of cyberinfrastucture developments.
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
What can social sciences bring to cyberinfrastructure developments?
Reflection on the social challenges and problems within cyberinfrastuctures.
Making the social, political and cultural dimensions visible.
Understanding the larger national and transnational context of cyberinfrastructures in different scientific cultures.
Analyzing the conditions for successful cyberinfrastucture projects and „best practices“.
Social scientists can „act as honest brokers between designers and users, explaining the contingencies of each to the other and suggesting ways forward“.
(Edwards et al. 2007:34)
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
Thank You !
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
Appendix – Additional foils …
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
Time
LevelVariable
[David Viner, CRU]
• High volume „gridded“ datasets
• self describing („container“) data formats (netcdf, grib, HDF)
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
AA
Data access and security
PortalPortalPortal
DISDIS
DMSDMSWorkflowSchedulerWorkflowScheduler
Distributed gridinfrastructure
RISRIS
AA
primarymetadataprimary
metadataprimary
dataprimary
database data
metadata
pre-proc.pre-proc.
ComputeResourceComputeResource
workspace
local resources and interfaces
: Interface
Research Institutes
Climate Mare RSAT PIK GKSSAWI MPI-M
Universities
FU Berlin Uni KölnDWDIFM-GeomarDKRZ
C3Grid Data / Compute Providers World Data CentersAA
• single sign on• support of users without grid certificates• federated identity management
• X509 grid certificates (EU-GridPMA CA) • Grid security infrastructure (GSI)
• legacy AA infrastructure (LDAP, DB based, ..)• legacy data access infrastructure
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
Identity ProviderHome Organisation
Identity ProviderVirtual Organisation
MyProxyMyProxy
Delegation Service
Grid Service
Grid ServiceGrid
Resource
GRAM / DataRAM
C3Grid Middleware
GridShibSAML tools
wflowclient
SLCS(CA)
SLCS(CA)
X509 Grid-proxy
GridShib for GT policy
Portal
<..SAML Assertions..>
SAML SAML
SAML
SAML
Personal /Group
Account
„Home attributes + VO attributes“
WAYF
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
Visualize
selected
result
Collect & Prepare a temporal and spatial subset of the data
4
Analyse the integrated, transport of humidity between selected levels
Find & Select relevant & available datasets
Distributed Climate Data
Analysis Dataset
Result Dataset
Wind speed
3
2
1TemperatureSpecific
humidity
I want to control where my job is running !!
Uniform discovery for these data centers nice, but I also need data from ….
I need version xx of yy and …
I want to know exactly what`s happening, e.g. need reproducable results
I don`t want to learn a new job description language, get a certificate to do a simple analysis …
Debugging ???!!!What went wrong ???
Data collection is fine, but I don`t need a „grid“ get my results !!
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
ESG-CET
Earth System Grid - Center for Enabling Technologies (ESG-CET)
• Will deliver a federation architecture capable of allowing data held at “nodes” to be visible via “gateways”.
• Support for CMIP5 via “modelling-nodes” and “core-nodes”, with the former holding all the data from one modelling group, and the latter holding the CMIP5 defined “core” data.
• Expect multiple “core-nodes”, with two in Europe (BADC, WDCC), several in the US, and one in Japan.
• Expect multiple gateways (Metafor+IS-ENES in Europe, Curator+ESG in US). ESG being lead by U.S. Program for Climate
Model Diagnosis and Intercomparison (PCMDI)
at Lawrence Livermore National Laboratory
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
Compare !! What ???
Disagreement about what terms mean
What is a model?
What is a component?
What is a coupler?
What is a code base?
What is a (canonical) dataset (data-aggregate)
What is a model configuration ?
Little or no documentation of the “simulation context” (the whys and wherefores and issues associated with any particular simulation
Need to collect information from modelling groups !!!!
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
Metafor
Common Metadata for Climate Modelling Digital Repositorieshttp://metaforclimate.eu
SEVENTH FRAMEWORK PROGRAMMEResearch Infrastructures
INFRA-2007-1.2.1 - Scientific Digital Repositories
METAFOR describes activities, software, and data involved in the simulation of climate so that “models” can be discovered and compared between distributed digital repositories
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
• “scientific” words end up in controlled vocabularies
• definitions of “other” end up in description
• choices end up in values
.. and METAFOR is Responsible for CMIP5 metadata questionnaire
[from Bryan Lawrence]
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
UI
C3Grid data interfaceC3Grid data interface
ClimateData
Workspace
Webservice Interface
SE
CEWNWNWNWNWNWN
LFCCatalog
Web Portal C3
Lucene Index
OAI-PMHserver
Webservice Interface
OAI-PMHserver
AMGA/…Metadata Catalog
Publish (ISO
19115/19139)
Harvest (OAI-PMH)
DataResource Metadata
Publish
(ISO 19115/19139
)
Harvest (OAI-PMH)
Webservice Interface
Download, upload & analysis
incl. republishin
g (webservice)
Download, preprocessing& analysis (webservice)
Nice early prototype, but ..
Community ?? Users ?? ..
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
further info:
www.c3grid.de
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
Pink = core, yellow = tier 1, green = tier 2.
“A Summary of the CMIP5 Experiment Design”
Lead authors: Karl E. Taylor, Ronald J. Stouffer and Gerald A. Meehl.
31 December 2008
The Climate Model Intercomparison Project CMIP5
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
Location
Various data centers & portals
Institutional storage & computing facilities
local facilities
Personal Computer
Visualize
selected
result
A concrete example: “qflux”
Collect & Prepare a temporal and spatial subset of the data
4
Analyse the integrated, transport of humidity between selected levels
Find & Select relevant & available datasets
Distributed Climate Data
Analysis Dataset
Result Dataset
Wind speed
3
2
1TemperatureSpecific
humidity
Datavolume
Several PB
~3,1TB
(300-500 files)
~10,3GB
(28 files)
~76 MB
~6MB
~66KB
ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009
A common metadata description
(Simplified) System Overview
Graphical User
Interface
Metadata
Simulation Datasets
Models
??Query
Model Run Output
…………………………………………
Model Metadata