long-term archiving of climate model data at wdc climate and dkrz michael lautenschlager wdc climate...

28
Long-term Archiving of Climate Model Data at WDC Climate and DKRZ Michael Lautenschlager WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Data Management Workshop (Köln, 29.-30.09.09)

Upload: lillian-montgomery

Post on 18-Dec-2015

217 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Long-term Archiving of Climate Model Data at WDC Climate and DKRZ Michael Lautenschlager WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Data

Long-term Archiving of Climate Model Data

at WDC Climate and DKRZ

Michael LautenschlagerWDC Climate / Max-Planck-Institute for Meteorology, Hamburg

Data Management Workshop (Köln, 29.-30.09.09)

Page 2: Long-term Archiving of Climate Model Data at WDC Climate and DKRZ Michael Lautenschlager WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Data

DKRZ: • Earth system model development• Simulations of past, present and

future climate

WDC Climate: • Long-term data archiving• Inter-disciplinary data

dissemination

Structure 2009

Page 3: Long-term Archiving of Climate Model Data at WDC Climate and DKRZ Michael Lautenschlager WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Data

Diagram of Climate System

Page 4: Long-term Archiving of Climate Model Data at WDC Climate and DKRZ Michael Lautenschlager WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Data

Diagram of the Hamburg IPCC-Climate Model ECHAM5/MPI-OM

Page 5: Long-term Archiving of Climate Model Data at WDC Climate and DKRZ Michael Lautenschlager WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Data

Forcing of Climate Projetions for IPCC AR4

Page 6: Long-term Archiving of Climate Model Data at WDC Climate and DKRZ Michael Lautenschlager WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Data

Near surface temperature change for the scenariosA1B und B1. Presented is the difference of the 30-year-means 2071-2100minus 1961-1990.

Page 7: Long-term Archiving of Climate Model Data at WDC Climate and DKRZ Michael Lautenschlager WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Data

Comparison of the present-day sea ice coverIn March and September(oben) with the climate projection for the scenario A1B (unten) in 2100.Additionally the snow over land can be obtained.

Page 8: Long-term Archiving of Climate Model Data at WDC Climate and DKRZ Michael Lautenschlager WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Data

HLRE-II Architecture(http://www.dkrz.de/dkrz/about/hardware)

blizzard: /work/pf

/scratch

tape:/hpss/arch /hpss/doku /dxul/ut /dxul/utf /dxul/utd

xtape:

ssh blizzard

(sftp xtape.dkrz.de)„get /hpss/arch/<prjid>/<myfile>“

pftp

HPSS(10 Pbyte /a )

GPFS(3 Pbyte)

IBM Power62 x Login250 x Compute150 TFlops peak

StorageTek SilosTotal Capacity: 60000 Tapes Approx. 60 PB

(LTO and Titan)

Page 9: Long-term Archiving of Climate Model Data at WDC Climate and DKRZ Michael Lautenschlager WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Data

Data production on IBM-P6: 50 PB/year Limit for mass storage archive (HPSS): 10 PB/year

Scientific project data archive with expiration date Limit long-term data archive (WDCC): 1 PB/year

Required is a complete data catalogue entry in WDCC (metadata)

Decision procedure for long-term archive transition is not finally implemented (data storage policy).

Accessible via WDCC infrastructure Searchable data catalogue (GUI) Field-based and file-based data access (Internet) Storage time period: at least 10 years (no expiration date)

Development of data archive at DKRZ (German Climate Computing Centre)

Page 10: Long-term Archiving of Climate Model Data at WDC Climate and DKRZ Michael Lautenschlager WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Data

Development of mass storage archive

Oct. 2008

Mid of 2009:10 PB

Page 11: Long-term Archiving of Climate Model Data at WDC Climate and DKRZ Michael Lautenschlager WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Data

Data documentation requirements are accomplished by using the WDCC infrastruture CERA-2 metadata model developed in 1999

Catalogue interface: cera.wdc-climate.de Input interface: input.wdc-climate.de

CERA-2 metadata content is complete with respect to browse, to discover and to use climate data which are stored in the database system or outside in flat files

The WDCC matches international description standards like ISO 19115, Dublin Core or GCMD and is integrated in international data federations

Data storage structure assembles field-based storage of climate time series per variable in database tables. This allows for web-based data catalogue search and data access in small data granules.

Page 12: Long-term Archiving of Climate Model Data at WDC Climate and DKRZ Michael Lautenschlager WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Data

CERA Data Model

Entry

Reference

Status

Distribution

Contact Coverage

Parameter

SpatialReferenceLocal Adm.

Data Access

Data Org

Page 13: Long-term Archiving of Climate Model Data at WDC Climate and DKRZ Michael Lautenschlager WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Data

Coloured columns correspond to BLOB data tables in WDCC. Collections of matrix rows represents storage in model raw data files (complete model output storage time step by storage time step).

Page 14: Long-term Archiving of Climate Model Data at WDC Climate and DKRZ Michael Lautenschlager WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Data

WDCC Developement

Future annual growth rate: 1 PB / year

Page 15: Long-term Archiving of Climate Model Data at WDC Climate and DKRZ Michael Lautenschlager WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Data

2008

WDCC Users (authorised for data download)

Page 16: Long-term Archiving of Climate Model Data at WDC Climate and DKRZ Michael Lautenschlager WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Data

WDCC Data Downloads in 2008

Page 17: Long-term Archiving of Climate Model Data at WDC Climate and DKRZ Michael Lautenschlager WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Data

WDCC / CERA: General Statistics at 30-09-2009 00:00:10 Database Size (TByte): 404

Number of blobs: 8194476663 (8.2 billion)

Number of experiments: 1378

Number of datasets: 165376

Total size divided by number of BLOBs gives the average size of data access granules: 50 kB/BLOB (field-based data access)

Page 18: Long-term Archiving of Climate Model Data at WDC Climate and DKRZ Michael Lautenschlager WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Data

WDCC Content

ERA40

IPCC

CEOPBALTEX HOAPS

CARIBIC

WOCE

ERA15/40NCEP

GEBCO

COSMOS

Simulations @ MPI, GKSS,…

Data from Earth SystemModelling andRelated Observations

EH5/MPI-OMIPCC-AR4

Regional ClimateScenarios IPCC-AR4(CCLM + REMO)

Page 19: Long-term Archiving of Climate Model Data at WDC Climate and DKRZ Michael Lautenschlager WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Data

Oracle BLOB-DB: data access via http and Java-API

Page 20: Long-term Archiving of Climate Model Data at WDC Climate and DKRZ Michael Lautenschlager WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Data

WDCC Catalogue search and data access interface(URL: cera.wdc-climate.de)

Access to 97 model experiments

Page 21: Long-term Archiving of Climate Model Data at WDC Climate and DKRZ Michael Lautenschlager WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Data

WDCC Project-based Data Access

(IPCC AR4 Hamburg, Results from Introduction)

Page 22: Long-term Archiving of Climate Model Data at WDC Climate and DKRZ Michael Lautenschlager WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Data

WDCC major accomplishments

Offering many TB of data by a standard web-browser interface and a Java API for direct data download.

Entering the interdisciplinary e-science environment by the primary data publication service. Independent data entities of more general interest are placed in library

catalogues in order to make them searchable with and citable in classical scientific literature

WDCC has more than 50 data entities registered in TIBORDER which are connected to appr. 1.5 TB data volume.

Networking with other topic related WDCs and long-term data archives. German WDC Cluster Earth System Research (WDC MARE, WDC

RSAT and WDCC) Data sharing with British Atmospheric Data Centre (BADC)

Offering data management services to scientific research projects for long-term archiving and dissemination of research results

Page 23: Long-term Archiving of Climate Model Data at WDC Climate and DKRZ Michael Lautenschlager WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Data

Primary data publication service Following the STD-DOI concept (Scientific and Technical Data

– Digital Object Identifier, URL: www.std-doi.de) Important aspects of the publication process are

The identification of independent data entities which are suitable for publication at the level of scientific literature,

The execution of an elaborated review process for metadata and climate data (quality control),

The assigment of additional metadata for electronic publication (ISO 690-2) and of persistent identifiers (DOI / URN) and

The integration of publication metadata and persistent identifiers into the TIB-Order library catalogue (German National Library of Science and Technology, Hannover) so that primary data entities are searchable and citable together with scientific literature.

Quality characteristic is presently “approved by author”, could be “peer reviewed” with ESSD (Earth System Science Data Journal).

Published data entities cannot be modified any longer. They are freely available via Internet..

Page 24: Long-term Archiving of Climate Model Data at WDC Climate and DKRZ Michael Lautenschlager WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Data

TIB

WDCC

Page 25: Long-term Archiving of Climate Model Data at WDC Climate and DKRZ Michael Lautenschlager WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Data

Data infrastructure integrates data stewardship in the long-term archive• Bit-stream preservation• Quality assurance• Usability enabling

Page 26: Long-term Archiving of Climate Model Data at WDC Climate and DKRZ Michael Lautenschlager WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Data

Long-term archive data stewardship Bit-stream preservation

Secondary tape copies on different tapes and technology at separate location

Copy to new tapes after maximum number of tape accesses are reached (Refreshment)

Quality assurance Semantic examinations: behavior of a numerical model

compared to observations and to other models, part of the scientific evaluation process

Syntactic examinations: formal aspects of data archiving and ensurance that data archiving is free of errors as far as possible

Consitency between metadata and climate dataCompleteness of climate dataStandard range of valuesSpatial and temporal data arrangement

Page 27: Long-term Archiving of Climate Model Data at WDC Climate and DKRZ Michael Lautenschlager WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Data

Long-term archive data stewardship (continued) Usability enabling

Complete and searchable documenation of climate data entities (database tables and flat files) in the catalogue system of the WDCC

WDCC offers web-based data access to small data granules (individual entries in BLOB DB tables)

Archive technology transfer must be downward compatible to keep old data technically readable

Data processing tools and data format access libraries must be migrated to new architectures

Page 28: Long-term Archiving of Climate Model Data at WDC Climate and DKRZ Michael Lautenschlager WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Data

Summary long-term archiving services at WDCC/DKRZ:

Long-term data storage at WDCC/DKRZ is thematically focused to Earth system research (modeling and related observations)

WDCC provides a fully documented data archive including a web-based searchable data catalogue and web-based data access

WDCC supports field-based data access including server side data processing (extraction of geographical regions and single time steps, format conversion)

WDCC is integrated in national (WDC-Cluster Germany, C3-Grid) and international data federations (IPCC AR5).

WDCC/DKRZ offer within the existing infrastructure long-term data storage for topic related external data entities at net cost basis.