ws spatiotemporal databases for geosciences, biomedical sciences and physical sciences

32
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 1 WS Spatiotemporal Databases for Geosciences, Biomedical sciences and Physical sciences Edinburgh, November 1st + 2nd, 2005 World Data Center Climate: Terabyte Data Storage in a Relational Database System WDCC Home: www.wdcc-climate.de / WDCC Contact: [email protected] Michael Lautenschlager, Hannes Thiemann and Frank Toussaint ICSU World Data Center Climate Model and Data / Max-Planck-Institute for Meorology Hamburg, Germany

Upload: aneko

Post on 14-Jan-2016

32 views

Category:

Documents


0 download

DESCRIPTION

World Data Center Climate: Terabyte Data Storage in a Relational Database System. Michael Lautenschlager, Hannes Thiemann and Frank Toussaint ICSU World Data Center Climate Model and Data / Max-Planck-Institute for Meorology Hamburg, Germany. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: WS Spatiotemporal Databases for  Geosciences, Biomedical sciences and Physical sciences

M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 1

WS Spatiotemporal Databases for Geosciences, Biomedical sciences and Physical

sciencesEdinburgh, November 1st + 2nd, 2005

World Data Center Climate:Terabyte Data Storage in a

Relational Database System

WDCC Home: www.wdcc-climate.de / WDCC Contact: [email protected]

Michael Lautenschlager, Hannes Thiemann and Frank Toussaint

ICSU World Data Center ClimateModel and Data / Max-Planck-Institute for Meorology

Hamburg, Germany

Page 2: WS Spatiotemporal Databases for  Geosciences, Biomedical sciences and Physical sciences

M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 2

Content:

Introduction of WDCC

CERA2 Data Model

Data Access

Connection to Mass Storage Archive

Summary

Page 3: WS Spatiotemporal Databases for  Geosciences, Biomedical sciences and Physical sciences

M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 3

Page 4: WS Spatiotemporal Databases for  Geosciences, Biomedical sciences and Physical sciences

M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 4

WDCC Content

ERA40

IPCC

CEOPBALTEX HOAPS

CARIBIC

WOCE

ERA15/40NCEP

GEBCO

COSMOS

Simulations @ MPI, GKSS,…

Data from Earth SystemModelling andRelated Observations

EH5/MPI-OMIPCC-AR4

Start: Approved in January 2003Maintenance: Model and Data (M&D/MPI-M) and German Climate Computing Centre (DKRZ)

Oktober 2005: 580 Experiments / 68.000 Data Sets

Page 5: WS Spatiotemporal Databases for  Geosciences, Biomedical sciences and Physical sciences

M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 5

WDCC Access

Page 6: WS Spatiotemporal Databases for  Geosciences, Biomedical sciences and Physical sciences

M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 6

WDCC Size

4.6 Billion BLOBs

Page 7: WS Spatiotemporal Databases for  Geosciences, Biomedical sciences and Physical sciences

M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 7

WDCC DB StorageStorage of global coverages per file or BLOB :

all levels, all parameters arbitrary time intervals

all levels, all parameters 1 moment (6 by 6 hours)

1 level, 1 parameter 1 moment (= 1 BLOB = 1 global field)

parameters

leve

lsda

ys/4

days

/4

parameters

leve

lstim

e

how we get the grid data:Files from climate model

postprocessing step 1: homogenizing time and calculation of diagnostics

postprocessing step 2: isolation of levels & parameters and creation of BLOB table input

Page 8: WS Spatiotemporal Databases for  Geosciences, Biomedical sciences and Physical sciences

M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 8

Data Model

Page 9: WS Spatiotemporal Databases for  Geosciences, Biomedical sciences and Physical sciences

M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 9

(I) Data catalogue and Unix files (pointer or BLOB-table-entry)

Enable search and identification of data Allow for data access as they are (coarse granularity)

(II) Application-oriented data storage Time series of individual variables are stored as BLOB

entries in DB Tables (fine granularity)Allow for fast and selective data access

Storage in standard data format (GRIB, NetCDF)Allow for application of standard data processing routines

(PINGOs, CDOs)

CERA1) Concept:Semantic Data Management

1) Climate and Environmental data Retrieval and Archiving

Page 10: WS Spatiotemporal Databases for  Geosciences, Biomedical sciences and Physical sciences

M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 10

Level 1 - Interface:Metadata entries(XML, ASCII)+ Data Files

Level 2 – Interf.:Separate filescontaining BLOBtable data in application adapted structure(time series ofsingle variables)

Experiment Description

Unix-FilesTable / Pointer

Dataset 1Description

Dataset nDescription

BLOB DataTable

BLOB DataTable

WDCC Data Topology

BLOB DB Table corresponds to scalable, virtual file at the operating system level.

Page 11: WS Spatiotemporal Databases for  Geosciences, Biomedical sciences and Physical sciences

M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 11

Page 12: WS Spatiotemporal Databases for  Geosciences, Biomedical sciences and Physical sciences

M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 12

CERA Data Model

Entry

Reference

Status

Distribution

Contact Coverage

Parameter

SpatialReferenceLocal Adm.

Data Access

Data Org

Page 13: WS Spatiotemporal Databases for  Geosciences, Biomedical sciences and Physical sciences

M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 13

Page 14: WS Spatiotemporal Databases for  Geosciences, Biomedical sciences and Physical sciences

M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 14

CERA Modules

3 Modules:

• DATA_ACCESSfor automatted data access ( remote data access)

• DATA_ORGorganization of grid data( geo-references of grid points in BLOBs)

• CODEmatching of (internal) model code numbers

Page 15: WS Spatiotemporal Databases for  Geosciences, Biomedical sciences and Physical sciences

M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 15

The CERA2 data model …allows for data search according to discipline, keyword, variable,

project, author, geographical region and time interval and for data retrieval.

allows for specification of data processing (aggregation and selection) without attaching the primary data.

is flexible with respect to local adaptations, to storage of different types of geo-referenced data, and to definition of data topologies (hierarchical, network, ….).

is open for cooperation and interchange with other database systems (e.g. FGDC metadata standard and ISO 19115 included).

But:is not the simplest data model for each single application.

Data Model Functions

Page 16: WS Spatiotemporal Databases for  Geosciences, Biomedical sciences and Physical sciences

M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 16

Data Access

Page 17: WS Spatiotemporal Databases for  Geosciences, Biomedical sciences and Physical sciences

M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 17

Web Access to WDCC

METADATA: DATA:

GUI: display in applet JDBC

jblob-script: Search for DS names

JDBC jblob –f …

http: - html-display

- xml-download (ISO, DC, …)

downloadhttp

URL:http://…

Page 18: WS Spatiotemporal Databases for  Geosciences, Biomedical sciences and Physical sciences

dynamichtml pages

http:htmlS

ervl

et /

JS

PlnternetApplication

Server

web browser

Interactive Catalogue Access

Catalogue access via WWW

• URL parsed by JSP

• integrated DB retrieval by JSP

• response in standard html

• efficient administration of detailed meta information

request: URL

Page 19: WS Spatiotemporal Databases for  Geosciences, Biomedical sciences and Physical sciences

write toclient disk

http:file download

Ser

vlet

/ J

SPlnternet

ApplicationServer

web browser

HTTP and JDBC Data Download

• request handeled by JSP• return of binary file

request: html form

jdbcfile download

request: jdbc

write toclient disk

progr. „jblob“

Data download via WWW

• standard client side jdbc retrieval• return of binary fileData download via script/batch

Page 20: WS Spatiotemporal Databases for  Geosciences, Biomedical sciences and Physical sciences

raw xml

xhtml

ISO xml

DC xml

... variousmetadataformats

http:XML

xsl –mapping

xsql

–qu

ery

see wini.wdc-climate.de

lnternetApplication

Server

Metadata access via WWW:

• xsql query to DB

• xml output from DB

• xsl mapping to any metadata format

XML Interface for http Metadata Output

request: URL user applications

Page 21: WS Spatiotemporal Databases for  Geosciences, Biomedical sciences and Physical sciences

plain ASCII

html tables

binary objects. . .

variousdata

formats

http:plain, bin,

htmlJava

Ser

vletlnternet

ApplicationServer

user applications

Data access via WWW

• URL parsed by servlet

• query: DB access by jdbc

• response in any format

http Data Output

request: URL

Page 22: WS Spatiotemporal Databases for  Geosciences, Biomedical sciences and Physical sciences

M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 22

Connection to Mass Storage Archive

Page 23: WS Spatiotemporal Databases for  Geosciences, Biomedical sciences and Physical sciences

M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 23

Page 24: WS Spatiotemporal Databases for  Geosciences, Biomedical sciences and Physical sciences

M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 24

Oracle DBMS+ HSM

DXDB:Unitree client on DB machines forcommunication betweenOracle DB and tape archive

Tapes Disks

Page 25: WS Spatiotemporal Databases for  Geosciences, Biomedical sciences and Physical sciences

M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 25

Use of DXDB

DXDB is used for Ordinary Oracle datafiles Redo logs Backup

Page 26: WS Spatiotemporal Databases for  Geosciences, Biomedical sciences and Physical sciences

M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 26

TBS - RW

TblPartition 1

TBS - RW

TblPartition 2

dxdb

TBS - RO

TblPartition 1

All tablespaces are moved

“at once” to dxdb

MigoutMigin

Page 27: WS Spatiotemporal Databases for  Geosciences, Biomedical sciences and Physical sciences

M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 27

Migout / Migin

Migout takes place after files haven’t been modified for x minutes

Only one migout process per dxdb-filesystem Migin takes place immediately after a file is requested.

Only parts accessed are retrieved from the backend storage.

One migin process per requested file.

Page 28: WS Spatiotemporal Databases for  Geosciences, Biomedical sciences and Physical sciences

M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 28

dxdb

LWM

HWM

Purging

Page 29: WS Spatiotemporal Databases for  Geosciences, Biomedical sciences and Physical sciences

M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 29

Pro

It works It’s fast Applications don’t have to wait until files are completely

restored from tapes.

Page 30: WS Spatiotemporal Databases for  Geosciences, Biomedical sciences and Physical sciences

M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 30

Contra

It works

Dxdb not supported by Oracle Oracle's officially supported Backend requirements do not necessarily match requirements from other applications like HSM systems (i.e. connection to Unitree is not standarised).

- If the backend works

Page 31: WS Spatiotemporal Databases for  Geosciences, Biomedical sciences and Physical sciences

M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 31

SummaryEfficient handling of detailed metadata

• easy and structured administration of > 60 metadata tables

• access support:Java Server Pages (JSP), Servlets, jdbc, xsqlincluding standard DB features (sql, views, triggers, ... )

Efficient handling of fine granularity data

• random access to arbitrary time steps of single parameters

• access support:Java Server Pages (JSP), Servlets, jdbcincluding standard DB features (authorisation, ... )

• transparent migration of bulk data to tape

Page 32: WS Spatiotemporal Databases for  Geosciences, Biomedical sciences and Physical sciences

M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 32

The Winter TopTen Program identifies the world’s largest and most heavily used databases.

Email reached in September, 13th: ….. Congratulations on achieving Grand Prize award winner status (1) in Database Size, Other, All and TopTen Winner status Database Size, Other, Linux;Workload, Other, Linux in Winter Corp.'s 2005 TopTen Program! .......

(1) Grand prizes are  awarded for first place winners in the All Environments categories only.

WDCC's CERA DB has been identified as the largest Linux DB.

http://www.wintercorp.com/VLDB/2005_TopTen_Survey/2005TopTenWinners.pdf