© university of reading 2008 reading e-science centre 9 september 2008 harmonization of...
TRANSCRIPT
9 September 2008
© University of Reading 2008 www.reading.ac.uk
Reading e-Science Centre
Harmonization of environmental data using the Climate Science Modelling LanguageJon Blower, Alastair Gemmell (Reading e-Science Centre)Andrew Woolf, Dominic Lowe, Arif Shaon (STFC e-Science Centre)Stephen Pascoe (British Atmospheric Data Centre)Keiran Millard, Quillon Harphem (HR Wallingford)
We need to integrate and comparelots of different types of data…
SSM/I HadCM3
HiGEMERA-40
Satellite
Re-analysis product
Low res. Climate GCM
HadCM3
Hi-res Climate GCM, New physics
Putt, Gurney and Haines
…for validating numerical models…
… calibrating instruments …
+ =
…data assimilation…
Black line: control run
time
Green stars: observationsRed line: assimilation run
Flood prediction
... and making predictions
Search and rescue
Climate prediction
Where we are now (mostly)
Separate websites for
each data provider
The need for harmonization
• Each community has evolved its own means for presenting data:– File formats– Metadata conventions– Coordinate systems
• These are not usually mutually compatible
• … and vital metadata can be missing
• No widely-accepted standards exist for certain types of data
• Hence scientists spend lots of time dealing with low-level technical issues
• Need a common view onto all these datasets
Open Geospatial standards
• Aim to describe all geographic data
• XML encoding– Geography Markup Language
• Web Services for data exchange
• Rooted in international standards
• Mandated by European INSPIRE directive
• But fiendishly complex• Evolved from map-oriented
systems– Vertical and temporal
information not handled cleanly
Bridging the gap: CSML• Climate Science Modelling Language
– Abstract data model defined using ISO/OGC approach
– XML encoding based upon GML
• Adapts open geospatial standards to environmental science data– “Best of both worlds”
• Wraps existing data– Doesn’t expect providers to convert data
• Data are seen as geographical “features”, not as a file system
Selected CSML Feature Types
PointSeriesFeature
(timeseries at a point)
ProfileFeature
(vertical profile at a point)
GridSeriesFeature
(series of multidimensional grids)
SwathFeature
(single satellite sweep)
SectionFeature
(vertical section)
Feature Types are classified by their geometry
Harmonizing 2 databases using CSML
• Different data providers, different internal representation– Met Office “MIDAS” dataset– “Environmental Change
Network” dataset
• Modelled both databases as collections of CSML PointSeriesFeatures
• Allowed sharing of plotting and analysis tools– CSML-XML documents
converted to maps, plots and KML
• Intermediate step via XML not necessary in ideal world
Java-CSML• Need reusable libraries to
apply CSML more widely• Aim is to reduce cost of
developing data-driven applications
• Interoperates with other means of modelling data in Java:– GeoAPI, Common Data
Model
• High-level analysis/visualization routines completely decoupled from low-level data access
Java-CSML: Design attempts
1. Transform CSML’s XML schema to Java code using automated tool• Led to very deeply-nested code
2. Based upon OGC-sponsored GeoAPI• Incomprehensible unless very familiar with ISO
standards• GeoAPI is a moving target
3. Based on well-known Java concepts• Accessible to “typical” Java programmer• Compatibility with other data models assured
through wrappers• Insulated against inevitable changes to standards• More code needs to be written by Java-CSML
designers• Less code needs to be written by users
Java-CSML Application 1:Coastal oceanography decision support system
Red line: Smartbuoy dataBlue dots: model output
Behind the scenes
Smartbuoys(via Web Feature Service)
Physical model(via NetCDF files)
Biological model(via OPeNDAP server)
Java-CSMLwrappers
Java-CSMLPlotting routines
Java-CSML Application 2:Atmospheric ozone
Control run
Assimilation run
Specializing CSML Features
• A generic data model can’t encode all possible metadata without becoming extremely complex
• In CSML generic feature types can be specialized– cf. object-oriented
inheritance
• Hence core data model retains simplicity
ProfileFeature
ArgoProfileFeatureint qualityFlag
Java-CSML Application 3:Ocean data assimilation
ArgoProfileFeatureProfileFeature
Red lines: Argo dataBlue lines: model output
Summary• CSML bridges gap between bottom-up (science) and top-
down (GIS) approaches to modelling data– Wraps existing data holdings
• Data modelled as Feature Types distinguished by geometry and “sensible plotting”– Complexity managed through feature inheritance
• Doesn’t attempt to model everything!– Other technologies deal with discovery, provenance,
security…
• Java-CSML framework allows data intercomparison applications to be built quickly– Automates tedious and error-prone tasks
Wider lessons• “Interoperable” data formats not necessarily
suitable for storage– Because no single data model can satisfy every
application– Abstraction usually leads to data loss!
• Trade-offs between scope and complexity– Don’t attempt to put everything in one specification
• Symbiotic relationship between standards, tools and applications– Must be developed in parallel