the mashmydata project combining and comparing environmental science data on the web alastair...

15
The MashMyData project Combining and comparing environmental science data on the web Alastair Gemmell 1 , Jon Blower 1 , Keith Haines 1 , Stephen Pascoe 2 , Phil Kershaw 2 , Bryan Lawrence 2 , Simon Woodman 3 , Hugo Hiden 3 1. Reading e-Science Centre (ReSC) @ University of Reading 2. Centre for Environmental Data Archival (CEDA) @ British Atmospheric Data Centre 3. School of Computing Science @ University of Newcastle

Post on 20-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The MashMyData project Combining and comparing environmental science data on the web Alastair Gemmell 1, Jon Blower 1, Keith Haines 1, Stephen Pascoe 2,

The MashMyData projectCombining and comparing environmental science

data on the web

Alastair Gemmell1, Jon Blower1, Keith Haines1, Stephen Pascoe2, Phil Kershaw2, Bryan Lawrence2, Simon Woodman3, Hugo Hiden3

1. Reading e-Science Centre (ReSC) @ University of Reading2. Centre for Environmental Data Archival (CEDA) @ British Atmospheric Data

Centre3. School of Computing Science @ University of Newcastle

Page 2: The MashMyData project Combining and comparing environmental science data on the web Alastair Gemmell 1, Jon Blower 1, Keith Haines 1, Stephen Pascoe 2,

Outline

• Background to MashMyData

• Motivation

• Challenges

• Interoperability and project architecture

• Current state of the project

• The future of the project

Page 3: The MashMyData project Combining and comparing environmental science data on the web Alastair Gemmell 1, Jon Blower 1, Keith Haines 1, Stephen Pascoe 2,

• NERC-funded project under the ‘Technology proof of concept’ programme

• Commenced 1st February 2010. Runs until 30th June 2011

• Aiming to present some of our later outputs at EGU 2011

• Here we introduce the project and show its current status and plans for the future.

• Funded partners are Reading e-Science Centre (ReSC) and the Centre for Environmental Data Archival (CEDA)

MashMyData Background

Page 4: The MashMyData project Combining and comparing environmental science data on the web Alastair Gemmell 1, Jon Blower 1, Keith Haines 1, Stephen Pascoe 2,

Motivation

• Environmental scientists use many diverse data sources including:

• in-situ measurements (e.g. ocean buoys, radiosondes)

• remotely-sensed data (e.g. satellite, radar)

• numerical simulations

• However this results in much heterogeneity of data formats, data access methods, and thus suitable software

• We want to allow scientists from different disciplines to bridge between a variety of datasets regardless of the underlying data formats etc.

Page 5: The MashMyData project Combining and comparing environmental science data on the web Alastair Gemmell 1, Jon Blower 1, Keith Haines 1, Stephen Pascoe 2,

Technical Challenges

• The MashMyData project is faced with a number of challenges in order to be successful

• Much overlap between these challenges and a number of important challenges in the wider e-Science community

• The solutions will potentially be widely applicable in the future

• Challenges:

• Dealing with data diversity

• Performing calculations remotely in a way that scales

• Accessing secure data, and the delegation problem

• Enabling traceability and reproducibility

Page 6: The MashMyData project Combining and comparing environmental science data on the web Alastair Gemmell 1, Jon Blower 1, Keith Haines 1, Stephen Pascoe 2,

Integrating web services and technologies

• Recent discussion on the gains of re-using existing e-Science

• We have identified a number of existing web services and technologies and integrated them in the MashMyData project:

• Reading e-Science Centre’s ncWMS/Godiva2 Web Map Service (displaying gridded environmental data)

• Centre for Environmental Data Archival’s Web Processing Service (number crunching for compute-intensive workflows)

• Newcastle University’s e-Science Central software (upload, workflows, versioning)

• University of Liege’s DIVA-on-web service (interpolating geospatial point data)

Page 7: The MashMyData project Combining and comparing environmental science data on the web Alastair Gemmell 1, Jon Blower 1, Keith Haines 1, Stephen Pascoe 2,

Architecture

Page 8: The MashMyData project Combining and comparing environmental science data on the web Alastair Gemmell 1, Jon Blower 1, Keith Haines 1, Stephen Pascoe 2,

Current project status

• First important step was to add multi-dataset capability to godiva2 viewing portal.

• As part of this we have added ability to view in-situ point data as well as gridded data

• This paves the way for mashing up datasets (e.g. produce average or difference of existing datasets)

• Security is being engineered currently, as is the Web Processing Service (required for mash-up workflows)

Page 9: The MashMyData project Combining and comparing environmental science data on the web Alastair Gemmell 1, Jon Blower 1, Keith Haines 1, Stephen Pascoe 2,

Web Interface

Page 10: The MashMyData project Combining and comparing environmental science data on the web Alastair Gemmell 1, Jon Blower 1, Keith Haines 1, Stephen Pascoe 2,

Web Interface

• Click and drag a layer metadata box into a new position

• This alters the layer stacking on the map to enable layers to be moved towards the front or back

• The opacity of the layers can also be modified to reveal those underneath

Page 11: The MashMyData project Combining and comparing environmental science data on the web Alastair Gemmell 1, Jon Blower 1, Keith Haines 1, Stephen Pascoe 2,

Web Interface

Page 12: The MashMyData project Combining and comparing environmental science data on the web Alastair Gemmell 1, Jon Blower 1, Keith Haines 1, Stephen Pascoe 2,

Interface with e-Science Central

• User can upload data via the e-Science Central API.

• Thereafter they can view available data sources and workflows

• User can run a given workflow on data of choice and this will execute the workflow in e-Science Central

• This interface with e-Science Central will be invisible to the user – they just know they can upload data, view it and run workflows

• Files and workflows versioning and metadata are recorded by e-Science Central

Page 13: The MashMyData project Combining and comparing environmental science data on the web Alastair Gemmell 1, Jon Blower 1, Keith Haines 1, Stephen Pascoe 2,

Examples of work in progress

Page 14: The MashMyData project Combining and comparing environmental science data on the web Alastair Gemmell 1, Jon Blower 1, Keith Haines 1, Stephen Pascoe 2,

Further Work

• Finish integration with CEDA’s WPS (currently works with a simple test process)

• This in turn will pave the way for adding mash-up functionality to the web interface.

• Finish engineering the security solution. This will allow access to secure datasets (e.g. Met Office) for certain authorised users.

• Continue meetings with test case users to ensure that the system meets their needs (so far so good but relatively early days!)

Page 15: The MashMyData project Combining and comparing environmental science data on the web Alastair Gemmell 1, Jon Blower 1, Keith Haines 1, Stephen Pascoe 2,

Thanks!

[email protected]

www.mashmydata.org (Not live yet, but currently links to our project page including

svn on Google Code)