how to expand the galaxy from genes to earth in six simple steps (and live smarter and longer)

20
How to expand the Galaxy from genes to Earth in six simple steps (and live smarter and longer) Raffaele Montella 1,2 , Alison Brizius 2 , Joshua Elliott 2 , David Kelly 2 , Ravi Madduri 2,3 , Ketan Maheshwari 3 , Cheryl Porter 4 , Peter Vilter 2 , Michael Wilde 2 , Wei Xiong 4 , Meng Zhang 4 and Ian Foster 2,3,5 1 Department of Science and Technologies, University of Naples Parthenope, Naples, ITALY; 2 Computation Institute, Argonne National Laboratory and University of Chicago, Chicago, Illinois, USA; 3 Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, Illinois, USA; 4 University of Florida, Department of Agricultural and Biological Engineering, Gainsville, Florida, USA; 5 Departmet of Computer Science, University of Chicago, Chicago, Illinois, USA; Department of Science and Technologies University of Naples Parthenope Mathematics and Computer Science Division Department of Agricultural and Biological Engineering ceit -portal.org use faceit.or learn faceit.org

Upload: raffaele-montella

Post on 29-Jul-2015

336 views

Category:

Science


0 download

TRANSCRIPT

Page 1: How to expand the Galaxy from genes to Earth in six simple steps (and live smarter and longer)

How to expand the Galaxy fromgenes to Earth in six simple steps

(and live smarter and longer)

Raffaele Montella1,2, Alison Brizius2, Joshua Elliott2, David Kelly2, Ravi Madduri2,3, Ketan Maheshwari3, Cheryl Porter4, Peter Vilter2, Michael Wilde2, Wei Xiong4, Meng Zhang4 and Ian Foster2,3,5

1Department of Science and Technologies, University of Naples Parthenope, Naples, ITALY;2Computation Institute, Argonne National Laboratory and University of Chicago, Chicago, Illinois, USA;

3Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, Illinois, USA;4University of Florida, Department of Agricultural and Biological Engineering, Gainsville, Florida, USA;

5Departmet of Computer Science, University of Chicago, Chicago, Illinois, USA;

Department of Science and TechnologiesUniversity of Naples Parthenope

Mathematics andComputer Science

DivisionDepartment of

Agricultural andBiological Engineering

faceit-portal.org usefaceit.orglearnfaceit.org

Page 2: How to expand the Galaxy from genes to Earth in six simple steps (and live smarter and longer)

FACE-IT: A Framework to Advance Climate, Economic, and Impact Investigations with Information Technology

Facing real problems withInformation Technology

No buzzwordReal things!An open playground forthe next generation ofearth system scientists

What’s in a name…

Scie

nce

Gat

eway

s

Dat

a +

Wor

kflow

s =

Resu

lts

The user profile…

ScientistsExperts of their fields

Limited programming skillsComplex experiments

Effective and efficientsolutions to real problemsExperts in design andabstraction

Info

rmati

onTe

chno

logy

Development experts in wizardry…

Built onwidely used Galaxy,

Globus, and Swift systems

faceit-portal.org

Page 3: How to expand the Galaxy from genes to Earth in six simple steps (and live smarter and longer)

Janu

ary

June

Nov

embe

r

Mar

ch

Janu

ary

Febr

uary

FACE-IT: A Framework to Advance Climate, Economic, and Impact Investigations with Information Technology

The Face-IT Galaxytimeline

• Research Prototype

2011 2012 2013 2014 2015

Oct

ober

April

Febr

uary

Oct

ober

GalaxyGalaxy-ESFace-IT

• FACE-ITFramework for Advanced Climate, Economy and Impact Investigation with Information Technology

Nov

embe

r

Genomics and Earth Science communities share the same problems: why don’t create a Galaxy for Earth Science?The first working prototype

with data analysis and workflow support. It could work!

Galaxy-ES exists as a Globus Galaxy Genomics fork.• Earth Science datatypes• Converters• Processors• Visualizers• Remote data browsing• Display Application demo• Aggregated datasets (RAFT)

Galaxy-ES is a pluggable toolshed of Globus Galaxy Genomics. More datatypes, more tools, more working demos.

Galaxy-ES changes its status from research prototype to project in progress. The Face-IT team is finally joined.

First Face-IT developer conference.• NetCDF Schema• Swift based toolsThings are working.

The prototype Globus Galaxy Face-IT instance is launched in the Amazon cloud.

Second Face-IT developer conference.• Globus Online integration• 3rd parts remote data browsing• Advanced visualization

Face-IT at GCE14/SC14Many data sourcesMany applicationsGlobus integrationPeople around the world use it!

Third Face-IT developer conference.• NTCC: No need to touch core code• Datatypes as “proprietary”• New visualizers

Face-IT as AgMIP meeting.• New netcdf datatype with schema• New xml and json datatypes• WMS map visualizer

Page 4: How to expand the Galaxy from genes to Earth in six simple steps (and live smarter and longer)

FACE-IT: A Framework to Advance Climate, Economic, and Impact Investigations with Information Technology

The Hitchhiker’s[Data Analysis] Guide to the Galaxy

Canvas

Tool

Pal

ette

His

tory

Dataset

Peek Area

Page 5: How to expand the Galaxy from genes to Earth in six simple steps (and live smarter and longer)

FACE-IT: A Framework to Advance Climate, Economic, and Impact Investigations with Information Technology

The Hitchhiker’s[Workflow] Guide to the Galaxy

Workflow

Tool

Pal

ette

Tool

Par

amet

ers

Page 6: How to expand the Galaxy from genes to Earth in six simple steps (and live smarter and longer)

FACE-IT: A Framework to Advance Climate, Economic, and Impact Investigations with Information Technology

From genesto Earth

• Datatypes• Tools• Tool parameters• Aggregated datatypes• Data providers• Visualizers

Page 7: How to expand the Galaxy from genes to Earth in six simple steps (and live smarter and longer)

Step ONE:earth system datatypes

CLIMATEAGRICULTURE

ECONOMY

converter

Datatypes

Datasets

Data

file_extmime-type…Metadata…

set_meta()sniff()display_peek()display_data()…

NetCDF

file_ext=“nc”…Metadata: NCMLMetadata: WMS

set_meta()Sniff()display_peek()display_data()…

GCM

file_ext=“gcm.nc”…schema= “GCM.ncxsd”…

FACE-IT: A Framework to Advance Climate, Economic, and Impact Investigations with Information Technology

• Datatype:the kind of data we want to deal with

• Dataset:the actual data we manage as belonging to a datatype

• If you are thinking about classes and instances in the OOP model you are right!

• Implemented as Python classes

Page 8: How to expand the Galaxy from genes to Earth in six simple steps (and live smarter and longer)

FACE-IT: A Framework to Advance Climate, Economic, and Impact Investigations with Information Technology

Intermezzo[primer] NetCDF

• NetCDF:wide-spread file format for multidimensional environmental data

• Supports unstructured, regular and curvilinear grids

• Dimensions, variables and attributes

• Self descriptive• Conventions

• Huge amount of data sources, libraries and tools

Page 9: How to expand the Galaxy from genes to Earth in six simple steps (and live smarter and longer)

FACE-IT: A Framework to Advance Climate, Economic, and Impact Investigations with Information Technology

Intermezzo[Schema] NetCDF

• NetCDF Schema:a brand new way to compare and match different NetCDF files.

• Based on wide spread and stable technologies– XML Schema– NetCDF Markup Language– Regular expressions

• Originally built for NetCDF sniffing in Face-IT Galaxy could be something promising…

NetCDFfile

.ncxsd(NetCDF Schema)

NetCDF Schema Library (Python)

Page 10: How to expand the Galaxy from genes to Earth in six simple steps (and live smarter and longer)

FACE-IT: A Framework to Advance Climate, Economic, and Impact Investigations with Information Technology

Step TWO:new tools

• Tool:Is a computing process fed by one or more datasets producing one or more datasets

• It is wrapped over any kind of executable

• Running by naïve local scheduler, super-computers, virtual machines somewhere in the cloud.

• Each input and output is data typed• It is defined using XML The tools palette

The same toolin a workflow

A tool in data analysis

Page 11: How to expand the Galaxy from genes to Earth in six simple steps (and live smarter and longer)

FACE-IT: A Framework to Advance Climate, Economic, and Impact Investigations with Information Technology

Step TWO:[changing the order of running dimensions] new tools

• The tool executable is run in a scratch directory

• By default input and output datasets are managed “in place”

• Data-typing is strictly enforced

ncpdq -a lat,lon ACCESS1-0.nc4 ACCESS1-0_latlon.nc4

<variable name="fwetpr1_rcp45”shape="decade_rcp month

lon lat”type="float">

…</variable>

<variable name="fwetpr1_rcp45”shape="decade_rcp month

lat lon”type="float">

…</variable>

GCMlatlon

GCM

Executable

Must be in the path

Parameters

Could be defined at runtime

Input dataset

The input filename

Output dataset

The output filename

<tool id="gcm2gcmlatlon" name="GCM to GCM with latlon" version="0.1"> <description>Convert a GCM dataset to a GCMlatlon ready for WMS …</description> <command>ncpdq -a lat,lon $Input $Output</command> <inputs> <param name="Input" type="data" format="gcm.nc" label=”…" /> </inputs> <outputs> <data format="gcm.latlon.nc" name="Output" label=”…" /> </outputs></tool>

Page 12: How to expand the Galaxy from genes to Earth in six simple steps (and live smarter and longer)

FACE-IT: A Framework to Advance Climate, Economic, and Impact Investigations with Information Technology

Step THREE:tool parameters

• Tool parameters:Define the user interface elements for a tool

• Regular tool parameters wrap text fields, radio buttons and drop drown lists.

• Custom tool parameters for Globus Online, OpenDap, date peaking and feature selection of maps.

Page 13: How to expand the Galaxy from genes to Earth in six simple steps (and live smarter and longer)

FACE-IT: A Framework to Advance Climate, Economic, and Impact Investigations with Information Technology

Step FOUR:aggregated datatypes (RAFT*)

• Dataset References:XML based datatype grouping references to different datasets in the same history.

• The regular Galaxy works on single file datasets or composite file datasets.

• Acts as a ‘struct’ or an ‘array’ or a mix of both.

• Supports schemas and translators.

DsRef(EnhancedXML)

Used when:• A tool consumes and/or produces a variable

number of datasets• The tool is implemented using a Swift script

working in parallel

Page 14: How to expand the Galaxy from genes to Earth in six simple steps (and live smarter and longer)

FACE-IT: A Framework to Advance Climate, Economic, and Impact Investigations with Information Technology

Step FIVE:data providers

• Data providers:software components interfacing the datasets with the web browser.

• They provide data as array of JSON objects

• Key/Values, Columnar, custom• Implemented in Datatype classes

Web Browser Galaxy Instance

History Database

Association

Data Providers

Datatype

Data fileDataset

DataProvider

Web page…

…dynamically generated---

…form Mako template

(mix of server side python code with client side web technologies)

request

response

template

Page 15: How to expand the Galaxy from genes to Earth in six simple steps (and live smarter and longer)

FACE-IT: A Framework to Advance Climate, Economic, and Impact Investigations with Information Technology

Step SIX:[GeoJson vector maps] map visualizers

• Visualizers:client-side software components for interactive data visualization

• Quasi-GIS!

• Map:Visualizes vector data produced as GeoJson objects by a data provider

• Wms (World Map Server):Visualizes raster data from NetCDF datatypes. ACMO file visualization

Page 16: How to expand the Galaxy from genes to Earth in six simple steps (and live smarter and longer)

FACE-IT: A Framework to Advance Climate, Economic, and Impact Investigations with Information Technology

Step SIX:[NetCDF & World Map Server] map visualizers

• Wms:World Map Server visualizes raster data from NetCDF datatypes.

• It leverages on an external software.

• Still experimental!

• Steps:– Dataset registration– Data provider interaction– GUI setup– Map consuming NetCDF Datatype

Data Provider

WMS Data Provider

History Database

AssociationWMS Server

modifiedsci-wms

History

Dat

atyp

e

HTMLJavascript

JQueryJQuery-UI

Leaflet JS

WMS Visualizer

set_meta()

wms_urlG

ener

ate

visu

alize

rfr

om M

ako

tem

plat

e

request:provider-wms

response:wms_urlcapabilities…

Page 17: How to expand the Galaxy from genes to Earth in six simple steps (and live smarter and longer)

FACE-IT: A Framework to Advance Climate, Economic, and Impact Investigations with Information Technology

Step SIX:[NetCDF & World Map Server] map visualizers

• Examples:

Sea Surface Temperature

Conturing

Page 18: How to expand the Galaxy from genes to Earth in six simple steps (and live smarter and longer)

FACE-IT: A Framework to Advance Climate, Economic, and Impact Investigations with Information Technology

Conclusions and[now] future works

• Face-IT Galaxy is a creative playground for the next generation of earth scientists

• Propose your application, write your code and share it!

• Spin-off projects: extreme weathersimulations in the Bay of Napoli, IT(UniParthenope)

http://www.learnfaceit.org

Page 19: How to expand the Galaxy from genes to Earth in six simple steps (and live smarter and longer)

FACE-IT: A Framework to Advance Climate, Economic, and Impact Investigations with Information Technology

Conclusions and[live smarter and longer] future works

• Instrumented Smart Cities are a huge source of big data

• Array of Things as a Face-IT Galaxy data source?

• Why not use NetCDF Schema as a search criteria after a crawler has explored the internet hunting

for earth system data?

A po

ssib

le C

I col

labo

ratio

n?

http://arrayofthings.github.io

Page 20: How to expand the Galaxy from genes to Earth in six simple steps (and live smarter and longer)

FACE-IT: A Framework to Advance Climate, Economic, and Impact Investigations with Information Technology

GCE: The 9th Gateway Computing Environments Workshop@SC14

Cite as:

Montella, R., Brizius, A., Elliott, J., Kelly, D., Madduri, R., Maheshwari, K., ... & Foster, I. (2014, November). FACE-IT: a science gateway for food security research. In Proceedings of the 9th Gateway Computing Environments Workshop (pp. 42-46). IEEE Press.