Download - Australian Ecosystems Science Cloud
TERN is supported by the Australian Government through the National Collaborative Research Infrastructure Strategy.
Australian Ecosystems Science Cloud
overview
Presentation by Siddeswara GuruDirector, Data Science
Ecosystem science• Inter-relationship among the living organisms, physical features, bio-chemical
processes, natural phenomena, and human activities in ecological communtiies1
• Focusing on Terrestrial Ecosystem– Terrestrial Ecosystem Research Network– Atlas of Living Australia
• Data is heterogeneous: wide variety from different domain– Observation (human, in-situ sensors and satellite remote sensing)– Variety of scale: spatial and temporal– Different data formats used in the community
Data Use• Conventional data access – Need to find data– Access via services– copy from source to destination for further for
large datasets
Image from internet
Storage and Compute• Advent of NeCTAR and RDS– Researchers are moving data and computation to
cloud.– Building tools (Virtual labs, research tools and
platforms)– However, easy accessibility of data is still an issue• Multiple interfaces to search for data• No clear access mechanism from different nodes
Goal• Offer open data platform: harmonised cloud-enabled data
infrastructure for data interoperability with simplified service model
• Offer compute next to data to minimise data movement• Data accessibility to different research platforms and virtual
labs from common platform• Offer scalable managed computing environment with access
to distributed and data-intensive computation technologies• develop a support system for a cross-discipline use of data
User Stories• As an ecosystem science continental-scale gridded data user, I wants to query a dataset, perform
spatial and temporal sub-setting of data, access and use that data from a cloud platform as a local file so that I can work on further analyses.
• As an application developer, I need enough compute and storage for short period of time to run a distributed large-scale data intensive application so that the output of the analyses are available in decent amount of time.
• As a regular ecology data user, I need a easily accessible cloud compute platform with common tools (Rstudio, Jupyter Python, NetCDF viewer, spatial data viewer, CSV file viewer) attached with the TERN ecology and biophysical data collection so that I can build applications for analysis and synthesis.
• As a data intensive application developer, I need a flexible approach to create and access to Hadoop cluster so that I can distribute my computation.
• As a data user, I want an easy access to reference datasets with compute resources so that I can use them in my analysis and research work.
• As a ecosystem data user, I want a one stop-shop to search, query and access ecosystem data and use in my analysis so that I don't have to go through multiple portals to access and use data.
• As an application developer, I want a cloud platform to run my simulation with a local access to data so that I don't move data around or download into my desktop.
High-level conceptual Architecture
Current status• Setup a Technical Advisory Group advice on the scoping and
implementation of the project.• In the first iteration: reference datasets will be made available – Remote sensing reference data (fractional Cover)– Long-term ecological monitoring data– Climate variables
• Scoping the mediation layer and overall architecture• Building a coalition of willing for partnership and collaboration
Contributions• NeCTAR – Major project sponsor• TERN, ALA – NCRIS Domain Projects, partners• QCIF - implementation partner• NCI – collaborator, partners