iplant cyberifrastructure to support ecological modeling presented at the species distribution...
TRANSCRIPT
iPlant cyberifrastructure to support ecological modeling
Presented at the Species Distribution Modeling Group at the American Museum of Natural History
May 30, 2014Ramona Walls
iPlant Collaborative Vision
Enable life science researchers and educators to use and extend cyberinfrastructure to understand and ultimately predict the complexity of biological systems.
What is cyberinfrastructure?
Data Storage
Software HPC People
iPlant CI
Storage and compute
Platforms, tools, datasets
Training, support, expertise
Software Hardware People
iPlant CI supports synthetic biology
Genotypic
Phylogenetic Tools for inference
Ecological Models
Crop Models
Association Studies
Molecular Networks
Environmental
Comparative Genomics
Sequencing & Assembly
Annotation
Environmental datasets
Phenotypic
Image-based Phenotyping
Molecular Phenotyping
Trait Data
Climate model products
iPlant is a collaborative virtual organization
iPlant collaborates to enable access to the solutions that work
the best for you.
OVERVIEW OF IPLANT TOOLS AND SERVICES
iPlant Data Store
Initial 100 GB allocation – TB allocations available
Automatic data backup
Easy upload /download and sharing
The resources you need to share and manage data with your lab, colleagues and community
AtmosphereCloud computing for the life sciences
Simple: One-click access to more than 100 virtual machine images
Flexible: Fully customize your software setup
Powerful: Integrated with iPlant computing and data resources
Discovery EnvironmentHundreds of bioinformatics Apps in an easy-to-use interface
A platform that can run almost any bioinformatics application
Seamlessly integrated with data and high performance computing
User extensible – add your own applications
BisqueImage analysis, management, and metadata
Secure image storage, analysis, and data management
Integrate existing algorithms or create new ones
Custom visualization and image handling routines and APIs
Agave APIFully customize iPlant resources
Science-as-a-service platform
Define your own compute and storage resources (local and iPlant)
Build your own app store of scientific codes and workflows
DNA SubwayEducational workflows for Genomes, DNA Barcoding, RNA-Seq
Commonly used bioinformatics tools in streamlined workflows
Teach important concepts in biology and bioinformatics
Inquiry-based experiments for novel discovery and publication of data
SUPPORT FOR ECOLOGICAL MODELING
Project Goals
• Provide computational support for scalable:– modeling of species’ geographic distribution
(SDM)– mechanistic eco-physiological modeling
Major limitations in the field of ecological modeling
• Access to data– environmental and organismal
• Access to high performance computing (HPC) tools that can support compute-intensive models
• Model development
iPlant can provide infrastructure to help overcome the first two challenges and partner with the community on the third challenge.
?
iPlant’s long-term vision for an ecological modeling infrastructure
• Modular access to climate layers• A query interface for finding and extracting
relevant occurence and trait data for the taxa of interest from iPlant’s Data Commons
• Powerful, flexible modeling tools• Sophisticated visualization of geospatial data
Initial plan is to provide access to:
• Environmental data• Organismal locality (occurence) data• High performance computing environment for
running models
Environmental Data
• Data layers are often large and difficult to work with, even though the researcher only needs a subset of the layer.
• Web services (e.g., GeoNode.org and GeoServer.org) can be harnessed to allow researchers to work with data layers stored remotely.
iPlant will make environmental data layers available through the Data Commons and GeoServer
• University Corporation for Atmospheric Research (UCAR)
• Oakridge National Laboratory’s Distributed Active Archive Center for Biogeochemical Dynamics (ORNL DAAC)
• NASA Earth Observing System Data and Information System (EOSDIS) available through the Data Commons.
• High-res layers from iPlant collaborators?
Organismal locality (occurence) data
• For many modeling efforts, users will supply their own list of species’ localities.
• Through the BIEN3 database, iPlant users will also have access to data for North American plants– includes cleaned-up Global Biodiversity Information
Framework (GBIF) data. – iPlant will provide a query interface for extracted
subsets of the BIEN data for use in ecological modeling.
• Some trait data will also available
Modeling tools 1
• Initially, iPlant will make an HPC version of Maxent available to users.
• Investigating the utility of making popular R packages for modeling (biomod2, Maxlike, and IPMpack) available through rPlant and wrapR, so that they can run on HPC resources.
Modeling tools 2
• More generally, ecological modeling will be supported through an HPC version Matlab.
• Because of licensing restrictions, users will initially be restricted to running Matlab models which they build on their own, licensed system.
• Stan and OpenBUGS are being considered to support Bayesian modeling.
?
Links
• http://www.iplantcollaborative.org/
• contact: rwalls_at_iplantcollaborative.org
Timeline• Q3 2014:
– HPC version of Maxent available through iPlant (DE or Atmosphere)– Availability of BIEN occurence data– Scope work on query and subsetting services for data layers*– Metadata template for environmental layers
• Q4 2014:– HPC version of Matlab for running models– Query interface for BIEN occurence data– Continue development of query and subsetting services for data layers*
• Q1 2015:– Ability to query environmental layers through Data Commons*– Ability to subset environmental layers through iPlant CI (DE, Atmosphere, or
API)– Species distribution modeling tutorial.
*May happen sooner through GeoNode