iplant-highlights-pag2015

14
The iPlant Collaborative Matt Vaughn Director, Life Sciences Computing Texas Advanced Computing Center [email protected] www.iplantc.org Biology Cyberinfrastructure to Meet the Challenges of Large Datasets

Upload: matthew-vaughn

Post on 14-Jul-2015

100 views

Category:

Science


0 download

TRANSCRIPT

The iPlant Collaborative

Matt VaughnDirector, Life Sciences Computing

Texas Advanced Computing [email protected]

www.iplantc.org

Biology Cyberinfrastructure to Meet the Challenges of Large Datasets

What is iPlant?

The iPlant Collaborative is an NSF-

sponsored community-driven

organization that builds, operates, and

supports extensible and powerful

cyberinfrastructure for life sciences

iPlant Cyberinfrastructure

iPlant Software Products

Name Description Target Audience

DNA Subway Educational interface to genomics topics Beginning users and educators

Discovery Environment User-friendly, petascale graphical workbench Command-line naïve users who

need to do scalable bioinformatics

Atmosphere User-friendly, on demand cloud computing and

persistent services

Users with desktop use cases or

complex software environments

Bisque Platform to facilitate cloud-based exchange and

exploration of biological images

Command-line naïve users who

need to do image analysis

Spatial Data

Infrastructure*

Platform for developing geospatial information

systems & deploying spatial data infrastructures

Command-line naïve users who

need to work with GIS data

iPlant Science APIs RESTful interface to all iPlant capabilities Advanced users, developers, 3rd

party infrastructure or service

providers

iPlant Data Store Capacious, scalable, shareable storage Shared and used by all iPlant users

iPlant Services

• Education, Outreach and

Training

• Real-time user support

• Hackathons & workshops

• Extended Collaborative

Support

• Powered by iPlant Program

2014-2015 Highlights: Thousand Plant Transcriptomes

• Marker paper1 out along with several coordinated manuscripts

• 100x increase in green plant gene coverage by Genbank

• Key insights into relationships between land plants and green algae

• Original sequence reads, assemblies & downstream analyses, plus data access APIs and workflows available via iPlant2

1. Wickett & Mirarab et al. Proc Natl Acad Sci U S A. 2014 Nov 11;111(45):E4859-68. doi: 10.1073/pnas.1323926111

2. Matasci et. Al 2014 GigaScience 2014, 3:17 doi:10.1186/2047-217X-3-17

2014-2015 Highlights: iMicrobe

iMicrobe Data Commons

• Hurwitz Lab, University of Arizona

• Funded by Gordon & Betty Moore Foundation

• Aim: Make the high-value CAMERA microbial datasets available through an interactive data commons

• Required just two months of development thanks to iPlant cyberinfrastructure

– http://data.imicrobe.us/

• Being replicated to power a viral genomics platform

iPlant offers a powerful toolbox for rapidly developing next-

generation community resources

2014-2015 Highlights: iPlant’s Broadening Impact

• Powered by iPlant program

• Foundation for other life sciences projects

• Adoption outside the life sciences

JETSTREAM

2014-2015 Highlights: Jetstream

• iPlant Atmosphere demonstrated value of user-provisioned cloud

• Partnership: Indiana University, TACC, UArizona, U Chicago, UTSA, Johns Hopkins & Penn State

• NSF ACI #1445604

• January 2016 via XSEDE

• ~50x capacity of iPlant Atmosphere. Same great UI. Innovative new capabilities.

A national science and engineering cloud

What’s Coming Next?

• New high performance tools and workflows

– MAKER-P and a host of assembly and expression workflows

• iPlant Data Commons

– Discoverability, persistence, provenance

• Expanded support for pro users and developers

– APIs, workshops, tutorials, and more

• New capabilities to support Science Communities

– Expanding participation and fostering cooperation

The iPlant CollaborativeNew and Continuing Peer Collaborations

• CoGe – Comparative genomics• EPIC – CoGe extension to support

epigenetics• iAnimal – 2x USDA AFRI grants for CI• Galaxy – Hosting usegalaxy.org• BioExtract Server • IBP – GCP led• IRRI/CAS – Resequenced rice varieties• KBase – DOE’s CI for bioenergy• transPLANT – Elixir’s CI for plants• TAIR – Hosting for sustainability

The iPlant CollaborativeScientific Achievements through iPlant’s Open Infrastructure

1KP – 1000 Plant Transcriptome Project

• Stored tens of millions of sequence reads with iPlant, all assemblies, plus data access APIs exposing 3+ million compute hours of downstream analysis

• Demonstrates TNRS, tree creation, ortholog clustering, etc.

• Claimed to create 100-fold increase in plant genes in GenBank

• Dozens of papers out or on the way

Presenter Title

David Horvath Progress in Sequencing the Genome of an Invasive Polyploid Weed (Leafy Spurge)

Joshua Der A Global Gene Family Classification Resource for Plants and Its Utility for Comparative

Genomics, Genome Annotation, and Gene Family Studies

Kranthi Mandadi Transcriptomic Analyses and Alternative Splicing Landscapes of Brachypodium

Infected with Panicum Mosaic Virus

John Duvick Genome Annotation in the Cloud through XGDBvm Virtual Server Instances Deployed

at iPlant

Dong Xhu Soybean Knowledge Base (SoyKB): A Web Resource for Integration of Soybean

Translational Genomics and Molecular Breeding

Bonnie Hurwitz iMicrobe: Advancing Clinical and Environmental Microbial Research using the iPlant

Cyberinfrastructure

The iPlant CollaborativeSuccess Stories from our Users