the iplant collaborative community cyberinfrastructure for life science roger barthelson/uwe hilgert...

35
The iPlant Collaborative ommunity Cyberinfrastructure for Life Scien Roger Barthelson/Uwe Hilgert iPlant / University of Arizona

Upload: cecil-higgins

Post on 26-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The iPlant Collaborative Community Cyberinfrastructure for Life Science Roger Barthelson/Uwe Hilgert iPlant / University of Arizona

The iPlant Collaborative Community Cyberinfrastructure for Life Science

Roger Barthelson/Uwe HilgertiPlant / University of Arizona

Page 2: The iPlant Collaborative Community Cyberinfrastructure for Life Science Roger Barthelson/Uwe Hilgert iPlant / University of Arizona

The iPlant CollaborativeVision

www.iPlantCollaborative.org

Enable life science researchers and educators touse and extend cyberinfrastructure

Page 3: The iPlant Collaborative Community Cyberinfrastructure for Life Science Roger Barthelson/Uwe Hilgert iPlant / University of Arizona

Community-driven organization builds cyberinfrastructure for biological sciences

The iPlant CollaborativeVision

Page 4: The iPlant Collaborative Community Cyberinfrastructure for Life Science Roger Barthelson/Uwe Hilgert iPlant / University of Arizona

UATACC

CSHL

iPlant CollaborativeA virtual organization

Page 5: The iPlant Collaborative Community Cyberinfrastructure for Life Science Roger Barthelson/Uwe Hilgert iPlant / University of Arizona

Biological CyberinfrastructureBig Data in Biology

Page 6: The iPlant Collaborative Community Cyberinfrastructure for Life Science Roger Barthelson/Uwe Hilgert iPlant / University of Arizona

Human Genome:$2.7 Billion, 13 Years

Human Genome: $900, 6 Hours

2014Oxford Nanopore

MiniION

2003: ABI 3730 Sequencer

The Egalitarian GenomeNext Generation Sequencing 2014

Page 7: The iPlant Collaborative Community Cyberinfrastructure for Life Science Roger Barthelson/Uwe Hilgert iPlant / University of Arizona

“BGI, based in China, is the world’s largest genomics research institute, with 167 DNA sequencers producing the equivalent of 2,000 human genomes a day.

BGI churns out so much data that it often cannot transmit its results to clients or collaborators over the Internet or other communications lines because that would take weeks. Instead, it sends computer disks containing the data, via FedEx.”

The Big Data ProblemStorage and Analysis

Page 8: The iPlant Collaborative Community Cyberinfrastructure for Life Science Roger Barthelson/Uwe Hilgert iPlant / University of Arizona

Biological CyberinfrastructureThe Problem of Big Data in Biology

Page 9: The iPlant Collaborative Community Cyberinfrastructure for Life Science Roger Barthelson/Uwe Hilgert iPlant / University of Arizona

Biology’s Other Big Data

Phenomics

Visualization

Page 10: The iPlant Collaborative Community Cyberinfrastructure for Life Science Roger Barthelson/Uwe Hilgert iPlant / University of Arizona

How iPlant CI Enables DiscoveryChallenge: Create an easy-to-use platform powerful enough

to handle data-intensive biologyMany bioinformatics tools “off limits” to those without

specialized computational backgrounds (“command line”).

• Data Store• Discovery Environment – 100s of tools/apps• Atmosphere – Cloud Computing• Bisque – Image Analysis Environment• APIs

Page 11: The iPlant Collaborative Community Cyberinfrastructure for Life Science Roger Barthelson/Uwe Hilgert iPlant / University of Arizona

iPlant APIsResources

The Biology App Store

Page 12: The iPlant Collaborative Community Cyberinfrastructure for Life Science Roger Barthelson/Uwe Hilgert iPlant / University of Arizona

The iPlant CollaborativeWhat is cyberinfrastructure?

Manage DataShare Data

Analyze Data

Scalable, accessible computation:data storage, cloud services, and software tools

Utilize Big Data TechFacilitate

CollaborationsConnect Resources

Manage Access

Enable science(verifiable, reproducible, tractable)

Page 13: The iPlant Collaborative Community Cyberinfrastructure for Life Science Roger Barthelson/Uwe Hilgert iPlant / University of Arizona

The iPlant CollaborativeWhat iPlant offers

• Data Management & Storage Resources• Access to High Performance Computing Resources• Tool Integration System• Application Programming Interfaces (APIs)• Cloud Computing• Genotype To Phenotype Science Enablement• Tree of Life Science Enablement• Image Analysis Platform• Support for Molecular Breeding Platform (IBP)• Support for AgMIP• Others to come...

Page 14: The iPlant Collaborative Community Cyberinfrastructure for Life Science Roger Barthelson/Uwe Hilgert iPlant / University of Arizona

The iPlant CollaborativeWhat iPlant offers

Page 15: The iPlant Collaborative Community Cyberinfrastructure for Life Science Roger Barthelson/Uwe Hilgert iPlant / University of Arizona

The iPlant CollaborativeWhat iPlant offers

Page 16: The iPlant Collaborative Community Cyberinfrastructure for Life Science Roger Barthelson/Uwe Hilgert iPlant / University of Arizona

The iPlant CollaborativeWhat iPlant offers

Page 17: The iPlant Collaborative Community Cyberinfrastructure for Life Science Roger Barthelson/Uwe Hilgert iPlant / University of Arizona

How iPlant CI Enables DiscoveryChallenge: Navigate biology’s “data deluge”

HT Image data – GB’s per dayHT sequence data – TB’s per run

Page 18: The iPlant Collaborative Community Cyberinfrastructure for Life Science Roger Barthelson/Uwe Hilgert iPlant / University of Arizona

iPlant Data Store

Texas

Replication

Arizona

Grid Computing

Cloud Computing HPCCommunity

Super Computing

iDrop

WebDAV

FoundationAPI

DE

i-commands

iPlant Data StoreScalableReliableRedundantHigh-PerformanceConnected

Page 19: The iPlant Collaborative Community Cyberinfrastructure for Life Science Roger Barthelson/Uwe Hilgert iPlant / University of Arizona

How iPlant CI Enables DiscoverySolution: iPlant Data Store

All data in within the same platform speed and accessibility

• Access your data from multiple iPlant services

• Automatic data backup redundant between University of Arizona and University of Texas

• Multiple ways to share data with collaborators

• Multi-threaded high speed transfers

• Default 100 GB allocation. >1 TB allocations available with justification

Source Time (s)

CD 320

Berkeley Server 150

External Drive 36*

USB2.0 Flash 30

iPlant Data Store 18*

My Computer 15

Getting 1 GB onto my computer takes...

Page 20: The iPlant Collaborative Community Cyberinfrastructure for Life Science Roger Barthelson/Uwe Hilgert iPlant / University of Arizona

How iPlant CI Enables DiscoveryWhat iPlant data solutions mean for a bovine breeder

“It's kind of like being in that COPD commercial where the weight is lifted off your chest, only with iPlant, we have access to more computational power, so we can get to projects much faster and we can do big projects that our machines may not have allowed us to do previously!

The ability to transport 2TB of data overnight using the iRODS system was particularly helpful because previously, we had been mailing hard drives which is not an optimal solution to sharing big data.”

James Koltes, Iowa State

Page 21: The iPlant Collaborative Community Cyberinfrastructure for Life Science Roger Barthelson/Uwe Hilgert iPlant / University of Arizona

How iPlant CI Enables DiscoverySolution: Discovery Environment

An extensible platform for science

• High-powered computing• Data sharing/collaboration• Easy to use interface• Virtually limitless apps• Analysis history (provenance)

Page 22: The iPlant Collaborative Community Cyberinfrastructure for Life Science Roger Barthelson/Uwe Hilgert iPlant / University of Arizona

iPlant’s Discovery EnvironmentWeb Interface for Hundreds of Applications

Page 23: The iPlant Collaborative Community Cyberinfrastructure for Life Science Roger Barthelson/Uwe Hilgert iPlant / University of Arizona

(Some) Apps in Discovery Environment

• Sequence Quality Control– FastQC– Fastx Toolkit– Sabre, Scythe, Sickle (paired end

trimming)– SGA cleanup (paired end quality

trimming)– Coming soon…

Sequence induction, assessment, and trimming pipeline

Mira contaminant detection and removal

(for sequencing studies)

Page 24: The iPlant Collaborative Community Cyberinfrastructure for Life Science Roger Barthelson/Uwe Hilgert iPlant / University of Arizona

(Some) Apps in Discovery Environment

• Genome Assembly– ABySS– Soapdenovo2– Velvet– Newbler– Contig analysis tools

With or without reference sequence for comparison

– Coming soon…Minimus2MiraPacBioToCA Or PBJelly?

(for sequencing studies)

Page 25: The iPlant Collaborative Community Cyberinfrastructure for Life Science Roger Barthelson/Uwe Hilgert iPlant / University of Arizona

(Some) Apps in Discovery Environment

• Transcript assembly/RNASeq– Tophat, Cufflinks, Cuffmerge,

CuffDiff– Oases– Trinity– Newbler– Scarf– Coming soon…

Open pipeline for transcript expression analysis (quantitative RNASeq)

Mira transcriptome assembly

(for sequencing studies)

Page 26: The iPlant Collaborative Community Cyberinfrastructure for Life Science Roger Barthelson/Uwe Hilgert iPlant / University of Arizona

The iPlant CollaborativeWhat iPlant offers

Page 27: The iPlant Collaborative Community Cyberinfrastructure for Life Science Roger Barthelson/Uwe Hilgert iPlant / University of Arizona

The iPlant CollaborativeWhat iPlant offers

Page 28: The iPlant Collaborative Community Cyberinfrastructure for Life Science Roger Barthelson/Uwe Hilgert iPlant / University of Arizona

How iPlant CI Enables DiscoveryWhat the Discovery Environment means to bench biologists

“In one week I was able to align my RNA-Seq samples using a method that previously took me a month on my bioinformatics computers…

Being able to access my data any time and from anywhere – price less.

The DE interface is intuitive and easy to use...[and] will allow greater continuity and comparability between different experiments from different laboratories.”

Richard Barker – Univ. Wisconsin, Madison

Page 29: The iPlant Collaborative Community Cyberinfrastructure for Life Science Roger Barthelson/Uwe Hilgert iPlant / University of Arizona

How iPlant CI Enables DiscoveryChallenge: Collaborate and access software on demand

Frustrated bioinformaticians serving the needs of severalusers

+ works well / powerful- expensive / complex

Cartoon: http://phdhumor.blogspot.com/2008/12/on-lazy-day-for-bioinformatician.html

Page 30: The iPlant Collaborative Community Cyberinfrastructure for Life Science Roger Barthelson/Uwe Hilgert iPlant / University of Arizona

How iPlant CI Enables DiscoveryiPlant Solution: Atmosphere

On-demand computing resource built on a cloud infrastructure

• Virtual Machine pre-configured with: Software Memory requirements Processing power

• Plant authentication and storage and HPC capabilities

• Build custom images/appliances and share with community

• Cross-platform desktop access to GUI applications in the cloud (using VNC)

Page 31: The iPlant Collaborative Community Cyberinfrastructure for Life Science Roger Barthelson/Uwe Hilgert iPlant / University of Arizona

Atmosphere: Your Cloud, Your Way

Google Cloud

Atmosphere

Page 32: The iPlant Collaborative Community Cyberinfrastructure for Life Science Roger Barthelson/Uwe Hilgert iPlant / University of Arizona

AtmosphereSelect a Machine Image, Launch

Page 33: The iPlant Collaborative Community Cyberinfrastructure for Life Science Roger Barthelson/Uwe Hilgert iPlant / University of Arizona

How iPlant CI Enables DiscoveryWhat Atmosphere means to bioinformaticians

“What my users used to call me for, they now do on their own through Atmosphere. Now I can scale up my user community”

Nathan Miller, Univ. Wisconsin, Madison

• BLAST 400k transcripts against NCBI nr in 36 h vs. 2 months

• Use iPlant Data Store to move 1500 high-res images per day for analysis

“iPlant is a great equalizer.” Mike Covington, UC Davis

Page 34: The iPlant Collaborative Community Cyberinfrastructure for Life Science Roger Barthelson/Uwe Hilgert iPlant / University of Arizona

The iPlant CollaborativeYour colleagues

Staff:Greg AbramSonali AdityaRitu AroraRoger BarthelsonRob BovillBrad BoyleGordon BurleighJohn CazesMike ConwayVictor CorderoRion DooleyAaron DubrowAndy EdmondsDmitry FedorovJohn FonnerMelyssa FratkinMichael Gatto

Leadership Team

Steve Goff - UADan Stanzione – TACCMatt Vaughn - TACCNirav Merchant – UAEric Lyons - UADoreen Ware – CSHL

Faculty Advisors & Collaborators:Ali AkogluKobus BarnardVolker BrendelTimothy ClausnerSally ElginBrian EnquistDamian GesslerRuth GreneJohn HartmanMatthew HudsonDavid LowenthalB.S. ManjunathDavid Neale

Students:Peter BaileyJeremy BeaulieuDevi BhattacharyaStorme BriscoeYaDi ChenDavid ChoiBarbara Dobrin

Brian O’MearaSudha RamDavid SaltMark SchildhauerNeelima SinhaDoug SoltisPam SoltisEdgar SpaldingAlexis StamatakisRick StevensJames TaylorBrett TylerSteve Welch

Zhenyuan LuEric LyonsAaron MarcuseKubitzNaim MatasciRobert McLayNathan MillerSteve Mock Martha NarroShannon OliverBenoit ParmentierJmatt PetersonDennis RobertsPaul SarandoJerry SchneiderEdwin SkidmoreBrandon Smith

Utkarsh GaurCornel GhibanSteve GregoryMathew HelmkeNatalie HenriquesUwe HilgertNicole HopkinsLogan JohnsonChris JordanKathleen KennedyMohammed KhalfanDavid KnappLars KoersterkSangeeta KuchimanchiKristian KvilekvalSue LauterTina Lee

Mary Margaret Sprinkle Sriram SrinivasanJosh SteinLisa StillwellJonathan StrootmanPeter Van BurenHans VasquezGrossRebeka VillarrealRamona WalllsLiya WangAnton Westveld Jason WilliamsJohn WregglesworthWeijia Xu

Andrew PredoehlSathee RavindranathKyle SimekGregory StriemerJason VandeventerNicholas WoodwardKuan Yang

Postdocs:Barbara BanburyChristos Noutsos Solon PissisBrad Ruhfel

John DonoghueYekatarina KhartianovaChris La RoseAmgad MadkourAniruddha MaratheAndre MercerKurt MichaelsZack Pierce

Michael Schatz – CSHLDavid Micklos – CSHLAnn Stapleton – UNCWRon Vetter – UNCW

Page 35: The iPlant Collaborative Community Cyberinfrastructure for Life Science Roger Barthelson/Uwe Hilgert iPlant / University of Arizona

Connect with iPlant!

Twitter: @iPlantCollab #iPlantFacebook: facebook.com/iPlantCollab

LinkedIn: iplant.co/iPlantCollabLinkedInGoogle+: iplant.com/iPlantGooglePlus