gmod in the cloud genome informatics november 3, 2011 scott cain gmod project coordinator ontario...

24
GMOD in the Cloud Genome Informatics November 3, 2011 Scott Cain GMOD Project Coordinator Ontario Institute for Cancer Research [email protected]

Upload: jocelin-walker

Post on 20-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: GMOD in the Cloud Genome Informatics November 3, 2011 Scott Cain GMOD Project Coordinator Ontario Institute for Cancer Research scott@scottcain.net

GMOD in the Cloud

Genome InformaticsNovember 3, 2011

Scott CainGMOD Project CoordinatorOntario Institute for Cancer [email protected]

Page 2: GMOD in the Cloud Genome Informatics November 3, 2011 Scott Cain GMOD Project Coordinator Ontario Institute for Cancer Research scott@scottcain.net

Click to edit the title text formatIntroduction: GMOD is …

• A set of interoperable open-source software components for visualizing, annotating, and managing biological data.

• An active community of developers and users asking diverse questions, and facing common challenges, with their biological data.

Page 3: GMOD in the Cloud Genome Informatics November 3, 2011 Scott Cain GMOD Project Coordinator Ontario Institute for Cancer Research scott@scottcain.net

Click to edit the title text formatWho uses GMOD?

Plus hundreds of others

Page 4: GMOD in the Cloud Genome Informatics November 3, 2011 Scott Cain GMOD Project Coordinator Ontario Institute for Cancer Research scott@scottcain.net

Click to edit the title text formatGMOD in the Cloud

What GMOD in the cloud isn't:

Clouds

Guy gettingblown up

Garry's MOD (aka gmod.com)

Page 5: GMOD in the Cloud Genome Informatics November 3, 2011 Scott Cain GMOD Project Coordinator Ontario Institute for Cancer Research scott@scottcain.net

Click to edit the title text formatSeveral GMOD Cloud Projects

Galaxy - Web-based platform for data intensive biomedical research

CloVR - Automated and portable sequence analysis

GBrowse2 - Web-based, scalable genome browser

cloud.gmod.org - Several integrated GMOD tools

http://gmod.org/wiki/Cloud

Page 6: GMOD in the Cloud Genome Informatics November 3, 2011 Scott Cain GMOD Project Coordinator Ontario Institute for Cancer Research scott@scottcain.net

Click to edit the title text formatGalaxy Cloudman

Get Galaxy without the data or usage limitations.

Combine with Cloud BioLinux to have access to MANY tools.

Create an analysis cluster in minutes. Use autoscaling to get good performance at

low cost.

http://wiki.g2.bx.psu.edu/Admin/Cloud

Page 7: GMOD in the Cloud Genome Informatics November 3, 2011 Scott Cain GMOD Project Coordinator Ontario Institute for Cancer Research scott@scottcain.net

Click to edit the title text formatDeploying Galaxy cluster on AWS

1.

2.3.

4.

Page 8: GMOD in the Cloud Genome Informatics November 3, 2011 Scott Cain GMOD Project Coordinator Ontario Institute for Cancer Research scott@scottcain.net

Click to edit the title text formatExercising elasticity with autoscaling

Computation time: 9 Computation time: 9 hrshrs

Fixed cluster size5 nodes Computation cost: $20Computation cost: $20

20 nodes Computation cost: $50Computation cost: $50

Computation time: 6 hrsComputation time: 6 hrs

1 to 16 nodes

Computation time: 6 hrsComputation time: 6 hrs

Dynamic cluster size

Computation cost: $20Computation cost: $20

Page 9: GMOD in the Cloud Genome Informatics November 3, 2011 Scott Cain GMOD Project Coordinator Ontario Institute for Cancer Research scott@scottcain.net

Click to edit the title text formatCloVR

Cloud Virtual Resource. Automated pipeline for sequence analysis. Uses 2 GMOD tools: Workflow and Ergatis. Use a virtual machine locally to interact with

resources in the cloud.

http://clovr.org/

Page 10: GMOD in the Cloud Genome Informatics November 3, 2011 Scott Cain GMOD Project Coordinator Ontario Institute for Cancer Research scott@scottcain.net

Click to edit the title text formatCloVR Architecture

Page 11: GMOD in the Cloud Genome Informatics November 3, 2011 Scott Cain GMOD Project Coordinator Ontario Institute for Cancer Research scott@scottcain.net

Click to edit the title text formatWhy the virtual machine?

Running the pipeline happens on the local machine, while the heavy lifting is done on the cloud/cluster

Page 12: GMOD in the Cloud Genome Informatics November 3, 2011 Scott Cain GMOD Project Coordinator Ontario Institute for Cancer Research scott@scottcain.net

Click to edit the title text formatGBrowse2

Installed and configured recent release of GBrowse2.

Tools to allow automatically adding rendering servers.

Ability to add standard data sets.

http://gmod.org/wiki/GBrowse

Page 13: GMOD in the Cloud Genome Informatics November 3, 2011 Scott Cain GMOD Project Coordinator Ontario Institute for Cancer Research scott@scottcain.net

Click to edit the title text formatGBrowse2

Yeast

Fly Worm

Human

Amazon Snapshots

RenderSlaves

Master

GBrowse2 in the Cloud

Page 14: GMOD in the Cloud Genome Informatics November 3, 2011 Scott Cain GMOD Project Coordinator Ontario Institute for Cancer Research scott@scottcain.net

Click to edit the title text format

Page 15: GMOD in the Cloud Genome Informatics November 3, 2011 Scott Cain GMOD Project Coordinator Ontario Institute for Cancer Research scott@scottcain.net

Click to edit the title text formatcloud.gmod.org

Tripal Drupal-based web frontend

Chado Generic organism DB schema

GBrowse Venerable genome browser

JBrowse Fast, AJAX genome browser

Sample data Saccharomyces cerevisiae

GMOD tools preinstalled:

Can be run as a micro machine (albeit slowly)

Page 16: GMOD in the Cloud Genome Informatics November 3, 2011 Scott Cain GMOD Project Coordinator Ontario Institute for Cancer Research scott@scottcain.net

Click to edit the title text formatA little more on Tripal

Based on the popular CMS Drupal.

Several modules written to serve as an interface for Chado:Controlled Vocabularies

Features

Analyses

Libraries

Stocks

Integrated job management

Page 17: GMOD in the Cloud Genome Informatics November 3, 2011 Scott Cain GMOD Project Coordinator Ontario Institute for Cancer Research scott@scottcain.net

Click to edit the title text format

Page 18: GMOD in the Cloud Genome Informatics November 3, 2011 Scott Cain GMOD Project Coordinator Ontario Institute for Cancer Research scott@scottcain.net

Click to edit the title text format

Page 19: GMOD in the Cloud Genome Informatics November 3, 2011 Scott Cain GMOD Project Coordinator Ontario Institute for Cancer Research scott@scottcain.net

Click to edit the title text format

Page 20: GMOD in the Cloud Genome Informatics November 3, 2011 Scott Cain GMOD Project Coordinator Ontario Institute for Cancer Research scott@scottcain.net

Click to edit the title text formatPotential use case for Cloud GMOD

Community annotation:Just add a web-start Apollo and set the security

group to allow it to connect to the database.

When WebApollo is ready, it's even easier: WA is an addon to JBrowse but allows collaborative editing.

Tripal and Drupal allow editing of most data types in Chado, and commenting on pages similar to a blog.

Page 21: GMOD in the Cloud Genome Informatics November 3, 2011 Scott Cain GMOD Project Coordinator Ontario Institute for Cancer Research scott@scottcain.net

Click to edit the title text formatWhy use the cloud?

Avoid installation related issues (saves you time and frustration!)

Save money (how much, of course, depends)

Availability of common genomic data sets (several projects already make these available at AWS)

Page 22: GMOD in the Cloud Genome Informatics November 3, 2011 Scott Cain GMOD Project Coordinator Ontario Institute for Cancer Research scott@scottcain.net

Click to edit the title text formatFuture work

Get GBrowse2 AMI public (very soon)

Add Apollo to gmod.cloud.org (relatively soon)

Add WebApollo to gmod.cloud.org (as soon

as it's released)

Page 23: GMOD in the Cloud Genome Informatics November 3, 2011 Scott Cain GMOD Project Coordinator Ontario Institute for Cancer Research scott@scottcain.net

Click to edit the title text formatConclusion

http://gmod.org/wiki/Cloud for more information on GMOD work in the cloud.

http://cloud.gmod.org/ for a running example of cloud.gmod.org.

http://clovr.org/ for more info on CloVR and to download the client VM.

http://getgalaxy.org/ for more information on getting Cloudman.

Page 24: GMOD in the Cloud Genome Informatics November 3, 2011 Scott Cain GMOD Project Coordinator Ontario Institute for Cancer Research scott@scottcain.net

Click to edit the title text formatAcknowlegements

• Funding agencies: NIH, USDA ARS, NSF, Ontario Ministry of Economic Development and Innovation • Lincoln Stein, Chris Vandevelde • Enis Afgan and the Galaxy Team• Sam Angiuoli et al at UofM SOM• Stephen Ficklin and the Tripal group• Mitch Skinner and JBrowse developers• The rest of the GMOD community