sequence services phase 2--eagle genomics and cycle computing

16
Sequence Services Phase 2 Pistoia Alliance AGM, Boston MA, April 24 th 2012

Upload: pistoia-alliance

Post on 11-May-2015

618 views

Category:

Technology


1 download

DESCRIPTION

William Spooner (Eagle) and Carl Chesal (Cycle) introduce the proof of concept provided by this consortium for Phase 2 of the Pistoia Alliance Sequence Services project. The presentation was delivered at the Pistoia Alliance Conference in Boston, MA, on April 24, 2012.

TRANSCRIPT

Page 1: Sequence Services Phase 2--Eagle Genomics and Cycle Computing

Sequence Services Phase 2Pistoia Alliance AGM, Boston MA, April 24th 2012

Page 2: Sequence Services Phase 2--Eagle Genomics and Cycle Computing

NurtureBuild trust, shared language

CollaborateEnterpriseAcademiaGovernmentFoundations Open

Innovation

ExploreWork together

to find a common purpose

ExploitTurn ideas into

tangible benefits

2/ ElasticAP, Pistoia Alliance Conference, Boston MA, 24th April 2012

Page 3: Sequence Services Phase 2--Eagle Genomics and Cycle Computing

The Requirements

3/ ElasticAP, Pistoia Alliance Conference, Boston MA, 24th April 2012

$

?

Share

FUNCTIONALLogin and workspace

Manage users

Manage data

Upload private data

Access public data

Export

Delete/archive

Manage applications

Upload scripts/pipelines

Analyse data

Monitor use/performance

NON-FUNCTIONAL

Charging Model

Service Support

Operational Requirements

Security Requirements

Page 4: Sequence Services Phase 2--Eagle Genomics and Cycle Computing

The Partnership

4/ ElasticAP, Pistoia Alliance Conference, Boston MA, 24th April 2012

Established: 2005 2008

Domain: High performance computing

Operational bioinformatics

Employees: 18, 16 engineers 12, 9 engineers, pool of external consultants

Location: Across USA/Canada Cambridge, UK

Sectors: Pharmaceutical, biotechnology, financial, computer gaming, engineering, academia.

Pharmaceutical, biotechnology, agri-biotechnology, consumer goods, food, other life sciences.

Customers: North America, Europe North America, Europe, Asia

Partnerships: Schrodinger, VMWare, Canonical

Amazon Web Services, Cognizant, European Bioinformatics Institute, University of Manchester, John Innes Centre

Page 5: Sequence Services Phase 2--Eagle Genomics and Cycle Computing

The Platform

The platform for storage, analysis and sharing of life sciences data in the cloud

5/ ElasticAP, Pistoia Alliance Conference, Boston MA, 24th April 2012

Page 6: Sequence Services Phase 2--Eagle Genomics and Cycle Computing

The Proposal

6/ ElasticAP, Pistoia Alliance Conference, Boston MA, 24th April 2012

ANALYSESUpload

Pipeline process

Stored data

Manual process

Start

StopStored data

Share

Depositor

Collaborator

BioinformaticianCIO

Biologist

Page 7: Sequence Services Phase 2--Eagle Genomics and Cycle Computing

The Architecture

7/ ElasticAP, Pistoia Alliance Conference, Boston MA, 24th April 2012

A

mazo

n E

C2 C

loud

Gateway Shiboleth

Web ServerCycleCloud

MySQLAssets

DB

Bioinformatician

Collaborator

Depositor

OpenAM IdP

Customer Single Sign On

SA

ML

Token

Exch

an

ge

HTTPSWeb

Web ServerSEEK

HTTPSWeb

Encrypt/Decrypt

Data FiData Fi

Data Files

Customer Sandbox

S3 Storage

Data FiData Fi

Data Files

Customer SandboxEC2/AMIs

Customer SandboxEC2/AMIs

Condor

Ensembl

BioLinux

HTTPSWeb

SA

ML

Au

then

ticate

HTTPSWeb

HTTPSWeb

Page 8: Sequence Services Phase 2--Eagle Genomics and Cycle Computing

The Present

8/ ElasticAP, Pistoia Alliance Conference, Boston MA, 24th April 2012

Bioinformatician

DepositorCollaborator

Page 9: Sequence Services Phase 2--Eagle Genomics and Cycle Computing

The1000 Genomes• A Deep Catalogue of Human

Variation– Freely available on AWS– 1,700 Individuals– 200Tb data– 10,000s data files– Almost no metadata!

• ElasticAP evaluating 1000 Genomes Project Pilot 2– 20X resequencing– 2 trios (6 individuals)

Page 10: Sequence Services Phase 2--Eagle Genomics and Cycle Computing

TRUP: Tumor RNA-seq Unified Pipeline

• Collaboration between–Max Planck Institute for Molecular

Genetic– Bayer Pharma AG

• Identifies gene fusion events in tumor samples

• Involves both alignment and de-novo sequencing steps

• Pipeline is being implemented on ElasticAP– Using public GEO datasets for validation

Page 11: Sequence Services Phase 2--Eagle Genomics and Cycle Computing

The PoC

11/ ElasticAP, Pistoia Alliance Conference, Boston MA, 24th April 2012

FUNCTIONAL

Login and workspace

Load dataManage public

dataLoad scripts and

pipelinesAnalyse dataExport dataArchive dataManage

applicationsManage usersMonitor

use/performance

NON-FUNCTIONAL

Charging ModelService SupportOperational

RequirementsSecurity

Requirements

KEYFully implementedPartially implementedTo-do list

$?

Page 12: Sequence Services Phase 2--Eagle Genomics and Cycle Computing

The Prior Art• Eagle have been building analysis

pipelines and hosting secure cloud apps for years.

• Cycle have been developing HPC solutions and deploying them on the cloud for years

• We built this as a platform we could use ourselves in order to carry on delivering what we already do.

• But now the results are interactive, and everyone can share and participate.

• The most common tasks won’t need to involve us at all.

12/ ElasticAP, Pistoia Alliance Conference, Boston MA, 24th April 2012

Page 13: Sequence Services Phase 2--Eagle Genomics and Cycle Computing

The Price• AWS-style pay as you go business model

– Free sign-up and account creation– Tiered applications by the hour.– Discounts for up-front reservation fee.– Offline data import/export also available.– Flat-rate data by the gigabyte-month.– Backup data by the gigabyte-month.– Monthly billing.– Support contracts available.

• Customisation and new pipelines at Eagle/Cycle standard consulting rates.

13/ ElasticAP, Pistoia Alliance Conference, Boston MA, 24th April 2012

Page 14: Sequence Services Phase 2--Eagle Genomics and Cycle Computing

The Plan• Early access to preferred partner

customers in July– talk to us now if you’d like to be part of that.

• Full production in September with all partial/todo items implemented.

• Increased number of public datasets.

• Increased range of applications and pipelines.

• User interface improvements based on feedback from early access period.

14/ ElasticAP, Pistoia Alliance Conference, Boston MA, 24th April 2012

Page 15: Sequence Services Phase 2--Eagle Genomics and Cycle Computing

The Potential• Available as customisation projects:

– Conversions to other clouds.

– Conversions to run on in-house infrastructure.

• Truly secure and scalable R&D collaboration environment.– Applicable to all sciences, not just genomics.

15/ ElasticAP, Pistoia Alliance Conference, Boston MA, 24th April 2012

Change the way you do science