sequence services phase 2--eagle genomics and cycle computing
DESCRIPTION
William Spooner (Eagle) and Carl Chesal (Cycle) introduce the proof of concept provided by this consortium for Phase 2 of the Pistoia Alliance Sequence Services project. The presentation was delivered at the Pistoia Alliance Conference in Boston, MA, on April 24, 2012.TRANSCRIPT
Sequence Services Phase 2Pistoia Alliance AGM, Boston MA, April 24th 2012
NurtureBuild trust, shared language
CollaborateEnterpriseAcademiaGovernmentFoundations Open
Innovation
ExploreWork together
to find a common purpose
ExploitTurn ideas into
tangible benefits
2/ ElasticAP, Pistoia Alliance Conference, Boston MA, 24th April 2012
The Requirements
3/ ElasticAP, Pistoia Alliance Conference, Boston MA, 24th April 2012
$
?
Share
FUNCTIONALLogin and workspace
Manage users
Manage data
Upload private data
Access public data
Export
Delete/archive
Manage applications
Upload scripts/pipelines
Analyse data
Monitor use/performance
NON-FUNCTIONAL
Charging Model
Service Support
Operational Requirements
Security Requirements
The Partnership
4/ ElasticAP, Pistoia Alliance Conference, Boston MA, 24th April 2012
Established: 2005 2008
Domain: High performance computing
Operational bioinformatics
Employees: 18, 16 engineers 12, 9 engineers, pool of external consultants
Location: Across USA/Canada Cambridge, UK
Sectors: Pharmaceutical, biotechnology, financial, computer gaming, engineering, academia.
Pharmaceutical, biotechnology, agri-biotechnology, consumer goods, food, other life sciences.
Customers: North America, Europe North America, Europe, Asia
Partnerships: Schrodinger, VMWare, Canonical
Amazon Web Services, Cognizant, European Bioinformatics Institute, University of Manchester, John Innes Centre
The Platform
The platform for storage, analysis and sharing of life sciences data in the cloud
5/ ElasticAP, Pistoia Alliance Conference, Boston MA, 24th April 2012
The Proposal
6/ ElasticAP, Pistoia Alliance Conference, Boston MA, 24th April 2012
ANALYSESUpload
Pipeline process
Stored data
Manual process
Start
StopStored data
Share
Depositor
Collaborator
BioinformaticianCIO
Biologist
The Architecture
7/ ElasticAP, Pistoia Alliance Conference, Boston MA, 24th April 2012
A
mazo
n E
C2 C
loud
Gateway Shiboleth
Web ServerCycleCloud
MySQLAssets
DB
Bioinformatician
Collaborator
Depositor
OpenAM IdP
Customer Single Sign On
SA
ML
Token
Exch
an
ge
HTTPSWeb
Web ServerSEEK
HTTPSWeb
Encrypt/Decrypt
Data FiData Fi
Data Files
Customer Sandbox
S3 Storage
Data FiData Fi
Data Files
Customer SandboxEC2/AMIs
Customer SandboxEC2/AMIs
Condor
Ensembl
BioLinux
HTTPSWeb
SA
ML
Au
then
ticate
HTTPSWeb
HTTPSWeb
The Present
8/ ElasticAP, Pistoia Alliance Conference, Boston MA, 24th April 2012
Bioinformatician
DepositorCollaborator
The1000 Genomes• A Deep Catalogue of Human
Variation– Freely available on AWS– 1,700 Individuals– 200Tb data– 10,000s data files– Almost no metadata!
• ElasticAP evaluating 1000 Genomes Project Pilot 2– 20X resequencing– 2 trios (6 individuals)
TRUP: Tumor RNA-seq Unified Pipeline
• Collaboration between–Max Planck Institute for Molecular
Genetic– Bayer Pharma AG
• Identifies gene fusion events in tumor samples
• Involves both alignment and de-novo sequencing steps
• Pipeline is being implemented on ElasticAP– Using public GEO datasets for validation
The PoC
11/ ElasticAP, Pistoia Alliance Conference, Boston MA, 24th April 2012
FUNCTIONAL
Login and workspace
Load dataManage public
dataLoad scripts and
pipelinesAnalyse dataExport dataArchive dataManage
applicationsManage usersMonitor
use/performance
NON-FUNCTIONAL
Charging ModelService SupportOperational
RequirementsSecurity
Requirements
KEYFully implementedPartially implementedTo-do list
$?
The Prior Art• Eagle have been building analysis
pipelines and hosting secure cloud apps for years.
• Cycle have been developing HPC solutions and deploying them on the cloud for years
• We built this as a platform we could use ourselves in order to carry on delivering what we already do.
• But now the results are interactive, and everyone can share and participate.
• The most common tasks won’t need to involve us at all.
12/ ElasticAP, Pistoia Alliance Conference, Boston MA, 24th April 2012
The Price• AWS-style pay as you go business model
– Free sign-up and account creation– Tiered applications by the hour.– Discounts for up-front reservation fee.– Offline data import/export also available.– Flat-rate data by the gigabyte-month.– Backup data by the gigabyte-month.– Monthly billing.– Support contracts available.
• Customisation and new pipelines at Eagle/Cycle standard consulting rates.
13/ ElasticAP, Pistoia Alliance Conference, Boston MA, 24th April 2012
The Plan• Early access to preferred partner
customers in July– talk to us now if you’d like to be part of that.
• Full production in September with all partial/todo items implemented.
• Increased number of public datasets.
• Increased range of applications and pipelines.
• User interface improvements based on feedback from early access period.
14/ ElasticAP, Pistoia Alliance Conference, Boston MA, 24th April 2012
The Potential• Available as customisation projects:
– Conversions to other clouds.
– Conversions to run on in-house infrastructure.
• Truly secure and scalable R&D collaboration environment.– Applicable to all sciences, not just genomics.
15/ ElasticAP, Pistoia Alliance Conference, Boston MA, 24th April 2012
Change the way you do science