(cmp303) researchcloud: cfncluster and internet2 for enterprise hpc

38
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Jason Tetrault, Karl Gutwin October 2015 CMP303 ResearchCloud CfnCluster and Internet2 for Enterprise HPC

Upload: amazon-web-services

Post on 22-Jan-2018

856 views

Category:

Technology


5 download

TRANSCRIPT

Page 1: (CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC

© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Jason Tetrault, Karl Gutwin

October 2015

CMP303

ResearchCloudCfnCluster and Internet2 for Enterprise HPC

Page 2: (CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC

Driver: Large Scale NGS

Page 3: (CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC

2 Minute Over Simplification of NGS

Page 4: (CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC

2 Minute Over Simplification of NGS

Page 5: (CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC

NGS: Sequencing

Page 6: (CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC

NGS: Sequencing

Page 7: (CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC

NGS: Alignment

Page 8: (CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC

NGS: Alignment

HG19

Page 9: (CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC

NGS: Alignment

HG19

Page 10: (CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC

NGS: Alignment

HG19

Fastq

BAM

Page 11: (CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC

NGS: Down Stream

Curly Hair

?

Green Eyes

Page 12: (CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC

Evolution in NGS

Whole Exome Whole Genome

• File Sizes

• Fastq Size: 4 – 20GB

• Bam Size: 4 – 20GB

• Pile up: 10 – 20x size

• Total: 40 – 400 GB

• Processing Time

• 5 – 24 hours

• File Sizes

• Fastq Size: 80 – 300GB

• Bam Size: 80 – 300GB

• Pile up: 10 – 20x size

• Total: 800 – 4000 GB

• Processing Time

• 24 – 72 hours

Page 13: (CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC

NGS Data Shape

Current

demand

Demand in

6 months

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29

Current Large CollaborationTransfers

Internal Sequencer Other

Internal Sequencer HiSeq

Standard Traffic

0

0.5

1

1.5

2

2.5

3

3.5

4

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29

Future Large CollaborationTransfers

X10 Processed Results

Reprocessing

Current Large CollaborationTransfers

Internal Sequencer Other

Gig

abits p

er

second

Monthly View

Gig

abits p

er

second

Monthly View

Page 14: (CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC

Security Requirements

• Strong encryption

• At rest

• In flight

• Strong access controls

• Private networks and VPCs

Page 15: (CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC

ResearchCloud

Page 16: (CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC

ResearchCloud

Page 17: (CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC

ResearchCloud

Network

Page 18: (CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC

ResearchCloud

Network Biogen

10 GbpsDirect Connect

Other Cloud

• Data transfers for collaborations

• Private AWS Direct Connect

• 10G-class end-to-end

Consortia

Sequencing

Centers

Universities

Page 19: (CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC

Internet2Creating a new circuit

Cambridge,

MA

Ashburn, VA

• Research & Education Community

• 100Gbps backbone

Page 20: (CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC

ResearchCloud

Network

Page 21: (CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC

AWS Accounts and VPCs

Secure

Services

Consolidated

Billing

CompBio Private

CompChem Private

DataLake Prod

Collaboration Public

Public Sandbox

DataLake Dev/Test

Private Sandbox

Admin Private Public

DirectC

onnect

VPC

VPC

VPC

VPC

VPC

VPC

VPC

VPC

VPC

VPC

Page 22: (CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC

Scaling to 10 Gigabits

Find and eliminate

bottlenecks

Packet loss and

latency do not mix

Parallel streams Continuous

monitoring and

testing• Network

• Disk

• Firewall

• Amazon EC2

Page 23: (CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC

Instant HPC

Page 24: (CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC

Instant HPC: CfnCluster and Gluster

CfnCluster Info:

• AWS CloudFormation template based

framework

• Easily deploys in VPC

• Encrypted Amazon EBS and Ephemeral

• Elasticity using Auto Scaling and Amazon

CloudWatch

• http://aws.amazon.com/hpc/cfncluster

Storage Info:

• Deployed GlusterFS on top of CfnCluster

• Has encryption in flight (TLS)

and at rest

Page 25: (CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC

Instant HPC: CfnCluster and GlusterSecure CfnCluster Config

• VPC, Key, and Size Configurations

• Ephemeral Encryption

• Install Hooks

Chef Based Secure Recipes

• Base Software, Security and Config

• Finalization of Secure GlusterFS

Page 26: (CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC

Instant HPC: CfnCluster and Gluster

NGS BASE 1:

• 1 + 8 c3.8xlarge Instances

• 16 Modern Intel Cores

• 2 X 320 GB Onboard SSD

• 10 Gbps Network

• $1.68 per hour * per node

Total Min:

• Total Cores: 128

• 5 TB Ephemeral SSD

• ~5 TB Encrypted Gluster

• Cost: ~$16 per hour

Total Max (20 Nodes):

• Total Cores: 320

• 12.5 TB Ephemeral SSD

• ~5 TB Encrypted Gluster

• Cost: ~$36 per hour

Page 27: (CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC

Instant HPC

Page 28: (CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC

Performance Testing

Load Test:

• BWA alignment: Same

job repeated

• Cluster would grow to

its elastic max

• Different node / Gluster

ratios

Things to think about:

• Cores vs Hyperthreads

• I/O intensity

• Gluster

• Elastic ratio

• Versions

• Total disk needed

Page 29: (CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC

Finally: Invert the Question

Can I run a hundred

whole genomes quickly?

I can run a handful a

month!

Page 30: (CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC

Finally: Invert the Question

Can I run a hundred

whole genomes quickly?

I can run a handful a

month!

Oh, and it will fill our

storage

Page 31: (CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC

Finally: Invert the Question

Can I run a hundred

whole genomes quickly?

I can run a handful a

month!

Oh, and it will fill our

storage

I hope you don’t

crash the network

Page 32: (CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC

Finally: Invert the Question

Can I run a hundred

whole genomes quickly?Sure, do you have

$10,000?

Page 33: (CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC

Finally: Invert the Question

Can I run a thousand

whole genomes quickly?Sure, do you have

$100,000?

Page 34: (CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC

Team:

Scientific Computing

• Hank Wu

• Jon Cody Haines

• Karl Gutwin

• Rodney Marable

GDO

• Jason Tetrault

PMO

• Kristen Cleveland

• Thomas Bolton

BioTeam

• Anushka Brownley

• Chris Dagdigian

• Adam Kraut

Other

• Adriana Karaboutis

• Charles O’Donnell

• Comp Bio

• Maxim Dozhdev

• Victor Khabarov

• Dougal Ballantyne

• Stacy O’Keefe

• Dan Taylor

• Brian Cashman

Infrastructure and Security

• Brandon Patton

• Leon Rice

• Chaminda Attale

• Bob Dyer

• Ricardo Mills

• Harry Gadatia

• George McDowell

• Gregor Irish

• Geovyn Fernandez

Page 35: (CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC

How Are We Using This?

Large Scale NGS

Reprocessing

Computational Chemistry

Epigenetics

High Speed S3 Transfer

Peered Network Tests

Globus Transfers

GDO

Data Science

Data Lake

Page 36: (CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC

Lessons Learned…

CfnCluster and

Internet2

Scaling and Tuning 10G does not always

mean 10G

KMS and

CloudFormation

EBS warmup and

launch times

Be careful which

nodes you drop (Duh)

Page 37: (CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC

Remember to complete

your evaluations!

Page 38: (CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC

Thank you!

biogen.com/careers

Thank you!