(cmp303) researchcloud: cfncluster and internet2 for enterprise hpc
TRANSCRIPT
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Jason Tetrault, Karl Gutwin
October 2015
CMP303
ResearchCloudCfnCluster and Internet2 for Enterprise HPC
Driver: Large Scale NGS
2 Minute Over Simplification of NGS
2 Minute Over Simplification of NGS
NGS: Sequencing
NGS: Sequencing
NGS: Alignment
NGS: Alignment
HG19
NGS: Alignment
HG19
NGS: Alignment
HG19
Fastq
BAM
NGS: Down Stream
Curly Hair
?
Green Eyes
Evolution in NGS
Whole Exome Whole Genome
• File Sizes
• Fastq Size: 4 – 20GB
• Bam Size: 4 – 20GB
• Pile up: 10 – 20x size
• Total: 40 – 400 GB
• Processing Time
• 5 – 24 hours
• File Sizes
• Fastq Size: 80 – 300GB
• Bam Size: 80 – 300GB
• Pile up: 10 – 20x size
• Total: 800 – 4000 GB
• Processing Time
• 24 – 72 hours
NGS Data Shape
Current
demand
Demand in
6 months
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29
Current Large CollaborationTransfers
Internal Sequencer Other
Internal Sequencer HiSeq
Standard Traffic
0
0.5
1
1.5
2
2.5
3
3.5
4
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29
Future Large CollaborationTransfers
X10 Processed Results
Reprocessing
Current Large CollaborationTransfers
Internal Sequencer Other
Gig
abits p
er
second
Monthly View
Gig
abits p
er
second
Monthly View
Security Requirements
• Strong encryption
• At rest
• In flight
• Strong access controls
• Private networks and VPCs
ResearchCloud
ResearchCloud
ResearchCloud
Network
ResearchCloud
Network Biogen
10 GbpsDirect Connect
Other Cloud
• Data transfers for collaborations
• Private AWS Direct Connect
• 10G-class end-to-end
Consortia
Sequencing
Centers
Universities
Internet2Creating a new circuit
Cambridge,
MA
Ashburn, VA
• Research & Education Community
• 100Gbps backbone
ResearchCloud
Network
AWS Accounts and VPCs
Secure
Services
Consolidated
Billing
CompBio Private
CompChem Private
DataLake Prod
…
Collaboration Public
Public Sandbox
DataLake Dev/Test
Private Sandbox
…
Admin Private Public
DirectC
onnect
VPC
VPC
VPC
VPC
VPC
VPC
VPC
VPC
VPC
VPC
Scaling to 10 Gigabits
Find and eliminate
bottlenecks
Packet loss and
latency do not mix
Parallel streams Continuous
monitoring and
testing• Network
• Disk
• Firewall
• Amazon EC2
Instant HPC
Instant HPC: CfnCluster and Gluster
CfnCluster Info:
• AWS CloudFormation template based
framework
• Easily deploys in VPC
• Encrypted Amazon EBS and Ephemeral
• Elasticity using Auto Scaling and Amazon
CloudWatch
• http://aws.amazon.com/hpc/cfncluster
Storage Info:
• Deployed GlusterFS on top of CfnCluster
• Has encryption in flight (TLS)
and at rest
Instant HPC: CfnCluster and GlusterSecure CfnCluster Config
• VPC, Key, and Size Configurations
• Ephemeral Encryption
• Install Hooks
Chef Based Secure Recipes
• Base Software, Security and Config
• Finalization of Secure GlusterFS
Instant HPC: CfnCluster and Gluster
NGS BASE 1:
• 1 + 8 c3.8xlarge Instances
• 16 Modern Intel Cores
• 2 X 320 GB Onboard SSD
• 10 Gbps Network
• $1.68 per hour * per node
Total Min:
• Total Cores: 128
• 5 TB Ephemeral SSD
• ~5 TB Encrypted Gluster
• Cost: ~$16 per hour
Total Max (20 Nodes):
• Total Cores: 320
• 12.5 TB Ephemeral SSD
• ~5 TB Encrypted Gluster
• Cost: ~$36 per hour
Instant HPC
Performance Testing
Load Test:
• BWA alignment: Same
job repeated
• Cluster would grow to
its elastic max
• Different node / Gluster
ratios
Things to think about:
• Cores vs Hyperthreads
• I/O intensity
• Gluster
• Elastic ratio
• Versions
• Total disk needed
Finally: Invert the Question
Can I run a hundred
whole genomes quickly?
I can run a handful a
month!
Finally: Invert the Question
Can I run a hundred
whole genomes quickly?
I can run a handful a
month!
Oh, and it will fill our
storage
Finally: Invert the Question
Can I run a hundred
whole genomes quickly?
I can run a handful a
month!
Oh, and it will fill our
storage
I hope you don’t
crash the network
Finally: Invert the Question
Can I run a hundred
whole genomes quickly?Sure, do you have
$10,000?
Finally: Invert the Question
Can I run a thousand
whole genomes quickly?Sure, do you have
$100,000?
Team:
Scientific Computing
• Hank Wu
• Jon Cody Haines
• Karl Gutwin
• Rodney Marable
GDO
• Jason Tetrault
PMO
• Kristen Cleveland
• Thomas Bolton
BioTeam
• Anushka Brownley
• Chris Dagdigian
• Adam Kraut
Other
• Adriana Karaboutis
• Charles O’Donnell
• Comp Bio
• Maxim Dozhdev
• Victor Khabarov
• Dougal Ballantyne
• Stacy O’Keefe
• Dan Taylor
• Brian Cashman
Infrastructure and Security
• Brandon Patton
• Leon Rice
• Chaminda Attale
• Bob Dyer
• Ricardo Mills
• Harry Gadatia
• George McDowell
• Gregor Irish
• Geovyn Fernandez
How Are We Using This?
Large Scale NGS
Reprocessing
Computational Chemistry
Epigenetics
High Speed S3 Transfer
Peered Network Tests
Globus Transfers
GDO
Data Science
Data Lake
Lessons Learned…
CfnCluster and
Internet2
Scaling and Tuning 10G does not always
mean 10G
KMS and
CloudFormation
EBS warmup and
launch times
Be careful which
nodes you drop (Duh)
Remember to complete
your evaluations!
Thank you!
biogen.com/careers
Thank you!