4 c’s for using cloud to support scientific research
TRANSCRIPT
Jeff TaborSr. Director of Product Management & MarketingAvere Systems
Jeff Tabor has worked at Avere Systems for six years, leading product definition and marketing efforts. Prior to Avere, Jeff managed clustered NAS solutions at NetApp and Spinnaker Networks. Jeff holds Bachelor’s and Master’s degrees in Electrical Engineering and Computer Science from the Massachusetts Institute of Technology.
2
Scott JeschonekDirector of Product ManagementAvere Systems
Scott has more than twenty years of synthesizing his enterprise, telecommunications, and vendor experience to provide a unique perspective to the implications of the cloud phenomenon. After working with several technology companies, Scott joined Avere in early 2014 where he is responsible for the software roadmap.
3
Agenda• Cloud overview
– Opportunities & challenges• Cloud benefits for scientific research – “The 4
C’s”– Compute scaling– Capacity scaling– Collaboration across global enterprise– Cost savings
• Cloud demo– Running scientific apps– Storing scientific data 4
Poll Question #1
Which one of the 4 “Cs” do you think would be the best use for cloud in your organization?
A. ComputeB. CapacityC. CollaborationD. Cost
Storage CloudCompute Cloud
On-Prem Storage
NAS Object
On-Prem Compute
Hybrid Cloud
6
Bucket 2
Bucket n
Bucket 1App servers
Storage CloudCompute Cloud
On-Prem Storage
NAS Object
On-Prem Compute
• App servers
• Compute farm
• Desktops
Hybrid Cloud
7
Bucket 2
Bucket n
Bucket 1App servers
Storage CloudCompute Cloud
On-Prem Storage
NAS Object
On-Prem Compute
• NAS and Object
• Multiple tiers of storage
• App servers
• Compute farm
• Desktops
Hybrid Cloud
8
Bucket 2
Bucket n
Bucket 1App servers
Storage CloudCompute Cloud
On-Prem Storage
NAS Object
On-Prem Compute
• NAS and Object
• Multiple tiers of storage
• App servers
• Compute farm
• Desktops
• Near infinite compute
• Cloud bursting
• Permanent infrastructure
Hybrid Cloud
9
Bucket 2
Bucket n
Bucket 1App servers
Storage CloudCompute Cloud
On-Prem Storage
NAS Object
On-Prem Compute
• NAS and Object
• Multiple tiers of storage
• App servers
• Compute farm
• Desktops
• Near infinite compute
• Cloud bursting
• Permanent infrastructure
• Near infinite capacity
• Mostly backup and archive today
Hybrid Cloud
10
Bucket 2
Bucket n
Bucket 1App servers
Storage Cloud
Bucket 2
Bucket n
Bucket 1
Compute Cloud
App servers
Cloud is Attractive BUT has Challenges
11On-Prem Storage
NAS Object
On-Prem Compute
Storage Cloud
Bucket 2
Bucket n
Bucket 1
Compute Cloud
App servers
Cloud is Attractive BUT has Challenges
12On-Prem Storage
NAS Object
On-Prem Compute
Cloud challenges
1. Unfamiliar object-based interface
S3
S3S3
S3
Storage Cloud
Bucket 2
Bucket n
Bucket 1
Compute Cloud
App servers
Cloud is Attractive BUT has Challenges
13On-Prem Storage
NAS Object
On-Prem Compute
Cloud challenges
1. Unfamiliar object-based interface
2. High latency to remote storage
S3
S3S3
S3
Latency of 10-100ms or more
Storage Cloud
Bucket 2
Bucket n
Bucket 1
Compute Cloud
App servers
Cloud is Attractive BUT has Challenges
14On-Prem Storage
NAS Object
On-Prem Compute
Cloud challenges
1. Unfamiliar object-based interface
2. High latency to remote storage
3. No easy on-ramp to cloud storage
S3
S3S3
S3
Latency of 10-100ms or more
Storage Cloud
Bucket 2
Bucket n
Bucket 1
Compute Cloud
App servers
Cloud is Attractive BUT has Challenges
15On-Prem Storage
NAS Object
On-Prem Compute
Cloud challenges
1. Unfamiliar object-based interface
2. High latency to remote storage
3. No easy on-ramp to cloud storage
4. Cloud Gateways do NOT scale
S3
S3S3
S3
Latency of 10-100ms or more
Single-nodeGateway
Single-nodeGateway
Compute• Cloud benefits
– Unlimited compute capacity • Time to market
– Cost effective• Easy to turn on and turn off • Zero footprint
• Cloud challenges– High latency to data
• Performance impact• Compute cloud NOT intended for storage
– Familiar file system interface • No change to apps
Cloud Computing with Avere
17
Storage CloudCompute Cloud
On-Prem Storage
NAS Object
On-Prem Compute
Customer Need• Performance for tier-
1 applications
• Avoid latency to storage
• Don’t replicate all data to compute cloud
Bucket 2
Bucket n
Bucket 1App servers
Cloud Computing with Avere
18
Storage CloudCompute Cloud
On-Prem Storage
NAS Object
On-Prem Compute
Customer Need• Performance for tier-
1 applications
• Avoid latency to storage
• Don’t replicate all data to compute cloud
Recommended
Solution• Automatic caching of
active data only
• Handle read, writes, and metadata ops
• Hide latency (50:1 or more offload)
PhysicalEdge Filer
Virtual Edge Filer
Cache data near computeHide latency to storage
Bucket 2
Bucket n
Bucket 1App servers
Cloud Computing with Avere
19
Customer Need• Keep pace with
growing demand
• Avoid disruptive upgrades
• Start small, grow huge at “click of button”
Storage CloudCompute Cloud
On-Prem Storage
NAS Object
On-Prem Compute
PhysicalEdge Filer
Virtual Edge Filer
Bucket 2
Bucket n
Bucket 1App servers
Cloud Computing with Avere
20
Customer Need• Keep pace with
growing demand
• Avoid disruptive upgrades
• Start small, grow huge at “click of button”
Storage CloudCompute Cloud
On-Prem Storage
NAS Object
On-Prem Compute
PhysicalEdge Filer
Virtual Edge Filer
Bucket 2
Bucket n
Bucket 1App servers
Recommended
Solution• Scale-out NAS
provided via clustering
• Performance scaling (more CPUs, DRAM)
• Capacity scaling (more SSD for better hit rate)
Cluster from 3 to 50 nodes
Capacity• Cloud benefits
– Unlimited storage capacity– Simple to manage (no maintenance, easy scaling)– Pay only for what you use
• Cloud challenges– Protect investment in on-prem storage– Unfamiliar access protocol (e.g. S3 API)– Missing features
• Security• Compression• Snapshots
Data Source: http://genome.gov/sequencingcosts
Slide Source: Chris Dagdigian, BioTeam, as presented in BioIT World webinar, April 2014
Use Case – Inova Health SystemEnabling Personalized Medicine with a Hybrid Cloud
Avere + AWS Benefits– 6,300 whole genomes, 1.3PB, and 7M files in
AWS– Genomic analysis results in hours not days– Saved more than $10 million– Avere encryption & AWS HIPAA compliance
Customer Challenges– Scale to petabytes of capacity– Scale performance for genomic
analysis– Contain cost (opex, not capex)– Security & compliance
GNS
AWS S3AWS EC2
Use Case – Large Pharmaceutical Company
Public Cloud
Services• New apps, built for cloud• FlashMove to public
cloud- Lowest cost- Simplest
management• Centralized resources
- Collaboration- Scalable, most
efficient
On-Prem Resources
• Existing apps, no
changes• Data that cannot move to
public cloud (security)• FlashMove to object
storage- Lower cost- Better scaling
• Geo-dispersal, no replication- Simpler to manage- More efficient
Bucket 2
Bucket n
Bucket 1
Physical FXT
On-Prem Storage
NAS Object
On-Prem Compute
Virtual FXT
Virtual Compute Farm
FlashMove
FlashMove
New Apps
Existing Apps
Use Case – Next Gen Sequencing at CDC
Avere Benefits– Scalable performance for SMB and NFS– Reduced Isilon spend by 33%– GNS provides central mount point for all
storage– Hide network latency to remote labs– Add public or private cloud in future
Customer Challenges– Isilon performance at scale– Isilon cost– Legacy Isilon– Distributed users and storage– Network saturation and latency
Datacenter B (legacy)
EMC Isilon(old)
FXT Cluster(3 nodes)EMC Isilon
(510PB)
FXT Cluster(11 nodes)
Generate(SMB)
Process(NFS)
Interpret(SMB)
Datacenter A (centralized, scalable storage)
Remote LabsUse central storage
Simplify mgmtCache at edge
GNS
Collaboration• Cloud benefits
– Centralized data– Accessible from many geographic regions
• Cloud challenges– Hide latency to centralized data– Read and write access coordination
• One writer, many readers• Many writers, minimal sharing• Many writers to shared files
Primary Datacenter
Hybrid Cloud NAS for Global Enterprises
Remote Office
Secondary/Partner/Colo Datacenter
Primary Datacenter
Hybrid Cloud NAS for Global Enterprises
Cloud Computing(Public/Private)
Remote Office
Secondary/Partner/Colo Datacenter
Virtual orPhysical
Cloud Storage(Public/Private)
Primary Datacenter
Hybrid Cloud NAS for Global Enterprises
Cloud Computing(Public/Private)
Remote Office
Secondary/Partner/Colo Datacenter
Virtual orPhysical
Cloud Storage(Public/Private)
Primary Datacenter
Hybrid Cloud NAS for Global Enterprises
Cloud Computing(Public/Private)
Remote Office
Secondary/Partner/Colo Datacenter
Virtual orPhysical
Cost• Cloud benefits
– Economy of scale– Zero footprint, power, and cooling– Pay only for what you use (capacity & compute)– Simplified management
• Cloud challenge/change– Opex, not capex
Avere + Amazon TCO ExampleTCO Savings = $2,701,000
Old Way (NetApp or EMC) New Way (Avere + Amazon)
T1Customer Reqs
• 200k IOPS• 200TB primary• 1PB Amazon S3
T2
200k IOPS + 200TB PrimaryNetApp 6000+SAS
orEMC Isilon S200
1PB ArchiveNetApp 3000+SATA
orEMC Isilon NL400
T2 AWS
200TB PrimaryPrivate object or
legacy NAS
1PB StorageAWS S3: 1,000TB
Avere 200k IOPSAvere 3800 (4x)manual
copy
FlashMove
TCO Saving with Avere + Amazon
Cost NetApp or Isilon Avere + Amazon S3
Storage Acquisition $2,067,000 $298,000
Service $930,000 $134,000
Amazon S3 (capacity) $0 $1,032,000
Amazon S3 (data out) $0 $298,000
Storage Admin $1,080,000 $196,000
Facilities & Power $264,000 $43,000
Data Migration $360,000 $0
3-year TCO $4,701,000 $2,000,000
Avere + AWS TCO savings ($) $2,701,000
Avere + AWS TCO savings (%) 57%
Comparing 1,000,000 IOPS Solutions*
EMC Isilon$10.7 / IOPS
NetApp$5.1 / IOPS
150ms
Avere$2.3 / IOPS
Avere 32-node FXT cluster
Comparing 1,000,000 IOPS Solutions*
EMC Isilon$10.7 / IOPS
NetApp$5.1 / IOPS
150ms
Avere$2.3 / IOPS
Avere 32-node FXT cluster
Core Filer-NAS-Public object-Private object
Poll Question #2What applications are you looking to deploy in the cloud?
A. Business apps (e.g. email, DB, knowledge base)
B. File services (e.g. file serving, home dirs, document management, patient management)
C. High-performance apps (e.g. genomic sequencing, imaging, bioinformatics )
D. Backup and archival only
E. Other (please specify in chat)
Cloud Bursting Use Case – Life Sciences
39Proprietary & Confidential
Cloud StorageCloud Compute Customer
Challenges•Add compute resources at peak times•Need for 3-6 months, no long-term commitment•Do NOT want to rewrite applications•May move data to the cloud for capacity scaling (later)Avere Benefits•Virtual FXT provides scalable NAS in Cloud Compute•Hide latency to on-prem NAS and object storage•Easy setup, easy teardown•Pay only for what is used•Future: move data to the cloud for better economics
Physical FXT
On-Prem Storage
NAS
On-Prem Compute
Virtual FXT
Virtual Compute Farm
Cloud Bursting + Archive Use Case – Life Sciences
40Proprietary & Confidential
Cloud StorageCloud Compute Customer
Challenges•Add compute resources at peak times•Need for 3-6 months, no long-term commitment•Do NOT want to rewrite applications•Wishes to post results for general access in lower-cost Cloud ComputeAvere Benefits•Provide File System access to virtually unlimited cloud storage•Provide Global NameSpace (GNS) directory structure between on-prem and cloud for application continuity
Physical FXT
On-Prem Storage
NAS
On-Prem Compute
Virtual FXT
Virtual Compute Farm
Bucket 2
Bucket n
Bucket 1
Hybrid Galaxy-in-Cloud – Life Sciences
41Proprietary & Confidential
Cloud StorageCloud Compute Customer
Challenges•Never enough compute nodes on-prem•Linear OPEX costs for new on-prem, seeking to reduce•Requires file-system and performance for continuity•Storage flexibility must remain for compliance, etc
Avere Benefits•Empower application operation completely in Cloud and eliminate on-prem compute, no matter the size of organization
Physical FXT
On-Prem Storage
NAS
On-Prem Compute
Virtual FXT
Virtual Compute Farm
Bucket 2
Bucket n
Bucket 1
Sequencer
Galaxy-in-Cloud – Life Sciences
42Proprietary & Confidential
Cloud StorageCloud Compute Customer
Challenges•Company or Project too small for on-prem costs•But need file system with sufficient performance characteristics•Duration of project may be short
Avere Benefits•No on-prem requirement•Run only when you need the nodes helps reduce overall costs•The ultimate flexibility
Virtual FXT
Virtual Compute Farm
Bucket 2
Bucket n
Bucket 1
Small Company, Research Lab, etc
Galaxy in Cloud
Compute Instance (8CPUs, 8GB Memory, 500GB Disk or less)
Debian LinuxGalaxy Server installed with all
tools
3-Node Avere vFXT cluster
BucketNFS Mounts to the Avere Cluster
Map to Cloud BucketS3 Protocol
Cloud Storage, Single Bucket
/
/gcs
File SystemMaintained ByAvere for Object Storage
/galaxy-data
Galaxy in Cloud/etc/fstab
Maps NFS to the vFXT environment as /mnt/galaxy-
data
Galaxy Server is locally installed, though it could also be located in the Cloud Storage and mounted via the vFXT
The Power of Caching
3-Node Avere vFXT cluster
BucketNFS Mounts to the Avere Cluster
Map to Cloud BucketS3 Protocol
Cloud Storage, Single Bucket
/
/gcs
File SystemMaintained ByAvere for Object Storage
/galaxy-data
Multiple Galaxy Clients calling:Reference Genomes
Same output files
Read / Read-ahead cachingEnsures faster response and
minimizes Cloud Bucket “hits”
Thank you!
Questions?
For more information, visit:averesystems.com
Contact Us at:888.88.AVERE