4 c’s for using cloud to support scientific research

56

Upload: avere-systems

Post on 02-Aug-2015

58 views

Category:

Technology


1 download

TRANSCRIPT

Jeff TaborSr. Director of Product Management & MarketingAvere Systems

Jeff Tabor has worked at Avere Systems for six years, leading product definition and marketing efforts. Prior to Avere, Jeff managed clustered NAS solutions at NetApp and Spinnaker Networks. Jeff holds Bachelor’s and Master’s degrees in Electrical Engineering and Computer Science from the Massachusetts Institute of Technology.

2

Scott JeschonekDirector of Product ManagementAvere Systems

Scott has more than twenty years of synthesizing his enterprise, telecommunications, and vendor experience to provide a unique perspective to the implications of the cloud phenomenon. After working with several technology companies, Scott joined Avere in early 2014 where he is responsible for the software roadmap.

3

Agenda• Cloud overview

– Opportunities & challenges• Cloud benefits for scientific research – “The 4

C’s”– Compute scaling– Capacity scaling– Collaboration across global enterprise– Cost savings

• Cloud demo– Running scientific apps– Storing scientific data 4

Poll Question #1

Which one of the 4 “Cs” do you think would be the best use for cloud in your organization?

A. ComputeB. CapacityC. CollaborationD. Cost

Storage CloudCompute Cloud

On-Prem Storage

NAS Object

On-Prem Compute

Hybrid Cloud

6

Bucket 2

Bucket n

Bucket 1App servers

Storage CloudCompute Cloud

On-Prem Storage

NAS Object

On-Prem Compute

• App servers

• Compute farm

• Desktops

Hybrid Cloud

7

Bucket 2

Bucket n

Bucket 1App servers

Storage CloudCompute Cloud

On-Prem Storage

NAS Object

On-Prem Compute

• NAS and Object

• Multiple tiers of storage

• App servers

• Compute farm

• Desktops

Hybrid Cloud

8

Bucket 2

Bucket n

Bucket 1App servers

Storage CloudCompute Cloud

On-Prem Storage

NAS Object

On-Prem Compute

• NAS and Object

• Multiple tiers of storage

• App servers

• Compute farm

• Desktops

• Near infinite compute

• Cloud bursting

• Permanent infrastructure

Hybrid Cloud

9

Bucket 2

Bucket n

Bucket 1App servers

Storage CloudCompute Cloud

On-Prem Storage

NAS Object

On-Prem Compute

• NAS and Object

• Multiple tiers of storage

• App servers

• Compute farm

• Desktops

• Near infinite compute

• Cloud bursting

• Permanent infrastructure

• Near infinite capacity

• Mostly backup and archive today

Hybrid Cloud

10

Bucket 2

Bucket n

Bucket 1App servers

Storage Cloud

Bucket 2

Bucket n

Bucket 1

Compute Cloud

App servers

Cloud is Attractive BUT has Challenges

11On-Prem Storage

NAS Object

On-Prem Compute

Storage Cloud

Bucket 2

Bucket n

Bucket 1

Compute Cloud

App servers

Cloud is Attractive BUT has Challenges

12On-Prem Storage

NAS Object

On-Prem Compute

Cloud challenges

1. Unfamiliar object-based interface

S3

S3S3

S3

Storage Cloud

Bucket 2

Bucket n

Bucket 1

Compute Cloud

App servers

Cloud is Attractive BUT has Challenges

13On-Prem Storage

NAS Object

On-Prem Compute

Cloud challenges

1. Unfamiliar object-based interface

2. High latency to remote storage

S3

S3S3

S3

Latency of 10-100ms or more

Storage Cloud

Bucket 2

Bucket n

Bucket 1

Compute Cloud

App servers

Cloud is Attractive BUT has Challenges

14On-Prem Storage

NAS Object

On-Prem Compute

Cloud challenges

1. Unfamiliar object-based interface

2. High latency to remote storage

3. No easy on-ramp to cloud storage

S3

S3S3

S3

Latency of 10-100ms or more

Storage Cloud

Bucket 2

Bucket n

Bucket 1

Compute Cloud

App servers

Cloud is Attractive BUT has Challenges

15On-Prem Storage

NAS Object

On-Prem Compute

Cloud challenges

1. Unfamiliar object-based interface

2. High latency to remote storage

3. No easy on-ramp to cloud storage

4. Cloud Gateways do NOT scale

S3

S3S3

S3

Latency of 10-100ms or more

Single-nodeGateway

Single-nodeGateway

Compute• Cloud benefits

– Unlimited compute capacity • Time to market

– Cost effective• Easy to turn on and turn off • Zero footprint

• Cloud challenges– High latency to data

• Performance impact• Compute cloud NOT intended for storage

– Familiar file system interface • No change to apps

Cloud Computing with Avere

17

Storage CloudCompute Cloud

On-Prem Storage

NAS Object

On-Prem Compute

Customer Need• Performance for tier-

1 applications

• Avoid latency to storage

• Don’t replicate all data to compute cloud

Bucket 2

Bucket n

Bucket 1App servers

Cloud Computing with Avere

18

Storage CloudCompute Cloud

On-Prem Storage

NAS Object

On-Prem Compute

Customer Need• Performance for tier-

1 applications

• Avoid latency to storage

• Don’t replicate all data to compute cloud

Recommended

Solution• Automatic caching of

active data only

• Handle read, writes, and metadata ops

• Hide latency (50:1 or more offload)

PhysicalEdge Filer

Virtual Edge Filer

Cache data near computeHide latency to storage

Bucket 2

Bucket n

Bucket 1App servers

Cloud Computing with Avere

19

Customer Need• Keep pace with

growing demand

• Avoid disruptive upgrades

• Start small, grow huge at “click of button”

Storage CloudCompute Cloud

On-Prem Storage

NAS Object

On-Prem Compute

PhysicalEdge Filer

Virtual Edge Filer

Bucket 2

Bucket n

Bucket 1App servers

Cloud Computing with Avere

20

Customer Need• Keep pace with

growing demand

• Avoid disruptive upgrades

• Start small, grow huge at “click of button”

Storage CloudCompute Cloud

On-Prem Storage

NAS Object

On-Prem Compute

PhysicalEdge Filer

Virtual Edge Filer

Bucket 2

Bucket n

Bucket 1App servers

Recommended

Solution• Scale-out NAS

provided via clustering

• Performance scaling (more CPUs, DRAM)

• Capacity scaling (more SSD for better hit rate)

Cluster from 3 to 50 nodes

Capacity• Cloud benefits

– Unlimited storage capacity– Simple to manage (no maintenance, easy scaling)– Pay only for what you use

• Cloud challenges– Protect investment in on-prem storage– Unfamiliar access protocol (e.g. S3 API)– Missing features

• Security• Compression• Snapshots

Data Source: http://genome.gov/sequencingcosts

Slide Source: Chris Dagdigian, BioTeam, as presented in BioIT World webinar, April 2014

Use Case – Inova Health SystemEnabling Personalized Medicine with a Hybrid Cloud

Avere + AWS Benefits– 6,300 whole genomes, 1.3PB, and 7M files in

AWS– Genomic analysis results in hours not days– Saved more than $10 million– Avere encryption & AWS HIPAA compliance

Customer Challenges– Scale to petabytes of capacity– Scale performance for genomic

analysis– Contain cost (opex, not capex)– Security & compliance

GNS

AWS S3AWS EC2

Use Case – Large Pharmaceutical Company

Public Cloud

Services• New apps, built for cloud• FlashMove to public

cloud- Lowest cost- Simplest

management• Centralized resources

- Collaboration- Scalable, most

efficient

On-Prem Resources

• Existing apps, no

changes• Data that cannot move to

public cloud (security)• FlashMove to object

storage- Lower cost- Better scaling

• Geo-dispersal, no replication- Simpler to manage- More efficient

Bucket 2

Bucket n

Bucket 1

Physical FXT

On-Prem Storage

NAS Object

On-Prem Compute

Virtual FXT

Virtual Compute Farm

FlashMove

FlashMove

New Apps

Existing Apps

Use Case – Next Gen Sequencing at CDC

Avere Benefits– Scalable performance for SMB and NFS– Reduced Isilon spend by 33%– GNS provides central mount point for all

storage– Hide network latency to remote labs– Add public or private cloud in future

Customer Challenges– Isilon performance at scale– Isilon cost– Legacy Isilon– Distributed users and storage– Network saturation and latency

Datacenter B (legacy)

EMC Isilon(old)

FXT Cluster(3 nodes)EMC Isilon

(510PB)

FXT Cluster(11 nodes)

Generate(SMB)

Process(NFS)

Interpret(SMB)

Datacenter A (centralized, scalable storage)

Remote LabsUse central storage

Simplify mgmtCache at edge

GNS

Collaboration• Cloud benefits

– Centralized data– Accessible from many geographic regions

• Cloud challenges– Hide latency to centralized data– Read and write access coordination

• One writer, many readers• Many writers, minimal sharing• Many writers to shared files

Primary Datacenter

Hybrid Cloud NAS for Global Enterprises

Remote Office

Primary Datacenter

Hybrid Cloud NAS for Global Enterprises

Remote Office

Secondary/Partner/Colo Datacenter

Primary Datacenter

Hybrid Cloud NAS for Global Enterprises

Cloud Computing(Public/Private)

Remote Office

Secondary/Partner/Colo Datacenter

Virtual orPhysical

Cloud Storage(Public/Private)

Primary Datacenter

Hybrid Cloud NAS for Global Enterprises

Cloud Computing(Public/Private)

Remote Office

Secondary/Partner/Colo Datacenter

Virtual orPhysical

Cloud Storage(Public/Private)

Primary Datacenter

Hybrid Cloud NAS for Global Enterprises

Cloud Computing(Public/Private)

Remote Office

Secondary/Partner/Colo Datacenter

Virtual orPhysical

Cost• Cloud benefits

– Economy of scale– Zero footprint, power, and cooling– Pay only for what you use (capacity & compute)– Simplified management

• Cloud challenge/change– Opex, not capex

Avere + Amazon TCO ExampleTCO Savings = $2,701,000

Old Way (NetApp or EMC) New Way (Avere + Amazon)

T1Customer Reqs

• 200k IOPS• 200TB primary• 1PB Amazon S3

T2

200k IOPS + 200TB PrimaryNetApp 6000+SAS

orEMC Isilon S200

1PB ArchiveNetApp 3000+SATA

orEMC Isilon NL400

T2 AWS

200TB PrimaryPrivate object or

legacy NAS

1PB StorageAWS S3: 1,000TB

Avere 200k IOPSAvere 3800 (4x)manual

copy

FlashMove

TCO Saving with Avere + Amazon

Cost NetApp or Isilon Avere + Amazon S3

Storage Acquisition $2,067,000 $298,000

Service $930,000 $134,000

Amazon S3 (capacity) $0 $1,032,000

Amazon S3 (data out) $0 $298,000

Storage Admin $1,080,000 $196,000

Facilities & Power $264,000 $43,000

Data Migration $360,000 $0

3-year TCO $4,701,000 $2,000,000

Avere + AWS TCO savings ($) $2,701,000

Avere + AWS TCO savings (%) 57%

Comparing 1,000,000 IOPS Solutions*

EMC Isilon$10.7 / IOPS

NetApp$5.1 / IOPS

150ms

Avere$2.3 / IOPS

Comparing 1,000,000 IOPS Solutions*

EMC Isilon$10.7 / IOPS

NetApp$5.1 / IOPS

150ms

Avere$2.3 / IOPS

Avere 32-node FXT cluster

Comparing 1,000,000 IOPS Solutions*

EMC Isilon$10.7 / IOPS

NetApp$5.1 / IOPS

150ms

Avere$2.3 / IOPS

Avere 32-node FXT cluster

Core Filer-NAS-Public object-Private object

Poll Question #2What applications are you looking to deploy in the cloud?

A. Business apps (e.g. email, DB, knowledge base)

B. File services (e.g. file serving, home dirs, document management, patient management)

C. High-performance apps (e.g. genomic sequencing, imaging, bioinformatics )

D. Backup and archival only

E. Other (please specify in chat)

Real-world Demo: Life Sciences in Google Cloud

Cloud Bursting Use Case – Life Sciences

39Proprietary & Confidential

Cloud StorageCloud Compute Customer

Challenges•Add compute resources at peak times•Need for 3-6 months, no long-term commitment•Do NOT want to rewrite applications•May move data to the cloud for capacity scaling (later)Avere Benefits•Virtual FXT provides scalable NAS in Cloud Compute•Hide latency to on-prem NAS and object storage•Easy setup, easy teardown•Pay only for what is used•Future: move data to the cloud for better economics

Physical FXT

On-Prem Storage

NAS

On-Prem Compute

Virtual FXT

Virtual Compute Farm

Cloud Bursting + Archive Use Case – Life Sciences

40Proprietary & Confidential

Cloud StorageCloud Compute Customer

Challenges•Add compute resources at peak times•Need for 3-6 months, no long-term commitment•Do NOT want to rewrite applications•Wishes to post results for general access in lower-cost Cloud ComputeAvere Benefits•Provide File System access to virtually unlimited cloud storage•Provide Global NameSpace (GNS) directory structure between on-prem and cloud for application continuity

Physical FXT

On-Prem Storage

NAS

On-Prem Compute

Virtual FXT

Virtual Compute Farm

Bucket 2

Bucket n

Bucket 1

Hybrid Galaxy-in-Cloud – Life Sciences

41Proprietary & Confidential

Cloud StorageCloud Compute Customer

Challenges•Never enough compute nodes on-prem•Linear OPEX costs for new on-prem, seeking to reduce•Requires file-system and performance for continuity•Storage flexibility must remain for compliance, etc

Avere Benefits•Empower application operation completely in Cloud and eliminate on-prem compute, no matter the size of organization

Physical FXT

On-Prem Storage

NAS

On-Prem Compute

Virtual FXT

Virtual Compute Farm

Bucket 2

Bucket n

Bucket 1

Sequencer

Galaxy-in-Cloud – Life Sciences

42Proprietary & Confidential

Cloud StorageCloud Compute Customer

Challenges•Company or Project too small for on-prem costs•But need file system with sufficient performance characteristics•Duration of project may be short

Avere Benefits•No on-prem requirement•Run only when you need the nodes helps reduce overall costs•The ultimate flexibility

Virtual FXT

Virtual Compute Farm

Bucket 2

Bucket n

Bucket 1

Small Company, Research Lab, etc

Galaxy in Cloud

Compute Instance (8CPUs, 8GB Memory, 500GB Disk or less)

Debian LinuxGalaxy Server installed with all

tools

3-Node Avere vFXT cluster

BucketNFS Mounts to the Avere Cluster

Map to Cloud BucketS3 Protocol

Cloud Storage, Single Bucket

/

/gcs

File SystemMaintained ByAvere for Object Storage

/galaxy-data

Galaxy in Cloud/etc/fstab

Maps NFS to the vFXT environment as /mnt/galaxy-

data

Galaxy Server is locally installed, though it could also be located in the Cloud Storage and mounted via the vFXT

Avere vFXT maps to Cloud Storage

Avere vFXT maps to Cloud Storage

Avere vFXT Exports NameSpace

Galaxy Client is mounted on the FXT cluster

Galaxy in Compute

Reference Data located in Bucket

Reference Genome data stored in a directory on the Cloud Bucket

Step 1, upload fastq files

Client uploads through vFXT

Run TopHat2

Run TopHat2

The Power of Caching

3-Node Avere vFXT cluster

BucketNFS Mounts to the Avere Cluster

Map to Cloud BucketS3 Protocol

Cloud Storage, Single Bucket

/

/gcs

File SystemMaintained ByAvere for Object Storage

/galaxy-data

Multiple Galaxy Clients calling:Reference Genomes

Same output files

Read / Read-ahead cachingEnsures faster response and

minimizes Cloud Bucket “hits”

Thank you!

Questions?

For more information, visit:averesystems.com

Contact Us at:888.88.AVERE

[email protected]