build hybrid storage architectures
TRANSCRIPT
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Build hybrid storage architectures with AWS Storage Gateway
S T G 3 0 5
Asa Kalavade
AWS Storage Gateway General Manager
Paul Reed
AWS Storage Gateway Principal Product Manager
Mohammad Shaikh
Director of Research
ComputingBristol-Myers Squibb
Oleg Moiseyenko
Sr. Cloud Architect, Bristol-Myers Squibb
… then you’ve come to the right session
Are you faced with these on-premises storage challenges
Growing backup infrastructure costs
Storage capacity limits
Limited access to in-cloud data
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Use cases
Customer case study - BMS
New features deep dive
Storage Gateway overview
Summary
AWS Storage Gateway
Provides on-premises access to virtually unlimited cloud storage …
… regardless of cloud adoption stage
Move on-premises backups
to the cloud
Provide low latency access for
on-premises applications to
cloud data
Shift on-premises storage to
cloud-backed file shares
Tens of thousands of customers
PBs ingested
every day
Average 96% reduction of on-premises storage
100s of PBs managed in-cloud
AWS Storage Gateway
Managing rapidly growing customer datasets …
… and serving more customers every day
Some AWS Storage Gateway customers
Integrated with AWS Identity and Access Management
(IAM), AWS Key Management Service (AWS KMS),
AWS CloudTrail, Amazon CloudWatch services
AWS Storage Gateway
Configuration: VMware ESXi, Microsoft Hyper-V,
Amazon Elastic Compute Cloud (Amazon EC2),
Hardware Appliance
AWS CloudCustomer premise
Files
(NFS/SMB)
Volumes
(iSCSI)
Tapes
(iSCSI VTL)
AWS Storage GatewayAmazon S3
Glacier
Amazon S3
Amazon Elastic
Block Store
(Amazon EBS)
AWS Backup
Amazon S3
Glacier Deep
Archive
Storage Gateway serviceStorage Gateway
HTTPS
• Low latency cached access to data in Amazon S3
• Support for NFS (POSIX) and SMB file shares (Windows ACLs)
• One-to-one mapping between files and objects in S3
Features
File GatewayStore and access objects in Amazon S3 from file-based applications with local caching
On-Premises
NFS & SMB
File Gateway
HTTPS
Amazon
S3 bucketApplication Storage
Gateway
service
• Presents block storage over iSCSI in cached mode (recently accessed data) or stored mode (full volume)
• Cost-efficient incremental Amazon EBS snapshots of volumes managed through AWS Backup
• Compresses data between gateway and cloud to minimize storage charges
Features
Volume GatewayBlock storage on-premises backed by cloud storage
Storage
Gateway
service
On-Premises
iSCSI HTTPS
Volume
Gateway
Amazon EBS
snapshots
Application
• Emulates physical tape library through iSCSI-VTL protocol
• Compatible with most major backup applications
• Archive virtual tapes in S3 Glacier Deep Archive, lowest cost cloud storage, or S3 Glacier
Features
Tape Gateway
Learn more … STG217 – Shift your tape backups to AWS to save time and money
Tuesday, Dec 3, 5:30 PM - 6:30 PM
On-Premises
iSCSI VTL
Tape Gateway
HTTPS
Application
Storage Gateway service
Tape library(Amazon S3)
Tape shelf(S3 Glacier Deep Archive)
OR (S3 Glacier)
File
Gateway
Volume
GatewayTape
Gateway
What’s new since re:Invent 2018
NEW!
NEW!
NEW!
What’s new since re:Invent 2018
Hardware appliance Enterprise features
◉
◉
◉◉◉◉
◉
◉◉◉◉◉
◉◉◉ ◉◉◉
Regions
• Currently available in 20
regions, including China
(Beijing), and GovCloud
(US-West)
NEW!
NEW!
NEW!
Limited time incentive for Hardware ApplianceMONDAY
CYBER
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Storage Gateway
Provides on-premises access to virtually unlimited cloud storage …
… regardless of cloud adoption stage
Move on-premises backups
to the cloud
Provide low latency access
for on-premises applications
to cloud data
Shift on-premises storage to
cloud-backed file shares
Move on-premises backups to the cloud
iSCSI VTL
AWS Cloud
File
Gateway
Volume
Gateway
Tape
Gateway
Storage
Gateway
Managed
Service
Database
Server
Application
Server
Backup
Server
iSCSI
NFS/SMB
Tape Library
(Amazon S3)Tape Archive
(S3 Glacier / GDA)
Amazon S3 Amazon EBSAWS Backup
HTTPS
HTTPS
HTTPS
On-premises
Any S3 storage class
lifecycle
Amazon S3
eject
Maintain your backup workflows while reducing your backup infrastructure on-premises
File Gateway for on-premises backupMove database and file backups into the cloud and free up on-premises storage capacity
Features
NFS/SMB protocol support, mount shares directly
on database and application servers
Files stored durably in Amazon S3, lifecycle to any
S3 storage class
Local cache for accessing recent backups
Windows ACL support to control access to
backup files
Support for S3 Object Lock
Bandwidth-optimized, only changes are transferred
Reduce on-premises storage for backups
Easily integrates with SAP, SQL Server,
Oracle, HDFS, and other applications
Restore backups on-premises or in the
cloud on EC2 or RDS
Benefits
AWS Cloud
HTTPSFile
Gateway
NFS/SMBApp/DB
Server
Any S3 storage class
On-premises
Amazon S3
lifecycle
Volume Gateway for on-premises backupEnable faster application recovery in-cloud or on-premises
AWS Cloud
HTTPS
On-premises
Volume
Gateway
Application
Server
iSCSI
Amazon S3 Amazon EBSAWS Backup
Features Benefits
Present cloud-based iSCSI block storage volumes
to on-premises applications
On-premises cache of recently accessed data
Backup volumes as EBS snapshots
Integrates with AWS Backup to coordinate
volume backup and retention
Store volume backups securely
and reliably
Restore backups on-premises or
in the cloud as EBS volumes
Tape Gateway for on-premises backupReplace physical tape infrastructure with virtual tape workflows
Features Benefits
iSCSI VTL interface compatible with leading
backup applications
Active tapes stored in Amazon S3
Ejected tapes stored in S3 Glacier or S3 Glacier
Deep Archive
Automatic fixity checking
Data compressed and encrypted, in-transit
and at-rest
Drop-in replacement for tape libraries,
tape media, and archiving services
Maintain existing backup workflows
Eliminate the hassles of physical tape
Store archived tapes durably and reliably
in Amazon S3 Glacier Deep Archive for
$1/TB/month
iSCSI VTL
AWS Cloud
Tape
Gateway
Backup
Server
Tape Library
S3 Glacier / S3
GDA
HTTPS
Amazon S3
Tape Archive
On-premises
eject
Backing up to physical tapes, sent off-site
Lengthy, unreliable recovery of data from tapes
No new backup budget approved
Couldn’t disrupt their existing operations
Problem
Solution
Outcome
EMC Networker connected to Tape Gateway
Backups stored in Virtual Tape Library (VTL)
on Amazon S3
Archive to Amazon S3 Glacier
No change in backup workflow
50% cost reduction
Parallel backups for one year, then turned off physical tape
Phased out off-site archive in 3 months
Analog Devices is a world leader in the design, manufacture, and marketing of a
broad portfolio of high performance analog, mixed-signal, and digital signal
processing (DSP) integrated circuits (ICs) used in virtually all types of electronic
equipment
Migrating datacenters & applications to AWS
Many on-premises databases and assets to migrate, backup & archive
High backup costs with commercial software
Install File Gateways for backup of SAP on Oracle
environments, hybrid backups, and archives of SQL
databases, Hadoop clusters, and other applications
Keep on-premises access to in-cloud data
~90% reduction in backup costs, eliminating
backup software
With a few TB of storage on premises, get access
to 100s of TB of storage and backups in cloud
Problem
Solution
Outcome
The world's leading cereal company, 2nd largest producer of cookies, crackers, and
savory snacks, and leading North American frozen foods company
Shift on-premises storage to cloud-backed file sharesAccess virtually unlimited, highly durable cloud storage using common file protocols
Features Benefits
Supports NFS and SMB protocols—no application
changes required
Files stored durably in Amazon S3
SMB shares integrate with Active Directory
Amazon CloudWatch events for
automated workflows
Reduce costs by moving storage to Amazon
S3 and accessing on-premises
Virtually unlimited cloud storage—no more
running out of capacity
Eliminate expensive hardware refresh cycles
AWS Cloud
HTTPSFile
Gateway
NFS/SMBApplication
On-premises
Amazon S3
NAS storage
Stacks of disk arrays on-premises were expensive and required a lot of space
Complex architecture and cache hierarchy
Many readers via NFS
Problem
Solution
Outcome
AWS DataSync to transfer bulk data and active
datasets to cloud
File Gateway for local access to cloud data
Active/active multi-region and versioning with
lifecycles
$1M bandwidth cost savings
Saved ~85% on storage, per location
Storage engineers focused on high-value activities
With more than 40,000 auto dealer clients across five continents, we strive to
understand your needs by pairing our insights and research with your business
goals – delivering inspired results to bridge the gap between consumers,
manufacturers, dealers and lenders at every stage of the automotive experience
Learn more … STG354 – Large-scale file migrations with AWS DataSync
Thursday, Dec 5, 3:15 PM - 4:15 PM
Low-latency access for on-premises applications to cloud dataAccess files quickly from distributed locations and scale capacity as needed
Features Benefits
Generate data in-cloud or ingest from on-
premises using AWS DataSync or AWS Snowball
Up to 16 TB local cache per gateway
Fully-managed gateway cache provides low-
latency access to data
Refresh cache at the bucket or prefix level
Access cloud storage from any
on-premises location
Process data in the cloud and refresh
gateway cache for up-to-date results
Data stored cost effectively and centrally
in the cloud
AWS Cloud
Application
NFS/SMB
Cache refresh
HTTPS
Cache refresh
HTTPS
Application
NFS/SMB
On-premises
File Gateway
On-premises
File GatewayIn-cloud processing
AWS
DataSync
AWS
Snowball
Data stored on premises for regulatory and performance reasons
Moved application data to Amazon S3 but developers still need file-based access
Required high level of security, encryption, and scalability
Problem
Solution
Outcome
Deployed multiple file gateways to manage ready
access to cloud data
Use gateways for granular control over data stored
in Amazon S3
Preserve developer access to frequently used data
Use native tools with no proprietary formats
No coding required—works with existing protocols
and OS-level commands
The world's leading and most diverse derivatives marketplace
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
New features deep dive
Customers asked to Feature we delivered
• High availability for all gateway types running
on VMware
• Gateway health checks integrated with VMware
provide application level monitoring including:
• NFS/SMB file share availability
• iSCSI availability
• Configuration errors; e.g., read-only root disks
• Gateway restarts on service interruption
High availability on VMware: Feature overview and benefitsFor VMware-based gateways running on premises or in VMware Cloud on AWS
• Enterprise workloads operate
uninterrupted
• VMware HA protects workloads against
hardware, hypervisor, and network
errors
• Gateway automatically recovers from
most service interruptions in under 60
seconds and maintains its local cache
What is it What are its benefits
How does it workGateway recovery for software, hardware, and datacenter failure scenarios
VMware Host
Software failure Hardware failure
VMware Host VMware Host
Datacenter failure
DR DatacenterCorporate Datacenter
VMware Host VMware Host
• Real-time visibility into cache utilization, gateway
access patterns, and throughput and I/O metrics
through CloudWatch integration
• Administrators can monitor performance and
cache metrics to tune resources based on
application needs
• High ”Cache Percent Dirty” can prompt an increase
in network allocation
• High “Cloud Traffic” can prompt and increase in
cache size
For all environments
Monitor all of your gateways from the console
CloudWatch integration For all environments
Trigger actions and notifications based on events and metrics
Corporate datacenter AWS Cloud
NEW!
NEW!
Gateway software updates are managed
automatically for customers
Granular control over maintenance windows
to meet the uptime requirements of enterprise-
wide applications that need to operate
without interruption:
• Day of the week—available now
• Day of the month—available now
• Day of the week of the month—coming soon
• Day of every # weeks—coming soon
Additional maintenance window optionsFor all environments
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Customer case study: Bristol-Myers SquibbStorage Gateway applications in life sciences
Mohammad Shaikh
Director of Research ComputingBristol-Myers Squibb
Oleg Moiseyenko
Sr. Cloud Architect, Bristol-Myers Squibb
To discover, develop, and deliver
innovative medicines that help
patients prevail over serious diseases
Our mission
Scientific Computing Services
Major data sources
• Raw data from labs
• Scratch space
• Results data
• External collaborations
• Public & government agencies
• R&D
It’s all about data, Big Data
From GBs to PBs scale
Exponential growth
(Tens of PBs)
Scientific data sets
• NGS data
• Proteomics
• Flow Cytometry
• Imaging data
• High-throughput screening
• Mass spectrometry
• Databases
2016 2017 2018 2019 2020
Our data sources
High-velocity and continuous sources• Illumina sequencers (Genomics data)
• Nuclear Magnetic Resonance (NMR)
• Many others
High-volume sources• High-resolution mass spectrometer (Proteomics)
• AT2 tissue microscope (Histology)
• High content screening
Intermediate storage• NAS drive, NFS-based metadata
• POSIX metadata captured only
• Business metadata: Relationships need to be enriched on S3
Hybrid file use cases: Data transfer, analytics, ML
Lab to Cloud (NMR, Histology, NGS)
• Instrument data
• Metadata catalog in the cloud
• Downstream analysis
Machine Learning (ML) analysis in cloud/visualization in Labs (Flow Cytometry)
• Instrument data to cloud
• ML-based analytics, unsupervised learning models
• Visualization of scientific data
Image management analysis in cloud
• Specialized scientific data formats
• Data enrichment
• Downstream analytics
1. Instruments writes raw data into File Gateway file share
2. File Gateway transfer files to S3 buckets
3. Data Management system scans S3 buckets regularly
4. Applications request data via Data Management system meta catalog
Typical data flow diagram
AWS Direct
Connect
10 Gb/s
S3 buckets Data
Management
System
ApplicationsFile GatewayBMS
Scientific
Instruments
1 2
3 4
AWS Storage Gateway in Image Discovery
AWS Direct
Connect
10 Gb/s
BMS AWS Cloud
S3 bucket A
S3 bucket B
…
S3 bucket N
S3 bucket N+1
S3 object store
S3 bucket 3
S3 bucket 2
S3 bucket 1
…
Data Management
System
(Metadata Catalog)
Image analysis
tools
S3 bucket
for transformed
images
Collaborator’s AWS Cloud
Image transformation
On premises
Scientific
instruments
Scientists
Images on local
server (NFS)
Images on local
server (NFS)
…
Images on local
server (NFS)
Local storage
layer
Image Metadata
database
…
Storage Gateway
Hardware appliance
AWS Snowball
Outcomes for BMS
Tech
Integration across standard protocols
Low-latency
Efficient data transfer
Easy to deploy: Virtual and hardware storage gateways
Data replication
Encryption in transit
Business
Cost and elasticity
Support many old and new applications
Overall simplicity
Effective workflows automation
Secure data sharing
Plan Storage Gateway deployment
Preparing for Storage Gateway
• S3 buckets
• Access policies
• File shares
• Mounting instructions
• Data transfers
Preparing for metadata catalog
• Collection names
• Directory names
• Data sources, daily volumes,
formats
• Business data tags and rules
• Access requirements
• Shared directory needs
• Data scan frequency
• Access to metadata catalog
AWS Storage Gateway hardware appliance
Appliance details
The hardware appliance comes with AWS Storage
Gateway software pre-installed on a validated
configuration of a Dell EMC PowerEdge R640XL server:
• 2 x Intel Xeon Silver 4114 2.20 GHz
processors with 10 cores each
• 128 GB DDR4 RAM
• 5 TB of usable enterprise SSD storage, with the
option to add 7 TB of usable enterprise SSD
storage for a total of 12 TB
• 4-port 10 Gigabit copper network card, with
the option to purchase and use a 4-port 10
Gigabit fiber-optic network card
• 3 years of hardware support from Dell—
accessed and coordinated through your
normal AWS support channels
1 2 3
4 5
Hardware applianceFacts:
• You own it!
• Secure local installation
• Low latency
• Data compression
• Suitable for legacy applications
• Provide local applications access to S3 storage
• Price range: $12K–$16K USD
Current limitations:
• One gateway type per appliance
• 5 TB usable storage (extendable up to 12 TB)
• Software RAID
• Intel X710 4-port 10 Gigabit fiber optic network card
• AWS Direct Connect is recommended
• Local proxy servers
1 2 3
4 5
Lessons learned
• Optimizing AWS Gateway: compute, storage, cache size
• Do not oversubscribe the CPUs of the host server (4-16-24 vCPU’s)
• Don’t mix upload buffer disks and cache storage
• Use high-performing RAID configuration for data store disks
• Cache disk configuration: Proxy server vs. Direct connect
• IP addresses, ports, and firewall rules
• Live test from actual scientific instruments
• Caution while sharing same S3 bucket through different AWS Storage Gateways
• Software-based RAID (no hardware RAID option?)
• Direct Connect links
• Storage Gateway, data governance and reliability
• Support channels and security
Preventing multiple file shares writing to S3 Bucket
When you create a file share, we
recommend that you configure your
Amazon S3 bucket so that only one
file share can write to it
If you configure your S3 bucket
to be written to by multiple file
shares, unpredictable results
can occur
To prevent this, create an S3 bucket
policy that denies all roles except the
role used for the file share to put or
delete objects in the bucket
{"Version":"2012-10-17","Statement":[
{"Sid":"DenyMultiWrite","Effect":"Deny","Principal":"*","Action":[
"s3:DeleteObject","s3:PutObject"
],
"Resource":"arn:aws:s3:::TestBucket/*","Condition":{
"StringNotLike":{"aws:userid":"TestUser:*"
}}
}]
}
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Storage Gateway General [email protected]
Paul Reed
AWS Storage Gateway Principal Product [email protected]
Asa Kalavade
Question Time
Mohammad Shaikh
Director of Research
ComputingBristol-Myers Squibb
Oleg Moiseyenko
Sr. Cloud Architect, Bristol-Myers Squibb
Take action
Deploy a Storage
Gateway VM
Learn more … aws.amazon.com/storagegateway
Start using cloud
storage on-premises
Try it out
File
(NFS/SMB)
Volume
(iSCSI)
Tape
(iSCSI VTL)
Choose your
Gateway Type
With Amazon S3, Amazon S3
Glacier, Amazon S3 Glacier
Deep Archive, and Amazon EBS
Learn more about hybrid cloud storage in these sessions
• STG231 Lift and shift your tape-based backup workflows to AWS
• STG226 Hands-on with hybrid block storage using a Volume Gateway
• STG217 Shift your tape backups to AWS to save time and money
• STG213— —Storage for hybrid cloud and edge computing: Bring AWS to you
• STG313 Hybrid architectures for database backups & file migrations
• STG336— Using hybrid cloud storage to close a data center and migrate
Thank you!
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Paul Reed
Asa Kalavade
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.