(stg312) amazon glacier deep dive: cold data storage in aws
TRANSCRIPT
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Henry Zhang, Senior Product Manager, Amazon Glacier
October 2015
Amazon Glacier Deep Dive
STG312
Audio archives – SoundCloud
• World’s leading social sound platform
• Audio files transcoded and stored in multiple formats
• Stores PBs of data
• Transcoded files served from Amazon S3
• Originals moved to Amazon Glacier for long-term retention
Video archives – Sony Media Cloud (Ci)
Amazon
Glacier
Tape replacement – King County
• Most populous county in Washington State
• Replace tape solution for backup from 17 agencies
• Meet compliance requirement
• Saved $1MM in first year, no more tape refresh or
management churn
Archive:
Data retained for the long term,
for compliance or potential
future reference
Data archiving needs are growing everywhere
• Media assets, 4K, 8K
• Health care / Life sciences
• Financial services
• Regulated industries
• Oil and gas / Geospatial
• Digital preservation
• Long-term backups
• Logs
Traditional archiving approaches
• Tape silos / Tape libraries
• Tape drives (LTO-X / DLT / etc.)
• Virtual tape libraries (VTLs)
• Tape out / Vaulting
• Specialized software & personnel
How can Amazon Glacier help with your archival?
Metered usage:
Pay as you go
No capital investment
No commitment
No risky capacity planning
Avoid risks of physical
media handling
Control your
geographic locality for
performance and
compliance
Amazon Glacier is a low-cost storage service for
archival data with long-term retention requirements.
$0.007/GB per month 3-5 hour data retrievalFinancial records
Medical PACs images
High Res Media Assets
How can Amazon Glacier help with your archival?
Extremely low-cost archive storage service, starting at $0.007 GB/mo
Allows you to retrieve data within 3-5 hours
99.999999999% of durability (7 orders of magnitude higher than 2 copies of tape)
No data migration, no hardware/infrastructure investments
Infinite scale and pay for what you use
Access to on-demand compute resource on AWS
Getting started – key concepts
• Account – Access AWS services, view billing/usage, manage security
• Vaults – Container for archives, up to 1000 vaults per account
• Archives – Files and records, write-once, 40TB max, unlimited archives
• Inventory – Cold index of archive properties refreshed every 24 hours
Amazon Glacier – 3 ways to Access
•Direct Glacier API/SDK
•S3 lifecycle integration
•Third party tools and gateways
Amazon Glacier concepts: Uploading data
Create vault (films)1
Configure access policies2
ArchiveApp user policy
Effect:Allow
Resource:
arn:aws:glacier:<accountId>:vaults/Films
Action: glacier:UploadArchive
3 Upload archivesUploadArchive(data) ->
Archive ID
Amazon Glacier concepts: Retrieving data
Initiate JobArchiveId: AE99F…
Vault: Films -> Job ID
1
3-5 hours for job completion2
3 Job completion notification
4 Download output
Amazon Glacier – Amazon S3 lifecycle archival
• Seamlessly move data from Amazon S3 to Amazon Glacier
• Automated lifecycle rules
• Transition based on object age or predefined date
Amazon Glacier – Backup software integration
• CommVault – Native Integration
with Amazon S3 & Amazon Glacier
• Deduplication & encryption
• Single console management
Amazon S3 Amazon Glacier
Amazon Glacier – Third-party tools and gateways
•Consumer grade: less than $50
• Example: Cloudberry, FastGlacier, Arq (Haystack Software)
•Small / medium business: $500 - $1,000
• Example: Synology, Veeam, QNap
•Enterprise grade gateway (price varies)
• Example: NetApp AltaVault
Best practices – Prepare your data
Use Archive descriptions
• Use Archive description field for
metadata.
• If local index is corrupted or
destroyed, use archive description
to reconstruct critical mappings.
• For example, create index entry,
add primary key to archive
description on upload.
Small objects and object size overhead
• Every archive has 32KB of associated overhead
and some operations are charged per request
• For archive size of 3.2MB ~1% cost overheads
• For 1KB archive, 97% of cost would go to
overhead
• Solution is aggregation – recommend minimum
size on the order of at least MBs
Archive aggregation
Checksum 2
Checksum 1
File 2
Checksum 3
. . .
Local index
File 1 offset
File 1
File 2 offset
File 3 offset
Index/directory
…
Checksum & metadata
Checksum & metadata
Checksum & metadata
Archive
Best practices – Optimize upload
Best practices: Multipart uploads
Improve throughput, reliability, and get idempotency with multipart uploads
1. InitiateMultipartUpload(partSize) → uploadId
2. UploadPart(uploadId, data)
3. CompleteMultipartUpload(uploadId) → archiveId
Arc
hiv
e
Parallel Uploads
Parts
Best practices: Data ingestion options
AWS Direct
ConnectDedicated bandwidth between
your site and AWS
InternetTransfer data in a secure SSL tunnel
over the public Internet
AWS Import/Export
SnowballPhysical transfer of media into
and out of AWS
Best practices – Cost management
Amazon Glacier – Data retrieval policies
• Provides transparency and cost control for data retrievals
• Governs all retrieval activities for an account in a region
• Synchronously accept/reject each retrieval request
• Accounts for inflight retrieval operations
Amazon Glacier – Data retrieval policies
Amazon Glacier – Data retrieval policies
Amazon Glacier – Data retrieval policies
Amazon Glacier – Data retrieval policies
Cost allocation with vault tags
Best practices – Security and compliance
Amazon Glacier – Audit logging with AWS CloudTrail
• Enable AWS CloudTrail in
console
• Control plane events –
Vault activities
• Data plane events –
Archive activities
Vault access policies
• Manage access to a Vault in a single location – single IAM policy
– Grant/revoke access to internal business units/teams
– “Marketing_Vault” has a distinct access policy than “DevOps_Vault”
• Easily manage cross-account access for your business partner
– Simply add a section for your business partner in the same policy
Amazon Glacier Vault Lock allows you to easily
set compliance controls on individual vaults and
enforce them via a lockable policy.
Time-based retention
MFA Authentication
Controls govern all
records in a Vault
Immutable policy
Two-step locking
Compliance Storage with Vault Lock
Vault Lock for compliance storage
• Non-overwrite, non-erasable records
• Time-based retention with “ArchiveAgeInDays” control
• Policy lockdown (strong governance)
• Legal hold with vault-level tags
• Configure optional designated third-party access and grant
temporary access
Example control: 1 year record retention
Example control: 1 year record retention
Vault Lock: Two-step locking
Legal hold with vault-level tags
Example control: Legal hold
Vault lock best practices
Vault access policy• Can be updated/deleted
Vault lock policy• Lockable/Immutable policy
• Cannot be updated/deleted after lockdown
Use vault access policy to:• Designate third-party access
• Grant temporary read permissions when necessary
Use vault lock policy to:• Deploy regulatory controls such
as records retention
• Enforce data access through multi-factor authentication only
Compliance/Governance Flexibility
Using vault lock policy with vault access policy
Vault Lock in the Glacier Console
Vault Lock in the Glacier Console
Vault Lock in the Glacier Console
Vault Lock in the Glacier Console
Vault Lock in the Glacier Console
Vault Lock in the Glacier Console
Vault Lock in the Glacier Console
Vault Lock in the Glacier Console
Vault Lock in the Glacier Console
Vault Lock in the Glacier Console
Vault Lock in the Glacier Console
Vault Lock in the Glacier Console
Vault Lock in the Glacier Console
Vault Lock in the Glacier Console
Vault Lock in the Glacier Console
Vault Lock in the Glacier Console
Vault Lock in the Glacier Console
Amazon Glacier received a third-party assessment
from Cohasset Associates on how Amazon Glacier
with Vault Lock can be used to meet the
requirements of SEC 17a-4(f) and CFTC 1.31(b)-(c).
Thank you!
Remember to complete
your evaluations!