aws webcast - archiving in the cloud - best practices for amazon glacier
DESCRIPTION
Join our webinar to learn more about how to build a cost effective archive application using Amazon Glacier, an extremely low cost, secure, highly durable, and easy to use storage service in the AWS cloud. We will explain how Amazon Glacier works and walk through some best practices to get the most out of the service We will also highlight how to choose between Amazon Glacier and Amazon S3’s Glacier storage option. Learn more: http://aws.amazon.com/glacier/TRANSCRIPT
Archiving in the Cloud
Best Practices for Amazon Glacier
Colin Lazier & Henry Zhang
What We’ll Cover Today
Overview of Amazon Glacier
Amazon Glacier Key Concepts
Key Use Cases and Benefits
Best Practices with Amazon Glacier
Q & A
Overview of Amazon Glacier
With Amazon Glacier, You Can:
Achieve extremely low storage costs for archive data
Pay only for what you use
No longer maintain your own physical storage infrastructure
Increase durability and geographic redundancy
Secure your data
Access on-demand computing EC2
What is Archival Data?
Most data stored is infrequently accessed (Cold Data)
Often older data still important for future reference
Typically long-lived (months or years)
Business and regulatory reasons to retain data
What is Amazon Glacier?
Extremely low cost archive storage service
Allows you to retrieve any amount of data within 3-5 hours
Provides high-durability storage
Makes it easy to retain data safely and securely for months,
years, or decades
Benefits with Amazon Glacier
Secure Low cost
Simple Durable
Flexible
Use multiple services
As little as $0.01/GB/month with no up-front
capital commitments.
Leverage AWS’ robust security platform.
Control access to your data.
Designed to provide an average annual
durability of 99.999999999% per archive.
Eliminate your operational overhead. Focus
your resources on your core business.
Easily leverage other AWS services once your
data is in the AWS cloud.
Store any amount of data on-demand. Eliminate
the need for capacity planning.
Media Archives
Enterprise Archives
Scientific Archives
Enterprise Information Archiving includes archiving
email, business documents and other unstructured
content. Driven by business needs, compliance
requirements, and to reduce primary storage costs.
Customer Data Archiving Examples
Media companies’ core assets (books, movies,
music, TV etc.) can grow to hundreds of petabytes.
Amazon Glacier reduces the cost of storing these
assets while simultaneously increasing the durability,
ease of use, and accessibility of the content.
Research and scientific organizations, such as
pharmaceutical and bio-tech companies, as well as
universities, store many large but rarely accessed
data sets.
Amazon Glacier Key Concepts
High-level Amazon Glacier Architecture
RDS
Control Access to your data
Amazon IAM
Send + Receive Data
Archive Application
Archive Application (Search, Policy-based data
management, eDiscovery)
Index (Index of your
archived data)
Amazon Glacier
HTTP / REST APIs / AWS Import/Export
Amazon Glacier Concepts
Archives
An archive is a durably stored block of information. You store your data in
Amazon Glacier as archives. You may upload a single file as an archive,
but your request costs will be lower if you aggregate your data. TAR and
ZIP are common formats that customers use to aggregate multiple files into
a single file before uploading to Amazon Glacier
Vaults
You use vaults to organize the data you store in Amazon Glacier. Each
archive is stored in a vault of your choice. You may control access to your
data by setting vault-level access policies
Uploading Data to Amazon Glacier
Create Vault
1
Configure Access Policies (Optional)
via
2 Upload Archives
3 Retrieve Archives
Archives are retrieved 3 - 5 hours after being requested
Initiate
Job
Track
Job
Download
Job
Output
Amazon Identity and
Access Management
Retrieving Data from Amazon Glacier
Create Vault
1
Configure Access Policies (Optional)
via
2 Upload Archives
3 Retrieve Archives
Archives are retrieved 3 - 5 hours after being requested
Initiate
Job
Track
Job
Download
Job
Output
Amazon Identity and
Access Management
Sending / Retrieving Data
Sending and retrieving data
• Glacier REST-based APIs to send and retrieve data
• Direct Connect
• Amazon S3 lifecycle archival to Amazon Glacier
Additional Amazon Glacier / AWS Concepts
Vault Inventory
For a real time view of the contents of your vaults, you would refer to your index. For Disaster Recovery purposes, in case you lose or corrupt your index, Amazon Glacier maintains an inventory of all your archives in a vault. The vault inventory is updated approximately once a day
Amazon Simple Notification Service (Amazon SNS) Amazon Simple Notification Service (Amazon SNS) is a web service that makes it easy to set up, operate, and send notifications from the cloud
Amazon Glacier Key Concepts
Create Vault
1
Configure Access Policies
(Optional) via
Configure Notification Policies
(Optional) via
Amazon Simple
Notification Service
2 Upload Archives
3 Retrieve Archives
Archives retrieved 3 - 5 hours after being requested
Your
Application
Notifications sent via
Amazon SNS
Download
Archives
Initiate
Job
Track
Job
Download
Job
Output
AWS Management Console Operations
Also accessible via Amazon Glacier APIs or SDKs Amazon Glacier API Operations
Also accessible via Amazon Glacier SDKs
Amazon Identity and
Access Management
Best Practices with Amazon Glacier
Aggregate Large Number of Smaller Files
Reduce overhead costs
Reduce requests costs
Find ideal archive size for your use case
Uploading Large files – MultipartUpload
Internet weather
Distance between your application and Amazon Glacier
Cost of retrying failed transmissions
Improve upload throughput
Multipart Upload
Improve speed and reliability with multipart upload
1. InitiateMultipartUpload(partSize) -> uploadId
2. UploadPart(uploadId, data)
3. CompleteMultipartUpload(uploadId) -> archiveId
Optimize Data Retrieval and Download
Retrieval vs. Download
Ranged Retrieval
• Reduce cost, control retrieval rate
• Retrieve only what you need
Ranged Download (Get)
• Improve download speed
• Be aware of your download speed as data is only staged for 24 hours
Ranged Retrieval Example
Example 12 GB archive
Retrieved using a single 4 hour job = 3GB/hour peak
retrieval
Retrieved over 24 hours using 6 consecutive jobs =
0.5GB/hour peak retrieval
Amazon Glacier Benefits
Secure Low cost
Simple Durable
Flexible
Use multiple services
As little as $0.01/GB/month with no up-front
capital commitments.
Leverage AWS’ robust security platform.
Control access to your data.
Designed to provide an average annual
durability of 99.999999999% per archive.
Eliminate your operational overhead. Focus
your resources on your core business.
Easily leverage other AWS services once your
data is in the AWS cloud.
Store any amount of data on-demand. Eliminate
the need for capacity planning.
Thank You
Q&A with
Colin Lazier & Henry Zhang
http://aws.amazon.com/glacier