(stg312) amazon glacier deep dive: cold data storage in aws

62
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Henry Zhang, Senior Product Manager, Amazon Glacier October 2015 Amazon Glacier Deep Dive STG312

Upload: amazon-web-services

Post on 08-Jan-2017

1.537 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Henry Zhang, Senior Product Manager, Amazon Glacier

October 2015

Amazon Glacier Deep Dive

STG312

Page 2: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Audio archives – SoundCloud

• World’s leading social sound platform

• Audio files transcoded and stored in multiple formats

• Stores PBs of data

• Transcoded files served from Amazon S3

• Originals moved to Amazon Glacier for long-term retention

Page 3: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Video archives – Sony Media Cloud (Ci)

Amazon

Glacier

Page 4: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Tape replacement – King County

• Most populous county in Washington State

• Replace tape solution for backup from 17 agencies

• Meet compliance requirement

• Saved $1MM in first year, no more tape refresh or

management churn

Page 5: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Archive:

Data retained for the long term,

for compliance or potential

future reference

Data archiving needs are growing everywhere

• Media assets, 4K, 8K

• Health care / Life sciences

• Financial services

• Regulated industries

• Oil and gas / Geospatial

• Digital preservation

• Long-term backups

• Logs

Page 6: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Traditional archiving approaches

• Tape silos / Tape libraries

• Tape drives (LTO-X / DLT / etc.)

• Virtual tape libraries (VTLs)

• Tape out / Vaulting

• Specialized software & personnel

Page 7: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

How can Amazon Glacier help with your archival?

Metered usage:

Pay as you go

No capital investment

No commitment

No risky capacity planning

Avoid risks of physical

media handling

Control your

geographic locality for

performance and

compliance

Page 8: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Amazon Glacier is a low-cost storage service for

archival data with long-term retention requirements.

$0.007/GB per month 3-5 hour data retrievalFinancial records

Medical PACs images

High Res Media Assets

Page 9: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

How can Amazon Glacier help with your archival?

Extremely low-cost archive storage service, starting at $0.007 GB/mo

Allows you to retrieve data within 3-5 hours

99.999999999% of durability (7 orders of magnitude higher than 2 copies of tape)

No data migration, no hardware/infrastructure investments

Infinite scale and pay for what you use

Access to on-demand compute resource on AWS

Page 10: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Getting started – key concepts

• Account – Access AWS services, view billing/usage, manage security

• Vaults – Container for archives, up to 1000 vaults per account

• Archives – Files and records, write-once, 40TB max, unlimited archives

• Inventory – Cold index of archive properties refreshed every 24 hours

Page 11: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Amazon Glacier – 3 ways to Access

•Direct Glacier API/SDK

•S3 lifecycle integration

•Third party tools and gateways

Page 12: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Amazon Glacier concepts: Uploading data

Create vault (films)1

Configure access policies2

ArchiveApp user policy

Effect:Allow

Resource:

arn:aws:glacier:<accountId>:vaults/Films

Action: glacier:UploadArchive

3 Upload archivesUploadArchive(data) ->

Archive ID

Page 13: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Amazon Glacier concepts: Retrieving data

Initiate JobArchiveId: AE99F…

Vault: Films -> Job ID

1

3-5 hours for job completion2

3 Job completion notification

4 Download output

Page 14: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Amazon Glacier – Amazon S3 lifecycle archival

• Seamlessly move data from Amazon S3 to Amazon Glacier

• Automated lifecycle rules

• Transition based on object age or predefined date

Page 15: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Amazon Glacier – Backup software integration

• CommVault – Native Integration

with Amazon S3 & Amazon Glacier

• Deduplication & encryption

• Single console management

Amazon S3 Amazon Glacier

Page 16: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Amazon Glacier – Third-party tools and gateways

•Consumer grade: less than $50

• Example: Cloudberry, FastGlacier, Arq (Haystack Software)

•Small / medium business: $500 - $1,000

• Example: Synology, Veeam, QNap

•Enterprise grade gateway (price varies)

• Example: NetApp AltaVault

Page 17: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Best practices – Prepare your data

Page 18: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Use Archive descriptions

• Use Archive description field for

metadata.

• If local index is corrupted or

destroyed, use archive description

to reconstruct critical mappings.

• For example, create index entry,

add primary key to archive

description on upload.

Page 19: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Small objects and object size overhead

• Every archive has 32KB of associated overhead

and some operations are charged per request

• For archive size of 3.2MB ~1% cost overheads

• For 1KB archive, 97% of cost would go to

overhead

• Solution is aggregation – recommend minimum

size on the order of at least MBs

Page 20: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Archive aggregation

Checksum 2

Checksum 1

File 2

Checksum 3

. . .

Local index

File 1 offset

File 1

File 2 offset

File 3 offset

Index/directory

Checksum & metadata

Checksum & metadata

Checksum & metadata

Archive

Page 21: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Best practices – Optimize upload

Page 22: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Best practices: Multipart uploads

Improve throughput, reliability, and get idempotency with multipart uploads

1. InitiateMultipartUpload(partSize) → uploadId

2. UploadPart(uploadId, data)

3. CompleteMultipartUpload(uploadId) → archiveId

Arc

hiv

e

Parallel Uploads

Parts

Page 23: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Best practices: Data ingestion options

AWS Direct

ConnectDedicated bandwidth between

your site and AWS

InternetTransfer data in a secure SSL tunnel

over the public Internet

AWS Import/Export

SnowballPhysical transfer of media into

and out of AWS

Page 24: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Best practices – Cost management

Page 25: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Amazon Glacier – Data retrieval policies

• Provides transparency and cost control for data retrievals

• Governs all retrieval activities for an account in a region

• Synchronously accept/reject each retrieval request

• Accounts for inflight retrieval operations

Page 26: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Amazon Glacier – Data retrieval policies

Page 27: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Amazon Glacier – Data retrieval policies

Page 28: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Amazon Glacier – Data retrieval policies

Page 29: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Amazon Glacier – Data retrieval policies

Page 30: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Cost allocation with vault tags

Page 31: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Best practices – Security and compliance

Page 32: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Amazon Glacier – Audit logging with AWS CloudTrail

• Enable AWS CloudTrail in

console

• Control plane events –

Vault activities

• Data plane events –

Archive activities

Page 33: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Vault access policies

• Manage access to a Vault in a single location – single IAM policy

– Grant/revoke access to internal business units/teams

– “Marketing_Vault” has a distinct access policy than “DevOps_Vault”

• Easily manage cross-account access for your business partner

– Simply add a section for your business partner in the same policy

Page 34: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Amazon Glacier Vault Lock allows you to easily

set compliance controls on individual vaults and

enforce them via a lockable policy.

Time-based retention

MFA Authentication

Controls govern all

records in a Vault

Immutable policy

Two-step locking

Compliance Storage with Vault Lock

Page 35: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Vault Lock for compliance storage

• Non-overwrite, non-erasable records

• Time-based retention with “ArchiveAgeInDays” control

• Policy lockdown (strong governance)

• Legal hold with vault-level tags

• Configure optional designated third-party access and grant

temporary access

Page 36: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Example control: 1 year record retention

Page 37: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Example control: 1 year record retention

Page 38: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Vault Lock: Two-step locking

Page 39: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Legal hold with vault-level tags

Page 40: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Example control: Legal hold

Page 41: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Vault lock best practices

Page 42: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Vault access policy• Can be updated/deleted

Vault lock policy• Lockable/Immutable policy

• Cannot be updated/deleted after lockdown

Use vault access policy to:• Designate third-party access

• Grant temporary read permissions when necessary

Use vault lock policy to:• Deploy regulatory controls such

as records retention

• Enforce data access through multi-factor authentication only

Compliance/Governance Flexibility

Using vault lock policy with vault access policy

Page 43: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Vault Lock in the Glacier Console

Page 44: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Vault Lock in the Glacier Console

Page 45: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Vault Lock in the Glacier Console

Page 46: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Vault Lock in the Glacier Console

Page 47: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Vault Lock in the Glacier Console

Page 48: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Vault Lock in the Glacier Console

Page 49: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Vault Lock in the Glacier Console

Page 50: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Vault Lock in the Glacier Console

Page 51: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Vault Lock in the Glacier Console

Page 52: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Vault Lock in the Glacier Console

Page 53: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Vault Lock in the Glacier Console

Page 54: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Vault Lock in the Glacier Console

Page 55: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Vault Lock in the Glacier Console

Page 56: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Vault Lock in the Glacier Console

Page 57: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Vault Lock in the Glacier Console

Page 58: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Vault Lock in the Glacier Console

Page 59: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Vault Lock in the Glacier Console

Page 60: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Amazon Glacier received a third-party assessment

from Cohasset Associates on how Amazon Glacier

with Vault Lock can be used to meet the

requirements of SEC 17a-4(f) and CFTC 1.31(b)-(c).

Page 61: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Thank you!

Page 62: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Remember to complete

your evaluations!