s3 and glacier
DESCRIPTION
This is a presentation that I gave at the AWS Meetup in Ann Arbor, Michigan back in January. It recounts some experiences that I had while working on a project with RightBrain Networks that involved moving millions of small files around between S3, Glacier and an NFS NAS volume. A good time was had by all.TRANSCRIPT
![Page 1: S3 and Glacier](https://reader030.vdocument.in/reader030/viewer/2022020101/55495d59b4c905f74e8b5656/html5/thumbnails/1.jpg)
Glacier and S3
Dave ThompsonAWS Meetup Michigan, Jan 2014
![Page 2: S3 and Glacier](https://reader030.vdocument.in/reader030/viewer/2022020101/55495d59b4c905f74e8b5656/html5/thumbnails/2.jpg)
Who the @#%^ is Dave Thompson?
• DevOps/SRE/Systems guy from MI by way of San Francisco
• Current Employer: MuleSoft Inc
• Past Employers: Netflix, Domino’s Pizza, U of M
• Also contributing to the madness at RBN
![Page 3: S3 and Glacier](https://reader030.vdocument.in/reader030/viewer/2022020101/55495d59b4c905f74e8b5656/html5/thumbnails/3.jpg)
… and what is he talking about?
• Today, we’ll talk about a case study using Glacier with S3, and the various surprises that I encountered on the way.
![Page 4: S3 and Glacier](https://reader030.vdocument.in/reader030/viewer/2022020101/55495d59b4c905f74e8b5656/html5/thumbnails/4.jpg)
Act 1: A New Project
![Page 5: S3 and Glacier](https://reader030.vdocument.in/reader030/viewer/2022020101/55495d59b4c905f74e8b5656/html5/thumbnails/5.jpg)
Our Story So Far
• Client’s datacenter is going dark in a few months.
• Their app is data heavy… a little less than 1 BN small files.
![Page 6: S3 and Glacier](https://reader030.vdocument.in/reader030/viewer/2022020101/55495d59b4c905f74e8b5656/html5/thumbnails/6.jpg)
Our Story So Far (cont.)
• Client has migrated app servers to EC2
• Data has been uploaded to S3
![Page 7: S3 and Glacier](https://reader030.vdocument.in/reader030/viewer/2022020101/55495d59b4c905f74e8b5656/html5/thumbnails/7.jpg)
Everything Goes According to Plan!
• Files are uploaded to S3
• App updated to use S3 data
![Page 8: S3 and Glacier](https://reader030.vdocument.in/reader030/viewer/2022020101/55495d59b4c905f74e8b5656/html5/thumbnails/8.jpg)
Act 2: The Public Cloud Strikes Back
![Page 9: S3 and Glacier](https://reader030.vdocument.in/reader030/viewer/2022020101/55495d59b4c905f74e8b5656/html5/thumbnails/9.jpg)
Things take a dark turn…
S3 is too latent for the app.
![Page 10: S3 and Glacier](https://reader030.vdocument.in/reader030/viewer/2022020101/55495d59b4c905f74e8b5656/html5/thumbnails/10.jpg)
Enter RBN!
The proposal: migrate the data from S3 to a cloud storage solution (Zadara), and archive the files to
Glacier.
![Page 11: S3 and Glacier](https://reader030.vdocument.in/reader030/viewer/2022020101/55495d59b4c905f74e8b5656/html5/thumbnails/11.jpg)
Everything Goes According to Plan
(Again)!
• Files are copied to Zadara share
• S3 lifecycle configured to archive objects to Glacier
![Page 12: S3 and Glacier](https://reader030.vdocument.in/reader030/viewer/2022020101/55495d59b4c905f74e8b5656/html5/thumbnails/12.jpg)
The Zadara share becomes corrupted after the data is
migrated.
Except…
![Page 13: S3 and Glacier](https://reader030.vdocument.in/reader030/viewer/2022020101/55495d59b4c905f74e8b5656/html5/thumbnails/13.jpg)
Amazon Glacier: a Primer
• Glacier is an archival solution provided by AWS.
• It’s closely integrated with S3.
• Use cases for Glacier and S3 are different, though…
![Page 14: S3 and Glacier](https://reader030.vdocument.in/reader030/viewer/2022020101/55495d59b4c905f74e8b5656/html5/thumbnails/14.jpg)
S3 vs Glacier
• Unlike an S3 GET, a Glacier RETRIEVAL takes ~4 hours
• UPLOAD and RETRIEVAL API requests are 10x more expensive on Glacier than comparable S3 requests
• Bandwidth charges for RETRIEVAL requests apply, even inside us-east-1
![Page 15: S3 and Glacier](https://reader030.vdocument.in/reader030/viewer/2022020101/55495d59b4c905f74e8b5656/html5/thumbnails/15.jpg)
S3 vs Glacier (cont.)
• This means that Glacier is optimized for compressed archives (i.e. tarball data)
• S3 is about equally suited for smaller or larger files
• Automatically archiving S3 objects to Glacier can thus lead to great sadness.
![Page 16: S3 and Glacier](https://reader030.vdocument.in/reader030/viewer/2022020101/55495d59b4c905f74e8b5656/html5/thumbnails/16.jpg)
What a Twist!
~100MM files had already been automatically archived to Glacier.
![Page 17: S3 and Glacier](https://reader030.vdocument.in/reader030/viewer/2022020101/55495d59b4c905f74e8b5656/html5/thumbnails/17.jpg)
Act 3: Return of the Data
![Page 18: S3 and Glacier](https://reader030.vdocument.in/reader030/viewer/2022020101/55495d59b4c905f74e8b5656/html5/thumbnails/18.jpg)
The New Plan
• Restore files from Glacier back to S3
• Migrate data from S3 to Zadara share
• Archive files back to Glacier in tar.gz chunks
• Create DynamoDB index from file name to Glacier archive for future restore
![Page 19: S3 and Glacier](https://reader030.vdocument.in/reader030/viewer/2022020101/55495d59b4c905f74e8b5656/html5/thumbnails/19.jpg)
but wait…
How much was this restore going to cost?
![Page 20: S3 and Glacier](https://reader030.vdocument.in/reader030/viewer/2022020101/55495d59b4c905f74e8b5656/html5/thumbnails/20.jpg)
Task 0: Calculating Cost
• Glacier pricing model is… interesting
• Costs are fixed per UPLOAD and RETRIEVAL request
• Cost for bandwidth based on the peak outbound bandwidth consumed in a monthly billing period2
• Monthly bandwidth equal to 5% of your total Glacier usage is permitted free of charge
![Page 21: S3 and Glacier](https://reader030.vdocument.in/reader030/viewer/2022020101/55495d59b4c905f74e8b5656/html5/thumbnails/21.jpg)
The Equation(Oh, boy. Okay, let’s do
this.)• Let X equal the number of RETRIEVE API calls
made.
• Let Y equal the amount to restore in GB.
• Let Z equal the total amount of data archived in GB.
• Let T equal the time to restore the data in hours.
• Then the cost can be expressed as:(0.05 * (X / 1000)) + (((Y / T) - (Z * .05 / 30) * .01 * 720)
![Page 22: S3 and Glacier](https://reader030.vdocument.in/reader030/viewer/2022020101/55495d59b4c905f74e8b5656/html5/thumbnails/22.jpg)
Task 1: Restore from Glacier
• Two m2.large instances running a Python daemon
• Multiple iterations, from single threaded to multi-threaded to multiprocessing with threading
After iterating several times to get the speed we needed, I started the process for the ‘last time’ on a Sunday evening. ETA: ~5 days
![Page 23: S3 and Glacier](https://reader030.vdocument.in/reader030/viewer/2022020101/55495d59b4c905f74e8b5656/html5/thumbnails/23.jpg)
This Page Intentionally Left
Blank
![Page 24: S3 and Glacier](https://reader030.vdocument.in/reader030/viewer/2022020101/55495d59b4c905f74e8b5656/html5/thumbnails/24.jpg)
Protip:Glacier is not optimized for RPS
![Page 25: S3 and Glacier](https://reader030.vdocument.in/reader030/viewer/2022020101/55495d59b4c905f74e8b5656/html5/thumbnails/25.jpg)
Task 1: Restore from Glacier (cont.)
Glacier team was not amused.
![Page 26: S3 and Glacier](https://reader030.vdocument.in/reader030/viewer/2022020101/55495d59b4c905f74e8b5656/html5/thumbnails/26.jpg)
Task 1: Restore from Glacier (cont.)
Restore continued at the ‘suggested’ rate, and thereafter completed successfully a couple of weeks later.
Task 1 complete!
![Page 27: S3 and Glacier](https://reader030.vdocument.in/reader030/viewer/2022020101/55495d59b4c905f74e8b5656/html5/thumbnails/27.jpg)
Task 2: Migrate and Archive Data
Now we just needed to migrate the data from S3 to Zadara (again), create tarballs of the files, archive them to Glacier, and create a DynamoDB index so you can look up individual files.
Easy!
![Page 28: S3 and Glacier](https://reader030.vdocument.in/reader030/viewer/2022020101/55495d59b4c905f74e8b5656/html5/thumbnails/28.jpg)
Task 2: Migrate and Archive Data (cont.)
Back to iPython and Boto. Recent experience with Python threading and multiprocessing was to prove helpful.
![Page 29: S3 and Glacier](https://reader030.vdocument.in/reader030/viewer/2022020101/55495d59b4c905f74e8b5656/html5/thumbnails/29.jpg)
This Page Intentionally Left
Blank
![Page 30: S3 and Glacier](https://reader030.vdocument.in/reader030/viewer/2022020101/55495d59b4c905f74e8b5656/html5/thumbnails/30.jpg)
Great Success!
And the whole thing only took about 10x as long as the client initially estimated!
![Page 31: S3 and Glacier](https://reader030.vdocument.in/reader030/viewer/2022020101/55495d59b4c905f74e8b5656/html5/thumbnails/31.jpg)
Lessons Learned
• Glacier is optimized for large, compressed files and lower request rates.
• Be very careful about the S3 -> Glacier lifecycle option.
• If you DoS an Amazon service, you get special attention!
![Page 32: S3 and Glacier](https://reader030.vdocument.in/reader030/viewer/2022020101/55495d59b4c905f74e8b5656/html5/thumbnails/32.jpg)
Questions have you?