aws s3 uploading tricks 2016

25
Amazon S3 Uploading Tricks 2016 Bogdan Naydenov Certified Amazon Solutions Architect 09 Jun 2016 #AWSBulgaria User Group

Upload: bogdan-naydenov

Post on 07-Jan-2017

87 views

Category:

Technology


0 download

TRANSCRIPT

Amazon S3 Uploading Tricks 2016

Bogdan NaydenovCertified Amazon Solutions Architect

09 Jun 2016#AWSBulgaria User Group

Who am I

Bogdan NaydenovSenior IT Enterprise Architect - Platform Services team at Progress Software

Mostly Operational background with more than 18 years of IT experience

MongoDB DBA MongoDB Developer MongoDB Advanced Deployment and Operations https://www.linkedin.com/in/bnaydenov @BobbyNaydenov

S3 History and Quick Facts

- 1st and oldest AWS service launched March 14, 2006

- Amazon S3 designed for 11 nines durability , 99.99999999999% in a given year

- Amazon S3 availability is 99.99% in a given year

- Objects size range in S3 is : Min: 1 Byte , Max: 5TB

- S3 Number of buckets: 100 (can be increased upon request)

- Number of objects you can store: UNLIMITED

The AWS Command Line Interface (AWS CLI)

There are two pieces of functionality built into the AWS CLI for Amazon S3 tool that help make large transfers (many files and large files) into Amazon S3 go as quickly as possible:

The AWS S3 Quick Facts

Example 1: Uploading a large number of very small files to Amazon S3

Example 2: Uploading a small number of very large files to Amazon S3

Example 3: Periodically synchronizing a directory that contains a large number of small and large files that change over time

Example 4: Improving data transfer performance with the AWS CLI

Code for examples: https://github.com/bnaydenov/awsbulgaria_s3-uploading-tricks

AWS S3 Data Transfer Scenarios

● Amazon EC2 m3.xlarge instance located in the US West (Oregon)

● 4 vCPUs and 15 GB RAM

● 1 Gb/sec over the network interface to Amazon S3

● Amazon EBS 100 GB General Purpose (SSD) volume

Examples Environment Setup

Create the 26 directories named for each letter of the alphabet, then create 2048 files containing 32K of pseudo-random content in each

1. for i in {a..z}; do mkdir $i seq -w 1 2048 | xargs -n1 -P 256 -I % dd if=/dev/urandom of=$i/% bs=32k count=1Done

2. find . -type f | wc -l53248

3. time aws s3 cp --recursive --quiet . s3://test_bucket/test_smallfiles/

real 19m59.551suser 7m6.772ssys 1m31.336s

Example 1 – Uploading a large number of small files

10 open connections to Amazon S3 even though we are only running a single instance of the copy command

lsof -i tcp:443 | tail -n +2 | wc -l10

mpstat -P ALL 1009:43:18 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle09:43:19 PM all 6.33 0.00 1.27 0.00 0.00 0.00 0.51 0.00 91.9009:43:19 PM 0 14.14 0.00 3.03 0.00 0.00 0.00 0.00 0.00 82.8309:43:19 PM 1 6.06 0.00 2.02 0.00 0.00 0.00 0.00 0.00 91.9209:43:19 PM 2 2.04 0.00 0.00 0.00 0.00 0.00 1.02 0.00 96.9409:43:19 PM 3 2.02 0.00 0.00 0.00 0.00 0.00 1.01 0.00 96.97

Example 1 took 20 minutes to move 53,248 files at a rate of 44 files/sec (53,248 files / 1,200 seconds to upload) using 10 parallel streams.

Example 1 – Uploading a large number of small filesSummary

create five 2-GB files filled with 2 GB of pseudo-random content and upload them to Amazon S3

1. seq -w 1 5 | xargs -n1 -P 5 -I % dd if=/dev/urandom of=bigfile.% bs=1024k count=2048

3. du -sk .10485804find . -type f | wc -l5

2. time aws s3 cp --recursive --quiet . s3://test_bucket/test_bigfiles/real 1m48.286suser 1m7.692ssys 0m26.860s

Example 2 – Uploading a small number of large files

1.lsof -i tcp:443 | tail -n +2 | wc -l 10

2.mpstat -P ALL 1010:35:47 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle10:35:57 PM all 6.30 0.00 3.57 76.51 0.00 0.17 0.75 0.00 12.6910:35:57 PM 0 8.15 0.00 4.37 75.21 0.00 0.71 1.65 0.00 9.9210:35:57 PM 1 5.14 0.00 3.20 75.89 0.00 0.00 0.46 0.00 15.3110:35:57 PM 2 4.56 0.00 2.85 75.17 0.00 0.00 0.46 0.00 16.9710:35:57 PM 3 7.53 0.00 3.99 79.36 0.00 0.00 0.57 0.00 8.55

3.aws s3api head-object --bucket test_bucket --key test_bigfiles/bigfile.1bytes 2147483648 binary/octet-stream "9d071264694b3a028a22f20ecb1ec851-256"

Example 2 – Uploading a small number of large filesSummary

In example 2, we moved five 2-GB files to Amazon S3 in 10 parallel streams.

The operation took 1 minute and 48 seconds.

This represents an aggregate data rate of ~758 Mb/s (85,899,706,368 bytes in 108 seconds) – about 80% of the maximum bandwidth available on our host.

Example 2 – Uploading a small number of large filesSummary

Periodically synchronizing a directory that contains a large number of small and large files that change over time

UGLY CODE BELLOW DO NOT COPY AND PASTE;i=1;while [[ $i -le 132000 ]]; do num=$((8192*4/$i)) [[ $num -ge 1 ]] || num=1 mkdir randfiles/$i seq -w 1 $num | xargs -n1 -P 256 -I % dd if=/dev/urandom of=randfiles/$i/file_$i.% bs=16k count=$i; i=$(($i*2))Done

du -sh randfiles/12G randfiles/find ./randfiles/ -type f | wc -l65537

Example 3 – Periodically synchronizing a directory that contains a large number of small and large files that change over time

1.time aws s3 sync --quiet . s3://test_bucket/test_randfiles/real 26m41.194suser 10m7.688ssys 2m17.592s

2.lsof -i tcp:443 | tail -n +2 | wc -l10

3.mpstat -P ALL 10

03:08:50 AM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle03:09:00 AM all 6.23 0.00 1.70 1.93 0.00 0.08 0.31 0.00 89.7503:09:00 AM 0 14.62 0.00 3.12 2.62 0.00 0.30 0.30 0.00 79.0303:09:00 AM 1 3.15 0.00 1.22 0.41 0.00 0.00 0.31 0.00 94.9103:09:00 AM 2 3.06 0.00 1.02 0.31 0.00 0.00 0.20 0.00 95.4103:09:00 AM 3 4.00 0.00 1.54 4.41 0.00 0.00 0.31 0.00 89.74

Example 3 – Periodically synchronizing a directory that contains a large number of small and large files that change over time

Summary

touching eight existing files to update the modification time (mtime) and creating a directory containing five new files.

touch 4096/*mkdir 5_moreseq -w 1 5 | xargs -n1 -P 5 -I % dd if=/dev/urandom of=5_more/5_more% bs=1024k count=5

find . –type f -mmin -10./4096/file_4096.8./4096/file_4096.5./4096/file_4096.3./4096/file_4096.6./4096/file_4096.4./4096/file_4096.1./4096/file_4096.7./4096/file_4096.2./5_more/5_more1./5_more/5_more4./5_more/5_more2./5_more/5_more3./5_more/5_more5

Example 3 – Periodically synchronizing a directory that contains a large number of small and large files that change over time

Summary

time aws s3 sync . s3://test_bucket/test_randfiles/upload: 4096/file_4096.1 to s3://test_bucket/test_randfiles/4096/file_4096.1……….upload: 4096/file_4096.6 to s3://test_bucket/test_randfiles/4096/file_4096.6upload: 4096/file_4096.7 to s3://test_bucket/test_randfiles/4096/file_4096.7upload: 5_more/5_more3 to s3://test_bucket/test_randfiles/5_more/5_more3upload: 5_more/5_more5 to s3://test_bucket/test_randfiles/5_more/5_more5……….upload: 5_more/5_more1 to s3://test_bucket/test_randfiles/5_more/5_more1upload: 4096/file_4096.8 to s3://test_bucket/test_randfiles/4096/file_4096.8

real 1m3.449suser 0m31.156ssys 0m3.620s

This example shows the result of running the sync command to keep local and remote Amazon S3 locations synchronized over time. Synchronizing can be much faster than creating a new copy of the data in many cases.

Example 3 – Periodically synchronizing a directory that contains a large number of small and large files that change over time

Summary

Launch 26 copies of the aws s3 cp command, one per directory:

time (find smallfiles -mindepth 1 -maxdepth 1 -type d -print0 | xargs -n1 -0 -P30 -I {} aws s3 cp --recursive --quiet {}/ s3://test_bucket/{}/)

real 2m27.878suser 8m58.352ssys 0m44.572s

lsof -i tcp:443 | tail -n +2 | wc -l260

Example 4 – Maximizing throughput

mpstat -P ALL 10

07:02:49 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle07:02:59 PM all 91.18 0.00 5.67 0.00 0.00 1.85 0.00 0.00 1.3007:02:59 PM 0 85.30 0.00 6.50 0.00 0.00 7.30 0.00 0.00 0.9007:02:59 PM 1 92.61 0.00 5.79 0.00 0.00 0.00 0.00 0.00 1.6007:02:59 PM 2 93.60 0.00 5.10 0.00 0.00 0.00 0.00 0.00 1.3007:02:59 PM 3 93.49 0.00 5.21 0.00 0.00 0.00 0.00 0.00 1.30

Using 26 invocations of the command improved the execution time by a factor of 8:2 minutes 27 seconds for 53,248 files vs. the original run time of 20 minutes. The file upload rate improved from 44 files/sec to 362 files/sec.

Example 4 – Maximizing throughputSummary

<iframe width="456" height="257" src="https://www.youtube.com/embed/G-RmHpa_yno" frameborder="0" allowfullscreen></iframe>

Amazon S3 Transfer Acceleration

54: The number of global points of presence of AWS Edge locations, where Amazon CloudFront, Amazon Route 53, and AWS WAF services are offered.

Amazon S3 Transfer Acceleration

Amazon Import/Export SnowballBonus content

Amazon Import/Export SnowballBonus content

Thank you attending #AWSBulgaria

Q&A