(pfc306) performance tuning amazon ec2 instances | aws re:invent 2014

81
November 12, 2014 | Las Vegas, NV PFC306 Brendan Gregg, Performance Engineering, Netflix

Upload: amazon-web-services

Post on 29-Jun-2015

6.951 views

Category:

Technology


5 download

DESCRIPTION

Netflix tunes Amazon EC2 instances for maximum performance. In this session, you learn how Netflix configures the fastest possible EC2 instances, while reducing latency outliers. This session explores the various Xen modes (e.g., HVM, PV, etc.) and how they are optimized for different workloads. Hear how Netflix chooses Linux kernel versions based on desired performance characteristics and receive a firsthand look at how they set kernel tunables, including hugepages. You also hear about Netflix's use of SR-IOV to enable enhanced networking and their approach to observability, which can exonerate EC2 issues and direct attention back to application performance.

TRANSCRIPT

Page 1: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014

November 12, 2014 | Las Vegas, NV

PFC306

Brendan Gregg, Performance Engineering, Netflix

Page 2: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
Page 3: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
Page 4: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
Page 5: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
Page 6: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
Page 7: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
Page 8: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
Page 9: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014

S3

EC2

Cassandra

EVCache

Applications

(Services)

ELBElasticsearch

SQSSES

Page 10: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
Page 11: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
Page 12: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
Page 13: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014

Start

i2 Select memory to

cache working set

Find best

balance

Page 14: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014

Instance

Instance

Instance

ASG-v011

Instance

Instance

Instance

ASG-v010

ASG Cluster

prod1

Canary

ELB

Page 15: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
Page 16: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
Page 17: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014

Select instance families Select resources

From any desired

resource, see

types & cost

Page 18: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014

eg, 8 vCPU:

Page 19: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
Page 20: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
Page 21: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014

Headroom UnacceptableAcceptable

Page 22: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
Page 23: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
Page 24: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
Page 25: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
Page 26: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014

Services

Cost per hour

Page 27: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
Page 28: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
Page 29: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
Page 30: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
Page 31: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
Page 32: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
Page 33: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
Page 34: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
Page 35: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
Page 36: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014

# schedtool –B PID

Page 37: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014

vm.swappiness = 0 # from 60

Page 38: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014

# echo never > /sys/kernel/mm/transparent_hugepage/enabled # from madvise

Page 39: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014

vm.dirty_ratio = 80 # from 40vm.dirty_background_ratio = 5 # from 10vm.dirty_expire_centisecs = 12000 # from 3000mount -o defaults,noatime,discard,nobarrier …

Page 40: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014

/sys/block/*/queue/rq_affinity2/sys/block/*/queue/scheduler noop/sys/block/*/queue/nr_requests256/sys/block/*/queue/read_ahead_kb 256mdadm –chunk=64 ...

Page 41: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014

net.core.somaxconn = 1000net.core.netdev_max_backlog = 5000net.core.rmem_max = 16777216net.core.wmem_max = 16777216net.ipv4.tcp_wmem = 4096 12582912 16777216net.ipv4.tcp_rmem = 4096 12582912 16777216net.ipv4.tcp_max_syn_backlog = 8096net.ipv4.tcp_slow_start_after_idle = 0net.ipv4.tcp_tw_reuse = 1net.ipv4.ip_local_port_range = 10240 65535net.ipv4.tcp_abort_on_overflow = 1 # maybe

Page 42: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014

echo tsc > /sys/devices/system/clocksource/clocksource0/current_clocksource

Page 43: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
Page 44: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
Page 45: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
Page 46: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
Page 47: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014

Resource

Utilization

(%)X

Page 48: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
Page 49: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
Page 50: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
Page 51: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014

Application

System Libraries

System Calls

Kernel

Devices

Page 52: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
Page 53: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
Page 54: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014

$ sar -n TCP,ETCP,DEV 1Linux 3.2.55 (test-e4f1a80b) 08/18/2014 _x86_64_ (8 CPU)

09:10:43 PM IFACE rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s09:10:44 PM lo 14.00 14.00 1.34 1.34 0.00 0.00 0.0009:10:44 PM eth0 4114.00 4186.00 4537.46 28513.24 0.00 0.00 0.00

09:10:43 PM active/s passive/s iseg/s oseg/s09:10:44 PM 21.00 4.00 4107.00 22511.00

09:10:43 PM atmptf/s estres/s retrans/s isegerr/s orsts/s09:10:44 PM 0.00 0.00 36.00 0.00 1.00[…]

Page 55: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
Page 56: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
Page 57: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
Page 58: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
Page 59: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014

Stack frame

Mouse-over

frames to

quantify

Ancestry

Page 60: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014

# git clone https://github.com/brendangregg/FlameGraph# cd FlameGraph# perf record -F 99 -ag -- sleep 60# perf script | ./stackcollapse-perf.pl | ./flamegraph.pl > perf.svg

Page 61: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
Page 62: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014

Broken

Java stacks

(missing

frame

pointer)

Kernel

TCP/IP

GC

Idle

threadTime

Locksepoll

Page 63: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
Page 64: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
Page 65: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014

# ./iosnoop –tsTracing block I/O. Ctrl-C to end.STARTs ENDs COMM PID TYPE DEV BLOCK BYTES LATms5982800.302061 5982800.302679 supervise 1809 W 202,1 17039600 4096 0.625982800.302423 5982800.302842 supervise 1809 W 202,1 17039608 4096 0.425982800.304962 5982800.305446 supervise 1801 W 202,1 17039616 4096 0.485982800.305250 5982800.305676 supervise 1801 W 202,1 17039624 4096 0.43[…]

# ./iosnoop –hUSAGE: iosnoop [-hQst] [-d device] [-i iotype] [-p PID] [-n name] [duration]

-d device # device string (eg, "202,1)-i iotype # match type (eg, '*R*' for all reads)-n name # process name to match on I/O issue-p PID # PID to match on I/O issue-Q # include queueing time in LATms-s # include start time of I/O (s)-t # include completion time of I/O (s)

[…]

Page 66: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
Page 67: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014

# perf record –e skb:consume_skb –ag -- sleep 10# perf report[...]

74.42% swapper [kernel.kallsyms] [k] consume_skb|--- consume_skb

arp_processarp_rcv__netif_receive_skb_core__netif_receive_skbnetif_receive_skbvirtnet_pollnet_rx_action__do_softirqirq_exitdo_IRQret_from_intr

[…]

Summarizing stack traces for a

tracepoint

perf_events can do many things,

it is hard to pick just one example

Page 68: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
Page 69: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014

ec2-guest# ./showboostCPU MHz : 2500Turbo MHz : 2900 (10 active)Turbo Ratio : 116% (10 active)CPU 0 summary every 5 seconds...

TIME C0_MCYC C0_ACYC UTIL RATIO MHz06:11:35 6428553166 7457384521 51% 116% 290006:11:40 6349881107 7365764152 50% 115% 289906:11:45 6240610655 7239046277 49% 115% 2899[...]

Real CPU MHz

Page 70: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
Page 71: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
Page 72: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014

Region App Breakdowns

Metrics

Options

Interactive

Graph

Summary Statistics

Page 73: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
Page 74: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
Page 75: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014

Utilization Saturation

ErrorsPer device

Breakdowns

Page 76: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
Page 77: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
Page 78: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014

http://aws.amazon.com/ec2/instance-types/

http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-types.html

http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking.html

http://www.slideshare.net/cpwatson/cpn302-yourlinuxamioptimizationandperformance

http://www.brendangregg.com/blog/2014-09-27/from-clouds-to-roots.html

http://www.brendangregg.com/blog/2014-05-07/what-color-is-your-xen.html

http://www.brendangregg.com/linuxperf.html

http://www.slideshare.net/brendangregg/linux-performance-tools-2014

http://www.brendangregg.com/USEmethod/use-linux.html

http://www.brendangregg.com/blog/2014-06-12/java-flame-graphs.html

https://github.com/brendangregg/FlameGraph https://github.com/brendangregg/perf-tools

Page 79: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
Page 80: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014

Talk Time Title

PFC-305 Wednesday, 1:15pm Embracing Failure: Fault Injection and Service Reliability

BDT-403 Wednesday, 2:15pm Next Generation Big Data Platform at Netflix

PFC-306 Wednesday, 3:30pm Performance Tuning EC2

DEV-309 Wednesday, 3:30pm From Asgard to Zuul, How Netflix’s proven Open Source

Tools can accelerate and scale your services

ARC-317 Wednesday, 4:30pm Maintaining a Resilient Front-Door at Massive Scale

PFC-304 Wednesday, 4:30pm Effective Inter-process Communications in the Cloud: The

Pros and Cons of Micro Services Architectures

ENT-209 Wednesday, 4:30pm Cloud Migration, Dev-Ops and Distributed Systems

APP-310 Friday, 9:00am Scheduling using Apache Mesos in the Cloud

Page 81: (PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014

http://bit.ly/awsevals