deep dive on delivering amazon ec2 instance performance

56
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Adam Boeglin, HPC Solutions Architect July 13, 2016 Deep Dive on Delivering Amazon EC2 Instance Performance

Upload: amazon-web-services

Post on 15-Apr-2017

877 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Deep Dive on Delivering Amazon EC2 Instance Performance

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Adam Boeglin, HPC Solutions Architect

July 13, 2016

Deep Dive on Delivering Amazon

EC2 Instance Performance

Page 2: Deep Dive on Delivering Amazon EC2 Instance Performance

Understanding the factors that go into choosing an EC2 instance

Defining system performance and how it is characterized for

different workloads

How Amazon EC2 instances deliver performance while providing

flexibility and agility

How to make the most of your EC2 instance experience through the

lens of several instance types

What to Expect from the Session

Page 3: Deep Dive on Delivering Amazon EC2 Instance Performance

InstancesAPI

Networking

EC2EC2

Purchase options

Amazon Elastic Compute Cloud is Big

Page 4: Deep Dive on Delivering Amazon EC2 Instance Performance

Host Server

Hypervisor

Guest 1 Guest 2 Guest n

Amazon EC2 Instances

Page 5: Deep Dive on Delivering Amazon EC2 Instance Performance

In the Past

First launched in August 2006

M1 instance

“One size fits all” M1

Page 6: Deep Dive on Delivering Amazon EC2 Instance Performance

2006 2008 2010 2012 2014 2016

m1.small

m1.large

m1.xlarge

c1.medium

c1.xlarge

m2.xlarge

m2.4xlarge

m2.2xlarge

cc1.4xlarge

t1.micro

cg1.4xlarge

cc2.8xlarge

m1.medium

hi1.4xlarge

m3.xlarge

m3.2xlarge

hs1.8xlarge

cr1.8xlarge

c3.large

c3.xlarge

c3.2xlarge

c3.4xlarge

c3.8xlarge

g2.2xlarge

i2.xlarge

i2.2xlarge

i2.4xlarge

i2.4xlarge

m3.medium

m3.large

r3.large

r3.xlarge

r3.2xlarge

r3.4xlarge

r3.8xlarge

t2.micro

t2.small

t2.med

c4.large

c4.xlarge

c4.2xlarge

c4.4xlarge

c4.8xlarge

d2.xlarge

d2.2xlarge

d2.4xlarge

d2.8xlarge

g2.8xlarge

t2.large

m4.largem4.xlarge

m4.2xlarge

m4.4xlarge

m4.10xlarge

Amazon EC2 Instances History

x1.32xlarge

t2.nano

Page 7: Deep Dive on Delivering Amazon EC2 Instance Performance

Instance generation

c4.largeInstance family Instance size

Page 8: Deep Dive on Delivering Amazon EC2 Instance Performance

Choices and Flexibility

Choice of Processor

Memory

Storage Options

Accelerated Graphics

Burstable Performance

Page 9: Deep Dive on Delivering Amazon EC2 Instance Performance

Servers are hired to do jobs

Performance is measured differently depending on the job

Hiring a Server

?

Page 10: Deep Dive on Delivering Amazon EC2 Instance Performance

Performance Factors

Resource Performance factors Key indicators

CPU Sockets, number of cores, clock

frequency, bursting capability

CPU utilization, run queue length

Memory Memory capacity Free memory, anonymous paging,

thread swapping

Network

interface

Max. bandwidth, packet rate Receive throughput, transmit throughput

over max. bandwidth

Disks Input/output operations per second,

throughput

Wait queue length, device utilization,

device errors

Page 11: Deep Dive on Delivering Amazon EC2 Instance Performance

Resource Utilization

For given performance, how efficiently are

resources being used?

Something at 100% utilization can’t

accept any more work

Low utilization can indicate more resource

is being purchased than needed

Page 12: Deep Dive on Delivering Amazon EC2 Instance Performance

Example: Web Application

MediaWiki installed on Apache with 140 pages of content

Load increased in intervals over time

Page 13: Deep Dive on Delivering Amazon EC2 Instance Performance

Example: Web Application

Memory stats

Page 14: Deep Dive on Delivering Amazon EC2 Instance Performance

Example: Web Application

Disk stats

Page 15: Deep Dive on Delivering Amazon EC2 Instance Performance

Example: Web Application

Network stats

Page 16: Deep Dive on Delivering Amazon EC2 Instance Performance

Example: Web Application

CPU stats

Page 17: Deep Dive on Delivering Amazon EC2 Instance Performance

“Launching new instances and running tests

in parallel is easy…[when choosing an

instance] there is no substitute for measuring

the performance of your full application.”

- EC2 documentation

Page 18: Deep Dive on Delivering Amazon EC2 Instance Performance

How Not to Choose an EC2 Instance

Brute force testing

Ignoring metrics

Favoring old generation instances

Guessing based on what you already have

Page 19: Deep Dive on Delivering Amazon EC2 Instance Performance

EC2 Instance Families

General

purpose

Compute-

optimized

C3

Storage and IO-

optimized

I2 G2

GPU-

enabled

Memory-

optimized

R3C4M4

D2

X1

Page 20: Deep Dive on Delivering Amazon EC2 Instance Performance

Give back instances as easily as you can acquire new ones

Find an ideal instance type and workload combination

EC2 Instance Pages provide “Use Case” Guidance

With Amazon Elastic Block Store, storage and instance size don’t

need to be coupled

Instance Selection = Performance Tuning

Page 21: Deep Dive on Delivering Amazon EC2 Instance Performance

Instance Sizing

c4.8xlarge 2 - c4.4xlarge

4 - c4.2xlarge

8 - c4.xlarge

Page 22: Deep Dive on Delivering Amazon EC2 Instance Performance

Choosing the Right Size

Understand your unit of work

Web request

Database/table

Batch process

What is that unit’s requirements?

CPU threads

Memory constraints

Disk and network

What are it’s availability requirements?

Page 23: Deep Dive on Delivering Amazon EC2 Instance Performance

CPU Instructions and Protection Levels

CPU has at least two protection levels.

Privileged instructions can’t be executed in user mode to protect

system. Applications leverage system calls to the kernel.

Kernel

Application

Page 24: Deep Dive on Delivering Amazon EC2 Instance Performance

VMM

Application

Kernel

PV

X86 CPU Virtualization: Prior to Intel VT-x

Binary translation for privileged instructions

Paravirtualization (PV)

PV requires going through the VMM, adding latency

Applications that are system call bound are most affected

Page 25: Deep Dive on Delivering Amazon EC2 Instance Performance

Kernel

Application

VMM

PV-HVM

X86 CPU Virtualization: After Intel VT-x

Hardware-assisted virtualization (HVM)

PV-HVM uses PV drivers opportunistically for operations that

are slow emulated:

E.g., network and block I/O

Page 26: Deep Dive on Delivering Amazon EC2 Instance Performance

Tip: Use HVM AMIs with Amazon EBS

Page 27: Deep Dive on Delivering Amazon EC2 Instance Performance

Time Keeping Explained

Time keeping in an instance is deceptively hard

gettimeofday(), clock_gettime(), QueryPerformanceCounter()

The TSC

CPU counter, accessible from userspace

Requires calibration, vDSO

Invariant on Sandy Bridge+ processors

Xen pvclock; does not support vDSO

On current generation instances, use TSC as clocksource

Page 28: Deep Dive on Delivering Amazon EC2 Instance Performance

Tip: Use TSC as Clocksource

Page 29: Deep Dive on Delivering Amazon EC2 Instance Performance

Review: C4 Instances

Custom Intel E5-2666 v3 at 2.9 GHz

P-state and C-state controls

Model vCPU Memory (GiB) EBS (Mbps)

c4.large 2 3.75 500

c4.xlarge 4 7.5 750

c4.2xlarge 8 15 1,000

c4.4xlarge 16 30 2,000

c4.8xlarge 36 60 4,000

Batch and HPC Workloads, Game Servers, Ad Serving, and High Traffic Web Servers

Page 30: Deep Dive on Delivering Amazon EC2 Instance Performance

What’s New in C4: P-State and C-State Control

Intel Turbo Boost up to 3.5 Ghz

By entering deeper idle states, non-idle cores can achieve up to 300 MHz

higher clock frequencies

But…deeper idle states require more time to exit, may not be appropriate

for latency-sensitive workloads

Page 31: Deep Dive on Delivering Amazon EC2 Instance Performance

Tip: P-State Control for AVX2

If an application makes heavy use of AVX2 on all cores, the processor

may attempt to draw more power than it should

Processor will transparently reduce frequency

Frequent changes of CPU frequency can slow an application

Page 32: Deep Dive on Delivering Amazon EC2 Instance Performance

Review: T2 Instances

Lowest-cost EC2 instance at $0.0065 per hour

Burstable performance

Fixed allocation enforced with CPU credits

Model vCPU Baseline CPU Credits

/Hour

Memory

(GiB)

Storage

t2.nano 1 5% 3 .5 EBS Only

t2.micro 1 10% 6 1 EBS Only

t2.small 1 20% 12 2 EBS Only

t2.medium 2 40%** 24 4 EBS Only

t2.large 2 60%** 36 8 EBS Only

General Purpose, Web Serving, Developer Environments, Small Databases

Page 33: Deep Dive on Delivering Amazon EC2 Instance Performance

How Credits Work

A CPU credit provides the performance of a

full CPU core for one minute

An instance earns CPU credits at a steady rate

An instance consumes credits when active

Credits expire (leak) after 24 hours

Baseline rate

Credit

balance

Burst

rate

Page 34: Deep Dive on Delivering Amazon EC2 Instance Performance

Tip: Monitor CPU Credit Balance

Page 35: Deep Dive on Delivering Amazon EC2 Instance Performance

Tip: How to Interpret Steal Time

Fixed CPU allocations of CPU can be offered through CPU caps

Steal time happens when CPU cap is enforced

Leverage Amazon CloudWatch metrics

Page 36: Deep Dive on Delivering Amazon EC2 Instance Performance

Announced: X1 Instances

Largest memory instance with 2 TB of DRAM

Quad socket, Intel E7 processors with 128 vCPUs

Model vCPU Memory (GiB) Local

Storage

x1.32xlarge 128 1952 2x 1920 GB

In-Memory Databases, Big Data Processing, HPC Workloads

Page 37: Deep Dive on Delivering Amazon EC2 Instance Performance

NUMA

Non-uniform memory access

Each processor in a multi-CPU system has local memory that is

accessible through a fast interconnect

Each processor can also access memory from other CPUs, but local

memory access is a lot faster than remote memory

Performance is related to the number of CPU sockets and how they

are connected - Intel QuickPath Interconnect (QPI)

Page 38: Deep Dive on Delivering Amazon EC2 Instance Performance

QPI

122 GB 122 GB

16 vCPU’s 16 vCPU’s

r3.8xlarge

Page 39: Deep Dive on Delivering Amazon EC2 Instance Performance

QPI

QPI

QPIQPI

QPI

488 GB

488 GB

488 GB

488 GB

32 vCPU’s 32 vCPU’s

32 vCPU’s 32 vCPU’s

x1.32xlarge

Page 40: Deep Dive on Delivering Amazon EC2 Instance Performance

Tip: Kernel Support for NUMA Balancing

An application will perform best when the threads of its processes are

accessing memory on the same NUMA node.

NUMA balancing moves tasks closer to the memory they are accessing.

This is all done automatically by the Linux kernel when automatic NUMA

balancing is active: version 3.8+ of the Linux kernel.

Windows support for NUMA first appeared in the Enterprise and Data

Center SKUs of Windows Server 2003.

Page 41: Deep Dive on Delivering Amazon EC2 Instance Performance

Review: I2 Instances

16 vCPU: 3.2 TB SSD; 32 vCPU: 6.4 TB SSD

365 K random read IOPS for 32 vCPU instance

Model vCPU Memory

(GiB)

Storage Read IOPS Write IOPS

i2.xlarge 4 30.5 1 x 800 SSD 35,000 35,000

i2.2xlarge 8 61 2 x 800 SSD 75,000 75,000

i2.4xlarge 16 122 4 x 800 SSD 175,000 155,000

i2.8xlarge 32 244 8 x 800 SSD 365,000 315,000

NoSQL Databases, Clustered Databases, Online Transaction Processing (OLTP)

Page 42: Deep Dive on Delivering Amazon EC2 Instance Performance

Hardware

Split-Driver Model

Driver Domain Guest Domain Guest Domain

VMM

Front-end

driver

Front-

end

driver

Back-

end

driver

Device

Driver

Physical

CPU

Physical

Memory

Network

Device

Virtual CPUVirtual

Memory

CPU

Scheduling

Sockets

Application1

23

4

5

Page 43: Deep Dive on Delivering Amazon EC2 Instance Performance

Granting in Pre-3.8.0 Kernels

Requires “grant mapping” prior to 3.8.0

Grant mappings are expensive operations due to TLB flushes

read(fd, buffer,…)

I/O domain Instance

Page 44: Deep Dive on Delivering Amazon EC2 Instance Performance

Granting in 3.8.0+ Kernels, Persistent and Indirect

Grant mappings are set up in a pool once

Data is copied in and out of the grant pool

read(fd, buffer…)

Copy to and from grant pool

Page 45: Deep Dive on Delivering Amazon EC2 Instance Performance

2009–Longer Ago Than You Think

Avatar was the top movie in the theaters

Facebook overtook MySpace in active users

President Obama was sworn into office

The 2.6.32 Linux kernel was released

Page 46: Deep Dive on Delivering Amazon EC2 Instance Performance

Tip: Use 3.8+ Kernel

Amazon Linux 13.09 or later

Ubuntu 14.04 or later

RHEL/Centos 7 or later

Etc.

Page 47: Deep Dive on Delivering Amazon EC2 Instance Performance

Device Passthrough: Enhanced Networking

SR-IOV eliminates need for driver domain

Physical network device exposes virtual function to instance

Requires a specialized driver, which means:

Your instance OS needs to know about it

EC2 needs to be told your instance can use it

Page 48: Deep Dive on Delivering Amazon EC2 Instance Performance

Hardware

After Enhanced Networking

Driver Domain Guest Domain Guest Domain

VMM

NIC

Driver

Physical

CPU

Physical

Memory

SR-IOV Network

Device

Virtual CPUVirtual

Memory

CPU

Scheduling

Sockets

Application1

2

3

NIC

Driver

Page 49: Deep Dive on Delivering Amazon EC2 Instance Performance

Elastic Network Adapter

Next Generation of Enhanced Networking

Hardware checksums

Multi-Queue support

Receive-side steering

20 Gbps in a placement group

New Open Source Amazon Network Driver

Page 50: Deep Dive on Delivering Amazon EC2 Instance Performance

EBS Performance

Instance size matters

Match your volume size and

type to your instance

Use EBS optimization if EBS

performance is important

Page 51: Deep Dive on Delivering Amazon EC2 Instance Performance

Choose HVM AMI’s

Time keeping: use TSC

C state and P state controls

Monitor T2 CPU credits

Use a modern Linux kernel

NUMA balancing

Persistent grants for I/O performance

Enhanced networking

Summary: Getting the Most Out of EC2 Instances

Page 52: Deep Dive on Delivering Amazon EC2 Instance Performance
Page 53: Deep Dive on Delivering Amazon EC2 Instance Performance

Bare metal performance goal, and in many scenarios already there

History of eliminating hypervisor intermediation and driver domains

Hardware-assisted virtualization

Scheduling and granting efficiencies

Device passthrough

Virtualization Themes

Page 54: Deep Dive on Delivering Amazon EC2 Instance Performance

Next Steps

Visit the Amazon EC2 documentation

Launch an instance and try your app!

Page 55: Deep Dive on Delivering Amazon EC2 Instance Performance

Remember to complete

your evaluations!

Page 56: Deep Dive on Delivering Amazon EC2 Instance Performance

Thank you!