application optimized performance: choosing the right instance (cpn212) | aws re:invent 2013

45
Delivering Compelling Experiences: Choosing the Right Instance for Application Optimized Performance Jason Waxman, GM & VP Cloud Platforms Group, Intel Corporation November 14, 2013

Upload: amazon-web-services

Post on 02-Dec-2014

2.445 views

Category:

Technology


3 download

DESCRIPTION

(Presented by Intel) Each application places a different set of requirements on the underlying infrastructure. Whether it is web, big data analytics, technical computing, or general enterprise applications, applications are run more efficiently when performance, IO bandwidth, and memory capacity have been custom-tailored for that specific application. Jason Waxman, GM and VP of Intel’s Cloud Platform Group, looks under the hood at the different types of processors that comprise Amazon Web Services instances and shares insights from Intel IT and industry best practices for right-sizing infrastructure for different application characteristics and capabilities. By leveraging the underlying performance, security capabilities, and flexibility of various instance types, developers can more easily migrate applications into the cloud and drive down TCO for cloud-based services.

TRANSCRIPT

Page 1: Application Optimized Performance: Choosing the Right Instance (CPN212) | AWS re:Invent 2013

Delivering Compelling Experiences: Choosing the Right Instance for Application Optimized Performance

Jason Waxman, GM & VP Cloud Platforms Group, Intel Corporation

November 14, 2013

Page 2: Application Optimized Performance: Choosing the Right Instance (CPN212) | AWS re:Invent 2013

Voice & Gestures Personal assistant

Natural Interaction

20X growth in speech driven mobile network traffic1

>22X in smartphones with gesture recognition features1

Perceptual

Compelling

Experiences drive Growth

Pervasive

Video/Media Content Delivery

Video Search

16X in mobile video traffic2

4X in servers for media/graphics3

Personal

Predictive Analytics Improve healthcare

43% CAGR Big Data & Analytics

Infrastructure4

Page 3: Application Optimized Performance: Choosing the Right Instance (CPN212) | AWS re:Invent 2013

1.http://www.digitalservicecloud.com/resources/blog/good-customer-service.html

On an average consumers tell:

about bad

experiences1

9 16 ..and

The High Costs of a Bad Experience

people about

good experiences…

Page 4: Application Optimized Performance: Choosing the Right Instance (CPN212) | AWS re:Invent 2013

Delivering Compelling Experiences

What’s

Required?

Page 5: Application Optimized Performance: Choosing the Right Instance (CPN212) | AWS re:Invent 2013

Diverse Needs, Common Themes CONSUMER

EXPECTATIONS

DEVELOPER

REQUIREMENTS

AGILITY

RELIABILITY

EFFICIENCY

Personal &

Customized

Service Availability

Privacy/Security

Cost Effective

Fast Service Delivery

Elasticity/Scalability

Stable, Consistent

Privacy/Security

Cost to Serve - Services & APIs reduce

headcount required

Page 6: Application Optimized Performance: Choosing the Right Instance (CPN212) | AWS re:Invent 2013

Delivering the best experience

SCALE: TAKE ADVANTAGE OF

AWS ELASTICITY

OPTIMIZE: CHOOSE THE RIGHT

INSTANCES

SECURE: MANAGE RISKS FOR

INCREASED DURABILITY

AGILITY EFFICIENCY RELIABILITY

Page 7: Application Optimized Performance: Choosing the Right Instance (CPN212) | AWS re:Invent 2013

Delivering the best experience

SCALE: TAKE ADVANTAGE OF

AWS ELASTICITY

OPTIMIZE: CHOOSE THE RIGHT

INSTANCES

SECURE: MANAGE RISKS FOR

INCREASED DURABILITY

AGILITY EFFICIENCY RELIABILITY

Page 8: Application Optimized Performance: Choosing the Right Instance (CPN212) | AWS re:Invent 2013

An Intel Company

Intel®

Cloud

Services

Platform

Intel is using Amazon Web Services

Page 9: Application Optimized Performance: Choosing the Right Instance (CPN212) | AWS re:Invent 2013

Mashery API Management Service

• 175 Customers

• 60,000 Applications

• 215,000 Developers

• 500,000,000 API calls/day

An Intel Company

Page 10: Application Optimized Performance: Choosing the Right Instance (CPN212) | AWS re:Invent 2013

Scaling enables responsiveness Mashery relies on AWS elasticity

• Capacity Planning - Robust Development and QA

environment to perform load tests and deploy proof-

of-concept silos

• Modular Infrastructure - Loosely coupled single-

purpose servers that scale horizontally

• Globally Distributed - Extend the infrastructure to

where you customers are…every AWS Region,

reaching every corner of the globe.

From 100 queries/sec to 100,000 queries/sec in a matter of minutes

NEWS VOLUMES FOR SOME

CUSTOMERS AT PASSING OF

CELEBRITY IN EARLY 2012

100X

Page 11: Application Optimized Performance: Choosing the Right Instance (CPN212) | AWS re:Invent 2013

Kevin Baillee CEO, Atomic Fiction

Page 12: Application Optimized Performance: Choosing the Right Instance (CPN212) | AWS re:Invent 2013

Company Overview • Atomic Fiction crafts high-end visual effects (VFX) for film and television.

• Specialties include digital environments and character work

• Staff scales with projects, varying between 15 and 50 artists

• Known for high end work, medium volume, low cost

• Big company infrastructure, small company vibe

• Developing innovative approaches to reducing technological costs in

order to fee up resources for experienced artistic talent.

Page 13: Application Optimized Performance: Choosing the Right Instance (CPN212) | AWS re:Invent 2013

Company Overview

Page 14: Application Optimized Performance: Choosing the Right Instance (CPN212) | AWS re:Invent 2013
Page 15: Application Optimized Performance: Choosing the Right Instance (CPN212) | AWS re:Invent 2013

Our AWS Story • Pixar-sized render farm in minutes. iMac-sized the next.

• Only pay for what we use

• No physical limits on computing = no physical limits on creativity

• Seamless experience through tools like ZYNC

• Decreased load, thus increased performance, on local filesystem

• Same price for faster artist turnaround

• 100 computers for 10 hours = $2,200

• 1000 computers for 1 hour = $2,200

Page 16: Application Optimized Performance: Choosing the Right Instance (CPN212) | AWS re:Invent 2013

Problems we’re trying to solve • The problem: how do we “unlimit” creativity while staying profitable?

• Freeing the creative so that directors can achieve their vision

• Artists need iterations in order to hit “the look” and stay on schedule

• Need security, task-appropriate stats, and unlimited availability on demand

• Choosing the right instance: speed vs memory vs cost

• c1.xlarge – 20 compute units, 7GB RAM

• Low cost per hour for lightweight compositing tasks

• cc2.8xlarge – 88 compute units, 60.5GB RAM

• Beefy RAM, good $ per compute unit cost proposition

• Working with our partners at ZYNC, implementation was plug & play!

Page 17: Application Optimized Performance: Choosing the Right Instance (CPN212) | AWS re:Invent 2013
Page 18: Application Optimized Performance: Choosing the Right Instance (CPN212) | AWS re:Invent 2013

Star Trek Into Darkness

Page 19: Application Optimized Performance: Choosing the Right Instance (CPN212) | AWS re:Invent 2013

Star Trek Into Darkness

Page 20: Application Optimized Performance: Choosing the Right Instance (CPN212) | AWS re:Invent 2013

Star Trek Into Darkness

Page 21: Application Optimized Performance: Choosing the Right Instance (CPN212) | AWS re:Invent 2013

Star Trek Into Darkness

Page 22: Application Optimized Performance: Choosing the Right Instance (CPN212) | AWS re:Invent 2013

Key Learnings/What’s Next? • For Star Trek Into Darkness, we achieved exactly what JJ Abrams wanted

• Key findings:

• For over 80% of tasks, cc2.8xlarge was fastest & most cost effective

• Given higher per-hour costs, efficient utilization of high end instances is

critical. Partial hours = wasted money!

• Burstability was critical for hitting deadlines. Ran between 0 and 400

instances simultaneously depending on the needs of the moment.

• Grew over 200% month-over-month two months in a row

• Our ideal instance would be inexpensive, high compute power (Intel Xeon

E5-2600 v2), medium memory (32-48GB) with nVidia GPU.

• Next for us: moving even more of our workflow into the cloud!

Page 23: Application Optimized Performance: Choosing the Right Instance (CPN212) | AWS re:Invent 2013
Page 24: Application Optimized Performance: Choosing the Right Instance (CPN212) | AWS re:Invent 2013

Delivering the best experience

SCALE: TAKE ADVANTAGE OF

AWS ELASTICITY

OPTIMIZE: CHOOSE THE RIGHT

INSTANCES

SECURE: MANAGE RISKS FOR

INCREASED DURABILITY

AGILITY EFFICIENCY RELIABILITY

Page 25: Application Optimized Performance: Choosing the Right Instance (CPN212) | AWS re:Invent 2013

Optimize: Choose the right Instance

E- Commerce

Dedicated

Hosting

Enterprise

Applications

High

Performance

Computing

Big Data Content Delivery

and Gaming

Graphics

Rendering

I/O Intensive

CP

U &

Mem

ory

In

ten

siv

e

Cold

Storage

Low End

Networking

Edge

Routing

Storage

De-dupe

Cloud RAN

Small

Cell

Higher latency, lower throughput Lower latency, higher throughput

Micro Instance

M3 Standard

Instance

E5-2670

Cluster

Compute

Instance

E5-2670 Cluster Graphics Instance

X5570

M1 Standard Instance

C1 Compute

Instance M2 Memory Optimized

CR1 Memory

Optimized

E5-2670

Storage –

Optimized

E5-2650

High Memory

G2 GPU Instance

E5-2670

Page 26: Application Optimized Performance: Choosing the Right Instance (CPN212) | AWS re:Invent 2013

Intel® Cloud Services Platform

• 175,000 Users

• 5M users by 2014

• Entire path of production

is on AWS

• 1448 instances…

Our Wake Up Moment… One month we spent $300K…60% of which we found later was wasted…

• We were spinning up instances and forgetting they were on

• We had larger instances than we actually needed

• Most instances never went over 10% utilization…

Page 27: Application Optimized Performance: Choosing the Right Instance (CPN212) | AWS re:Invent 2013

PER WEEK OF

UNUTILIZED INSTANCES

108hrs

OF SAVINGS BY TURNING

OFF NON PRODUCTION

INSTANCES AFTER HOURS

$100K

TOTAL SAVINGS

>60%

Optimize for Efficiency Select the Right Instance

Keys to Success • Analyze: “Trusted Advisor”

• Select the right type of instance

for your workload

• Size & Features

• # of Instances

• Reserve instances where possible

for cost efficiency

Page 28: Application Optimized Performance: Choosing the Right Instance (CPN212) | AWS re:Invent 2013

Steve Litster, PhD. Global Head of Scientific Computing

Novartis Institutes for Biomedical Research

Accelerating Science

Page 29: Application Optimized Performance: Choosing the Right Instance (CPN212) | AWS re:Invent 2013

Novartis Institutes for BioMedical Research

(NIBR)

Unique research strategy driven by patient needs

World-class research organization with about

6000 scientists globally

Intensifying focus on molecular pathways shared by

various diseases

Integration of clinical insights with mechanistic

understanding of disease

Research-to-Development transition redefined

through fast and rigorous “proof-of-concept” trials

Strategic alliances with academia and biotech

strengthen preclinical pipeline

Page 30: Application Optimized Performance: Choosing the Right Instance (CPN212) | AWS re:Invent 2013

Requirements

Large Scale Computational Chemistry Simulation

Results in under a week

Flexible target

Ability to run multiple experiments “on-demand”

Challenges

Sustained access to 50000+ compute cores

Ability to monitor and re-launch jobs

No additional Capital Expenditure

Internal HPCC already running at capacity

Job Profile

Embarrassingly Parallel

CPU Bound

Low I/O, Memory and Network requirements

Accelerating the Science

Virtual Screening

Target

Molecule Compound

Molecule

binding

site

"Lock" "Keys"

Page 31: Application Optimized Performance: Choosing the Right Instance (CPN212) | AWS re:Invent 2013

The Cloud: Flexible Science on Flexible Infrastructure

Engineering the right infrastructure for a workload:

Software runs the same job many times across instance types

Measures the throughput and determines the $ per job

Use the instances that provide the best scientific ROI

CC2 instance (Intel Xeon® ‘Sandy Bridge’) ran best for this

Page 32: Application Optimized Performance: Choosing the Right Instance (CPN212) | AWS re:Invent 2013

Metric Count

Compute Hours of Science 341,700 hours

Compute Days of Science 14,238 days

Compute Years of Science 39 years

AWS Instance Count-CC2 10,600 instances

Super Computing in the Cloud

$44 Million infrastructure

10 million compounds screened

39 Drug Design years in 11 hours for a cost of …$4,232

3 promising compounds identified

Page 33: Application Optimized Performance: Choosing the Right Instance (CPN212) | AWS re:Invent 2013

Key Learnings/What’s Next?

Diversity of Life Sciences brings unique challenges

Spend the time analyzing and tuning

Flexibility, Scalability and Performance

Time to rethink and retool

Challenge the Science and the Scientist

Collaboration

Future plans

Chemical Universe : 166 Billion cpds ≤ 17 atoms (Extreme scale CPU)

Next Generation Sequencing in the Cloud (Extreme CPU, Mem, I/O)

“Disruptive” Technologies-Imaging (x10 NGS requirements!)

Page 34: Application Optimized Performance: Choosing the Right Instance (CPN212) | AWS re:Invent 2013

Delivering the best experience

SCALE: TAKE ADVANTAGE OF

AWS ELASTICITY

OPTIMIZE: CHOOSE THE RIGHT

INSTANCES

SECURE: MANAGE RISKS FOR

INCREASED DURABILITY

AGILITY EFFICIENCY RELIABILITY

Page 35: Application Optimized Performance: Choosing the Right Instance (CPN212) | AWS re:Invent 2013

Optimizing for Security Performance higher performance saves cost

Intel Internal Benchmark

Instance Requirements for 400mbps

OPEN SSL PERFORMANCE

1 2 3

m1.x

larg

e

42 $10K/year

m3.x

larg

e

(w/A

ES

-NI)

1 $5.6K/year 21

1 - Not required but added for redundancy

2 - Requirement is 3.2, but you can’t buy .2, so round up to 4

SAVINGS

50-75% By upgrading from

m1.xlarge to the more

expensive m3.xlarge

because of AES-NI

Page 36: Application Optimized Performance: Choosing the Right Instance (CPN212) | AWS re:Invent 2013

NASDAQ OMX

36

WE LIST ~3300 GLOBAL COMPANIES WORTH

IN MARKET CAP REPRESENTING

$6 TRILLION

DIVERSE INDUSTRIES AND

MANY OF THE WORLD’S

MOST WELL-KNOWN AND

INNOVATIVE BRANDS

OUR TECHNOLOGY

IS USED TO POWER MORE THAN

I N 5 0

C O U N T R I E S

70 MARKETPLACES

OUR GLOBAL PLATFORM

CAN HANDLE MORE THAN

1 MILLION MESSAGES/SECOND AT A MEDIAN SPEED OF

S U B - 5 5

M I C R O S E C O N D S

FinQloud R3 (Regulatory Record Retention)

Security

Elastic Durable/Available

Cost Effective Transparent

POWERED BY AMAZON WEB SERVICES

Cloud Computing Platform

Exclusively for Financial Services

Page 37: Application Optimized Performance: Choosing the Right Instance (CPN212) | AWS re:Invent 2013

NASDAQ OMX Security Protocols

* Highly confidential data must be encrypted

and the keys must be stored in HSMs

Data Classification Encryption

Audit

AWS Built In Security

Security

at all times (in flight and at

rest) SSL for all data

Define and enforce what is

and is not approved for

the Cloud

Any action someone does

in R3 is audited and

fully transparent to system

admins and regulators

IAM, MFA, VPC, Direct

Connect private circuits,

routing/firewalls, etc

Data Classification

Page 38: Application Optimized Performance: Choosing the Right Instance (CPN212) | AWS re:Invent 2013

Technology: What’s Next

Intel® Xeon® Processor

E5-2600 v2 Family

Software Defined Infrastructure

Rack Scale Architecture

Page 39: Application Optimized Performance: Choosing the Right Instance (CPN212) | AWS re:Invent 2013

Diversity of Datacenter Workloads

E- Commerce

Dedicated

Hosting

Enterprise

Applications

High

Performance

Computing

Big Data Content Delivery

and Gaming

Graphics

Rendering

I/O Intensive

CP

U &

Mem

ory

In

ten

siv

e

Cold

Storage

Low End

Networking

Edge

Routing

Storage

De-dupe

Cloud RAN

Small

Cell

Higher latency, lower throughput Lower latency, higher throughput

Micro Instance

M3 Standard

Instance

E5-2670

Cluster

Compute

Instance

E5-2670 Cluster Graphics Instance

X5570

M1 Standard Instance

C1 Compute

Instance M2 Memory Optimized

CR1 Memory

Optimized

E5-2670

Storage –

Optimized

E5-2650

High Memory

G2 GPU Instance

E5-2670

“New” EC2 C3

Compute Optimized

w/ Latest Intel® Xeon®

E5-2600v2 Processors

E5-2680v2

“New” EC2

Storage Optimized

w/ Latest Intel® Xeon®

E5-2600v2 Processors

E5-2670v2

Page 40: Application Optimized Performance: Choosing the Right Instance (CPN212) | AWS re:Invent 2013

Technology Matters

Amazon Web Services

discloses instances based on

Intel Xeon

Page 41: Application Optimized Performance: Choosing the Right Instance (CPN212) | AWS re:Invent 2013

New 2013 AWS C3 Compute-Optimized

Instance

Powered by “NEW” Intel® Xeon® E5-2600v2 processor family

26K cores based on Intel® Xeon® processor E5-2680v2

SC’13

484 TFLOPs*

*SC’13 Submission

Page 42: Application Optimized Performance: Choosing the Right Instance (CPN212) | AWS re:Invent 2013

Platform Flexibility - Increase

useful life, and capacity

The Future of the platform: Intel Rack Scale Architecture Innovation

Orchestration

Increases Agility, Efficiency & Reliability

CPU / Mem Modules

Silicon – Intel® Atom™ & Xeon

Photonics & switch fabric

Storage – PCIE –SSD &

Caching

Open Network Platform Network platform – Flexible &

Cost effective

Increase utilization thru storage

aggregation

Extreme Compute and Network

bandwidth

Page 43: Application Optimized Performance: Choosing the Right Instance (CPN212) | AWS re:Invent 2013

The Future of Infrastructure: Intel’s Approach to Software Defined Infrastucture

EXPOSED & INTEGRATED TELEMETRY

Hardware and infrastructure attributes are exposed

and integrated with orchestration software for deeper

insight & optimal provisioning management

BROADEST ENABLED ECOSYSTEM

Integrated and optimized for all leading commercial

and open source operating environments for more

seamless Data Center operations

Rack Scale Architecture

• Scalable Intel®

Atom® &

Xeon® storage

solutions

• SSD’s with

Cache

Acceleration

• Luster

• NVM/Crystal

Ridge

• Open Network

Platforms

• Wind River

OS

• DPDK

• Cave Creek

• Silicon

Photonics

• Intel Xeon

• Atom®C2000

• Intel Xeon

Phi

• Integrated

graphics

• TXT

AMAZON WEB SERVICES (AWS)

Storage Network Compute

Service Assurance Manager

PLATFORM AND ARCHITECTURAL LEADERSHIP

Standards-based compute, network and storage

building blocks in Intel’s Rack Scale Architecture drive

maximum infrastructure efficiency and flexibility

A world where the application defines the system

Page 44: Application Optimized Performance: Choosing the Right Instance (CPN212) | AWS re:Invent 2013

Delivering the best experience

SCALE: TAKE ADVANTAGE OF

AWS ELASTICITY

OPTIMIZE: CHOOSE THE RIGHT

INSTANCES

SECURE: MANAGE RISKS FOR

INCREASED DURABILITY

AGILITY EFFICIENCY RELIABILITY

Page 45: Application Optimized Performance: Choosing the Right Instance (CPN212) | AWS re:Invent 2013