plnog 13: james kretchmar: how akamai scales to serve the largest events on the internet

44
Scaling to serve the Internet’s largest events James Kretchmar CTO EMEA

Upload: proidea

Post on 29-May-2015

415 views

Category:

Internet


2 download

DESCRIPTION

James Kretchmar – CTO EMEA, Akamai Technologies Inc. James Kretchmar is CTO for Akamai’s Europe, Middle East & Africa region and is responsible for technical strategy across the region. Previously he served as Chair of Akamai’s Architecture Board, responsible for review and oversight of technical designs for Akamai’s globally distributed intelligent platform, as well as providing company-wide technical guidance. Mr. Kretchmar came to Akamai from MIT in 2004 and during his tenure has also served as Architect for Akamai’s Mapping and Network Management systems. He is a published author on Network Administration and speaks several European Languages. Topic of Presentation: How Akamai scales to serve the largest events on the Internet Language: English Abstract: Scaling to meet the ever increasing demands of users in a world of more and heavier on-line content is always a challenge but at Akamai, serving at scale means delivering enormous events like the Olympic Games and the FIFA World Cup to a massive worldwide audience. Akamai’s highly distributed platform of 150,000 servers located in 92 countries around the world is critical to effectively delivering events of this magnitude but the physical deployment is only half of the story. The true key to serving with good performance at massive scale is in the intelligent algorithms that drive the platform. In this talk hear James Kretchmar, CTO EMEA, discuss how the design of the Akamai Intelligent Platform enables a fast, reliable and secure online experience for the largest events on the Internet.

TRANSCRIPT

Page 1: PLNOG 13: James Kretchmar: How Akamai scales to serve the largest events on the Internet

Scaling to serve the Internet’s largest events

James Kretchmar

CTO EMEA

Page 2: PLNOG 13: James Kretchmar: How Akamai scales to serve the largest events on the Internet

©2014 AKAMAI | FASTER FORWARDTM

Sochi Olympic Games 2014

Akamai Helps NBC Olympics Reach Record Digital Audience10.8 Million Total Hours of Online Video Delivered

Page 3: PLNOG 13: James Kretchmar: How Akamai scales to serve the largest events on the Internet

©2014 AKAMAI | FASTER FORWARDTM

2014 Winter Olympics Overview

Customers•NBC, TV2 Norway, France Televisions, …•25 customers total

Coverage•24/7 over 17 days•Live streamed all 98 events•Global audience

Media Technologies•Delivery in all major streaming formats•SecureHD for stream protection •Media Analytics for QoS monitoring and post-event analysis

Page 4: PLNOG 13: James Kretchmar: How Akamai scales to serve the largest events on the Internet

©2014 AKAMAI | FASTER FORWARDTM

Peak Sochi Traffic Events

Page 6: PLNOG 13: James Kretchmar: How Akamai scales to serve the largest events on the Internet

©2014 AKAMAI | FASTER FORWARDTM

2014 FIFA World Cup

Akamai delivered the World Cup globallyto 80+ countries

for 50+ rights holders

Page 7: PLNOG 13: James Kretchmar: How Akamai scales to serve the largest events on the Internet

©2014 AKAMAI | FASTER FORWARDTM

FIFA World Cup Overview

Event•1 Month, 32 Teams, 64 Matches, 12 Cities

Customers•55 Customers (25 in EMEA)

Streaming technologies:•HLS, HDS, Smooth•Both with and without stream security (SecureHD)

Akamai Traffic•6.99 Tb/s event peak = ~11,200 full DVDs per minute•5.3 million unique online viewers for Belgium/USA in the USA ALONE•47.8 average online minutes viewed for Germany vs Portugal on

WatchESPN

Page 8: PLNOG 13: James Kretchmar: How Akamai scales to serve the largest events on the Internet

©2014 AKAMAI | FASTER FORWARDTM

World Cup Traffic Peaks

Dramatic, Rapid Demand Spikes!

Global Demand with Regional Concentrations

http://www.akamai.com/html/ms/akamai-delivers-online-streaming-performance.html

Wednesday, June 25, 2014

Page 9: PLNOG 13: James Kretchmar: How Akamai scales to serve the largest events on the Internet

©2014 AKAMAI | FASTER FORWARDTM

Peak Perspective

1

2

3

4

5

6

7

0

Tera

bits

per

sec

ond 2010

2012

2014

Page 10: PLNOG 13: James Kretchmar: How Akamai scales to serve the largest events on the Internet

©2014 AKAMAI | FASTER FORWARDTM

1.0 1.8 3.1

8.4

Relative Event Size

Page 11: PLNOG 13: James Kretchmar: How Akamai scales to serve the largest events on the Internet

©2014 AKAMAI | FASTER FORWARDTM

The Akamai Platform

• 149,000 servers• Located in 92 countries around the world• Delivers over 2 trillion Internet interactions daily• Delivers approximately 30% of all Web traffic

Page 12: PLNOG 13: James Kretchmar: How Akamai scales to serve the largest events on the Internet

©2014 AKAMAI | FASTER FORWARDTM

Akamai Customers

Customers on the Akamai platform include:•All 20 top global eCommerce sites•96 of the top 100 online U.S. retailers (Source: Internet Retailer Magazine)

•The top 30 media & entertainment companies•7 of the top 10 banks (Source: The Banker)

•9 of the top 10 largest newspapers•9 out of 10 top social media sites•All of the top anti-virus companies•One out of every three Global 500® companies (List compiled by Fortune Magazine)

Page 13: PLNOG 13: James Kretchmar: How Akamai scales to serve the largest events on the Internet

©2014 AKAMAI | FASTER FORWARDTM

SCALING 101

Page 14: PLNOG 13: James Kretchmar: How Akamai scales to serve the largest events on the Internet

©2014 AKAMAI | FASTER FORWARDTM

HIGHLY DISTRIBUTED

Page 15: PLNOG 13: James Kretchmar: How Akamai scales to serve the largest events on the Internet

©2014 AKAMAI | FASTER FORWARDTM

Avoid data theft and downtime by extending the security perimeter outside the data-center and protect from increasing frequency, scale and sophistication of web attacks.

Centralized Deployment

1

10

100

10000

Total Traffic

1000

1

10

100

10000

Origin Traffic

1000

Ave Distance:HIGH

Page 16: PLNOG 13: James Kretchmar: How Akamai scales to serve the largest events on the Internet

©2014 AKAMAI | FASTER FORWARDTM

Avoid data theft and downtime by extending the security perimeter outside the data-center and protect from increasing frequency, scale and sophistication of web attacks.

Moderately Distributed Deployment

1

10

100

10000

Total Traffic

1000

1

10

100

10000

Origin Traffic

1000

Ave Distance:MEDIUM

Page 17: PLNOG 13: James Kretchmar: How Akamai scales to serve the largest events on the Internet

©2014 AKAMAI | FASTER FORWARDTM

Avoid data theft and downtime by extending the security perimeter outside the data-center and protect from increasing frequency, scale and sophistication of web attacks.

Highly Distributed Deployment

1

10

100

10000

Total Traffic

1000

1

10

100

10000

Origin Traffic

1000

Ave Distance:LOW

Page 18: PLNOG 13: James Kretchmar: How Akamai scales to serve the largest events on the Internet

©2014 AKAMAI | FASTER FORWARDTM

Avoid data theft and downtime by extending the security perimeter outside the data-center and protect from increasing frequency, scale and sophistication of web attacks.

Ave Distance vs. Deployments

0 200 400 600 800 1000 1200 1400 1600 1800 2000

Centralized

SomewhatDistributed

HighlyDistribtued

Number of Deployments

Ave

Dis

tanc

e

Page 19: PLNOG 13: James Kretchmar: How Akamai scales to serve the largest events on the Internet

©2014 AKAMAI | FASTER FORWARDTM

Avoid data theft and downtime by extending the security perimeter outside the data-center and protect from increasing frequency, scale and sophistication of web attacks.

Constrained peering points

12,000 networks connected by peering points

Page 20: PLNOG 13: James Kretchmar: How Akamai scales to serve the largest events on the Internet

©2014 AKAMAI | FASTER FORWARDTM

Avoiding Congestion

Page 21: PLNOG 13: James Kretchmar: How Akamai scales to serve the largest events on the Internet

©2014 AKAMAI | FASTER FORWARDTM

SCALING 201

Page 22: PLNOG 13: James Kretchmar: How Akamai scales to serve the largest events on the Internet

©2014 AKAMAI | FASTER FORWARDTM

INTELLIGENTALGORITHIMS

&FINE-GRAINED

LOAD BALANCING

Page 23: PLNOG 13: James Kretchmar: How Akamai scales to serve the largest events on the Internet

©2014 AKAMAI | FASTER FORWARDTM

Avoid data theft and downtime by extending the security perimeter outside the data-center and protect from increasing frequency, scale and sophistication of web attacks.

Highly Distributed Deployment

1

10

100

10000

Total Traffic

1000

1

10

100

10000

Origin Traffic

1000

Ave Distance:LOW

Page 24: PLNOG 13: James Kretchmar: How Akamai scales to serve the largest events on the Internet

©2014 AKAMAI | FASTER FORWARDTM

Managing Load

Challenges:•Want to send users to the “best” or “closest” server•Need to fully utilize servers, else have to over-capacitate•Need to predict and prevent overload

Typical CDN solution is IP Anycast

Page 25: PLNOG 13: James Kretchmar: How Akamai scales to serve the largest events on the Internet

©2014 AKAMAI | FASTER FORWARDTM

Avoid data theft and downtime by extending the security perimeter outside the data-center and protect from increasing frequency, scale and sophistication of web attacks.

IP Anycast

10.12.4.80

10.12.4.80

10.12.4.80

10.12.4.80

10.12.4.80

10.12.4.80

10.12.4.80

10.12.4.80

Page 26: PLNOG 13: James Kretchmar: How Akamai scales to serve the largest events on the Internet

©2014 AKAMAI | FASTER FORWARDTM

Mechanics of fine-grained load balancing

Problems with IP Anycast:•Doesn’t find best performance•Doesn’t account for congestion•Very coarse•Won’t work for highly distributed•Little control

Need a system with fine-grained control•Send these users to these specific machines

Page 27: PLNOG 13: James Kretchmar: How Akamai scales to serve the largest events on the Internet

©2014 AKAMAI | FASTER FORWARDTM

DNS based load-balancing

www.example.com CNAME www.example.com.edgesuite.net

www.example.com.edgesuite.net CNAME a1782.g.akamai.net

a1782.g.akamai.net A 10.7.20.66 A 10.7.20.70

static

dynamic

CNAME to a special hostname that can return dynamic answers

Now we can choose exactly which servers for which end-users

Page 28: PLNOG 13: James Kretchmar: How Akamai scales to serve the largest events on the Internet

©2014 AKAMAI | FASTER FORWARDTM

Fine-grained load-balancing

Being able to choose specific servers means:•Possible to drive close to 100% utilization•No excess capacity necessary•Can deal with huge traffic spikes•Best end user performance•Can find and route around problems

Page 29: PLNOG 13: James Kretchmar: How Akamai scales to serve the largest events on the Internet

©2014 AKAMAI | FASTER FORWARDTM

Intelligent Algorithms

Full control is good, but only if you know what to do with it …•Understand the structure of the Internet•Measure performance between servers and end users in real time•Assign to the best performing server•But with fairness if the best would be overloaded

Load vs. Demand•When a request comes in we don’t know how “heavy” it will be•Or on what resource•Must adaptively measure, adjust and predict with a load feedback controller

Page 30: PLNOG 13: James Kretchmar: How Akamai scales to serve the largest events on the Internet

©2014 AKAMAI | FASTER FORWARDTM

SCALING 301

Page 31: PLNOG 13: James Kretchmar: How Akamai scales to serve the largest events on the Internet

©2014 AKAMAI | FASTER FORWARDTM

RELIABILITY =

SCALABILITY

Page 32: PLNOG 13: James Kretchmar: How Akamai scales to serve the largest events on the Internet

©2014 AKAMAI | FASTER FORWARDTM

Reliability = Scalability

The cost of redundancy•Anything that can break needs a backup•Requires some extra resources in the system•If small failures cause large failures, less capacity is available•For effective use of large capacity, must minimize impact of small failures•Like RAID for disks

Akamai reliability•Reliability built in layer upon layer •Each deployment acts like one big cache, but take a closer look …

Page 33: PLNOG 13: James Kretchmar: How Akamai scales to serve the largest events on the Internet

©2014 AKAMAI | FASTER FORWARDTM

Avoid data theft and downtime by extending the security perimeter outside the data-center and protect from increasing frequency, scale and sophistication of web attacks.

Highly Distributed Deployment

1

10

100

10000

Total Traffic

1000

1

10

100

10000

Origin Traffic

1000

Ave Distance:LOW

Page 34: PLNOG 13: James Kretchmar: How Akamai scales to serve the largest events on the Internet

©2014 AKAMAI | FASTER FORWARDTM

Akamai reliability

Within a deployment of servers:•Disk space effectively shared•Customers are striped across servers•Dynamically use more servers if customer load is high•If a server fails:

•Reassign in a minimally impacting way (consistent hashing)•DNS reflects the new assignments•A “buddy” machine grabs the IP

Deployment•Acts like one big cache•Multiple machines can fail with no bad effect

Page 35: PLNOG 13: James Kretchmar: How Akamai scales to serve the largest events on the Internet

©2014 AKAMAI | FASTER FORWARDTM

SCALING THE FUTURE

Page 36: PLNOG 13: James Kretchmar: How Akamai scales to serve the largest events on the Internet

©2014 AKAMAI | FASTER FORWARDTM

Avoid data theft and downtime by extending the security perimeter outside the data-center and protect from increasing frequency, scale and sophistication of web attacks.

Quality Requires BitsC

OM

PR

ES

SE

D B

IT R

AT

E

(Kb

ps)

50,000

40,000

30,000

20,000

10,000

0

8K-UHDXEVC+?50,000+

Kbps

Future

HD-1080p AVC/H.2647,500 Kbps

2011

HD-720pAVC/H.2643,500 Kbps

2007 2013

4K-UHD HEVC/H.265

16,000-30,000 Kbps

VGA/SDTVMPEG2-4

1,800 Kbps

2004

QVGAMPEG1

550 Kbps

2001

Page 37: PLNOG 13: James Kretchmar: How Akamai scales to serve the largest events on the Internet

©2014 AKAMAI | FASTER FORWARDTM

Delivery Innovation: Akamai Media Client Technology

A toolkit for the next-generation network

Akamai Media Client

Core Services

API

Cache

HybridHTTP/UDP

(HHU)Acceleration

Multicast Delivery

Peer Assisted Delivery

Intelligent Pre-Positioning

Page 38: PLNOG 13: James Kretchmar: How Akamai scales to serve the largest events on the Internet

©2014 AKAMAI | FASTER FORWARDTM

Connected Device Stack

Media Client Technology

Patent Pending

Specification and reference implementation SDK

OS

Services / Libraries

Application Framework

App

Media Client SDK

Media Client SDK

App layer integration

Service layer integration

Standard HTTP

App

Can be implemented at the App Layer or the Service Layer

Page 39: PLNOG 13: James Kretchmar: How Akamai scales to serve the largest events on the Internet

©2014 AKAMAI | FASTER FORWARDTM

Avoid data theft and downtime by extending the security perimeter outside the data-center and protect from increasing frequency, scale and sophistication of web attacks.

Akamai’s Media Client Technology InitiativeHybrid HTTP/UDP

Residential Gateway

HTTP/UDPUnicast Acceleration

HTTP/UDP Unicast Acceleration

✔Ready to Play Later

Page 40: PLNOG 13: James Kretchmar: How Akamai scales to serve the largest events on the Internet

©2014 AKAMAI | FASTER FORWARDTM

Akamai Network

ISP Network

HTTP

Origin

Akamai Edge Server Non-Multicast Router

100,000 ViewersOne 2 Mbps stream per viewer = 200Gbps

Akamai’s Media Client Technology InitiativeMulticast

Page 41: PLNOG 13: James Kretchmar: How Akamai scales to serve the largest events on the Internet

©2014 AKAMAI | FASTER FORWARDTM

Akamai Network

ISP Network

HTTPMulticastAMT

Origin

Akamai Edge Server Multicast Router AMT Router/Relay

Akamai’s Media Client Technology InitiativeMulticast

Page 42: PLNOG 13: James Kretchmar: How Akamai scales to serve the largest events on the Internet

©2014 AKAMAI | FASTER FORWARDTM

Avoid data theft and downtime by extending the security perimeter outside the data-center and protect from increasing frequency, scale and sophistication of web attacks.

Akamai’s Media Client Technology InitiativeIntelligent Prepositioning

Residential Gateway

HTTP/UDP Unicast Acceleration

Multicast

Intelligent Prepositioning

Intelligent Prepositioning

✔Ready to Play LaterContent demand is

predicted,And content is delivered in

anticipation.

Page 43: PLNOG 13: James Kretchmar: How Akamai scales to serve the largest events on the Internet

©2014 AKAMAI | FASTER FORWARDTM

Avoid data theft and downtime by extending the security perimeter outside the data-center and protect from increasing frequency, scale and sophistication of web attacks.

Akamai’s Media Client Technology InitiativePeer Assisted Delivery

Residential Gateway

HTTP/UDP Unicast Acceleration

Multicast

Intelligent Prepositioning

Peer-Assisted Delivery

Peer Assist

Network intelligence can exclude peering traffic from data caps and billed usage.

Page 44: PLNOG 13: James Kretchmar: How Akamai scales to serve the largest events on the Internet