scaling the netflix api - from atlassian dev den

Post on 15-Jan-2015

3.191 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

The term "scale" for engineering often is used to discuss systems and their ability to grow with the needs of its users. This is clearly an important aspect of scaling, but there are many other areas in which an engineering organization needs to scale to be successful in the long term. This presentation discusses some of those other areas and details how Netflix (and specifically the API team) addresses them.

TRANSCRIPT

Scaling the Netflix API

Daniel Jacobson@daniel_jacobson

http://www.linkedin.com/in/danieljacobsonhttp://www.slideshare.net/danieljacobson

Please read the notes associated with each slide for

the full context of the presentation

What do I mean by “scale”?

But There Are Many Ways to Scale!

OrganizationSystems

Devices

Development

Testing

But first, some background…

Global Streaming Videofor TV Shows and Movies

More than 36 Million Subscribers

More than 40 Countries

Netflix Accounts for 33% of Peak Internet Traffic in North America

Netflix subscribers are watching more than 1 billion hours a month

2007

Netflix REST API:One-Size-Fits-All (OSFA)

Solution

Image courtesy of Jay Mac 3 on Flickr

Netflix API Requests by AudienceAt Launch In 2008

External Developers

Image courtesy of Jay Mac 3 on Flickr

Netflix API Requests by AudienceFrom 2011

External Developers

Global Streaming Product

Three aspects of the Streaming Product:• Discovery• Sign-Up• Streaming

Member Sign-Up

Discovery

Discovery

Today, Netflix API Supports Discovery and Sign-Up

But Soon, Will Support Streaming

Scaling…

OrganizationSystems

Devices

Development

Testing

Distributed Architecture

1000+ Device Types

Personalization

EngineUser Info Movie

MetadataMovie Ratings

Similar Movies Reviews A/B Test

Engine

Dozens of Dependencies

Personalization

EngineUser Info Movie

MetadataMovie Ratings

Similar Movies

API

Reviews A/B Test Engine

Personalization

EngineUser Info Movie

MetadataMovie Ratings

Similar Movies

API

Reviews A/B Test Engine

http://www.slideshare.net/reed2001/culture-1798664

Scaling…

OrganizationSystems

Devices

Development

Testing

System Resiliency

Distributed Architecture

Dependency Relationships

2,000,000,000Requests Per Day to the

Netflix API

30Distinct Dependent

Services for the Netflix API

14,000,000,000Netflix API Calls Per Day to those Dependent Services

0Dependent Services with

100% SLA

99.99% = 99.7%30

0.3% of 2B = 6M failures per day

2+ Hours of Downtime Per Month

99.99% = 99.7%30

0.3% of 2B = 6M failures per day

2+ Hours of Downtime Per Month

99.9% = 97%30

3% of 2B = 60M failures per day

20+ Hours of Downtime Per Month

Personalization

EngineUser Info Movie

MetadataMovie Ratings

Similar Movies

API

Reviews A/B Test Engine

Personalization

EngineUser Info Movie

MetadataMovie Ratings

Similar Movies

API

Reviews A/B Test Engine

Personalization

EngineUser Info Movie

MetadataMovie Ratings

Similar Movies

API

Reviews A/B Test Engine

Personalization

EngineUser Info Movie

MetadataMovie Ratings

Similar Movies

API

Reviews A/B Test Engine

Personalization

EngineUser Info Movie

MetadataMovie Ratings

Similar Movies

API

Reviews A/B Test Engine

Circuit Breaker Dashboard

Call Volume and Health / Last 10 Seconds

Call Volume / Last 2 Minutes

Successful Requests

Successful, But Slower Than Expected

Short-Circuited Requests, Delivering Fallbacks

Timeouts, Delivering Fallbacks

Thread Pool & Task Queue Full, Delivering Fallbacks

Exceptions, Delivering Fallbacks

Error Rate# + # + # + # / (# + # + # + # + #) = Error Rate

Status of Fallback Circuit

Requests per Second, Over Last 10 Seconds

SLA Information

Personalization

EngineUser Info Movie

MetadataMovie Ratings

Similar Movies

API

Reviews A/B Test Engine

Personalization

EngineUser Info Movie

MetadataMovie Ratings

Similar Movies

API

Reviews A/B Test Engine

Personalization

EngineUser Info Movie

MetadataMovie Ratings

Similar Movies

API

Reviews A/B Test Engine

Personalization

EngineUser Info Movie

MetadataMovie Ratings

Similar Movies

API

Reviews A/B Test Engine

Fallback

Personalization

EngineUser Info Movie

MetadataMovie Ratings

Similar Movies

API

Reviews A/B Test Engine

Fallback

System Infrastructure

AWS Cloud

Autoscaling

Autoscaling

Forced Failure

Global System

More than 36 Million Subscribers

More than 40 Countries

ZuulGatekeeper for the Netflix Streaming Application

Zuul

• Multi-Region Resiliency

• Insights• Stress Testing• Canary Testing• Dynamic Routing

• Load Shedding• Security• Static Response

Handling• Authentication

Isthmus

Scaling…

OrganizationSystems

Devices

Development

Testing

Screen Real Estate

Controller

Technical Capabilities

One-Size-Fits-AllAPI

Request

RequestRequest

Request

Request

Request

RequestRequest

Request

Request

RequestRequest

Request

Request

Request

Request

Scaling…

OrganizationSystems

Devices

Development

Testing

Courtesy of South Florida Classical Review

Resource-Based API

vs.

Experience-Based API

Resource-Based Requests

• /users/<id>/ratings/title• /users/<id>/queues• /users/<id>/queues/instant• /users/<id>/recommendations• /catalog/titles/movie• /catalog/titles/series• /catalog/people

REST API

RECOMMENDATIONS

MOVIE DATA

SIMILAR MOVIES

AUTH MEMBERDATA

A/B TESTS

START-UP

RATINGS

Network Border Network Border

RECOMMENDATIONS

MOVIE DATA

SIMILAR MOVIES

AUTH MEMBERDATA

A/B TESTS

START-UP

RATINGS

OSFA API

Network Border Network Border

SERVER CODE

CLIENT CODE

RECOMMENDATIONS

MOVIE DATA

SIMILAR MOVIES

AUTH MEMBERDATA

A/B TESTS

START-UP

RATINGS

OSFA API

Network Border Network Border

DATA GATHERING,FORMATTING,AND DELIVERY

USER INTERFACERENDERING

Experience-Based Requests

• /ps3/homescreen

JAVA API

Network Border Network Border

RECOMMENDATIONS

MOVIE DATA

SIMILAR MOVIES

AUTH MEMBERDATA

A/B TESTS

START-UP

RATINGS

Groovy Layer

RECOMMENDATIONSA

ZXSXX C CCC

MOVIE DATA

SIMILAR MOVIES

AUTH MEMBERDATA

A/B TESTS

START-UP

RATINGS

JAVA API

SERVER CODE

CLIENT CODE

CLIENT ADAPTER CODE(WRITTEN BY CLIENT TEAMS, DYNAMICALLY UPLOADED TO SERVER)

Network Border Network Border

RECOMMENDATIONSA

ZXSXX C CCC

MOVIE DATA

SIMILAR MOVIES

AUTH MEMBERDATA

A/B TESTS

START-UP

RATINGS

JAVA API

DATA GATHERING

DATA FORMATTINGAND DELIVERY

USER INTERFACERENDERING

Network Border Network Border

Scaling…

OrganizationSystems

Devices

Development

Testing

Dependency Relationships

Testing Philosophy:

Act Fast, React Fast

That Doesn’t Mean We Don’t Test

• Unit tests

• Functional tests

• Regression scripts

• Continuous integration

• Capacity planning

• Load / Performance tests

Cloud-Based Deployment Techniques

Current Code

In Production

API Requests from the Internet

Single Canary InstanceTo Test New Code with Production Traffic

(around 1% or less of traffic)

Current Code

In Production

API Requests from the Internet

Error!

Current Code

In Production

API Requests from the Internet

Current Code

In Production

API Requests from the Internet

Perfect!

Current Code

In Production

API Requests from the Internet

New Code

Getting Prepared for Production

Current Code

In Production

API Requests from the Internet

New Code

Getting Prepared for Production

Error!

Current Code

In Production

API Requests from the Internet

New Code

Getting Prepared for Production

Current Code

In Production

API Requests from the Internet

New Code

Getting Prepared for Production

Current Code

In Production

API Requests from the Internet

Perfect!

Current Code

In Production

API Requests from the Internet

New Code

Getting Prepared for Production

Current Code

In Production

API Requests from the Internet

New Code

Getting Prepared for Production

API Requests from the Internet

New Code

Getting Prepared for Production

https://www.github.com/Netflix

Scaling the Netflix API

Daniel Jacobson@daniel_jacobson

http://www.linkedin.com/in/danieljacobsonhttp://www.slideshare.net/danieljacobson

HelpWanted!

top related