api world 2013 - transforming the netflix api

Post on 27-Jan-2015

112 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

The Netflix API has undergone a transformation since its inception in 2008. It has transitioned from being a public API with a generic RESTful interface to a platform for creating highly optimized, device-centric APIs that are critical to delivering the Netflix streaming experience on over 1000 different device types. This talk covers the design principles that shaped the transformation of the API as well as the technology that powers it, enabling rapid user experience iteration and bringing Netflix streaming to almost 38 million subscribers around the world.

TRANSCRIPT

Transforming the Netflix API

Ben Schmaus, NetflixOctober 2013, API World

bschmaus@netflix.com || @schmaus

Streaming TV Shows & Movies Globally

> 1000 Devices

1/3 ofInternet at peak

ExclusiveContent

Almost 38 million subscribers in over

40 countries

PersonalizationEngine User Info Movie

Metadata Ratings SimilarMovies

InstantQueue

A/B TestEngine

API

PersonalizationEngine User Info Movie

Metadata Ratings SimilarMovies

InstantQueue

A/B TestEngine

APIEnable UX Innovation

Insulate from Failure

It wasn’t always this way

FosterInnovation

fromOutside

REST for Easy Interop, Integration

RESTful API

REST for Easy Interop, Integration

RESTful API

Model resources like users, movies, series, ratings, etc

REST for Easy Interop, Integration

RESTful API

/catalog/titles/catalog/titles/movies/catalog/titles/series

/catalog/titles/series/episodes

REST for Easy Interop, Integration

RESTful API

/users/lists/users/title_states

/users/ratings

EnterStreaming

Devices

Lots of devices, lots of variety across platforms

External apps

1 /apps/{app_id}/config1 /users/{user_id}1 /users/{user_id}/queues/instant/available1 /users/{user_id}/lists26 /catalog/titles?...

Doesn’t include JS, CSS, images, etc.

1 /apps/{app_id}/config1 /users/{user_id}1 /users/{user_id}/queues/instant/available1 /users/{user_id}/lists26 /catalog/titles?...

Device variance: interaction models

Device variance: interaction models & form factors

A/B Tests Add to the Challenge

“Bolt on” functionality

expand=@queue_item@title,@box_art_exists,@short_synopsis,@directors,@cast,@episodes,@formats,@format_availability,@subtitle_languages,@languages_and_audio,@maturity_ratings

Resource-based API wasn’t scaling

Model Resources/catalog, /users

Model Experiences/tv/home

Reduce network chattiness

Support device optimizations

Enable fast dev iterations

GET/users/{user_id}/lists

apiGateway.getLists(userId)

Discrete HTTP requests pay WAN tax repeatedly

Single, optimized request; pay WAN tax once

Single, optimized request; pay WAN tax once

Client data assembly logic pushed to server

Add server-side scripting capability

Enable independent dev and device optimization

API

Internet /tv/home

API

/tv/home

Client Server

Internet

API

Application

Client Server

Internet /tv/home

API

Client Server

Internet

UI Teams

Application

/tv/home

API

API Team

Client Server

Internet

UI Teams

Application

/tv/home

APIUI

Teams

API Team

Mid-tierServices

Client Server

Internet

Application

/tv/home

RxJava Hystrix

API Team

Java Service Layer

Mid-tierServices

UI Teams

Client Server

Internet

Application

/tv/home

ELB ZuulMid-tier Services

ScriptableBackend

ScriptableBackend

+

API Layer

https://github.com/Netflix/Hystrixresilience patterns for distributed systems

https://github.com/Netflix/RxJavaasync, event-based programming

https://github.com/Netflix/zuuldynamic request routing, realtime monitoring, resiliency

Need well-appointed toolkit

Cassandra

upload script.groovyfor /script

Cassandra

Cassandra

activate /script

upload script.groovyfor /script

Cassandra

Cassandra

activate /script

CassandraScriptableBackend

load /script

upload script.groovyfor /script

Cassandra

Cassandra

activate /script

CassandraScriptableBackend

load /script

upload script.groovyfor /script

ScriptableBackend

DeviceClient

GET/script

Deployment & Ops

Historical & realtime metrics, sort realtime by error/request rate

Distributed grep + tail

2013-05-09.20:38:54 MX 200 us-east-1c i-1824cb73 i-1c61b77f prod NFPS3-001-8G50FJCX... 288404769389848058 90ms api-global.netflix.com GET /tvui/release/470/plus/pathEvaluator -amazon.ami-id: ami-502eb039amazon.availability-zone: us-east-1camazon.instance-id: i-1824cb73amazon.instance-type: m2.2xlargeamazon.local-ipv4: 10.6.213.112amazon.public-hostname: ec2-54-243-4-69.compute-1.amazonaws.comamazon.public-ipv4: 54.243.4.69cookie_esn: NFPS3-001-8G50FJCX...country: MXcurrentTime: 1368131934468duration-millis: 90esn: NFPS3-001-8G50FJCX...geo.city: CIUDADOBREGON...

$ ./simple_stream.py -f -q 'e["country"]=="MX" && e["esn"]==~/NFPS3.*/' -r us

Go for haystack handing you the needle

Or at least be able to make smaller haystacks

Run 1% of your traffic on the new code and see how it does

API ami-123 API ami-456

2xx4xx5xx

latencybusy threads

load...

Confidence score for each AMI based on comparison of 1000+ metrics

Scannable visualization of metric space

More important

Less important

Your basic red/black push

Doing red/black by hand for multiple clusters across multiple regions is

not fun

Automate multi-cluster/region pushes

Automate multi-cluster/region pushes

Don't forget to automate

rollbacks, too!

$Who, $What, $Where, $When

e.g., "bschmaus, ami-123, Sandbox Canary, 2013-05-06 19:05"

Latest prod change in chat topic

Quickly see status of all clusters in a region

API for APIs

Control network interactions

Device optimization

Development agility

Automation & insights

Continuously experiment to make hard things easier

bschmaus@netflix.com@schmaus

GreatEngineersWanted!

jobs.netflix.com

top related