netflix container scheduling and execution - qcon new york 2016

39
Scheduling a Fuller House: Container Management Sharma Podila, Andrew Spyker - Senior Software Engineers

Upload: aspyker

Post on 16-Apr-2017

7.234 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Netflix Container Scheduling and Execution - QCon New York 2016

Scheduling a Fuller House: Container Management

Sharma Podila, Andrew Spyker - Senior Software Engineers

Page 2: Netflix Container Scheduling and Execution - QCon New York 2016

About Netflix

● 81.5M members● 2000+ employees (1400 tech)● 190+ countries● > 100M hours watch per day● > ⅓ NA internet download traffic● 500+ Microservices● Many 10’s of thousands VM’s● 3 regions across the world

2

Page 3: Netflix Container Scheduling and Execution - QCon New York 2016

Agenda

● Why containers at Netflix?

● What did we build and what did we learn?

● What are our current and future workloads?

3

Page 4: Netflix Container Scheduling and Execution - QCon New York 2016

Why a 2nd edition of virtualization?

● Given our resilient cloud native, CI/CD devops enabled, elastically scalable virtual machine based architecture, did we really need containers? 4

Page 5: Netflix Container Scheduling and Execution - QCon New York 2016

Motivating factors for containers

● Simpler management of compute resources

● Simpler deployment packaging artifacts for compute jobs

● Need for a consistent local developer environment

5

Page 6: Netflix Container Scheduling and Execution - QCon New York 2016

Simpler compute, Management & Packaging

Batch/stream processing jobs

● Here are the files to run my process● I need m cores, n disk, and o memory● Please just run it for me!

6

Service style jobs (VM’s)

● Use tested/secure base AMI● Bake an AMI● Define launch config● Choose t-shirt sized instance● Canary & red/black ASG’s

Page 7: Netflix Container Scheduling and Execution - QCon New York 2016

Consistent developer experience

● Many years focused on○ Build, bake / cloud deploy / operational experience○ Not as much time focused on developer experience

● New Netflix local developer experience based on Docker

● Has had a benefit in both directions○ Cloud like local development environment○ Easier operational debugging of cloud workloads

7

Page 8: Netflix Container Scheduling and Execution - QCon New York 2016

What about resource optimization?

● Not absolutely required and easier to get wins at larger scale across larger virtual machine fleet

● However, potential benefits to○ Elastic resource pool for scaling batch & adhoc jobs○ Reliable smaller instance sizes for NodeJS○ Cross Netflix resource optimizations

■ Trough usage, instance type migration8

Page 9: Netflix Container Scheduling and Execution - QCon New York 2016

Agenda

● Why containers at Netflix?

● What did we build and what did we learn?

● What are our current and future workloads?

9

Page 10: Netflix Container Scheduling and Execution - QCon New York 2016

VMVM

Lesson: Support containers by leveraging existing Netflix IaaS focused cloud platform

10

Atlas

EC2

AW

S A

utoS

cale

r

VMs

App

Cloud Platform(metrics, IPC, health)

Eureka

VPC

Edda

Existing - VM’s

VMVM

Atlas

EC2

Titu

s Jo

b C

ontro

l

Containers

App

Cloud Platform(metrics, IPC, health)

Eureka

VPC

Edda

Titus - Containers

VMVM

BatchContainers

Page 11: Netflix Container Scheduling and Execution - QCon New York 2016

VMVM

11

EC2

AW

S A

utoS

cale

rVMs

App

Cloud Platform(metrics, IPC, health)

VPC

Netflix Cloud Infrastructure (VM’s + Containers)

VMVM

Atlas

Titu

s Jo

b C

ontro

l

Containers

App

Cloud Platform(metrics, IPC, health)

Eureka Edda

VMVM

BatchContainers

Why - Single consistent cloud platform

Page 12: Netflix Container Scheduling and Execution - QCon New York 2016

Lesson: Buy vs. Build, Why build our own?● Looking across other container management solutions

○ Mesos, Kubernetes, and Swarm● Proven solutions are focused on the datacenter● Newer solutions are

○ Working to abstract datacenter and cloud○ Delivering more than cluster manager

■ PaaS, Service discovery, IPC■ Continuous deployment■ Metrics

○ Not yet at our level of scale● Not appropriate for Netflix 12

Page 13: Netflix Container Scheduling and Execution - QCon New York 2016

“Project Titus” (Firehose peek)

13

Titus UITitus UI

Docker RegistryDocker Registry

Rhea

containercontainer

container

docker

Titus Agent metrics agent

Titus executor

logging agent

zfs

mesos agent

docker

RheaTitus API

Cassandra

Titus Master

Job Management & Scheduler

S3

ZookeeperDocker Registry

EC2 Autocaling API

Mesos Master

Titus UI

Fenzo

container

Pod & VPC net drivers

containercontainer

AWS containermetadata proxy

Integration

CI/CD Amazon VM’s

Page 14: Netflix Container Scheduling and Execution - QCon New York 2016

Is that all?

14

Page 15: Netflix Container Scheduling and Execution - QCon New York 2016

Container Execution

15

Titus UITitus UI

Docker RegistryDocker Registry

Rhea

containercontainer

container

docker

Titus Agent metrics agent

Titus executor

logging agent

zfs

mesos agent

docker

RheaTitus API

Cassandra

Titus Master

Job Management & Scheduler

S3

ZookeeperDocker Registry

EC2 Autocaling API

Mesos Master

Titus UI

Fenzo

container

Pod & VPC net drivers

containercontainer

AWS containermetadata proxy

CI/CD Amazon VM’s

Page 16: Netflix Container Scheduling and Execution - QCon New York 2016

Lesson: What you lose with Docker on EC2

16

+ <● Networking: VPC● Security: Security Groups, IAM Roles● Context: Instance Metadata, User Data / Env Context● Operational Visibility: Metrics, Health checking● Resource Isolation: Networking, Local Storage

MULTI

-TENA

NT

Page 17: Netflix Container Scheduling and Execution - QCon New York 2016

Lesson: Making Containers Act Like VM’s

17

● Built: EC2 Metadata Proxy○ Provide overridden scheduled IAM role, instance id○ Proxy other values

● Provided: Provide Environmental Context○ Titus specific job and task info○ ASG app, stack, sequence, other EC2 standard

● Why? Now:○ Service discovery registration works○ Amazon service SDK based applications work

Page 18: Netflix Container Scheduling and Execution - QCon New York 2016

Lesson: Networking will continue to evolve

18

● Started with batch○ Started with “bridge” with port mapping○ Added “host” with port resource mapping (for performance?)○ Continue to use “bridge” without port mapping

● Service style apps added○ Added “nfvpc” VPC IP/container with libnetwork plugin○ Removed Host (no value over VPC IP/container)○ Changed “nfvpc” VPC IP/container

■ Pod based with customer executor (no plugin)○ Added security groups to “nfvpc”

Page 19: Netflix Container Scheduling and Execution - QCon New York 2016

Plumbing VPC Networking into Docker

19

No IP Needed

Task 0

SecGrp Y

Task 1 Task 2 Task 3

docker0 (*)

EC2 VMeth0

eni0SG=Titus Agent

eth1

eni1SecGrp=X

eth2

eni2SG=Y

IP 1IP 2

IP 3

pod root

veth<id>

app

SecGrp X

pod root

veth<id>

app

SecGrp X

pod root

veth<id>

appapp

veth<id>

Linux Policy Based Routing

EC2 Metadata

Proxy

169.254.169.254IPTables NAT (*)

* **

169.254.169.254

Page 20: Netflix Container Scheduling and Execution - QCon New York 2016

Lesson: Secure Multi-tenancy is Hard

20

Common to VM’s and tiered security needed● Protect the reduced host IAM role, Allow containers to have specific IAM roles● Needed to support same security groups in container networking as VM’s

User namespacing● Docker 1.10 - Introduced User Namespaces

● Didn’t work /w shared networking NS● Docker 1.11 - Fixed shared networking NS’s

● But, namespacing is per daemon● Not per container, as hoped

● Waiting on Linux● Considering mass chmod / ZFS clones

Page 21: Netflix Container Scheduling and Execution - QCon New York 2016

Operational Visibility Evolution

21

● What is “node” - containers on VM’s

● Soft limits / bursting a good thing?○ Until percent util and outliers are considered

● System level metrics○ Currently - hand coded cgroup scraping○ Considering Intel Snap replacement

● Pollers - Metrics, Health, Discovery○ Created Edda common “server group” view

Page 22: Netflix Container Scheduling and Execution - QCon New York 2016

Future Execution Focus

22

● Better Isolation (agents, networking, block I/O, etc.)

● Exposing our implementation of “Pod”’s to users

● Better resiliency (DNS dependencies reduced)

Page 23: Netflix Container Scheduling and Execution - QCon New York 2016

Job Management and Resource Scheduling

23

Titus UITitus UI

Docker RegistryDocker Registry

Rhea

containercontainer

container

docker

Titus Agent metrics agent

Titus executor

logging agent

zfs

mesos agent

docker

RheaTitus API

Cassandra

Titus Master

Job Management & Scheduler

S3

ZookeeperDocker Registry

EC2 Autocaling API

Mesos Master

Titus UI

Fenzo

container

Pod & VPC net drivers

containercontainer

AWS containermetadata proxy

CI/CD Amazon VM’s

Page 24: Netflix Container Scheduling and Execution - QCon New York 2016

Lesson: Complexity in scheduling

24

● Resilience○ Balance instances across EC2 zones,

instances within a zone

● Security○ Two level resource for ENIs

● Placement optimization○ Resource affinity○ Task locality○ Bin packing (Auto Scaling)

Page 25: Netflix Container Scheduling and Execution - QCon New York 2016

Lesson: Keep resource scheduling extensible

25

Fenzo - Extensible Scheduling Library

Features:● Heterogeneous resources & tasks● Autoscaling of mesos cluster

○ Multiple instance types● Plugins based scheduling objectives

○ Bin packing, etc.● Plugins based constraints evaluator

○ Resource affinity, task locality, etc.● Scheduling actions visibility

https://github.com/Netflix/Fenzo

Page 26: Netflix Container Scheduling and Execution - QCon New York 2016

Cluster Autoscaling Challenge

26

Host 4Host 3Host 1vs.

For long running stateful services

Host 1 Host 2

Host 2

Host 3 Host 4

Page 27: Netflix Container Scheduling and Execution - QCon New York 2016

Resources assigned in Titus

27

● CPU, memory, disk capacity

● Per container AWS EC2 Security groups, IP, and network bandwidth via custom driver

● Abstracting out EC2 instance types

Page 28: Netflix Container Scheduling and Execution - QCon New York 2016

Security groups and their resources

28

A two level resource per EC2 Instance: N ENIs, each with M IPs

ENI 0

Assigned Security Group: SG1 Used IPs Count: 2 of 7

ENI 1

Assigned Security Group: SG1,SG2 Used IPs Count: 1 of 7

ENI 2

Assigned Security Group: SG3 Used IPs Count: 7 of 7

Page 29: Netflix Container Scheduling and Execution - QCon New York 2016

Lesson: Scheduling Vs. Job Management

29

Scheduling resources to tasks is common.

Lifecycle management is not.

Page 30: Netflix Container Scheduling and Execution - QCon New York 2016

Lesson: Scheduling Vs. Job Management

30

Task scheduling concerns

● Assign resources to tasks● Cluster wide optimizations

○ Bin packing○ Global constraints, like SLAs

● Task preferences and constraints○ Locality with other tasks○ Resource affinity

Job manager concerns

● Managing task/instance counts● Creating metadata, defining constraints● Lifecycle management

○ Replace failed task executions

● Handle failures○ Rate limit requeuing & relaunching ○ Time out tasks in transitionary states

Page 31: Netflix Container Scheduling and Execution - QCon New York 2016

Future Job Management & Scheduling Focus

31

● More resources to track: GPUs

● Automatic resource affinity with heterogenous instances

● SLAs○ Latencies for services○ Throughput for batch○ Task preemptions

Page 32: Netflix Container Scheduling and Execution - QCon New York 2016

Things we didn’t cover in this talk

● Overall integration○ Chaos, continuous delivery, performance insight

● Container Execution○ Logging (live log access & S3 log rotation)○ Liveness and health checking○ Isolation (disk usage, networking, block I/O)○ Image registry (metrics, security scanning)

● Scheduling○ Autoscaling heterogeneous pools○ Host-task fitness criteria

● API○ Extensibility, polymorphic, SLA and job/container ownership 32

Page 33: Netflix Container Scheduling and Execution - QCon New York 2016

Agenda

● Why containers at Netflix?

● What did we build and what did we learn?

● What are our current and future workloads?

33

Page 34: Netflix Container Scheduling and Execution - QCon New York 2016

Current Titus Production Usage

34

● Autoscaling○ 100’s of r3.8xl’s○ Each 32 vCPU, 244G

● Peak○ Thousands of cores○ Tens of TB’s memory

● Thousands containers/day○ ~ 100 different images

Page 35: Netflix Container Scheduling and Execution - QCon New York 2016

Workloads, Past

● Most current usage is batch○ Algorithm training, adhoc reporting jobs

● Sampling:○ Training of “sims” and A/B test models○ Open Connect Device/IX reporting○ Web security scanning and analysis○ Social media analytics updates

35

Page 36: Netflix Container Scheduling and Execution - QCon New York 2016

Workloads, Now

● Spent last five months adding service style support

● First line of fire customer requests already received

● Larger scale shadow and trickle traffic throughout 2Q

● First service style apps○ Finer grained instances - NodeJS○ Docker provided local developer experience

36

Page 37: Netflix Container Scheduling and Execution - QCon New York 2016

Workloads, Coming

● Media Encoding○ Thousands of VM’s○ VM based resource scheduling○ Considering containers to have faster start-up○ Internal spot-market - trough borrowing

● SPaaS○ 10’s of thousands of containers○ Stream Processing as a Service○ Convert scheduling systems to Titus

37

Page 38: Netflix Container Scheduling and Execution - QCon New York 2016

Questions?

38

Page 39: Netflix Container Scheduling and Execution - QCon New York 2016

Other Netflix QCon Talks

39

Title Time Speaker(s)

The Netflix API Platform for Server-Side Scripting

Monday 10:35 Katharina Probst

Scheduling A Fuller House: Container Mgmt @ Netflix

Tuesday 10:35 Andrew Spyker & Sharma Podila

Chaos Kong - Endowing Netflix with Antifragility

Tuesday 11:50 Luke Kosewski

The Evolution of the JavaScript

Wednesday 4:10 Jafar Husain

Async Programming in JS: The End of the Loop

Friday 9:00 Jafar Husain