ilm - pipeline in the cloud

Post on 14-Jan-2017

6.805 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Who are we?Jim Vanns

Aaron CareyProduction Engineers at ILM London

VFX Pipeline in the Cloud

Experiments with Mesos and Docker

Nomenclature, glossary and other big words★ VFX Visual Effects★ Pipeline Data->Process->Data repeat!★ Show Film★ Sequence A thematically linked series of

(continuous) scenes!★ Shot An uninterrupted portion of the

sequence★ Frame A single image in time★ Asset A character or building etc.

What is a VFX pipeline?

What is a VFX pipeline?

Film Scan

Roto

3D

FX

Comp

Lighting

What is a VFX pipeline?

What VFX isn’t.

What VFX isn’t★ Rendering and Sims are our ‘Big Data’★ We’re not crunching analytics in real-time★ Rendering != MapReduce★ Apps run on hardware, not in a browser★ We’re not here to re-write a renderer (not yet...)

Where does the cloud meet VFX?

What’s in it for us?★ Reducing Capital Expenditure★ Potentially reducing overheads★ Flexibility★ Giving power back to developers

VFX Studio Infrastructure★ Render Farm★ Database★ Storage★ Workstations

Render Farm

First, what is rendering!?★ Take a virtual 3D representation of a scene

○ 3D Models

○ Textures

○ Light sources

○ Static backgrounds (plates)

★ Place a virtual camera in the scene★ Compute the 2D image that the camera will see★ Repeat the process for each frame

Rendering in the cloud★ Low hanging fruit★ Already happening★ Typical Farm 30-50k procs★ Managed by specialist software (Tractor/Deadline/in-

house etc)

★ VFX has been doing clustered computing for decades

What’s next?

Mesos★ Open Source framework for scheduling★ Already used at massive scale★ NOT a job scheduler★ We can concentrate on the scheduling

logic★ Support for task isolation/containment

(eg Docker)

Automating our Mesos cluster with Docker and Ansible★Goals: Quick - Easy - Repeatable

★Didn’t want to spend time fighting our config manager (or each other)

★Be able to deploy a virtual studio from scratch in under an hour (including provisioning, building software, deploying, configuration)

★Run multiple versions of the infrastructure at the same time (in the same availability zone/network)

★If something is typed in the terminal, we want to automate and version it

Docker + Ansible was the answer

Automating our Mesos cluster with Ansible★Heavily using tags and variables in Ansible

★Cloud agnostic: Some modification of GCE inventory and launch modules

★Example: Creating a multi-host dynamic Zookeeper configuration -- name: Append the zookeeper server entries lineinfile: dest=/etc/zookeeper/conf/zoo.cfg insertafter=EOF line="server.{{hostvars[item]['zkid']}}={{hostvars[item]['ansible_eth0']['ipv4']['address']}}:2888:3888" with_items: "{{ groups['tag_zookeeper_server_' + consul_domain ]}}"

Service Discovery in Mesos★ No control over where a service or render runs★ Services may move hosts★ Can’t guarantee hosts will have same IP★ Options:

○ Mesos-DNS○ Homegrown (etcd etc)○ Consul

Mesos and Consul★ What is Consul?★ Every host runs an agent★ All DNS lookups on a host go to its agent★ Consul servers outside the Mesos cluster★ Mesos-Consul automates service registry★ Can be used for services outside the cluster

Example - Static service outside the cluster$ ssh -i mykey.pem username@172.100.121.100

$ docker run -d -p 5000:5000 --restart=always -e REGISTRY_STORAGE_S3_ACCESSKEY \ -e REGISTRY_STORAGE_S3_SECRETKEY -e REGISTRY_STORAGE_S3_REGION -e REGISTRY_STORAGE=s3

$ curl -H "Content-Type: application/json" -X POST -d '{ "Name": "docker-registry", \"Tags": ["docker-registry", "v2"], "Port": 5000 }' \http://127.0.0.1:8500/v1/agent/service/register

Example - Static service outside the cluster- name: Run docker registry container docker: name: docker-registry image: registry:2.1 state: started ports: - "5000:5000" restart_policy: always env: REGISTRY_STORAGE_S3_ACCESSKEY: REGISTRY_STORAGE_S3_SECRETKEY: REGISTRY_STORAGE_S3_REGION: REGISTRY_STORAGE_S3_BUCKET: REGISTRY_STORAGE: s3

- name: Register registry with consul uri: url: http://127.0.0.1:8500/v1/agent/service/register method: PUT body: '{ "Name": "docker-registry", "Tags": [ "docker-registry", "v2" ], "Port": 5000 }' body_format: json

Example - Launching a service on marathon- name: Submit maya container to marathon hosts: "tag_build_docker_{{ consul_domain }}" gather_facts: False tasks: - name: Submit maya job to marathon uri: url: http://marathon:8080/v2/apps method: POST status_code: 201,409 body: '{ "args": [], "container": { "type": "DOCKER", "docker": { "network": "BRIDGE", "portMappings": [ { "containerPort": 5901, "hostPort": 0, "protocol": "tcp" } ],

"image": "docker-registry:5000/studio-local-base/maya", "forcePullImage": true, "parameters": [ { "key": "env", "value": "DISPLAY" }, { "key": "device", "value": "/dev/dri/card0" }, { "key": "device", "value": "/dev/nvidia0" }, { "key": "device", "value": "/dev/nvidiactl" } ] }, "volumes": [ { "containerPath": "/tmp/.X11-unix/X0", "hostPath": "/tmp/.X11-unix/X0", "mode": "RW" } ] }, "id": "maya", "instances": 1, "cpus": 4, "mem": 8024, "constraints": [ ["gfx", "CLUSTER", "gpu"] ] }' body_format: json

Studio Services

Studio Service Structure

Studio Service Deployment

Database

★ Sites (eg. London, San Francisco, Singapore etc.)★ Departments★ Shows (film)★ Sequences★ Shots★ Tasks★ Assets★ Data

Modelling studio relationships

Challenges★ New technologies

○ Graph database○ Query language/APIs○ Distributed storage engine

★ Complexity (both in the data modelling and system)

★ Adoption/Approval

Storage

Cloud Storage Pros and Cons★ Managed★ No more tape archives/backups

But..★ Getting data into the cloud is expensive★ Getting data into the cloud is slooow

Is there another way?

Work in Progress...★ Applications need a POSIX filesystem interface★ Can we cache cloud storage?

○ EFS○ Avere○ Homegrown

Can we create content entirely in the cloud?

Workstations

Can we create content entirely in the Cloud?★ Applications require OpenGL★ OpenGL requires hardware★ Hardware needs drivers

Can we do this in Docker?

Dockerising OpenGL Applications★ NVIDIA drivers must match the host

version exactly★ Driver inside the container must not

install kernel module★ Container requires access to GPU device

and X Server

Running an OpenGL Docker applicationdocker run \

-it \ -v /tmp/.X11-unix:/tmp/.X11-unix:rw \

--device=/dev/dri/card0 \--device=/dev/nvidia0 \--device=/dev/nvidiactl \-e DISPLAY

Scheduling a VFX app on Mesos in the cloud★ Must use custom Mesos

resources/attributes to only schedule on GPU machines

★ Cloud machines have no monitor★ Remote desktop apps will forward GL

calls to the client machine

Using VirtualGL★ Intercepts GLX calls on the host★ Calls forwarded to 2nd (local) X Server★ GPU computation is done on the GPU

and output forwarded to the 2D (VNC) X Server

Using VirtualGL

3D X server setup/etc/X11/xorg.conf

Section "Device"Identifier "Device0"Driver "nvidia"VendorName "NVIDIA Corporation"BoardName "GRID K520"BusID "PCI:0:3:0"

EndSection

Section "Screen"Identifier "Screen0"Device "Device0"Monitor "Monitor0"DefaultDepth 24Option "UseDisplayDevice"

"None"SubSection "Display"

Depth 24EndSubSection

EndSection

$ lspci00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II]00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 01)00:02.0 VGA compatible controller: Cirrus Logic GD 544600:03.0 VGA compatible controller: NVIDIA Corporation GK104GL [GRID K520] (rev a1)00:1f.0 Unassigned class [ff80]: XenSource, Inc. Xen Platform Device (rev 01)

Demo

We’re Hiring

top related