ilm - pipeline in the cloud
TRANSCRIPT
Who are we?Jim Vanns
Aaron CareyProduction Engineers at ILM London
VFX Pipeline in the Cloud
Experiments with Mesos and Docker
Nomenclature, glossary and other big words★ VFX Visual Effects★ Pipeline Data->Process->Data repeat!★ Show Film★ Sequence A thematically linked series of
(continuous) scenes!★ Shot An uninterrupted portion of the
sequence★ Frame A single image in time★ Asset A character or building etc.
What is a VFX pipeline?
What is a VFX pipeline?
Film Scan
Roto
3D
FX
Comp
Lighting
What is a VFX pipeline?
What VFX isn’t.
What VFX isn’t★ Rendering and Sims are our ‘Big Data’★ We’re not crunching analytics in real-time★ Rendering != MapReduce★ Apps run on hardware, not in a browser★ We’re not here to re-write a renderer (not yet...)
Where does the cloud meet VFX?
What’s in it for us?★ Reducing Capital Expenditure★ Potentially reducing overheads★ Flexibility★ Giving power back to developers
VFX Studio Infrastructure★ Render Farm★ Database★ Storage★ Workstations
Render Farm
First, what is rendering!?★ Take a virtual 3D representation of a scene
○ 3D Models
○ Textures
○ Light sources
○ Static backgrounds (plates)
★ Place a virtual camera in the scene★ Compute the 2D image that the camera will see★ Repeat the process for each frame
Rendering in the cloud★ Low hanging fruit★ Already happening★ Typical Farm 30-50k procs★ Managed by specialist software (Tractor/Deadline/in-
house etc)
★ VFX has been doing clustered computing for decades
What’s next?
Mesos★ Open Source framework for scheduling★ Already used at massive scale★ NOT a job scheduler★ We can concentrate on the scheduling
logic★ Support for task isolation/containment
(eg Docker)
Automating our Mesos cluster with Docker and Ansible★Goals: Quick - Easy - Repeatable
★Didn’t want to spend time fighting our config manager (or each other)
★Be able to deploy a virtual studio from scratch in under an hour (including provisioning, building software, deploying, configuration)
★Run multiple versions of the infrastructure at the same time (in the same availability zone/network)
★If something is typed in the terminal, we want to automate and version it
Docker + Ansible was the answer
Automating our Mesos cluster with Ansible★Heavily using tags and variables in Ansible
★Cloud agnostic: Some modification of GCE inventory and launch modules
★Example: Creating a multi-host dynamic Zookeeper configuration -- name: Append the zookeeper server entries lineinfile: dest=/etc/zookeeper/conf/zoo.cfg insertafter=EOF line="server.{{hostvars[item]['zkid']}}={{hostvars[item]['ansible_eth0']['ipv4']['address']}}:2888:3888" with_items: "{{ groups['tag_zookeeper_server_' + consul_domain ]}}"
Service Discovery in Mesos★ No control over where a service or render runs★ Services may move hosts★ Can’t guarantee hosts will have same IP★ Options:
○ Mesos-DNS○ Homegrown (etcd etc)○ Consul
Mesos and Consul★ What is Consul?★ Every host runs an agent★ All DNS lookups on a host go to its agent★ Consul servers outside the Mesos cluster★ Mesos-Consul automates service registry★ Can be used for services outside the cluster
Example - Static service outside the cluster$ ssh -i mykey.pem [email protected]
$ docker run -d -p 5000:5000 --restart=always -e REGISTRY_STORAGE_S3_ACCESSKEY \ -e REGISTRY_STORAGE_S3_SECRETKEY -e REGISTRY_STORAGE_S3_REGION -e REGISTRY_STORAGE=s3
$ curl -H "Content-Type: application/json" -X POST -d '{ "Name": "docker-registry", \"Tags": ["docker-registry", "v2"], "Port": 5000 }' \http://127.0.0.1:8500/v1/agent/service/register
Example - Static service outside the cluster- name: Run docker registry container docker: name: docker-registry image: registry:2.1 state: started ports: - "5000:5000" restart_policy: always env: REGISTRY_STORAGE_S3_ACCESSKEY: REGISTRY_STORAGE_S3_SECRETKEY: REGISTRY_STORAGE_S3_REGION: REGISTRY_STORAGE_S3_BUCKET: REGISTRY_STORAGE: s3
- name: Register registry with consul uri: url: http://127.0.0.1:8500/v1/agent/service/register method: PUT body: '{ "Name": "docker-registry", "Tags": [ "docker-registry", "v2" ], "Port": 5000 }' body_format: json
Example - Launching a service on marathon- name: Submit maya container to marathon hosts: "tag_build_docker_{{ consul_domain }}" gather_facts: False tasks: - name: Submit maya job to marathon uri: url: http://marathon:8080/v2/apps method: POST status_code: 201,409 body: '{ "args": [], "container": { "type": "DOCKER", "docker": { "network": "BRIDGE", "portMappings": [ { "containerPort": 5901, "hostPort": 0, "protocol": "tcp" } ],
"image": "docker-registry:5000/studio-local-base/maya", "forcePullImage": true, "parameters": [ { "key": "env", "value": "DISPLAY" }, { "key": "device", "value": "/dev/dri/card0" }, { "key": "device", "value": "/dev/nvidia0" }, { "key": "device", "value": "/dev/nvidiactl" } ] }, "volumes": [ { "containerPath": "/tmp/.X11-unix/X0", "hostPath": "/tmp/.X11-unix/X0", "mode": "RW" } ] }, "id": "maya", "instances": 1, "cpus": 4, "mem": 8024, "constraints": [ ["gfx", "CLUSTER", "gpu"] ] }' body_format: json
Studio Services
Studio Service Structure
Studio Service Deployment
Database
★ Sites (eg. London, San Francisco, Singapore etc.)★ Departments★ Shows (film)★ Sequences★ Shots★ Tasks★ Assets★ Data
Modelling studio relationships
Challenges★ New technologies
○ Graph database○ Query language/APIs○ Distributed storage engine
★ Complexity (both in the data modelling and system)
★ Adoption/Approval
Storage
Cloud Storage Pros and Cons★ Managed★ No more tape archives/backups
But..★ Getting data into the cloud is expensive★ Getting data into the cloud is slooow
Is there another way?
Work in Progress...★ Applications need a POSIX filesystem interface★ Can we cache cloud storage?
○ EFS○ Avere○ Homegrown
Can we create content entirely in the cloud?
Workstations
Can we create content entirely in the Cloud?★ Applications require OpenGL★ OpenGL requires hardware★ Hardware needs drivers
Can we do this in Docker?
Dockerising OpenGL Applications★ NVIDIA drivers must match the host
version exactly★ Driver inside the container must not
install kernel module★ Container requires access to GPU device
and X Server
Running an OpenGL Docker applicationdocker run \
-it \ -v /tmp/.X11-unix:/tmp/.X11-unix:rw \
--device=/dev/dri/card0 \--device=/dev/nvidia0 \--device=/dev/nvidiactl \-e DISPLAY
Scheduling a VFX app on Mesos in the cloud★ Must use custom Mesos
resources/attributes to only schedule on GPU machines
★ Cloud machines have no monitor★ Remote desktop apps will forward GL
calls to the client machine
Using VirtualGL★ Intercepts GLX calls on the host★ Calls forwarded to 2nd (local) X Server★ GPU computation is done on the GPU
and output forwarded to the 2D (VNC) X Server
Using VirtualGL
3D X server setup/etc/X11/xorg.conf
Section "Device"Identifier "Device0"Driver "nvidia"VendorName "NVIDIA Corporation"BoardName "GRID K520"BusID "PCI:0:3:0"
EndSection
Section "Screen"Identifier "Screen0"Device "Device0"Monitor "Monitor0"DefaultDepth 24Option "UseDisplayDevice"
"None"SubSection "Display"
Depth 24EndSubSection
EndSection
$ lspci00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II]00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 01)00:02.0 VGA compatible controller: Cirrus Logic GD 544600:03.0 VGA compatible controller: NVIDIA Corporation GK104GL [GRID K520] (rev a1)00:1f.0 Unassigned class [ff80]: XenSource, Inc. Xen Platform Device (rev 01)
Demo
We’re Hiring