Download - ILM - Pipeline in the cloud
Who are we?Jim Vanns
Aaron CareyProduction Engineers at ILM London
VFX Pipeline in the Cloud
Experiments with Mesos and Docker
Nomenclature, glossary and other big words★ VFX Visual Effects★ Pipeline Data->Process->Data repeat!★ Show Film★ Sequence A thematically linked series of
(continuous) scenes!★ Shot An uninterrupted portion of the
sequence★ Frame A single image in time★ Asset A character or building etc.
What is a VFX pipeline?
What is a VFX pipeline?
Film Scan
Roto
3D
FX
Comp
Lighting
What is a VFX pipeline?
What VFX isn’t.
What VFX isn’t★ Rendering and Sims are our ‘Big Data’★ We’re not crunching analytics in real-time★ Rendering != MapReduce★ Apps run on hardware, not in a browser★ We’re not here to re-write a renderer (not yet...)
Where does the cloud meet VFX?
What’s in it for us?★ Reducing Capital Expenditure★ Potentially reducing overheads★ Flexibility★ Giving power back to developers
VFX Studio Infrastructure★ Render Farm★ Database★ Storage★ Workstations
Render Farm
First, what is rendering!?★ Take a virtual 3D representation of a scene
○ 3D Models
○ Textures
○ Light sources
○ Static backgrounds (plates)
★ Place a virtual camera in the scene★ Compute the 2D image that the camera will see★ Repeat the process for each frame
Rendering in the cloud★ Low hanging fruit★ Already happening★ Typical Farm 30-50k procs★ Managed by specialist software (Tractor/Deadline/in-
house etc)
★ VFX has been doing clustered computing for decades
What’s next?
Mesos★ Open Source framework for scheduling★ Already used at massive scale★ NOT a job scheduler★ We can concentrate on the scheduling
logic★ Support for task isolation/containment
(eg Docker)
Automating our Mesos cluster with Docker and Ansible★Goals: Quick - Easy - Repeatable
★Didn’t want to spend time fighting our config manager (or each other)
★Be able to deploy a virtual studio from scratch in under an hour (including provisioning, building software, deploying, configuration)
★Run multiple versions of the infrastructure at the same time (in the same availability zone/network)
★If something is typed in the terminal, we want to automate and version it
Docker + Ansible was the answer
Automating our Mesos cluster with Ansible★Heavily using tags and variables in Ansible
★Cloud agnostic: Some modification of GCE inventory and launch modules
★Example: Creating a multi-host dynamic Zookeeper configuration -- name: Append the zookeeper server entries lineinfile: dest=/etc/zookeeper/conf/zoo.cfg insertafter=EOF line="server.{{hostvars[item]['zkid']}}={{hostvars[item]['ansible_eth0']['ipv4']['address']}}:2888:3888" with_items: "{{ groups['tag_zookeeper_server_' + consul_domain ]}}"
Service Discovery in Mesos★ No control over where a service or render runs★ Services may move hosts★ Can’t guarantee hosts will have same IP★ Options:
○ Mesos-DNS○ Homegrown (etcd etc)○ Consul
Mesos and Consul★ What is Consul?★ Every host runs an agent★ All DNS lookups on a host go to its agent★ Consul servers outside the Mesos cluster★ Mesos-Consul automates service registry★ Can be used for services outside the cluster
Example - Static service outside the cluster$ ssh -i mykey.pem [email protected]
$ docker run -d -p 5000:5000 --restart=always -e REGISTRY_STORAGE_S3_ACCESSKEY \ -e REGISTRY_STORAGE_S3_SECRETKEY -e REGISTRY_STORAGE_S3_REGION -e REGISTRY_STORAGE=s3
$ curl -H "Content-Type: application/json" -X POST -d '{ "Name": "docker-registry", \"Tags": ["docker-registry", "v2"], "Port": 5000 }' \http://127.0.0.1:8500/v1/agent/service/register
Example - Static service outside the cluster- name: Run docker registry container docker: name: docker-registry image: registry:2.1 state: started ports: - "5000:5000" restart_policy: always env: REGISTRY_STORAGE_S3_ACCESSKEY: REGISTRY_STORAGE_S3_SECRETKEY: REGISTRY_STORAGE_S3_REGION: REGISTRY_STORAGE_S3_BUCKET: REGISTRY_STORAGE: s3
- name: Register registry with consul uri: url: http://127.0.0.1:8500/v1/agent/service/register method: PUT body: '{ "Name": "docker-registry", "Tags": [ "docker-registry", "v2" ], "Port": 5000 }' body_format: json
Example - Launching a service on marathon- name: Submit maya container to marathon hosts: "tag_build_docker_{{ consul_domain }}" gather_facts: False tasks: - name: Submit maya job to marathon uri: url: http://marathon:8080/v2/apps method: POST status_code: 201,409 body: '{ "args": [], "container": { "type": "DOCKER", "docker": { "network": "BRIDGE", "portMappings": [ { "containerPort": 5901, "hostPort": 0, "protocol": "tcp" } ],
"image": "docker-registry:5000/studio-local-base/maya", "forcePullImage": true, "parameters": [ { "key": "env", "value": "DISPLAY" }, { "key": "device", "value": "/dev/dri/card0" }, { "key": "device", "value": "/dev/nvidia0" }, { "key": "device", "value": "/dev/nvidiactl" } ] }, "volumes": [ { "containerPath": "/tmp/.X11-unix/X0", "hostPath": "/tmp/.X11-unix/X0", "mode": "RW" } ] }, "id": "maya", "instances": 1, "cpus": 4, "mem": 8024, "constraints": [ ["gfx", "CLUSTER", "gpu"] ] }' body_format: json
Studio Services
Studio Service Structure
Studio Service Deployment
Database
★ Sites (eg. London, San Francisco, Singapore etc.)★ Departments★ Shows (film)★ Sequences★ Shots★ Tasks★ Assets★ Data
Modelling studio relationships
Challenges★ New technologies
○ Graph database○ Query language/APIs○ Distributed storage engine
★ Complexity (both in the data modelling and system)
★ Adoption/Approval
Storage
Cloud Storage Pros and Cons★ Managed★ No more tape archives/backups
But..★ Getting data into the cloud is expensive★ Getting data into the cloud is slooow
Is there another way?
Work in Progress...★ Applications need a POSIX filesystem interface★ Can we cache cloud storage?
○ EFS○ Avere○ Homegrown
Can we create content entirely in the cloud?
Workstations
Can we create content entirely in the Cloud?★ Applications require OpenGL★ OpenGL requires hardware★ Hardware needs drivers
Can we do this in Docker?
Dockerising OpenGL Applications★ NVIDIA drivers must match the host
version exactly★ Driver inside the container must not
install kernel module★ Container requires access to GPU device
and X Server
Running an OpenGL Docker applicationdocker run \
-it \ -v /tmp/.X11-unix:/tmp/.X11-unix:rw \
--device=/dev/dri/card0 \--device=/dev/nvidia0 \--device=/dev/nvidiactl \-e DISPLAY
Scheduling a VFX app on Mesos in the cloud★ Must use custom Mesos
resources/attributes to only schedule on GPU machines
★ Cloud machines have no monitor★ Remote desktop apps will forward GL
calls to the client machine
Using VirtualGL★ Intercepts GLX calls on the host★ Calls forwarded to 2nd (local) X Server★ GPU computation is done on the GPU
and output forwarded to the 2D (VNC) X Server
Using VirtualGL
3D X server setup/etc/X11/xorg.conf
Section "Device"Identifier "Device0"Driver "nvidia"VendorName "NVIDIA Corporation"BoardName "GRID K520"BusID "PCI:0:3:0"
EndSection
Section "Screen"Identifier "Screen0"Device "Device0"Monitor "Monitor0"DefaultDepth 24Option "UseDisplayDevice"
"None"SubSection "Display"
Depth 24EndSubSection
EndSection
$ lspci00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II]00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 01)00:02.0 VGA compatible controller: Cirrus Logic GD 544600:03.0 VGA compatible controller: NVIDIA Corporation GK104GL [GRID K520] (rev a1)00:1f.0 Unassigned class [ff80]: XenSource, Inc. Xen Platform Device (rev 01)
Demo
We’re Hiring