(cmp404) cloud rendering at walt disney animation studios

47
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Usman Shakeel, Amazon Web Services Kevin Constantine, Walt Disney Animation Studios October 2015 CMP404 Cloud Rendering at Walt Disney Animation Studios

Upload: amazon-web-services

Post on 12-Apr-2017

4.534 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: (CMP404) Cloud Rendering at Walt Disney Animation Studios

© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Usman Shakeel, Amazon Web Services Kevin Constantine, Walt Disney Animation Studios

October 2015

CMP404 Cloud Rendering at

Walt Disney Animation Studios

Page 2: (CMP404) Cloud Rendering at Walt Disney Animation Studios

Visual Effects and Animation 1

Who is using AWS for rendering?

3 Theme Parks

5 Gaming

Marketing 2

4 Manufacturing

6 Life Sciences

7 Engineering and Architecture

Page 3: (CMP404) Cloud Rendering at Walt Disney Animation Studios

Visual Effects and Animation 1

Let’s make a film in the cloud…

Page 4: (CMP404) Cloud Rendering at Walt Disney Animation Studios

VFX/Animation Rendering - workflow components

Compositing Modeling Rendering

Asset management

Collaboration and task management

Page 5: (CMP404) Cloud Rendering at Walt Disney Animation Studios

The challenge of making a film

Page 6: (CMP404) Cloud Rendering at Walt Disney Animation Studios

The challenge of making a film

On-premises capacity

Page 7: (CMP404) Cloud Rendering at Walt Disney Animation Studios

The challenge of making a film

On-premises capacity

Rendering in the cloud

Page 8: (CMP404) Cloud Rendering at Walt Disney Animation Studios

The challenge of making a film

On-premises capacity

Rendering in the cloud Cloud provides you the capability to scale fast and get the outputs faster

Initial project on-boarding artwork

Page 9: (CMP404) Cloud Rendering at Walt Disney Animation Studios

A tale of two customers A boutique studio Walt Disney Animation Studios

On-Premises Hardware

No or very little investment A significant investment

Licenses Limited Unlimited

Project Structure

Project based from other studios Internal customers/projects

Budget Constraints

Time and resources Time and resources

Compute Needs Large scale Very large scale

Infrastructure Efficiencies

No or very little On-premises infrastructure optimized for rendering workload

Cloud Model All-in mostly Hybrid mostly

Security Mandated by customers Required due to high valued assets

Page 10: (CMP404) Cloud Rendering at Walt Disney Animation Studios

They both ask us the same thing… The ability to spin up thousands of cores on-demand

…without any upfront investment …and leveraging the most up-to-date configurations

A project-based “disposable” infrastructure

…with a flexible licensing / utility / by the hour

Page 11: (CMP404) Cloud Rendering at Walt Disney Animation Studios

They both tell us the same thing…

=< $0.01 per core/hour

Access to thousands of cores whenever needed

No upfront investments in infrastructure

Easier collaboration

Ecosystem of software providers

Access to large memory configs to do 6K/10K renders

Project based “disposable” infrastructure

Page 12: (CMP404) Cloud Rendering at Walt Disney Animation Studios

…when the rubber meets the road !

Share FS everywhere Latency Large datasets Lots of instances

{Data/Content}

Page 13: (CMP404) Cloud Rendering at Walt Disney Animation Studios

Rendering in the Cloud

Page 14: (CMP404) Cloud Rendering at Walt Disney Animation Studios

Rendering in the Cloud - State of the Union Scale at a very cheap price

EC2 Spot

Page 15: (CMP404) Cloud Rendering at Walt Disney Animation Studios

Leveraging Spot successfully today requires some effort Build stateless, distributed, scalable applications Choose which instance types fit your workload the best Ingest price feed data for AZs and regions Make run time decisions on which Spot pools to launch in based on price and volatility Manage interruptions Monitor and manage market prices across AZs and instance types Manage the capacity footprint in the fleet And all of this while you don’t know where the capacity is Serve your customers

Page 16: (CMP404) Cloud Rendering at Walt Disney Animation Studios

Spot Fleet

Instead of writing all that code to manage Spot instances, simply specify:

•  Target Capacity – The number of EC2 instances that you want in your fleet.

•  Maximum Bid Price – The maximum bid price that you are willing to pay.

•  Launch Specifications – # of and types of instances, AMI ID, VPC, subnets or AZs, etc.

•  IAM Fleet Role – The name of an IAM role. It must allow Amazon EC2 to terminate instances on your behalf.

Page 17: (CMP404) Cloud Rendering at Walt Disney Animation Studios

Spot Fleet Example – Instance Weighting Say your workload needs at least 60 GB of memory Want capacity to complete 20 units of work Choices:

•  r3.2xlarge (61.0 GB, 8 vCPUs) = 1 unit of 20 •  r3.4xlarge (122.0 GB, 16 vCPUs) = 2 units of 20 •  r3.8xlarge (244.0 GB, 32 vCPUs) = 4 units of 20

An option to bid for all of these instance types:

Page 18: (CMP404) Cloud Rendering at Walt Disney Animation Studios

AWS cloud scale is “large” • 10s/100s/1000s/10000s cores on-demand in the cloud • A “large” (Disney Animation Studio) renderfarm:

55,000 cores • In this demo:

~40,000 vCPUs on EC2 Spot Market

Rendering in the Cloud - State of the Union Scale at a very cheap price

Page 19: (CMP404) Cloud Rendering at Walt Disney Animation Studios

• BYOL • SaaS • AWS Marketplace • Elastic Licensing models

Thinkbox Deadline Usage Based Licensing •  Render nodes pull metered licenses from cloud-based license server •  Usage is tracked per minute •  Bulk minutes will be available via Thinkbox’s online store •  Store will eventually host 3rd party licensing (Nuke, VRay, etc.)

AutoDesk Maya

Rendering in the Cloud - State of the Union Licensing at Cloud Scale

Page 20: (CMP404) Cloud Rendering at Walt Disney Animation Studios

Rendering in the Cloud - State of the Union Hydrating the Cloud Renderfarm Amazon S3 as the source of truth for your content/data •  On AWS Marketplace/SaaS

(Aspera, Signiant, File Catalyst, Expedat) •  Amazon S3 Multi-part Upload Direct to Shared File Systems •  Amazon EFS throughput scales linearly to the storage •  Lustre can hydrate from an S3 bucket •  Avere can be fronted to Amazon S3 or an

on-premises NAS

+ AWS Direct Connect

EFS S3 Multipart

Page 21: (CMP404) Cloud Rendering at Walt Disney Animation Studios

Rendering in the Cloud - State of the Union Shared FileSystem Everywhere (some ideas)

Shared Storage

On-prem Storage AWS Direct Connect

Storage Cache

Amazon S3

Luster on EC2

Avere on EC2

EFS

AWS Direct Connect

Hydrate workers

EC2 Spot

Shared Storage

FXT on-prem

Page 22: (CMP404) Cloud Rendering at Walt Disney Animation Studios

Rendering in the Cloud - State of the Union NFS/CIFS (Content/Data Share) Everywhere (some ideas)

Elastic File System •  Designed to support petabyte scale file systems •  Throughput scales linearly to storage •  Same latency spec across each AZ •  Thousands of concurrent NFS connections •  Works great for large I/O sizes •  Pay for only what you use not what you provision •  Managed with multi-copy durability

EFS

Page 23: (CMP404) Cloud Rendering at Walt Disney Animation Studios

Rendering in the Cloud - State of the Union Move the Graphic Artist to the Cloud …

•  NVIDIA GPU based EC2 instances •  Teradici PCoIP •  Frame, Otoy •  Windows and Linux (VNC+VirtualGL)

Page 24: (CMP404) Cloud Rendering at Walt Disney Animation Studios

Rendering in the Cloud - State of the Union Managing your “disposable” infrastructure

Launch a CloudFormation stack with all the infrastructure

resources for a specific project

Automatically scale the stack as appropriate

AMI

CloudFormation Template

CloudFormation Terminate Template

Page 25: (CMP404) Cloud Rendering at Walt Disney Animation Studios

Rendering in the Cloud - State of the Union The Crown Jewels

•  AWS alignment with the latest MPAA cloud based application guidelines for content security – August 2015

•  VPC private endpoint for Amazon S3 – enables a true private workflow capability

•  Encryption & key management capabilities •  Amazon Glacier Vault for high-value media/originals

Page 26: (CMP404) Cloud Rendering at Walt Disney Animation Studios

Rendering in the Cloud - A Sample Architecture (All in Cloud Pipeline)

Shared Storage

Renderfarm

On-Prem Storage

Pipeline and License Manager

3D Modeler

Remote App Visualization

AWS Direct Connect

Modeling Dumb Client

Storage Cache

Amazon S3

Avere on EC2

Scalable Renderfarm on EC2

Appstream or Teradici running on a G2 instance

Pipeline Manager running on EC2

G2

EC2 SPOT

EFS

Hydrate workers

EC2 Spot

Page 27: (CMP404) Cloud Rendering at Walt Disney Animation Studios

Render Farm

Rendering in the Cloud - A Sample Architecture (A Hybrid Pipeline)

Shared Storage

Renderfarm

On-Prem Storage AWS Direct Connect

Storage Cache

Amazon S3

Avere on EC2

Scalable Renderfarm on EC2

EFS

Hydrate workers

EC2 Spot

On-premise Renderfarm

EC2 SPOT

Cloud renderfarm as an extension of on-prem renderfarm

FXT on-prem

Pipeline and License Manager (also manage cloud renderfarm)

Page 28: (CMP404) Cloud Rendering at Walt Disney Animation Studios

Let’s make a real film in the cloud…

Page 29: (CMP404) Cloud Rendering at Walt Disney Animation Studios

Disney Animation Renderfarm

Renderfarm Avere FXT cluster

WDAS Data Center

Renderfarm

Avere FXT cluster

Storage

Remote Data Center

Renderfarm Avere FXT cluster

Remote Data Center

San Francisco

Los Angeles

Burbank

Artists Redundant 10Gb

Page 30: (CMP404) Cloud Rendering at Walt Disney Animation Studios

Disney Animation’s Environment

•  90% Red Hat Enterprise Linux 6, 8% MacOSX •  1Gb/s Ethernet to clients, 10Gb/s to most servers

•  Clients are bursty, not generally bandwidth constrained

•  Major Applications: •  Hyperion (GI Renderer) •  Maya •  Houdini •  Nuke •  Coda (Scheduler)

Page 31: (CMP404) Cloud Rendering at Walt Disney Animation Studios

Disney Animation’s Environment •  NFS v3 Everywhere

•  5-7 petabytes •  500 TB working-set •  100 TB/week of data churn •  Global namespace •  Lots of metadata operations •  Serve everything out of RAM/SSD

•  Renderfarm Footprint •  55,000 core renderfarm •  1.1 million render hours per day •  200,000-400,000 tasks per day

•  Typical render •  8-16 threads, 64 GB •  3-5 hours per task

Page 32: (CMP404) Cloud Rendering at Walt Disney Animation Studios

Disney Animation Renderfarm

Renderfarm Avere FXT cluster

WDAS Data Center

Renderfarm

Avere FXT cluster

Storage

Remote Data Center

Renderfarm Avere FXT cluster

Remote Data Center

San Francisco

Los Angeles

Burbank

Artists

Redundant 10Gb

virtual private cloud Avere vFXT

Oregon

Spot Instances

10Gb Primary, 1Gb backup

EFS

Page 33: (CMP404) Cloud Rendering at Walt Disney Animation Studios

Mostly Automated Deployment

•  Pre-built EBS-backed AMI •  Heavily customized RHEL

•  Python/Boto3 •  Pass in how many resources and the minimum instance size •  Calculates resource weights •  Needs to calculate pricing

•  User-Data •  Raids ephemeral disks if available for scratch space •  Integrate with on-premises environment (DNS, asset inventory,

Puppet) •  Creates EC2 tags •  Runs Puppet to pick up changes since AMI-build-time •  Joins the render queue and asks for work

•  Scale-up/down still a manual process

Page 34: (CMP404) Cloud Rendering at Walt Disney Animation Studios

Spot Fleet Deployment

Core Count

./aws_spot_fleet_request  -­‐p  reinvent  -­‐-­‐cpu  8  -­‐-­‐ram  64  -­‐m  4.7    -­‐c  1500  

Page 35: (CMP404) Cloud Rendering at Walt Disney Animation Studios

Spot Fleet Deployment

Page 36: (CMP404) Cloud Rendering at Walt Disney Animation Studios

Spot Fleet Pricing

•  Target Price 1 •  $0.47/resource for the 40,000 core

•  Target Price 2 •  $0.16/resource for 16,000 cores

Page 37: (CMP404) Cloud Rendering at Walt Disney Animation Studios

Cloud Rendering Benchmarks

Page 38: (CMP404) Cloud Rendering at Walt Disney Animation Studios

Benchmarks: On Premises vs. the Cloud

0"

20"

40"

60"

80"

100"

120"

stream"triad" disk"read" disk"write"

On"Prem"

r3.4xlarge"

r3.8xlarge"

m4.4xlarge"

m4.10xlarge"

cr1.8xlarge"

Higher is better

Page 39: (CMP404) Cloud Rendering at Walt Disney Animation Studios

EFS Hydration

Single Node

50 Clients – multi-threaded file copy

Page 40: (CMP404) Cloud Rendering at Walt Disney Animation Studios

Average Open Latency

Page 41: (CMP404) Cloud Rendering at Walt Disney Animation Studios

Average Read Latency

0

100

200

300

400

500

600

700

100 500 800 1200 2400 4000

Tim

e (µ

s)

Render Processes

Mid-TierA Mid-TierB Mid-TierC Archive EFS

Page 42: (CMP404) Cloud Rendering at Walt Disney Animation Studios

Rendering in the Cloud vs. On-Premises

!"!!!!

!5,000!!

!10,000!!

!15,000!!

!20,000!!

!25,000!!

!30,000!!

1! 10! 20! 30! 40! 50! 60! 70! 80! 90!

Ren

der T

ime

(s)

Frame #

EC2/EFS!

On!Prem!

Lower is better

Page 43: (CMP404) Cloud Rendering at Walt Disney Animation Studios

Lessons Learned

•  Use as many different instance types as you can. Especially older generations.

•  Think about ways to modify your workload

•  Use every Availability Zone

•  Check your limits, especially your Amazon EBS limit and

VPC setup (address space)

•  Resource-oriented bidding

•  Diversified allocation

•  Benchmark your workload and set pricing accordingly

•  Set ONLY realistic pricing that you will pay for

•  Don’t be afraid to ask for help or pre-planning your run from AWS

Page 44: (CMP404) Cloud Rendering at Walt Disney Animation Studios

Conclusion •  Cloud rendering on AWS - State of the Union

Is getting stronger …

•  Rendering forecast Partly cloudy with a chance of all in the cloud…

•  Future research • Storage hydration

Distribute across many clients to saturate the EFS throughput

• Storage for processing Read freely and lump the writes (for shared FS performance)

• Latency is killer Atomic workflows within a single AZ/region Caching appliances

Page 45: (CMP404) Cloud Rendering at Walt Disney Animation Studios

Relevant talks

Page 46: (CMP404) Cloud Rendering at Walt Disney Animation Studios

Remember to complete your evaluations!

Page 47: (CMP404) Cloud Rendering at Walt Disney Animation Studios

Thank you!