scaling servers and storage for film assets · environment as of march 2011: • ~1000 perforce...

28
Scaling Servers and Storage for Film Assets Mike Sundy Digital Asset System Administrator David Baraff Senior Animation Research Scientist Pixar Animation Studios

Upload: others

Post on 24-Oct-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

  • Scaling Servers and Storage for Film Assets Mike Sundy Digital Asset System Administrator

    David Baraff Senior Animation Research Scientist

    Pixar Animation Studios

  • Environment Overview Scaling Storage Scaling Servers Challenges with Scaling

  • Environment Overview

  • Environment

    As of March 2011: •  ~1000 Perforce users (80% of company) •  70 GB db.have •  12 million p4 ops per day (on busiest server) •  30+ VMWare server instances •  40 million submitted changelists (across all servers) •  On 2009.1 but planning to upgrade to 2010.1 soon

  • Growth & Types of Data

    Pixar grew from one code server in 2007 to 90+ Perforce servers storing all types of assets: •  art – reference and concept art – inspirational art for film. •  tech – show-specific data. e.g. models, textures, pipeline. •  studio – company-wide reference libraries. e.g. animation reference, config files, flickr-like company photo site. •  tools – code for our central tools team, software projects. •  dept – department-specific files. e.g. Creative Resources has “blessed” marketing images. •  exotics – patent data, casting audio, data for live action shorts, story gags, theme park concepts, intern art show.

  • Scaling Storage

  • Storage Stats

    •  115 million files in Perforce. •  20+ TB of versioned files.

  • Techniques to Manage Storage

    •  Use +S filetype for the majority of generated data. Saved 40% of storage for Toy Story 3 (1.2 TB). •  Work with teams to migrate versionless data out of Perforce. Saved 2 TB by moving binary scene data out. •  De-dupe files — saved 1 million files and 1 TB.

  • De-dupe Trigger Cases

    • p4 submit file1 file2 ••• fileN p4 submit file1 file2 ••• fileN # only file2 actually modified •  p4 submit file # contents: revision n # five seconds later: “crap!” p4 submit file # contents: revision n–1 •  p4 delete file p4 submit file # user deletes file (revision n) # five seconds later: “crap!” p4 add file p4 submit file # contents: revision n

  • De-dupe Trigger Mechanics

    repfile.14 AABBCC…!

    repfile.15 AABBCC…!

    repfile.24 AABBCC…!

    repfile.25 XXYYZZ…!

    repfile.26 AABBCC…!

    repfile.34 AABBCC…!

    repfile.38 AABBCC…!

    file#n file#n+1

    file#n file#n+1 file#n+2

    file#n file#n+2 file#n+1

  • De-dupe Trigger Mechanics

    repfile.24 AABBCC…!

    repfile.25 XXYYZZ…!

    repfile.26 AABBCC…!

    •  +F for all files; detect duplicates via checksums. •  Safely discard duplicate: $ ln repfile.24 repfile.26.tmp 
 $ rename repfile.26.tmp repfile.26!

    repfile.24 AABBCC…!

    repfile.25 XXYYZZ…!

    repfile.26.tmp

    repfile.26 AABBCC…!

    hardlink rename

  • Scaling Servers

  • Scale Up vs. Scale Out

    Why did we choose to scale out? •  Shows are self-contained. •  Performance of one depot won’t affect another.* •  Easy to browse other depots. •  Easier administration/downtime scheduling. •  Fits with workflow (e.g. no merging art) •  Central code server – share where it matters.

  • Pixar Perforce Server Spec

    •  VMWare ESX Version 4. •  RHEL 5 (Linux 2.6). •  4 GB RAM. •  50 GB “local” data volume (on EMC SAN). •  Versioned files on Netapp GFX. •  90 Perforce depots on 6 node VMWare cluster – special 2-node cluster for “hot” tech show. •  For more details, see 2009 conference paper.

  • Virtualization Benefits

    •  Quick to spin up new servers. •  Stable and fault tolerant. •  Easy to remotely administer. •  Cost-effective. •  Reduces datacenter footprint, cooling, power, etc.

  • Reduce Dependencies

    •  Clone all servers from a VM template. •  RHEL vs. Fedora. •  Reduce triggers to minimum. •  Default tables, p4d startup options. •  Versioned files stored on NFS. •  VM on a cluster. •  Can build new VM quickly if one ever dies.

  • Virtualization Gotchas

    •  Had severe performance problem when one datastore grew to over 90% full. •  Requires some jockeying to ensure load stays balanced across multiple nodes – manual vs. auto. •  Physical host performance issues can cause cross-depot issues.

  • Speed of Virtual Perforce Servers

    •  Used Perforce Benchmark Results Database tools. •  Virtualized servers 95% of performance for branchsubmit benchmark. •  85% of performance for browse benchmark (not as critical to us). •  VMWare flexibility outweighed minor performance hit.

  • Quick Server Setup

    •  Critical to be able to quickly spin up new servers. •  Went from 2-3 days for setup to 1 hour. 1-hour Setup •  Clone a p4 template VM. (30 minutes) •  Prep the VM. ( 15 minutes) •  Run “squire” script to build out p4 instance. (8 seconds) •  Validate and test. (15 minutes)

  • Squire

    Script which automates p4 server setup. Sets up: •  p4 binaries •  metadata tables (protect/triggers/typemap/counters) •  cron jobs (checkpoint/journal/verify) •  monitoring •  permissions (filesystem and p4) •  .initd startup script •  linkatron namespace •  pipeline integration (for tech depots) •  config files

  • Superp4

    Script for managing p4 metadata tables across multiple servers. •  Preferable to hand-editing 90 tables. •  Database driven (i.e. list of depots) •  Scopable by depot domain (art, tech, etc.) •  Rollback functionality.

  • Superp4 example

    $ cd /usr/anim/ts3!$ p4 triggers -o!Triggers: 


    !noHost form-out client ”removeHost.py %formfile%”!!$ cat fix-noHost.py!def modify(data, depot):! return [line.replace("noHost form-out”,! "noHost form-in”)! for line in data]!!$ superp4 –table triggers –script fix-noHost.py –diff! • Copies triggers to restore dir • Runs fix-noHost.py to produce new triggers, for each depot. • Shows me a diff of the above. • Asks confirmation; finally, modifies triggers on each depot. • Tells me where the restore dir is!!

  • Superp4 options

    $ superp4 –help! -n Don’t actually modify data! -diff Show diffs for each depot using xdiff. -category category Pick depots by category (art, tech, etc.) -units unit1 unit2 ... Specify an explicit depot list (regexp allowed). -script script Python file to be execfile()'d; must define a function named modify(). -table tableType Table to operate on (triggers, typemap,…) -configFile configFile Config file to modify (e.g. admin/values-config) -outDir outDir Directory to store working files, and for restoral. -restoreDir restoreDir Directory previously produced by running superp4, for when you screw up.

  • Challenges With Scaling

  • Gotchas

    •  //spec/client filled up. •  user-written triggers sub-optimal. •  “shadow files” consumed server space. •  monitoring difficult – cue templaRX and mayday. •  cap renderfarm ops. •  beware of automated tests and clueless GUIs. •  verify can be dangerous to your health (cross-depot).

  • Summary

    •  Perforce scales well for large amounts of binary data. •  Virtualization = fast and cost-effective server setup. •  Use +S filetype and de-dupe to reduce storage usage.

  • Q & A

    Questions?