scaling servers and storage for film assets · environment as of march 2011: • ~1000 perforce...
TRANSCRIPT
-
Scaling Servers and Storage for Film Assets Mike Sundy Digital Asset System Administrator
David Baraff Senior Animation Research Scientist
Pixar Animation Studios
-
Environment Overview Scaling Storage Scaling Servers Challenges with Scaling
-
Environment Overview
-
Environment
As of March 2011: • ~1000 Perforce users (80% of company) • 70 GB db.have • 12 million p4 ops per day (on busiest server) • 30+ VMWare server instances • 40 million submitted changelists (across all servers) • On 2009.1 but planning to upgrade to 2010.1 soon
-
Growth & Types of Data
Pixar grew from one code server in 2007 to 90+ Perforce servers storing all types of assets: • art – reference and concept art – inspirational art for film. • tech – show-specific data. e.g. models, textures, pipeline. • studio – company-wide reference libraries. e.g. animation reference, config files, flickr-like company photo site. • tools – code for our central tools team, software projects. • dept – department-specific files. e.g. Creative Resources has “blessed” marketing images. • exotics – patent data, casting audio, data for live action shorts, story gags, theme park concepts, intern art show.
-
Scaling Storage
-
Storage Stats
• 115 million files in Perforce. • 20+ TB of versioned files.
-
Techniques to Manage Storage
• Use +S filetype for the majority of generated data. Saved 40% of storage for Toy Story 3 (1.2 TB). • Work with teams to migrate versionless data out of Perforce. Saved 2 TB by moving binary scene data out. • De-dupe files — saved 1 million files and 1 TB.
-
De-dupe Trigger Cases
• p4 submit file1 file2 ••• fileN p4 submit file1 file2 ••• fileN # only file2 actually modified • p4 submit file # contents: revision n # five seconds later: “crap!” p4 submit file # contents: revision n–1 • p4 delete file p4 submit file # user deletes file (revision n) # five seconds later: “crap!” p4 add file p4 submit file # contents: revision n
-
De-dupe Trigger Mechanics
repfile.14 AABBCC…!
repfile.15 AABBCC…!
repfile.24 AABBCC…!
repfile.25 XXYYZZ…!
repfile.26 AABBCC…!
repfile.34 AABBCC…!
repfile.38 AABBCC…!
file#n file#n+1
file#n file#n+1 file#n+2
file#n file#n+2 file#n+1
-
De-dupe Trigger Mechanics
repfile.24 AABBCC…!
repfile.25 XXYYZZ…!
repfile.26 AABBCC…!
• +F for all files; detect duplicates via checksums. • Safely discard duplicate: $ ln repfile.24 repfile.26.tmp $ rename repfile.26.tmp repfile.26!
repfile.24 AABBCC…!
repfile.25 XXYYZZ…!
repfile.26.tmp
repfile.26 AABBCC…!
hardlink rename
-
Scaling Servers
-
Scale Up vs. Scale Out
Why did we choose to scale out? • Shows are self-contained. • Performance of one depot won’t affect another.* • Easy to browse other depots. • Easier administration/downtime scheduling. • Fits with workflow (e.g. no merging art) • Central code server – share where it matters.
-
Pixar Perforce Server Spec
• VMWare ESX Version 4. • RHEL 5 (Linux 2.6). • 4 GB RAM. • 50 GB “local” data volume (on EMC SAN). • Versioned files on Netapp GFX. • 90 Perforce depots on 6 node VMWare cluster – special 2-node cluster for “hot” tech show. • For more details, see 2009 conference paper.
-
Virtualization Benefits
• Quick to spin up new servers. • Stable and fault tolerant. • Easy to remotely administer. • Cost-effective. • Reduces datacenter footprint, cooling, power, etc.
-
Reduce Dependencies
• Clone all servers from a VM template. • RHEL vs. Fedora. • Reduce triggers to minimum. • Default tables, p4d startup options. • Versioned files stored on NFS. • VM on a cluster. • Can build new VM quickly if one ever dies.
-
Virtualization Gotchas
• Had severe performance problem when one datastore grew to over 90% full. • Requires some jockeying to ensure load stays balanced across multiple nodes – manual vs. auto. • Physical host performance issues can cause cross-depot issues.
-
Speed of Virtual Perforce Servers
• Used Perforce Benchmark Results Database tools. • Virtualized servers 95% of performance for branchsubmit benchmark. • 85% of performance for browse benchmark (not as critical to us). • VMWare flexibility outweighed minor performance hit.
-
Quick Server Setup
• Critical to be able to quickly spin up new servers. • Went from 2-3 days for setup to 1 hour. 1-hour Setup • Clone a p4 template VM. (30 minutes) • Prep the VM. ( 15 minutes) • Run “squire” script to build out p4 instance. (8 seconds) • Validate and test. (15 minutes)
-
Squire
Script which automates p4 server setup. Sets up: • p4 binaries • metadata tables (protect/triggers/typemap/counters) • cron jobs (checkpoint/journal/verify) • monitoring • permissions (filesystem and p4) • .initd startup script • linkatron namespace • pipeline integration (for tech depots) • config files
-
Superp4
Script for managing p4 metadata tables across multiple servers. • Preferable to hand-editing 90 tables. • Database driven (i.e. list of depots) • Scopable by depot domain (art, tech, etc.) • Rollback functionality.
-
Superp4 example
$ cd /usr/anim/ts3!$ p4 triggers -o!Triggers:
!noHost form-out client ”removeHost.py %formfile%”!!$ cat fix-noHost.py!def modify(data, depot):! return [line.replace("noHost form-out”,! "noHost form-in”)! for line in data]!!$ superp4 –table triggers –script fix-noHost.py –diff! • Copies triggers to restore dir • Runs fix-noHost.py to produce new triggers, for each depot. • Shows me a diff of the above. • Asks confirmation; finally, modifies triggers on each depot. • Tells me where the restore dir is!!
-
Superp4 options
$ superp4 –help! -n Don’t actually modify data! -diff Show diffs for each depot using xdiff. -category category Pick depots by category (art, tech, etc.) -units unit1 unit2 ... Specify an explicit depot list (regexp allowed). -script script Python file to be execfile()'d; must define a function named modify(). -table tableType Table to operate on (triggers, typemap,…) -configFile configFile Config file to modify (e.g. admin/values-config) -outDir outDir Directory to store working files, and for restoral. -restoreDir restoreDir Directory previously produced by running superp4, for when you screw up.
-
Challenges With Scaling
-
Gotchas
• //spec/client filled up. • user-written triggers sub-optimal. • “shadow files” consumed server space. • monitoring difficult – cue templaRX and mayday. • cap renderfarm ops. • beware of automated tests and clueless GUIs. • verify can be dangerous to your health (cross-depot).
-
Summary
• Perforce scales well for large amounts of binary data. • Virtualization = fast and cost-effective server setup. • Use +S filetype and de-dupe to reduce storage usage.
-
Q & A
Questions?