afs at intel -

17
AFS at Intel AFS at Intel Travis Broughton Travis Broughton

Upload: others

Post on 08-Dec-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

AFS at IntelAFS at Intel

Travis BroughtonTravis Broughton

AgendaAgenda

Intel’s Engineering EnvironmentIntel’s Engineering EnvironmentThings AFS Does wellThings AFS Does wellHow Intel uses AFSHow Intel uses AFSHow not to use AFSHow not to use AFSManagement ToolsManagement Tools

Intel’s Engineering EnvironmentIntel’s Engineering Environment

Learned about AFS in 1991Learned about AFS in 1991First deployed AFS in Intel’s Israel First deployed AFS in Intel’s Israel design center in 1992design center in 1992Grew to a peak of 30 cells in 2001Grew to a peak of 30 cells in 2001Briefly considered DCE/DFS Briefly considered DCE/DFS migration in 1998 (the first time AFS migration in 1998 (the first time AFS was scheduled to go away…)was scheduled to go away…)

Intel’s Engineering EnvironmentIntel’s Engineering Environment~95% NFS, ~5% AFS~95% NFS, ~5% AFS~20 AFS cells managed by ~10 regional ~20 AFS cells managed by ~10 regional organizationsorganizationsAFS used for CAD and /usr/local AFS used for CAD and /usr/local applications, global data sharing for applications, global data sharing for projects, secure access to dataprojects, secure access to dataNFS used for everything else, gives NFS used for everything else, gives higher performance in most caseshigher performance in most casesWide range of client platforms, OSs, etcWide range of client platforms, OSs, etc

Cell Topology ConsiderationsCell Topology Considerations

Number of sites/campuses/buildings Number of sites/campuses/buildings to supportto supportDistance (latency) between sitesDistance (latency) between sitesMax # of replicas needed for a Max # of replicas needed for a volumevolumeTrustTrust… … As a result, Intel has many cellsAs a result, Intel has many cells

Things AFS Does WellThings AFS Does WellSecuritySecurity

• Uses Kerberos, doesn’t have to trust clientUses Kerberos, doesn’t have to trust client• Uses ACLs, better granularityUses ACLs, better granularity

Performance for frequently-used files Performance for frequently-used files • e.g. /usr/local/bin/perle.g. /usr/local/bin/perl

High availability for RO dataHigh availability for RO dataStorage virtualizationStorage virtualizationGlobal, delegated namespaceGlobal, delegated namespace

AFS Usage at Intel:AFS Usage at Intel:Global Data Sharing Global Data Sharing

Optimal use of compute resourcesOptimal use of compute resources• Batch jobs launched from site x may land at site y, Batch jobs launched from site x may land at site y,

depending on demanddepending on demandOptimal use of headcount resourcesOptimal use of headcount resources

• A project based at site x may “borrow” idle headcount A project based at site x may “borrow” idle headcount from site y without relocationfrom site y without relocation

Optimal license sharingOptimal license sharing• A project based at site x may borrow idle software licenses A project based at site x may borrow idle software licenses

(assuming contract allows “WAN” licensing)(assuming contract allows “WAN” licensing)Efficient IP reuseEfficient IP reuse

• A project based at site x may require access to the most A project based at site x may require access to the most recent version of another project being developed at site yrecent version of another project being developed at site y

Storage virtualization and load balancingStorage virtualization and load balancing• Many servers – can migrate data to balance load and do Many servers – can migrate data to balance load and do

maintenance during working hoursmaintenance during working hours

AFS Usage at Intel:AFS Usage at Intel:Other ApplicationsOther Applications

x-site tool consistencyx-site tool consistency• Before rsync was widely deployed and SSH-tunneled, used Before rsync was widely deployed and SSH-tunneled, used

AFS namespace to keep tools in syncAFS namespace to keep tools in sync@sys simplifies multiplatform support@sys simplifies multiplatform support

• Environment variables, automounter macros are reasonable Environment variables, automounter macros are reasonable workaroundsworkarounds

““@cell” link at top-level of AFS simplifies @cell” link at top-level of AFS simplifies namespacenamespace

• In each cell, @cell points to the local cellIn each cell, @cell points to the local cell• Mirrored data in multiple cells can be accessed through the Mirrored data in multiple cells can be accessed through the

same path (fs wscell expansion would also work)same path (fs wscell expansion would also work)/usr/local, CAD tool storage/usr/local, CAD tool storage

• Cache manager outperforms NFSCache manager outperforms NFS• Replication provides many levels of fault-toleranceReplication provides many levels of fault-tolerance

Things AFS Doesn’t Do WellThings AFS Doesn’t Do WellPerformance on seldom-used filesPerformance on seldom-used filesHigh availability for RW dataHigh availability for RW dataScalability with SMP systemsScalability with SMP systemsIntegration with OSIntegration with OSFile/volume size limitationsFile/volume size limitations

When NOT to Use AFSWhen NOT to Use AFS

CVS repositoriesCVS repositories• Remote $CVSROOT using SSH seems to Remote $CVSROOT using SSH seems to

work betterwork better

rsyncrsyncAny other tool that would potentially Any other tool that would potentially thrash the cache…thrash the cache…

Other Usage NotesOther Usage Notes

Client cache is better than nothing, Client cache is better than nothing, but shared “edge” cache may be but shared “edge” cache may be betterbetter• Mirroring w/ rsync accomplishes this for Mirroring w/ rsync accomplishes this for

RO dataRO data• Client disk is very cheap, shared Client disk is very cheap, shared

(fileserver) disk is fairly cheap, WAN (fileserver) disk is fairly cheap, WAN bandwidth is still costly (and latency can bandwidth is still costly (and latency can rarely be reduced)rarely be reduced)

OpenAFS at IntelOpenAFS at Intel

Initially used contrib’d AFS 3.3 port for Initially used contrib’d AFS 3.3 port for LinuxLinuxAdopted IBM/Transarc port when it Adopted IBM/Transarc port when it became availablebecame availableMigrated to OpenAFS when kernel churn Migrated to OpenAFS when kernel churn became too frequentbecame too frequentOpenafs-devel very responsive to bug Openafs-devel very responsive to bug submissionssubmissions• Number of bug submissions (from Intel) Number of bug submissions (from Intel)

tapering off – client has become much more tapering off – client has become much more stablestable

Management ToolsManagement Tools

Data age indicatorsData age indicators• Per-volume view onlyPer-volume view only• 11pm (local) nightly cron job to collect volume access statistics11pm (local) nightly cron job to collect volume access statistics

idle++ if accesses==0, else idle=0idle++ if accesses==0, else idle=0Mountpoint databaseMountpoint database• /usr/afs/bin/salvager –showmounts on all fileservers/usr/afs/bin/salvager –showmounts on all fileservers• Find root.afs volume, traverse mountpoints to build treeFind root.afs volume, traverse mountpoints to build tree

MountpointDB auditMountpointDB audit• Find any volume names not listed MpDBFind any volume names not listed MpDB• Find unused read-only replicas (mounted under RW)Find unused read-only replicas (mounted under RW)

Samba integrationSamba integration• SmbklogSmbklog

““Storage on Demand”Storage on Demand”• Delegates volume creation (primarily for scratch space) to Delegates volume creation (primarily for scratch space) to

users, with automated reclaimusers, with automated reclaim

Management ToolsManagement Tools

Recovery of PTS groupsRecovery of PTS groups• Cause – someone confuses “pts del” and “pts rem”Cause – someone confuses “pts del” and “pts rem”• Initial fix – create a new cell, restore pts db, use pts exa to get Initial fix – create a new cell, restore pts db, use pts exa to get

list of userslist of users• Easier fix – wrap pts to log pts del, capture state of group before Easier fix – wrap pts to log pts del, capture state of group before

deletingdeleting• Even better fix – do a nightly text dump of your PTS DBEven better fix – do a nightly text dump of your PTS DB

Mass deletion of volumesMass deletion of volumes• Cause – someone does “rm –rf” equivalent in the wrong place Cause – someone does “rm –rf” equivalent in the wrong place

(most recent case was a botched rsync)(most recent case was a botched rsync)• Initial fix – lots of vos dump .backup/.readonly | vos restoreInitial fix – lots of vos dump .backup/.readonly | vos restore

Disks fill up, etcDisks fill up, etc• Other fixes – watch size of volumes, and alert if some threshold Other fixes – watch size of volumes, and alert if some threshold

change is exceededchange is exceededThrow fileserver into debug mode, capture IP address doing the Throw fileserver into debug mode, capture IP address doing the damage and lock it downdamage and lock it down

Management ToolsManagement Tools

Watch for ‘calls waiting for a thread’Watch for ‘calls waiting for a thread’Routing loops can trigger problemsRouting loops can trigger problemsTrue load-based meltdowns can be diagnosedTrue load-based meltdowns can be diagnosed• Send signal to fileserver to toggle debug modeSend signal to fileserver to toggle debug mode• Collect logs for some period of time (minutes)Collect logs for some period of time (minutes)• Analyze logs to locate most frequently used vnodesAnalyze logs to locate most frequently used vnodes• Convert vnum to inumConvert vnum to inum• Use find to locate busiest volume and files/directories Use find to locate busiest volume and files/directories

being accessedbeing accessed• Sometimes requires moving the busy volume elsewhere Sometimes requires moving the busy volume elsewhere

to complete diagnosisto complete diagnosis

Management ToolsManagement Tools

Keep fileserver machines identical if Keep fileserver machines identical if possiblepossible• Easier maintenanceEasier maintenanceKeep a hot spare fileserver around and Keep a hot spare fileserver around and onlineonline• Configure as a fileserver in local cell to host Configure as a fileserver in local cell to host

busy volumesbusy volumes• Configure as a DB server in its own cell for DB Configure as a DB server in its own cell for DB

recoveryrecovery““Splitting” a volume is somewhat tediousSplitting” a volume is somewhat tedious• Best to plan directory/volume layout ahead of Best to plan directory/volume layout ahead of

time, but it can be changed if necessarytime, but it can be changed if necessary

Questions?Questions?