lec16 dist para file systems - ranger.uta.edu
TRANSCRIPT
CSE 3320OperatingSystems
DistributedandParallelFileSystems
Jia RaoDepartmentofComputerScience and Engineering
http://ranger.uta.edu/~jrao
RecapofPreviousClasses• Filesystemsprovideanabstractionofpermanentlystoreddatao Namespace:filesanddirectories
} Translatepathstocorrespondinglocationsondisks
o Spacemanagementandoptimizations} Freeblocks
} Cachingandprefetching
o Reliabilityandconsistency
DistributedandParallelFileSystems
• Providesimilarabstractionsofdataonmultiple machineso Namespace:pathnameàmachineID:diskblockaddress
o Management:placementoffilesonmachines} Replication
} Striping
• Designedforperformanceandavailability
Distributedv.s.ParallelFileSystems• Designobjectives
o Fault-tolerancev.s.Concurrentperformance
• Datadistributiono Entirefileonasinglenodev.s.stripingovermultinodes
• Symmetryo Storageco-locatedwithappsv.s.storageseparatedfromapps
• Fault-toleranceo Designedforfault-tolerancev.s.relyingonenterprisestorage
• Workloado Looselycoupled,distributedappsv.s.coordinatedHPCapps
Theboundaryisblurring
Examples
• DistributedFileSystemso NFS,GFS(GoogleFileSystem),HDFS(Hadoop DistributedFileSystem),GlusterFS
• ParallelFileSystemso PVFS(ParallelVirtualFileSystem),Lustre,OCFS2,GPFS
DesignIssues(1)
• Nameservero mapsfilenamestoobjects(files,directories,blocks)o Implementationoptions
} SinglenameServer¨ Simple implementation, reliabilityandperformance issues
} SeveralNameServers(ondifferenthosts)¨ Eachserverresponsible foradomain
DesignIssues(2)• Caching
o Cachingattheclient:Mainmemoryvs.Disko Cacheconsistency
} Serverinitiated¨ Serverinformscachemanagerswhendatainclientcachesisstale¨ Clientcachemanagersinvalidatestaledataorretrievenewdata¨ Disadvantage:extensivecommunication
} Clientinitiated¨ Cachemanagersattheclientsvalidatedatawithserverbeforereturningitto
clients¨ Disadvantage:extensivecommunication
} Prohibit filecachingwhenconcurrent-writing¨ Severalclientsopenafile,atleastoneofthemforwriting¨ Serverinformsallclientstopurgethatcachedfile
} Lockfileswhenconcurrent-writesharing (atleastoneclientopens forwrite)
DesignIssues(3)• Update(write)policy
o Onceaclientwritesintoafile(andthelocalcache),whenshouldthemodifiedcachebesenttotheserver?} Write-through:allwritesattheclients,immediatelytransferredtotheservers¨ Advantage:reliability¨ Disadvantage:performance, itdoesnottakeadvantageofthecache
} Delayedwriting:delaytransfertoservers¨ Advantages:
¨ Manywritestakeplace(including intermediateresults)beforeatransfer
¨ Somedatamaybelost¨ Disadvantage:reliability
} Delayedwritinguntilfileisclosedatclient¨ Forshortopen intervals,sameasdelayedwriting¨ Forlong intervals,reliabilityproblems
DesignIssues(4)Availability
o Whatisthelevelofavailabilityoffilesinadistributedfilesystem?
o Usereplicationtoincreaseavailability,i.e.manycopies(replicas)offilesaremaintainedatdifferentsites/servers
o Replicationissues:} Howtokeepreplicasconsistent
} Howtodetectinconsistencyamong replicas
DesignIssues(5)Scalability
o Dealwithagrowingsystem?
o Issues} Nodejoinandleave(fail)
} Cacheconsistency
} Nameserver
o Solutions} Replication
} Designcacheconsistencyprotocolforscalability
} Multiplename(meta)servers
} Takeadvantageofmulti-threadandmulti-core
Example- GlusterFS (DFS)
Client-1 Client-2 Client-N
Gluster VirtualStoragePool(builtondonatedpartitionsoneachmachine)
Gluster GlobalNamespace(Gluster Native)
IPnetwork
Example– GlusterFS (2)
• Threewaystoplacefileso Distribute:placeentirefilesondifferentservers
} Pros:goodscalability,efficientdiskspaceusage
} Cons:poorreliability
o Replicate:placeidenticalcopiesoffilesondifferentservers} Pros:reliability
} Cons:wasteddiskspace,moderatescalability
o Stripe:placeonlypartofafileononeserver} Pros:goodperformanceforconcurrentandrandomaccess
} Cons:poorscalabilityandreliability
Example– PVFS(PFS)
Example– PVFS(PFS)
Significant improvement inthroughputWhatcouldbetheissues?
1. Severcoordinationaffectsefficiency2. ClientQoS?
DFSandPFSintheCloud(1)
• Bothapproachesprovidecheap,reliableandhigh-performancecloudstoragesolutions
Usecase-1
DFSandPFSintheCloud(2)
Usecase-2
SomeRealResults…• Hosta8-VMHadoop clusteron8DELLmachines
• PerformedmicroandrealI/Ointensiveworkloads
• Twostoragesolutions:PVFSandlocalext3
PVFS Localext3
Gridmix websort 20GBdata 2391second 4693second
16k 32k 64k 256k 1M
Sequential 58.89 60.15 60.47 104.80 130.47
random 12.34 20.84 33.51 50.43 108.71
16k 32k 64k 256k 1M
Sequential 120.11 120.56 120.39 120.39 120.57
random 4.01 7.80 14.71 43.20 92.19
PVFS
Localext3
Networkbandwidthbottleneck