pachyderm: building a big data beast on kubernetes
TRANSCRIPT
Pachyderm’sArchitectureKubernetes
UserAnalysis
PachydermPipelineSystem
Services Jobs
PachydermFile
System
UserData
PachydermFileSystem
Acopy-on-writedistributedfilesystemCopy-on-writeistheparadigmthat“powers”technologieslike
DockerandSparkCorestorageforPachyderm
Whyisthiscool?
• Viewdiffs• InstantRevert• Reducestorageneeds• Reliability
Commit
0
Commit
1
Commit
2
Commit
3
Commit
4
Gitforhugedatasets
PachydermPipelineSystem
• Runsk8sjobsoverPFS• Jobstriggeredbycommits
• Understandsjobdependencies• Leveragescopy-on-writestorage
Task1
Task2 Task3
Task4
Dashboard
Task5
Task6
Data-awarecontainerscheduler
Pachydermis…
Task1
Task2 Task3
Task4
Dashboard
Task5
Task6
$Task2failed$Task4and6waiting…
…Fixingcode…
$Task2resuming...$Task2complete$Task4starting…
Monitoring
Resilient:K8sjobscanberestarted
Efficient:incrementalprocessing
3
2
1
0
Data Analysis
Task4
DashboardTask6
Task1
Task2 Task3
Task5
1%moredata
Task4
DashboardTask6
Pachydermis…
PFSstoragenodes
PPS
Copy-on-writestoragenodes
Elasticallyscalingcomputationnodes
d2.8xlarge
PPSPPS
PPSSpot
SpotSpot
Cost-effective:resourcemanagement
Pachydermis…
Summary
Kubernetesisagame-changerfordistributedsystems
Copy-on-writedataisreallypowerful
PachydermunlocksthepowerofKubernetesforbigdata