oozie hug may 2011
DESCRIPTION
Presented at HUG, 2011.TRANSCRIPT
Oozie3:ImprovedSchedulingandControlOfWorkflows
MohammadKIslamkamrul@yahoo‐inc.com
Introduc?ons
• OozieTeam• Architecture,Development,Management
– MayankBansal– AngeloHuang– MohammadIslam– AmolKekre– AndreasNewman– LeiZhang
• Externalcontributors.• QE
– MarcyChang– MichelleChiang
• WhoIam• TechnicalLeadatYahoo!
Agenda• OozieOverview• Oozie3.0features:– Bundle– Scalability– Usability
• FuturePlan• Q&A
Overview:Workflow• OozieexecutesworkflowdefinedasDAGofjobs.• Thejobtypeincludes:Map‐Reduce/Pipes/Streaming/Pig/CustomJavaCodeetc.
• IntroducedinOozie1.x.
startM/Rjob
M/Rstreaming
job
decision
fork
Pigjob
M/Rjob
join
end JavaFSjob
ENOUGH
MORE
Overview:Coordinator• Oozieexecutesworkflowbasedon:– TimeDependency(Frequency)– DataDependency
• IntroducedinOozie2.x.
Hadoop
OozieServer
OozieClient
OozieWorkflow
WSAPI OozieCoordinator
CheckDataAvailability
Bundle
• WhatisBundle?– AnewabstracconlayerontopofCoordinator.– Userscandefineandexecutea bunch of coordinatorapplicacons.
– IntroducedinOozie3.x.• Whyitisrequired?– Datapipeline:Asetofinter‐relatedcoordinatorsapplicaconrequiredforlargedataprocessing.
– Operaconalnightmare:HardtomaintainandcontrolthesepipelinesforServiceEngineeringteam.
BundleCont.• Userdefinesthebundlethroughanew XML.• Usercouldstart/stop/suspend/resume/rerun inthebundlelevel.
• Bundleisop3onal.
Hadoop
OozieServer
OozieClient
Workflow
WSAPI
Coordinator
CheckDataAvailability
Bundle
OozieAbstrac?onLayers
Coord Action 1
Coord Action 2
Coord Action1
Coord Action 2
WF Job 1 WF Job 2 WF Job 2
M/R Job
PIG Job
FS Job
M/R Job
PIG Job
Bundle Layer1
Coord Job 1 Coord Job 2
Layer2
WF Job 1
Layer3
EnhancedStabilityandScalability
• Issue:Atveryhighload,Ooziebecomesslow.• Impact:90%ofthetotalOoziesupportincidence.• Reason:– Lotofaccvebutnon‐progressingjobs.– Non‐progressingjobsareconsumingalotofresources.
– Oozieinternalqueueisfull.• Resolucon:– Throhlethenumberofaccvejobs/coordinator– Putthejobintocmeoutstate.– Enforcetheuniquenessforooziequeueelement.
ImprovedUsability
• Issue:Coordinatorjob’sstatusisnotintuicveandcausesconfusiontotheOozieuser.
• Impact:UserconfusionandrelatedOoziesupport.
• Reason:– StatusSUCCEEDEDdoesn’tmeanjobissuccessful!!– StatusPREMATERisforoozieinternaluseonly.Butitwasexposedtouser.
• Resolucon:– RedesignCoordinatorstatus
CoordinatorStatusRedesign
PREP Running
KILLED
SUCCEEDED
FAILED
DONE_WITH_ERROR
SUSPENDED
PAUSED
Current
New
PREP PREMATER Running
KILLED
SUCCEEDED
FAILED
SUSPENDED
PREMATER SUCCEEDED
FuturePlan• HigherScalability:Changepolling‐baseddata‐dependencychecktopush‐modelthroughHCatalogandNocficaconsystem.
• Adaptability:GracefulhandlingHadoopdowncme:– IfHadoopisdown,blocksubmission.
– WhenHadoopbecomesavailable• Submittheblockedjob
• Auto‐resubmittheuntracedjob.
• Monitoring:RichWSAPIforapplicaconMonitoring/Alercng.
FuturePlanCont.
• Automa?cFailover:UsingZooKeeper.• LoadBalancing:Throughserverreplicacon• ImprovedUsability:– Distcpaccon– HiveAccon
• Asynchronousdataprocessing.• Incrementaldataprocessing.
• ApacheMigra?on:Worksinicated.
Q&A
MohammadKIslam
kamrul@yahoo‐inc.com
• Githublink:hhp://yahoo.github.com/oozie• Mailinglist:[email protected]