6 roadmap operating pentaho at scale - … –worker nodes hear about new upcoming capabilities for...
TRANSCRIPT
Roadmap:OperatingPentahoatScaleJensBleuelSeniorProductManager,Pentaho
Agenda– WorkerNodes
HearaboutnewupcomingcapabilitiesforscalingoutthePentahoplatforminlargeenterpriseoperations.Thiswillcover8.0androadmaptopics.
• WorkerNodes:OverviewandBusinessBenefits
• HowisthisdifferentfromAEL/HadoopMapReduce
• TypicalCustomerScenarios
• Architecture&CapabilitiesincludingMonitoring&Logging
• ImprovementsinRelatedAreas
• Demonstration
• Availability&Roadmap
WorkerNodes– Overview
• WorkerNodescanscaleworkitemsacrossmultiplenodes(containers)like:
– PDIjobsandtransformations(in8.0)– Reportexecutions(notin8.0)– […]
• Itoperateseasilyandsecurelyacrossanelasticarchitecture,whichaddsadditionalmachineresourcesastheyarerequiredforprocessing
• WorkerNodescanoperateonpremiseorinthecloud
• UsesPopulartechnologiesunderthehoodsuchasDocker(ContainerPlatform),Chronos(Scheduler)andMesos/Marathon(ContainerOrchestration)
WorkerNode(a)
WorkerNode(b)
WorkerNode(c…)DistributeandScale
WorkerNodes– BusinessBenefits
Largeenterprisesneedtheabilitytoseamlesslyandefficientlyspinupresourcestohandle100s+workitemsatdifferenttimes,withdifferentdependenciesandprocessingrequirements.WorkerNodesaddressestheseneedsanddelivers:• FastertimetovalueandreducedTCObecauseitenablescustomerstodeploytheirownscale-outprocesseswithoutrequiredservices• Managechangingworkloadsmoreefficientlybyspinningresourcesupanddownasneeded• Increasedbusinessagilitythankstocontainerization– whichenablesportabilityofprocessesacrosson-prem andcloudenvironmentswithouttheneedtore-engineerthem.– Eveninpureon-prem,WNprovideselasticityandresourceoptimization.
HowIsThisDifferentfromAEL/HadoopMapReduce?
Thesetwoarchitecturescanalsobecombined:WithinaWorkerNode,aPDItransformationcanalsoscaleoutwithAELorMapReduce
SCALEOUTONDATA
SCALEOUTONPROCESSES(WORKITEMS)
AEL/HadoopMapReduce(simplified):• Dataisdistributedacrossnodes• Theprocessingtakesplaceatthenodelevel• Helpsinscaleoutdatavolume
WorkerNodes(simplified):• WorkItemslikePDIJobs,PDITransformationsgetdistributedacrossnodes– thisisabouttheprocessingandorchestration(incontrasttodistributingdata)
• HelpsinscaleoutPentahoprocesses
TypicalCustomerScenarios
CustomerType TypicalNumberofWorkItems Scale-OutNeed
Small Upto10 No
Medium 10through100 Sometimes
Enterprisewithonedepartment +/- 100 Yes
Enterprisewithmultipledepartments Hundredsorthousands Yes
TypicalCustomerExamples– SLA’sandTimeWindows
• NeedtomeetcustomerSLA’s– Datafromhundredsofsourcesneedtogetcollectedandaggregated– ThisisdonebyhundredsofPDIjobsandtransformations– Allthesejobsandtransformationsneedtobefinishedwithinadefinedtimewindow(forexamplebetween5amand7am)sothatthedataisavailableandaccurateforthetargetaudience
• WorkerNodesprovidesthetechnologytorunprocessesinparallelandscaleoutwhenneeded,forexampleatpeaktimes(endofmonth)
TypicalCustomerExamples– SharedServices
Exampleofoneproject:
• 800dailybatchesfromdifferentdepartmentsinanenterprise
• Oneserverwith120GBmemoryandmanyCPUs
• ThismachinehostslotsofVMinparallel
Issue:Whenthereistoomuchworkload,onemachineisnotenough
• WorkerNodessolvesthisinscalingoutonacluster
TypicalCustomerExamples– ScalableonDemand
• Needtosupportgrowingdatavolumesandcustomerrequirements
• WorkerNodesprovidesaflexibleandscalablearchitectureon-promiseorinthecloudforgrowingdemand
• Thisisseamlessanddoesnotneedtochangetheunderlyingarchitecture
WorkerNode(1)
WorkerNode(2)
WorkerNode(3)DistributeandScale
WorkerNode(1)
WorkerNode(2)
WorkerNode(3)DistributeandScale
WorkerNode(4)
WorkerNode(5)
BASETIMES PEAKTIMES
WORKERNODES
OrchestrationFramework
ContainerFramework
WorkerNodes– Newin8.0
• Containerizedscale-out• PentahoPDI“workitems”
PentahoServerWN1e.g.KJB
WN2e.g.KTR
WN…n“Executor”
Orchestration(Scheduler,monitoring,security,etc.)
Controller
Master(Standby)
Master(Standby)
Master(Working)
PentahoRepository
PentahoClients
WorkerNodesCapabilities
• Deployconsistentlyinphysical,virtual,andcloudenvironmentsAdaptstocustomerneeds(bare-metalvs.virtualizationvs.Cloud)andnoneedtomodifytheproductwhenthestrategychanges
• ScaleandloadbalanceservicesThishelpstodealwithpeaksandlimitedtime-windows,allocatetheresourcesthatareneeded.
• HybriddeploymentscanbeusedtodistributeloadEvenwhentheon-premise resourcesarenotsufficient,scalingoutintotheCloudispossibletoprovidemoreresources.
MonitoringandLogging
Monitoring– Overview
Monitoring– WorkerNodeExample
ImprovementsinRelatedAreasOpenandSaveDialogs
• Wheneveryousaveanewtransformation/jobintotherepository,thedefaultfolderissettotheuser’shomefolder.
PainPoint:SaveaNewJob/Transformation
Inpreviousversions:Theuserwillneedtochangethefolderforeverytimetheysaveanewtransformationorjob.
NewSaveDialogin8.0– Overview
• Remembersthelastopenedfolder!
• Justenterthefilename!(and/orchangethefolder)
• SimilartotheOpenDialogwithadditionalfunctionality(seenextslide).
NewOpenDialogin8.0– Overview
Recents
Openshowsthelastopenedfolder.Thisisabigtimesaver!
Search
ImprovementsinRelatedAreasRunConfigurations
PainPoint:RemotePentahoServerExecutionbefore 8.0
ToexecuteonthePentahoServerbefore8.0,youneedtodefineaSlaveserverandgivethecredentials. ThenexecuteontheselectedServer.
ExecuteonthePentahoServer
• ByselectingthePentahoserveroption,youdonotneedtodefineaSlaveserveranymorewhenyouwanttoexecuteremotely.
• Behindthescenes,thisoptionexecutesthetransformationorjobviatheScheduler.Thisisthesameasyouwoulddoa“ScheduleNow.”
Thisnewfunctionalityimprovestheeaseofuse,alsoforWorkerNodes
RunConfigurationswithinJobEntries
• RunConfigurationcanbeusedintheRundialogandalsointhejobentriesthatcouldexecutejobsortransformationsremotelyandonWorkerNodes
7.1 Example
8.0
Demonstration
AvailabilityandRoadmap
Availability
• WorkerNodesisEEonly
• Initially,8.0WorkerNodeswillbeLimitedAvailability– Fullysupported,productiondeployment– Distributiontoalimitednumberofcustomers
• Requiresadditionaldownloadandimplementationservices
• PentahoServer&RepositoryasaServiceincludingHighAvailability
• ImprovedMonitoringandLogging
• ExtendtootherPentahoworkitemssuchasReports
• IntegratedwithotherHitachiVantara ServicesandProducts
Roadmap
ContainerFrameworkPentahoServer
WN1e.g.KJB
WN2e.g.KTR
WN…n“Executor”
PentahoRepository
Summary
Whatwecoveredtoday:
• TheupcomingcapabilitiesforscalingoutthePentahoplatformandwhentousethem
• Howtousethenewwayofscalingoutworkitems(PentahoprocessessuchasPDIjobsandtransformations)acrossmultiplenodes
NextStepsWanttolearnmore?
• Meet-the-Expert:– PedroTeixera
• Otherrecommendedbreakoutsessions:– MattHoward:Pentaho8.0andRoadmap– RakeshSaha andJensBleuel:Roadmap:ProcessingBigData– MattCasters:PDIBestArchitecturePractices– SteveSzabo:PDISizingOverviewandCaseStudy– JonathanJarvis:UnderstandingParallelismwithPDIandAdaptiveExecutionwithSpark– MarkBurnett:UnderstandingtheBigDataTechnologyEcosystem