spark summit eu talk by shaun klopfenstein and neelesh shastry
TRANSCRIPT
![Page 1: Spark Summit EU talk by Shaun Klopfenstein and Neelesh Shastry](https://reader031.vdocument.in/reader031/viewer/2022030306/586f75821a28ab10258b60a3/html5/thumbnails/1.jpg)
Elastic Streaming Spark Streaming + Dynamic Provisioning + Dynamic Allocation
Neelesh Shastry, ArchitectShaun Klopfenstein, CTO
![Page 2: Spark Summit EU talk by Shaun Klopfenstein and Neelesh Shastry](https://reader031.vdocument.in/reader031/viewer/2022030306/586f75821a28ab10258b60a3/html5/thumbnails/2.jpg)
The Vision
![Page 3: Spark Summit EU talk by Shaun Klopfenstein and Neelesh Shastry](https://reader031.vdocument.in/reader031/viewer/2022030306/586f75821a28ab10258b60a3/html5/thumbnails/3.jpg)
Requirements
![Page 4: Spark Summit EU talk by Shaun Klopfenstein and Neelesh Shastry](https://reader031.vdocument.in/reader031/viewer/2022030306/586f75821a28ab10258b60a3/html5/thumbnails/4.jpg)
Page 4Marketo Proprietary andConfidential|©Marketo, Inc.10/30/16
Business Requirements
• Nearreal-timeactivityprocessing• Billons activitiespercustomerperday• Improve costefficiencyofoperationswhilescaling up• Globalenterprisegradesecurity andgovernance
![Page 5: Spark Summit EU talk by Shaun Klopfenstein and Neelesh Shastry](https://reader031.vdocument.in/reader031/viewer/2022030306/586f75821a28ab10258b60a3/html5/thumbnails/5.jpg)
Page 5Marketo Proprietary andConfidential|©Marketo, Inc.10/30/16
SAAS Requirements
• Customersareaddedandremoved• Fairnessandthrottlingpercustomer• Strictsequentialeventprocessingforsomeapplications• Temporarilysuspendacustomer,whenerrorsoccur
![Page 6: Spark Summit EU talk by Shaun Klopfenstein and Neelesh Shastry](https://reader031.vdocument.in/reader031/viewer/2022030306/586f75821a28ab10258b60a3/html5/thumbnails/6.jpg)
Technology Selection
![Page 7: Spark Summit EU talk by Shaun Klopfenstein and Neelesh Shastry](https://reader031.vdocument.in/reader031/viewer/2022030306/586f75821a28ab10258b60a3/html5/thumbnails/7.jpg)
Page 7Marketo Proprietary andConfidential|©Marketo, Inc.10/30/16
Use Cases
• Reacttoactivities• Sendanemailwhensomeone visitsawebpage• Changethescorewhensomeone fillsaform
• Replicatedata• BuildSolrIndexes, near real-time• UpdateDataXChange– aninternal leadcache• Syncto/fromCRMSystems
• Analytics• Incrementallyupdateemailreports• Enrichactivitiesandfeed toDruidforadvancedemail/webreports
![Page 8: Spark Summit EU talk by Shaun Klopfenstein and Neelesh Shastry](https://reader031.vdocument.in/reader031/viewer/2022030306/586f75821a28ab10258b60a3/html5/thumbnails/8.jpg)
Page 8Marketo Proprietary andConfidential|©Marketo, Inc.10/30/16
Why Spark Streaming?
• Micro-batchingprovidessink-sideefficiencies• GreatintegrationwithKafka• Nostrictrealtimeprocessingrequirements• Greatcommunity,industryadoption
![Page 9: Spark Summit EU talk by Shaun Klopfenstein and Neelesh Shastry](https://reader031.vdocument.in/reader031/viewer/2022030306/586f75821a28ab10258b60a3/html5/thumbnails/9.jpg)
Page 9Marketo Proprietary andConfidential|©Marketo, Inc.10/30/16
Challenges with Spark + Kafka
• Nowaytoadd/removetopicsonthefly• NooutoftheboxsupportforsequencingRDDs• Nosupportforturningofftopicsundererrors• DoesnotplaywellwithscalingKafkapartitionsup/down,whenorderingisrequired
![Page 10: Spark Summit EU talk by Shaun Klopfenstein and Neelesh Shastry](https://reader031.vdocument.in/reader031/viewer/2022030306/586f75821a28ab10258b60a3/html5/thumbnails/10.jpg)
Page 10Marketo Proprietary andConfidential|©Marketo, Inc.10/30/16
Challenges - Stragglers
• Abatchcan’tcompleteuntiltheslowestoperationfinishes
• Manyofourbatchesincludeslowoperations• Sometimesdon’tcompletewithinthebatchtime
• Batchesaremultitenant• onecustomersoperationcandelayprocessingforothercustomersinthesamebatch
• Severeimpactonutilization&batchdelay
![Page 11: Spark Summit EU talk by Shaun Klopfenstein and Neelesh Shastry](https://reader031.vdocument.in/reader031/viewer/2022030306/586f75821a28ab10258b60a3/html5/thumbnails/11.jpg)
Architecture & Design
![Page 12: Spark Summit EU talk by Shaun Klopfenstein and Neelesh Shastry](https://reader031.vdocument.in/reader031/viewer/2022030306/586f75821a28ab10258b60a3/html5/thumbnails/12.jpg)
Page 12Marketo Proprietary andConfidential|©Marketo, Inc.10/30/16
Marketo Activity Architecture
![Page 13: Spark Summit EU talk by Shaun Klopfenstein and Neelesh Shastry](https://reader031.vdocument.in/reader031/viewer/2022030306/586f75821a28ab10258b60a3/html5/thumbnails/13.jpg)
Page 13Marketo Proprietary andConfidential|©Marketo, Inc.10/30/16
Kafka Topics Organization
• Onetopicperusecase,datafromallcustomers• Easytomanage• Asinglecustomercancreatebacklogsforothersduringactivitystorms
• Fairness/throttlingishardtoimplement
• Onetopicperusecase,percustomer• Stormsareisolatedtothecustomer• Fairness/throttlingiseasytocontrol,bytweakingthetopic• PressureonKafkaZK– sofarnotaproblem
![Page 14: Spark Summit EU talk by Shaun Klopfenstein and Neelesh Shastry](https://reader031.vdocument.in/reader031/viewer/2022030306/586f75821a28ab10258b60a3/html5/thumbnails/14.jpg)
Solutions
![Page 15: Spark Summit EU talk by Shaun Klopfenstein and Neelesh Shastry](https://reader031.vdocument.in/reader031/viewer/2022030306/586f75821a28ab10258b60a3/html5/thumbnails/15.jpg)
Page 15Marketo Proprietary andConfidential|©Marketo, Inc.10/30/16
Dynamic provisioning capacity
JobGenerator
DAGScheduler
Executor1
Executor2
MultitenantKafka
DStream
OffsetManager
ProvisioningFramework
CustomerRegistry
Add/Remove
Check & Pull Changes
compute#Get new offsets
Generate RDD
Submit Job
Schedule Tasks
![Page 16: Spark Summit EU talk by Shaun Klopfenstein and Neelesh Shastry](https://reader031.vdocument.in/reader031/viewer/2022030306/586f75821a28ab10258b60a3/html5/thumbnails/16.jpg)
Page 16Marketo Proprietary andConfidential|©Marketo, Inc.10/30/16
Marketo Offset Manager
• Tracksmultitenancy• StreamingJobsprocessdataformanycustomers
• AccessingmultipleKafkatopicsandpartitions
• Addsnewtopics• Remove/Deactivate/Suspendtopics
![Page 17: Spark Summit EU talk by Shaun Klopfenstein and Neelesh Shastry](https://reader031.vdocument.in/reader031/viewer/2022030306/586f75821a28ab10258b60a3/html5/thumbnails/17.jpg)
Page 17Marketo Proprietary andConfidential|©Marketo, Inc.10/30/16
• EnablesefficientmultitenantRDDs• ControlledsequencingofRDDs• CoalesceKafkapartitions
• Bin-packingforefficiency
• Maintainspartitionlineageforoffsetmanagement
Multitenant DStream
![Page 18: Spark Summit EU talk by Shaun Klopfenstein and Neelesh Shastry](https://reader031.vdocument.in/reader031/viewer/2022030306/586f75821a28ab10258b60a3/html5/thumbnails/18.jpg)
Page 18Marketo Proprietary andConfidential|©Marketo, Inc.10/30/16
Provisioning
• Managesallocatingcustomerstoasparkstreamingapplication
• roundrobin+resourceaffinity• Enablesrebalancingofcustomersacrosssparkstreamingjobs
• Oozie basedframework
![Page 19: Spark Summit EU talk by Shaun Klopfenstein and Neelesh Shastry](https://reader031.vdocument.in/reader031/viewer/2022030306/586f75821a28ab10258b60a3/html5/thumbnails/19.jpg)
Page 19Marketo Proprietary andConfidential|©Marketo, Inc.10/30/16
Dynamic Resource Allocation
• SPARK-12133• Goal– “makeprocessing timeinfinitelyclosetoduration”• Assumes tasksareroughlysimilar
• Stragglersthrowthisgoaloff• Whatwereallywant:
• DRA+Safeconcurrent jobexecution
![Page 20: Spark Summit EU talk by Shaun Klopfenstein and Neelesh Shastry](https://reader031.vdocument.in/reader031/viewer/2022030306/586f75821a28ab10258b60a3/html5/thumbnails/20.jpg)
Page 20Marketo Proprietary andConfidential|©Marketo, Inc.10/30/16
Results so far
• ~10differentusecases• >100SparkExecutors• >1000KafkaPartitions• Processinglatencies<5s(99th %)• Rolledoutto~20%customers
![Page 21: Spark Summit EU talk by Shaun Klopfenstein and Neelesh Shastry](https://reader031.vdocument.in/reader031/viewer/2022030306/586f75821a28ab10258b60a3/html5/thumbnails/21.jpg)
Future Work
![Page 22: Spark Summit EU talk by Shaun Klopfenstein and Neelesh Shastry](https://reader031.vdocument.in/reader031/viewer/2022030306/586f75821a28ab10258b60a3/html5/thumbnails/22.jpg)
Page 22Marketo Proprietary andConfidential|©Marketo, Inc.10/30/16
Application Scheduling
• Schedulingwithinanapplicationtohandlestragglers• spark.streaming.concurrentJobs• Exploringschedulerpools• ChangestoStreamingJobScheduler,toexecutemultipleRDDssafely
![Page 23: Spark Summit EU talk by Shaun Klopfenstein and Neelesh Shastry](https://reader031.vdocument.in/reader031/viewer/2022030306/586f75821a28ab10258b60a3/html5/thumbnails/23.jpg)
Page 23Marketo Proprietary andConfidential|©Marketo, Inc.10/30/16
Scaling Up Kafka Partitions
• Ourcustomersgrowinsizeoveraperiodoftime• Orderingrequirementsmeanwecannotaltertopiconthefly
• Coordinationrequiredonbothproducer&consumerfronts
• Enhanceprovisioner tomanagepartitionup/downscaling
![Page 24: Spark Summit EU talk by Shaun Klopfenstein and Neelesh Shastry](https://reader031.vdocument.in/reader031/viewer/2022030306/586f75821a28ab10258b60a3/html5/thumbnails/24.jpg)
Page 24Marketo Proprietary andConfidential|©Marketo, Inc.10/30/16
Move to 2.x and Open Source!
![Page 25: Spark Summit EU talk by Shaun Klopfenstein and Neelesh Shastry](https://reader031.vdocument.in/reader031/viewer/2022030306/586f75821a28ab10258b60a3/html5/thumbnails/25.jpg)
We’re Hiring! Http://Marketo.Jobs
Q & A
![Page 26: Spark Summit EU talk by Shaun Klopfenstein and Neelesh Shastry](https://reader031.vdocument.in/reader031/viewer/2022030306/586f75821a28ab10258b60a3/html5/thumbnails/26.jpg)
Q & A
![Page 27: Spark Summit EU talk by Shaun Klopfenstein and Neelesh Shastry](https://reader031.vdocument.in/reader031/viewer/2022030306/586f75821a28ab10258b60a3/html5/thumbnails/27.jpg)
Page 27Marketo Proprietary andConfidential|©Marketo, Inc.10/30/16
Architecture Requirements
• Maximizeutilizationofhardware• Multitenancy supportwithfairness• Encryption,Authorization&Authentication• Applicationsmustscalehorizontally
![Page 28: Spark Summit EU talk by Shaun Klopfenstein and Neelesh Shastry](https://reader031.vdocument.in/reader031/viewer/2022030306/586f75821a28ab10258b60a3/html5/thumbnails/28.jpg)
Deploying It
![Page 29: Spark Summit EU talk by Shaun Klopfenstein and Neelesh Shastry](https://reader031.vdocument.in/reader031/viewer/2022030306/586f75821a28ab10258b60a3/html5/thumbnails/29.jpg)
Running It