![Page 1: Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)](https://reader034.vdocument.in/reader034/viewer/2022042619/58f9a973760da3da068b6ec7/html5/thumbnails/1.jpg)
Spark and Spark Streaming @ Netflix Kedar Sadekar & Monal Daxini
![Page 2: Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)](https://reader034.vdocument.in/reader034/viewer/2022042619/58f9a973760da3da068b6ec7/html5/thumbnails/2.jpg)
Mission • Enable rapid pace of innovation for
Algorithm Engineers
• Business Value – More A/B tests
![Page 3: Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)](https://reader034.vdocument.in/reader034/viewer/2022042619/58f9a973760da3da068b6ec7/html5/thumbnails/3.jpg)
Experiments
Users with plays Feature selection Large sample size
Turn back time Multiple ideas
![Page 4: Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)](https://reader034.vdocument.in/reader034/viewer/2022042619/58f9a973760da3da068b6ec7/html5/thumbnails/4.jpg)
Use Cases
Feature Selection
Feature Generation
Model Training
Metric Evaluation
![Page 5: Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)](https://reader034.vdocument.in/reader034/viewer/2022042619/58f9a973760da3da068b6ec7/html5/thumbnails/5.jpg)
Use Cases
More Users
Faster Iteration
InteractiveTurn back time
![Page 6: Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)](https://reader034.vdocument.in/reader034/viewer/2022042619/58f9a973760da3da068b6ec7/html5/thumbnails/6.jpg)
Solution – Netflix BDAS
Netflix BDAS
Notebooks• Zeppelin / iPython
• Inline Graphs / REPL
Prana Sidecar• Netflix Ecosystem
Berkeley BDAS• Faster Compute• Scale Users• Access to S3 / Hive
![Page 7: Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)](https://reader034.vdocument.in/reader034/viewer/2022042619/58f9a973760da3da068b6ec7/html5/thumbnails/7.jpg)
Netflix BDAS - Features • Simplicity
– Individual cluster • Prana - Netflix ecosystem
– Automatic configuration – Classloader isolation – Discovery & healthcheck
![Page 8: Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)](https://reader034.vdocument.in/reader034/viewer/2022042619/58f9a973760da3da068b6ec7/html5/thumbnails/8.jpg)
Netflix BDAS - Features • Ad-hoc experimentation • Time machine functionality • Access to Hive data and micro services from single place
– Access to multiple AWS buckets (S3)
![Page 9: Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)](https://reader034.vdocument.in/reader034/viewer/2022042619/58f9a973760da3da068b6ec7/html5/thumbnails/9.jpg)
Netflix BDAS – Sample Deployment
![Page 10: Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)](https://reader034.vdocument.in/reader034/viewer/2022042619/58f9a973760da3da068b6ec7/html5/thumbnails/10.jpg)
Wins • 8X the number of users
• 5x - 9x faster
• Interactive
• Turn back time
![Page 11: Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)](https://reader034.vdocument.in/reader034/viewer/2022042619/58f9a973760da3da068b6ec7/html5/thumbnails/11.jpg)
Learnings • Easy to bring down an online system • Almost killed 1000’s of ETL jobs
- hive metastore update • Too many systems and configuration • Playing catch up with libraries and tools
- Hadoop, iPython, Zeppelin • Scala / Spark learning curve • Debugging
- files open, no resources etc.
![Page 12: Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)](https://reader034.vdocument.in/reader034/viewer/2022042619/58f9a973760da3da068b6ec7/html5/thumbnails/12.jpg)
Increased Adoption Adoption increasing amongst teams • Multiple Algorithmic Eng. teams • Personalization Infrastructure • Marketing • Security • A/B Test Engineering
![Page 13: Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)](https://reader034.vdocument.in/reader034/viewer/2022042619/58f9a973760da3da068b6ec7/html5/thumbnails/13.jpg)
Looking Ahead • Spark-R / Dataframes support • Multi-tenancy
– Job specific configurations • Debuggability • Newer notebooks • Spark Streaming
– Lambda Architecture – Real-time algorithms (trending now)
![Page 14: Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)](https://reader034.vdocument.in/reader034/viewer/2022042619/58f9a973760da3da068b6ec7/html5/thumbnails/14.jpg)
Netflix Streaming Event Data Pipeline
Monal Daxini
![Page 15: Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)](https://reader034.vdocument.in/reader034/viewer/2022042619/58f9a973760da3da068b6ec7/html5/thumbnails/15.jpg)
Netflix Event Data Pipeline
Event Streams
Stream processing
![Page 16: Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)](https://reader034.vdocument.in/reader034/viewer/2022042619/58f9a973760da3da068b6ec7/html5/thumbnails/16.jpg)
Publish
Collect
Move
Process
Events @ Cloud Scale
![Page 17: Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)](https://reader034.vdocument.in/reader034/viewer/2022042619/58f9a973760da3da068b6ec7/html5/thumbnails/17.jpg)
450 Billion Events per Day
8 Million (17 GB) per sec peak
![Page 18: Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)](https://reader034.vdocument.in/reader034/viewer/2022042619/58f9a973760da3da068b6ec7/html5/thumbnails/18.jpg)
S3 EMR
Event Producer
Fronting Kafka
Consumer Kafka
At least Once Processing
Stream Consumers
Mantis
450 Billion events / day
![Page 19: Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)](https://reader034.vdocument.in/reader034/viewer/2022042619/58f9a973760da3da068b6ec7/html5/thumbnails/19.jpg)
S3 EMR
Event Producer
Fronting Kafka
Consumer Kafka
At least Once Processing
Stream Consumers
Mantis
450 Billion events / day
![Page 20: Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)](https://reader034.vdocument.in/reader034/viewer/2022042619/58f9a973760da3da068b6ec7/html5/thumbnails/20.jpg)
S3 EMR
Event Producer
Fronting Kafka
Consumer Kafka
At least Once Processing
Stream Consumers
Mantis
450 Billion events / day
![Page 21: Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)](https://reader034.vdocument.in/reader034/viewer/2022042619/58f9a973760da3da068b6ec7/html5/thumbnails/21.jpg)
S3 EMR
Event Producer
Fronting Kafka
Consumer Kafka
Stream Consumers
Mantis
450 Billion events / day
At least Once Processing
![Page 22: Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)](https://reader034.vdocument.in/reader034/viewer/2022042619/58f9a973760da3da068b6ec7/html5/thumbnails/22.jpg)
Backpressure
Direct API for Kafka
Improve Cloud Multi-tenancy
What’s Missing?
+
ì
![Page 23: Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)](https://reader034.vdocument.in/reader034/viewer/2022042619/58f9a973760da3da068b6ec7/html5/thumbnails/23.jpg)
Backpressure + JobScheduler JobGenerator DStreamGraph
Driver
00 : 00 00 : 01 00 : 02 00 : 03 00 : 04 00 : 05 00 : 06 00 : 07 00 : 08 00 : 09 00 : 10
Unbounded …
Ops Queue
![Page 24: Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)](https://reader034.vdocument.in/reader034/viewer/2022042619/58f9a973760da3da068b6ec7/html5/thumbnails/24.jpg)
Backpressure
SPARK-7398 – Add backpressure to Spark streaming
SPARK-6691 – Add dynamic rate limiter to Spark streaming
+
![Page 25: Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)](https://reader034.vdocument.in/reader034/viewer/2022042619/58f9a973760da3da068b6ec7/html5/thumbnails/25.jpg)
Backpressure + Backpressure implementation slated for
Spark 1.5 release
![Page 26: Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)](https://reader034.vdocument.in/reader034/viewer/2022042619/58f9a973760da3da068b6ec7/html5/thumbnails/26.jpg)
Direct API for Kafka
Spark 1.3
Kafka Integration
Spark 1.2 Receiver based Kafka Integration
2x Faster
![Page 27: Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)](https://reader034.vdocument.in/reader034/viewer/2022042619/58f9a973760da3da068b6ec7/html5/thumbnails/27.jpg)
Direct API for Kafka
Prefetch messages
Connection reuse (pooling)
Enhance
![Page 28: Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)](https://reader034.vdocument.in/reader034/viewer/2022042619/58f9a973760da3da068b6ec7/html5/thumbnails/28.jpg)
Cloud Multi-tenancy ì
Cloud Scheduler
Mesos Framework
Mesos Slave
Docker
Executor
Task Task
Docker
Executor
Task Task
Mesos Slave
Docker
Spark Driver
Improve
![Page 29: Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)](https://reader034.vdocument.in/reader034/viewer/2022042619/58f9a973760da3da068b6ec7/html5/thumbnails/29.jpg)
Next?
Measuring Spark Streaming Latencies
Spark Streaming Cloud Multi-tenancy
![Page 30: Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)](https://reader034.vdocument.in/reader034/viewer/2022042619/58f9a973760da3da068b6ec7/html5/thumbnails/30.jpg)