using spark @ conviva 29 aug 2013
DESCRIPTION
Using Spark @ Conviva 29 Aug 2013. Summary. Who are we? What is the problem we needed to solve? How was Spark essential to the solution? What can Spark enable us to do in the future?. Some context…. Quality has a huge impact on user engagement in online video. What we do. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Using Spark @ Conviva 29 Aug 2013](https://reader035.vdocument.in/reader035/viewer/2022062814/5681686a550346895ddeda65/html5/thumbnails/1.jpg)
Using Spark @ Conviva29 Aug 2013
![Page 2: Using Spark @ Conviva 29 Aug 2013](https://reader035.vdocument.in/reader035/viewer/2022062814/5681686a550346895ddeda65/html5/thumbnails/2.jpg)
Summary
Who are we?What is the problem we needed to solve?How was Spark essential to the solution? What can Spark enable us to do in the future?
![Page 3: Using Spark @ Conviva 29 Aug 2013](https://reader035.vdocument.in/reader035/viewer/2022062814/5681686a550346895ddeda65/html5/thumbnails/3.jpg)
Some context…
Quality has a huge impact on user engagement in online video
![Page 4: Using Spark @ Conviva 29 Aug 2013](https://reader035.vdocument.in/reader035/viewer/2022062814/5681686a550346895ddeda65/html5/thumbnails/4.jpg)
What we do
Optimize end user online video experienceProvide historical and real-time metrics for online video providers and distributersMaximize quality through Adaptive Bit Rate (ABR) Content Distribution Network (CDN) switching
Enable Multi-CDN infrastructure with fine grained traffic shaping control
![Page 5: Using Spark @ Conviva 29 Aug 2013](https://reader035.vdocument.in/reader035/viewer/2022062814/5681686a550346895ddeda65/html5/thumbnails/5.jpg)
About four billion streams per monthMostly premium content providers (e.g., HBO, ESPN, BSkyB, CNTV) but also User Generated Video sites (e.g., Ustream)Live events (e.g., NCAA March Madness, FIFA World Cup, MLB), short VoDs (e.g., MSNBC), and long VoDs (e.g., HBO, Hulu)Various streaming protocols (e.g., Flash, SmoothStreaming, HLS, HDS), and devices (e.g., PC, iOS devices, Roku, XBOX, …) Traffic from all major CDNs, including ISP CDNs (e.g., Verizon, AT&T)
What traffic do we see?
![Page 6: Using Spark @ Conviva 29 Aug 2013](https://reader035.vdocument.in/reader035/viewer/2022062814/5681686a550346895ddeda65/html5/thumbnails/6.jpg)
The TruthVideo delivery over the internet is hard There is a big disconnect between viewers and publishers
![Page 7: Using Spark @ Conviva 29 Aug 2013](https://reader035.vdocument.in/reader035/viewer/2022062814/5681686a550346895ddeda65/html5/thumbnails/7.jpg)
The Viewer’s Perspective
ISP & Home Net
Video Player
Start of the viewer’s perspective
?
![Page 8: Using Spark @ Conviva 29 Aug 2013](https://reader035.vdocument.in/reader035/viewer/2022062814/5681686a550346895ddeda65/html5/thumbnails/8.jpg)
Video Source
Encoders & Video Servers
CMS and Hosting Content Delivery
Networks (CDN)
The Provider’s Perspective
?
![Page 9: Using Spark @ Conviva 29 Aug 2013](https://reader035.vdocument.in/reader035/viewer/2022062814/5681686a550346895ddeda65/html5/thumbnails/9.jpg)
Video Source
Encoders & Video Servers
CMS and Hosting Content Delivery
Networks (CDN)
ISP & Home Net
Video Player
Closing the loop
![Page 10: Using Spark @ Conviva 29 Aug 2013](https://reader035.vdocument.in/reader035/viewer/2022062814/5681686a550346895ddeda65/html5/thumbnails/10.jpg)
The TruthVideo delivery over the internet is hard CDN variability makes it nearly impossible to deliver high quality
everywhere with just one CDN
SIGCOMM ‘12
![Page 11: Using Spark @ Conviva 29 Aug 2013](https://reader035.vdocument.in/reader035/viewer/2022062814/5681686a550346895ddeda65/html5/thumbnails/11.jpg)
The TruthVideo delivery over the internet is hard CDN variability makes it nearly impossible to deliver high quality all
the time with just one CDN
SIGCOMM ‘12
![Page 12: Using Spark @ Conviva 29 Aug 2013](https://reader035.vdocument.in/reader035/viewer/2022062814/5681686a550346895ddeda65/html5/thumbnails/12.jpg)
The IdeaWhere there is heterogeneity, there is room for optimizationFor each viewer we want to decide what CDN to stream fromBut it’s difficult to model the internet, and things can rapidly change over timeSo we will make this decision based on the real-time data that we collect
![Page 13: Using Spark @ Conviva 29 Aug 2013](https://reader035.vdocument.in/reader035/viewer/2022062814/5681686a550346895ddeda65/html5/thumbnails/13.jpg)
CDN1 (buffering ratio)
CDN2 (buffering ratio)
The Idea
Avg. buff ratio of users in Region[1] streaming
from CDN1
Avg. buff ratio of users in Region[1] streaming
from CDN2
For each CDN, partition clients by CityFor each partition compute Buffering Ratio
Seatt
le
San
Fran
cisc
o
New
York
Lond
on
Hong
Kon
g
![Page 14: Using Spark @ Conviva 29 Aug 2013](https://reader035.vdocument.in/reader035/viewer/2022062814/5681686a550346895ddeda65/html5/thumbnails/14.jpg)
The IdeaFor each partition select best CDN and send clients to this CDN
CDN1 (buffering ratio)
CDN2 (buffering ratio)
Seatt
le
San
Fran
cisc
o
New
York
Lond
on
Hong
Kon
g
![Page 15: Using Spark @ Conviva 29 Aug 2013](https://reader035.vdocument.in/reader035/viewer/2022062814/5681686a550346895ddeda65/html5/thumbnails/15.jpg)
The IdeaFor each partition select best CDN and send clients to this CDN
CDN1 (buffering ratio)
CDN2 (buffering ratio)
CDN2 >> CDN1
Seatt
le
San
Fran
cisc
o
New
York
Lond
on
Hong
Kon
g
![Page 16: Using Spark @ Conviva 29 Aug 2013](https://reader035.vdocument.in/reader035/viewer/2022062814/5681686a550346895ddeda65/html5/thumbnails/16.jpg)
The IdeaFor each partition select best CDN and send clients to this CDN
CDN1 (buffering ratio)
CDN2 (buffering ratio)
Seatt
le
San
Fran
cisc
o
New
York
Lond
on
Hong
Kon
g
![Page 17: Using Spark @ Conviva 29 Aug 2013](https://reader035.vdocument.in/reader035/viewer/2022062814/5681686a550346895ddeda65/html5/thumbnails/17.jpg)
Best CDN (buffering ratio)
The Idea
CDN1 (buffering ratio)
CDN2 (buffering ratio)
For each partition select best CDN and send clients to this CDN
Seatt
le
San
Fran
cisc
o
New
York
Lond
on
Hong
Kon
g
![Page 18: Using Spark @ Conviva 29 Aug 2013](https://reader035.vdocument.in/reader035/viewer/2022062814/5681686a550346895ddeda65/html5/thumbnails/18.jpg)
The Idea
Best CDN (buffering ratio)
CDN1 (buffering ratio)
CDN2 (buffering ratio)
What if there are changes in performance? Se
attle
San
Fran
cisc
o
New
York
Lond
on
Hong
Kon
g
![Page 19: Using Spark @ Conviva 29 Aug 2013](https://reader035.vdocument.in/reader035/viewer/2022062814/5681686a550346895ddeda65/html5/thumbnails/19.jpg)
The Idea
Best CDN (buffering ratio)
CDN1 (buffering ratio)
CDN2 (buffering ratio)
Use online algorithm respond to changes in the network.
Seatt
le
San
Fran
cisc
o
New
York
Lond
on
Hong
Kon
g
![Page 20: Using Spark @ Conviva 29 Aug 2013](https://reader035.vdocument.in/reader035/viewer/2022062814/5681686a550346895ddeda65/html5/thumbnails/20.jpg)
Content owners (CMS & Origin)
CDN 1
CDN 2
CDN 3
Clients
Coordinator
Conti
nuou
s mea
sure
men
ts
Busin
ess P
olic
ies
control
How?Coordinator implementing an optimization algorithm that dynamically selects a CDN for each client based on Individual client Aggregate statistics Content owner policies
All based on real-time data
![Page 21: Using Spark @ Conviva 29 Aug 2013](https://reader035.vdocument.in/reader035/viewer/2022062814/5681686a550346895ddeda65/html5/thumbnails/21.jpg)
What processing framework do we use?Twitter Storm Fault tolerance model affects data accuracy Non-deterministic streaming model
Roll our own Too complex No need to reinvent the wheel
Spark Easily integrates with existing Hadoop architecture Flexible, simple data model Writing map() is generally easier than writing update()
![Page 22: Using Spark @ Conviva 29 Aug 2013](https://reader035.vdocument.in/reader035/viewer/2022062814/5681686a550346895ddeda65/html5/thumbnails/22.jpg)
Spark Usage for Production
Decision Maker
Decision Maker
Decision Maker
Compute performance metrics in spark clusterRelay performance information to decision makers
Spark ClusterSparkNode
Spark Node
Spark Node
Spark Node Spark
Node
Storage Layer
Clients
![Page 23: Using Spark @ Conviva 29 Aug 2013](https://reader035.vdocument.in/reader035/viewer/2022062814/5681686a550346895ddeda65/html5/thumbnails/23.jpg)
Results
500 700 900 1100 1300 1500 1700 1900 2100 2300 250075%
80%
85%
90%
95%
100%Non-buffering views
Average Bitrate (Kbps)
![Page 24: Using Spark @ Conviva 29 Aug 2013](https://reader035.vdocument.in/reader035/viewer/2022062814/5681686a550346895ddeda65/html5/thumbnails/24.jpg)
Spark’s RoleSpark development was incredibly rapid, aided both by its excellent programming interface and highly active communityExpressive: Develop complex on-line ML decision based algorithm in ~1000
lines of code Easy to prototype various algorithms
It has made scalability a far more manageable problemAfter initial teething problems, we have been running Spark in a production environment reliably for several months.
![Page 25: Using Spark @ Conviva 29 Aug 2013](https://reader035.vdocument.in/reader035/viewer/2022062814/5681686a550346895ddeda65/html5/thumbnails/25.jpg)
Problems we facedSilent crashes…Often difficult to debug, requiring some tribal knowledgeDifficult configuration parameters, with sometimes inexplicable resultsFundamental understanding of underlying data model was essential to writing effective, stable spark programs
![Page 26: Using Spark @ Conviva 29 Aug 2013](https://reader035.vdocument.in/reader035/viewer/2022062814/5681686a550346895ddeda65/html5/thumbnails/26.jpg)
Enforcing constraints on optimizationImagine swapping clients until an optimal solution is reached
Constrained Best CDN (buffering ratio)
CDN1 (buffering ratio)
CDN2 (buffering ratio)
50%
50%
![Page 27: Using Spark @ Conviva 29 Aug 2013](https://reader035.vdocument.in/reader035/viewer/2022062814/5681686a550346895ddeda65/html5/thumbnails/27.jpg)
Enforcing constraints on top of optimizationSolution is found after clients have already joined.
Therefore we need to parameterize solution to clients already seen for online use.Need to compute an LP on real time dataSpark Supported it 20 LPs Each with 4000 decisions variables and 350 constraints 5 seconds.
![Page 28: Using Spark @ Conviva 29 Aug 2013](https://reader035.vdocument.in/reader035/viewer/2022062814/5681686a550346895ddeda65/html5/thumbnails/28.jpg)
Tuning
Can’t select a CDN based solely on one metric. Select utility functions that best predict engagement
Confidence in a decision, or evaluation will depend on how much data we have collected Need to tune time window Select different attributes for separation
![Page 29: Using Spark @ Conviva 29 Aug 2013](https://reader035.vdocument.in/reader035/viewer/2022062814/5681686a550346895ddeda65/html5/thumbnails/29.jpg)
Tuning
Need to validate algorithm changes quicklySimulation of algorithm offline, is essential
![Page 30: Using Spark @ Conviva 29 Aug 2013](https://reader035.vdocument.in/reader035/viewer/2022062814/5681686a550346895ddeda65/html5/thumbnails/30.jpg)
Spark Usage for Simulation
SparkNode
Spark Node
Spark Node
Spark Node
Spark Node
HDFS with Production traces
Load production traces with randomized initial decisionsGenerate decision table (with artificial delay)Produce simulated decision setEvaluate decisions against actual traces to estimate expected quality improvement
![Page 31: Using Spark @ Conviva 29 Aug 2013](https://reader035.vdocument.in/reader035/viewer/2022062814/5681686a550346895ddeda65/html5/thumbnails/31.jpg)
Future of Spark and ConvivaLeverage spark streamingUnify live and historical processingDevelop platform to build various processing ‘apps’ (e.g. Anomaly Detection, Customer Tailored Reporting) Can share the same data API Will all have consistent input
![Page 32: Using Spark @ Conviva 29 Aug 2013](https://reader035.vdocument.in/reader035/viewer/2022062814/5681686a550346895ddeda65/html5/thumbnails/32.jpg)
In SummarySpark was able to support our initial requirement of fast fault tolerant performance computation for an on-line decision makerNew complexities like LP calculation ‘just worked’ in the existing architectureSpark has become an essential tool in our software stack
![Page 33: Using Spark @ Conviva 29 Aug 2013](https://reader035.vdocument.in/reader035/viewer/2022062814/5681686a550346895ddeda65/html5/thumbnails/33.jpg)
Thank you!
![Page 34: Using Spark @ Conviva 29 Aug 2013](https://reader035.vdocument.in/reader035/viewer/2022062814/5681686a550346895ddeda65/html5/thumbnails/34.jpg)
Questions?