sve: distributed video processing at yajurvedi, paul ...iwanicki/courses/ds/2019/... · abhishek...

29
SVE: Distributed Video Processing at Facebook Scale Qi Huang, Petchean Ang, Peter Knowles, Tomasz Nykiel, Iaroslav Tverdokhlib, Amit Yajurvedi, Paul Dapolito IV, Xifan Yan, Maxim Bykov, Chuen Liang, Mohit Talwar, Abhishek Mathur, Sachin Kulkarni, Matthew Burke and Wyatt Lloyd Facebook, University of Southern California, Cornell, Princeton Presentation by Jonas Umland

Upload: others

Post on 07-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SVE: Distributed Video Processing at Yajurvedi, Paul ...iwanicki/courses/ds/2019/... · Abhishek Mathur, Sachin Kulkarni, Matthew Burke and Wyatt Lloyd Facebook, University of Southern

SVE: Distributed Video Processing at Facebook Scale

Qi Huang, Petchean Ang, Peter Knowles, Tomasz Nykiel, Iaroslav Tverdokhlib, Amit Yajurvedi, Paul Dapolito IV, Xifan Yan, Maxim Bykov, Chuen Liang, Mohit Talwar,

Abhishek Mathur, Sachin Kulkarni, Matthew Burke and Wyatt LloydFacebook, University of Southern California, Cornell, Princeton

Presentation by Jonas Umland

Page 2: SVE: Distributed Video Processing at Yajurvedi, Paul ...iwanicki/courses/ds/2019/... · Abhishek Mathur, Sachin Kulkarni, Matthew Burke and Wyatt Lloyd Facebook, University of Southern

Introduction

● Every day:○ 8B video views○ 500M users watch 100M hours video ○ Many tens of millions uploads

Page 3: SVE: Distributed Video Processing at Yajurvedi, Paul ...iwanicki/courses/ds/2019/... · Abhishek Mathur, Sachin Kulkarni, Matthew Burke and Wyatt Lloyd Facebook, University of Southern

Overview● Legacy design (MES) vs new design (SVE)● Performance comparison● DAG execution system ● Overload control● Production lessons

Page 4: SVE: Distributed Video Processing at Yajurvedi, Paul ...iwanicki/courses/ds/2019/... · Abhishek Mathur, Sachin Kulkarni, Matthew Burke and Wyatt Lloyd Facebook, University of Southern

Full Video Pipeline

Page 5: SVE: Distributed Video Processing at Yajurvedi, Paul ...iwanicki/courses/ds/2019/... · Abhishek Mathur, Sachin Kulkarni, Matthew Burke and Wyatt Lloyd Facebook, University of Southern

Tasks 153 22 18 >1000

Production Video Applications

Page 6: SVE: Distributed Video Processing at Yajurvedi, Paul ...iwanicki/courses/ds/2019/... · Abhishek Mathur, Sachin Kulkarni, Matthew Burke and Wyatt Lloyd Facebook, University of Southern

Monolithic Encoding Script

Page 7: SVE: Distributed Video Processing at Yajurvedi, Paul ...iwanicki/courses/ds/2019/... · Abhishek Mathur, Sachin Kulkarni, Matthew Burke and Wyatt Lloyd Facebook, University of Southern

Design Goals for a New Engine

Fast Robust Flexible

Page 8: SVE: Distributed Video Processing at Yajurvedi, Paul ...iwanicki/courses/ds/2019/... · Abhishek Mathur, Sachin Kulkarni, Matthew Burke and Wyatt Lloyd Facebook, University of Southern

SVE Architecture Overview

Page 9: SVE: Distributed Video Processing at Yajurvedi, Paul ...iwanicki/courses/ds/2019/... · Abhishek Mathur, Sachin Kulkarni, Matthew Burke and Wyatt Lloyd Facebook, University of Southern

SVE Architecture - Preprocessor

● Validation● Splitting video into chunks for old clients● DAG generation● Storing input video● Caching

Page 10: SVE: Distributed Video Processing at Yajurvedi, Paul ...iwanicki/courses/ds/2019/... · Abhishek Mathur, Sachin Kulkarni, Matthew Burke and Wyatt Lloyd Facebook, University of Southern

SVE Architecture - Scheduler & Workers

● Scheduler○ Receiving DAG from preprocessor○ Scheduling tasks○ Putting tasks into queue, when no worker is available (high & low prio)

● Worker○ Executing task○ Fetching data from preprocessor or intermediate storage○ Writing to intermediate storage

Page 11: SVE: Distributed Video Processing at Yajurvedi, Paul ...iwanicki/courses/ds/2019/... · Abhishek Mathur, Sachin Kulkarni, Matthew Burke and Wyatt Lloyd Facebook, University of Southern

SVE Architecture Overview - Intermediate Storage

● Caching of application metadata● Caching of video/audio data● Storing DAG state● Automatically free data

Page 12: SVE: Distributed Video Processing at Yajurvedi, Paul ...iwanicki/courses/ds/2019/... · Abhishek Mathur, Sachin Kulkarni, Matthew Burke and Wyatt Lloyd Facebook, University of Southern

Overlap Upload and Encoding

Page 13: SVE: Distributed Video Processing at Yajurvedi, Paul ...iwanicki/courses/ds/2019/... · Abhishek Mathur, Sachin Kulkarni, Matthew Burke and Wyatt Lloyd Facebook, University of Southern

Overlap Upload and Encoding

Page 14: SVE: Distributed Video Processing at Yajurvedi, Paul ...iwanicki/courses/ds/2019/... · Abhishek Mathur, Sachin Kulkarni, Matthew Burke and Wyatt Lloyd Facebook, University of Southern

Parallel Processing

Page 15: SVE: Distributed Video Processing at Yajurvedi, Paul ...iwanicki/courses/ds/2019/... · Abhishek Mathur, Sachin Kulkarni, Matthew Burke and Wyatt Lloyd Facebook, University of Southern

Parallel Processing

Page 16: SVE: Distributed Video Processing at Yajurvedi, Paul ...iwanicki/courses/ds/2019/... · Abhishek Mathur, Sachin Kulkarni, Matthew Burke and Wyatt Lloyd Facebook, University of Southern

Video Sync (Durably Storing)

Page 17: SVE: Distributed Video Processing at Yajurvedi, Paul ...iwanicki/courses/ds/2019/... · Abhishek Mathur, Sachin Kulkarni, Matthew Burke and Wyatt Lloyd Facebook, University of Southern

Video Sync (Durably Storing)

Page 18: SVE: Distributed Video Processing at Yajurvedi, Paul ...iwanicki/courses/ds/2019/... · Abhishek Mathur, Sachin Kulkarni, Matthew Burke and Wyatt Lloyd Facebook, University of Southern

Overall latency improvement

Page 19: SVE: Distributed Video Processing at Yajurvedi, Paul ...iwanicki/courses/ds/2019/... · Abhishek Mathur, Sachin Kulkarni, Matthew Burke and Wyatt Lloyd Facebook, University of Southern

DAG Execution System

Page 20: SVE: Distributed Video Processing at Yajurvedi, Paul ...iwanicki/courses/ds/2019/... · Abhishek Mathur, Sachin Kulkarni, Matthew Burke and Wyatt Lloyd Facebook, University of Southern

Dynamic DAG Generation● Processing tasks depend on

video propterties● Enables performance testing

Page 21: SVE: Distributed Video Processing at Yajurvedi, Paul ...iwanicki/courses/ds/2019/... · Abhishek Mathur, Sachin Kulkarni, Matthew Burke and Wyatt Lloyd Facebook, University of Southern

Fault Tolerance Strategies

Component Strategy

Client device Anticipate intermittent uploads

Front-end Replicate state externally

Preprocessor Replicate state externally

Scheduler Synchronously replicate state externally

Worker Replicate in time

Task Many retries

Storage Replicate on multiple disks

Page 22: SVE: Distributed Video Processing at Yajurvedi, Paul ...iwanicki/courses/ds/2019/... · Abhishek Mathur, Sachin Kulkarni, Matthew Burke and Wyatt Lloyd Facebook, University of Southern

Retry Tasks After Recoverable Error

Success rate

First try 99.788%

2 local retries 99.795%

1 retry on different worker 99.901%

6 retries on different workers 99.995%

Page 23: SVE: Distributed Video Processing at Yajurvedi, Paul ...iwanicki/courses/ds/2019/... · Abhishek Mathur, Sachin Kulkarni, Matthew Burke and Wyatt Lloyd Facebook, University of Southern

Failure of 20 % of Preprocessors in a Region

Page 24: SVE: Distributed Video Processing at Yajurvedi, Paul ...iwanicki/courses/ds/2019/... · Abhishek Mathur, Sachin Kulkarni, Matthew Burke and Wyatt Lloyd Facebook, University of Southern

Gradual Failure of 5% of Workers in a Region

Page 25: SVE: Distributed Video Processing at Yajurvedi, Paul ...iwanicki/courses/ds/2019/... · Abhishek Mathur, Sachin Kulkarni, Matthew Burke and Wyatt Lloyd Facebook, University of Southern

Mitigate overload

1) Delay latency insensitive tasks2) Delay latency sensitive tasks and notify engineer3) Redirect portion of video uploads to different region4) Delay video processing

Page 26: SVE: Distributed Video Processing at Yajurvedi, Paul ...iwanicki/courses/ds/2019/... · Abhishek Mathur, Sachin Kulkarni, Matthew Burke and Wyatt Lloyd Facebook, University of Southern

Overload Control in Practice

Page 27: SVE: Distributed Video Processing at Yajurvedi, Paul ...iwanicki/courses/ds/2019/... · Abhishek Mathur, Sachin Kulkarni, Matthew Burke and Wyatt Lloyd Facebook, University of Southern

Production Lessons

● Mismatch for livestreaming● Failures from global inconsistency● Failures from regional inconsistency● Continuous sandboxing

Page 28: SVE: Distributed Video Processing at Yajurvedi, Paul ...iwanicki/courses/ds/2019/... · Abhishek Mathur, Sachin Kulkarni, Matthew Burke and Wyatt Lloyd Facebook, University of Southern

Summary

● 3 additional parallelities to improve latency● DAG execution system● Robust to overload and fault● Large scale production insights

Page 29: SVE: Distributed Video Processing at Yajurvedi, Paul ...iwanicki/courses/ds/2019/... · Abhishek Mathur, Sachin Kulkarni, Matthew Burke and Wyatt Lloyd Facebook, University of Southern

SourcesMost images are extracted from the paper:

https://www.cs.princeton.edu/~wlloyd/papers/sve-sosp17.pdf

And from Qi Huang's Talk:

www.qhuangcs.com/slides/sosp_sve.pptx