high-performance video streaming - acm sigcomm · 2017. 10. 27. · high-performance video...

19
Disk|Crypt|Net High-performance video streaming Ilias Marinos, Robert Watson (Cambridge), Mark Handley (UCL), Randall Stewart (Netflix)

Upload: others

Post on 22-Sep-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: High-performance video streaming - acm sigcomm · 2017. 10. 27. · High-performance video streaming IliasMarinos, Robert Watson (Cambridge), Mark Handley (UCL), ... a kernel-bypass

Disk|Crypt|NetHigh-performancevideostreaming

Ilias Marinos,RobertWatson(Cambridge),MarkHandley(UCL),

RandallStewart(Netflix)

Page 2: High-performance video streaming - acm sigcomm · 2017. 10. 27. · High-performance video streaming IliasMarinos, Robert Watson (Cambridge), Mark Handley (UCL), ... a kernel-bypass

ModernVideoStreaming

• JustlotsofHTTPrequestsforvideochunks.• Clientpickschunkstoadaptrate.• Serverisprettydumb– justhastogofast.• HTTP/1.1persistentconnections.• TLSbecomingimportant(95%ofYoutube traffic).

• Morethan50%ofUSInternettraffic.• Importanttomakegooduseofexpensivehardware.Howfastcanyougo?

Page 3: High-performance video streaming - acm sigcomm · 2017. 10. 27. · High-performance video streaming IliasMarinos, Robert Watson (Cambridge), Mark Handley (UCL), ... a kernel-bypass

NewiPlayer setup,Dec2015:• nginx onLinux,24coresontwoIntelXeonE5-2680v3

processors,512GBDDR4RAM,8.6TBRAIDarrayofSSDs.• 20Gb/sperserver. ßCanweimproveperformance?

Page 4: High-performance video streaming - acm sigcomm · 2017. 10. 27. · High-performance video streaming IliasMarinos, Robert Watson (Cambridge), Mark Handley (UCL), ... a kernel-bypass

Casestudy:Netflix

• FreeBSD,buttweaked.– Asynchronoussendfile()• Non-blockingzerocopyfromdiskbuffercachetoNet.

– VMscaling• FakeNUMAdomainstoavoidlockcontention.• Proactivecleanupofdiskbuffercache.

– RSS-assistedLRO.• Sortincomingpacketstobucketsbasedon5-tuplehashtooptimizeLROengineefficacy.

Page 5: High-performance video streaming - acm sigcomm · 2017. 10. 27. · High-performance video streaming IliasMarinos, Robert Watson (Cambridge), Mark Handley (UCL), ... a kernel-bypass

LetsDoSomeExperiments• 8-coreHaswellserver,2x40GbENICs,128GBRAM,4xIntelP3700NVMe disks

• LinuxClients.• Syntheticworkload,middlebox forrealisticRTT.

Streamer

middlebox

40GbEswitch

C C

ms

Client

middlebox

Streamer

μs

Page 6: High-performance video streaming - acm sigcomm · 2017. 10. 27. · High-performance video streaming IliasMarinos, Robert Watson (Cambridge), Mark Handley (UCL), ... a kernel-bypass

Unencryptedvideostreamingworkload

DataNOTindiskbuffercache

Conclusions• Netfliximprovementsgood• CPUutilizationisaproblem

~2x Datacomesfromdiskbuffercache

CPUutilizationdoubleswhenfetchingfromdisk

(~350%->~700%)

Page 7: High-performance video streaming - acm sigcomm · 2017. 10. 27. · High-performance video streaming IliasMarinos, Robert Watson (Cambridge), Mark Handley (UCL), ... a kernel-bypass

EncryptionProblem:

Sendfile:• Zerocopyfromdiskbuffercache.

TLS:• Different encryptedstreamperuser.• Kernel isunawareofTLS.

Sendfile andTLSarefundamentallyincompatible!

• ConventionalTLSstackgaveNetflix 20-> 8.5Gb/s• Netfliximplementedin-kernelTLSsupportforsendfile!.

sendfile()NOT zerocopy anymore!

Page 8: High-performance video streaming - acm sigcomm · 2017. 10. 27. · High-performance video streaming IliasMarinos, Robert Watson (Cambridge), Mark Handley (UCL), ... a kernel-bypass

Encryptedvideostreamingworkload

Performanceloss(~30%)whencontentfetched

fromSSDs

CPUissaturated.Memoryreadthroughput~3xmorethannetwork

throughput!

Page 9: High-performance video streaming - acm sigcomm · 2017. 10. 27. · High-performance video streaming IliasMarinos, Robert Watson (Cambridge), Mark Handley (UCL), ... a kernel-bypass

What’shappening?

NVMeDRAMLLC

NIC

BufferCache

Copieddata

Encrypteddata

Copy

TCP

CPU1

2

3

AES

Thestackistooasynchronous.DatakeepsgettingflushedfromtheLLC,andre-loaded.Systemisbottleneckedonmemory.

Page 10: High-performance video streaming - acm sigcomm · 2017. 10. 27. · High-performance video streaming IliasMarinos, Robert Watson (Cambridge), Mark Handley (UCL), ... a kernel-bypass

ProductionNetflixWorkload

• 192GBforbuffercache,butonly10%hitratio.• Streamersbottleneckedinmemorybandwidth.

üModernNVMe SSDshavelowlatency &highthroughput.

üModernIntelCPUsDMAdirectlytoL3cache.

Canweeliminatethediskbuffercachecompletely,andfetcheverythingfromtheSSDs

on-demand?

Page 11: High-performance video streaming - acm sigcomm · 2017. 10. 27. · High-performance video streaming IliasMarinos, Robert Watson (Cambridge), Mark Handley (UCL), ... a kernel-bypass

IdealStack

NVMe

DRAMLLC

NIC

AES

TCP

CPU

re-usebuffer

Toachievethis,wemust:• FetchondemandfromtheSSDwhenTCPneedsdata.• AssoonastheSSDreturnsdata,processitto

completionandDMAittotheNIC.

Page 12: High-performance video streaming - acm sigcomm · 2017. 10. 27. · High-performance video streaming IliasMarinos, Robert Watson (Cambridge), Mark Handley (UCL), ... a kernel-bypass

SolutionOutline1. ATCPACKarrives,freeingupcongestion

window.2. TriggerstacktorequestmoredatafromSSDsto

fillthatcongestionwindow.3. SSDsreturndata placingthemintheLLC.4. Readcompletioneventcausesapplicationto

encryptthedatain-place,addTCPheaders,andtriggerthetransmissionofthepackets.

5. Networkcompletioneventfreesthebuffer,allowingittobereusedforalaterdiskread.

ConventionalOSstackNOTsuitable:Ø Highlyasynchronous;storageandnetworkstackare

looselycoupled-- reliesonVFS&BufferCache.Ø Introducesoverheadsrelatedtoabstractionlayers

(VFS,POSIXetc),redundantmemorycopiesanddomaintransitions(user<->kernel).

Page 13: High-performance video streaming - acm sigcomm · 2017. 10. 27. · High-performance video streaming IliasMarinos, Robert Watson (Cambridge), Mark Handley (UCL), ... a kernel-bypass

TheAtlasStreamingStack

Atlas:acompleteuser-spacestackØ TCP/IPstackbasedonmodifiedversionofSandstorm(SIGCOMM’14) andnetmap(ATC’12).

Ø Storagehandledusingdiskmap (nobuffercache,nosophisticatedFS).

Ø Lockless,fullzero-copy stackfromdisk<->NIC.Ø Tightpipelinetoreduceasynchrony,andideallysavememorybandwidth(w/DDIO).

Page 14: High-performance video streaming - acm sigcomm · 2017. 10. 27. · High-performance video streaming IliasMarinos, Robert Watson (Cambridge), Mark Handley (UCL), ... a kernel-bypass

Diskmap Architecture

SQ CQ

PCIe NVMe Disk

kernel

user

DMA

SQ CQ

nvme0-1

libnvmeapp

SQ CQ

nvme0-2

libnvmeapp

DMA

DMA

adminqpairs

C0 C1

I/OMMU

Diskmap:akernel-bypassI/OframeworkforNVMe disks

memorymapped

buffers buffers

Page 15: High-performance video streaming - acm sigcomm · 2017. 10. 27. · High-performance video streaming IliasMarinos, Robert Watson (Cambridge), Mark Handley (UCL), ... a kernel-bypass

TheAtlasExecutionPipeline

SQ CQ

NVMe DiskNIC

RX TX

kernel

user

webserver

TCP/IP

libnmio libnvme

1

2

4buffers 5

637

Page 16: High-performance video streaming - acm sigcomm · 2017. 10. 27. · High-performance video streaming IliasMarinos, Robert Watson (Cambridge), Mark Handley (UCL), ... a kernel-bypass

Atlasvs.Netflix,UnencryptedContent

Throughp

ut(G

b/s)

LLCmisses/s(x10

7 )Netflixneeds8

cores,Atlasonlyneeds4

15%betterthroughputthanNetflixwhencachehitratioislow.

AlmostnoCPUstalls:datainLLCwhenwewantit.

Page 17: High-performance video streaming - acm sigcomm · 2017. 10. 27. · High-performance video streaming IliasMarinos, Robert Watson (Cambridge), Mark Handley (UCL), ... a kernel-bypass

Atlasvs.Netflix,EncryptedContent

Throughp

ut(G

b/s)

Mem

oryread/throu

ghpu

t

Whencachehitratioislow,50%morethroughputusinghalfthecores.

Almosthalfthememoryreadsforeachpacketsent.

Page 18: High-performance video streaming - acm sigcomm · 2017. 10. 27. · High-performance video streaming IliasMarinos, Robert Watson (Cambridge), Mark Handley (UCL), ... a kernel-bypass

Atlasmemoryusage

WhenLLC/CPUisNOTsaturated:

WhenLLC/CPUissaturated:

DRAMLLC

NIC

AES

TCP

CPU

TCPPackets

re-usebuffer

NVMe

DRAMLLC

NIC

AES

TCP

CPU

TCPPackets

re-usebuffer

NVMe

Netmap doesn’tprovidealow-delayfine-grainedwaytocommunicateDMAcompletions.Can’treusebuffersfastenough(noLIFOstack),andthiscontributestosomeextracachepressure.

Page 19: High-performance video streaming - acm sigcomm · 2017. 10. 27. · High-performance video streaming IliasMarinos, Robert Watson (Cambridge), Mark Handley (UCL), ... a kernel-bypass

Summary• Netflixaddressedallthelow-hangingfruit– Veryfast,butnowbottleneckedonmemory

• Atlasisaspecializedstack– PutsSSDdirectlyinTCPcontrolloop– Immediatelyprocessesdiskreadstocompletionandtransmits.

– 50%throughputimprovementwithencryptedcontent,closeto50%reductioninmemoryreads

• NetflixinspiredbyAtlas– NowexperimentingwithhowtodirectlytriggerencryptionoffofdiskDMAcompletionsintheirFreeBSDstack.