assessment of data path implementations for download and streaming
DESCRIPTION
Assessment of Data Path Implementations for Download and Streaming. Pål Halvorsen. Overview. RELAY overview??? Existing mechanisms in Linux Tested enhancements Ongoing Summary and Conclusions. RELAY Resource Utilization in Large-Scale Time-Dependent Systems. VoD. WWW. - PowerPoint PPT PresentationTRANSCRIPT
Assessment of Data Path Implementations for
Download and Streaming
Pål Halvorsen
Visit at Technische Universität Braunschweig, March 2007
Overview
RELAY overview???
Existing mechanisms in Linux
Tested enhancements
Ongoing
Summary and Conclusions
RELAYResource Utilization in
Large-Scale Time-Dependent Systems
Visit at Technische Universität Braunschweig, March 2007
networknetwork
networknetwork
VoDWWW
P2P
Live eventperformance??
Picture Today
Visit at Technische Universität Braunschweig, March 2007
Phys
TransportNetwork
Link
Application
HardwareDrivers
Kernel
User Space• System support for improved resource utilization & QoS• Multimedia (game and video) servers• … Some current areas
• protocols for interactive applications• multicast group maintenance• latency hiding• resource availability adaptation• hybrid P2P streaming / streaming to mobile devices• asymmetric multiprocessor scheduling• …
RELAY
Linux Data Path Linux Data Path ImplementationsImplementations
Visit at Technische Universität Braunschweig, March 2007
Delivery Systems
Network
bus(es)
Visit at Technische Universität Braunschweig, March 2007
bus(es)
file system communication system
application
user spacekernel space
Delivery Systems
Visit at Technische Universität Braunschweig, March 2007
Pentium 4Processor
registerscache(s)
I/Ocontroller
hub
memorycontroller
hub
RDRAMRDRAM
RDRAMRDRAM
PCI slotsPCI slotsPCI slots
network card
disk
file systemcommunication systemapplication
file system communication system
application
disk network card
Intel Hub Architecture several in-memory data movements and context switches
Visit at Technische Universität Braunschweig, March 2007
Cost of Data Transfers Data copy operations are expensive
−consume CPU, memory, hub, bus and interface resources (proportional to size)
−profiling shows that ~40% of CPU time is consumed by copying data in a disk-network scenario
−speed-gap between memory and CPU increase−different access times to different banks
System calls makes a lot of switches between user and kernel space− ~450 ns on 933MHz PentiumIII− ~920 ns on 1.7GHz PentiumIV
Visit at Technische Universität Braunschweig, March 2007
Observation and QuestionA lot of research
has been performed in this
area!!!!
BUT, what is the status today of commodity
OSes?IO-Litesplice
MMBUF
stream
sendfile
….
Visit at Technische Universität Braunschweig, March 2007
file system communication system
application
user spacekernel space
bus(es)
Content Download
Visit at Technische Universität Braunschweig, March 2007
Content Download: read / send
application
kernel
page cache socket buffer
applicationbuffer
read send
copycopy
DMA transfer DMA transfer
2n copy operations 2n system calls
Visit at Technische Universität Braunschweig, March 2007
Content Download: mmap / send
application
kernel
page cache socket buffer
mmap send
copy
DMA transfer DMA transfer
n copy operations 1 + n system calls
Visit at Technische Universität Braunschweig, March 2007
Content Download: sendfile
application
kernel
page cache socket buffer
sendfile
gather DMA transfer
append descriptor
DMA transfer
0 copy operations 1 system calls
Visit at Technische Universität Braunschweig, March 2007
Content Download: Results
UDP TCP
Tested transfer of 1 GB file on Linux 2.6 Both UDP (with enhancements) and TCP
Visit at Technische Universität Braunschweig, March 2007
file system communication system
application
user spacekernel space
bus(es)
Streaming
Visit at Technische Universität Braunschweig, March 2007
Streaming: read / send
application
kernel
page cache socket buffer
application buffer
read send
copycopy
DMA transfer DMA transfer
2n (3n) copy operations 2n system calls
Visit at Technische Universität Braunschweig, March 2007
Streaming: read / writev
application
kernel
page cache socket buffer
application buffer
read writev
copycopy
DMA transfer DMA transfer
3n copy operations 2n system calls
copy
Visit at Technische Universität Braunschweig, March 2007
Streaming: mmap / send
application
kernel
page cache socket buffer
application buffer
mmap uncork
copy
DMA transfer DMA transfer
2n copy operations 1 + 4n system calls
copy
sendsendcork
Visit at Technische Universität Braunschweig, March 2007
Streaming: mmap / writev
application
kernel
page cache socket buffer
application buffer
mmap writev
copy
DMA transfer DMA transfer
2n copy operations 1 + n system calls
copy
Visit at Technische Universität Braunschweig, March 2007
Streaming: sendfile
application
kernel
page cache socket buffer
application buffer
DMA transfer
n copy operations 4n system calls
gather DMA transfer
append descriptor
copyuncorksendfilesendcork
Visit at Technische Universität Braunschweig, March 2007
Streaming: Results Tested streaming of 1 GB file on Linux 2.6 RTP over UDP
TCP sendfile (content download)
Compared to not sending an RTP header over UDP, we get an increase of 29%(additional send call)
More copy operations and system calls required potential for improvements
Enhanced Streaming Enhanced Streaming Data PathsData Paths
Visit at Technische Universität Braunschweig, March 2007
Enhanced Streaming: mmap / msend
application
kernel
page cache socket buffer
application buffer
DMA transfer
n copy operations 1 + 4n system calls
gather DMA transfer
append descriptor
copy
msend allows to send data from anmmap’ed file without copy
mmap uncorksendsendcork msend
copy
DMA transfer
Visit at Technische Universität Braunschweig, March 2007
Enhanced Streaming: mmap / rtpmsend
application
kernel
page cache socket buffer
application buffer
DMA transfer
n copy operations 1 + n system calls
gather DMA transfer
append descriptor
copymmap uncorkmsendsendcork rtpmsend
RTP header copy integrated intomsend system call
Visit at Technische Universität Braunschweig, March 2007
Enhanced Streaming: mmap / krtpmsend
application
kernel
page cache socket buffer
application buffer
DMA transfer
0 copy operations 1 system call
gather DMA transfer
append descriptor
copykrtpmsend
An RTP engine in the kernel adds RTP headers
rtpmsend
RTP engine
Visit at Technische Universität Braunschweig, March 2007
Enhanced Streaming: rtpsendfile
application
kernel
page cache socket buffer
application buffer
DMA transfer
n copy operations n system calls
gather DMA transfer
append descriptor
copyrtpsendfileuncorksendfilesendcork
RTP header copy integrated intosendfile system call
Visit at Technische Universität Braunschweig, March 2007
Enhanced Streaming: krtpsendfile
application
kernel
page cache socket buffer
application buffer
DMA transfer
0 copy operations 1 system call
gather DMA transfer
append descriptor
copykrtpsendfile
An RTP engine in the kerneladds RTP headers
rtpsendfile
RTP engine
Visit at Technische Universität Braunschweig, March 2007
Enhanced Streaming: Results Tested streaming of 1 GB file on Linux 2.6 RTP over UDP
TCP s
endfi
le (c
onte
nt d
ownlo
ad)Ex
isting
mec
hanis
m
(stre
aming
)
mmap based mechanisms sendfile based mechanisms
~27%
impr
ovem
ent
~25%
impr
ovem
ent
Ongoing WorkOngoing Work
Visit at Technische Universität Braunschweig, March 2007
Enhanced Streaming: rtpsendfile
application
kernel
page cache socket buffer
application buffer
DMA transfer
n copy operations n system calls
gather DMA transfer
append descriptor
copyrtpsendfile
Calls like writev, sendfilev, … exist
Visit at Technische Universität Braunschweig, March 2007
Enhanced Streaming: sendfilew
application
kernel
page cache socket buffer
application buffer
DMA transfer
gather DMA transfer
append descriptor
copysendfilew
len, off, src_fd, flags
Batched system call enabling an arbitrary interleaving of blocks from files and user-space buffers to be sent as one or more packets
Visit at Technische Universität Braunschweig, March 2007
Conclusions sendfile works nice for download scenarios
Current commodity operating systems still pay a high price for streaming services
However, small changes in the system call layer might be sufficient to remove most of the overhead
Conclusively, commodity operating systems still have potential for improvement with respect to streaming support
What can we hope to be supported?
Visit at Technische Universität Braunschweig, March 2007
Questions??