![Page 1: 1 Combining Events and Threads for Scalable Network Services Peng Li and Steve Zdancewic University of Pennsylvania PLDI 2007, San Diego](https://reader035.vdocument.in/reader035/viewer/2022062713/56649f575503460f94c7b568/html5/thumbnails/1.jpg)
1
Combining Events and
Threads for Scalable Network Services
Peng Li and Steve Zdancewic
University of Pennsylvania
PLDI 2007, San Diego
![Page 2: 1 Combining Events and Threads for Scalable Network Services Peng Li and Steve Zdancewic University of Pennsylvania PLDI 2007, San Diego](https://reader035.vdocument.in/reader035/viewer/2022062713/56649f575503460f94c7b568/html5/thumbnails/2.jpg)
2
Overview A Haskell framework for massively concurrent network
applications Servers, P2P systems, load generators
Massive concurrency ::= 1,000 threads? (easy)
| 10,000 threads? (common)
| 100,000 threads? (challenging)
| 1,000,000 threads? (20 years later?)
| 10,000,000 threads? (in 15 minutes) How to write such programs?
The very first decision to make: the programming modelShall we use threads or events?
A lazy, purely functional programming languagehttp://www.haskell.org
![Page 3: 1 Combining Events and Threads for Scalable Network Services Peng Li and Steve Zdancewic University of Pennsylvania PLDI 2007, San Diego](https://reader035.vdocument.in/reader035/viewer/2022062713/56649f575503460f94c7b568/html5/thumbnails/3.jpg)
3
Threads vs. EventsThe multithreaded model
One thread ↔ one client Synchronous I/O Scheduling: OS/runtime libs
int send_data(int fd1, int fd2) { while (!EOF(fd1)) { size = read_chunk(fd, buf, count); write_chunk(fd, buf, size); } …
The event-driven model: One thread ↔ 10000 clients Asynchronous I/O Scheduling: programmer
while(1) { nfds=epoll_wait(kdpfd, events, MAXEVT,-1); for(n=0; n<nfds; ++n) handle_event(events[n]); …
Threads Events
Expressiveness and Abstraction
(for programming each client)
Synchronous I/O + intuitive control flow primitives
Finite state machines /
Continuation-passing style (CPS) programming
Flexibility and Control
(for resource scheduling)
Baked into OS/runtime, difficult to customize
Programmer has complete control – tailored to each application’s needs
“Why threads are a bad idea (for most purposes)” [USENIX ATC 1999]
“Why events are a bad idea (for high-concurrency servers)”
[HotOS 2003]
![Page 4: 1 Combining Events and Threads for Scalable Network Services Peng Li and Steve Zdancewic University of Pennsylvania PLDI 2007, San Diego](https://reader035.vdocument.in/reader035/viewer/2022062713/56649f575503460f94c7b568/html5/thumbnails/4.jpg)
4
Can we get the best of both worlds?
The bridge between threads/events?(some kind of “continuation” support)
Resource scheduling: events•Written as part of the application • Tailored to application’s needs
Programming with each client: threads • Synchronous I/O• Intuitive control-flow primitives
One application program
![Page 5: 1 Combining Events and Threads for Scalable Network Services Peng Li and Steve Zdancewic University of Pennsylvania PLDI 2007, San Diego](https://reader035.vdocument.in/reader035/viewer/2022062713/56649f575503460f94c7b568/html5/thumbnails/5.jpg)
5
Roads to lightweight, application-level concurrency
Direct language support for continuations:Good if you have them
Source-to-source CPS translationsRequires hacking on compiler/runtimeOften not very elegant
Other solutions? (no language support) (no compiler/runtime hacks)
![Page 6: 1 Combining Events and Threads for Scalable Network Services Peng Li and Steve Zdancewic University of Pennsylvania PLDI 2007, San Diego](https://reader035.vdocument.in/reader035/viewer/2022062713/56649f575503460f94c7b568/html5/thumbnails/6.jpg)
6
The poor man’s concurrency monad “A poor man’s concurrency monad” by Koen Claessen,
JFP 1999. (Functional Pearl) The thread interface:
The CPS monad The event interface:
A lazy, tree-like data structure called “trace”
server_loop s = do { sock <- sock_accept s; sys_fork (client sock); server_loop;}client_loop sock = do { sock_send sock data; sock_close sock;}sock_send sock data = do { ... n<-sys_nbio (write_nb ...); ...; sys_epoll_wait sock EPOLL_READ; ... foo; ... n<-sys_nbio (write_nb ...); ...}
scheduler = do { ... trace <- fetch_thread; execute trace; ...}
execute trace = case trace of SYS_NBIO c -> do { cont <- c; execute cont; } SYS_FORK t1 t2 -> ...
SYS_NBIO(accept)
SYS_EPOLL_WAIT(s)
SYS_NBIO(accept)
SYS_FORK
SYS_NBIO(write_nb)
SYS_EPOLL_WAIT(sock)
Multithreaded code Trace
Thread Abstraction E
vent Abstraction
InternalRepresentation Scheduler code
CPSMonad
SYS_NBIO(write_nb)
![Page 7: 1 Combining Events and Threads for Scalable Network Services Peng Li and Steve Zdancewic University of Pennsylvania PLDI 2007, San Diego](https://reader035.vdocument.in/reader035/viewer/2022062713/56649f575503460f94c7b568/html5/thumbnails/7.jpg)
7
Questions on the poor man’s approach
Does it work for high-performance network services?(using a pure, lazy, functional language?)
How does the design scale up to real systems? Symmetrical multiprocessing? Synchronization? I/O?
How cheap is it? How much does a poor man’s thread cost?
How poor is it? Does it offer acceptable performance?
![Page 8: 1 Combining Events and Threads for Scalable Network Services Peng Li and Steve Zdancewic University of Pennsylvania PLDI 2007, San Diego](https://reader035.vdocument.in/reader035/viewer/2022062713/56649f575503460f94c7b568/html5/thumbnails/8.jpg)
8
Our experiment
A high-performance Haskell framework for massively-concurrent network services!!!
Supported features: Linux Asynchronous IO (AIO) epoll() and nonblocking IO OS thread pools SMP support Thread synchronization primitives
Applications developed IO benchmarks on FIFO pipes / Disk head scheduling A simple web server for static files HTTP load generator Prototype of an application-level TCP stack
We used the Glasglow Haskell Compiler (GHC)
![Page 9: 1 Combining Events and Threads for Scalable Network Services Peng Li and Steve Zdancewic University of Pennsylvania PLDI 2007, San Diego](https://reader035.vdocument.in/reader035/viewer/2022062713/56649f575503460f94c7b568/html5/thumbnails/9.jpg)
9
Exception handling
Nested function calls
Conditional branches
Synchronous call to I/O lib
Recursion
Multithreaded code example
![Page 10: 1 Combining Events and Threads for Scalable Network Services Peng Li and Steve Zdancewic University of Pennsylvania PLDI 2007, San Diego](https://reader035.vdocument.in/reader035/viewer/2022062713/56649f575503460f94c7b568/html5/thumbnails/10.jpg)
10
Event-driven code exampleA wrapper function to the C library call using the Haskell Foreign Function Interface
(FFI)
Put events in queues for processing in other OS threads
An event loop running in a separate OS thread
![Page 11: 1 Combining Events and Threads for Scalable Network Services Peng Li and Steve Zdancewic University of Pennsylvania PLDI 2007, San Diego](https://reader035.vdocument.in/reader035/viewer/2022062713/56649f575503460f94c7b568/html5/thumbnails/11.jpg)
11
A complete event-driven I/O subsystem
Submit AIO request
Register event handler
ready_queue
blio_queue
worker_blio
worker_epoll
worker_aio
SYS_BLIO
/ forkworker_main
worker_main
worker_main
worker_main
Epollinterface
AIO interface
Event notification
AIO completion
Fet
ch t
hre
ad
s
OS thread pool for CPS thread execution
with event handler
Fetch thread
System call completion, thread ready to run
Context switchworker_blio
OS thread pool for blocking I/O
Haskell Foreign Function
Inteface (FFI)
Each event loop runs in a
separate OS thread
One “virtual processor” event
loop for each CPU
![Page 12: 1 Combining Events and Threads for Scalable Network Services Peng Li and Steve Zdancewic University of Pennsylvania PLDI 2007, San Diego](https://reader035.vdocument.in/reader035/viewer/2022062713/56649f575503460f94c7b568/html5/thumbnails/12.jpg)
12
Modular and customizable I/O system (add a TCP stack if you like)
Submit AIO request
Register event handler
ready_queue
blio_queue
worker_epoll
worker_aio
SYS_BLIO
/ forkworker_main
worker_main
worker_main
worker_main
Epollinterface
AIO interface
Event notification
AIO completion
Fetc
h t
hre
ad
s
OS thread pool for CPS thread execution
with event handler
Fetch thread
System call completion, thread ready to run
Context switch
TCP stackstates
TCP User requests
worker_tcp_input
worker_tcp_timer
Request Completion
Request Completion
Blocking
worker_blio
worker_blio
OS thread pool for blocking I/O
worker_main
Define / interpret TCP syscalls (22 lines)
Event loop for incoming packets (7 lines)
Event loop for timers (9 lines)
![Page 13: 1 Combining Events and Threads for Scalable Network Services Peng Li and Steve Zdancewic University of Pennsylvania PLDI 2007, San Diego](https://reader035.vdocument.in/reader035/viewer/2022062713/56649f575503460f94c7b568/html5/thumbnails/13.jpg)
13
How cheap is a poor man’s thread?
Minimal memory consumption: 48 bytes
Each thread just loops and does nothing Actual size determined by thread-local states
Even an ethernet packet can be >1,000 bytes… Pay as you go --- only pay for things needed
In contrast: A Linux POSIX thread’s stack has 2MB by default The state-of-the-art user-level thread system (Capriccio) use at
least a few KBs for each thread
Observation:The poor man’s thread is extremely memory-efficient(Challenging most event-driven systems)
48 bytes
![Page 14: 1 Combining Events and Threads for Scalable Network Services Peng Li and Steve Zdancewic University of Pennsylvania PLDI 2007, San Diego](https://reader035.vdocument.in/reader035/viewer/2022062713/56649f575503460f94c7b568/html5/thumbnails/14.jpg)
14
I/O scalability test Comparison against the Linux POSIX Thread Library
(NPTL) Highly optimized OS thread implementation Each NPTL thread’s stack limited to 32KB
Mini-benchmarks used: Disk head scheduling (all threads running) FIFO pipe scalability with idle threads (128 threads running)
![Page 15: 1 Combining Events and Threads for Scalable Network Services Peng Li and Steve Zdancewic University of Pennsylvania PLDI 2007, San Diego](https://reader035.vdocument.in/reader035/viewer/2022062713/56649f575503460f94c7b568/html5/thumbnails/15.jpg)
15
A simple web server
![Page 16: 1 Combining Events and Threads for Scalable Network Services Peng Li and Steve Zdancewic University of Pennsylvania PLDI 2007, San Diego](https://reader035.vdocument.in/reader035/viewer/2022062713/56649f575503460f94c7b568/html5/thumbnails/16.jpg)
16
How poor is the poor man’s monad?
Not too shabby Benchmarks shows comparable (if not higher)
performance to existing, optimized systems
An elegant design is more important than 10% performance improvement
Added benefit: type safety for many dangerous things Continuations, thread queues, schedulers, asynchronous I/O
![Page 17: 1 Combining Events and Threads for Scalable Network Services Peng Li and Steve Zdancewic University of Pennsylvania PLDI 2007, San Diego](https://reader035.vdocument.in/reader035/viewer/2022062713/56649f575503460f94c7b568/html5/thumbnails/17.jpg)
17
Related Work We are motivated by two projects:
Twisted: the python event-driven framework for scalable internet applications- The programmer must write code in CPS
Capriccio: a high-performance user-level thread system for network servers- Requires C compiler hacks
- Difficult to customize (e.g. adding SMP support)
Continuation-based concurrency [Wand 80], [Shivers 97], …
Other languages and programming models: CML, Erlang, …
![Page 18: 1 Combining Events and Threads for Scalable Network Services Peng Li and Steve Zdancewic University of Pennsylvania PLDI 2007, San Diego](https://reader035.vdocument.in/reader035/viewer/2022062713/56649f575503460f94c7b568/html5/thumbnails/18.jpg)
18
Conclusion Haskell and The Poor Man’s Concurrency
Monad are a promising solution for high-performance, massively-concurrent networking applications:
Get the best of both threads and events!
This poor man’s approach is actually very cheap, and not so poor!
http://www.cis.upenn.edu/~lipeng/homepage/unify.html