the performance of microkernel-based systems l4linux

Post on 02-Jan-2016

225 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

The Performance of Microkernel-Based Systems

L4Linux

What is a microkernel

• kernels that only provide address spaces, threads, and IPC

• kernel does not handle e.g. the file system or interrupts

Mircokernel abstraction level

• some researchers feel abstraction level is too high

– kernel should map more directly to hardware

• some researchers feel abstraction is too low

– focus on extensibility (Mach)

L4

• so called 2nd generation microkernel

• built from scratch as opposed to a developed from earlier monolithic kernel approaches (e.g. Mach)

L4 essentials

• threads, address spaces and cross-address-space communication (IPC)

• other kernel operations e.g. RPC and address-space crossing thread migration are built up from IPC primitives

Address spaces

• recursive construction

• granting, mapping, unmapping

• a page owner can grant or map any of its pages to another address space with the receiver permission. That page is then accessible to both address spaces

Address spaces (cont)

• only the grant, map, and unmap are implemented in the kernel

• user-level “pagers” handle page faults

Interrupts, exceptions, and traps

• all handled at user level

• interrupts are transformed, by kernel, into IPC messages and sent to appropriate user level thread

• exceptions and traps are synchronous to associated thread and kernel mirrors them to that thread

Implementation of L4Linux

• develop L4Linux, a linux “personality” on top of the L4 microkernel

• due to time restrictions the linux kernel was not fine tuned in L4Linux, so results are only an upper bound on the performance penalty

L4Linux

• linux 2.0.21 on top of L4

• linux kernel is a user level server

• 100% binary compatible– modified versions of shared C library libc.so and libc.a– user level “trampoline” exception

• 14 engineer months and 6500 rewritten lines out of a total of ~340,000

Trampoline• 100% binary compatible means that a program

statically linked against the native linux library must run, unmodified, on L4Linux

• the trampoline “bounces” the system-call trap that on native linux went into the kernel back into the modified shared library on L4Linux

• Microkernel upcalls into user level handler, handler than makes an RPC (read, invokes kernel again) to OS personality to invoke system call

L4Linux (cont)• L4 maps the entire initial address space to

kernel server

• single thread in L4, acts as a single virtual processor to the linux server

• Linux server occupies a small memory region, which utilizes Pentium’s segment feature to protect its TLB entries, so the TLB always has the linux server’s translations (small-address-space optimization)

L4Linux (cont)

• L4 allows user level processes to disable interrupts, so uniprocessor version of linux did not need modification of critical sections

L4Linux (cont)

• interrupt threads have a priority above the server itself, so they don’t execute concurrently

• signals are forwarded to a co-located signal handler inside each user process, since only a thread in the same address space can manipulate another thread’s state

L4Linux (cont)

• scheduling is mostly done by L4 scheduler

• Four priority levels: top half interrupts, bottom half interrupts, the linux server, user processes. No priority decay.

• so L4 interrupts the linux server in the same way the hardware would interrupt a native linux kernel

L4Linux (cont)

• “user level schedulers can dynamically change priority and time slice of any thread”?

Experiments

• micro- and macro- benchmarks used to compare native linux and MkLinux (Mach derived variant) to L4Linux– linux vs L4Linux demonstrates performance penalty

for using microkernel– L4Linux vs MkLinux demonstrates influence of

mircokernel on overall system including the influence of colocation

• extensibility experiments– functionality specialized for L4Linux

PerformanceL4Linux, MkLinux and Linux

• microbenchmarks – getpid: L4Linux 2.4 or 3.4 times slower than linux;

MkLinux 3.9 or 28 times slower than L4Linux– lmbench and hbench: L4Linux 1 to 3 times slower

than linux; MkLinux 1 to 32 times slower than L4Linux

Performance (cont)L4Linux, MkLinux and Linux

• macrobenchmarks– recompiling linux server: L4Linux 6%-7% slower than

linux; MkLinux 10%-20% slower than L4Linux– AIM multiuser benchmark suite: job throughput in

L4Linux is 7%-8% lower than linux; MkLinux is 30%-52% lower than L4Linux

Conclusions

• At application level there is a 5%-10% performance penalty for using L4Linux vs bare linux

• The particular microkernel used matters

• Colocation it secondary to microkernel implementation

Extensibility

• “Can we add services outside L4Linux to improve performance by specializing Unix functionality”

• “Can we improve certain applications by using native microkernel mechanisms in addition to the classical API”

• “Can we achieve high performance for non-classical, Unix-incompatible systems coexisting with L4Linux”

Pipes and RPCLatency (us)

Bandwidth(MB/s)

Linux pipe 29 41

L4Linux pipe 46 40

L4Linux trampoline pipe 56 38

MkLinux (user) pipe 722 10

MkLinux (in-kernel) pipe 316 13

L4 pipe 22 47-10

Synchronous L4 RPC 5 65-105

Synchronous mapping RPC 12 2480-2900

Virtual Memory

• measure user level page fault that maps a page from one address space to another (not available on Unix)

• measured traps and two different trap, protect, unprotect patterns which performed on average ~4 times faster than native linux

Cache Partitioning

• User level main-memory manager can coordinate with L4 to allocate specific L2 cache pages to certain processor

• matrix multiplication example with a four times speed-up of worst case performance

Possible Alternatives

• Protected Control Transfers

• Grafting

top related