the performance of microkernel-based systems l4linux

25
The Performance of Microkernel-Based Systems L4Linux

Upload: magdalene-cain

Post on 02-Jan-2016

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Performance of Microkernel-Based Systems L4Linux

The Performance of Microkernel-Based Systems

L4Linux

Page 2: The Performance of Microkernel-Based Systems L4Linux

What is a microkernel

• kernels that only provide address spaces, threads, and IPC

• kernel does not handle e.g. the file system or interrupts

Page 3: The Performance of Microkernel-Based Systems L4Linux

Mircokernel abstraction level

• some researchers feel abstraction level is too high

– kernel should map more directly to hardware

• some researchers feel abstraction is too low

– focus on extensibility (Mach)

Page 4: The Performance of Microkernel-Based Systems L4Linux

L4

• so called 2nd generation microkernel

• built from scratch as opposed to a developed from earlier monolithic kernel approaches (e.g. Mach)

Page 5: The Performance of Microkernel-Based Systems L4Linux

L4 essentials

• threads, address spaces and cross-address-space communication (IPC)

• other kernel operations e.g. RPC and address-space crossing thread migration are built up from IPC primitives

Page 6: The Performance of Microkernel-Based Systems L4Linux

Address spaces

• recursive construction

• granting, mapping, unmapping

• a page owner can grant or map any of its pages to another address space with the receiver permission. That page is then accessible to both address spaces

Page 7: The Performance of Microkernel-Based Systems L4Linux

Address spaces (cont)

• only the grant, map, and unmap are implemented in the kernel

• user-level “pagers” handle page faults

Page 8: The Performance of Microkernel-Based Systems L4Linux

Interrupts, exceptions, and traps

• all handled at user level

• interrupts are transformed, by kernel, into IPC messages and sent to appropriate user level thread

• exceptions and traps are synchronous to associated thread and kernel mirrors them to that thread

Page 9: The Performance of Microkernel-Based Systems L4Linux

Implementation of L4Linux

• develop L4Linux, a linux “personality” on top of the L4 microkernel

• due to time restrictions the linux kernel was not fine tuned in L4Linux, so results are only an upper bound on the performance penalty

Page 10: The Performance of Microkernel-Based Systems L4Linux

L4Linux

• linux 2.0.21 on top of L4

• linux kernel is a user level server

• 100% binary compatible– modified versions of shared C library libc.so and libc.a– user level “trampoline” exception

• 14 engineer months and 6500 rewritten lines out of a total of ~340,000

Page 11: The Performance of Microkernel-Based Systems L4Linux

Trampoline• 100% binary compatible means that a program

statically linked against the native linux library must run, unmodified, on L4Linux

• the trampoline “bounces” the system-call trap that on native linux went into the kernel back into the modified shared library on L4Linux

• Microkernel upcalls into user level handler, handler than makes an RPC (read, invokes kernel again) to OS personality to invoke system call

Page 12: The Performance of Microkernel-Based Systems L4Linux

L4Linux (cont)• L4 maps the entire initial address space to

kernel server

• single thread in L4, acts as a single virtual processor to the linux server

• Linux server occupies a small memory region, which utilizes Pentium’s segment feature to protect its TLB entries, so the TLB always has the linux server’s translations (small-address-space optimization)

Page 13: The Performance of Microkernel-Based Systems L4Linux

L4Linux (cont)

• L4 allows user level processes to disable interrupts, so uniprocessor version of linux did not need modification of critical sections

Page 14: The Performance of Microkernel-Based Systems L4Linux

L4Linux (cont)

• interrupt threads have a priority above the server itself, so they don’t execute concurrently

• signals are forwarded to a co-located signal handler inside each user process, since only a thread in the same address space can manipulate another thread’s state

Page 15: The Performance of Microkernel-Based Systems L4Linux

L4Linux (cont)

• scheduling is mostly done by L4 scheduler

• Four priority levels: top half interrupts, bottom half interrupts, the linux server, user processes. No priority decay.

• so L4 interrupts the linux server in the same way the hardware would interrupt a native linux kernel

Page 16: The Performance of Microkernel-Based Systems L4Linux

L4Linux (cont)

• “user level schedulers can dynamically change priority and time slice of any thread”?

Page 17: The Performance of Microkernel-Based Systems L4Linux

Experiments

• micro- and macro- benchmarks used to compare native linux and MkLinux (Mach derived variant) to L4Linux– linux vs L4Linux demonstrates performance penalty

for using microkernel– L4Linux vs MkLinux demonstrates influence of

mircokernel on overall system including the influence of colocation

• extensibility experiments– functionality specialized for L4Linux

Page 18: The Performance of Microkernel-Based Systems L4Linux

PerformanceL4Linux, MkLinux and Linux

• microbenchmarks – getpid: L4Linux 2.4 or 3.4 times slower than linux;

MkLinux 3.9 or 28 times slower than L4Linux– lmbench and hbench: L4Linux 1 to 3 times slower

than linux; MkLinux 1 to 32 times slower than L4Linux

Page 19: The Performance of Microkernel-Based Systems L4Linux

Performance (cont)L4Linux, MkLinux and Linux

• macrobenchmarks– recompiling linux server: L4Linux 6%-7% slower than

linux; MkLinux 10%-20% slower than L4Linux– AIM multiuser benchmark suite: job throughput in

L4Linux is 7%-8% lower than linux; MkLinux is 30%-52% lower than L4Linux

Page 20: The Performance of Microkernel-Based Systems L4Linux

Conclusions

• At application level there is a 5%-10% performance penalty for using L4Linux vs bare linux

• The particular microkernel used matters

• Colocation it secondary to microkernel implementation

Page 21: The Performance of Microkernel-Based Systems L4Linux

Extensibility

• “Can we add services outside L4Linux to improve performance by specializing Unix functionality”

• “Can we improve certain applications by using native microkernel mechanisms in addition to the classical API”

• “Can we achieve high performance for non-classical, Unix-incompatible systems coexisting with L4Linux”

Page 22: The Performance of Microkernel-Based Systems L4Linux

Pipes and RPCLatency (us)

Bandwidth(MB/s)

Linux pipe 29 41

L4Linux pipe 46 40

L4Linux trampoline pipe 56 38

MkLinux (user) pipe 722 10

MkLinux (in-kernel) pipe 316 13

L4 pipe 22 47-10

Synchronous L4 RPC 5 65-105

Synchronous mapping RPC 12 2480-2900

Page 23: The Performance of Microkernel-Based Systems L4Linux

Virtual Memory

• measure user level page fault that maps a page from one address space to another (not available on Unix)

• measured traps and two different trap, protect, unprotect patterns which performed on average ~4 times faster than native linux

Page 24: The Performance of Microkernel-Based Systems L4Linux

Cache Partitioning

• User level main-memory manager can coordinate with L4 to allocate specific L2 cache pages to certain processor

• matrix multiplication example with a four times speed-up of worst case performance

Page 25: The Performance of Microkernel-Based Systems L4Linux

Possible Alternatives

• Protected Control Transfers

• Grafting