Engineering a formally verified operating system microkernel
Wolfgang PaulSaarland University
joint work withM. Gargano, M. Hillebrand, D. Leinenbach
Formal verification of a system means here
1. Specify desired behaviour (semantics)2. Specify construction (in model with semantics)3. Theorem: construction satisfies specification4. Check proof with CAV system
Semantics: you know precisely what you are talking about
German government project Verisoft…
• Formally verify entire complex computer systems consisting of– hardware– system software– comunication system– Application
• Industry partners: BMW, TSystems, Infineon,…• Academics partners: Saarbrücken, Munich, Darmstadt,…• 4 Mio €/Year • Now 17 months old
Project in case of success…
• turns computer science into unified mathematical theory
• increased speed of teaching• Splits market into
– non verified– verified mass products
• Reality in computer architecture
• next high end controller from Infineon (verification cheaper than testing)
Overview (this talk and system layers)
• Hardware: – physical, – virtual, – simulation
• C0Language:– syntax, – semantics, – compilation– in line assembler (C0A)
• CVM (communicating virtual machines): – model of computation with abstract kernel ² C0,– implementation by concrete kernel ² C0A
virtual machine configurations d
d.R
d.vm(i)cpu
Virtual memory
virtual machine next state d‘
d.R
d.vm(i)
• no page fault interrupts
Physical machines
d.R
d.pm(i)
d.sm(j)
• swap memory c.sm
• registers •d.mode
•d.pto (page table origin)
•d.ptl (page table length)
• address translation if d.mode = 1
• page fault interrupts
swap memory
pysical memory
address translation sequentially
• Address translation of virtual addresses va– va = (va.px, va.bx)– px: page index,– bx: byte index– d DLX configuration vp p . x
v a . p x v a . b x
p p a ( d , v a )
p t ( d )
address translation sequentially
• Address translation of virtual addresses va– va = (va.px, va.bx)– px: page index,– bx: byte index– d DLX configuration
• ppa(d,va) = … • ppa: physical page
address
vp p . x
v a . p x v a . b x
p p a ( d , v a )
p t ( d )
address translation sequentially
• Address translation of virtual addresses va– va = (va.px, va.bx)– px: page index,– bx: byte index– d DLX configuration
• ppa(d,va) = (pt(d, va.px).ppx, va.bx)• ppa: physical page address• pt: page table• ppx: physical page index
vp p . x
v a . p x v a . b x
p p a ( d , v a )
p t ( d )
(sequential) simulation of virtual machines (u) by physical machines (d)
• u.vm(va) =– d.pm(pma(va)): pt(d, va.px).v = 1 (in cache)– d.sm(sma(va)): otherwise
• d.pm is cache für u.vm• theorem: physical DLX + page
fault handler simulates virtual DLX
• livesness: do not evict most recently swapped in page
va
u.vm ppa(va)
d.pm
sma(va)
d.sm
hardware correctness: address translation in a pipeline (or Tomasulo scheduler)
• address translation by 2 MMU‘s (data+ instr)
• without TLB two memory accesses (pt, memory)
• correctness: pt and memory must stay constant during translated access
• software requirement for processor correctness theorem
– Write to code address and read to that address separated by sync (pipe drain) instruction (!)
• Proof obligation for page fault handlers !
f e t c h
w B
M
E X
I D
M M UI F I c a c h e
D c a c h eM M Ul / s
M e m
Overview (this talk and system layers)
• Hardware: – physical, – virtual, – simulation
• C0Language:– syntax, – semantics, – compilation– in line assembler (C0A)
• CVM (communicating virtual machines): – model of computation with abstract kernel ² C0,– implementation with concrete kernel ² C0A
C0 syntax
C0 syntax
• accepted by engineers as sufficiently Clike
C0 semantics
1. Hoare logics• Equivalent to big steps operational semantics• Shallow embedding into IsabelleHOL highly productive (1 page code/person week)
2. Small steps operational semantics• Equivalent to ASM‘s [Glesner 2003]• Needed for interleaving runs of kernel and users (later also C0 programs)• Imports results from Hoare logics
big step vs. small step semantics
1. Hoare logics• Equivalent to big steps operational semantics
2. Small steps operational semantics• Needed for interleaving runs of kernel and users
semantics and syntax trees
1. Hoare logics• equivalent to big steps operational
semantics• perfectly matches syntax tree/ (non
optimized) code generation
2. Small steps operational semantics• Needed for interleaving runs of kernel and
users• match with syntax tree not perfect
e S_1
ifte
S_2
C machine configurations
Borrowing from [Loeckx, Mehlhorn, Wilhelm 86], • c = (c.S, c.pr)• c.pr program rest• c.S state• c.S = (TT, FT, gm, hm, lms)
– TT: {type names}!{type descriptors}– FT:{function names}!{types}X{bodies}– gm: global memory– hm: heap memory– lms: [1: recursion depth]!{local memories}
Borrowing from [Norrish 99]• memory m: array of simple values (+ name and type info)• simple: bool, char, int, float, double, pointer• variable: (m,i) i’th variable in m• va(c,(m,i)) = c.S.m size(m,i) (ba(m,i))
• ba(m,i) base address• Pointer values: subvariables (m,i)[7].next
va(c,(m,i))
ba(m,i)
memory m
size(m,i)
Function call semantics
&id
e_itop(c‘)
lms(0)
top(c)
Aligned allocation of (sub)variables
x.naj1
x.naj
x
displ(j,t)
Lemma: no misalignment interrupts
Simulation relation consis(c, alloc, d)
p
y
alloc(c,p)
alloc(c,y)
d.vm
code generation and correctness
• by induction on syntax tree
e S_1
ifte
S_2
code(e) ? Code(S_1) Code(S_2)
• easy induction on T and syntax tree for big steps semantics
code generation and correctness
• by induction on syntax tree
e S_1
ifte
S_2
code(e) ? Code(S_1) Code(S_2)
• easy induction on T and syntax tree for big steps semantics
•wrong theorem
Step by step simulation theorem
Proof: induction on T:(cc(T‘,q)) for all statements and statement sequences q terminating at T
Step by step simulation theorem
Proof: induction on T:(cc(T‘,q)) for all statements and statement sequences q terminating at T
Problem: Last statement q of S_1 in if e then S_1 else S_2 ends: q, S_1, if e then S_1 else S_2
code(e) ? Code(S_1) Code(S_2)q
Simultaneously ending statements
returncall
ifte
while
body(g)body(f)
Aftermath of single statement execution
• If q terminates in T, then some q0 ends after 1 step in T– Leaf– while with false condition
• in at most 2 trees– 2 trees only for call/return– Follow path q0, q1, ... up (from call) of simultaneously ending statements
code(e) ? Code(S_1) Code(S_2)q0
CT.pr = r;r‘r statement
call
ifte
while
body(f)
Lemmas:
•After code(qx) data are consistent
•Code(q x+1) terminates 0 or 1 steps after code(q x)
•0 or 1 steps after Code(q s) control is consistent, i.e. PCs point to code(r)•0 or 2 steps for delayed branch
q0
qs
C0A: C0 with in line assembler code
• Assembler code is necessary: user processes and CPU registers not visible in C variables
• Syntax: asm(u), {updated C variables x are global,…}
• Compilation: u• Semantics:
Overview (this talk and system layers)
• Hardware: – physical, – virtual, – simulation
• C0Language:– syntax, – semantics, – compilation– in line assembler (C0A)
• CVM (communicating virtual machines): – model of computation with abstract kernel ² C0,– implementation with concrete kernel ² C0A
CVM: communicating virtual machines
• abstract parallel user model of kernel• cvm = (ca, ..., u(p),...,vmsize(p),..., cp ,...)
– ca: C0machine konfiguration of abstract kernel k – u(p): p'th user machine configuration– cp = 0: kernel running (current process)– cp = i: user u(p) running
• parameter: kernel call definition– kcd: IN ! {fnames of k} – trap i call function kcd(i) of k
• No in line code in CVM: user processes visible in parallel model !
CVM implementation: by concrete kernel K ² C0A
• Additional data structures of K• PCB[p]: process control blocks; save/restore
registers• pt: page tables• spt: swap memory page tables• cp: current process• ...
CVM semantics and implementation (1)
CVM semantics and implementation (2)
CVM semantics and implementation (3)
CVM semantics and implementation (4)
CVM semantics and implementation (5)
Simulation relations
• 3 computations:– cvm0,cvm1,.... : CVM machine– cc0,cc1,....: concrete kernel K– d0, d1,...: physical DLX machine
• 3 simulation relations– consis(cc, alloc, d): compiler– kconsis(cvm.ca, kalloc, cc): k translated into K; subgraph isomorphism
between heaps– B(i, cvm, d): from virtual memory simulation
simulation theorem
• 3 computations– for all cvm0,cvm1,… : (CVM) exist– d0, d1, ….(physical DLX): – subsequence D of d0, d1, …. (inputs for C0Acomp.)– (cc0,D), (cc1,tail(D)),…: concrete kernel K
• 2 sequences of – Numbers of steps s(i) and t(j)– Allocation functions alloci and kalloc j
• such that– kconsis (cvmi.ca, kalloci, ccs(i) )– consis (ccs(i), alloci, dt(i) )– B(j, cvmi, dt(i) ) for all user virtual machines j
correctness proof overview
• user mode, page fault: uses correctness of memory management
• kernel from C0A: uses compilercorrectness
• hardware: assumptions on synchronization are proven for page fault handlers of the concrete kernel K
version of L4 Kernel
CVM
Kernel
DLX
Hardware
VHDL
Compiler MM
30 Slides
further work (CVM)
• complete formal verification• specialise cvm
– vamos: L4version– OSEKTIME (Automotive)
• treat concrete I/Odevices• simple OS on top of vamos
– build– verify
further work (verisoft)
• automotive (BMW)– electronic control unit hardware (processor + bus interface)– flex ray bus hardware + protocol– OSEKTIME– emergency call (using 2 ecu's)– recall grand challenge of J. Moore
• public project + TSystems– kryptoprotocols on top of OS
Summary (correctness proofs)
• memory management– (formal) verification of procesor + MMU– virtual memory simulation arguing about hardware and software
• compiler– alignment– dynamic heap– small steps semantics– (delayed branch/delay slot filling)
• kernel – semantics: abstract parallel CVM model– code level ² C0A
– uniform treatment of functions and handlers
• natural combination of established formalism