tools and techniques for higher reliability software
TRANSCRIPT
Tools and Techniques forHigher Reliability Software
FOSDEM 2013 – Ada Developer Room
Philippe WaroquiersEurocontrol/DNM
3 February 2013
FOSDEM 2013 2
Eurocontrol
European Organisation for the Safety of Air Navigation International organisation, 39 member states Multiple activities/directorates/…
Participates/supports big European projects Central Route Charge Office Directorate Network Management ….
More info: www.eurocontrol.int
FOSDEM 2013 3
Directorate Network Management
Air Traffic Management “Network Management” = services of general interest for the European Aviation
European air route design Flight plan processing Flow management Scarce resources management
Radio frequencies SSR codes
Crisis management (remember 2010 volcano ash crisis)
…
FOSDEM 2013 4
Flight plan processing, Flow management, …
Flight Plan processing over the whole of Europe (IFPS) Aircraft Operators send flight plans to IFPS Flight plans are verified, corrected if needed, redistributed to airspace
control centres, airports, Aircraft Operators
Flow Management (ETFMS) : Balancing demand and capacity Safety : avoid Air Traffic Control overload Efficiency : best use of ATC capacity, minimise delays
ENV: “airspace data management system” Data Ware house User interfaces for external users
Web Portal …
FOSDEM 2013 5
2D Trajectory, alternate routes
FOSDEM 2013 6
Vertical Trajectory
FOSDEM 2013 7
Differences radar plots <> Plan
FOSDEM 2013 8
Recomputed with radar plots
FOSDEM 2013 9
Macroscopic view of Europe
FOSDEM 2013 10
ETFMS & IFPS
Sophisticated systems around 2 million SLOC of Ada
Reliability requirements If IFPS down: no flight plan processing in Europe ! If ETFMS down: passengers will sleep in aerodromes ! Duplicated hardware, duplicated sites, contingency systems, …
Performances requirements ETFMS handles 3 millions messages per day
Sometimes implies complex processing (e.g. recompute a flight route)
Safety requirements Various obligations about people, procedures and systems
E.g. Software Assurance Level (SWAL)
Safety audits
FOSDEM 2013 11
Better no critical bugs in critical systems …
Use of uninitialised data Memory leaks Dangling pointers Buffer overflows Race conditions Performance issues Memory use problems …
FOSDEM 2013 12
But how to avoid/find/eliminate such bugs ?
People (qualifications, training, …) Procedures (code review, coding standards, …) Testing (unit testing, integration testing, user acceptance tests,
shadow operations, security audits, …)
But also TOOLS The Ada language is a main asset to avoid many such bugs
Thanks to early detection at compilation time Thanks to run-time checks showing bugs during early testing
Valgrind is a main asset to find and eliminate remaining bugs
FOSDEM 2013 13
What is Valgrind ?
Valgrind = framework to build runtime analysis tools + a set of tools Framework = about 400 KSLOC Tools : between 3 KSLOC to 22 KSLOC
Tools: Memcheck Callgrind Helgrind Drd Massif Exp-sgcheck …
FOSDEM 2013 14
Use of uninitialised data (1)
Memcheck --undef-value-errors=yes (default value) Will report an error if an undefined value use will change the behaviour
Ada language : pragma Normalize_Scalars All non-explicitly assigned scalars are automatically given a (invalid if
possible) value Run-time checks will detect the use of a invalid value GNAT pragma Initialize_Scalars
More flexible version of Normalize_Scalars Initial scalar value can be controlled Flexibility about which/when run-time checks are done
FOSDEM 2013 15
Use of uninitialised data (2)
Memcheck detects a bug even if there is no invalid value Initialize_Scalars
Detects a bug only if there is an invalid value in the range of the type Otherwise, runs with different initial values can expose use of unitialised
data Initialize_Scalars is faster than memcheck
-O0 + all checks on + Initialize_Scalars only 2x slower than
-O2 + standard Ada Reference Manual checks (these checks detect the most horrible/random behaviour)
At Eurocontrol: Day to day development done with Initialize_Scalars
Some “shadow operational” testing period with Initialize_Scalars Week-end builds validated with memcheck
FOSDEM 2013 16
Memory leaks
Avoid by using Ada constructs Often, some Ada constructs allow to avoid using heap
E.g. record discriminants, OO types without heap, arrays, …
Otherwise, manage heap a “safe” way : Controlled types, storage pools Not always possible (CPU, memory)
Detect with gcc/gnat debug pools (GNAT.Debug_Pools) “pre-processing” + recompile
Detect with memcheck --leak-check=full
FOSDEM 2013 17
Dangling pointers
Avoid by using Ada constructs : same as avoid memory leaks Detect with gcc/gnat debug pools Detect with memcheck Detect with gcc “address sanitizer” option
New functionality, will be in gcc 4.8 Need to recompile Not (yet) tried at Eurocontrol
FOSDEM 2013 18
Buffer overflows (1)
Ada arrays are first class citizens ‘range, ‘first, ‘last, … avoid buffer overflows Arrays always carry their bounds
Detect with Ada : standard mandates array index verification All array overflows are detected before damage Buffer overflow results in a run-time exception
=> no “random behaviour”
Very small overhead. Measured on a representative program (compiled with optimisation) :
less than 2% for all standard Ada Reference Manual checks (a part of these Ada RM checks are the buffer overflow checks).
FOSDEM 2013 19
Buffer overflows (2)
Detect (not needed with Ada) with Memcheck Detects (most) buffer overflows in heap allocated blocks No detection in global or stack or “inside” a struct
Detect (not needed with Ada) with Exp-sgcheck Experimental tool detecting stack and global overrun No detection “inside” a struct
Detect (not needed with Ada) with gcc “address sanitizer” option Will be in gcc 4.8, not (yet) tried at Eurocontrol No detection “inside” a struct
Only the Ada run-time checks are detecting all buffer overflows E.g. “inside” record (struct) components
FOSDEM 2013 20
Race conditions
Avoid by using Ada constructs Ada tasks (threads) are first class citizens Many constructs helps to avoid race conditions
Rendez-vous, protected objects, …
Ada multi-tasking constructs are easy – higher abstraction level(or at least easier to use than pthreads) E.g. protected objects
Detect by using helgrind (or drd) Helgrind used very successfully at Eurocontrol
Detect by using gcc “thread sanitizer” option New functionality, will be in gcc 4.8 Need to recompile Not tried (yet) at Eurocontrol
FOSDEM 2013 21
Performance issues
Callgrind : where is my CPU spent ? It can measure a lot more
E.g. memory cache misses using a cache simulator
Callgrind is the main tool used at Eurocontrol to tune the performance
Kcachegrind : amazing visualisation tool for callgrind output
FOSDEM 2013 22
Kcachegrind
FOSDEM 2013 23
Kcachegrind
FOSDEM 2013 24
Memory use analysis
Memcheck Report “delta memory” usage between two memory scans Reports can be triggered from the program or from the shell
Massif Shows the evolution of memory use with time Produces reports at regular interval or on request
Exp-dhat Shows if heap allocated memory is “accessed” a lot E.g. can report memory allocated and then not used anymore
Memcheck and Massif used at Eurocontrol
FOSDEM 2013 25
Feedback from Valgrind use at Eurocontrol
Very easy to use No re-compilation, no re-linking, works with closed source libs, … Many powerful/advanced functionalities
But, depending on the tool 3 .. 100+ times slower 2 .. xxx+ more memory
Eurocontrol applications are big/heavy Encountered very high memory and CPU use by Valgrind => several optimisations/additional functionalities added to Valgrind
FOSDEM 2013 26
Valgrind NEWS
One or two new major releases per year New platforms, support for new instructions, … New functionalities, new tools, … Optimisation in CPU or memory, … Bug fixes
Easy to get and compile new versions Get last released version on http://www.valgrind.org Next (unreleased) version:
svn co svn://svn.valgrind.org/valgrind/trunk valgrind cd valgrind ./autogen.sh ./configure --prefix=... make make install
FOSDEM 2013 27
Valgrind NEWS
Current release : 3.8.1 Next release under development : 3.9.0 We will discuss recently provided or next release NEWS Not yet released functionality in orange (will be in 3.9.0)
FOSDEM 2013 28
Valgrind NEWS: platforms
Started on linux/x86 Now available on
Linux/x86,amd64,ppc32,ppc64,arm,s390,mips32 Android/arm,x86 MacOS/x86,amd64
Support for new instructions E.g. SSE, AVX, AES E.g. ppc Decimal Floating Point instructions
Support for new distributions and glibc versions
FOSDEM 2013 29
Valgrind NEWS: improved leak functionality
3.8.1 memcheck leak suppression suppresses all leak kinds
E.g. an entry aimed at suppressing “possible leak” also suppresses “definite leak”
Dangling pointer errors only reports the “freed at” stack trace
3.9.0 : A suppression optionally indicates the kind of leaks to suppress Command line arguments to control output and/or exit code
--show-leak-kinds=kind1,kind2,… --errors-for-leak-kinds=kind1,kind2,…
--keep-stacktraces=alloc|free|alloc-and-free|alloc-then-free|none Can report more stacktraces in a dangling pointer error Or can optimise memory by recording fewer or no stack traces
E.g. if not interested in some error kinds
--merge-recursive-frames=<number> Useful to limit the number of recorded stack traces by merging recursive calls
FOSDEM 2013 30
Valgrind NEWS : gdb server (1)
GDB server allows to have fully debuggable program under Valgrind Connect with GDB to the Valgrind gdb server GDB can then
Insert breakpoints, (unlimited) watchpoints, … Examine the list of threads/tasks Examine the value of variables Continue/interrupt execution …
Valgrind gdb server provides “monitor commands” Allows to trigger Valgrind functionalities from GDB
(or from the shell command line)
E.g. for memcheck : leak search, checking definedness, …
FOSDEM 2013 31
Valgrind NEWS : gdb server (2)memcheck monitor commands
get_vbits <addr> [<len>]
returns validity bits for <len> (or 1) bytes at <addr>
bit values 0 = valid, 1 = invalid, __ = unaddressable byte
Example: get_vbits 0x8049c78 10
make_memory [noaccess|undefined
|defined|Definedifaddressable] <addr> [<len>]
mark <len> (or 1) bytes at <addr> with the given accessibility
check_memory [addressable|defined] <addr> [<len>]
check that <len> (or 1) bytes at <addr> have the given accessibility
and outputs a description of <addr>
FOSDEM 2013 32
Valgrind NEWS : gdb server (3)memcheck monitor commands
leak_check [full*|summary] [kinds kind1,kind2,...|reachable|possibleleak*|definiteleak] [increased*|changed|any] [unlimited*|limited <max_loss_records_output>] * = defaults where kind is one of definite indirect possible reachable all none Examples: leak_check leak_check summary any leak_check full kinds indirect,possible leak_check full reachable any limited 100 block_list <loss_record_nr> after a leak search, shows the list of blocks of <loss_record_nr> who_points_at <addr> [<len>] shows places pointing inside <len> (default 1) bytes at <addr> (with len 1, only shows "start pointers" pointing exactly to <addr>, with len > 1, will also show "interior pointers")
FOSDEM 2013 33
Valgrind NEWS: tune red zones size
Red zone = protection zone before/after malloc-ed block Allows to detect buffer over/under-flow If too small: less chance to detect a bug If too big : uses too much memory
Command line options to increase/decrease size --redzone-size=<number>
Size for client (application) malloc’ed blocks
--core-redzone-size=<number> Size for Valgrind internal malloc’ed blocks
No buffer overflows with Ada => use minimal red zone
FOSDEM 2013 34
Valgrind NEWS: support for other malloc libs
Command line –soname-synonyms=… allows to support non-libc malloc libraries or statically linked libs
--soname-synonyms=somalloc=*tcmalloc* Support for all variants of tcmalloc shared libraries
--soname-synonyms=somalloc=NONE Support for a statically linked malloc library
FOSDEM 2013 35
Valgrind bad NEWS: failure to develop, help needed
Valgrind serialises thread execution In other words, on a multi-core, Valgrind can only use one core
Trial done to make a “really” multi-threaded Valgrind Many race conditions found (with Valgrind on Valgrind)
Some have been fixed The “none” tool reasonably uses multi-core
Biggest (not solved) blocking problem: Memcheck “VA bits” data structure is used for each memory access Using locks to protect it is way too slow Even using one atomic instruction is too slow => ????
Ideas/help welcome …
FOSDEM 2013 36
Reliable Software : other tools/approaches/…
AdaControl : Ada coding rule checker Developed initially for Eurocontrol. Open source Routinely used at Eurocontrol
Static code analyzers CodePeer (Adacore)
Program provers SPARK : annotated subset of Ada Ada 2012 contracts
…
FOSDEM 2013 37
Conclusion : Reliable Software
Reliable software obtained using a combination of various techniques and tools
Use a safe language, i.e. Ada Complement this with tools
Valgrind is a main tool used at Eurocontrol Use it, you will like it
FOSDEM 2013 38
Questions ?