![Page 1: Copyright © 2005 EEM202A/CSM213A - Fall 2005 Ram Kumar & Roy Shea UCLA - NESL {ram@ee,roy@cs}.ucla.edu Lecture #12: Reliable Embedded](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4c5503460f94a2a3f3/html5/thumbnails/1.jpg)
Copyright © 2005
EEM202A/CSM213A - Fall 2005
Ram Kumar & Roy Shea
UCLA - NESL
{ram@ee,roy@cs}.ucla.edu
http://nesl.ee.ucla.edu
Lecture #12: Reliable Embedded Software
![Page 2: Copyright © 2005 EEM202A/CSM213A - Fall 2005 Ram Kumar & Roy Shea UCLA - NESL {ram@ee,roy@cs}.ucla.edu Lecture #12: Reliable Embedded](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4c5503460f94a2a3f3/html5/thumbnails/2.jpg)
2
Reading List for this Lecture
• “The Model Checker Spin”, IEEE Trans. on Software Engineering, Vol. May 1997.
• D. Gay, P. Levis, R. von Behren, M. Welsh, E. Brewer, and D. Culler.“The nesC Language: A Holistic Approach to Networked Embedded Systems”. Proceedings of Programming Language Design and Implementation (PLDI) 2003, June 2003.
• G. Necula, S. McPeak, S.P. Rahul, and W. Weimer. "CIL: Intermediate Language and Tools for Analysis and Transformation of C Programs".Proceedings of Conference on Compiler Construction, 2002.
• Mark Weiser. “Program Slicing”. 5th International Conference on Software Engineering. 1981.
• Robert Wahbe, Steven Lucco, Thomas E. Anderson, Susan L. Graham, “Efficient software-based fault isolation,” Proceedings of the fourteenth ACM symposium on Operating systems principles (SOSP-93).
– http://citeseer.ist.psu.edu/wahbe93efficient.html• Nial Murphy, “Watchdog Timers,” Embedded Systems Programming
– http://www.embedded.com/2000/0011/0011feat4.htm
![Page 3: Copyright © 2005 EEM202A/CSM213A - Fall 2005 Ram Kumar & Roy Shea UCLA - NESL {ram@ee,roy@cs}.ucla.edu Lecture #12: Reliable Embedded](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4c5503460f94a2a3f3/html5/thumbnails/3.jpg)
3
Outline
• Overview of design process
• Static analysis– Concurrency
– Memory usage
• Runtime monitoring– Detection
– Isolation
• Hardware support• Conclusions
Implementation(Static analysis)
Deployment(Runtime
monitoring)
TestingSpecification
![Page 4: Copyright © 2005 EEM202A/CSM213A - Fall 2005 Ram Kumar & Roy Shea UCLA - NESL {ram@ee,roy@cs}.ucla.edu Lecture #12: Reliable Embedded](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4c5503460f94a2a3f3/html5/thumbnails/4.jpg)
4
Overview of Software Design Process
• Specification– Understand task and
constraints– Develop formal
models for protocols– “The Model Checker Spin”,
IEEE Trans. on Software Engineering, Vol. May 1997.
• Testing– Feed inputs– Stress test– Long test
● Implementation*– Coding standards
– Code reviews and pair programming
– Static analysis
● Deployment*– Fault detection
– Isolation
– Feedback
![Page 5: Copyright © 2005 EEM202A/CSM213A - Fall 2005 Ram Kumar & Roy Shea UCLA - NESL {ram@ee,roy@cs}.ucla.edu Lecture #12: Reliable Embedded](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4c5503460f94a2a3f3/html5/thumbnails/5.jpg)
5
What and Why of Static Analysis
• “Testing and verification of a system without running the code”
• Specification may not be implemented correctly• Not all errors appear during test runs
– Concurrency problems with timing dependence– Faults under specific system loads
• Complements other techniques• Early detection such as type checking
![Page 6: Copyright © 2005 EEM202A/CSM213A - Fall 2005 Ram Kumar & Roy Shea UCLA - NESL {ram@ee,roy@cs}.ucla.edu Lecture #12: Reliable Embedded](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4c5503460f94a2a3f3/html5/thumbnails/6.jpg)
6
Techniques
• Create abstract model of the program
– Direct reasoning about code is hard
– Basic blocks or AST – G. Necula, S. McPeak, S.P.
Rahul, and W. Weimer. "CIL: Intermediate Language and Tools for Analysis and Transformation of C Programs".Proceedings of Conference on Compiler Construction, 2002.
• Examine the model– Mark Weiser. “Program Slicing”. 5th
International Conference on Software Engineering. 1981.
– Dataflow to track state through a program
#include <stdlib.h>#include <stdio.h>
int main() {
int x; int y;
x = rand() % 10; y = rand() % 9;
if(x>y) { x = x * x / 2; } else { x = y / 2; y = y * x; }
printf("X+Y = %d", x+y); return 0;}
![Page 7: Copyright © 2005 EEM202A/CSM213A - Fall 2005 Ram Kumar & Roy Shea UCLA - NESL {ram@ee,roy@cs}.ucla.edu Lecture #12: Reliable Embedded](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4c5503460f94a2a3f3/html5/thumbnails/7.jpg)
7
Example: Concurrency
• Problem– Shared data can be corrupted
by concurrent accesses
– Concurrency is a problem even without threading (why?)
• Solution– Annotate atomic code blocks
– Infer what must be protected
– Verify protection by looking at code base
• D. Gay, P. Levis, R. von Behren, M. Welsh, E. Brewer, and D. Culler.“The nesC Language: A Holistic Approach to Networked Embedded Systems”. Proceedings of Programming Language Design and Implementation (PLDI) 2003, June 2003.
![Page 8: Copyright © 2005 EEM202A/CSM213A - Fall 2005 Ram Kumar & Roy Shea UCLA - NESL {ram@ee,roy@cs}.ucla.edu Lecture #12: Reliable Embedded](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4c5503460f94a2a3f3/html5/thumbnails/8.jpg)
8
Example: Memory Management
• Problem– Dynamic memory in embedded applications can result in difficult to
understand bugs and strange errors
– Dangling pointers, memory leaks, data corruption
● Important benefits of dynamic memory– Significantly simplify code base
– Dynamic Memory Allocation in Embedded Apps?
– http://ask.slashdot.org/article.pl?sid=05/11/16/2236235&tid=156&tid=201&tid=4
int *p = malloc (sizeof(int)*num);int *q = malloc (sizeof(int)*num*2);int *r = p;...free(r);...if (p[0] == 0) launchMissile();
![Page 9: Copyright © 2005 EEM202A/CSM213A - Fall 2005 Ram Kumar & Roy Shea UCLA - NESL {ram@ee,roy@cs}.ucla.edu Lecture #12: Reliable Embedded](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4c5503460f94a2a3f3/html5/thumbnails/9.jpg)
9
Model for Memory
Formalized by Shane Markstrum
![Page 10: Copyright © 2005 EEM202A/CSM213A - Fall 2005 Ram Kumar & Roy Shea UCLA - NESL {ram@ee,roy@cs}.ucla.edu Lecture #12: Reliable Embedded](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4c5503460f94a2a3f3/html5/thumbnails/10.jpg)
10
Implementation
• Convert module into an AST• Use data flow to track annotated data
__attribute__((sos_claim)) __attribute__((sos_release))
● Must either:
– persistently store data once
– free data
– release data to ownership of another module
● Must not create any persistent references to data before call
● Must treat data as dead after the call
caller -callee -
![Page 11: Copyright © 2005 EEM202A/CSM213A - Fall 2005 Ram Kumar & Roy Shea UCLA - NESL {ram@ee,roy@cs}.ucla.edu Lecture #12: Reliable Embedded](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4c5503460f94a2a3f3/html5/thumbnails/11.jpg)
11
Outline
• Run Time Techniques– Operate during the execution of system– Access to more information than the static analysis tools– Introduce performance overheads
• Fault Isolation– Localize the impact of the fault– Specifically looking at memory corruption faults
• Fault Tolerance– Detect and recover from a fault
• Restore to a known good state• Re-initialize the state
– Specifically looking at hardware/architecture based techniques
![Page 12: Copyright © 2005 EEM202A/CSM213A - Fall 2005 Ram Kumar & Roy Shea UCLA - NESL {ram@ee,roy@cs}.ucla.edu Lecture #12: Reliable Embedded](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4c5503460f94a2a3f3/html5/thumbnails/12.jpg)
12Memory Corruption Fault
Within Single Address Space
• A program is free to access the entire address space• Memory Corruption Fault
– Very easy for a program to corrupt the state of other programs• Desktop/Server CPUs have MMU
– No MMU in Embedded Processors (esp. micro-controllers)– Power, Performance, Cost … blah blah
Middleware
Operating System
ApplicationsRun-time
Stack
Global Data
and
Heap
Single Continuous
Address Space
Single Continuous
Address Space
Program Memory Data Memory
![Page 13: Copyright © 2005 EEM202A/CSM213A - Fall 2005 Ram Kumar & Roy Shea UCLA - NESL {ram@ee,roy@cs}.ucla.edu Lecture #12: Reliable Embedded](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4c5503460f94a2a3f3/html5/thumbnails/13.jpg)
13
Software Fault Isolation (SFI)
• Re-write the program to perform fault isolation in software– Simple but a very powerful concept
– Useful even in servers/desktops for high performance application extensions, kernel extensions etc.
• Trade slower instrumented code for more protection– No need for a hardware protection boundary
• Slogan - You can still shoot yourself in the foot, but you can’t shoot the other guy in the foot
Ack. Prof. Aiken UCB
![Page 14: Copyright © 2005 EEM202A/CSM213A - Fall 2005 Ram Kumar & Roy Shea UCLA - NESL {ram@ee,roy@cs}.ucla.edu Lecture #12: Reliable Embedded](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4c5503460f94a2a3f3/html5/thumbnails/14.jpg)
14
Overview
• Maintain two invariants for isolated code
• Any jumps stay within the isolated code
• Any writes are to data belonging to the isolated code
• Idea: Divide the address-space into segments– Segment addresses have unique high-order bits
• Protection subdomains are defined by segments– Every write must be within the segment
– Every jump must be within the segment
![Page 15: Copyright © 2005 EEM202A/CSM213A - Fall 2005 Ram Kumar & Roy Shea UCLA - NESL {ram@ee,roy@cs}.ucla.edu Lecture #12: Reliable Embedded](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4c5503460f94a2a3f3/html5/thumbnails/15.jpg)
15
Fault Domain
Run-timeStack
Sampling Application
Operating System
Middleware
Operating System
Sampling Application
Fault Domains
No jumps outside fault domainNo writes outside fault domain
PROG DATA
![Page 16: Copyright © 2005 EEM202A/CSM213A - Fall 2005 Ram Kumar & Roy Shea UCLA - NESL {ram@ee,roy@cs}.ucla.edu Lecture #12: Reliable Embedded](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4c5503460f94a2a3f3/html5/thumbnails/16.jpg)
16
Implementation - Segment Matching• Replace each store by the sequence:
dedicated-reg target addressMove target address into dedicated register
srcatch-reg (dedicated-reg >> shift-reg)Right shift address to get segment identifierShift-reg is dedicated
scratch-reg == segment-regCompare segment identifier with current segmentSegment-reg is dedicated
trap if not equalTrap if store address is outside of the segment
store through dedicated-regGuaranteed to store at the correct address
![Page 17: Copyright © 2005 EEM202A/CSM213A - Fall 2005 Ram Kumar & Roy Shea UCLA - NESL {ram@ee,roy@cs}.ucla.edu Lecture #12: Reliable Embedded](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4c5503460f94a2a3f3/html5/thumbnails/17.jpg)
17
Comments
• Segment matching overhead – 4 instructions for EVERY store instruction in the program
• Requires three dedicated registers– Dedicated-reg holds the address being computed– Segment-reg holds current valid segment– Shift-size holds the size of the shift to perform– These three registers will not be used in the program
• Why dedicated registers ?– What will happen if a jump instruction by-passes all
checks ?– What will happen if a jump lands in the middle of the
checks ?
![Page 18: Copyright © 2005 EEM202A/CSM213A - Fall 2005 Ram Kumar & Roy Shea UCLA - NESL {ram@ee,roy@cs}.ucla.edu Lecture #12: Reliable Embedded](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4c5503460f94a2a3f3/html5/thumbnails/18.jpg)
18
Sandboxing - Faster Approach
• Idea– Don’t test the segment bits– Just overwrite segment bits with correct segment
dedicated-reg (target-reg & and-mask-reg)Use dedicated register and-mask-reg to clear segment identifier bits
dedicated-reg dedicated-reg | segment-regUse dedicated register segment-reg to set segment identifier bits
• This is much faster– Only two instructions per instrumentation point
• Loses information about errors– Program may keep running with incorrect instructions and data
![Page 19: Copyright © 2005 EEM202A/CSM213A - Fall 2005 Ram Kumar & Roy Shea UCLA - NESL {ram@ee,roy@cs}.ucla.edu Lecture #12: Reliable Embedded](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4c5503460f94a2a3f3/html5/thumbnails/19.jpg)
19
Implementation Details
• Optimizations– Traditional compiler optimizations
• Move sandboxing out of the loop
– Don’t instrument statically verifiable writes and jumps
• Binary instrumentation– Most portable & easily deployed– Also the hairiest option– Need to verify the binary
• No use of dedicated registers
• Modified compiler– Less easy to adopt– But easier to implement
![Page 20: Copyright © 2005 EEM202A/CSM213A - Fall 2005 Ram Kumar & Roy Shea UCLA - NESL {ram@ee,roy@cs}.ucla.edu Lecture #12: Reliable Embedded](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4c5503460f94a2a3f3/html5/thumbnails/20.jpg)
20
Things to ponder about …
• How will the applications residing in their respective fault domains communicate with one another ?
• How will the data be shared amongst the fault domains?
• How will SFI be implemented on micro-controllers with less than 1 KB of memory ?
![Page 21: Copyright © 2005 EEM202A/CSM213A - Fall 2005 Ram Kumar & Roy Shea UCLA - NESL {ram@ee,roy@cs}.ucla.edu Lecture #12: Reliable Embedded](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4c5503460f94a2a3f3/html5/thumbnails/21.jpg)
21
Embedded Systems In Real World
• Used in inaccessible places– Controllers for space vehicles - MARS Pathfinder– Closer home … sensor networks in dense forests
• Used for critical applications– Brake-by-wire systems– Medical Instruments
• Unexpected faults– Cosmic rays may flip on-chip bits
• Hard or even impossible to produce perfect firmware– Strive to design our systems to cleanly handle failures
![Page 22: Copyright © 2005 EEM202A/CSM213A - Fall 2005 Ram Kumar & Roy Shea UCLA - NESL {ram@ee,roy@cs}.ucla.edu Lecture #12: Reliable Embedded](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4c5503460f94a2a3f3/html5/thumbnails/22.jpg)
22
Watchdog Timer Hardware
• Hardware counter that is set to an initial value• Continually counts down to zero• Responsibility of the software to set the count to original
value• When the counter reaches zero, the software is assumed
to have failed• Perform any suitable recovery
– Typically reset the CPU
• Visual Metaphor– “If the man stops kicking the dog, the dog will take advantage
of the hesitation and bite the man.”
![Page 23: Copyright © 2005 EEM202A/CSM213A - Fall 2005 Ram Kumar & Roy Shea UCLA - NESL {ram@ee,roy@cs}.ucla.edu Lecture #12: Reliable Embedded](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4c5503460f94a2a3f3/html5/thumbnails/23.jpg)
23
Failures detected by watchdog
• Catch events that hang the system
• Transient Failures– Power glitches may corrupt program counter, stack
pointer or even the data in RAM
• Software Bugs– Infinite loops– Accidental jump out of code memory– Deadlock conditions (Incorrect design)
• Watchdog guarantees that none of the bugs will hang the system indefinitely
![Page 24: Copyright © 2005 EEM202A/CSM213A - Fall 2005 Ram Kumar & Roy Shea UCLA - NESL {ram@ee,roy@cs}.ucla.edu Lecture #12: Reliable Embedded](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4c5503460f94a2a3f3/html5/thumbnails/24.jpg)
24
Watchdog Design Considerations• First Aid - Recovery from watchdog bite• Maintain a count of number of resets
– Shutdown a persistently errant application• Use watchdog for sanity checks
– Verify the control flow through a piece of code– Record failure reports in non-volatile storage– Diagnostic information is very useful
• Choosing watchdog timeout interval– Need to understand the timing characteristics of
the program– Large interval - Slow response– Small interval - Frequent resets, difficult to
diagnose• Space Shuttle’s main engine controller
– WDT timeout 18 ms– Switchover to a backup computer
![Page 25: Copyright © 2005 EEM202A/CSM213A - Fall 2005 Ram Kumar & Roy Shea UCLA - NESL {ram@ee,roy@cs}.ucla.edu Lecture #12: Reliable Embedded](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4c5503460f94a2a3f3/html5/thumbnails/25.jpg)
25
Watchdog Self Test
• What if WDT fails in a way that it never bites ?
• Would be discovered only if a failure hangs the system
• WDT failure is VERY EASILY possible– WDT can be disabled in software
– HW Misconfiguration - Jumper of reset line pulled out
• Startup self-test– Allow WDT to timeout and reset the processor
– Flag to distinguish power on reset from WDT reset
![Page 26: Copyright © 2005 EEM202A/CSM213A - Fall 2005 Ram Kumar & Roy Shea UCLA - NESL {ram@ee,roy@cs}.ucla.edu Lecture #12: Reliable Embedded](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4c5503460f94a2a3f3/html5/thumbnails/26.jpg)
26
Grenade Timer
• Idea - Build a counter that cannot be reloaded once it is running– Grenade whose pin has been pulled will have to explode
• Guaranteed reboot is a “useful feature” in some applications– Purges all bad state and re-initializes the system
Grenade Timer HW Interface
![Page 27: Copyright © 2005 EEM202A/CSM213A - Fall 2005 Ram Kumar & Roy Shea UCLA - NESL {ram@ee,roy@cs}.ucla.edu Lecture #12: Reliable Embedded](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4c5503460f94a2a3f3/html5/thumbnails/27.jpg)
27
Taxonomy• FAILURE
– Event that occurs when the delivered service deviates from the correct service
– Failure is the effect that is observed
– E.g. - “Your iPod Nano stops responding.”
• FAULT– Fault is the cause of an error
– An error may lead to failure
– E.g. - “Memory corruption fault lead to the failure of the iPod”