![Page 1: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/1.jpg)
Self-Stabilizing Systems as a Base for Autonomic Computing
Shlomi Dolev
Yinnon Haviv, Reuven Yagel,
Olga Brukman
![Page 2: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/2.jpg)
Trustworthy Systems: Why Is It So Hard?
Corbató’91: "It almost goes without saying that ambitious systems never quite work as expected“http://larch-www.lcs.mit.edu:8001/~corbato/turing91/
"You must pay extreme attention to detail here. One wrong bit will make things fail… "http://my.execpc.com/~geezer/os/pm.htm
From Pentium’s manual:“… if the ESP or SP register is 1 when the PUSH instruction is executed, the processor shuts down due to a lack of stack space. No exception is generated to indicate this condition"
![Page 3: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/3.jpg)
Mars Rover - Spirit…The Spirit rover has a radiation-hardened R6000 CPU from Lockheed-Martin Federal Systems…The operating system is Wind River Systems' Vx-Works.. …attempted to allocate more files than the RAM-based directory structure could accommodate. That caused an exception, which caused the task that had attempted the allocation to be suspended… …Spirit fell silent, alone on the emptiness of Mars, trying and trying to reboot
http://www.eetimes.com/sys/news/OEG20040220S0046
![Page 4: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/4.jpg)
Linux and Windows do not Stabilize
![Page 5: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/5.jpg)
Self-Stabilization
Self-healing, Self-managing, Self-*
Recovery Oriented Computing [Berkeley, Stanford]
Autonomic Computing [IBM]
Self-Stabilization Self-Stabilizing algorithm for mutual exclusion in a
ring topology [Dijkstra’74]
![Page 6: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/6.jpg)
Well Established Theory !
![Page 7: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/7.jpg)
Self-Stabilizing Systems
Elegant fault tolerant approach.Started at any state, the system convergences to a desired behavior.
Generally used in distributed systems. Routing, clock synchronization, leader election, etc.
Overcome transient faults in the system. Transient faults: soft-errors (“98% of RAM errors are soft
errors”), wrong CRC during communication etc.
![Page 8: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/8.jpg)
Self-Stabilization
The combination and type of faults cannot be totally anticipated in on-going systemsAny on-going system must be self stabilizing (or manually monitored)
E L
![Page 9: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/9.jpg)
First Self-Stabilizing Algorithm: Token Passing
![Page 10: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/10.jpg)
Token Passing
1 P1: do forever
2 if x1=xn then
3 x1:=(x1+1)mod(n+1)
4 Pi(i ≠ 1):do forever
5 if xi≠xi-1 then
6 xi:=xi-1
![Page 11: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/11.jpg)
Token Passing Cont.
Surely works when we start in
x1 = x2 = … = xn = 0.
One processor may change a state at a time.
{0; 0; 0; 0; 0};
{1; 0; 0; 0; 0};
{1; 1; 0; 0; 0};
{1; 1; 1; 0; 0};
{1; 1; 1; 1; 0};
{1; 1; 1; 1; 1};
{2; 1; 1; 1; 1};
{2; 2; 1; 1; 1};
{2; 2; 2; 1; 1};
{2; 2; 2; 2; 1};
{2; 2; 2; 2; 2}
…
![Page 12: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/12.jpg)
Token Passing: Faults
Transient fault, soft errors, wrong CRC, unexpected temporal severe conditions, etc.
Assigns each processor with an arbitrary state (in the range of its state space).
For example {3; 4; 4; 1; 0}.
p2; p4; and p5 have tokens!
Will the system ever recover?
![Page 13: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/13.jpg)
Token Passing: Automatic Recovery
p1 changes state infinitely often,
Otherwise, let s1 be the fixed state of p1,
p2 eventually copies s1 from p1, then
p3 eventually copies s1 from p2, then ...
pn eventually copies s1 from pn-1, then
p1 changes state.
p1 changes state in the order 4; 5; 0; 1; 2; 3; 4; 5; 0; ...
![Page 14: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/14.jpg)
Token Passing: Automatic Recovery Cont.
In any initial state at least one state is missing, {4; 4; 1; 0; 2}, 3 and 5 are missing.
Once p1 reaches the missing state e.g., 5, all the processors must copy 5, before p1 reads 5 from pn and changes state to 0.
![Page 15: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/15.jpg)
Will It Stabilize With mod (n - 2)?
Mod 3
{0,0,2,1,0} p1 {1,0,2,1,0} p5
{1,0,2,1,1} p4 {1,0,2,2,1} p3
{1,0,0,2,1} p2 {1,1,0,2,1}
+1 mod 3 !
![Page 16: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/16.jpg)
Is Self-Stabilization a Toy?
![Page 17: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/17.jpg)
Talk Outline
Self Stabilizing Microprocessor [DH04]
Self Stabilizing Operating System [DY04]
Self-Stabilization Preserving Compiler[DH05]
Self-Stabilizing Automatic Recoverer For
Eventual Byzantine Software [BDK03]
Recovery Oriented Programming[BD05]
![Page 18: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/18.jpg)
Self-Stabilizing MicroprocessorOvercoming Soft-Errors
Shlomi Dolev and Yinnon A. Haviv
17th International Conference on Architecture of Computing Systems (ARCS)
![Page 19: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/19.jpg)
Motivation
Soft-Errors: Single Event Upsets (SEU)
Caused by cosmic ray / other disruptions.
Cause a logical gate to flip its content.
Currently handled only in memories.
Significant impact on the microprocessors.
![Page 20: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/20.jpg)
Soft-Errors - Current Solutions
Obtaining masking using probabilistic approaches: Information redundancy (ECC / Parity) Space redundancy Time redundancy Failure detection / recovery.
Known solutions: IBM S-390 Compaq NonStop Himalaya IROC
![Page 21: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/21.jpg)
Self-Stabilizing Microprocessor
Self-stabilizing algorithms assume that the microprocessor executes them. Soft-errors may cause the microprocessor to be stuck
in a faulty state.
A microprocessor is self-stabilizes if: Started in any internal state, converges in a finite
number of steps into the set of safe states. Microprocessor’s safe state – in which it performs
“fetch-decode-execute” cycle
![Page 22: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/22.jpg)
Proving Convergence Proving that there exists no “bad” cycle in the transition graph
of the microprocessor. Too large ! (we must explore the entire graph) Using an abstraction:~ Group together states in which the
micro-code program counter is the same.
a
b
c d
ef
k
l
i
j
hg D
E
F
A
B
C
![Page 23: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/23.jpg)
Self-Stabilizing Microprocessor: Summary
Soft-errors are here to stay, we should: Design our systems to mask them. Self-stabilize following a non-masked error.
We provide methodology for validating self-stabilization property of microprocessors.
![Page 24: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/24.jpg)
Talk Outline
Self Stabilizing Microprocessor [DH04]
Self Stabilizing Operating System [DY04]
Self-Stabilization Preserving Compiler[DH05]
Self-Stabilizing Automatic Recoverer For
Eventual Byzantine Software [BDK03]
Recovery Oriented Programming[BD05]
![Page 25: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/25.jpg)
Toward Self-Stabilizing Operating System (SOS)
Shlomi Dolev and Reuven Yagel,SAACS’04 Workshop, Zaragoza
![Page 26: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/26.jpg)
Talk Outline
The first self-stabilizing algorithm (of Dijkstra)
Self Stabilizing Microprocessor [DH04]
Self Stabilizing Operating System [DY04]
Self-Stabilization Preserving Compiler[DH05]
Self-Stabilizing Automatic Recoverer For Eventual Byzantine Software [BDK03]
Recover Oriented Programming[BD05]
![Page 27: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/27.jpg)
Basic Directions
Black-box Take existing OS (Unix, Windows, RTOS) Add stabilization layer
Carefully tailoring a tiny kernel Processor scheduling Memory management Device allocation
![Page 28: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/28.jpg)
Assumptions
Every configuration (processor/memory) is possibleAt least some program code is hardwired (in ROM) and is correct – Harvard Model
Processor: Instruction manual (e.g. x86\IA-32) defines a
transition function. Self-stabilizing [DH04]
![Page 29: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/29.jpg)
Black Box
Requirements: Defining a legal execution is usually impractical At least - restore original state (variables + code), infinitely often
Periodic Reset Re-install and Execute Watchdog timer (self-stabilizing) Periodic processor reset During bootstraps OS reinstall from ROM
Weak self-stabilization E = (ci, ai, ci+1, …., RRE, c1, a1, c2, a2, …., ci, ai, ci+1, …., RRE, c1, a1,
c2, a2, …. Is it always acceptable?
Alternative: Periodic re-install code only, add consistency check and enforcement
![Page 30: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/30.jpg)
Tailored Kernel
Tiny Scheduler Tiny Memory Manager
Requirements: Self-stabilizing Fair Process stabilization preserving (e.g. validity of
P.C. value)
![Page 31: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/31.jpg)
Tiny SOS Scheduler
; increase task10 mov word ax, [currentProc]11 and ax, PROC_MASK...
; load task state...;restore ip52 mov ax, [bx+4];validate ip53 and ax, IP_MASK54 mov word [ss:STACK TOP], ax;restore general registers55 mov cx, word [bx+12] 56 mov dx, word [bx+14] 57 mov si, word [bx+16] 58 mov di, word [bx+18]
~70 lines of a real machine assembly code16bit Real mode & 32bit Protected mode.Standard build and emulation tools (Nasm, ld, Bochs)Detailed proof of requirement preservation
![Page 32: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/32.jpg)
![Page 33: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/33.jpg)
Tiny SOS Memory Manager
Requirements: Consistency of memory hierarchy Self-stabilization preservation
![Page 34: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/34.jpg)
Any State
Process(ing)Next ProcessValidated & Ready
Clock tick / execute next
Some Error
Some Error
Some Error
Establish SchedulerConsistency
Tiny SOS Scheduler
![Page 35: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/35.jpg)
Any State
Process(ing)Next ProcessValidated & Ready
Clock tick / execute next
NMI / load PC with scheduler handler
Establish SchedulerConsistency
Tiny SOS Scheduler
![Page 36: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/36.jpg)
Sketch of Proof
In every execution E, the code of the scheduler is started to be executed and is executed from the first instruction to the last instruction infinitely often
In every execution E of the scheduler each process is executed infinitely often
The self-stabilizing scheduler preservers stabilization of processes.
![Page 37: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/37.jpg)
Talk Outline
Self Stabilizing Microprocessor [DH04]
Self Stabilizing Operating System [DY04]
Self-Stabilization Preserving Compiler[DH05]
Self-Stabilizing Automatic Recoverer For
Eventual Byzantine Software [BDK03]
Recover Oriented Programming[BD05]
![Page 38: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/38.jpg)
Self-Stabilization Preserving Compiler
Shlomi Dolev, Yinnon A. Haviv,Department of Computer Science
Ben-Gurion University, Israel
Mooly Sagiv,Department of Computer Science
Tel Aviv University, Israel
![Page 39: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/39.jpg)
Motivation
Transient malfunctions.
Single processor: Hardware glitches. Soft-Errors.
Distributed environment: Processor crashes / recoveries. Link errors.
Resulting in an unpredictable system state.
![Page 40: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/40.jpg)
Coping with Transient Errors
Masking (safety factor) achieved by: Information redundancy (e.g., ECC). Time/Space redundancy. (e.g., TMR)
Self-Stabilization [Dijkstra74]: Assuming any system state (caused by errors). Recovering by converging into legal behavior. Existing algorithms for distributed tasks:
Routing, leader election, mutual exclusion, etc.
![Page 41: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/41.jpg)
Self-Stabilizing Algorithms – a Solution to Soft-Errors?
Self-Stabilizing algorithm assumes that the microprocessor executes it. Soft-Errors may cause the microprocessor to be
stuck in a faulty state.
Composition of self-stabilizing algorithms creates a self-stabilizing system. Make the microprocessor eventually fetch-decode-
execute machine code.
![Page 42: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/42.jpg)
The Gap.
Need a transformation between: Input program P written in a high abstraction
language, e.g., (D)ASM. Output program Q in a machine language, say, JVM.
Existing compilers? P and Q behaves the same when started in the
initial state. What if Q reaches an unexpected state due to
soft-error experienced by microprocessor?
![Page 43: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/43.jpg)
Trivial Example
A statement of the form:For each i in {0..9} do f(i)
May be compiled to Start with cx=12 inside the loop…
Moreover: Any runtime mechanism can get stuck / inconsistent.
mov ax, 10 mov cx, 0loop1: push cx call f inc cx cmp cx,ax jne loop
![Page 44: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/44.jpg)
Stabilization Preserving Compiler – a closer look
State space of P
Ensuring that Q eventually behaves as P:
State space of Q
![Page 45: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/45.jpg)
Self-Stabilization Preserving Compiler: Summary
Front end of compiler established. Typed version of ASM. JavaCC as a parser generator.
Interpreter (used as a model).
Fast stabilization vs. optimizations.
Self Stabilization preserving compiler. Language with clear semantics from any state. Innovative demands from compiler.
![Page 46: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/46.jpg)
Talk Outline
Self Stabilizing Microprocessor [DH04]
Self Stabilizing Operating System [DY04]
Self-Stabilization Preserving Compiler[DH05]
Self-Stabilizing Automatic Recoverer For Eventual Byzantine Software [BDK03]
Recover Oriented Programming[BD05]
![Page 47: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/47.jpg)
Self-Stabilization and Evolving Systems
Real world systems cannot be verified exhaustively…
We enforce safety and live-ness specifications
Contract between the client, project manager and programmers, that is checked on line!
Make sure that the additional (thin) monitoring and recovering layer is self-stabilizing
A change can be made to the
implementation/specification
to support evolving environments
![Page 48: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/48.jpg)
Self-Stabilizing Recoverer for Eventual Byzantine Software
Olga Brukman, Shlomi DolevDepartment of Computer Science
Ben-Gurion University, Israel
Hillel Kolodner,Haifa Research Labs
IBM, Israel
![Page 49: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/49.jpg)
Software Contains Bugs
Heisenbugs, corrupt states, leaked resources are common… Correct and faultless SW is hard Long-lived running programs, e.g., OS
Usually software is tested when starting from initial state and considering limited time scenarios.
![Page 50: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/50.jpg)
Fault Model Reflecting Reality
Software packages can be trusted to work as required after restart.Eventual Byzantine software.System administrators and users use reboot to deal with faults.
![Page 51: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/51.jpg)
Middleware Architecture
OS
Kern
el
OMR
<Preds,RActs>1
<Preds,RActs>2
…<Preds,RActs>n
<Preds,R
Acts
>
<Preds,RActs>
<Preds,RActs>
<Pred
s,RActs>
![Page 52: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/52.jpg)
Monitor-Restarter for Process and Subsystem
<Pred,RActs>1
<Pred,RActs>2
…
![Page 53: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/53.jpg)
Restart Actions – Mature Approach
Subsystem waits for completion of a restart of its components.Restart action may vary, depending on component internal state. Reschedule Roll-back Kill & Restart
Few restart attempts with more drastic restart actions.
![Page 54: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/54.jpg)
Computational Model: rsf-executionAn execution E is rsf (restart supporting fair)-execution iff E is a fair execution in which every subsystem subi that is initialised during E respects its specification function ssi.
Requirement: Every rsf-execution E has a suffix in which the system respects its specification function ss.
![Page 55: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/55.jpg)
Tools for Implementation – Black Box Approach
Software package is a black box.
Package is monitored by recording it’s IO (e.g., strace in Linux).
Monitors are independent of specific implementation
![Page 56: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/56.jpg)
Tools for Implementation – Transparent Box Approach
Software package implementation tool is known.
Run-Time Reflection tools are used to monitor and restart the package.
Possible in Java, C++, CORBA, COM.
![Page 57: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/57.jpg)
Practical Experience: Printers Problem
Corrupted pdf, doc or ps file sent to printing server.
Printer can’t print the file.
Cause retries by printing server Printer is “stuck” on one job.
Predicate for printing server: Restrict number of retries, try format conversions,
send error message to user.
![Page 58: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/58.jpg)
Self-Stabilizing Automatic Recoverer: Summary
Theory foundations of self-stabilization and restart techniques could serve as a basis for the new paradigms.General framework for design and correctness proof for autonomic recoverer.
Printers experience coordinated with IBM.
![Page 59: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/59.jpg)
Recovery Oriented Programming
Olga Brukman and Shlomi DolevDepartment of Computer Science
Ben-Gurion University, Israel
![Page 60: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/60.jpg)
Towards Robust Software
Programming Structural programming, OOD, Design Patterns…
Testing and debugging Unit testing [JUnit, CppUnit]… Design By Contract (Eiffel) …
Formal specification languages ASM, IO Automata, NURPL
Model checkingOnline recovery ROC [PBB02]. Self-Stabilizing Autonomic Recoverer for Eventual
Byzantine Software [BDK03] - black box software packages.
![Page 61: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/61.jpg)
Our Contribution
Program invariants derived from design specifications. Checked every time invariant variables are updated.
Automatic code generation for invariant verification and recovery upon invariant violation.Invariants are verified during runtime. Change of invariant variable is pre-checked in sand-
box. Violation is prevented and replaced with a recovery action.
![Page 62: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/62.jpg)
Our Contribution Cont.
Recovery action is chosen depending on the current state and history.
Roll back & resume.Wait.Reschedule.Kill & restart.
![Page 63: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/63.jpg)
External Monitoring
Monitoring the whole task to avoid transient faults occurrence after which
invariant variables are not changed ( and no invariant checks are done)
liveness problem – monitor over time
![Page 64: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/64.jpg)
Producer Consumer Example
Producer, Consumer – threads
Queue – a circular bounded length queue int queue[size] int start – position of
current first element in the queue
int end – position of the first empty place in the queue
boolean empty
Producer
Consumer
Queue
![Page 65: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/65.jpg)
Invariants for Queue
start, end are in some values range
({0,..,size-1})
Queue is not empty iff start != end
A possible invariant is:
(start in {0,..,size-1}, end in{0,.., size-1}) &&
(start != end => empty == false)
![Page 66: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/66.jpg)
Given Implementation Codeclass Queue {
private int[] queue; private int start, end, size; private boolean empty; public Queue(int size){
queue = new int[size];start = 0; end =0; empty= true;
} public synchronized int dequeue() {
int result; while (empty) wait(); result = queue[start]; start= (start + 1) % size; empty = (start == end) ? true; notifyAll(); return result; } public synchronized void enqueue(int value) { while (start == end && !empty) //while full wait(); queue[end] = value; end = (end + 1) % size; empty=false; notifyAll(); }}
![Page 67: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/67.jpg)
Code Transformationclass Queue {invariant: (start in {0,..,size-1}, end in {0,.., size-1}) &&
(start != end => empty == false) recovery actions: {this = new Queue(); empty = false;}
private int[] queue; private int start, end, size; private boolean empty; public Queue(int size){
queue = new int[size];atomic{ start = 0; end =0; empty= true;}
} public synchronized int get() {
int result; while (empty) wait(); result = queue[start]; atomic{ start= (start + 1) % size; empty = (start == end) ? true;} notifyAll(); return result; }
![Page 68: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/68.jpg)
Code Transformation Cont. public synchronized void put(int value) {
while (start == end && !empty) //while full wait(); queue[end] = value; atomic{ end = (end + 1) % size; empty=false;} notifyAll(); }
}
![Page 69: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/69.jpg)
External Monitoring
The possible invariant is:
(start in {0,..,size-1}, end in {0,.., size-1}) &&
(start != end => empty = false) &&
(start = end && empty = false => producer.wait()) && (empty = true =>consumer.wait()) &&
[(producer.wait() && !consumer.wait()) || (!producer.wait() && consumer.wait())]
New recovery actions: interrupt producer/consumer and initialize it.
![Page 70: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/70.jpg)
Recovery Oriented Programming: Summary
Programming with self-stabilization enforcing.Eventually safe execution.
![Page 71: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/71.jpg)
Talk Conclusions
Self-Stabilization as an effective paradigm for creating robust systems.
Rigorous approach for designing basic system components Microprocessor Operating system Compiler Evolving and Recovery Oriented
![Page 72: Self-Stabilizing Systems as a Base for Autonomic Computing Shlomi Dolev Yinnon Haviv, Reuven Yagel, Olga Brukman](https://reader035.vdocument.in/reader035/viewer/2022081519/56649ca65503460f94968dee/html5/thumbnails/72.jpg)