recovery oriented programming olga brukman and shlomi dolev ben-gurion university beer-sheva israel

22
Recovery Oriented Programming Olga Brukman and Shlomi Dolev Ben-Gurion University Beer-Sheva Israel

Post on 21-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Recovery Oriented Programming Olga Brukman and Shlomi Dolev Ben-Gurion University Beer-Sheva Israel

Recovery Oriented Programming

Olga Brukman and Shlomi Dolev Ben-Gurion University

Beer-ShevaIsrael

Page 2: Recovery Oriented Programming Olga Brukman and Shlomi Dolev Ben-Gurion University Beer-Sheva Israel

2

Towards Correct Software

• Software should respects its specifications– Safety, Liveness

• Atomic power station– Safety: the atomic

station shouldn't explode

– Liveness: the atomic station should produce some electricity

Atomic power station

Page 3: Recovery Oriented Programming Olga Brukman and Shlomi Dolev Ben-Gurion University Beer-Sheva Israel

3

Recovery Oriented Design

• Software performs substantially in accordance with specifications for a period of 90 days... (IEEE Computer, October 2006)

• How to cope with such software?!– Recovery Oriented Computing [PBB'02]!

• Recovery actions– Reboot, wait, reschedule– Non-intrusive: avoid rewriting the program

(possibly new other bugs)

Page 4: Recovery Oriented Programming Olga Brukman and Shlomi Dolev Ben-Gurion University Beer-Sheva Israel

4

Recovery Oriented Programming

• Specifications Composer (Project Manager)

– Invariants and predicates• important properties on

program IO

– Recovery actions

• Programmer• Best-effort implementation

• Using same IO variables as specifier

• Still: bugs and unexpected states

Page 5: Recovery Oriented Programming Olga Brukman and Shlomi Dolev Ben-Gurion University Beer-Sheva Israel

5

Recovery Oriented Programming: Assumptions • Self-stabilizing processor

• Self-stabilizing OS

• Infrastructure for robust monitoring and recovery• Processes exist and execute their code

Page 6: Recovery Oriented Programming Olga Brukman and Shlomi Dolev Ben-Gurion University Beer-Sheva Israel

Recovery Oriented Programming: Assumptions

• Not immediately Byzantine– eventual Byzantine program

Long enough to do sufficient job

Page 7: Recovery Oriented Programming Olga Brukman and Shlomi Dolev Ben-Gurion University Beer-Sheva Israel

7

Our Framework

Pre-compiler

Code

Recovery tuples

Subsystemshierarchy

event-driven monitoring

event-driven monitoring

External Monitor

event-driven monitoring

event-driven monitoring

External Monitor

event-driven monitoring

event-driven monitoring

External Monitor

SubsystemExternal Monitor

System is able to recover from any

state

Page 8: Recovery Oriented Programming Olga Brukman and Shlomi Dolev Ben-Gurion University Beer-Sheva Israel

Generated Code: One Process

event-driven monitoring

External Monitor

Codeevent-driven monitoring

Recovery tuples

Page 9: Recovery Oriented Programming Olga Brukman and Shlomi Dolev Ben-Gurion University Beer-Sheva Israel

9

Generated Code: Subsystem

event-driven monitoring

event-driven monitoring

External Monitor

event-driven monitoring

event-driven monitoring

External Monitor

event-driven monitoring

event-driven monitoring

External Monitor

SubsystemExternal Monitor

Code

Code

Code

Recovery tuples

Subsystemshierarchy

Page 10: Recovery Oriented Programming Olga Brukman and Shlomi Dolev Ben-Gurion University Beer-Sheva Israel

10

Our Framework: Transforming Recovery Tuples into Code

Code

Recovery tuples

Subsystemshierarchy

event-driven monitoring

event-driven monitoring

External Monitor

SubsystemExternal Monitor

Pre-compiler

event-driven monitoring

event-driven monitoring

External Monitor

event-driven monitoring

event-driven monitoring

External Monitor

Page 11: Recovery Oriented Programming Olga Brukman and Shlomi Dolev Ben-Gurion University Beer-Sheva Israel

11

Safety Recovery Tuple

...x=a;...

PRED: x!=7RA: this.restart()

1 process

temp_x=a;if temp_x!=7 x=temp_x;else this.restart();

Pre-compiler

Page 12: Recovery Oriented Programming Olga Brukman and Shlomi Dolev Ben-Gurion University Beer-Sheva Israel

12

Safety Recovery Tuple in the Scope of Stabilization: External Monitoring

...x=a;...

PRED: x!=7RA: this.restart();

1 process

temp_x=a;if temp_x!=7 x=temp_x;else this.restart(); ...

if !(ps.x!=7) ps.restart();

No more x=...

Pre-compiler

Page 13: Recovery Oriented Programming Olga Brukman and Shlomi Dolev Ben-Gurion University Beer-Sheva Israel

13

Liveness Recovery Tuple

x=x+2;...y=y+5;...

INV: eventually x+y=15RA: this.restart()HTR: history={}

1 processx=x+2;if (x+y==15) this.history={};...y=y+5;if (x+y==15) this.history={};

History= [ ... {.., x=1,y=2,..}, {.., x=3,y=7,..},...]

history=history▪this.state(); if loop in history and CPU(this) ps.restart();

Pre-compiler

Page 14: Recovery Oriented Programming Olga Brukman and Shlomi Dolev Ben-Gurion University Beer-Sheva Israel

14

Generated Monitoring Code for Subsystem

Code for p1

Recovery Tuples

sub: p1, p

2

History= [ ... distributed snapshot(sub),...] External monitor

for sub

Code for p2

Pre-compiler event-driven monitoring

event-driven monitoring

External Monitor

event-driven monitoring

event-driven monitoring

External Monitor

Page 15: Recovery Oriented Programming Olga Brukman and Shlomi Dolev Ben-Gurion University Beer-Sheva Israel

15

Generic Correctness Theorem

• In the program produced by the pre-compiler every rsf (restart supporting fair)-execution E has a suffix in which the program respects its specification function

– A rsf-execution is the execution in which system is trusted to behave according to its specifications after restart.

Page 16: Recovery Oriented Programming Olga Brukman and Shlomi Dolev Ben-Gurion University Beer-Sheva Israel

16

Generic Correctness Proof

• Assumption: Processes and external monitors are scheduled fairly due to presence of self-stabilizing software platform

• Safety: process either reaches monitoring section in its code or its external monitor makes scheduled check – Subsystem: external monitor makes scheduled

check

Page 17: Recovery Oriented Programming Olga Brukman and Shlomi Dolev Ben-Gurion University Beer-Sheva Israel

17

Generic Correctness Proof Cont.

• Liveness: the process (subsystem) external monitor makes scheduled check of the history log

• Corrupted history: – If causes (unnecessary) recovery - trimmed– New correct records are eventually

accumulated and reflect the real state of system

Page 18: Recovery Oriented Programming Olga Brukman and Shlomi Dolev Ben-Gurion University Beer-Sheva Israel

18

Related Work: Perfect Software• Formal specification languages

– ASM [GRS'04], IO Automata [L'96], NURPL [CKB'84]

– Gradually and manually translated into fully verified program

• Model checking – Doesn't scale

• Specification embedding programming languages– SRC (Software Cost Reduction) language [RLHL'06]

– Programmer bugs

Page 19: Recovery Oriented Programming Olga Brukman and Shlomi Dolev Ben-Gurion University Beer-Sheva Israel

19

Related Work: Programming Tools• Design By Contract

– Eiffel, iContract for Java– Checking invariants on an object state,

pre-/post-conditions on object methods, recovery by predefined recovery action

– Partial monitoring of liveness, based on timeout

– Monitoring of safety outside of stabilization scope

• Exceptions– Suitable for single process only

• Unpractical for changing the program flow

Page 20: Recovery Oriented Programming Olga Brukman and Shlomi Dolev Ben-Gurion University Beer-Sheva Israel

20

Related Work: Online Recovery• Recovery blocks (N-programming) [RX94]• ROC [PBB02], Java MOP[CR'05],

Kinesthetics eXtreme [KPGV'03], "On Modeling and Tolerating Incorrect Software" [AT'03]

• Monitoring/correcting layer that alternates the failed component behaviour

Page 21: Recovery Oriented Programming Olga Brukman and Shlomi Dolev Ben-Gurion University Beer-Sheva Israel

21

Related Work: Online Recovery

• Assumption of monitoring/correcting layer stability– ROC [PBB02], Java MOP[CR'05], Kinesthetics

eXtreme [KPGV'03]• Intrusive correcting actions

– Empty program: correcting actions define the program

• "On Modelling and Tolerating Incorrect Software" [AT'03]

Page 22: Recovery Oriented Programming Olga Brukman and Shlomi Dolev Ben-Gurion University Beer-Sheva Israel

22

Conclusions

• Recovery Oriented Programming paradigm for a programming language

• Full monitoring of safety and liveness properties in the scope of stabilization

• Formal correctness proof scheme for the resulting code