page 1 runtime verification and software fault protection with eagle allen goldberg klaus havelund...
TRANSCRIPT
Page 1
Runtime Verification and
Software Fault Protectionwith Eagle
Allen GoldbergKlaus Havelund
Kestrel TechnologyPalo Alto, USA
Page 2
Overview
• Run-time Monitoring• About EAGLE • Software Fault Protection• Summary
Page 3
Motivation for Runtime Verification
• Model checking and Theorem Proving are rigorous, but:• Not fully automated:
– model creation is often manual– Abstraction is often manual in the case of model checking– Lemma discovery is often manual in the case of theorem proving
• Therefore: not very scalable
• Applied Testing is scalable and widely used, but is ad hoc:• Lack of formal coverage• Lack of formal conformance checking
• Combine Formal Methods and Testing
Page 4
Run-time Verification
• Combine temporal logic specification and testing:• Specify properties in some temporal logic.• Instrument program to generate events.• Monitor properties against a trace of events
emitted by the running program.
Page 5
test & property
generation
event streamImplemented
system
under test
instrumentation
disp
atch
Observer
Model
test
inputs
behavioral
properties
reports
A Model-Based Verification Architecture
Rqmts
Deadlock
Dataraces
instrumentation
Page 6
Static and Dynamic Analysis
Program
Specification
Input Output
Test case generation Runtime verification
Program instrumentation
Page 7
So many logics …
• What is the most basic, yet, general specification language suitable for monitoring?
EAGLE is our answer.
Page 8
Logics
• Assertions (Java 1.4)• Pre-post conditions, invariants (Eiffel, JML)• Temporal logic (Mac, Temporal Rover)• Time lines (Smith, Dillon)• Live Sequence Charts (Harel)• Automata (Buchi, Alternating, Timed)• Regular expressions (Rosu, Lee)• Process algebra (Jass)• Mathematical modeling/programming languages (Alloy, Prolog, Maude, Haskell, Specware)
Page 9
Eagle’s Core Concepts
• Finite trace monitoring logic • Prop. logic + Three temporal
connectives: • Next: @F• Previous: #F• Concatenation: F1;F2
• Recursive parameterized rules : Always(Term t) = t /\ @Always(t) .
@F#Fnow
F1 F2
Each state is a environment
mapping variables to values
Page 10
Introducing EAGLE
Can encode: • future time temporal logic • past-time logic • extended regular expressions • µ-calculus • real-time, data-binding, statistics….
• Monitoring formulas are evaluated online over a given input trace, on a state by state basis• identify failure as early as reasonably possible
• Evaluation proceeds by checking facts about the past and generating obligations about the future.
Page 11
Basic Propositional Example
// LTL rules:
max Alws(Term t) = t /\ @ Alws(t) .
min Even(Term t) = t \/ @ Even(t) .
min Prev(Term t) = t \/ # Prev(t) .
// Monitor:
Mon M = Alws(y>0 -> (Prev(x>0) /\ Even(z>0))) .
library
y>0x>0 z>0… …
Page 12
Data Bindings
min R(int k) = Prev(x==k) /\ Even(z == k) .
// Monitor:
mon M = Alws(y>0 -> R(y)) .
mon Wrong = Alws(y>0 -> (Prev(x==y) /\ Even(z==y))) .
y>0 z==kx==k
k := y
… …
Page 13
Time is Just Data
min Before(long t, Term F) = clock <= t /\ (F \/ @ Before(t,F)) .
min Within(long t, Term F) = Before(t+clock, F) .
min Before(long t, Term F) = clock <= t /\ (F \/ @ Before(t,F)) .
min Within(long t, Term F) = Before(t+clock, F) .
x>0 y>0 x>0 y>0
mon M = Alws(x>0 -> Within(4,y>0)) .
< 4 < 4
clock is a variable defined in the state
Page 14
Automata
open
closeS1 S2
access
init
max S1() = init -> @ S1)( \/ open -> @ S2 )(
(~ \/ access \/ close. )min S2() = access -> @ S2)(
\/ close -> @ S1 )((~ \/ init \/ open. )
mon M = S1. )(
Page 15
Timed Automata
open
closeS1 S2
accessinit
max S1() = init -> @ S1)( \/ open -> @ S2(clock)
(~ \/ access \/ close. )min S2(long t) = clock <= t+10
\/ access -> @ S2)( \/ close -> @ S1 )(
(~ \/ init \/ open. )
x := 0
x < 10
Page 16
Combining Automatae and Temporal Logic
open /\ Prev(init)
closeS1 S2
access
init
max S1() = init -> @ S1)( \/ open -> (Prev(init) /\ @ S2())
(~ \/ access \/ close. )min S2() = access -> @ S2)(
\/ close -> @ S1 )((~ \/ init \/ open. )
mon M = S1. )(
Page 17
EAGLE by example: statistical logics
Monitor that state property F holds with at least probability p:
max Empty() = false .min Last() = @Empty() .
min A(Term F, float p, int f, int t) = ( Last() /\ (( F /\ (1 – f/t) >= p) \/
(¬F /\ (1 – (f+1)/t) >= p)) ) \/
(¬Last() /\ (( F → @A(F, p, f, t+1)) /\ (¬F → @A(F, p, f+1, t+1))) .
mon AtLeast (Term F, float p) = A(F, p, 0, 1) .
Page 18
Regular Expressions
M = (idle*;open;access*;close)*
Property:File accesses are always enclosed by open and close operations.
mon M = S(S(idle());open();S(access());close()) .
max S(Term t) = t /\ @ S(t) . // Starmin P(Term t) = t /\ @ P(t) . // Plus
Page 19
Grammars
// Match rule:
max Match (Term l, Term r) =Empty() \/ (l;Match(l,r);r;Match(l,r)) .
// Monitor:
mon M = Match(lock(),release()) .
Property:Locks are acquired and released nested.
lock release
lock lock
release release
lock release
lock lock
release release
Page 20
EAGLE by example: beyond regular
Monitor a sequence of login and logout events – at no point should there be more logouts than logins and they should match at the end of the trace.
min Match (Term F1, Term F2) =Empty() \/
F1 ; Match(F1, F2) ; F2 ; Match(F1, F2)
mon M1 = Match(login, logout)
Page 21
Syntax
Page 22
Semantics
Page 23
Some EAGLE facts
• EAGLE-LTL (past and future). Monitoring formula of size m has space complexity bounded by m2 2m log m
• EAGLE with data binding has worst case dependent on length of input trace
• EAGLE without data is at least Context Free
Page 24
EAGLE interface
class Observer {
Monitors mons;
State state;
eventHandler(Event e){
state.update(e);
mons.apply(state);
}
}
class MyState extends EagleState{ int x,y; update(Event e){ x = e.x; y = e.y; }}
class Monitors {
Formula M1, M2;
apply(State s){
M1.apply(s);
M2.apply(s); }
}
class Event {
int x,y;
}
e1 e2 e3 …
User definesthese
classes
Page 25
Eagle Implementation
• Built an initial prototype implementation and used it for a number of applications• test case monitoring scenario• monitoring the behavior of a planning system
• High performance algorithm• pay for what you use
– e.g. state machine
• symbolic manipulation -> automata-like solutions
Page 26
Online Algorithm
• Start with the initial formula M to monitor• “see” a new state• Transform M to M’ so that
M is valid on the whole trace
iff
M’ is valid on the remaining trace
Page 27
Basic Algorithm: Future Time LTL
max Alws(Term t) = t /\ @ Alws(t) .min Even(Term t) = t \/ @ Even(t) .mon M = Alws(y>0 -> Even(z>0))) .M0 = Alws(y>0 -> Even(z>0)))
M1 = Alws(y>0 -> Even(z>0)))M2 = Alws(y>0 -> Even(z>0))) /\ Even(z>0)M3 = Alws(y>0 -> Even(z>0))) /\ Even(z>0)M4 = Alws(y>0 -> Even(z>0)))M5 = Alws(y>0 -> Even(z>0))) /\ Even(z>0)
y=2z=0
y=-1z=0
y=-1z=0
y=-1z=2
y=2z=0
Page 28
Toward an Algorithm
The previous example suggests how monitoring may be reduced to state machine execution• For propositional future time LTL by employing
normal forms and simplification, the set of derivable monitors is finite. This is the idea behind generating a Buchi automata for model checking propositional future time LTL.
• However Eagle has,• Past time operators• Data parameters
Page 29
Past Time
max Alws(Term t) = t /\ @ Alws(t) . min Prev(Term t) = t \/ # Prev(t) .
mon M = Alws(y>0 -> (Prev(x>0))) .
Cache: C = Prev(x>0)
C0 = false M0 = Alws(y>0 -> (Prev(x>0)))
c1 = false M1 = Alws(y>0 -> (Prev(x>0)))
c2 = true M2 = false
y=2x=2
y=-1x=0
y=-1x=0
y=-1x=2
y=2x=0
Page 30
Algorithm with Past Time
• Static Analysis• determine what formulas are needed in cache
• Evaluate formulas in cache in state 0• for each state:
• Evaluate monitors, referring to cache to look up values of formulas headed by #
• update the cache
• End of trace• evaluate monitor for “beyond the last state”
Page 31
Improvements
• Conjunctive normal form• Subsumption: (a \/ b) /\ a => a• Term caching
• assign a unique id to each normalized term• Define various caches of the form
cache:TermId -> TermId– subsumption cache– rule application cache
• Normalization folded into evaluation• Ordering evaluation of conjuncts and disjuncts
• Terms which do not (hereditarily) contain instances of the @ operator will fully evaluate to true/false.
• Evaluate such terms first
Page 32
Improvements : Automata Construction
• On the fly automata construction• for terms that are conjunctions, disjunctions
or rule applications dynamically create a state machine (decision tree) for evaluation.
• map termId -> Automata
x>0
z==ytrue
w==y
result is termID1
?
falseresult is termID2
?
true
true
falsefalse
Page 33
Improvements: Reflection Removal
• Removal of use of reflection for Java expression evaluation• Alws(y>0 -> (Prev(x>0))• static boolean greater(int a, int b)• Defined in user-defined State class• Generate during static analysis an interface
class that dispatches on term id of methodCall terms to directly call the user-supplied method
– all arguments must be available, otherwise it must be symbolically evaluated
– must treat substitution instances
Page 34
On the way to…
• An Eagle compiler• static analysis of monitor specification to
generate an alternating automata with no term manipulation and no use of reflection
Page 35
Instrument Specification
• An instrument specification is a collection of rules lhs rhs• LHS conjunction of syntactic conditions on a program
point (local conditions only)• RHS set of actions that log reporting events
• Each bytecode “statement” is examined to see if it satisfies RHS. Actions of all matching LHS are collected and inserted into the bytecode at that “point”.
Page 36
Predicates
Predicate Name Parameters Returns true iff current statement:
InMethod StringmethodNameSpec
is within method with name that matches methodNameSpec.
AtMethodStart StringmethodNameSpec
is the first statement within method with name that matches method-NameSpec.
AtFieldAccess StringfieldNameSpecBooleanonlyUpdates
accesses a field who’s name matches fieldNameSpec. If onlyUpdates is true, only write accesses are matched.
AtSyncStart enters a synchronized block, taking a lock.
AtStatementType String stmtType Is a statement of type stmtType, which ranges over 32 different statement types, such as: ”assign”,”break”, ”return”, ”try”, ”throw”, ”while”, etc.
InStatementRange int lowerBound int upperBound
is within the range of line numbers lowerBound and upperBound
Page 37
Reportable Events
ActionName
Parameters Inserts code that reports:
ReportMethedStart String methodNameSpec
name of method entered, who’s name matches ethodNameSpec.
ReportField String fieldNameSpe boolean onlyUpdates
name of field that is accessed, who’s name matches fieldNameSpec. If onlyUpdates is true, only write accesses are matched.
ReportLocal String varNameSpecboolean onlyUpdates
name of local variable that is accessed, who’s name matches var-NameSpec. If onlyUpdates is true,only write accesses are matched.
ReportSyncStart String classNameSpec identity and class (name) of object that is locked, where the class name matches classNameSpec.
Page 38
Reportable EventsActionName
Parameters Inserts code that reports:
ReportTimeStamp the current time.
ReportProgramPoint the current line number.
ReportExpression String expressionName String importsString expressionBody byte expressionType String parameters
the value of expressionBody, and an identifer expressionName. Parameters and expressionType indicate the expression’s free parameters and the result type. Expression is evaluated locally
Page 39
Using AOP for instrumentation
F ::= true | false | ~ F | F /\ F | f \/ F | F → F | @ F | # F | F;F
|E | E => F | E & F
E ::= [Nm:]Nm.Id(Nm,*)[returns [Nm]]Id ::= Java identifier
Nm ::= Id[?| ]*
Page 40
Example Using AOP
observer BufferObs{ max Always(Term t ) = t /\ @ Always(t).
min Eventually(Term t) = t \/ @ Eventually(t). min Previously(term t) = t \/ # Previously(t).
var Buffer b; var Object o;
monitor InOut = Always(put(b?,o?) => Eventually(get(b) returns o)).
monitor OutIn = Always(get(b?) returns o? => Previously(put(b,o))).
}
Page 41
Software Fault Protection: Motivation
To obtain higher levels of assurance and quality for safety and mission critical software.
Residual error rate per line of code
Marginal cost to rem
ove next erro
r
3 errors/1KLOC 10 errors/1KLOC
Page 42
Approach
• Instead of climbing the error curve, build mechanisms into the code to protect against inevitable residual software errors.
• Software Fault Protection.
Page 43
Three Levels of Fault Protection
• Caution and Warning Systems• Warning generated for off nominal measurements of
system state and resources
• Autonomic response• Circuit breaker• MER low battery warning stopped repeated reboot
cycle and transition into safe mode.
• Model Based Diagnosis: fault detection isolation and repair• HAL reports a module will go critical in 7 days
requiring EVA
Page 44
Differences between System Engineering and Software
• Software does not wear out, but suffers from design and coding errors.
• Software has not been traditionally designed with using fault containment concepts• E.g. on MER hardware exceptions generated by out of
memory faults were not explicitly dealt with.• A natural concept of component based on physicality
exists for hardware. These components have known failure modes • E.g. a valve may be stuck on or a pipe may leak.
Page 45
Component
• Component is • Organizing notion for system
composition/decomposition• Goal is to identify a failed component and possibly
reconfigure components maximizing functional capability
• Spectrum of component choices• Low granularity: a statement is a component.• Medium granularity : a procedure/method is a
component.• High granularity: a component in a modeling
framework.
Page 46
Model
• It must be cost effective to formulate model• Model must be useful in identifying and isolating faults.• Model has a delicate relationship to code:
• structural + limited behavioral code synthesis,• monitor behavioral properties that are beyond synthesis.
Software Fault protection is like V&V in one important respect:
Consistency check between code andmodel
Page 47
Approach
• High granularity components• difficult to model, diagnose and repair at lower levels of
granularity• Addresses higher-payoff, system of system and system
integration issues.
• Model• structural: component/connector models properties
– interface behavior: • patterns of event interactions over connectors• real time response• data validity (pre- post- conditions)
– resource usage• memory, object creation, disk, communication
devices, quality of service, deadlock and other concurrency problems
Page 48
Software Fault Protection Fault Detection, Isolation and Repair (FDIR)
• Detection: instrument and monitor system and identify an error state
• Isolation: identify the faulty component• Repair: take corrective action
Page 49
Detection
• Detection = property violation• Use Eagle to define and monitor propertie
Page 50
Isolation (Diagnosis)
• Process of moving from symptom to identification or isolation of faulty component.• Our simplistic approach:
– Associate a failed component with each property.
Page 51
Repair
• “Do no harm”• generate a problem report• Micro-reboot: reset or re-initialize a
component to repair an inconsistent data state.
• Reconfiguration with reduced functionality
• Replace module with a less capable backup– 1.5 version programming
Page 52
Example System
Producer 1
Queue Consumer
Producer 2
• Every request must be responded to the consumer within 50 ms. • Consumer must stay alive. It does not stop processing requests.• Queue must stay alive. It does not stop receiving and forwarding
transactions• Any transaction processed by the consumer did indeed originate
from one of the producers. • Queue size remains bounded at size at most 50. • Requests have priority that the Queue must respect
Page 53
The State
class State extends EagleState { static String con; static int ident; static int clock;
static void update(String event) {...} static boolean B1(){return con.equals("B1");} static boolean B2(){return con.equals("B2");} static boolean C() {return con.equals("C");} static boolean D() {return con.equals("D");} }
Page 54
Linear Temmporal Logic
max A(Term t) = t /\ @ A(t)
min E(Term t) = t \/ @ A(t)
min P(Term t) = t \/ # P(t)
min U(Term t1, Term t2) = t2 \/ (t1 /\ @U(t1,t2))
max W(Term t1, Term t2) = A(t1) \/ U(t1,t2)
Page 55
Some Basic Definitions
min B12() = B1() \/ B2()
min B2(int id) = B2() /\ ident = id
min C(int id) = C() /\ ident = id
Page 56
Extensions of E and P withData Constraints
min Et(Term t, int time) = E(t /\ clock <= time)
min Eti(Term t, int time, int id) = E(t /\ clock <= time /\ ident = id)
min Pi(Term t, int id) = P(t /\ ident = id)
Page 57
Requirement #1
requiredResponse Every message from a producer (communicated oneither B1 or B2) must be responded to and acknowledged (on D)
bythe consumer within 50 seconds.
mon requiredResponse = A(B12() → Eti(D(), clock + 50, ident))
Page 58
Requirement #2
stayAliveQueue Whenever a message is sent to the queue from a producer (on either B1 or B2), a message (not necessarily the same) must be consumed by the consumer (on C) within 32 seconds.
mon stayAliveQueue = A(B12() → Et(C(), clock + 32))
Page 59
Requirement #3
limitedSize There are never more that 40 messages in the queue.
max CountSize(int size) = B12() → (size < 40 @ CountSize(size + 1)) /\ C() → @ CountSize(size − 1) /\ D() → @ CountSize(size)
mon limitedSize = CountSize(0)
Page 60
Requirement #4
C D Alternation The consumer should alternate between consuming messages (on C) and acknowledging messages (on D).
max S1() = (C() -> @S2()) /\ (~C() -> @S1()) /\ ~D()
Min S2() = (D() -> @S1()) /\ (~D() -> @S2()) /\ ~C()
mon C_D_Alternation = S1()
Page 61
Requirement #5
stayAliveConsumer Every message consumed (on C) by the consumer is processed and acknowledged (on D) within 30 seconds.
mon stayAliveConsumer = A(C() → Et(D(), clock + 30))
Page 62
Requirement #6
noJunkConsumed Every message consumed by the coonsumer (on C) has
previously Been produced (on Bor B).
mon noJunkConsumed = A(C() → Pi(B12(), ident))
Page 63
Requirement #7
noJunkAcknowledged Every message processed and acknowledged by the consumer (on
D) haspreviously been consumed by the consumer (on C).
mon noJunkAcknowledged = A(D() -> Pi(C(), ident))
Page 64
Requirement #8orderPreserved The queue behaves as a FIFO queue wrt. messages produced byproducer 1. That is, the consumer consumes (on C) messages
produced on B1 in the order in which they were produced.
max R0() = (B1() -> R1’(ident,0)) /\ @R0()
max R1’(int id1,int id2) = C(id1) \/ ((B1() -> R2’(id1,ident)) /\ (~B1() -> @R1(id1,id2)))
max R2’(int id1,int id2) = @R2(id1,id2)max R2(int id1,int id2) = W(~C(id2),C(id1))
mon orderPreserved = R0()
Page 65
Requirement #9
B1HasPriority Messages produced by producer 1 (on B1) have priority over
messages produced by producer 2 (on B2). That is, as long as there are
pending B1 messages in the queue, no B2 message will be consumed (on C)
by the consumer. min CForB1BeforeCForB2(int id) = U(~(C() /\ identFromB2(ident)), C(id))
min identFromB2(int id) = P(B2(id))
mon B1HasPriority = A(B1() -> CForB1BeforeCForB2(ident))
Page 66
Example Implementation
• Each component is a process • Components communicate using Java
Messaging Service• Instrumentation by listening to message
traffic• FDIR component can start, reset,
terminate, or replace components
Page 67
Summary
• EAGLE is a succinct but highly expressive finite trace monitoring logic. Can elegantly encode any monitoring logic we have investigated.
• EAGLE can be efficiently implemented, but users must remain aware of expensive features.
• EAGLE demonstrated by integration within a formal test environment, showing the benefit of novel combinations of formal methods and test.
• EAGLE has a natural application in fault protection, but this is a very preliminary idea yet to be validated as useful.