1 abstractions and decision procedures for effective software model checking microsoft summer...

1

Abstractions and Decision Procedures for Effective

Software Model Checking

Microsoft Summer School, Moscow, July 2011Lecture 1

Prof. Natasha SharyginaThe University of Lugano,

Carnegie Mellon University

2

Outline

Day 1 (Lectures 1 and 2)• Model Checking in a Nutshell• Abstractions in Model Checking

– Predicate Abstraction – SAT-based approach

Bug Catching: Automated Program Analysis

Informatics DepartmentThe University of Lugano

Professor Natasha Sharygina

Guess what this is!

Bug Catching: Automated Program Analysis

Informatics DepartmentThe University of Lugano

Professor Natasha Sharygina

Two trains, one bridge – model transformed with a simulation tool,

Hugo

12

What is Formal Verification?

• Build a mathematical model of the system:– what are possible behaviors?

• Write correctness requirement in a specification language: – what are desirable behaviors?

• Analysis: (Automatically) check that model satisfies specification

13

What is Formal Verification (2)?

• Formal - Correctness claim is a precise mathematical statement

• Verification - Analysis either proves or disproves the correctness claim

14

Algorithmic Analysis by Model Checking

• Analysis is performed by an algorithm (tool)

• Analysis gives counterexamples for debugging

• Typically requires exhaustive search of state-space

• Limited by high computational complexity

15

Temporal Logic Model Checking[Clarke,Emerson 81][Queille,Sifakis 82]

M |= P

“implementation” (system model)

“specification” (system property)

“satisfies”, “implements”, “refines” (satisfaction relation)

16

M |= P

“implementation” (system model)

“specification” (system property)

“satisfies”, “implements”, “refines”, “confirms”, (satisfaction relation)

more detailed

more abstract

Temporal Logic Model Checking

17

M |= P

system model system specification

satisfaction relation


18

variable-based vs. event-based

interleaving vs. true concurrency

synchronous vs. asynchronous interaction

clocked vs. speed-independent progress

etc.

Decisions when choosing a system model:

19

Characteristics of system models

which favor model checking over other verification techniques:

ongoing input/output behavior (not: single input, single result)

concurrency

(not: single control flow)

control intensive

(not: lots of data manipulation)

20

While the choice of system model is important for ease of modeling in a given situation,

the only thing that is important for model checking is that the system model can be translated into some form of state-transition graph.

Decisions when choosing a system model:

21

Finite State Machine (FSM)

• Specify state-transition behavior• Transitions depict observable behavior

ERROR

unlock unlock

lock

lock

Acceptable sequences of acquiring and releasing a lock

22

High-level View

LinuxKernel

(C)

Spec(FSM)

ConformanceCheck

23

High-level View

LinuxKernel

(C)

Finite StateModel(FSM)

Spec(FSM)

By Construction

Model Checking

24

State-transition graph

S set of states

I set of initial states

AP set of atomic observation

R S S transition relation

L: S 2AP observation (labeling) function

Low-level View

25

a

a,b b

s1

s3s2

Run: s1 s3 s1 s3 s1 state sequence

Trace: a b a b a observation sequence

26

Model of Computation

Infinite Computation Tree

a b

b c

c

c

a b c

a b

b c

c

State Transition Graph

Unwind State Graph to obtain Infinite Tree.

A trace is an infinite sequence of state observations

27

Semantics

Infinite Computation Tree

a b

b c

c

c

a b c

a b

b c

c

State Transition Graph

The semantics of a FSM is a set of traces

28

Where is the model?

• Need to extract automatically• Easier to construct from hardware• Fundamental challenge for software

Linux Kernel~1000,000 LOC

Recursion and data structuresPointers and Dynamic memory

Processes and threads

Finite StateModel

29

Mutual-exclusion protocol

loop

out: x1 := 1; last := 1

req: await x2 = 0 or last = 2

in: x1 := 0

end loop.

loop

out: x2 := 1; last := 2

req: await x1 = 0 or last = 1

in: x2 := 0

end loop.

||

P1 P2

30

oo001

rr112

ro101 or012

ir112

io101

pc1: {o,r,i} pc2: {o,r,i} x1: {0,1} x2: {0,1} last: {1,2}

33222 = 72 states

31

The translation from a system description to a state-transition graph usually involves an exponential blow-up !!!

e.g., n boolean variables 2n states

This is called the “state-explosion problem.”

State space blow up

32

M |= P

system model system specification

satisfaction relation


33

operational vs. declarative: automata vs. logic

may vs. must: branching vs. linear time

prohibiting bad vs. desiring good behavior: safety vs. liveness

Decisions when choosing system properties:

34

System Properties/Specifications

- Atomic propositions: properties of states

- (Linear) Temporal Logic Specifications: properties of traces.

37

Examples of the Robot Control Properties

• Configuration Validity Check:If an instance of EndEffector is in the “FollowingDesiredTrajectory” state, then the instance of the corresponding Arm class is in the ‘Valid” state

Always((ee_reference=1) ->(arm_status=1)

• Control Termination: Eventually the robot control terminates

Eventually(abort_var=1)

38

What is “satisfy”?

M satisfies S if all the reachable states satisfy P

Different Algorithms to check if M |= P.

- Explicit State Space Exploration

For example: Invariant checking Algorithm.

1. Start at the initial states and explore the states of Musing DFS or BFS.

2. In any state, if P is violated then print an “error trace”.

3. If all reachable states have been visited then say “yes”.

40

Abstractions

• They are one of the most useful ways to fight the state explosion problem

• They should preserve properties of interest: properties that hold for the abstract model should hold for the concrete model

• Abstractions should be constructed directly from the program

• Why do we need to abstract?– To reduce a number of states– To represent (in a sound manner) infinite state

systems as finite state systems

Abstractions

• Why we need to abstract?– To reduce a number of states– To represent (in a sound manner) infinite state

systems as finite state systems

• How do we abstract?– By removing irrelevant to verification details

Abstractions

43

Data Abstraction

Given a program P with variables x1,...xn , each over domain D, the concrete model of P is defined over states

(d1,...,dn) D...D Choosing

• Abstract domain A• Abstraction mapping (surjection) h: D A

we get an abstract model over abstract states (a1,...,an) A...A

44

ExampleGiven a program P with variable x over the integers

Abstraction 1:

A1 = { a–, a0, a+ }

a+ if d>0

h1(d) = a0 if d=0

a– if d<0

Abstraction 2:

A2 = { aeven, aodd }

h2(d) = if even( |d| ) then aeven else aodd

45

h h h

Existential Abstraction

M

A

M < A

46

A


1

2 3

4 6

a b

c f

M

[2,3]

[4,5] [6,7]

[1]

5 7

ed

a b

c d fe

47


• Every trace of M is a trace of A

– A over-approximates what M can do (Preserves safety properties!): A satisfies M satisfies

• Some traces of A may not be traces of M

– May yield spurious counterexamples - < a, e >

• Eliminated via abstraction refinement

– Splitting some clusters in smaller ones– Refinement can be automated

48

A

Original Abstraction

1

2 3

4 6

a b

c f

M

[2,3]

[4,5] [6,7]

[1]

5 7

ed

a b

c d fe

49

A

Refined Abstraction

1

2 3

4 6

a b

c f

M

[4,5] [6,7]

[1]

5 7

ed

a b

c d

[2] [3]

e f

How to define an abstract model

Given M (model) and ϕ (spec), choose• Sh - a set of abstract states

• AP – a set of atomic propositions that label

concrete and abstract states• h : S → Sh - a mapping from S on Sh that

satisfies:h(s) = h(t) only if L(s)=L(t)

Abstraction

Depending on h and the size of M, Mh (i.e., Ih,

Rh ) can be built using:

• BDDs or• SAT solver or• Theorem prover

52

Predicate Abstraction

[Graf/Saïdi 97]

• Idea: Only keep track of predicates on data

• Abstraction function:

Labeling of concrete states:

L(s) = { Pi | s |= Pi }


Abstract Model

• Abstract states are defined over Boolean

variables { B1,...,Bk }:

Example

Program over natural variables x, yAP = { P1, P2, P3 }, whereP1 = x≤1 , P2 = x>y , P3 = y=2

AP = { x≤1 , x>y , y=2 }

For state s, where s(x)=s(y)=0: L(s) = { P1 }For state t, where t(x)=1, t(y)=2: L(t) =

{ P1,P3 }

Example

Computing abstract transition relation

(the same example)

Abstract transition relation

ExampleConcrete States:

Predicates:

Abstract transitions?


Abstract Transitions:

Property:

Property holds. Ok.

Abstract Transitions:

Property:

This trace is

spurious!


New Predicates:


CEGAR

Counter Example Guided Abstraction Refinement

CEGAR approach

CEGAR

MOriginal Model

Refinement

Refinement

M Initial AbstractionSpurious

Spuriouscounterexample

Validation orCounterexample Correct !

68

Abstraction Refinement Loop

ActualProgram

ConcurrentBooleanProgram

ModelChecker

Abstraction refinement

VerificationInitial

AbstractionNo error

or bug found

Spurious counterexample

Simulator

Propertyholds

Simulationsuccessful

Bug found

Refinement

Counterexample

69

Predicate Abstraction for Software• Let’s take existential abstraction seriously

• Basic idea: with n predicates, there are 2n x 2n possible abstract transitions

• Let’s just check them!

70

Existential AbstractionPredicates

i++;

Basic Block Formula

Current Abstract State Next Abstract State

p1 p2 p3

0 0 0

0 0 1

0 1 0

0 1 1

1 0 0

1 0 1

1 1 0

1 1 1

p’1 p’2 p’3

0 0 0

0 0 1

0 1 0

0 1 1

1 0 0

1 0 1

1 1 0

1 1 1

??Query

71

Existential AbstractionPredicates

i++;

Basic Block Formula

Current Abstract State Next Abstract State

p1 p2 p3

0 0 0

0 0 1

0 1 0

0 1 1

1 0 0

1 0 1

1 1 0

1 1 1

p’1 p’2 p’3

0 0 0

0 0 1

0 1 0

0 1 1

1 0 0

1 0 1

1 1 0

1 1 1

Query

??

… and so on …2n x 2n possible abstract transitions for

n predicates

What is the problem?

Problem of existing tools: Large number of expensive theorem prover calls – slow

Over-approximation yields additional,unnecessary spurious counterexamples

Theorem prover works on natural numbers, but ANSI-C uses bit-vectors false positives

Most theorem provers support only few operators(+, -, <, ≤, …), no bitwise operators

SAT-based approach

• Successfully used for abstraction of various designs (Clarke, Kroening, Sharygina, Yorav – SAT-based predicate abstraction)

• There is now a version of MSR tool (SLAM) that has it– Found previously unknown Windows bug

• Create a SAT instance which relates initial value of predicates, basic block, and the values of predicates after the execution of basic block

• SAT also used for simulation and refinement

Use SAT solver!

SAT Approach: Given a propositional formula in CNF, find an assignment to Boolean variables that makes the formula true:

= 1 2 3 , where

1 = (x2 x3)

2 = (x1 x4)

3 = (x2 x4)

A = {x1=0, x2=1, x3=0, x4=1}

= 1 2 3 , where

1 = (x2 x3)

2 = (x1 x4)

3 = (x2 x4)

A = {x1=0, x2=1, x3=0, x4=1}

SATisfying assignment to

!

Use SAT solver!

1. Generate query equation withpredicates as free variables

SAT-based Solution

Single query for Theorem Prover

Query for SAT

Our Solution

Use SAT solver!1. Generate query equation with

predicates as free variables

2. Transform equation into CNF using Bit Vector Logic

One satisfying assignment matches one abstract transition

3. Obtain all satisfying assignments= most precise abstract transition relation

Query for SAT

Our Solution

This solves two problems:

1. Now can do all ANSI-C integer operators, including *, /, %, <<, etc.

2. Sound with respect to overflow

This solves two problems:

1. Now can do all ANSI-C integer operators, including *, /, %, <<, etc.

2. Sound with respect to overflow

No more unnecessary spurious counterexamples!

No more unnecessary spurious counterexamples!

Use SAT solver!1. Generate query equation with

predicates as free variables

2. Transform equation into CNF using Bit Vector Logic

One satisfying assignment matches one abstract transition

3. Obtain all satisfying assignments= most precise abstract transition relation

Performance

How does the performance compare with existing approaches?

1. Runtime potentially exponential

2. Exponential part is inside of SAT solver,instead of exponential number ofTheorem Prover calls

3. SAT solver is not re-started; all the learning and pruning done by modern SAT solvers is retained between iterations.

• Worst case:all possible assignments are satisfying

• Runtime uncritical up to 2^14 assignments

Performance

• A realistic experiment: two 32-bit variables, plus n predicates• Various operators: +, <, shifting, xor, or, and, combinations

thereof, …• All predicates are affected by basic block

Compare to 2n x 2n potential theorem prover calls!

No. of Predicates Runtime(with 32-bit *)

4 0.35 s

8 7.20 s

16 71.16 s

32 512.72 s

Performance

Experimental Results

Comparison of SLAM with Integer-based theorem prover against SAT-based SLAM

308 device drivers

Timeout: 1200s

SATABS

SATABS toolset – SAT-based predicate abstraction

Download and use with Cadence SMV model checker

Take a tcas program and verify it using SATABS

Make a SHORT report following the steps of the assignment

1 abstractions and decision procedures for effective software model checking microsoft summer...

Documents

choice of system model

mathematical model

model checkinganalysis

bridge model

proof system

formal verification

algorithm tool analysis

formal correctness claim