learning universally quantified invariants of linear data structures pranav garg 1, christof loding,...

Learning Universally Quantified Invariants of Linear Data Structures

Pranav Garg1, Christof Loding, 2 P. Madhusudan1 and Daniel Neider2

1University of Illinois at Urbana-Champaign2RWTH Aachen, Germany

2

Renewed interest in application of learning to synthesizing invariants [Sharma et al. CAV-12], [Sharma et al. SAS-13], [Kong et al. APLAS-10]

Black-box learning of invariants:

Advantages with respect to white-box techniques: - verification of complex program with simple invariants - generalization - apply extremely scalable Machine Learning algorithms

for verification.

Black-box learning of invariants

checkHypothesis?

Program Learner

Teacher

H(hypothesis)

3

Active learning:- learner queries teacher with equivalence and

membership queries

Passive learning:- given a sample = (examples, counter-examples),

learn the simplest concept

Active Learning and Passive Learning

Teacher Active Learner

membership/equivalence

yes/no

LearnerSample S

4

Build active learning algorithms for learning quantified formulas over linear data structures (arrays/lists).

- introduce Quantified Data Automata normal form for such invariants.

- build active learning algorithm for QDAs.

Build passive learning algorithm using active learning algorithm.

- based on an imprecise teacher that answers questions wrt the samples.

Introduce elastic QDAs (EQDAs) that translate to decidable logics.

- develop learning algorithms for EQDAs.

Overview

5 7 8 9

head List pointed to by head is sorted

))()(.( 212*

1*

21 ydataydatayyheadyy nextnext

5

Program Configuration/Data words

8 932

head

4 7

i

Programconfiguration:

{}7

{}9

{}3

}{8

{}4

}{2

ihead

Data word:

6

Quantified Data Automata QDAs represent universally quantified properties of linear

data structures.)),(),(.(/\ ypDataypGuardy iii

Example:

b b

head y1 y2 data(y1) <= data(y2)

))()(.( 212121 ydataydatayyheadyy nextnext

b b

7

Quantified Data AutomataFix P – program pointer variablesFix Y – set of quantified variablesFix F – numerical abstract domain over data formulas

QDA over linear data structures:- reads a data word annotated with pointers P and Y- checks whether data stored at these positions satisfy a data property

QDA accepts a data word w with pointers P if it accepts all possible extensions of w with valuations for Y.

b b


))()(.( 212121 ydataydatayyheadyy nextnext

b b

8

Valuation words Valuation word = data word over P + valuation for Y

Data word

Valuation words

Universal Quantification QDA accepts a data word iff it accepts ALL corresponding valuation words.

8 932head

4 7i8 932

head, y14 7

i, y28 932

head4 7

i, y2y1

9

Quantified Data Automata Deterministic, finite, register automata over words

- each state labeled with a data formula f For a valuation word, QDA reads ptr. and univ. vars. and

stores the data values in the register reg.

At the final state, QDA checks if these data values satisfy the formula labeling the state.- reg satisfies f(q) Accepts the valuation word- reg does not satisfy f(q) Rejects the valuation word

head 2y1 4i 8y2 8

reg:

f(q) = data(y1) <= data(y2)

8 932head

4 7i, y2y1

8 932head

4 7i, y2y1

10

QDAs are finite automata which output data formulas.

Lift Angluin’s L* algorithm for learning DFAs to learn QDAs.

Given a teacher, the unique minimal QDA can be learned in time polynomial in the size of this minimal QDA.

Learning QDAs

b


b b

*}{*}{*}{ 21 bybybheadRegular expression outputsdata(y1) <= data(y2)

11

Elastic Quantified Data Automata (EQDA)

Subclass of QDAs which translate to decidable logics- Array Property Fragment (APF) [Bradley et al. VMCAI-

06] - decidable fragment of Strand over lists [Madhusudan

et al. POPL-11]

Cannot test whether two universal vars. are a bounded distance away.

21 ),( qbq 21 qq Restriction for EQDAs: All transitions on blank symbols (no ptr./univ. var) must be self-loops

)1( 1221 yyyy )( 2121 yyyy outside APF inside APF

y1y2 b

by1

y2 b b

QDA EQDA

12

Elastic Quantified Data Automata (EQDA)

Unique minimal over-approximation theorem:A QDA A can be uniquely minimally over-approximated by a language of valuation words that is accepted by an EQDA Ael

The construction of Ael given QDA A is called elastification.

Learning EQDAs <= learning QDAs + elastification.

AAel

Bel Cel

13

Passively learning QDAs

Given the samples S+ and S-, the teacher uses them to answer the active learner.The teacher wants the active learner to construct a QDA that includes S+ and

excludes S-.

Membership query:- if s belongs to S+, return yes- if s belongs to S-, return no- otherwise, return no (errs on keeping the learned concept semantically small)

Equivalence query: - checks if conjectured invariant is consistent with S+ and S-

The learned QDA might be non-optimal (usually small).Running time is polynomial in the size of the learned QDA.

TeacherSample S+, S-

Active Learner

PassiveLearner

14

Experiments Run the program on arrays/lists of small bounded sizes,

with data values from a bounded data-domain, eg. {0, 1, 2}, etc.

Extract the concrete data-structures that get manifest at loop headers.

Obtain the set S+ on which passive learning is performed.- fix F to the cartesian lattice of atomic formulas over

relations {=, <, ≤}

Learn QDAs using Angluin’s algorithm- The learner never asks long membership queries- The teacher, thus, often has correct answers.

The learned QDA is over-approximated to an elastic QDA to get a quantified invariant over decidable Strand or APF.

15

ExperimentsPrograms #Equiv. #Mem #States Time (teacher) Time (learner)

BUBBLE-SORT 3 447 12 0.19 0.01

QUICK-SORT 1 37 5 0.03 0.00

SELECTION-SORT 3 306 11 0.18 0.01

INSERTION-SORT 3 305 11 0.19 0.00

HEAP-SORT 1 57 6 0.05 0.01

SORTED-FIND 6 1683 15 0.04 0.01

SORTED-INSERT 3 1096 20 0.04 0.01

SORTED-MERGE 1 5775 42 10.50 0.06

SORTED-REVERSE 2 439 18 0.02 0.00

COPY 2 146 10 1.75 0.00

COMPARE 2 146 10 0.51 0.00

MAX 7 1608 14 0.08 0.00

INIT 5 879 10 0.07 0.01

FIND 2 121 8 0.05 0.00

PARTITITON 10 11807 38 11.40 0.11

SPLIT 2 287 14 0.21 0.00

COREUTILS-SORT 17 37 5 0.03 0.07

16


BUBBLE-SORT 3 447 12 0.19 0.01

QUICK-SORT 1 37 5 0.03 0.00

SELECTION-SORT 3 306 11 0.18 0.01

INSERTION-SORT 3 305 11 0.19 0.00

HEAP-SORT 1 57 6 0.05 0.01

SORTED-FIND 6 1683 15 0.04 0.01

SORTED-INSERT 3 1096 20 0.04 0.01

SORTED-MERGE 1 5775 42 10.50 0.06

SORTED-REVERSE 2 439 18 0.02 0.00

COPY 2 146 10 1.75 0.00

COMPARE 2 146 10 0.51 0.00

MAX 7 1608 14 0.08 0.00

INIT 5 879 10 0.07 0.01

FIND 2 121 8 0.05 0.00

PARTITITON 10 11807 38 11.40 0.11

SPLIT 2 287 14 0.21 0.00


17


BUBBLE-SORT 3 447 12 0.19 0.01

QUICK-SORT 1 37 5 0.03 0.00

SELECTION-SORT 3 306 11 0.18 0.01

INSERTION-SORT 3 305 11 0.19 0.00

HEAP-SORT 1 57 6 0.05 0.01

SORTED-FIND 6 1683 15 0.04 0.01

SORTED-INSERT 3 1096 20 0.04 0.01

SORTED-MERGE 1 5775 42 10.50 0.06

SORTED-REVERSE 2 439 18 0.02 0.00

COPY 2 146 10 1.75 0.00

COMPARE 2 146 10 0.51 0.00

MAX 7 1608 14 0.08 0.00

INIT 5 879 10 0.07 0.01

FIND 2 121 8 0.05 0.00

PARTITITON 10 11807 38 11.40 0.11

SPLIT 2 287 14 0.21 0.00


18


BUBBLE-SORT 3 447 12 0.19 0.01

QUICK-SORT 1 37 5 0.03 0.00

SELECTION-SORT 3 306 11 0.18 0.01

INSERTION-SORT 3 305 11 0.19 0.00

HEAP-SORT 1 57 6 0.05 0.01

SORTED-FIND 6 1683 15 0.04 0.01

SORTED-INSERT 3 1096 20 0.04 0.01

SORTED-MERGE 1 5775 42 10.50 0.06

SORTED-REVERSE 2 439 18 0.02 0.00

COPY 2 146 10 1.75 0.00

COMPARE 2 146 10 0.51 0.00

MAX 7 1608 14 0.08 0.00

INIT 5 879 10 0.07 0.01

FIND 2 121 8 0.05 0.00

PARTITITON 10 11807 38 11.40 0.11

SPLIT 2 287 14 0.21 0.00


19


BUBBLE-SORT 3 447 12 0.19 0.01

QUICK-SORT 1 37 5 0.03 0.00

SELECTION-SORT 3 306 11 0.18 0.01

INSERTION-SORT 3 305 11 0.19 0.00

HEAP-SORT 1 57 6 0.05 0.01

SORTED-FIND 6 1683 15 0.04 0.01

SORTED-INSERT 3 1096 20 0.04 0.01

SORTED-MERGE 1 5775 42 10.50 0.06

SORTED-REVERSE 2 439 18 0.02 0.00

COPY 2 146 10 1.75 0.00

COMPARE 2 146 10 0.51 0.00

MAX 7 1608 14 0.08 0.00

INIT 5 879 10 0.07 0.01

FIND 2 121 8 0.05 0.00

PARTITITON 10 11807 38 11.40 0.11

SPLIT 2 287 14 0.21 0.00


20

Related Work Daikon [Ernst et al. ICSE-00]

- conjunctive Boolean learning- learns quantified invariants over arrays, to some

extent.

Applications of learning in verification- rely-guarantee contracts [Cobleigh et al. TACAS-03,

Alur et al. CAV-05]- stateful interfaces [Alur et al. POPL-05]- learning quantified invariants over predicates [Kong et

al. APLAS-10]

Machine learning algorithms for invariant synthesis[Sharma et al. CAV-12, SAS-13, ESOP-13]

21

Conclusion Learning universally quantified invariants over linear

data structures- Quantified Data Automata (QDA) / elastic QDAs- Active learning for QDAs- Unique elastification- Algorithm for passive learning QDAs/EQDAs.- Experimental validation

Future Work: Extensions to trees to capture universally quantified

properties like binary-search-tree, max-heap, … Combining automata based structural learning with

machine learning algorithms for learning data formulasThank You !

learning universally quantified invariants of linear data structures pranav garg 1, christof loding,...

Documents