seoul national university academia sinica 1 2 seoul national...

1
? “bug-finders” “verifiers” Soundness Scalability Precision “optimizers” 대형 프로그램을 안전하고 정확하게 분석하기 오학주, 이원찬, 허기홍, 양홍석, 이광근 (서울대학교, 옥스퍼드 대학교) Programming Research Laboratory 정확도 향상 기술 문제 / 목표 1 3 안전하고(sound), 정확하게(precise), 대형 프로그램을(scalable) 분 석하는 일반적인 방법 대형 프로그램 분석 기술 2 / 25 x = x+1 y = y-1 z = x v = y ret *a+*b x y z v a b x y z v a b x y z v a b x y z v a b x y z v a b x x y y z z v v a b 핵심 아이디어: 스파스 분석 (sparse analysis) Thm. (preservation of soundness and precision) 스파스 분석 프레임워크 (sparse analysis framework) 일반적으로 적용가능: - 프로그래밍 언어 (imperative, functional, oop, etc) - 분석 수준 (flow-sensitivity, context-sensitivity, etc) 선별적 정확도 향상 (selective X-sensitivity) program states error states program states error states vs. k-CFA: +25% / -1300% +25% / -25% 핵심 아이디어: 임팩트 전분석 (Impact pre-analysis) Main Analysis Context-sensitivity Other precision aspects Impact Pre-analysis w q 예: 선별적 문맥 구분 (selective context-sensivitiy) int h(n) {ret n;} void f(a) { x = h(a); assert(x > 1); // Q1 y = h(input()); assert(y > 1); // Q2 } void g() {f(8);} void m() { f(4); g(); g(); } c1: c2: c4: c5: c6: c3: g h h h h h h f f m c4 c6 c5 c3 c3 c1 c2 c1 c2 c1 c2 f g [4,4] [-∞,+∞] [8,8] [8,8] [-∞,+∞] [-∞,+∞] h h m c4 {c5,c6} c3 c1 c1 f g h f c2 c2

Upload: others

Post on 25-Mar-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Seoul National University Academia Sinica 1 2 Seoul National …rosaec.snu.ac.kr/meet/file/20140730x.pdf · 2018. 4. 12. · Seoul National University Academia Sinica 1 1 2 1 1 2?

/ 23

임팩트 전분석

20

g

h

h

h

h

h

h

f

fm

c4

c6

c5 c3

c3

c1

c2

c1

c2

c1

c2

fg

int h(n) {ret n;} !void f(a) { x = h(a); assert(x > 1); // Q1 y = h(input()); assert(y > 1); // Q2 } !void g() {f(8);} !void m() { f(4); g(); g(); }

c1:

c2:

c4:c5:c6:

c3:

/ 45

Common Sense: Infeasible

17

?

“bug-finders”

“verifiers”

Soundness

Scalability Precision

Before !

my work

Catching Software Bugs Early at Build Time

An Overview ofSparrow’s Static Program Analysis Technology

July 2007

Copyright c⃝ 2007 Fasoo.com, Inc. All rights reserved.

“optimizers”

대형 프로그램을 안전하고 정확하게 분석하기 오학주, 이원찬, 허기홍, 양홍석, 이광근 (서울대학교, 옥스퍼드 대학교)

Membership Query

5. Conclusion* Novel approach to invariants generation.* Plentiful of invariants are the reason why this approach is working with random answers.* We are currently working on its extension supporting quantified invariants.

Equivalence Query 1. Check that the guess meets the three conditions to be an invariant. 2. If not, try to find a counter example.

3. Solutions3.1. Predicate AbstractionTo find propositional invariant by using the CDNF algorithm which generates only Boolean formula.

Example:

3.2. Query ResolutionUse under/over approximations of the invariant, derived from precondition , loop guard , and postcondition .

Deriving Invariants in Propositional Logic by Algorithmic Learning, Decision Procedure, and Predicate Abstraction

Yungbum Jung, Soonho Kong, Bow-Yaw Wang, and Kwangkeun Yi

1. Problem : Find an invariant for the Loop

Invariant must satisfy the following conditions:

Example:

Invariant :

{�} while ⇤ do S end {⇥}

✏ = (i = 10 ^ ret)

� = (i = 0)

2. Idea : Using the CDNF AlgorithmExact Learning Algorithm for Boolean formula Asks two types of queries:

1.

2.

Membership Query asksif the truth assignment satisfies . µ

MEM (µ)�

MEM (µ) = Yes if µ |= �

MEM (µ) = No if µ �|= �

Equivalence Query asks if the Boolean formula is equivalent to , If not, the teacher returns a truth assignment as a counterexample.

EQ(�)� �

EQ(�) = YesEQ(�) = No

if � ⌘ �

ProgrammingResearch Laboratory

A.B.C.

� ^ ⇢ ) I

I ^ � ) Pre(I , S)

I ⇥ ¬⇢ � ✏

I

while i < 10 do

r := random();

if r != 0 then i := i + 1

end

S

i < 10 _ (i = 10 ^ ret)

bi<10 _ (bi=10 ^ bret)i < 10 _ (i = 10 ^ ret)

be the resulting parse stacks for the code fragments x and y. Theparse stack p for the concatenation x.y is computed as

p = parse(pinit , x.y)

= parse(parse(pinit , x), y)

= parse(px, y).

However, what we have is py = parse(pinit , y) not parse(px, y).The parse(pinit , y) is the parse stack after parsing y from the initialstack not from the px stack. We cannot directly compute p from px

and py .Abstracting the code c into the parse stack transition function

⇤p.parse(p, c) solves the above concatenation problem elegantly.Let fx and fy be the parse stack transition functions for codefragments x and y respectively. Then we have

fx = ⇤p.parse(p, x)

fy = ⇤p.parse(p, y).

With the two parse stack transition functions fx and fy , we con-struct the parse stack transition function fx.y as follows.

fx.y = ⇤p.parse(p, x.y)

= ⇤p.parse(parse(p, x), y)

= (⇤p.parse(p, y)) ⇥ (⇤p.parse(p, x))

= fy ⇥ fx

3.4 Concrete Parsing SemanticsUsing the abstraction from Code to P ⇧ P , the Galois connection2Code �⇧⌅��

⇥VP = 2P�P is established as follows.

� = ⇤C.{⇤p.parse(p, c) | c ⌃ C}

⇥ = ⇤F.[

f⇥F

{c | �p ⌃ P.parse(p, c) = f(p)}

Concrete parsing semantics is derived from the collecting se-mantics as follows.

P = the set of parse stacks

VP = 2P�P

⌅ ⌃ EnvP = Var ⇧ VP

[[e]]0P ⌃ EnvP ⇧ VP

[[f ]]1P ⌃ EnvP ⇧ VP

[[x]]0P ⌅ = ⌅(x)

[[let x e1 e2]]0P ⌅ = [[e2]]

0P (⌅[x ⌥⇧ [[e1]]

0P ⌅])

[[or e1 e2]]0P ⌅ = [[e1]]

0P ⌅ ⌦ [[e2]]

0P ⌅

[[re x e1 e2 e3]]0P ⌅ = [[e3]]

0P (⌅[x ⌥⇧

fix ⇤k.[[e1]]0P ⌅ ⌦ [[e2]]

0P (⌅[x ⌥⇧ k])])

[[‘f ]]0P ⌅ = [[f ]]1P ⌅

[[t]]1P ⌅ = {⇤p.parse action(p, t)}[[f1.f2]]

1P ⌅ = {p2 ⇥ p1 | p1 ⌃ [[f1]]

1P ⌅ ↵ p2 ⌃ [[f2]]

1P ⌅}

[[, e]]1P ⌅ = [[e]]0P ⌅

4. Abstract Parsing4.1 First Step Abstraction : Value VP̂ = 2P ⇧ 2P

First, we abstract the concrete parsing domain VP to VP̂ = 2P ⇧2P by establishing the Galois connection VP �⇧⌅��

⇥VP̂ where

� =⇤F.⇤P.[

p⇥P

{f(p) | f ⌃ F}

⇥ = ⇤F.[

{S | �(S) ⇤ f}.

Then we derive the abstract semantics for this domain as follows.

⌅ ⌃ Env P̂ = Var ⇧ VP̂

[[e]]0P̂ ⌃ Env P̂ ⇧ VP̂

[[f ]]1P̂ ⌃ Env P̂ ⇧ VP̂

[[x]]0P̂ = ⌅(x)

[[let x e1 e2]]0P̂ ⌅ = [[e2]]

0P̂ (⌅[x ⌥⇧ [[e1]]

0P̂ ⌅])

[[or e1 e2]]0P̂ ⌅ = [[e1]]

0P̂ ⌅ ⌦ [[e2]]

0P̂ ⌅

[[re x e1 e2 e3]]0P̂ ⌅ = [[e3]]

0P̂ (⌅[x ⌥⇧

fix ⇤k.[[e1]]0P̂ ⌅ ⌦ [[e2]]

0P̂ (⌅[x ⌥⇧ k])])

[[‘f ]]0P̂ ⌅ = [[f ]]1P̂ ⌅

[[t]]1P̂ ⌅ = ⇤P.Parse action(P, t)

[[f1.f2]]1P̂ ⌅ = [[f2]]

1P̂ ⌅ ⇥ [[f1]]

1P̂ ⌅

[[, e]]1P̂ ⌅ = [[e]]0P̂ ⌅

where Parse action : 2P ⇧ Token ⇧ 2P is the natural setextension of parse action:

Parse action = ⇤P.⇤t.{parse action(p, t) | p ⌃ P}.

Abstract semantic function [[·]]0P̂

is used to check whether gen-erated codes conform to the grammar. For the given program e, wecompute

S = [[e]]0P̂ ⌅0{pinit} : 2P

where ⌅0 ⌃ Env P̂ is an empty environment. Then we compareS with {pacc}. If they are equal, we conclude that the generatedcodes in the given program conform to the grammar. Otherwise,the analysis concludes that the program may generate syntacticallyincorrect code.

4.2 Parameterized FrameworkWe generalize the analysis by parameterizing abstract domain. In-stead of abstracting the powerset domain of parse stack 2P intoa particular domain, we provide conditions which the abstract do-main should satisfy. Then we define semantic function on the ab-stract parsing domain.

Definition 1 (Abstract Parsing Domain). V ⇤ = D⇤ ⇧ D⇤ isan abstract parsing domain if an abstract domain D⇤ satisfies thefollowing conditions.

1. �D⇤,✏,�, D�� is a CPO.2. Powerset domain of concrete parse stack 2P and its abstract

domain D⇤are Galois connected via �2P�D� and ⇥D��2P .3. Parse action⇤ : D⇤ ⇧ Token ⇧ D⇤ is a sound approxima-

tion of Parse action . That is, we have

�P ⌃ 2P .�t ⌃ Token.

�2P�D�(Parse action(P, t)) ✏ Parse action⇤(�2P�D�(P ), t)

� ⇢ ✏

Over Approximation ⇢ _ ✏

4. Experiment ResultsFor some Linux device drivers and SPEC2000 benchmarks.

Program LOC AP MEM EQ Random Restart Time (sec)

] ide-ide-tape 16 6 18.2 5.2 4.1 1.2 0.055

ide-wait-ireason 9 6 216.1 111.8 47.2 9.9 0.602

parser 37 20 6694.5 819.4 990.3 12.5 32.120

usb-message 18 10 20.1 6.8 1.0 1.0 0.128

vpr 8 7 14.5 8.9 11.8 2.9 0.055

Under Approximation �

{

Restart!Cannot find a counter example.

No, with found a counter example.

No

No

Random answer does not break soundness, because we always check the three conditions when we resolve equivalence query.

Yes / No

No

Yes

Unknown

Random Answer

if � ⇤⇥ ⇥ ⌅ (µ |= � � ⇥)

GuessGuess

Seoul National University Academia Sinica1 21 1

1 2

?

Membership Query

5. Conclusion* Novel approach to invariants generation.* Plentiful of invariants are the reason why this approach is working with random answers.* We are currently working on its extension supporting quantified invariants.

Equivalence Query 1. Check that the guess meets the three conditions to be an invariant. 2. If not, try to find a counter example.

3. Solutions3.1. Predicate AbstractionTo find propositional invariant by using the CDNF algorithm which generates only Boolean formula.

Example:

3.2. Query ResolutionUse under/over approximations of the invariant, derived from precondition , loop guard , and postcondition .

Deriving Invariants in Propositional Logic by Algorithmic Learning, Decision Procedure, and Predicate Abstraction

Yungbum Jung, Soonho Kong, Bow-Yaw Wang, and Kwangkeun Yi

1. Problem : Find an invariant for the Loop

Invariant must satisfy the following conditions:

Example:

Invariant :

{�} while ⇤ do S end {⇥}

✏ = (i = 10 ^ ret)

� = (i = 0)

2. Idea : Using the CDNF AlgorithmExact Learning Algorithm for Boolean formula Asks two types of queries:

1.

2.

Membership Query asksif the truth assignment satisfies . µ

MEM (µ)�

MEM (µ) = Yes if µ |= �

MEM (µ) = No if µ �|= �

Equivalence Query asks if the Boolean formula is equivalent to , If not, the teacher returns a truth assignment as a counterexample.

EQ(�)� �

EQ(�) = YesEQ(�) = No

if � ⌘ �

ProgrammingResearch Laboratory

A.B.C.

� ^ ⇢ ) I

I ^ � ) Pre(I , S)

I ⇥ ¬⇢ � ✏

I

while i < 10 do

r := random();

if r != 0 then i := i + 1

end

S

i < 10 _ (i = 10 ^ ret)

bi<10 _ (bi=10 ^ bret)i < 10 _ (i = 10 ^ ret)

be the resulting parse stacks for the code fragments x and y. Theparse stack p for the concatenation x.y is computed as

p = parse(pinit , x.y)

= parse(parse(pinit , x), y)

= parse(px, y).

However, what we have is py = parse(pinit , y) not parse(px, y).The parse(pinit , y) is the parse stack after parsing y from the initialstack not from the px stack. We cannot directly compute p from px

and py .Abstracting the code c into the parse stack transition function

⇤p.parse(p, c) solves the above concatenation problem elegantly.Let fx and fy be the parse stack transition functions for codefragments x and y respectively. Then we have

fx = ⇤p.parse(p, x)

fy = ⇤p.parse(p, y).

With the two parse stack transition functions fx and fy , we con-struct the parse stack transition function fx.y as follows.

fx.y = ⇤p.parse(p, x.y)

= ⇤p.parse(parse(p, x), y)

= (⇤p.parse(p, y)) ⇥ (⇤p.parse(p, x))

= fy ⇥ fx

3.4 Concrete Parsing SemanticsUsing the abstraction from Code to P ⇧ P , the Galois connection2Code �⇧⌅��

⇥VP = 2P�P is established as follows.

� = ⇤C.{⇤p.parse(p, c) | c ⌃ C}

⇥ = ⇤F.[

f⇥F

{c | �p ⌃ P.parse(p, c) = f(p)}

Concrete parsing semantics is derived from the collecting se-mantics as follows.

P = the set of parse stacks

VP = 2P�P

⌅ ⌃ EnvP = Var ⇧ VP

[[e]]0P ⌃ EnvP ⇧ VP

[[f ]]1P ⌃ EnvP ⇧ VP

[[x]]0P ⌅ = ⌅(x)

[[let x e1 e2]]0P ⌅ = [[e2]]

0P (⌅[x ⌥⇧ [[e1]]

0P ⌅])

[[or e1 e2]]0P ⌅ = [[e1]]

0P ⌅ ⌦ [[e2]]

0P ⌅

[[re x e1 e2 e3]]0P ⌅ = [[e3]]

0P (⌅[x ⌥⇧

fix ⇤k.[[e1]]0P ⌅ ⌦ [[e2]]

0P (⌅[x ⌥⇧ k])])

[[‘f ]]0P ⌅ = [[f ]]1P ⌅

[[t]]1P ⌅ = {⇤p.parse action(p, t)}[[f1.f2]]

1P ⌅ = {p2 ⇥ p1 | p1 ⌃ [[f1]]

1P ⌅ ↵ p2 ⌃ [[f2]]

1P ⌅}

[[, e]]1P ⌅ = [[e]]0P ⌅

4. Abstract Parsing4.1 First Step Abstraction : Value VP̂ = 2P ⇧ 2P

First, we abstract the concrete parsing domain VP to VP̂ = 2P ⇧2P by establishing the Galois connection VP �⇧⌅��

⇥VP̂ where

� =⇤F.⇤P.[

p⇥P

{f(p) | f ⌃ F}

⇥ = ⇤F.[

{S | �(S) ⇤ f}.

Then we derive the abstract semantics for this domain as follows.

⌅ ⌃ Env P̂ = Var ⇧ VP̂

[[e]]0P̂ ⌃ Env P̂ ⇧ VP̂

[[f ]]1P̂ ⌃ Env P̂ ⇧ VP̂

[[x]]0P̂ = ⌅(x)

[[let x e1 e2]]0P̂ ⌅ = [[e2]]

0P̂ (⌅[x ⌥⇧ [[e1]]

0P̂ ⌅])

[[or e1 e2]]0P̂ ⌅ = [[e1]]

0P̂ ⌅ ⌦ [[e2]]

0P̂ ⌅

[[re x e1 e2 e3]]0P̂ ⌅ = [[e3]]

0P̂ (⌅[x ⌥⇧

fix ⇤k.[[e1]]0P̂ ⌅ ⌦ [[e2]]

0P̂ (⌅[x ⌥⇧ k])])

[[‘f ]]0P̂ ⌅ = [[f ]]1P̂ ⌅

[[t]]1P̂ ⌅ = ⇤P.Parse action(P, t)

[[f1.f2]]1P̂ ⌅ = [[f2]]

1P̂ ⌅ ⇥ [[f1]]

1P̂ ⌅

[[, e]]1P̂ ⌅ = [[e]]0P̂ ⌅

where Parse action : 2P ⇧ Token ⇧ 2P is the natural setextension of parse action:

Parse action = ⇤P.⇤t.{parse action(p, t) | p ⌃ P}.

Abstract semantic function [[·]]0P̂

is used to check whether gen-erated codes conform to the grammar. For the given program e, wecompute

S = [[e]]0P̂ ⌅0{pinit} : 2P

where ⌅0 ⌃ Env P̂ is an empty environment. Then we compareS with {pacc}. If they are equal, we conclude that the generatedcodes in the given program conform to the grammar. Otherwise,the analysis concludes that the program may generate syntacticallyincorrect code.

4.2 Parameterized FrameworkWe generalize the analysis by parameterizing abstract domain. In-stead of abstracting the powerset domain of parse stack 2P intoa particular domain, we provide conditions which the abstract do-main should satisfy. Then we define semantic function on the ab-stract parsing domain.

Definition 1 (Abstract Parsing Domain). V ⇤ = D⇤ ⇧ D⇤ isan abstract parsing domain if an abstract domain D⇤ satisfies thefollowing conditions.

1. �D⇤,✏,�, D�� is a CPO.2. Powerset domain of concrete parse stack 2P and its abstract

domain D⇤are Galois connected via �2P�D� and ⇥D��2P .3. Parse action⇤ : D⇤ ⇧ Token ⇧ D⇤ is a sound approxima-

tion of Parse action . That is, we have

�P ⌃ 2P .�t ⌃ Token.

�2P�D�(Parse action(P, t)) ✏ Parse action⇤(�2P�D�(P ), t)

� ⇢ ✏

Over Approximation ⇢ _ ✏

4. Experiment ResultsFor some Linux device drivers and SPEC2000 benchmarks.

Program LOC AP MEM EQ Random Restart Time (sec)

] ide-ide-tape 16 6 18.2 5.2 4.1 1.2 0.055

ide-wait-ireason 9 6 216.1 111.8 47.2 9.9 0.602

parser 37 20 6694.5 819.4 990.3 12.5 32.120

usb-message 18 10 20.1 6.8 1.0 1.0 0.128

vpr 8 7 14.5 8.9 11.8 2.9 0.055

Under Approximation �

{

Restart!Cannot find a counter example.

No, with found a counter example.

No

No

Random answer does not break soundness, because we always check the three conditions when we resolve equivalence query.

Yes / No

No

Yes

Unknown

Random Answer

if � ⇤⇥ ⇥ ⌅ (µ |= � � ⇥)

GuessGuess

Seoul National University Academia Sinica1 21 1

1 2

?

정확도 향상 기술문제 / 목표1 3

안전하고(sound), 정확하게(precise), 대형 프로그램을(scalable) 분석하는 일반적인 방법

대형 프로그램 분석 기술2

/ 4525

x = x+1

y = y-1

z = x

v = y

ret *a+*b

xyzvabxyzvabxyzvabxyzvabxyzvab

x

x

y

y

z

z

v

v

ab

Key: General Sparse Analysis

“Right Part at Right Moment”

PLDI’1

2

핵심 아이디어: 스파스 분석 (sparse analysis)

/ 45

General Sparse Analysis Framework

26

Thm. (preservation of soundness and precision)

“An important strength is that the theoretical result is very general ... The result should be highly influential on future work in sparse analysis.” (from PLDI reviews)

PLDI’1

2

스파스 분석 프레임워크 (sparse analysis framework)

일반적으로����������� ������������������  적용가능:����������� ������������������  

-����������� ������������������  프로그래밍����������� ������������������  언어����������� ������������������  (imperative,����������� ������������������  functional,����������� ������������������  oop,����������� ������������������  etc)����������� ������������������  

-����������� ������������������  분석����������� ������������������  수준����������� ������������������  (flow-sensitivity,����������� ������������������  context-sensitivity,����������� ������������������  etc)

선별적 정확도 향상 (selective X-sensitivity)

/ 23

program states

error states

program states

error states

vs.

Selective X-Sensitivity Framework

12

PLDI’1

4

k-CFA: +25% / -1300%+25% / -25%

Our method Existing methods

핵심 아이디어: 임팩트 전분석 (Impact pre-analysis)

/ 2313

핵심 아이디어: 임팩트 전분석

�Main Analysis

Con

text

-sen

sitiv

ity

Other precision aspects

PLDI’1

4

Impact !Pre-analysis w

q

예: 선별적 문맥 구분 (selective context-sensivitiy)

/ 2315

예: 선별적 문맥 구분int h(n) {ret n;} !void f(a) { x = h(a); assert(x > 1); // Q1 y = h(input()); assert(y > 1); // Q2 } !void g() {f(8);} !void m() { f(4); g(); g(); }

c1:

c2:

c4:c5:c6:

c3:

always holds

does not always hold

/ 2317

g

h

h

h

h

h

h

int h(n) {ret n;} !void f(a) { x = h(a); assert(x > 1); // Q1 y = h(input()); assert(y > 1); // Q2 } !void g() {f(8);} !void m() { f(4); g(); g(); }

c1:

c2:

c4:c5:c6:

c3:

f

fm

c4

c6

c5 c3

c3

c1

c2

c1

c2

c1

c2

fg

[4,4]

[-∞,+∞]

[8,8]

[8,8]

[-∞,+∞]

[-∞,+∞]

문맥 구분 분석 (Context-Sensitivity)

/ 2321

h

h

m

c4

{c5,c6}c3

c1

c1fg

h

f

c2

c2

int h(n) {ret n;} !void f(a) { x = h(a); assert(x > 1); // Q1 y = h(input()); assert(y > 1); // Q2 } !void g() {f(8);} !void m() { f(4); g(); g(); }

c1:

c2:

c4:c5:c6:

c3:

임팩트 전분석