segmented symbolic analysis

29
Segmented Symbolic Analysis Wei Le Rochester Institute of Technology

Upload: minna

Post on 05-Jan-2016

27 views

Category:

Documents


0 download

DESCRIPTION

Segmented Symbolic Analysis. Wei Le Rochester Institute of Technology. Motivation. Symbolic analysis has many important applications in software tools [ S en , Marinov , Agha ‘05] [ Godefroid , Klarlund , Sen ‘05] [Le, Soffa ’08] [ Chipounov , Kuznetsov , Candea ‘12] - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Segmented Symbolic Analysis

Segmented Symbolic Analysis

Wei LeRochester Institute of Technology

Page 2: Segmented Symbolic Analysis

Motivation

• Symbolic analysis has many important applications in software tools [Sen, Marinov, Agha ‘05] [Godefroid, Klarlund, Sen ‘05] [Le, Soffa ’08]

[Chipounov, Kuznetsov, Candea ‘12]

• Compared to testing with concrete input: better coverage

• Compared to other static techniques: more precise

• Will continue being a powerful tool due to improved scalability [Chipounov, Kuznetsov, Candea ‘12]

Page 3: Segmented Symbolic Analysis

Challenges of Symbolic Analysis

• Loops: Can have an statically unknown bound

• Library calls: the source code of a library is typically not available at compile time

Page 4: Segmented Symbolic Analysis

Previous Solutions

• Loops - very small state space is covered – Iterate once [Cadar, Dunbar, Engler ‘ 08] [Chipounov, Kuznetsov, Candea ‘12]

– Report unknown [Xie, Chou, Engler ‘03]

– Pattern matching [Saxena, Poosankam, McCamant, Song ‘00]

• Library calls – imprecise and manual effort – A concrete value [Sen, Marinov, Agha ‘05] [Godefroid, Klarlund, Sen ‘05]

– Manually constructed models (e.g., simplified C implementation) [Bush, Pincus, Sielaff ‘00] [Chipounov, Kuznetsov, Candea ‘12]

Page 5: Segmented Symbolic Analysis

Segmented Symbolic Analysis - Insights

• Code is not uniformly easy to analyze

• We should leverage the structural and semantic relations between statements to partition a program and apply different analyses accordingly

• The capabilities of static analysis are limited; we should introduce dynamic analysis to supply information that a pure static symbolic analyzer is slow or unable to produce

Page 6: Segmented Symbolic Analysis

Overall Approach

• Perform symbolic analysis

• When an unknown occurs, identify code segments that cause unknown

• Construct unit tests and automatically generate inputs

• Run tests, perform dynamic inference to generate symbolic rules and symbolic values (transfer functions)

• Resume symbolic analysis using inferred rules

Page 7: Segmented Symbolic Analysis

Novelty of the Work • Weave static and dynamic analyses on demand on a concurrent framework

• Dynamic analysis is fully automatic (not running the entire program but on code segments)

• Aggregated information from multiple runs: regression analysis1. Programs mostly consist of linear operations [Knuth’71] [Halbwachs, Proy, P.

Roumanoff ‘97]

2. Determining program properties often only requires linear constraints [Halbwachs, Y.-E. Proy, and P. Roumanoff ’97] [[Xie, Chou, Engler ‘03]

3. We assume that linear relations can characterize relevant behavior of small code segments

Page 8: Segmented Symbolic Analysis

Overview using an Example

Page 9: Segmented Symbolic Analysis

struct stat s

char filename[32]

char* temp = argv[1]

int i = 0

*temp != ‘\0’

filename[i] = *temp++

i++

strcat(filename, “, ”)

t == 0

t =_stat64i32(filename,&s)

yes

no

no

yes

Library

Loop

32 > Len(filename)+1

1

2

3

4

5

6

7

8

9

10

Segmented SATraditional SA

Library Unknown

Traditional SA with Library Models

32 > Len(filename)+1

32 > Len(filename)+1

Loop Unknown

32 > Len(filename)+1

32 > Len(filename)+1

Len(filename’) = Len (temp)

32 > Len(temp)+1

32 > Len(argv[1])+1

Buffer Overflow

Len (filename’) = Len(filename)

Page 10: Segmented Symbolic Analysis

//initialize with test inputs char* temp = _GenChars(test_buf);char* filename = _GenChars(test_buf);

//code segment for the loopint i = 0;while(*temp != '\0'){ filename[i] = *temp++; i++; }

//output Len(filename) char* _result = _GenChars(g_buf);int _rint = strlen(filename);itoa(_rint, _result, 10);fputs(_result, fp);

// cleanup…

Unit Test to Infer the Loop

Page 11: Segmented Symbolic Analysis

Reduce to Regression Analysis

Test Test Input Test Input Transformed for RA Output

temp filename Len(temp) Len(filenname) Len(filename’)

1 acde piidaf 4 6 4

2 tazipad qdd 7 3 7

3 ad dafdalfll 2 9 2

Page 12: Segmented Symbolic Analysis

Internal Design and Components

Page 13: Segmented Symbolic Analysis

q q q……

Solved

Solving

q

Solving

Unknown

Solved

Not Found New Rules

Symbolic Analysis & Partition Program for Unknown

Test Synthesizer Inference Engine

Inference Repository

Request

Respond

Dynamic Inference On Demand

The Helium framework

Page 14: Segmented Symbolic Analysis

Components on the Helium Framework

• Static component:

- Perform demand-driven, path-sensitive symbolic analysis

- Isolate the code segment that causes unknown- Determine the environment for the code segment

Page 15: Segmented Symbolic Analysis

V: Inquiry

Transfer Func

Test Input

Code Unit Test Output

Request

Respond Inference

Dynamic Inference

E: Env

C: CodeSymbolic Analysis

Interaction Protocol

Page 16: Segmented Symbolic Analysis

Test Synthesizer

Construct a Unit Test from Program Segment

Code Segment

Determine Test Input Variables

Determine Test Output Variables

Construct Runnable Code Select Code Segment

Page 17: Segmented Symbolic Analysis

Inference via Regression

Input TransformationModel Selection

Simple, Multiple, Polynomial LinearPiecewise Linear

Data for Explanatory Variables

Data for Response Variables

Linear Symbolic Rules

Dynamic Inference as Regression Analysis

Y = X0 + a1 X1 + a2 X2 … + an Xn

Page 18: Segmented Symbolic Analysis

Explanatory Models for Representing Code Semantics

(SUPPOSE a: OUTPUT VAR, b, c, d: INPUT VARS)

Models Examples

Constant a = 0

Simple Linear a = b

Multiple Linear a = 2*b + c

Polynomial Linear a = b^2 + c*d

Piece-wise Linear if b > 0 a = b, else a = 3

Page 19: Segmented Symbolic Analysis

Experimental Setup

1. Implementation - Phoenix and Disolver, analyzing C/C++/C#

• A traditional symbolic analysis that gives up in loops and library calls

• Segmented symbolic analysis• Applications of both symbolic analyses to detect infeasible

paths and buffer overflows

2. Research Questions:

• Can we find useful symbolic rules and values?• Are we improving the detection capabilities for infeasible

paths and buffer overflows?• What are the capabilities of segmented symbolic analysis? • Is the technique still scalability and practical?

Page 20: Segmented Symbolic Analysis

Experimental Results: Compare the two

Program Overflow Unknown Infeasible Unknown

SA S-SA SA S-SA SA S-SA SA S-SA

wu-ftpd 0 3 7 5 1 2 4 3

sendmail 0 3 18 16 1 1 6 6

polymorph 2 7 6 2 3 4 5 4

gzip 1 5 25 21 9 11 24 22

grep 1 1 6 6 14 15 19 17

tightvnc 0 0 12 11 5 5 34 32

putty 0 1 60 54 30 31 72 70

snort 0 13 53 45 59 67 147 124

Page 21: Segmented Symbolic Analysis

Dynamic Inference for Buffer OverflowProgram Segments Runnable Analyzable Inferred Rules

Loop Lib Loop Lib Loop Lib

wu-ftpd 6 35 0 35 0 33 112

sendmail 7 26 7 25 7 16 79

polymorph 0 19 0 18 0 17 62

gzip 5 63 3 61 3 57 197

grep 2 7 0 5 0 5 11

tightvnc 8 11 0 5 0 2 4

putty 18 42 10 29 10 22 82

snort 37 47 11 40 9 25 148

Page 22: Segmented Symbolic Analysis

PerformanceProgram size Symbolic Analysis Segmented Symbolic Analysis

kloc T-inf T-buf T-inf Thread-inf T-buf Thread-buf

wu-ftpd 0.4 0.7 s 1.5 s 2.7 s 6 523.8 s 206

sendmail 0.9 1.0 s 1.8 s 11.2 s 26 228.9 s 166

polymorph 0.9 2.2 s 1.0 s 3.9 s 6 143.3 s 96

gzip 5.1 358.7 s 3. 0 s 1679.3 s 271 508.6 s 341

grep 16.9 21.1 s 3.4 s 470.1 s 71 79.9 s 46

tightvnc 45.4 490.5 s 18.4 s 1149.9 s 126 596.6 s 96

putty 60.1 331.4 s 81.4 s 508.4 s 101 1213.6 s 281

snort 98.8 124.6 s 465.6 s 2009.4 s 651 1472.3 s 421

Page 23: Segmented Symbolic Analysis

Experimental Summary • Improved the detection capabilities: 5 times more

buffer overflows

• Inferred 1135 models

• 2/3 of the loops are eligible for size, 29.3% yields runnable unit tests, inferred models from 23.8% loops

• Unit tests for 81.4% library calls are runnable and models are inferred for 70.4% library calls

• Scalability is still practical

• We can handle loops that traditional symbolic analysis cannot

Page 24: Segmented Symbolic Analysis

Capabilities of Segmented Symbolic Analysis

Lib Yes Example

String strcpy, strcat, strlen, strncpy, strdup

File Systems chdir, getcwd, rename, unlink, stat

I/O printf, fgets, fgetc, read

Misc perror, utime, inet addr,atoi

Lib No Example

String Content strrchr, getenv

Compiler Unknown malloc

Network recv, gethostbyname

Interactive Input getchar

Loop No Example

Complex loop nested loop

Network recv

Interactive input getchar

Invalid context Invalid loop index

Page 25: Segmented Symbolic Analysis

Loops We can Handle

//loop handled by segment symbolic analysisfor (p = name; *p != '\0'; p++){

if (isascii((int)*p) && isupper((int)*p)){*p = tolower(*p);tryagain = TRUE;

}}

Page 26: Segmented Symbolic Analysis

Loops We cannot Handle Yet

for (n = 7; n >= 8 - pfburh->r.w % 8; n--) {rcSource[i++] = rcolors [m_netbuf[y * bytesPerRow + x] >> n & 1] ;

}

Page 27: Segmented Symbolic Analysis

Related Work

• Various symbolic analyses for bug finding, debugging [Sen, Marinov, Agha ‘05] [Godefroid, Klarlund, Sen ‘05] [Le, Soffa ’08]

[Chipounov, Kuznetsov, Candea ‘12]

• Hybrid symbolic analysis [Sen, Marinov, Agha ‘05] [Godefroid,

Klarlund, Sen ‘05] [Chipounov, Kuznetsov, Candea ‘12]

• Dynamic invariants discovery [Ernst, Czeisler,

Griswold, Notkin ‘ 00]

Page 28: Segmented Symbolic Analysis

Conclusions

A novel hybrid technique that flexibly weaves static and dynamic analyses on demand for their maximum capabilities of discoveringprogram semantic information

Addressed the two key challenges : 1) partitioning a program toconstruct valid unit tests, and 2) mapping the problems of discovering symbolic relations between program variables to regression analysis.

Fully automatic and can be generally applied for determining different program properties and for different programs.

Page 29: Segmented Symbolic Analysis

Thank you and Questions?