deriving input syntactic structure from execution zhiqiang lin xiangyu zhang

Post on 29-Jan-2016

41 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

The 16th ACM SIGSOFT International Symposium on the Foundations of Software Engineering (FSE’08). Deriving Input Syntactic Structure From Execution Zhiqiang Lin Xiangyu Zhang. Purdue University November 11 th , 2008. Motivation -- Most software takes structural input. - PowerPoint PPT Presentation

TRANSCRIPT

Deriving Input Syntactic Structure From Execution

Zhiqiang Lin Xiangyu Zhang

Purdue University

November 11th, 2008

The 16th ACM SIGSOFT International Symposium on the Foundations of Software Engineering (FSE’08)

Motivation -- Most software takes structural input

Applications -- Software Testing/Debugging

Using Input Grammar to Generate Test Cases K. Hanford. Automatic Generation of Test Cases. In IBM

Systems Journal, 9(4), 1970. P. Purdom. A sentence generator for testing parsers. In BIT

Numerical Mathematics, 12(3), 1972 Grammar based whitebox fuzz [PLDI’08]

Delta Debugging Reducing large failure input [TSE’02] Hierarchical Delta Debugging (HDD) [ICSE’06]

Execution Fast Forwarding Reducing Event Log for failure replay[FSE’06]

Applications -- Computer Security

Malware, Attack instanceSignature generation Exploit (input) Signature

Payload length, keywords, Field structure…

Penetration testing Software vulnerability Play with Input (fuzz)

Packet Vaccine [CCS’06] ShieldGen [IEEE S&P’07]

Malware Protocol Replayer Malware feature

Replay the protocol Input Format

Challenges

Input structure exists in a machine unfriendly way Plain text (ASCII Stream, e.g., C File) Binary Code (Protocol Message Stream)

Known specification (RFC) Implementation Deviation

Unknown Specification Malware

Bot Botnet protocol Legal software

SAMBA protocol (12 years for open source community)

Challenges

May not have the Source Code Access

Penetration testing Malware analysis Legal software

Working on binary

Our Contributions

2 different approaches to handling 2 types of parsers Using Dynamic Control Dependency to handle

top down parsers

A new dynamic analysis to handle bottom up parsers by identifying and analyzing the parsing stack

Experimental results show that the proposed analyses are highly effective in producing very precise input syntax trees

Outline

Motivation Technical Description

Handling Inputs with A Top-down Parser Handling Inputs with A Bottom-up Parser

Evaluation Discussion Related Work Conclusion

I. Top down Parser

Parse input in a top-down manner.

S

BH

S HBH hNN 1|2B bB|ε

h N

1

Bb

ε

Bb

h1bbε

ImplementationVoid Parser (){ char c =getchar(); if (c == ’h’) { c = getchar(); if c ==‘1’ || c==‘2’) {

c=getchar(); }else error(); } else error (); while(c==‘b’){ c=getchar(); if(c==‘ε’){ break; } }error(); }

1 2 3 4 5 6 7 8 910111213141516

S HBH hNN 1|2B bB|ε

H

B

Execution TraceVoid Parser (){ char c =getchar(); if (c == ’h’) { c = getchar(); if c ==‘1’ || c==‘2’) {

c=getchar(); }else error(); } else error (); while(c==‘b’){ c=getchar(); if(c==‘ε’){ break; } }error(); }

1 2 3 4 5 6 7 8 910111213141516

c=getchar()

if(c==‘h’)

c = getchar()

if(c==‘1’||’2’)

c = getchar()

c = getchar()

break

c = getchar()

h

1

while(c==‘b’)b1

if(c==‘ε’’) b2

while(c==‘b’)b2

if(c==‘ε’’) εh1bbε

Control Dependency: A Stmt Y is control-dependent on X iff X directly determines whether Y executes

Execution TraceVoid Parser (){ char c =getchar(); if (c == ’h’) { c = getchar(); if c ==‘1’ || c==‘2’) {

c=getchar(); }else error(); } else error (); while(c==‘b’){ c=getchar(); if(c==‘ε’){ break; } }error(); }

1 2 3 4 5 6 7 8 910111213141516

c=getchar()

if(c==‘h’)

c = getchar()

while(c==‘b’)

break

if(c==‘ε’’)

c = getchar()

hc = getchar()

if(c==‘1’||’2’) 1

b1

if(c==‘ε’’)

c = getchar()

b2

while(c==‘b’)b2

εh1bbε

Control Dependency: A Stmt Y is control-dependent on X iff X directly determines whether Y executes

if(c==‘ε’’)

c = getchar()

while(c==‘b’)

Void Parser (){ char c =getchar(); if (c == ’h’) { c = getchar(); if c ==‘1’ || c==‘2’) {

c=getchar(); }else error(); } else error (); while(c==‘b’){ c=getchar(); if(c==‘ε’){ break; } }error(); }

1 2 3 4 5 6 7 8 910111213141516

Control dependency graph for the execution trace

c=getchar() if(c==‘h’)

c = getchar() if(c==‘1’||’2’)

c = getchar()

while(c==‘b’)

if(c==‘ε’’) c = getchar()

break

while(c==‘b’)

if(c==‘ε’’) c = getchar()

h

1

b2

b1

b2

ε

START

A Control Dependency Graph: A Graph in which any given node directly controls its child node execution

Eliminate non data use node

c=getchar() if(c==‘h’)

c = getchar() if(c==‘1’||’2’)

c = getchar()

while(c==‘b’)

if(c==‘ε’’) c = getchar()

break

while(c==‘b’)

if(c==‘ε’’) c = getchar()

START

h

1

b2

b1

b2

ε

Add Data Use Leaf Node

if(c==‘h’)

if(c==‘1’||’2’)

while(c==‘b’)

if(c==‘ε’’)

while(c==‘b’)

if(c==‘ε’’)

START

h

1

b2

b1

b2

ε

Add Data Use Leaf Node

if(c==‘h’)

if(c==‘1’||’2’)

while(c==‘b’)

if(c==‘ε’’)

while(c==‘b’)

if(c==‘ε’’)

START

h

1

b1

b2

b2

ε

Eliminate Redundant Node

2 if(c==‘h’)

4 if(c==‘1’||’2’)

91 while(c==‘b’)

111 if(c==‘ε’’)

START

h

1

b1

b292 while(c==‘b’)

112 if(c==‘ε’’) b2

εIdentical Node

II. Bottom up parser

Parse input in a bottom up manner Programming languages lex/yacc

S ABA aaB b

aab

S

a

B

a b

A

A General Bottom Up Parsing Algorithmwhile (…) { if (stack should not be reduced ) { stack.push(c); … } else{ //A→ β stack.pop (|β|); stack.push (A); } }

aab

S ABA aaB b

Trace:while (…) ; if (stack should not be reduced ) ; stack.push(a), while (…) ; if (stack should not be reduced ) ; stack.push(a), while (…) ; if (stack should not be reduced ) ; stack.pop(aa); stack.push(A)….

A General Bottom Up Parsing Algorithmwhile (…) { if (stack should not be reduced ) { stack.push(c); … } else{ //A→ β stack.pop (|β|); stack.push (A); } }

aab

S ABA aaB b

Trace:while (…) ; if (stack should not be reduced ) ; stack.push(a), while (…) ; if (stack should not be reduced ) ; stack.push(a), while (…) ; if (stack should not be reduced ) ; stack.pop(aa); stack.push(A)….

Tree Construction

aab

S ABA aaB b

Stack Operation Trace:Push(a), Push(a), Pop(aa), Push(A)Push(b), Pop(b), Push(B), Pop(AB), Push(S)

Pop(b)

Push(B)

Push(b)

Push(a)

Push(A)

Push(a)

Push(S)

Identify the parsing stack

Identical Node

Evaluation – Top down grammar

Bad?

Evaluation – Top down grammar

Evaluation – Bottom up grammar

Identical Node

Performance Overhead

5X-45X 6X-8X

Discussion

Grammar categories Top down, bottom up, any others? Possible to evade the control dependency

structure in top down parser implementation.

Individual input Multiple input final grammar

Syntactic Structure Semantics

Related Work Network Protocol Format Reverse Engineering

Instruction Semantics (Comparison, loop keyword, delimiter) Polyglot [CCS’07] Automatic Network Protocol Analysis [NDSS’08] Tupni [CCS’08]

Execution Context (Call stack, PC) AutoFormat [NDSS’08]

Limitations Part of the problem space

Only top-down parsers. Part of the problem’s essence.

Comparison (predicate), call stack control dependency

Conclusion

Two dynamic analyses to construct input structure from program execution.

No source code access or any symbolic information.

Highly effective and produce input syntax trees with high quality.

Thank you

To further contact us:

{zlin,xyzhang}@cs.purdue.edu

Q & A

top related