deriving input syntactic structure from execution zhiqiang lin xiangyu zhang

29
Deriving Input Syntactic Structure From Execution Zhiqiang Lin Xiangyu Zhang Purdue University November 11 th , 2008 The 16th ACM SIGSOFT International Symposium on the Foundations of Software Engineering (FSE’08)

Upload: laszlo

Post on 29-Jan-2016

41 views

Category:

Documents


0 download

DESCRIPTION

The 16th ACM SIGSOFT International Symposium on the Foundations of Software Engineering (FSE’08). Deriving Input Syntactic Structure From Execution Zhiqiang Lin Xiangyu Zhang. Purdue University November 11 th , 2008. Motivation -- Most software takes structural input. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Deriving Input Syntactic Structure From Execution Zhiqiang Lin        Xiangyu Zhang

Deriving Input Syntactic Structure From Execution

Zhiqiang Lin Xiangyu Zhang

Purdue University

November 11th, 2008

The 16th ACM SIGSOFT International Symposium on the Foundations of Software Engineering (FSE’08)

Page 2: Deriving Input Syntactic Structure From Execution Zhiqiang Lin        Xiangyu Zhang

Motivation -- Most software takes structural input

Page 3: Deriving Input Syntactic Structure From Execution Zhiqiang Lin        Xiangyu Zhang

Applications -- Software Testing/Debugging

Using Input Grammar to Generate Test Cases K. Hanford. Automatic Generation of Test Cases. In IBM

Systems Journal, 9(4), 1970. P. Purdom. A sentence generator for testing parsers. In BIT

Numerical Mathematics, 12(3), 1972 Grammar based whitebox fuzz [PLDI’08]

Delta Debugging Reducing large failure input [TSE’02] Hierarchical Delta Debugging (HDD) [ICSE’06]

Execution Fast Forwarding Reducing Event Log for failure replay[FSE’06]

Page 4: Deriving Input Syntactic Structure From Execution Zhiqiang Lin        Xiangyu Zhang

Applications -- Computer Security

Malware, Attack instanceSignature generation Exploit (input) Signature

Payload length, keywords, Field structure…

Penetration testing Software vulnerability Play with Input (fuzz)

Packet Vaccine [CCS’06] ShieldGen [IEEE S&P’07]

Malware Protocol Replayer Malware feature

Replay the protocol Input Format

Page 5: Deriving Input Syntactic Structure From Execution Zhiqiang Lin        Xiangyu Zhang

Challenges

Input structure exists in a machine unfriendly way Plain text (ASCII Stream, e.g., C File) Binary Code (Protocol Message Stream)

Known specification (RFC) Implementation Deviation

Unknown Specification Malware

Bot Botnet protocol Legal software

SAMBA protocol (12 years for open source community)

Page 6: Deriving Input Syntactic Structure From Execution Zhiqiang Lin        Xiangyu Zhang

Challenges

May not have the Source Code Access

Penetration testing Malware analysis Legal software

Working on binary

Page 7: Deriving Input Syntactic Structure From Execution Zhiqiang Lin        Xiangyu Zhang

Our Contributions

2 different approaches to handling 2 types of parsers Using Dynamic Control Dependency to handle

top down parsers

A new dynamic analysis to handle bottom up parsers by identifying and analyzing the parsing stack

Experimental results show that the proposed analyses are highly effective in producing very precise input syntax trees

Page 8: Deriving Input Syntactic Structure From Execution Zhiqiang Lin        Xiangyu Zhang

Outline

Motivation Technical Description

Handling Inputs with A Top-down Parser Handling Inputs with A Bottom-up Parser

Evaluation Discussion Related Work Conclusion

Page 9: Deriving Input Syntactic Structure From Execution Zhiqiang Lin        Xiangyu Zhang

I. Top down Parser

Parse input in a top-down manner.

S

BH

S HBH hNN 1|2B bB|ε

h N

1

Bb

ε

Bb

h1bbε

Page 10: Deriving Input Syntactic Structure From Execution Zhiqiang Lin        Xiangyu Zhang

ImplementationVoid Parser (){ char c =getchar(); if (c == ’h’) { c = getchar(); if c ==‘1’ || c==‘2’) {

c=getchar(); }else error(); } else error (); while(c==‘b’){ c=getchar(); if(c==‘ε’){ break; } }error(); }

1 2 3 4 5 6 7 8 910111213141516

S HBH hNN 1|2B bB|ε

H

B

Page 11: Deriving Input Syntactic Structure From Execution Zhiqiang Lin        Xiangyu Zhang

Execution TraceVoid Parser (){ char c =getchar(); if (c == ’h’) { c = getchar(); if c ==‘1’ || c==‘2’) {

c=getchar(); }else error(); } else error (); while(c==‘b’){ c=getchar(); if(c==‘ε’){ break; } }error(); }

1 2 3 4 5 6 7 8 910111213141516

c=getchar()

if(c==‘h’)

c = getchar()

if(c==‘1’||’2’)

c = getchar()

c = getchar()

break

c = getchar()

h

1

while(c==‘b’)b1

if(c==‘ε’’) b2

while(c==‘b’)b2

if(c==‘ε’’) εh1bbε

Control Dependency: A Stmt Y is control-dependent on X iff X directly determines whether Y executes

Page 12: Deriving Input Syntactic Structure From Execution Zhiqiang Lin        Xiangyu Zhang

Execution TraceVoid Parser (){ char c =getchar(); if (c == ’h’) { c = getchar(); if c ==‘1’ || c==‘2’) {

c=getchar(); }else error(); } else error (); while(c==‘b’){ c=getchar(); if(c==‘ε’){ break; } }error(); }

1 2 3 4 5 6 7 8 910111213141516

c=getchar()

if(c==‘h’)

c = getchar()

while(c==‘b’)

break

if(c==‘ε’’)

c = getchar()

hc = getchar()

if(c==‘1’||’2’) 1

b1

if(c==‘ε’’)

c = getchar()

b2

while(c==‘b’)b2

εh1bbε

Control Dependency: A Stmt Y is control-dependent on X iff X directly determines whether Y executes

if(c==‘ε’’)

c = getchar()

while(c==‘b’)

Page 13: Deriving Input Syntactic Structure From Execution Zhiqiang Lin        Xiangyu Zhang

Void Parser (){ char c =getchar(); if (c == ’h’) { c = getchar(); if c ==‘1’ || c==‘2’) {

c=getchar(); }else error(); } else error (); while(c==‘b’){ c=getchar(); if(c==‘ε’){ break; } }error(); }

1 2 3 4 5 6 7 8 910111213141516

Control dependency graph for the execution trace

c=getchar() if(c==‘h’)

c = getchar() if(c==‘1’||’2’)

c = getchar()

while(c==‘b’)

if(c==‘ε’’) c = getchar()

break

while(c==‘b’)

if(c==‘ε’’) c = getchar()

h

1

b2

b1

b2

ε

START

A Control Dependency Graph: A Graph in which any given node directly controls its child node execution

Page 14: Deriving Input Syntactic Structure From Execution Zhiqiang Lin        Xiangyu Zhang

Eliminate non data use node

c=getchar() if(c==‘h’)

c = getchar() if(c==‘1’||’2’)

c = getchar()

while(c==‘b’)

if(c==‘ε’’) c = getchar()

break

while(c==‘b’)

if(c==‘ε’’) c = getchar()

START

h

1

b2

b1

b2

ε

Page 15: Deriving Input Syntactic Structure From Execution Zhiqiang Lin        Xiangyu Zhang

Add Data Use Leaf Node

if(c==‘h’)

if(c==‘1’||’2’)

while(c==‘b’)

if(c==‘ε’’)

while(c==‘b’)

if(c==‘ε’’)

START

h

1

b2

b1

b2

ε

Page 16: Deriving Input Syntactic Structure From Execution Zhiqiang Lin        Xiangyu Zhang

Add Data Use Leaf Node

if(c==‘h’)

if(c==‘1’||’2’)

while(c==‘b’)

if(c==‘ε’’)

while(c==‘b’)

if(c==‘ε’’)

START

h

1

b1

b2

b2

ε

Page 17: Deriving Input Syntactic Structure From Execution Zhiqiang Lin        Xiangyu Zhang

Eliminate Redundant Node

2 if(c==‘h’)

4 if(c==‘1’||’2’)

91 while(c==‘b’)

111 if(c==‘ε’’)

START

h

1

b1

b292 while(c==‘b’)

112 if(c==‘ε’’) b2

εIdentical Node

Page 18: Deriving Input Syntactic Structure From Execution Zhiqiang Lin        Xiangyu Zhang

II. Bottom up parser

Parse input in a bottom up manner Programming languages lex/yacc

S ABA aaB b

aab

S

a

B

a b

A

Page 19: Deriving Input Syntactic Structure From Execution Zhiqiang Lin        Xiangyu Zhang

A General Bottom Up Parsing Algorithmwhile (…) { if (stack should not be reduced ) { stack.push(c); … } else{ //A→ β stack.pop (|β|); stack.push (A); } }

aab

S ABA aaB b

Trace:while (…) ; if (stack should not be reduced ) ; stack.push(a), while (…) ; if (stack should not be reduced ) ; stack.push(a), while (…) ; if (stack should not be reduced ) ; stack.pop(aa); stack.push(A)….

Page 20: Deriving Input Syntactic Structure From Execution Zhiqiang Lin        Xiangyu Zhang

A General Bottom Up Parsing Algorithmwhile (…) { if (stack should not be reduced ) { stack.push(c); … } else{ //A→ β stack.pop (|β|); stack.push (A); } }

aab

S ABA aaB b

Trace:while (…) ; if (stack should not be reduced ) ; stack.push(a), while (…) ; if (stack should not be reduced ) ; stack.push(a), while (…) ; if (stack should not be reduced ) ; stack.pop(aa); stack.push(A)….

Page 21: Deriving Input Syntactic Structure From Execution Zhiqiang Lin        Xiangyu Zhang

Tree Construction

aab

S ABA aaB b

Stack Operation Trace:Push(a), Push(a), Pop(aa), Push(A)Push(b), Pop(b), Push(B), Pop(AB), Push(S)

Pop(b)

Push(B)

Push(b)

Push(a)

Push(A)

Push(a)

Push(S)

Identify the parsing stack

Identical Node

Page 22: Deriving Input Syntactic Structure From Execution Zhiqiang Lin        Xiangyu Zhang

Evaluation – Top down grammar

Bad?

Page 23: Deriving Input Syntactic Structure From Execution Zhiqiang Lin        Xiangyu Zhang

Evaluation – Top down grammar

Page 24: Deriving Input Syntactic Structure From Execution Zhiqiang Lin        Xiangyu Zhang

Evaluation – Bottom up grammar

Identical Node

Page 25: Deriving Input Syntactic Structure From Execution Zhiqiang Lin        Xiangyu Zhang

Performance Overhead

5X-45X 6X-8X

Page 26: Deriving Input Syntactic Structure From Execution Zhiqiang Lin        Xiangyu Zhang

Discussion

Grammar categories Top down, bottom up, any others? Possible to evade the control dependency

structure in top down parser implementation.

Individual input Multiple input final grammar

Syntactic Structure Semantics

Page 27: Deriving Input Syntactic Structure From Execution Zhiqiang Lin        Xiangyu Zhang

Related Work Network Protocol Format Reverse Engineering

Instruction Semantics (Comparison, loop keyword, delimiter) Polyglot [CCS’07] Automatic Network Protocol Analysis [NDSS’08] Tupni [CCS’08]

Execution Context (Call stack, PC) AutoFormat [NDSS’08]

Limitations Part of the problem space

Only top-down parsers. Part of the problem’s essence.

Comparison (predicate), call stack control dependency

Page 28: Deriving Input Syntactic Structure From Execution Zhiqiang Lin        Xiangyu Zhang

Conclusion

Two dynamic analyses to construct input structure from program execution.

No source code access or any symbolic information.

Highly effective and produce input syntax trees with high quality.

Page 29: Deriving Input Syntactic Structure From Execution Zhiqiang Lin        Xiangyu Zhang

Thank you

To further contact us:

{zlin,xyzhang}@cs.purdue.edu

Q & A