automated exploit detection in binaries

Automated Exploit Detection in Binaries

Matt Hargett

http://www.clock.org/~matt

matt {hizzat} use {dizznot} net

Luis Miras

http://dwerd.blogspot.com

lmiras {hizzat} gmail {dizznot} com

Finding exploitable vulnerabilities in binaries

Agenda

• Definition

• Architecture

• Challenges

bugreport

• Set of tests for analysis tools

• Proof Of Concept tool

• Not a product or real-world tool

• Released under GPLv3 draft 2– http://sf.net/projects/bugreport

• Enhancements as issues come up

Why C#?

• Very similar to Java and C++

• Open ECMA standard

• 3 open source implementations

• It has specific features we like– high-speed generics– nullable value types– strong typing– high quality and simple open source tools

Target of Detection

• Many vendors have their own definitions of exploitable bugs.

"depends on what you mean by exploit and by bug"

• Our definition is Out-Of-Bound (OOB) memory write using tainted data.

Out-Of-Bound Write Tests

• C code

• x86 code

• Test

• C# code

bugreport architecture

• Set of tests

• x86 emulator– Other processors will be added later.

• Analysis engine

Challenges

• Branches

• Inter-function Analysis

• Non-Contiguous functions

• Self Modifying code

• Loops

Dealing with Branches

• Known values– Results in one machine state

• Unknown values – Results in two machine states– Constraints are used

• x <= value <= y


• cmp, test, math instructions set flags based on input

• jxx, sbb, etc. instructions act on flags


1: cmp eax, 02: jne 4 3: ret4: cmp eax, 2555: jle 7 6: ret7: ...

Choosing Branches

1. Cheat, take branches (follow jxx, sbb).

2. Randomly pick branches

3. Take all branches (drop through and follow jxx, sbb)

4. Take some branches (drop through and follow jxx, sbb)

Choosing Branches

• Many functions have guards at the entry.

• Guards generally drop through on failure.

• Taking all branches increases code coverage


void main(int argc, char** argv){ if (argc < 2) { exit(-1); }

printf("23c3\n");}


cmp [ebp+argc], 1jg short postGuardpush 0FFFFFFFFh ; statuscall _exit

postGuard:push offset a23c3 ; ”23c3\n"call _printf

Choosing Branches

• Prefix is a tool that randomly took branches.

– Found many bugs for customers.– Produced different results each run.

• Bought by Microsoft and shelved.• Many customers keep old versions

around.• Prefast comes with DDK.

– Does not do interfunction value tracking

Choosing Branches

• Taking all branches results in multiple machine states.

• Taking a branch sets constraints on input.

• These constraints must not be broken.

Dealing with Branchesint getSize(char *ch) { int size = 1; char x = *ch; if (x != 0) { if (x != '\n') { size++; } else { size += 2; } } else { size--; } return size;}


What are the potential states?1. (x <=-1 || x >= 1) && (x != ‘\n’) && (size

== 2)2. (x == ‘\n’) && (size == 3)3. (x == 0) && (size == 0)

Real world code will have many potentialstates.

Inter-function: Top-down

• Start at an export or entry point.

• Traverse code through functions


main() {foo(); x(); bar();}foo() {x(); }bar() {y(); }x() {y(); z(); }y() {z(); }z() { return 0; }

// Code omitted for brevity


Function Count

main() 0foo() 0bar() 0x() 0y() 0z() 0


Function Count

main() 1foo() 0bar() 0x() 0y() 0z() 0


Function Count

main() 1foo() 1bar() 0x() 0y() 0z() 0


Function Count

main() 1foo() 1bar() 0x() 1y() 0z() 0


Function Count

main() 1foo() 1bar() 0x() 1y() 1z() 0


Function Count

main() 1foo() 1bar() 0x() 1y() 1z() 1


Function Count

main() 1foo() 1bar() 0x() 1y() 1z() 2


Function Count

main() 1foo() 1bar() 0x() 2y() 1z() 2


Function Count

main() 1foo() 1bar() 0x() 2y() 2z() 2


Function Count

main() 1foo() 1bar() 0x() 2y() 2z() 3


Function Count

main() 1foo() 1bar() 0x() 2y() 2z() 4


Function Count

main() 1foo() 1bar() 1x() 2y() 2z() 4


Function Count

main() 1foo() 1bar() 1x() 2y() 3z() 4


Function Count

main() 1foo() 1bar() 1x() 2y() 3z() 5


• Complexity can explode.

• Very time consuming.

• Hitting the same functions multiple times.

• z() visited 5 times.

• Larger programs can have very large call chains.

• “like playing with a yo-yo in the grand canyon”

Inter-function: Bottom-up

• Describe each function in isolation

• Taint return value

• Store return values for a function based on constraints

• Use it when function call is evaluated

• Creating a machine state diff.


• With deeply nested calls

• Taint return value

• Requires multiple sweeps


main() {foo(); x(); bar();}foo() { x(); }bar() { y(); }x() {y(); z(); }y() { z(); }z() { return 0; }

// Code omitted for brevity

Inter-function: Bottom-upPass #1

main() { foo(); x(); bar();}

Done: <None>


foo() { x(); }

Done: <None>


bar() { y(); }

Done: <None>


y() { z(); }

Done: <None>


x() { y(); z(); }

Done: <None>


z() { return 0; }

Done: z()


• One pass through call graph seems similar to top-down.

• What is the difference?

• The difference is z() is evaluated as a machine state diff.

• z()’s analysis is cached


main() { foo(); x(); bar();}

Done: z()


foo() { x(); }

Done: z()


bar() { y(); }

Done: z()


x() { y(); z(); }

Done: z()


y() { z();}

Done: z(), y()


• At pass #2 z() and y() are cached.

• Each pass caches more functions.


Done: z(), y(), x(), bar()


Done: z(), y(), x(), bar(), foo()


Done: z(), y(), x(), bar(), foo(), main()


• This method took 5 passes.• Less passes can be achieved by starting

at the bottom. • Optimizations can include:

– Leaf nodes first– Functions with low calls and high xrefs

• Ideally Top-down and Bottom-up are combined.

Non-Contiguous functions

• Modern compilers commonly make non-contiguous functions.

• Does this matter for analysis?

• NO.

Non-Contiguous functions

• Functions come from languages.

• In reality a function is a collection of basic blocks.

• Basic blocks are a transformation between an input state and an output machine state.

• Functions are a collection of these transformations or expressions.

Self-modifying code

• We don’t care about self-modifying code.

• Does Microsoft or any large vendor use packers or self modifying code?

Unbounded loops

• Trillian exploit example• Find a control flow block• That forms a closed loop on itself• Where a pointer is

– written to– incremented

• Exit from loop (if found) is – a tainted byte comparison– said byte was written to pointer

Unbounded loops

Inline strcpy

while (*dst++ = *src++);

Unbounded loops

// \\MachineName\GetMachineName(WCHAR *src, WCHAR *dst, int arg_8){ for(src++; *src != (WCHAR)'\'; ) *dst++ = *src++;

...}

// MS-RPC Blaster Overflow// Code Snippet from // The Art of Software Security Assessment: Identifying and Preventing

// Software Vulnerabilities

Bounded loops

• Find control flow blocks• That form a closed loop on themselves• Where pointers are

– written to– incremented

• Exit from loop is – comparison with an input– a variable being (++/--)

• Inline wsncpy with wrong n.

Summary

• basic value tracking

• out-of-bounds memory detection

• dealing with branches

• dealing with loops

• with real, working C# code– http://sf.net/projects/bugreport

Questions ?

Shameless Self-Promotion

Automating Exploit Detection with Binary Code Analysis

RSA Conference 2007 TutorialsFeb 4th and 5th San Francisco

Automating Exploit Detection:Cutting-edge Tools and Techniques

Black Hat Europe 2007 TrainingMarch 27th and 28th Amsterdam

automated exploit detection in binaries

Documents