TAINTSCOPEA Checksum-Aware Directed fuzzing Tool for Automatic Software Vulnerability Detection
Tielei Wang1, Tao Wei1, Guofei Gu2, Wei Zou1
1Peking University, China2Texas A&M University, US
TERMS
Checksum – a way to check the integrity of data. Used in network protocols and files.
Fuzzing – generating malformed inputs and feeding them to the application.
Dynamic Taint Analysis – runs a program and observes which computations are affected by predefined taint sources (e.g. input)
data
data Checksum field
Checksum function
2
THE PROBLEM
The input mutation space is enormous .
Most malformed inputs dropped at an early stage, if the program employs a checksum mechanism.
3
THE PROBLEM
1 void decode_image(FILE* fd){2 ...3 int length = get_length(fd);4 int recomputed_chksum = checksum(fd, length);5 int chksum_in_file = get_checksum(fd);
//line 6 is used to check the integrity of inputs6 if(chksum_in_file != recomputed_chksum)7 error();8 int Width = get_width(input_file);9 int Height = get_height(input_file);10 int size = Width*Height*sizeof(int);11 int* p = malloc(size);12 ...13 for(i=0; i<Height; i++){// read ith row to p14 read_row(p+Width*i, i, fd);
4
THE IDEA To infer whether/where a program checks the
integrity of input.
Identify which input bytes can flow into sensitive points:Taint analysis at byte level – monitors how application uses the input data.
Create malformed input focusing the “hot bytes”.
Repair checksum fields in input, to expose vulnerability.
Fully automatic
Found 27 new vulnerability – acrobat reader, google picasa and more.
5
HOW DOES IT WORK?
1. Dynamic taint tracing2. Detecting checksum3. Directed fuzzing4. Repairing crashed samples
6
HOW DOES IT WORK?
Execution Monitor
Checksum Locator
Directed Fuzzer
Checksum Repairer
Modified Program
Hot Bytes InfoInstruction Profile
CrashedSamples
Reports
7
HOW DOES IT WORK?
Runs the program with well-formed input.
Execution monitor records: Which input bytes related to arguments of API
functions (e.g. malloc, strcpy) – “hot bytes” report.
Which bytes each conditional jump instruction
depends on (e.g. JZ, JE, JB) – checksum report.
Considering only data flow (no control flow).
1. DYNAMIC TAINT TRACING
8
HOW DOES IT WORK?
Instruments instructions – movement (e.g. MOV, PUSH), arithmetic (e.g.
SUB, ADD), logic (e.g. AND, XOR) Taints all values written by an
instruction with union of all taint labels associated with values used by that instruction.
Considering also eflags register.
1. DYNAMIC TAINT TRACING
eax {0x6, 0x7}, ebx {0x8, 0x9} add eax, ebxeax {0x6, 0x7, 0x8, 0x9}, eflags {0x6, 0x7, 0x8, 0x9}
9
HOW DOES IT WORK?1. DYNAMIC TAINT TRACING -
EXAMPLE
…0x8048d5b: invoking malloc: [0x8,0xf]…
8 int Width = get_width(input_file);9 int Height = get_height(input_file);10 int size = Width*Height*sizeof(int);11 int* p = malloc(size);
Input size is 1024 bytes“hot bytes” report:
10
HOW DOES IT WORK?
Input size is 1024 byteschecksum report:
1. DYNAMIC TAINT TRACING - EXAMPLE
6 if(chksum_in_file != recomputed_chksum)7 error();
…0x8048d4f: JZ: 1024: [0x0,0x3ff]…
11
HOW DOES IT WORK?
Checksum detector:
identify potential checksum check points the recomputed checksum value depends
on many input bytes Instruments conditional jump. Before
execution, checks whether the number of
marks associated with eflags register exceeds a threshold.
Problem with decompressed bytes.
2. DETECTING CHECKSUM
12
HOW DOES IT WORK?
Refinement:
2. DETECTING CHECKSUM
Well-formed inputs can pass the checksum test,
but most malformed inputs cannot
13
HOW DOES IT WORK?
Refinement:
2. DETECTING CHECKSUM
Well-formed inputs can pass the checksum test,
but most malformed inputs cannot Run well-formed inputs, identify the
always-taken and always-not-taken instructions.
14
HOW DOES IT WORK?
Refinement:
2. DETECTING CHECKSUM
Well-formed inputs can pass the checksum test,
but most malformed inputs cannot Run well-formed inputs, identify the
always-taken and always-not-taken instructions.
Run malformed inputs, also identify the always-taken and always-not-taken instructions.
15
HOW DOES IT WORK?
Refinement:
2. DETECTING CHECKSUM
Well-formed inputs can pass the checksum test,
but most malformed inputs cannot Run well-formed inputs, identify the
always-taken and always-not-taken instructions.
Run malformed inputs, also identify the always-taken and always-not-taken instructions.
Identify the conditional jump instructions that behaves completely different when processing well-formed and malformed inputs.
16
HOW DOES IT WORK?
Checksum detector: Creates bypass rules –
always-taken, always-not-taken
2. DETECTING CHECKSUM
6 if(chksum_in_file != recomputed_chksum)7 error();
…0x8048d4f: JZ: 1024: [0x0,0x3ff]…
0x8048d4f: JZ: always-taken
17
HOW DOES IT WORK?
Checksum detector: Checksum field identification
Input bytes that affects chksum_in_file are the checksum field.
2. DETECTING CHECKSUM
6 if(chksum_in_file != recomputed_chksum)7 error();
18
HOW DOES IT WORK?
Generates malformed test cases – feeds them to the original or instrumented program.
According to the bypass rules, alters the execution traces at check points – sets the eflags register.
3. DIRECTED FUZZING
19
HOW DOES IT WORK?
All malformed test cases are constructed based on the “hot bytes” information Using attack heuristics:
bytes that influence memory allocation are set to small, large or negative.bytes that flow into string functions are replaced by characters such as %n, %p.
Output – test cases that could cause to crash or consume 100% CPU.
3. DIRECTED FUZZING
20
HOW DOES IT WORK?3. DIRECTED FUZZING
…0x8048d5b: invoking malloc: [0x8,0xf]…
6 if(chksum_in_file != recomputed_chksum)7 error();8 int Width = get_width(input_file);9 int Height = get_height(input_file);10 int size = Width*Height*sizeof(int);11 int* p = malloc(size);
0x8048d4f: JZ: always-taken
…0x8048d4f: JZ: 1024: [0x0,0x3ff]…
“hot bytes” reportChecksum report
Bypass info
21
HOW DOES IT WORK?3. DIRECTED FUZZING
…0x8048d5b: invoking malloc: [0x8,0xf]…
6 if(chksum_in_file != recomputed_chksum)7 error();8 int Width = get_width(input_file);9 int Height = get_height(input_file);10 int size = Width*Height*sizeof(int);11 int* p = malloc(size);
0x8048d4f: JZ: always-taken
…0x8048d4f: JZ: 1024: [0x0,0x3ff]…
“hot bytes” reportChecksum report
Bypass info
Before executing 0x8048d4f, the fuzzer sets
the flag ZF in eflags to an opposite value
22
HOW DOES IT WORK?
Fixing is expensive - fixes checksum fields only in test cases that caused crashing.
How?Cr – row data in the checksum field
D – input data protected by checksum filedChecksum() – the complete checksum algorithmT – transformationWe want to pass the constraint:
4. REPAIRING CRASHED SAMPLES
Checksum(D) == T(Cr)
23
HOW DOES IT WORK?
Using symbolic execution to solve:
Checksum(D) is a runtime determinable constant:
Only Cr is a symbolic value. Common transformations (e.g. converting
from hex/oct to decimal), can be solved by existing solvers (STP).
4. REPAIRING CRASHED SAMPLES
Checksum(D) == T(Cr)
c== T(Cr)
24
HOW DOES IT WORK?
If the new test case cause the original program to crash,
4. REPAIRING CRASHED SAMPLES
a potential vulnerability is detected!
25
EVALUATION
MS Paint Google Picasa Adobe Acrobat ImageMagick
irfanview gstreamer Winamp XEmacs
Amaya dillo wxWidgets PDFlib
27 previous unknown Vulnerabilities:
30
DISCUSSION TaintScope cannot deal with secure
integrity check schemes (e.g. cryptographic hash algorithms, digital signature) – impossible to generate valid test cases.
Limited effectiveness when all input data are encrypted (tracking decrypted data).
Checksum check points identification can be affected by the quality of inputs.
Not tracks control flow propagation. Not all instructions of x86 are
instrumented by the execution monitor.
32
CONCLUSIONTaintScope can perform: Directed fuzzing
Identify which bytes flow into system/library calls.
dramatically reduce the mutation space. Checksum-aware fuzzing
Disable checksum checks by control flow alternation.
Generate correct checksum fields in invalid inputs.
33