Gogul Balakrishnan, Radu Gruian
and Thomas RepsComputer Science Dept., Univ. of
WisconsinGrammaTech, Inc.
April, 2005
CodeSurfer / x86A Platform for Analyzing x86
Executables
1
ContentsIntroductionCodeSurfer / x86 ArchitectureCodeSurfer / x86 FacilitiesCodeSurfer / x86 LimitationsRecent Work
2
Introduction
3
MotivationEnsuring that 3rd-party applications do not
perform malicious operationsIssues
Symbol-table and debugging information is either absent
No abstract location information (variables)Existing binary analysis tools are not capable
of dealing with these issues
Introduction
4
CodeSurferProgram analysis and inspection toolProgramming API is bundled with the
CodeSurfer programmable package
Introduction
IDAProPowerful and commercial disassemby toolkitProvide APIs for its internal plug-ins
5
Introduction
6
CodeSurfer / x86Prototype system for analyzing x86
executablesCombine Value-Set Analysis(VSA) with
facilities provided by the IDAPro and CodeSurfer toolkits
Recover Intermediate Representations(IR) of programs using VSA
Provide a platform for investigating the properties and behaviors of potentially malicious code
CodeSurfer / x86 Architecture
7
Overall Architecture
CodeSurfer / x86 Architecture
8
Value-set Analysis(VSA)Purpose
Over-approximate possible range of values at each program point each memory Location(registers, stack...) might store
DescriptionSeparate address space into a set of disjoint areasMemory Locations are represented as a-locs
Ex) EAX -> ( ㅗ , 4[0, 1]-20, ㅜ ) means that EAX may not contain any meaningful value in Global Environment , may have value 4 * [0, 1] – 20 + ESP in some Local Environment and be able to have any value in some other Local Environment
CodeSurfer / x86 Architecture
9
IDAProInput
x86 ExecutableProcess
Disassemble x86 binary executableAnalyze static information
OutputAssembly codeControl Flow Graphs(CFGs)Procedure boundariesStatically known memory addresses and offsets
CodeSurfer / x86 Architecture
10
Connector – ParsingProcess
Parse input data into connector’s data structures for VSA
OutputParsed Data which keeps whole information intact
CodeSurfer / x86 Architecture
11
Connector – AbstractionProcess
Value-set Analysis – a-locsOutput
Parsed Data with Abstract Information including a-locs with value-sets
CodeSurfer / x86 Architecture
12
Connector – AugmentationProcess
Augment incomplete(indirect jumps, indirect calls) call graph and CFGs using each program point’s a-locs and value-sets
OutputCode Surfer compatible format data(IRs)
CodeSurfer / x86 Architecture
13
CodeSurferInput
Code Surfer compatible format DataOutput
Collection of IRs, consisting of Abstract Syntax Tree, CFGs, call graph, System Dependence Graph(SDG)
CodeSurfer / x86 Architecture
14
Overall Architecture (revisit)
CodeSurfer / x86 Facilities
15
Standard Compilation Model CheckCheckpoints
Runtime StackSelf-modificationSeparation of Program’s Data
If it cannot be confirmed that the executable conforms to the model, then the IR is possibly incorrect
CodeSurfer / x86 Facilities
16
CodeSurfer’s GUISDG Browser
CodeSurfer’s APIAccess lower-level information
individual nodes and edges of the program’s SDGCall graphCFGs
Conjunction with GrammaTech’s Path InspectorDetect possibly problematic paths
CodeSurfer / x86 Limitations
17
LimitationsDynamically Determined Information
IDAPro and VSA cannot fully recover dynamically determined information such as heap-allocated data, indirect calls, and indirect jumps
Complex Data StructureRecover only very coarse information about arraysValue-sets are only suitable for congruence,
contiguous data structure
Recent Work
18