compare - port - navigate by halvar flake & rolf rolles
TRANSCRIPT
-
8/7/2019 Compare - Port - Navigate by Halvar Flake & Rolf Rolles
1/40
Com pare, Por t , Navigat eThree applications of graphs and graphing for securityHalvar Flake Blackhat Briefings Europe 2005
-
8/7/2019 Compare - Port - Navigate by Halvar Flake & Rolf Rolles
2/40
OverviewThe talk consists of three parts:
i. Comparing executable objects
i. What is the difference between these two pieces of code ?
ii. How similar are these two pieces of code ?
ii. Porting information between executable objects
i. If this malware new thing is similar to xyz, cant I use my olddisassembly ?
ii. They use OpenSSL in this embedded device, can I get
those symbols into IDA ?iii. Navigating executable objects
i. This binary is huge and where the heck should I startreading ?
ii. Now, how does all this fit together ?
-
8/7/2019 Compare - Port - Navigate by Halvar Flake & Rolf Rolles
3/40
-
8/7/2019 Compare - Port - Navigate by Halvar Flake & Rolf Rolles
4/40
St ruc t ural Com parisonAsymetry between source recompilation and RE
i. Making a minor change and recompiling a programis easy
ii. The general assumption is that reverseengineering these changes is hard
i. Reverse Engineer has to re-do all work, because he lostall results from previous disassembly
ii. Reverse Engineer then has to compare the functionslogic, as many instructions will have changed due to
optimisationiii. Software Vendors and HLL-Virus Authors try to
exploit this asymetryi. The first want to buy time for customers
ii. The second can create variants quickly/easily
-
8/7/2019 Compare - Port - Navigate by Halvar Flake & Rolf Rolles
5/40
St ruc t ural Com parisonDiffing executables is difficult
i. Small changes in the source code can triggersignificant changes in the executable:
i. Adding a structure member will change immediate offsetsfor all accesses to structure members after the change
ii. Adding a few lines of code can produce radically differentregister assignments and lead to differing instructions
iii. Changed sizes of basic blocks in one function can lead tocode in unrelated functions changing ( because of branchinversion )
ii. The overwhelming majority of changes in thebinary are irrelevant
i. Classical trade-off: More false positives or running the risk
of a false negative ?
-
8/7/2019 Compare - Port - Navigate by Halvar Flake & Rolf Rolles
6/40
St ruc t ural Com parisonWhy byte-by-byte comparison is no good
i. All offsets and branch displacements haveto be masked out of the comparison
ii. Compilers like to re-arrange basic blocks
iii. Objects might be linked in different orderiv. Individual instructions might have been
replaced by functional equivalents
All these things will make byte-by-bytecomparison report big changes when none
really occured
-
8/7/2019 Compare - Port - Navigate by Halvar Flake & Rolf Rolles
7/40
St ruc t ural Com parisonViewing a program as graph of graphs
i. Primarily one is interested in changes to programlogic
ii. A program can be viewed by looking at two graphs:i. The callgraph which contains all functions and their
relationships ( A calls B etc. )ii. The individual function flowgraphs which represent the
basic blocks of every function and how they are linked byconditional or unconditional branches
iii. The program logic is more or less encoded inthese two graphs
i. Adding a single if( ) in any function will trigger a change inits flowgraph
ii. Changing a call to strcpy to a call to strncpy will change thecallgraph
-
8/7/2019 Compare - Port - Navigate by Halvar Flake & Rolf Rolles
8/40
St ruc t ural Com parisonDetecting changes by comparing graphs
i. Program logic can be viewed as a callgraph withnodes representing the individual flowgraphs
ii. Comparing two executable based on these graphswill detect all logic changes
iii. The comparison should be false-positive-free:i. Only real changes to program logic should be detected
ii. Compilers dont usually change the program logic
iii. Modern compilers can inline entire functions and do manymore crazy things ( thus the should be instead of is )
iv. The comparison will not be false-negative-free:i. Switching signedness of a type or changing constants and
buffer sizes will go undetectedv. So how can two graphs of graphs be compared ?
-
8/7/2019 Compare - Port - Navigate by Halvar Flake & Rolf Rolles
9/40
St ruc t ural Com parisonComparing the graphs
i. Checking if two undirected graphs are isomorphic
is NP -- Math-speak for: Finding out if two graphsare the same is damn expensive
ii. Problem is a lot less problematic if one can find
fixed points two nodes in the two graphs thatare definitely the same
iii. When analyzing programs, entry points and namesfor imported functions are available, yielding a firstset of fixed points
iv. More fixed points would be desirable so whatwould be a decent heuristic to generate more of
them ?
-
8/7/2019 Compare - Port - Navigate by Halvar Flake & Rolf Rolles
10/40
St ruc t ural Com parisonHeuristic signatures for the functions
Simplistic signature for every function:
Number of basic blocks
Number of links
Number of functions called from this function
-
8/7/2019 Compare - Port - Navigate by Halvar Flake & Rolf Rolles
11/40
St ruc t ural Com parisonHeuristic signatures for the functions: Basic Blocks
Nodes5 Nodes
-
8/7/2019 Compare - Port - Navigate by Halvar Flake & Rolf Rolles
12/40
St ruc t ural Com parisonHeuristic signatures for the functions: Links
Links6 Links
-
8/7/2019 Compare - Port - Navigate by Halvar Flake & Rolf Rolles
13/40
St ruc t ural Com parisonHeuristic signatures for the functions: Subcalls
Subcalls6 subcalls
Signature: ( 5, 6, 6 )
-
8/7/2019 Compare - Port - Navigate by Halvar Flake & Rolf Rolles
14/40
-
8/7/2019 Compare - Port - Navigate by Halvar Flake & Rolf Rolles
15/40
St ruc t ural Com parisonFinding more fixedpoints (2)
Fps12::19::72::2::5
5::7::0
2::2::118::32::23
11::17::2
3::3::2
14::25::19
1::0::139::66::33
1::0::2
22::36::4
2::2::1
12::19::7
2::2::5
5::7::0
2::2::1
18::32::23
11::17::2
3::3::2
16::28::19
1::0::1
39::66::33
1::0::2
22::36::4
2::2::1
Signature Set of Binary A Signature Set of Binary B
-
8/7/2019 Compare - Port - Navigate by Halvar Flake & Rolf Rolles
16/40
St ruc t ural Com parisonFinding more fixedpoints (3)
Fps12::19::72::2::5
5::7::0
2::2::118::32::23
11::17::2
3::3::2
14::25::19
1::0::1
1::0::2
22::36::4
2::2::1
12::19::7
2::2::5
5::7::0
2::2::1
18::32::23
11::17::2
3::3::2
16::28::19
1::0::1
1::0::2
22::36::4
2::2::1
12::19::7
2::2::5
5::7::0
2::2::118::32::23
11::17::2
3::3::2
14::25::19
1::0::1
1::0::2
22::36::4
2::2::1
12::19::7
2::2::5
5::7::0
2::2::1
18::32::23
11::17::2
3::3::2
16::28::19
1::0::1
1::0::2
22::36::4
2::2::1
Signature Set of Binary A Signature Set of Binary B
-
8/7/2019 Compare - Port - Navigate by Halvar Flake & Rolf Rolles
17/40
St ruc t ural Com parisonFinding more fixedpoints (4)
Fps12::19::72::2::5
5::7::0
2::2::1
11::17::2
3::3::2
14::25::19
1::0::1
1::0::2
22::36::4
2::2::1
12::19::7
2::2::5
5::7::0
2::2::1
11::17::2
3::3::2
16::28::19
1::0::1
1::0::2
22::36::4
2::2::1
Signature Set of Binary A Signature Set of Binary B
-
8/7/2019 Compare - Port - Navigate by Halvar Flake & Rolf Rolles
18/40
-
8/7/2019 Compare - Port - Navigate by Halvar Flake & Rolf Rolles
19/40
St ruc t ural Com parisonFinding more fixedpoints (6)
Fps2::2::1
14::25::19
2::2::1
2::2::1
16::28::19
2::2::1
Signature Set of Binary A Signature Set of Binary B
-
8/7/2019 Compare - Port - Navigate by Halvar Flake & Rolf Rolles
20/40
St ruc t ural Com parisonHeuristic signatures for the functions: More fixedpoints
In the next step, signatures that occur more than once in bothbinaries are eliminated by using the callgraph:
5::7::012::19::7
12::19::7
2::2::1
2::2::1 2::2::1
5::7::0
2::2::1
-
8/7/2019 Compare - Port - Navigate by Halvar Flake & Rolf Rolles
21/40
St ruc t ural Com parisonHeuristic signatures for the functions: More fixedpoints
In the next step, signatures that occur more than once in bothbinaries are eliminated by using the callgraph:
5::7::012::19::7
12::19::7
2::2::1
2::2::1 2::2::1
5::7::0
2::2::1
-
8/7/2019 Compare - Port - Navigate by Halvar Flake & Rolf Rolles
22/40
St ruc t ural Com parisonMore Fixedpoints
Its clear that smaller subsets of functions will have fewercollisions, and thus generate more fixedpoints. One can
restrict the sizes of the examined sets using many differentcharacteristics:
Same indegree in the callgraph
Same outdegree in the callgraph Reference the same string in both binaries ()
-
8/7/2019 Compare - Port - Navigate by Halvar Flake & Rolf Rolles
23/40
St ruc t ural Com parisonSmall Prime Products
Compilers tend to re-order instructions for better pipelining andoptimization a lot
A quick algorithm to see if two sequences of instructionscontain the same instructions, just re-ordered, would be nice
A signature that contains the instructions but not their order
and is reasonably short/easy to represent
Small Prime Products (SPP): Each instruction type (mov, lea etc.) is assigned a small
prime number For a given sequence, all the primes corresponding to the
instructions in the sequence are multiplied together Because of uniqueness of prime number decompositions and
because a*b = b*a we have all possible permutations covered
-
8/7/2019 Compare - Port - Navigate by Halvar Flake & Rolf Rolles
24/40
-
8/7/2019 Compare - Port - Navigate by Halvar Flake & Rolf Rolles
25/40
St ruc t ural Com parisonAnother fun example: SCHANNEL.DLL
i. No information except bug in PCT parsing
ii. Running BinDiff yielded a bunch of results, but onlyone that had PCT in the function name
iii. Funny bug: Overwrite EIP with anything, includingNULL bytes
iv. Total time invested to locate and understand thebug after receiving the patch: Less than one hour
v. more about this later in this talk
vi. For those into reading more in-depth papers:
http://www.sabre-security.com/files/dimva_paper2.pdf
-
8/7/2019 Compare - Port - Navigate by Halvar Flake & Rolf Rolles
26/40
Port ing Sym bolsMalware analysis
More and more malware hits the networks every day Malware authors have adapted to a multi-version-
development cycle The source code of Agobot etc. was widely distributed,
and many special variants exist
The presented algorithms can be used to:
Re-use information from a disassembly of an earliervariant of a piece of malware
Measure similarity between code bases, and thusuncover cooperation between malware authors
In the future: Automatically classify new malware into thecorrect malware family, by measuring code similarity &
clustering
-
8/7/2019 Compare - Port - Navigate by Halvar Flake & Rolf Rolles
27/40
Port ing Sym bolsMalware analysis (Example)
-
8/7/2019 Compare - Port - Navigate by Halvar Flake & Rolf Rolles
28/40
Port ing Sym bolsPorting library symbols into ROMs
When analyzing embedded systems (or in fact any sort ofbinary), a lot of code one encounters is clearly from open
sources Things like OpenSSL are VERY common in many applications It would be nice if we could make use of that information
We can compile OpenSLL with debug symbols, and back-portthe debug symbols into our disassembly, even though thecompiler was different
-
8/7/2019 Compare - Port - Navigate by Halvar Flake & Rolf Rolles
29/40
Port ing Sym bolsPorting library symbols into ROMs
Example
-
8/7/2019 Compare - Port - Navigate by Halvar Flake & Rolf Rolles
30/40
Navigat ing B inar iesRoad Maps as analogy for programs
-
8/7/2019 Compare - Port - Navigate by Halvar Flake & Rolf Rolles
31/40
Navigat ing B inar iesRoad Maps as analogy for programs
i. When driving in a car, a frequent problem is to get
from the current location of the car to anotherlocation
ii. Similar problems have to be solved in security
analysis:i. A patch for a problem changes a function somewheredeep in program logic and fixes a security problem
ii. A (static) analysis tool detects a problem somewhere
deep in program logiciii. In both cases, the problem of finding a way from
point A to point B has to be solvedi. Why use textual representations such as road A leads to
road B etc. for program analysis ? There is a goodreason we invented maps
-
8/7/2019 Compare - Port - Navigate by Halvar Flake & Rolf Rolles
32/40
Navigat ing B inar iesThe importance of visualisation
i. Programs are huge and not easily understood
ii. Most of the human brain is built to react to visualstimulus
i. Humans are more suited to recognize food than to keep
large graphs in their headii. Recognition tasks are a lot faster than memory tasks,
meaning that reading code dependencies is significantlyslower than seeing them
iii. Many problems are data visualisation problems
iv. Good ways to visualise function dependencies canyield better understanding
-
8/7/2019 Compare - Port - Navigate by Halvar Flake & Rolf Rolles
33/40
Navigat ing B inar iesThe differences between road maps and programs
i. Roads crossings have an outdegree of less than 4
usuallyi. Many functions call much more than just 4 subfunctions
ii. Road crossings have in indegree of less than 4
usuallyi. Many functions get called from a lot more than 4subfunctions
iii. When driving with a car, one usually does not
drive down many dead endsi. A subfunction call that does not lead to the desired
location immediately will nonetheless be executed
Useful analogy, but use with care
-
8/7/2019 Compare - Port - Navigate by Halvar Flake & Rolf Rolles
34/40
-
8/7/2019 Compare - Port - Navigate by Halvar Flake & Rolf Rolles
35/40
Navigat ing B inar iesRestructuring callgraphs
Example Graph
-
8/7/2019 Compare - Port - Navigate by Halvar Flake & Rolf Rolles
36/40
Navigat ing B inar iesRestructuring callgraphs
Example Graph (Restructured)
-
8/7/2019 Compare - Port - Navigate by Halvar Flake & Rolf Rolles
37/40
Navigat ing B inar iesGetting your bearings in an unknown binary
Auditing binaries is like harpooning in cold, dark water:
You have a hard time telling what yoursurroundings look like
You have a hard time telling where exactly you are A map is only helpful if you can somehow find out
where on the map you yourself are
A callgraph is a map of the executable
We can do find out where we are by setting echo-
breakpoints breakpoints in a small debuggerthat, when hit, send the address that was hit overthe wire
N i i B i i
-
8/7/2019 Compare - Port - Navigate by Halvar Flake & Rolf Rolles
38/40
Navigat ing B inar iesNavigating using echo
i. Same concept as presented at BH Vegas 2002
ii. Set BPX on all nodes ( remove upon hit )
iii. Visualize the results in a graph
i. Highlight all functions that have been hit
ii. Remove nodes that cannot be reached from the nodesthat have been hit
iii. Playback the path the program takes as a movie
iv. Attempt to navigate to a certain location using your mapand your own location
Navigation can be significantly improved
N i t i B i i
-
8/7/2019 Compare - Port - Navigate by Halvar Flake & Rolf Rolles
39/40
Navigat ing B inar iesOther uses (Library Functions II)
i. We can identify library functions just based on their
properties in the graph (indegree/outdegree)
ii. Library functions are of special interest whenauditing code in order to find vulnerabilities:
i. Library functions are called from many locations if theycontain a bug, it is likely that a function with high indegreeis reachable from a location that we can hit
ii. Library functions are called from many locations if theyreact badly to certain inputs, the sheer number of callsincrease the odds that one can get malicious input in
E d f t h t lk
-
8/7/2019 Compare - Port - Navigate by Halvar Flake & Rolf Rolles
40/40
End of t he t a lkAny questions ?