binhunt: automatically finding semantic differences in binary programs debian gao michael k. reiter...
TRANSCRIPT
BinHunt: Automatically Finding Semantic Differences in Binary Programs
Debian GaoMichael K. Reiter
Dawn Song
ICICS 2008: 10th International Conference on Information and Comunications Security
Conference
ICICS: A bi-annual International Conference on Information,
Communications and Signal Processing. The conference covers areas in Information Engineering, Communication Systems, Signal Processing, Multimedia Processing and Applications.
Papers
Session V: Software security BinHunt: Automatically Finding Semantic Differences in Binary
ProgramsDebin Gao (a), Mike Reiter (b) and Dawn Song (c)
Enhancing Java ME Security Support with Resource Usage MonitoringPaolo Mori, Fabio Martinelli, Alessandro Castrucci and Francesco RopertiIIT-CNR, Italy
Pseudo-randomness Inside Web BrowsersGuan Zhi, Zhang Long, Zhong Chen and Nan XianghaoPeking University, China
Author
Debin Gao
Michael K. Reiter
Dawn Song
Debin Gao Automatically Adapting a Trained Anomaly Detector to
Software PatchesPeng Li, Debin Gao and Michael K. ReiterIn Proceedings of the 12th International Symposium on Recent Advances in Intrusion Detection (RAID 2009)
Bridging the Gap between Data-flow and Control-flow Analysis for Anomaly DetectionPeng Li, Hyundo Park, Debin Gao and Jianming FuIn Proceedings of the 24th Annual Computer Security Applications Conference (ACSAC 2008)
Gray-Box Extraction of Execution Graphs for Anomaly DetectionDebin Gao, Michael K. Reiter and Dawn SongIn Proceedings of the 11th ACM Conference on Computer and Communications Security (CCS 2004)
On Gray-Box Program Tracking for Anomaly DetectionDebin Gao, Michael K. Reiter and Dawn SongIn Proceedings of the 13th USENIX Security Symposium (USENIX Security 2004)
Assistant ProfessorSchool of Information Systems
Singapore Management University
Michael K. Reiter Automatically adapting a trained anomaly detector to
software patches P. Li, D. Gao and M. K. Reiter In Recent Advances in Intrusion Detection, 12th International Symposium, RAID 2009
Fast and black-box exploit detection and signature generation for commodity software X. Wang, Z. Li, J. Y. Choi, J. Xu, M. K. Reiter and C. Kil ACM Transactions on Information and System Security 12(2)
On gray-box program tracking for anomaly detection D. Gao, M. K. Reiter and D. Song In Proceedings of the 13th USENIX Security Symposium
Lawrence M. Slifkin Distinguished ProfessorDepartment of Computer Science
University of North Carolina at Chapel HIll
Dawn Song Research Projects
BitBlaze: Binary analysis for COTS protection and malicious code defense
Binary Code Extraction and Interface Identification for Security Applications. Juan Caballero, Noah M. Johnson, Stephen McCamant, and Dawn Song. In Proceedings of the 17th Annual Network and Distributed System Security Symposium, February 2010.
Loop-Extended Symbolic Execution on Binary Programs. Prateek Saxena, Pongsin Poosankam, Stephen McCamant, and Dawn Song. In Proceedings of the ACM/SIGSOFT International Symposium on Software Testing and Analysis (ISSTA), July 2009.
BitBlaze: A New Approach to Computer Security via Binary Analysis. Dawn Song, David Brumley, Heng Yin, Juan Caballero, Ivan Jager, Min Gyung Kang, Zhenkai Liang, James Newsome, Pongsin Poosankam, and Prateek Saxena. In Proceedings of the 4th International Conference on Information Systems Security
Associate ProfessorComputer Science Division
University of California, Berkeley
Introduction
BinHunt:
It bases its analysis on the control flow of the programs using a new graph isomorphism technique, symbolic execution, and theorem proving for finding semantic differences in binary programs.
Semantic differences:
changes in the program functionality
Syntactic differences:
e.g. Different register allocation and basic block re-ordering
Challenge
A small change in the source code may cause the compiler to use a different register allocation in other parts of the program in which the corresponding source code remains the same
A small change in the source code may change the size of a small number of basic blocks, which further triggers the compiler to re-order many other basic blocks in the binary file
Idea
The control flow of a program is much more resistant to “superficial” changes like different register allocations and basic block re-ordering, and therefore is a more attractive feature for finding semantic differences
Assumption
source code of binary files is not available
function name extracted from these binary files are unreliable for the purpose of binary difference analysis, since they can be changed easily
System Overview(1)
Input: two binary files
Output: a matching between functions in the two binary files
a matching between basic blocks in two matched functions
a matching strength for each match of functions or basic block
System Overview(2)
Decision:
The matchings together with the matching strengths tell us where the semantic differences are. Unmatched functions and unmatched basic blocks, as well as matched functions and matched basic blocks with low matching strengths, constitute the semantic differences found between the two binary file.
Disassembler
parse each binary file
locate the code segment
Realization:
Implement a plug-in to IDA Pro
IR Converter
IR: a dozen different statements, which are type-checked and free of side effects
Easy: our symbolic execution and theorem proving are applied on a much simpler set of instructions
Reliable: reduce the language variation in performing the same functionality
CFG Constuctor
CFG: a set of nodes each representing a basic block and a set of directed edges representing the control flow among the basic blocks
CG: the set of nodes corresponding to the functions in the file and the set of directed edges representing calls among the functions
Graph Isomorphism Engine
Basic Block Comparison
Symbolic Execution and Theorem Proving
Maximum common subgraph isomorphism problem
Backtracking Algorithm
Symbolic Execution Definition
represent values of program variables with symbolic values instead of concrete(initialized) data and to manipulate expressions involving symbolic values
Procedure
Step1:
find all the input and output registers and variables
Step2:
use symbolic execution to represent the final values of the output registers and variables
Theorem Proving Realization
STP: a decision procedure for the satisfiability of quantifier-free formulas in the theory of bit-vectors and arrays
Procedure
pick the symbolic representation of one register/variable from each basic block and use STP to test if they are equivalent, assuming that the inputs to the basic blocks share the same values
Assurance
if two basic blocks are found to be different by our technique of symbolic execution and theorem proving, then they must not be functionally equivalent
This property holds even if the two binary files are compiled using different compilers or compiler options.
Matching Strength Basic Block
1.0: functionally equivalent and registers used are the same
0.9: functionally equivalent while registers used are different
lower: scored on how functionally equivalent they are
Function
1.0: instructions(x86 or IR) of the two functions are the same
others: subgraph measurement divided by the number of nodes in the CFG that has fewer nodes, where subgraph measurement is defined as the summation of matching strengths of matched nodes(basic blocks)
Backtracking Algorithm
D:
contains all possible pairs of nodes that might still be matched(initially V X M)
M:
contains matched node pairs(initially empty)
Case Study——gzip
Case Study——tar(1)
Case Study——tar(2)
Case Study——tar(3)
Related Work& Conclusion BinDiff/BindView
contruct a maximal subgraph isomorphism between the sets of functions in two versions of the same executable file
BinHunt:
contribute a more thorough technique(backtracking technique) for identifying the maximum common subgraph isomorphism
use a novel technique for basic block comparison using symbolic execution and theorem proving
Reference
Thank you!