binhunt: automatically finding semantic differences in binary programs debian gao michael k. reiter...

28
BinHunt: Automatically Finding Semantic Differences in Binary Programs Debian Gao Michael K. Reiter Dawn Song ICICS 2008: 10th International Conference on Information and Comunications Security

Upload: ashlynn-charleen-chapman

Post on 18-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: BinHunt: Automatically Finding Semantic Differences in Binary Programs Debian Gao Michael K. Reiter Dawn Song ICICS 2008: 10th International Conference

BinHunt: Automatically Finding Semantic Differences in Binary Programs

Debian GaoMichael K. Reiter

Dawn Song

ICICS 2008: 10th International Conference on Information and Comunications Security

Page 2: BinHunt: Automatically Finding Semantic Differences in Binary Programs Debian Gao Michael K. Reiter Dawn Song ICICS 2008: 10th International Conference

Conference

ICICS: A bi-annual International Conference on Information,

Communications and Signal Processing. The conference covers areas in Information Engineering, Communication Systems, Signal Processing, Multimedia Processing and Applications.

Page 3: BinHunt: Automatically Finding Semantic Differences in Binary Programs Debian Gao Michael K. Reiter Dawn Song ICICS 2008: 10th International Conference

Papers

Session V: Software security BinHunt: Automatically Finding Semantic Differences in Binary

ProgramsDebin Gao (a), Mike Reiter (b) and Dawn Song (c)

Enhancing Java ME Security Support with Resource Usage MonitoringPaolo Mori, Fabio Martinelli, Alessandro Castrucci and Francesco RopertiIIT-CNR, Italy

Pseudo-randomness Inside Web BrowsersGuan Zhi, Zhang Long, Zhong Chen and Nan XianghaoPeking University, China

Page 4: BinHunt: Automatically Finding Semantic Differences in Binary Programs Debian Gao Michael K. Reiter Dawn Song ICICS 2008: 10th International Conference

Author

Debin Gao

Michael K. Reiter

Dawn Song

Page 5: BinHunt: Automatically Finding Semantic Differences in Binary Programs Debian Gao Michael K. Reiter Dawn Song ICICS 2008: 10th International Conference

Debin Gao Automatically Adapting a Trained Anomaly Detector to

Software PatchesPeng Li, Debin Gao and Michael K. ReiterIn Proceedings of the 12th International Symposium on Recent Advances in Intrusion Detection (RAID 2009)

Bridging the Gap between Data-flow and Control-flow Analysis for Anomaly DetectionPeng Li, Hyundo Park, Debin Gao and Jianming FuIn Proceedings of the 24th Annual Computer Security Applications Conference (ACSAC 2008)

Gray-Box Extraction of Execution Graphs for Anomaly DetectionDebin Gao, Michael K. Reiter and Dawn SongIn Proceedings of the 11th ACM Conference on Computer and Communications Security (CCS 2004)

On Gray-Box Program Tracking for Anomaly DetectionDebin Gao, Michael K. Reiter and Dawn SongIn Proceedings of the 13th USENIX Security Symposium (USENIX Security 2004)

Assistant ProfessorSchool of Information Systems

Singapore Management University

Page 6: BinHunt: Automatically Finding Semantic Differences in Binary Programs Debian Gao Michael K. Reiter Dawn Song ICICS 2008: 10th International Conference

Michael K. Reiter Automatically adapting a trained anomaly detector to

software patches P. Li, D. Gao and M. K. Reiter In Recent Advances in Intrusion Detection, 12th International Symposium, RAID 2009

Fast and black-box exploit detection and signature generation for commodity software X. Wang, Z. Li, J. Y. Choi, J. Xu, M. K. Reiter and C. Kil ACM Transactions on Information and System Security 12(2)

On gray-box program tracking for anomaly detection D. Gao, M. K. Reiter and D. Song In Proceedings of the 13th USENIX Security Symposium

Lawrence M. Slifkin Distinguished ProfessorDepartment of Computer Science

University of North Carolina at Chapel HIll

Page 7: BinHunt: Automatically Finding Semantic Differences in Binary Programs Debian Gao Michael K. Reiter Dawn Song ICICS 2008: 10th International Conference

Dawn Song Research Projects

BitBlaze: Binary analysis for COTS protection and malicious code defense

Binary Code Extraction and Interface Identification for Security Applications. Juan Caballero, Noah M. Johnson, Stephen McCamant, and Dawn Song. In Proceedings of the 17th Annual Network and Distributed System Security Symposium, February 2010.

Loop-Extended Symbolic Execution on Binary Programs. Prateek Saxena, Pongsin Poosankam, Stephen McCamant, and Dawn Song. In Proceedings of the ACM/SIGSOFT International Symposium on Software Testing and Analysis (ISSTA), July 2009.

BitBlaze: A New Approach to Computer Security via Binary Analysis. Dawn Song, David Brumley, Heng Yin, Juan Caballero, Ivan Jager, Min Gyung Kang, Zhenkai Liang, James Newsome, Pongsin Poosankam, and Prateek Saxena. In Proceedings of the 4th International Conference on Information Systems Security

Associate ProfessorComputer Science Division

University of California, Berkeley

Page 8: BinHunt: Automatically Finding Semantic Differences in Binary Programs Debian Gao Michael K. Reiter Dawn Song ICICS 2008: 10th International Conference

Introduction

BinHunt:

It bases its analysis on the control flow of the programs using a new graph isomorphism technique, symbolic execution, and theorem proving for finding semantic differences in binary programs.

Semantic differences:

changes in the program functionality

Syntactic differences:

e.g. Different register allocation and basic block re-ordering

Page 9: BinHunt: Automatically Finding Semantic Differences in Binary Programs Debian Gao Michael K. Reiter Dawn Song ICICS 2008: 10th International Conference

Challenge

A small change in the source code may cause the compiler to use a different register allocation in other parts of the program in which the corresponding source code remains the same

A small change in the source code may change the size of a small number of basic blocks, which further triggers the compiler to re-order many other basic blocks in the binary file

Page 10: BinHunt: Automatically Finding Semantic Differences in Binary Programs Debian Gao Michael K. Reiter Dawn Song ICICS 2008: 10th International Conference

Idea

The control flow of a program is much more resistant to “superficial” changes like different register allocations and basic block re-ordering, and therefore is a more attractive feature for finding semantic differences

Page 11: BinHunt: Automatically Finding Semantic Differences in Binary Programs Debian Gao Michael K. Reiter Dawn Song ICICS 2008: 10th International Conference

Assumption

source code of binary files is not available

function name extracted from these binary files are unreliable for the purpose of binary difference analysis, since they can be changed easily

Page 12: BinHunt: Automatically Finding Semantic Differences in Binary Programs Debian Gao Michael K. Reiter Dawn Song ICICS 2008: 10th International Conference

System Overview(1)

Input: two binary files

Output: a matching between functions in the two binary files

a matching between basic blocks in two matched functions

a matching strength for each match of functions or basic block

Page 13: BinHunt: Automatically Finding Semantic Differences in Binary Programs Debian Gao Michael K. Reiter Dawn Song ICICS 2008: 10th International Conference

System Overview(2)

Decision:

The matchings together with the matching strengths tell us where the semantic differences are. Unmatched functions and unmatched basic blocks, as well as matched functions and matched basic blocks with low matching strengths, constitute the semantic differences found between the two binary file.

Page 14: BinHunt: Automatically Finding Semantic Differences in Binary Programs Debian Gao Michael K. Reiter Dawn Song ICICS 2008: 10th International Conference

Disassembler

parse each binary file

locate the code segment

Realization:

Implement a plug-in to IDA Pro

Page 15: BinHunt: Automatically Finding Semantic Differences in Binary Programs Debian Gao Michael K. Reiter Dawn Song ICICS 2008: 10th International Conference

IR Converter

IR: a dozen different statements, which are type-checked and free of side effects

Easy: our symbolic execution and theorem proving are applied on a much simpler set of instructions

Reliable: reduce the language variation in performing the same functionality

Page 16: BinHunt: Automatically Finding Semantic Differences in Binary Programs Debian Gao Michael K. Reiter Dawn Song ICICS 2008: 10th International Conference

CFG Constuctor

CFG: a set of nodes each representing a basic block and a set of directed edges representing the control flow among the basic blocks

CG: the set of nodes corresponding to the functions in the file and the set of directed edges representing calls among the functions

Page 17: BinHunt: Automatically Finding Semantic Differences in Binary Programs Debian Gao Michael K. Reiter Dawn Song ICICS 2008: 10th International Conference

Graph Isomorphism Engine

Basic Block Comparison

Symbolic Execution and Theorem Proving

Maximum common subgraph isomorphism problem

Backtracking Algorithm

Page 18: BinHunt: Automatically Finding Semantic Differences in Binary Programs Debian Gao Michael K. Reiter Dawn Song ICICS 2008: 10th International Conference

Symbolic Execution Definition

represent values of program variables with symbolic values instead of concrete(initialized) data and to manipulate expressions involving symbolic values

Procedure

Step1:

find all the input and output registers and variables

Step2:

use symbolic execution to represent the final values of the output registers and variables

Page 19: BinHunt: Automatically Finding Semantic Differences in Binary Programs Debian Gao Michael K. Reiter Dawn Song ICICS 2008: 10th International Conference

Theorem Proving Realization

STP: a decision procedure for the satisfiability of quantifier-free formulas in the theory of bit-vectors and arrays

Procedure

pick the symbolic representation of one register/variable from each basic block and use STP to test if they are equivalent, assuming that the inputs to the basic blocks share the same values

Assurance

if two basic blocks are found to be different by our technique of symbolic execution and theorem proving, then they must not be functionally equivalent

This property holds even if the two binary files are compiled using different compilers or compiler options.

Page 20: BinHunt: Automatically Finding Semantic Differences in Binary Programs Debian Gao Michael K. Reiter Dawn Song ICICS 2008: 10th International Conference

Matching Strength Basic Block

1.0: functionally equivalent and registers used are the same

0.9: functionally equivalent while registers used are different

lower: scored on how functionally equivalent they are

Function

1.0: instructions(x86 or IR) of the two functions are the same

others: subgraph measurement divided by the number of nodes in the CFG that has fewer nodes, where subgraph measurement is defined as the summation of matching strengths of matched nodes(basic blocks)

Page 21: BinHunt: Automatically Finding Semantic Differences in Binary Programs Debian Gao Michael K. Reiter Dawn Song ICICS 2008: 10th International Conference

Backtracking Algorithm

D:

contains all possible pairs of nodes that might still be matched(initially V X M)

M:

contains matched node pairs(initially empty)

Page 22: BinHunt: Automatically Finding Semantic Differences in Binary Programs Debian Gao Michael K. Reiter Dawn Song ICICS 2008: 10th International Conference

Case Study——gzip

Page 23: BinHunt: Automatically Finding Semantic Differences in Binary Programs Debian Gao Michael K. Reiter Dawn Song ICICS 2008: 10th International Conference

Case Study——tar(1)

Page 24: BinHunt: Automatically Finding Semantic Differences in Binary Programs Debian Gao Michael K. Reiter Dawn Song ICICS 2008: 10th International Conference

Case Study——tar(2)

Page 25: BinHunt: Automatically Finding Semantic Differences in Binary Programs Debian Gao Michael K. Reiter Dawn Song ICICS 2008: 10th International Conference

Case Study——tar(3)

Page 26: BinHunt: Automatically Finding Semantic Differences in Binary Programs Debian Gao Michael K. Reiter Dawn Song ICICS 2008: 10th International Conference

Related Work& Conclusion BinDiff/BindView

contruct a maximal subgraph isomorphism between the sets of functions in two versions of the same executable file

BinHunt:

contribute a more thorough technique(backtracking technique) for identifying the maximum common subgraph isomorphism

use a novel technique for basic block comparison using symbolic execution and theorem proving

Page 27: BinHunt: Automatically Finding Semantic Differences in Binary Programs Debian Gao Michael K. Reiter Dawn Song ICICS 2008: 10th International Conference

Reference

Page 28: BinHunt: Automatically Finding Semantic Differences in Binary Programs Debian Gao Michael K. Reiter Dawn Song ICICS 2008: 10th International Conference

Thank you!