intraprocedural static slicing of binary executables cristina cifuentes antoine fraboulet university...

29
Intraprocedural Static Slicing of Binary Executables Cristina Cifuentes Antoine Fraboulet University of Queensland International Conference on Software Maintenance(ICSM ’97)

Upload: sandra-poole

Post on 16-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Intraprocedural Static Slicing of Binary

ExecutablesCristina CifuentesAntoine Fraboulet

University of QueenslandInternational Conference on Software

Maintenance(ICSM ’97)

INDEX

ICSM

Author

Motivation

Background

Basic knowledge

Algorithm

1st ICSM

Since its start in 1983, ICSM (International Conference on Software Maintenance) has grown and developed into an international forum for software maintenance researchers and practitioners to examine key issues facing the software maintenance community.

LINK: http://conferences.computer.org/icsm/

2nd AUTHOR

Cristina Cifuentes

Antoine Fraboulet

Cristina Cifuentes Senior Staff Engineer for Sun Microsystems

Laboratories working in the Static Program Analysis group

ACM SIGPLAN Executive Committee, Treasurer and member-at-large, 2007-2012

Chair of the IEEE Reverse Engineering and Reengineering Committee, Technical Council on Software Engineering, 2002-2003

Technical Report User-Input Dependence Analysis via Graph Reachability By: Bernard

Scholz, Chenyi Zhang and Cristina Cifuentes Report Number:TR-2008-171 Mar 31, 2008

Partitioning of Code for a Massively Parallel Machine By: Michael Ball, Cristina Cifuentes and Deepankar Bairagi Report Number:TR-2004-134 Nov 1, 2004

A Transformational Approach to Binary Translation of Delayed Branches with Applications to SPARC® and PA-RISC Instructions Sets By: Cristina Cifuentes and Norman Ramsey Report Number:TR-2002-104 Jan 1, 2002

Experience in the Design, Implementation and Use of a Retargetable Static Binary Translation Framework By: Cristina Cifuentes, Mike Van Emmerik, Brian T.Lewis and Norman Ramsey Report Number:TR-2002-105 Jan 1, 2002

Walkabout-A Retargetable Dynamic Binary Translation Framework By: Brian T. Lewis, David Ung and Cristina Cifuentes Report Number:TR-2002-106 Jan 1, 2002

Antoine Fraboulet Associate Professor at INSA Lyon and a member of the

CITI Lab and of the INRIA Amazones group

Member of the program committee of t2pWSN 07 workshop

Member of the program committee of MSN 07 conference

Member of the program committee of MSN 06 conference

Member of the commitee of the TSI Journal

Organization committee of the RECAP 2006 workshop

Paper Assembly to High-Level Language Translation Cifuentes Cristina; Simon Doug;

Fraboulet Antoine In International Conference on Software Maintenance (11/1998) 228-237

Loop Alignment for Memory Accesses Optimization Fraboulet Antoine ; Huard Guillaume; Mignotte AnneIn International Symposium on System synthesis (ISSS) (10/1999) 71-77

Memory Optimization of Data Flow Applications at the Codesign Level Fraboulet Antoine ; Just-Meunier Laurence; Mignotte AnneIn Sophia Antipolis Forum on MicroElectronics (SAME) (10/2000) 16-21

Loop fusion for memory space optimization Fraboulet Antoine ; Godary Karen ; Mignotte AnneIn International Symposium on System Synthesis (10/2001) 95 – 100

Source Code Loop Transformations for Memory Hierarchy Optimizations Fraboulet Antoine ; Mignotte AnneIn International Conference on Parallel Architectures and Compilation Techniques. Workshop on MEmory access DEcoupled Architectures (MEDEA) (2001) 6

Recommend

Ákos Kiss, Judit Jász, Gábor Lehotai, Tibor Gyimóthy.  Interprocedural Static Slicing of Binary Executables. 3rd IEEE International Workshop on Source Code Analysis and Manipulation (SCAM 2003), 26-27 September 2003, Amsterdam, The Netherlands 2003

Ákos Kiss, Judit Jász, Tibor Gyimóthy. Using Dynamic Information in the Interprocedural Static Slicing of Binary Executables. Software Quality Journal 2005, Volume 13

3rd MOTIVATION

Primary initial goal of slicing was to assist with debugging

Programmers naturally form program slices,mentally, when they debug and understand programs

A program slice consists of the parts of a program that potentially affect the values computed at some point of interest (slicing criteria)

4th BackgroundTools for executable programs

Decode the information stored in the binary file

Decode the machine instructions and translate them to an assembly representation or an equivalent intermediate representation

Problem?

Separation of data and code

Huge amount of codes when debugging

cannot determine what paths to traverse next when faced with an indexed jump instruction, or an indirect call or jump instruction on the value of a register

Q: Then solution?

Representation of Binaries

The general format of a binary executable varies widely based on the binary-file format used by the OS

When running a program, the binary-file format is decoded by the operating system’s loader, which loads the program into memory and passes control to the program via its entry point

WorkflowStep1(binaryassembly or IL):

dasm not only decodes the binary file and its machine instructions, but also stores the instructions in terms of low-level instructions(icode) and control flow graphs for each procedure

Step2(assembly or IL high-level language)

Dateflow analysis is performed in the CFGs to recover high-level information

Dasm:the disassembler of the dcc decompiler, which implements a decoder of the EXE format and creates an intermediate representation of the program.

Icode:resemble assembly instructions with property of only performing one operation at time

5th BASIC KNOWLEDGE

Classification:

Backward Slicing

Forward Slicing

Or

Static Slicing

Dynamic Slicing

Program Slice

Backward slice consists of all statements that the computation at the slicing criteria may depend on

public class SimpleExample { static int add(int a, int b){

return(a+b); }public static void main(final String[] arg){

int i = 1; int sum = 0; while (i < 11) { sum = add(sum, i); i = add(i, 1); } System.out.println("sum = " + sum); System.out.println("i = " + i);

}}

SlicingCriterion

SlicingCriterion

Forward slice includes all statements depending on the slicing criterion

public class SimpleExample { static int add(int a, int b){

return(a+b); }public static void main(final String[] arg){

int i = 1; int sum = 0; while (i < 11) { sum = add(sum, i); i = add(i, 1); } System.out.println("sum = " + sum); System.out.println("i = " + i);

}}

SlicingCriterion

SlicingCriterion

Static slice: all possible executions of the program are taken into account.

Dynamic slice is constructed with respect to only one execution of the program (iteration number is taken into account)

Slicing Method

The most popular approaches are based on dependency graphs (non-executable slices)

Q: How does a specific statement affect the others?

Construct a Program Dependence Graph

A Combination of Data Dependency Graph and Control Dependency Graph

Identify Data Dependency

b depends on a

Identify Control Dependency

Both assignments depend on if statement

How to Determine a Slice

1. a:=32. b:=a

1. if a=true then2. b:=13. else4. c:=0

movl $0x0,0xfffffff8(%ebp)

cmpl $0x0,0xfffffff8(%ebp)

jne 0x8048475 <main+49>

movl $0x1,0xfffffffc(%ebp)

jmp 0x8048485 <main+65>

movl $0x7,0xfffffffc(%ebp)

mov 0xfffffffc(%ebp),%eax

mov %eax,0xfffffff8(%ebp)

Data Dependence GraphControl Dependence Graph

mov $0x5,%eax

sub 0xfffffffc(%ebp),%eax

mov %eax,0xfffffff4(%ebp)

mov 0xfffffffc(%ebp),%eax

sub $0x5,%eax

mov %eax,0xfffffff4(%ebp)

movl $0x0,0xfffffff8(%ebp)

cmpl $0x0,0xfffffff8(%ebp)

jne 0x8048475 <main+49>

movl $0x1,0xfffffffc(%ebp)

jmp 0x8048485 <main+65>

movl $0x7,0xfffffffc(%ebp)

mov 0xfffffffc(%ebp),%eax

mov %eax,0xfffffff8(%ebp)

mov $0x5,%eax

sub 0xfffffffc(%ebp),%eax

mov %eax,0xfffffff4(%ebp)

mov 0xfffffffc(%ebp),%eax

sub $0x5,%eax

mov %eax,0xfffffff4(%ebp)

Control Dependency Graph A node V is post-dominated by a node W if every directed

path from V to Stop contains W

An instruction Y is control dependent on another instruction X iff

There exists a directed path P from X to Y with another instruction Z in P, post-dominated by Y

X is not post-dominated by Y

CFGCFG

Post Dominator

Tree

Post Dominator

Tree

AA

DD

BB CC

STOPSTOP

DD

AA BB CC

6th ALGORITHM

Determine the slice using the conventional algorithm

Add unconditional jumps and returns to the slice

Fix jump labels

BASIC COMPONENTS

Lvalue

An lvalue is expressed as a pair of a base plus an offset. The base address can be either the starting address for the storage for a variable (local or global) or any pointer expression.

Step1 Start point

Using basic blocks as the node granularity

Control dependency

Control dependencies are based on the PDT of a procedure

Control flow analysis of the CFG for the purposes of recovering the underlying control structures of a graph and their nesting level

Data dependency

Data dependencies are presented as ud-chains at the procedure level

Ud-chains are generated for each register and condition code used in an instructon

Idiom analysis & dead-condition code elimination

Step2 & Step 3 Lexical successor tree is used to represent the next high-level statement at

the same nesting level of a given statement

An unconditional jump and a return instruction introduce a break in the flow of control of the program so that they are added to the final slice if they are in the path to the instructions in the slice

Fix target labels by checking all the jumps that belong to the slice

Thank you!