binary translation

42
Binary Translation Jae Wook Kim

Upload: xia

Post on 22-Jan-2016

48 views

Category:

Documents


0 download

DESCRIPTION

Binary Translation. Jae Wook Kim. Binary Translation. Process of converting the source binary program into a target binary program Enhance performance. Binary Translation Code Discovery & Dynamic Translation Control Transfer Optimizations Instruction Set Issues Case Study. Simulators. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Binary Translation

Binary Translation

Jae Wook Kim

Page 2: Binary Translation

Binary Translation

• Process of converting the source binary program into a target binary program• Enhance performance

Binary Translation

Code Discovery &Dynamic Translation

Control TransferOptimizations

Instruction SetIssues

Case Study

Page 3: Binary Translation

Simulators

• First generation• 100 host instructions per guest instruction simul

ated• Second generation

• Reduce the expansion factor to about 10• Third generation

• Translate groups of instructions as a unit• Expansion factor of about 4• Mimic

Binary Translation

Code Discovery &Dynamic Translation

Control TransferOptimizations

Instruction SetIssues

Case Study

Page 4: Binary Translation

Binary Translation from IA-32 to Power PC

addl %edx,4(%eax)

movl 4(%eax),%eax

add %eax,4

Binary Translation

Code Discovery &Dynamic Translation

Control TransferOptimizations

Instruction SetIssues

Case Study

r1 points to IA-32 register context blockr2 points to IA-32 memory imager3 contains IA-32 ISA PC value

lwz r4,0(r1) ;load %eax from register blockaddi r5,r4,4 ;add 4 to %eaxlwzx r5,r2,r5 ;load operand from memorylwz r4,12(r1) ;load %eax from register blockadd r5,r4,r5 ;perform addstw r5,12(r1) ;put result into %edxaddi r3,r3,3 ;update PC (3 bytes)

lwz r4,0(r1) ;load %eax from register blockaddi r5,r4,4 ;add 4 to %eaxlwz r4,12(r1) ;load %eax from register blockstwx r4,r2,r5 ;store %eax value into memoryaddi r3,r3,3 ;update PC (3 bytes)

lwz r4,0(r1) ;load %eax from register blockaddi r4,r4,4 ;add immediatestw r4,0(r1) ;place result back into %eaxaddi r3,r3,3 ;update PC (3 bytes)

Page 5: Binary Translation

Threaded Interpretation vs Binary Translation

Binary Translation

Code Discovery &Dynamic Translation

Control TransferOptimizations

Instruction SetIssues

Case Study

predecoder binarytranslator

source code

binary translatedtarget code

source code

intermediatecode

interpreterroutines

• In both cases, the original source code is converted to another form

• In predecoding, interpreter routines are needed• In binary translation, converted code is directly exe

cuted

Page 6: Binary Translation

State Mapping from Target to Source Registers

Binary Translation

Code Discovery &Dynamic Translation

Control TransferOptimizations

Instruction SetIssues

Case Study

r1

r2

r3

r4

r5

n+3

Source RegisterBlock

Source MemoryImage

program counter

Reg 1

Reg 2

Reg n

point to the memory image and register block of the source ISA

mapped directly

other portions of the source sate may be held in target registers

Page 7: Binary Translation

Binary Translation with State Mapping

r1 points to IA-32 register context blockr2 points to IA-32 memory imager3 contains IA-32 ISA PC valuer4 holds IA-32 register %eaxr7 holds IA-32 register %edx

addi r16,r4,4 ;add 4 to %eaxlwzx r17,r2,r16 ;load operand from memoryadd r7,r17,r7 ;perform add of %edxaddi r16,r4,4 ;add 4 to %eax stwz r7,r2,r16 ;store %edx value into memoryaddi r4,r4,4 ;increment %eaxaddi r3,r3,9 ;update PC (9 bytes)

Binary Translation

Code Discovery &Dynamic Translation

Control TransferOptimizations

Instruction SetIssues

Case Study

addl %edx,4(%eax)

movl 4(%eax),%eax

add %eax,4

Page 8: Binary Translation

Code Discovery

• Static Predecoding/Translation• Predecode or binary translate a program in its en

tirety before beginning emulation• Difficult or impossible in many situations

• target of a jump instruction is held in a register• data is interspersed with code

Binary Translation

Code Discovery &Dynamic Translation

Control TransferOptimizations

Instruction SetIssues

Case Study

Page 9: Binary Translation

Discovery Problem

Binary Translation

Code Discovery &Dynamic Translation

Control TransferOptimizations

Instruction SetIssues

Case Study

inst. 1 inst. 2

inst. 3

inst. 5 inst. 6

inst. 8

paduncond. branch

datareg.

jump

pad for instruction alignment

data in instruction stream

jump indirect to ???

|mov %ch,0 ??31 c0 | 8b |b5 00 00 03 08 8b bd 00 00 03 00

| movl %esi, 0x08030000(%ebp)??

Page 10: Binary Translation

Code Location

• Translated code is accessed with target program counter (TPC), which is different from source program counter (SPC)• Problem when there is an indirect control

transfer (branch or jump)• Destination address of the control transfer

is held in a register and is a source code address, even though it occurs in the translated code

Binary Translation

Code Discovery &Dynamic Translation

Control TransferOptimizations

Instruction SetIssues

Case Study

movl %eax,4(%esp) ;load jump address from memoryjmp %eax ;jump indirect through %eax

addi r16,r11,4 ;compute IA-32 addresslwzx r4,r2,r16 ;get IA-32 jump address from IA-32 memory imagemtctr r4 ;move to count register (ctr)bctr ;jump indirect through ctr

Page 11: Binary Translation

Solutions for Code-discovery and Code-location

Binary Translation

Code Discovery &Dynamic Translation

Control TransferOptimizations

Instruction SetIssues

Case Study

• Simple• Instruction sets with fixed-length instructions tha

t are always aligned on fixed boundaries• Source instruction sets are explicitly designed to

be emulated• Java bytecodes

• do not allow data to be interspersed with code• restrict control flow instructions (branches and ju

mps) to enable easy code discovery• Sophisticated

• Discussed next

Page 12: Binary Translation

Incremental Predecoding and Translation

• General solution to code discovery• Dynamically: translate the binary while the progra

m is operating on actual input data• Incrementally: predecode or translate new sectio

ns of code as the program reaches them• High-level control is provided by an emulation m

anager (EM)• Translated blocks of code are organized as a co

de cache• A map table associates the SPC for a block of s

ource code with the TPC for the corresponding block of translated code

Binary Translation

Code Discovery &Dynamic Translation

Control TransferOptimizations

Instruction SetIssues

Case Study

Page 13: Binary Translation

Dynamic Translation System

Binary Translation

Code Discovery &Dynamic Translation

Control TransferOptimizations

Instruction SetIssues

Case Study

SPC to TPCMap

Table

EmulationManager

Interpreter TranslatorMiss

Hit

Page 14: Binary Translation

Dynamic Translation Flowchart

Binary Translation

Code Discovery &Dynamic Translation

Control TransferOptimizations

Instruction SetIssues

Case Study

Look UpSPC<->TPCin Map Table

Use SPC to Read Instructions

from SourceMemory Image

-----------Interpret, Translate,

and Place intoCode Cache

Write NewSPC<->TPC

Mapping into Map Table

Get SPCfor Next Block

Branch to TPCand Execute

Translated Block

Hit in Table?

StartwithSPC

No

Yes

Page 15: Binary Translation

Basic flow of Mimic

Binary Translation

Code Discovery &Dynamic Translation

Control TransferOptimizations

Instruction SetIssues

Case Study

Page 16: Binary Translation

Basic Block• System translates one block of source

code at a time• Natural unit is dynamic basic block

• Static basic block• Single entry point and single exit point

• Dynamic basic block• Begins at the instruction executed

immediately after a branch or jump, • Follows sequential instruction stream, • Ends with next branch or jump

• Complication when a branch goes into the middle of a block that is already translated• Additional data structure used to keep

track of ranges of translated code blocks• When using dynamic basic blocks are used

• new translation is always started when there is a miss in the map table, even if it leads to replicated sections of translated code

Binary Translation

Code Discovery &Dynamic Translation

Control TransferOptimizations

Instruction SetIssues

Case Study

Page 17: Binary Translation

Static Versus Dynamic Basic Blocks

add…load… block 1store…

loop: load…add… block 2storebrcond skipload… block 3sub…

skip: add…store block 4brcond loopadd…load…store… block 5jump indirect……

Binary Translation

Code Discovery &Dynamic Translation

Control TransferOptimizations

Instruction SetIssues

Case Study

add…load…store…

loop: load… block 1add…storebrcond skipload…sub… block 2

skip: add…storebrcond loop

loop: load…add…store block 3brcond skip

skip: add…store block 4brcond loop……

StaticBasic Blocks

DynamicBasic Blocks

Page 18: Binary Translation

Flow of Control Involving Translated Blocks

Binary Translation

Code Discovery &Dynamic Translation

Control TransferOptimizations

Instruction SetIssues

Case Study

translatedblock

EmulationManager

translatedblock

translatedblock

• Incrementally, more of the program is discovered and translated, until eventually only translated code is being executed

Page 19: Binary Translation

Tracking the Source Program Code

Binary Translation

Code Discovery &Dynamic Translation

Control TransferOptimizations

Instruction SetIssues

Case Study

EmulationManager

MapTable

Jump and Link to EMNext Source PC

CodeBlock

CodeBlock

• The SPC must be available to the EM• Can be placed in a “stub” at the end of

the translated block• When the translated block finishes,

control is transferred back to the EM using a jump-and-link instruction

Page 20: Binary Translation

Same-ISA Emulation

• Emulation manager is always in control of the software being emulated• Can identify details concerning operations

to be performed by that specific instruction

• Monitor the execution of the source program at any desired level of detail

Binary Translation

Code Discovery &Dynamic Translation

Control TransferOptimizations

Instruction SetIssues

Case Study

Page 21: Binary Translation

Control Transfer Optimizations

• Every time a translation block finishes execution, EM must be reentered and an SPC-to-TPC lookup must occur

• There are a number of optimizations that can reduce this overhead by eliminating the need to go through the EM between every pair of translation blocks

Binary Translation

Code Discovery &Dynamic Translation

Control TransferOptimizations

Instruction SetIssues

Case Study

Page 22: Binary Translation

Translation Chaining

• Instead of branching to the EM at the end of every translated block, the blocks can be linked directly to each other• Blocks are translated one at a time, but

they are linked together into chains as they are constructed

• Linking is accomplished by replacing what was initially a jump and link back to the EM with a direct branch to the successor translation block

Binary Translation

Code Discovery &Dynamic Translation

Control TransferOptimizations

Instruction SetIssues

Case Study

Page 23: Binary Translation

Chaining of Translation Blocks

translatedblock

EmulationManager

translatedblock

translatedblock

translatedblock

Binary Translation

Code Discovery &Dynamic Translation

Control TransferOptimizations

Instruction SetIssues

Case Study

Page 24: Binary Translation

Creating a Link

Get nextSPC

Look upsuccessor

Predecessor

JAL EM

Next SPC

SuccessorSet up link

Jump TPC

(1)(2)

(3)

(4)

(5)

Binary Translation

Code Discovery &Dynamic Translation

Control TransferOptimizations

Instruction SetIssues

Case Study

Page 25: Binary Translation

Software Indirect Jump Prediction

• Indirect jumps by map table lookup is expensive in execution time• Hash SPC to form index into map table• Load and compare to find matching table

entry• Load and indirect jump to TPC address

• In many cases, jump target never or very seldom changes• The most frequent SPC addresses and

their matching TPC addresses are encoded into the translated binary code

Binary Translation

Code Discovery &Dynamic Translation

Control TransferOptimizations

Instruction SetIssues

Case Study

Page 26: Binary Translation

Shadow Stack

• When a translated code block contains a procedure call to a target binary routine, the SPC value must be saved

• When the procedure is finished, it restores the SPC value, accesses the map table, and jumps to translated block of code at the return address

• To avoid this overhead, make the target return PC value be directly available• Return value of target code is pushed onto

a shadow stack maintained by the EM• In case the source code has changed

before the return, check the return address of the source binary against the return address saved in the stack

Binary Translation

Code Discovery &Dynamic Translation

Control TransferOptimizations

Instruction SetIssues

Case Study

Page 27: Binary Translation

Shadow Stack Implementation

Binary Translation

Code Discovery &Dynamic Translation

Control TransferOptimizations

Instruction SetIssues

Case Study

IA-32 return address Shadow stack frame

IA-32 return addressPPC return address

IA-32 stack pointer

IA-32 stack pointer

Compare on return

Returnaddressifcomparesucceeds

IA-32 stack Shadow stack

if compare fails, map table must be used

Page 28: Binary Translation

Instruction Set Issues

• Details to be taken care of in translating a complete instruction setBinary Translation

Code Discovery &Dynamic Translation

Control TransferOptimizations

Instruction SetIssues

Case Study

Page 29: Binary Translation

Register Architectures

• General-purpose registers of the target ISA are used to• Hold general-purpose registers of the

source ISA• Hold special-purpose registers of the

source ISA• Point to the source register context block

and memory image• Hold intermediate values used by the

emulator• Assign target registers to the most

common or performance-critical source resources

Binary Translation

Code Discovery &Dynamic Translation

Control TransferOptimizations

Instruction SetIssues

Case Study

Page 30: Binary Translation

Condition Codes

• Special architected bits that characterize instruction results

• Little uniformity in how they are used across ISAs

• Easiest – neither source nor target ISAs use condition codes

• Almost as easy – source ISA does not use condition codes but target ISA does

• If target ISA has no condition codes, source condition codes must be emulated• Time consuming

Binary Translation

Code Discovery &Dynamic Translation

Control TransferOptimizations

Instruction SetIssues

Case Study

Page 31: Binary Translation

Condition Codes

• Although condition codes are set frequently, they are seldom used

• Lazy evaluation can be performed to make condition code emulation more efficient• Operands and operation that set the

condition code are saved rather than the condition code itself

Binary Translation

Code Discovery &Dynamic Translation

Control TransferOptimizations

Instruction SetIssues

Case Study

Page 32: Binary Translation

Condition Codes

addl %ebx,0(%eax) add %ecx,%ebx jmp label1

.

.label1: jz target

r4 <-> %eax IA-32 tor5 <-> %ebx PowerPCr6 <-> %ecx register mappings…r16 <-> scratch register used by emulation coder25 <-> condition code operand 1 ;registersr26 <-> condition code operand 2 ; used forr27 <-> condition code operation ; lazy condition code emulationr28 <-> jump table base operation

lwz r16,0(r4) ;perform memory load for addlmr r25,r16 ;save operandsmr r26,r5 ; and opcode forli r27,“addi” ; lazy condition code emulationadd r5,r5,r16 ;finish addlmr r25,r6 ;save operandsmr r26,r5 ; and opcode forli r27,“add” ; lazy condition code emulationadd r6,r6,r5 ;translation of addb label1…b1 genZF ;branch and link to evaluate genZF codebeq cr0,target ;branch on condition flag….add r29,r28,r27 ;add “opcode” to jump table base addressmtctr r29 ;copy to counter registerbctr ;branch via jump table……add r24,r25,2r6 ;perform PowerPC add, set cr0blr ;return

label1:

genZF:

“sub”:“add”:

Binary Translation

Code Discovery &Dynamic Translation

Control TransferOptimizations

Instruction SetIssues

Case Study

condition codes set by first add are not used

Page 33: Binary Translation

Data Formats and Arithmetic

• Data formats and arithmetic have become standardized over the years

• There may be some differences in the way floating-point arithmetic is performed on different implementations• IA-32 uses 80-bit intermediate results, unlike m

ost other ISAs• Leads to different precisions

• Source ISA may require functional capability not available in the target ISA• Simple ISA has sufficient primitives to imple

ment the more complex instructions in the other ISA

Binary Translation

Code Discovery &Dynamic Translation

Control TransferOptimizations

Instruction SetIssues

Case Study

Page 34: Binary Translation

Memory Address Resolution

• Most ISAs today address memory to the granularity of individual byes• Emulating a byte-resolution ISA on a machine wit

h word resolution • shift out the low-order byte address bits when per

forming memory access • use the saved byte offset bits to select the speci

fic bytes being accessed

Binary Translation

Code Discovery &Dynamic Translation

Control TransferOptimizations

Instruction SetIssues

Case Study

Page 35: Binary Translation

Memory Data Alignment

• Some ISAs align memory data on “natrual” boundaries• word access must be performed with the 2 low-o

rder address bits being 00 • halfword access must have a 0 for the lowest-or

der bit• If ISA does not support unaligned data, there ar

e supplementary instructions to simplify the process

• Otherwise, ISA specifies a trap if an instruction attempts access with an unaligned address

Binary Translation

Code Discovery &Dynamic Translation

Control TransferOptimizations

Instruction SetIssues

Case Study

Page 36: Binary Translation

Byte Order

• Some ISAs order bytes within a word so that the most significant byte is byte 0 (big-endian)

• Other ISAs order bytes in little-endian order• To emulate one from the other, complement the

low-order two-address bits • 00 -> 11, 01 -> 10, …

Binary Translation

Code Discovery &Dynamic Translation

Control TransferOptimizations

Instruction SetIssues

Case Study

Page 37: Binary Translation

Addressing Architecture

• Address space sizes of source and target ISAs may be different

• Page sizes may be different• Privilege levels may be different• Solution depends on virtual machine architectur

e

Binary Translation

Code Discovery &Dynamic Translation

Control TransferOptimizations

Instruction SetIssues

Case Study

Page 38: Binary Translation

Shade

• Tool developed for high-performance simulation• Translates ISA being simulated into sequences

of target instructions• Dynamically cross-compiles executable code for t

arget machine into executable code that runs directly on the host machine

• Caches host code for ruse• Integrates simulation and tracing code• Gives analyzer detailed control over what is trace

d

Binary Translation

Code Discovery &Dynamic Translation

Control TransferOptimizations

Instruction SetIssues

Case Study

Page 39: Binary Translation

Shade Data Structures

Binary Translation

Code Discovery &Dynamic Translation

Control TransferOptimizations

Instruction SetIssues

Case Study

Page 40: Binary Translation

Shade Data Structures

• Source program counter (VPC), base of source memory image (VMEM), and base of the TLB are permanently mapped to target register

• During execution, source register values (in Virtual State) are temporarily copied to target registers and results are copied back into VS

• Translations fill TC linearly as they are produced• TLB is an array of lists of <target, host> address

pairs• Lookup algorithm linearly scans the TLB until it h

as a match or hits the end of the array• Fast, partial TLB lookup• Slower, full TLB lookup• generate new TC and update TLB

Binary Translation

Code Discovery &Dynamic Translation

Control TransferOptimizations

Instruction SetIssues

Case Study

Page 41: Binary Translation

Translation

Binary Translation

Code Discovery &Dynamic Translation

Control TransferOptimizations

Instruction SetIssues

Case Study

sample application code translation

• Load contents of application registers r1 and r2 from the appication’s virtual state structure into host scratch registers s1 and s2

• Perform add operation• Write result in host scratch register s3 back to vi

rtual state structure location for application register r3

• Update application’s virtual PC

Page 42: Binary Translation

Conclusion

• Memory requirements: high• Size of predecoded memory image is proportional

to original source memory image• Can be reduced if blocks of translated code are c

ached• Start-up performance: very slow

• Source memory image must be interpreted to discover control flow before being translated

• Steady-state performance: fast• Translated binaries execute directly on hardware• Translated blocks can be linked directly together

• Code portability: poor• Code is translated for a specific target ISA

Binary Translation

Code Discovery &Dynamic Translation

Control TransferOptimizations

Instruction SetIssues

Case Study