emulation unit v

48
EMULATION

Upload: sreemahasaraswath

Post on 29-Nov-2014

51 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Emulation Unit V

EMULATION

Page 2: Emulation Unit V

• Emulation: binary in one ISA is executed on processor supporting a different ISA.

ways of implementing emulationInterpretation: relatively inefficient instruction -at-a-time

Binary translation: block-at-a-time optimized or repeated

e.g., the execution of programs compiled for instruction set A on a machine that executes instruction set B.

Page 3: Emulation Unit V

Windows 8 Tablets Won't Run PC Apps, After All

Apps written for x86 Windows PCs won't be able to run on ARM-based Win 8 tablets,

By Paul McDougall InformationWeek

Windows 8 for tablets runs on devices powered by chips designed by U.K.-based ARM.

Page 4: Emulation Unit V

VIRTUALIZATION vs EMULATIONIn emulation the virtual

machine simulates the complete hardware in software.

operating system for one computer architecture to be run on the architecture that the emulator is written for.

The diagram below illustrates how a Java based PC emulator fits into the hierarchy of programs that uses the real computer's hardware.

Page 5: Emulation Unit V

Contd…Virtualization therefore is normally faster

than emulation but the real system has to have an architecture identical to the guest system.

For example, VMWare can provide a virtual environment for running a virtual WindowsXP machine "inside" a real one. However VMWare cannot work on any real hardware other than a real x86 PC.

Basically, virtualization will use your computer's actual resources (at least the CPU) for the guest machine. With emulation, it simulates all the hardware using software.This is why virtualization is much faster, since it's using your actual CPU.

Page 6: Emulation Unit V

Emulation Suppose you develop for a system G

(guest, e.g. an ARM-base phone) on your workstation H (host, e.g., an x86 PC). An emulator for V running on H precisely emulates G’s

• CPU,• memory subsystem, and• I/O devices.

Ideally, programs running on the emulated G exhibit the same behaviour as when running on a real G (except for timing).

Page 7: Emulation Unit V

GUEST vs HOST

Guest– environment being supported by underlying platform

Host– underlying platform that provides guest environment

Page 8: Emulation Unit V

Source VS Target

Source ISA or binary– original instruction set or binary the ISA to be emulated

Target ISA or binary– ISA of the host processor underlying ISA

• Source/Target refer to ISAs• Guest/Host refer to platforms

Page 9: Emulation Unit V

Definition• Process of implementing the interface and

functionality of one (sub)system on a (sub)system having a different interface and functionality

– terminal emulators, such as for VT100

• Instruction set emulation– binaries in source instruction set can be executed on machine implementing target instruction set

– e.g., IA-32 execution layer

Page 10: Emulation Unit V

Terminal Emulators• Terminal emulators are software programs that

provide modern computers and devices interactive access to applications running on Mainframe Computer.

• Terminals such as the IBM 370 or VT100 and many others, are no longer produced as physical devices.

• Software running on modern operating systems simulates a "dumb" terminal and is able to render the graphical and text elements of the host application.

Page 11: Emulation Unit V

WHY EMULATIONBinary translation grew out of emulation

techniques in the late 1980s in order to provide for a migration path from legacy CISC machines to the newer RISC machines.

Such techniques were developed by hardware manufacturers interested in marketing their new RISC platforms.

Page 12: Emulation Unit V

Contd…• It would be very useful if one could take the

programs written on a CISC and use them on a RISC processor.

• In this way, one could take advantage of the capabilities of the newer processor without having to spend time and money redeveloping programs already existing on the older platform.

• But CISC binaries are incompatible with the RISC. In order to use CISC programs on a RISC processor, those programs now must be rewritten.

• A binary-to-binary translator that can automatically convert binary code from the CISC to the RISC.

Page 13: Emulation Unit V

Emulation MethodsWays of implementing emulation• Interpretation:

Repeats a cycle of fetch a source instruction, analyze, perform

simple and easy to implement, portablelow performance Basic, Threaded interpretation, Direct Threaded

• Binary Translation: Translates a block of source instruction to a

block of target instruction Save the translated code for repeated useBigger initial translation cost with smaller

execution costMore advantageous if translated code is

executed frequentlyCode discovery, Code location

Page 14: Emulation Unit V

InterpretationAn interpreter is the easier way to emulate a CPU. It

just reproduces how a basic CPU works. A basic CPU reads bytes from an address of the

memory pointed by a special register (PC or Program Counter).

These bytes contain the information about the instruction that the CPU must execute (that is what the CPU has to do).

The CPU must decode these bytes and decide what it has to do. Then it performs the action commanded, updates PC counter and reads another byte or bytes.

That is how it works an emulator interpreter. At first it reads a byte (or the number of bytes which form an instruction in the emulated CPU).

Page 15: Emulation Unit V

switch(memory[PC++]){case OPCODE1:

opcode1();break;

case OPCODE2:opcode2();break;....

case OPCODEn:opcodeN();break;

}

Interpreter emulator.

Page 16: Emulation Unit V

Basic InterpreterEmulates the whole

source machine state

An interpreter needs to maintain the complete architected state of the machine implementing the source ISA

Guest memory and context block is kept in interpreter’s memory (heap) code and data

general-purpose registers, PC, CC, control registers

Code

Data

Stack

Program Counter

Condition Codes

Reg 0

Reg 1

……

Reg n-1

Interpreter Code

Source Memory StateSource Context

Block

Interpreter Overview

Page 17: Emulation Unit V

Decode-and-dispatch interpreter

Decode and dispatch interpreter step through the source program one

instruction at a time decode the current instruction dispatch to corresponding interpreter

routine very high interpretation cost

Page 18: Emulation Unit V

Contd…while (!halt && !

interrupt) {inst = code[PC];opcode =

extract(inst,31,6);switch(opcode) { case

LoadWordAndZero: LoadWordAndZero(inst);

case ALU: ALU(inst);

case Branch: Branch(inst);

. . .}}

Instruction function: LoadLoadWordAndZero(inst){

RT = extract(inst,25,5);RA = extract(inst,20,5);displacement =

extract(inst,15,16);if (RA == 0) source = 0; else source = regs[RA]; address = source + displacement; regs[RT] = (data[address]<< 32)>> 32; PC = PC + 4;}

Page 19: Emulation Unit V

Contd…Instruction function: ALUALU(inst){

RT = extract(inst,25,5);RA = extract(inst,20,5);RB = extract(inst, 15,5);source1 = regs[RA];source2 = regs[RB];extended_opcode = extract(inst,10,10);switch(extended_opcode) {

case Add: Add(inst);case AddCarrying: AddCarrying(inst);case AddExtended: AddExtended(inst);

. . .}PC = PC + 4;}

Page 20: Emulation Unit V

Contd…While (!halt&&interrupt){ switch(opcode){ case ALU:ALU(inst); · · · · ·}

Page 21: Emulation Unit V

Indirect Threaded Interpretation

Page 22: Emulation Unit V

Indirect Threaded InterpretationThreaded interpretation improves

efficiency by reducing branch overhead

append dispatch code with each interpretation routine removes 3 branches threads together function routines

Page 23: Emulation Unit V

Indirect Threaded Interpretation

Put the dispatch code to the end of each interpretation routine.

Add: RT=extract(inst,25,5); RA=extract(inst,20,5); RB=extract(inst,15,5); source1=regs[RA]; source2=regs[RB]; sum=source1+source2; regs[RT]=sum; PC=PC+4; If (halt || interrupt) goto exit; inst=code[PC]; opcode=extract(inst,31,6); extended_opcode=extract(inst,10,10); routine=dispatch[ opcode, extended_opcode]; goto *routine;}

Page 24: Emulation Unit V

Contd…Dispatch occurs indirectly through a dispatch table routine =

dispatch[opcode,extended_opcode]; goto *routine;

Directed threaded interpretation: remove the overhead of accessing the

tableSolution: predecoding and direct

threading

Page 25: Emulation Unit V

Predecoding

Extracting various fields of an instruction is complicatedFields are not aligned, requiring complex bit

extractionSome related fields needed for decoding is not

adjacentIf it is in a loop, this extraction job should be repeated

PredecodingPre-parsing instructions in a form that is easier to

interpreterDone before interpretation startsPredecoding allows direct threaded interpretation

Page 26: Emulation Unit V

Predecoding

In PPC, opcode & extended opcode field are separated and register specifiers are not byte-aligned

Define instruction format and define an predecode instruction array based on the format

Struct instruction { unsigned long op; // 32 bit unsigned char dest; // 8 bit unsigned char src1; // 8 bit unsigned int src2; // 16 bit } code [CODE_SIZE];

Pre-decode each instruction based on this format

Page 27: Emulation Unit V
Page 28: Emulation Unit V

Predecoding

lwz r1, 8(r2)add r3, r3,r1stw r3, 0(r4)

07

1 2 08

08

3 1 03

37

3 4 00

(load word and zero)

(add)

(store word)

Page 29: Emulation Unit V

Contd…It is efficient to perform the repeated decoding

operations only once per source machine address.

Previous interpreter code:LoadWordAndZero(inst){

RT = extract(inst,25,5);RA = extract(inst,20,5);displacement = extract(inst,15,16);if (RA == 0) source = 0;else source = regs[RA];address = source + displacement;regs[RT] = (data[address]<< 32)>> 32;PC = PC + 4;

Page 30: Emulation Unit V

Contd…Predecoded instruction(contained in an array) can now be executed by the following code.

struct instruction { unsigned long op; unsigned char dest; unsigned char src1; unsigned char src1;} code [CODE_SIZE];

Load Word and Zero:RT = code[TPC].dest;RA = code[TPC].src1;displacement = code[TPC].src2; if (RA == 0) source = 0;else source = regs[RA];address = source + displacement;regs[RT] = (data[address]<< 32) >> 32SPC = SPC + 4; TPC = TPC + 1;If (halt || interrupt) goto exit; opcode = code[TPC].oproutine = dispatch[opcode];goto *routine;

Page 31: Emulation Unit V

Contd…Even with predecoding, indirect

threading includes a centralized dispatch table, which requiresMemory access and indirect jump

To remove this overhead, replace the instruction opcode in predecoded format by address of interpreter routine

001048d0

1 2 08

07

1 2 08

If (halt || interrupt) goto exit;opcode= code[TPC].op;routine=dispatch [opcode];goto *routine;

If (halt || interrupt) goto exit;routine= code[TPC].op;goto *routine;

Page 32: Emulation Unit V

SUMMARY

Page 33: Emulation Unit V

Binary TranslationFaster than just to get each opcode from the

emulated CPU and execute a function or code, which implements the function of this instruction, would be to translate that opcode to an opcode in the target CPU.

Binary translation means to get the native code for the emulated CPU and translate this code, using techniques related with compilation for example, into code for the target CPU.

The translated code is stored and executed every time that is called the emulation of the CPU.

Page 34: Emulation Unit V

Translate source binary program to target binary before execution– is the logical conclusion of predecoding– get rid of parsing and jumps altogether– allows optimizations on the native code– achieves higher performance than

interpretation– needs mapping of source state onto

the host state(statemapping)

Page 35: Emulation Unit V

x86 Source Binaryaddl %edx,4(%eax)movl 4(%eax),%edxadd %eax,4

Translate to PowerPC Targetr1 points to x86 register context blockr2 points to x86 memory imager3 contains x86 ISA PC value

Page 36: Emulation Unit V
Page 37: Emulation Unit V

State MappingMaintaining the state of the source

machine on the host(target) machine.

state includes source registers and memory contents

source registers can be held in host registers or in host memory

reduces loads/stores significantly easier if target registers > source

registers

Page 38: Emulation Unit V

Register MappingMap source registers

to target registers if target registers <

source registers map some to memory map on per-block

basis Reduces load/storesignificantly

improves performance

Page 39: Emulation Unit V

r1 points to x86 register context blockr2 points to x86 memory imager3 contains x86 ISA PC valuer4 holds x86 register %eaxr7 holds x86 register %edx etc.addi r16,r4,4 ;add 4 to %eaxlwzx r17,r2,r16 ;load operand from

memoryadd r7,r17,r7 ; perform add of %edxaddi r16,r4,4 ; add 4 to %eaxstwx r7,r2,r16 ; store %edx value into

memoryaddi r4,r4,4 ; increment %eaxaddi r3,r3,9 ; update PC (9 bytes)

Page 40: Emulation Unit V

Predecoding Vs.Binary TranslationRequirement of interpretation

routines during predecoding.After binary translation, code can

be directly executed.

Page 41: Emulation Unit V

Dynamic Binary Translation• Dynamic binary translation is the process of

translating executable code of one architecture to another at runtime.

• The translation is usually done by extracting a sequence of instructions from the source program and then translating them to the target architecture;

• The sequence of instructions is called a translation block. The translated block is typically cached to avoid the need of retranslation if the untranslated block were encountered again.

Page 42: Emulation Unit V

Architecture• The main components of an emulator using

dynamic binary translator are the dispatcher, the translator, and the translation cache.

• Dispatcher:– The purpose of the dispatcher is to control the

execution of a program by selecting the next block to be executed based on the given source address.

– It first checks whether the block at the current address is already translated; that is, whether it resides in the cache.

– If the block is found, it is executed; otherwise, the dispatcher calls the translator.

Page 43: Emulation Unit V

• Translator:– The translator is responsible for translating a

block of source code to an equivalent block of target code and storing the translated block into the translation cache.

– The block can, for instance, be a sequence of instruction from the current point to the next branch. Moreover, the last instruction of the block, the branch, is replaced by a branch to the dispatcher.

Page 44: Emulation Unit V

• Translator Cache:– The translation cache stores the translated

blocks. When the dispatcher is called, it first checks whether the next block is already translated; that is, whether the block resides in cache. One requirement for the translation cache is naturally a fast lookup of blocks.

– An important issue concerns the size of the cache: • if the size is small, little memory is consumed and

better locality is achieved;• a large cache fills up more slowly but has worse

locality and requires more memory.

Page 45: Emulation Unit V

• Instruction Selection:– The phase of instruction selection maps a set

of source instructions to a set of target instructions. The sizes of those sets do not need to match: • a particular PowerPC instruction requires six Alpha

instructions to be implemented. • Another issue in instruction selection are operands.

If, for instance, an immediate operand of a source instruction does not fit to the immediate field of the target instruction, that operand must first be stored somewhere (to a register, for instance) which requires an extra instruction.

Page 46: Emulation Unit V

Condition Codes:Many architectures have instructions which

set condition codes to reflect state. As an example, the arithmetic instructions in

Intel 386 modify the flags register as a side effect (the add instruction, for instance, modifies six bits in that register).

Generating target instructions to handle those modifications can cause a severe overhead.

Page 47: Emulation Unit V

Register allocation• The purpose of register allocation is to map

the registers of the source architecture to the target architecture.

• The ideal case for register allocation occurs when the target architecture has at least the same amount of available registers as the source architecture.

• If, on the other hand, the target architecture has less registers, memory will likely to be needed to hold some of the source registers.

Page 48: Emulation Unit V

Virtual machine is an entity that emulates a guest interface on top of a host machine

Language view:• Virtual machine = Entity that emulates an API (e.g., JAVA) on top of another

Virtualizing software = compiler/interpreter Process view:

• Machine = Entity that emulates an ABI on top of another

• Virtualizing software = runtimeOperating system view:

Machine = Entity that emulates an ISAVirtualizing software = virtual machine monitor

(VMM)