emulation: interpretation

29
Emulation: Interpretation

Upload: zelia

Post on 05-Jan-2016

65 views

Category:

Documents


2 download

DESCRIPTION

Emulation: Interpretation. Presentation Outline. 1. Emulation, Basic Interpretation 2. Threaded Interpretation 3. Emulation of a Complex Instruction Set. Emulation. “Implementing the interface/functionality of one system on a system with different interface/functionality“ - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Emulation: Interpretation

Emulation: Interpretation

Page 2: Emulation: Interpretation

Presentation Outline

1. Emulation, Basic Interpretation

2. Threaded Interpretation

3. Emulation of a Complex Instruction Set

Page 3: Emulation: Interpretation

Emulation

“Implementing the interface/functionality of one system on a system with different interface/functionality“

In VM, it means instruction set emulation Implementing one ISA (the target) reproduces the

behavior of software compiled to another ISA (the source)

Page 4: Emulation: Interpretation

Emulation Methods

Two methods of emulation: interpretation & binary translation

Interpretation Repeats a cycle of fetch a source instruction, analyze,

perform Binary translation

Translates a block of source instr. to a block of target instr. Save the translated code for repeated use Bigger initial translation cost with smaller execution cost More advantageous if translated code is executed frequently

Some in-between techniques Threaded interpretation Predecoding

Page 5: Emulation: Interpretation

Basic Interpreter

Code

Data

Stack

Program Counter

Condition Codes

Reg 0

Reg 1

……

Reg n-1

Interpreter Code

Source Memory StateSource Context

Block

Interpreter Overview

• Emulates the whole source machine state

• Guest memory and context block is kept in interpreter’s memory (heap)

• code and data

• general-purpose registers, PC, CC, control registers

Page 6: Emulation: Interpretation

Decode-and-dispatch interpreter

Interpretation repeats Decodes an instruction Dispatches it to an interpretation routine based on the

type of instruction Code for interpreting PPC ISA

While (!halt && interrupt){ inst=code[PC]; opcode=extract(inst,31,6); switch(opcode){ case LoadWordAndZero: LoadWordAndZero(inst); case ALU: ALU(inst); case Branch: Branch(inst); · · · · ·}

Page 7: Emulation: Interpretation

Instruction Functions

Page 8: Emulation: Interpretation

Instruction Functions

Page 9: Emulation: Interpretation

Decode-and-dispatch interpreter

Advantage Low memory requirements Zero star-up time

Disadvantage: Steady-state performance is slow- A source instruction must be parsed each time it is emulated- Lots of branches would degrade performance

How many branches are there in our interpreter code?

Page 10: Emulation: Interpretation

Branches in Decode-&-Dispatch

Switch(opcode)

case

return

1.Switch statement->case Indirect

2.ALU(inst) direct

3.Return from the routine Indirect

4.Loop back-edge direct

While (!halt&&interrupt){ switch(opcode){ case ALU:ALU(inst); · · · · ·}

We can remove all of these branches with threading

Page 11: Emulation: Interpretation

Threaded Interpretation: Idea

Instruction function list Add: RT=extract(inst,25,5); RA=extract(inst,20,5); RB=extract(inst,15,5); source1=regs[RA]; source2=regs[RB]; sum=source1+source2; regs[RT]=sum; PC=PC+4; If (halt || interrupt) goto exit; inst=code[PC]; opcode=extract(inst,31,6); extended_opcode=extract(inst,10,10); routine=dispatch[opcode,extended_opcode]; goto *routine;}

Put the dispatch code to the end of each interpretation routine.

Page 12: Emulation: Interpretation

Threaded Interpretation (2)

Page 13: Emulation: Interpretation

Threaded Interpretation (3)

Page 14: Emulation: Interpretation

Threaded Interpretation

One key point is that dispatch occurs indirectly thru a dispatch table

routine = dispatch[opcode,extended_opcode]; goto *routine;

Also called indirect threaded interpretation Then, what would be directed threaded

interpretation? Can we remove the overhead of accessing the

table? Solution: predecoding and direct threading

Page 15: Emulation: Interpretation

Predecoding

Extracting various fields of an instruction is complicated Fields are not aligned, requiring complex bit extraction Some related fields needed for decoding is not adjacent

If it is in a loop, this extraction job should be repeated

How can we reduce this overhead? Predecoding Pre-parsing instructions in a form that is easier to

interpreter Done before interpretation starts Predecoding allows direct threaded interpretation

Page 16: Emulation: Interpretation

Predecoding for PPC

In PPC, opcode & extended opcode field are separated and register specifiers are not byte-aligned

Define instruction format and define an predecode instruction array based on the format

Struct instruction { unsigned long op; // 32 bit unsigned char dest; // 8 bit unsigned char src1; // 8 bit unsigned int src2; // 16 bit } code [CODE_SIZE];

Pre-decode each instruction based on this format

Page 17: Emulation: Interpretation

Predecoding Example

Page 18: Emulation: Interpretation

Previous Interpreter Code

Page 19: Emulation: Interpretation

New Interpreter Code

Page 20: Emulation: Interpretation

Directed Threaded Interpretation

Even with predecoding, indirect threading includes a centralized dispatch table, which requires Memory access and indirect jump

To remove this overhead, replace the instruction opcode in predecoded format by address of interpreter routine

001048d0

1 2 08

07

1 2 08

If (halt || interrupt) goto exit;opcode= code[TPC].op;routine=dispatch [opcode];goto *routine;

If (halt || interrupt) goto exit;routine= code[TPC].op;goto *routine;

Page 21: Emulation: Interpretation

Comparison Dispatch-&-Decode Indirect

Threaded

source code source code interpreter routines source code interpreter routines

dispatch loop

(a) (b ) ( c)

Indire

ction T

able

Page 22: Emulation: Interpretation

Comparison

Predecoded Indirect Threaded Direct Threaded

(d) (e)

Predecoder

Indire

ction T

able

Page 23: Emulation: Interpretation

ComparisonDecode-and-Dispatch

Indirect Threaded Interpreter

Direct Threaded Interpreter

Memory requirements

Low Low High

Start-up performance

Fast Fast Slow

Steady-state

performance Slow Slow

(better than the first one)

Medium

Code portability

Good Good Medium

Page 24: Emulation: Interpretation

DSVM

Dynamic Samsung Virtual Machine Splitted interpreter

Inner, Outer loop Instruction cache

Indirect threaded interpretation

Page 25: Emulation: Interpretation

Interpreting CISC ISA

RISC ISA (Power PC) 32 bit register. 32bit length.

Op Rd Rs1

Rs2 OpxRegister-register

Op Rd Rs1

Const

31 25 20 15 10 0

31 25 20 15 0

Register-immediate

Op Const opx

2

Jump/call

Page 26: Emulation: Interpretation

Interpreting a Complex Instruction Set

Prefixes Opcode ModR/M SIB Displacement Immediate

CISC instruction set has a wide variety of formats, variable instruction lengths, and variable field lengths (x86 instruction lengths: 1 ~ 16 bytes)

Up to fourPrefixes of 1 byte each(optional)

1-,2-,or 3-byteopcode

1byte(if required)

1byte(if required)

AddressDisplacement

Of 1,2,or 4Bytes or none

Immediate data

Of 1,2,or 4Bytes or none

Mod Reg/

Opcode

R/M Scale Index Base

7 6 5 3 2 0 7 6 5 3 2 0

IA-32 Instruction Format

Jaemok Lee
X86 명령어는 피연산자로서, 메모리에 있는 값을 사용할 수 있는 경우가 상당히 많습니다.즉... 다음과 같은 명령어가 가능합니다.ADD AX, [SI]이건 SI 레지스터가 가르키고 있는 주소에 있는 값을, AX 라는 레지스터에 더하는 명령입니다.RISC 명령으로 표현하자면,LD R1, [SI]ADD AX, AX, R1이렇게 되겠죠...이렇게 피연산자가 Register 인 경우도 있고, Memory 인 경우도 있습니다. 그것을 구별하기 위해서 ModR/M 이라는 플래그가 있습니다.
Page 27: Emulation: Interpretation

Interpreting a Complex Instruction Set Decode and dispatch

Decode fields and fill in a general template

Jump to routines Slow due to generality Solution

Make common case faster

General Decode

(fill-in instruction Structure)

Dispatch

Inst.1Specialized

routine

Inst.1Specialized

routine

Inst.1Specialized

routine

Page 28: Emulation: Interpretation

Some optimizations

SimpleInst.1

Specializedroutine

SimpleInst.m

Specializedroutine

ComplexInst.m+1

Specializedroutine

ComplexInst.m+1

Specializedroutine

PrefixSet flags

Dispatch On

first byte

Shared routines

Page 29: Emulation: Interpretation

Threaded Interpretation

SimpleInstructionSpecialized

routine

SimpleDecode/Dispatch

SimpleInstructionSpecialized

routine

SimpleDecode/Dispatch

SimpleInstructionSpecialized

routine

SimpleDecode/Dispatch

SimpleDecode/Dispatch

SimpleInstructionSpecialized

routine

ComplexDecode/Dispatch