ia-64 isa a summary jinlin yang phil varner shuoqi li

29
IA-64 ISA A Summary JinLin Yang Phil Varner Shuoqi Li

Upload: claribel-marilynn-merritt

Post on 27-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: IA-64 ISA A Summary JinLin Yang Phil Varner Shuoqi Li

IA-64 ISAA Summary

JinLin Yang

Phil Varner

Shuoqi Li

Page 2: IA-64 ISA A Summary JinLin Yang Phil Varner Shuoqi Li

Overview

• Summary of IA-64

• Register model

• Instruction format and support for explicit parallelism

• Instruction set basics

• Predication and speculation support

• Conclusion

Page 3: IA-64 ISA A Summary JinLin Yang Phil Varner Shuoqi Li

Summary of IA-64

• RISC-style

• Register-Register ISA

• Compiler-based ILP support (VLIW)

• Predication

• Memory-reference speculation

• Basis for Intel Itanium processor

Page 4: IA-64 ISA A Summary JinLin Yang Phil Varner Shuoqi Li

The IA-64 Register Model

Page 5: IA-64 ISA A Summary JinLin Yang Phil Varner Shuoqi Li

• 128 64-bit general-purpose registers (actually 65-bits);

• 128 82-bit floating point registers;

• 64 1-bit predicate registers;

• 8 64-bit branch registers;

• several registers used for system control, memory mapping, performance counters, and communication with the OS.

Components

Page 6: IA-64 ISA A Summary JinLin Yang Phil Varner Shuoqi Li

Register Stack Mechanism(1)

• This technique is used by integer registers to

accelerate procedure calls. (similar to register

windows in SPARC)

• Registers 0-31 are always accessible.

• Registers 32-128 are used as a register stack and

each procedure is allocated a set of registers for its

use.

Page 7: IA-64 ISA A Summary JinLin Yang Phil Varner Shuoqi Li

Register Stack Mechanism(2)

CFM pointer

• CFM pointer points to the set of

registers to be used by a given

procedure

Page 8: IA-64 ISA A Summary JinLin Yang Phil Varner Shuoqi Li

Register Stack Mechanism(3)

How does it work?

1) The new register stack frame is created by registers renaming. So the registers to be used by a given procedure always starts at R32.

2) The callee executes an alloc instruction to allocate both local and output registers for caller.

Page 9: IA-64 ISA A Summary JinLin Yang Phil Varner Shuoqi Li

Register Stack Mechanism(4)

3) The CFM pointer is updated, so R32 of the called procedure points to the output registers of the calling procedure.

(I think there is a typo in the text!)

Page 10: IA-64 ISA A Summary JinLin Yang Phil Varner Shuoqi Li

Register Rotation

• Both the integer and floating point registers support register rotation for registers 32-128

Page 11: IA-64 ISA A Summary JinLin Yang Phil Varner Shuoqi Li

Benefits of register rotation

• Makes it easy to allocate registers in software pipelined loops

• When combined with predication, it can reduce the code expansion incurred by using software pipelining.

• Makes this technique usable for loops with small number of iterations.

Page 12: IA-64 ISA A Summary JinLin Yang Phil Varner Shuoqi Li

Instruction Format and Support for Explicit Parallelism

IA-64 has the combination of :

• Major benefits of VLIW-approach

• Greater Flexibility

Page 13: IA-64 ISA A Summary JinLin Yang Phil Varner Shuoqi Li

Inherit major benefits of VLIW

• Implicit parallelism among operations in an instruction

• Fixed formatting of the operation fields

• Relying on the compiler to detect ILP and schedule insts into slots

Page 14: IA-64 ISA A Summary JinLin Yang Phil Varner Shuoqi Li

Greater Flexibility

• Flexibility in formatting of instructions

• Allowing the compiler to indicate when an inst cannot be executed in parallel with its successors

Page 15: IA-64 ISA A Summary JinLin Yang Phil Varner Shuoqi Li

Implicit Parallelism

• Placing instructions into instruction groups

Page 16: IA-64 ISA A Summary JinLin Yang Phil Varner Shuoqi Li

Instruction group

• a sequence of consecutive instructions with no register data dependency

• Instructions in the group can be executed in parallel

• Instruction group can be arbitrarily long

• Compiler must explicitly indicate group boundary by a “stop”

Page 17: IA-64 ISA A Summary JinLin Yang Phil Varner Shuoqi Li

Fixed Formatting

• Instructions are encoded in bundles

Page 18: IA-64 ISA A Summary JinLin Yang Phil Varner Shuoqi Li

Bundles• 128-bit wide• 5-bit template field -- specify exec unit type

needed by each inst in the bundle and possible presence of stop – I-unit, M-unit, F-unit, B-unit, L + X(used to encode

64-bit immediate and a few special instructions)

– One Execution Unit Slot can hold more than one type of Instruction

• Three 41-bit instructions

Page 19: IA-64 ISA A Summary JinLin Yang Phil Varner Shuoqi Li

Different Code Scheduling Algorithms

• Code scheduled to minimize the number of bundles – more stalls between bundles due to data dependency

• Code scheduled to minimize the number of cycles – more empty slots

• The number of empty slots and the use of bundles may lead to much larger code size

Page 20: IA-64 ISA A Summary JinLin Yang Phil Varner Shuoqi Li

Instruction Set Basics

• Inst encoding– Major opcode (high-order 4-bit opcode+exec

unit slot designation bits)– Specification bits of predicate register that

guards the instruction (low order 6 bits)

• The encoding strategy leads to various inst formats for each inst type

Page 21: IA-64 ISA A Summary JinLin Yang Phil Varner Shuoqi Li

Predication and Speculation

• Nearly every instruction can be predicated

• Specify by predicate register (lower six bits of each inst)

• if-conversion and code motion have lower overhead

• Conditional branch is just branch with guarding predicate

Page 22: IA-64 ISA A Summary JinLin Yang Phil Varner Shuoqi Li

Setting predicates

• Predicates set using compare or test instructions

• compare – 10 different tests– two predicate register destinations– written: result + complement or logical function

+ complement

• multiple comparisons

Page 23: IA-64 ISA A Summary JinLin Yang Phil Varner Shuoqi Li

Speculation

• control speculation - speculated inst past branch

• exception handling

• memory reference speculation

Page 24: IA-64 ISA A Summary JinLin Yang Phil Varner Shuoqi Li

Deferred exception handling

• NaT - Not A Thing – equivalent of poison bits– make GPRs 65 bits wide

• NaTVal - FP registers - Not A Thing Value– invalid IEEE FP value– FP exceptions handled separately

Page 25: IA-64 ISA A Summary JinLin Yang Phil Varner Shuoqi Li

Deferred exception handling II

• Only generated by speculative load– all inst will propagate– nonspec cannot defer NaT

Page 26: IA-64 ISA A Summary JinLin Yang Phil Varner Shuoqi Li

Deferred exception handling III

• Non-speculated instruction gets NaT - immediate and unrecoverable exception

• chk.s – detect NaT or NaTVal– branch to routine

• provides special instructions for storing NaT and NaTVal registers for saving processor state

Page 27: IA-64 ISA A Summary JinLin Yang Phil Varner Shuoqi Li

Memory reference

• advanced loads - spec moved from a store on which it was dependant

• ld.a

• special entry in ALAT – register destination of the load– address of the accessed memory location

Page 28: IA-64 ISA A Summary JinLin Yang Phil Varner Shuoqi Li

Memory reference II

• when store is executed - active ALAT entries looked up, if ALAT entry with same address, ALAT entry marked as invalid

• Any nonspeculative instruction must check ALAT before using value from ld.a– if ALAT value is valid, clear ALAT entry– if not,

• ld.c - reload from memory (only used with ld.a)

• chk.a - reload and execute "clean up" code

Page 29: IA-64 ISA A Summary JinLin Yang Phil Varner Shuoqi Li

Conclusion

• Hits a lot of hot technologies– RISC– VLIW– Predication– Speculation

• Itanium will/may show viability of approach