instruction set architectureadiaz/arqcomp/04-isa.pdfprogrammer, i. e., the conceptual structure and...
TRANSCRIPT
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 1
Instruction Set ArchitectureInstruction Set Architecture
Arquitectura de ComputadorasArquitectura de ComputadorasArturo DArturo Dííaz Paz Péérezrez
Centro de InvestigaciCentro de Investigacióón y de Estudios Avanzados del IPNn y de Estudios Avanzados del IPNLaboratorio de TecnologLaboratorio de Tecnologíías de Informacias de Informacióónn
[email protected]@cinvestav.mx
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 2
InstructionInstruction SetSet
♦
... the attributes of a [computing] system as seen by the programmer, i. e., the conceptual structure and functional behavior, as distinct from the organization of the data flows and controls of logic design, and the physical implementation.
■
Amdahl, Blaaw, Brooks, 1964.
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 3
Instruction Set DesignInstruction Set Design
instruction set
software
hardware
Which is easier to change/design?
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 4
Instruction Set ArchitectureInstruction Set Architecture
♦
... the attributes of a [computing] system as seen by the programmer, i. e., the conceptual structure and functional behavior, as distinct from the organization of the data flows and controls of logic design, and the physical implementation.
■
Amdahl, Blaaw, Brooks, 1964.
♦
Organization of programmable storage♦
Data types & data structures: encodings and representations
♦
Instruction formats♦
Instruction (or Operand Code) Set
♦
Modes of addressing and accessing data items and instructions♦
Exceptional conditions
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 5
ISA: What Must be Specified?ISA: What Must be Specified?
InstructionFetch
InstructionDecode
OperandFetch
Execute
ResultStore
NextInstruction
♦
Instruction Format or Encoding■
how is it decoded?
♦
Location of operands and result■
where other than memory?
■
how many explicit operands?■
how are memory operands located?
■
which can or cannot be in memory?♦
Data type and Size
♦
Operations■
what are supported
♦
Successor instruction■
jumps, conditions, branches
■
fetch-decode-execute is implicit!
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 6
EvolutionEvolution ofof InstructionInstruction SetsSets
Single Accumulator (EDSAC 1950)
Accumulator + Index Registers(Manchester Mark I, IBM 700 series 1953
Separation of Programming Modelfrom Implementation
High-level Language Based(B5000 1963)
Concept of a FamilyIBM 360 1964
General Purpose Register Machines
Complex Instruction Sets(Vax, Intel 432 1977-80)
Load/Store Architecture(CDC 6600, Cray 1 1963-76)
RISC: MIPS, Sparc, 88000, IBM RS6000, ... 1987
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 7
Basic ISA Basic ISA ClassesClasses
♦
Accumulator:1 address
add A
acc ← acc + mem[A]
1+x address addx A
acc ← acc + mem[A + x]
♦
Stack0 address
add
tos ← tos + next
♦
General Purpose register2 address
add A B
EA(A) ← EA(A) + EA(B)
3 address
add A B C
EA(A) ← EA(B) + EA(C)
♦
Load/Store3 address
add Ra Rb Rc
Ra ← Rb + Rc
load Ra Rb
Ra ← mem[Rb]store Ra Rb
mem[Rb] ← Ra
Most real machines are hybrids of those
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 8
Code sequence for (C = A + B) for four classes of instruction sets:
Accumulator
Load AAdd BStore C
Register (register-memory)
Load R1,AAdd R1,BStore C, R1
Stack
Push APush BAddPop C
Register (load-store)
Load R1,ALoad R2,BAdd R3,R1,R2Store C,R3
Comparison:Bytes per instruction? Number of Instructions? Cycles per instruction?
Comparing Number of InstructionsComparing Number of Instructions
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 9
General Purpose Registers DominateGeneral Purpose Registers Dominate
♦
1975-200x all machines use general purpose registers
♦
Advantages of registers■
registers are faster than memory
■
registers are easier for a compiler to use» e.g., (A*B) –
(C*D) –
(E*F) can do multiplies in any order vs.
stack
■
registers can hold variables» memory traffic is reduced, so program is sped up (since
registers are faster than memory)» code density improves (since register named with fewer bits
than memory location)
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 10
Caches vs. Caches vs. RegistersRegisters
♦
Registers advantages■
Faster (no addressing mode, no tags)
■
Deterministic (no misses)■
Can duplicate for two ports
■
Short identifier (3-8 bits)
♦
Register disadvantages■
Must save/restore on procedure calls
■
Can’t take the address of a register■
Fixed size (FP, strings, structures)
■
Compiler must control (?)
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 11
Caches vs. Caches vs. RegistersRegisters ((contcont’’dd))
♦
How many registers? More means
+ Hold operands longer (reducing memory traffic & potentially execution time)
-
Longer register specifiers (except with register windows)-
Slow registers
-
More state slows context switches
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 12
♦
Programmable storage■ 232
x bytes of memory■ 31 x 32-bit GPRs
(R0 = 0)
■ 32 x 32-bit FP regs
(paired DP)■ HI, LO, PC
0r0r1°°°r31PClohi
MIPS I RegistersMIPS I Registers
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 13
Data Movement Load (from memory)Store (to memory)memory-to-memory moveregister-to-register moveinput (from I/O device)output (to I/O device)push, pop (to/from stack)
Arithmetic integer (binary + decimal) or FPAdd, Subtract, Multiply, Divide
Logical not, and, or, set, clear
Shift shift left/right, rotate left/right
Typical OperationsTypical Operations
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 14
Control (Jump/Branch) unconditional, conditional
Subroutine Linkage call, returnInterrupt trap, returnSynchronization test & set (atomic r-m-w)String search, translateGraphics (MMX) parallel subword
ops (4 16bit add)
Typical OperationsTypical Operations
little change since 1960
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 15
° Rank instruction Integer Average Percent total executed1 load 22%2 conditional branch 20%3 compare 16%4 store 12%5 add 8%6 and 6%7 sub 5%8 move register-register 4%9 call 1%10 return 1%
Total 96%° Simple instructions dominate instruction frequency
Top 10 80x86 InstructionsTop 10 80x86 Instructions
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 16
Operation SummaryOperation Summary
♦
Support these simple instructions, since they will dominate the number of instructions executed: ■
load,
■
store, ■
add,
■
subtract, ■
move register-register,
■
and, ■
shift,
■
compare equal, compare not equal, ■
branch, jump,
■
call, ■
return;
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 17
OperandsOperands forfor ALU ALU instructionsinstructions
♦
ALU instructions combine operands (e.g. ADD)♦
Number of explicit operands■
Two -
destination equals one source
■
Three -
orthogonal
♦
Operands in registers or memory■
Any combination --
VAX
» (orthogonal, but variable instr. formats)■
At least one register --
much of 360
» (not orthogonal)■
All registers --
CRAY, DLX, RISCs
» (orthogonal, but needs loads/stores)
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 18
Memory AddressingMemory Addressing
♦
Since 1980 almost every machine uses addresses to level of 8-bits (byte)
♦
2 questions for design of ISA:■
Since could read a 32-bit word as four loads of bytes from sequential byte addresses or as one load word from a single byte address,
» How do byte addresses map onto words?» Can a word be placed on any byte boundary?
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 19
♦
Big Endian:
address of most significant byte = word address (xx00 = Big End of word)
■
IBM 360/370, Motorola 68k, MIPS, Sparc, HP PA
♦
Little Endian:
address of least significant byte = word address (xx00 = Little End of word)
■
Intel 80x86, DEC Vax, DEC Alpha (Windows NT)
♦
Mode selectable■
becoming more common: PowerPC, MIPS R10000
msb lsb3 2 1 0
little endian byte 0
0 1 2 3big endian byte 0
Addressing Objects: Addressing Objects: EndianEndian WarsWars
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 20
0 1 2 3
Aligned
NotAligned
Addressing Objects: AlignmentAddressing Objects: Alignment
♦
Alignment: require that objects fall on address that is multiple of their size.
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 21
AlignmentAlignment
♦
No restrictions■
Simpler software
■
Hardware must detect misalignment and make 2 memory accesses■
expensive logic, slows down all references
■
sometimes required for backward compatibility♦
Restrictred alignment■
software must guarantee alignment
■
hardware only detecs misalignment and traps■
trap handler does it
♦
Middle group■
misaligned data ok but requires multiple instructions
■
compiler must skill know■
still trap on misaligned access
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 22
A A ““typicaltypical”” RISCRISC
♦
32-bit fixed
format
instruction
(3 formats)♦
32 32-bit GPR (R0 contains
zero, DP take
pair)
♦
3-address, reg-reg
arithmetic
instruction♦
Single address
mode
for
load/store: base+displacement
■
no indirection
♦
Simple branch
conditions♦
Delay
branch
see: SPARC, MIPS MC88100, AMD2900, i960, i860, PARisc, DEC Alpha, Clipper, CDC 6600, CDC 7600, Cray-1, Cray-2, Cray-3, ...
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 23
VAXVAX--1111
♦
Variable format, 2 and 3 address instruction♦
32-bit word size, 16 GPR (four reserved)
♦
Rich set of addressing modes (apply to any operand)♦
Rich set of operations■
bit-field, stack, call, case, loop, string, poly, system)
♦
Rich set of data types (B, W, L, Q, O, F, D, G, H)♦
Condition codes
OpCode A/M A/M A/M
Byte 0 1 n m
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 24
VAXVAX--11: 11: AddressingAddressing ModesModes
1.
Register
Ri2.
Base + Displacement
M[Ri
+ v]3.
Immediate
v4.
Register
Indirect
M[Ri]5.
Direct
(absolute)
M[v]6.
Base + Index
M[Ri
+ Rj]7.
Scaled
Index
M[Ri
+ Rj*d + v]8.
Autoincrement M[Ri++]
9.
Autodecrement M[Ri--]
10.
Memory
Indirec M[ M[Ri] ]
11.
[Indirection
chains]
Modes
1-4 account
for
93 % of
all operands
on
the
VAX
Memory
Register File
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 25
Addressing Mode UsageAddressing Mode Usage
♦
3 programs measured on machine with all address modes (VAX)---
Displacement:
42% avg, 32% to 55%
---
Immediate: 33% avg, 17% to 43%---
Register deferred (indirect): 13% avg, 3% to 24%
---
Scaled: 7% avg, 0% to 16%---
Memory indirect: 3% avg, 1% to 6%
---
Misc:
2% avg, 0% to 3%♦
75% displacement & immediate
♦
85% displacement, immediate & register indirect
75% 85%
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 26
° Avg. of 5 SPECint92 programs v. avg. 5 SPECfp92 programs
° 1% of addresses > 16-bits
° 12 -
16 bits of displacement needed
0%5%
10%15%20%25%30%
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Int. Avg. FP Avg.
Address Bits
Displacement Address Size?Displacement Address Size?
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 27
Immediate Size?Immediate Size?
♦
50% to 60% fit within 8 bits
♦
75% to 80% fit within 16 bits
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 28
Addressing SummaryAddressing Summary
♦
Data Addressing modes that are important:■
Displacement, Immediate, Register Indirect
♦
Displacement size should be 12 to 16 bits
♦
Immediate size should be 8 to 16 bits
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 29
Variable:
Fixed:
Hybrid:
……
Generic Example of Instruction Format WidthsGeneric Example of Instruction Format Widths
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 30
Instruction FormatsInstruction Formats
♦
If code size is most important, use variable length instructions
♦
If performance is most important, use fixed length instructions
♦
Recent embedded machines (ARM, MIPS) added optional mode to execute subset of 16-bit wide
instructions (Thumb, MIPS16); per procedure decide performance or density
♦
Some architectures actually exploring on-the-fly decompression for more density.
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 31
Instruction FormatInstruction Format
♦
If have many memory operands per instruction and/or many addressing modes:
=>Need one address specifier
per operand
♦
If have load-store machine with 1 address per instr. and one or two addressing modes:
=> Can encode addressing mode in the opcode
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 32
op rs rt rd
register
Register (direct)
immedop rs rtImmediate
• All instructions 32 bits wide
Base+indeximmedop rs rt
register +
Memory
PC-relativeimmedop rs rt
PC +
Memory
• Register Indirect?
MIPS Addressing Modes/Instruction MIPS Addressing Modes/Instruction FormatsFormats
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 33
Most Popular ISA of all time: Intel Most Popular ISA of all time: Intel 80x8680x86♦
1971: Intel invents microprocessor 4004/8008, 8080 in 1975
♦
1975: Gordon Moore realized one more chance for new ISA before ISA locked in for decades■
hired CS people in Oregon
■
weren’t ready in 1977 (CS people did 432 in 1980)■
started crash effort for 16-bit microcomputer
♦
1978: 8086 dedicated registers, segmented address, 16 bit■
8088; 8-bit external bus version of 8086
♦
1980: IBM selects 8088 as basis for IBM PC♦
1980: 8087 floating point coprocessor: adds 60 instructions using hybrid stack/register scheme
♦
1982: 80286 24-bit address, protection, memory mapping♦
1985: 80386 32-bit address, 32-bit GP registers, paging
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 34
Intel x86 (IAIntel x86 (IA--32)32)
♦
1989: 80486 & Pentium in 1992: faster + MP few instructions♦
1997: MMX multimedia extensions
♦
200X: Superseded by IA-64 (Merced, McKinley, Itanium, etc.)♦
“Difficult to explain and impossible to love”■
See H&P Appendix D.8
♦
Eight 32-bit registers (EAX, EBX, ..., but also ESP, EBP)♦
Also 16-
and 8-bit version (AX, AH, AL)
♦
Most instructions have two operands, one possibly from memory♦
One super-duper addressing mode w/ effective address =■
base_reg
+ (index_reg
* scaling_factor) + displacement
♦
Many formats: see H&P fig. D.8
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 35
Intel MMXIntel MMX
♦
MultiMedia eXtension to IA-32 [Peleg & Weiser, IEEE Micro, 8/96]■
Multimedia data values often need much less than 32 bits
■
But are organized in groups (e.g. red/green/blue)■
So in 64-bit FP registers: 2x32, 4x16, 8x8
♦
E.g. ADDB (for byte)■
17
87
100
...
6 more
■
+17
13
200
...
6 more■
------
---
-----
...
-----------
■
34
100
255
...
6 more
♦
MMX takes 16-element dot product (a0
*b0
+ a1
*b1
+ ... + a15
*b15
)■
from 200 to 16 instructions & from 76 to 12 cycles (6x)
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 36
Control (Jump/Branch) unconditional, conditional
Subroutine Linkage call, returnInterrupt trap, returnSynchronization test & set (atomic r-m-w)String search, translateGraphics (MMX) parallel subword
ops (4 16bit add)
Typical OperationsTypical Operations
little change since 1960
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 37
Control InstructionsControl Instructions
Taken or not taken?
Where is the target?
Link return address
Save or restore state
Conditional branches
X X
Jumps XProcedure calls
X X X
Procedure returns
X X
O.S. calls X X XO.O. returns X X
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 38
(1) (1) Taken or not taken ?Taken or not taken ?
♦
Compare and branch instruction+
No extra compare instruction
+
No state passed between instructions-
Requires ALU operation
-
Restricts code scheduling opportunities
♦
Implicitly set condition codes (Z, N, V, C)+
Can be set “for free”
-
Constrains code reordering-
Extra state to save and restore
♦
Explicitly set condition codes (Z, N, V, C)+
Can be set “for free”
+
Decouples branch/fetch from pipeline-
Extra state to save and restore
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 39
(1) (1) Taken or not taken ?, cont.Taken or not taken ?, cont.
♦
Condition in general-purpose register+
No special state to save and implement but uses up a register
-
branch condition separated from branch logic in pipeline
♦
Some data for MIPS■
> 80 % of compares for branches use immediates
■
> 80 % of these immediates are zero■
50 % compares for branches are =0 or != 0
♦
Compromise used in MIPS■
Have branch-if = 0 and branch-if != 0
■
Have compare instructions (r1=r2, r1 != r2, r1 < r2, r1 <= r2, etc.)
♦
With pipelining, can we predict whether taken ?■
Statically ?
■
Dynamically ?
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 40
(2) Where is the target ?(2) Where is the target ?
♦
Could use Arbitrary Specifier
?+
Orthogonal and powerful
-
More bits to specify, more time to decode-
branch execution and target separated in pipeline
♦
PC-relative with immediate+
Position independence (helps linking), target computable in branch unit
+
Short immediate sufficient. MIPS word immediate:<= 4 bits: 47 %<= 8 bits: 94 %<= 12 bits: 100 %
-
Target must be known statically (to link)-
Can’t jump arbitrarily far
-
Other techniques are required for returns and distance jumps
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 41
(2) Where is the target ?, cont.(2) Where is the target ?, cont.
♦
Register+
Short specification
+
Can jump anywhere+
Dynamic target okay (returns)
-
Extra instruction to load register
♦
(Vectored) TrapCritical for O.S. calls+
Protection.
-
Implementation headache
♦
Common compromise■
(Conditional) branches (pc-rel)
■
(Unconditional) jumps (pc-rel, reg)
■
Procedure calls (pc-rel, reg)■
Procedure returns (reg)
■
O.S. calls (trap)■
O.S. returns (reg)
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 42
(3) Link return address ?(3) Link return address ?
♦
Implicit register+
Fast, simple
-
SW must save register before next call
-
Surprise traps or interrups
?
♦
Explicit register-
No important advantages over above
-
Register must be specified
Required for procedure calls and O.S. calls
•
Processor stack+
Recursion supported directly-
Complex instruction
Many recent architectures use implicit register
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 43
(4) (4) Save or restore state ?Save or restore state ?
♦
What state ?■
Procedure calls: registers
■
O.S. calls: registers and PSW (incl. CCs)
♦
Hardware need not save registers■
Caller can save registers in use
■
Callee
can save registers it will use
♦
Hardware register save■
Which (IBM STM, VAX CALLS) ?
■
Is the above faster ?■
Register windows
Many recent architectures do no register saving or do implicit saving with register windows
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 44
MIPS: MIPS: RegisterRegister StateState
♦
32 integer
registers■
$0 is hardwared to 0
■
$31 is
the
return
address register
■
software convention
for
other registers
♦
32 single-precision
FP registers
or
16 double-
precision
FP registers
♦
PC and
other
special
registers
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 45
MIPS I Operation OverviewMIPS I Operation Overview
♦
Arithmetic Logical:■
Add, AddU, Sub, SubU, And, Or, Xor, Nor, SLT, SLTU
■
AddI, AddIU, SLTI, SLTIU, AndI, OrI, XorI, LUI■
SLL, SRL, SRA, SLLV, SRLV, SRAV
♦
Memory Access:■
LB, LBU, LH, LHU, LW, LWL,LWR
■
SB, SH, SW, SWL, SWR
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 46
Multiply / DivideMultiply / Divide
♦
Start multiply, divide■
MULT rs, rt
■
MULTU rs, rt■
DIV rs, rt
■
DIVU rs, rt♦
Move result from multiply, divide■
MFHI rd
■
MFLO rd♦
Move to HI or LO■
MTHI rd
■
MTLO rd♦
Why not third field for destination? ■
(Hint: how many clock cycles for multiply or divide vs. add?)
Registers
HI LO
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 47
Data TypesData Types
Bit: 0, 1
Bit String: sequence of bits of a particular length4 bits is a nibble8 bits is a byte
16 bits is a half-word32 bits is a word64 bits is a double-word
Character:ASCII 7 bit codeUNICODE 16 bit code
Decimal:digits 0-9 encoded as 0000b thru 1001btwo decimal digits packed per 8 bit byte
Integers:2's Complement
Floating Point:Single PrecisionDouble PrecisionExtended Precision
M x RE
How many +/- #'s?Where is decimal pt?How are +/- exponents
represented?
exponent
basemantissa
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 48
Operand Size UsageOperand Size Usage
Frequency of reference by size
0% 20% 40% 60% 80%
Byte
Halfword
Word
Doubleword
0%
0%
31%
69%
7%
19%
74%
0%
Int Avg.
FP Avg.
•
Support for these data sizes and types: 8-bit, 16-bit, 32-bit integers and 32-bit and 64-bit IEEE 754 floating point numbers
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 49
MIPS arithmetic instructionsMIPS arithmetic instructions
Instruction Example Meaning Commentsadd add $1,$2,$3 $1 = $2 + $3 3 operands; exception possiblesubtract sub $1,$2,$3 $1 = $2 – $3 3 operands; exception possibleadd immediate addi $1,$2,100 $1 = $2 + 100 + constant; exception possibleadd unsigned addu $1,$2,$3 $1 = $2 + $3 3 operands; no exceptionssubtract unsigned subu $1,$2,$3 $1 = $2 – $3 3 operands; no exceptionsadd imm. unsign. addiu $1,$2,100 $1 = $2 + 100 + constant; no exceptionsmultiply mult $2,$3 Hi, Lo = $2 x $3 64-bit signed productmultiply unsigned multu$2,$3 Hi, Lo = $2 x $3 64-bit unsigned productdivide div $2,$3 Lo = $2 ÷ $3, Lo = quotient, Hi = remainder
Hi = $2 mod $3 divide unsigned divu $2,$3 Lo = $2 ÷ $3, Unsigned quotient & remainder
Hi = $2 mod $3Move from Hi mfhi $1 $1 = Hi Used to get copy of HiMove from Lo mflo $1 $1 = Lo Used to get copy of Lo
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 50
MIPS logical instructionsMIPS logical instructions
Instruction Example Meaning Commentand and $1,$2,$3 $1 = $2 & $3 3 reg. operands; Logical ANDor or $1,$2,$3 $1 = $2 | $3 3 reg. operands; Logical ORxor xor $1,$2,$3 $1 = $2 ⊕
$3 3 reg. operands; Logical XORnor nor $1,$2,$3 $1 = ~($2 |$3) 3 reg. operands; Logical NORand immediate andi $1,$2,10 $1 = $2 & 10 Logical AND reg, constantor immediate ori $1,$2,10 $1 = $2 | 10 Logical OR reg, constantxor immediate xori $1, $2,10 $1 = ~$2 &~10 Logical XOR reg, constantshift left logical sll $1,$2,10 $1 = $2 << 10 Shift left by constantshift right logical srl $1,$2,10 $1 = $2 >> 10 Shift right by constantshift right arithm. sra $1,$2,10 $1 = $2 >> 10 Shift right (sign extend) shift left logical sllv $1,$2,$3 $1 = $2 << $3 Shift left by variableshift right logical srlv $1,$2, $3 $1 = $2 >> $3 Shift right by variableshift right arithm. srav $1,$2, $3 $1 = $2 >> $3 Shift right arith. by variable
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 51
MIPS data transfer instructionsMIPS data transfer instructions
Instruction CommentSW 500(R4), R3
Store word
SH 502(R2), R3
Store halfSB 41(R3), R2
Store byte
LW R1, 30(R2)
Load wordLH R1, 40(R3)
Load halfword
LHU R1, 40(R3)
Load halfword
unsignedLB R1, 40(R3)
Load byte
LBU R1, 40(R3)
Load byte unsigned
LUI R1, 40
Load Upper Immediate (16 bits shifted left by 16)
0000 … 0000
LUI R5
R5
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 52
When does MIPS sign extend?When does MIPS sign extend?
♦
When value is sign extended, copy upper bit to full value: Examples of sign extending 8 bits to 16 bits:
00001010 ⇒ 00000000 00001010
10001100 ⇒ 11111111 10001100
♦
When is an immediate value sign extended?■
Arithmetic instructions (add, sub, etc.) sign extend immediates
even for the unsigned versions of the instructions!■
Logical instructions do not sign extend
♦
Load/Store half or byte do sign extend, but unsigned versions do not.
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 53
Methods of Testing ConditionMethods of Testing Condition
♦
Condition Codes■
Processor status bits are set as a side-effect of arithmetic instructions (possibly on Moves) or explicitly by compare or test instructions.
ex:
add r1, r2, r3bz
label
♦
Condition RegisterEx:
cmp
r1, r2, r3
bgt
r1, label
♦
Compare and BranchEx:
bgt
r1, r2, label
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 54
Conditional Branch DistanceConditional Branch Distance
Bits of Branch Dispalcement
0%10%20%30%40%
0 1 2 3 4 5 6 7 8 910 11 12 13 14 15
Int. Avg. FP Avg.
• 25% of integer branches are 2 to 4 instructions
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 55
Conditional Branch AddressingConditional Branch Addressing
♦
PC-relative since most branches are relatively close to the current
PC♦
At least 8 bits suggested (±128 instructions)
♦
Compare Equal/Not Equal most important for integer programs (86%)
Frequency of comparison types in branches
0% 50% 100%
EQ/NE
GT/LE
LT/GE
37%
23%
40%
86%
7%
7%
Int Avg.
FP Avg.
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 56
MIPS Compare and BranchMIPS Compare and Branch
♦
Compare and Branch■
BEQ rs, rt, offset if R[rs] == R[rt] then PC-relative branch
■
BNE rs, rt, offset
<>
♦
Compare to zero and Branch■
BLEZ rs, offset
if R[rs] <= 0 then PC-relative branch
■
BGTZ rs, offset
>■
BLT
<
■
BGEZ >=■
BLTZAL rs, offset if R[rs] < 0 then branch and link (into R 31)
■
BGEZAL
>=!
♦
Remaining set of compare and branch ops take two instructions
♦
Almost all comparisons are against zero!
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 57
MIPS jump, branch, compare MIPS jump, branch, compare instructionsinstructionsInstruction Example Meaningbranch on equal beq $1,$2,100 if ($1 == $2) go to PC+4+100
Equal test; PC relative branchbranch on not eq. bne $1,$2,100 if ($1!= $2) go to PC+4+100
Not equal test; PC relative set on less than slt $1,$2,$3 if ($2 < $3) $1=1; else $1=0
Compare less than; 2’s comp. set less than imm. slti $1,$2,100 if ($2 < 100) $1=1; else $1=0
Compare < constant; 2’s comp.set less than uns. sltu $1,$2,$3 if ($2 < $3) $1=1; else $1=0
Compare less than; unsigned numbersset l. t. imm. uns. sltiu $1,$2,100 if ($2 < 100) $1=1; else $1=0
Compare < constant; unsigned numbersjump j 10000 go to 10000
Jump to target addressjump register jr $31 go to $31
For switch, procedure returnjump and link jal 10000 $31 = PC + 4; go to 10000
For procedure call
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 58
Signed vs. Unsigned ComparisonSigned vs. Unsigned Comparison
R1= 0…00 0000 0000 0000 0001R2= 0…00 0000 0000 0000 0010R3= 1…11 1111 1111 1111 1111
♦
After executing these instructions:slt r4,r2,r1 ; if (r2 < r1) r4=1; else r4=0slt r5,r3,r1 ; if (r3 < r1) r5=1; else r5=0sltu r6,r2,r1 ; if (r2 < r1) r6=1; else r6=0sltu r7,r3,r1 ; if (r3 < r1) r7=1; else r7=0
♦
What are values of registers r4 -
r7? Why?r4 = ; r5 = ; r6 = ; r7 = ;
two
two
two
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 59
Calls: Why Are Stacks So Great?Calls: Why Are Stacks So Great?
Stacking of Subroutine Calls & Returns and Environments:
A: CALL B
CALL C
C: RET
RET
B:
A
A B
A B C
A B
A
Some machines provide a memory stack as part of the architecture (e.g., VAX)
Sometimes stacks are implemented via software convention (e.g., MIPS)
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 60
Memory StacksMemory Stacks
Useful for stacked environments/subroutine call & return even ifoperand stack not part of architecture
Stacks that Grow Up vs. Stacks that Grow Down:
abc
0 Little
inf. Big 0 Little
inf. Big
MemoryAddresses
SP
NextEmpty?
LastFull?
How is empty stack represented?
Little --> Big/Last Full
POP: Read from Mem(SP)Decrement SP
PUSH: Increment SPWrite to Mem(SP)
growsup
growsdown
Little --> Big/Next Empty
POP: Decrement SPRead from Mem(SP)
PUSH: Write to Mem(SP)Increment SP
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 61
CallCall--Return Linkage: Stack FramesReturn Linkage: Stack Frames
FP
ARGS
Callee SaveRegisters
Local Variables
SP
Reference args andlocal variables atfixed (positive) offsetfrom FP
Grows and shrinks duringexpression evaluation
(old FP, RA)
♦
Many variations on stacks possible (up/down, last pushed / next )♦
Compilers normally keep scalar variables in registers, not memory!
High Mem
Low Mem
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 62
0 zero constant 0
1 at reserved for assembler
2 v0 expression evaluation &
3 v1 function results
4 a0 arguments
5 a1
6 a2
7 a3
8 t0 temporary: caller saves
. . . (callee can clobber)
15 t7
MIPS: Software conventions for MIPS: Software conventions for RegistersRegisters
16 s0 callee saves
. . . (callee must save)
23 s7
24 t8 temporary (cont’d)
25 t9
26 k0 reserved for OS kernel
27 k1
28 gp Pointer to global area
29 sp Stack pointer
30 fp frame pointer
31 ra Return Address (HW)
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 63
MIPS / GCC Calling ConventionsMIPS / GCC Calling Conventions
FP
SPfact:
addiu
$sp, $sp, -32sw
$ra, 20($sp)
sw
$fp, 16($sp)addiu
$fp, $sp, 32
. . .sw
$a0, 0($fp)
...lw
$31, 20($sp)
lw
$fp, 16($sp)addiu
$sp, $sp, 32
jr
$31
raold FP
raold FP
ra lowaddress
First four arguments passed in registers.
FP
SP
ra
FP
SP
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 64
Details of the MIPS instruction setDetails of the MIPS instruction set
♦
Register zero always has the value zero (even if you try to write it)♦
Branch/jump and link put the return addr. PC+4 or 8 into the link register (R31) (depends on logical vs
physical architecture)
♦
All instructions change all 32 bits of the destination register (including lui, lb, lh) and all read all 32 bits of sources (add, sub, and, or, …)
♦
Immediate arithmetic and logical instructions are extended as follows:■
logical immediates
ops are zero extended to 32 bits
■
arithmetic immediates
ops are sign extended to 32 bits (including addu)♦
The data loaded by the instructions lb and lh
are extended as follows:
■
lbu, lhu
are zero extended■
lb, lh
are sign extended
♦
Overflow can occur in these arithmetic and logical instructions:■
add, sub, addi
■
it cannot occur in addu, subu, addiu, and, or, xor, nor, shifts, mult, multu, div, divu
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 65
Delayed BranchesDelayed Branches
♦
In the “Raw”
MIPS, the instruction after the branch is executed even when the branch is taken?■This is hidden by the assembler for the MIPS “virtual machine”■allows the compiler to better utilize the instruction pipeline
(???)
li r3, #7
sub r4, r4, 1
bz r4, LL
addi r5, r3, 1
subi r6, r6, 2
LL: slt r1, r3, r5
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 66
Branch & PipelinesBranch & Pipelines
execute
Branch
Delay Slot
Branch Target
By the end of Branch instruction, the CPU knows whether or not the branch will take place.
However, it will have fetched the next instruction by then, regardless of whether or not a branch will be taken.
Why not execute it?
ifetch execute
ifetch execute
ifetch executeLL: slt r1, r3, r5
li r3, #7
sub r4, r4, 1
bz r4, LL
addi r5, r3, 1
Time
ifetch execute
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 67
Filling Delayed Branches Filling Delayed Branches
IF DEC & OP fetchBranch:
execute successoreven if branch taken!
Then branch targetor continue Single delay slot
impacts the critical path
•Compiler can fill a single delay slot with a useful instruction 50% of the time.
•
try to move down from above jump
•move up from target, if safe
add r3, r1, r2
sub r4, r4, 1
bz r4, LL
NOP
...
LL: add rd, ...
Is this violating the ISA abstraction?
Execute
IF DEC & OP fetch Execute
IF
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 68
Miscellaneous MIPS I instructionsMiscellaneous MIPS I instructions
♦
break
A breakpoint trap occurs, transfers control to exception handler
♦
syscall
A system trap occurs, transfers control to exception handler
♦
coprocessor instrs.
Support for floating point♦
TLB instructions
Support for virtual memory: discussed later
♦
restore from exception Restores previous interrupt mask &
kernel/user mode bits into status register♦
load word left/right
Supports misaligned word loads
♦
store word left/right
Supports misaligned word stores
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 69
MIPS: MIPS: InstructionInstruction SetSet FormatFormat
■
load/store architecture with 3 explicit operands (ALU ops)■
fixed 32-bit instructions
■
3 instruction formats» R-Type» I-Type» J-Type
■
6 instruction set groups:» load/store -
data movement operations
» computational -
arithmetic, logical, and shift operations» jump/branch -
including call and returns
» coprocessor -
FP instructions» coprocessor0 -
memory management and exception handling
» special -
accessing special registers, system calls, breakpoint instructions, etc.
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 70
R2000/3000 R2000/3000 InstructionInstruction FormatsFormats
♦
R-type
(register)e.g. add
$8, $17, $18
# $8 = $17 + $18
OpCode rs rt rd shamt funct056101115162021252631
0 17 18 8 0 32056101115162021252631
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 71
•
I-type
(immediate)e.g. addi
$8, $17, -44
# $8 = $17 -44
lw
$8, -44($17)
# $8 = M[$17 -
44]beq
$17, $8, label
# if( $8 == $17) go
to
label:
OpCode rs rt immediate015162021252631
“op” 17 8 -44015162021252631
R2000/3000 R2000/3000 InstructionInstruction FormatsFormats
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 72
•
J-type
(jump)e.g. jump
label
# call
label: ;
$31 = $pc
+ 8
OpCode target0252631
3 -440252631
R2000/3000 R2000/3000 InstructionInstruction FormatsFormats
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 73
BhandarkarBhandarkar andand Clark: RISC vs. CISCClark: RISC vs. CISC
♦
Compares the
VAX 8700 vs
MIPS M/2000 (R3000 chip)
♦
Combines three
fractors:■
Architecture
■
Implementation■
Compilers
and
OS
♦
Argues that:■
Implementation
effects
are second
order
■
Compilers
are “similar”■
RISCs
are better
than
CISCs
♦
Is
it
a fair comparison
of
RISCs
vs
CISCs
?
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 74
BhandarkarBhandarkar andand Clark, cont.Clark, cont.
RISC factor
Risc Facto InstrCPI Instr
vaxmips mips
r = CPIvax
Bechmark Inst.Ratio
CPIMIPS
CPIVAX
Ratio RISCfactor
spice2g6 2.5 1.8 8.0 4.4 1.8matrix300 2.4 3.0 13.8 4.5 1.9nasa7 2.1 3.0 15.0 5.0 2.4fpppp 2.9 1.5 15.2 10.5 2.7tomcatv 2.9 2.1 17.5 8.2 2.9dudoc 2.7 1.7 13.2 7.9 3.0espresso 1.7 1.1 5.4 5.1 3.0eqntott 1.1 1.3 4.4 3.5 3.3li 1.6 1.1 6.5 6.0 3.7geo. mean 2.2 1.7 9.9 5.8 2.7
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 75
BhandarkarBhandarkar andand Clark, cont.Clark, cont.
♦
Compensation Factors■
Increase VAX CPI but decrease VAX instruction count
■
Increase MIPS instruction count■
Example 1: Loads and stores vs. operand specifiers
■
Example 2: Necessary complex operations, e.g. loop branches
♦
Factors favoring VAX■
Big immediate values
■
Not-taken branches incur no delay
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 76
BhandarkarBhandarkar andand Clark, cont.Clark, cont.
♦
Factors
favoring
MIPS■
Operand
specifier
decoding
■
Number
of
registers■
Separating
floating
point
unit
■
Simple jumps
and
branches
(lower
latency)■
Fancy
VAX instructions: Unnecessary
functionality■
Instruction
scheduling
■
Translation
buffer■
Branch
displacement
size
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 77
HomeworkHomework AssignmentAssignment 33
♦
Make a review of the ISA for a specific processor♦
Write a three page report explaining the following:
1.
Kind of IS Architecture (RISC, CISC or other, 16-bit, 32- bit, 64-bit)
2.
Classes of instructions (ALU, Memory Movement, Branches, etc.)
3.
Addressing Modes (immediate, base+displacemente, indirect, etc.)
4.
Displacements in branches and control flow instructions (call, ret)
5.
Special instructions: system calls, traps, access to special purpose registers
6.
Instructions formats
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 78
HwHw33: List of Processors: List of Processors
1 Claudia Méndez Garza Xscale
o ARM2 José
Alberto Ramírez Uresti Opteron AMD 64 bits
3 Víctor Echeverría Ríos Texas Instruments
TMS320DM64x
o un DSP
Due date: September 26th, 2008.
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 79
LLááminas complementarias no minas complementarias no expuestas en la clase que pueden expuestas en la clase que pueden
servir de soporteservir de soporte
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 80
VAXVAX--1111
♦
Introduced
by DEC in 1977: VAX- 11/780
♦
Upward
compatible from
PDP-11♦
32-bit “word”
and
addresses
♦
Virtual memory
is
first-class♦
16 GPRs
(r15 is
PC, r14 is
SP), CCs
♦
Extremely
orthogonal, memory- memory
♦
Decode
as byte stream■
Opcode: operation, number
of
operands
& operand
type■
Variable-length
address
specifiers
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 81
VAXVAX--11, cont.11, cont.
♦
Data types■
8-, 16-, 32-, 64-, 128-bit integers
■
F (32 bits), D (64), G (64), H (128) FP■
Character
string
(8-bits/char)
■
Decimal (4-bits/digit)■
Numeric
string
(8-bits/digit)
♦
Addresing
modes
include–Literal (6 bits)–8-, 16-, 32-bit immediates–Register, register
deferred–8-, 16-, 32-bit displacements–8-, 16-, 32-bit displacement
deferred
–Indexed–Autoincrement–Autodecrement–Autoincrement
deferred
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 82
VAXVAX--11, cont.11, cont.
♦
Operations■
Data Transfers
(including
string
move)
■
Arithmetic
and
Logical
(2 and
3 operands)■
Control (Branch, Jump, etc. )
» AOBLEQ (Add
one
and
Branch
if
Less
than
or
EQual)■
Procedure
(CALLs
save
state)
■
Bit Manipulation■
Floating
Poing
(Add/Sub/Mult/Divide)
■
POLYF --
Polynomial
Evaluation■
System
(Exception, VM)
■
Other» CRC --
Cycle
redundant
chech
» INSQUE --
Insert
entry
in queue
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 83
VAXVAX--11, cont.11, cont.
VAX has too many modes
& formats
Serial semantics
limit parallel
execution
8 bits
PC+0PC+1
PC+6
PC+9
PC+15
New instruction:opcode calls for three operandsSpecifier 1: four bits + register
+ four byte displacements
Specifier 2
+ two byte displacements
Specifier 3: indexSpecifier 3: indexed mode
+ four byte displacement
next instruction
The big deal with RISC is not REDUCED number of instructions; it’s few modes & formats to facilitate pipelining
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 84
DEC AlphaDEC Alpha
♦
Introduced
by DEC in 1992■
Ability
to
emulate VAX instructions
important
■
Strongly
influenced
by Cray-1
♦
64-bit architecture♦
Load/Store --
only
displacement
addressing♦
Standard datatypes■
No byte loads/stores
♦
Registers■
32 64-bit GPRs
(r31 = 0)
■
32 64-bit FPRs
♦
VAX and
IEEE floating
point
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 85
DEC Alpha, cont.DEC Alpha, cont.
♦
Four
fixed-length
instruction
formats■
Sub-formats
for
computation
instructions
■
32-bit instructions■
Designed
with
multiple-issue
in mind
■
No delayed
branches
♦
Precise exceptions
not
automatic
♦
PAL code
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 86
DEC Alpha DEC Alpha InstructionsInstructions FormatsFormats
OpCode src/dest base displacement015162021252631
Memory
Format
OpCode src displacement02021252631
PC-Relative
Format
OpCode PAL argument0252631
PAL-call
Format
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 87
DEC ALPHA DEC ALPHA InstructionInstruction FormatsFormats, cont., cont.
Three-Register
Integer
Format
OpCode src1 src2 000 function dest045111215162021252631
0
Eight-bit Immediate
Integer
Format
OpCode src1 const function dest045111215162021252631
1
Eight-bit Immediate
Integer
Format
OpCode src1 src2 function dest04515162021252631
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 88
DEC Alpha DEC Alpha InstructionInstruction SetSet
♦
Operate Instructions■
Integer
Arithmetic
■
Logical
(AND, OR, conditional
MOV)■
Byte-manipulation
■
Floating-point
arithmetic■
Miscellaneous
(memory
prefetching,
trap
and
memory
barriers
♦
Load/Store Instructions■
Load/Store Quadwords
(64-bits)
■
Load-Linked/Store Conditional
(for MP synchronization)
Laboratorio deTecnologías de Información
Arquitectura de Computadoras ISA- 89
DEC Alpha DEC Alpha InstructionInstruction SetSet, cont., cont.
♦
Control/Branching Instruction■
Branch on condition (8 conditions) in integer register
■
Branch on condition (6 conditions) in FP register
■
Unconditional branches■
Calculated jumps
♦
Branch Hints■
Different hint/rule type of branch
♦
Supervision Instructions■
PAL code for needed task