cs 136 1 cs136, advanced architecture instruction set architecture
TRANSCRIPT
![Page 1: CS 136 1 CS136, Advanced Architecture Instruction Set Architecture](https://reader033.vdocument.in/reader033/viewer/2022050805/56649c785503460f9492d5ed/html5/thumbnails/1.jpg)
CS 136 1
CS136, Advanced Architecture
Instruction Set Architecture
![Page 2: CS 136 1 CS136, Advanced Architecture Instruction Set Architecture](https://reader033.vdocument.in/reader033/viewer/2022050805/56649c785503460f9492d5ed/html5/thumbnails/2.jpg)
CS 136 2
Types of ISAs
• Stack– Implicit operands (top of stack)
– Heavy memory traffic
– Limited ability to access operands at will
– Obsolete
• Accumulator– Implicit register operand (“accumulator”)
– One memory operand
– Insufficient temporaries
– Obsolete
• General-purpose register– Multiple registers
– Several variations
![Page 3: CS 136 1 CS136, Advanced Architecture Instruction Set Architecture](https://reader033.vdocument.in/reader033/viewer/2022050805/56649c785503460f9492d5ed/html5/thumbnails/3.jpg)
CS 136 3
GPR Architectures
• Memory-memory– CISC idea
– Usually allows any operand to be in register as well
• Register-memory– Example: x86
– Can do one operand in register, one in memory, or 2 in regs
• Register-register– Only design used in modern machines
– Lots of registers ⇒ fast flexible operand access– Simplicity of hardware– Compiler has full flexibility in register usage
![Page 4: CS 136 1 CS136, Advanced Architecture Instruction Set Architecture](https://reader033.vdocument.in/reader033/viewer/2022050805/56649c785503460f9492d5ed/html5/thumbnails/4.jpg)
CS 136 4
Five Ways to Do C = A + B
STACK
PUSH APUSH BADDPOP C
ACCUM
LOAD AADD BSTORE C
MEM-MEM
ADD C,A,B
REG-MEM
LOAD R1,AADD R1,BSTORE R1,C
REG-REG
LOAD R1,ALOAD R2,BADD R3,R1,R2STORE R3,C
![Page 5: CS 136 1 CS136, Advanced Architecture Instruction Set Architecture](https://reader033.vdocument.in/reader033/viewer/2022050805/56649c785503460f9492d5ed/html5/thumbnails/5.jpg)
CS 136 5
Memory Addressing
• Originally just word addressing
• 8-bit bytes and byte addressing introduced on IBM 360 series
• Brief experiments with bit addressing (bad idea)
• Unaligned accesses not worth supporting
• Some machines byte-address but only load/store a word at a time– Turned out to be bad design decision
– Too many programs do string processing 1 character at a time
– May need to revisit in future (32-bit characters?)
• Modern RISC designs allow short load/store, but not short arithmetic
![Page 6: CS 136 1 CS136, Advanced Architecture Instruction Set Architecture](https://reader033.vdocument.in/reader033/viewer/2022050805/56649c785503460f9492d5ed/html5/thumbnails/6.jpg)
CS 136 6
Endian-ness
• The word is “Endian”, not “Indian”
• Reference to Gulliver’s Travels
• Little-Endian invented by Digital Equipment on the PDP-11– Mathematically more elegant
– Horrible for humans
– “It seemed like a good idea at the time”
– Should be banished from the face of the Earth
• Some machines can switch endianness with a control bit– This idea is even stupider than the original
![Page 7: CS 136 1 CS136, Advanced Architecture Instruction Set Architecture](https://reader033.vdocument.in/reader033/viewer/2022050805/56649c785503460f9492d5ed/html5/thumbnails/7.jpg)
CS 136 7
Addressing Modes
• How can an instruction reference memory?
• Early days: absolute address in instruction– Led to instruction modification
– Improvement: “Indirection” picked up absolute location, used it as final address
• Minimum necessary today: follow pointer in register– Clumsy if only option
• Fanciest conceivable: *(R1+S*R2+constant), with either or both of R1 and R2 autoincremented or autodecremented as side effect, either before or after instruction– No machine went quite this far
– But VAX came close
![Page 8: CS 136 1 CS136, Advanced Architecture Instruction Set Architecture](https://reader033.vdocument.in/reader033/viewer/2022050805/56649c785503460f9492d5ed/html5/thumbnails/8.jpg)
CS 136 8
Addressing Modes (cont’d)
• What’s actually useful?
• Need to follow pointers: can restrict to registers– ADD R1,(R2)
– Better: LOAD R1,R2 (like MIPS)
• Frequent stack access ⇒ register + constant useful• Immediates needed for built-in constants• Access to globals ⇒ absolute memory
addresses– (We’ll see that that’s painful)
• PC-relative modes– Used to be needed for data; not in modern systems– Still needed for calls and branches
• Absolute addresses no longer needed for branches– Can always emulate with PC-relative, since PC known– Still available on some architectures
![Page 9: CS 136 1 CS136, Advanced Architecture Instruction Set Architecture](https://reader033.vdocument.in/reader033/viewer/2022050805/56649c785503460f9492d5ed/html5/thumbnails/9.jpg)
CS 136 9
Operand Types and Sizes
• Type usually implies size
• Integers can safely be widened to word size– Shrink again when stored
– Takes advantage of two’s-complement representation
• Single-precision FP gives different results than double-precisions⇒Necessary to support both widths
– Some FPUs can do two SP operations in parallel
• Older machines allowed “packed” decimal (2 digits per byte)– x86 supports with DAA (Decimal Add Adjust) instruction
– Still useful in business world, though dying
• 32 bits standard these days, 64 bits coming– 128 some day?
![Page 10: CS 136 1 CS136, Advanced Architecture Instruction Set Architecture](https://reader033.vdocument.in/reader033/viewer/2022050805/56649c785503460f9492d5ed/html5/thumbnails/10.jpg)
CS 136 10
Operations Provided
• Only one instruction truly needed: SJ– Subtract A from B, giving C; if result is < 0, jump to D
– It’s Turing-complete!
• Practical machines need a bit more at minimum:– Arithmetic and logical (add, multiply, divide?, and, or, …)
– Data movement (load/store, move between registers)
– Control (conditional/unconditional branch, procedure call and return, trap to OS)
– System control (return from interrupt, manage VM, set unprivileged mode, access I/O devices)
• Other builtins can be useful:– Basic floating point
» Bad x86 design idea: sin, sqrt, etc.!
– Decimal
– String
– Vector, graphics
![Page 11: CS 136 1 CS136, Advanced Architecture Instruction Set Architecture](https://reader033.vdocument.in/reader033/viewer/2022050805/56649c785503460f9492d5ed/html5/thumbnails/11.jpg)
CS 136 11
Control Flow
• Addressing modes are important– PC-relative means code can run at any virtual address
– Useful for dynamically linked (shared) libraries
• Pointer-following jump needed for returns– Also useful for switch statements, function pointers, virtual
functions, and shared libraries
• How to specify condition for conditional branches?– Condition code as side effect of every instruction
» Boils down to extra register
» Spurious dependencies in pipeline
– Condition register explicitly set by comparison
– Compare as part of branch
» Adds delay slots in pipeline
![Page 12: CS 136 1 CS136, Advanced Architecture Instruction Set Architecture](https://reader033.vdocument.in/reader033/viewer/2022050805/56649c785503460f9492d5ed/html5/thumbnails/12.jpg)
CS 136 12
Encodings
• Variable-length instructions– Highly efficient (few wasted bits)
– Allows complex specifications (e.g., x86 addressing modes)
– Usually means misaligned instruction fetch
– Greatly complicates fetch/decode units
• Fixed-length instructions– May limit number of registers
– Usually very few instruction formats
– Wastes space but gains speed (e.g., only aligned fetches)
– Limits width of immediate operands
![Page 13: CS 136 1 CS136, Advanced Architecture Instruction Set Architecture](https://reader033.vdocument.in/reader033/viewer/2022050805/56649c785503460f9492d5ed/html5/thumbnails/13.jpg)
CS 136 13
The Fight for Bits
• How wide should instruction be?– Wider ⇒ can encode more registers, more options– Wider ⇒ bigger programs, more memory bandwidth– Bigger programs ⇒ fewer cache hits
• Things you need to encode:– Operation code (16 to 1000 instructions)– Operands (at least one, normally two or three)– Immediate operands– Memory offsets– Branch targets– Branch conditions– Conditional operations (e.g., conditional load, add)
![Page 14: CS 136 1 CS136, Advanced Architecture Instruction Set Architecture](https://reader033.vdocument.in/reader033/viewer/2022050805/56649c785503460f9492d5ed/html5/thumbnails/14.jpg)
CS 136 14
Two or Three Operands?
• In favor of three:– Smaller code size
– No clobbered operands ⇒ fewer copies or reloads– Setting R0 to zero allows fewer operations
supported in ALU
• In favor of two:– Can address more registers
![Page 15: CS 136 1 CS136, Advanced Architecture Instruction Set Architecture](https://reader033.vdocument.in/reader033/viewer/2022050805/56649c785503460f9492d5ed/html5/thumbnails/15.jpg)
CS 136 15
How to Decide All These Questions?
• Slide rules at 50 paces?
• Analysis wars– Look at existing designs, existing programs
– “Recompile” programs for hypothetical architecture
» Analyze size of resulting program
» Run through simulator to see how it performs
– Impractical approach
» Writing compiler back ends is expensive
» Simulators are slow
– instead, make projections based on existing object code
![Page 16: CS 136 1 CS136, Advanced Architecture Instruction Set Architecture](https://reader033.vdocument.in/reader033/viewer/2022050805/56649c785503460f9492d5ed/html5/thumbnails/16.jpg)
CS 136 16
Example of Bad Analysis: @-(R2)
• DEC VAX had three “auto” addressing modes: autopostincrement, autopredecrement, and indirect autopostincrement
• What happened to indirect autopredecrement?– Analyzed output of BLISS compiler on many programs
– Language didn’t provide way to express autopredecrement
– Concluded it wasn’t necessary
– Very different result if had analyzed C!
*--p1 = a[--i];
![Page 17: CS 136 1 CS136, Advanced Architecture Instruction Set Architecture](https://reader033.vdocument.in/reader033/viewer/2022050805/56649c785503460f9492d5ed/html5/thumbnails/17.jpg)
CS 136 17
Example of Difficult Analysis: imm16
• How big should an immediate be?
• Easy analysis: examine existing code– Calculate frequency of various widths
– Analyze tradeoff of using those bits for other purposes
• Problem: analyzed architecture affects frequency of different widths– E.g., Alpha has only 16 bits, so you’ll never see over 16!
– Alternative: look for multi-instruction sequences that effectively use more than 16 bits
» Hard to find (compiler pipeline scheduling)
» Compiler will stand on head, use sneaky tricks to avoid generating extra instructions
– Need for wider constants depends on architecture
» E.g., MIPS needs them when jumping to shared libraries
![Page 18: CS 136 1 CS136, Advanced Architecture Instruction Set Architecture](https://reader033.vdocument.in/reader033/viewer/2022050805/56649c785503460f9492d5ed/html5/thumbnails/18.jpg)
CS 136 18
![Page 19: CS 136 1 CS136, Advanced Architecture Instruction Set Architecture](https://reader033.vdocument.in/reader033/viewer/2022050805/56649c785503460f9492d5ed/html5/thumbnails/19.jpg)
CS 136 19
Interaction with Compilers
• Nearly all modern code generated by compilers
• Architect must make compiler’s job easier– Lots of registers
– Orthogonal instruction set
– Few side effects
– Instructions and addressing modes matched to language constructs
» But NOT attempt to implement them in detail!
» Primitives are better than “solutions” even when solutions are correct
– Good support for stack, globals, and pointers
– Support for both compile-time and run-time binding
– Don’t ask compiler to predict dynamic information (e.g., branch targets)
– Don’t provide features language can’t express
» Example pro and con: vector architectures
![Page 20: CS 136 1 CS136, Advanced Architecture Instruction Set Architecture](https://reader033.vdocument.in/reader033/viewer/2022050805/56649c785503460f9492d5ed/html5/thumbnails/20.jpg)
CS 136 20
The MIPS64 Architecture
• Extension of MIPS32
• Data path widened to 64 bits– Still 32-bit instructions
– Still only 32 registers
• Most instructions have “D” as prefix to indicate 64-bit version
![Page 21: CS 136 1 CS136, Advanced Architecture Instruction Set Architecture](https://reader033.vdocument.in/reader033/viewer/2022050805/56649c785503460f9492d5ed/html5/thumbnails/21.jpg)
CS 136 21
MIPS Instruction Formats
Opcode
6
rs
5
rt
5
rd
5
shamt
5
funct
6
R-Type Instruction
I-Type Instruction
Opcode
6
rs
5
rt
5 16
Immediate
J-Type Instruction
Opcode
6 26
Offset inserted into PC
![Page 22: CS 136 1 CS136, Advanced Architecture Instruction Set Architecture](https://reader033.vdocument.in/reader033/viewer/2022050805/56649c785503460f9492d5ed/html5/thumbnails/22.jpg)
CS 136 22
I-Type Instructions
• Encodes loads, stores (all widths), immediate ALU ops
• Also conditional branches (rt unused)
Opcode
6
rs
5
rt
5 16
Immediate
![Page 23: CS 136 1 CS136, Advanced Architecture Instruction Set Architecture](https://reader033.vdocument.in/reader033/viewer/2022050805/56649c785503460f9492d5ed/html5/thumbnails/23.jpg)
CS 136 23
R-Type Instructions
Opcode
6
rs
5
rt
5
rd
5
shamt
5
funct
6
• Register-register ALU operations– “funct” encodes the ALU operation: add, sub, etc.
– Opcode chooses operands, special registers, sizes, etc.
– Conditional moves
• Handles special registers, floating point, …
![Page 24: CS 136 1 CS136, Advanced Architecture Instruction Set Architecture](https://reader033.vdocument.in/reader033/viewer/2022050805/56649c785503460f9492d5ed/html5/thumbnails/24.jpg)
CS 136 24
J-Type Instructions
Opcode
6 26
Offset inserted into PC
• Jump, jump and link
• Trap, return from exception
![Page 25: CS 136 1 CS136, Advanced Architecture Instruction Set Architecture](https://reader033.vdocument.in/reader033/viewer/2022050805/56649c785503460f9492d5ed/html5/thumbnails/25.jpg)
CS 136 25
MIPS Control Flow
• Unconditional jump substitutes low bits of PC– NOT addition!
– Exceptionally bad on 64-bit architecture, where 36 bits unchanged
• No built-in stack– Subroutine call stores return in register
– Callee must save on stack if necessary
– Reduces overall cycle time
– Ultra-efficient for leaf functions
• Conditional branches only test against zero– Complex tests (e.g., <) store Z/NZ result in a register
– We’ve seen how this improves the pipeline
• Conditional moves can eliminate many branches– Feature of many modern architectures
![Page 26: CS 136 1 CS136, Advanced Architecture Instruction Set Architecture](https://reader033.vdocument.in/reader033/viewer/2022050805/56649c785503460f9492d5ed/html5/thumbnails/26.jpg)
CS 136 26
MIPS Floating Point
• Floating point was originally coprocessor
⇒Separate FP registers– Special instructions to move to/from integer registers
• MIPS64 (but not 32) has paired single operations– Two SP numbers pass through DP ALU simultaneously
• MIPS64 also has multiply-add in one instruction– Useful in signal processing (multimedia)
![Page 27: CS 136 1 CS136, Advanced Architecture Instruction Set Architecture](https://reader033.vdocument.in/reader033/viewer/2022050805/56649c785503460f9492d5ed/html5/thumbnails/27.jpg)
CS 136 27
Fallacies and Pitfalls
• PITFALL: Instruction designed to support feature in some language– Examples: PDP-11/45 MARK, VAX CALLS, IBM 360 ED/EDMK
– Why is this bad?
» Easy to get wrong (PDP-11 MARK instruction)
» Easy to make inefficient (VAX CALLS)
» Languages evolve, hardware doesn‘t
![Page 28: CS 136 1 CS136, Advanced Architecture Instruction Set Architecture](https://reader033.vdocument.in/reader033/viewer/2022050805/56649c785503460f9492d5ed/html5/thumbnails/28.jpg)
CS 136 28
Fallacies and Pitfalls (2)
• FALLACY: Typical programs exist– We wish!
• PITFALL: Ignoring the compiler– Design better code size, based on bad compiler
– Good compiler can blow your idea out of the water
• FALLACY: Flawed architectures can’t succeed– Ummm, x86?
– Every architecture has drawbacks
• FALLACY: You (YOU!) can design a flawless architecture– Always tradeoffs
– Always something new to learn
![Page 29: CS 136 1 CS136, Advanced Architecture Instruction Set Architecture](https://reader033.vdocument.in/reader033/viewer/2022050805/56649c785503460f9492d5ed/html5/thumbnails/29.jpg)
CS 136 29
Summary
• Instruction encoding is important
• Don’t forget to provide what the compiler needs– This is NOT what you think the compiler needs!
• Addresses will only get wider
• Data will only get wider– Including characters
• Cleverness to improve bandwidth (e.g., MADD)
• RISC is here to stay