lecture 5: instruction set architecture computer engineering 585 fall 2001
TRANSCRIPT
![Page 1: Lecture 5: Instruction Set Architecture Computer Engineering 585 Fall 2001](https://reader035.vdocument.in/reader035/viewer/2022062314/56649f345503460f94c51f36/html5/thumbnails/1.jpg)
Lecture 5: Instruction Set Architecture
Computer Engineering 585Fall 2001
![Page 2: Lecture 5: Instruction Set Architecture Computer Engineering 585 Fall 2001](https://reader035.vdocument.in/reader035/viewer/2022062314/56649f345503460f94c51f36/html5/thumbnails/2.jpg)
Summary, #1• Designing to Last through Trends
Capacity Speed
Logic 2x in 3 years 2x in 3/2 years
DRAM 4x in 3 years 2x in 10 years
Disk 4x in 3 years 2x in 10 years
• 6yrs to graduate => 16X CPU speed, DRAM/Disk size
• Time to run the task– Execution time, response time, latency
• Tasks per day, hour, week, sec, ns, …– Throughput, bandwidth
• “X is n times faster than Y” means ExTime(Y) Performance(X)
--------- = --------------
ExTime(X) Performance(Y)
![Page 3: Lecture 5: Instruction Set Architecture Computer Engineering 585 Fall 2001](https://reader035.vdocument.in/reader035/viewer/2022062314/56649f345503460f94c51f36/html5/thumbnails/3.jpg)
Summary, #2 Amdahl’s Law:
CPI Law:
Execution time is the REAL measure of computer performance!
Good products created when have: Good benchmarks, good ways to summarize
performance Die Cost goes roughly with die area4
Can PC industry support engineering/research investment?
Speedupoverall =ExTimeold
ExTimenew
=
1
(1 - Fractionenhanced) + Fractionenhanced
Speedupenhanced
CPU time = Seconds = Instructions x Cycles x Seconds
Program Program Instruction Cycle
CPU time = Seconds = Instructions x Cycles x Seconds
Program Program Instruction Cycle
![Page 4: Lecture 5: Instruction Set Architecture Computer Engineering 585 Fall 2001](https://reader035.vdocument.in/reader035/viewer/2022062314/56649f345503460f94c51f36/html5/thumbnails/4.jpg)
Computer Architecture Is …the attributes of a [computing] system as
seen by the programmer, i.e., the conceptual structure and functional behavior, as distinct from the organization of the data flows and controls the logic design, and the physical implementation.
Amdahl, Blaaw, and Brooks, 1964
SOFTWARESOFTWARE
![Page 5: Lecture 5: Instruction Set Architecture Computer Engineering 585 Fall 2001](https://reader035.vdocument.in/reader035/viewer/2022062314/56649f345503460f94c51f36/html5/thumbnails/5.jpg)
Computer Architecture’s Changing Definition
1950s to 1960s: Computer Architecture Course: Computer Arithmetic
1970s to mid 1980s: Computer Architecture Course: Instruction Set Design, especially ISA appropriate for compilers
1990s-2000s: Computer Architecture Course:Design of CPU, memory system, I/O system, Multiprocessors
![Page 6: Lecture 5: Instruction Set Architecture Computer Engineering 585 Fall 2001](https://reader035.vdocument.in/reader035/viewer/2022062314/56649f345503460f94c51f36/html5/thumbnails/6.jpg)
Instruction Set Architecture (ISA)
instruction set
software
hardware
![Page 7: Lecture 5: Instruction Set Architecture Computer Engineering 585 Fall 2001](https://reader035.vdocument.in/reader035/viewer/2022062314/56649f345503460f94c51f36/html5/thumbnails/7.jpg)
Interface DesignA good interface:
• Lasts through many implementations (portability, compatibility)
• Is used in many different ways (generality)
• Provides convenient functionality to higher levels
• Permits an efficient implementation at lower levels
Interfaceimp 1
imp 2
imp 3
use
use
use
time
![Page 8: Lecture 5: Instruction Set Architecture Computer Engineering 585 Fall 2001](https://reader035.vdocument.in/reader035/viewer/2022062314/56649f345503460f94c51f36/html5/thumbnails/8.jpg)
Evolution of Instruction Sets Single Accumulator (EDSAC 1950)
Accumulator + Index Registers(Manchester Mark I, IBM 700 series 1953)
Separation of Programming Model from Implementation
High-level Language Based Concept of a Family(B5000 1963) (IBM 360 1964)
General Purpose Register Machines
Complex Instruction Sets Load/Store Architecture
RISC
(Vax, Intel 432 1977-80) (CDC 6600, Cray 1 1963-76)
(Mips,Sparc,HP-PA,IBM RS6000, . . .1987)
![Page 9: Lecture 5: Instruction Set Architecture Computer Engineering 585 Fall 2001](https://reader035.vdocument.in/reader035/viewer/2022062314/56649f345503460f94c51f36/html5/thumbnails/9.jpg)
A "Typical" RISC
32-bit fixed format instruction (3 formats) 32 32-bit GPR (R0 contains zero, DP take
pair) 3-address, reg-reg arithmetic instruction Single address mode for load/store:
base + displacement no indirection
Simple branch conditions Delayed branch
see: SPARC, MIPS, HP PA-Risc, DEC Alpha, IBM PowerPC, CDC 6600, CDC 7600, Cray-1, Cray-2, Cray-3
![Page 10: Lecture 5: Instruction Set Architecture Computer Engineering 585 Fall 2001](https://reader035.vdocument.in/reader035/viewer/2022062314/56649f345503460f94c51f36/html5/thumbnails/10.jpg)
Evolution of Instruction Sets Major advances in computer architecture are
typically associated with landmark instruction set designs Ex: Stack vs GPR (System 360)
Design decisions must take into account: technology machine organization programming languages compiler technology operating systems
And they in turn influence these
![Page 11: Lecture 5: Instruction Set Architecture Computer Engineering 585 Fall 2001](https://reader035.vdocument.in/reader035/viewer/2022062314/56649f345503460f94c51f36/html5/thumbnails/11.jpg)
Example: MIPS
Op
31 26 01516202125
Rs1 Rd immediate
Op
31 26 025
Op
31 26 01516202125
Rs1 Rs2
target
Rd Opx
Register-Register
561011
Register-Immediate
Op
31 26 01516202125
Rs1 Rs2/Opx immediate
Branch
Jump / Call
![Page 12: Lecture 5: Instruction Set Architecture Computer Engineering 585 Fall 2001](https://reader035.vdocument.in/reader035/viewer/2022062314/56649f345503460f94c51f36/html5/thumbnails/12.jpg)
Architecture, Implementation Architecture deals with functions
provided to the programmer: addressing, addition, interrupt, and I/O
Implementation deals with method used to achieve this function, such as a parallel datapath and a microprogrammed control
Realization is means used to materialize this method: electrical, magnetic or mechanical devices; power and packaging.
![Page 13: Lecture 5: Instruction Set Architecture Computer Engineering 585 Fall 2001](https://reader035.vdocument.in/reader035/viewer/2022062314/56649f345503460f94c51f36/html5/thumbnails/13.jpg)
Clock Architecture
1 23
12
4567
8910
11
Architecture
Variant Realizations
![Page 14: Lecture 5: Instruction Set Architecture Computer Engineering 585 Fall 2001](https://reader035.vdocument.in/reader035/viewer/2022062314/56649f345503460f94c51f36/html5/thumbnails/14.jpg)
Architecture: Two arms – small one for hour, longer one for minutes, may be alarm.
Realization: Shape of clock arms and dial, numbers. Mechanical or digital mechanism. Energy source a wound spring or a battery.
![Page 15: Lecture 5: Instruction Set Architecture Computer Engineering 585 Fall 2001](https://reader035.vdocument.in/reader035/viewer/2022062314/56649f345503460f94c51f36/html5/thumbnails/15.jpg)
Instruction Set Design: (1) Ease of Use
consistency: with a partial knowledge of the system, one can predict the remainder. e.g. including square-root as an instruction should almost fully define everything else. FP op halve was added to IBM 360 as an afterthought and lacked post-normalization.
orthogonality: Two independent concerns should be handled as such. e.g. clock architecture -- (1) luminous dial (2) alarm.
IBM 650, low order addr bits determine amount of shift. Yet, if address exceeds address space, a violation occurs.
![Page 16: Lecture 5: Instruction Set Architecture Computer Engineering 585 Fall 2001](https://reader035.vdocument.in/reader035/viewer/2022062314/56649f345503460f94c51f36/html5/thumbnails/16.jpg)
transparency: an architectural function is transparent if its implementation does not produce any architecturally visible side-effects. e.g. pipelining should not affect the compiler-visible machine.
generality: Designer should not limit a function by his/her own notions about its use. Intel 8080 has a restart op intended to restart after an interrupt. Its larger use is a return from a subroutine, since it was designed in all its generality.
open-endedness: provision for future expansion.
completeness: all functions of a given class are provided. special case: symmetry: inverse is also provided.
![Page 17: Lecture 5: Instruction Set Architecture Computer Engineering 585 Fall 2001](https://reader035.vdocument.in/reader035/viewer/2022062314/56649f345503460f94c51f36/html5/thumbnails/17.jpg)
Instruction Set Design: (2) Program size: memory size; CPU-MM
bandwidth; frequently used (written-down) instructions should be short.
(3) Execution speed: time required to execute an instruction Can they be pipelined? Are they uniform in
execution length? Control and cache are often in the critical path of a
processor design. Uniform length requirements at loggerheads with
(2) above. (4) Complexity of control unit: Some
instructions should not even be in the instruction set. (RISC)
![Page 18: Lecture 5: Instruction Set Architecture Computer Engineering 585 Fall 2001](https://reader035.vdocument.in/reader035/viewer/2022062314/56649f345503460f94c51f36/html5/thumbnails/18.jpg)
Instruction Set Classification
internal CPU operand storage mechanism: registers, stack, accumulator
# explicit operands / instruction: 0, 1, 2, 3
presumed operand locations: memory, stack
Operations type and size of operands
Instruction: Opcode ---- Operands: ADD R1, 20
![Page 19: Lecture 5: Instruction Set Architecture Computer Engineering 585 Fall 2001](https://reader035.vdocument.in/reader035/viewer/2022062314/56649f345503460f94c51f36/html5/thumbnails/19.jpg)
Instruction Formats
#Ops Instruction
Semantics Machine
4 NI op A B C C = A op B IBM 650 µ-code
3 op A B C C=A op B RISC, Cray
2 op A B A=A op B IBM370, VAX
1 op A Acc = Acc op A
PDP8, M6809
0 op X=X op Y stack machines transputer, B5500
![Page 20: Lecture 5: Instruction Set Architecture Computer Engineering 585 Fall 2001](https://reader035.vdocument.in/reader035/viewer/2022062314/56649f345503460f94c51f36/html5/thumbnails/20.jpg)
Stack/Reg/Acc Architectures
Stack Accumulator
Reg-Mem Reg-Reg
PUSH A LOAD A LOAD R1, A LOAD R1, A
PUSH B ADD B ADD R1, B LOAD R2, B
ADD STORE C STORE C, R1
ADD R3, R1, R2
POP C STORE R3, C
C = A+B
![Page 21: Lecture 5: Instruction Set Architecture Computer Engineering 585 Fall 2001](https://reader035.vdocument.in/reader035/viewer/2022062314/56649f345503460f94c51f36/html5/thumbnails/21.jpg)
Stack: short inst, post-fix model of expression evaluation; sequential operand access --- hard for
compilers, Implementation issues --- how deep,
exception handling e.g. when empty? Accumulator: short inst and relatively
small machine state, (easier context-switch); high memory traffic.
Reg-Reg: Easiest for compiler optimization -- most general model.
long instructions and large state.
![Page 22: Lecture 5: Instruction Set Architecture Computer Engineering 585 Fall 2001](https://reader035.vdocument.in/reader035/viewer/2022062314/56649f345503460f94c51f36/html5/thumbnails/22.jpg)
Endian-ness of Memory AddressingCohen's article: On Holy Wars and a Plea for Peace,IEEE Computer, Oct 81.
CPUwords, pages
Memorybits, bytes
What order are they composed in order to form the nextobject in the hierarchy?
LSB (less-significant unit) travels first little endians (Lilliputians)MSB (more-significant unit) travels first big endians (Blefuscians)
![Page 23: Lecture 5: Instruction Set Architecture Computer Engineering 585 Fall 2001](https://reader035.vdocument.in/reader035/viewer/2022062314/56649f345503460f94c51f36/html5/thumbnails/23.jpg)
Endian-ness Big-endian: IBM 360, MIPS, Motorola
680xx, SPARC, DLX Little-endian: DEC VAX,Compaq/HP
Alpha, Intel 80x86 Selectable: PowerPC, MIPS: mode bit: 0-
Big, 1-LittleA Content
s
4 0x10
5 0x20
6 0x30
7 0x40
Word at Addr 4: 0X10203040 (Big) 0x40302010 (Little)
![Page 24: Lecture 5: Instruction Set Architecture Computer Engineering 585 Fall 2001](https://reader035.vdocument.in/reader035/viewer/2022062314/56649f345503460f94c51f36/html5/thumbnails/24.jpg)
Memory addressing contd: data alignment
Most machines are byte addressable.
Object Aligned at byte addr Misaligned at byte addr
Byte 0,1,2,3,4,5,6,7 Never
Half word 0,2,4,6 1,3,5,7
Word 0,4 1,2,3,5,6,7
Double word 0 1,2,3,4,5,6,7
![Page 25: Lecture 5: Instruction Set Architecture Computer Engineering 585 Fall 2001](https://reader035.vdocument.in/reader035/viewer/2022062314/56649f345503460f94c51f36/html5/thumbnails/25.jpg)
1 B
Decoder
1KX1B memory
.. …1K
1K to 132 to 1
decoder 32
32X32B memory
32 B
32 to 1 multiplexor
5 MSBAddr bits
5 LSBAddr bits
10 addr. bits
Physical Rationale for Alignment
![Page 26: Lecture 5: Instruction Set Architecture Computer Engineering 585 Fall 2001](https://reader035.vdocument.in/reader035/viewer/2022062314/56649f345503460f94c51f36/html5/thumbnails/26.jpg)
Costs of misalignment
0 1 2 3
a2=1
4 5 6 7
Memory Multiplexor
3 addr bits: a3, a2, a1
a3=0 a3=1
a2=0