1 ece 371 microprocessors chapter 2 microprocessors: a high level view herbert g. mayer, psu status...
TRANSCRIPT
![Page 1: 1 ECE 371 Microprocessors Chapter 2 Microprocessors: A High Level View Herbert G. Mayer, PSU Status 10/13/2015 For use at CCUT Fall 2015 Some Material](https://reader036.vdocument.in/reader036/viewer/2022062500/5697c0151a28abf838ccddfd/html5/thumbnails/1.jpg)
1
ECE 371Microprocessors
Chapter 2
Microprocessors: A High Level View
Herbert G. Mayer, PSUStatus 10/13/2015
For use at CCUT Fall 2015Some Material inherited from Eric Krause @ PSU MS ECE
![Page 2: 1 ECE 371 Microprocessors Chapter 2 Microprocessors: A High Level View Herbert G. Mayer, PSU Status 10/13/2015 For use at CCUT Fall 2015 Some Material](https://reader036.vdocument.in/reader036/viewer/2022062500/5697c0151a28abf838ccddfd/html5/thumbnails/2.jpg)
2
Syllabus
Introduction Block Diagrams Simple µP Operation Arithmetic Logic Unit ALU BCD Arithmetic Instruction Set Architecture (ISA) Iron Law of µP Performance Amdahl’s Law Registers and Score Board Bibliographys
![Page 3: 1 ECE 371 Microprocessors Chapter 2 Microprocessors: A High Level View Herbert G. Mayer, PSU Status 10/13/2015 For use at CCUT Fall 2015 Some Material](https://reader036.vdocument.in/reader036/viewer/2022062500/5697c0151a28abf838ccddfd/html5/thumbnails/3.jpg)
3
Introduction
A microprocessor µP is functionally similar to an old-fashioned mainframe (MF) computer, more even to a typical desktop (DT) computer
Yet there are notable differences! Like a Main Frame (MF) and desktop (DT), a
µP:Has a defined instruction set, named ISA: for Instruction Set Architecture
Has a defined data width, e.g. 8-bit bytes, 60-bit words, etc
Has a defined address range, typically 232 or 264
Includes an arithmetic unit for integer, floating-point, BCD, and bit operations
Includes a memory subsystem, plus several levels of caches
Includes an IO subsystem and has other peripheral devices
![Page 4: 1 ECE 371 Microprocessors Chapter 2 Microprocessors: A High Level View Herbert G. Mayer, PSU Status 10/13/2015 For use at CCUT Fall 2015 Some Material](https://reader036.vdocument.in/reader036/viewer/2022062500/5697c0151a28abf838ccddfd/html5/thumbnails/4.jpg)
4
IntroductionDifferent from MF computer or DT, a typical
µP:Has a smaller power envelope, in order to be usable in laptops
Uses a lower main frequency, in order to generate less heat, also enabling laptop use with long battery life
Manages power consumption and heat generation through frequency variation, and module shut-down
Integrates some peripheral devices onto the same silicon, again saving electrical power, saving board space, plus gaining some speed
Provides a smaller number of duplicated arithmetic logical units (ALU) that compute data; as a result some computations, giving identical clock frequencies, run more slowly on a µP due to the lower degree of parallel execution of instructions
Must fit into a small volume of space, e.g. into a laptop; as a result, there exist fewer options for cooling the actual CPUs
![Page 5: 1 ECE 371 Microprocessors Chapter 2 Microprocessors: A High Level View Herbert G. Mayer, PSU Status 10/13/2015 For use at CCUT Fall 2015 Some Material](https://reader036.vdocument.in/reader036/viewer/2022062500/5697c0151a28abf838ccddfd/html5/thumbnails/5.jpg)
5
Introduction
A µP CPU performs arithmetic-logical operations
Memory holds information, has a path to/from CPU via main bus
How does the µP interact with the real world?
IO ports connect external peripheral devices with HW registers ri, and allow ri to access memory
Buses connect components E.g. data bus, address bus,
specific enabling bits
![Page 6: 1 ECE 371 Microprocessors Chapter 2 Microprocessors: A High Level View Herbert G. Mayer, PSU Status 10/13/2015 For use at CCUT Fall 2015 Some Material](https://reader036.vdocument.in/reader036/viewer/2022062500/5697c0151a28abf838ccddfd/html5/thumbnails/6.jpg)
6
Introduction IO ports connect a variety of IO devices Could be a simple latch to read/write Or a separate embedded system, e.g. an
ethernet controller Common: General Purpose IO, or GPIO, which
exposes raw bits IO controller and memory controller
together can become as complex as the CPU E.g. on a motherboard the CPU chip,
generally the largest component, is surrounded by other, good-sized chips
Trend is to relocate some of their functions onto the same die as the CPU
![Page 7: 1 ECE 371 Microprocessors Chapter 2 Microprocessors: A High Level View Herbert G. Mayer, PSU Status 10/13/2015 For use at CCUT Fall 2015 Some Material](https://reader036.vdocument.in/reader036/viewer/2022062500/5697c0151a28abf838ccddfd/html5/thumbnails/7.jpg)
7
Block Diagram of Itanium µP
![Page 8: 1 ECE 371 Microprocessors Chapter 2 Microprocessors: A High Level View Herbert G. Mayer, PSU Status 10/13/2015 For use at CCUT Fall 2015 Some Material](https://reader036.vdocument.in/reader036/viewer/2022062500/5697c0151a28abf838ccddfd/html5/thumbnails/8.jpg)
8
Actual Photo of Itanium2 µP
![Page 9: 1 ECE 371 Microprocessors Chapter 2 Microprocessors: A High Level View Herbert G. Mayer, PSU Status 10/13/2015 For use at CCUT Fall 2015 Some Material](https://reader036.vdocument.in/reader036/viewer/2022062500/5697c0151a28abf838ccddfd/html5/thumbnails/9.jpg)
9
Block Diagram of AMD Athlon
![Page 10: 1 ECE 371 Microprocessors Chapter 2 Microprocessors: A High Level View Herbert G. Mayer, PSU Status 10/13/2015 For use at CCUT Fall 2015 Some Material](https://reader036.vdocument.in/reader036/viewer/2022062500/5697c0151a28abf838ccddfd/html5/thumbnails/10.jpg)
10
Block Diagram of Compaq Alpha
![Page 11: 1 ECE 371 Microprocessors Chapter 2 Microprocessors: A High Level View Herbert G. Mayer, PSU Status 10/13/2015 For use at CCUT Fall 2015 Some Material](https://reader036.vdocument.in/reader036/viewer/2022062500/5697c0151a28abf838ccddfd/html5/thumbnails/11.jpg)
11
Die Image of Compaq Alpha
![Page 12: 1 ECE 371 Microprocessors Chapter 2 Microprocessors: A High Level View Herbert G. Mayer, PSU Status 10/13/2015 For use at CCUT Fall 2015 Some Material](https://reader036.vdocument.in/reader036/viewer/2022062500/5697c0151a28abf838ccddfd/html5/thumbnails/12.jpg)
12
Simple µP Operation We’ll analyze a
hypothetical, abstract µP operation
In effect the simplest generic operation is:
1. Read numeric input from a keyboard; here we read an integer from the keyboard
2. To perform an operation on the read operand: addition of constant integer 7
3. Followed by displaying the newly computed result
![Page 13: 1 ECE 371 Microprocessors Chapter 2 Microprocessors: A High Level View Herbert G. Mayer, PSU Status 10/13/2015 For use at CCUT Fall 2015 Some Material](https://reader036.vdocument.in/reader036/viewer/2022062500/5697c0151a28abf838ccddfd/html5/thumbnails/13.jpg)
13
Simple µP Operation, Step 1
1. Read numeric (e.g. integer) input data from keyboard, port 5:
µP sends memory address for Port Input Instruction on the address bus
µP sends memory read control signal
Memory sends machine code for Port Read to µP
µP decodes instruction, sends out port 5 on address bus
µP sends control signal for Port Read
Input port sends µP one byte from port 5, by placing it on the data bus
![Page 14: 1 ECE 371 Microprocessors Chapter 2 Microprocessors: A High Level View Herbert G. Mayer, PSU Status 10/13/2015 For use at CCUT Fall 2015 Some Material](https://reader036.vdocument.in/reader036/viewer/2022062500/5697c0151a28abf838ccddfd/html5/thumbnails/14.jpg)
14
Simple µP Operation, Step 2
2. Add integer constant 7 to the data read from port 5:
µP sends memory address for Add instruction on address bus
µP sends control signal for Read on control bus
Memory returns machine code for Add
µP adds integer constant 7 to the value previously read from input port 5
![Page 15: 1 ECE 371 Microprocessors Chapter 2 Microprocessors: A High Level View Herbert G. Mayer, PSU Status 10/13/2015 For use at CCUT Fall 2015 Some Material](https://reader036.vdocument.in/reader036/viewer/2022062500/5697c0151a28abf838ccddfd/html5/thumbnails/15.jpg)
15
Simple µP Operation, Step 3
3. Display newly computed integer value:
µP sends memory address for Port Output instruction on address bus
µP sends memory read control signal
Memory returns opcode (machine code) for Port Output instruction
µP decodes instruction, sends output port address (0x7) on address bus
µP sends sum from register on data bus
µP sends port write signal on control bus
![Page 16: 1 ECE 371 Microprocessors Chapter 2 Microprocessors: A High Level View Herbert G. Mayer, PSU Status 10/13/2015 For use at CCUT Fall 2015 Some Material](https://reader036.vdocument.in/reader036/viewer/2022062500/5697c0151a28abf838ccddfd/html5/thumbnails/16.jpg)
16
Arithmetic Logic Unit ALU Full adder produces COUT and SUM
using 3 1-bit inputs, A B and CIN
Just via logic gates
What does this do? See result below right:
![Page 17: 1 ECE 371 Microprocessors Chapter 2 Microprocessors: A High Level View Herbert G. Mayer, PSU Status 10/13/2015 For use at CCUT Fall 2015 Some Material](https://reader036.vdocument.in/reader036/viewer/2022062500/5697c0151a28abf838ccddfd/html5/thumbnails/17.jpg)
17
Arithmetic Logic Unit ALU
![Page 18: 1 ECE 371 Microprocessors Chapter 2 Microprocessors: A High Level View Herbert G. Mayer, PSU Status 10/13/2015 For use at CCUT Fall 2015 Some Material](https://reader036.vdocument.in/reader036/viewer/2022062500/5697c0151a28abf838ccddfd/html5/thumbnails/18.jpg)
18
Arithmetic Logic Unit ALU
This simple ALU performs any one of 16 arithmetic functions, on two 4-bit words and a carry in
And what determines the function it performs?
Selection plus MODE! We are going to program this device to
perform a function on the two 4-bit inputs and produce a result on the output pins
![Page 19: 1 ECE 371 Microprocessors Chapter 2 Microprocessors: A High Level View Herbert G. Mayer, PSU Status 10/13/2015 For use at CCUT Fall 2015 Some Material](https://reader036.vdocument.in/reader036/viewer/2022062500/5697c0151a28abf838ccddfd/html5/thumbnails/19.jpg)
19
Arithmetic Logic Unit ALU
S3 S2 S1 S0
A= A3 A2 A1 A0 L L L L B= B3 B2 B1 B0 L L L H
Basic Format F= F3 F2 F1 F0 L L H L L L H H A= 0 0 1 1 L H L L B= 0 1 0 1 L H L H
Logic OR F = A + B 0 1 1 1 L H H L L H H H A= 0 0 1 1 H L L L B= 0 1 0 1 H L L H
Logic AND F= A * B 0 0 0 1 H L H L H L H H A= 0 0 1 1 H H L L B= 0 1 0 1 H H L H
Logic XOR F= A × B 0 1 1 0 H H H L H H H H
![Page 20: 1 ECE 371 Microprocessors Chapter 2 Microprocessors: A High Level View Herbert G. Mayer, PSU Status 10/13/2015 For use at CCUT Fall 2015 Some Material](https://reader036.vdocument.in/reader036/viewer/2022062500/5697c0151a28abf838ccddfd/html5/thumbnails/20.jpg)
20
Arithmetic Logic Unit ALU Bitwise OR, meaning fi = ai + bi, e.g. f0 =
a0 + b0, etc, as if there were a dual-input OR gate for each bit of 2 inputs
ALU can perform any of these functions based on how it is programmed!
Thus we have a device usable in different ways based on program
That is a fundamental part of a microprocessor
How do we program it? With an operational code (SELECT lines). We call this the OpCode
We add 7; but how can this be done? Use the OpCode for Add, i.e. a binary
value applied to the select code, to select an operation
![Page 21: 1 ECE 371 Microprocessors Chapter 2 Microprocessors: A High Level View Herbert G. Mayer, PSU Status 10/13/2015 For use at CCUT Fall 2015 Some Material](https://reader036.vdocument.in/reader036/viewer/2022062500/5697c0151a28abf838ccddfd/html5/thumbnails/21.jpg)
21
Binary Coded Decimal Arithmetic
Binary Coded Decimal is a machine-internal data type
BCD uses 4 bits per decimal digit But computes in binary, to re-use integer
ALU part of the processor Since only 10 of 16 numeric choices are
used, it will be necessary to add 6 to each decimal digit computed
I.e. it is necessary add 6 to each group of 4 bits
The raw binary addition would be correct only in binary, and incorrect in BCD
Adding 6 per BCD digit is that needed correction
![Page 22: 1 ECE 371 Microprocessors Chapter 2 Microprocessors: A High Level View Herbert G. Mayer, PSU Status 10/13/2015 For use at CCUT Fall 2015 Some Material](https://reader036.vdocument.in/reader036/viewer/2022062500/5697c0151a28abf838ccddfd/html5/thumbnails/22.jpg)
22
Binary Coded Decimal Arithmetic
With correction, BCD ALU part works like integer add part, with the additional step of adding 6, but the adder is already implemented in HW
Only a constant storage for decimal 6 is needed
Correction algorithm: Given 2 BCD digits A and B:
if sum( A, B ) >= 10102 or carry( A, B ) = 1 then
sum = add( sum( A, B ), 01102 );
end if;
![Page 23: 1 ECE 371 Microprocessors Chapter 2 Microprocessors: A High Level View Herbert G. Mayer, PSU Status 10/13/2015 For use at CCUT Fall 2015 Some Material](https://reader036.vdocument.in/reader036/viewer/2022062500/5697c0151a28abf838ccddfd/html5/thumbnails/23.jpg)
23
Binary Coded Decimal Arithmetic
![Page 24: 1 ECE 371 Microprocessors Chapter 2 Microprocessors: A High Level View Herbert G. Mayer, PSU Status 10/13/2015 For use at CCUT Fall 2015 Some Material](https://reader036.vdocument.in/reader036/viewer/2022062500/5697c0151a28abf838ccddfd/html5/thumbnails/24.jpg)
24
Binary Coded Decimal Arithmetic
![Page 25: 1 ECE 371 Microprocessors Chapter 2 Microprocessors: A High Level View Herbert G. Mayer, PSU Status 10/13/2015 For use at CCUT Fall 2015 Some Material](https://reader036.vdocument.in/reader036/viewer/2022062500/5697c0151a28abf838ccddfd/html5/thumbnails/25.jpg)
25
Binary Coded Decimal Arithmetic
![Page 26: 1 ECE 371 Microprocessors Chapter 2 Microprocessors: A High Level View Herbert G. Mayer, PSU Status 10/13/2015 For use at CCUT Fall 2015 Some Material](https://reader036.vdocument.in/reader036/viewer/2022062500/5697c0151a28abf838ccddfd/html5/thumbnails/26.jpg)
26
BCD Correction Circuit
![Page 27: 1 ECE 371 Microprocessors Chapter 2 Microprocessors: A High Level View Herbert G. Mayer, PSU Status 10/13/2015 For use at CCUT Fall 2015 Some Material](https://reader036.vdocument.in/reader036/viewer/2022062500/5697c0151a28abf838ccddfd/html5/thumbnails/27.jpg)
27
BCD Correction Circuit
ALU is in the center of above diagram Register File is an ordered sequence of
flip-flops that can be accessed collectively
Core of algorithm is to do 2 adds The /4 means: this is a 4-bit line MuxB will either take the second operand
or add 0, or add 0x110 It has 3 4-input options on it Registers in front of mux hold values so
they are stable for the mux Output register holds value of first
addition, usable for second addition
![Page 28: 1 ECE 371 Microprocessors Chapter 2 Microprocessors: A High Level View Herbert G. Mayer, PSU Status 10/13/2015 For use at CCUT Fall 2015 Some Material](https://reader036.vdocument.in/reader036/viewer/2022062500/5697c0151a28abf838ccddfd/html5/thumbnails/28.jpg)
28
BCD Correction Circuit
To read the 2 operands from the reg file, send them into the A/B registers and do the first addition
Then use the second addition. It will take the result of the first addition, then MUX B will switch to add 6 or 0
how do we which? Inspecting the flag register
In this simple case, one bit stores carry information
Output register result goes into the correction detection logig:
It decides, to add 0 or add 6 Combinational logic: If Carry, or if one
of these 6 signals, assert the FIX signal that goes to the controller
![Page 29: 1 ECE 371 Microprocessors Chapter 2 Microprocessors: A High Level View Herbert G. Mayer, PSU Status 10/13/2015 For use at CCUT Fall 2015 Some Material](https://reader036.vdocument.in/reader036/viewer/2022062500/5697c0151a28abf838ccddfd/html5/thumbnails/29.jpg)
29
BCD Correction Circuit
1. Select desired operand registers in register file
2. Transfer operand A from register file to A operand register
3. Transfer operand B from register file to B operand register
4. Set MUX A to select A for ALU
5. Set MUX B to select B for ALU
6. Apply ADD code to ALU select inputs
7. Transfer first sum to output of ALU Output register
8. Transfer COUT and other flags to output of Status register
9. Set MUX A to select temporary sum for the ALU A input
10.Set MUX B to select correction 0110 or 0000, based on 1st result
11.Transfer final sum to ALU output register
12.Select register in register file to write final result
13.Write result to selected register in register file
![Page 30: 1 ECE 371 Microprocessors Chapter 2 Microprocessors: A High Level View Herbert G. Mayer, PSU Status 10/13/2015 For use at CCUT Fall 2015 Some Material](https://reader036.vdocument.in/reader036/viewer/2022062500/5697c0151a28abf838ccddfd/html5/thumbnails/30.jpg)
30
Instruction Set Architecture (ISA) ISA is boundary between Software (SW) and
Hardware (HW) Specifies logical machine that is visible
to the programmer & compiler Is functional specification for processor
designers That boundary is sometimes a very low-
level piece of system software that handles exceptions, interrupts, and HW-specific services
Could fall into the domain of the OS
![Page 31: 1 ECE 371 Microprocessors Chapter 2 Microprocessors: A High Level View Herbert G. Mayer, PSU Status 10/13/2015 For use at CCUT Fall 2015 Some Material](https://reader036.vdocument.in/reader036/viewer/2022062500/5697c0151a28abf838ccddfd/html5/thumbnails/31.jpg)
31
Instruction Set Architecture (ISA) Specified by an ISA: Operations: what to perform and in which
order Temporary Operand Storage in the CPU:
accumulator, stack, registers Note that stack can be word-sized, even
bit-sized (design of successor for NCR’s Century architecture of the 1970s)
Number of operands per instruction Operand location: where and how to
specify/locate the operands Type and size of operands Instruction Encoding in binary
![Page 32: 1 ECE 371 Microprocessors Chapter 2 Microprocessors: A High Level View Herbert G. Mayer, PSU Status 10/13/2015 For use at CCUT Fall 2015 Some Material](https://reader036.vdocument.in/reader036/viewer/2022062500/5697c0151a28abf838ccddfd/html5/thumbnails/32.jpg)
32
Instruction Set Architecture
ISA: Dynamic Static Interface (DSI)
![Page 33: 1 ECE 371 Microprocessors Chapter 2 Microprocessors: A High Level View Herbert G. Mayer, PSU Status 10/13/2015 For use at CCUT Fall 2015 Some Material](https://reader036.vdocument.in/reader036/viewer/2022062500/5697c0151a28abf838ccddfd/html5/thumbnails/33.jpg)
33
Iron Law of µP Performance
Clock-rate doesn’t count, bus width doesn’t count, the number of registers and operations executed in parallel doesn’t count!
What counts is how long it takes for my computational task to complete. That time is of the essence of computing!
If a MIPS-based solution runs at 1 GHz, completing a program X in 2 minutes, while an Intel Pentium® 4–based program runs at 3 GHz and completes that same program x in 2.5 minutes, programmers are more interested in the MIPS solution
![Page 34: 1 ECE 371 Microprocessors Chapter 2 Microprocessors: A High Level View Herbert G. Mayer, PSU Status 10/13/2015 For use at CCUT Fall 2015 Some Material](https://reader036.vdocument.in/reader036/viewer/2022062500/5697c0151a28abf838ccddfd/html5/thumbnails/34.jpg)
34
Iron Law of µP Performance
If a solution on an Intel CPU can be expressed in an object program of size Y bytes, but on an IBM architecture of size 1.1 Y bytes, the Intel solution is generally more attractive
Assuming same execution, performance Meaning of this:
Wall-clock time (Time) is time the user has to wait for program completion
Program Size is indicator of overall physical complexity of computational task
![Page 35: 1 ECE 371 Microprocessors Chapter 2 Microprocessors: A High Level View Herbert G. Mayer, PSU Status 10/13/2015 For use at CCUT Fall 2015 Some Material](https://reader036.vdocument.in/reader036/viewer/2022062500/5697c0151a28abf838ccddfd/html5/thumbnails/35.jpg)
35
Iron Law of µP Performance
![Page 36: 1 ECE 371 Microprocessors Chapter 2 Microprocessors: A High Level View Herbert G. Mayer, PSU Status 10/13/2015 For use at CCUT Fall 2015 Some Material](https://reader036.vdocument.in/reader036/viewer/2022062500/5697c0151a28abf838ccddfd/html5/thumbnails/36.jpg)
36
Amdahl’s Law Articulated by Gene Amdahl During 1967 AFIPS conference; yes,
computers existed then already Stating that the maximum speedup of
a program P is dominated by its sequential portion S
I.e. if some part of P can be perfectly or infinitely accelerated due to numerous parallel processors, but some part S of P is inherently sequential, then the resulting performance is dominated by S
See Wikipedia sample:
![Page 37: 1 ECE 371 Microprocessors Chapter 2 Microprocessors: A High Level View Herbert G. Mayer, PSU Status 10/13/2015 For use at CCUT Fall 2015 Some Material](https://reader036.vdocument.in/reader036/viewer/2022062500/5697c0151a28abf838ccddfd/html5/thumbnails/37.jpg)
37
Amdahl’s Law (From Wikipedia)The speedup of a program using multiple processors in parallel computing is limited by the sequential fraction of the program. For example, if 95% of the program can be parallelized, the theoretical maximum speedup using parallel computing would be 20, no matter how many processors are used
n = element of N, N number of processors
B = element of { 0, 1 }
T(n) = time to execute with n processors
T(n) = T(1) ( B + (1-B) / n )
S(n) = Speedup T(1) / T(n)
S(n) = 1 / ( B + (1 – B ) / n )
![Page 38: 1 ECE 371 Microprocessors Chapter 2 Microprocessors: A High Level View Herbert G. Mayer, PSU Status 10/13/2015 For use at CCUT Fall 2015 Some Material](https://reader036.vdocument.in/reader036/viewer/2022062500/5697c0151a28abf838ccddfd/html5/thumbnails/38.jpg)
38
Amdahl’s Law (From Wikipedia)
![Page 39: 1 ECE 371 Microprocessors Chapter 2 Microprocessors: A High Level View Herbert G. Mayer, PSU Status 10/13/2015 For use at CCUT Fall 2015 Some Material](https://reader036.vdocument.in/reader036/viewer/2022062500/5697c0151a28abf838ccddfd/html5/thumbnails/39.jpg)
39
Registers,Dependences,Score Board
in a Microprocessor
![Page 40: 1 ECE 371 Microprocessors Chapter 2 Microprocessors: A High Level View Herbert G. Mayer, PSU Status 10/13/2015 For use at CCUT Fall 2015 Some Material](https://reader036.vdocument.in/reader036/viewer/2022062500/5697c0151a28abf838ccddfd/html5/thumbnails/40.jpg)
40
Register Dependencies Inter-instruction dependencies, in
engineering parlance also known as dependences, arise between registers or memory locations being defined and used
One instruction computes a result into a register (or memory); another instruction needs that result from that same register, or from that same memory location
Or, one instruction uses a resource; and after its use the same resource is newly recomputed
Dependences require sequential execution, lest the result is unpredictable
![Page 41: 1 ECE 371 Microprocessors Chapter 2 Microprocessors: A High Level View Herbert G. Mayer, PSU Status 10/13/2015 For use at CCUT Fall 2015 Some Material](https://reader036.vdocument.in/reader036/viewer/2022062500/5697c0151a28abf838ccddfd/html5/thumbnails/41.jpg)
41
Register DependenciesTrue-Dependence, AKA Data Dependence: <- synonymous!r3 ← r1 op r2 1: r5 ← r3 op r4 2: Read after Write, RAW
Anti-Dependence, not a true dependenceparallelize under right conditionr3 ← r1 op r2 1:r1 ← r5 op r4 2: Write after Read, WAR
Output Dependence, similar to Anti-Dependence: can do something
r3 ← r1 op r2 1: Writer5 ← r3 op r4 2: User3 ← r6 op r7 3: Write after Write, WAW, use
between
![Page 42: 1 ECE 371 Microprocessors Chapter 2 Microprocessors: A High Level View Herbert G. Mayer, PSU Status 10/13/2015 For use at CCUT Fall 2015 Some Material](https://reader036.vdocument.in/reader036/viewer/2022062500/5697c0151a28abf838ccddfd/html5/thumbnails/42.jpg)
42
Register Dependencies
Control Dependence:
// ri, i = 1..4 come in “live”
if ( condition1 ) {
r3 = r1 op r2;
}else{ see the jump here?
r5 = r3 op r4;
} // end if
write( r3 );
![Page 43: 1 ECE 371 Microprocessors Chapter 2 Microprocessors: A High Level View Herbert G. Mayer, PSU Status 10/13/2015 For use at CCUT Fall 2015 Some Material](https://reader036.vdocument.in/reader036/viewer/2022062500/5697c0151a28abf838ccddfd/html5/thumbnails/43.jpg)
43
Register Renaming Only data dependence is a real
dependence, hence called true dependence
Other dependences are artifacts of insufficient resources, generally insufficient registers
This means: if additional registers were available, then replacing some of these conflicting registers with new, other registers, could make the conflict disappear!
Anti- and Output-Dependences are indeed such false dependences
![Page 44: 1 ECE 371 Microprocessors Chapter 2 Microprocessors: A High Level View Herbert G. Mayer, PSU Status 10/13/2015 For use at CCUT Fall 2015 Some Material](https://reader036.vdocument.in/reader036/viewer/2022062500/5697c0151a28abf838ccddfd/html5/thumbnails/44.jpg)
44
Register Renaming
Original Code:-- r2, r3, r5, r6, r7 come in “live” ... code before
-- r1, r4 are not “live”, don’t have values
L1: r1 ← r2 op r3
L2: r4 ← r1 op r5
L3: r1 ← r3 op r6
L4: r3 ← r1 op r7
Dependences before:
Lx: Ly: x, y = 1..4, which dependence?
![Page 45: 1 ECE 371 Microprocessors Chapter 2 Microprocessors: A High Level View Herbert G. Mayer, PSU Status 10/13/2015 For use at CCUT Fall 2015 Some Material](https://reader036.vdocument.in/reader036/viewer/2022062500/5697c0151a28abf838ccddfd/html5/thumbnails/45.jpg)
45
Register RenamingOriginal Code:L1: r1 ← r2 op r3
L2: r4 ← r1 op r5
L3: r1 ← r3 op r6
L4: r3 ← r1 op r7
Initial Dependences, lots of dependences:
L1, L2 true-Dep with r1
L1, L3 output-Dep with r1
L1, L4 anti-Dep with r3
L3, L4 true-Dep with r1
L2, L3 anti-Dep with r1
L3, L4 anti-Dep with r3s
![Page 46: 1 ECE 371 Microprocessors Chapter 2 Microprocessors: A High Level View Herbert G. Mayer, PSU Status 10/13/2015 For use at CCUT Fall 2015 Some Material](https://reader036.vdocument.in/reader036/viewer/2022062500/5697c0151a28abf838ccddfd/html5/thumbnails/46.jpg)
46
Register RenamingWhat could we change, if we had some additional registers?
Compute and use other temporaries via other registers to reduce dependences!
That sometimes allows a higher degree of parallelism, due to a lower degree of dependences
More parallelism means faster execution!
![Page 47: 1 ECE 371 Microprocessors Chapter 2 Microprocessors: A High Level View Herbert G. Mayer, PSU Status 10/13/2015 For use at CCUT Fall 2015 Some Material](https://reader036.vdocument.in/reader036/viewer/2022062500/5697c0151a28abf838ccddfd/html5/thumbnails/47.jpg)
47
Register RenamingOriginal Code: New Code, after adding
regs:
L1: r1 ← r2 op r3 r10 ← r2 op r30 –- r30 instead
L2: r4 ← r1 op r5 r4 ← r10 op r5 –- r10 instead
L3: r1 ← r3 op r6 r1 ← r30 op r6
L4: r3 ← r1 op r7 r3 ← r1 op r7
Dependences before: Dependences after:
L1, L2 true-Dep with r1 L1, L2 true-Dep with r10
L1, L3 output-Dep with r1 L3, L4 true-Dep with r1
L1, L4 anti-Dep with r3 // ri, i = 1..7 are “live”
L3, L4 true-Dep with r1
L2, L3 anti-Dep with r1
L3, L4 anti-Dep with r3
![Page 48: 1 ECE 371 Microprocessors Chapter 2 Microprocessors: A High Level View Herbert G. Mayer, PSU Status 10/13/2015 For use at CCUT Fall 2015 Some Material](https://reader036.vdocument.in/reader036/viewer/2022062500/5697c0151a28abf838ccddfd/html5/thumbnails/48.jpg)
48
Register Renaming• With these additional, renamed regs, the new
code could possibly run in half the time!
• First: Compute into r10 instead of r1, but you need to have such an additional register r10; no time penalty!
• Also: Compute in preceding code into r30 instead of r3, if r30 available; also no time penalty!
• Then the following regs are live afterwards: r1, r3, r4, plus the non-modified ones, i.e. r2! r2 came in live, must go out live!
• While r10 and r30 are don’t cares afterwards
![Page 49: 1 ECE 371 Microprocessors Chapter 2 Microprocessors: A High Level View Herbert G. Mayer, PSU Status 10/13/2015 For use at CCUT Fall 2015 Some Material](https://reader036.vdocument.in/reader036/viewer/2022062500/5697c0151a28abf838ccddfd/html5/thumbnails/49.jpg)
49
Score BoardScore-board: array of single-bit HW programmable resources sb[]
Manages other HW resources, specifically registers
In this single-bit HW array sb[], every bit i in sb[i] is associated with a specific machine register, the one identified by i , i.e. ri
Association is by index, i.e. by name: sb[i] belongs to reg ri
Only if sb[i] = 0, does that register i have valid data, and can therefore be used
If sb[i] = 0 then register ri is NOT in process of being written
If bit i is set, i.e. if sb[i] = 1, then that register ri is reserved; i.e it cannot be used
Initially all sb[*] set to 0; hence all registers ri can be used
![Page 50: 1 ECE 371 Microprocessors Chapter 2 Microprocessors: A High Level View Herbert G. Mayer, PSU Status 10/13/2015 For use at CCUT Fall 2015 Some Material](https://reader036.vdocument.in/reader036/viewer/2022062500/5697c0151a28abf838ccddfd/html5/thumbnails/50.jpg)
50
Score BoardExecution constraints:
rd ← rs op rt
if sb[s] or if sb[t] is set → RAW dependence, hence stall the computation; wait until both rs and rt are available
if sb[d] is set→ WAW dependence, hence stall the write; wait until rd has been used; SW can sometimes determine to use another register instead of rd
Else, if none of the 3 registers are in use, dispatch the instruction immediately
![Page 51: 1 ECE 371 Microprocessors Chapter 2 Microprocessors: A High Level View Herbert G. Mayer, PSU Status 10/13/2015 For use at CCUT Fall 2015 Some Material](https://reader036.vdocument.in/reader036/viewer/2022062500/5697c0151a28abf838ccddfd/html5/thumbnails/51.jpg)
51
Score BoardOut of order (ooo) execution has long been common on microprocessors, where speed is critical
To allow ooo execution, upon computing the value of rd : update rd, and clear sb[d]
For uses (reads), HW may use any register i, whose sb[i] is 0
For definitions (writes), HW may set any register j, whose sb[j] is 0
Independent of original order, in which source program was written, i.e. possibly ooo
Ooo instructions retire in original order
![Page 52: 1 ECE 371 Microprocessors Chapter 2 Microprocessors: A High Level View Herbert G. Mayer, PSU Status 10/13/2015 For use at CCUT Fall 2015 Some Material](https://reader036.vdocument.in/reader036/viewer/2022062500/5697c0151a28abf838ccddfd/html5/thumbnails/52.jpg)
52
Bibliography1. http://en.wikipedia.org/wiki/Flynn's_taxonomy
2. http://www.ajwm.net/amayer/papers/B5000.html
3. http://www.robelle.com/smugbook/classic.html
4. http://www.intel.com/design/itanium/manuals.htm
5. http://www.csupomona.edu/~hnriley/www/VonN.html
6. http://cva.stanford.edu/classes/ee482s/scribed/lect11.pdf