Download - CMPT 250 Computer Architecture
Instructor: Yuzhuang [email protected]
Assembly LinesAn assembly line is a manufacture process in
which parts are added into a product in a sequential manner using optimally planned logistics to create a finished product much faster than handcrafting-type methods.
The Ford Motor Company built the world’s first assembly line between 1908 and 1915.
This pipeline made the Ford Model T affordable and brought high wages to Ford workers.
Some Pictures of the Ford 1913 Assembly Line
A CalculationConsider assembly the car. Assume it has
three steps: install the engine, install the hood, and install the wheel.
One car takes 35 minutes. Three cars take 105 minutes, if only one car can be operated at once.
Install the hood
Install the engine
Install the wheel
5 minutes 20 minutes
10 minutes
A Calculation contd.What if we have three workers for each part?
Ideally, a car can be assembled in every 20 minutes.
Install the hood
Install the engine
Install the wheel
5
25
35
1st car
Install the hood
Install the engine
Install the wheel
Install the hood
Install the engine
Install the wheel
45
55
65
75
2nd car
3rd car
Pipeline Design
Separate the process into different stages of almost the same length.
These stages are separated by registers.
These registers provide temporary storage for data passing through the pipeline and are called pipeline platforms.
A Pipelined DatapathConventional: 0.6, 0.6, 0.2, 0.8, 0.2 ns (new) in total: 2.4 ns rate: 416.7 MHzPipelined: 0.6, 0.6, 0.2, 0.2, 0.8, 0.2, 0.2 ns (new version) in total: 1 ns rate: 1 GHz
0.6
0.6
0.2
0.8
0.2
0.6
0.6
0.2
0.2
0.8
0.2
0.2
D LatchEliminate the undesirable undefined state in the
SR latch: ensure S and R are never 1 at the same time.
D
Q
Q
C
Negative-Edge-Triggered D Flip-Flop1s-Catching behaviour is eliminated as S and
R can not both be 0 in a D Flip-Flop.
D
C
D
C
S
C
R
Assume no data hazards.
How much can we gain?Conventional: 2.4 * 7 ns Pipeline: 9 * 1 ns
Assume no data and control hazards.
Pipeline contd.In the first four clock cycles, the pipeline is filling.
In the next four clock cycles, all stages of the pipeline are active. The pipeline is fully utilized.
In the last three clock cycles, not all stages of the pipeline are active, since the pipeline is emptying.
The Reduced Instruction Set Computer (RISC)The goal of a RISC architecture is high
throughput and fast execution. To achieve these goals, accesses to memory are to be avoided.
A RISC architecture has the following properties: Memory accesses are restricted to load and store
instructions, and data-manipulation instructions are register-to-register.
Addressing modes are limited in number. Instruction formats are all of the same length. Instructions perform elementary operations.
A RISC Instruction Set Architecture32 registers R0 through R31. R0 is a special
register storing the value zero.
Datapath OrganizationThe new datapath has 32 32-bit registers. The address
inputs are therefore five bits.
The replacement of the single-bit position shifter with a barrel shifter to permit multiple-position (SH) shifting.
In the function unit, the ALU is expanded to 32 bits.
The constant unit performs zero fill for CS=0 and sign extension for CS=1.
MUX A is added to provide a path from the updated PC, PC-
1, for implementation of the JML instruction.
Datapath Organization contd.Adding an additional input to MUX D to implement
the Set if Less Than (SLT) instruction. It is 1 when N is 1 and V is 0, or N is 0 and V is 1.
A final difference is that the register file is no longer edge triggered and is no longer a part of a pipeline platform at the end of the write-back (WB) stage.
In the second half of the cycle, it is possible to read data written into the register file during the first half of the same clock cycle. It is called a read-after-write register file.
Control OrganizationSH is added to IR, CS is added to the instruction
decoder, MD is expanded to two bits.
MUX C selects from three different sources for the next value of PC.
BrA is formed from the sum of the updated PC value for the branch instruction and the target offset.
BAA is used for the register jump.
BS, PS and Z are used to select the next PC value.
Control Organization contd.To determine the control codes, the CPU is
viewed much as is the single cycle CPU.
However, it is important to examine the timing carefully to be sure that various parts of the register transfer statement take place in the right stage of the pipeline.
Note that BrA and RAA are obtained in the EX stage.
More on Instruction Set ArchitectureThe format of an instruction is depicted in a
rectangular box symbolizing the bits of the binary instruction.
The bits are divided into groups called fields.An opcode field.An address field.A mode field, which specifies the way the
address field is to be interpreted.
Operand AddressingTo illustrate the influence of the number of
operands on computer programs, we will evaluate the arithmetic statement X=(A+B)(C+D).
Three address instructions:ADD T1, A, B M[T1]<-M[A]+M[B]ADD T2, C, D M[T2]<-M[C]+M[D]MUL X, T1, T2 M[X]<=M[T1]*M[T2]OrADD R1, A, B R1<-M[A]+M[B]ADD R2, C, D R2<-M[C]+M[D]MUL X, R1, R2 M[X]<=R1*R2
Operand Addressing contd. Two-Address Instructions
MOVE T1, A M[T1]<-M[A] ADD T1, B M[T1]<-M[T1]+M[B] MOVE X, C M[X]<-M[C] ADD X, D M[X]<-M[X]+M[D] MUL X, T1 M[X]<-M[X]*M[T1]
One-Address Instructions LD A ACC<-M[A] ADD B ACC<-ACC+M[B] ST X M[X]<-ACC LD C ACC<-M[C] ADD D ACC<-ACC+M[D] MUL X ACC<-ACC*M[X] ST X M[X]<-ACC
Zero-Address InstructionsWe use a stack. The top of the stack is
referred to as TOS. The word below is TOS-1.PUSH A TOS<-M[A]PUSH B TOS<-M[B]ADD TOS<-TOS+TOS-1.PUSH C TOS<-M[C]PUSH D TOS<-M[D]ADD TOS<-TOS+TOS-1
MUL TOS<-TOS*TOS-1
POP X M[X]<-TOS
Addressing ModesThe addressing mode of an instruction
specifies a rule for interpreting or modifying the address field of the instruction.
The address of the operand produced by such a rule is called the effective address. Give programming flexibility to the user.To reduce the number of bits in the address
fields of the instruction.
Addressing Modes contd.Implied Mode: the operand is specified implicitly in
the opcode, e.g. ADD in a stack computer.
Immediate Mode: LDI R0, 3
Register and Register-Indirect ModesRegister Mode: the address field specifies a register.Register-Indirect Mode: the address field specifies a
register whose content gives the address of the operand in memory.
Auto Increment/Decrement Mode:ADD (R1)+,3 M[R1]<-M[R1]+3, R1<-R1+1
Addressing Mode contd.Direct Addressing Mode: the address field of the
instruction gives the address of the operand in memory.
Indirect Addressing Mode: the address field of the instruction gives the address at which the effective address is stored in memory.
Relative Addressing Mode:Effective address = Address part of the instruction + PC
Addressing Mode contd.Index Addressing Mode: the content of an
index register is added to the address part of the instruction to obtain the effective address.
The index register may be a special CPU register or simply a register in a register file, e.g. for arrays.
The Base-Register Mode: the contents of a base register are added to the address part of the instruction to obtain the effective address.
Addressing Modes ExamplesOpcode: Load to ACC
PC=250
R1=400
ACC
250 251 252
400
500
752
800
900
Memory
Addressing Modes Examples contd.
Addressing mode Mnemonic Register Transfer Effective address
Contents of ACC
Immediate
Direct
Indirect
Relative
Index
Register
Register-Indirect
LDA ADRS
LDA #NBR
LDA [ADRS]
LDA $ADRS
LDA ADRS(R1)
LDA R1
LDA (R1)
ACC M[ADRS]
ACC NBR
ACC M[M[ADRS]]
ACC M[ADRS+PC]
ACC M[ADRS+R1]
ACC R1
ACC M[R1]
500
251
800
752
900
-----
400
800
500
300
600
200
400
700
CISC ArchitectureThe goal of the CISC architecture is to match more
closely the operations used in programming language and to provide instructions that facilitate compact programs and conserve memory.
A purely CISC architecture has the following properties: Memory access is directly available to most types of
instructions. Addressing modes are substantial in number. Instruction formats are of different lengths. Instructions perform both elementary and complex
operations.
THANKS!