risc:reduced instruction set computing. overview what is risc architecture? how did risc evolve? how...
TRANSCRIPT
RISC:Reduced Instruction Set RISC:Reduced Instruction Set ComputingComputing
OverviewOverview
What is RISC architecture?How did RISC evolve?How does RISC use instruction pipelining?How does RISC use register windowing?What is the future of RISC ?
Early MicroprocessorsEarly Microprocessors
Early Microprocessors were very simpleThey had a small instruction setGradually, more and more instructions were
added
CISC: Complex Instruction Set CISC: Complex Instruction Set ComputingComputing
May include over 300 instructions Approximately a 1:1 relationship with
higher level languagesOnly some of these instructions are used all
the time
Why are more instructions Why are more instructions slower ?slower ?
A 16 instruction set uses a 4 to 16 decoder
If you had a 32 instruction set, you would have to use a 5 to 32 decoder
The larger the decoder, the longer the propagation delay
Problem with CISCProblem with CISC
The more instructions in the instruction set, the larger the propagation delay
CISC is too slow
Get rid of some of those Get rid of some of those InstructionsInstructions
It takes 20 ns to complete each instruction
If we reduce the instruction set, we can get it down to 18 ns to complete each instruction
Every instruction we deleted can be replaced by 3 of the simpler remaining instructions
We choose to eliminate instructions used less than 2% of the time
Consider ThisConsider This
100%(20 c) vs. 98% (18c) + 2%(54c)
=20c vs. 17.64c + 1.08 c
20c > 18.72c
In this case, reducing instructions is faster
Don’t reduce too muchDon’t reduce too much
- say we eliminate instructions used 10% of the time
100%(20 c) vs. 90% (18c) + 10%(54c)
=20c vs. 16.2c + 5.4 c
20c < 21.6c
If we reduce our instruction set too much, the end result could be slower
RISC: Reduced Instruction RISC: Reduced Instruction Set ArchitectureSet Architecture
Fewer than 100 instructions in instruction set
Fixed Length InstructionsLimited Loading and Storing instructions Fewer Addressing modesInstruction PipelineLarge number of registers
RISC:Reduced Instruction Set RISC:Reduced Instruction Set Architecture cont.Architecture cont.
Hardwired control unitDelayed loads and branchesSpeculative Execution of InstructionsOptimizing compilerSeparated Instruction and Data Streams
RISC vs. CISCRISC vs. CISC
RISC Faster Less complicated
instruction set More difficult to
program
CISC Slower More complicated
instruction set Easier to program
Ex:Fixed Length InstructionsEx:Fixed Length InstructionsInstructional Formats for SPARC CPUInstructional Formats for SPARC CPU
Sparc CPU addSparc CPU addr1r1r2+r3r2+r3
Format of instruction: op2 = add Destination register : 00001 : register 1 Add : 000000 Source register: 00010 : register 2 0 00000000 : unused in this instruction Source register: register 3
1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1
op2 Register 1 Add Register 2 Not used Register 3
PipelinesPipelines
Assembly Lines and PipelinesAssembly Lines and PipelinesWhy are assembly lines cool?
Work on more than one item at a time
Finish more items faster
Instruction PipelinesInstruction Pipelines
Very similar to assembly lines in manufacturing
Divides the execution of a task into several stages
Then it can work on more than one task at a time
Overall, faster , and more efficient
Pipeline example: 3 stagesPipeline example: 3 stages
Fetch
instruction
Decode Instruction
Select registers
Execute Instruction
Store Result
Each stage must be completed in 1 clock cycle for this to work
Example 1:Example 1:r1r1r2 +r3r2 +r3r4 r4 r5+r6r5+r6r7 r7 r8+r9r8+r9
Fetch instruction 1
Decode instruction 1, select registers
Execute instruction 1, store results
10 0001 000000 00010 0 00000000 00011
Add r2 + r3 3+2=5
r1 5
Fetch instruction 1
Decode instruction 1, select registers
Execute instruction 1, store results
10 0001 000000 00010 0 00000000 00011
Add r2 + r3
r2=2, r3=3
3+2=5
r1 5
10 0100 000000 01010 0 00000000 00110
Add r5 + r6
r5=5, r6=6
5+6=11
r7 11
10 0111 000000 01000 0 00000000 01001
Add r8 + r9
r8=8,r9=9
8+9=17
r7 17
Fetch instruction 2
Decode instruction 2, select registers
Execute instruction 2, store results
Fetch instruction 3
Decode instruction 3, select registers
Execute instruction 3, store results
r1r1r2 +r3r2 +r3r4 r4 r5+r6r5+r6r7 r7 r8+r9r8+r9
t1 t2 t3 t4 t5
Consider a more problematic Consider a more problematic exampleexample
r1r1r2 +r3 r2 +r3
r4r4r1 +r3 r1 +r3
r5r5r6 +r3r6 +r3
Fetch instruction 1
Decode instruction 1, select registers
Execute instruction 1, store results
10 0001 000000 00010 0 00000000 00011
Add r2 + r3
r2=2, r3=3
3+2=5
r1 5
10 0100 000000 00001 0 00000000 00011
Add r1 + r3
r1=1, r3=3
3+1=4
r4 4
10 0111 000000 01000 0 00000000 01001
Add r6 + r3
r6=6,r3=3
6+3=9
r5 9
Fetch instruction 2
Decode instruction 2, select registers
Execute instruction 2, store results
Fetch instruction 3
Decode instruction 3, select registers
Execute instruction 3, store results
r1r1r2 +r3r2 +r3r4 r4 r1+r3r1+r3r5 r5 r6+r3r6+r3
t1 t2 t3 t4 t5
Problem: data conflict
Since t3 is not yet completed, r1 contains wrong value
Solutions to Data ConflictSolutions to Data Conflict
No-op insertionsInstruction reorderingStall insertionsData forwarding
Fetch instruction 1
Decode instruction 1, select registers
Execute instruction 1, store results
10 0001 000000 00010 0 00000000 00011
Add r2 + r3
r2=2, r3=3
3+2=5
r1 5
10 0100 000000 00001 0 00000000 00011
Add r1 + r3
r1=5, r3=3
3+5=8
r4 4
10 0111 000000 01000 0 00000000 01001
Add r6 + r3
r6=6,r3=3
6+3=9
r5 9
Fetch instruction 2
Decode instruction 2, select registers
Execute instruction 2, store results
Fetch instruction 3
Decode instruction 3, select registers
Execute instruction 3, store results
r1r1r2 +r3r2 +r3r4 r4 r1+r3r1+r3r5 r5 r6+r3r6+r3
t1 t2 t3 t4 t5
Solution1: add No OpSolution1: add No Op
No OP
No op
Possible problems with no-opPossible problems with no-op
SlowerWastes time
Fetch instruction 1
Decode instruction 1, select registers
Execute instruction 1, store results
10 0001 000000 00010 0 00000000 00011
Add r2 + r3
r2=2, r3=3
3+2=5
r1 5
10 0100 000000 00001 0 00000000 00011
Add r6 + r3
r6=6, r3=3
6+3=9
r5 9
10 0111 000000 01000 0 00000000 01001
Add r1 + r3
r1=5,r3=3
5+3=8
r1 8
Fetch instruction 2
Decode instruction 2, select registers
Execute instruction 2, store results
Fetch instruction 3
Decode instruction 3, select registers
Execute instruction 3, store results
r1r1r2 +r3r2 +r3r5 r5 r6+r3r6+r3
r4 r4 r1+r3r1+r3
t1 t2 t3 t4 t5
Solution2: instruction reorderingSolution2: instruction reordering
Possible problems with re-Possible problems with re-orderingordering
It is not possible to reorder every set of operations successfully
Consider:r1r1 +r2
r1r1 +r3r1r1 +r4
Fetch instruction 1
Decode instruction 1, select registers
Execute instruction 1, store results
10 0001 000000 00010 0 00000000 00011
Add r2 + r3
r2=2, r3=3
3+2=5
r1 5
10 0100 000000 00001 0 00000000 00011
Add r1 + r3
r1=5, r3=3
3+5=8
r4 4
10 0111 000000 01000 0 00000000 01001
Add r6 + r3
r6=6,r3=3
6+3=9
r5 9
Fetch instruction 2
Decode instruction 2, select registers
Execute instruction 2, store results
Fetch instruction 3
Decode instruction 3, select registers
Execute instruction 3, store results
r1r1r2 +r3r2 +r3r4 r4 r1+r3r1+r3r5 r5 r6+r3r6+r3
t1 t2 t3 t4 t5
Solution3: add stall insertionSolution3: add stall insertion
stall
stall
Fetch instruction 1
Decode instruction 1, select registers
Execute instruction 1, store results
10 0001 000000 00010 0 00000000 00011
Add r2 + r3
r2=2, r3=3
3+2=5
r1 5
10 0100 000000 00001 0 00000000 00011
Add r1 + r3
r1=5, r3=3
3+5=8
r4 4
10 0111 000000 01000 0 00000000 01001
Add r6 + r3
r6=6,r3=3
6+3=9
r5 9
Fetch instruction 2
Decode instruction 2, select registers
Execute instruction 2, store results
Fetch instruction 3
Decode instruction 3, select registers
Execute instruction 3, store results
r1r1r2 +r3r2 +r3r4 r4 r1+r3r1+r3r5 r5 r6+r3r6+r3
t1 t2 t3 t4 t5
Solution4: data forwardingSolution4: data forwarding
Data passed within same time cycle to next instruction
Solutions to Data ConflictSolutions to Data ConflictNo-Op insertions Slow and Wasteful
Stall insertions
Instruction Reordering
not always possible
Data forwarding
Register WindowingRegister Windowing
Each window overlaps with the next
Main method would be window1
Subroutine is window 2
Since they overlap, window 2 can return values to window 1 easily
SummarySummary
RISC architecture definedBenefits and drawbacks of RISC
architecturePipelines
– Problems with pipelines
Register Windowing
Future of RISCFuture of RISC
Hotly debatedCISC is still easier to support
– Provides backward compatibility
RISC is fasterMore than likely, see a convergence of the 2
systems– Ex: Pentium Processor