25 pipeline telkom

27
1 IKI20210 Pengantar Organisasi Komputer Kuliah no. 25: Pipeline 10 Januari 2003 Bobby Nazief ([email protected]) Johny Moningka ([email protected]) bahan kuliah: http://www.cs.ui.ac.id/~iki20210/ Sumber : 1. Hamacher. Computer Organization, ed-4. 2. Materi kuliah CS152, th. 1997, UCB.

Upload: rizky-fatihah

Post on 13-Sep-2015

224 views

Category:

Documents


1 download

DESCRIPTION

tototototototot

TRANSCRIPT

Pengantar Organisasi Komputer2. Materi kuliah CS152, th. 1997, UCB.
*
*
to wash, dry, and fold
Washer takes 30 minutes
Dryer takes 40 minutes
“Folder” takes 20 minutes
If they learned pipelining, how long would laundry take?
30
40
20
30
40
20
30
40
20
30
40
20
Pipelined laundry takes 3.5 hours for 4 loads
6 PM
Pipelining Lessons
Pipelining doesn’t help latency of single task, it helps throughput of entire workload
Pipeline rate limited by slowest pipeline stage
Multiple tasks operating simultaneously using different resources
Potential speedup = Number pipe stages
Unbalanced lengths of pipe stages reduces speedup
Time to “fill” pipeline and time to “drain” it reduce speedup
Stall for Dependences
Instruksi:
Langkah-langkah:
Fetch instruksi
PCout, MARin, Read, Clear Y, Set carry-in to ALU, Add, Zin
Zout, PCin, WMFC
Fetch operand #1 (isi lokasi memori yg ditunjuk oleh R3)
R3out, MARin, Read
R1out, Yin, WMFC
Lakukan operasi penjumlahan
MDRout, Add, Zin
Zout, R1in, End
Ifetch: Instruction Fetch
Exec: Calculate the memory address
Mem: Read the data from the Data Memory
Wr: Write the data back to the register file
Load/Store Architecture:
Ifetch
Reg/Dec
Exec
Mem
Wr
Load
As shown here, each of these five steps will take one clock cycle to complete.
And in pipeline terminology, each step is referred to as one stage of the pipeline.
+1 = 8 min. (X:48)
Program Flow
Why Pipeline?
Non-pipeline machine
10 ns/cycle x 4.6 CPI (due to instr mix) x 100 inst = 4600 ns
Ideal pipelined machine
10 ns/cycle x (4 cycle fill + 1 CPI x 100 inst) = 1040 ns
Ifetch
Reg
Exec
Mem
Wr
Ifetch
Reg
Exec
Mem
Wr
Ifetch
Reg
Exec
Mem
Ifetch
Reg
Exec
Mem
Wr
Ifetch
Ifetch
Reg
Exec
Mem
Wr
Clk
I
n
s
t
r.
O
r
d
e
r
PC
Yes: Pipeline Hazards
structural hazards: attempt to use the same resource two different ways at the same time
E.g., combined washer/dryer would be a structural hazard or folder busy doing something else (watching TV)
data hazards: attempt to use item before it is ready
E.g., one sock of pair in dryer and one in washer; can’t fold until get sock from washer through dryer
instruction depends on result of prior instruction still in the pipeline
control hazards: attempt to make a decision before condition is evaluated
E.g., washing football uniforms and need to get proper detergent level; need to see after dryer before next load in
branch instructions
take action (or delay action) to resolve hazards
*
Mem
I
n
s
t
r.
O
r
d
e
r
Reg
Mem
Reg
Reg
Mem
Reg
Detection is easy in this case! (right half highlight means read, left half write)
ALU
Mem
ALU
Mem
Reg
Mem
Reg
ALU
Mem
Reg
Mem
Reg
ALU
ALU
Mem
Reg
Mem
Reg
Stall: wait until decision is clear
Its possible to move up decision to 2nd stage by adding hardware to check registers as being read
Impact: 2 clock cycles per branch instruction
=> slow
I
n
s
t
r.
O
r
d
e
r
Predict not taken
Impact: 1 clock cycles per branch instruction if right, 2 if wrong (right ­ 50% of time)
More dynamic scheme: history of 1 branch (­ 90%)
I
n
s
t
r.
O
r
d
e
r
Redefine branch behavior (takes place after next instruction) “delayed branch”
Impact: 0 clock cycles per branch instruction if can find instruction to put in “slot” (­ 50% of time)
As launch more instruction per clock cycle, less useful
I
n
s
t
r.
O
r
d
e
r
I
n
s
t
r.
O
r
d
e
r
I
n
s
t
r.
O
r
d
e
r
Forwarding Structure
Detect nearest valid write op operand register and forward into op latches, bypassing remainder of the pipe
Increase muxes to add paths from pipeline registers
Data Forwarding = Data Bypassing
Can’t solve with forwarding:
Must delay/stall instruction dependent on loads
Time (clock cycles)