introduction to pipelining
DESCRIPTION
Introduction to Pipelining. Adapted from the lecture notes of Dr. John Kubiatowicz (UC Berkeley). A. B. C. D. Pipelining is Natural!. Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Dryer takes 40 minutes - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Introduction to Pipelining](https://reader035.vdocument.in/reader035/viewer/2022062316/56816846550346895dde2129/html5/thumbnails/1.jpg)
Introduction to Pipelining
Adapted from the lecture notes of Dr. John Kubiatowicz (UC Berkeley)
![Page 2: Introduction to Pipelining](https://reader035.vdocument.in/reader035/viewer/2022062316/56816846550346895dde2129/html5/thumbnails/2.jpg)
Pipelining is Natural!• Laundry Example
• Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold
• Washer takes 30 minutes
• Dryer takes 40 minutes
• “Folder” takes 20 minutes
A B C D
![Page 3: Introduction to Pipelining](https://reader035.vdocument.in/reader035/viewer/2022062316/56816846550346895dde2129/html5/thumbnails/3.jpg)
Sequential Laundry
• Sequential laundry takes 6 hours for 4 loads
A
B
C
D
30 40 20 30 40 20 30 40 20 30 40 20
6 PM 7 8 9 10 11 Midnight
Task
Order
Time
![Page 4: Introduction to Pipelining](https://reader035.vdocument.in/reader035/viewer/2022062316/56816846550346895dde2129/html5/thumbnails/4.jpg)
Pipelined Laundry: Start work ASAP
• Pipelined laundry takes 3.5 hours for 4 loads
A
B
C
D
6 PM 7 8 9 10 11 Midnight
Task
Order
Time
30 40 40 40 40 20
![Page 5: Introduction to Pipelining](https://reader035.vdocument.in/reader035/viewer/2022062316/56816846550346895dde2129/html5/thumbnails/5.jpg)
Pipelining Lessons• Latency vs. Throughput• Question
– What is the latency in both cases ?– What is the throughput in both cases ?
Pipelining doesn’t help latency of single task, it helps throughput of entire workload
A
B
C
D
30 40 40 40 40 20
![Page 6: Introduction to Pipelining](https://reader035.vdocument.in/reader035/viewer/2022062316/56816846550346895dde2129/html5/thumbnails/6.jpg)
Pipelining Lessons [contd…]• Question
– What is the fastest operation in the example ?– What is the slowest operation in the example
Pipeline rate limited by slowest pipeline stage
A
B
C
D
30 40 40 40 40 20
![Page 7: Introduction to Pipelining](https://reader035.vdocument.in/reader035/viewer/2022062316/56816846550346895dde2129/html5/thumbnails/7.jpg)
Pipelining Lessons [contd…]
A
B
C
D
30 40 40 40 40 20
Multiple tasks operating simultaneously using different resources
![Page 8: Introduction to Pipelining](https://reader035.vdocument.in/reader035/viewer/2022062316/56816846550346895dde2129/html5/thumbnails/8.jpg)
Pipelining Lessons [contd…]• Question
– Would the speedup increase if we had more steps ?
A
B
C
D
30 40 40 40 40 20
Potential Speedup = Number of pipe stages
![Page 9: Introduction to Pipelining](https://reader035.vdocument.in/reader035/viewer/2022062316/56816846550346895dde2129/html5/thumbnails/9.jpg)
Pipelining Lessons [contd…]• Washer takes 30 minutes
• Dryer takes 40 minutes
• “Folder” takes 20 minutes
• Question– Will it affect if “Folder” also took 40 minutes
Unbalanced lengths of pipe stages reduces speedup
![Page 10: Introduction to Pipelining](https://reader035.vdocument.in/reader035/viewer/2022062316/56816846550346895dde2129/html5/thumbnails/10.jpg)
Pipelining Lessons [contd…]
A
B
C
D
30 40 40 40 40 20
Time to “fill” pipeline and time to “drain” it reduces speedup
![Page 11: Introduction to Pipelining](https://reader035.vdocument.in/reader035/viewer/2022062316/56816846550346895dde2129/html5/thumbnails/11.jpg)
Five Stages of an Instruction
• Ifetch: Instruction Fetch– Fetch the instruction from the Instruction Memory
• Reg/Dec: Registers Fetch and Instruction Decode• Exec: Calculate the memory address• Mem: Read the data from the Data Memory• Wr: Write the data back to the register file
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5
Ifetch Reg/Dec Exec Mem WrLoad
![Page 12: Introduction to Pipelining](https://reader035.vdocument.in/reader035/viewer/2022062316/56816846550346895dde2129/html5/thumbnails/12.jpg)
Conventional Pipelined Execution Representation
IFetch Dcd Exec Mem WB
IFetch Dcd Exec Mem WB
IFetch Dcd Exec Mem WB
IFetch Dcd Exec Mem WB
IFetch Dcd Exec Mem WB
IFetch Dcd Exec Mem WBProgram Flow
Time
![Page 13: Introduction to Pipelining](https://reader035.vdocument.in/reader035/viewer/2022062316/56816846550346895dde2129/html5/thumbnails/13.jpg)
ExampleProgram
Execution Order (in
instructions)
lw $1,100($0)
lw $2, 200($0)
lw $3, 300($0)
Time
Inst. Fetch
Reg
MemALUReg
Inst. Fetch
Reg
MemALUReg
Inst. Fetch
Reg
MemALUReg
Program Execution Order (in
instructions)
lw $1,100($0)
lw $2, 200($0)
lw $3, 300($0)
Time
Inst. Fetch
Reg
MemALUReg
Inst. Fetch
Reg
MemALUReg
Inst. Fetch
Reg
MemALUReg
![Page 14: Introduction to Pipelining](https://reader035.vdocument.in/reader035/viewer/2022062316/56816846550346895dde2129/html5/thumbnails/14.jpg)
Example [contd…]
• Timepipeline = Timenon-pipeline / Pipe stages– Assumptions
• Stages are perfectly balanced• Ideal conditions
![Page 15: Introduction to Pipelining](https://reader035.vdocument.in/reader035/viewer/2022062316/56816846550346895dde2129/html5/thumbnails/15.jpg)
cpsc614Lec 1.15
Performance(X) Execution_time(Y) n = =
Performance(Y) Execution_time(X)
Definitions•Performance is in units of things per sec
– bigger is better•If we are primarily concerned with response time–performance(x) = 1
execution_time(x)
" X is n times faster than Y" means
![Page 16: Introduction to Pipelining](https://reader035.vdocument.in/reader035/viewer/2022062316/56816846550346895dde2129/html5/thumbnails/16.jpg)
Example [contd…]
• Speedup in this case = 24/14 = 1.7• Lets add 1000 more instructions
– Time (non-pipelined) = 1000 x 8 + 24 ns = 8000 ns– Time (pipelined) = 1000 x 2 + 14 ns = 2014 ns– Speedup = 8000 / 2014 = 3.98 = 4 (approx) = 8/2
Instruction throughput is important metric (as opposed to individual instruction)as real programs execute billions of instructions in practical case !!!
![Page 17: Introduction to Pipelining](https://reader035.vdocument.in/reader035/viewer/2022062316/56816846550346895dde2129/html5/thumbnails/17.jpg)
Pipeline Hazards
• Structural HazardIFetch Dcd Exec Mem WB
IFetch Dcd Exec Mem WB
IFetch Dcd Exec Mem WB
IFetch Dcd Exec Mem WB
IFetch Dcd Exec Mem WB
IFetch Dcd Exec Mem WBProgram Flow
![Page 18: Introduction to Pipelining](https://reader035.vdocument.in/reader035/viewer/2022062316/56816846550346895dde2129/html5/thumbnails/18.jpg)
Pipeline Hazard [contd…]
• Control Hazard
• Example– add $4, $5, $6– beq $1, $2, 40– lw $3, 300($0)
![Page 19: Introduction to Pipelining](https://reader035.vdocument.in/reader035/viewer/2022062316/56816846550346895dde2129/html5/thumbnails/19.jpg)
Pipleline Hazard [contd…]
• Data Hazards
• Example– add $s0, $t0, $t1– sub $t2, $s0, $t3
![Page 20: Introduction to Pipelining](https://reader035.vdocument.in/reader035/viewer/2022062316/56816846550346895dde2129/html5/thumbnails/20.jpg)
Summary Pipelining Lessons• Pipelining doesn’t help
latency of single task, it helps throughput of entire workload
• Pipeline rate limited by slowest pipeline stage
• Multiple tasks operating simultaneously using different resources
• Potential speedup = Number pipe stages
• Unbalanced lengths of pipe stages reduces speedup
• Time to “fill” pipeline and time to “drain” it reduces speedup
• Stall for Dependences
A
B
C
D
6 PM 7 8 9
Task
Order
Time
30 40 40 40 40 20
![Page 21: Introduction to Pipelining](https://reader035.vdocument.in/reader035/viewer/2022062316/56816846550346895dde2129/html5/thumbnails/21.jpg)
Summary of Pipeline Hazards
• Structural Hazards– Hardware design
• Control Hazard– Decision based on results
• Data Hazard– Data Dependency