cs222: pipeline processor · 2017-04-12 · ‐c ass lw t r t m t r t a t a t i t i sw t + t t + t...

32
CS222: Pipeline Processor Design Dr. A. Sahu Dept of Comp. Sc. & Engg. Dept of Comp. Sc. & Engg. Indian Institute of Technology Guwahati 1

Upload: others

Post on 04-Jan-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS222: Pipeline Processor · 2017-04-12 · ‐c ass lw t R t M t R t A t A t I t I sw t + t t + t I ... m_wr nop. Micro operations and ... IF D EX Mem WB IF D EX Mem WB IF D EX Mem

CS222: Pipeline Processor Design

Dr.  A. Sahu

Dept of Comp. Sc. & Engg.Dept of Comp. Sc. & Engg.

Indian Institute of Technology Guwahati

1

Page 2: CS222: Pipeline Processor · 2017-04-12 · ‐c ass lw t R t M t R t A t A t I t I sw t + t t + t I ... m_wr nop. Micro operations and ... IF D EX Mem WB IF D EX Mem WB IF D EX Mem

Outline• Mid Exam Answer Script: Friday Class • Previous:

– Control unit for Multi‐Cycle design

• Pipeline processor• Basic Structure of PipelineBasic Structure of Pipeline• Hazards 

D t H d (D t d d )– Data Hazards (Data dependency)– Resource Hazards (Same resource  used in two stage)stage)

– Control hazards (Branch instruction)2

Page 3: CS222: Pipeline Processor · 2017-04-12 · ‐c ass lw t R t M t R t A t A t I t I sw t + t t + t I ... m_wr nop. Micro operations and ... IF D EX Mem WB IF D EX Mem WB IF D EX Mem

Course StructureCourse Structure • Before Mid‐Semester

I i d A bl i– Instruction set and Assembly programming

– Arithmetic's unit design (Adder/Sub/Mult, Float)

– Processor  Design: data path for both Single cycle and multi cycle processor (RTL, control) 

f• After Mid‐Semester– Pipeline processor, hazards, superscalar

– Memory hierarchy, Cache, Disk

– IO organization and controller

– Advance topic related to Computer Architecture3

Page 4: CS222: Pipeline Processor · 2017-04-12 · ‐c ass lw t R t M t R t A t A t I t I sw t + t t + t I ... m_wr nop. Micro operations and ... IF D EX Mem WB IF D EX Mem WB IF D EX Mem

Problems with single cycle designProblems with single cycle design

• Slowest instruction pulls down the clockSlowest instruction pulls down the clock frequency

• Resource utilization is poor• Resource utilization is poor

• There are some instructions which are i ibl b i l d i hiimpossible to be implemented in this manner– Think which are the instructions ?

Page 5: CS222: Pipeline Processor · 2017-04-12 · ‐c ass lw t R t M t R t A t A t I t I sw t + t t + t I ... m_wr nop. Micro operations and ... IF D EX Mem WB IF D EX Mem WB IF D EX Mem

1. Clock period in single cycle design1. Clock period in single cycle design

tt ttR l

clockperiodtR

tRtM

tR

tR

tA

tA

tI

tI

R‐class

lw

period

tMtR

tR

tA

tA

tI

tI

sw

tR tAt+t

tIt+tI

beq

t+tI

t+jtI

j

Page 6: CS222: Pipeline Processor · 2017-04-12 · ‐c ass lw t R t M t R t A t A t I t I sw t + t t + t I ... m_wr nop. Micro operations and ... IF D EX Mem WB IF D EX Mem WB IF D EX Mem

1. Clock period in multi‐cycle design1. Clock period in multi cycle design

clocktR

tRtM

tR

tR

tA

tA

tI

tI

R‐class

lw

clockperiod

RM

tM

R

tR

t

A

tA

t

I

tI

t

sw

tR tAt+t

tIt+t

beq

t+tI

t+jtI

j

Page 7: CS222: Pipeline Processor · 2017-04-12 · ‐c ass lw t R t M t R t A t A t I t I sw t + t t + t I ... m_wr nop. Micro operations and ... IF D EX Mem WB IF D EX Mem WB IF D EX Mem

Single Cycle DatapathSingle Cycle Datapath1

0

s2s2ins[25‐0]ja[31‐0]

28

0

++ s2s2

1

4

PC+4[31‐28]

ins[25‐21]

00

1

1100

1

PCPC

IM

adins

RF

rad1rad2wadwd

rd1

rd2

DMad rd

ALU

ins[25 21]ins[20‐16]

ins[15‐11]11

0011wd DMwd

sxsxins[15‐0]

16

Page 8: CS222: Pipeline Processor · 2017-04-12 · ‐c ass lw t R t M t R t A t A t I t I sw t + t t + t I ... m_wr nop. Micro operations and ... IF D EX Mem WB IF D EX Mem WB IF D EX Mem

Improve Resource UtilizationImprove Resource Utilization

• Merge IM/DMMerge IM/DM– Lw: IM/PC++ ‐R  ‐ ALU‐ DM ‐R– Sw: IM/PC++ ‐R  ‐ ALU‐ DM/

• Eliminate 1st Adder and Use ALU– As 1st adder is used in 1st Cycle and ALU is free inAs 1 adder is used in 1 Cycle and ALU is free in 1st Cycle

• Eliminate 2nd Adder and Use ALU– As 2nd adder is used in 2nd Cycle and ALU is free in 2nd Cycle

8

Page 9: CS222: Pipeline Processor · 2017-04-12 · ‐c ass lw t R t M t R t A t A t I t I sw t + t t + t I ... m_wr nop. Micro operations and ... IF D EX Mem WB IF D EX Mem WB IF D EX Mem

Multi‐cycle data pathMulti cycle data path

• Single cycle approach to multi‐cycle approach:Single cycle approach to multi cycle approach: improve performance and resource sharing

• Delays in different cycles should be balanced• Delays in different cycles should be balanced

• Single ALU and single memory used

• Additional registers and multiplexers required

Page 10: CS222: Pipeline Processor · 2017-04-12 · ‐c ass lw t R t M t R t A t A t I t I sw t + t t + t I ... m_wr nop. Micro operations and ... IF D EX Mem WB IF D EX Mem WB IF D EX Mem

Multi‐cycle Control DesignMulti cycle Control  Design

• Break instructions into cyclesBreak instructions into cycles

• Put cycle sequences together

C l i l d i i• Control signal groups and micro operations

• Control states and signal values

• Control state transitions

Page 11: CS222: Pipeline Processor · 2017-04-12 · ‐c ass lw t R t M t R t A t A t I t I sw t + t t + t I ... m_wr nop. Micro operations and ... IF D EX Mem WB IF D EX Mem WB IF D EX Mem

Put cycle sequences togetherPut cycle sequences togetherR‐class sw lw beq j

IR = Mem[ ]PC = .. + ..

IR = Mem[ ]PC = .. + ..

IR = Mem[ ]PC = .. + ..

IR = Mem[ ]PC = .. + ..

IR = Mem[ ]PC = .. + ..

A = RF[..]B = RF[..]

A = RF[..]B = RF[..]

A = RF[..] A = RF[..]B = RF[..]Res = ..+..

PC = ..

Res = ..op.. Res = ..+.. Res = ..+.. if(..==..)PC =..

RF[..] = .. Mem[ ] = .. DR = Mem[ ]these can be merged

RF[..] = DR

Page 12: CS222: Pipeline Processor · 2017-04-12 · ‐c ass lw t R t M t R t A t A t I t I sw t + t t + t I ... m_wr nop. Micro operations and ... IF D EX Mem WB IF D EX Mem WB IF D EX Mem

With a common decoding cycleWith a common decoding cycle

IR = Mem[ ]PC = .. + ..

A = RF[..][ ]B = RF[..]Res = ..+..

R‐class sw lw beq j

Res = ..op.. Res = ..+.. Res = ..+.. if(..==..)PC =..

PC = ..

RF[..] = .. Mem[ ] = .. DR = Mem[ ]

RF[ ] DRRF[..] = DR

Page 13: CS222: Pipeline Processor · 2017-04-12 · ‐c ass lw t R t M t R t A t A t I t I sw t + t t + t I ... m_wr nop. Micro operations and ... IF D EX Mem WB IF D EX Mem WB IF D EX Mem

lw, sw can split after third cyclelw, sw can split after third cycle

IR = Mem[ ]PC = .. + ..

A = RF[..][ ]B = RF[..]Res = ..+..

R‐class sw / lw beq j

Res = ..op.. Res = ..+.. if(..==..)PC =..

PC = ..sw lw

RF[..] = .. Mem[ ] = .. DR = Mem[ ]

RF[ ] DRRF[..] = DR

Page 14: CS222: Pipeline Processor · 2017-04-12 · ‐c ass lw t R t M t R t A t A t I t I sw t + t t + t I ... m_wr nop. Micro operations and ... IF D EX Mem WB IF D EX Mem WB IF D EX Mem

Control signals in multi‐cycle DPControl signals in multi cycle DP

22ins[25‐0] ja[31‐0]28

rad1

ss2

ins[25‐21]PC[31‐28]

IRA 20

PW

IorD

MR MW RW

AW

Asrc1

R WZ

0

1

PC

RF

rad1rad2wadwd

rd1

rd2

ALU

ins[20‐16]

ins[15‐11]Mem

ad rd

wd

IR

Res

1

0

1

0

14

B0

1

BW 3

ReW

Psrc

sxsx16

s2s2

Res

ins[15‐0]DR 2

3

1

0

IW

DW

Rdst

M2R

BW

Asrc2

3op

Psrc

0DW M2R

Page 15: CS222: Pipeline Processor · 2017-04-12 · ‐c ass lw t R t M t R t A t A t I t I sw t + t t + t I ... m_wr nop. Micro operations and ... IF D EX Mem WB IF D EX Mem WB IF D EX Mem

Micro operations and l i lcontrol signals ‐ PC group

PWu PWc Psrc

PC PC 4 1 X 1

Micro operation

iPC = PC + 4

if (A == B) PC = Res

1        X      1

0        1       0PCinc

branchPC=PC[31‐28] || s2(IR[25‐0]) 1        X      2default 0        0      X

jumpnop

PW = PWu + Z . PWc

Page 16: CS222: Pipeline Processor · 2017-04-12 · ‐c ass lw t R t M t R t A t A t I t I sw t + t t + t I ... m_wr nop. Micro operations and ... IF D EX Mem WB IF D EX Mem WB IF D EX Mem

Micro operations and l i lcontrol signals ‐Mem group

MW MR IorD IW DWMicro operation

IR = Mem[PC] 0     1     0     1     0fetch

DR = Mem[Res]

Mem[Res] = B

0     1     1     0     1

1 0 1 0 0

m_rd

m wrMem[Res]   B 1     0     1     0     0

default 0     0     X     0     0

m_wr

nop

Page 17: CS222: Pipeline Processor · 2017-04-12 · ‐c ass lw t R t M t R t A t A t I t I sw t + t t + t I ... m_wr nop. Micro operations and ... IF D EX Mem WB IF D EX Mem WB IF D EX Mem

Micro operations and l i lcontrol signals ‐ RF group

di i RW Rdst M2R AW BW

A = RF[IR[25‐21]] 0     X     X     1     0

Micro operation

rs2A[ [ ]]

B = RF[IR[20‐16]] 0     X     X     0     1

rs2A

rt2B

RF[IR[15‐11]] = Res 1     1     0      0     0res2rd

RF[IR[20‐16]] = DR 1     0     1      0     0

default 0     X     X     0     0

mem2rt

nopnop

Page 18: CS222: Pipeline Processor · 2017-04-12 · ‐c ass lw t R t M t R t A t A t I t I sw t + t t + t I ... m_wr nop. Micro operations and ... IF D EX Mem WB IF D EX Mem WB IF D EX Mem

Micro operations and l i lcontrol signals ‐ ALU group

opc Asrc1 Asrc2 ReW

PC = PC + 4 0      0        1        0

Micro operation

PCinc

Res = A op B

Res = A + sx(IR[15 0])

2      1        0        1

0 1 2 1

arith

M ddRes = A + sx(IR[15‐0]) 0      1        2        1

Res = PC + s2(sx(IR[15‐11])) 0      0        3        1

Maddr

Paddr

if (A == B) PC = Res 1      1        0        0

default X X X 0

branch

nopdefault X      X       X        0nop

Page 19: CS222: Pipeline Processor · 2017-04-12 · ‐c ass lw t R t M t R t A t A t I t I sw t + t t + t I ... m_wr nop. Micro operations and ... IF D EX Mem WB IF D EX Mem WB IF D EX Mem

Control states and micro operationsControl states and micro operations

fetchcs0cs0 PCinc

rs2A

cs0cs0

rt2BPaddr

R‐class sw / lw beq j

cs1cs1

arith Maddr branch jumpsw lw

cs2cs2cs4cs4

cs8cs8cs9cs9

res2rd m_wr m_rd

2 tcs3cs3 cs5cs5

cs6cs6

77

cs9cs9

mem2rtcs7cs7

Page 20: CS222: Pipeline Processor · 2017-04-12 · ‐c ass lw t R t M t R t A t A t I t I sw t + t t + t I ... m_wr nop. Micro operations and ... IF D EX Mem WB IF D EX Mem WB IF D EX Mem

Control states and signal valuesControl states and signal valuesPC grp Mem grp RF grp ALU grp

0 PCinc          fetch           nop           PCincnop              nop       rs2A,rt2B      Paddr

ith

cs0cs12 nop              nop           nop            arith

nop              nop        res2rd            nopnop nop nop Maddr

cs2cs3cs4 nop              nop           nop           Maddr

nop            m_wr          nop             nopnop m rd nop nop

cs4cs5cs6 nop            m_rd           nop             nop

nop              nop        mem2rt          nopbranch nop nop branch

cs6cs7cs8 branch          nop           nop          branch

jump            nop           nop             nopcs8cs9

Page 21: CS222: Pipeline Processor · 2017-04-12 · ‐c ass lw t R t M t R t A t A t I t I sw t + t t + t I ... m_wr nop. Micro operations and ... IF D EX Mem WB IF D EX Mem WB IF D EX Mem

Control state transitionsControl state transitions

R‐class sw lw beq jR class  sw lw beq jcs1           cs1        cs1        cs1        cs1cs2           cs4        cs4        cs8        cs9

cs0cs1

cs3            X           X           X           Xcs0            X           X           X           X

cs2cs3

X             cs5        cs6         X           XX             cs0         X           X           X

cs4cs56 X              X          cs7         X           X

X              X          cs0         X           XX X X cs0 X

cs6cs7cs8 X              X           X          cs0         X

X              X           X           X           cs0cs8cs9

Page 22: CS222: Pipeline Processor · 2017-04-12 · ‐c ass lw t R t M t R t A t A t I t I sw t + t t + t I ... m_wr nop. Micro operations and ... IF D EX Mem WB IF D EX Mem WB IF D EX Mem

Pipeline DesignPipeline Design • Single Cycle  

– Poor Resource Utilization, ,TC >= long Instr latency 

• Multi Cycle– TC > Loner Stage, Better Utilization, Still performance need toperformance need to improve using pipeline

– When Decoding INSi you h Scan Fetch INSi+1

• Pipeline

22

Page 23: CS222: Pipeline Processor · 2017-04-12 · ‐c ass lw t R t M t R t A t A t I t I sw t + t t + t I ... m_wr nop. Micro operations and ... IF D EX Mem WB IF D EX Mem WB IF D EX Mem

Instruction PipelineInstruction Pipeline

IF D EX Mem WB

IF D EX Mem WBIF D EX Mem WB

IF D EX Mem WB

IF D EX Mem WB

IF D EX Mem WB

Performance: 1 instruction per Cycle23

All the Stages work in parallel, No resource can be shared by stages

Page 24: CS222: Pipeline Processor · 2017-04-12 · ‐c ass lw t R t M t R t A t A t I t I sw t + t t + t I ... m_wr nop. Micro operations and ... IF D EX Mem WB IF D EX Mem WB IF D EX Mem

Single cycle datapath (abstract)Single cycle datapath (abstract)

+

+4

PCPC

IM

adins

RF

rad

wadwd

rd1

rd2

DMad rd

ALU

wd DMwd

Page 25: CS222: Pipeline Processor · 2017-04-12 · ‐c ass lw t R t M t R t A t A t I t I sw t + t t + t I ... m_wr nop. Micro operations and ... IF D EX Mem WB IF D EX Mem WB IF D EX Mem

Pipelined datapathPipelined datapath

IF ID EX Mem WB

IF/ID ID/EX EX/Mem Mem/WB

+

+4

PC

IM

adins

RF

rad

wadwd

rd1

rd2

DM

ad rd

ALU

wd DMwd

Page 26: CS222: Pipeline Processor · 2017-04-12 · ‐c ass lw t R t M t R t A t A t I t I sw t + t t + t I ... m_wr nop. Micro operations and ... IF D EX Mem WB IF D EX Mem WB IF D EX Mem

Don’t share resources in StagesDon t share resources in Stages

• In Multi Cycle DesignIn Multi Cycle Design– ALU used for PC++ and Offset Adding

Used for 1st Adder and 2nd Adder– Used for 1st Adder and 2nd Adder

– Register FILE is used in  2nd and 4th Cycle

I Pi li• In Pipeline  – Use Separate resource 1st Adder, 2nd Adder & ALU

– Register FILE is accesses 1st Half of 2nd Cycle and 2nd Half of 4th Cycle

26

Page 27: CS222: Pipeline Processor · 2017-04-12 · ‐c ass lw t R t M t R t A t A t I t I sw t + t t + t I ... m_wr nop. Micro operations and ... IF D EX Mem WB IF D EX Mem WB IF D EX Mem

Put back multiplexersPut back multiplexers

IF ID EX Mem WB1

IF/ID ID/EX EX/Mem Mem/WB

1

0

d

+

+4

s2s2

PCPC

IM

adins

RF

rad1

wad

wd

rd1

rd2

DMad rd

ALU

0

1

0

11100

1

rad2

wd DMwd

0011

sxsx

Page 28: CS222: Pipeline Processor · 2017-04-12 · ‐c ass lw t R t M t R t A t A t I t I sw t + t t + t I ... m_wr nop. Micro operations and ... IF D EX Mem WB IF D EX Mem WB IF D EX Mem

Correction for WB stageCorrection for WB stage

IF ID EX Mem WB1

IF/ID ID/EX EX/Mem Mem/WB

1

0

d

+

+4

s2s2

PCPC

IM

adins

RF

rad1

wad

wd

rd1

rd2

DMad rd

ALU 1100

1

rad2

00

wd DMwd

0011

sxsx

11

Page 29: CS222: Pipeline Processor · 2017-04-12 · ‐c ass lw t R t M t R t A t A t I t I sw t + t t + t I ... m_wr nop. Micro operations and ... IF D EX Mem WB IF D EX Mem WB IF D EX Mem

Abstract: Adding controlAbstract: Adding control1

0

ololcontro

contro

d

+

+4

s2s2

PCPC

IM

adins

RF

rad1

wad

wd

rd1

rd2

DMad rd

ALU 1100

1

rad2

wd DMwd

00

0011

sxsx

Actrl

Actrl

11

Page 30: CS222: Pipeline Processor · 2017-04-12 · ‐c ass lw t R t M t R t A t A t I t I sw t + t t + t I ... m_wr nop. Micro operations and ... IF D EX Mem WB IF D EX Mem WB IF D EX Mem

Control signals with delaysControl signals with delays1

0

ololcontro

contro

d

+

+4

s2s2

PCPC

IM

adins

RF

rad1

wad

wd

rd1

rd2

DMad rd

ALU 1100

1

rad2

wd DMwd

00

0011

sxsx

Actrl

Actrl

11

Page 31: CS222: Pipeline Processor · 2017-04-12 · ‐c ass lw t R t M t R t A t A t I t I sw t + t t + t I ... m_wr nop. Micro operations and ... IF D EX Mem WB IF D EX Mem WB IF D EX Mem

Correction for RF write signalCorrection for RF write signal1

0

ololcontro

contro

d

+

+4

s2s2

PCPC

IM

adins

RF

rad1

wad

wd

rd1

rd2

DMad rd

ALU 1100

1

rad2

wd DMwd

00

0011

sxsx

Actrl

Actrl

11

Page 32: CS222: Pipeline Processor · 2017-04-12 · ‐c ass lw t R t M t R t A t A t I t I sw t + t t + t I ... m_wr nop. Micro operations and ... IF D EX Mem WB IF D EX Mem WB IF D EX Mem

32