sequential design and pipelining - doc.ic.ac.ukwl/teachlocal/cuscomp/notes/cc07.pdf · •...

24
wl 2019 7.1 Sequential design and pipelining sequential design example: systolic array stream representation delays and anti-delays pipelining pros and cons graphical method Horner’s Rule

Upload: lamthuy

Post on 31-Jul-2019

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sequential design and pipelining - doc.ic.ac.ukwl/teachlocal/cuscomp/notes/cc07.pdf · • pipelining – pros and cons – graphical method – Horner’s Rule. wl 2019 7.2 Systolic

wl 2019 7.1

Sequential design and pipelining

• sequential design

– example: systolic array

– stream representation

– delays and anti-delays

• pipelining

– pros and cons

– graphical method

– Horner’s Rule

Page 2: Sequential design and pipelining - doc.ic.ac.ukwl/teachlocal/cuscomp/notes/cc07.pdf · • pipelining – pros and cons – graphical method – Horner’s Rule. wl 2019 7.2 Systolic

wl 2019 7.2

Systolic array: data-oriented parallelism

• introduced by Kung and Leiserson in 1978

• pump data through processors

– like blood pumped through the body

• efficient, scalable, suitable for regular control

– particularly suited for FPGA technology

• challenges

– describe them systematically

– verify that they work

M

P P P P

M: Memory

P : Pipelined

Processor

Page 3: Sequential design and pipelining - doc.ic.ac.ukwl/teachlocal/cuscomp/notes/cc07.pdf · • pipelining – pros and cons – graphical method – Horner’s Rule. wl 2019 7.2 Systolic

wl 2019 7.3

*

=

(Kung and Leiserson, 1978)

+C’=A * B

Page 4: Sequential design and pipelining - doc.ic.ac.ukwl/teachlocal/cuscomp/notes/cc07.pdf · • pipelining – pros and cons – graphical method – Horner’s Rule. wl 2019 7.2 Systolic

wl 2019 7.4

P33P32P31

P23P22P21

P11 P12 P13a13 a12 a11

a23 a22 a21

a33 a32 a31

b31

b21

b11

b32

b22

b12

b33

b23

b13

each processor Pij

at each time step:

- computes

ct+1 = at * bt + ct

- passes a rightwards

- passes b downwards

- ct remains stationary

Simple systolic matrix multiplier

(source: J Break)

Page 5: Sequential design and pipelining - doc.ic.ac.ukwl/teachlocal/cuscomp/notes/cc07.pdf · • pipelining – pros and cons – graphical method – Horner’s Rule. wl 2019 7.2 Systolic

wl 2019 7.5

3 4 2

2 5 3

3 2 5

X =3 4 2

2 5 3

3 2 5

23 36 28

25 39 34

28 32 37

2 4 3

3 5 2

3

2

3

5 2 3

5

3

2

2

5

4

P33P32P31

P23P22P21

P11 P12 P13

Systolic matrix multiplier: example

(source: J Break)

Page 6: Sequential design and pipelining - doc.ic.ac.ukwl/teachlocal/cuscomp/notes/cc07.pdf · • pipelining – pros and cons – graphical method – Horner’s Rule. wl 2019 7.2 Systolic

wl 2019 7.6

3*32 4

3 5 2

3

2

5 2 3

5

3

2

2

5

4

Time step: 1

9 0 0 0 0 0 0 0 0

P11 P12 P13 P21 P23P22 P31 P32 P33

(source: J Break)

Page 7: Sequential design and pipelining - doc.ic.ac.ukwl/teachlocal/cuscomp/notes/cc07.pdf · • pipelining – pros and cons – graphical method – Horner’s Rule. wl 2019 7.2 Systolic

wl 2019 7.7

3

2*3

4*2 3*42

3 5

5 2 3

5

3

2

2

5

Time step: 2

17 12 0 6 0 0 0 0 0

P11 P12 P13 P21 P23P22 P31 P32 P33

(source: J Break)

Page 8: Sequential design and pipelining - doc.ic.ac.ukwl/teachlocal/cuscomp/notes/cc07.pdf · • pipelining – pros and cons – graphical method – Horner’s Rule. wl 2019 7.2 Systolic

wl 2019 7.8

2

3*3

2*45*2

2*3 4*5 3*2

3

5 2

5

3

Time step: 3

23 32 6 16 8 0 9 0 0

P11 P12 P13 P21 P23P22 P31 P32 P33

(source: J Break)

Page 9: Sequential design and pipelining - doc.ic.ac.ukwl/teachlocal/cuscomp/notes/cc07.pdf · • pipelining – pros and cons – graphical method – Horner’s Rule. wl 2019 7.2 Systolic

wl 2019 7.9

5

3*42*2

2*25*53*3

2*2 4*3

5

Time step: 4

23 36 18 25 33 4 13 12 0

P11 P12 P13 P21 P23P22 P31 P32 P33

(source: J Break)

Page 10: Sequential design and pipelining - doc.ic.ac.ukwl/teachlocal/cuscomp/notes/cc07.pdf · • pipelining – pros and cons – graphical method – Horner’s Rule. wl 2019 7.2 Systolic

wl 2019 7.10

3*25*25*3

5*33*2

2*5

Time step: 5

23 36 28 25 39 19 28 22 6

P11 P12 P13 P21 P23P22 P31 P32 P33

(source: J Break)

Page 11: Sequential design and pipelining - doc.ic.ac.ukwl/teachlocal/cuscomp/notes/cc07.pdf · • pipelining – pros and cons – graphical method – Horner’s Rule. wl 2019 7.2 Systolic

wl 2019 7.11

2*35*2

3*5

Time step: 6

23 36 28 25 39 34 28 32 12

P11 P12 P13 P21 P23P22 P31 P32 P33

(source: J Break)

Page 12: Sequential design and pipelining - doc.ic.ac.ukwl/teachlocal/cuscomp/notes/cc07.pdf · • pipelining – pros and cons – graphical method – Horner’s Rule. wl 2019 7.2 Systolic

wl 2019 7.12

5*5

Time step: 7

23 36 28 25 39 34 28 32 37

P11 P12 P13 P21 P23P22 P31 P32 P33

(source: J Break)

Done – now look into each cell…

Page 13: Sequential design and pipelining - doc.ic.ac.ukwl/teachlocal/cuscomp/notes/cc07.pdf · • pipelining – pros and cons – graphical method – Horner’s Rule. wl 2019 7.2 Systolic

wl 2019 7.13

Parallel matrix multiplier

(Woods, McCanny and McWhirter, 2008)

Bit-level systolic correlator

y = a * x + c

(a is M-bit vector)

a

Page 14: Sequential design and pipelining - doc.ic.ac.ukwl/teachlocal/cuscomp/notes/cc07.pdf · • pipelining – pros and cons – graphical method – Horner’s Rule. wl 2019 7.2 Systolic

wl 2019 7.14

Sequential designs

• components relate streams, same laws if delay not involved

– <..., xt-1, xt, xt+1, ...> inc <... xt-1+1, xt+1, xt+1+1, ...>

or x inc y t . xt+1 = yt

– <..., <xt-1, yt-1>, <xt, yt>, ...> add <... xt-1+yt-1, xt+yt, ...>

• delay: provides state, range is one cycle behind domain– <..., xt-1, xt, xt+1, ...> D <... xt-2, xt-1, xt, ...> models a register

or x D y t . xt-1 = yt

• initialised delay DIc, at t = 0, yt = C (DI c in Rebecca)

• D-1 is anti-delay: if input in domain then predicts input,hence not implementable (can simulate AD in Rebecca) – can implement D (D in Rebecca) when input in domain

or D-1 (D^~1 in Rebecca) when input in range– cannot implement D (AD^~1) with input in range

or D-1 (AD) with input in domain

incxt yt

single time sequence of pairs

Page 15: Sequential design and pipelining - doc.ic.ac.ukwl/teachlocal/cuscomp/notes/cc07.pdf · • pipelining – pros and cons – graphical method – Horner’s Rule. wl 2019 7.2 Systolic

wl 2019 7.15

Pipelining

• insert latches between circuits to increase throughput

• also reduce power consumption, especially for FPGAs

• but may also increase– area

– clock power consumption

– latency

data result

clock

Page 16: Sequential design and pipelining - doc.ic.ac.ukwl/teachlocal/cuscomp/notes/cc07.pdf · • pipelining – pros and cons – graphical method – Horner’s Rule. wl 2019 7.2 Systolic

wl 2019 7.16

Graphical method

• idea: introduce anti-delay which cancels effect of a delay; OK to have non-implementables at inputs or outputs

• graphical contours linking introduction of delay/anti-delay: draw contours around blocks; when a contour cuts– a domain connection, put a D (or D-1)

– a range connection, put a D-1 (or D)

• make sure

– R is timeless: D;R = R;D or R=R\D (D is timeless but not stateless!)

– the internal Ds are implementable

R

D

D

D

D-1

D-1

D-1 D-1 D-1R

D

D

D-1

R DR R R =

design is combinational

Page 17: Sequential design and pipelining - doc.ic.ac.ukwl/teachlocal/cuscomp/notes/cc07.pdf · • pipelining – pros and cons – graphical method – Horner’s Rule. wl 2019 7.2 Systolic

wl 2019 7.17

Pipelining a chain

• timeless pre-condition:

• then

RD D-1R

given

R = R \ D-1

=

R D D-1 D-1 D-1R DR DR R R =

Horner’s Rn = (R ; D)n ; D-n

Rule

boundary condition

Page 18: Sequential design and pipelining - doc.ic.ac.ukwl/teachlocal/cuscomp/notes/cc07.pdf · • pipelining – pros and cons – graphical method – Horner’s Rule. wl 2019 7.2 Systolic

wl 2019 7.18

Pipelining a row

• timeless pre-condition:

• then

RD

D

D-1

D-1R

given

R = R \ D-1 (or R \ [D-1, D-1])

=

R

D

D

D

D-1

D-1

D-1 D-1 D-1R

D

D

D-1

R DR R R =

Horner’s Rn = (R ; D)n ; D-n

Rule rownR = snd nD ; rown (R ; snd D) ; [nD-1, D-n]

boundary conditions

Page 19: Sequential design and pipelining - doc.ic.ac.ukwl/teachlocal/cuscomp/notes/cc07.pdf · • pipelining – pros and cons – graphical method – Horner’s Rule. wl 2019 7.2 Systolic

wl 2019 7.19

Example: polynomial evaluation and optim.

• given

• then

x

+

a3

x

x

+

+

xx

x

a2

a1

a0

x

+

a3

x

x

+

+

a2

a1

a0

a0 + a1 x + a2 x2 + a3 x3 = a0 + x (a1 + x (a2 + a3x))

x

+a

b

x

a x + b x = (a + b) x

x

+

b

a

Page 20: Sequential design and pipelining - doc.ic.ac.ukwl/teachlocal/cuscomp/notes/cc07.pdf · • pipelining – pros and cons – graphical method – Horner’s Rule. wl 2019 7.2 Systolic

wl 2019 7.20

Horner’s Rule

• given

• then

Q

R P

[P, Q] ; R = R ; Q

Q

R

Q

R

Q

Q

R

R

PP

P

Q

R

Q

Q

R

R

[nP, Qn] ; rdrn R = rdrn (2Q ; R)

Page 21: Sequential design and pipelining - doc.ic.ac.ukwl/teachlocal/cuscomp/notes/cc07.pdf · • pipelining – pros and cons – graphical method – Horner’s Rule. wl 2019 7.2 Systolic

wl 2019 7.21

A grid

R R R R

R R R R

R R R R

gridm n R = ?

Page 22: Sequential design and pipelining - doc.ic.ac.ukwl/teachlocal/cuscomp/notes/cc07.pdf · • pipelining – pros and cons – graphical method – Horner’s Rule. wl 2019 7.2 Systolic

wl 2019 7.22

Pipelining a grid: (a) put D between columns

R D R D R D R D D-1 D-1 D-1 D-1

R D R D R D R D D-1 D-1 D-1 D-1

R D R D R D R D D-1 D-1 D-1 D-1

D-1 D-1

D-1

D-1

D-1

D-1

D

D

D

D

D

D

Page 23: Sequential design and pipelining - doc.ic.ac.ukwl/teachlocal/cuscomp/notes/cc07.pdf · • pipelining – pros and cons – graphical method – Horner’s Rule. wl 2019 7.2 Systolic

wl 2019 7.23

Pipelining a grid: (b) put D between rows

R D

D

R D

D

R D

D

R D

D

D-1 D-1 D-1 D-1

R D

D

R D

D

R D

D

R D

D

D-1 D-1 D-1 D-1 D-1

R D

D

R D R D R D D-1 D-1 D-1 D-1 D-1 D-1

D

DD

D-1

D-1

D-1

D

D-1

D-1

D-1

D-1

D

D-1

D-1

D-1

D-1

D-1

D

D-1

D-1

D-1

D-1

D-1

D-1

D

D

D

D

D

D

Page 24: Sequential design and pipelining - doc.ic.ac.ukwl/teachlocal/cuscomp/notes/cc07.pdf · • pipelining – pros and cons – graphical method – Horner’s Rule. wl 2019 7.2 Systolic

wl 2019 7.24

Pipelining a grid: (c) place D diagonally

R D

D

R D

D

R D

D

R D

D

D-1 D-1 D-1 D-1

R D

D

R D

D

R D

D

R D

D

D-1 D-1 D-1 D-1 D-1

R D

D

R D R D R D D-1 D-1 D-1 D-1 D-1 D-1

D

DD

D-1

D-1

D-1

D

D-1

D-1

D-1

D-1

D

D-1

D-1

D-1

D-1

D-1

D

D-1

D-1

D-1

D-1

D-1

D-1

D

D

D

D

D

D