-
1Digital DesignCopyright © 2006Frank Vahid
Digital Design
Chapter 5: Register-Transfer Level
(RTL) Design
-
5.1
2Digital DesignCopyright © 2006Frank Vahid
IntroductionCombinational
logic
n0s1 s0
n1
bobi
clk State register
FSM
inpu
ts
FSM
outp
uts
• Chapter 3: Controllers– Control input/output: single bit (or just a
few) representing event or state– Finite-state machine describes
behavior; implemented as state register and combinational logic
• Chapter 4: Datapath components– Data input/output: Multiple bits
collectively representing single entity– Datapath components included
registers, adders, ALU, comparators, register files, etc.
• This chapter: custom processors– Processor: Controller and datapath
components working together to implement an algorithm
Register Comparatorsi
ansis
Register fileALU
z
e
Combinationallogic
n0s1 s0
n1
bobi
State register
Register file
ALU
DatapathController
-
3Digital DesignCopyright © 2006Frank Vahid
RTL Design: Capture Behavior, Convert to Circuit• Recall
– Chapter 2: Combinational Logic Design• First step: Capture behavior (using equation
or truth table)• Remaining steps: Convert to circuit
– Chapter 3: Sequential Logic Design• First step: Capture behavior (using FSM)• Remaining steps: Convert to circuit
• RTL Design (the method for creating custom processors)– First step: Capture behavior (using high-
level state machine, to be introduced) – Remaining steps: Convert to circuit
Capture behavior
Convert to circuit
-
5.2
4Digital DesignCopyright © 2006Frank Vahid
RTL Design Method
-
5Digital DesignCopyright © 2006Frank Vahid
RTL Design Method: “Preview” Example• Soda dispenser
– c: bit input, 1 when coin deposited
– a: 8-bit input having value of deposited coin
– s: 8-bit input having cost of a soda
– d: bit output, processor sets to 1 when total value of deposited coins equals or exceeds cost of a soda
as
cd
Sodadispenserprocessor
as
cd
Sodadispenserprocessor
25
1 025
1
1
500
0
0
0tot: 25
tot: 50
a
How can we precisely describe this processor’s behavior?
-
6Digital DesignCopyright © 2006Frank Vahid
Preview Example: Step 1 --Capture High-Level State Machine
• Declare local register tot• Init state: Set d=0, tot=0• Wait state: wait for coin
– If see coin, go to Add state• Add state: Update total value:
tot = tot + a– Remember, a is present coin’s
value– Go back to Wait state
• In Wait state, if tot >= s, go to Disp(ense) state
• Disp state: Set d=1 (dispense soda)– Return to Init state
Inputs: c (bit), a (8 bits), s (8 bits)Outputs: d (bit)Local registers: tot (8 bits)
Wait
Add
Disp
Init
d=0tot=0
c’*(tot
-
7Digital DesignCopyright © 2006Frank Vahid
Preview Example: Step 2 -- Create Datapath
ldclr
tot
8-bit<
8-bitadder
8
8
88
s a
Datapath
tot_ldtot_clr
tot_lt_s
Inputs : c (bit), a(8 bits), s (8 bits)Outputs : d (bit)Local reg isters: tot (8 bits)
Wait
Add
Disp
Init
d=0tot=0
c‘ (tot
-
8Digital DesignCopyright © 2006Frank Vahid
Preview Example: Step 3 –Connect Datapath to a Controller
• Controller’s inputs– External input c
(coin detected)– Input from datapath
comparator’s output, which we named tot_lt_s
• Controller’s outputs– External output d
(dispense soda)– Outputs to datapath
to load and clear the tot register
tot_lt_s
tot_clr
tot_ld
Controller Datapath
s
c
d
a8 8
ldclr
tot
8-bit<
8-bitadder
8
8
88
s a
Datapath
tot_ldtot_clr
tot_lt_s
-
9Digital DesignCopyright © 2006Frank Vahid
Preview Example: Step 4 – Derive the Controller’s FSM
• Same states and arcs as high-level state machine
• But set/read datapathcontrol signals for all datapathoperations and conditions
tot_lt_s
tot_clr
tot_ld
Con
trolle
r
Dat
apat
h
s
c
d
a8 8
ldclr tpt
8-bit<
8-bitadder
8
8
88
s a
Datapath
tot_ldtot_clr
tot_lt_s
Inputs::c, tot_lt_s(bit)Outputs:d, tot_ld, tot_clr (bit)
Wait
Disp
Init
d=0tot_clr=1
c’* tot_lt_s’
c’*tot_lt_s
d=1
c
tot_ld=1
c
d
tot_ld
tot_clr
tot_lt_s
Controller
Add
-
10Digital DesignCopyright © 2006Frank Vahid
Preview Example: Completing the Design• Implement the FSM as
a state register and logic– As in Ch3– Table shown on right
d000000000
1
000000001
0
111100000
0
n0111111001
0
n1000010110
0
010101010
0
c001100110
0
s1000000001
1
s0000011110
1
tot_lt_s
tot_ld
tot_clr
Init
Wai
tAd
dDi
sp
Inputs::c, tot_lt_s(bit)Outputs:d, tot_ld, tot_clr (bit)
Wait
Disp
Init
d=0tot_clr=1
c’* tot_lt_s’
c’*tot_lt_s
d=1
c
tot_ld=1
c
d
tot_ld
tot_clr
tot_lt_s
Controller
Add
-
11Digital DesignCopyright © 2006Frank Vahid
Step 1: Create a High-Level State Machine• Let’s consider each step of the
RTL design process in more detail
• Step 1– Soda dispenser example– Not an FSM because:
• Multi-bit (data) inputs a and s• Local register tot• Data operations tot=0, tot
-
12Digital DesignCopyright © 2006Frank Vahid
Step 1 Example: Laser-Based Distance Measurer
Object ofinterest
D
2D = T sec * 3*108 m/secsensor
laser
T (in seconds)
• Example of how to create a high-level state machine to describe desired processor behavior
• Laser-based distance measurement – pulse laser, measure time T to sense reflection– Laser light travels at speed of light, 3*108 m/sec – Distance is thus D = T sec * 3*108 m/sec / 2
-
13Digital DesignCopyright © 2006Frank Vahid
Step 1 Example: Laser-Based Distance Measurer
sensor
laser
T (in seconds)
Laser-baseddistancemeasurer16
from button
to displayS
L
D
Bto laser
from sensor
• Inputs/outputs– B: bit input, from button to begin measurement– L: bit output, activates laser– S: bit input, senses laser reflection– D: 16-bit output, displays computed distance
-
14Digital DesignCopyright © 2006Frank Vahid
Step 1 Example: Laser-Based Distance MeasurerLaser-based
distancemeasurer16
from button
to displayS
L
D
Bto laser
from sensor
Inputs: BOutputs:
, S(1 bit each)L (bit), D (16 bits)
S0 ?
a L = 0 (laser off)D = 0 (distance = 0)
• Step 1: Create high-level state machine• Begin by declaring inputs and outputs• Create initial state, name it S0
– Initialize laser to off (L=0)– Initialize displayed distance to 0 (D=0)
-
15Digital DesignCopyright © 2006Frank Vahid
Step 1 Example: Laser-Based Distance MeasurerLaser-based
distancemeasurer16
from button
to displayS
L
D
Bto laser
from sensor
Inputs: B, S (1 bit each)Outputs: L (bit), D (16 bits)
S0
L = 0D = 0
S1 ?
B’ (button not pressed)
B(buttonpressed)
S0
a
• Add another state, call S1, that waits for a button press– B’ – stay in S1, keep waiting– B – go to a new state S2
Q: What should S2 do? A: Turn on the lasera
-
16Digital DesignCopyright © 2006Frank Vahid
Step 1 Example: Laser-Based Distance Measurer
S0 S1 S2
L = 0D = 0
L = 1(laser on)
S3
L = 0(laser off)
B’
B
Laser-based
distancemeasurer16
from button
to displayS
L
D
Bto laser
from sensor
Inputs: B, S (1 bit each)Outputs: L (bit), D (16 bits)
a
• Add a state S2 that turns on the laser (L=1)• Then turn off laser (L=0) in a state S3
Q: What do next? A: Start timer, wait to sense reflectiona
-
17Digital DesignCopyright © 2006Frank Vahid
Step 1 Example: Laser-Based Distance MeasurerLaser-based
distancemeasurer16
from button
to displayS
L
D
Bto laser
from sensorLocal Registers: Dctr (16 bits)
S0 S1 S2 S3
L = 0D = 0
L = 1 L = 0Dctr = Dctr + 1(count cycles)
Dctr = 0(reset cycle
count)
B’
B
S’ (no reflection)
S (reflection)?
Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits)
a
• Stay in S3 until sense reflection (S)• To measure time, count cycles for which we are in S3
– To count, declare local register Dctr– Increment Dctr each cycle in S3– Initialize Dctr to 0 in S1. S2 would have been O.K. too
-
18Digital DesignCopyright © 2006Frank Vahid
Step 1 Example: Laser-Based Distance MeasurerLaser-based
distancemeasurer16
from button
to displayS
L
D
Bto laser
from sensor
S0 S1 S2 S3
L = 0D = 0
L = 1 L=0Dctr = Dctr + 1
Dctr = 0
B’ S’
B SD = Dctr / 2
(calculate D)
S4
Local Registers: Dctr (16 bits)Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits)
a
• Once reflection detected (S), go to new state S4– Calculate distance – Assuming clock frequency is 3x108, Dctr holds number of meters, so
D=Dctr/2
• After S4, go back to S1 to wait for button again
-
19Digital DesignCopyright © 2006Frank Vahid
Step 2: Create a Datapath• Datapath must
– Implement data storage– Implement data computations
• Look at high-level state machine, do three substeps– (a) Make data inputs/outputs be datapath
inputs/outputs– (b) Instantiate declared registers into the
datapath (also instantiate a register for each data output)
– (c) Examine every state and transition, and instantiate datapath components and connections to implement any data computations
Instantiate: to introduce a new component into a design.
-
20Digital DesignCopyright © 2006Frank Vahid
Step 2 Example: Laser-Based Distance Measurer(a) Make data
inputs/outputs be datapathinputs/outputs
(b) Instantiate declared registers into the datapath (also instantiate a register for each data output)
(c) Examine every state and transition, and instantiate datapathcomponents and connections to implement any data computations
DatapathDreg_clr
Dctr_clrDctr_cnt
Dreg_ld
Local Registers: Dctr (16 bits)
S0 S1 S2 S3
L = 0D = 0
L = 1 L=0Dctr = Dctr + 1
Dctr = 0
B‘ S‘
B SD = Dctr / 2
(calculate D)
S4
loadQ
IDreg: 16-bit
registerQ
Dctr: 16-bitup-counter
16
D
clearclearcount
a
Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits)
-
21Digital DesignCopyright © 2006Frank Vahid
Step 2 Example: Laser-Based Distance Measurer(c) (continued)
Examine every state and transition, and instantiate datapathcomponents and connections to implement any data computations
clearcount
clearload
Q Q
IDctr: 16-bitup-counter
Dreg: 16-bitregister
16
D
Datapath
Dreg_clr
Dctr_clrDctr_cnt
Dreg_ld 16
16
>>1
a
Local Registers: Dctr (16 bits)
S0 S1 S2 S3
L = 0D = 0
L = 1 L=0Dctr = Dctr + 1
Dctr = 0
B‘ S‘
B SD = Dctr / 2
(calculate D)
S4
Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits)
-
22Digital DesignCopyright © 2006Frank Vahid
Step 2 Example Showing Mux Use
• Introduce mux when one component input can come from more than one source
T0
T1
R = E + F
R = R + G
E,F, G, R (16 bits)Localregisters:
(a)
E F G
A B+
R
add_A_s0add_B_s0
2⋅ 1 2⋅ 1
(d)
××
a
E F G
A B+
R
(b)
E F G
A B+
R
(c)
-
23Digital DesignCopyright © 2006Frank Vahid
Step 3: Connecting the Datapath to a Controller
• Laser-based distance measurer example
• Easy – just connect all control signals between controller and datapath300 MHz Clock
D
B L
S
16to display
from buttonController
to laserfrom sensor
Datapath
Dreg_clr
Dreg_ld
Dctr_clr
Dctr_cnt
clearcount
clearload
Q Q
IDctr: 16-bitup-counter
Dreg: 16-bitregister
16D
Datapath
Dreg_clr
Dctr_clrDctr_cnt
Dreg_ld
-
24Digital DesignCopyright © 2006Frank Vahid
Step 4: Deriving the Controller’s FSM
• FSM has same structure as high-level state machine– Inputs/outputs all
bits now– Replace data
operations by bit operations using datapath
300 MHz Clock
D
B L
S
16to display
from button Controllerto laserfrom sensor
Datapath
Dreg_clr
Dreg_ld
Dctr_clr
Dctr_cnt
Inputs: B, SOutputs: L, Dreg_clr, Dreg_ld, Dctr_clr, Dctr_cnt
S0 S1 S2 S3
L = 0 L = 1 L = 0L = 0
B’ S’
B SS4
L = 0
Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits)Local Registers: Dctr (16 bits)
S0 S1 S2 S3
L = 0D = 0
L = 1 L=0Dctr = Dctr + 1
Dctr = 0
B’ S’
B SD = Dctr / 2
(calculate D)
S4
a
Dreg_clr = 1Dreg_ld = 0Dctr_clr = 0Dctr_cnt = 0(laser off)(clear D reg)
Dreg_clr = 0Dreg_ld = 0Dctr_clr = 1Dctr_cnt = 0(clear count)
Dreg_clr = 0Dreg_ld = 0Dctr_clr = 0Dctr_cnt = 0(laser on)
Dreg_clr = 0Dreg_ld = 0Dctr_clr = 0Dctr_cnt = 1(laser off)(count up)
Dreg_clr = 0Dreg_ld = 1Dctr_clr = 0Dctr_cnt = 0(load D reg with Dctr/2)(stop counting)
-
Step 4: Deriving the Controller’s FSM
25Digital DesignCopyright © 2006Frank Vahid
• Using shorthand of outputs not assigned implicitly assigned 0
S0 S1 S2 S3
L = 0 L = 1 L = 0L = 0
B’ S’
B SS4
L = 0Dreg_clr = 1Dreg_ld = 0Dctr_clr = 0Dctr_cnt = 0(laser off)(clear D reg)
Dreg_clr = 0Dreg_ld = 0Dctr_clr = 1Dctr_cnt = 0(clear count)
Dreg_clr = 0Dreg_ld = 0Dctr_clr = 0Dctr_cnt = 0(laser on)
Dreg_clr = 0Dreg_ld = 0Dctr_clr = 0Dctr_cnt = 1(laser off)(count up)
Dreg_clr = 0Dreg_ld = 1Dctr_clr = 0Dctr_cnt = 0(load D reg with Dctr/2)(stop counting)
S0 S1 S2 S3
L = 0 L = 1 L = 0
B’ S’
B S
(laser on)
S4
Inputs: B, S Outputs: L, Dreg_clr, Dreg_ld, Dctr_clr, Dctr_cnt
Dreg_clr = 1(laser off)(clear D reg)
Dctr_clr = 1(clear count) Dctr_cnt = 1
(laser off)(count up)
Dreg_ld = 1Dctr_cnt = 0(load D reg with Dctr/2)(stop counting)
a
-
26Digital DesignCopyright © 2006Frank Vahid
Step 4
• Implement FSM as state register and logic (Ch3) to complete the design
clearcount
clearload
Q Q
IDctr: 16-bitup-counter Dreg: 16-bitregister
16D
Datapath
Dreg_clr
Dctr_clrDctr_cnt
Dreg_ld
300 MHz Clock
D
B L
S
16to display
from button
Con
trolle
r to laserfrom sensor
Dat
apat
h
Dreg_clr
S0 S1 S2 S3
L = 0 L = 1 L = 0
B’ S’
B S
(laser on)
S4
Inputs: B, S Outputs: L, Dreg_clr, Dreg_ld, Dctr_clr, Dctr_cnt
Dreg_clr = 1(laser off)(clear D reg)
Dctr_clr = 1(clear count) Dctr_cnt = 1
(laser off)(count up)
Dreg_ld = 1Dctr_cnt = 0(load D reg with Dctr/2)(stop counting)
Dreg_ld
Dctr_clr
Dctr_cnt
-
5.3
27Digital DesignCopyright © 2006Frank Vahid
RTL Design Examples and Issues
32
4 A
rdD
Per0 Per1 Per15
Masterprocessor
Faddr
4
ADrd
Bus interface
Main part
Peripheral
Q32
to/from processor bus
32 4
• We’ll use several more examples to illustrate RTL design
• Example: Bus interface– Master processor can read
register from any peripheral• Each register has unique 4-bit
address • Assume 1 register/periph.
– Sets rd=1, A=address– Appropriate peripheral places
register data on 32-bit D lines• Periph’s address provided on
Faddr inputs (maybe from DIP switches, or another register)
-
28Digital DesignCopyright © 2006Frank Vahid
RTL Example: Bus Interface
WaitMyAddress
Inputs: rd (bit); Q (32 bits); A, Faddr (4 bits)Outputs: D (32 bits)Local register: Q1 (32 bits)
rd’ rd
SendData
D = “Z”Q1 = Q
(A = Faddr)and rd
((A = Faddr)and rd’)
D = Q1
• Step 1: Create high-level state machine– State WaitMyAddress
• Output “nothing” (“Z”) on D, store peripheral’s register value Q into local register Q1
• Wait until this peripheral’s address is seen (A=Faddr) and rd=1– State SendData
• Output Q1 onto D, wait for rd=0 (meaning main processor is done reading the D lines)
-
29Digital DesignCopyright © 2006Frank Vahid
RTL Example: Bus Interface
WaitMyAddress
Inputs: rd (bit); Q (32 bits); A, Faddr (4 bits)Outputs: D (32 bits)Local register: Q1 (32 bits)
rd’ rd
SendData
D = “Z”Q1 = Q
(A = Faddr)and rd
((A = Faddr)and rd’)
D = Q1
W W
ZD Z ZQ1 Q1
W W WSD SD SD
clkInputs
StateOutputs
rd
-
30Digital DesignCopyright © 2006Frank Vahid
RTL Example: Bus Interface
WaitMyAddress
Inputs: rd (bit); Q (32 bits); A, Faddr (4 bits)Outputs: D (32 bits)Local register: Q1 (32 bits)
rd’ rd
SendData
D = “Z”Q1 = Q
(A = Faddr)and rd
((A = Faddr)and rd)’
D = Q1
• Step 2: Create a datapath(a) Datapath inputs/outputs(b) Instantiate declared registers(c) Instantiate datapath components and
connections
DatapathBus interface
Q1_ldld Q1
F Qaddr
4 4 32
A
D_en
A_eq_Faddr= (4-bit) 32
32
D
a
-
31Digital DesignCopyright © 2006Frank Vahid
RTL Example: Bus Interface
• Step 3: Connect datapath to controller• Step 4: Derive controller’s FSM
a
WaitMyAddress
Inputs: rd (bit); Q (32 bits); A, Faddr (4 bits)Outputs: D (32 bits)Local register: Q1 (32 bits)
rd’ rd
SendData
D = “Z”Q1 = Q
(A = Faddr)and rd
((A = Faddr)and rd)’
D = Q1rd
Inputs: rd, A_eq_Faddr (bit)Outputs: Q1_ld, D_en (bit)
WaitMyAddress
rd‘ rd
SendData
D_en = 0Q1_ld = 1
D_en = 1Q1_ld = 0
A_eq_Faddrand rd
(A_eq_Faddrand rd)‘
DatapathBus interface
Q1_ldld Q1
Faddr Q
4 4 32
A
D_en
A_eq_Faddr= (4-bit) 32
32
D
-
32Digital DesignCopyright © 2006Frank Vahid
RTL Example: Video Compression – Sum of Absolute Differences
• Video is a series of frames (e.g., 30 per second)• Most frames similar to previous frame
– Compression idea: just send difference from previous frame
Digitizedframe 2
1 Mbyte
Frame 2
Digitizedframe 1
Frame 1
1 Mbyte(a)
Digitizedframe 1
Frame 1
1 Mbyte(b)
Only difference: ball moving
aDifference of
2 from 1
0.01 Mbyte
Frame 2
Just send difference
-
33Digital DesignCopyright © 2006Frank Vahid
RTL Example: Video Compression – Sum of Absolute Differences
Frame 2Frame 1compare Each is a pixel, assume
represented as 1 byte(actually, a color picture might have 3 bytes per pixel, for intensity of red, green, and blue components of pixel)
• Need to quickly determine whether two frames are similar enough to just send difference for second frame– Compare corresponding 16x16 “blocks”
• Treat 16x16 block as 256-byte array– Compute the absolute value of the difference of each array item– Sum those differences – if above a threshold, send complete frame
for second frame; if below, can use difference method (using another technique, not described)
-
34Digital DesignCopyright © 2006Frank Vahid
RTL Example: Video Compression – Sum of Absolute Differences
!(i
-
35Digital DesignCopyright © 2006Frank Vahid
RTL Example: Video Compression – Sum of Absolute Differences
!(i=256)• S3: add difference to sum,
increment index• S4: done, write to output
sad_reg
B
A
go
SAD
sad
Inputs: A, B (256 byte memory); go (bit)Outputs: sad (32 bits)Local registers: sum, sad_reg (32 bits); i (9 bits)
!goS0go
S1 sum = 0i = 0
S3 sum=sum+abs(A[i]-B[i])i=i+1
S4 sad_reg = sum
S2
i
-
36Digital DesignCopyright © 2006Frank Vahid
RTL Example: Video Compression – Sum of Absolute Differences
• Step 2: Create datapath
!(i
-
37Digital DesignCopyright © 2006Frank Vahid
RTL Example: Video Compression – Sum of Absolute Differences
!(i
-
38Digital DesignCopyright © 2006Frank Vahid
RTL Example: Video Compression – Sum of Absolute Differences
• Comparing software and custom circuit SAD – Circuit: Two states (S2 & S3) for
each i, 256 i’s 512 clock cycles– Software: Loop (for i = 1 to 256), but
for each i, must move memory to local registers, subtract, compute absolute value, add to sum, increment i – say about 6 cycles per array item 256*6 = 1536 cycles
– Circuit is about 3 times (300%) faster
– Later, we’ll see how to build SAD circuit that is even faster
!(i
-
39Digital DesignCopyright © 2006Frank Vahid
RTL Design Pitfalls and Good Practice• Common pitfall: Assuming
register is update in the state it’s written– Final value of Q?– Final state?– Answers may surprise you
• Value of Q unknown• Final state is C, not D
– Why?• State A: R=99 and Q=R
happen simultaneously• State B: R not updated with
R+1 until next clock cycle, simultaneously with state register being updated
A B
C
D
R>=100
R
-
40Digital DesignCopyright © 2006Frank Vahid
RTL Design Pitfalls and Good Practice
BA B2
C
D
R>=100
R
-
41Digital DesignCopyright © 2006Frank Vahid
RTL Design Pitfalls and Good Practice• Common pitfall:
Reading outputs– Outputs can only be
written– Solution: Introduce
additional register, which can be written and read
Inputs: A, B (8 bits)Outputs: P (8 bits)
Inputs: A, B (8 bits)Outputs: P (8 bits)Local register: R (8 bits)
TS
P=P+BP=A
S T
R=AP=A
P=R+B
(a) (b)
-
42Digital DesignCopyright © 2006Frank Vahid
RTL Design Pitfalls and Good Practice• Good practice: Register
all data outputs– In fig (a), output P would
show spurious values as addition computes
• Furthermore, longest register-to-register path, which determines clock period, is not known until that output is connected to another component
– In fig (b), spurious outputs reduced, and longest register-to-register path is clear
+
RB
P
+
R
Preg
B
P(b)
(a)
-
43Digital DesignCopyright © 2006Frank Vahid
Control vs. Data Dominated RTL Design• Designs often categorized as control-dominated or data-
dominated– Control-dominated design – Controller contains most of the
complexity– Data-dominated design – Datapath contains most of the complexity– General, descriptive terms – no hard rule that separates the two
types of designs– Laser-based distance measurer – control dominated– Bus interface, SAD circuit – mix of control and data– Now let’s do a data dominated design
-
44Digital DesignCopyright © 2006Frank Vahid
Data Dominated RTL Design Example: FIR Filter• Filter concept
– Suppose X is data from a temperature sensor, and particular input sequence is 180, 180, 181, 240, 180, 181 (one per clock cycle)
– That 240 is probably wrong!• Could be electrical noise
– Filter should remove such noise in its output Y
– Simple filter: Output average of last N values
• Small N: less filtering• Large N: more filtering, but
less sharp output
1212
Y
clk
X
digital filter
-
45Digital DesignCopyright © 2006Frank Vahid
Data Dominated RTL Design Example: FIR Filter• FIR filter
– “Finite Impulse Response”– Simply a configurable weighted
sum of past input values– y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)
• Above known as “3 tap”• Tens of taps more common• Very general filter – User sets the
constants (c0, c1, c2) to define specific filter
– RTL design• Step 1: Create high-level state
machine– But there really is none! Data
dominated indeed.• Go straight to step 2
1212
Y
clk
X
digital filter
y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)
-
46Digital DesignCopyright © 2006Frank Vahid
Data Dominated RTL Design Example: FIR Filter• Step 2: Create datapath
– Begin by creating chain of xt registers to hold past values of X
1212Y
clk
Xdigital filter
xt0 xt1 xt2
12 12 12 12
x(t-2)x(t-1)x(t)3-tap FIR filter
X Y
clk
y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)
180 180181 180181240
Suppose sequence is: 180, 181, 240
a
-
47Digital DesignCopyright © 2006Frank Vahid
Data Dominated RTL Design Example: FIR Filter• Step 2: Create datapath
(cont.)– Instantiate registers for
c0, c1, c2– Instantiate multipliers to
compute c*x values
1212Y
clk
Xdigital filter
y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)
xt0 xt1 xt2
x(t-2)x(t-1)x(t)3-tap FIR filter
X
Y
clk
c1c0 c2
∗∗ ∗
a
-
48Digital DesignCopyright © 2006Frank Vahid
Data Dominated RTL Design Example: FIR Filter• Step 2: Create datapath
(cont.)– Instantiate adders
1212Y
clk
Xdigital filter
y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)
xt0 xt1 xt2
x(t-2)x(t-1)x(t)
3-tap FIR filter
X
Y
clk
c0 c1 c2
∗ ∗ ∗
+ +
a
-
49Digital DesignCopyright © 2006Frank Vahid
Data Dominated RTL Design Example: FIR Filter• Step 2: Create datapath (cont.)
– Add circuitry to allow loading of particular c register
y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)
a
1212Y
clk
Xdigital filter
xt0 xt1 xt2
x(t-2)x(t-1)x(t)
3-tap FIR filter
X
Y
clk
c0 c1 c2
* *
+
*
+
3210
2x4
yreg
eCa1
CL
C
Ca0
-
50Digital DesignCopyright © 2006Frank Vahid
Data Dominated RTL Design Example: FIR Filter• Step 3 & 4: Connect to controller, Create FSM
– No controller needed– Extreme data-dominated example– (Example of an extreme control-dominated design – an FSM, with no
datapath)• Comparing the FIR circuit to a software implementation
– Circuit• Assume adder has 2-gate delay, multiplier has 20-gate delay• Longest past goes through one multiplier and two adders
– 20 + 2 + 2 = 24-gate delay• 100-tap filter, following design on previous slide, would have about a 34-gate
delay: 1 multiplier and 7 adders on longest path– Software
• 100-tap filter: 100 multiplications, 100 additions. Say 2 instructions per multiplication, 2 per addition. Say 10-gate delay per instruction.
• (100*2 + 100*2)*10 = 4000 gate delays– Circuit is more than 100 times faster (10,000% faster). Wow.
y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)
-
5.4
51Digital DesignCopyright © 2006Frank Vahid
Determining Clock Frequency• Designers of digital circuits
often want fastest performance– Means want high clock
frequency• Frequency limited by longest
register-to-register delay– Known as critical path– If clock is any faster, incorrect
data may be stored into register– Longest path on right is 2 ns
• Ignoring wire delays, and register setup and hold times, for simplicity
a
+
b
c
2 nsdelay
clk
-
52Digital DesignCopyright © 2006Frank Vahid
Critical Path• Example shows four paths
– a to c through +: 2 ns– a to d through + and *: 7 ns– b to d through + and *: 7 ns– b to d through *: 5 ns
• Longest path is thus 7 ns• Fastest frequency
– 1 / 7 ns = 142 MHz
+ *
c d
7 ns7 ns
5 nsdelay
2 nsdelay
Max(2,7,7,5)= 7 ns
a b
5 ns
7 ns
7 ns2 ns
-
53Digital DesignCopyright © 2006Frank Vahid
Critical Path Considering Wire Delays• Real wires have delay too
– Must include in critical path• Example shows two paths
– Each is 0.5 + 2 + 0.5 = 3 ns• Trend
– 1980s/1990s: Wire delays were tiny compared to logic delays
– But wire delays not shrinking as fast as logic delays
• Wire delays may even be greater than logic delays!
• Must also consider register setup and hold times, also add to path
• Then add some time to the computed path, just to be safe– e.g., if path is 3 ns, say 4 ns instead
a
+
b
c
2 ns
3 ns
3 ns
0.5 ns0.5 ns
0.5 ns
clk
3 ns
-
54Digital DesignCopyright © 2006Frank Vahid
A Circuit May Have Numerous Paths• Paths can exist
– In the datapath– In the controller– Between the
controller and datapath
– May be hundreds or thousands of paths
• Timing analysis tools that evaluate all possible paths automatically very helpful
Combinational logic
c
tot_lt_s
clk
n1
d
tot_ld
tot_lt_s
tot_clr
s0s1
n0
State register
s
8 8
8
8
a
ld
clrtot
Datapath
8-bit<
8-bitadder
(c)
(b) (a)
-
5.5
55Digital DesignCopyright © 2006Frank Vahid
Behavioral Level Design: C to Gates
!goS0go
S1 sum = 0i = 0
S3 sum=sum+abs(A[i]-B[i])i=i+1
S4 sad_reg = sum
S2
i
-
56Digital DesignCopyright © 2006Frank Vahid
Behavioral-Level Design: Start with C (or Similar Language)
• Replace first step of RTL design method by two steps– Capture in C, then convert C to high-level state machine– How convert from C to high-level state machine?
Step 1A: Capture in C
Step 1B: Convert to high-level state machinea
-
57Digital DesignCopyright © 2006Frank Vahid
Converting from C to High-Level State Machine• Convert each C construct to
equivalent states and transitions
• Assignment statement– Becomes one state with
assignment
• If-then statement– Becomes state with condition
check, transitioning to “then”statements if condition true, otherwise to ending state
• “then” statements would also be converted to states
target=expression
atarget = expression;
(then stmts)
!cond
cond
(end)
if (cond) {// then stmts
}
a
-
58Digital DesignCopyright © 2006Frank Vahid
Converting from C to High-Level State Machine• If-then-else
– Becomes state with condition check, transitioning to “then”statements if condition true, or to “else” statements if condition false
• While loop statement– Becomes state with condition
check, transitioning to while loop’s statements if true, then transitioning back to condition check
!cond
cond
(end)
(then stmts) (else stmts)
if (cond) {// then stmts
}else {
// else stmts}
a
!cond
cond(while stmts)
(end)
while (cond) {// while stmts
}a
-
59Digital DesignCopyright © 2006Frank Vahid
Simple Example of Converting from C to High-Level State Machine
• Simple example: Computing the maximum of two numbers– Convert if-then-else statement to states (b)– Then convert assignment statements to states (c)
(end)
(c)
X>Y
!(X>Y)
(end)
(then stmts) (else stmts)
(b)
X>Y
!(X>Y)
Max=X Max=Y
(a)
Inputs: uint X, YOutputs: uint Max
if (X > Y) {
}else {
}
Max = X;
Max = Y;
a a
-
60Digital DesignCopyright © 2006Frank Vahid
Example: Converting Sum-of-Absolute-Differences C code to High-Level State Machine
• Convert each construct to states– Simplify when possible,
e.g., merge states• From high-level state
machine, follow RTL design method to create circuit
• Thus, can convert C to gates using straightforward automatable process– Not all C constructs can be
efficiently converted– Use C subset if intended
for circuit– Can use languages other
than C, of course
sum = sum + abs(A[i] - B[i]);
(a)
Inputs: byte A[256, B[256]bit go;
Output: int sadmain(){
uint sum; short uint I;while (1) {
sum = 0;i = 0;
while (!go);
while (i < 256) {
i = i + 1;}sad = sum;}
}
(d)
!go go
sum=0i=0
(g)
!go go
sum=0i=0
!(i
-
5.6
61Digital DesignCopyright © 2006Frank Vahid
Memory Components• Register-transfer level
design instantiates datapathcomponents to create datapath, controlled by a controller– A few more components are
often used outside the controller and datapath
• MxN memory– M words, N bits wide each
• Several varieties of memory, which we now introduce
N-bitswide each
M×N memory
M w
ords
-
62Digital DesignCopyright © 2006Frank Vahid
Random Access Memory (RAM)• RAM – Readable and writable memory
– “Random access memory”• Strange name – Created several decades ago to
contrast with sequentially-accessed storage like tape drives
– Logically same as register file – Memory with address inputs, data inputs/outputs, and control
• RAM usually just one port; register file usually two or more
– RAM vs. register file• RAM typically larger than roughly 512 or 1024
words• RAM typically stores bits using a bit storage
approach that is more efficient than a flip flop• RAM typically implemented on a chip in a square
rather than rectangular shape – keeps longest wires (hence delay) short
32
4
32
4W_data
W_addr
W_en
R_data
R_addr
R_en16×32
register file
Register file from Chpt. 4
32
10data
addr
rw
en
1024× 32RAM
RAM block symbol
-
63Digital DesignCopyright © 2006Frank Vahid
RAM Internal Structure32
10data
addr
rw
en
1024x32RAM
addr0addr1
addr(A-1)
clkenrw
addr
Let A = log2M
to all cells
wdata(N-1)
rdata(N-1)
wdata(N-2)
rdata(N-2)
wdata0
rdata0
bit storageblock(aka “cell”)
word
word
RAM cell
wordenable
wordenable
rw
data cell
data
a0a1
d0
d1
d(M-1)
a(A-1)
e
AxMdecoder
enable
• Similar internal structure as register file– Decoder enables appropriate word based on address
inputs– rw controls whether cell is written or read– Let’s see what’s inside each RAM cell
-
64Digital DesignCopyright © 2006Frank Vahid
Static RAM (SRAM)
• “Static” RAM cell– 6 transistors (recall inverter is 2 transistors)– Writing this cell
• word enable input comes from decoder• When 0, value d loops around inverters
– That loop is where a bit stays stored• When 1, the data bit value enters the loop
– data is the bit to be stored in this cell– data’ enters on other side– Example shows a “1” being written into cell
addr0addr1
addr(A-1)
clkenrw
add
r
Let A = log2 M
a0a1
d0
d1
d(M-1)
a(A-1)
e
A ⋅ Mdecoder
wordenable
to all cells
wdata(N-1)
rdata(N-1)
wdata(N-2)
rdata(N-2)
wdata0
rdata0
bit storageblock(aka cell )
word
,,,,
cell
wordenable
wordenable
rw
data
data
SRAM celldata data’
d’dcell
0wordenable
1
1
1
0
0
32
10data
addr
rw
en
1024x32RAM
SRAM celldata data’
d
wordenable
data data’
d’dcell
0wordenable
1 0 a
a
a
-
65Digital DesignCopyright © 2006Frank Vahid
Static RAM (SRAM)
• “Static” RAM cell– Reading this cell
• Somewhat trickier• When rw set to read, the RAM logic sets
both data and data’ to 1• The stored bit d will pull either the left line or
the right bit down slightly below 1• “Sense amplifiers” detect which side is
slightly pulled down– The electrical description of SRAM is really
beyond our scope – just general idea here, mainly to contrast with DRAM...
addr0addr1
addr(A-1)
clkenrw
add
r
Let A = log2 M
a0a1
d0
d1
d(M-1)
a(A-1)
e
A ⋅ Mdecoder
wordenable
to all cells
wdata(N-1)
rdata(N-1)
wdata(N-2)
rdata(N-2)
wdata0
rdata0
bit storageblock(aka cell )
word
,,,,
cell
wordenable
wordenable
rw
data
data
SRAM cell
32
10data
addr
rw
en
1024x32RAM
data data’
d
1
1 1
wordenable
To sense amplifiers
1 0
1
-
66Digital DesignCopyright © 2006Frank Vahid
Dynamic RAM (DRAM)
• “Dynamic” RAM cell– 1 transistor (rather than 6)– Relies on large capacitor to store bit
• Write: Transistor conducts, data voltage level gets stored on top plate of capacitor
• Read: Just look at value of d• Problem: Capacitor discharges over time
– Must “refresh” regularly, by reading d and then writing it right back
addr0addr1
addr(A-1)
clkenrw
add
r
Let A = log2 M
a0a1
d0
d1
d(M-1)
a(A-1)
e
A ⋅ Mdecoder
wordenable
to all cells
wdata(N-1)
rdata(N-1)
wdata(N-2)
rdata(N-2)
wdata0
rdata0
bit storageblock(aka cell )
word
,,,,
cell
wordenable
wordenable
rw
data
data DRAM cell
32
10data
addr
rw
en
1024x32RAM
wordenable
data
cell
(a)
(b)
data
enable
d discharges
dcapacitorslowlydischarging
-
67Digital DesignCopyright © 2006Frank Vahid
Comparing Memory Types• Register file
– Fastest– But biggest size
• SRAM– Fast– More compact than register file
• DRAM– Slowest
• And refreshing takes time– But very compact
• Use register file for small items, SRAM for large items, and DRAM for huge items– Note: DRAM’s big capacitor requires
a special chip design process, so DRAM is often a separate chip
MxN Memoryimplemented as a:
registerfile
SRAM
DRAM
Size comparison for samenumber of bits (not to scale)
-
68Digital DesignCopyright © 2006Frank Vahid
Reading and Writing a RAMclk
addr
data
rw
en
1 2
9 913
999 Z 500500
• Writing– Put address on addr lines, data on data lines, set rw=1, en=1
• Reading– Set addr and en lines, but put nothing (Z) on data lines, set rw=0– Data will appear on data lines
• Don’t forget to obey setup and hold times– In short – keep inputs stable before and after a clock edge
valid
valid
Z 500
accesstime
setuptime
holdtime
setuptime
clk
addr
data
rw
3
1 means write
RAM[9]now equals 500
RAM[13]now equals 999
(b)
-
69Digital DesignCopyright © 2006Frank Vahid
RAM Example: Digital Sound Recorder
• Behavior– Record: Digitize sound, store as series of 4096 12-bit digital values in RAM
• We’ll use a 4096x16 RAM (12-bit wide RAM not common)– Play back later– Common behavior in telephone answering machine, toys, voice recorders
• To record, processor should read a-to-d, store read values into successive RAM words– To play, processor should read successive RAM words and enable d-to-a
wire
speaker
microphone
wireanalog-to-
digitalconverter
digital-to-analog
converterad_ld da_ld
Rrw RenRa12
16
processor
ad_buf
data
addr
rw en
4096⋅ 16RAM
-
70Digital DesignCopyright © 2006Frank Vahid
RAM Example: Digital Sound Recorder
ad_ld=1ad_buf=1Ra=aRrw=1Ren=1
S
a=0
a=a+1
a=4095
a
-
71Digital DesignCopyright © 2006Frank Vahid
RAM Example: Digital Sound Recorder– Now create play behavior– Use local register a again,
create state machine that counts from 0 to 4095 again
• For each a– Read RAM– Write to digital-to-analog conv.
• Note: Must write d-to-a one cycle after reading RAM, when the read data is available on the data bus
– The record and play state machines would be parts of a larger state machine controlled by signals that determine when to record or play
a
da_ld=1
ad_buf=0Ra=aRrw=0Ren=1
V
a=0
a=a+1
a=4095
a
-
72Digital DesignCopyright © 2006Frank Vahid
Read-Only Memory – ROM• Memory that can only be read from, not
written to– Data lines are output only– No need for rw input
• Advantages over RAM– Compact: May be smaller– Nonvolatile: Saves bits even if power supply
is turned off– Speed: May be faster (especially than
DRAM)– Low power: Doesn’t need power supply to
save bits, so can extend battery life• Choose ROM over RAM if stored data won’t
change (or won’t change often)– For example, a table of Celsius to Fahrenheit
conversions in a digital thermometer
32
10data
addr
rw
en
1024× 32RAM
RAM block symbol
32
10data
addr
en
1024x32ROM
ROM block symbol
-
73Digital DesignCopyright © 2006Frank Vahid
Read-Only Memory – ROM32
10data
addr
en
1024x32ROM
ROM block symbol
ROM cell
addr0addr1
addr(A-1)
clken
addr
Let A = log2M
a0a1
d0
d1
d(M-1)
a(A-1)
e
AxMdecoder
wordenable
rdata(N-1) rdata(N-2) rdata0
bit storageblock(aka “cell”)
word
wordenable
wordenable
data
data
• Internal logical structure similar to RAM, without the data input lines
-
74Digital DesignCopyright © 2006Frank Vahid
ROM Types• If a ROM can only be read, how
are the stored bits stored in the first place?– Storing bits in a ROM known as
programming– Several methods
• Mask-programmed ROM– Bits are hardwired as 0s or 1s
during chip manufacturing• 2-bit word on right stores “10”• word enable (from decoder) simply
passes the hardwired value through transistor
– Notice how compact, and fast, this memory would be
cell cell
wordenable
data line data line01
addr0addr1
addr(A-1)
en
addr
Let A = log2 M
a0a1
d0
d1
d(M-1)
a(A-1)
e
A ⋅ Mdecoder
wordenable
data(N-1) data(N-2) data0
bit storageblock(a cell )
word
,,,,
cellword
enableword
enable
data
data
-
75Digital DesignCopyright © 2006Frank Vahid
ROM Types• Fuse-Based Programmable
ROM– Each cell has a fuse– A special device, known as a
programmer, blows certain fuses (using higher-than-normal voltage)
• Those cells will be read as 0s (involving some special electronics)
• Cells with unblown fuses will be read as 1s
• 2-bit word on right stores “10”– Also known as One-Time
Programmable (OTP) ROM
cell cell
wordenable
data line data line11
blown fusefuse
addr0addr1
addr(A-1)
en
addr
Let A = log2 M
a0a1
d0
d1
d(M-1)
a(A-1)
e
A ⋅ Mdecoder
wordenable
data(N-1) data(N-2) data0
bit storageblock(a cell )
word
,,,,
cellword
enableword
enable
data
data
a
-
76Digital DesignCopyright © 2006Frank Vahid
ROM Types• Erasable Programmable ROM
(EPROM)– Uses “floating-gate transistor” in each cell– Special programmer device uses higher-
than-normal voltage to cause electrons to tunnel into the gate
• Electrons become trapped in the gate• Only done for cells that should store 0• Other cells (without electrons trapped in
gate) will be 1– 2-bit word on right stores “10”
• Details beyond our scope – just general idea is necessary here
– To erase, shine ultraviolet light onto chip• Gives trapped electrons energy to escape• Requires chip package to have window
addr0addr1
addr(A-1)
en
addr
Let A = log2 M
a0a1
d0
d1
d(M-1)
a(A-1)
e
A ⋅ Mdecoder
wordenable
data(N-1) data(N-2) data0
bit storageblock(a cell )
word
,,,,
cellword
enableword
enable
data
data
cell cell
wordenable
data line data line
eÐeÐa
ting
g
a
t
e t
r
t
or
trapped electrons
01
float
ing-
gate
tra
nsis
tor
-
77Digital DesignCopyright © 2006Frank Vahid
ROM Types• Electronically-Erasable Programmable ROM
(EEPROM)– Similar to EPROM
• Uses floating-gate transistor, electronic programming to trap electrons in certain cells
– But erasing done electronically, not using UV light– Erasing done one word at a time
• Flash memory– Like EEPROM, but all words (or large blocks of
words) can be erased simultaneously– Become common relatively recently (late 1990s)
• Both types are in-system programmable– Can be programmed with new stored bits while in the
system in which the ROM operates• Requires bi-directional data lines, and write control input• Also need busy output to indicate that erasing is in
progress – erasing takes some time
a
ting
g
a
t
e t
r
t
or
32
10data
addr
en
write
busy
1024x32EEPROM
-
78Digital DesignCopyright © 2006Frank Vahid
ROM Example: Talking Doll4096x16 ROM
processor
d
a
Ra
16
Ren
da_ld
digital-to-analog
converter
v
speaker
vibrationsensor
“Hello there!”
“Hello there!” audio divided into 4096 samples, storedin ROM
“Hello there!”
a
• Doll plays prerecorded message, trigger by vibration– Message must be stored without power supply Use a ROM, not a RAM,
because ROM is nonvolatile• And because message will never change, use a mask-programmed ROM or
OTP ROM– Processor should wait for vibration (v=1), then read words 0 to 4095 from
the ROM, writing each to the d-to-a
-
79Digital DesignCopyright © 2006Frank Vahid
ROM Example: Talking Doll
d
a
4096x16 ROM
processor
Ra
16
Ren
da_ld
digital-to-analog
converter
v
Sa=0
da_ld=1a=a+1a=4095
a
-
80Digital DesignCopyright © 2006Frank Vahid
ROM Example: Digital Telephone Answering Machine Using a Flash Memory
• Want to record the outgoing announcement
– When rec=1, record digitized sound in locations 0 to 4095
– When play=1, play those stored sounds to digital-to-analog converter
• What type of memory?– Should store without power
supply – ROM, not RAM– Should be in-system
programmable – EEPROM or Flash, not EPROM, OTP ROM, or mask-programmed ROM
– Will always erase entire memory when reprogramming – Flash better than EEPROM
analog-to-digital
converterdigital-to-
analogconverterad_ld
da_ld
Rrw Rener buRa12
16
processor
ad_buf
busy
4096x16 Flash
recplayrecord
microphone speaker
“We’re not home.”
-
81Digital DesignCopyright © 2006Frank Vahid
ROM Example: Digital Telephone Answering Machine Using a Flash Memory
• High-level state machine– Once rec=1, begin
erasing flash by setting er=1
– Wait for flash to finish erasing by waiting for bu=0
– Execute loop that sets local register a from 0 to 4095, reading analog-to-digital converter and writing to flash for each a
en
analog-to-digital
converterdigital-to-
analogconverterad_ld
da_ld
Rrw Ren er buRa12
16
processor
ad_buf
4096x16 Flash
recplayrecord
microphone speaker
a
w
d r
Ter=0
bu
bu’
er=1rec
S
Local register: a (13 bits)
a=4096
a
-
82Digital DesignCopyright © 2006Frank Vahid
Blurring of Distinction Between ROM and RAM• We said that
– RAM is readable and writable– ROM is read-only
• But some ROMs act almost like RAMs– EEPROM and Flash are in-system programmable
• Essentially means that writes are slow– Also, number of writes may be limited (perhaps a few million times)
• And, some RAMs act almost like ROMs– Non-volatile RAMs: Can save their data without the power supply
• One type: Built-in battery, may work for up to 10 years• Another type: Includes ROM backup for RAM – controller writes RAM contents to
ROM before turning off• New memory technologies evolving that merge RAM and ROM benefits
– e.g., MRAM• Bottom line
– Lot of choices available to designer, must find best fit with design goals
EEPROMROM Flash
NVRAM
RAMa
-
5.7
83Digital DesignCopyright © 2006Frank Vahid
Queues• A queue is another component
sometimes used during RTL design
• Queue: A list written to at the back, from read from the front– Like a list of waiting restaurant
customers• Writing called a push, reading
called a pop• Because first item written into a
queue will be the first item read out, also called a FIFO (first-in-first-out)
frontback
read (andremove) itemsfrom front ofthe queue
write itemsto the backof the queue
-
84Digital DesignCopyright © 2006Frank Vahid
Queues
r f
01234567
fr
0
A
1234567
A
fr
0
AB
1234567
B
fr
0
B
1234567
A
• Queue has addresses, and two pointers: rear and front– Initially both point to 0
• Push (write)– Item written to address pointed to
by rear– rear incremented
• Pop (read)– Item read from address pointed
to by front– front incremented
• If front or rear reaches 7, next (incremented) value should be 0 (for a queue with addresses 0 to 7)
a
a
a
-
85Digital DesignCopyright © 2006Frank Vahid
Queues• Treat memory as a circle
– If front or rear reaches 7, next (incremented) value should be 0 rather than 8 (for a queue with addresses 0 to 7)
• Two conditions of interest– Full queue – no room for more items
• In 8-entry queue, means 8 items present• No further pushes allowed until a pop occurs• Causes front=rear
– Empty queue – no items• No pops allowed until a push occurs• Causes front=rear
– Both conditions have front=rear• To detect whether front=rear means full or
empty, need state machine that detects if previous operation was push or pop, sets full or empty output signal (respectively)
fr
0
B
1234567
A
B
1 7
2 6
3 5
4
0
frr
a
-
86Digital DesignCopyright © 2006Frank Vahid
Queue Implementation8⋅ 16 register file
clr
3-bitup counter
3-bitup counter
incclr
inc
rear front
=
wr
rd
reset
wdata rdata16 16
33
wdata
waddrwr
rdata
raddr
rd
eq
Cont
rolle
r
full
empty8-word 16-bit queue
• Can use register file for item storage
• Implement rear and frontusing up counters– rear used as register file’s
write address, front as read address
• Simple controller would set control lines for pushes and pops, and also detect full and empty situations– FSM for controller not
shown
-
87Digital DesignCopyright © 2006Frank Vahid
Common Uses of a Queue• Computer keyboard
– Pushes pressed keys onto queue, meanwhile pops and sends to computer
• Digital video recorder– Pushes captured frames, meanwhile pops frames, compresses
them, and stores them
• Computer network routers– Pushes incoming packets onto queue, meanwhile pops packets,
processes destination information, and forwards each packet out over appropriate port
-
88Digital DesignCopyright © 2006Frank Vahid
Queue Usage Example
r f
01234567
fr
0123456
9585723
7
fr
01234567
f r
01234567
9585723
95857236
r f
01234567
data:9
full35857236
ERROR! Pushing a full queueresults in unknown state
Initially emptyqueue
1. After pushing9, 5, 8, 5, 7, 2, 3
2. After popping
3. After pushing 6
4. After pushing 3
5. After pushing 4
• Example series of pushes and pops– Note how rear and front
pointers move– Note that popping doesn’t
really remove the data from the queue, but that data is no longer accessible
– Note how rear (and front) wraps around from address 7 to 0
• Note: pushing a full queue is an error– As is popping an empty queue
-
5.8
89Digital DesignCopyright © 2006Frank Vahid
Hierarchy – A Key Design Concept
• Hierarchy– An organization with a few items at the
top, with each item decomposed into other items
– Common example: A country• 1 item at the top (the country)• Country item decomposed into
state/province items• Each state/province item decomposed into
city items• Hierarchy helps us manage complexity
– To go from transistors to gates, muxes, decoders, registers, ALUs, controllers, datapaths, memories, queues, etc.
– Imagine trying to comprehend a controller and datapath at the level of gates
P
r
o
vin
c
e 3
P
r
o
vin
c
e 2
P
r
o
vin
c
e 1
CityF
Country A
vinvin
vin
P
r
o
c
P
r
o
c
e 2
P
r
o
c
e 1
Province 1
Province 2
Province 3
Province 1
Province 2
Province 3
Map showing all levels of hierarchy
Map showing just top two levels of hierarchy
CityGCityE
CityDCityA
CityB
CityC
Country A
-
90Digital DesignCopyright © 2006Frank Vahid
Hierarchy and Abstraction
• Abstraction– Hierarchy often involves not just grouping
items into a new item, but also associating higher-level behavior with the new item, known as abstraction
• e.g., an 8-bit adder has an understandable high-level behavior – it adds two 8-bit binary numbers
– Frees designer from having to remember, or even from having to understand, the lower-level details P
r
o
vin
c
e 3
P
r
o
vin
c
e 2
P
r
o
vin
c
e 1
vin
P
r
o
c
e 1
vin
P
r
o
P
r
o
a7.. a0 b7.. b0
s7.. s0co
ci8-bit adder
vin
c c
e 2 e 3
-
91Digital DesignCopyright © 2006Frank Vahid
Hierarchy and Composing Larger Components from Smaller Versions
• A common task is to compose smaller components into a larger one
– Gates: Suppose you have plenty of 3-input AND gates, but need a 9-input AND gate
• Can simple compose the 9-input gate from several 3-input gates
– Muxes: Suppose you have 4x1 and 2x1 muxes, but need an 8x1 mux
• s2 selects either top or bottom 4x1• s1s0 select particular 4x1 input• Implements 8x1 mux – 8 data inputs, 3 selects, one output
vin
P
r
o
c
e 3
P
r
o
c
e 2
4⋅ 1
2⋅ 1
d
d
i0i1
i1
i0
i2i3
i0i1i2i3
i4i5i6i7
s1 s0
s04⋅ 1
d
i0i1i2i3
s1 s0
s1 s0 s2
a
P
P
r
o
vin
c
e 3
P
r
o
vin
c
e 2
P
r
o
vin
c
e 1
r
o
vin
vin
c
e 1
-
92Digital DesignCopyright © 2006Frank Vahid
Hierarchy and Composing Larger Components from Smaller Versions
• Composing memory very common• Making memory words wider
– Easy – just place memories side-by-side until desired width obtained– Share address/control lines, concatenate data lines– Example: Compose 1024x8 ROMs into 1024x32 ROM
P
r
o
vin
c
e 3
P
r
o
vin
c
e 2
P
r
o
vin
c
e 1
vin
P
r
o
c
e 3
P
r
o
P
r
o
1024x32ROM
1024x8ROM
data
addr
endata
8 8
32
8 8
10
10
en
en
addr
addr
data(31..0)
1024x8ROM
addr
endata
1024x8ROM
addr
endata
1024x8ROM
addr
endata
-
93Digital DesignCopyright © 2006Frank Vahid
Hierarchy and Composing Larger Components from Smaller Versions
• Creating memory with more words – Put memories on top of one another until the
number of desired words is achieved– Use decoder to select among the memories
• Can use highest order address input(s) as decoder input
• Although actually, any address line could be used
– Example: Compose 1024x8 memories into 2048x8 memory
P
r
o
vin
c
e 3
P
r
o
vin
c
e 2
P
r
o
vin
c
e 1
vinvin
P
r
o
c
e 3
P
r
o
c
e 2
P
1024x8ROM
addr
en data
1024x8ROM
addr
en data
0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 10 0 0 0 0 0 0 0 0 1 0
0 1 1 1 1 1 1 1 1 1 00 1 1 1 1 1 1 1 1 1 1
1 0 0 0 0 0 0 0 0 0 01 0 0 0 0 0 0 0 0 0 11 0 0 0 0 0 0 0 0 1 0
1 1 1 1 1 1 1 1 1 1 01 1 1 1 1 1 1 1 1 1 1
a0a10a9a8
a10 just chooses which memory to access
a
2048x8ROM
data
8
11
11
en
addr
1024x8ROM
addr
en data
8
1024x8ROM
addr
en data
8
a9..a0
a10 d0
d1
en
addr
1x2dcdi0e
a
To create memory with more words and wider words, can first compose to enough words, then widen.
-
94Digital DesignCopyright © 2006Frank Vahid
Chapter Summary– Modern digital design involves creating processor-level components– Four-step RTL method can be used
• 1. High-level state machine 2. Create datapath 3. Connect datapathto controller 4. Derive controller FSM
– Several example• Control dominated, data dominated, and mix
– Determining fastest clock frequency• By finding critical path
– Behavioral-level design – C to gates• By using method to convert C (subset) to high-level state machine
– Additional RTL components• Memory: RAM, ROM• Queues
– Hierarchy: A key concept used throughout Chapters 2-5
Digital DesignIntroductionRTL Design: Capture Behavior, Convert to CircuitRTL Design MethodRTL Design Method: “Preview” ExamplePreview Example: Step 1 --Capture High-Level State MachinePreview Example: Step 2 -- Create DatapathPreview Example: Step 3 – Connect Datapath to a ControllerPreview Example: Step 4 – Derive the Controller’s FSMPreview Example: Completing the DesignStep 1: Create a High-Level State MachineStep 1 Example: Laser-Based Distance MeasurerStep 1 Example: Laser-Based Distance MeasurerStep 1 Example: Laser-Based Distance MeasurerStep 1 Example: Laser-Based Distance MeasurerStep 1 Example: Laser-Based Distance MeasurerStep 1 Example: Laser-Based Distance MeasurerStep 1 Example: Laser-Based Distance MeasurerStep 2: Create a DatapathStep 2 Example: Laser-Based Distance MeasurerStep 2 Example: Laser-Based Distance MeasurerStep 2 Example Showing Mux UseStep 3: Connecting the Datapath to a ControllerStep 4: Deriving the Controller’s FSMStep 4: Deriving the Controller’s FSMStep 4RTL Design Examples and IssuesRTL Example: Bus InterfaceRTL Example: Bus InterfaceRTL Example: Bus InterfaceRTL Example: Bus InterfaceRTL Example: Video Compression – Sum of Absolute DifferencesRTL Example: Video Compression – Sum of Absolute DifferencesRTL Example: Video Compression – Sum of Absolute DifferencesRTL Example: Video Compression – Sum of Absolute DifferencesRTL Example: Video Compression – Sum of Absolute DifferencesRTL Example: Video Compression – Sum of Absolute DifferencesRTL Example: Video Compression – Sum of Absolute DifferencesRTL Design Pitfalls and Good PracticeRTL Design Pitfalls and Good PracticeRTL Design Pitfalls and Good PracticeRTL Design Pitfalls and Good PracticeControl vs. Data Dominated RTL DesignData Dominated RTL Design Example: FIR FilterData Dominated RTL Design Example: FIR FilterData Dominated RTL Design Example: FIR FilterData Dominated RTL Design Example: FIR FilterData Dominated RTL Design Example: FIR FilterData Dominated RTL Design Example: FIR FilterData Dominated RTL Design Example: FIR FilterDetermining Clock FrequencyCritical PathCritical Path Considering Wire DelaysA Circuit May Have Numerous PathsBehavioral Level Design: C to GatesBehavioral-Level Design: Start with C (or Similar Language)Converting from C to High-Level State MachineConverting from C to High-Level State MachineSimple Example of Converting from C to High-Level State MachineExample: Converting Sum-of-Absolute-Differences C code to High-Level State MachineMemory ComponentsRandom Access Memory (RAM)RAM Internal StructureStatic RAM (SRAM)Static RAM (SRAM)Dynamic RAM (DRAM)Comparing Memory TypesReading and Writing a RAMRAM Example: Digital Sound RecorderRAM Example: Digital Sound RecorderRAM Example: Digital Sound RecorderRead-Only Memory – ROMRead-Only Memory – ROMROM TypesROM TypesROM TypesROM TypesROM Example: Talking DollROM Example: Talking DollROM Example: Digital Telephone Answering Machine Using a Flash MemoryROM Example: Digital Telephone Answering Machine Using a Flash MemoryBlurring of Distinction Between ROM and RAMQueuesQueuesQueuesQueue ImplementationCommon Uses of a QueueQueue Usage ExampleHierarchy – A Key Design ConceptHierarchy and AbstractionHierarchy and Composing Larger Components from Smaller VersionsHierarchy and Composing Larger Components from Smaller VersionsHierarchy and Composing Larger Components from Smaller VersionsChapter Summary