vhdl coding exercise 4: fir filter
DESCRIPTION
VHDL Coding Exercise 4: FIR Filter. Where to start?. Feedback. Designspace Exploration. Algorithm. Architecture. Optimization. RTL- Block diagram. VHDL-Code. Algorithm. High-Level System Diagram Context of the design Inputs and Outputs Throughput/rates Algorithmic requirements - PowerPoint PPT PresentationTRANSCRIPT
VHDL CodingExercise 4: FIR Filter
Where to start?
Algorithm Architecture
RTL-Block diagram
VHDL-Code
Designspace Exploration
Feedback
Optimization
Algorithm• High-Level System
DiagramContext of the design
Inputs and Outputs Throughput/rates Algorithmic requirements
• Algorithm DescriptionMathematical DescriptionPerformance Criteria
Accuracy Optimization constraints
Implementation constraints Area Speed
N
ii
ikxbky0
FIR ky kx
Architecture (1)• Isomorphic Architecture:
Straight forward implementation of the algorithm
0b
1b
2b 2Nb 1Nb N
b
ky
kx
Architecture (2)• Pipelining/Retiming:
Improve timing
0b
1b
2b 2Nb 1Nb N
b
ky
kx
Insert register(s) at the inputs or outputs Increases Latency
Architecture (2)• Pipelining/Retiming:
Improve timing
0b
1b
2b 2Nb 1Nb N
b
ky
kx
Insert register(s) at the inputs or outputs Increases Latency
Perform Retiming: Move registers through the logic
without changing functionality Forward:
Backwards:
Architecture (2)• Pipelining/Retiming:
Improve timing
0b
1b
2b 2Nb 1Nb N
b
ky
kx
Insert register(s) at the inputs or outputs Increases Latency
Perform Retiming: Move registers through the logic
without changing functionality Forward:
Backwards:
Architecture (2)• Pipelining/Retiming:
Improve timing
0b
1b
2b 2Nb 1Nb N
b
ky
kx
Insert register(s) at the inputs or outputs Increases Latency
Perform Retiming: Move registers through the logic
without changing functionality Forward:
Backwards:
Architecture (3)• Retiming and simple transformation:
Optimization
0b
1b
2b 2Nb 1Nb N
b
ky
kx
Reverse the adder chain
Architecture (3)• Retiming and simple transformation:
Optimization
0b
1b
2b 2Nb 1Nb N
b
ky
kx
Reverse the adder chain
Architecture (3)• Retiming and simple transformation:
Optimization
0b
1b
2b 2Nb 1Nb N
b
kx
Reverse the adder chainPerform Retiming
ky
Architecture (3)• Retiming and simple transformation:
Optimization
0b
1b
2b 2Nb 1Nb N
b
kx
Reverse the adder chainPerform Retiming
ky
Architecture (3)• Retiming and simple transformation:
Optimization
0b
1b
2b 2Nb 1Nb N
b
kx
Reverse the adder chainPerform Retiming
ky
Architecture (3)• Retiming and simple transformation:
Optimization
0b
1b
2b 2Nb 1Nb N
b
kx
Reverse the adder chainPerform Retiming
ky
Architecture (3)• Retiming and simple transformation:
Optimization
0b
1b
2b 2Nb 1Nb N
b
kx
Reverse the adder chainPerform Retiming
ky
Architecture (3)• Retiming and simple transformation:
Optimization
0b
1b
2b 2Nb 1Nb N
b
kx
Reverse the adder chainPerform Retiming
ky
Architecture (3)• Retiming and simple transformation:
Optimization
0b
1b
2b 2Nb 1Nb N
b
kx
Reverse the adder chainPerform Retiming
ky
Architecture (3)• Retiming and simple transformation:
Optimization
0b
1b
2b 2Nb 1Nb N
b
kx
Reverse the adder chainPerform Retiming
ky
Architecture (3)• Retiming and simple transformation:
Optimization
0b
1b
2b 2Nb 1Nb N
b
kx
Reverse the adder chainPerform Retiming
ky
Architecture (3)• Retiming and simple transformation:
Optimization
0b
1b
2b 2Nb 1Nb N
b
kx
Reverse the adder chainPerform Retiming
ky
Architecture (3)• Retiming and simple transformation:
Optimization
0b
1b
2b 2Nb 1Nb N
b
kx
Reverse the adder chainPerform Retiming
ky
Architecture (4)• More pipelining:
Add one pipelining stage to the retimed circuit
0b
1b
2b 2Nb 1Nb N
b
kx
The longest path is given by the multiplier Unbalanced: The delay from input to the first pipeline
stage is much longer than the delay from the first to the second stage
ky
Architecture (5)• More pipelining:
Add one pipelining stage to the retimed circuit
0b
1b
2b 2Nb 1Nb N
b
kx
Move the pipeline registers into the multiplier: Paths between pipeline stages are balanced Improved timing
Tclock = (Tadd + Tmult)/2 + Treg
ky
Architecture (6)• Iterative Decomposition:
Reuse Hardware
Identify regularity and reusable hardware componentsAdd control
multiplexers storage elements Control
Increases Cycles/Sample
0b
1b
2b 2Nb 1Nb N
b
ky
kx
kx
0b
Nb
0 ky
RTL-Design• Choose an architecture under the following
constraints: It meets ALL timing specifications/constraints:
Throughput Latency
It consumes the smallest possible area It requires the least possible amount of power
• Decide which additional functions are needed and how they can be implemented efficiently:Storage of samples x(k) => MEMORYStorage of coefficients bi => LUTAddress generators for MEMORY and LUT
=> COUNTERSControl => FSM
IterativeDecomposition
kx
0b
Nb
0 ky
RTL-Design• RTL Block-diagram:
Datapath
N
ii
ikxbky0
• FSM: Interface protocols
datapath control:
kx
0b
Nb
0 ky
RTL-Design• How it works:
IDLE Wait for new sample
N
ii
ikxbky0
RTL-Design• How it works:
IDLE Wait for new sample Store to input register
N
ii
ikxbky0
RTL-Design• How it works:
IDLE Wait for new sample Store to input register
NEW DATA: Store new sample to memory
N
ii
ikxbky0
RTL-Design• How it works:
IDLE Wait for new sample Store to input register
NEW DATA: Store new sample to memory
RUN:
N
ii
ikxbky0
N
ii
ikxbky0
RTL-Design• How it works:
IDLE Wait for new sample Store to input register
NEW DATA: Store new sample to memory
RUN: Store result to output register
N
ii
ikxbky0
N
ii
ikxbky0
RTL-Design• How it works:
IDLE Wait for new sample Store to input register
NEW DATA: Store new sample to memory
RUN: Store result to output register
DATA OUT: Output result
N
ii
ikxbky0
N
ii
ikxbky0
RTL-Design• How it works:
IDLE Wait for new sample Store to input register
NEW DATA: Store new sample to memory
RUN: Store result to output register
DATA OUT: Output result / Wait for ACK
N
ii
ikxbky0
N
ii
ikxbky0
RTL-Design• How it works:
IDLE Wait for new sample Store to input register
NEW DATA: Store new sample to memory
RUN: Store result to output register
DATA OUT: Output result / Wait for ACK
IDLE: …
N
ii
ikxbky0
N
ii
ikxbky0
Translation into VHDL• Some basic VHDL building blocks:
Signal Assignments: Outside a process:
Within a process (sequential execution):
AxD YxD
AxDYxD
BxD
• Sequential execution• The last assignment is kept when the process terminates
AxD YxD
BxD
• This is NOT allowed !!!
Translation into VHDL• Some basic VHDL building blocks:
Multiplexer:
Conditional Statements:
AxD
BxD YxD
SELxS
CxD Default Assignment
AxD
BxD
SelAxS
CxD
DxD
OUTxD
SelBxS
STATExDP
Translation into VHDL• Common mistakes with conditional statements:
Example:
AxD
??
SelAxS
BxD
??
OUTxD
SelBxS
STATExDP
• NO default assignment
• NO else statement
• ASSIGNING NOTHING TO A SIGNAL IS NOT A WAY TO KEEP ITS VALUE !!!!! => Use FlipFlops !!!
Translation into VHDL• Some basic VHDL building blocks:
Register:
Register with ENABLE:
DataREGxDN DataREGxDP
DataREGxDN DataREGxDP
DataREGxDN DataREGxDP
Translation into VHDL• Common mistakes with sequential processes:
DataREGxDN DataREGxDP
CLKxCI
DataRegENxS
DataREGxDN DataREGxDP
CLKxCI
DataRegENxS
DataREGxDN DataREGxDP
0
1
• Can not be translated into hardware and is NOT allowed
• Clocks are NEVER generated within any logic
• Gated clocks are more complicated then this• Avoid them !!!
Translation into VHDL• Some basic rules:
Sequential processes (FlipFlops) Only CLOCK and RESET in the sensitivity list Logic signals are NEVER used as clock signals
Combinatorial processes Multiple assignments to the same signal are ONLY possible
within the same process => ONLY the last assignment is valid
Something must be assigned to each signal in any case OR There MUST be an ELSE for every IF statement
• More rules that help to avoid problems and surprises:Use separate signals for the PRESENT state and the
NEXT state of every FlipFlop in your design.Use variables ONLY to store intermediate results or even
avoid them whenever possible in an RTL design.
Translation into VHDL• Write the ENTITY definition of your design to
specify: Inputs, Outputs and Generics
Translation into VHDL• Describe the functional units in your block
diagram one after another in the architecture section:
Translation into VHDL• Describe the functional units in your block
diagram one after another in the architecture section:
Translation into VHDL• Describe the functional units in your block
diagram one after another in the architecture section:
Register with ENABLE
Register with ENABLE
Translation into VHDL• Describe the functional units in your block
diagram one after another in the architecture section:
Register with CLEAR
Translation into VHDL• Describe the functional units in your block
diagram one after another in the architecture section:
Counter
Counter
Translation into VHDL• Describe the functional units in your block
diagram one after another in the architecture section:
Translation into VHDL• The FSM is described with one sequential
process and one combinatorial process
Translation into VHDL• The FSM is described with one sequential
process and one combinatorial process
Translation into VHDL• The FSM is described with one sequential
process and one combinatorial process
Translation into VHDL• The FSM is described with one sequential
process and one combinatorial process
MEALY
Translation into VHDL• The FSM is described with one sequential
process and one combinatorial process
Translation into VHDL• The FSM is described with one sequential
process and one combinatorial process
MEALY
Translation into VHDL• The FSM is described with one sequential
process and one combinatorial process
MEALY
Translation into VHDL• Complete and check the code:
Declare the signals and components
Check and complete the sensitivity lists of ALL combinatorial processes with ALL signals that are: used as condition in any IF or CASE statement being assigned to any other signal used in any operation with any other signal
Check the sensitivity lists of ALL sequential processes that they contain ONLY one global clock and one global async. reset
signal no other signals
Other Good Ideas• Keep things simple• Partition the design (Divide et Impera):
Example: Start processing the next sample, while the previous result is waiting in the output register: Just add a FIFO to at the output of you filter
• Do NOT try to optimize each Gate or FlipFlop• Do not try to save cycles if not necessary• VHDL code
Is usually long and that is good !! Is just a representation of your block diagramDoes not mind hierarchy