lecture 2 (i)
TRANSCRIPT
Lecture 2 (I): Lecture 2 (I): Pipelining & RetimingPipelining & Retiming
Hsie-Chia Chang 張錫嘉
E-mail : [email protected]
Fall 2006
2Optimized Application-Specific Integrated Systems
OutlineOutline
Pipelining of FIR Digital filters– Data-Broadcast Structures
– Fine-Grain Pipelining
Parallel Processing
Pipelining and Parallel Processing for Low Power
Retiming– Definitions and Properties
– Solving Systems of Inequalities
– Retiming Techniques• Cutset Retiming & Pipelining• Retiming for Clock Period Minimization• Retiming for Register Minimization
3Optimized Application-Specific Integrated Systems
IntroductionIntroduction
– If some real-time application requires a faster input rate, the critical path can be reduced by either pipelining or parallel processing
4Optimized Application-Specific Integrated Systems
Pipelining & Parallel Processing (1/2)Pipelining & Parallel Processing (1/2)
Pipelining– Reduce the effective critical path by introducing pipelining
latches along the critical datapath
– Without any pipelining latches, the critical path can be reducedby
Parallel processing– Increase the sampling by replicating hardware so that inputs can
be processed in parallel; outputs can be produced at the same time
This techniques applied in the non-recursive computations
continue sending
Tsample=TCLK Tsample≠TCLK
5Optimized Application-Specific Integrated Systems
Pipelining & Parallel Processing (2/2)Pipelining & Parallel Processing (2/2)
Example 2:
6Optimized Application-Specific Integrated Systems
Pipelining of FIR Digital FiltersPipelining of FIR Digital Filters
Schedule of Events in the Pipelined FIR Filter
TCritical=TM+TA
7Optimized Application-Specific Integrated Systems
CutsetCutset Pipelining (1/2)Pipelining (1/2)
The speed is limited by the longest path between– any two latches– an input & a latch– a latch & an output– The input & the output
2-level pipelined structure– The longest path can be reduced by suitably placing the pipelining
latches in the architecture
– In this system, at any time, 2 consecutive outputs are computed in an interleaved manner
– Drawbacks••
8Optimized Application-Specific Integrated Systems
CutsetCutset Pipelining (2/2)Pipelining (2/2)
Cutset
Feed-forward cutset
– We can arbitrarily place latches on
a feed-forward cutset of any FIR
filter structure without affecting the
functionality of the algorithm
+ kD
+kD
+ kD
cutset
G2
G1
9Optimized Application-Specific Integrated Systems
Example 3.2.1Example 3.2.1
10Optimized Application-Specific Integrated Systems
DataData--Broadcast StructuresBroadcast Structures
11Optimized Application-Specific Integrated Systems
FineFine--grain Pipelininggrain Pipelining
12Optimized Application-Specific Integrated Systems
Parallel ProcessingParallel Processing
Parallel processing are also referred to as block processing– Block size = no. of inputs processed in a clock cycle
– For a 3-tap FRI filter, the duplicate hardware can be shown as:
In MIMO,
)2()1()()( −+−+= ncxnbxnaxny
++++=+−+++=+−+−+=
)3()13()23()23()13()3()13()13()23()13()3()3(
kcxkbxkaxkykcxkbxkaxkykcxkbxkaxky
delayBlock delay
13Optimized Application-Specific Integrated Systems
Complete Parallel Processing SystemsComplete Parallel Processing Systems
– A serial-to-parallel converter – A parallel-to-serial converter
14Optimized Application-Specific Integrated Systems
Why use Parallel Processing??Why use Parallel Processing??
Communication bounded– When the critical path is less than Tcommunication, the I/O bound
dominates and this system is communication bounded.
– Pipelining can be used only to the extent such that the critical path is limited by the communication bound.
– Once this is reached, pipelining can no longer increase the speed
15Optimized Application-Specific Integrated Systems
Combined Pipelining & Parallel ProcessingCombined Pipelining & Parallel Processing
– After combining M-level pipelining and L-level parallel processing,
16Optimized Application-Specific Integrated Systems
CMOS Power Consumption (1/2)CMOS Power Consumption (1/2)
Ptotal=Pdynamic+Pshort-circuit+Pstatic
Short circuit– current spikes
Static Power– leakage current
17Optimized Application-Specific Integrated Systems
CMOS Power Consumption (2/2)CMOS Power Consumption (2/2)
Based on simple approximation & 1st-order analysis– Propagation delay
Ccharge the capacitance to be charged or discharged in a singleclock cycle (along the critical path)
V0、Vt the supply voltage、the threshold voltage
K a function of technology parameters
– Power consumption
Ctotal the total capacitance of the CMOS circuit
f clock frequency of the circuit
fVCP total ⋅⋅= 20
( )20
0chargepd
tVVkVC
T−
⋅=
18Optimized Application-Specific Integrated Systems
Low Power DesignLow Power Design
To reduce– Capacitances
• Transistor/Gate C• Load C• Interconnects• External
– Activity– Frequency– Power supply
Other issues– Off-chip connections have high capacitive load
– System integration
19Optimized Application-Specific Integrated Systems
Pipelining for Low Power (1/2)Pipelining for Low Power (1/2)
For an M-level pipelined architecture,– the critical path is reduced to 1/M and the capacitance to be
charged/discharged in a single cycle (Ccharge) is also reduced to 1/M
If the same clock speed is maintained (f = 1/Tpd),– only 1/M of the non-pipelined capacitance is required to be charged
or discharged, which suggests voltage reduction– Suppose the voltage can be reduced to ,
the power consumption becomes0V⋅β
( )
pipelinednon
totalpipelined
P
fVCP
−⋅=
⋅⋅⋅=2
20
β
β
20Optimized Application-Specific Integrated Systems
Pipelining for Low Power (2/2)Pipelining for Low Power (2/2)
– propagation delay of the original architecture
– propagation delay of the pipelined architecture
– setting the above two equations equal, the following quadratic equation can be obtained to solve β
( ) ( )202
0 tt VVVVM −⋅=−⋅ ββ
21Optimized Application-Specific Integrated Systems
Example 3.4.1: Reduce Power by PipeliningExample 3.4.1: Reduce Power by Pipelining
Consider the following two FIR filters.
– What is the supply voltage of the pipelined architecture if the clock periods are identical?
– What is the relative power consumption?
D y(n)D
x(n)
D y(n)D
x(n)
D D D
m1
m2
m1 m1
m2 m2
22Optimized Application-Specific Integrated Systems
SolutionSolution
23Optimized Application-Specific Integrated Systems
Parallel Processing for Low Power (1/2)Parallel Processing for Low Power (1/2)
For an L-parallel architecture, – the charge capacitance remains the same,
but the total capacitance (Ctotal) is increased L times
To maintain the same sample rate,– The clock speed is reduced to 1/L (f = 1/LTpd), which means the
Ccharge is charged or discharged L times longer.
– The supply voltage can be reduced to , the power consumption becomes
0V⋅β
( ) ( )
parallelnon
totalparallel
PLfVCLP
−⋅=
⋅⋅⋅⋅=
2
20
β
β
24Optimized Application-Specific Integrated Systems
Parallel Processing for Low Power (2/2)Parallel Processing for Low Power (2/2)
– propagation delay of the original architecture
– propagation delay of the parallel architecture
– setting these two propagation delays equal, the following quadratic equation can be obtained to solve β
( ) ( )202
0 tt VVVVL −⋅=−⋅ ββ
25Optimized Application-Specific Integrated Systems
Example 3.4.2: Reduce Power by ParallelExample 3.4.2: Reduce Power by Parallel
Consider the following two FIR filters, with critical paths denoted in dash lines respectively
– What is the supply voltage of the parallel architecture?
– What is the relative power consumption?
D y(n)D
x(n)
D D y(2k+1)
x(2k)
y(2k)D D
x(2k+1)
26Optimized Application-Specific Integrated Systems
SolutionSolution
27Optimized Application-Specific Integrated Systems
Example 3.4.3Example 3.4.3
Area-efficient architecture
28Optimized Application-Specific Integrated Systems
SummarySummary
In pipelining & parallel processing,– M-level pipelining,
– L-level parallel processing,
– Combining M-level pipelining & L-level parallel processing,
For low power design,– Pipelining
– Parallel Processing
– Combining Pipelining and Parallel Processing