v4 tech module dsp
TRANSCRIPT
-
8/10/2019 V4 Tech Module DSP
1/24
Xilinx Advanced Products Division
Digital SignalProcessingVersion 2.0
January 2005
-
8/10/2019 V4 Tech Module DSP
2/24
Virtex-4 DSP 2
Agenda
Introduction Background
Virtex-4 Solutions
Summary
-
8/10/2019 V4 Tech Module DSP
3/24
Introduction
-
8/10/2019 V4 Tech Module DSP
4/24
Virtex-4 DSP 4
High-Speed DSP Challenges
High performance digital communicationand video imaging designs challenge
existing DSP solutions Need higher performance
Need lower costs
Need lower power
Compromises are often made Performance is sacrificed
Time is spent designing substituteimplementations
-
8/10/2019 V4 Tech Module DSP
5/24
Virtex-4 DSP 5
Achieve DSP Performance and
Efficiency in Virtex-4
Virtex-4 XtremeDSP
Performance 512 XtremeDSP slices at 500MHz
256 GMACs/s DSP bandwidth
Low Power
2.3mW/100MHz scalable power efficiency
Value
Operate the XtremeDSP slice in over 40 different modes
Highest DSP bandwidth per dollar solution
-
8/10/2019 V4 Tech Module DSP
6/24
Background
-
8/10/2019 V4 Tech Module DSP
7/24
Virtex-4 DSP 7
FPGAs Enable Massively Parallel
DSP
Data OutData Out
MAC UnitMAC Unit
CoefficientsCoefficients
Programmable DSPProgrammable DSP -- SequentialSequential
1 GHz1 GHz1 GHz
256 clock cycles256 clock cycles256 clock cycles
= 4 MSPS= 4 MSPS= 4 MSPS
256 clock256 clock
cyclescyclesneededneeded
Data InData In
XX
++
RegReg
500 MHz500 MHz500 MHz
1 clock cycle1 clock cycle1 clock cycle
= 500 MSPS= 500 MSPS= 500 MSPS
Data OutData Out
FPGAFPGA -- Fully Parallel ImplementationFully Parallel Implementation
256 operations256 operations
in 1 clock cyclein 1 clock cycle
Data InData In
XX
++
C0C0 C0C0XXC1C1 XXC2C2 XXC3C3 XXC255C255
the unprecedented signal processing requirements of nextthe unprecedented signal processing requirements of next--generation wirelessgeneration wireless
devices threaten to outpace the capabil ities of DSP processors,devices threaten to outpace the capabili ties of DSP processors, creating opportunitiescreating opportunities
for massively parallel and highly customized devices.for massively parallel and highly customized devices. BDTI, 2004BDTI, 2004
Example 256 TAP Filter ImplementationExample 256 TAP Filter Implementation
Reg
Reg
Reg
Reg
Reg
Reg
Reg
Reg
-
8/10/2019 V4 Tech Module DSP
8/24
Virtex-4 DSP 8
Parallel Adder Tree Implementation
Consumes FPGA resources
Fabric and Routing MayFabric and Routing May
Reduce PerformanceReduce Performance
32 TAP filter implementation willconsume 1,461 logic cells toimplement adders in fabric
Parallel Adder Tree ImplementationParallel Adder Tree Implementation
Data InData In
XX
++
C0C0 C0C0XXC1C1 XXC2C2 XXC3C3
++++
XXC4C4 C0C0XXC5C5 XXC6C6 XXC7C7 XXC30C30 XXC31C31
++++ ++++
Data OutData Out
++
++Consumes Logic toConsumes Logic to
Implement AddersImplement Adders VariableVariable
LatencyLatency
Reg
Reg
Reg
Reg
Reg
Reg
Reg
Reg
Reg
Reg
Reg
Reg
Reg
Reg
Reg
Reg
Reg
Reg
-
8/10/2019 V4 Tech Module DSP
9/24
Virtex-4 DSP 9
Parallel Adder Cascade ImplementationParallel Adder Cascade Implementation
Data InData In
XX
++
XX
++
XX
++
Data OutData Out
XX
++
XX
++
XX
++
XX
++
XX
++
XX
++
Filters ImplementedFilters Implemented
Entirely Within theEntirely Within the
XtremeDSP SliceXtremeDSP SliceGuaranteed 500MHz PerformanceGuaranteed 500MHz Performance
Regardless of Filter SizeRegardless of Filter Size
Virtex-4 Parallel Implementation
Consumes Zero Logic Resources
32 TAP filter implementation using32 XtremeDSP Slices
C0C0 C1C1 C2C2 C3C3 C5C5 C6C6 C7C7 C30C30 C31C31C4C4 XX
++
Re
g
Re
g
Re
g
Re
g
Re
g
Re
g
Re
g
Re
g
Re
g
Re
g
Re
g
Re
g
Re
g
Re
g
Re
g
Re
g
Re
g
Re
g
Re
g
Re
g
Re
g
Re
g
Re
g
Re
g
Re
g
Re
g
Re
g
Re
g
Re
g
Re
g
Re
g
Re
g
Re
g
Re
g
Re
g
Re
g
Reg
Reg
Reg
Reg
Reg
Reg
Reg
Reg
Reg
Reg
Reg
Reg
Reg
Reg
Reg
Reg
Reg
Reg
-
8/10/2019 V4 Tech Module DSP
10/24
Virtex-4 DSP 10
Xilinx 4th Generation XtremeDSP
00
5050
100100
150150
200200
250250
VirtexVirtex--EE VirtexVirtex--IIII VirtexVirtex--II ProII Pro VirtexVirtex--44
DSP BandwidthDSP Bandwidth
GMACs/sGMACs/s
1st Generation
11GMACs/s
11stst GenerationGeneration
11GMACs/s11GMACs/s
2nd Generation
32GMACs/s
22ndnd GenerationGeneration
32GMACs/s32GMACs/s
3rd Generation
111GMACs/s
33rdrd GenerationGeneration
111GMACs/s111GMACs/s
4th Generation
256GMACs/s
44thth GenerationGeneration
256GMACs/s256GMACs/s
Virtex-4 XtremeDSP Highest DSP
Bandwidth Available
VirtexVirtex--4 XtremeDSP Highest DSP4 XtremeDSP Highest DSP
Bandwidth AvailableBandwidth Available
-
8/10/2019 V4 Tech Module DSP
11/24
Virtex-4Solutions
-
8/10/2019 V4 Tech Module DSP
12/24
Virtex-4 DSP 12
Full Custom Design Results in Higher
PerformanceScalable 500MHz performance is impossible with
Standard Cell libraries and Standard Cell design flow
Scalable 500MHz performance is impossible withScalable 500MHz performance is impossible with
Standard Cell libraries and Standard Cell design flowStandard Cell libraries and Standard Cell design flow
Ar ithmatica Parallel Counter
20% faster performance and
uses less area
Ar ithmaticaAr ithmatica Parallel CounterParallel Counter
20% faster performance and20% faster performance and
uses less areauses less area
Integrated Cascade
Routing enables
scalable performance
Integrated CascadeIntegrated Cascade
Routing enablesRouting enables
scalable performancescalable performance
Ar ithmatica A+Adder
20% faster than
other implementations
Ar ithmaticaAr ithmaticaA+AdderA+Adder
20% faster than20% faster than
other implementationsother implementations
Pipeline Registers
enable 500Mhzperformance
Pipeline RegistersPipeline Registers
enable 500Mhzenable 500Mhzperformanceperformance
2X the performance
of Virtex-II Pro
2X the performance2X the performance
of Virtexof Virtex--II ProII Pro
-
8/10/2019 V4 Tech Module DSP
13/24
Virtex-4 DSP 13
Wide Filters At Full SpeedWithin the Virtex-4 DSP Slice Column
Systolic N-tap FIR
Scalable N-level deep implementation
500MHz performance at N-level deep
Uses Integrated Pipeline Registers tosynchronize filter inputs
Utilizes Input and Output Cascade Routing
Build massively parallel 512-TAP FIR filter
in a single device achieving
256 GMACs/s performance
Build massively parallel 512Build massively parallel 512--TAP FIR filterTAP FIR fi lter
in a single device achievingin a single device achieving
256 GMACs/s performance256 GMACs/s performance
Equivalent implementation would consume
444 Embedded Mult ipliers and 77,008 LCs
and would only achieve half the performance
Equivalent implementation would consumeEquivalent implementation would consume
444 Embedded Multipliers and 77,008444 Embedded Mult ipliers and 77,008 LCsLCs
and would only achieve half the performanceand would only achieve half the performance
-
8/10/2019 V4 Tech Module DSP
14/24
Virtex-4 DSP 14
Lowest Power DSP
FrequencyFrequency
(MHz)(MHz)
Number ofNumber ofXtremeDSP SlicesXtremeDSP Slices
Power (Power (mWmW))
300300200200 400400 500500100100
10001000
400400
200200
512512
300300
100100200200
800800
00
400400
600600
18x18 Multiply-Accumulate
scalable power efficiency
at 2.3mW/100MHz
18x18 Multiply18x18 Multiply--AccumulateAccumulate
scalable power efficiencyscalable power efficiency
at 2.3at 2.3mmW/100MHzW/100MHz
20 GMACs/s for .46 W20 GMACs/s for .46 W20 GMACs/s for .46 W
60 GMACs/s for 1.38 W60 GMACs/s for 1.38 W60 GMACs/s for 1.38 W
Note: Power efficiency achieved using the DSP48 component wit h a togg le rate of 38%.
It is not an entire MAC with BRAM, control path sequencer/address generator in fabric, and including external routing.
-
8/10/2019 V4 Tech Module DSP
15/24
Virtex-4 DSP 15
35x18 Complex Multiply
Imaginary Portion
35x18 Complex Multiply35x18 Complex Multiply
Imaginary PortionImaginary Portion
35x18 Complex Multiply
Real Portion
35x18 Complex Multiply35x18 Complex Multiply
Real PortionReal Portion
High-Speed, Low Power
Complex Multiply
35x18 Complex Multiplyat 500MHz
35x18 Complex Multiply35x18 Complex Multiplyat 500MHzat 500MHz
Real and Imaginary
35x18 Complex Multiply
consumes only 92mW at 500MHz
Real and ImaginaryReal and Imaginary
35x18 Complex Multiply35x18 Complex Multiply
consumes only 92mW at 500MHzconsumes only 92mW at 500MHz
Complex filter implementation Register the inputs using minimal
external resources Synchronize data using pipeline
delay elements
-
8/10/2019 V4 Tech Module DSP
16/24
-
8/10/2019 V4 Tech Module DSP
17/24
Virtex-4 DSP 17
Virtex-4 DSP SolutionsChoose the Right Combination
DSP SlicesDSP Slices
MemoryMemory
LogicLogic
DCMsDCMs
VirtexVirtex--4 LX4 LX
Logic PlatformLogic Platform
VirtexVirtex--4 SX4 SX
DSP PlatformDSP Platform
VirtexVirtex--4 FX4 FX
Full Featured PlatformFull Featured Platform
FeaturesFeatures
256GMACs/s256GMACs/s
DSP BandwidthDSP Bandwidth
96GMACs/s96GMACs/s
DSP BandwidthDSP Bandwidth
48GMACs/s48GMACs/s
DSP BandwidthDSP Bandwidth
-
8/10/2019 V4 Tech Module DSP
18/24
Virtex-4 DSP 18
Dynamically Programmable
DSP Op Modes
Enables time-division
multiplexing for DSP Over 40 different modes
Each XtremeDSP Slice
individually controllable
Change operation in a singleclock cycle
Control functionality fromlogic, memory or processor
6 5 4 3 2 1 0
Zero 0 0 0 0 0 0 0 +/- Cin
Hold P 0 0 0 0 0 1 0 +/- (P + Cin)
A:B Sel ect 0 0 0 0 0 1 1 +/- (A:B + Cin )
Multiply 0 0 0 0 1 0 1 +/- (A * B + Cin)
C Select 0 0 0 1 1 0 0 +/- (C + Cin)
Feedback Add 0 0 0 1 1 1 0 +/- (C + P + Cin)
36-Bit Adder 0 0 0 1 1 1 1 +/- (A:B + C + Cin)
P Cascade Select 0 0 1 0 0 0 0 PCIN +/- Cin
P Cascade Feedback Add 0 0 1 0 0 1 0 PCIN +/- (P + Cin)
P Cascade Add 0 0 1 0 0 1 1 PCIN +/- (A:B + Cin)
P Cascade Multiply Add 0 0 1 0 1 0 1 PCIN +/- (A * B + Cin)
P Cascade Add 0 0 1 1 1 0 0 PCIN +/- (C + Cin)
P Cascade Feedback Add Ad 0 0 1 1 1 1 0 PCIN +/- (C + P + Cin)P Cascade Add Add 0 0 1 1 1 1 1 PCIN +/- (A:B + C + Cin)
Hold P 0 1 0 0 0 0 0 P +/- Cin
Double Feedback Add 0 1 0 0 0 1 0 P +/- (P + Cin)
Feedback Add 0 1 0 0 0 1 1 P +/- (A:B + Cin)
Multiply-Accumulate 0 1 0 0 1 0 1 P +/- (A * B + Cin)
Feedback Add 0 1 0 1 1 0 0 P +/- (C + Cin)
Double Feedback Add 0 1 0 1 1 1 0 P +/- (C + P + Cin)
Feedback Add Add 0 1 0 1 1 1 1 P +/- (A:B + C + Cin)
C Select 0 1 1 0 0 0 0 C +/- Cin
Feedback Add 0 1 1 0 0 1 0 C +/- (P + Cin)
36-Bit Adder 0 1 1 0 0 1 1 C +/- (A:B + Cin)
Multiply-Add 0 1 1 0 1 0 1 C +/- (A * B + Cin)
Double 0 1 1 1 1 0 0 C +/- (C + Cin)
Double Add Feedback Add 0 1 1 1 1 1 0 C +/- (C + P + Cin)
Double Add 0 1 1 1 1 1 1 C +/- (A:B + C + Cin)
OpMode OutputXYZ
-
8/10/2019 V4 Tech Module DSP
19/24
Virtex-4 DSP 19
Virtex-4 XtremeDSP Slices Useful
For More Than DSP
6:1 high-speed, 36-bit Multiplexer
Use four XtremeDSP Slice and op-modes 500 MHz performance using no programmable logic
Save 1584 LCs to build equivalent function in logic
Dynamic 18-bit Barrel Shifter Use two XtremeDSP slices
Use dedicated cascade routing and integrated 17-bit shift Save 1449 LCs to build equivalent function in logic
36-bit Loadable Counter Use a single XtremeDSP slice, achieve 500 MHz performance
Save 540 LCs to build equivalent function in logic
-
8/10/2019 V4 Tech Module DSP
20/24
Virtex-4 DSP 20
DSP Design Services, Training & HotlineDSP Design Services, Training & Hotl ine Systems ExpertiseSystems Expertise
FPGAs with DSP FunctionsFPGAs with DSP Functions Shortest Design TimeShortest Design Time Major DSP AlliancesMajor DSP Alliances
Dedicated Field SpecialistsDedicated Field Specialists60+ Advanced DSP Cores60+ Advanced DSP Cores 60+ DSP Development Boards60+ DSP Development Boards
256256 GMACsGMACs
PerformancePerformance
50+ Field DSP
Experts
Comprehensive Library
Fast Turnaround
Exceptional Performance
Xilinx Design Services, Education and Support
Lowest CostLowest Cost
(90nm)(90nm)
Distributor
Services &
Training
DSP Division Experts
Tools, IP Solutions
-
8/10/2019 V4 Tech Module DSP
21/24
Virtex-4 DSP 21
Xilinx FPGA DSP Design
FlowFPGA Designer andFPGA Designer and
System ArchitectSystem Architect
Synthesize DesignSynthesize DesignSynthesize Design
VHDL and Coregen OutputVHDL andVHDL and CoregenCoregenOutputOutput
Specify DesignSpecify DesignSpecify DesignImplement In HardwareImplement In HardwareImplement In Hardware
DSP Architectural WizardDSP Architectural Wizard
-
8/10/2019 V4 Tech Module DSP
22/24
Summary
-
8/10/2019 V4 Tech Module DSP
23/24
Virtex-4 DSP 23
Virtex-4 XtremeDSP Enabling next generation high-performance DSP
Highest Performance
512 XtremeDSP slices at 500MHz
256 GMACs/s DSP bandwidth
Lowest Power
2.3mW/100MHz Most Value
Operate the XtremeDSP slice in over 40 different modes
Highest DSP bandwidth per dollar solution available
-
8/10/2019 V4 Tech Module DSP
24/24