Academy - Xilinx DSP Page 1
Academy - Xilinx DSP Page 2
Existing DSP Solutions• Fixed function DSP devices• ASICs• Standard DSP processors (only programmable solution)
But what do you do when ...… the fastest DSP Processor Is Not Fast Enough?
Add more DSP processors?
Design a custom gate array?
Academy - Xilinx DSP Page 3
Performance Through Parallel Processing
DSP ProcessorMAC
MAC
MAC
MAC
MAC
MAC
MAC
MAC
MAC
MAC
MAC
MAC
MAC
MAC
MAC
MAC
MAC
MAC
MAC
MAC
MAC
MAC
MAC
MAC
MAC
MAC
MAC
MAC
MAC
MAC
MAC
MAC
MAC
MAC
MAC
MAC
MAC
MAC
MAC
MAC
MAC MAC MAC MAC MAC
As many MACs in parallel as you need
Xilinx FPGA
Time-share 1 or 2 or 4 MACs
CPU& MAC(s)RAM
RAM ROMROM
Peripherals
Peripherals
Academy - Xilinx DSP Page 4
GIG
A-M
AC
s
S30 S40 4036 4062320C6x
1
2
3
4
5
4085
16-bit FIR Filter Benchmark
320C6x XC4000XL
$0.25
$0.20
$0.15
$0.10
$0.05
10X the Performance 1/5th the Cost
PLUS Faster Time-to-Market
40125
Xilinx DSP Advantages
Academy - Xilinx DSP Page 5
Xilinx DSP: Complete High-Performance Programmable Solution
• Xilinx FPGAs - Spartan, 4K, Virtex• Design tools and DSP IP
– LogiCORE & AllianceCORE– CORE Generator software– Reference designs on PreLINX internal web – Elanix - SystemView - integration
• DSP Prototyping boards
• DSP Starter Kit• DSP Support
– Ph.D. Eng– DSP FAEs– Design services
Modeling Tools
DSPFunctions
Academy - Xilinx DSP Page 6
Device Family -------------Size --------------------------- Performance ---------------
Design SourceBehavioral ModelInstantiation CodeTest BenchReference DesignCore Source
Deliverables Applications
XC40001200 CLBs including buffers75 MHz, -08
2D Discrete Cosine Transform ReferenceDesign
Features• Under 2 usec. continuous transform time• 8 x 8 points, 2D DCT• 12-bit 2’s complement data in• 16 bit resolution internal coefficients• Efficient one bit bit clock distributed arithmetic algorithm• Distributed RAM corner turning buffer
Smart-IPTechnology
Smart-IPTechnology
• Video image compression• JPEG, MPEG building block• Video conferencing
Netlist (with R-Loc’s)NoVHDL and VerilogTest vectors availableData sheetPreLINX (Turney)
12-bits
ClockLoad Data
16-bits
Valid (Data on output)Busy (Don’t write data into core)
Data Out64 Pixels
8 x 8 2D, DCT
Corner Turn Buffer 64x12x2Distributed RAM
1D - DCTEngineInput Data
64, 12-bit Pixels
1D - DCTEngine
Non-parameterizable Core• Sign extend for less than 12 bit inputs
Example PerLINX
Reference Design
Academy - Xilinx DSP Page 7
GVA-200 FPGA DSP Prototyping Board
Academy - Xilinx DSP Page 8
DSP Market Opportunity
Custom, Embedded uPBuilding Blocks,Multimedia uP
DSP Processors$3.8B
FASICS$5.8B
$.4
• Medium volume Gate Arrays, Embedded Processors• Prototyping
• Multiple processors• High cost processors
• Building Blocks
$10B 1998 Total DSP ICs
Xilinx DSP TAM
uP FASIC
Academy - Xilinx DSP Page 9
DSP Processor Market Growth
‘97 ‘98 ‘99 ‘00 ‘01
$2,000
$4,000
$6,000
$8,000
$10,000
$12,000
‘02
>33% CAGRHigh Performance
DSP Processors
$Millions
HighPerformance
Portion
Xilinx DSP TAMGrows FasterThan OverallMarket
Source: Forward Concepts & Xilinx Estimate
Academy - Xilinx DSP Page 10
Xilinx Portion of DSP Processor Market
• More performance required for more new designs• Sample rates above 1 MHz = Xilinx DSP opportunity
New design starts
> 1 MHz Data Sample Rates
< 1 MHz
26%
DSP Designs Starts
Survey source: Forward Concepts
Academy - Xilinx DSP Page 11
FPGAs Now A Factor In DSP
Fixed-Point DSP Processors
Floating-Point DSP Processors
ASICs
RISC
FPGAs
Source:Forward Concept ‘98 Survey
Chips Employed For DSP Functions (# Responses)
Xilinx DSP
“FPGAs represent a fast-growing market segment and are being increasingly employed for high-performance computation, often with the added benefit of reconfigurability”
Academy - Xilinx DSP Page 12
DSP Processors$3.8
Xilinx DSP =High Performance,> 1 Processor,MHz sample rates
• Communications 74% (Wireless, modems)
• Computer 13% (Hard disk drives)
Consumer 2%• Industrial 3%• Instrumentation 2.3%• Military 3.5%• Office Automation 2%
DSP Processor Market Segments
Academy - Xilinx DSP Page 13
Communications: Largest FPGA Segment • Satellite modems• Cable modems• Copper - twisted pair
• xDSL• Modem banks• Telecom test equipment
• Wireless• Cellular / PCS
• Base stations• Test equipment• Spread spectrum
• Wireless local loop• Microwave internet • Smart antennas
• Satellite modems• Cable modems• Copper - twisted pair
• xDSL• Modem banks• Telecom test equipment
• Wireless• Cellular / PCS
• Base stations• Test equipment• Spread spectrum
• Wireless local loop• Microwave internet • Smart antennas
Communication Applications
Academy - Xilinx DSP Page 14
Communications Applications• Common functions required:
– Filters: interpolation, decimation, standard• sample rates from 2 MSPS to 80 MSPS• from 7 to 128 taps
– NCO and Mixer• 32 bit phase accumulator, 1024 point table, 8 to 12 bit input data• 10 - 60 MSPS
– Other• rectangular to polar conversion, power measurement, delay elements• 8 to 16 bit input, 1 to 20 MSPS
– Multipliers– R-S en/decoders, turbo codes
Academy - Xilinx DSP Page 15
Satellite Modem Board
Academy - Xilinx DSP Page 16
Other High Performance Applications• Video and image processing
– Medical - Ultrasound, MRI, CT– Industrial - security, manufacturing– Set top boxes– Digital TV broadcast equipment– HDTV
• Military - radar, sonar, encryption, guidance, navigation, software radios• Instrumentation • Office automation
Academy - Xilinx DSP Page 17
Altera DSP Cores
Altera IP:LPM MultiplierCFFT ‘97DS FS7
3rd Party IP:*Reed-Solomon HammerCoreAdaptive equalizer HammerCore
Square root ISSFloating-point Add, Divide, Mult ISSRank order ISSMedian filter ISSIIR ISSFIR ISSDecimating FIR ISSInteger divide ISSDCT, FFT ISSImage processing lib. ISSJPEG ISSADPCM ISSAdaptive FIR ISSReed Solomon ISSViterbi ISSTrellis ISSConvolution encoder ISSWavelet Filter ISSAdaptive eqilizer ISSBlock $ convolutional Interlieve ISSViterbi CASTDecimating filter FASTMANWavelet filter FASTMANConvolutional interleaver (Cable & PCS) Ktech TelecomTelephone tone generator NcommNCO Nova EngineeringDigital modulator Nova EngineeringLinear feedback shift register Nova EngineeringBinary correlator Nova Engineering
Reference Designs:
SDA and Parallel FIR Filter FS18, 16, 24, 32, and 64 tapsAny data widthPipelining, symmetry3x3 Video filter
Floating-Point Add / Sub FS2Floating-Point Mult FS4Integer Divide FS3Rounder FS5Saturator FS6RGB2YCrCb YCrCb2RGB ‘97DS
New CoresCordic
Academy - Xilinx DSP Page 18
Altera Competitive Analysis• Cores
– Good Multipliers and Parallel FIR– Inefficient SDA FIR filters– Most Altera reference designs are parameterizable
• Easy to do with AHDL code• Not Smart-IP
– More 3rd party DSP IP - but not effective– Little internal DSP IP
• Customer 3rd party IP evaluation capability (security)• No announced DSP system level tools
Academy - Xilinx DSP Page 19
Xilinx AllianceCOREs
Academy - Xilinx DSP Page 20
Xilinx DSP LogiCOREs
Academy - Xilinx DSP Page 21
Xilinx DSP LogiCOREs
Academy - Xilinx DSP Page 22
Xilinx DSP LogiCOREs
Academy - Xilinx DSP Page 23
Xilinx DSP LogiCOREs
Academy - Xilinx DSP Page 24
20 Million FIR Filters
Academy - Xilinx DSP Page 25
SDA FIR Filter
• Most commonly used core • Parallel In, Parallel Out• Bit serial processing• All taps processed in parallel• Full precession through entire core• One clock cycle required for each data bit• One additional clock cycle for symmetric filters
Academy - Xilinx DSP Page 26
The Solution to High Performance DSP• Higher performance
– 10X faster, parallel processing• Lower power
– 50% to 80% less than DSP processors• Lower price
– 1/5th the cost, Spartan FPGAs at ASIC prices• Faster time to market
– Simpler design flow, no real-time software
Add a Xilinx FPGA, not more processors