challenges and opportunities for fpga platforms
DESCRIPTION
Challenges and opportunities for FPGA platforms. Ivo Bolsens Xilinx Research Labs. Thanks to. Bill Carter David Eden Erich Goetting Alireza Kaviani Bernie New Cameron Patterson Steve Trimberger Tim Tuan. Overview. FPGA’s ride the tide Opportunities Challenges. - PowerPoint PPT PresentationTRANSCRIPT
-
Challenges and opportunities for FPGA platformsIvo Bolsens Xilinx Research Labs
-
Thanks toBill CarterDavid EdenErich Goetting Alireza KavianiBernie NewCameron PattersonSteve TrimbergerTim Tuan
-
OverviewFPGAs ride the tideOpportunitiesChallenges
-
ASICs buck the tide, FPGAs ride the tideProcess TechnologyPerformanceArchitectureCostFlexibilityMarket trends
-
Moores LawA tale of two numbers : What process people dont tell you CD80nm240nm320nm160nm2.71.34.56.5(nm)CDTox
-
Trend: Line Widths Smaller Than the Wavelength of Light -0.1000.2000.3000.4000.5000.6000.70019881990199219941996199820002002Process Geometry (micron)Optical Processing WavelengthProcess Geometry
-
Painting a one cm line with a three cm brushCourtesy : IBM
-
Gate OxideAbout 10 molecular layers of SiO2 for this 150nm example90nm technology is about half the thicknessPolysilicon GateGate OxideSilicon crystal
-
FPGAs are ahead of the curveProcess Technology Feature Size (nm)Year35025018015013010070979899000102030405
-
FPGA Process Technology: Leading the transistor count in MOS Logic2V6000:~250 Million transistors per chip !2V8000: ~360 Million transistors per chip !
-
Where are we todayLogic Cells125K105KBlock RAM10Mb3MbMultipliers5561683.125Gb/sMGTs424PowerPC CPUs840Mb/s LVDS340442XC2VP125XC2V8000= 350M tranistors
-
Intels RoadmapSource : IntelFPGAs are leading
-
Gate count requirement for ASICsFPGAs can address very large part of the ASIC market todaySource: IMS
-
Performance requirement for ASICsFPGAs can address very large part of the ASIC market todaySource: IMS 2000
-
A Decade of Progress11010010001/911/921/931/941/951/961/971/981/991/001/01YearCapacitySpeedPriceVirtex &Virtex-E(excl. Block RAM)XC4000100x10x1xSpartan1000xVirtex-II(excl. Block RAM)
-
The Cost/Volume Crossover
Unit VolumeRelative Cost0.11101001000101001,00010,000100,0001,000KASIC CostFPGA Cost
-
Are Transistors Free?
-
Performance Scaling
-
Localisation of storage and computingll/2++++storestore2Courtesy :IMEC
-
Market Requirements607080900010/ human
-
Electronics Industry DynamicsTimeCable DecodersCustom Features(Pay-Per-View)DigitalVCRSatellite/Cable+ DigitalVCRResidential Gateway(Broadband access) Market Size ($)NTSC NTSCSmart cards(DES)NTSCDESATAPINTSCDESATAPIDBSDOCSISNTSCDESATAPIDBSDOCSISHomePNAHomeRFHomePLUGBluetoothHiperlan2DSL...Dramatic increasein new standards New Products Take less time to reach high volumes Shorter Product Life Cycles Many standards / More Interoperability
-
Complex ASIC DesignThe Shrinking Window of InnovationSynthesis16% Average iterations between design and layout = 20(Source Electronic Systems Jan 99)
-
Simpler/Faster Design Flows2:1 proven Time-to-Market AdvantageNo silicon design or verification stepsMore design flexibility through later design freezeSpecASICFlowDesignFreezeDesign andVerificationSiliconPrototypeSystemIntegrationSiliconProductionSpecFPGAFlowDesignFreezeDesign andVerificationSystemIntegration
-
Todays Product Lifecycle37% of new digital products were late to marketEntering the market first can result in up to a 40% greater total profit contribution over the products life vs. the #2 entrantReduced profit for latecomersProfit for first to MarketProfitTime
-
Todays Product Lifecycle37% of new digital products were late to marketEntering the market first can result in up to a 40% greater total profit contribution over the products life vs. the #2 entrantIRL extendsproduct life in marketProfitTime
-
Virtex-II Pro PowerPC Technology32-bit RISC CPU, Harvard Architecture130nm CMOS with 1.5V Operation456 Dhrystone MIPS at 300MHz32 x 32-bit General Purpose RegistersHardware Multiply / Divide5-Stage Execution Pipeline16KB D-Cache, 16KB I-CacheMemory Management Unit (MMU)High-Bandwidth Interface to LogicBuilt-In Hardware TimersBuilt-In JTAG Debug and Trace supportIBM PowerPC 405 RISC CPU3.8 sq mm = 1% of 2VP100
-
High Performance4008001600200100Virtex-II Pro PowerPC 4054569121824Altera Excalibur Arm 9220Dhrystone MIPS1 CPU
-
Low PowerPC: 0.59mW/MIPSPower (mW)Performance (Dhrystone MIPS)501002003004001502503501002003004000Full-Custom IBM CPU Design1.5V 130nm CMOS TechnologyLow-K DielectricIP-Immersion100mW = 1 LED Indicatoror 169 MIPS!
-
IP-ImmersionEmbed multiple IP blocks of arbitrary shape withhigh-bandwidth connectivity to FPGA core logic, memory & I/O
Technologies Enabling IP-ImmersionActive Interconnect Segmented RoutingMetal 1Metal 2Metal 3Metal 4Metal 5Metal 6Metal 7Metal 8Silicon SubstratePolyAdvanced hard-IP block(e.g. PowerPC CPU)Metal HeadroomMetal 9
-
System Architecture OptionsLogic-Centric ArchitecturePowerPC Executes Entirely out of CacheNo FPGA Logic, Memory, or I/O Used10-20 Pages of C-Code or MoreUse as Complex Algorithmic EngineWeb ServerEncryption/DecryptionPacket Processor CPU-Centric ArchitecturePowerPC forms Heart of Embedded SystemOn & Off-Chip PeripheralsExternal Interfacese.g. PCI, 3GIO, Gb Ethernet, ZBT SRAMCoreConnect On-Chip BusTies System TogetherPeripherals implemented in FPGA LogicTypically Runs Embedded OSExternal DevicesExternal InterfacesExternal DevicesExternal Interfaces
-
HW accelerationControl TasksControl TasksCode Stack (C++)Control TasksInterleaverReed-SolomonViterbiPowerPCProcessor RAMPowerPC with Application-Specific Hardware AccelerationVirtex-II ProConcatenated FEC EngineViterbiInter- leaverReed- SolomonProcessing timeTraditionalXTREME Processing
-
HW/SW InterfacingProvides Specialized Connectivity Between PowerPC & FPGA LogicDual-Port BlockRAM MemoryCPU & Logic Each Own 1 PortHigh-Bandwidth6.4Gb/secLow-LatencyNon-CachingDesigned for Communications Data ProcessingEnables PowerPC & FPGA Logic to Work together on Complex Problems
BlockRAMs
-
APU ControllerMicro-controller style interface to fabric for control plane applicationsBenefits:Up to 10x faster than memory mapped interfaceSaves PLB bandwidth for code execution Minimizes pipeline stalls
-
Creating Complete Communications SolutionsGb Ethernet (1000BaseLX/SX/CX)RocketIO is PHY (1000Base-SX/LX)PHYftptelnetrloginmailetcUpper Layers on PowerPC
-
Infiniband ExampleCPU Makes Communications Practical, Easier, & CheaperInfiniBand TCAbuilt withCPU + fabricSources: Intel, Xilinxor builtwith fabric onlyCPU Based Solution 8 Times Less Area
-
Configurable Platform
-
The MicroBlaze High Performance Soft CPUTMLocalOPBBusCoreConnect Technologytm
-
Incremental Designlessens the impact of design changesNext Generation technologyEasy set-up through floorplanning along HDL hierarchy boundariesChanges only affect the module that was changedThe remainder of the design stays locked and intactTiming repeatabilitypreserves routingFaster turnaround for localized design changes
-
Partial ReconfigurabilityFPGA Flexibility for the FieldRe-program part of an FPGA while its still runningVirtex-II and Virtex-E011011User Definable BoundariesFixedLogicPRLogicPRLogicFixedLogicFixedLogic
-
System Exploration
-
Traditional ArchitectureCPMU-BusOther PeripheralsProcessormP BusSystemPCI BusSystemPCI BridgeDeviceRAMFLASHEEPROMG704FramerAAL5ProcessorMemoryInterfaceG703LIUMPC860Generic DesignCPM = Communications Processor ModuleTxRxLineCodingDataFormatLineDecodingDataAlignmentPayloadQualifyPayloadQualityPayloadAssemblyPayloadBufferPayloadProcessingSystemInterfacesMotorola PowerQUICC
-
Traditional Architecture
-
Optimized ArchitectureOther PeripheralsPowerPCProcessormP BusSystemPCI BusSystemPCI BridgeDeviceRAMFLASHEEPROMG704FramerMicroBProcessorMemoryInterfaceG703LIUGeneric DesignFPGA BoundaryTxRxLineCodingDataFormatLineDecodingDataAlignmentPayloadQualifyPayloadQualityPayloadAssemblyPayloadBufferPayloadProcessingSystemInterfacesDual PortBlockRAMFast I/FFIFO
-
Optimized ArchitectureOther PeripheralsPowerPCProcessormP BusSystemPCI BusSystemPCI BridgeDeviceRAMFLASHEEPROMG704FramerMicroBProcessorMemoryInterfaceG703LIUGeneric DesignFPGA BoundaryTxRxLineCodingDataFormatLineDecodingDataAlignmentPayloadQualifyPayloadQualityPayloadAssemblyPayloadBufferPayloadProcessingSystemInterfacesDual PortBlockRAMFast I/FFIFO
-
Interconnect and powerSource : Bill Daly
-
Interconnect and performanceSource : Bill Daly
-
Power AnalysisTypical design5.9uW/CLB/MHz [FPGA00]Fabric power is ~69% of total power2V6000 = 5.9uW/CLB/MHz 8448CLBs 100MHz 69% = 7.5W
Chart2
4485.888
422.4960360626
842.2601030137
712.5239934247
Power Dist
V2 (2v6000) Power Distribution based on our measurements, simulations + XPower measurements
XC2V6000
Typ. UtilizationWrt Utiliz
#BRAM1/73CLBs104.15144.00
#Mult1/2BRAM52.08144.00
#CLBs0.97603.28448
#IBUF1/33CLBs230.40552.00
#OBUF1/23CLBs330.57552.00
DCMNA112
AATUAAWU
mW/MHzMHzmW/MHzMHz
Fabric (avg)0.00591004485.890.00591004984.32
IBUF (avg)0.000510011.090.000510026.58
OBUF (avg)0.0124100411.400.0124100686.97
Multiplier0.1368100712.520.13681001970.24
BRAM0.0809100842.260.08091001164.49
DCM0.178510017.850.1785100214.20
WATUWAWU
mW/MHzMHzmW/MHzMHz
Fabric0.022820034670.590.022820038522.88
IBUF (extrm)0.0024200110.940.0024200265.79
OBUF (extrm)0.06222004114.020.06222006869.69
Multiplier0.13682001425.050.13682003940.47
BRAM0.08092001684.520.08092002328.98
DCM0.178520035.700.1785200428.40
AATUAAWUWATUWAWU
Fabric4485.894984.3234670.5938522.88
IOB422.50713.554224.967135.48
BRAM842.261164.49842.261164.49
Mult712.521970.24712.521970.24
DCM17.85214.2017.85214.20
Total6481.029046.7940468.1949007.28
tuan:- This value is based on XPower (output = speedfile + 10pF).- Assumed 12.5% toggle rate (6.25% 0->1 rate).
tuan:- Assume 50% (random) toggle rate.
Power Dist
00000
00000
00000
00000
Fabric
IOB
BRAM
Mult
DCM
mW
2v6000 Power Distribution
PEST P Dist
V2 (2v6000) Power Distribution based on Power Estimation Worksheet
XC2V6000
Typ. UtilizationTyp UteWrt Utiliz
#CLBs184488448
#IBUF1/33CLBs256.00552.00
#OBUF1/23CLBs367.30552.00
#Mult1/2BRAM57.86144.00
#BRAM1/73CLBs115.73144.00
DCMNA112
AATUAAWU
mW/MHzMHzmW/MHzMHz
Fabric0.00681005744.64SSTL3_II_DCI0.00871007349.76SSTL3_II_DCI
I0.001510039.4214673.0240.001510085.0131324.208
O0.014100495.860869565223938.69356521740.014100745.235976.048
Multiplier0.18751001084.930.18751002700.00
BRAM0.0331100383.050.0331100476.64
DCM0.0431004.300.04310051.60
WATUWAWU
mW/MHzMHzmW/MHzMHz
Fabric0.022920038691.84SSTL3_II_DCI0.022920038691.84SSTL3_II_DCI
I0.0038200191.7414712.4480.0038200413.4531409.216
O0.0642004672.111304347824167.89147826090.0642007021.4436320.496
Multiplier0.23752002748.490.23752006840.00
BRAM0.116852002704.520.116852003365.28
DCM0.05720011.400.057200136.80
AATUAAWUWATUWAWU
Fabric5.747.3538.6938.69
LVTTLIOB0.540.834.867.43
Mult1.082.702.756.84
BRAM0.380.482.703.37
DCM0.000.050.010.14
Total7.7511.4149.0256.47
AATUAAWUWATUWAWU
Fabric5.747.3538.6938.69
SSTL3_II_DCIIOB38.6167.3038.8867.73
Mult1.082.702.756.84
BRAM0.380.482.703.37
DCM0.000.050.010.14
Total45.8377.8883.04116.76
tuan:- This value is based on XPower (output = speedfile + 10pF).- Assumed 12.5% toggle rate (6.25% 0->1 rate).
tuan:- Assume 50% (random) toggle rate.
0
0
0
0
PEST P Dist
00000
00000
00000
00000
Fabric
IOB
Mult
BRAM
DCM
Config P
00000
00000
00000
00000
Fabric
IOB
Mult
BRAM
DCM
Static P
00000
00000
00000
00000
Fabric
IOB
Mult
BRAM
DCM
Watts
Virtex-II 2V6000 Power Distribution
00000
00000
00000
00000
Fabric
IOB
Mult
BRAM
DCM
V2 Configuration Power
Configuration Process
Config Bits
DeviceCLBs# FramesFrame SizeConfig Bits (K)Cbits/CLBSlopeOffset
XC2V406440483236056252565651
XC2V80128404147263549612565651
XC2V25038475221121,69744192565651
XC2V50076892827522,76235962565651
XC2V10001280110433924,08331902565651
XC2V15001920128040325,65929472565651
XC2V20002688145646727,49227872565651
XC2V300035841804531210,49429282565651
XC2V400057602156659215,66027192565651
XC2V600084482508787221,85025862565651
Programming Time (Sec)
Serial Mode
F (MHz)66
XC2V405.45E-03
XC2V60003.31E-01
Parallel (SelectMap) Mode
F (MHz)66
XC2V406.82E-04
XC2V60004.14E-02
Programming Power
Reg Writes
Memory Cell Power
MC-Vgg
MC-Vdd
CRC Power
QUESTIONSHow many of the MCs are VDD and VGG respectively?
tuan:- 10% toggle rate- MEDIUM routing
tuan:- 10% toggle- HIGH routing
tuan:- 50% toggle- MEDIUM routing
tuan:- 50% toggle- HIGH routing
tuan:16-bit CRC is performed on Config bitstream.
tuan:The bitstream consists of a series of packets, each consists of a header (address, packet length) and data. All cfg bits first go to a hold register, where an FSM writes the packet data to the appropriate register.
Actual config bits (all one packet) are broken into frames. Each frame = 80bits per row. Each column of IOB=4frames, IOI=22f, CLB=22f, BRAM=84f, GCLK=4f. Frame address is automatically incremented by FDRI.
Config data comes in as bits or bytes, and are converted to 32-bit words in the LR corner logic. This 32-bit wide data bus is then routed to the center bottom block, and fed into a shift register that shifts the data upward through each row. At each row, 80-bits are distributed to the left half and right half of the FPGA.
tuan:Stages of regs before the MCs: Corner logic:SR, FDBI, HR-Data_out-Data_busCenter logic:shiftreg in the center, data output reg from the shift reg.
V2 Quiescent Power
XPower Values (Are these just data from early versions of the Spreadsheet?)
DeviceSlicesP-staticslopeoffset
2v40256112.50.0070995392114.3655929299
2v80512112.50.0070995392114.3655929299
2v2501536112.50.0070995392114.3655929299
2v50030721500.0070995392114.3655929299
2v100051201500.0070995392114.3655929299
2v150076801500.0070995392114.3655929299
2v2000107522250.0070995392114.3655929299
2v3000143362250.0070995392114.3655929299
2v4000230403000.0070995392114.3655929299
2v6000337923000.0070995392114.3655929299
2v8000465924500.0070995392114.3655929299
2v1000061440562.50.0070995392114.3655929299
V2 Power Estimation Spreadsheet
DeviceSlicesPs (fabric)Ps/slicePs Offset
2v402561130.0089124.12
2v80512113
2v2501536113
2v5003072150
2v10005120150
2v15007680150
2v200010752300
2v300014336300
2v400023040375
2v600033792375
2v800046592525
0.0283981566uW/CLB
0
0
0
0
0
0
0
0
0
0
0
0
P-static
Slices
mW
Static Power in V2 Devices
0
0
0
0
0
0
0
0
0
0
0
0
Ps (fabric)
-
Methodology
Dynamic power VDD2 * (1/LMIN) * Capacity * fCLKVDD: Supply voltageLMIN: Minimum process feature sizeCapacity: Number of gates per devicefCLK: Clock frequencyStatic power VDD * Capacity * exp(-Vth/S)VTH: Theshold voltageS: Subthreshold slopeIdeally 60mV/decade; typically 90-120mV/decade; set at 100mV/decadeNote: Does not include gate leakage (?)
-
Dynamic PowerNormalized to 2001Best fit is a quadratic trend linePredicts 5X by 20071996: 4000EX 1997: 4000XL 1998: 4000XV 1999: Virtex 2000: Virtex-E 2001: Virtex-II
Chart17
0.0677083333
0.07623
0.1488095238
0.3434444444
0.754272
1
Dynamic Power
Sheet1
Xilinx FPGAs Historical Trend
Dynamic Power Ratio = C*V^2*f
199619971998199920002001200220032004200520062007
XC4036EXXC4085XL40250XVVirtexVirtex-eVirtex-IIVirtex-II ProWhitneyRanier??Everest??
Vdd53.32.52.51.81.51.36992619931.22207853281.10303480481.00512473410.92317939610.8535884382
1/Vdd0.20.3030303030.40.40.55555555560.66666666670.729966330.81827801830.90658970660.99490139491.08321308321.1715247715
Feature (nm)0.50.350.250.220.180.150.13390096870.11930683490.1075813310.09795435390.08990881230.0830846067
1/Feature sz22.857142857144.54545454555.55555555566.66666666677.46820586818.38174946749.295293066610.208836665911.122380265112.0359238643
Tilo1.61.20.70.60.420.36
1/Tilo0.6250.83333333331.42857142861.66666666672.3809523812.7777777778
Speed334475.428571428688125.7142857143146.6666666667168.0730158714191.6743764151215.2757369589238.8770975026262.4784580464286.0798185901
125001
Capacity30787448201022764884240140000
65000180000500000112400040740008000000120000001600000024000000310000004100000050000000
Power11.12585846152.19780219785.072410256411.140017230814.769230769218.902532406520.380286608625.222281143527.33144308430.754925576732.2951668462
0.06770833330.076230.14880952380.34344444440.75427211.2798589651.37991523911.70775861911.85056645882.08236475262.1866519219
Sheet1
0
0
0
0
0
0
MHz
Speed
Sheet2
0
0
0
0
0
0
0
0
&A
Page &P
1/Volts
1/Vdd
Sheet3
00
00
00
00
00
00
00
00
1/um
1/Process Feature Size
Sheet4
0
0
0
0
0
0
Watts
00
00
00
00
00
00
&A
Page &P
#Logic Cells
#System Gates
Device Capacity
0
0
0
0
0
0
0
0
0
0
0
0
&A
Page &P
Volts
1/Vdd
0
0
0
0
0
0
0
0
0
0
0
0
1/um
1/Process Feature Size
Xilinx FPGAs Historical Trend
Static Power = K*Vdd*exp(-Vth/S)
199619971998199920002001200220032004200520062007
Vdd53.32.52.51.81.51.36992619931.22207853281.10303480481.00512473410.92317939610.8535884382
Feature s0.50.350.250.220.180.150.13390096870.11930683490.1075813310.09795435390.08990881230.0830846067
VTH0 (n)0.740.610.550.550.3620.3550.2240.27960.23078054490.21125507920.19477583530.1806815238
1/VTHN01.35135135141.63934426231.81818181821.81818181822.76243093922.81690140854.46428571433.57653791134.33312088884.73361399685.13410710475.5346002127
VTH0 (p)0.940.780.60.60.52560.4730.21860.2850.24356099440.22103030140.20231507610.1865217794
1/VTHP01.06382978721.28205128211.66666666671.66666666671.9025875192.11416490494.57456541633.50877192984.10574773064.52426655484.9427853795.3613042032
VTH0 (n-lv)0.2760.2650.116
VTH0 (n)0.740.610.550.44333333320.3620.3550.31048621490.28257507430.25926817640.23951305030.22255528910.2078400106
1/VTHN01.35135135141.63934426231.81818181822.25563909852.76243093922.81690140853.2207549063.53888255163.85701019724.17513784284.49326548844.811393134
VTH0 (p)0.940.780.60.57726315790.52560.4730.42647044540.39216598990.36296943280.33781898250.3159280670.2967015886
1/VTHP01.06382978721.28205128211.66666666671.73231218091.9025875192.11416490492.34482837142.54994065182.75505293222.96016521263.16527749293.3703897733
Capacity65000180000500000112400040740008000000120000001600000024000000310000004100000050000000
I-leak0.00000003980.00000079430.00000316230.00000316230.00023988330.00028183830.00575439940.0015995580.00492260.00771701260.01127824820.0156021613
P-static0.0129384830.47183097143.95284707528.88600022511759.11215617543382.059517517494597.229551942531276.5684601045130315.180093971240453.366373206426885.699713102665891.222728189
P-static0.00000382560.000139510.00116876920.00262739320.5201304551127.97030302459.247787715838.531308931471.0967282296126.2206349421196.8892679976
I-leak0.00000003980.00000079430.00000316230.00003686950.00023988330.00028183830.00078548490.00149365140.00255457250.00402596040.0059490430.0083483355
P-static0.0129384830.47183097143.9528470752103.60315665811759.11215617543382.059517517312912.676419360129205.749426117267626.7776815155125444.36326418225173.391129576356302.132984923
P-static0.00000382560.000139510.00116876920.0306331560.520130455113.81799207048.635492449219.995738493437.091116408366.5787783933105.3506394963
tuan:/tools/xicad/solomon/spice/models/umc/u05-2hw3.lib
tuan:/tools/xicad/solomon/spice/models/umc/u035-3s.lib
tuan:/tools/xicad/solomon/spice/models/umc/u025-22e.lib
tuan:/tools/xicad/solomon/spice/models/umc/u018-1.lib
tuan:/tools/xicad/solomon/spice/models/umc/u015-10.lib
tuan:/tools/xicad/solomon/spice/models/ibm/xi010tt-01.lib
tuan:/tools/xicad/solomon/spice/models/ibm/v212/ESD_85/xi013tt-212.mod
00
00
00
00
00
00
00
00
1/VTHN0
1/VTHP0
ITRS Roadmap 2002 Predictions
Dynamic Power
Process13011510090807065
ITRS Year2001200220032004200520062007
Xilinx Year200220032004200520062007
Speed168.4231.7308.8399517.3563.1673.9
Energy/xtor0.3470.2120.1370.0990.0650.0520.032
Capacity2763484395536978781106
Power/chip16128.004817093.899218572.158421844.05323436.276525708.893623850.6688
Pwr/chip (norm)11.05988926791.15154717711.35441756561.45314171161.59405294821.478835671
tuan:/tools/xicad/solomon/spice/models/umc/u025-22e.lib
Watts
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
tuan:/tools/xicad/solomon/spice/models/umc/u05-2hw3.lib
tuan:/tools/xicad/solomon/spice/models/umc/u035-3s.lib
tuan:/tools/xicad/solomon/spice/models/umc/u025-22e.lib
tuan:/tools/xicad/solomon/spice/models/umc/u025-22e.lib
tuan:/tools/xicad/solomon/spice/models/umc/u018-1.lib
tuan:/tools/xicad/solomon/spice/models/umc/u015-10.lib
\
Watts
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
tuan:/tools/xicad/solomon/spice/models/umc/u025-22e.lib
tuan:ITRS Table1i: Functions per chip (million transistors) for Logic (low-volume, high performance)* Assuming constant die size
tuan:Table 4c: Chip Frequency / 10.
tuan:Table 35a: Energy per device switching transition.
0
0
0
0
0
0
ITRS Roadmap 2002 Predictions
Static Power
Process13011510090807065
ITRS Year2001200220032004200520062007
Xilinx Year200220032004200520062007
Static Power per device5.60E-096.70E-091.00E-081.10E-082.60E-085.30E-085.30E-08
Capacity2763484395536978781106
Static Power/chip1.55E-062.33E-064.39E-066.08E-061.81E-054.65E-05
S Pwr/chip (norm)11.51E+002.84E+003.94E+001.17E+013.01E+01
0
0
0
0
0
0
0
Watts
tuan:ITRS Table1i: Functions per chip (million transistors) for Logic (low-volume, high performance)* Assuming constant die size
tuan:Table 35a: Static power per device (Watts/Device)* ITRS Requirement (projection?)* Gate leakage not included* Assuming non-CMOS technology in 2007
-
Static PowerNormalized to 2001Best fit is a power trend Predicts 100X by 2007Future data points projected using linear trend for 1/VTH
1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007
Chart21
0.0000038256
0.00013951
0.0011687692
0.0306331559
0.5201304551
1
\
Static Power
Sheet1
Xilinx FPGAs Historical Trend
Dynamic Power Ratio = C*V^2*f
199619971998199920002001200220032004200520062007
XC4036EXXC4085XL40250XVVirtexVirtex-eVirtex-IIVirtex-II ProWhitneyRanier??Everest??
Vdd53.32.52.51.81.51.36992619931.22207853281.10303480481.00512473410.92317939610.8535884382
1/Vdd0.20.3030303030.40.40.55555555560.66666666670.729966330.81827801830.90658970660.99490139491.08321308321.1715247715
Feature (nm)0.50.350.250.220.180.150.13390096870.11930683490.1075813310.09795435390.08990881230.0830846067
1/Feature sz22.857142857144.54545454555.55555555566.66666666677.46820586818.38174946749.295293066610.208836665911.122380265112.0359238643
Tilo1.61.20.70.60.420.36
1/Tilo0.6250.83333333331.42857142861.66666666672.3809523812.7777777778
Speed334475.428571428688125.7142857143146.6666666667168.0730158714191.6743764151215.2757369589238.8770975026262.4784580464286.0798185901
125001
Capacity30787448201022764884240140000
65000180000500000112400040740008000000120000001600000024000000310000004100000050000000
Power11.12585846152.19780219785.072410256411.140017230814.769230769218.902532406520.380286608625.222281143527.33144308430.754925576732.2951668462
0.06770833330.076230.14880952380.34344444440.75427211.2798589651.37991523911.70775861911.85056645882.08236475262.1866519219
Sheet1
0
0
0
0
0
0
MHz
Speed
Sheet2
0
0
0
0
0
0
0
0
&A
Page &P
1/Volts
1/Vdd
Sheet3
00
00
00
00
00
00
00
00
1/um
1/Process Feature Size
Sheet4
0
0
0
0
0
0
Watts
00
00
00
00
00
00
&A
Page &P
#Logic Cells
#System Gates
Device Capacity
0
0
0
0
0
0
0
0
0
0
0
0
&A
Page &P
Volts
1/Vdd
0
0
0
0
0
0
0
0
0
0
0
0
1/um
1/Process Feature Size
Xilinx FPGAs Historical Trend
Static Power = K*Vdd*exp(-Vth/S)
123456789101112
Vdd53.32.52.51.81.51.36992619931.22207853281.10303480481.00512473410.92317939610.8535884382
Feature s0.50.350.250.220.180.150.13390096870.11930683490.1075813310.09795435390.08990881230.0830846067
VTH0 (n)0.740.610.550.550.3620.3550.2240.27960.23078054490.21125507920.19477583530.1806815238
1/VTHN01.35135135141.63934426231.81818181821.81818181822.76243093922.81690140854.46428571433.57653791134.33312088894.73361399695.13410710495.5346002129
VTH0 (p)0.940.780.60.60.52560.4730.21860.2850.24356099440.22103030130.20231507610.1865217794
1/VTHP01.06382978721.28205128211.66666666671.66666666671.9025875192.11416490494.57456541633.50877192984.10574773074.52426655494.94278537915.3613042034
VTH0 (n-lv)0.2760.2650.116
VTH0 (n)0.740.610.550.44333333330.3620.3550.3104862150.28257507430.25926817640.23951305030.22255528920.2078400106
1/VTHN01.35135135141.63934426231.81818181822.25563909772.76243093922.81690140853.22075490583.53888255143.8570101974.17513784264.49326548824.8113931338
VTH0 (p)0.940.780.60.57726315790.52560.4730.42647044540.39216598990.36296943270.33781898240.3159280670.2967015886
1/VTHP01.06382978721.28205128211.66666666671.73231218091.9025875192.11416490492.34482837152.54994065182.75505293222.96016521263.1652774933.3703897733
Capacity65000180000500000112400040740008000000120000001600000024000000310000004100000050000000
I-leak0.00000003980.00000079430.00000316230.00000316230.00023988330.00028183830.00575439940.0015995580.00492260.00771701260.01127824820.0156021613
P-static0.0129384830.47183097143.95284707528.88600022511759.11215617543382.059517517494597.229551942531276.5684601045130315.180108833240453.366401281426885.699763206665891.22280577
P-static0.00000382560.000139510.00116876920.00262739320.5201304551127.97030302459.247787715838.531308935871.0967282379126.2206349569196.8892680205
I-leak0.00000003980.00000079430.00000316230.00003686950.00023988330.00028183830.00078548490.00149365140.00255457250.00402596040.0059490430.0083483355
P-static0.0129384830.47183097143.9528470752103.6031563131759.11215617543382.059517517312912.676414328529205.749415877867626.7776599712125444.363227566225173.391068971356302.132895923
P-static0.00000382560.000139510.00116876920.03063315590.520130455113.81799206898.635492446119.995738487137.091116397566.5787783753105.35063947
tuan:/tools/xicad/solomon/spice/models/umc/u05-2hw3.lib
tuan:/tools/xicad/solomon/spice/models/umc/u035-3s.lib
tuan:/tools/xicad/solomon/spice/models/umc/u025-22e.lib
tuan:/tools/xicad/solomon/spice/models/umc/u018-1.lib
tuan:/tools/xicad/solomon/spice/models/umc/u015-10.lib
tuan:/tools/xicad/solomon/spice/models/ibm/xi010tt-01.lib
tuan:/tools/xicad/solomon/spice/models/ibm/v212/ESD_85/xi013tt-212.mod
00
00
00
00
00
00
00
00
1/VTHN0
1/VTHP0
ITRS Roadmap 2002 Predictions
Dynamic Power
Process13011510090807065
ITRS Year2001200220032004200520062007
Xilinx Year200220032004200520062007
Speed168.4231.7308.8399517.3563.1673.9
Energy/xtor0.3470.2120.1370.0990.0650.0520.032
Capacity2763484395536978781106
Power/chip16128.004817093.899218572.158421844.05323436.276525708.893623850.6688
Pwr/chip (norm)11.05988926791.15154717711.35441756561.45314171161.59405294821.478835671
tuan:/tools/xicad/solomon/spice/models/umc/u025-22e.lib
Watts
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
tuan:/tools/xicad/solomon/spice/models/umc/u05-2hw3.lib
tuan:/tools/xicad/solomon/spice/models/umc/u035-3s.lib
tuan:/tools/xicad/solomon/spice/models/umc/u025-22e.lib
tuan:/tools/xicad/solomon/spice/models/umc/u025-22e.lib
tuan:/tools/xicad/solomon/spice/models/umc/u018-1.lib
tuan:/tools/xicad/solomon/spice/models/umc/u015-10.lib
\
Watts
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
tuan:/tools/xicad/solomon/spice/models/umc/u025-22e.lib
tuan:ITRS Table1i: Functions per chip (million transistors) for Logic (low-volume, high performance)* Assuming constant die size
tuan:Table 4c: Chip Frequency / 10.
tuan:Table 35a: Energy per device switching transition.
0
0
0
0
0
0
Watts
Dynamic Power
0
0
0
0
0
0
Speed
0
0
0
0
0
0
Energy/xtor
0
0
0
0
0
0
#Xtors/chip
ITRS Roadmap 2002 Predictions
Static Power
Process13011510090807065
ITRS Year2001200220032004200520062007
Xilinx Year200220032004200520062007
Static Power per device5.60E-096.70E-091.00E-081.10E-082.60E-085.30E-085.30E-08
Capacity2763484395536978781106
Static Power/chip1.55E-062.33E-064.39E-066.08E-061.81E-054.65E-05
S Pwr/chip (norm)11.51E+002.84E+003.94E+001.17E+013.01E+01
0
0
0
0
0
0
0
Watts
Static Power
tuan:ITRS Table1i: Functions per chip (million transistors) for Logic (low-volume, high performance)* Assuming constant die size
tuan:Table 35a: Static power per device (Watts/Device)* ITRS Requirement (projection?)* Gate leakage not included* Assuming non-CMOS technology in 2007
-
Static versus Dynamic
-
The Age of Accumulation
-
Incremental Designlessens the impact of design changesNext Generation technologyEasy set-up through floorplanning along HDL hierarchy boundariesChanges only affect the module that was changedThe remainder of the design stays locked and intactTiming repeatabilitypreserves routingFaster turnaround for localized design changesNew in ISE5.1i
-
Partial ReconfigurabilityFPGA Flexibility for the FieldRe-program part of an FPGA while its still runningVirtex-II and Virtex-E011011User Definable BoundariesFixedLogicPRLogicPRLogicFixedLogicFixedLogic
-
System Level Design FlowEnabling Technologies
-
Reconfigurability : 4th dimensionApplications open, close, and change priority over timeUse model : Object-Oriented Multiprocessing
-
The mother of all complex flows
No established formal models / semanticsDesign capture and simulation issuesHardware/software co-designModular design with uniform interfacesDynamic management of diverse hardware resources creates new place & route issuesBitstream production, storage, loading and linking
-
Combining the Best of FPGA and ASIC: XBlue 100% Programmable100% Fixed LogicTraditional FPGA MarketTraditional ASIC Market Flexible, but expensive Inflexible, but highestperformance/integrationBlockRAMSpecial FunctionsPowerPCCoreBlockRAMEmbeddedFPGA Core
-
ConclusionsFPGAs ride the tideToday : Programmable System PlatformChallengesDesign TechnologyInterconnectLow PowerExploit 3rd and 4th dimension
-
A trend in the 1990s is that process geometries are smaller than the wavelength of light used in wafer fabrication. This creates an obstacle in manufacturing deep submicron geometries.
How do companies such as Xilinx and UMC overcome this obstacle? Through optical proximity correction (OPC) technology. Wafer processing masks have to be corrected as shown on the next slide.
Future system cannot be upscaled 10GHz monoprocessor also for physical reasons. There is 4* more data transport but the average wire length also increases. Memory delay does not scale unless we decrease their size This leads to the need for parallel architectures with distributed embedded storage. Localized computation and storage are the key features to exploit DS technology. Fortunately this is consistent with domain specific multimedia image and sound stream processing as opposed to GP C program execution. The digitization of consumer products has also had another very profound effect on the dynamics of the consumer market place. The digitization of Consumer products has accelerated the rate at which products and standards become obsolete For the manufactures of these products, being able to quickly design and deliver a product to market will have a significant impact on his ability to gain market share.The window of design authoring is the most crucial in the design flow. This is where product innovation and design excellence takes place. It is becoming a smaller portion of the overall design task. Because deep-submicron design calls for additional emphasis on design rules and silicon effects, tools to analyze and correct these problems proliferate. The logic designer must comprehend interconnect timing issues, power and electromigration effects at the time of authoring, to reduce the iterations between design and layout. The average number of iterations is 20 to 23, impacting the efficiency of design.The number of point tools to ensure complete analysis increases as process geometries decrease. The design task becomes longer and more complex. No silicon design or verification requiredProgrammability means design freeze moves much farther to the rightlower cost, easier to use toolsets
from a study published in California Management Review, 1998by Stefan Thomke and Donald Reinertsen from Harvard391 ASIC and FPGA projects surveyedaverage 18 months per ASIC design, 8 months per FPGA design55% time-to-market advantage for FPGA design flow demonstrated
37% of new digital products were late to marketentering late significantly reduces the amount of profit available to a company
data is from a study by Price Waterhouse Coopers
The inherent programmability of FPGAs leads to greater time IN marketFPGAs can now be reprogrammed securely via the internetgreater profitslower cost of supportfix and debug designs while in the fieldlower susceptibility to changing standards or specslonger time in marketWith the PowerPC 405 tightly integrated into the FPGA fabric, the CoreConnect bus architecture can be used to connect the processor to the soft peripherals and other IP built in progrmamable gates. The following example illustrates how the three busses of the CoreConnect standard are used in a typical system.
The main bus is the Processor Local Bus, or P-L-B, which is primarily used for connecting high speed peripherals directly to the processor, such as memory and external bus controllers. It has separate 32-bit addresses and 64-bit data busses for the data and instruction caches of the 405. Furthermore, it has separate data read and write busses for overlapped transfers, and at a typical speed of 133MHz, it has a peak data bandwidth to the 405 of 2.1 GB/s. It supports pipelining, split transactions, and burst and cache line transfers. ***MOUSE CLICK***
CoreConnect also has a secondary bus, the On-chip Peripheral Bus, or O-P-B., used primarily to offload slower peripherals such as UARTs and General Purpose I/Os from the PLB. It has 32-bit address and data busses and supports single cycle data transfers. ***MOUSE CLICK***
The third CoreConnect bus is the Device Control Register, or D-C-R. It provides direct data transfers between the General Purpose Registers and DCR registers included in any peripherals. It is used primarily for system initialization and device configuration.Please not that these bus structures built in soft gates and can be customized by users for optimal performance . ***MOUSE CLICK***And now, lets move on to the System Generator for PowerPC. With this tool designers can specify, configure and assemble a PowerPC system. The tool will automatically integrate the PowerPC with the selected and customized soft peripherals using the CoreConnect bus.The flow is simple;First, the designer has a concept, a block diagram, of a processor system.Second, the designer specifies the architecture by selecting the different peripherals and connecting them together using the GUI shown on the foil.Third, each peripheral is parameterized using a CORE generator like GUI, as shown.Fourth, the designer assembles the entire system by specifying memory maps etc, using the GUI shown on the foil. Here Xilinx has an advantage over Altera. In the Altera tool, the designer is forced to define the memory map while defining each individual peripheral. Its difficult to get the system overview doing it that way. The Xilinx approach is to allow the designer to define global parameters in one global view.
As part of the Xilinx Empower! strategy, the MicroBlaze soft processor complements the Xilinx embedded PowerPC solution. The MicroBlaze processor will utilize the IBM CoreConnect bus for peripherals, the same bus used for the embedded PowerPC, providing customers with a plug-and-play infrastructure for building complex system-on-a-chip. In addition, peripherals built for the MicroBlaze processor are compatible with PowerPC products. Finally, because of it's compact size and the use of the CoreConnect interface, customers will be able to build intelligent processor systems with many multiple MicroBlaze processors as on-chip slave peripherals to the PowerPC.Incremental Design - is focused on minimizing the impact of late cycle design changes.
Based on the pioneering BLIS technology Xilinx introduced, Incremental Design is "next generation" technology that works the way customers work.
The engineer can quickly and easily floorplan along hierarchy boundaries, and then finish the design as normal. Then if a late design change comes along, the engineer only has to re-implement the portion of the design that changed, the rest of the design stays completed and intact.
Every one of our global customers said that they had to implement at least one late-cycle design change in their projects released in the last year.
Incremental Design leads to faster overall design closure with better, repeatable timing results.Partial Reconfigurability is an exciting new FPGA option available for Virtex-II and Virtex-E devices and enabled with the Modular Design ISE software option.
Partial Reconfigurability lets you re-program part of an FPGA while the FPGA continues running. An industry exclusive, Partial Reconfigurability offers customers lower support costs, with less field support required with no product down time.Accumulating oher stuffAdded features dont get dropped -- they get improved. Gates, mem, arithmetic, clocksThe age of accumulation has been gaining speed.As with any evolution, it starts earler than notices, but gains speed quickly: punctuated equilibrium.Top 4 boxes in 3 yearsFor many years, programmable logic consisted of gates and routing. However, Xilinx has been expanding the capabilities offered by programmable logic over the years. First special arithmetic functions were added to increase the performance of commonly used building blocks like adders and counters. In 1992, Xilinx pioneered the XC4000 family, which included RAM into programmable logic and greatly expanded the application space which could be addressed. More recently, Xilinx has brought sophisticated clock management to the masses in the Virtex architecture, along with extremely flexible I/Os. In all cases, these sub-systems have been done in a programmable way that maximizes the flexibility for the FPGA user.Looking into the future, we can see that serial gigabit I/O, microprocessors and mixed signal capability will all be added to FPGAs to create a truly programmable system on a chip.Lest anyone doubt this vision, consider for a moment the steady forward progress of Moores Law. Extrapolated, this says that Xilinx will have FPGAs in 2005 that contain 2 billion transistors. At this point, the cost of adding some of these additional features is negligible. The trick is to implement the features in the correct way, something that Xilinx has demonstrated over the years.
Incremental Design - is focused on minimizing the impact of late cycle design changes.
Based on the pioneering BLIS technology Xilinx introduced, Incremental Design is "next generation" technology that works the way customers work.
The engineer can quickly and easily floorplan along hierarchy boundaries, and then finish the design as normal. Then if a late design change comes along, the engineer only has to re-implement the portion of the design that changed, the rest of the design stays completed and intact.
Every one of our global customers said that they had to implement at least one late-cycle design change in their projects released in the last year.
Incremental Design leads to faster overall design closure with better, repeatable timing results.Partial Reconfigurability is an exciting new FPGA option available for Virtex-II and Virtex-E devices and enabled with the Modular Design ISE software option.
Partial Reconfigurability lets you re-program part of an FPGA while the FPGA continues running. An industry exclusive, Partial Reconfigurability offers customers lower support costs, with less field support required with no product down time.