challenges and opportunities for fpga platforms

Challenges and opportunities for FPGA platformsIvo Bolsens Xilinx Research Labs

Thanks toBill CarterDavid EdenErich Goetting Alireza KavianiBernie NewCameron PattersonSteve TrimbergerTim Tuan

OverviewFPGAs ride the tideOpportunitiesChallenges

ASICs buck the tide, FPGAs ride the tideProcess TechnologyPerformanceArchitectureCostFlexibilityMarket trends

Moores LawA tale of two numbers : What process people dont tell you CD80nm240nm320nm160nm2.71.34.56.5(nm)CDTox

Trend: Line Widths Smaller Than the Wavelength of Light -0.1000.2000.3000.4000.5000.6000.70019881990199219941996199820002002Process Geometry (micron)Optical Processing WavelengthProcess Geometry

Painting a one cm line with a three cm brushCourtesy : IBM

Gate OxideAbout 10 molecular layers of SiO2 for this 150nm example90nm technology is about half the thicknessPolysilicon GateGate OxideSilicon crystal

FPGAs are ahead of the curveProcess Technology Feature Size (nm)Year35025018015013010070979899000102030405

FPGA Process Technology: Leading the transistor count in MOS Logic2V6000:~250 Million transistors per chip !2V8000: ~360 Million transistors per chip !

Where are we todayLogic Cells125K105KBlock RAM10Mb3MbMultipliers5561683.125Gb/sMGTs424PowerPC CPUs840Mb/s LVDS340442XC2VP125XC2V8000= 350M tranistors

Intels RoadmapSource : IntelFPGAs are leading

Gate count requirement for ASICsFPGAs can address very large part of the ASIC market todaySource: IMS

Performance requirement for ASICsFPGAs can address very large part of the ASIC market todaySource: IMS 2000

A Decade of Progress11010010001/911/921/931/941/951/961/971/981/991/001/01YearCapacitySpeedPriceVirtex &Virtex-E(excl. Block RAM)XC4000100x10x1xSpartan1000xVirtex-II(excl. Block RAM)

The Cost/Volume Crossover

Unit VolumeRelative Cost0.11101001000101001,00010,000100,0001,000KASIC CostFPGA Cost

Are Transistors Free?

Performance Scaling

Localisation of storage and computingll/2++++storestore2Courtesy :IMEC

Market Requirements607080900010/ human

Electronics Industry DynamicsTimeCable DecodersCustom Features(Pay-Per-View)DigitalVCRSatellite/Cable+ DigitalVCRResidential Gateway(Broadband access) Market Size ($)NTSC NTSCSmart cards(DES)NTSCDESATAPINTSCDESATAPIDBSDOCSISNTSCDESATAPIDBSDOCSISHomePNAHomeRFHomePLUGBluetoothHiperlan2DSL...Dramatic increasein new standards New Products Take less time to reach high volumes Shorter Product Life Cycles Many standards / More Interoperability

Complex ASIC DesignThe Shrinking Window of InnovationSynthesis16% Average iterations between design and layout = 20(Source Electronic Systems Jan 99)

Simpler/Faster Design Flows2:1 proven Time-to-Market AdvantageNo silicon design or verification stepsMore design flexibility through later design freezeSpecASICFlowDesignFreezeDesign andVerificationSiliconPrototypeSystemIntegrationSiliconProductionSpecFPGAFlowDesignFreezeDesign andVerificationSystemIntegration

Todays Product Lifecycle37% of new digital products were late to marketEntering the market first can result in up to a 40% greater total profit contribution over the products life vs. the #2 entrantReduced profit for latecomersProfit for first to MarketProfitTime

Todays Product Lifecycle37% of new digital products were late to marketEntering the market first can result in up to a 40% greater total profit contribution over the products life vs. the #2 entrantIRL extendsproduct life in marketProfitTime

Virtex-II Pro PowerPC Technology32-bit RISC CPU, Harvard Architecture130nm CMOS with 1.5V Operation456 Dhrystone MIPS at 300MHz32 x 32-bit General Purpose RegistersHardware Multiply / Divide5-Stage Execution Pipeline16KB D-Cache, 16KB I-CacheMemory Management Unit (MMU)High-Bandwidth Interface to LogicBuilt-In Hardware TimersBuilt-In JTAG Debug and Trace supportIBM PowerPC 405 RISC CPU3.8 sq mm = 1% of 2VP100

High Performance4008001600200100Virtex-II Pro PowerPC 4054569121824Altera Excalibur Arm 9220Dhrystone MIPS1 CPU

Low PowerPC: 0.59mW/MIPSPower (mW)Performance (Dhrystone MIPS)501002003004001502503501002003004000Full-Custom IBM CPU Design1.5V 130nm CMOS TechnologyLow-K DielectricIP-Immersion100mW = 1 LED Indicatoror 169 MIPS!

IP-ImmersionEmbed multiple IP blocks of arbitrary shape withhigh-bandwidth connectivity to FPGA core logic, memory & I/O

Technologies Enabling IP-ImmersionActive Interconnect Segmented RoutingMetal 1Metal 2Metal 3Metal 4Metal 5Metal 6Metal 7Metal 8Silicon SubstratePolyAdvanced hard-IP block(e.g. PowerPC CPU)Metal HeadroomMetal 9

System Architecture OptionsLogic-Centric ArchitecturePowerPC Executes Entirely out of CacheNo FPGA Logic, Memory, or I/O Used10-20 Pages of C-Code or MoreUse as Complex Algorithmic EngineWeb ServerEncryption/DecryptionPacket Processor CPU-Centric ArchitecturePowerPC forms Heart of Embedded SystemOn & Off-Chip PeripheralsExternal Interfacese.g. PCI, 3GIO, Gb Ethernet, ZBT SRAMCoreConnect On-Chip BusTies System TogetherPeripherals implemented in FPGA LogicTypically Runs Embedded OSExternal DevicesExternal InterfacesExternal DevicesExternal Interfaces

HW accelerationControl TasksControl TasksCode Stack (C++)Control TasksInterleaverReed-SolomonViterbiPowerPCProcessor RAMPowerPC with Application-Specific Hardware AccelerationVirtex-II ProConcatenated FEC EngineViterbiInter- leaverReed- SolomonProcessing timeTraditionalXTREME Processing

HW/SW InterfacingProvides Specialized Connectivity Between PowerPC & FPGA LogicDual-Port BlockRAM MemoryCPU & Logic Each Own 1 PortHigh-Bandwidth6.4Gb/secLow-LatencyNon-CachingDesigned for Communications Data ProcessingEnables PowerPC & FPGA Logic to Work together on Complex Problems

BlockRAMs

APU ControllerMicro-controller style interface to fabric for control plane applicationsBenefits:Up to 10x faster than memory mapped interfaceSaves PLB bandwidth for code execution Minimizes pipeline stalls

Creating Complete Communications SolutionsGb Ethernet (1000BaseLX/SX/CX)RocketIO is PHY (1000Base-SX/LX)PHYftptelnetrloginmailetcUpper Layers on PowerPC

Infiniband ExampleCPU Makes Communications Practical, Easier, & CheaperInfiniBand TCAbuilt withCPU + fabricSources: Intel, Xilinxor builtwith fabric onlyCPU Based Solution 8 Times Less Area

Configurable Platform

The MicroBlaze High Performance Soft CPUTMLocalOPBBusCoreConnect Technologytm

Incremental Designlessens the impact of design changesNext Generation technologyEasy set-up through floorplanning along HDL hierarchy boundariesChanges only affect the module that was changedThe remainder of the design stays locked and intactTiming repeatabilitypreserves routingFaster turnaround for localized design changes

Partial ReconfigurabilityFPGA Flexibility for the FieldRe-program part of an FPGA while its still runningVirtex-II and Virtex-E011011User Definable BoundariesFixedLogicPRLogicPRLogicFixedLogicFixedLogic

System Exploration

Traditional ArchitectureCPMU-BusOther PeripheralsProcessormP BusSystemPCI BusSystemPCI BridgeDeviceRAMFLASHEEPROMG704FramerAAL5ProcessorMemoryInterfaceG703LIUMPC860Generic DesignCPM = Communications Processor ModuleTxRxLineCodingDataFormatLineDecodingDataAlignmentPayloadQualifyPayloadQualityPayloadAssemblyPayloadBufferPayloadProcessingSystemInterfacesMotorola PowerQUICC

Traditional Architecture

Optimized ArchitectureOther PeripheralsPowerPCProcessormP BusSystemPCI BusSystemPCI BridgeDeviceRAMFLASHEEPROMG704FramerMicroBProcessorMemoryInterfaceG703LIUGeneric DesignFPGA BoundaryTxRxLineCodingDataFormatLineDecodingDataAlignmentPayloadQualifyPayloadQualityPayloadAssemblyPayloadBufferPayloadProcessingSystemInterfacesDual PortBlockRAMFast I/FFIFO

Interconnect and powerSource : Bill Daly

Interconnect and performanceSource : Bill Daly

Power AnalysisTypical design5.9uW/CLB/MHz [FPGA00]Fabric power is ~69% of total power2V6000 = 5.9uW/CLB/MHz 8448CLBs 100MHz 69% = 7.5W

Chart2

4485.888

422.4960360626

842.2601030137

712.5239934247

Power Dist

V2 (2v6000) Power Distribution based on our measurements, simulations + XPower measurements

XC2V6000

Typ. UtilizationWrt Utiliz

#BRAM1/73CLBs104.15144.00

#Mult1/2BRAM52.08144.00

#CLBs0.97603.28448

#IBUF1/33CLBs230.40552.00

#OBUF1/23CLBs330.57552.00

DCMNA112

AATUAAWU

mW/MHzMHzmW/MHzMHz

Fabric (avg)0.00591004485.890.00591004984.32

IBUF (avg)0.000510011.090.000510026.58

OBUF (avg)0.0124100411.400.0124100686.97

Multiplier0.1368100712.520.13681001970.24

BRAM0.0809100842.260.08091001164.49

DCM0.178510017.850.1785100214.20

WATUWAWU

mW/MHzMHzmW/MHzMHz

Fabric0.022820034670.590.022820038522.88

IBUF (extrm)0.0024200110.940.0024200265.79

OBUF (extrm)0.06222004114.020.06222006869.69

Multiplier0.13682001425.050.13682003940.47

BRAM0.08092001684.520.08092002328.98

DCM0.178520035.700.1785200428.40

AATUAAWUWATUWAWU

Fabric4485.894984.3234670.5938522.88

IOB422.50713.554224.967135.48

BRAM842.261164.49842.261164.49

Mult712.521970.24712.521970.24

DCM17.85214.2017.85214.20

Total6481.029046.7940468.1949007.28

tuan:- This value is based on XPower (output = speedfile + 10pF).- Assumed 12.5% toggle rate (6.25% 0->1 rate).

tuan:- Assume 50% (random) toggle rate.

Power Dist

00000

00000

00000

00000

Fabric

IOB

BRAM

Mult

DCM

mW

2v6000 Power Distribution

PEST P Dist

V2 (2v6000) Power Distribution based on Power Estimation Worksheet

XC2V6000

Typ. UtilizationTyp UteWrt Utiliz

#CLBs184488448

#IBUF1/33CLBs256.00552.00

#OBUF1/23CLBs367.30552.00

#Mult1/2BRAM57.86144.00

#BRAM1/73CLBs115.73144.00

DCMNA112

AATUAAWU

mW/MHzMHzmW/MHzMHz

Fabric0.00681005744.64SSTL3_II_DCI0.00871007349.76SSTL3_II_DCI

I0.001510039.4214673.0240.001510085.0131324.208

O0.014100495.860869565223938.69356521740.014100745.235976.048

Multiplier0.18751001084.930.18751002700.00

BRAM0.0331100383.050.0331100476.64

DCM0.0431004.300.04310051.60

WATUWAWU

mW/MHzMHzmW/MHzMHz

Fabric0.022920038691.84SSTL3_II_DCI0.022920038691.84SSTL3_II_DCI

I0.0038200191.7414712.4480.0038200413.4531409.216

O0.0642004672.111304347824167.89147826090.0642007021.4436320.496

Multiplier0.23752002748.490.23752006840.00

BRAM0.116852002704.520.116852003365.28

DCM0.05720011.400.057200136.80

AATUAAWUWATUWAWU

Fabric5.747.3538.6938.69

LVTTLIOB0.540.834.867.43

Mult1.082.702.756.84

BRAM0.380.482.703.37

DCM0.000.050.010.14

Total7.7511.4149.0256.47

AATUAAWUWATUWAWU

Fabric5.747.3538.6938.69

SSTL3_II_DCIIOB38.6167.3038.8867.73

Mult1.082.702.756.84

BRAM0.380.482.703.37

DCM0.000.050.010.14

Total45.8377.8883.04116.76

tuan:- This value is based on XPower (output = speedfile + 10pF).- Assumed 12.5% toggle rate (6.25% 0->1 rate).

tuan:- Assume 50% (random) toggle rate.

0

0

0

0

PEST P Dist

00000

00000

00000

00000

Fabric

IOB

Mult

BRAM

DCM

Config P

00000

00000

00000

00000

Fabric

IOB

Mult

BRAM

DCM

Static P

00000

00000

00000

00000

Fabric

IOB

Mult

BRAM

DCM

Watts

Virtex-II 2V6000 Power Distribution

00000

00000

00000

00000

Fabric

IOB

Mult

BRAM

DCM

V2 Configuration Power

Configuration Process

Config Bits

DeviceCLBs# FramesFrame SizeConfig Bits (K)Cbits/CLBSlopeOffset

XC2V406440483236056252565651

XC2V80128404147263549612565651

XC2V25038475221121,69744192565651

XC2V50076892827522,76235962565651

XC2V10001280110433924,08331902565651

XC2V15001920128040325,65929472565651

XC2V20002688145646727,49227872565651

XC2V300035841804531210,49429282565651

XC2V400057602156659215,66027192565651

XC2V600084482508787221,85025862565651

Programming Time (Sec)

Serial Mode

F (MHz)66

XC2V405.45E-03

XC2V60003.31E-01

Parallel (SelectMap) Mode

F (MHz)66

XC2V406.82E-04

XC2V60004.14E-02

Programming Power

Reg Writes

Memory Cell Power

MC-Vgg

MC-Vdd

CRC Power

QUESTIONSHow many of the MCs are VDD and VGG respectively?

tuan:- 10% toggle rate- MEDIUM routing

tuan:- 10% toggle- HIGH routing

tuan:- 50% toggle- MEDIUM routing

tuan:- 50% toggle- HIGH routing

tuan:16-bit CRC is performed on Config bitstream.

tuan:The bitstream consists of a series of packets, each consists of a header (address, packet length) and data. All cfg bits first go to a hold register, where an FSM writes the packet data to the appropriate register.

Actual config bits (all one packet) are broken into frames. Each frame = 80bits per row. Each column of IOB=4frames, IOI=22f, CLB=22f, BRAM=84f, GCLK=4f. Frame address is automatically incremented by FDRI.

Config data comes in as bits or bytes, and are converted to 32-bit words in the LR corner logic. This 32-bit wide data bus is then routed to the center bottom block, and fed into a shift register that shifts the data upward through each row. At each row, 80-bits are distributed to the left half and right half of the FPGA.

tuan:Stages of regs before the MCs: Corner logic:SR, FDBI, HR-Data_out-Data_busCenter logic:shiftreg in the center, data output reg from the shift reg.

V2 Quiescent Power

XPower Values (Are these just data from early versions of the Spreadsheet?)

DeviceSlicesP-staticslopeoffset

2v40256112.50.0070995392114.3655929299

2v80512112.50.0070995392114.3655929299

2v2501536112.50.0070995392114.3655929299

2v50030721500.0070995392114.3655929299

2v100051201500.0070995392114.3655929299

2v150076801500.0070995392114.3655929299

2v2000107522250.0070995392114.3655929299

2v3000143362250.0070995392114.3655929299

2v4000230403000.0070995392114.3655929299

2v6000337923000.0070995392114.3655929299

2v8000465924500.0070995392114.3655929299

2v1000061440562.50.0070995392114.3655929299

V2 Power Estimation Spreadsheet

DeviceSlicesPs (fabric)Ps/slicePs Offset

2v402561130.0089124.12

2v80512113

2v2501536113

2v5003072150

2v10005120150

2v15007680150

2v200010752300

2v300014336300

2v400023040375

2v600033792375

2v800046592525

0.0283981566uW/CLB

0

0

0

0

0

0

0

0

0

0

0

0

P-static

Slices

mW

Static Power in V2 Devices

0

0

0

0

0

0

0

0

0

0

0

0

Ps (fabric)

Methodology

Dynamic power VDD2 * (1/LMIN) * Capacity * fCLKVDD: Supply voltageLMIN: Minimum process feature sizeCapacity: Number of gates per devicefCLK: Clock frequencyStatic power VDD * Capacity * exp(-Vth/S)VTH: Theshold voltageS: Subthreshold slopeIdeally 60mV/decade; typically 90-120mV/decade; set at 100mV/decadeNote: Does not include gate leakage (?)

Dynamic PowerNormalized to 2001Best fit is a quadratic trend linePredicts 5X by 20071996: 4000EX 1997: 4000XL 1998: 4000XV 1999: Virtex 2000: Virtex-E 2001: Virtex-II

Chart17

0.0677083333

0.07623

0.1488095238

0.3434444444

0.754272

1

Dynamic Power

Sheet1

Xilinx FPGAs Historical Trend

Dynamic Power Ratio = C*V^2*f

199619971998199920002001200220032004200520062007

XC4036EXXC4085XL40250XVVirtexVirtex-eVirtex-IIVirtex-II ProWhitneyRanier??Everest??

Vdd53.32.52.51.81.51.36992619931.22207853281.10303480481.00512473410.92317939610.8535884382

1/Vdd0.20.3030303030.40.40.55555555560.66666666670.729966330.81827801830.90658970660.99490139491.08321308321.1715247715

Feature (nm)0.50.350.250.220.180.150.13390096870.11930683490.1075813310.09795435390.08990881230.0830846067

1/Feature sz22.857142857144.54545454555.55555555566.66666666677.46820586818.38174946749.295293066610.208836665911.122380265112.0359238643

Tilo1.61.20.70.60.420.36

1/Tilo0.6250.83333333331.42857142861.66666666672.3809523812.7777777778

Speed334475.428571428688125.7142857143146.6666666667168.0730158714191.6743764151215.2757369589238.8770975026262.4784580464286.0798185901

125001

Capacity30787448201022764884240140000

65000180000500000112400040740008000000120000001600000024000000310000004100000050000000

Power11.12585846152.19780219785.072410256411.140017230814.769230769218.902532406520.380286608625.222281143527.33144308430.754925576732.2951668462

0.06770833330.076230.14880952380.34344444440.75427211.2798589651.37991523911.70775861911.85056645882.08236475262.1866519219

Sheet1

0

0

0

0

0

0

MHz

Speed

Sheet2

0

0

0

0

0

0

0

0

&A

Page &P

1/Volts

1/Vdd

Sheet3

00

00

00

00

00

00

00

00

1/um

1/Process Feature Size

Sheet4

0

0

0

0

0

0

Watts

00

00

00

00

00

00

&A

Page &P

#Logic Cells

#System Gates

Device Capacity

0

0

0

0

0

0

0

0

0

0

0

0

&A

Page &P

Volts

1/Vdd

0

0

0

0

0

0

0

0

0

0

0

0

1/um



Static Power = K*Vdd*exp(-Vth/S)

199619971998199920002001200220032004200520062007

Vdd53.32.52.51.81.51.36992619931.22207853281.10303480481.00512473410.92317939610.8535884382

Feature s0.50.350.250.220.180.150.13390096870.11930683490.1075813310.09795435390.08990881230.0830846067

VTH0 (n)0.740.610.550.550.3620.3550.2240.27960.23078054490.21125507920.19477583530.1806815238

1/VTHN01.35135135141.63934426231.81818181821.81818181822.76243093922.81690140854.46428571433.57653791134.33312088884.73361399685.13410710475.5346002127

VTH0 (p)0.940.780.60.60.52560.4730.21860.2850.24356099440.22103030140.20231507610.1865217794

1/VTHP01.06382978721.28205128211.66666666671.66666666671.9025875192.11416490494.57456541633.50877192984.10574773064.52426655484.9427853795.3613042032

VTH0 (n-lv)0.2760.2650.116

VTH0 (n)0.740.610.550.44333333320.3620.3550.31048621490.28257507430.25926817640.23951305030.22255528910.2078400106

1/VTHN01.35135135141.63934426231.81818181822.25563909852.76243093922.81690140853.2207549063.53888255163.85701019724.17513784284.49326548844.811393134

VTH0 (p)0.940.780.60.57726315790.52560.4730.42647044540.39216598990.36296943280.33781898250.3159280670.2967015886

1/VTHP01.06382978721.28205128211.66666666671.73231218091.9025875192.11416490492.34482837142.54994065182.75505293222.96016521263.16527749293.3703897733

Capacity65000180000500000112400040740008000000120000001600000024000000310000004100000050000000

I-leak0.00000003980.00000079430.00000316230.00000316230.00023988330.00028183830.00575439940.0015995580.00492260.00771701260.01127824820.0156021613

P-static0.0129384830.47183097143.95284707528.88600022511759.11215617543382.059517517494597.229551942531276.5684601045130315.180093971240453.366373206426885.699713102665891.222728189

P-static0.00000382560.000139510.00116876920.00262739320.5201304551127.97030302459.247787715838.531308931471.0967282296126.2206349421196.8892679976

I-leak0.00000003980.00000079430.00000316230.00003686950.00023988330.00028183830.00078548490.00149365140.00255457250.00402596040.0059490430.0083483355

P-static0.0129384830.47183097143.9528470752103.60315665811759.11215617543382.059517517312912.676419360129205.749426117267626.7776815155125444.36326418225173.391129576356302.132984923

P-static0.00000382560.000139510.00116876920.0306331560.520130455113.81799207048.635492449219.995738493437.091116408366.5787783933105.3506394963

tuan:/tools/xicad/solomon/spice/models/umc/u05-2hw3.lib

tuan:/tools/xicad/solomon/spice/models/umc/u035-3s.lib

tuan:/tools/xicad/solomon/spice/models/umc/u025-22e.lib

tuan:/tools/xicad/solomon/spice/models/umc/u018-1.lib


tuan:/tools/xicad/solomon/spice/models/ibm/xi010tt-01.lib

tuan:/tools/xicad/solomon/spice/models/ibm/v212/ESD_85/xi013tt-212.mod

00

00

00

00

00

00

00

00

1/VTHN0

1/VTHP0

ITRS Roadmap 2002 Predictions

Dynamic Power

Process13011510090807065

ITRS Year2001200220032004200520062007

Xilinx Year200220032004200520062007

Speed168.4231.7308.8399517.3563.1673.9

Energy/xtor0.3470.2120.1370.0990.0650.0520.032

Capacity2763484395536978781106

Power/chip16128.004817093.899218572.158421844.05323436.276525708.893623850.6688

Pwr/chip (norm)11.05988926791.15154717711.35441756561.45314171161.59405294821.478835671


Watts

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0







\

Watts

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0


tuan:ITRS Table1i: Functions per chip (million transistors) for Logic (low-volume, high performance)* Assuming constant die size

tuan:Table 4c: Chip Frequency / 10.

tuan:Table 35a: Energy per device switching transition.

0

0

0

0

0

0


Static Power

Process13011510090807065

ITRS Year2001200220032004200520062007

Xilinx Year200220032004200520062007

Static Power per device5.60E-096.70E-091.00E-081.10E-082.60E-085.30E-085.30E-08

Capacity2763484395536978781106

Static Power/chip1.55E-062.33E-064.39E-066.08E-061.81E-054.65E-05

S Pwr/chip (norm)11.51E+002.84E+003.94E+001.17E+013.01E+01

0

0

0

0

0

0

0

Watts


tuan:Table 35a: Static power per device (Watts/Device)* ITRS Requirement (projection?)* Gate leakage not included* Assuming non-CMOS technology in 2007

Static PowerNormalized to 2001Best fit is a power trend Predicts 100X by 2007Future data points projected using linear trend for 1/VTH

1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007

Chart21

0.0000038256

0.00013951

0.0011687692

0.0306331559

0.5201304551

1

\

Static Power

Sheet1


Dynamic Power Ratio = C*V^2*f

199619971998199920002001200220032004200520062007

XC4036EXXC4085XL40250XVVirtexVirtex-eVirtex-IIVirtex-II ProWhitneyRanier??Everest??

Vdd53.32.52.51.81.51.36992619931.22207853281.10303480481.00512473410.92317939610.8535884382

1/Vdd0.20.3030303030.40.40.55555555560.66666666670.729966330.81827801830.90658970660.99490139491.08321308321.1715247715

Feature (nm)0.50.350.250.220.180.150.13390096870.11930683490.1075813310.09795435390.08990881230.0830846067

1/Feature sz22.857142857144.54545454555.55555555566.66666666677.46820586818.38174946749.295293066610.208836665911.122380265112.0359238643

Tilo1.61.20.70.60.420.36

1/Tilo0.6250.83333333331.42857142861.66666666672.3809523812.7777777778

Speed334475.428571428688125.7142857143146.6666666667168.0730158714191.6743764151215.2757369589238.8770975026262.4784580464286.0798185901

125001

Capacity30787448201022764884240140000

65000180000500000112400040740008000000120000001600000024000000310000004100000050000000

Power11.12585846152.19780219785.072410256411.140017230814.769230769218.902532406520.380286608625.222281143527.33144308430.754925576732.2951668462

0.06770833330.076230.14880952380.34344444440.75427211.2798589651.37991523911.70775861911.85056645882.08236475262.1866519219

Sheet1

0

0

0

0

0

0

MHz

Speed

Sheet2

0

0

0

0

0

0

0

0

&A

Page &P

1/Volts

1/Vdd

Sheet3

00

00

00

00

00

00

00

00

1/um


Sheet4

0

0

0

0

0

0

Watts

00

00

00

00

00

00

&A

Page &P

#Logic Cells

#System Gates

Device Capacity

0

0

0

0

0

0

0

0

0

0

0

0

&A

Page &P

Volts

1/Vdd

0

0

0

0

0

0

0

0

0

0

0

0

1/um



Static Power = K*Vdd*exp(-Vth/S)

123456789101112

Vdd53.32.52.51.81.51.36992619931.22207853281.10303480481.00512473410.92317939610.8535884382

Feature s0.50.350.250.220.180.150.13390096870.11930683490.1075813310.09795435390.08990881230.0830846067

VTH0 (n)0.740.610.550.550.3620.3550.2240.27960.23078054490.21125507920.19477583530.1806815238

1/VTHN01.35135135141.63934426231.81818181821.81818181822.76243093922.81690140854.46428571433.57653791134.33312088894.73361399695.13410710495.5346002129

VTH0 (p)0.940.780.60.60.52560.4730.21860.2850.24356099440.22103030130.20231507610.1865217794

1/VTHP01.06382978721.28205128211.66666666671.66666666671.9025875192.11416490494.57456541633.50877192984.10574773074.52426655494.94278537915.3613042034

VTH0 (n-lv)0.2760.2650.116

VTH0 (n)0.740.610.550.44333333330.3620.3550.3104862150.28257507430.25926817640.23951305030.22255528920.2078400106

1/VTHN01.35135135141.63934426231.81818181822.25563909772.76243093922.81690140853.22075490583.53888255143.8570101974.17513784264.49326548824.8113931338

VTH0 (p)0.940.780.60.57726315790.52560.4730.42647044540.39216598990.36296943270.33781898240.3159280670.2967015886

1/VTHP01.06382978721.28205128211.66666666671.73231218091.9025875192.11416490492.34482837152.54994065182.75505293222.96016521263.1652774933.3703897733

Capacity65000180000500000112400040740008000000120000001600000024000000310000004100000050000000

I-leak0.00000003980.00000079430.00000316230.00000316230.00023988330.00028183830.00575439940.0015995580.00492260.00771701260.01127824820.0156021613

P-static0.0129384830.47183097143.95284707528.88600022511759.11215617543382.059517517494597.229551942531276.5684601045130315.180108833240453.366401281426885.699763206665891.22280577

P-static0.00000382560.000139510.00116876920.00262739320.5201304551127.97030302459.247787715838.531308935871.0967282379126.2206349569196.8892680205

I-leak0.00000003980.00000079430.00000316230.00003686950.00023988330.00028183830.00078548490.00149365140.00255457250.00402596040.0059490430.0083483355

P-static0.0129384830.47183097143.9528470752103.6031563131759.11215617543382.059517517312912.676414328529205.749415877867626.7776599712125444.363227566225173.391068971356302.132895923

P-static0.00000382560.000139510.00116876920.03063315590.520130455113.81799206898.635492446119.995738487137.091116397566.5787783753105.35063947






tuan:/tools/xicad/solomon/spice/models/ibm/xi010tt-01.lib

tuan:/tools/xicad/solomon/spice/models/ibm/v212/ESD_85/xi013tt-212.mod

00

00

00

00

00

00

00

00

1/VTHN0

1/VTHP0


Dynamic Power

Process13011510090807065

ITRS Year2001200220032004200520062007

Xilinx Year200220032004200520062007

Speed168.4231.7308.8399517.3563.1673.9

Energy/xtor0.3470.2120.1370.0990.0650.0520.032

Capacity2763484395536978781106

Power/chip16128.004817093.899218572.158421844.05323436.276525708.893623850.6688

Pwr/chip (norm)11.05988926791.15154717711.35441756561.45314171161.59405294821.478835671


Watts

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0







\

Watts

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0



tuan:Table 4c: Chip Frequency / 10.

tuan:Table 35a: Energy per device switching transition.

0

0

0

0

0

0

Watts

Dynamic Power

0

0

0

0

0

0

Speed

0

0

0

0

0

0

Energy/xtor

0

0

0

0

0

0

#Xtors/chip


Static Power

Process13011510090807065

ITRS Year2001200220032004200520062007

Xilinx Year200220032004200520062007

Static Power per device5.60E-096.70E-091.00E-081.10E-082.60E-085.30E-085.30E-08

Capacity2763484395536978781106

Static Power/chip1.55E-062.33E-064.39E-066.08E-061.81E-054.65E-05

S Pwr/chip (norm)11.51E+002.84E+003.94E+001.17E+013.01E+01

0

0

0

0

0

0

0

Watts

Static Power


tuan:Table 35a: Static power per device (Watts/Device)* ITRS Requirement (projection?)* Gate leakage not included* Assuming non-CMOS technology in 2007

Static versus Dynamic

The Age of Accumulation

Incremental Designlessens the impact of design changesNext Generation technologyEasy set-up through floorplanning along HDL hierarchy boundariesChanges only affect the module that was changedThe remainder of the design stays locked and intactTiming repeatabilitypreserves routingFaster turnaround for localized design changesNew in ISE5.1i

Partial ReconfigurabilityFPGA Flexibility for the FieldRe-program part of an FPGA while its still runningVirtex-II and Virtex-E011011User Definable BoundariesFixedLogicPRLogicPRLogicFixedLogicFixedLogic

System Level Design FlowEnabling Technologies

Reconfigurability : 4th dimensionApplications open, close, and change priority over timeUse model : Object-Oriented Multiprocessing

The mother of all complex flows

No established formal models / semanticsDesign capture and simulation issuesHardware/software co-designModular design with uniform interfacesDynamic management of diverse hardware resources creates new place & route issuesBitstream production, storage, loading and linking

Combining the Best of FPGA and ASIC: XBlue 100% Programmable100% Fixed LogicTraditional FPGA MarketTraditional ASIC Market Flexible, but expensive Inflexible, but highestperformance/integrationBlockRAMSpecial FunctionsPowerPCCoreBlockRAMEmbeddedFPGA Core

ConclusionsFPGAs ride the tideToday : Programmable System PlatformChallengesDesign TechnologyInterconnectLow PowerExploit 3rd and 4th dimension

A trend in the 1990s is that process geometries are smaller than the wavelength of light used in wafer fabrication. This creates an obstacle in manufacturing deep submicron geometries.

How do companies such as Xilinx and UMC overcome this obstacle? Through optical proximity correction (OPC) technology. Wafer processing masks have to be corrected as shown on the next slide.

Future system cannot be upscaled 10GHz monoprocessor also for physical reasons. There is 4* more data transport but the average wire length also increases. Memory delay does not scale unless we decrease their size This leads to the need for parallel architectures with distributed embedded storage. Localized computation and storage are the key features to exploit DS technology. Fortunately this is consistent with domain specific multimedia image and sound stream processing as opposed to GP C program execution. The digitization of consumer products has also had another very profound effect on the dynamics of the consumer market place. The digitization of Consumer products has accelerated the rate at which products and standards become obsolete For the manufactures of these products, being able to quickly design and deliver a product to market will have a significant impact on his ability to gain market share.The window of design authoring is the most crucial in the design flow. This is where product innovation and design excellence takes place. It is becoming a smaller portion of the overall design task. Because deep-submicron design calls for additional emphasis on design rules and silicon effects, tools to analyze and correct these problems proliferate. The logic designer must comprehend interconnect timing issues, power and electromigration effects at the time of authoring, to reduce the iterations between design and layout. The average number of iterations is 20 to 23, impacting the efficiency of design.The number of point tools to ensure complete analysis increases as process geometries decrease. The design task becomes longer and more complex. No silicon design or verification requiredProgrammability means design freeze moves much farther to the rightlower cost, easier to use toolsets

from a study published in California Management Review, 1998by Stefan Thomke and Donald Reinertsen from Harvard391 ASIC and FPGA projects surveyedaverage 18 months per ASIC design, 8 months per FPGA design55% time-to-market advantage for FPGA design flow demonstrated

37% of new digital products were late to marketentering late significantly reduces the amount of profit available to a company

data is from a study by Price Waterhouse Coopers

The inherent programmability of FPGAs leads to greater time IN marketFPGAs can now be reprogrammed securely via the internetgreater profitslower cost of supportfix and debug designs while in the fieldlower susceptibility to changing standards or specslonger time in marketWith the PowerPC 405 tightly integrated into the FPGA fabric, the CoreConnect bus architecture can be used to connect the processor to the soft peripherals and other IP built in progrmamable gates. The following example illustrates how the three busses of the CoreConnect standard are used in a typical system.

The main bus is the Processor Local Bus, or P-L-B, which is primarily used for connecting high speed peripherals directly to the processor, such as memory and external bus controllers. It has separate 32-bit addresses and 64-bit data busses for the data and instruction caches of the 405. Furthermore, it has separate data read and write busses for overlapped transfers, and at a typical speed of 133MHz, it has a peak data bandwidth to the 405 of 2.1 GB/s. It supports pipelining, split transactions, and burst and cache line transfers. ***MOUSE CLICK***

CoreConnect also has a secondary bus, the On-chip Peripheral Bus, or O-P-B., used primarily to offload slower peripherals such as UARTs and General Purpose I/Os from the PLB. It has 32-bit address and data busses and supports single cycle data transfers. ***MOUSE CLICK***

The third CoreConnect bus is the Device Control Register, or D-C-R. It provides direct data transfers between the General Purpose Registers and DCR registers included in any peripherals. It is used primarily for system initialization and device configuration.Please not that these bus structures built in soft gates and can be customized by users for optimal performance . ***MOUSE CLICK***And now, lets move on to the System Generator for PowerPC. With this tool designers can specify, configure and assemble a PowerPC system. The tool will automatically integrate the PowerPC with the selected and customized soft peripherals using the CoreConnect bus.The flow is simple;First, the designer has a concept, a block diagram, of a processor system.Second, the designer specifies the architecture by selecting the different peripherals and connecting them together using the GUI shown on the foil.Third, each peripheral is parameterized using a CORE generator like GUI, as shown.Fourth, the designer assembles the entire system by specifying memory maps etc, using the GUI shown on the foil. Here Xilinx has an advantage over Altera. In the Altera tool, the designer is forced to define the memory map while defining each individual peripheral. Its difficult to get the system overview doing it that way. The Xilinx approach is to allow the designer to define global parameters in one global view.

As part of the Xilinx Empower! strategy, the MicroBlaze soft processor complements the Xilinx embedded PowerPC solution. The MicroBlaze processor will utilize the IBM CoreConnect bus for peripherals, the same bus used for the embedded PowerPC, providing customers with a plug-and-play infrastructure for building complex system-on-a-chip. In addition, peripherals built for the MicroBlaze processor are compatible with PowerPC products. Finally, because of it's compact size and the use of the CoreConnect interface, customers will be able to build intelligent processor systems with many multiple MicroBlaze processors as on-chip slave peripherals to the PowerPC.Incremental Design - is focused on minimizing the impact of late cycle design changes.

Based on the pioneering BLIS technology Xilinx introduced, Incremental Design is "next generation" technology that works the way customers work.

The engineer can quickly and easily floorplan along hierarchy boundaries, and then finish the design as normal. Then if a late design change comes along, the engineer only has to re-implement the portion of the design that changed, the rest of the design stays completed and intact.

Every one of our global customers said that they had to implement at least one late-cycle design change in their projects released in the last year.

Incremental Design leads to faster overall design closure with better, repeatable timing results.Partial Reconfigurability is an exciting new FPGA option available for Virtex-II and Virtex-E devices and enabled with the Modular Design ISE software option.

Partial Reconfigurability lets you re-program part of an FPGA while the FPGA continues running. An industry exclusive, Partial Reconfigurability offers customers lower support costs, with less field support required with no product down time.Accumulating oher stuffAdded features dont get dropped -- they get improved. Gates, mem, arithmetic, clocksThe age of accumulation has been gaining speed.As with any evolution, it starts earler than notices, but gains speed quickly: punctuated equilibrium.Top 4 boxes in 3 yearsFor many years, programmable logic consisted of gates and routing. However, Xilinx has been expanding the capabilities offered by programmable logic over the years. First special arithmetic functions were added to increase the performance of commonly used building blocks like adders and counters. In 1992, Xilinx pioneered the XC4000 family, which included RAM into programmable logic and greatly expanded the application space which could be addressed. More recently, Xilinx has brought sophisticated clock management to the masses in the Virtex architecture, along with extremely flexible I/Os. In all cases, these sub-systems have been done in a programmable way that maximizes the flexibility for the FPGA user.Looking into the future, we can see that serial gigabit I/O, microprocessors and mixed signal capability will all be added to FPGAs to create a truly programmable system on a chip.Lest anyone doubt this vision, consider for a moment the steady forward progress of Moores Law. Extrapolated, this says that Xilinx will have FPGAs in 2005 that contain 2 billion transistors. At this point, the cost of adding some of these additional features is negligible. The trick is to implement the features in the correct way, something that Xilinx has demonstrated over the years.

Incremental Design - is focused on minimizing the impact of late cycle design changes.

Based on the pioneering BLIS technology Xilinx introduced, Incremental Design is "next generation" technology that works the way customers work.

The engineer can quickly and easily floorplan along hierarchy boundaries, and then finish the design as normal. Then if a late design change comes along, the engineer only has to re-implement the portion of the design that changed, the rest of the design stays completed and intact.

Every one of our global customers said that they had to implement at least one late-cycle design change in their projects released in the last year.

Incremental Design leads to faster overall design closure with better, repeatable timing results.Partial Reconfigurability is an exciting new FPGA option available for Virtex-II and Virtex-E devices and enabled with the Modular Design ISE software option.

Partial Reconfigurability lets you re-program part of an FPGA while the FPGA continues running. An industry exclusive, Partial Reconfigurability offers customers lower support costs, with less field support required with no product down time.

challenges and opportunities for fpga platforms

Documents

asic market todaysource

new digital products

products life

simplerfaster design

leadinggate count requirement

nm example90nm technology

proven time

transistor count