challenges and opportunities for fpga platforms

58
Challenges and opportunities for FPGA platforms Ivo Bolsens Xilinx Research Labs

Upload: erik

Post on 13-Jan-2016

41 views

Category:

Documents


2 download

DESCRIPTION

Challenges and opportunities for FPGA platforms. Ivo Bolsens Xilinx Research Labs. Thanks to. Bill Carter David Eden Erich Goetting Alireza Kaviani Bernie New Cameron Patterson Steve Trimberger Tim Tuan. Overview. FPGA’s ride the tide Opportunities Challenges. - PowerPoint PPT Presentation

TRANSCRIPT

  • Challenges and opportunities for FPGA platformsIvo Bolsens Xilinx Research Labs

  • Thanks toBill CarterDavid EdenErich Goetting Alireza KavianiBernie NewCameron PattersonSteve TrimbergerTim Tuan

  • OverviewFPGAs ride the tideOpportunitiesChallenges

  • ASICs buck the tide, FPGAs ride the tideProcess TechnologyPerformanceArchitectureCostFlexibilityMarket trends

  • Moores LawA tale of two numbers : What process people dont tell you CD80nm240nm320nm160nm2.71.34.56.5(nm)CDTox

  • Trend: Line Widths Smaller Than the Wavelength of Light -0.1000.2000.3000.4000.5000.6000.70019881990199219941996199820002002Process Geometry (micron)Optical Processing WavelengthProcess Geometry

  • Painting a one cm line with a three cm brushCourtesy : IBM

  • Gate OxideAbout 10 molecular layers of SiO2 for this 150nm example90nm technology is about half the thicknessPolysilicon GateGate OxideSilicon crystal

  • FPGAs are ahead of the curveProcess Technology Feature Size (nm)Year35025018015013010070979899000102030405

  • FPGA Process Technology: Leading the transistor count in MOS Logic2V6000:~250 Million transistors per chip !2V8000: ~360 Million transistors per chip !

  • Where are we todayLogic Cells125K105KBlock RAM10Mb3MbMultipliers5561683.125Gb/sMGTs424PowerPC CPUs840Mb/s LVDS340442XC2VP125XC2V8000= 350M tranistors

  • Intels RoadmapSource : IntelFPGAs are leading

  • Gate count requirement for ASICsFPGAs can address very large part of the ASIC market todaySource: IMS

  • Performance requirement for ASICsFPGAs can address very large part of the ASIC market todaySource: IMS 2000

  • A Decade of Progress11010010001/911/921/931/941/951/961/971/981/991/001/01YearCapacitySpeedPriceVirtex &Virtex-E(excl. Block RAM)XC4000100x10x1xSpartan1000xVirtex-II(excl. Block RAM)

  • The Cost/Volume Crossover

    Unit VolumeRelative Cost0.11101001000101001,00010,000100,0001,000KASIC CostFPGA Cost

  • Are Transistors Free?

  • Performance Scaling

  • Localisation of storage and computingll/2++++storestore2Courtesy :IMEC

  • Market Requirements607080900010/ human

  • Electronics Industry DynamicsTimeCable DecodersCustom Features(Pay-Per-View)DigitalVCRSatellite/Cable+ DigitalVCRResidential Gateway(Broadband access) Market Size ($)NTSC NTSCSmart cards(DES)NTSCDESATAPINTSCDESATAPIDBSDOCSISNTSCDESATAPIDBSDOCSISHomePNAHomeRFHomePLUGBluetoothHiperlan2DSL...Dramatic increasein new standards New Products Take less time to reach high volumes Shorter Product Life Cycles Many standards / More Interoperability

  • Complex ASIC DesignThe Shrinking Window of InnovationSynthesis16% Average iterations between design and layout = 20(Source Electronic Systems Jan 99)

  • Simpler/Faster Design Flows2:1 proven Time-to-Market AdvantageNo silicon design or verification stepsMore design flexibility through later design freezeSpecASICFlowDesignFreezeDesign andVerificationSiliconPrototypeSystemIntegrationSiliconProductionSpecFPGAFlowDesignFreezeDesign andVerificationSystemIntegration

  • Todays Product Lifecycle37% of new digital products were late to marketEntering the market first can result in up to a 40% greater total profit contribution over the products life vs. the #2 entrantReduced profit for latecomersProfit for first to MarketProfitTime

  • Todays Product Lifecycle37% of new digital products were late to marketEntering the market first can result in up to a 40% greater total profit contribution over the products life vs. the #2 entrantIRL extendsproduct life in marketProfitTime

  • Virtex-II Pro PowerPC Technology32-bit RISC CPU, Harvard Architecture130nm CMOS with 1.5V Operation456 Dhrystone MIPS at 300MHz32 x 32-bit General Purpose RegistersHardware Multiply / Divide5-Stage Execution Pipeline16KB D-Cache, 16KB I-CacheMemory Management Unit (MMU)High-Bandwidth Interface to LogicBuilt-In Hardware TimersBuilt-In JTAG Debug and Trace supportIBM PowerPC 405 RISC CPU3.8 sq mm = 1% of 2VP100

  • High Performance4008001600200100Virtex-II Pro PowerPC 4054569121824Altera Excalibur Arm 9220Dhrystone MIPS1 CPU

  • Low PowerPC: 0.59mW/MIPSPower (mW)Performance (Dhrystone MIPS)501002003004001502503501002003004000Full-Custom IBM CPU Design1.5V 130nm CMOS TechnologyLow-K DielectricIP-Immersion100mW = 1 LED Indicatoror 169 MIPS!

  • IP-ImmersionEmbed multiple IP blocks of arbitrary shape withhigh-bandwidth connectivity to FPGA core logic, memory & I/O

    Technologies Enabling IP-ImmersionActive Interconnect Segmented RoutingMetal 1Metal 2Metal 3Metal 4Metal 5Metal 6Metal 7Metal 8Silicon SubstratePolyAdvanced hard-IP block(e.g. PowerPC CPU)Metal HeadroomMetal 9

  • System Architecture OptionsLogic-Centric ArchitecturePowerPC Executes Entirely out of CacheNo FPGA Logic, Memory, or I/O Used10-20 Pages of C-Code or MoreUse as Complex Algorithmic EngineWeb ServerEncryption/DecryptionPacket Processor CPU-Centric ArchitecturePowerPC forms Heart of Embedded SystemOn & Off-Chip PeripheralsExternal Interfacese.g. PCI, 3GIO, Gb Ethernet, ZBT SRAMCoreConnect On-Chip BusTies System TogetherPeripherals implemented in FPGA LogicTypically Runs Embedded OSExternal DevicesExternal InterfacesExternal DevicesExternal Interfaces

  • HW accelerationControl TasksControl TasksCode Stack (C++)Control TasksInterleaverReed-SolomonViterbiPowerPCProcessor RAMPowerPC with Application-Specific Hardware AccelerationVirtex-II ProConcatenated FEC EngineViterbiInter- leaverReed- SolomonProcessing timeTraditionalXTREME Processing

  • HW/SW InterfacingProvides Specialized Connectivity Between PowerPC & FPGA LogicDual-Port BlockRAM MemoryCPU & Logic Each Own 1 PortHigh-Bandwidth6.4Gb/secLow-LatencyNon-CachingDesigned for Communications Data ProcessingEnables PowerPC & FPGA Logic to Work together on Complex Problems

    BlockRAMs

  • APU ControllerMicro-controller style interface to fabric for control plane applicationsBenefits:Up to 10x faster than memory mapped interfaceSaves PLB bandwidth for code execution Minimizes pipeline stalls

  • Creating Complete Communications SolutionsGb Ethernet (1000BaseLX/SX/CX)RocketIO is PHY (1000Base-SX/LX)PHYftptelnetrloginmailetcUpper Layers on PowerPC

  • Infiniband ExampleCPU Makes Communications Practical, Easier, & CheaperInfiniBand TCAbuilt withCPU + fabricSources: Intel, Xilinxor builtwith fabric onlyCPU Based Solution 8 Times Less Area

  • Configurable Platform

  • The MicroBlaze High Performance Soft CPUTMLocalOPBBusCoreConnect Technologytm

  • Incremental Designlessens the impact of design changesNext Generation technologyEasy set-up through floorplanning along HDL hierarchy boundariesChanges only affect the module that was changedThe remainder of the design stays locked and intactTiming repeatabilitypreserves routingFaster turnaround for localized design changes

  • Partial ReconfigurabilityFPGA Flexibility for the FieldRe-program part of an FPGA while its still runningVirtex-II and Virtex-E011011User Definable BoundariesFixedLogicPRLogicPRLogicFixedLogicFixedLogic

  • System Exploration

  • Traditional ArchitectureCPMU-BusOther PeripheralsProcessormP BusSystemPCI BusSystemPCI BridgeDeviceRAMFLASHEEPROMG704FramerAAL5ProcessorMemoryInterfaceG703LIUMPC860Generic DesignCPM = Communications Processor ModuleTxRxLineCodingDataFormatLineDecodingDataAlignmentPayloadQualifyPayloadQualityPayloadAssemblyPayloadBufferPayloadProcessingSystemInterfacesMotorola PowerQUICC

  • Traditional Architecture

  • Optimized ArchitectureOther PeripheralsPowerPCProcessormP BusSystemPCI BusSystemPCI BridgeDeviceRAMFLASHEEPROMG704FramerMicroBProcessorMemoryInterfaceG703LIUGeneric DesignFPGA BoundaryTxRxLineCodingDataFormatLineDecodingDataAlignmentPayloadQualifyPayloadQualityPayloadAssemblyPayloadBufferPayloadProcessingSystemInterfacesDual PortBlockRAMFast I/FFIFO

  • Optimized ArchitectureOther PeripheralsPowerPCProcessormP BusSystemPCI BusSystemPCI BridgeDeviceRAMFLASHEEPROMG704FramerMicroBProcessorMemoryInterfaceG703LIUGeneric DesignFPGA BoundaryTxRxLineCodingDataFormatLineDecodingDataAlignmentPayloadQualifyPayloadQualityPayloadAssemblyPayloadBufferPayloadProcessingSystemInterfacesDual PortBlockRAMFast I/FFIFO

  • Interconnect and powerSource : Bill Daly

  • Interconnect and performanceSource : Bill Daly

  • Power AnalysisTypical design5.9uW/CLB/MHz [FPGA00]Fabric power is ~69% of total power2V6000 = 5.9uW/CLB/MHz 8448CLBs 100MHz 69% = 7.5W

    Chart2

    4485.888

    422.4960360626

    842.2601030137

    712.5239934247

    Power Dist

    V2 (2v6000) Power Distribution based on our measurements, simulations + XPower measurements

    XC2V6000

    Typ. UtilizationWrt Utiliz

    #BRAM1/73CLBs104.15144.00

    #Mult1/2BRAM52.08144.00

    #CLBs0.97603.28448

    #IBUF1/33CLBs230.40552.00

    #OBUF1/23CLBs330.57552.00

    DCMNA112

    AATUAAWU

    mW/MHzMHzmW/MHzMHz

    Fabric (avg)0.00591004485.890.00591004984.32

    IBUF (avg)0.000510011.090.000510026.58

    OBUF (avg)0.0124100411.400.0124100686.97

    Multiplier0.1368100712.520.13681001970.24

    BRAM0.0809100842.260.08091001164.49

    DCM0.178510017.850.1785100214.20

    WATUWAWU

    mW/MHzMHzmW/MHzMHz

    Fabric0.022820034670.590.022820038522.88

    IBUF (extrm)0.0024200110.940.0024200265.79

    OBUF (extrm)0.06222004114.020.06222006869.69

    Multiplier0.13682001425.050.13682003940.47

    BRAM0.08092001684.520.08092002328.98

    DCM0.178520035.700.1785200428.40

    AATUAAWUWATUWAWU

    Fabric4485.894984.3234670.5938522.88

    IOB422.50713.554224.967135.48

    BRAM842.261164.49842.261164.49

    Mult712.521970.24712.521970.24

    DCM17.85214.2017.85214.20

    Total6481.029046.7940468.1949007.28

    tuan:- This value is based on XPower (output = speedfile + 10pF).- Assumed 12.5% toggle rate (6.25% 0->1 rate).

    tuan:- Assume 50% (random) toggle rate.

    Power Dist

    00000

    00000

    00000

    00000

    Fabric

    IOB

    BRAM

    Mult

    DCM

    mW

    2v6000 Power Distribution

    PEST P Dist

    V2 (2v6000) Power Distribution based on Power Estimation Worksheet

    XC2V6000

    Typ. UtilizationTyp UteWrt Utiliz

    #CLBs184488448

    #IBUF1/33CLBs256.00552.00

    #OBUF1/23CLBs367.30552.00

    #Mult1/2BRAM57.86144.00

    #BRAM1/73CLBs115.73144.00

    DCMNA112

    AATUAAWU

    mW/MHzMHzmW/MHzMHz

    Fabric0.00681005744.64SSTL3_II_DCI0.00871007349.76SSTL3_II_DCI

    I0.001510039.4214673.0240.001510085.0131324.208

    O0.014100495.860869565223938.69356521740.014100745.235976.048

    Multiplier0.18751001084.930.18751002700.00

    BRAM0.0331100383.050.0331100476.64

    DCM0.0431004.300.04310051.60

    WATUWAWU

    mW/MHzMHzmW/MHzMHz

    Fabric0.022920038691.84SSTL3_II_DCI0.022920038691.84SSTL3_II_DCI

    I0.0038200191.7414712.4480.0038200413.4531409.216

    O0.0642004672.111304347824167.89147826090.0642007021.4436320.496

    Multiplier0.23752002748.490.23752006840.00

    BRAM0.116852002704.520.116852003365.28

    DCM0.05720011.400.057200136.80

    AATUAAWUWATUWAWU

    Fabric5.747.3538.6938.69

    LVTTLIOB0.540.834.867.43

    Mult1.082.702.756.84

    BRAM0.380.482.703.37

    DCM0.000.050.010.14

    Total7.7511.4149.0256.47

    AATUAAWUWATUWAWU

    Fabric5.747.3538.6938.69

    SSTL3_II_DCIIOB38.6167.3038.8867.73

    Mult1.082.702.756.84

    BRAM0.380.482.703.37

    DCM0.000.050.010.14

    Total45.8377.8883.04116.76

    tuan:- This value is based on XPower (output = speedfile + 10pF).- Assumed 12.5% toggle rate (6.25% 0->1 rate).

    tuan:- Assume 50% (random) toggle rate.

    0

    0

    0

    0

    PEST P Dist

    00000

    00000

    00000

    00000

    Fabric

    IOB

    Mult

    BRAM

    DCM

    Config P

    00000

    00000

    00000

    00000

    Fabric

    IOB

    Mult

    BRAM

    DCM

    Static P

    00000

    00000

    00000

    00000

    Fabric

    IOB

    Mult

    BRAM

    DCM

    Watts

    Virtex-II 2V6000 Power Distribution

    00000

    00000

    00000

    00000

    Fabric

    IOB

    Mult

    BRAM

    DCM

    V2 Configuration Power

    Configuration Process

    Config Bits

    DeviceCLBs# FramesFrame SizeConfig Bits (K)Cbits/CLBSlopeOffset

    XC2V406440483236056252565651

    XC2V80128404147263549612565651

    XC2V25038475221121,69744192565651

    XC2V50076892827522,76235962565651

    XC2V10001280110433924,08331902565651

    XC2V15001920128040325,65929472565651

    XC2V20002688145646727,49227872565651

    XC2V300035841804531210,49429282565651

    XC2V400057602156659215,66027192565651

    XC2V600084482508787221,85025862565651

    Programming Time (Sec)

    Serial Mode

    F (MHz)66

    XC2V405.45E-03

    XC2V60003.31E-01

    Parallel (SelectMap) Mode

    F (MHz)66

    XC2V406.82E-04

    XC2V60004.14E-02

    Programming Power

    Reg Writes

    Memory Cell Power

    MC-Vgg

    MC-Vdd

    CRC Power

    QUESTIONSHow many of the MCs are VDD and VGG respectively?

    tuan:- 10% toggle rate- MEDIUM routing

    tuan:- 10% toggle- HIGH routing

    tuan:- 50% toggle- MEDIUM routing

    tuan:- 50% toggle- HIGH routing

    tuan:16-bit CRC is performed on Config bitstream.

    tuan:The bitstream consists of a series of packets, each consists of a header (address, packet length) and data. All cfg bits first go to a hold register, where an FSM writes the packet data to the appropriate register.

    Actual config bits (all one packet) are broken into frames. Each frame = 80bits per row. Each column of IOB=4frames, IOI=22f, CLB=22f, BRAM=84f, GCLK=4f. Frame address is automatically incremented by FDRI.

    Config data comes in as bits or bytes, and are converted to 32-bit words in the LR corner logic. This 32-bit wide data bus is then routed to the center bottom block, and fed into a shift register that shifts the data upward through each row. At each row, 80-bits are distributed to the left half and right half of the FPGA.

    tuan:Stages of regs before the MCs: Corner logic:SR, FDBI, HR-Data_out-Data_busCenter logic:shiftreg in the center, data output reg from the shift reg.

    V2 Quiescent Power

    XPower Values (Are these just data from early versions of the Spreadsheet?)

    DeviceSlicesP-staticslopeoffset

    2v40256112.50.0070995392114.3655929299

    2v80512112.50.0070995392114.3655929299

    2v2501536112.50.0070995392114.3655929299

    2v50030721500.0070995392114.3655929299

    2v100051201500.0070995392114.3655929299

    2v150076801500.0070995392114.3655929299

    2v2000107522250.0070995392114.3655929299

    2v3000143362250.0070995392114.3655929299

    2v4000230403000.0070995392114.3655929299

    2v6000337923000.0070995392114.3655929299

    2v8000465924500.0070995392114.3655929299

    2v1000061440562.50.0070995392114.3655929299

    V2 Power Estimation Spreadsheet

    DeviceSlicesPs (fabric)Ps/slicePs Offset

    2v402561130.0089124.12

    2v80512113

    2v2501536113

    2v5003072150

    2v10005120150

    2v15007680150

    2v200010752300

    2v300014336300

    2v400023040375

    2v600033792375

    2v800046592525

    0.0283981566uW/CLB

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    P-static

    Slices

    mW

    Static Power in V2 Devices

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    Ps (fabric)

  • Methodology

    Dynamic power VDD2 * (1/LMIN) * Capacity * fCLKVDD: Supply voltageLMIN: Minimum process feature sizeCapacity: Number of gates per devicefCLK: Clock frequencyStatic power VDD * Capacity * exp(-Vth/S)VTH: Theshold voltageS: Subthreshold slopeIdeally 60mV/decade; typically 90-120mV/decade; set at 100mV/decadeNote: Does not include gate leakage (?)

  • Dynamic PowerNormalized to 2001Best fit is a quadratic trend linePredicts 5X by 20071996: 4000EX 1997: 4000XL 1998: 4000XV 1999: Virtex 2000: Virtex-E 2001: Virtex-II

    Chart17

    0.0677083333

    0.07623

    0.1488095238

    0.3434444444

    0.754272

    1

    Dynamic Power

    Sheet1

    Xilinx FPGAs Historical Trend

    Dynamic Power Ratio = C*V^2*f

    199619971998199920002001200220032004200520062007

    XC4036EXXC4085XL40250XVVirtexVirtex-eVirtex-IIVirtex-II ProWhitneyRanier??Everest??

    Vdd53.32.52.51.81.51.36992619931.22207853281.10303480481.00512473410.92317939610.8535884382

    1/Vdd0.20.3030303030.40.40.55555555560.66666666670.729966330.81827801830.90658970660.99490139491.08321308321.1715247715

    Feature (nm)0.50.350.250.220.180.150.13390096870.11930683490.1075813310.09795435390.08990881230.0830846067

    1/Feature sz22.857142857144.54545454555.55555555566.66666666677.46820586818.38174946749.295293066610.208836665911.122380265112.0359238643

    Tilo1.61.20.70.60.420.36

    1/Tilo0.6250.83333333331.42857142861.66666666672.3809523812.7777777778

    Speed334475.428571428688125.7142857143146.6666666667168.0730158714191.6743764151215.2757369589238.8770975026262.4784580464286.0798185901

    125001

    Capacity30787448201022764884240140000

    65000180000500000112400040740008000000120000001600000024000000310000004100000050000000

    Power11.12585846152.19780219785.072410256411.140017230814.769230769218.902532406520.380286608625.222281143527.33144308430.754925576732.2951668462

    0.06770833330.076230.14880952380.34344444440.75427211.2798589651.37991523911.70775861911.85056645882.08236475262.1866519219

    Sheet1

    0

    0

    0

    0

    0

    0

    MHz

    Speed

    Sheet2

    0

    0

    0

    0

    0

    0

    0

    0

    &A

    Page &P

    1/Volts

    1/Vdd

    Sheet3

    00

    00

    00

    00

    00

    00

    00

    00

    1/um

    1/Process Feature Size

    Sheet4

    0

    0

    0

    0

    0

    0

    Watts

    00

    00

    00

    00

    00

    00

    &A

    Page &P

    #Logic Cells

    #System Gates

    Device Capacity

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    &A

    Page &P

    Volts

    1/Vdd

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    1/um

    1/Process Feature Size

    Xilinx FPGAs Historical Trend

    Static Power = K*Vdd*exp(-Vth/S)

    199619971998199920002001200220032004200520062007

    Vdd53.32.52.51.81.51.36992619931.22207853281.10303480481.00512473410.92317939610.8535884382

    Feature s0.50.350.250.220.180.150.13390096870.11930683490.1075813310.09795435390.08990881230.0830846067

    VTH0 (n)0.740.610.550.550.3620.3550.2240.27960.23078054490.21125507920.19477583530.1806815238

    1/VTHN01.35135135141.63934426231.81818181821.81818181822.76243093922.81690140854.46428571433.57653791134.33312088884.73361399685.13410710475.5346002127

    VTH0 (p)0.940.780.60.60.52560.4730.21860.2850.24356099440.22103030140.20231507610.1865217794

    1/VTHP01.06382978721.28205128211.66666666671.66666666671.9025875192.11416490494.57456541633.50877192984.10574773064.52426655484.9427853795.3613042032

    VTH0 (n-lv)0.2760.2650.116

    VTH0 (n)0.740.610.550.44333333320.3620.3550.31048621490.28257507430.25926817640.23951305030.22255528910.2078400106

    1/VTHN01.35135135141.63934426231.81818181822.25563909852.76243093922.81690140853.2207549063.53888255163.85701019724.17513784284.49326548844.811393134

    VTH0 (p)0.940.780.60.57726315790.52560.4730.42647044540.39216598990.36296943280.33781898250.3159280670.2967015886

    1/VTHP01.06382978721.28205128211.66666666671.73231218091.9025875192.11416490492.34482837142.54994065182.75505293222.96016521263.16527749293.3703897733

    Capacity65000180000500000112400040740008000000120000001600000024000000310000004100000050000000

    I-leak0.00000003980.00000079430.00000316230.00000316230.00023988330.00028183830.00575439940.0015995580.00492260.00771701260.01127824820.0156021613

    P-static0.0129384830.47183097143.95284707528.88600022511759.11215617543382.059517517494597.229551942531276.5684601045130315.180093971240453.366373206426885.699713102665891.222728189

    P-static0.00000382560.000139510.00116876920.00262739320.5201304551127.97030302459.247787715838.531308931471.0967282296126.2206349421196.8892679976

    I-leak0.00000003980.00000079430.00000316230.00003686950.00023988330.00028183830.00078548490.00149365140.00255457250.00402596040.0059490430.0083483355

    P-static0.0129384830.47183097143.9528470752103.60315665811759.11215617543382.059517517312912.676419360129205.749426117267626.7776815155125444.36326418225173.391129576356302.132984923

    P-static0.00000382560.000139510.00116876920.0306331560.520130455113.81799207048.635492449219.995738493437.091116408366.5787783933105.3506394963

    tuan:/tools/xicad/solomon/spice/models/umc/u05-2hw3.lib

    tuan:/tools/xicad/solomon/spice/models/umc/u035-3s.lib

    tuan:/tools/xicad/solomon/spice/models/umc/u025-22e.lib

    tuan:/tools/xicad/solomon/spice/models/umc/u018-1.lib

    tuan:/tools/xicad/solomon/spice/models/umc/u015-10.lib

    tuan:/tools/xicad/solomon/spice/models/ibm/xi010tt-01.lib

    tuan:/tools/xicad/solomon/spice/models/ibm/v212/ESD_85/xi013tt-212.mod

    00

    00

    00

    00

    00

    00

    00

    00

    1/VTHN0

    1/VTHP0

    ITRS Roadmap 2002 Predictions

    Dynamic Power

    Process13011510090807065

    ITRS Year2001200220032004200520062007

    Xilinx Year200220032004200520062007

    Speed168.4231.7308.8399517.3563.1673.9

    Energy/xtor0.3470.2120.1370.0990.0650.0520.032

    Capacity2763484395536978781106

    Power/chip16128.004817093.899218572.158421844.05323436.276525708.893623850.6688

    Pwr/chip (norm)11.05988926791.15154717711.35441756561.45314171161.59405294821.478835671

    tuan:/tools/xicad/solomon/spice/models/umc/u025-22e.lib

    Watts

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    tuan:/tools/xicad/solomon/spice/models/umc/u05-2hw3.lib

    tuan:/tools/xicad/solomon/spice/models/umc/u035-3s.lib

    tuan:/tools/xicad/solomon/spice/models/umc/u025-22e.lib

    tuan:/tools/xicad/solomon/spice/models/umc/u025-22e.lib

    tuan:/tools/xicad/solomon/spice/models/umc/u018-1.lib

    tuan:/tools/xicad/solomon/spice/models/umc/u015-10.lib

    \

    Watts

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    tuan:/tools/xicad/solomon/spice/models/umc/u025-22e.lib

    tuan:ITRS Table1i: Functions per chip (million transistors) for Logic (low-volume, high performance)* Assuming constant die size

    tuan:Table 4c: Chip Frequency / 10.

    tuan:Table 35a: Energy per device switching transition.

    0

    0

    0

    0

    0

    0

    ITRS Roadmap 2002 Predictions

    Static Power

    Process13011510090807065

    ITRS Year2001200220032004200520062007

    Xilinx Year200220032004200520062007

    Static Power per device5.60E-096.70E-091.00E-081.10E-082.60E-085.30E-085.30E-08

    Capacity2763484395536978781106

    Static Power/chip1.55E-062.33E-064.39E-066.08E-061.81E-054.65E-05

    S Pwr/chip (norm)11.51E+002.84E+003.94E+001.17E+013.01E+01

    0

    0

    0

    0

    0

    0

    0

    Watts

    tuan:ITRS Table1i: Functions per chip (million transistors) for Logic (low-volume, high performance)* Assuming constant die size

    tuan:Table 35a: Static power per device (Watts/Device)* ITRS Requirement (projection?)* Gate leakage not included* Assuming non-CMOS technology in 2007

  • Static PowerNormalized to 2001Best fit is a power trend Predicts 100X by 2007Future data points projected using linear trend for 1/VTH

    1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007

    Chart21

    0.0000038256

    0.00013951

    0.0011687692

    0.0306331559

    0.5201304551

    1

    \

    Static Power

    Sheet1

    Xilinx FPGAs Historical Trend

    Dynamic Power Ratio = C*V^2*f

    199619971998199920002001200220032004200520062007

    XC4036EXXC4085XL40250XVVirtexVirtex-eVirtex-IIVirtex-II ProWhitneyRanier??Everest??

    Vdd53.32.52.51.81.51.36992619931.22207853281.10303480481.00512473410.92317939610.8535884382

    1/Vdd0.20.3030303030.40.40.55555555560.66666666670.729966330.81827801830.90658970660.99490139491.08321308321.1715247715

    Feature (nm)0.50.350.250.220.180.150.13390096870.11930683490.1075813310.09795435390.08990881230.0830846067

    1/Feature sz22.857142857144.54545454555.55555555566.66666666677.46820586818.38174946749.295293066610.208836665911.122380265112.0359238643

    Tilo1.61.20.70.60.420.36

    1/Tilo0.6250.83333333331.42857142861.66666666672.3809523812.7777777778

    Speed334475.428571428688125.7142857143146.6666666667168.0730158714191.6743764151215.2757369589238.8770975026262.4784580464286.0798185901

    125001

    Capacity30787448201022764884240140000

    65000180000500000112400040740008000000120000001600000024000000310000004100000050000000

    Power11.12585846152.19780219785.072410256411.140017230814.769230769218.902532406520.380286608625.222281143527.33144308430.754925576732.2951668462

    0.06770833330.076230.14880952380.34344444440.75427211.2798589651.37991523911.70775861911.85056645882.08236475262.1866519219

    Sheet1

    0

    0

    0

    0

    0

    0

    MHz

    Speed

    Sheet2

    0

    0

    0

    0

    0

    0

    0

    0

    &A

    Page &P

    1/Volts

    1/Vdd

    Sheet3

    00

    00

    00

    00

    00

    00

    00

    00

    1/um

    1/Process Feature Size

    Sheet4

    0

    0

    0

    0

    0

    0

    Watts

    00

    00

    00

    00

    00

    00

    &A

    Page &P

    #Logic Cells

    #System Gates

    Device Capacity

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    &A

    Page &P

    Volts

    1/Vdd

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    1/um

    1/Process Feature Size

    Xilinx FPGAs Historical Trend

    Static Power = K*Vdd*exp(-Vth/S)

    123456789101112

    Vdd53.32.52.51.81.51.36992619931.22207853281.10303480481.00512473410.92317939610.8535884382

    Feature s0.50.350.250.220.180.150.13390096870.11930683490.1075813310.09795435390.08990881230.0830846067

    VTH0 (n)0.740.610.550.550.3620.3550.2240.27960.23078054490.21125507920.19477583530.1806815238

    1/VTHN01.35135135141.63934426231.81818181821.81818181822.76243093922.81690140854.46428571433.57653791134.33312088894.73361399695.13410710495.5346002129

    VTH0 (p)0.940.780.60.60.52560.4730.21860.2850.24356099440.22103030130.20231507610.1865217794

    1/VTHP01.06382978721.28205128211.66666666671.66666666671.9025875192.11416490494.57456541633.50877192984.10574773074.52426655494.94278537915.3613042034

    VTH0 (n-lv)0.2760.2650.116

    VTH0 (n)0.740.610.550.44333333330.3620.3550.3104862150.28257507430.25926817640.23951305030.22255528920.2078400106

    1/VTHN01.35135135141.63934426231.81818181822.25563909772.76243093922.81690140853.22075490583.53888255143.8570101974.17513784264.49326548824.8113931338

    VTH0 (p)0.940.780.60.57726315790.52560.4730.42647044540.39216598990.36296943270.33781898240.3159280670.2967015886

    1/VTHP01.06382978721.28205128211.66666666671.73231218091.9025875192.11416490492.34482837152.54994065182.75505293222.96016521263.1652774933.3703897733

    Capacity65000180000500000112400040740008000000120000001600000024000000310000004100000050000000

    I-leak0.00000003980.00000079430.00000316230.00000316230.00023988330.00028183830.00575439940.0015995580.00492260.00771701260.01127824820.0156021613

    P-static0.0129384830.47183097143.95284707528.88600022511759.11215617543382.059517517494597.229551942531276.5684601045130315.180108833240453.366401281426885.699763206665891.22280577

    P-static0.00000382560.000139510.00116876920.00262739320.5201304551127.97030302459.247787715838.531308935871.0967282379126.2206349569196.8892680205

    I-leak0.00000003980.00000079430.00000316230.00003686950.00023988330.00028183830.00078548490.00149365140.00255457250.00402596040.0059490430.0083483355

    P-static0.0129384830.47183097143.9528470752103.6031563131759.11215617543382.059517517312912.676414328529205.749415877867626.7776599712125444.363227566225173.391068971356302.132895923

    P-static0.00000382560.000139510.00116876920.03063315590.520130455113.81799206898.635492446119.995738487137.091116397566.5787783753105.35063947

    tuan:/tools/xicad/solomon/spice/models/umc/u05-2hw3.lib

    tuan:/tools/xicad/solomon/spice/models/umc/u035-3s.lib

    tuan:/tools/xicad/solomon/spice/models/umc/u025-22e.lib

    tuan:/tools/xicad/solomon/spice/models/umc/u018-1.lib

    tuan:/tools/xicad/solomon/spice/models/umc/u015-10.lib

    tuan:/tools/xicad/solomon/spice/models/ibm/xi010tt-01.lib

    tuan:/tools/xicad/solomon/spice/models/ibm/v212/ESD_85/xi013tt-212.mod

    00

    00

    00

    00

    00

    00

    00

    00

    1/VTHN0

    1/VTHP0

    ITRS Roadmap 2002 Predictions

    Dynamic Power

    Process13011510090807065

    ITRS Year2001200220032004200520062007

    Xilinx Year200220032004200520062007

    Speed168.4231.7308.8399517.3563.1673.9

    Energy/xtor0.3470.2120.1370.0990.0650.0520.032

    Capacity2763484395536978781106

    Power/chip16128.004817093.899218572.158421844.05323436.276525708.893623850.6688

    Pwr/chip (norm)11.05988926791.15154717711.35441756561.45314171161.59405294821.478835671

    tuan:/tools/xicad/solomon/spice/models/umc/u025-22e.lib

    Watts

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    tuan:/tools/xicad/solomon/spice/models/umc/u05-2hw3.lib

    tuan:/tools/xicad/solomon/spice/models/umc/u035-3s.lib

    tuan:/tools/xicad/solomon/spice/models/umc/u025-22e.lib

    tuan:/tools/xicad/solomon/spice/models/umc/u025-22e.lib

    tuan:/tools/xicad/solomon/spice/models/umc/u018-1.lib

    tuan:/tools/xicad/solomon/spice/models/umc/u015-10.lib

    \

    Watts

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    tuan:/tools/xicad/solomon/spice/models/umc/u025-22e.lib

    tuan:ITRS Table1i: Functions per chip (million transistors) for Logic (low-volume, high performance)* Assuming constant die size

    tuan:Table 4c: Chip Frequency / 10.

    tuan:Table 35a: Energy per device switching transition.

    0

    0

    0

    0

    0

    0

    Watts

    Dynamic Power

    0

    0

    0

    0

    0

    0

    Speed

    0

    0

    0

    0

    0

    0

    Energy/xtor

    0

    0

    0

    0

    0

    0

    #Xtors/chip

    ITRS Roadmap 2002 Predictions

    Static Power

    Process13011510090807065

    ITRS Year2001200220032004200520062007

    Xilinx Year200220032004200520062007

    Static Power per device5.60E-096.70E-091.00E-081.10E-082.60E-085.30E-085.30E-08

    Capacity2763484395536978781106

    Static Power/chip1.55E-062.33E-064.39E-066.08E-061.81E-054.65E-05

    S Pwr/chip (norm)11.51E+002.84E+003.94E+001.17E+013.01E+01

    0

    0

    0

    0

    0

    0

    0

    Watts

    Static Power

    tuan:ITRS Table1i: Functions per chip (million transistors) for Logic (low-volume, high performance)* Assuming constant die size

    tuan:Table 35a: Static power per device (Watts/Device)* ITRS Requirement (projection?)* Gate leakage not included* Assuming non-CMOS technology in 2007

  • Static versus Dynamic

  • The Age of Accumulation

  • Incremental Designlessens the impact of design changesNext Generation technologyEasy set-up through floorplanning along HDL hierarchy boundariesChanges only affect the module that was changedThe remainder of the design stays locked and intactTiming repeatabilitypreserves routingFaster turnaround for localized design changesNew in ISE5.1i

  • Partial ReconfigurabilityFPGA Flexibility for the FieldRe-program part of an FPGA while its still runningVirtex-II and Virtex-E011011User Definable BoundariesFixedLogicPRLogicPRLogicFixedLogicFixedLogic

  • System Level Design FlowEnabling Technologies

  • Reconfigurability : 4th dimensionApplications open, close, and change priority over timeUse model : Object-Oriented Multiprocessing

  • The mother of all complex flows

    No established formal models / semanticsDesign capture and simulation issuesHardware/software co-designModular design with uniform interfacesDynamic management of diverse hardware resources creates new place & route issuesBitstream production, storage, loading and linking

  • Combining the Best of FPGA and ASIC: XBlue 100% Programmable100% Fixed LogicTraditional FPGA MarketTraditional ASIC Market Flexible, but expensive Inflexible, but highestperformance/integrationBlockRAMSpecial FunctionsPowerPCCoreBlockRAMEmbeddedFPGA Core

  • ConclusionsFPGAs ride the tideToday : Programmable System PlatformChallengesDesign TechnologyInterconnectLow PowerExploit 3rd and 4th dimension

  • A trend in the 1990s is that process geometries are smaller than the wavelength of light used in wafer fabrication. This creates an obstacle in manufacturing deep submicron geometries.

    How do companies such as Xilinx and UMC overcome this obstacle? Through optical proximity correction (OPC) technology. Wafer processing masks have to be corrected as shown on the next slide.

    Future system cannot be upscaled 10GHz monoprocessor also for physical reasons. There is 4* more data transport but the average wire length also increases. Memory delay does not scale unless we decrease their size This leads to the need for parallel architectures with distributed embedded storage. Localized computation and storage are the key features to exploit DS technology. Fortunately this is consistent with domain specific multimedia image and sound stream processing as opposed to GP C program execution. The digitization of consumer products has also had another very profound effect on the dynamics of the consumer market place. The digitization of Consumer products has accelerated the rate at which products and standards become obsolete For the manufactures of these products, being able to quickly design and deliver a product to market will have a significant impact on his ability to gain market share.The window of design authoring is the most crucial in the design flow. This is where product innovation and design excellence takes place. It is becoming a smaller portion of the overall design task. Because deep-submicron design calls for additional emphasis on design rules and silicon effects, tools to analyze and correct these problems proliferate. The logic designer must comprehend interconnect timing issues, power and electromigration effects at the time of authoring, to reduce the iterations between design and layout. The average number of iterations is 20 to 23, impacting the efficiency of design.The number of point tools to ensure complete analysis increases as process geometries decrease. The design task becomes longer and more complex. No silicon design or verification requiredProgrammability means design freeze moves much farther to the rightlower cost, easier to use toolsets

    from a study published in California Management Review, 1998by Stefan Thomke and Donald Reinertsen from Harvard391 ASIC and FPGA projects surveyedaverage 18 months per ASIC design, 8 months per FPGA design55% time-to-market advantage for FPGA design flow demonstrated

    37% of new digital products were late to marketentering late significantly reduces the amount of profit available to a company

    data is from a study by Price Waterhouse Coopers

    The inherent programmability of FPGAs leads to greater time IN marketFPGAs can now be reprogrammed securely via the internetgreater profitslower cost of supportfix and debug designs while in the fieldlower susceptibility to changing standards or specslonger time in marketWith the PowerPC 405 tightly integrated into the FPGA fabric, the CoreConnect bus architecture can be used to connect the processor to the soft peripherals and other IP built in progrmamable gates. The following example illustrates how the three busses of the CoreConnect standard are used in a typical system.

    The main bus is the Processor Local Bus, or P-L-B, which is primarily used for connecting high speed peripherals directly to the processor, such as memory and external bus controllers. It has separate 32-bit addresses and 64-bit data busses for the data and instruction caches of the 405. Furthermore, it has separate data read and write busses for overlapped transfers, and at a typical speed of 133MHz, it has a peak data bandwidth to the 405 of 2.1 GB/s. It supports pipelining, split transactions, and burst and cache line transfers. ***MOUSE CLICK***

    CoreConnect also has a secondary bus, the On-chip Peripheral Bus, or O-P-B., used primarily to offload slower peripherals such as UARTs and General Purpose I/Os from the PLB. It has 32-bit address and data busses and supports single cycle data transfers. ***MOUSE CLICK***

    The third CoreConnect bus is the Device Control Register, or D-C-R. It provides direct data transfers between the General Purpose Registers and DCR registers included in any peripherals. It is used primarily for system initialization and device configuration.Please not that these bus structures built in soft gates and can be customized by users for optimal performance . ***MOUSE CLICK***And now, lets move on to the System Generator for PowerPC. With this tool designers can specify, configure and assemble a PowerPC system. The tool will automatically integrate the PowerPC with the selected and customized soft peripherals using the CoreConnect bus.The flow is simple;First, the designer has a concept, a block diagram, of a processor system.Second, the designer specifies the architecture by selecting the different peripherals and connecting them together using the GUI shown on the foil.Third, each peripheral is parameterized using a CORE generator like GUI, as shown.Fourth, the designer assembles the entire system by specifying memory maps etc, using the GUI shown on the foil. Here Xilinx has an advantage over Altera. In the Altera tool, the designer is forced to define the memory map while defining each individual peripheral. Its difficult to get the system overview doing it that way. The Xilinx approach is to allow the designer to define global parameters in one global view.

    As part of the Xilinx Empower! strategy, the MicroBlaze soft processor complements the Xilinx embedded PowerPC solution. The MicroBlaze processor will utilize the IBM CoreConnect bus for peripherals, the same bus used for the embedded PowerPC, providing customers with a plug-and-play infrastructure for building complex system-on-a-chip. In addition, peripherals built for the MicroBlaze processor are compatible with PowerPC products. Finally, because of it's compact size and the use of the CoreConnect interface, customers will be able to build intelligent processor systems with many multiple MicroBlaze processors as on-chip slave peripherals to the PowerPC.Incremental Design - is focused on minimizing the impact of late cycle design changes.

    Based on the pioneering BLIS technology Xilinx introduced, Incremental Design is "next generation" technology that works the way customers work.

    The engineer can quickly and easily floorplan along hierarchy boundaries, and then finish the design as normal. Then if a late design change comes along, the engineer only has to re-implement the portion of the design that changed, the rest of the design stays completed and intact.

    Every one of our global customers said that they had to implement at least one late-cycle design change in their projects released in the last year.

    Incremental Design leads to faster overall design closure with better, repeatable timing results.Partial Reconfigurability is an exciting new FPGA option available for Virtex-II and Virtex-E devices and enabled with the Modular Design ISE software option.

    Partial Reconfigurability lets you re-program part of an FPGA while the FPGA continues running. An industry exclusive, Partial Reconfigurability offers customers lower support costs, with less field support required with no product down time.Accumulating oher stuffAdded features dont get dropped -- they get improved. Gates, mem, arithmetic, clocksThe age of accumulation has been gaining speed.As with any evolution, it starts earler than notices, but gains speed quickly: punctuated equilibrium.Top 4 boxes in 3 yearsFor many years, programmable logic consisted of gates and routing. However, Xilinx has been expanding the capabilities offered by programmable logic over the years. First special arithmetic functions were added to increase the performance of commonly used building blocks like adders and counters. In 1992, Xilinx pioneered the XC4000 family, which included RAM into programmable logic and greatly expanded the application space which could be addressed. More recently, Xilinx has brought sophisticated clock management to the masses in the Virtex architecture, along with extremely flexible I/Os. In all cases, these sub-systems have been done in a programmable way that maximizes the flexibility for the FPGA user.Looking into the future, we can see that serial gigabit I/O, microprocessors and mixed signal capability will all be added to FPGAs to create a truly programmable system on a chip.Lest anyone doubt this vision, consider for a moment the steady forward progress of Moores Law. Extrapolated, this says that Xilinx will have FPGAs in 2005 that contain 2 billion transistors. At this point, the cost of adding some of these additional features is negligible. The trick is to implement the features in the correct way, something that Xilinx has demonstrated over the years.

    Incremental Design - is focused on minimizing the impact of late cycle design changes.

    Based on the pioneering BLIS technology Xilinx introduced, Incremental Design is "next generation" technology that works the way customers work.

    The engineer can quickly and easily floorplan along hierarchy boundaries, and then finish the design as normal. Then if a late design change comes along, the engineer only has to re-implement the portion of the design that changed, the rest of the design stays completed and intact.

    Every one of our global customers said that they had to implement at least one late-cycle design change in their projects released in the last year.

    Incremental Design leads to faster overall design closure with better, repeatable timing results.Partial Reconfigurability is an exciting new FPGA option available for Virtex-II and Virtex-E devices and enabled with the Modular Design ISE software option.

    Partial Reconfigurability lets you re-program part of an FPGA while the FPGA continues running. An industry exclusive, Partial Reconfigurability offers customers lower support costs, with less field support required with no product down time.