stacked and loaded xilinx ssi, 28-gbps i/o yield amazing fpgas · stacked & loaded: xilinx ssi,...

6
8 Xcell Journal First Quarter 2011 COVER STORY Stacked & Loaded: Xilinx SSI, 28-Gbps I/O Yield Amazing FPGAs by Mike Santarini Publisher, Xcell Journal Xilinx, Inc. [email protected]

Upload: trinhmien

Post on 08-May-2018

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Stacked and Loaded Xilinx SSI, 28-Gbps I/O Yield Amazing FPGAs · Stacked & Loaded: Xilinx SSI, 28-Gbps I/O ... tecture connects several dice on a sin- ... 28-Gbps I/O Yield Amazing

8 Xcell Journal First Quarter 2011

COVER STORY

Stacked & Loaded: Xilinx SSI, 28-Gbps I/OYield Amazing FPGAs

by Mike SantariniPublisher, Xcell JournalXilinx, [email protected]

Page 2: Stacked and Loaded Xilinx SSI, 28-Gbps I/O Yield Amazing FPGAs · Stacked & Loaded: Xilinx SSI, 28-Gbps I/O ... tecture connects several dice on a sin- ... 28-Gbps I/O Yield Amazing

First Quarter 2011 Xcell Journal 9

X ilinx recently added to its line-up two innovations that willfurther expand the application

possibilities and market reach ofFPGAs. In late October, Xilinx®

announced it is adding stacked siliconinterconnect (SSI) FPGAs to the highend of its forthcoming 28-nanometerVirtex®-7 series (see Xcell Journal,Issue 72). The new, innovative archi-tecture connects several dice on a sin-gle silicon interposer, allowing Xilinxto field Virtex-7 FPGAs that pack asmany as 2 million logic cells—twicethe logic capacity of any otherannounced 28-nm FPGA—enablingnext-generation capabilities in the cur-rent generation of process technology.

Then, in late November, Xilinxtipped the Virtex-7 HT line of devices.Leveraging this SSI technology to com-bine FPGA and high-speed transceiverdice on a single IC, the Virtex-7 HTdevices are a giant technological leapforward for customers in the commu-nications sector and for the growingnumber of applications requiring high-speed I/O. These new FPGAs carrymany 28-Gbit/second transceiversalong with dozens of 13.1-Gbps trans-ceivers in the same device, facilitatingthe development of 100-Gbps commu-nications equipment today and 400-Gbps communications line cards wellin advance of established standardsfor equipment running at that speed.

MORE THAN MOOREEver since Intel co-founder GordonMoore published his seminal article“Cramming More Components ontoIntegrated Circuits” in the April 19,1965 issue of Electronics magazine, thesemiconductor industry has doubled

the transistor counts of new devicesevery 22 months, in lockstep with theintroduction of every new siliconprocess. Like other companies in thesemiconductor business, Xilinx haslearned over the years that to lead themarket, it must keep pace with Moore’sLaw and create silicon on each new gen-eration of process technology—or bet-ter yet, be the first company to do so.

Now, at a time when the complexity,cost and thus risk of designing on thelatest process geometries are becomingprohibitive for a greater number ofcompanies, Xilinx has devised a uniqueway to more than double the capacityof its next-generation devices, theVirtex-7 FPGAs. By introducing one ofthe semiconductor industry’s firststacked-die architectures, Xilinx willfield a line of the world’s largestFPGAs. The biggest of these, the 28-nmVirtex-7 XC7V2000T, offers 2 millionlogic cells along with 46,512 kbits ofblock RAM, 2,160 DSP slices and 36GTX 10.3125-Gbps transceivers. TheVirtex-7 family includes multiple SSIFPGAs as well as monolithic FPGAconfigurations. Virtex-7 is the high endof the 7 series, which also includes thenew low-cost, low-power Artix™FPGAs and the midrange Kintex™FPGAs—all implemented on a unifiedApplication Specific Modular BlockArchitecture (ASMBL) architecture.

The new SSI technology is more thanjust a windfall for customers itching touse the biggest FPGAs the industry canmuster. The successful deployment ofstacked dice in a mainstream logic chipmarks a huge semiconductor engineer-ing accomplishment. Xilinx is deliver-ing a stacked silicon chip at a timewhen most companies are just evaluat-

‘More than Moore’ stacked silicon interconnect technology and 28-Gbps transceivers lead new era of FPGA-driven innovations.

Page 3: Stacked and Loaded Xilinx SSI, 28-Gbps I/O Yield Amazing FPGAs · Stacked & Loaded: Xilinx SSI, 28-Gbps I/O ... tecture connects several dice on a sin- ... 28-Gbps I/O Yield Amazing

ing stacked-die architectures in hopesof reaping capacity, integration, PCBreal-estate and even yield benefits.Most of these companies are lookingto stacked-die technology to simplykeep up with Moore’s Law—Xilinx isleveraging it today as a way to exceedit and as a way to mix and match com-plementary types of dice on a single ICfootprint to offer vast leaps forward insystem performance, bill-of-materials(BOM) savings and power efficiency.

THE STACKED SILICON ARCHITECTURE“This new stacked silicon intercon-nect technology allows Xilinx to offernext-generation density in the currentgeneration of process technology,”said Liam Madden, corporate vicepresident of FPGA development andsilicon technology at Xilinx. “As diesize gets larger, the yield goes downexponentially, so building large diceis quite difficult and very costly. Thenew architecture allows us to build a

number of smaller dice and then usea silicon interposer to connect thosesmaller dice lying side-by-side on topof the interposer so they appear tobe, and function as, one integrateddie” (Figure 1).

Each of the dice is interconnectedvia layers in the silicon interposer inmuch in the same way that discretecomponents are interconnected on themany layers of a printed-circuit board(Figure 2). The die and silicon inter-poser layer connect by means of multi-ple microbumps. The architecture alsouses through-silicon vias (TSVs) thatrun through the passive silicon inter-poser to facilitate direct communica-tion between regions of each die onthe device and resources off-chip(Figure 3). Data flows between theadjacent FPGA die across more than10,000 routing connections.

Madden said using a passive sili-con interposer rather than going witha system-in-package or multichip-module configuration has huge

advantages. “We use regular siliconinterconnect or metallization to con-nect up the dice on the device,” saidMadden. “We can get many moreconnections within the silicon thanyou can with a system-in-package.But the biggest advantage of thisapproach is power savings. Becausewe are using chip interconnect toconnect the dice, it is much moreeconomical in power than connect-ing dice through big traces, throughpackages or through circuit boards.”

In fact, the SSI technology providesmore than 100 times the die-to-dieconnectivity bandwidth per watt, atone-fifth the latency, without consum-ing any high-speed serial or parallelI/O resources.

Madden also notes that themicrobumps are not directly con-nected to the package. Rather, theyare interconnected to the passiveinterposer, which in turn is linked tothe adjacent die. This setup offersgreat advantages by shielding the

10 Xcell Journal First Quarter 2011

C O V E R S T O R Y

Silicon Interposer

>10K routing connectionsbetween slices

~1ns latency

ASMBLOptimizedFPGASlice

FPGA SlicesSide-by-Side

SiliconInterposer

Figure 1 – The stacked silicon architecture places several dice (aka slices) side-by-sideon a silicon interposer.

Page 4: Stacked and Loaded Xilinx SSI, 28-Gbps I/O Yield Amazing FPGAs · Stacked & Loaded: Xilinx SSI, 28-Gbps I/O ... tecture connects several dice on a sin- ... 28-Gbps I/O Yield Amazing

microbumps from electrostatic dis-charge. By positioning dice next toeach other and interfaced to the ball-grid array, the device avoids the ther-mal flux, signal integrity and designtool flow issues that would haveaccompanied a purely vertical die-stacking approach.

As with the monolithic 7 seriesdevices, Xilinx implemented the SSImembers of the Virtex-7 family inTSMC’s 28-nm HPL (high-perform-ance, low-power) process technolo-gy, which Xilinx and TSMC devel-oped to create FPGAs with the rightmix of power efficiency and perform-ance (see cover story sidebar, Xcell

Journal, Issue 72).

NO NEW TOOLS REQUIREDWhile the SSI technology offers someradical leaps forward in terms ofcapacity, Madden said it will notforce a radical change in customerdesign methodologies. “One of thebeautiful aspects of this architectureis that we were able to establish theedges of each slice [individual die inthe device] along natural partitionswhere we would have traditionallyrun long wires had these structuresbeen in our monolithic FPGA archi-tecture,” said Madden. “This meantthat we didn’t have to do anythingradical in the tools to support thedevices.” As a result, “customersdon’t have to make any major adjust-

ments to their design methods orflows,” he said.

At the same time, Madden said thatcustomers will benefit from addingfloor-planning tools to their flowsbecause they now have so many logiccells to use.

A SUPPLY CHAIN FIRSTWhile the design is in and of itselfquite innovative, one of the biggestchallenges of fielding such a devicewas in putting together the supplychain to manufacture, assemble, testand distribute it. To create the endproduct, each of the individual dicemust first be tested extensively at thewafer level, binned and sorted, and

First Quarter 2011 Xcell Journal 11

C O V E R S T O R Y

28nm FPGA Slice 28nm FPGA Slice

Package Substrate

28nm FPGA Slice 28nm FPGA Slice

Microbumps

• Access to power / ground / IOs

• Access to logic regions

• Leverages ubiquitous image sensor microbump technology

Through-Silicon Vias (TSVs)

• Only bridge power / ground / IOs to C4 bumps

• Coarse pitch, low density aid manufacturability

• Etch process (not laser drilled)

Passive Silicon Interposer

• 4 conventional metal layers connect microbumps & TSVs

• No transistors means low risk and no TSV-induced performance degradation

• Etch process (not laser drilled)

Side-by-Side Die Layout

• Minimal heat flux issues

• Minimal design tool flow impact

Microbumps

Silicon Interposer

Through-Silicon Vias

C4 Bumps

BGA Balls

Figure 2 – Xilinx’s stacked silicon technology uses passive silicon-basedinterposers, microbumps and TSVs.

Page 5: Stacked and Loaded Xilinx SSI, 28-Gbps I/O Yield Amazing FPGAs · Stacked & Loaded: Xilinx SSI, 28-Gbps I/O ... tecture connects several dice on a sin- ... 28-Gbps I/O Yield Amazing

then attached to the interposer. Thecombined structure then needs to bepackaged and given a final test toensure connectivity before the endproduct ships to customers.

Madden’s group worked with TSMCand other partners to build this supplychain. “This is another first in theindustry, as no other company has putin place a supply chain like this acrossa foundry and OSAT [outsourcedsemiconductor assembly and test],”said Madden.

“Another beautiful aspect of thisapproach is that we can use essential-ly the same test approach that we usein our current devices,” he wenton.“Our current test technology allowsus to produce known-good dice, andthat is a big advantage for us becausein general, one of the biggest barriersof doing stacked-die technology is howdo you test at the wafer level.”

Because the stacked silicon tech-nology integrates multiple XilinxFPGA dice on a single IC, it logicallyfollows that the architecture wouldalso lend itself to mixing and matchingFPGA and other dice to create entirelynew devices. And that’s exactly what

Xilinx did with its ultrafast Virtex-7 HTline, announced just weeks after theSSI technology rollout.

DRIVING COMMUNICATIONS TO 400 GBPSThe new Virtex-7 HT line of devices istargeted squarely at communicationscompanies that are developing 100- to400-Gbps equipment. The Virtex-7 HTcombines on a single IC multiple 28-nm FPGA dice, bearing dozens of13.1-Gbps transceivers, with 28-Gbpstransceiver dice. The result is toendow the final device with a formi-dable mix of logic cells as well as cut-ting-edge transceiver performanceand reliability.

The largest of the Virex-7 HT lineincludes sixteen GTZ 28-Gbps trans-ceivers, seventy-two 13.1-Gbps trans-ceivers plus logic and memory, offeringtransceiver performance and capacityfar greater than competing devices(see Video 1, http://www.youtube.com/

user/XilinxInc#p/c/71A9E924ED61

B8F9/1/eTHjt67ViK0).

C O V E R S T O R Y

12 Xcell Journal First Quarter 2011

Figure 3 –Actual cross-section of the 28-nm Virtex-7 device. TSVs can be seenconnecting the microbumps (dotted line, top) through the silicon interposer.

Video 1 – Dr. Howard Johnsonintroduces the 28-Gbps transceiver-laden Virtex-7 HT.http://www.youtube.com/user/XilinxInc#p/c/71A9E924ED61B8F9/1/eTHjt67ViK0.

Page 6: Stacked and Loaded Xilinx SSI, 28-Gbps I/O Yield Amazing FPGAs · Stacked & Loaded: Xilinx SSI, 28-Gbps I/O ... tecture connects several dice on a sin- ... 28-Gbps I/O Yield Amazing

“We leveraged stacked interconnecttechnology to offer Virtex-7 deviceswith a 28G capability,” said Madden. “Byoffering the transceivers on a separatedie, we can optimize our 28-Gbps trans-ceiver performance and electrically iso-late functions to offer an even higherdegree of reliability for applicationsrequiring cutting-edge transceiver per-formance and reliability.”

With the need for bandwidthexploding, the communications sectoris franticly racing to establish new net-works. The wireless industry is scram-bling to produce equipment support-ing 40-Gbps data transfer today, whilewired networking is approaching 100Gbps. FPGAs have played a key role injust about every generation of net-working equipment since their incep-tion (see cover stories in Xcell

Journal, Issues 65 and 67).

Communications equipment designteams traditionally have used FPGAsto receive signals sent to equipmentin multiple protocols, translate thosesignals to common protocols that theequipment and network use, and thenforward the data to the next destina-tion. Traditionally companies haveplaced a processor in between FPGAsmonitoring and translating incomingsignals and those FPGAs forwardingsignals to their destination. But asFPGAs advance and grow in capacityand functionality, a single FPGA canboth send and receive, while also per-forming processing, to add greaterintelligence and monitoring to thesystem. This lowers the BOM and,more important, reduces the powerand cooling costs of networkingequipment, which must run reliably24 hours a day, seven days a week.

In a white paper titled “Industry’sHighest-Bandwidth FPGA EnablesWorld’s First Single-FPGA Solution for400G Communications Line Cards,”Xilinx’s Greg Lara outlines several com-munications equipment applicationsthat can benefit from the Virtex-7 HTdevices (see http://www.xilinx.com/

support/documentation/white_papers/

wp385_V7_28G_for_400G_Comm_

Line_Cards.pdf).To name a few, Virtex-7 HT FPGAs

can find a home in 100-Gbps linecards supporting OTU-4 (OpticalTransfer Unit) transponders. Theycan be used as well in muxponders orservice aggregation routers, in lower-cost 120-Gbps packet-processing linecards for highly demanding data pro-cessing, in multiple 100G Ethernetports and bridges, and in 400-GbpsEthernet line cards. Other potentialapplications include base stationsand remote radio heads with 19.6-Gbps Common Public Radio Interfacerequirements, and 100-Gbps and 400-Gbps test equipment.

JITTER AND EYE DIAGRAMA key to playing in these markets isensuring the FPGA transceiver sig-nals are robust, reliable and resistantto jitter or interference and to fluctu-ations caused by power system noise.For example, the CEI-28G specifica-

tion calls for 28-Gbps networkingequipment to have extremely tightjitter budgets.

Signal integrity is an extremelycrucial factor for 28-Gbps operation,said Panch Chandrasekaran, seniormarketing manager of FPGA compo-nents at Xilinx. To meet the stringentCEI-28G jitter budgets, the trans-ceivers in the new Xilinx FPGAsemploy phase-locked loops (PLLs)based on an LC tank design andadvanced equalization circuits to off-set deterministic jitter.

“Noise isolation becomes a veryimportant parameter at 28-Gbps sig-naling speeds,” said Chandrasekaran.“Because the FPGA fabric and trans-ceivers are on separate dice, the sen-sitive 28-Gbps analog circuitry is iso-lated from the digital FPGA circuits,providing superior isolation com-pared to monolithic implementa-tions” (Figures 4a and 4b).

The FPGA design also includesfeatures that minimize lane-to-laneskew, allowing the devices to sup-port stringent optical standards suchas the Scalable Serdes FramerInterface standard (SFI-S).

Further, the GTZ transceiverdesign eliminates the need fordesigners to employ external refer-ence resistors, lowering the BOMcosts and simplifing the boarddesign. A built-in “eye scan” func-tion automatically measures theheight and width of the post-equal-ization data eye. Engineers can usethis diagnostic tool to perform jitterbudget analysis on an active chan-nel and optimize transceiver param-eters to get optimal signal integrity,all without the expense of special-ized equipment.

ISE® Design Suite software toolsupport for 7 series FPGAs is avail-able today. Virtex-7 FPGAs with mas-sive logic capacity, thanks to the SSItechnology, will be available thisyear. Samples of the first Virtex-7 HTdevices are scheduled to be availablein the first half of 2012. For moreinformation on Virtex-7 FPGAs andSSI technology, visit http://www.

xilinx.com/technology/roadmap/

7-series-fpgas.htm.

First Quarter 2011 Xcell Journal 13

C O V E R S T O R Y

Figure 4a –Xilinx 28-Gbps transceiver displays an excellent eye opening

and jitter performance (using PRBS31 data pattern).

Figure 4b – This is a competing device’s 28-Gbps signal using a much simpler

PRBS7 pattern. The signal is extremelynoisy with a significantly smaller

eye opening. Eye size is shown close to relative scale.