xcell journal issue 80

68
www.xilinx.com/xcell/ SOLUTIONS FOR A PROGRAMMABLE WORLD Xcell journal Xcell journal ISSUE 80, THIRD QUARTER 2012 FPGA-Based Instrumentation Withstands the Chill of Deep Space Partial Dynamic Reconfiguration Fuels FSK Demodulator Design Demystifying FPGA Mathematics Ins and Outs of ADCs and DACs Xilinx Rolls World’s First Heterogeneous 3D FPGA Virtex-7 H580T Device Enables 2x100G Transponder-on-Chip for CFP2 Optical Nets Artix-7 FPGA Brings High-End Value to Low-Cost Market page 14

Upload: xilinx-xcell-publications

Post on 14-Mar-2016

255 views

Category:

Documents


3 download

DESCRIPTION

The summer 2012 edition of Xcell Journal magazine features in-depth looks at Xilinx’s Virtex®-7 H580T, the world’s first heterogeneous 3D FPGA, and the Artix™-7 A100T, the first device shipping from Xilinx’s feature-rich, low-power, low-cost 28nm generation of All Programmable devices. The feature also includes several informative how-to and design methodology articles from the Xilinx user community.

TRANSCRIPT

Page 1: Xcell Journal issue 80

www.xilinx.com/xcell/

S O L U T I O N S F O R A P R O G R A M M A B L E W O R L D

Xcell journalXcell journalI S SUE 80 , TH IRD QUAR TER 2012

FPGA-Based Instrumentation Withstands the Chill of Deep Space

Partial Dynamic Reconfiguration Fuels FSK Demodulator Design

Demystifying FPGA Mathematics

Ins and Outs of ADCs and DACs

Xilinx Rolls World’s First Heterogeneous 3D FPGAVirtex-7 H580T Device Enables 2x100GTransponder-on-Chip for CFP2 Optical Nets

Artix-7 FPGA Brings High-EndValue to Low-Cost Market

page14

Page 3: Xcell Journal issue 80

New stacked silicon architecture from Xilinx makes your big design much easier to prototype.

Partitioning woes are forgotten, and designs run at near final chip speed. The DINI Group DNV7F1board puts this new technology in your hands with a board that gets you to market easier, faster and

more confident of your design’s functionality running at high speed. DINI Group engineers put the

features you need most, right on the board:

• 10GbE

• USB 2

• PCIe, Gen 1, 2, and 3

• 240 pin UDIMM for DDR3

There is a Marvel Processor for any custom interfaces you might need and

plenty of power and cooling for high speed logic emulation. Software and

firmware developers will appreciate the productivity gains that come with this

low cost, stand-alone development platform.

Prototyping just got a lot easier, call DINI today and get your chip up to speed.

www.dinigroup.com • 7469 Draper Avenue • La Jolla, CA 92037 • (858) 454-3419 • e-mail: [email protected]

Page 4: Xcell Journal issue 80

L E T T E R F R O M T H E P U B L I S H E R

Xilinx, Inc.2100 Logic DriveSan Jose, CA 95124-3400Phone: 408-559-7778FAX: 408-879-4780www.xilinx.com/xcell/

© 2012 Xilinx, Inc. All rights reserved. XILINX, the Xilinx Logo, and other designated brands includedherein are trademarks of Xilinx, Inc. All other trade-marks are the property of their respective owners.

The articles, information, and other materials includedin this issue are provided solely for the convenience ofour readers. Xilinx makes no warranties, express,implied, statutory, or otherwise, and accepts no liabilitywith respect to any such articles, information, or othermaterials or their use, and any use thereof is solely atthe risk of the user. Any person or entity using suchinformation in any way releases and waives any claim itmight have against Xilinx for any loss, damage, orexpense caused thereby.

PUBLISHER Mike [email protected]

EDITOR Jacqueline Damian

ART DIRECTOR Scott Blair

DESIGN/PRODUCTION Teie, Gelwicks & Associates1-800-493-5551

ADVERTISING SALES Dan [email protected]

INTERNATIONAL Melissa Zhang, Asia [email protected]

Christelle Moraga, Europe/Middle East/[email protected]

Miyuki Takegoshi, [email protected]

REPRINT ORDERS 1-800-493-5551

Xcell journal

www.xilinx.com/xcell/

Hardware-Assisted Prototyping: Make vs. Buy?

A s long as there are ASICs, ASSPs, microprocessors and other types of digital ICsfollowing the silicon process technology curve, there will be a need to prototypethose devices in real hardware before they go to production. But is it easier to

make or buy a prototyping system? That’s the question panelists attacked in a recentPavilion Panel session at the 49th Design Automation Conference in San Francisco.

Gabe Moretti, EDA veteran and owner of the popular website GabeonEDA.com, moder-ated the panel, entitled “Hardware-Assisted Prototyping and Verification: Make vs. Buy?”Gabe’s panelists were Qualcomm engineering director Albert Camilleri; Austin Lesea ofXilinx Research Labs, and co-author of the book The FPGA Prototyping Methodology

Manual; and Mike Dini, CEO of the hardware-assisted verification company the Dini Group. Panelists agreed that with SoCs now reaching well over 100 million gates, it’s imperative

to emulate the chip you are designing before it goes to manufacturing so as to reduce corner-case bugs and minimize respins. Likewise, as software becomes a greater part of theoverall system development, hardware-assisted prototyping systems, such as those offeredby the Dini Group and Synopsys’ HAPS group, are becoming an imperative.

“I look at the size and complexity of IC designs people are doing today, and I’m not sureI’d want to build a prototype for something this large,” said Moretti. “The commercial sys-tems are pretty expensive, but I’m not sure if it makes more sense today to buy one ratherthan build one. So, should I build one or buy one?” Moretti asked panelists.

“I think a lot of people start out thinking there are some good reasons to build their own,but quickly realize it probably would have been easier to buy one,” said Lesea. “In the mod-ern SoC world, about 80 percent of the project is software. The sooner you can get somethingfor the software team to work on, the sooner your project is not stuck. … The time you savein time-to-market pretty much pays for the commercial prototyping system and more.”

“I’ve long argued that unless software guys have real hardware working at some rea-sonable frequency, they can’t really do effective development,” added Dini. “Unless theycan get a real blue screen of death, if they can’t run an interrupt into the weeds and smokea piece of hardware, generally development slows or stalls … pure software-only simula-tion can’t cover enough corner cases.”

Camilleri said he has both built and purchased prototyping systems depending on whata given design required and what commercial offerings were available. “If there is alreadya commercial offering that will do what you need, then why build one?” Camilleri noted,however, that sometimes a design’s performance requirements will force design groups tobuild a custom system. But he said that in general it’s only a good idea if you are buildingit for a specific project in which everyone on the design team is intimately aware of all thenuances. “If you are building a prototyping system for your division, for example, that canbe a huge undertaking,” he said. “It’s easy to underestimate the time it takes and the quali-ty of EDA software it takes to develop a commercial-class prototyping system.”

With a few exceptions in the emulation world, the vast majority of emulation andprototyping systems are FPGA based.

Panelists were especially encouraged by FPGAs vendors’ embrace of 3D IC technologyto exceed the doubling of capacity for the next generation of FPGAs that one typicallyexpects with each new silicon process technology. “The Virtex®-7 2000T is a game changer,”said Dini. Panelists said that the fewer FPGAs there are in an emulation system, the easierit is to partition a design and stabilize it in the prototyping system. An FPGA with massivecapacity allows ever-larger designs, with hundreds of millions of ASIC gates, to get tomarket much sooner.

Mike SantariniPublisher

Page 5: Xcell Journal issue 80

ww

w.a

ldec

.com

© 2012 Aldec and Xilinx are trademarks of Aldec, Inc. and Xilinx, Inc. respectively. All other trademarks or registered trademarks are property of their respective owners.

Headquarters-US2260 Corporate Circle Henderson, NV 89074USA

Phone: +1.702.990.4400E-mail: [email protected]

EuropeMercia House51 The Green, South BarBanbury, OX16 9ABUnited Kingdom

Phone: +44.1295. 20.1240 Email: [email protected]

JapanShinjyuku Estate Bldg. 9F1-34-15, Shinjyuku Shinjyuku-ku, Tokyo 160-0022Japan

Phone: +81.3.5312.1791Email: [email protected]

ChinaSuite 2004, BaoAn Building#800 DongFang RoadPuDong DistrictShanghai City, 200122, P.R. China

Phone: +86.21.6875.2030Email: [email protected]

Israel1 Haofe StreetKadima 60920Israel

Phone: +972.072.2288316E-mail: [email protected]

TaiwanNo. 37, Section 2Liujia 5th RoadHsinchu County 302Zhubei City, Taiwan

Phone: +886.3.6587712E-mail: [email protected]

India#2145, 17th Main 2nd Cross, HAL 2nd Stage IndiranagarBangalore, 560008, India

Phone: +91.80.3255.1030Email: [email protected]

VHDL, Verilog, SystemVerilogOVM/UVM, VMMCode and Functional CoverageDSP Co-Simulation (MATLAB® & Simulink®)Emulation (Hardware-Assisted Veri�cation)

Your Global Veri�cation Solutions ProviderXilinx

V

®

THE DESIGN VERIFICATION COMPANYR

Verification_2012.07.pdf 1 7/20/2012 2:07:41 PM

Page 6: Xcell Journal issue 80

C O N T E N T S

VIEWPOINTS XCELLENCE BY DESIGN APPLICATION FEATURES

1414

2020

Cover StoryXilinx Introduces First Heterogeneous 3D FPGA: Virtex-7 H580T

88

Product Feature

Artix-7 FPGA Brings High-End Value to Low-Cost Market…14

Xcellence in Aerospace & Defense

FPGA-Based Instrumentation Withstandsthe Chill of Deep Space… 24

Xcellence in Green Technology

Using Spartan Technology to SupportGreen Energy Development… 28

Xcellence in Solid-State Storage

Designing a 19-nm Flash PCIe SSD with Kintex-7 FPGAs… 32

Letter From the PublisherHardware-Assisted Prototyping:

Make vs. Buy?… 4

Xpert OpinionFPGAs Head for the Cloud… 20

2828

Page 7: Xcell Journal issue 80

T H I R D Q U A R T E R 2 0 1 2 , I S S U E 8 0

Xperts Corner

How Partial Dynamic ReconfigurationHelped Build an FSK Demodulator… 38

Xplanation: FPGA 101

The Basics of FPGA Mathematics… 44

Xplanation: FPGA 101

The FPGA Engineer’s Guide to Using ADCs and DACs… 50

THE XILINX XPERIENCE FEATURES

5050

Excellence in Magazine & Journal Writing2010, 2011

Excellence in Magazine & Journal Design and Layout2010, 2011, 2012

Profiles of Xcellence The sky’s the limit

for SSD enterprise storage startup Skyera… 54

Tools of Xcellence Xilinx FPGAs improve

beamforming system design… 58

Xtra, Xtra Your questions answered

on the new Vivado Design Suite… 64

Xclamations! Share your wit and wisdom by supplying

a caption for our techy cartoon. Three chances to win

an Avnet Spartan-6 LX9 MicroBoard!… 66

XTRA READING

44442424

Page 8: Xcell Journal issue 80

COVER STORY

Xilinx Introduces FirstHeterogeneous 3D FPGA:

Virtex-7 H580T

Xilinx Introduces FirstHeterogeneous 3D FPGA:

Virtex-7 H580T

8 Xcell Journal Third Quarter 2012

Page 9: Xcell Journal issue 80

C O V E R S T O R Y

by Mike SantariniPublisher, Xcell JournalXilinx, [email protected]

Built with Xilinx 3D SSI technology,this device enables designers to createa 2x100G OTN transponder-on-a-chip.

Hot on the heels ofbreaking capacityand transistor-countrecords with therelease of its 28-nanometer Virtex®-7

2000T (the industry’s first 28-nm FPGAimplemented in 3D stacked-siliconinterconnect technology), Xilinx® inMay released a device using SSI tech-nology that breaks the record for FPGAbandwidth. The new Virtex-7 H580Tdevice is the world’s first heteroge-neous 3D FPGA, integrating a dedicat-ed eight-channel 28-Gbps transceiverslice (or die) alongside two transceiver-rich FPGA dice on a single silicon inter-poser. All told, the new product giveswired communications companies adevice with up to forty-eight 13.1-Gbpstransceivers as well as the eight 28-Gbps transceivers and 580,480 logiccells—making the Virtex-7 H580TFPGA the only single-chip solution foraddressing key 2x100G applications andfunctions (Figure 1). Product details arespelled out at http://www.xilinx.com/

publications/prod_mktg/Virtex7-

Product-Table.pdf.“Combined with Xilinx’s 100-Gbps

gearbox, Ethernet MAC, OTN andInterlaken IP, the Virtex-7 HTdevices provide customers with thekind of system integration they needto meet space, power and cost chal-lenges as they transition to 100-Gbpslow-power optical modules in thenew CFP2 form factor,” said EphremWu, senior director of advancedcommunications at Xilinx. “The 28-Gbps transceivers are independentfrom the 13.1-Gbps transceivers.Customers can use all available 28-

Gbps transceivers without having togive up any 13.1-Gbps transceivers.”

The Virtex-7 H580T FPGA is the firstof three heterogeneous 3D devicesXilinx will field in its 28-nm family. TheVirtex-7 H870T, due in the near future,comprises two eight-channel transceiv-er dice alongside three FPGA logic diceon a single device, yielding a total of six-teen 28-Gbps transceivers, seventy-two13.1-Gbps transceivers and 876,160logic cells on one chip. A third heteroge-neous device, the Virtex-7 H290T, placesone eight-channel transceiver die along-side one FPGA logic slice on a singledevice, yielding twenty-four 13.1-Gbpstransceivers, eight 28-Gbps transceiversand 284,000 logic cells on one chip.

“Our 3D SSI technology enablesXilinx to jump ahead of the technologycurve and offer All Programmabledevices that enable the highest levelsof integration, system performance,power reduction, BOM cost reductionand productivity,” said Wu. “With theVirtex-7 2000T, we used 3D SSI tech-nology to stack four logic slices side-by-side on a silicon interposer to cre-ate a device with 6.8 billion transistorsand 1,954,560 logic cells—double thecapacity of the largest competing 28-nm FPGA and well beyond the expect-ed doubling of transistor counts dic-tated by Moore’s Law. Now, with theVirtex-7 HT devices, we have lever-aged our 3D SSI technology to stack28-Gbps transceiver slices alongside28-nm FPGA slices on a silicon inter-poser—all in a single chip.”

The SSI technology, Wu said,“enables Xilinx to field a device todaythat will allow customers to offer dis-tinctly compelling value to their cus-

Third Quarter 2012 Xcell Journal 9

Page 10: Xcell Journal issue 80

tomers of 100-Gbps optics-enabledequipment and allow the wired com-munications industry to acceleratethe development of next-generation400G equipment.”

THE INSATIABLE BANDWIDTH REQUIREMENTAlex Goldhammer, senior product linemanager at Xilinx for Virtex-7 FPGAs,notes that as more and more systemsconnect to the Internet and privatenetworks, the demand spirals for morebandwidth to transfer ever-larger filesand stream higher-quality video andaudio across the globe. To accommo-date this demand, service providerswant higher-bandwidth wired commu-nications equipment at a lower costper bit. The wired communicationssector in particular is currently build-ing equipment to conform to recentlyformalized 100-Gbps communicationsoptical transceiver standards—mostnotably CFP2 optics, OIF CEI-28-VSRand IEEE 802.3ba.

At the heart of the 100-Gbps infra-structure buildout are optical trans-port network (OTN) transponders andmuxponders, and 100G Ethernetcards. Network companies place theseOTN cards at the center or core of theoptical network—the fastest sec-tions—to ensure the integrity andproper routing of data as it flies acrossthe globe on fiber-optic cables.

Goldhammer said companiesalready have equipment with first-gen-eration 100-Gbps OTN transpondercards today, each typically composed

of a series of one or two ASSPs andone FPGA. These first-generation 100-Gbps OTN cards transmit and receiveinput from fiber optics via a CFP opti-cal module. (The acronym stands for CForm-factor Pluggable.) An ASSP thentakes 10x11.1G OTL 4.10 or CAUI (100-Gbps attachment unit interface) fromthe CFP and performs 100-Gbps for-ward error correction (GFEC), OTU-4framing and 100GE mapping beforesending the data to the FPGA via aCAUI. The FPGA is commonly used totranslate the protocol to the requiredform for the backplane to route thedata to the next point in the networkand, ultimately, its destination.

The CFP optical modules, relativelybulky and relatively expensive, are the

main stumbling blocks in these first-generation 100-Gbps OTN transportcards, said Goldhammer. To addressthis problem, the industry recently cre-ated the CFP2 form factor, defining anoptical module for 100-Gbps line cardsthat is half the width (pitch) of CFPwith slightly less depth, in the samepower envelope. The advent of CFP2means equipment companies can swapout existing CFP-based line cards fornew cards that have two CFP2 chan-nels per unit area, thus doubling thebandwidth of each card slot and poten-tially doubling the bandwidth of datacenters (see Figure 2).

But Goldhammer said that CFP2comes with new technical challenges.“CFP2 requires 25- to 28-Gbps trans-

10 Xcell Journal Third Quarter 2012

C O V E R S T O R Y

CFP

CFP2 CFP2 CFP2 CFP2 CFP2 CFP2 CFP2 CFP2

CFP CFP CFP

100G CFPOPTICS

CFP2OPTICS

CAUI

CAUI4

10x10G

4x25G

100G

4 CFPs400 Gbps60 Watts

8 CFP2s800 Gbps60 Watts

Figure 1 – Xilinx 3D SSI technology forms the foundation for the Virtex-7 H580T, the world’s first heterogeneous FPGA. It marries 28-nm FPGA logic slices and a dedicated 28-Gbps transceiver die on a silicon interposer.

Figure 2 – The CFP2 form factor makes it possible todouble the bandwidth of 100-Gbps OTN cards. CFP2cards are half the width of a CFP and use half thepower, enabling drastically lower system costs.

Page 11: Xcell Journal issue 80

ceivers, PCB channel modeling usingIBIS-AMI models and high-speed serialmodeling software tools. And eachcard must maintain the same powerbudget as the CFP card it is replacing.”Although the move from CFP to CFP2delivers twice the bandwidth per watt,“simply doubling the amount of chipson each card to handle the bandwidthis not going to be viable, especiallywhen it comes to power budgets,” hesaid. “CFP2 requires more sophisticat-ed silicon devices with greater degreesof integration.”

One architecture that equipmentmanufacturers are currently contem-plating for CFP2 cards consists of fivedevices, Goldhammer said: four ASSPsand one FPGA. Each card will have twoCFP2 optical modules, which woulduse a 4x27G OTL 4.4 interface to a gear-box ASSP. The gearbox in its turndemultiplexes the 4x27G OTL 4.4 signalto 10x11.1G OTL 4.10. Then, anotherASSP performs 100-Gbps GFEC, OTU-4

framing and 100GE mapping, and trans-ports the data on a CAUI interface to theFPGA. Next, each of the CFP2’s twochannels sends the data to the oneFPGA on the board, which serves as aCAUI-to-Interlaken bridge for the back-plane to in turn send data to the nextpoint in the network, and ultimately itsdestination (Figure 3).

“That configuration will typicallyrequire four ASSPs and one FPGA,” saidGoldhammer. “The biggest problemswith that configuration are power, com-plexity and cost. Simply doubling theASSPs will blow the power budgets.”

While a CFP2 card will enable twicethe bandwidth of CFP, each CFP2 mod-ule (with two 100-Gbps CFP2 ports)must maintain the power budget allot-ted to a single-port CFP module so asto stay within the same power budgetsallocated for the entire card.Goldhammer said that the operationalexpense of this equipment is a signifi-cant concern for carriers, as there are

many of these systems in the carrier’splant and they must maintain strictpower caps. “They have to stay in thesepower budgets, but they want that 2xincrease in bandwidth—so much ofthat burden falls to the semiconductorvendors to also lower power,” he said.

With the new Virtex-7 H580T FPGAand Xilinx IP, Goldhammer said, 100-Gbps OTN line card makers can furthermaximize the value of their CFP2-basedOTN cards by using just one Virtex-7H580T to do the job of what would oth-erwise take all five chips. “The Virtex-7H580T FPGA is a groundbreakingdevice timed perfectly to the CFP2 100-Gbps OTN transponder card marketrequirements,” said Goldhammer.

With the Virtex-7 H580T FPGA andXilinx IP, companies can implement anarchitecture for their CFP2-based cardsin which two CFP2 channels on a cardfeed into one Virtex-7 H580T FPGA. TheFPGA integrates gearbox, 100-GbpsGFEC, OTU-4 framing, 100GE mapping

Third Quarter 2012 Xcell Journal 11

C O V E R S T O R Y

CFP2ASSP

Gearbox

Backp

lane In

terface

FPGA

MAC toInterlaken

Bridge

100GGFEC

CAUI CAUI

Interlaken

OUT-4Framer

ASSP

100GEMapper

CFP2

CFP2

Backp

lane In

terface

CFP2

ASSPGearbox 100G

GFECCAUI CAUIOUT-4

Framer

ASSP

100GEMapper

100GGFEC

OUT-4Framer

100GEMapper

100GGFEC

Gearbox

Virtex-7 H580T

Xilinx Virtex-7 H580T – Single-Chip OTN 2x100G Transponder

ASSP Solution – Five-Chip OTN 2x100G Transponder

OUT-4Framer

100GEMapper

Gearbox

Interlaken

Interlaken

Figure 3 – The Virtex-7 H580T FPGA and Xilinx IP will enable customers to quickly create single-chip, CFP2-based 100-Gbps OTN transponder cards instead of using five-chip cards.

Page 12: Xcell Journal issue 80

and Interlaken bridging in one device(as again shown Figure 3).

“This is a single-chip solution that isnot only much lower power than a mul-tiple-chip ASSP or ASIC configuration,but faster, more reliable and of coursemuch less expensive to produce,” saidGoldhammer. “It eliminates the needfor multiple chips and their relatedpower and cooling circuitry. With theVirtex-7 H580T FPGA, we offer cus-tomers more value in terms of integra-tion, BOM cost reduction and improvedsystem performance, without exceed-ing the power cap requirements forCFP2-based OTN transport cards.”

What’s more, Xilinx also has theright IP to enable communicationequipment companies to acceleratetheir design productivity and get theirsingle-chip 100G optics-based cardsto market faster. Through internaldevelopment and a series of strategicacquisitions, Xilinx offers the wholepackage: 100-Gbps gearbox, Ethernet

MACs, OTN and Interlaken IP. “We’veoptimized all these cores for designinginto the 28-nm Virtex-7 FPGA logiccell slices on the device,” saidGoldhammer. “Xilinx manufacturedthe slices in TSMC’s 28-nm high-per-formance, low-power [HPL] technolo-gy, which greatly reduces leakage, forthe optimal mix of high performanceand low power.”

SSI TECHNOLOGY AND 28-GBPS TRANSCEIVERSOne of the biggest challenges of high-speed communications equipmenttoday is ensuring the proper functionof transceivers so that they maintaingood signal integrity. “Transceivers areanalog circuits and that means they canbe affected by a number of factors—especially noise,” said Goldhammer.“In most mixed-signal devices, trans-ceivers are usually placed in isolationon the edges of devices to shield themfrom the digital circuitry in the middle

of the device. Digital circuitry tends tobe noisy, so that’s why they usuallyisolate it from analog.”

Over the last decade, to increasebandwidth into the gigabits-per-secondrange, the industry has turned to high-speed analog transceivers to quicklysend and receive fast-traveling signals.Traditionally, the rule of thumb hasbeen that the higher the bandwidth ofthe transceiver, the harder it is toensure solid signal integrity.

Goldhammer said that because theVirtex-7 H580T FPGA is a highly inte-grated one-chip SSI technology solu-tion, CFP2-based line cards builtaround it will achieve much improvedperformance. “Moving to 4x25G inter-faces greatly reduces the complexityof routing 10x10G interfaces,” he said.“Although some are concerned about25G-to-28G transceivers, with XilinxSSI technology, Xilinx is able to sub-stantially reduce this complexity. The28G transceivers, which have sensi-tive analog circuitry, are physicallyseparated from the digital logic. Thisarchitecture ensures good isolationfrom the digital transceiver-rich dice.”

The 28G transceivers are manufac-tured on a high-speed process technol-ogy, Goldhammer said, ensuring theyare best in class. “The FPGA slices, bycontrast, are manufactured in 28-nmHPL to ensure the lowest total power.”The result, he said, is stellar 28-Gbpstransceiver performance and signalintegrity on the Virtex-7 H580T FPGA.To see these transceivers in action,check out this video on YouTube:http://www.youtube.com/watch?v=

FFZVwSjRC4c&feature=player_

profilepage.Goldhammer said the physical isola-

tion afforded by the SSI architectureenabled Xilinx to give the Virtex-7 H580TFPGA eight 28-Gbps transceivers—twicethe number found in the largest devicefielded by the competition.

What’s more impressive is that theVirtex-7 H580T FPGA is not even thehighest-transceiver device Xilinx willoffer in its 28-nm family. The company

C O V E R S T O R Y

12 Xcell Journal Third Quarter 2012

Video: A Virtex-7 H580T device demonstrates its ability to deliver the eye and jitter characteristics necessary to reach the

performance required to interface to CFP2 optic modules.

Page 13: Xcell Journal issue 80

will soon roll out the Virtex-7 H870Tdevice, which will have sixteen 28-Gbps transceivers, seventy-two 13.1-Gbps transceivers and 876,160 logiccells. Goldhammer said that if a cus-tomer used all the transceiver capa-bilities of the H580T device, theycould conceivably have a design witha serial connectivity totaling 2.78 ter-abits per second.

“It was impractical and cost-prohibi-tive to place that many 28-Gbps trans-ceivers on a monolithic FPGA,” he said.“Fortunately, SSI technology enabledus to create a scalable FPGA familytoday that has eight to sixteen 28-Gbpstransceivers.” ASSP suppliers and otherFPGA vendors have at most four 28Gtransceivers. This seems indicative of

the challenges in doing the job monolith-ically in 40- and 28-nm processes.

The Virtex-7 H870T device is tar-geted at the next generation in wiredcommunications—the 400G market,Goldhammer said. “The 400G marketis a ways off, and if anything, compa-nies are just starting to look at it intheir labs and the standards bodieshaven’t gotten to it yet,” he said.“What’s beautiful is that we alreadyhave a device that’s capable of doingit. We can help them speed up devel-opment of 400G, speed up the paceof innovation.”

In addition to the Virtex-7 H580T andH870T FPGAs, Xilinx will also releaseas part of the 28-nm family the Virtex-7H290T. By leveraging Xilinx’s 3D SSI

technology, the H290T will offer twenty-four 13.1-Gbps transceivers, eight 28-Gbps transceivers and 284,000 logiccells. Goldhammer said the Virtex-7H290T is particularly well suited for the2x100G gearbox market.

First silicon of the Virtex-7 H580TFPGAs is shipping to key customerstoday, with development tool supportavailable in the recently announcedVivado™ Design Suite. Customers inter-ested in using the Virtex-7 H580T devicecan contact their local Xilinx represen-tative for further pricing and availabilitydetails. You can also find new whitepapers and videos on Xilinx’s 28-GbpsSerial Transceiver Technology page:http://www.xilinx.com/products/

technology/transceivers/index.htm.

Third Quarter 2012 Xcell Journal 13

C O V E R S T O R Y

Page 14: Xcell Journal issue 80

14 Xcell Journal Third Quarter 2012

PRODUCT FEATURE

Xilinx Artix-7 FPGA Ships High-End Value to Low-Cost Market by Mike SantariniPublisher, Xcell JournalXilinx, [email protected]

Page 15: Xcell Journal issue 80

With an eye on helpingits customers offergreater value to theircustomers, Xilinx®

in July announced itis shipping its Artix™-7 A100T FPGA,the first of three parts in a feature-richline of low-cost, low-power, All Prog-rammable devices. The larger Artix-7A200T and A350T FPGAs will follow inthe coming months.

The first Artix-7 device shipment tocustomers represents another majormilestone for Xilinx. It means the com-pany is now shipping FPGAs from allof the families in its 28-nanometer AllProgrammable device rollout. Xilinxearlier released the world’s first 3D ICFPGAs, the Kintex™-7 line, and thenbroke new ground with the Zynq™-7000 Extensible Processing Platform,which marries an ARM processor andFPGA logic on the same die.

Ehab Mohsen, product marketingmanager at Xilinx, predicts the Artix-7lineup will prove to be a smash hitwith customers and will set a newstandard for feature-set sophistica-tion, power consumption and ulti-mately value in what the press has tra-ditionally called the “low end” of theFPGA market. FPGA vendors refer tothis sector as the “value-based,” “high-volume” or “cost-sensitive” market.

“If you look at the feature set of theArtix-7 family, it’s hard to call it ‘low-end.’ It’s certainly the highest-end andhighest-value FPGA line in that marketto date,” said Mohsen. “Where thebiggest Spartan®-6 FPGA was 150klogic cells, the Artix-7 family starts at100k logic cells and runs all the way upto 350k logic cells.” Beyond logic cellcount, he said, these FPGAs have eightto sixteen 6.6-Gbps transceivers, up to18,540 kbits of block RAM and as manyas 1,040 DSP48E1 slices.

“The Artix-7 family offers twice theperformance of the Spartan-6 familyand half the power. That’s a pretty high-end, ‘low-end’ FPGA,” added Maureen

Third Quarter 2012 Xcell Journal 15

P R O D U C T F E A T U R E

Xilinx is now shippingthe first device in itsAll Programmable Artix-7 FPGA series,setting new powerand performance standards for cost-sensitive applications.

Page 16: Xcell Journal issue 80

ing the traditional “FPGA” label, whichhas long tied FPGA advancementsmerely to a doubling of logic cells every22 months in keeping with Moore’sLaw. Even the Artix-7 family—which isin fact Xilinx’s smallest 28-nm device—is loaded with programmable systemfeatures well beyond logic cells.

Mohsen said that with a blockRAM-to-logic ratio of up to 18.5 Mbitswithin 360k logic cells and 1,040DSP48E1 slices for the same capacity,the Artix-7 rivals the logic density ofcompeting midrange products whilestill benefiting from lower power andlower cost. The DSP resources pro-

vide up to 1,306 GMACs of DSP per-formance—three times that of thecompetition. This signal-processingclout is useful for imaging and com-munications applications requiringextensive processing capacity.

In addition, the Artix-7 family sup-ports up to 16 configurable 6.6-Gbpstransceivers that Xilinx has optimizedfor low power, giving the Artix-7 thefastest line rates for the cost-sensitivemarket. These transceivers supportpre-emphasis and continuous-time lin-ear equalization (CTLE) to compen-sate for signal distortion across trans-mission channels. “With 211 Gbps of

Smerdon, strategic marketing manag-er at Xilinx. “In fact, you would have togo to our competitor’s ‘midrange’ line,a more expensive device family, to finda comparable feature set, and eventhen, the Xilinx Artix-7 family still hasadvantages.”

LEVERAGING HPL AND 7 SERIES’SCALABLE ARCHITECTUREReducing power consumption was atop priority for Xilinx’s 28-nm genera-tion of devices (see cover story, Xcell

Journal issue 76). In fact, Xilinxworked very closely with TSMC to for-mulate TSMC’s HPL (high-perform-

ance, low-power) 28-nm silicon manu-facturing process to a sweet spot forFPGA production. As a result, theentire Xilinx 28-nm line halved totalpower consumption compared withthe previous generation of FPGAs.

“Across all product families, cus-tomers had been asking for lowerpower consumption, but especially soin the cost-sensitive market,” saidMohsen. “These devices go into a widerange of applications where lowerpower is needed for multiple reasonsranging from longer battery life tolower energy costs, better power dissi-pation, lower BOM (not requiring

16 Xcell Journal Third Quarter 2012

P R O D U C T F E A T U R E

extra shielding and power circuitry)and smaller end-product form factor.”

As such, Mohsen said, the Artix-7family line takes full advantage ofthe 50 percent power savings whiledelivering the needed performancefor its target markets. “The 50 per-cent power reduction provides head-room for additional performance,logic density, I/O bandwidth and sig-nal processing,” said Mohsen. Thatgives designers “the flexibility toeither lower power by 50 percent ortake advantage of greater perform-ance and capacity at previous powerbudgets,” he said.

Mohsen noted that all of Xilinx’s 28-nm All Programmable devices use thesame logic architecture. The Artix-7FPGA’s slice architecture is basedclosely on that of the Xilinx Virtex®-6and Spartan-6 FPGA families, usingthe same LUT structure, control logicand outputs. “This scalable architec-ture provides users with an easymigration path when moving theirdesigns between Spartan-6 and Artix-7 FPGAs,” said Mohsen.

MOORE THAN LOGIC CELLSThe Artix-7 is a prime example of howall Xilinx devices are quickly outgrow-

AD

C

128

Ch

ann

els

Deserializer

RX Beamformer Control

DataHigh-Speed I/O

ControlHigh-Speed I/O

Artix-7 FPGARX 128-Channel Beamformer

128-Channel Transducer

547 Pins (LVDS)

46 Pins

Figure 1 – The Artix-7 FPGA’s DSP performance and I/O count can be leveraged for 128-channel portable ultrasound equipment.

Page 17: Xcell Journal issue 80

P R O D U C T F E A T U R E

Third Quarter 2012 Xcell Journal 17

total throughput, the Artix-7 is a low-cost alternative for bandwidth-sensi-tive applications that would other-wise require midrange solutions,”said Mohsen.

What’s more, Mohsen said thatbecause memory read/write band-width can affect overall system per-formance, the Artix-7 family offersDDR3 data rates of up to 1,066 Mbps,the highest in the industry for FPGAsin its class. The memory solution con-sists of a flexible controller and phys-ical layer (PHY) for interfacingdesigns and AMBA® AXI4 slave inter-faces to DDR3 and DDR2 SDRAMdevices. The controller supports anarray of external memories for flexi-ble system design, such as for stream-lined access to video and data storage.

As such, the Artix-7 A100T devicesare ideally suited for a number ofapplications that will allow cus-tomers to innovate, offer a rich new

set of features to their customers andexpand their markets. To illustrate,Mohsen described three markets—portable medical equipment, handheldradios and small cellular basesta-tions—that will greatly benefit fromthe Artix-7 FPGA family’s feature set.

PREMIUM VALUE FOR PORTABLE MEDICAL Mohsen said that companies creatingdevices for the medical electronicsfield are eager to expand their productportfolios beyond million-dollar, large-form-factor, hospital-class equipment.They are striving to also offer lower-cost, portable electronic equipmentlines to smaller doctors’ offices, hos-pital departments and even individ-ual practitioners.

“Portable ultrasound equipment is aprime example of a market that cangreatly benefit from the feature set ofthe Artix-7 FPGAs,” said Mohsen.

“Instead of having to wheel a patientinto a special room to be tested with avery large ultrasound system, theseportable systems are much smaller andcan be on a cart or even handheld andbrought to the patient. Paramedics canuse them in ambulances, and doctorswho still perform house calls can usethem, too. What’s amazing is that withthe Artix-7 FPGA line, companies cangive their next generation of portableultrasound equipment many of theadvanced features found previouslyonly in high-end systems.”

That’s not to say that these newclasses of equipment will replacethose bigger systems, Mohsen added,“because those systems also contin-ue to add incredible new featuresthanks in part to the rich feature setof our larger Kintex-7 and Virtex-7FPGA families.”

Mohsen said that because the Artix-7family offers 65 percent lower static

Low-NoiseAmplifierSection

RFTuning

SAWFilter

A/DData

Formatting I/O

HumanInterface

Digital WidebandFront End

Software-DefinedRadio Processing

Engine

Software Control Processing Engine

EncryptionProcessing

Engine

TxRxSwitch Artix-7 FPGA

300-MHz to 2-GHzAntenna

Traditional RF Section

Figure 2 – The system integration and DSP processing capacity in the Artix-7 FPGA is critical for software-radio design.

The new device family supports up to 16 configurable 6.6-Gbps transceivers that Xilinx

has optimized for low power, giving the Artix-7 FPGAsthe fastest line rates for the cost-sensitive market.

Page 18: Xcell Journal issue 80

challenging to support all these wave-forms, but they must also be complete-ly secure and able to operate in ruggedconditions where radio frequency isdifficult. So the military is always look-ing for better, lighter systems that canrun longer and more securely.”

All these requirements make theArtix-7 family ideal for SDR systems.Indeed, the new device is particularlywell suited for SDR modem manage-

ment. Mohsen explained that themodem in an SDR system performsbaseband signal preprocessing and RFsignal improvements, which requireenormous amounts of parallel process-ing and reconfigurability. “FPGAs are anatural fit for this application and mostsystems today do indeed use FPGAs,but the Artix-7 offers a vast perform-ance improvement,” he said. With up to1,040 DSP slices, the Artix-7 can provideup to 1,306 GMACs of DSP perform-ance—three times the performance ofcompeting FPGAs and far greater thanany standalone DSP or GPU.

and 50 percent lower dynamic powerconsumption than Xilinx’s Spartan-6devices, while delivering up to sixteen6.6-Gbps transceivers, designers ofportable ultrasound equipment canachieve the highest image quality formeeting JESD204B high-speed serialinterface standards. At the same time,they can extend battery life and meetsafety standards while implementing a128-channel beamformer at 41 percent

less power than alternative FPGAimplementations.

Figure 1 shows an example of theAll Programmable advantage Artix-7FPGA affords to the portable ultra-sound market.

REDUCING BOM, WEIGHT AND COST FOR MILITARY SDRAnother example of a market that willgreatly benefit from the Artix-7 FPGA’srich feature set is military software-defined radio (SDR), Mohsen said.Over the last decade, the U.S. militaryhas been diligently constructing a

18 Xcell Journal Third Quarter 2012

P R O D U C T F E A T U R E

highly sophisticated worldwide com-munications network called the GlobalInformation Grid (see cover story,Xcell Journal issue 69) that allows U.S.forces and allies to communicate glob-ally and run intelligence and militaryoperations more precisely. WhileXilinx’s larger Virtex-7 and Kintex-7FPGAs play an increasing role in thelarger communications equipment inthe GIG—from networking equipment

to aircraft and UAVs—the military isseeking better ways to connect all itsassets, even individual soldiers, to thegrid more efficiently.

“Many of the portable SDR systemsin deployment today suffer fromincreased power and short batterylife,” said Mohsen. “They are also toobig and heavy as well as expensive.They are fairly complex, too. Thesesystems require extensive DSP pro-cessing capabilities to support a vari-ety of radio protocols or waveformsfor voice, data and video communica-tions across the globe. Not only is it

Tran

scei

vers

Tran

scei

vers

AD

CD

AC

Modem

Ethernet Switch

Control Plane Processor

Traffic Management,Packet Processing

TimeSynchronization

Channel 1 RadioEthernet

Channel 2

Artix-7 FPGA

Figure 3 – Designers can integrate multichip functionality for microwave mobile backhaul using the Artix-7 FPGA.

Page 19: Xcell Journal issue 80

P R O D U C T F E A T U R E

Third Quarter 2012 Xcell Journal 19

Offering 101,440 logic cells, theArtix-7 is available in a 15 x 15-mmpackage, which makes it the indus-try’s smallest device at that capacitylevel. The mix of increased capacityand smaller size allows design teamsto create smaller and lighter systems.

Figure 2 shows an example of theAll Programmable advantage theArtix-7 FPGA affords in the portableSDR systems market.

WIRELESS BACKHAUL BUILDOUTWireless backhaul is another exampleof an application that will greatly ben-efit from the Artix-7 family. Mohsensaid that the vast majority of thegrowth in cellular traffic today isoccurring in urban and suburbanareas. To address this trend, opera-tors plan to boost the capacity of theirnetworks by deploying small cellbasestations on lamp posts, trafficlights and even the walls of adjoiningbuildings. “They need to connect allthese small cells in clusters and to thenearest aggregation points, so opera-tors must deploy low-power, low-costbackhaul units whose microwaveradio links can span up to tens ofmiles,” said Mohsen.

Whereas a traditional mobilebackhaul unit typically supports sev-eral Ethernet links, a wirelessmobile backhaul forwards the trafficbetween Ethernet links and radiochannels using an internal Ethernetswitch. “Both ends of the unitrequire high-speed transceivers, sothat’s where the Artix-7 FPGAmakes sense as an ideal low-costalternative to bigger, more expen-sive devices,” said Mohsen. “TheArtix-7 family delivers maximalbandwidth with its sixteen 6.6-Gbpstransceivers for both Ethernet andRF links using Jedec JESD204B con-nectivity to data converters.”

Artix-7 devices also allow wire-less-equipment providers to achievegreater system integration andreduce BOM costs. Half of the back-haul unit contains packet-process-

ing, traffic-management and timing-synchronization functions, Mohsenexplained. Meanwhile, the other halfof the unit supports modem channelsfor signal processing. The key require-ments for the modem are adequatehigh-performance DSP processingand high-speed transceivers to inter-connect with the data converters forhigh data throughput.

“The Artix-7 family fits nicely forthese functions because it has the rightmix of logic density, intellectual-prop-erty support and DSP resources,” saidMohsen. “The Artix-7 A200T, whichwill follow the Artix-7 A100T releaselater this year, has 215,360 logic cells,allowing wireless-equipment compa-nies to create a backhaul solution thatintegrates all the needed packet-pro-cessing, traffic-management, timingand synchronization blocks as well asa single high-speed radio channel intoone chip.” Likewise, the third memberof the family, the Artix-7 A350T device,will allow vendors of wireless-networkequipment to integrate two high-speedradio channels on a single chip.

Mohsen also noted that equipmentvendors make a concerted effort toensure the units have a low visualimpact and don’t appear to clutter theurban and suburban landscapes. Thisdesign requirement typically meansthe units must be very small, whichcan make it challenging for designersto ensure that each unit effectivelydissipates the heat it generates. TheArtix-7 family allows equipment ven-dors to keep power in check whilefurther reducing the overall unit sizeof their systems.

Figure 3 shows an example of the AllProgrammable advantage the Artix-7FPGA affords to the small cell wirelessbackhaul systems.

The first Artix-7 A100T FPGAs areavailable today, with production qualifi-cation scheduled for the first quarter of2013. Designers can begin their Artix-7family designs today using Xilinx designtools. For more information, please visitwww.xilinx.com/artix.

Page 20: Xcell Journal issue 80

The last decade has seen the emer-gence of a new global market forcloud computing. This new para-

digm, which delivers computing as a serv-ice over the Internet, represents a funda-mental shift in the way computers areused. The cloud offers enterprises themeans to shift tasks from their local ITinfrastructure into remote, optimized com-puting clusters and thus into the hands ofthe operator providing the cloud service.For consumers, the cloud delivers storage,video, messaging, social networking, gam-ing, Web search and many other servicescoherently across diverse computingdevices anywhere in the world.

20 Xcell Journal Third Quarter 2012

XPERT OPINION

FPGAs Head for the Cloud

by Michaela BlottSenior Research EngineerXilinx, [email protected]

Tom EnglishResearch ScientistXilinx, [email protected]

Emilio BilliCTOEB [email protected]

Page 21: Xcell Journal issue 80

Third Quarter 2012 Xcell Journal 21

At the heart of the cloud computingrevolution is the data center, whichintegrates the compute power, storageand interconnect required to service aglobal user base. Data centers areexperiencing phenomenal growth,which is translating directly into mas-sive investment. According to SynergyResearch Group, data center networkinfrastructure sales grew 22 percent in2010 alone. Companies such as Googleand Facebook are at the forefront ofthe cloud computing revolution andcorrectly anticipated the need forcolossal data center infrastructure toaddress a massive global user base.

THE FPGA ADVANTAGEFPGA technology can bring numerousadvantages for computing, storage and

networking as data centers strive tobecome faster, larger, cheaper and

greener. Within the networkinginfrastructure, FPGAs can

address the ever-increasingthroughput and process-

ing requirements whileremaining highly power-efficient. Furthermore,the inherent flexibilityof the FPGA is a cru-cial benefit in thislandscape, given thecontinual arrival ofnew communicationsprotocols.

On a basic level,FPGAs offer the right

physical interfaces andprovide the required

support and bandwidthfor high-speed memory

interfaces. They offer suffi-cient device complexity to

implement packet-processingpipelines greater than 100G.

Their flexibility allows forthe implementation of per-

fectly optimized customcircuits that operate at

maximum efficiency.Major improve-

ments in high-levelsynthesis, as for

example offered withAutoESL, are helping,

to overcome the greatestdisadvantage of FPGAs in this space—namely, the low abstraction level of theFPGA programming flow. Finally, abasic FPGA IP portfolio exists that cov-ers the fundamental networking func-tions. However, more data-center-spe-cific solutions around data centerbridging (DCB), VXLAN, virtual switch-ing and other specialized technologieshave yet to be developed.

Within servers, FPGAs are an attrac-tive implementation on network inter-face cards (NICs). Although a plethora

of controllers from Intel, Broadcom andothers are available for implementingstandard adapters for Ethernet andFibre Channel, FPGAs are ideal whenadditional processing functions are alsointegrated on the data path between thenetwork and the CPU. Examples ofadditional processing functions includeencryption, high-frequency trading andTCP offload engines (TOE).

FPGAs are also attractive wheneither the network interface or the pro-cessing function needs to be cus-tomized in any way. In these scenariosthe FPGA offers high-speed serialtransceivers, memory interfaces, PCIe®

endpoints and a sufficiently large fabricto accommodate high-throughput datastream processing along with the basicIP blocks. A more sophisticated IP-and-solutions portfolio addressing the spe-cific needs of this market could makeFPGAs more competitive in an environ-ment where the end user is accustomedto deploying fully integrated platforms.For example, a more sophisticatedTOE IP block capable of thousands ofsimultaneous sessions (and accompa-nied by a full Linux driver and TCP/IPstack) would open up a range of newFPGA data center applications.

A special case of such a networkadapter is a QuickPath Interconnect(QPI) network adapter. QPI is Intel’sproprietary high-bandwidth, low-laten-cy CPU interconnect. Xilinx has devel-oped IP that allows FPGAs to directlyattach to the CPU via QPI, significantlyreducing latency on the host interfaceas well as providing higher bandwidthbetween CPU and network interface.Such a network adapter could be high-ly attractive within data centers, sincelatency is rapidly becoming the keyperformance bottleneck in applica-tions that are already heavily paral-lelized. A QPI NIC has as much as fourtimes the bidirectional peak bandwidthto the host as a typical PCIe Gen2 serv-er NIC. QPI’s higher transfer rates,direct FPGA-to-CPU transfers andshorter headers make it possible totransfer small messages with much

X P E R T O P I N I O N

Robust growth in the data center opens new opportunities for existing and advancedFPGA devices.

Page 22: Xcell Journal issue 80

lower latency than in PCIe. As latencybecomes the key performance bottle-neck in applications that are alreadyheavily parallelized, an ultralow-laten-cy, high-bandwidth QPI NIC becomes avery attractive proposition.

ON THE MOTHERBOARDWe see further opportunities forFPGAs on the motherboard itself.Some common applications in thedata center such as in-memorycaching are currently implemented onx86-based servers, despite the fact thatthe x86 is not particularly well-suited tothese types of applications. FPGAscould offer a dramatic improvement inperformance, power and latency. Thetrend is to move compute from a dis-tributed number of cores to a morepipelined style of data processing. Thisapproach is beneficial for FPGA archi-tectures. The volume of the silicon fitswell with the FPGA opportunities aswell. However, the low abstraction levelof FPGA programming tools will haveto be addressed in order to compete

with C compilers on x86-based serversfor end users such as Facebook.

On a more speculative note, thereis a subclass of servers in data cen-ters that are casually referred to as“wimpy nodes.” Xilinx already hasmany of the key technologies neededto accommodate the new server andSoC architectures emerging for thisspace, such as ARM processor cores,PCIe interface blocks, memory inter-faces and programmable logic. The

current Zynq™-7000 ExtensibleProcessing Platform is not yetequipped to compete with ARM-basedserver SoCs, such as Applied Micro’sX-Gene, for this market, but using thetechnology blocks already available, afuture Zynq device could conceivablypower a data center server.

Finally, increasing processingdemands, especially at the high-per-formance computing end of the spec-trum, can greatly benefit from hybridcomputing solutions that combineboth FPGAs and CPUs. Existing solu-tions from Convey and Maxeler show-case the tremendous performance andpower benefits of such an approach.For example, Maxeler’s implementa-tion of a credit derivative pricing sys-tem for a financial customer was up to37 times faster than software on anIntel E5430 server and reduced energyconsumption by more than 97 percent.The QPI technology has the potentialto increase these advantages even fur-ther, as hardware accelerators canbecome more tightly coupled with the

CPU over a low-latency, high-band-width, cache-coherent interface.

DATA STORAGE, WAREHOUSINGAND ANALYTICSSimilar to the server and networkingscenarios, existing FPGAs can providecompetitive implementations withinstorage, data warehousing and dataanalytics in three very different ways.First, recent trends integrate flash-based storage closer with the host. A

new crop of PCIe SSD controllersallows for direct attachment of flash toPCIe. FPGAs already compete in thisspace, offering the key functionalityand the basic IP building blocks. A fur-ther key advantage is the FPGA’s flexi-bility. Currently, the flash-based inter-face lacks an industry standard,although new standardization effortssuch as the Open NAND FlashInterface (ONFi) are on their way.

Then too, FPGAs can assist in accel-eration of query processing, handlingfiltering, decompression and executionof some of the relational operators.This capability could be vital in futurestorage appliances that need to pro-vide more intelligence in order to copewith throughput bottlenecks. Finally,in so-called “super-storage” applica-tions, FPGAs can play a major role,accelerating file system operationsthat would otherwise consume consid-erable CPU cycles. These currently runon separate servers co-located with thestorage servers in the storage-area net-work. FPGA acceleration would allow

for a reduction of the control-to-stor-age server ratio, increasing availablestorage and performance.

Existing FPGA technology can serv-ice these requirements. In particular,the embedded ARM processor in thecurrent Zynq architecture can alreadytackle OS functionality. Again, a moresophisticated IP-and-solutions portfo-lio could of course further increase thepotential of FPGAs and help acceler-ate new design developments.

22 Xcell Journal Third Quarter 2012

X P E R T O P I N I O N

Hybrid Computing

Desktop Virtualization

OPST

Optical Backhaul

Custom NICs

Smart Analytics

Super Storage

Smart NICs

Flash Controllers

Group 1 Group 2 Group 3

QPI NIC

QPI I/O & Memory Expansion

Application-specific Servers

Cloud RAN

Smart Networking

Wimpy Nodes

Optical Interconnects

Figure 1 – Data center opportunities in three broad categories await FPGA implementation.

Page 23: Xcell Journal issue 80

THREE KINDS OF OPPORTUNITIESAs shown in Figure 1, we categorizethese varied opportunities in threegroups. The first group contains applica-tions that require no additional develop-ment effort. Silicon features, IP portfo-lio, related software and programmabili-ty are adequate to address these mar-kets, some of which already employFPGAs. For example, Intune Networksuses FPGAs to implement optical packetswitch and transport (OPST) solutions,claiming up to 300 percent cost reduc-tion in power consumption. Maxeler andConvey Computers offer hybrid comput-ing solutions on an FPGA basis. Smartanalytics are based on FPGAs withinIBM/Netezza’s products. BlueArc

demonstrates how super storage can besignificantly improved through FPGAs,while FusionIO uses FPGAs for flashcontrollers. Napatech and Nallatech areamong the many vendors offering FPGA-based smart or custom NICs.

The second group addresses oppor-tunities that require some advanceddevelopment effort. Most notable inthis category are the QPI-relatedopportunities: QPI NIC and memoryand I/O expansion.

The final group includes longer-termopportunities that will involve signifi-cant research-and-development efforton silicon features or programming envi-ronments. For example, to addresswimpy nodes through an FPGA devicewould take a new generation of Zynqdevices that offers more integration of64-bit ARM processors, as well as widerand faster memory interfaces.

Application-specific servers and C-RAN, or Cloud RAN, lie on the borderbetween groups 2 and 3. Both of themnecessitate advanced developmentefforts to provide necessary infrastruc-

ture and platforms, and would benefitfrom new programming tools with ahigher abstraction level. However, thefact that a traditional RTL-baseddesign flow might be adequate to someof the end users moves these opportu-nities to the second group.

HIGHLY DYNAMIC MARKETData centers are a highly dynamicmarket in which interface standardsand protocols are changing rapidly.Particularly within networkingequipment, but also for computefunctions, this environment offersgreat opportunities for the deploy-ment of FPGA-based high-speed pro-cessing systems. These types of

applications are well suited forFPGA implementation with currentXilinx technology, such as theKintex™ and Virtex® families ofdevices, leveraging high-speed serialI/O and corresponding IP.

The opportunities expand with anadditional focus on the needs of thedata center market. In particular,memory access (access bandwidthand density), as well as hash andsearch function support in the form ofsilicon or IP, are extremely importantfeatures within data centers, giventhat most applications center aroundlarge amounts data that must besearched and sorted. Future FPGAdevices based on Xilinx’s stacked-sili-con interconnect (SSI) technologycan potentially play a key role here.

Finally, we believe that these oppor-tunities—in particular those within theserver space—will be contingent onimprovements in FPGA programmabil-ity. FPGA programming must beabstracted to a level that is acceptableto programmers of a data center.

Third Quarter 2012 Xcell Journal 23

X P E R T O P I N I O N

The opportunit ies for FPGAs in this environment—in particular those within the server space—wil l be contingent onimprovements in FPGA programmabil ity.

GetonTarget

Is your marketingmessage reachingthe right people?

Hit your target by advertising your product or service in the Xilinx

Xcell Journal, you’ll reach thousands of qualified engineers, designers, and

engineering managers worldwide.

The Xilinx Xcell Journal is an award-winning publication, dedicated specifically to helping programmable

logic users – and it works.

We offer affordable advertising rates and a variety

of advertisement sizes to meet any budget!

Call today: (800) 493-5551 or e-mail us at

[email protected]

Page 24: Xcell Journal issue 80

24 Xcell Journal Third Quarter 2012

XCELLENCE IN AEROSPACE & DEFENSE

The Xilinx Virtex-5 proves able to endure, operate and survive cryogenic temperatures. No wonder NASA is onboard.

FPGA-Based Instrumentation Withstands the Chill of Deep Space

Page 25: Xcell Journal issue 80

Third Quarter 2012 Xcell Journal 25

Current and future NASA robotic-flightmissions to outer planets and asteroidsrequire avionics systems, computers,controllers and data-processing unitscapable of enduring the extreme low-temperature environments of deepspace and lunar and Martian surfaces.With recent technological advances inFPGAs, it has become possible toarchitect a complete system-on-a-chip(SoC) using a single FPGA. LargeFPGAs that are radiation-hardened bydesign (RHBD) have increased thenumber of gates per square inch,reduced power consumption per gateand included microprocessors, softand hard IP, arithmetic modules, siz-able onboard memory and analog-to-digital converters.

B&A Engineering (BAENG) conduct-ed studies with the Xilinx® Virtex®-5mixed-signal RHBD FPGA to addressNASA’s need for protected, reliabledata-acquisition controllers and com-puter electronics able to operate incryogenic temperatures. This RHBDFPGA will be the workhorse of futureNASA computer and data-handling sys-tems targeted for outer-planet landing,orbiting and sample-retrieval missions.

To conduct the experiment, BAENGdesigned and built a test board basedon a commercial Xilinx XC5VLX30FPGA and support circuitry (resistors,capacitors and oscillator), as shown inFigure 1. What’s remarkable is that wefound that the chip works at tempera-tures well below spec for a commercialpart and even well below spec for aspace-grade Xilinx FPGA.

The FPGA included circuits usinginternal phase-locked loops (PLLs),as well as a ring oscillator and a num-ber of basic circuits built from LUTs.During the test, we monitored bothcircuit functionality and FPGApower consumption.

We disabled all the regulators,switches, resets and configuration-mode pins on the board with theexception of the 100-MHz oscillator.We simulated FPGA and externalflash memory voltages and currents,switches, resets and configuration-mode pins and monitored themusing external test equipment andpower supplies.

The FPGA was reconfigured atevery 10-degree decrement of temper-ature, from room temperature down to-150°C, from both the JTAG interface(Xilinx IMPACT) and flash (XCF08).We erased and reprogrammed onboardflash during each temperature meas-urement. Additionally, using Xilinx’sChipSope™ Pro and the internal FPGAsystem monitor, we monitored Xilinxdie temperature, along with 2.5V auxil-iary and 1.0V internal voltages. Thisdata provided additional reference

X C E L L E N C E I N A E R O S PA C E & D E F E N S E

Figure 1 – The Xilinx Virtex-5, XC5VLX30 test board

Cby Alireza BakhshiPrincipalB&A Engineering Systems [email protected]

Page 26: Xcell Journal issue 80

points to the chamber and test equip-ment monitoring. The internal systemmonitor can be accessed preconfigura-tion through the JTAG interface.

TEST RESULTSCryogenic testing was done using liq-uid nitrogen. We started the testing atroom temperature of 24°C, having pro-grammed the test chamber to proceedin steps to 10°C, 0°C, -10°C, all the waydown to -150°C. Figure 2 charts Xilinxvoltage currents vs. the temperature.

In the course of our testing, wemade some interesting observations.For starters, we found that the2.5V/2.5V auxiliary voltage currentsremained stable over the test temper-ature range. Both of these voltagesare used for system monitor andJTAG communication. All of the I/Osare tied to 3.3V.

Second, the internal 1.0V currentwas significantly reduced from 140 mAat +20°C to 81 mA at -150°C. This wasno surprise, since a reduction in poweris expected at low temperatures.

Finally, we found that the flashmemory’s 1.8V current remainedalmost zero down to -50°C and thenchanged to 10 mA from -50 to -90°C.It dropped to zero from the -100 to -120°C temperature range, and wentback to 10 mA from -130 to -150 °C.We are not sure what to make of thisfinding; it could be due to test meas-urement errors.

Importantly, both clocks remainedstable over the test temperature range(see Figure 3). We used a PLL to bothdivide and multiply the masteronboard 100-MHz oscillator so as togenerate the 50- and 150-MHz clocks.

XILINX LOGICAll of the logic circuits remainedfunctional through the test tempera-ture range. However, the same wasnot true of the flash memory. Duringtesting, we found that the onboardflash memory became unstable at -110°C. Starting at that temperature,it took couple of attempts to pro-

gram the flash memory from JTAG.Nevertheless we were able to do soafter two tries. In addition, theonboard flash memory became non-functional at -140°C. We weren’t ableto program the flash from JTAGthereafter.

The internal 1.0V current increasedsignificantly (384 mA) when we triedto configure the FPGA from nonfunc-

tional flash memory. This result makessense, since we are not sure what thestate of FPGA I/Os would be once theFPGA is configured from nonfunction-al flash memory. The 1.0V internal cur-rent became normal once the FPGAwas configured through the JTAG port.Meanwhile, JTAG communicationthrough IMPACT was functionalthroughout the test temperature range.

26 Xcell Journal Third Quarter 2012

X C E L L E N C E I N A E R O S PA C E & D E F E N S E

160

140

120

100

80

60

40

20

-200°C -150°C -100°C -50°C 0°C 50°C

XILINX I/O3V3 Voltagecurrent (mA)

Flash 1.8Vcurrent (mA)

XILINX Internal1.0V current (mA)

XILINX 2.5V/2.5VAuxiliary current(mA)

160

140

120

100

80

60

40

20

-200°C -150°C -100°C -50°C 0°C 50°C

XILINX 50 MHz

XILINX 150 MHz

Figure 2 – Xilinx FPGA voltage currents vs. temperature

Figure 3 – Xilinx 50/150-MHz clock vs. temperature

Page 27: Xcell Journal issue 80

Additionally, ChipScope Pro wasfunctional throughout the test tem-perature range and with it we wereable to monitor the die temperature,1.0V internal and 2.5V auxiliary volt-age. They all tracked very closelywith the external LabVIEW and tem-perature sensor measurements. Thisis important since it tells us that theXilinx system monitor including theinternal A/D was functional through-out the test.

At the end, we brought the temper-ature back to ambient, in increments,and left the unit under test to stabilizefor 48 hours. In performing our end-to-end test, we were able to configurethe Xilinx FPGA from JTAG onlyonce. JTAG communication including

the ChipScope Pro stopped thereafter,even though we were able to initializethe JTAG chain through IMPACT. Weweren’t able to program the flash orconfigure the FPGA from either JTAGor flash memory. After removing theflash and rewiring the TDI/TDO chain,we were able to configure the FPGAthru the JTAG once again. This isimportant, since it shows that theFPGA wasn’t damaged.

It is important to note that the JTAGchain is serial. The TDO of theIMPACT is connected to TDI of theflash memory, and then the TDO of theflash memory is linked to the TDI ofthe Xilinx FPGA. This means that theflash memory is sitting between theIMPACT and the Xilinx FPGA.

PROMISING RESULTSXilinx FPGA testing using commercialparts showed some promising results asfar as reconfiguration at very low tem-peratures. Reconfiguration through theJTAG interface continued to work downto -150°C. However, due to what appearsto be a failure of the flash memory chipat -130°C, we were not able to reconfig-ure the Virtex-5 chip from flash memorybelow that temperature.

Internal current steadily declined onthe internal 1.0V as expected (and inter-nal power consumption) and ended at66 percent of where it was at room tem-perature. Basic circuits, ring oscillator,shift registers and PLL outputs contin-ued to function normally with hardlyany detectable changes.

X C E L L E N C E I N A E R O S PA C E & D E F E N S E

Third Quarter 2012 Xcell Journal 27

Versatile FPGA PlatformKINTEX-7

www.techway.eu

A cost-effective solution for intensive calculations

and high speedcommunications.

FMC-SFP/SFP+

PCI-e 4x Gen2Kintex-7 SeriesSDK for Windows and LinuxReady to go 10 GbE on FMC slot!

Page 28: Xcell Journal issue 80

28 Xcell Journal Third Quarter 2012

The Xilinx Spartan-3A FPGA augmentscontrol algorithm implementation for a multiterminal DRI power inverter.

XCELLENCE IN GREEN TECHNOLOGY

by Phillip SouthardSenior Design EngineerPDS Consulting, [email protected]

Using Spartan Technology to Support Development of Green Energy

Page 29: Xcell Journal issue 80

Third Quarter 2012 Xcell Journal 29

Product development for industrialapplications involves extensiveresearch and preparation in an environ-ment of rolling deadlines and ever-evolving product specifications. Whiletime-to-market for this sector may notbe as short as it is for consumer elec-tronics, products must ship quickly andwith as many essential functions, fea-tures and potential hooks for the nextgeneration as possible. Companies vieto be industry leaders in their respectivecompetitive arenas—especially in newmarkets such as green power, which intheir infancy and without defined lead-ers require pioneers to design, developand deliver new products. Successdepends not only on an inspired, dedi-cated team of engineers, advanced com-puting technology and new materials,but also on angel investors or govern-ment agencies to provide grants forpromising approaches to improvedenergy generation, distribution, moni-toring, metering and consumption.

In the fall of 2011, engineers fromPrinceton Power Systems (PPS), a NewJersey-based manufacturer of advancedpower-conversion products and alterna-tive-energy systems, demonstrated theirlatest green power product. Thisdemand response inverter (DRI) wasthe result of a three-year collaborationbetween PPS, the United StatesDepartment of Energy and SandiaNational Laboratories’ Solar EnergyGrid Integration Systems (SEGIS).

The resulting multiterminal DRI(Figure 1) is uniquely flexible to bemore reliable, more efficient and morecost-effective than currently availableinverters. Equipped with multiple ACand DC terminals, the DRI can routepower to the grid, a microgrid, DCenergy storage or dynamic loads.Programmable power curves andcharge profiles enhance control forgenerators, loads and batteries, ensur-ing greater efficiency. And the use ofadvanced high-capacity long-lifespanswitches maximizes reliability.

Princeton Power Systems show-cased features of the DRI thatimprove electrical-grid interconnectiv-ity and efficiency, enhance the per-formance of renewable energy sys-tems and allow for better integration

of electric vehicles and distributedpower generation. The DRI was part ofthe company’s “An Island in the Sun”microgrid demonstration (Figure 2),which detailed key advancements inclean technology and manufacturing,including a 200-kilowatt solar arrayand lithium-ion battery system.

A microgrid can operate independ-ently of a major utility grid to supplyreliable, low-carbon-emission energy.PPS’ DRI is compatible with AC gener-ators such as diesel or gas, and withphotovoltaic (PV) or wind inputs. Asmall community using a DRI is lessdependent on the grid and can reduceits carbon footprint and utility costs.The DRI can also provide grid servic-es, PV with storage and charging forelectric vehicles.

XILINX SPARTAN TECHNOLOGYTo meet the demands of industrialproduct design, companies likePrinceton Power Systems leverageflexible development vehicles such asXilinx’s Targeted Design Platforms(TDPs), with their rich ecosystem of

design services support. In this case,however, the engineering team facedan initial challenge of determining howto expand the inputs and outputs of theDRI system’s digital signal processor,and how to implement control andcommunication interfaces that func-tioned in parallel. PDS Consulting

X C E L L E N C E I N G R E E N T E C H N O L O G Y

Figure 1 – The flexibility in Princeton Power Systems’

demand response inverter comes from FPGAs.

P

Page 30: Xcell Journal issue 80

offers design services in programma-ble digital systems for a variety ofmarkets including aerospace anddefense, broadcast, industrial, scien-tific and medical. The firm supportedwork on the project as a member ofXilinx’s Alliance Program.

The PDS Consulting team providedon-site, hands-on system debug andPCB bring-up, as well as off-site RTLand IP design services. We alsoadvised Princeton Power Systemsdevelopers on how to implement thesystem control interface for theirgreen power control algorithm. In theend, engineers chose a Xilinx®

Spartan® XC3SD3400A FPGA marriedto a DSP as a prime system controlcomponent (Figure 3).

The Spartan-3A FPGA, with itsextensive SelectIO™ capabilities,offered flexibility in implementation,particularly for trigger signals andADC input channels. Xilinx’sSpartan-3A family is a superior alter-native to mask-programmed ASICsbecause these FPGAs permit designupgrades in the field and avoid thehigh initial cost, lengthy develop-ment cycles and inherent inflexibilityof conventional ASICs. The integrat-ed technology afforded by theSpartan-3A made the implementation

of Princeton Power Systems’ patentedcontrol algorithm for green powerconversion a possibility.

It took more than 300 I/Os toimplement the DRI system interface,which enabled access to 8 Mbytes offlash, a 256-Mbit SDRAM andUSB/RS-232 at >900 kbits/second. Inaddition, the team also utilized thegenerous amount of fast, distributed32-bit dual-port RAM inherent to theSpartan architecture. The config-urable logic block (CLB) lookuptables used as dual-port RAMsenabled the efficient local storage ofnew energy waveform samples thatthe ADCs supplied, while the DSPread the previous samples and aPicoBlaze™ embedded processoranalyzed new values from the secondport concurrently.

THE BENEFITS OF XILINX FPGASPrinceton Power Systems’ algorithmsrequired extensive calculations thatcan only be accomplished by floating-point DSPs, which traditionally donot have the same features as FPGAs.Some of the features of Xilinx FPGAsthat particularly suited the PPS proj-ect included multivoltage, multistan-dard SelectIO I/O pins; configurablelogic blocks; block RAM; and memo-

ry interfaces that can implement alarge number of programmable trig-ger signals. These signals generateand execute pulse trains that triggerpower electronic switches like IGBTsand control a large number of fastADC channels to read important sys-tem measurements on every pulse orcustom high-speed serial interfaces.

FPGAs not only allowed PrincetonPower Systems to design and imple-ment custom peripherals thatmatched its specific requirements,but also provided additional compu-tational resources for the processingof input values, which otherwisewould have to be done by the DSP.The Spartan-3 FPGA-based designcompletes several processes: Itaccomplishes system error checkingusing the values read from ADCs con-nected to the DSP. It implementstimer-driven activities like readingADCs precisely when necessary. Andit does an averaging of ADC values.

Without the FPGA, some of thesefunctional requirements would havebeen impossible to implement. Otherfunctionalities would have requiredmore components on the DRI’s controlboard or a significantly more complexsoftware architecture. The PPS teamknew it was crucial to avoid the latter,

30 Xcell Journal Third Quarter 2012

X C E L L E N C E I N G R E E N T E C H N O L O G Y

PV Connection

DC Energy Storage

AC Grid Connection

Motor/Generator

Figure 2 – Princeton Power’s flexible, multiterminal DRI is here configured for an electrical microgrid.

Page 31: Xcell Journal issue 80

since the control board acts as theheart of the DRI system.

“While an increasing number ofDSPs now offer peripherals that werepreviously absent, the importance ofhaving an FPGA still remains,” saidFrank Hoffmann, the R&D managerat Princeton Power Systems. “Witheach new generation, the amount ofcomputational resources inside theFPGA increases—for example, froma Spartan-3 to a Spartan-6—and it hasnow become possible to outsourcemore computational work to theFPGA. And this could mean runningour complex control algorithmsfaster and therefore improving thequality of a generated output like theone in the DRI.”

THE BOTTOM LINEWhile the technical benefits of usingan FPGA are clear (quick prototyping,flexible architecture, advanced sup-port tools like Xilinx’s ChipScope™Integrated Logic Analyzer for quick in-system debug), the decision has alsoaffected Princeton Power Systems’bottom line.

“Using an FPGA has made develop-ment much faster, reducing R&Dexpenses and time-to-market for newand innovative alternative-energy sys-tems,” said executive vice presidentDarren Hammell. “The programmingenvironment was easy to use andenabled us to rapidly develop and testour innovative software. This enabledus to complete the prototype for the

demonstration much quicker than oth-erwise would have been possible.”The product is now shipping, and PPShas added two new customers: BMWand SuperPlug have included a DRI innew power system designs.

In fields like green power technology,engineers face new challenges, includ-ing determining how to optimize algo-rithm implementation while retainingnecessary functionality. With the righttools, technology and team, enhance-ments in this field lie just within reach.

For more information on PrincetonPower’s multiterminal DRI, please visithttp://www.princetonpower.com/

prod_demand.shtml. You can reach PDS Consulting at

[email protected].

X C E L L E N C E I N G R E E N T E C H N O L O G Y

Third Quarter 2012 Xcell Journal 31

1010000

==

==

=~

=~

PV

Power ElectronicsGrid

DC/DC buck/boost

bridges

AC-to-DC Bridges

Load Port

Internet

RemoteLogging &Monitoring

Operator/User

SensorsTrig

ger

s

FPGA

FPGADSP

Battery

Figure 3 – Engineers chose a Spartan-3A FPGA, with its extensive SelectIO capabilities, as the main system peripheral.

Page 32: Xcell Journal issue 80

32 Xcell Journal Third Quarter 2012

XCELLENCE IN SOLID-STATE DISKS

A solid-state disk design based onPCI Express gains speed and performance thanks to Xilinx 7 series devices.

Designing a 19-nm Flash PCIe SSD with Kintex-7 FPGAsby Yilei WangSenior Hardware EngineerMemblaze China [email protected]

Xiangfeng LuCTO Memblaze China [email protected]

Page 33: Xcell Journal issue 80

Solid-state disk (SSD) technologybased on NAND flash memoryprovides higher throughput and

lower power consumption than tradi-tional mechanical-drive-based storagesystems. For that reason, SSD usagehas mushroomed over the last decade,moving from handheld devices to lap-top and desktop computers and, now,making incursions into the enterprisestorage market. The rapid rate ofexpansion has been further aided bythe enterprise storage industry’s adop-tion of SSDs based on the SerialAdvanced Technology Attachment(SATA) standard.

However, as SSD manufacturerslook toward next-generation systemsthat achieve new performance and den-sity highs by using flash memory that isimplemented in 19-nanometer processtechnology, SATA hasn’t kept up. Evenwith the latest revision (SATA 3.0), the6-Gbps physical interface hardly meetsthe highest throughput of the SSDNAND flash arrays, and thus leavesextra performance on the table.

To break the interface bottleneck,SSDs based on PCI Express® are mak-ing a huge impact on the market.PCIe® is an industry-standard localbus with higher performance and scal-ability than SATA. It is based on multi-lane high-speed serial links that sup-port one to 16 lanes, each operating atup to 8 Gbps (2.5 Gbps for Gen1, 5Gbps for Gen2, 8 Gbps for Gen3). ThePCIe interface for SSDs supports giga-byte throughput and better marginsfor the foreseeable future as NANDflash technology evolves.

However, creating a PCIe-basedSSD system using 19-nm flash has itsshare of challenges. The PCIe inter-face requires more high-speed seriallinks and more-complex interconnectthan SATA. The throughput demandsrequire the PCIe direct memory access(DMA) to operate at a gigabyte band-width level. In addition, at the 19-nmprocess node, flash reliability—orspecifically, the metric known as“wear” (the number of times a NANDcan read or write before encountering

an error)—is a growing issue. At 19nm, companies must perform wearleveling and error correction fasterthan ever before.

Xilinx® Kintex™-7 FPGAs establisha new benchmark for FPGA high-endperformance at less than half the priceof previous-generation FPGAs. TheKintex-7 family is one of four productlines Xilinx built using TSMC’s HPL(high-performance, low-power) 28-nmprocess, designed for maximumpower efficiency and delivering atwofold price/performance improve-ment while consuming 50 percent lesspower than previous generations.Kintex-7 FPGAs offer high-densitylogic, high-performance transceivers,memory and DSP, plus Agile MixedSignal—all to enable higher system-level performance and the next levelof integration. These capabilities allowfor continued innovation and differen-tiation in designs at volume pricepoints. As such, Xilinx’s Kintex-7series FPGAs are ideally suited for useas 19-nm flash PCIe SSD controllers.

Third Quarter 2012 Xcell Journal 33

X C E L L E N C E I N S O L I D - S T A T E D I S K S

PCIe Gen 2 x 8

7 SeriesPCIe Core

TLP RXEngine

DMARegister

TLP RXEngine

Interrupt

DMA RX Engine

TAG Module

DMA TX Engine

PCIe SG-DMASubsystem

Kintex-7 325 T

AXI 4 Bus

AXI 4 Lite Bus

9 x 2 GbitDDR3

32-MB XOR Flash

TemperatureSensor

Identify Chip

MIG DDR3Controller

MicroBlaze 0

MicroBlaze 1

BRAMs

Data AddressTranslate

High-Speed Wear Leveling/

Flash Block Manage

Data AddressTranslate

QSPI FlashController

IIcController UART RS-232

InterruptController

SystemController

On-Chip Register File CPUSubsystem

High-Speed Intelligent ECC Ecoding

High-Speed Intelligent ECC Ecoding

Storage Subsystem

19-nm Flash Controller 19-nm FlashArrays

Figure 1 – The Kintex-7 SoC solution for a PCIe 19-ns NAND flash SSD consists of three subsystems: CPU, storage and PCIe SG-DMA.

Page 34: Xcell Journal issue 80

operations. This allowed our designteam to focus on the functions of theSG-DMA operation itself. The integrat-ed block for the PCIe solution sup-ports one-lane, two-lane, four-lane andeight-lane endpoint configurations atspeeds up to 5 GBps (Gen2), compli-ant with the PCIe Base Specification,rev. 2.1. Table 1 shows the 7 seriesFPGAs’ integrated block for PCIe con-

figurations. The core can be config-ured as Gen1/Gen2 and for maximumsupport to x8 lanes, providing up to40-Gbps bandwidth.

We used CORE Generator™ toolsto configure and generate the PCIeendpoint IP, which includes the userguide, source code, simulation codeand example design—all of whichhelped us get up to speed quicklyusing the core. Figure 2 shows thePCIe hard core’s top-level functionalblocks and interfaces.

The main function of the SG-DMAcore is to process TLP packets fromthe host and respond. SG-DMA oper-ates as a PCIe master access to thehost memory, moving data betweenthe host and local memory. The host

Figure 1 shows the Memblaze SSDcontroller architecture, featuring threesubsystems interconnected with ahigh-speed AXI4 bus. The PCIe SG-DMA subsystem, which includes theKintex FPGA hard core, scatters andgathers data between the host comput-er and the SSD data buffer (the “SG”stands for scatter and gather). TheCPU subsystem manages peripherals

and executes SSD access commands,while the storage subsystem managesthe SSD sector data processing with amultichannel NAND controller, error-correcting code (ECC) block andwear-leveling block. These three sub-systems share a 2-Gbyte DDR3SDRAM with ECC function. It’s easy togenerate an ECC DDR3 SDRAM con-troller with Xilinx Memory InterfaceGenerator (MIG) tools.

In our design, the 7 series PCIehard core implements the physical-to-TLP layer and allows the design tofunction as a high-performance PCIeendpoint with minimal latency. Thenew embedded MircoBlaze® core withARM® AXI4 interconnect completelyremoves the bottlenecks of the on-

34 Xcell Journal Third Quarter 2012

X C E L L E N C E I N S O L I D - S T A T E D I S K S

chip bus. The DDR3 hard core pro-vides a 51.2-Gbps ECC solution for thedisk cache. Meanwhile, the low-powerlogic resources make it easy toachieve high performance of wear lev-eling and intelligent ECC algorithmexecution. In addition, abundant high-performance I/O resources provide aneasy way to interconnect to the 19-nmNAND flash arrays.

PCI EXPRESS SG-DMAOur design’s PCIe interface required afast DMA controller to implementhigh-speed communications betweenthe host and the local AXI4 bus. Thethroughput of the SSD flash arrayscan reach up to 2.5 GBps. To simplifythe PCIe interface design and getgreater margin as flash chips evolve,we chose to use an eight-lane PCIeGen2/Gen3 architecture.

The PCIe endpoint has many com-plex protocols to process in the phys-ical, data link and transaction layers.Luckily, designing the PCIe SG-DMAcontroller in the Xilinx 7 series FPGAswas quick and easy. The PCIe hardcore, which Xilinx had implementedin the device’s fabric, handled all PCIe

LogiCORE IP 7 Series FPGAsIntegrated Block for PCI Express

UserLogic

Physical LayerControl and Status

HostInterface

UserLogic

7 Series FPGAsIntegrated Block for

PCI Express(PCIE_2_1)

Transceivers

PCIExpressFabric

User Logic

ClockandReset

PCI Express(PCI_EXP)

Optional Debug

System(SYS)

TXBlock RAM

RXBlock RAM

AX14-StreamInterface

Physical(PL)

Configuration(CFG)

Optional Debug(DRP)

Figure 2 – Top-level functional blocks and interfaces in the PCI Express hard core

Page 35: Xcell Journal issue 80

X C E L L E N C E I N S O L I D - S T A T E D I S K S

Third Quarter 2012 Xcell Journal 35

sends commands to the DMA con-troller to control the DMA access.The command code is embedded inthe data of a specific host TLP regis-ter write. The SG-DMA controllerinitiates the SG-DMA write requestto move data from the local memoryto the host memory in response tothe host’s read commands. Similarly,for host write commands, the SG-

DMA controller initiates a DMA readrequest to move data from the hostmemory to local memory. Figure 3illustrates the flow.

AXI4 INTERCONNECTThe AXI interconnect IP connectsone or more AXI memory-mappedfaster devices to one or more memo-ry-mapped slave devices. The AXI

interfaces conform to the AMBA®

AXI version 4 specifications fromARM, including the AXI4-Lite controlregister interface subset. The inter-connect IP is intended for memory-mapped transfers only; AXI4-Streamtransfers are not applicable. The AXIinterconnect IP can be used as apCORE from Xilinx’s EmbeddedDevelopment Tool Kit (EDK) or as a

Receive TLP register access TLP write?

DMA command?

DMA write?

Send TLP write requestwith write data to host

Await TLP DMA completionwith read data from host

Send TLP register completewith register value

Send TLP read request

Set register with TLP value

PCIe

Gen (Integrated block)*

Artix-7 Kintex-7 Virtex-7 T Virtex-7 XT Virtex-7 HT

Gen2 Gen2 Gen2 Gen3 Gen3

x4 x8 x8 x8 x8

1 1 3-4 2-4 1-3

5 5 8 8 8

Width

Number of Blocks

Serial Date Rate (Gbps)

*Based on symmetric filter implementation

Table 1 – 7 Series FPGA integrated blocks for PCI Express

Figure 3 – Operation of the SG-DMA controller

Page 36: Xcell Journal issue 80

every write to a previously writtenblock must first be read, erased, modi-fied and rewritten to the same location.This is very time-consuming, and highlywritten locations will wear out quickly,even when other locations on the flashare completely unused. Once a fewblocks reach their end of life, the driveis no longer operable.

The first type of wear leveling is called“dynamic wear leveling.” It uses a map tolink logical block addresses (LBAs) fromthe OS to the physical flash memory.Each time the OS writes replacementdata, the map is updated to mark theoriginal physical block as invalid data,and a new block is linked to that mapentry. Each time a block of data is rewrit-ten to the flash memory, it is written to anew location. However, blocks thatnever get replacement data sit with noadditional wear on the flash memory.The drive may last longer than one withno wear leveling, but some blocks, whilestill remaining active, will go unused.

Another technique, called “staticwear leveling,” also uses a map to linkthe LBA to a physical memory address.Static wear leveling works the sameway as dynamic wear leveling exceptthe static blocks that do not change areperiodically moved so that other datamay access these low-usage cells. Thisrotational effect enables the SSD tooperate until most of the blocks arenear their end of life.

Figure 4 shows flash pages with andwithout wear leveling after a longwrite/erase operation. The one withoutwear leveling, with black pages, is bro-ken and can no longer record any data,while the one with wear leveling stillfunctions with all pages.

INTELLIGENT ECC ALGORITHMAnother key component of SSD systemdesign is error correction. There are anumber of anomalies that can cause biterrors, which in turn can affect dataintegrity and even the proper operationof the system itself. To deal with theseerrors, our design team employs com-plex ECC algorithms that get even more

standalone core from Xilinx’s COREGenerator IP catalog.

The designer can select from twomodes of operation that the XilinxAXI4 IP supports. The performance-optimized crossbar mode has ashared-address, multiple-data (SAMD)crossbar architecture with parallelpathways for write and read datachannels. The area-optimized shared-access mode features shared writedata, shared read data and singleshared address pathways. Both ofthese modes support burst lengths upto 256 for incremental (INCR) bursts,and variable data width from 32 up to1,024 bits. Propagated USER signalsare also supported on each channel, ifany; an independent USER signalwidth per channel is optional.

The AXI4 interconnect provides highperformance between the PCIe SG-DMA and the DDR3 memory. We found

36 Xcell Journal Third Quarter 2012

X C E L L E N C E I N S O L I D - S T A T E D I S K S

that the AXI4-Lite shared bus is also aperfect solution for the low-speed on-chip interconnect, requiring minimalconsumption of logic resources.

WEAR-LEVELING TECHNOLOGYWear leveling is a design technique thatstorage-media companies employ toprolong the service life of variouskinds of erasable computer storagetypes, such as the flash memory usedin solid-state drives. There are a fewwear-leveling mechanisms used in aflash memory systems, each with vary-ing levels of longevity enhancement.

A flash memory storage systemwithout wear leveling will not last verylong if it is writing data to the flash.Without wear leveling, the flash con-troller must permanently assign thelogical addresses from the operatingsystem (OS) to the physical addressesof the flash memory. This means that

Without wear leveling With wear leveling

Figure 4 – Flash pages with and without wear leveling

Page 37: Xcell Journal issue 80

X C E L L E N C E I N S O L I D - S T A T E D I S K S

Third Quarter 2012 Xcell Journal 37

elaborate when we use new, smaller-geometry flash in these systems.

One ECC algorithm we use for 19-nm NAND flash memory is called an“anti-random data error record.” Thealgorithm addresses bit errors causedby temperature changes, noise andreliability of the storage cell. In addi-tion, the storage cell of NAND flashnormally has limited lifetimes of eras-ing/programming. The bit error rate(BER) increases with the accumula-tion of erasing/programming opera-tions until the limited lifetimes runout. The SSD ECC function requiresthe algorithm also to detect the BERof each cell and to understand theirlifetimes. Designers set a certain BERthreshold to indicate that a lifetimehas been reached and to identify areplacement block. However, opti-

mizing this threshold is critical. A BERthreshold that’s too low can cause thesystem to abandon a reliable cell tooearly, ultimately reducing the SSD’slifetime. On the other hand, a higherBER threshold runs the risk of losingdata as it tries to write to an unreliablecell. Thus, an ECC algorithm must findthe proper balance between reliabilityand lifetime.

The 19-nm NAND flash offers moredensity in storage but less reliability.That’s why our design introduces high-speed, high-level error correction. TheECC part occupies more than 35 per-cent of the design resources, imple-menting a parallel computing ability ofa maximum 49 bits of error correctionin a 1,024-bit sector, at 4-Gbyte readspeeds. The new 28-nm Kintex-7 tech-nology increases system-level perform-

ance by up to 50 percent and increasescapacity twofold while lowering totalpower consumption by up to 50 per-cent over the previous-generationFPGAs. Compared with the same ECCblock in a Virtex-5 device, our Kintex-7 implementation reduced the area by5 percent while increasing the per-formance more than 40 percent andmaintaining the same cost.

Xilinx Kintex-7 series FPGAs areideally suited for 19-nm flash PCIeSSD designs. The hard PCIe core, theperformance, capacity and low powermake it the perfect chip for this mar-ket. With this device, our SSD through-put easily maintains 2 GBps for bothwriting and reading. The device allowsus to bring a great value to our cus-tomers and gain a large margin for our19-nm NAND flash system.

Page 38: Xcell Journal issue 80

38 Xcell Journal Third Quarter 2012

XPERTS CORNER

How Partial Dynamic ReconfigurationHelped Make an FSK Demodulator

Page 39: Xcell Journal issue 80

Third Quarter 2012 Xcell Journal 39

P artial dynamic reconfigurationis a radical new way to config-ure and reprogram an FPGA.

Unlike the standard FPGA reconfigura-tion process, PDR allows you tochange a small part of the device basedon the needs of your design, whileother parts are still running. There isno need to hold the device in resetwhile an external controller or internalpiece of glue logic reloads a designonto it, the standard reconfigurationmethodology. With PDR, critical partsof the design continue operating whilea controller, either on or off the FPGA,loads a partial design into a reconfig-urable module. The technique pays offin hardware resource optimization anda reduction in power consumption.

The PDR method has emerged as atopic for investigation in the contextof the European Union research proj-ect pSHIELD. This project aims at pio-neering techniques to build security,privacy and dependability (SPD) intoembedded systems, rather than tack-ing them on as “add-on” functionali-ties. The idea behind pSHIELD is totake a first step toward SPD certifica-tion for future embedded systems. Theleading concept is to demonstrate thecomposability of SPD technologies.

In such a context, we have identi-fied PDR as a key technology to imple-ment a secure, dependable and recon-figurable embedded system. Ourinvestigation of this new technologyinvolved implementing a projectdemonstrator—a reconfigurable fre-quency shift-keying (FSK) demodula-tor system—within the Xilinx® PDRdesign flow.

FREQUENCY SHIFT-KEYINGADAPTABLE DEMODULATORThe FSK adaptable demodulator is aproof of concept that we developed todemonstrate the pSHIELD SPD para-digm. In fact, it implements a simplesystem managing a data stream.Figure 1 shows the block diagram ofthe hardware implementation of the A-FSK demodulator SPD node.

X P E R T S C O R N E R

The ability to reconfigure a portion of a Xilinx FPGA on the fly allowed a Europeanresearch team to create amore dependable system.

by Fabio Giovagnini Software ManagerAurion S.r.l., SESM [email protected]

Antonio Di MarzoEmbedded System Dept. ManagerSESM [email protected]

Page 40: Xcell Journal issue 80

One of the most common forms ofdigital modulation in the high-frequencyradio spectrum, frequency shift keyinghas important applications in tele-phone circuits. This technology trans-mits data by shifting the frequency of acontinuous carrier in a binary way.One frequency is designated as the“mark,” with frequency f0, and theother as “space,” with frequency f1.The mark is associated with the sym-bol one, the higher frequency, whilethe space is associated with the sym-bol zero, the lower frequency. In theexample of the FSK signal seen inTable 1, the mark is at frequency 1,031Hz and the space at 968 Hz.

After the FSK performs thedemodulation, the signal and the car-rier are multiplied together (or ncoand multiplier, I2 and I1 blocks inFigure 2) and then low-pass filtered.This low-pass or loop (I3 block inFigure 2) filter discriminates thesymbol mark from the space. Theamplitude of the space symbol willbe higher than that of the mark.

The output of that loop filter goesto a 16-tap finite impulse response(FIR; I4 block in Figure 2) filter toperform digital low-pass filtering. TheFIR filter is essentially an average fil-ter, since its output is equal to theaverage value of its input over the lastn-tap samples, where n is the numberof taps used. This configurationneeds 16 coefficients, but you cansimplify the process by assuming all

the coefficients are the same, 1/16. Inreality, you can implement a 1/16 mul-tiply by just performing a 4-bit right-shift operation.

The FSK adaptable demodulator iscapable of adapting dynamically to adifferent frequency of the carrier Fc0and Fc1. In a general communicationsschema, two modules are present: themodulator and the demodulator. Anadaptable demodulator is able to

40 Xcell Journal Third Quarter 2012

X P E R T S C O R N E R

pSHIELD Network Ethernet:

Fault Injection Trigger

FM Demodulator SPD Node HW

CPU Core

Flash Memory

Fault Injector

FM Signal:Analog

Xilinx Board

Virtex5: FPGA

Reconfigurable Part

DAQ Adapter GPIO Port Controller

Encryption/Decryption

Demodulator

Figure 1 – Hardware implementation of the A-FSK demodulator SPD node

DIGITAL SIGNAL TO TRANSMIT

A-FSK Rate Variable between 100 Hz and 50 Hz

A-FSK MODULATED SIGNAL

“Space” frequency 968 or 1,937 Hz

“Mark” frequency 1,031 or 2,062 Hz

Amplitude 1 Vpp

Analog-to-digital sampling rate 16 kHz or 32 kHz

Table 1 – Mark and space in an example FSK signal

Page 41: Xcell Journal issue 80

switch automatically between two dif-ferent carriers in order to match a car-rier switch made by the modulator.The modulator might switch carriersfor several reasons, including trans-mission errors, too much noise or therisk of intrusion on the current carrier.

The FSK adaptive demodulator hasan additional built-in block named thecarrier controller. This block continu-ally checks the integrity of the trans-mitted signal by analyzing the consis-tency of received data. Based on thatanalysis, the carrier controller drivesthe reconfiguration condition.

The FSK adaptive demodulator canreconfigure itself in two distinct modes,each able to decode the signal modulat-ed at the given carrier frequency Fc0

and Fc1. The process of configurationtakes place in accordance with the par-tial dynamic reconfiguration methods.Figure 3 shows the general layout of theFSK adaptive demodulator. The carriercontroller, which we implemented viasoftware, runs on a PowerPC® 440 as asingle task and it performs a data integri-ty check. In the case of a communicationerror, the carrier controller will force areconfiguration event, using the InternalConfiguration Access Port (ICAP) soft-ware primitives.

We designed our FSK adaptivedemodulator using the Xilinx develop-er board ML507. Equipped withRocketIO™ GTX transceivers, thisembedded-system FPGA developmentboard provides a feature-rich, general-

purpose evaluation and developmentplatform. It includes onboard memo-ry and industry-standard connectivityinterfaces to deliver a versatile devel-opment platform for embedded appli-cations.

PDR DESIGN FLOWA typical static Xilinx ISE® DesignSuite flow consists of four main steps:

■ Design/edit■ Synthesis■ Implementation■ Device configuration

The partial-reconfiguration designflow is more sophisticated and com-plex than this. Figure 4 depicts a sim-plified PDR design flow.

Third Quarter 2012 Xcell Journal 41

X P E R T S C O R N E R

Figure 2 – Block diagram of the FSK demodulator circuit

The first step in the PR design flow is to identify the partially reconfigurable modules in our top-level design.

For each of these modules, we must define the interfaces in terms of input and output signals. In our case,

we identified a single PRM, named the demodulator.

Page 42: Xcell Journal issue 80

The first step is to identify the par-tially reconfigurable modules (PRMs)in our top-level design. For each ofthese modules, we must define theinterfaces in terms of input and outputsignals. In our case, we have identifieda single PRM, named the demodulator.The following coding describes thedemodulator interface:

ENTITY Demodulator IS

PORT (

clk : IN std_logic;

--Main Entity clock

reset : IN std_logic;

--High active reset

fmin : IN std_logic_vrc-

tor ( 7 DOWNTO 0); --Modulated

fsk signal

dmout : OUT std_logic_vrc-

tor (11 DOWNTO 0); --Pre- demodu-

lated signal

clko : OUT std_logic;

--Synch. FIFO signal

dbg : OUT std_logic;

--debug line

);

END Demodulator;

Given n number of PRMs we needin the current design, the next step isto generate the n PRM netlist filesusing the XST tools. The output ofXST consists of the NGC files. TheNGC is a netlist that contains bothlogical data and constraints. At theend, we have to generate n NGC files.In the event that we will use thoseNGC files in a partial-reconfigurationproject, is necessary to ensure thatthe IOBUF is disabled.

For our project, we have two NGCfiles: a 1k and a 2k demodulator. Thedifference between the two modules isthat one has the numerically con-trolled oscillator (NCO) set at around1 kHz and the other at around 2 kHz.

Now that we have the two NGCfiles to use in our system, we have tocreate a design that will host thePRMs. Using XPS, we build up a sys-

tem-on-a-chip (SoC), instantiating allthe needed modules and controllersvia the XPS menu. Additionally, wehave to create a black-box IP that willhost our PRM modules, previously cre-ated by the XST tool. To do that, wecan use the option “Create and ImportPeripheral Wizard” available in XPS.

In our case, we created a black-boxmodule, named FSKDemodulator,using the appropriate options. Thewizard process generates two VHDLfiles, FSKDemodulator.vhd andUser_logic.vhd. The FSKDemodul-ator.vhd is the top level associatedwith our PRM modules (demodula-

42 Xcell Journal Third Quarter 2012

X P E R T S C O R N E R

FSK Adaptive Demodulator

Carrier Controller(Static)

FSK DemodulatorFc0

(Dynamic)

FSK DemodulatorFc1

(Dynamic)

XPS

XST

PlanAhead

IMPACTdata2mem & xmd

Synthesis

ucf

ngc

Synthesisngc

1..n

Top and PRM bitstream(Static bitstream optional)

1..n

Program

1..nDefault

Bitstream

bit bit bit

Implement

Implement

Implement

TimingFloor

Figure 3 – Overview of the FSK adaptive demodulator design

Figure 4 – Simplified view of a PDR design flow using PlanAhead, XST and XPS

Page 43: Xcell Journal issue 80

tor) at the programmable-system level.This file defines the interface of thePRM module with the programmable-system components such as thePeripheral Local Bus (PLB) v4.6. TheUser_logic.vhd represents the userlogic function, and it includes theinstance of the PRM module. Once theprogrammable-system design has beencompleted, we can generate the NGCfile for this configuration.

Using XPS, we defined the pro-grammable-system as well as the gen-eral system architecture. It’s impor-tant to note that we defined all thePRMs as a black box in XPS.

Using PlanAhead™ we can mergethe outputs and NGC files comingfrom the two processes, XST and XPS,in order to have just a few PRM bit-streams and a default bitstream. Atop-level implementation will bedefined and built up using the NGCfile from XPS and one of the NGC filesfrom XST. The designer then needs toadd a partial-reconfiguration region tothe design and must specify the NGCfile associated with it. The last step isto promote this configuration in orderto make this implementation thedefault system implementation thatwill load into the system at startup.

The output of this process will bethe default bitstream. To build thePRMs’ bitstream files, we must reopenPlanAhead and, starting from scratch,check the option “PR project” in orderto import all the NGC files. PlanAheadwill generate a distinct bitstream foreach of those PRMs. In our case, it out-puts two PRM bitstreams, one fordemodulator1k and the other fordemodulator2k.

For debug purposes, we suggest cre-ating n (number of PRMs) differentstatic implementations, one for eachPRM. In this case the designer will haven static full implementations, each ofwhich will perform the function of thenth PRM statically attached to theFPGA. In our opinion, this is a goodcompromise between debug needs anddevelopment complexity.

The last step is to download thegenerated bitstream into our targetdevice. You can program the deviceusing the IMPACT tool as well as thecommand-line data2mem and theXilinx Microprocessor Debugger(XMD) tool, in case you need tostore the bitstream and SystemACE™ file in a CompactFlash. Inour case, we opted to program thedevice by command-line methods,since the ML507 Xilinx board has aCompactFlash on it and the SystemACE manages the CompactFlash asa boot device.

TRANSFORMABLE DEVICESCompared with static reconfigura-tion, the technique of partial dynam-ic reconfiguration is extremely effi-cient in terms of the time it takes forreconfiguration to occur. Althoughthe time is related to the physicaldimensions of the PRMs, if thesemodules can be smaller than thefull bitstream by even one gain fac-tor, that implies a time of reconfigu-ration of tens vs. hundreds of mil-liseconds. The use of PDR bringsFPGA system design to anotherlevel, giving the designer an oppor-tunity to drastically reduce thepower consumption and the cost ofa whole system.

In the context of the EU’spSHIELD research project, wheredesign criteria such as security, pri-vacy and dependability are the mainfactors, we found the PDR tech-nique extremely useful. The possi-bility of changing a cryptographicalgorithm or communication proto-col on the fly while keeping theother functionality alive was a bigadvantage. With this approach, wethink that the FPGA is going tousher in a new era in electronicdesign. We are envisioning systemsthat can change their functionalityand adapt themselves to a specificscenario or threat. In short, we arethinking of a world made up oftransformable devices.

Third Quarter 2012 Xcell Journal 43

X P E R T S C O R N E R

Common Features• On-Board Power Supplies• Very Low-Cost• Long-Term Available• Open Reference Designs• Ruggedized forIndustrial Applications

• Customizable• Custom Integration Services

Development Services• Hardware Design• HDL Design• Software Development

TE0630Xilinx Spartan-6 LX (USB)

www.trenz-electronic.de

GigaBeeXilinx Spartan-6 LX (Ethernet)

• Scalable:45 k to 150 k Logic Cells

• Hi-Speed USB 2.0• Very Small: 47.5 × 40.5 mm• Compatible with TE0300

• Scalable:45 k to 150 k Logic Cells

• Gigabit Ethernet• 2 Independent DDR3Memory Banks

• LVDS I/Os• Very Small: 40 x 50 mm• Coming Soon: Version

Page 44: Xcell Journal issue 80

44 Xcell Journal Third Quarter 2012

XPLANATION: FPGA 101

The Basics of FPGA Mathematics

by Adam TaylorPrincipal EngineerEADS [email protected]

Page 45: Xcell Journal issue 80

One of the many benefits of an FPGA-based solu-tion is the ability to implement a mathematicalalgorithm in the best possible manner for theproblem at hand. For example, if response time

is critical, then we can pipeline the stages of mathematics.But if accuracy of the result is more important, we can usemore bits to ensure we achieve the desired precision. Ofcourse, many modern FPGAs also provide the benefit ofembedded multipliers and DSP slices, which can be used toobtain the optimal implementation in the target device.

Let’s take a look at the rules and techniques that you canuse to develop mathematical functions within an FPGA orother programmable device.

REPRESENTATION OF NUMBERSThere are two methods of representing numbers within adesign, fixed- or floating-point number systems. Fixed-pointrepresentation maintains the decimal point within a fixedposition, allowing for straightforward arithmetic opera-tions. The major drawback of the fixed-point system is thatto represent larger numbers or to achieve a more accurateresult with fractional numbers, you will need to use a larg-er number of bits. A fixed-point number consists of twoparts, integer and fractional.

Floating-point representation allows the decimal point to“float” to different places within the number, depending uponthe magnitude. Floating-point numbers, too, are divided intotwo parts, the exponent and the mantissa. This scheme isvery similar to scientific notation, which represents a num-ber as A times 10 to the power of B, where A is the mantissaand B is the exponent. However, the base of the exponent ina floating-point number is base 2, that is, A times 2 to thepower of B. The floating-point number is standardized byIEEE/ANSI standard 754-1985. The basic IEEE floating-pointnumber utilizes an 8-bit exponent and a 24-bit mantissa.

Due to the complexity of floating-point numbers, we asdesigners tend wherever possible to use fixed-point repre-sentations. The above fixed-point number is capable of rep-resenting an unsigned number between 0.0 and 255.9906375or a signed number between –128.9906375 and 127.9906375using two’s complement representation. Within a design

27 26 25 24 23 22 21 20 2-1 2-2 2-3 2-4 2-5 2-6 2-7 2-8

you have the choice—typically constrained by the algo-rithm you are implementing—to use either unsigned orsigned numbers. Unsigned numbers are capable of repre-senting a range of 0 to 2n – 1, and always represent positivenumbers. By contrast, the range of a signed numberdepends upon the encoding scheme used: sign and magni-tude, one’s complement or two’s complement.

The sign-and-magnitude scheme utilizes the left-most bitto represent the sign of the number (0 = positive, 1 = nega-tive). The remainder of the bits represent the magnitude.Therefore, in this system, positive and negative numbershave the same magnitude but the sign bit differs. As a result,it is possible to have both a positive and a negative zerowithin the sign-and-magnitude system.

One’s complement uses the same unsigned representa-tion for positive numbers as sign and magnitude. However,for negative numbers it uses the inversion (one’s comple-ment) of the positive number.

Two’s complement is the most widely used encodingscheme for representing signed numbers. Here, as in theother two schemes, positive numbers are represented inthe same manner as unsigned numbers, while negativenumbers are represented as the binary number you addto a positive number of the same magnitude to get zero.You calculate a negative two’s complement number byfirst taking the one’s complement (inversion) of the pos-itive number and then adding one to it. The two’s comple-ment number system allows you to subtract one numberfrom another by performing an addition of the two num-bers. The range a two’s complement number can repre-sent is given by

– (2n-1) to + (2n-1 – 1)

One way to convert a number to its two’s complementformat is to work right to left, leaving the number thesame until you encounter the first “1.” After this point,each bit is inverted.

FIXED-POINT MATHEMATICSThe normal way of representing the split between integerand fractional bits within a fixed-point number is x,y wherex represents the number of integer bits and y the number offractional bits. For example, 8,8 represents 8 integer bits

Third Quarter 2012 Xcell Journal 45

X P L A N A T I O N : F P G A 1 0 1

One of the main advantages of the FPGA is its ability to perform mathematical functions as desired. Here’s a refresher on the basic rules and methods involved.

Page 46: Xcell Journal issue 80

The integer of the result will give you a stored number of1.45308673382x10-4, which will provide for a much moreaccurate calculation, assuming you can address the scalingfactor of 228 at a later stage within the calculation. Forexample, multiplying the scaled number with a 16-bit num-ber scaled 4,12 will produce a result of 4,40 (28 + 12). Theresult, however, will be stored in a 32-bit result.

FIXED-POINT RULESTo add, subtract or divide, the decimal points of both num-bers must be aligned. That is, you can only add to, subtractfrom or divide an x,8 number by a number that is also in anx,8 representation. To perform arithmetic operations onnumbers of a different x,y format, you must first ensure thedecimal points are aligned. To align a number to a differentformat, you have two choices: either multiply the numberwith more integer bits by 2X or divide the number with thefewest integer bits by 2X. Division, however, will reduceyour accuracy and may lead to a result that is outside theallowable tolerance. Since all numbers are stored in base-two scaling, you can easily scale the number up or down inan FPGA by shifting one place to the left or right for eachpower of 2 required to balance the two decimal points. Toadd together two numbers that are scaled 8,8 and 9,7, youcan either scale up the 9,7 number by a factor or 21 or scalethe 8,8 format down to a 9,7 format, if the loss of a least-sig-nificant bit is acceptable.

For example, say you want to add 234.58 and 312.732,which are stored in an 8,8 and a 9,7 format respectively. Thefirst step is to determine the actual 16-bit numbers that willbe added together.

234.58 • 28 = 60052.48

312.732 • 27 = 40029.69

The two numbers to be added are 60052 and 40029.However, before you can add them you must align the dec-imal points. To align the decimal points by scaling up thenumber with a largest number of integer bits, you mustscale up the 9,7-format number by a factor of 21.

40029 • 21 = 80058

You can then calculate the result by performing an addition of

80058 + 60052 = 140110

This represents 547.3046875 in a 10,8 format (140110 / 28).When multiplying two numbers together, you do not need

to align the decimal points, as the multiplication will providea result that is X1 + X2, Y1 + Y2 wide. Multiplying two num-bers that are formatted 14,2 and 10,6 will produce a resultthat is formatted as 24 integer bits and 8 fractional bits.

You can multiply by a fractional number instead of usingdivision within an equation through multiplying by the recip-

and 8 fractional bits, while 16,0 represents 16 integer and 0fractional. In many cases you will determine the correctnumber of integer and fractional bits required at designtime, normally following conversion from a floating-pointalgorithm. Thanks to the flexibility of FPGAs, we can repre-sent a fixed-point number of any bit length; the number ofinteger bits required depends upon the maximum integervalue the number is required to store, while the number offractional bits will depend upon the accuracy of the finalresult. To determine the number of integer bits required, usethe following equation:

For example, the number of integer bits required to rep-resent a value between 0.0 and 423.0 is given by

That means you would need 9 integer bits, allowing a rangeof 0 to 511 to be represented. Representing the number using16 bits would allow for 7 fractional bits. The accuracy thisrepresentation would be capable of providing is given by

You can increase the accuracy of a fixed-point number byusing more bits to store the fractional number. Whendesigning, there are times when you may wish to store onlyfractional numbers (0,16), depending upon the size of thenumber you wish to scale up. Scaling up by 216 may yield anumber that still does not provide an accurate enoughresult. In this case you can multiply up by the power of 2,such that the number can be represented within a 16-bitnumber. You can then remove this scaling at a further stagewithin the implementation. For example, to represent thenumber 1.45309806319x10-4 in a 16-bit number, the firststep is to multiply it by 216.

65536 • 1.45309806319x10-4 = 9.523023

Storing the integer of the result (9) will result in the num-ber being stored as 1.37329101563x10-4 (9 / 65536). This dif-ference between the number required to be stored and thestored number is substantial and could lead to an unaccept-able error in the calculated result. You can obtain a moreaccurate result by scaling the number up by a factor of 2.The result will be between 32768 and 65535, therefore stillallowing storage in a 16-bit number. Using the earlier exam-ple of storing 1.45309806319x10-4, multiplying by a factor of228 will yield a number that can be stored in 16 bits and willbe highly accurate of the desired number.

268435456 • 1.45309806319x10-4 = 39006.3041205

Accuracy = = 100Actual_Value – FPGA_Value

2Fractional Bits( )

9 = CeilLOG

10 423

LOG102( )

Integer Bits Required = CeilLOG

10 Integer_Maximum

LOG102( )

46 Xcell Journal Third Quarter 2012

X P L A N A T I O N : F P G A 1 0 1

Page 47: Xcell Journal issue 80

X P L A N A T I O N : F P G A 1 0 1

Third Quarter 2012 Xcell Journal 47

rocal of the divisor. This approach can reduce the complex-ity of your design significantly. For example, to divide thenumber 312.732, represented in 9,7 (40029) format, by 15,the first stage is to calculate the reciprocal of the divisor.

This reciprocal must then be scaled up, to be representedwithin a 16-bit number.

65536 • 0.06666 = 4369

This step will produce a result that is formatted 9,23 whenthe two numbers are multiplied together.

4369 • 40029 = 174886701

The result of this multiplication is thus

While the expected result is 20.8488, if the result is notaccurate enough, then you can scale up the reciprocal by alarger factor to produce a more accurate result. Therefore,never divide by a number when you can multiply by thereciprocal.

ISSUES OF OVERFLOWWhen implementing algorithms, the result must not be larg-er than what is capable of being stored within the result reg-ister. Otherwise a condition known as overflow occurs.When that happens, the stored result will be incorrect andthe most significant bits are lost. A very simple example ofoverflow would be if you added two 16-bit numbers, eachwith a value of 65535, and the result was stored within a 16-bit register.

65535 + 65535 = 131070

The above calculation would result in the 16-bit resultregister containing a value of 65534, which is incorrect. Thesimplest way to prevent overflow is to determine the maxi-mum value that will result from the mathematical operationand use this equation to determine the size of the result reg-ister required.

If you were developing an averager to calculate the aver-age of up to fifty 16-bit inputs, the size of the required resultregister could be calculated.

50 • 65535 = 3276750

Using this same equation, this would require a 22-bitresult register to prevent overflow occurring. You must also

Integer Bits Required = CeilLOG

10 Integer_Maximum

LOG102( )

= 20.8481193781174886701

8388608

= 0.0666'1

15

take care, when working with signed numbers, to ensurethere is no overflow when using negative numbers. Usingthe averager example again, taking 10 averages of a signed16-bit number returns a 16-bit result.

10 • -32768 = -327680

Since it is easier to multiply the result by a scaled recip-rocal of the divisor, you can multiply this number by 1/10 • 65536 = 6554 to determine the average.

-32768 • 6554 = -2147614720

This number when divided by 216 equals -32770, whichcannot be represented correctly within a 16-bit output. Themodule design must therefore take the overflow intoaccount and detect it to ensure you don’t output an incor-rect result.

REAL-WORLD IMPLEMENTATION Let’s say that you are designing a module to implement atransfer function that is used to convert atmospheric pres-sure, measured in millibars, into altitude, measured in meters.

-0.0088x2 + 1.7673x + 131.29

The input value will range between 0 and 10 millibars,with a resolution of 0.1 millibar. The output of the moduleis required to be accurate to +/-0.01 meters. As the modulespecification does not determine the input scaling, you canfigure it out by the following equation.

Therefore, to ensure maximum accuracy you should for-mat the input data as 4 integer and 12 fractional bits. Thenext step in the development of the module is to use aspreadsheet to calculate the expected result of the transferfunction across the entire input range using the unscaledvalues. If the input range is too large to reasonably achievethis, then calculate an acceptable number of points. For thisexample, you can use 100 entries to determine the expect-ed result across the entire input range.

4 = CeilLOG

10 10

LOG102( )

Input (millibar) Output (meters)

0 131.2900

0.1 131.4666

0.2 131.6431

0.3 131.8194

0.4 131.9955

0.5 132.1715

0.6 132.3472

Page 48: Xcell Journal issue 80

this example, 40 bits of fractional representation is exces-sive. Therefore, the result will be divided by 232 to producea result with a bit length of 16 bits formatted 8,8. The samereduction to 16 bits is carried out upon the calculation of Bxto produce a result formatted 5,11.

The result is the addition of columns Cx2, Bx and A.However, to obtain the correct result you must first alignthe radix points, either by shifting up A and Cx2 to align thenumbers in an x,11 format, or shifting down the calculatedBx to a format of 8,8, aligning the radix points with the cal-culated values of A and Cx2.

In this example, we shifted down the calculated value by23 to align the radix points in an 8,8 format. This approachsimplified the number of shifts required, thus reducing thelogic needed to implement the example. Note that if youcannot achieve the required accuracy by shifting down toalign the radix points, then you must align the radix pointsby shifting up the calculated values of A and Cx2. In thisexample, the calculated result is scaled up by a power of28. You can then scale down the result and compare itagainst the result obtained with unscaled values. The dif-ference between the calculated result and the expectedresult is then the accuracy, using the spreadsheet com-mands of MAX() and MIN(), for the maximum and mini-mum error of the calculated result that can be obtainedacross the entire range of spreadsheet entries.

Once the calculated spreadsheet confirms that you canachieve the required accuracy, you can write and simulatethe RTL code. If desired, you could design the testbenchsuch that the input values are the same as those used in thespreadsheet. This allows you to compare the simulationoutputs against the spreadsheet-calculated results toensure the correct RTL implementation.

Once you have calculated the initial unscaled expected values,the next step is to determine the correct scaling factors for theconstants and calculate the expected outputs using the scaledvalues. To ensure maximum accuracy, each of the constantsused within the equation will be scaled by a different factor.

The scaling factor for the first polynomial constant (A) isgiven by

The second polynomial constant (B) scaling factor is given by

The final polynomial constant (C) can be scaled up by afactor of 216, as it is completely fractional.

These scaling factors allow you to calculate the scaledspreadsheet, as shown in Table 1. The results of each stageof the calculation will produce a result that will requiremore than 16 bits.

The calculation of the Cx2 will produce a result that is 32bits long formatted 4,12 + 4,12 = 8,24. This is then multipliedby the constant C, producing a result that will be 48 bits longformatted 8,24 + 0,16 = 8,40. For the accuracy required in

1 = CeilLOG

10 1.7673

LOG102( )

8 = CeilLOG

10 133.29

LOG102( )

48 Xcell Journal Third Quarter 2012

X P L A N A T I O N : F P G A 1 0 1

Polynomial Constant Unscaled Scaled

A 133.29 33610

B 1.77 57910

C -0.01 -577

Input Scaled C B A Result Result Scaled Expected Result Difference

0 0 0 33610 33610 131.289 131.2900 0.0009

409 -6 361 33610 33655 131.465 131.4666 0.0018

819 -24 723 33610 33700 131.641 131.6431 0.0025

1228 -52 1085 33610 33745 131.816 131.8194 0.0030

1638 -93 1447 33610 33790 131.992 131.9955 0.0033

2048 -145 1809 33610 33835 132.168 132.1715 0.0035

2457 -208 2171 33610 33880 132.344 132.3472 0.0035

2867 -283 2533 33610 33925 132.520 132.5228 0.0033

Table 1 – Real results against the fixed-point mathematics

Page 49: Xcell Journal issue 80

X P L A N A T I O N : F P G A 1 0 1

Third Quarter 2012 Xcell Journal 49

RTL IMPLEMENTATION The RTL example uses signed parallel mathematics to calcu-late the result within four clock cycles. Because of the signedparallel multiplication, you must take care to correctly han-dle the extra sign bits generated by the multiplications.

ENTITY transfer_function IS PORT(

sys_clk : IN std_logic;

reset : IN std_logic;

data : IN std_logic_vector(15 DOWNTO 0);

new_data : IN std_logic;

result : OUT std_logic_vector(15 DOWNTO 0);

new_res : OUT std_logic);

END ENTITY transfer_function;

ARCHITECTURE rtl OF transfer_function IS

-- this module performs the following

transfer function -0.0088x2 + 1.7673x +

131.29

-- input data is scaled 8,8, while the

output data will be scaled 8,8.

-- this module utilizes signed parallel

mathematics

TYPE control_state IS (idle, multiply,

add, result_op);

CONSTANT c : signed(16 DOWNTO 0) :=

to_signed(-577,17);

CONSTANT b : signed(16 DOWNTO 0) :=

to_signed(57910,17);

CONSTANT a : signed(16 DOWNTO 0) :=

to_signed(33610,17);

SIGNAL current_state : control_state;

SIGNAL buf_data : std_logic; --used to

detect rising edge upon the new_data

SIGNAL squared : signed(33 DOWNTO 0); --

register holds input squared.

SIGNAL cx2 : signed(50 DOWNTO 0); --regis-

ter used to hold Cx2

SIGNAL bx : signed(33 DOWNTO 0); -- regis-

ter used to hold bx

SIGNAL res_int : signed(16 DOWNTO 0); --

register holding the temporary result

BEGIN

fsm : PROCESS(reset, sys_clk)

BEGIN

IF reset = '1' THEN

buf_data <= '0';

squared <= (OTHERS => '0');

cx2 <= (OTHERS => '0');

bx <= (OTHERS => '0');

result <= (OTHERS => '0');

res_int <= (OTHERS => '0');

new_res <= '0';

current_state <= idle;

ELSIF rising_edge(sys_clk) THEN

buf_data <= new_data;

CASE current_state IS

WHEN idle =>

new_res <= '0';

IF (new_data = '1') AND (buf_data = '0')

THEN --detect rising edge new data

squared <= signed( '0'& data) *

signed('0'& data);

current_state <= multiply;

ELSE

squared <= (OTHERS =>'0');

current_state <= idle;

END IF;

WHEN multiply =>

new_res <= '0';

cx2 <= (squared * c);

bx <= (signed('0'& data)* b);

current_state <= add;

WHEN add =>

new_res <= '0';

res_int <= a + cx2(48 DOWNTO 32) +

("000"& bx(32 DOWNTO 19));

current_state <= result_op;

WHEN result_op =>

result <= std_logic_vector(res_int

(res_int'high -1 DOWNTO 0));

new_res <= '0';

current_state <= idle;

END CASE;

END IF;

END PROCESS;

END ARCHITECTURE rtl;

The architecture of FPGAs makes them ideal for imple-menting mathematical functions, although the implemen-tation of your algorithm may take a little more initialthought and modeling in system-level tools such as MAT-LAB® or Excel. You can quickly implement mathematicalalgorithms once you have mastered some of the basics ofFPGA mathematics.

Page 50: Xcell Journal issue 80

50 Xcell Journal Third Quarter 2012

XPLANATION: FPGA 101

The FPGA Engineer’s Guide to Using ADCs and DACs

by Adam TaylorPrincipal EngineerEADS [email protected]

Page 51: Xcell Journal issue 80

Once it’s performed the task itwas designed to do, an FPGA-based system next has to inter-

face with the real world, and as everyengineer knows, the real world tendsto function around analog as opposedto digital signals. That means conver-sion is going to be required to and fromthe digital domain from the analogrealm. Just as you face a plethora ofchoices in selecting the correct FPGAfor the job at hand, so too will you findan abundance of riches when choosingthe correct ADC or DAC for a system.

The first thing to establish is thesampling rate you will need to convertthe signal. This parameter will drivenot only the converter selection butwill also impact your FPGA choice aswell, to ensure the device can addressthe processing speed and logic foot-print required. The sampling rate ofthe converter needs to be at leasttwice that of the signal being sampled.Therefore, if you need to sample a sig-nal at 50 MHz, your sampling rate mustbe at least 100 MHz. Otherwise theconverted signal will be aliased backupon itself and will not be correctlyrepresented. This aliasing is notalways a bad thing; in fact, if the con-verter bandwidth is wide enough, youcan employ the aliasing to fold signalsback into the usable bandwidth.

ADC AND DAC KEY PARAMETERSAnalog-to-digital converters are con-structed by many different techniques.Some of the most common are flash,ramp and successive approximation.

• Known for their speed, flash con-

verters use a series of scaled ana-log comparators to compare the

input voltage against a referencevoltage; ADCs use the outputs ofthese comparators to determinedigital code.

• Ramp converters utilize a free-running counter connected to adigital-to-analog converter, com-paring the output of the DACagainst the input voltage. When thetwo are equal, the count is held.

• Successive-approximation

converters are an adaptation oframp converters and also utilizea DAC and a comparator againstthe analog input. Howeverinstead of counting up, the SARconverter determines whetherthe analog representation of thecount is above or below theinput signal, allowing a trial-and-error-based approach to deter-mining the digital code.

Digital-to-analog converters alsocome in several implementations,some of the most common being bina-ry-weighted, R-2R ladder and pulse-width modulation.

• Binary-weighted is one of thefastest DAC architectures. Thesedevices sum the result of individ-ual conversions for each logic bit.For example, a resistor-based DACwill switch resistors on or outdepending upon the current code.

• R-2R ladder converters use astructure of cascaded resistors ofvalue R-2R. Due to the ease withwhich precision resistors can beproduced and matched, theseDACs are more accurate than thebinary-weighted types.

• Pulse-width modulation, the sim-plest type of DAC architecture,passes the PWM waveform througha simple low-pass analog filter.These devices are commonly usedin motor control but also form thebasis for delta-sigma converters.

Many manufacturers of specialistdevices have developed their owninternal conversion architectures toprovide the best possible performancein specific areas depending upon theintended use. Each of these varietieshas pros and cons relating to the speedof conversion, accuracy and resolu-tion. As when selecting an FPGA, youwill look at the number of I/Os, I/Ostandards supported, clock manage-ment, logic resources and memory,along with parameters specific to thedevice type: the maximum samplingrate, signal-to-noise ratio (SNR), spuri-ous-free dynamic range (SFDR) andeffective number of bits (ENOB).

The sampling frequency is prettysimple; it’s the maximum rate at whichan ADC can digitize its input. SNR rep-resents the ratio of the signal to thenoise level, which is assumed to beuncorrelated with the input signal. Youcan determine the SNR theoreticallyusing the equation

SNR = 6.02N + 1.76 dB

where N is the resolution. This equa-tion is valid for a full-scale sine wave.

You can determine the actual SNRduring system test by taking a fastFourier transform (FFT) of the outputand measuring between the value ofthe input signal and the noise floor.

The SFDR, meanwhile, is the ratiobetween the input signal and the next

Third Quarter 2012 Xcell Journal 51

X P L A N A T I O N : F P G A 1 0 1

Interfacing the signal-processing FPGA with thereal world requires the use of an analog-to-digitalor digital-to-analog converter.

Page 52: Xcell Journal issue 80

especially if the ADC has a wide inputbandwidth. With careful consideration,aliasing can allow you to directly con-vert signals without the need for down-converters. For this reason, the fre-quency spectrum is divided up into anumber of zones.

Using the information presented inTable 1, it is possible to alias signalsfrom one Nyquist band to another if theconverter has a wide enough bandwidth.

COMMUNICATION CHOICESAs with all external devices, ADCs andDACs come with several interfacingoptions, either parallel or serial.Typically, higher-speed devices willimplement a parallel interface whileslower ones will utilize a serial interface.However, your choice of applicationmay drive you down a particular route. Itis easier, for instance, to detect a stuck-at bit in a serial interface than it is in aparallel one. Really high-speed inter-faces may provide multiple output buses(I and Q) or use double-data-rate (DDR)outputs; some devices might even offerboth options. Having multiple buses orDDR outputs allows you to maintain thedata rate while reducing the frequencyof operation required for the interface.For example, an interface sampling at600 MHz will produce an output at 300MHz (half the sampling frequency).

It is much easier to recover if theclock frequency is 75 MHz (FS/4) andthere are two data buses that providesamples from the device using DDR.This kind of ADC allows the relaxationof the input timings you must achieve.Many high-speed converters utilizeLVDS signaling on their I/O, as the lowervoltage swing and low current reducethe coupling that can occur with othersignaling standards such as LVCMOS.This coupling problem can affect theconverter’s mixed-signal performance.

DAC FILTERINGMost DACs will hold the analog outputuntil the next sampling period. Theresult is an interesting effect upon theoutput frequency domain. You will

highest peak, usually a harmonic of thefundamental. Normally the SFDR isgiven in terms of dBc and will degradeas the input signal power drops.

From these measurements of theconverter, you can calculate the effec-tive number of bits using the equation

ENOB = (SNR – 1.77 / 6.02)

When performing this testing, takecare to ensure that the FFT you’reusing is correctly sized, so as to ensurethat you haven’t inadvertently miscal-culated the noise floor. An incorrectFFT size will throw off your calcula-tions. The FFT noise floor is given by

FFT noise floor = 6.02N + 1.76 dB + 10LOG10(FFT size / 2)

You should perform these stepsusing a single-tone test—normally, asimple sine wave—to reduce the com-plexity of the output spectrum. Toguarantee you will get the best results,be sure to take coherent samples ofthe output. Coherent sampling takesplace when there is an integer numberof cycles within the data window. Forthis to be correct, then

FS / Fin = Ncycles / FFT

THE FREQUENCY SPECTRUMOn another front, you must also beaware of the Nyquist criteria whenimplementing your system, to ensurethe signal is correctly converted orquantized. This means that you mustsample at least twice the maximumfrequency of the signal of interest to

52 Xcell Journal Third Quarter 2012

X P L A N A T I O N : F P G A 1 0 1

ensure correct conversion. When sig-nals are sampled outside of this criteri-on aliasing will occur, potentiallyresulting in unwanted performance if itis not correctly understood.

It is also for this reason that ADCsrequire anti-aliasing filters to preventsignals or noise being aliased back intothe quantized signal. However, aliasingcan be very useful to the engineer,

Nyquist Zone Range Lower Range Upper Aliasing

First DC 0.5 FS None

Second 0.5 FS FS Folds

Third FS 1.5 FS Direct

Fourth 1.5 FS 2 FS Folds

Tap Coefficient

1 -6.22102953898351E-003

2 9.56204928971727E-003

3 -1.64864415228791E-002

4 3.45071042895427E-002

5 -0.107027889432584

6 1.166276

7 -0.107027889432584

8 3.45071042895427E-002

9 -1.64864415228791E-002

10 9.56204928971727E-003

11 -6.22102953898351E-003

Table 1 – Nyquist zones and aliasing

Table 2 – The first 11 coefficients for a DAC compensation FIR filter

Page 53: Xcell Journal issue 80

X P L A N A T I O N : F P G A 1 0 1

Third Quarter 2012 Xcell Journal 53

notice that both images exist across theoutput spectrum, and the output signalin all of the Nyquist zones exhibits aroll-off due to the sinc effect beingnearly 4 dB (3.92 dB) lower at 0.5 FS, asshown in Figure 1. The solution to bothof these issues is to utilize filters.

You can implement the first filter dig-itally within the FPGA, before the DAC.This simple digital filter can correct forthe roll-off. But if you use an anti-imagefilter, you will need to position it afterthe DAC output, since the images are aconsequence of the reconstruction.

The sinc correction filter can beimplemented very easily as a FIR filter.The simplest method to undertake indeveloping this filter is to plot the sincroll-off using the equation

Create the correction factor, which isthe reciprocal of those calculated forthe roll-off, and then take an inverseFourier transform to obtain the coeffi-cients you will need to design the fil-ter. Typically, you will be able to imple-ment this filter with a few taps. Table 2shows the first 11 coefficients for thefilter, while Figure 2 shows the correc-tion against the roll-off.

IN-SYSTEM TESTMany of these systems will require theconverter to achieve specific perform-ance characteristics for the end appli-cation, for example CDMA or GSM.Doing the testing necessary to achievethis performance can require a signifi-cant investment in test systems (arbi-trary waveform generators, logic ana-lyzers, pattern generators, spectrumanalyzers and the like). However, thereprogrammable flexibility of theFPGA allows you to insert specifictest programs into the device to eithercapture and analyze the output of anADC or provide the stimulus for aDAC, reducing the need for additionalextra test equipment.

CONVERSION 101Because FPGAs are commonly calledupon to interface with ADCs and DACs,it is crucial for any FPGA engineer toacquire least a basic understanding ofthe parameters of importance in thesedevices. This is especially true whenyou are planning to have the FPGA testthe performance of the converter usingthe reprogrammable flexibility of theFPGA as part of the design-proving andcommissioning process.

Sampling Frequency (FS)

Gai

n in

dB

DAC Roll-off

0

-5

-10

-15

-20

-25

-30

-35

-40

-450 0.2 0.4 0.6 0.8 1 1.2

Sampling Frequency (FS)

DAC Roll-offDAC Compensation Filter

1.8

1.6

1.4

1.2

0

0.8

0.6

0.4

0.2

00 0.1 0.2 0.3 0.4 0.5 0.6

Gai

n

Figure 1 – DAC roll-off between 0 and FS

Figure 2 – DAC roll-off and compensation filter to FS/2

Page 54: Xcell Journal issue 80

54 Xcell Journal Third Quarter 2012

The Sky’s the Limit for SSD EnterpriseStorage Startup Skyera

Serial entrepreneur and noted chip architectRadoslav Danilak’s latest innovation leveragesSpartan-6 FPGAs.

PROFILES IN XCELLENCE

By Mike SantariniPublisher, Xcell JournalXilinx, [email protected]

Page 55: Xcell Journal issue 80

Third Quarter 2012 Xcell Journal 55

With a rich history ofcreating bleeding-edge IC designs formicroprocessors,graphics proces-

sors and ASICs, Radoslav Danilakand his team at startup Skyera arenow set to introduce a product thatpromises to ignite the market forsolid-state storage systems in theenterprise storage market. Creatinginnovative architectures isn’t new forDanilak, but in this design the pro-cessing heart of the Skyera solid-statedrive (SSD) storage system isn’timplemented in an SoC or standaloneprocessor—it’s implemented in aXilinx® Spartan®-6 FPGA.

Over the last 15 years, the datastorage sector has gone through aremarkable renaissance thanks to theadvent of flash memory-based solid-state storage systems. For well over adecade, the slowest link in most com-pute-intensive systems has been ahard drive’s read-and-write speed.However, in the early 2000s, memoryvendors started using NAND flashmemory as a faster, lower-poweralternative to mechanical disks.NAND products found their first biguses in mobile phones and handsetdevices such as Apple’s iPod. Soonafterward, higher-capacity configura-tions found their way into laptopcomputers, mainly for power-savingsreasons, and into desktop computersfor performance.

The price of SSD systems is typi-cally much higher than that of tradi-tional mechanical hard drives.However, over the last five years, asthe capacity of NAND flash has bal-looned, the price per byte of NANDhas declined markedly, making itinevitable that SSD will sooner ratherthan later push mechanical harddrives into obsolescence even in theenterprise storage market.

For the last company Danilakfounded, SandForce, Danilak’s team

created a custom memory controllerSoC that sits at the heart of many oftoday’s first-generation enterpriseSSD systems. With his new company,Skyera, he’s now on the verge of intro-ducing an enterprise system that thecompany claims will offer a 10ximprovement in both performanceand capacity at a price point in paritywith slower and lower-capacity enter-prise storage systems. Skyera willintroduce the first commercial ver-sion of its product in the comingmonths, and so at the time of thisinterview Danilak was not at liberty todisclose its full feature set. However,he said that a key attribute of theproduct, beyond massive capacity andperformance at a competitive pricepoint, will be its ability to extend thelife of the flash memory in the system.

The Achilles’ heel of NAND flash isthat as process geometries continueto shrink, the physical size of thememory cells in the NAND is smaller,and so is the charge they hold.Repeated programming and erasingdegrade the ability of each cell tomaintain a charge reliably. Danilaksaid that where a 43-nanometer sin-gle-level-cell (SLC) NAND could per-form 100,000 writes before encounter-ing an uncorrectable error, a 15-nmmultilevel-cell (MLC) NAND will per-form only 1,000 writes before encoun-tering an uncorrectable error.

This phenomenon is often referredto as NAND “wear.” To counter wear asit gets worse with every process shrink,vendors of SSD systems must developever-more-sophisticated NAND con-trollers to perform wear leveling alongwith a number of proprietary tech-niques to get the most out of their sys-tems. Danilak said that vendors havetraditionally developed these con-trollers in ASICs, with every generationa bit more complex than the previousone to ensure better endurance, relia-bility and performance.

So what’s especially noteworthyabout Skyera’s new product is that toget this revolutionary system to mar-ket quickly, Skyera is implementing itsunique controller functions on a XilinxSpartan-6. “The challenge was, notonly did we have to increase perform-ance and reliability 10x over compet-ing designs, but we had to figure outhow to fit the design into an FPGA thatis 10x smaller than what an ASICwould give us,” said Danilak. “Havingthe right balance of capacity, perform-ance and cost is essential for capturingmarket share in the SSD enterprisemarket today. We could have imple-mented the controller SoC for our sys-tem in an ASIC design, but that wouldhave taken us 18 months to do and mil-lions of dollars in development costs.If there was any kind of a problem indesign or manufacturing, it would cost

P R O F I L E S I N X C E L L E N C E

‘Not only did we have to increase performance

and reliability 10 times over competing designs,

but we had to figure out how to fit the design

into an FPGA that is 10 times smaller than what

an ASIC would give us. Having the right balance

of capacity, performance and cost is essential.’

Page 56: Xcell Journal issue 80

us millions more to pinpoint and thenfix the problem.”

To get to market ASAP, Danilak’sgroup instead chose to implement theproduction design in a Spartan-6FPGA. What’s more remarkable isthat his team was able to get severalcritical blocks—for error correctionand encryption—running at 250 MHz.“The Xilinx data sheet said that 250MHz was the theoretical limit for theclock tree and so that was our target,”said Danilak. “At the same time, weneeded to shrink the area of thedesign by 20 times to fit in an FPGA.We had to throw out conventionalthinking and rethink the architecture,but we were able to do it.”

Danilak said that he helped comeup with the unique architecture forthe design, but that his skilled teamunder Rod Mullendore, chief archi-tect at Skyera and previously thechief architect at flash controllercompany SandForce, gets the creditfor implementing the architecture tosqueeze the performance out of thedesign. Mullendore said the teampaid careful attention to the codingof the RTL in Verilog and performed alittle, but not excessive, manual lay-out to achieve the desired perform-ance. “We were very careful in howwe structured the RTL, and we did acertain amount of placement to mini-mize long lines and keep high-speedblocks grouped close together,” saidMullendore. “We also limited the lev-els of logic. With an FPGA we can usean iterative process where we pro-gram the design to find long pathsand bottlenecks, and fix them.”

Danilak has a long and storiedcareer as an IC architect that startedwhen he was a PhD student and ateacher at the Technical University ofKosice in the Slovak Republic, wherehe studied math and developed data-base architectures. From there hewent on to develop subsequentprocessor-centric architectures at hisfirst startup, DanSoft (advancedVLIW). Then came stints at Gizmo

Technologies (64-bit, x86 architec-ture), Toshiba (memory and Play-Station II group), Nishan Systems (net-work processors), NVIDIA (nForce4chip sets and Fermi HPC) and hisprior startup SandForce (memorycontrollers).

“I went from a PhD in softwaredatabase systems and mathematics tomicroprocessors, to chip sets, tographics processors—and then toflash controllers and now, even anFPGA architecture for a company stillin stealth mode … always somethingnew,” said Danilak. “At every step inmy career, I’ve learned something newabout technology and about the busi-ness of technology.”

Indeed, one of those “business oftechnology” lessons involves findingthe best scenario for funding in whichthe company is either bootstrapped orthe number of backers is minimized sothe startup can focus its efforts onfielding innovative technology ratherthan on placating investors.

“Funding can be very complex andif you need funding for a custom chip,you need a lot of money,” said Danilak.“Implementing this on an FPGA meantwe didn’t get hit with a huge NRE andwe could show to backers the entiresystem working, rather than havethem tapping their watches waiting forthe silicon to come back. That’s ofcourse in addition to getting the prod-uct to customers faster. ”

Danilak said that the plan of recordis to implement this and Skyera’s next-generation system on Xilinx FPGAsand use the FPGAs in production, notjust prototyping. “I’ve used FPGAs forprototyping in the past but today theirperformance, capacity, power con-sumption and cost mean we don’thave to go to an ASIC or ASSP,” saidDanilak. “That’s not to say we may notdo an ASIC down the road, but it’sgreat that we don’t have to do one nowor in the near future.”

For more information on Skyera,visit the company’s site at http://

skyera.com/.

P R O F I L E S I N X C E L L E N C E

56 Xcell Journal Third Quarter 2011

GetPublished

Would you like to be published

in XcellPublications?

It's easier than you think!

Submit an article draft for our Web-based

or printed Xcell Publications and we will

assign an editor and a graphic artist

to work with you to make your work

look as good as possible.

For more information on this

exciting and highly rewarding program,

please contact:

Mike Santarini

Publisher, Xcell Publications

[email protected]

Page 58: Xcell Journal issue 80

58 Xcell Journal Third Quarter 2012

Virtex-6 FPGA-based board with native and custom IP is foundation for eight-channel demo system.

TOOLS OF XCELLENCE

by Rodger H. HoskingVice President and Co-founderPentek, [email protected]

Xilinx FPGAs Improve BeamformingSystem Design

Page 59: Xcell Journal issue 80

Third Quarter 2012 Xcell Journal 59

Beamforming is a signal-processing technique thatutilizes an array of sensorsto achieve directionality,

increase the strength of transmittedsignals and improve the quality ofreceived signals. Communications,radar, countermeasures, weaponssystems, oil and mineral exploration,medical imaging and direction findingmake extensive use of beamforming.

In direction finding, we steer thebeamformed antenna to locate thearrival angle of a signal source. We canuse two or more arrays to triangulatethe exact location of the source, whichis essential for many signal intelligenceand counterterrorism efforts. Theaccuracy of this technique depends onthe exact settings of gain and phaseamong the beamforming channels. Weused a Pentek product built around aXilinx® Virtex®-6 FPGA with nativeand custom IP to achieve these fineadjustments that improve system per-formance and accuracy.

PRINCIPLES OF BEAMFORMINGWe typically use beamforming with anarray of sensors or antennas toimprove receptivity in a specific direc-tion, for example, from a single cellphone as shown in Figure 1. The signalfrom this source arrives at each anten-na based on the distance between thesource and the antenna, so the anten-na signals have relative phase andamplitude offsets.

The beamforming process adjuststhe gain and phase of each antennasignal to compensate for the differentdelays in the signal paths. The adjust-ments align the signals from eachantenna with the signals arriving fromone particular direction. When the sig-nals are summed together, the non-aligned signals arriving from otherdirections cancel each other, while thesignals from the beamformed direc-tion add constructively for greatlyimproved signal-to-noise ratio. In thisway, by electronically adjusting thegain and phase in each path, we effec-

tively steer the antenna toward thedirection of the signal source.

EIGHT-CHANNEL SYSTEM In this system, we arranged eightantennas in a linear array, as shown inthe overall block diagram of Figure 2.The antenna frequency here is 2.5 GHz,so each antenna signal needs to beamplified, filtered and then downcon-verted to an intermediate frequency(IF) so that an A/D converter can digi-tize it at a reasonable sampling rate. Itis mandatory to use synchronous sam-pling across all eight channels in orderto preserve a fixed phase relationshipfor beamforming.

We then downconvert samplesfrom each A/D to the baseband’s com-plex I+Q signals in a digital downcon-verter (DDC), which also includeschannel-specific phase and gainadjustments for the beamforming“weights.” Finally, we add together alleight baseband signals in summationblocks to produce the beamformed

T O O L S O F X C E L L E N C E

GainAdjust

G1

PhaseAdjust

P1

GainAdjust

G2

PhaseAdjust

P2

GainAdjust

G3

PhaseAdjust

P3

GainAdjust

G4

PhaseAdjust

P4

Figure 1 – Typical cell phone beamforming system

Page 60: Xcell Journal issue 80

sum signal. A CPU analyzes this sumsignal and makes adjustments to thephase and gain coefficients to track oradapt to new targets.

PENTEK MODEL 53661BEAMFORMING BOARDThe Pentek Model 53661 softwareradio board is a 3U OpenVPX Cobaltboard shown in the simplified blockdiagram of Figure 3. It features four200-MHz, 16-bit A/D converters; a tim-ing, clock and synchronization sec-tion; and a Xilinx Virtex-6 FPGA.

The FPGA has access to all of theboard’s data and control paths,enabling factory-installed functionssuch as data multiplexing, channelselection, data packing, gating, trig-gering and memory control. The

Cobalt architecture organizes theFPGA as a container for data-pro-cessing applications where eachfunction exists as an intellectualproperty (IP) module.

We can use a variety of differentFPGAs to match the specific require-ments of the processing task.Supported FPGAs include theLX240T, LX365T, SX315T and SX475T.The SXT parts feature up to 2,016DSP48E slices and are ideal for modu-lation/demodulation,encoding/decod-ing, encryption/decryption and chan-nelization of the signals betweentransmission and reception.

Factory-installed in the FPGA arefour DDC IP cores, each capable ofaccepting A/D samples from any of thefour A/Ds. Each DDC has a decimation

range of 2k to 64k and can deliverdownconverted baseband bandwidthsfrom 2.5 kHz to 80 MHz. Each DDC hasprogrammable gain and phase shift con-trols accessible to the processor acrossthe VPX backplane. In this system wewill be assigning one A/D to each DDC.

A power meter at the output of eachDDC calculates the downconvertedsignal power. Each power meter isequipped with a threshold detectorthat generates a system interrupt if theoutput power exceeds the upperthreshold or falls below the lowerthreshold. These features simplify gaincalibration and signal-monitoringtasks that the system processor wouldotherwise have to do in software.

The 53661 FPGA also includes anative Aurora summation block that

60 Xcell Journal Third Quarter 2012

T O O L S O F X C E L L E N C E

RF Tuner

A/D

DDC

Gain

Phase

IF

RF Tuner

A/D

DDC

Gain

Phase

IF

RF Tuner

A/D

DDC

Gain

Phase

IF

RF Tuner

A/D

DDC

Gain

Phase

IF

RF Tuner

A/D

DDC

Gain

Phase

IF

CPU

8 totalFrequency

Gain,and Phase

Control

Figure 2 – Eight-channel beamforming system block diagram

The FPGA has access to all of the board's data and control paths,enabling factory-installed functions such as data multiplexing,

channel selection, data packing, gating, triggering and memory control. Each function exists as an IP block.

Page 61: Xcell Journal issue 80

adds the four DDC outputs togetherto perform the channel combining forbeamforming. Aurora is a lightweightlink-layer gigabit serial protocol forXilinx FPGAs. In this board, theAurora interface accepts a propagat-ed sum on one input port with fourserial links (4X), and delivers thenew propagated sum on a 4X outputport, including the contributionsfrom the four onboard channels.Operating at a clock bit rate of 3.125Gbits/second, each 4X link can trans-fer data at 1.25 Gbytes/s.

A native PCIe® x4 interface IP oper-ating at a 2.5-Gbit/s serial clock rateprovides a 1-Gbyte/s link to the controlprocessor for programming the DDCand the beamforming parameters. ThisPCIe link also supports delivery of thefour DDC outputs as well as the beam-forming summation output.

A programmable gigabit serialcrossbar switch connects the two 4XAurora summation links and the x4PCIe link to the VPX P1 backplaneconnector. The flexibility of thiscrossbar switch allows the 53661 tooperate in a variety of OpenVPXbackplane topologies and slot pro-

files. In this system, we map theAurora links onto the OpenVPXexpansion plane; we likewise mapthe PCIe interface onto the OpenVPXdata plane, which also assumes therole of control plane.

EIGHT-CHANNEL 3U OPENVPXBEAMFORMING SYSTEMThe complete eight-channel OpenVPXbeamforming system is shown inFigure 4. Two Model 53661 boards areinstalled in slots 1 and 2 of anOpenVPX backplane, along with a CPUboard in slot 3. Eight dipole antennasdesigned for receiving 2.5-GHz signalsfeed RF tuners containing low-noiseamplifiers, local oscillators and mix-ers. The RF tuners translate the 2.5-GHz antenna frequency signal down toan IF of 50 MHz.

The 200-MHz 16-bit A/D convertersdigitize the IF signals and perform fur-ther frequency downconversion tobaseband, with a DDC decimation of128. This provides I+Q complex out-put samples with a bandwidth of about1.25 MHz. Phase and gain coefficientsfor each channel are applied to steerthe array for directionality.

The CPU board in VPX slot 3 sendscommands and coefficients across thebackplane over two x4 PCIe links, orOpenVPX “fat pipes.”

We process the first four signalchannels in the upper left portion ofthe 53661 board in VPX slot 1, wherethe four-channel beamformed sum ispropagated through the 4X Aurorasum-out link across the backplane tothe 4X Aurora sum-in port of the sec-ond 53661 in slot 2. We then add thefour-channel local summation fromthe second 53661 to the propagatedsum from the first board to form thecomplete eight-channel sum. This finalsum is sent across the x4 PCIe link tothe CPU card in slot 3.

Assignment of the three OpenVPX4X links (OpenVPX fat pipes) on theModel 53661 boards is simplifiedthrough the use of the crossbar switchshown in the previous block diagram.This allows the 53661 to operate with awide variety of different backplanes.Because OpenVPX does not restrictthe use of serial protocols across thebackplane links, the system supportsmixed-protocol architectures like theone shown.

T O O L S O F X C E L L E N C E

Third Quarter 2012 Xcell Journal 61

200-MHz16-bit A/D

200-MHz16-bit A/D

DDC 2Gain + Phase

200-MHz16-bit A/D

DDC 3Gain + Phase

200-MHz16-bit A/D

DDC 4Gain + Phase

PENTEKMODEL53661

XILINX VIRTEX-6 FPGA

PCIex4I/F

AURORABEAMFORMSUMMATION

Aurora4X Sum In

Aurora4X Sum Out

x4 PCIeCROSSBAR

SWITCH

FP AEP01

FP BEP02

Expansion Plane FPSum In - Aurora

Expansion Plane FPSum Out - Aurora

Data Plane FPx4 PCIe

FP C

FP DDP01

DDC 1Gain + Phase

Figure 3 – Pentek Cobalt 53661 OpenVPX beamforming software radio board with a Xilinx Virtex-6 FPGA

Page 62: Xcell Journal issue 80

BEAMFORMING DEMO SYSTEMPentek engineers have set up aneight-channel beamforming demosystem equipped with a control panelthat runs under Windows on the CPUboard. An automatic signal scannerdetects the strongest signal frequen-cy arriving from a test transmitter.This frequency is centered aroundthe 50-MHz IF frequency of the RFdownconverter. Once the frequency isidentified, the eight DDCs are setaccordingly to bring that signal downto 0 Hz for summation. The controlpanel software also allows specifichardware settings for all of the param-eters for the eight channels includinggain, phase and sync delay.

An additional display shows thebeamformed pattern of the array.This display is formed by adjustingthe phase shift of each of the eightchannels to provide maximum sensi-tivity across arrival angles from –90°to +90° perpendicular to the plane ofthe array.

The theoretical seven-lobe patternof an ideal eight-element array for asignal arriving at a 0° angle (directlyin front of the array) is displayed forcomparison with an actual plot.Below the lobe pattern is a polar plotshowing a single vector pointing tothe computed angle of arrival. This isderived from identifying the lobe withthe maximum response.

An actual plot of a real-life trans-mitter is also shown for a sourcedirectly in front of the display. In thiscase, the perfect lobe pattern isaffected by physical objects, reflec-tions, cable length variations andminor differences in the antennas.Nevertheless, the directional infor-mation is computed quite well. As thesignal source moves left and right infront of the array, the peak lobemoves with it, changing the comput-ed angle of arrival.

This demo system is availableonline at Pentek. If you’d like to viewa live demonstration, please let usknow of your interest by visitinghttp://pentek.com/go/xcellbf.

62 Xcell Journal Third Quarter 2012

T O O L S O F X C E L L E N C E

200-MHz16-bit A/D

200-MHz16-bit A/D

DDC 2Gain + Phase

200-MHz16-bit A/D

DDC 3Gain + Phase

200-MHz16-bit A/D

DDC 4Gain + Phase

XILINX VIRTEX-6

FPGA

PCIex4I/F

AURORABEAMFORMSUMMATION

4X Sum In

4X Sum Out

x4 PCIe

EP01

EP02

FP C

DP01

DDC 1Gain + PhaseRF Tuner

RF Tuner

RF Tuner

RF Tuner

200-MHz16-bit A/D

200-MHz16-bit A/D

DDC 2Gain + Phase

200-MHz16-bit A/D

DDC 3Gain + Phase

200-MHz16-bit A/D

DDC 4Gain + Phase

XILINX VIRTEX-6

FPGA

VPXBACKPLANE

4X Aurora

x4 PCIe

x4 PCIe

PCIex4I/F

AURORABEAMFORMSUMMATION

4X Sum In

4X Sum Out

x4 PCIe

EP01

EP02

FP C

DP01

DDC 1Gain + PhaseRF Tuner

RF Tuner

RF Tuner

RF Tuner

DP02

DP01

OpenVPXCPU

Module

FP A

FP B

Figure 4 – Eight-channel OpenVPX demo beamforming system utilizes two Pentek Cobalt 53661 beamforming boards.

Page 64: Xcell Journal issue 80

WHAT IS THE VIVADO DESIGN SUITE?It’s all about improving designer productivity.This entirely new tool suite was architected toincrease your overall productivity in design-ing, integrating and implementing with the 28-nanometer family of Xilinx All Programmabledevices. With 28-nm manufacturing, Xilinxdevices are now much larger and come with avariety of new technologies includingstacked-silicon interconnect, high-speed I/Ointerfaces operating at up to 28 Gbps, hard-ened microprocessors and peripherals, andAgile Mixed Signal. With these larger andmore complex devices, developers are facedwith multidimensional design challenges thatcan prevent them from hitting market win-dows and increasing productivity.

The Vivado Design Suite is a completereplacement for the existing Xilinx ISEDesign Suite of tools. It replaces all of the ISEDesign Suite point tools such as ProjectNavigator, XST, implementation, COREGenerator™, Timing Constraints Editor, ISim,ChipScope™, Xilinx Power Analyzer (XPA),FPGA Editor, PlanAhead™ and SmartXplorer,among others. All of those capabilities arenow built directly into the Vivado IntegratedDesign Environment (IDE), leveraging ashared scalable data model.

With the Vivado Design Suite, developersare able to accelerate design creation withhigh-level synthesis and implementation byusing place-and-route to analytically opti-mize for multiple and concurrent design met-rics, such as timing, congestion, total wirelength, utilization and power. Built onVivado’s shared scalable data model, theentire design process can be executed inmemory without the need to write or trans-late any intermediate file formats, accelerat-ing runtimes, debug and implementationwhile reducing memory requirements.

64 Xcell Journal Third Quarter 2012

Xilinx Tool & IP UpdatesThe Vivado™ Design Suite 2012.2 is now available, at

no additional cost, to all Xilinx® ISE® Design Suite

customers that are currently in warranty. The Vivado

Design Suite provides a highly integrated design

environment with a completely new generation of

system-to-IC-level features, including high-level

synthesis, analytical place-and-route and an advanced

timing engine. These tools enable developers to increase

design integration and implementation productivity.

XTRA, XTRA

Page 65: Xcell Journal issue 80

Vivado provides users with upfrontmetrics that allow for design andtool-setting modifications earlier inthe design process, when they haveless overall impact on the schedule.This capability reduces design itera-tions and accelerates productivity.

Users can manage the entiredesign process in a pushbutton man-ner by using the Flow Navigator fea-ture in the Vivado IDE, or control itmanually by using Tcl scripting.

SHOULD I CONTINUE TO USETHE ISE DESIGN SUITE ORMOVE TO VIVADO? The ISE Design Suite is an industry-proven solution for all generations ofXilinx’s All Programmable devices.The Xilinx ISE Design Suite continuesto bring innovations to a broad baseof developers, and extends the famil-iar design flow for 7 series and XilinxZynq™-7000 Extensible ProcessingPlatform (EPP) projects. ISE 14.2,which brings new innovations andcontains updated device support, isavailable for immediate download.

The Vivado Design Suite 2012.2,Xilinx’s next-generation design envi-ronment, supports 7 series devicesincluding Virtex®-7, Kintex™-7 andArtix™-7 FPGAs. It offers enhancedtool performance, especially on largeor congested designs.

IS VIVADO DESIGN SUITE TRAINING AVAILABLE? Vivado is new and takes full advan-tage of industry standards such aspowerful interactive Tcl scripting,Synopsys Design Constraints,SystemVerilog and more. To reduceyour learning curve, Xilinx hasrolled out 10 new instructor-ledclasses that include information onhow to use the Vivado tools. Wealso encourage you to view theVivado Quick Take videos found atwww.xilinx.com/design-tools.

ARE THERE DIFFERENT EDITIONS OF THE VIVADODESIGN SUITE?The Vivado Design Suite is availablein either the Design or System edi-tion (see Table 1). In-warranty ISEDesign Suite Logic and EmbeddedEdition customers will receive thenew Vivado Design Edition. ISEDesign Suite DSP and SystemEdition customers will receive thenew Vivado System Edition. Vivadois not yet available for WebPACK™users. Vivado WebPACK is currentlyplanned for later this year.

For more information aboutXilinx design tools for the next gen-eration of All Programmable devices,please visit www.xilinx.com/

design-tools.

Xilinx recommends that customersstarting a “new” design on a KintexK410 or larger device contact theirlocal FAE to determine if Vivado isright for the design. Xilinx does notrecommend transitioning during themiddle of a current ISE Design Suiteproject, as design constraints andscripts are not compatible betweenthe environments.

For more information, please readthe ISE 14.2 and Vivado 2012.2release notes.

WHAT ARE THE LICENSINGTERMS FOR VIVADO? There is no additional cost for theVivado Design Suite during theremainder of 2012. A single downloadat the Xilinx download center containsboth ISE Design Suite 14.2 and Vivado2012.2. All current, in-warranty, seatsof ISE Design Suite are entitled to acopy of the Vivado Design Suite begin-ning with the 2012.2 release.

For customers who generated anISE Design Suite license for versions13 or 14 after Feb. 2, 2012, your cur-rent license will also work for Vivado.Customers who are still in warrantybut who generated licenses prior toFebruary 2 will need to regeneratetheir licenses in order to use Vivado.For license generation, please visitwww.xilinx.com/getlicense.

Third Quarter 2012 Xcell Journal 65

PILLARS OF PRODUCTIVITY FEATURESWEBPACK

(DEVICE LIMITED) DESIGN EDITION SYSTEM EDITION

IP Integration and Implementation

Verification and Debug

Design Exploration and IP Generation

Integrated Design Environment

Software Development Kit (SDK)

Vivado Simulator Limited

Vivado Logic Analyzer

Vivado Serial I/O Analyzer

Vivado High-Level Synthesis

System Generator for DSP

VIVADO DESIGN SUITE EDITIONS

Table 1 – Vivado Design Suite editions; WebPACK support will roll out later in the year.

Page 66: Xcell Journal issue 80

66 Xcell Journal Third Quarter 2012

Xpress Yourself in Our Caption Contest

We hope this issue’s challenge won’t find you at a loss for words. Ifyou have a yen to Xercise your funny bone, we invite you to stepup and submit an engineering- or technology-related caption for

this cartoon showing a technical presentation by a mime. The image mightinspire a caption like “The demand to present his system diagram before theboard of directors left Edgar speechless.”

Send your entries to [email protected]. Include your name, job title, companyaffiliation and location, and indicate that you have read the contest rules atwww.xilinx.com/xcellcontest. After due deliberation, we will print the sub-missions we like the best in the next issue of Xcell Journal. The winner andtwo runners-up will each receive an Avnet Spartan®-6 LX9 MicroBoard, anentry-level development environment for evaluating the Xilinx® Spartan®-6family of FPGAs (http://www.xilinx.com/products/boards-and-kits/

AES-S6MB-LX9.htm).

The deadline for submitting entries is 5:00 pm Pacific Time (PT) on Oct. 1,2012. So, pull on your beret and get writing!

GREG ZAKER, senior hardwaredesign engineer at ViaSat Inc.,

won a shiny new Spartan-6 evaluation board with this caption

for the scene of the hypnotic imageon the computer monitor from

Issue 79 of Xcell Journal:

Congratulations as well to

our two runners-up. Like our

winner, they will each receive an

Avnet Spartan-6 LX9 MicroBoard.

“... and this is Bob. He’s taking our online fast-track

management course ...”

– Richard M. Myers, VP, senior staff devel-

opment engineer, AISD Inc.

“Attempting an innovative computerpenetration technique, John became

a victim of his own hack.”

– Tom Drake, electrical engineer, Harris

“No, I said let’s look at the Virtex code, not vortex.”

DA

NIE

LG

UID

ER

AXCLAMATIONS!

NO PURCHASE NECESSARY. You must be 18 or older and a resident of the fifty United States, the District of Columbia, or Canada (excluding Quebec) to enter. Entries must be entirely original and must bereceived by 5:00 pm Pacific Time (PT) on October 1, 2012. Official rules available online at www.xilinx.com/xcellcontest. Sponsored by Xilinx, Inc. 2100 Logic Drive, San Jose, CA 95124.

Page 67: Xcell Journal issue 80

No Room for Error

Synopsys FPGA Design Tools are an Essential Element for Today’s High Reliability Designs

There is no room for error in today’s communications infrastructure systems, medical and industrial applications, automotive, military/aerospace designs. These applications require highlyreliable operation and high availability in the face of radiation-induced errors that could result in a satellite failing to broadcast, a telecom router shutting down or an automobile failing to respond to a command.

To learn more about Synopsys FPGA Design tools visit www.synopsys.com/fpga

Hi-Rel Checklist

� Built-in redundancy

� Safety critical design

� Traceability and

equivalence checks

� Reproducible, documented

design process

� Power reduction

� DO-254 compliance

Page 68: Xcell Journal issue 80

www.xilinx.com/designtools

Vivado™ Design Suite. For the Next Decade of All-Programmable Devices.

You innovate.

We attack the major design bottlenecks.

Vivado tools not only speed the design of programmable logic and I/O, but accelerate programmable systems integration and implementation into devices incorporating 3D stacked silicon interconnect technology, ARM® processing systems, Analog Mixed Signal (AMS), and a significant percentage of intellectual-property (IP) cores.

Your design productivity just got multiplied.

Productivity. Multiplied.

typename NATIVE, template <typename N, typename T> class TCLE

ass native_wrapper<NATIVE*,tcltypes::tclobj,: public native_wrapper_base<NATIVE*,tcltype

typedef native_wrapper_base<NATIVE*,tcltypesse_t;typedef TCLEXT<NATIVE*,tcltypes::tclobj> tcltypedef detail::tcltypeinfo<NATIVE*,tcltypes

static void reset_type(Tcl_Obj* tobj)

ALL PROGRAMMABLE

© Copyright 2012 Xilinx, Inc. XILINX, the Xilinx logo, Vivado, and other designated brands included herein are trademarks of Xilinx in the United States and other countries. ARM is the registered trademark of ARM Limited in the EU and other countries. All other trademarks are the property of their respective owners.

PN 2521