xcell82

68
www.xilinx.com/xcell/ SOLUTIONS FOR A PROGRAMMABLE WORLD Xcell journal Xcell journal ISSUE 82, FIRST QUARTER 2013 Implementing Analog Functions in Rugged, Rad-Hard FPGAs Software-Programmable Digital Predistortion on the Zynq SoC Nuts and Bolts of Designing an FPGA into Your Hardware Do You Have the Latest ISE, Vivado and IP Updates? Getting Your Zynq SoC Design Up and Running: A Hands-on Tutorial How a MicroBlaze Can Peaceably Coexist with the Zynq SoC page 30

Upload: mayam-ayo

Post on 13-Apr-2015

61 views

Category:

Documents


1 download

DESCRIPTION

silinx monthly magazine

TRANSCRIPT

Page 1: Xcell82

www.xilinx.com/xcell/

S O L U T I O N S F O R A P R O G R A M M A B L E W O R L D

Xcell journalXcell journalI S SUE 82 , F I RST QUAR TER 2013

Implementing Analog Functions in Rugged, Rad-Hard FPGAs

Software-Programmable DigitalPredistortion on the Zynq SoC

Nuts and Bolts of Designing an FPGA into Your Hardware

Do You Have the Latest ISE, Vivado and IP Updates?

Getting Your Zynq SoC Design Up and Running: A Hands-on Tutorial

How a MicroBlaze Can Peaceably Coexist with the Zynq SoC

page30

Page 2: Xcell82

Avnet Electronics Marketing presents a new series of video-based SpeedWay

Design Workshops™, featuring the new Xilinx Zynq™-7000 All Programmable SoC

Architecture. This workshop series includes three online courses progressing

from introductory, through running a Linux operating system on the Zynq-7000

SoC, and finally to the integration of the high-speed analog signal chain with

digital signal processing from RF to baseband for wireless communications.

© Avnet, Inc. 2013. All rights reserved. AVNET is a registered trademark of Avnet, Inc. Xilinx® and Zynq™ are trademarks or registered trademarks of Xilinx®, Inc.

www.zedboard.org/trainings-and-videos

www.zedboard.org

Page 4: Xcell82

L E T T E R F R O M T H E P U B L I S H E R

Xilinx, Inc.2100 Logic DriveSan Jose, CA 95124-3400Phone: 408-559-7778FAX: 408-879-4780www.xilinx.com/xcell/

© 2013 Xilinx, Inc. All rights reserved. XILINX, the Xilinx Logo, and other designated brands includedherein are trademarks of Xilinx, Inc. All other trade-marks are the property of their respective owners.

The articles, information, and other materials includedin this issue are provided solely for the convenience ofour readers. Xilinx makes no warranties, express,implied, statutory, or otherwise, and accepts no liabilitywith respect to any such articles, information, or othermaterials or their use, and any use thereof is solely atthe risk of the user. Any person or entity using suchinformation in any way releases and waives any claim itmight have against Xilinx for any loss, damage, orexpense caused thereby.

PUBLISHER Mike [email protected]

EDITOR Jacqueline Damian

ART DIRECTOR Scott Blair

DESIGN/PRODUCTION Teie, Gelwicks & Associates1-800-493-5551

ADVERTISING SALES Dan [email protected]

INTERNATIONAL Melissa Zhang, Asia [email protected]

Christelle Moraga, Europe/Middle East/[email protected]

Tomoko Suto, [email protected]

REPRINT ORDERS 1-800-493-5551

Xcell journal

www.xilinx.com/xcell/

Getting Real with the Zynq-7000 All Programmable SoC

I f you have been a regular reader of Xcell Journal over the last five years, you’ve fol-lowed the development and deployment of Xilinx’s 28-nanometer line of AllProgrammable devices and our next-generation software suite, Vivado™. This mas-

sive effort, which was kicked off by CEO Moshe Gavrielov in 2008 and executed by thehardworking folks here at Xilinx, is now starting to pay dividends.

As you’ve read in past issues, back in 2008 Xilinx’s management decided to make sever-al breakout moves starting with the 28-nm node. Instead of just producing FPGAs withtwice the capacity and faster and more transceivers, the company expanded its portfolioby adding the Zynq™-7000 All Programmable SoC, while also pioneering homogeneous andheterogeneous 3D ICs. Xilinx was the first to commercially release such 3D parts, a gener-ation ahead of the competition.

Since the effort began, we here at Xilinx have been very excited about the prospects ofthe 28-nm lineup, especially the Zynq-7000 SoC. The device pairs a 28-nm ARM® dual-coreCortex™-A9 MPCore™ that runs at up to 1 GHz with programmable logic—all on the samechip. As someone who has covered the IC design space for many years, after hearing aboutthe 28-nm architecture plans, I couldn’t help but think the Zynq-7000 was a stroke of semi-conductor business genius—truly the right device at the right time.

Today we are seeing how all this hard work is beginning to pay off in the real world. TheZynq-7000 All Programmable SoC (previously called the Extensible Processing Platform) hasreceived top honors from several of the top trade publications and engineering organizations.It was named the SoC Product of the year in 2011 by CMP Media (now called UBM, the pub-lisher of EE Times and EDN), and won the IET Innovation Award as well as the EmbeddedSystem Product of the Year Award from Elektra (the European Electronics Industry) thatsame year. In 2012, Electronic Products Magazine named the Zynq-7000 its Product of theYear and The Microprocessor Report gave the device its Analyst Choice Award.

But by far the most rewarding part of the rollout has been seeing how excited you—Xilinx customers—are about the Zynq All Programmable SoCs and how today you areactively designing with them. In this issue of Xcell Journal, you will read several articlesabout how to implement Zynq-7000 designs, starting with the cover story, a hands-on tuto-rial by EADS Astrium principal engineer Adam Taylor.

In addition to these feature articles, there are a growing number of other informativeresources in which your colleagues are actively sharing their Zynq-7000 All ProgrammableSoC design methods and experiences. Two of these resources are Avnet’s ZedBoard.org(www.zedboard.org) and UBM’s All Programmable Planet (www.programmableplanet.org). I encourage you to not only read and learn from these sources, but to actively participate inthe discussions at both sites. And of course, Xilinx’s official training group (http://www.xilinx.com/training/index.htm) and Xilinx User Community Forums(http://forums.xilinx.com/) are fabulous resources as well.

If you don’t have a Zynq-7000 yet, contact Avnet to get a ZedBoard (http://

www.em.avnet.com/en-us/design/drc/Pages/Zedboard.aspx). And if you feel lucky,(and live in North America, excluding Quebec), you can try your hand at writing a wittycaption for our caption contest on page 66. The person who writes the funniest caption,according to our judges and in compliance with the contest rules, will win a newZedBoard, and his or her caption will run in the next issue of Xcell Journal.

Get interactive, get designing and have fun innovating. Oh, and if you feel like writingabout your experiences for an upcoming Xcell Journal, please feel free to contact me.

Mike SantariniPublisher

Page 5: Xcell82

Interested in adding “published author” to your resume and achieving a greater level of credibilityand recognition in your peer community? Consider submitting an article for global publication inthe highly respected, award-winning Xcell Journal.

The Xcell team regularly guides new and experienced authors from idea development to publishedarticle with an editorial process that includes planning, copy editing, graphics development, andpage layout. Our author guidelines and article template provide ample direction to keep you ontrack and focused on how best to present your chosen topic, be it a new product,research breakthrough, or inventive solution to a common design challenge.

Our design staff can even help you turn artistic concepts into effective graphics orredraw graphics that need a professional polish.

We ensure the highest standards of technical accuracy by communicating withyou throughout the editorial process and allowing you to review your article tomake any necessary tweaks.

Submit final draft articles for publication and we will assign an editor and a graphics artist to work with you to make your papers clear, professional, and effective.

For more information on this exciting and highly rewarding opportunity, please contact:

Mike SantariniPublisher, Xcell [email protected]

Would you like to write for Xcell Publications? It’s easier than you think.

Get Published

www.xilinx.com/xcell/

Page 6: Xcell82

C O N T E N T S

VIEWPOINTS XCELLENCE BY DESIGN APPLICATION FEATURES

Cover StoryGetting Your Zynq SoC Design Up and Running Using PlanAhead 88

Xcellence in Aerospace & Defense

Implementing Analog Functions in Rugged, Rad-Hard FPGAs… 18

Xcellence in Wireless Communications

Software-Programmable Digital Predistortion on the Zynq SoC… 22

Letter From the PublisherGetting Real with the Zynq-7000

All Programmable SoC… 4

Xpert OpinionDesigning to Forestall Failure… 14

1818

2222

Page 7: Xcell82

F I R S T Q U A R T E R 2 0 1 3 , I S S U E 8 2

Xperts Corner

How a MicroBlaze Can Peaceably Coexist with the Zynq SoC… 30

Xplanation: FPGA 101

Debugging High-Speed Memories… 36

Xplanation: FPGA 101

Nuts and Bolts of Designing an FPGA into Your Hardware… 42

THE XILINX XPERIENCE FEATURES

Excellence in Magazine & Journal Writing2010, 2011

Excellence in Magazine & Journal Design and Layout2010, 2011, 2012

Tools of Xcellence Using the Parallel FFT for Multigigahertz

FPGA Signal Processing… 50

OmniTek, Xilinx Roll Zynq-7000

Video Development Kit… 56

Xamples... A mix of new

and popular application notes… 60

Xtra, Xtra The latest Xilinx tool updates

and patches… 64

Xclamations! Share your wit and wisdom by supplying

a caption for our wild and wacky artwork… 66

XTRA READING

4242

5050

3030

Page 8: Xcell82

COVER STORY

8 Xcell Journal First Quarter 2013

Xilinx’s All Programmable SoC offers many advantages to the designer, and while development may seem complicated at first, in fact it is not.

Getting Your Zynq SoC Design Up and Running Using PlanAhead

Page 9: Xcell82

First Quarter 2013 Xcell Journal 9

C O V E R S T O R Y

The Zynq™-7000 All Programmable SoC isthe first of a new class of Xilinx® devicesthat marry a dual-core ARM® Cortex™-A9processor with programmable logic on a sin-gle chip (see cover story, Xcell Journal issue75). As such, the device offers a great leapforward in not only system flexibility, butalso performance and integration. This sys-tem-on-chip platform does, however, requirethe FPGA engineer to consider a slightly dif-ferent development path than is customaryfor logic-based FPGAs.

The good news is that development is notas difficult as you might think, thanks inlarge part to the availability of the XilinxPlanAhead™ tool. Let’s take a closer look atthe steps involved in generating a Zynq-7000system that you can load via JTAG.

LOTS OF FPGA RESOURCESThe Zynq-7000 family combines the dual-core ARM Cortex-A9 processor—the chip’sprocessing system (PS)—with programma-ble logic (PL) resources equivalent todevices from the Xilinx Artix™ or Kintex™families. Depending upon the device andspeed grade you select, it is possible for theprocessing system to operate at frequenciesup to 1 GHz.

The processing system side of the ZynqSoC provides supporting hardware for anumber of memory and communicationsperipherals. This architecture allows thePS to be the master and to operateautonomously, without the need to evenpower up or configure the programmable

Tby Adam TaylorPrincipal EngineerEADS [email protected]

This is the first in a planned series of hands-on

Zynq-700 All Progammable SoC tutorials by

Adam Taylor. A frequent contributor to Xcell Journal,Adam wrote a second article in this issue on how to

design an FPGA into your hardware (see page 42).

He is also a blogger at All Progrmmable Planet

(http://www.programmableplanet.com/profile_content.asp?piddl_userid=450978). – Ed.

Page 10: Xcell82

logic system. The key features the pro-cessing system supports are:

• Dynamic RAM interface—double-data-rate SDRAM, supporting bothDDR2 and DDR 3 standards, includ-ing LPDDR2.

• Static RAM interfaces—support forSRAM devices along with NAND,NOR and QSPI flash memories.

• Communications interfaces—UART,CAN, I2C, SPI, USB, GigabitEthernet and SD/SDIO.

• General-purpose I/O—support for upto four 32-bit general-purpose I/Os.

With the exception of the DRAMlink, the remaining interfaces utilizethe Multiuse I/O, a bank of 54 I/O pinsthat you can assign to different func-tions as required. This does, of course,mean you cannot implement all periph-erals at once. However, you can extendthis interface into the programmablelogic if you wish.

Communication between the pro-cessing system and programmablelogic sides is comprehensive, andincludes:

• Advanced Microcontroller BusArchitecture (AMBA®) andAdvanced eXtensible Interface(AXI), with both master and slaveinterfaces, including direct links toexternal DDR and on-chip memory.In addition, the AcceleratorCoherency Port (ACP) providescoherent access to the CPU memo-ry space from the programmablelogic side.

• Extended Multiuse I/O, which allowsyou to increase the number of pro-cessing system peripherals by usingprogrammable logic I/O.

• Direct memory access and inter-rupts, including 16 interrupts fromthe PL to the PS, along with fourDMA channels.

• Four clocks and resets from the pro-cessing system to the programma-ble logic.

Zynq configuration differs from theprocess of configuring many of theprevious families of Xilinx FPGAs,even those that contained hard macroprocessors like the Virtex®-2 Pro,Virtex-4 and Virtex-5 families. Withinthe Zynq system, the processing sideis the master and therefore boots fol-lowing a normal software bootprocess, loading in the applicationfrom nonvolatile memory and theneither executing in place or cross-loading the application into fasterDDR memory using a more complexboot loader. The programmable logicside of the Zynq loads via theProcessor Configuration Access Port(PCAP), which allows both partial andfull configuration. As always, you canalso configure the device through theJTAG interface. Because of this addedcomplexity, the Zynq device has moremode pins than you normally find onan FPGA to support all of these vari-ous configuration methods.

WHAT TOOLS DO WE NEED?Creating a system-on-chip (SoC)requires a little more effort than devel-oping a logic-based FPGA design.However, it is still pretty straightfor-ward and the tool chain provides goodguidance. To create a Zynq SoC designyou will need four tools at a minimum:Xilinx Platform Studio, the ISE®

Design Suite, Xilinx’s SoftwareDevelopment Kit (SDK) and iMPACT.

Xilinx Platform Studio is the placewhere you create your processing sys-tem, be it PowerPC®, MicroBlaze™ or,in this case, the Zynq PS. Here, in XPS,you define the configuration, interfaces,timing and address ranges—everythingneeded to generate a processor system.The output from this process is an HDLnetlist defining your system.

Most FPGA engineers are familiarwith ISE. This Xilinx tool chain takesyour HDL design, including the XPSnetlist, and generates the required bitfile. The Software Development Kit,

10 Xcell Journal First Quarter 2013

C O V E R S T O R Y

Figure 1 – The Base System Builder for the AXI bus

Page 11: Xcell82

meanwhile, is where you will developthe software for the processing system.To correctly generate the software, theSDK needs to be aware of the hard-ware configuration of the system.Finally, the iMPACT tool allows you toprogram the bit file into the system andgenerate the bitstream format.

While you can use these four tools inisolation to create an All ProgrammableSoC, rather helpfully Xilinx’s PlanAheadtool is capable of integrating them all,delivering a much simpler developmentprocess. Therefore, let’s focus on thisPlanAhead approach as we dig deeperinto the Zynq flow.

CREATING THE HARDWAREThe first step in the developmentprocess is to open PlanAhead and cre-ate a new RTL project. This is a fairlystraightforward matter. Since you willhave no existing RTL, IP or constraintsinitially, keep selecting “next option”until you reach the device selectiontab. If you are using a developmentboard, then select the board you intendto target. Otherwise, select the deviceyou wish to target for your system.

Once you have created the project,you will see the default screen withinPlanAhead. At this point you will needto add a source file to the design. Youcan do this by selecting “add sources”under project management options onthe flow options tab, which should beon the left-hand side of the screen.Initially we are interested in getting theprocessing system side of the Zynq upand running; therefore, select the “addor create embedded sources” option.This will open yet another tab fromwhich you can select the “create subde-sign” button, and then enter the selectedname for the processing system.

The tool will now take you through anumber of stages to create a base sys-tem using what is called the BaseSystem Builder (BSB) Wizard, asshown in Figure 1. This step will openXilinx Platform Studio, which will pres-ent you with a graphical representationof the processing system within thedevice (Figure 2).

If you have targeted a developmentboard, the tool will automatically con-figure the peripheral interfaces foryour board. However, if you are tar-

geting a single device you will need toimplement your own peripherals with-in this system—for example, DDR tim-ings and the MIO connections. Doingso is very straightforward, and youcan make sure you have made noerrors by running the design rulechecker before you leave XPS andreturn to PlanAhead.

Once back in PlanAhead, you willneed to generate an RTL netlist for theprocessing system in either VHDL orVerilog, picking your choice of languagevia the project settings. Generating thisHDL file is as simple as right-clicking onthe name of your processing system andselecting the “generate HDL” option.

You can then create RTL designs forthe programmable logic side of thedevice within PlanAhead as well, usingthe “add sources” option. Once youhave created the RTL you require, youwill need to define the top-level moduleby right-clicking on it and selecting the“set top” option. If your design has botha processing system and a programma-ble logic system, you will need to createa top-level RTL file to connect themtogether as you desire.

First Quarter 2013 Xcell Journal 11

C O V E R S T O R Y

Figure 2 – The XPS view of the processing system

Page 12: Xcell82

The final remaining step before theSoC is complete and a bit file can begenerated is to create a constraintsfile with pinout information for the PLside of the Zynq-7000 device. Do thiswithin PlanAhead by right-clicking onthe constraints option within thesource window and selecting “addsources.” This action will allow you tocreate the file. Now it is time to imple-ment the design using the “run imple-mentation” button.

CREATING THE SOFTWAREHaving defined your system usingPlanAhead and Xilinx Platform Studio,you now need to create the software thatis to run on the processing system. Forthe majority of this development youwill be using the Software DevelopmentKit. However, there’s one more thing youneed to do within PlanAhead—namely,

export the hardware configuration to theSDK by means of the export options with-in PlanAhead.

Performing this export should resultin the opening of the SDK and theimportation of the hardware specifica-tion, as seen in Figure 3.

Once you have imported the hard-ware, you will need to create a boardsupport package for your hardware, asshown in Figure 4. This BSP will be spe-cific to your hardware implementation. Itwill contain drivers for peripherals alongwith a number of hardware-specific Cheader files such as xparameters.h,which defines the memory map of thesystem along with other system configu-ration parameters. When creating theBSP, be sure to select your target pro-cessing system and CPU0. (We are exam-ining a simple system and will leave mul-ticore operations to another day.)

The next step is to create the actualsoftware project itself. At this point,you will get to choose a language,either C or C++, and a number of tem-plates ranging from a simple “helloworld” to a first-stage boot loader. Youcan also choose whether to implementthe project on a bare-metal system—that is, one with no operating sys-tem—or within the Linux OS. Onceyou have completed this step and cre-ated your project, you are free to beginwriting code for your application.

If you are new to developing embed-ded software, the task may at firstseem a little daunting. However, theSDK provides plenty of support andassistance to both first-time and expe-rienced developers. Should you beunsure of the peripheral drivers withinyour system or how to drive them, thesystem.mss file under your BSP within

C O V E R S T O R Y

12 Xcell Journal First Quarter 2013

Figure 3 – Importing the hardware to the SDK

The board support package will be specific to your hardware implementation. It will contain drivers for peripherals along with

a number of hardware-specific C header files.

Page 13: Xcell82

the tool’s Project Explorer will showthe memory map. This view alsooffers a list of helpful links to docu-mentation and examples.

There are a large number of Cheader files for you to call into a proj-ect. However, certain commonly usedfiles contain parameters and func-tions to drive peripherals.

• Stdio.h defines the standardinput and output. This fileshould be familiar to many peo-ple who have used C before.

• Platform.h defines basic func-tions for the implementationand platform initialization, andfor cleanup.

• Xil_types.h defines a number oftypes needed for the Xilinx IP.

• Xgpio.h defines the drivers forthe AXI-connected general-pur-pose I/O module.

• Xparameters.h defines thehardware architecture andconfiguration for this imple-mentation.

Having written your code and suc-cessfully compiled it, you will then ofcourse wish to download the applica-tion to your board. To do this, youfirst need to create a linker file scriptthat will determine where the exe-cutable is placed within the systemmemory. While your first programmay be small and would fit within theon-chip memory, it is often best toplace it within the DDR memory (ifused) to demonstrate that this inter-face was also correctly configured.You can create the linker script byright-clicking on your project andselecting “generate linker script.”

The final thing to do is to set up therun configurations (project explorer -> project -> run -> run, as within theright-click menu). Here you can defineyour STDIO connection—the serialport you are using for the STDIO (ifany). Once you are happy, you canapply the configuration and then closeit (do not click “run,” as we are notquite ready for that). The final stage isto attach the board to your PC, con-necting the JTAG cable and all other

cables you may need, including anRS232 serial cable, and programmingthe hardware via the Xilinx tools (usethe “program FPGA” option). Checkthat the bit file is the correct one andthen program the device.

Having completed the configura-tion, you can now download the ELFfile and try out your program. This isas simple as selecting your projectwithin PlanAhead’s Project Explorer,right-clicking “run as” and then“launch on hardware.” If you havenot previously built or compiled yourcode, running this command willautomatically build your design, pro-vided it is error-free.

The Zynq-7000 device introduces anew type of all programmable sys-tem-on-chip development flow thatmay seem complicated to the newuser. However, the tool chain is inte-grated within, and accessible from,Xilinx’s PlanAhead tool, creating aseamless, stage-by-stage develop-ment process that ensures you willbe able to create a Zynq system withrelative ease.

First Quarter 2013 Xcell Journal 13

C O V E R S T O R Y

Figure 4 – Creating the board support package

Page 14: Xcell82

14 Xcell Journal First Quarter 2013

XPERT OPINION

Designing to Forestall FailurePlanning a safety-critical system demands close attention to hardware failure rates and redundancy.

Page 15: Xcell82

First Quarter 2013 Xcell Journal 15

When it comes to safety-critical sys-tems (planes, trains, automobiles), orto safe industrial controls (windmills,factory steam plants), the designermust take a number of factors intoaccount. First, failures are always an“option”—they can and will happen. [1]Second, when a failure occurs, it isyour job to make sure it does notresult in damage or loss of life.

As opposed to a commercial sys-tem, where a failure results in incon-venience or frustration, the conse-quences of a failure in a safety-criticalapplication are dire. Therefore, thedesigner of a fail-safe system must dif-ferentiate and apply the proper designphilosophy to eliminate or at leastminimize failures.

In a server, router, telephone sys-tem, cell phone switch or other com-mercial system, it’s extremely unlikelythat failure will result in damage,destruction or loss of life. Dependingon the service grade (number of cus-tomers affected) and the use of theequipment (business telephone switchor 911 service center), a more strin-gent set of rules may apply. But if thesystem is deemed noncritical and itfails, the result will be a loss of serv-ice, which is something that is not verypleasant, but is acceptable. Of course,a reputation for a low failure rate willalways help in keeping customershappy (and paying for your products).

Let’s take a closer look at systemsin which failure is acceptable andsimply means a loss of service, beforediscussing systems where failure isunacceptable.

SYSTEMS THAT ARE ‘OK’ TO FAILTypically, designers wish to make theirdesigns as reliable as possible. Thehard failure rates of Xilinx® FPGAdevices by technology are listed in theQuarterly Reliability Report [2], inTables 1-16. Take the hard failure ratefor the 7 series devices, which is listedas 24 FIT (failures per billion hours),as of Aug. 22, 2012. That equates toone failure per 4,756 years. Imaginehaving 10,000 systems in the field, andyou would expect to see one customerfield failure return every 174 days onaverage. This is the baseline, “do noth-ing special,” failure rate for the systemif the system consisted of just onecomponent: a Xilinx 28-nanometer 7series FPGA device. Now, no systemconsists of just one component. Thereare the printed-circuit boards, powersupplies, connectors, LEDs, switchesand so forth.

A typical system, with all of itsassociated components, is going tohave a lifetime much shorter thanthat of the Xilinx FPGA device alone.Typical numbers range from 1,000 to10,000 FIT for systems in which the 7series FPGA device is a small part.This equates to failures ranging fromevery 114 years down to 11.4 years,or from four days down to less than aday between customer returns.Again, in these non-safety-criticalsystems, a failure represents aninconvenience, but no one is harmedand no damage results.

Not all components have a constantfailure rate. Most devices have what isknown as a “bathtub” curve—one withsteep sides and a flat bottom—for theirmean time between failures. The initialperiod of life harbors more failuresthan the latter portion, a phenomenonknown as “infant mortality.” There isusually a long stretch after this initialperiod of a lower failure rate, known asthe “normal operating life.” Near theend of the component’s lifetime, thefailure rate increases, and this period isknown as the “end of life.”

X P E R T O P I N I O N

Wby Austin LeseaPrincipal EngineerXilinx [email protected]

Page 16: Xcell82

A laptop computer, for example,has a typical service life of threeyears. Failures that occur immediate-ly after purchase or after three yearsof ownership dominate the failuresthat the user will experience, with farfewer failures in between.

Xilinx designs its commercial andindustrial products for a 15-year min-imum life. Depending on junctiontemperature, this time period may bemore or less. The reliability and userguides spell out the details for thevarious Xilinx FPGAs.

SYSTEMS THAT ARE NOT ‘OK’ TO FAILIn a system where life or the environ-ment is at stake, there is always anallowed failure rate. It may be a verystringent requirement, but since nosystem is perfect and no designexists that is completely fail-safe,there is a number, however tough itmay be to achieve.

One example is for a power-con-trol system in a windmill. The wind-mill, if it fails, may throw off a wind-mill blade, totally destroying thedevice and potentially injuring orkilling a person nearby. Let us sup-pose for argument’s sake that thatnumber is 10 FIT, or one failure every11,416 years. The first thing to recog-nize is that no single device is reli-able enough to meet this require-ment; the system will have to havesome form of redundancy.

As the hardware failure rate of thecomponents is too large to meet therequirement, the designer will haveto provide duplication for all of thecritical components, subsystems andpower supplies, along with othermeans to verify that there are no fail-ures. Such systems are oftendesigned such that when they fail,they fail safely.

Imagine two completely separateassemblies, each with a failure rateof 10,000 FIT, each checking theother through a duplicated communi-cations channel (in this case, perhaps

just two UARTs on four wires in eachsystem). The single point of failure isnow the four wires or the UARTs, butany failure would be detectable; thesubassembly would be unable to talkto its twin, and the system would beable to immediately shut down every-thing safely.

There is still the possibility thatboth systems might fail at the sametime. That is the probability of failureof each, times itself, or one simulta-neous failure every 130 million years.That performance is probably goingto meet the requirements.

As another example, the spaceshuttle used five redundant systems,and as long as they all agreed withone another, they could launch. Afterlaunch, as long as three of the fiveworked (agreed), the systems weredeemed safe. Similarly, commercialairplanes use three-way redundancyfor their flight controls.

WHEN AM I DONE?Often designers forget that designingfor reliability is an endless taskunless they have a clearly stated goal.In the above examples, if the soft fail-ure rate (from atmospheric neu-trons) was 1 FIT for an element in aXilinx 7 series device (that’s the actu-al rate for a PicoBlaze™ soft proces-sor), what should a designer do?Well, the general rule that I have seenused for a very long time now is thatonce any failure rate is less than thehardware failure rate, you are done.

If the final system does not meetthe requirement, then you have thewrong architecture (no redundancy,for example).

THERE ARE STANDARDS FOR THAT, NOW…Fortunately, designers now have inter-national standards for industrial, aero-nautic, automotive, medical and othersystems. [3] IEC 61508 is the umbrellastandard that applies to all at the high-est level, with each industry having itsown subset standard. The one thing to

remember about these standards isthat none of the manufacturers of thecomponents are certified (or certifi-able), but the final system and its man-ufacturers must be certified.

Anyone who claims to be able tosell you a 26262 (IEC automotivestandard) certified FPGA device islying; there is no such thing. Thesame is true for DO-254, a standardfor airplanes. It is the entire systemand its design that must meet thestandard, not the components. Ofcourse, if there is one FPGA device inthe system and nothing else, it is pret-ty easy to find the failure rate. Justlook it up in UG116, if it is a Xilinxcomponent. But that is never thecase; all systems consist of more thanjust one device.

References

1. “Failure is always an option,” Adam

Savage, from Myth Busters: http://www.

brainyquote.com/quotes/quotes/a/adamsa

vage207580.html

2. http://www.xilinx.com/support/docu-

mentation/user_guides/ug116.pdf

3. For more information and some fun

reading, visit www.xilinx.com/applica-

tions/aerospace-and-defense/avionics/.

Here you will find white papers on DO-

254 and SEU. Another helpful link is

http://en.wikipedia.org/wiki/ IEC_61508.

Austin Lesea graduated

from UC Berkeley with

his BS EECS in electro-

magnetic theory (1974)

and MS EECS in com-

munications and infor-

mation theory (1975).

He worked in the

telecommunications field for 20 years designing

optical, microwave and copper-based transmis-

sion systems. For 10 years Austin was in the IC

Design department for the Virtex® product line

at Xilinx. For the last three years he has worked

for Xilinx Research Labs, where he is looking

beyond the present technology issues. Austin

holds 70 patents.

16 Xcell Journal First Quarter 2013

X P E R T O P I N I O N

Page 17: Xcell82

No Room for Error

Synopsys FPGA Design Tools are an Essential Element for Today’s High Reliability Designs

There is no room for error in today’s communications infrastructure systems, medical and industrial applications, automotive, military/aerospace designs. These applications require highlyreliable operation and high availability in the face of radiation-induced errors that could result in a satellite failing to broadcast, a telecom router shutting down or an automobile failing to respond to a command.

To learn more about Synopsys FPGA Design tools visit www.synopsys.com/fpga

Hi-Rel Checklist

� Built-in redundancy

� Safety critical design

� Traceability and

equivalence checks

� Reproducible, documented

design process

� Power reduction

� DO-254 compliance

Page 18: Xcell82

18 Xcell Journal First Quarter 2013

XCELLENCE IN AEROSPACE & DEFENSE

Implementing Analog Functions in Rugged, Rad-Hard FPGAs

Aerospace engineers are pulling converter and clocking functionsonboard Xilinx FPGAs to streamline the design of their high-reliability systems.

Page 19: Xcell82

First Quarter 2013 Xcell Journal 19

FPGAs have alreadychanged the cost/reli-ability paradigm forembedded systems inhigh-reliability appli-cations, thanks to

advances in hardness and power reduc-tion. But on many embedded applica-tions for high-reliability markets,designers depend on a number ofperipheral analog components such asanalog-to-digital and digital-to-analogconverters to talk to the real world.Other system components such asphase-locked loops (PLLs) and DC/DCconverters are usually required to com-plete a system design. These peripheralsimpact overall cost, size and reliability.Peripheral analog parts can also be chal-lenging to work with and to source forradiation environments, as an example.

To further leverage the power ofFPGAs, military-and-aerospace engi-neers are actively looking for ways tointegrate many of these analog func-tions onto the FPGA. Synthesizable,digital IP cores that replace some ana-log functions now exist, allowingmil/aero designers to implement ADC,DAC, DC/DC controller and clock-mul-tiplier functions in fully digital process-es such as FPGAs. Not only does thisnew ability leverage the advantages ofFPGAs, it also helps mitigate manychallenges of using analog componentsin high-reliability applications.

OVERCOMING HIGH-RELIABILITYDESIGN CHALLENGES The engineering challenges of design-ing for military or high-reliability appli-cations such as aerospace are numer-ous. Power and weight are usuallyunder strict budgets because they canaffect operating costs and insertioncosts exponentially. Physical shocksafeguards, force survival and protec-tion from single-event upsets (SEUs)and latchup often mean that parts arelarger, heavier and more power hun-gry than commercial devices. Forinstance, a commercial 12-bit, 10-MHz

bandwidth ADC measures approxi-mately .71 by .42 inches and con-sumes 280 milliwatts. The equivalentradiation-hardened part is .81 by .72inches and consumes 335 mW. That’salmost double the size at 20 percentmore power.

A wide temperature range is anotherissue. Typically, temperatures of -40°Cto +80°C are expected for many mili-tary embedded applications here onEarth. Temperature takes on anothercomplexion in space. In satellite elec-tronics design, for instance, the normaloperating junction temperature mightbe -55°C to +125°C. Monitoring thisonboard temperature is key to effectivesystem maintenance, but installing arad-hard ADC part to provide this func-tion can add up to one square inch ofboard and require additional compo-nents and testing.

When a high-reliability designmakes use of peripherals such asADCs, DACs, DC/DC converters orPLLs, each one of those componentsrepresents a possible point of failure.Each must be qualified and tested,and each is most likely not optimallydesigned for the specific need. Thereis also always a risk that the manufac-turer will discontinue the part, forcingrequalification of the entire system.

These challenges to working withanalog components in high-reliabilityenvironments can evaporate by usingthe FPGA for a unified, all-digitalapproach. Let’s take a look at this newparadigm in military/aerospace design.

PULLING ANALOG FUNCTIONSONTO THE FPGA Regardless of how you define “ana-log” and “digital,” significant differ-ences and integration issues existbetween the two. Because of theseissues, it can be very advantageous tohave digital designers pull analogfunctions onto an FPGA and testthem. Herein, we will define “digital”as using standard digital library cellsand passive components for a fully

X C E L L E N C E I N A E R O S PA C E & D E F E N S E

by Allan Chin CEO [email protected]

Luciano [email protected]

Page 20: Xcell82

synthesizable and digitally testabledesign. Designers can create digital IPblocks of ADCs, DACs, DC/DC con-verter controllers and clock multipli-ers in RTL format and implement themin all-digital processes.

With these IP blocks, militarydesigners can take advantage ofrugged and radiation-hardened FPGAsto implement customized analog func-tions. Not only does this approachleverage the inherent protection prop-

erties of the FPGA, but these blocksare also a great way to utilize unusedFPGA resources. Xilinx recognizesthis advantage and now partners withStellamar to provide these functions.Increasingly, aerospace companies areturning to these solutions to attackanalog-integration problems.

DIGITAL ADC CORE YIELDS BENEFITS Figure 1 depicts an example blockdiagram of a Stellamar Digital ADCIP core. With the digital approach,the core requires only a few externalpassive components. The IP core is

instantiated right inside the FPGA andis easy to implement through digitalsynthesis. On a Xilinx® Virtex®-5QVdevice, a scenario such as that picturedin Figure 1 utilizes less than 1 percentof FPGA resources.

Proprietary signal processingmakes it possible to replicate analogsigma-delta ADC performance withall-digital library cells. Companieslike SEAKR Engineering and theFinnish Meteorological Institute are

using Digital ADC IP in their OnBoard Processor Program and LunarLanding Missions, respectively. Somebenefits are:

• 50 percent lower power than ana-log ADC parts

• 68 percent smaller area than ana-log ADC parts

• Process technology independence

• Reduced risk and cycle time

• Digital integration, synthesis and testing

• Easier radiation-hardened design

PERFORMANCE PLUS APPLICATIONSCurrent performance is up to 15 bits ofresolution and several hundred kilo-hertz of bandwidth. Bandwidth dependson the selected resolution. This level ofperformance is suitable for a host ofapplications including sensors (temper-ature, pressure, voltage, current andacceleration), touchscreen integration,high-quality voice and motor control.

As an example, many design teamsuse a radiation-hardened, 12-bit, 10-MHz bandwidth ADC part for moni-toring onboard temperature and volt-age. Some FPGAs, such as the XilinxVirtex-5QV space-grade FPGA, evenhave embedded diodes highlightingthe importance of the temperature-sensing function. However, normalbandwidths for these types of meas-urements are 0.5 Hz to 10 Hz, sousing bandwidth in the megahertz islike driving the head of a pin with asledgehammer. A Digital ADC IP coreon a radiation-hardened FPGA canget down to 0.5-Hz bandwidth perchannel and can consume less than 6mW, compared with 335 mW for theexternal part. Why waste criticalboard space and power for such alow-level task?

CONTROLLING DC/DCPOWER MANAGEMENT Power management is becoming alarger part of overall system design.Sometimes a single design caninclude more than 30 power supplies.External radiation-hardened DC/DCconverters retain the same difficultiesas external ADCs. Thus, the use ofthese parts to control power com-plexity in high-reliability applicationsdoes not scale well.

20 Xcell Journal First Quarter 2013

X C E L L E N C E I N A E R O S PA C E & D E F E N S E

AnalogInput

ReconstructionFilter

ClockInput

LVDSInput

DigitalOutput

DigitalCore

FPGA or ASIC

+

Figure 1 – An example of a fully digital ADC IP core interface

A Digital ADC IP core on a radiation-hardened FPGA can get down to 0.5-Hz bandwidth per channel and can consume less than

6 mW, compared with 335 mW for an external part.

Page 21: Xcell82

All-digital DC/DC controller IP nowexists to take advantage of radiation-hardened FPGAs’ processes and toallow for simplification of control,redundant power supplies, infinitesequencing and infinite throttling (seeFigure 2). You will still need an exter-nal power transistor, but this can bemuch easier to work with than a fullDC/DC converter part.

DIGITAL CLOCKING SOLUTIONSPhase-locked loops are some of themost widely used analog blocks forclock generation; thus, most FPGAshave incorporated PLL capability with-in the package. However, some FPGAs,including certain radiation-hardenedFPGA families, do not include PLLs atall. Other radiation-hardened FPGAsgenerally do not include the PLLs inthe rad-hard portion of the package.

Digital clock multiplier IP used onthese FPGAs can provide the ability togenerate any clock up to about 2 GHzwith no lock time. Models show 50-picosecond peak and 35-ps RMS, with5-ns to 1-ns rise/fall. As with DigitalADC IP, this solution requires very fewoff-the-shelf passive components.

PUTTING IT TOGETHERHistorically, FPGAs did not lend advan-tages to analog functions, forcing high-reliability design teams to use nonopti-mal external analog parts. This is nolonger the case, as mil/aero engineersnow have robust options for integratinganalog functions into any digital fabric,including radiation-hardened FPGAs.By using digital implementations of ana-log functions from Stellamar, engineerscan add critical functionality such asthermal monitoring, redundant powersupplies and clocking functions–allwithout adding weight, power or size tothe design. The digital synthesis and testmethodology ensures the operabilityand greatly increases reliability.

Further, designers can easily lever-age these technologies across projectsand across the whole organization.With budgets being slashed and per-formance more important than ever,these digital IP cores give mil/aeroengineering teams the flexibility andproductivity they need to meet criticalmission objectives.

For more information aboutStellamar cores, call (480) 664-9594 orgo to www.stellamar.com.

X C E L L E N C E I N A E R O S PA C E & D E F E N S E

First Quarter 2013 Xcell Journal 21

Vref

Vin+

SW1 L

CSW2

CLK

Vout

Load

Digital Control Circuit

Figure 2 – Example DC/DC controller block diagram

FPGA SOLUTIONS

■ 100K Logic Cells + 240 DSP Slices■ DDR3 SDRAM + Gigabit Ethernet■ SO-DIMM form factor (68 x 30 mm)

Mars AX3 Artix®-7 FPGA Module

from

Mars ZX3 SoC Module

■ Xilinx Zynq™-7000 All Programmable SoC (Dual Cortex™-A9 + Xilinx Artix®-7 FPGA)

■ DDR3 SDRAM + NAND Flash■ Gigabit Ethernet + USB 2.0 OTG■ SO-DIMM form factor (68 x 30 mm)

We speak FPGA.

www.enclustra.com

■ Plug and play solution for USB 3.0 connectivity between FPGA and host

■ 300+ MBytes/sec data transfer rate■ Uses Cypress EZ-USB FX3 device controller

FPGA Manager FX3 USB 3.0

■ Xilinx Kintex™-7 FPGA■ High-performance DDR3 SDRAM■ USB 3.0, PCIe 2.0 + 2 Gigabit Ethernet ports■ Smaller than a credit card

Mercury KX1 FPGA Module

Page 22: Xcell82

22 Xcell Journal First Quarter 2013

XCELLENCE IN WIRELESS COMMUNICATIONS

Xilinx’s Zynq All Programmable SoC and Vivado HLS tool provide an efficient and flexible way to implement wireless digital front-end applications.

by Baris OzgulResearch ScientistXilinx, Dublin, [email protected]

Jan LangerResearch ScientistXilinx, Dublin, [email protected]

Juanjo NogueraResearch ScientistXilinx, Dublin, [email protected]

Kees VissersDistinguished EngineerXilinx, [email protected]

Software-Programmable Digital Predistortion on the Zynq SoC

Software-Programmable Digital Predistortion on the Zynq SoC

Page 23: Xcell82

First Quarter 2013 Xcell Journal 23

Digital predistortion (DPD)is an advanced digital sig-nal-processing techniquethat mitigates the effects

of power amplifier (PA) nonlinearityin wireless transmitters. DPD plays akey role in providing efficient radiodigital front-end (DFE) solutions for3G/4G basestations and beyond. Ageneric DPD system consists of a pre-distorter that compensates for thenonlinearity effects prior to the inputof the PA, and an estimator on thefeedback path from the output of thePA, which updates the predistortercoefficients to reflect the possiblechanges in operational characteris-tics. Based on the modulation type,power amplifier technology and trans-mission bandwidth, the DPD solutioncan differ from system to system.Hence, it is worthwhile to provide aflexible design methodology for DFEsthat facilitates the implementationand integration of new DPD coeffi-cient estimation algorithms.

Modern FPGAs are a promising tar-get platform for the implementationof flexible wireless DFE solutions,

including DPD. The Xilinx® Zynq™-7000 All Programmable SoC, whichintegrates the programmable-logicfabric with a multicore ARM® proces-sor system, provides an especiallyinteresting new option. The Zynq-7000 All Programmable SoC enablesthe partitioning of functionalityamong hardware and software com-ponents to increase the overall sys-tem performance. As shown in Figure1, this All Programmable SoC is capa-ble of implementing all the requiredfunctions of a DFE in a single chip.

Our team created a software-pro-grammable design flow to imple-ment the DPD coefficient estimatorson a Zynq-7000 device using amethodology that allows the flexiblepartitioning of the functionalityamong the hardware and softwarecomponents, depending on the com-plexity of the estimation algorithmin use. We achieved a significantproductivity increase thanks to theVivado™ high-level synthesis (HLS)tool, which allowed the implementa-tion and verification of our hard-ware design at the software level.

SOFTWARE-PROGRAMMABLEDESIGN FLOW ON ZYNQ The Zynq-7000 SoC is a hybrid comput-ing platform that consists of two majorparts. First, there are two embeddedARM Cortex™-A9 processors operat-ing at up to 1 GHz and their supportinfrastructure, including a cache hier-archy, memory controllers and I/Operipherals. This section in itself rep-resents a complete programmableembedded platform that is fit for usewithout any FPGA programming. Thetwo ARM processor cores come with asingle-instruction, multiple-data(SIMD) extension called NEON thatprovides a 128-bit-wide data path.

Second, the Zynq-7000 devices alsocontain an area of programmable logicthat represents a conventional FPGA.The major advantage of this device isthe high bandwidth available betweenthe FPGA and the embedded proces-sors by means of a multitude of AXI4communication ports. In this way, it ispossible to develop software for theprocessor and, as necessary, offloadcompute-intensive tasks into the pro-grammable logic.

X C E L L E N C E I N W I R E L E S S C O M M U N I C A T I O N S

MemoryControl

6/10GSerdes

OpticalModules

CPRI/OBSAIMaster

DigitalUp-

conversion(DUC)

CrestFactor

Reduction(CFR)

DigitalDown-

conversion(DDC)

DigitalPre-

distortion(DPD)

CPRI/OBSAIMaster

6/10GSerdes

SPI/I2C UARTSARM A9:

DPDUpdate

ARM A9:O&MRTOS

Memory

GPIODAC

JESD204Or LVDS DAC

ADC

ADC

ADC

JESD204Or LVDS

JESD204Or LVDS

Figure 1 – Programmable wireless digital front end on the Zynq All Programmable SoC

Page 24: Xcell82

Figure 2 illustrates a shortoverview of our proposed design flow.The first step is no different from atypical software design flow. Itinvolves implementing the applicationin pure software, where the resultscan be tested and verified thoroughly.Next, a profiling step reveals the bot-tlenecks of the application, such ascompute-intensive subfunctions thatneed hardware acceleration.

This so-called hardware/softwarepartitioning involves not only theselection of functions to accelerate,but also the decision on an adequatecommunication infrastructure such asDMA transfers vs. memory mapping.Additionally, some changes to thesoftware are necessary in order to callthe hardware accelerator instead ofthe original C function.

In a traditional design flow, an expe-rienced hardware designer implementsthe functions to be accelerated in ahardware description language likeVHDL or Verilog. With the availabilityof the Xilinx Vivado HLS tool, wereplaced the manual hardware imple-

mentation with an automatic step thatuses the original software functions togenerate corresponding hardwareaccelerators. The conversion requiresseveral incremental manual refinementsteps that include adding directives tothe code or even restructuring the algo-rithm in order to obtain an efficienthardware implementation. During thisprocess, the code is still executable assoftware, so that the designer can usethe original test and verification envi-ronment. This is a big advantage overthe traditional hardware design flow.

After the completion of the refine-ment step, the Vivado HLS tool gener-ates a hardware accelerator imple-mentation that meets the design con-straints—for example, the requiredclock frequency and the amount ofhardware resources. Next, the integra-tion step requires the instantiation ofcommunication components (forexample, DMA) to enable the interac-tion between the hardware acceleratorand the processor. Finally, system syn-thesis generates a bitstream to pro-gram the programmable logic.

HIGH-LEVEL SYNTHESIS FORPROGRAMMABLE LOGICHLS tools raise the level of abstrac-tion for designs in the programmablelogic, and make the time-consumingand error-prone register-transfer-level (RTL) design tasks transparent.These tools take as their input ahigh-level description of the specificalgorithm to implement and generatethe RTL design for the target hard-ware accelerator.

Modern HLS tools accept as theirinput untimed C/C++ descriptions,from which they interpret the sequen-tial semantics of the input/outputbehavior and the architecture specifi-cations. Based on the C/C++ code,compiler directives and targetthroughput requirements, these toolsgenerate high-performance pipelinedarchitectures. Furthermore, theyenable automatic pipeline stage inser-tion and resource sharing to reducehardware resource utilization.

We have adopted the overall hard-ware design flow seen in Figure 3.The first step in this flow is restruc-turing a reference C/C++ code thatthe designer could have derived froma MATLAB® functional description.Here, restructuring means doingmodifications in the original code(which is typically coded for clarityand ease of conceptual understand-ing rather than for optimized per-formance) to turn it into a formatmore suitable for the target process-ing engine. This is similar to rearrang-ing an application’s code to havemore efficient performance on a DSPprocessor. When targeting FPGAs,this restructuring might involve, forexample, rewriting the code so it rep-resents an architecture specificationthat can achieve the desired through-put, or rewriting to make efficientuse of specific FPGA features such asembedded DSP macros.

The functional verification of theimplementation code uses traditionalC/C++ compilers (gcc, for example)and reuses C/C++ level testbenches

24 Xcell Journal First Quarter 2013

X C E L L E N C E I N W I R E L E S S C O M M U N I C A T I O N S

So

ftw

are

Des

ign

Flo

w SoftwareDevelopment

SW/HWPartitioning

SoftwareRefinement

Vivado HLSRefinement

Profiling

SW Compiler

Executable

HW Synthesis

Bitstream

Figure 2 – Overview of the design flow, which begins in software development

Page 25: Xcell82

developed for the verification of thereference code. In addition to theimplementation code, constraints andcompiler directives (e.g., pragmasinserted in the code) are the otherimportant input of the HLS tool. Twoessential constraints are the targetFPGA family (that is, the technology)and the target clock frequency.

Naturally, both of these factors willhave an effect on the number of pipelinestages in the generated architecture.The designer can apply different typesof directives to different sections of thecode. For example, there are directivesfor loop unrolling. As another example,there are directives to limit theinstances of specific functions or opera-

tions in order to minimize the corre-sponding FPGA resource utilization.

The HLS tool takes all these inputs(the implementation C/C++ code, con-straints and directives) to generate anRTL output and to report the through-put of the generated architecture. Ifthe generated architecture does notmeet the required throughput, you canmodify the implementation C/C++code or the directives, or both. If thearchitecture meets the requiredthroughput, then you can use the RTLoutput as the input to the XilinxVivado or ISE®/EDK tools. The report-ing of final achievable clock frequencyand number of FPGA resources usedoccurs only after you have run logicsynthesis and place-and-route. If thedesign does not meet timing or theFPGA resources are not as expected,you should modify the implementationC/C++ code or the compiler directives.

It is worth noting that this is an iter-ative design flow, so the implementa-tion code can go through differenttypes of restructuring until the designrequirements are met. A key concept tokeep in mind is that the C-level verifica-tion infrastructure is reused to verifyany change to the implementation. Inthis way, you do not carry out the veri-fication at the register-transfer level,avoiding time-consuming RTL simula-tion and hence, contributing to a reduc-tion in the overall development time.

First Quarter 2013 Xcell Journal 25

X C E L L E N C E I N W I R E L E S S C O M M U N I C A T I O N S

Reference C/C++ Code

Manual Code Restructuring

C/C

++

Cod

eV

erifi

catio

n

Implementation C/C++ Code

RTL Output

Bitstream

Vivado HLSTool

Constraints

Directives

Vivado(ISE/EDK)

CrestFactor

ReductionPredistorter

x z y0PA

coefficients

LS Estimator

CC AMC

Alignment

y

Antenna

Hardware

Software +Accelerators

Figure 3 – High-level synthesis design flow for programmable logic

Figure 4 – An algorithmic view of the digital predistorter (DPD)

Page 26: Xcell82

SYSTEM MODEL FOR DPDHigh peak-to-average power ratio(PAPR) is a major problem of the non-constant envelope signals (for example,wideband code-division multiple accessand orthogonal frequency-division mul-tiple access signals) widely adopted in3G/4G and emerging wireless systems.Due to high PAPR and PA nonlinearity,the transmitted signals get distortedduring transmission. This distortion typ-ically results in a growth of out-of-bandspurious emissions. A straightforwardsolution to this problem is to back offthe PA input so as to keep it in the linearoperating range of the PA. However, themain disadvantage of this approach isthe inefficient use of the PA, whichresults in a higher cost than required forthe same output power. Another solu-tion is to use digital predistortion. DPDnegates the nonlinearity effects of thePA and increases efficiency.

As Figure 4 shows, a DPD systemconsists of a predistorter employedprior to the amplification and a param-eter estimator on the feedback pathfrom the output of the PA. (Please notethat this illustration is an algorithmicview, which excludes the digital-to-ana-log and analog-to-digital converters atthe PA input and output, respectively,as well as the RF circuitry in between.)

The parameter estimator computesthe coefficients of the predistorterbased on the samples of the PA inputand output. To separate the PA behaviorfrom the additional analog hardwareeffects, the PA output y0 is alignedprior to the parameter estimation. Thealigned PA output y matches the ampli-tude, delay and phase variations of z.

The predistorter and parameterestimator rely on a memory model thatis used to describe the nonlinearityeffects of the PA. For wideband DPD

applications, it is quite common toemploy models based on the Volterraseries, which is widely used for theapproximation of nonlinear systems.

We consider the efficient implemen-tation of alignment and a least-square(LS) estimator in Figure 4. The DPDsystem employs these blocks toupdate the predistorter coefficientswhen there are major changes in thesignal characteristics or power dynam-ics. The autocorrelation matrix com-putation (AMC) block in Figure 4 isthe most computationally complexfunction in our LS estimator design. Itrelies on a Volterra-series-based modeland generates the inputs to the coeffi-cient compute (CC) block, which esti-mates the predistorter coefficients.

Unlike the predistorter filter inhardware, the alignment and the LSestimator blocks do not operate atthe sample rate, since they are not inuse unless the DPD system updatesthe predistorter coefficients. Here,our main goal is to reduce the overallcoefficient update time. In this way,the DPD solution reacts faster tochanging conditions, leading to more-effective predistortion correction.Furthermore, faster updates enablethe support of more-complex DPDsolutions, using a larger number ofactive coefficients. With shorterupdate times, it is also possible to runthe same design multiple times in aserial fashion, in order to update pre-distorter coefficients of different datapaths. This approach makes it possi-ble to implement efficient DPD solu-tions for multiantenna basestations.

Let’s take a closer look at thedetails of our software-programmabledesign flow, which facilitates theimplementation of efficient DPDcoefficient update solutions for mod-ern wireless transmitters.

SOFTWARE IMPLEMENTATIONFOR DIGITAL PREDISTORTIONDuring the software developmentprocess, we used a test environmentthat reads the z and y samples in

26 Xcell Journal First Quarter 2013

X C E L L E N C E I N W I R E L E S S C O M M U N I C A T I O N S

Alignment

Autocorrelation matrix computation

Coefficient computation

97%

0%3%

32 32

32

multiply multiply

32

64 64

Figure 5 – Results of initial profiling

Figure 6 – Two parallel 32 x 32-bit multiplications using NEON

Page 27: Xcell82

Figure 4 from a reference vector andwrites a set of coefficients. Sub-sequently, we compared the coeffi-cients to a reference implementationwritten in MATLAB that visualizes thedifference between both sets of coeffi-cients. We performed the software pro-filing in two levels. First, we ran thesoftware on a standard x86 server andused the gprof software profiling tool,to get a first estimate of the expectedbottlenecks. Second, we ran the soft-ware on the ARM processor and,depending on the gprof results, instru-mented the subfunctions of interestwith calls to the global CPU timer of

the Zynq All Programmabale SoC. Thistimer runs at half the CPU frequency,hence giving excellent resolution withthe overhead of only a few cycles. Figure5 shows the profiling results of the threemain function blocks running on theARM processor, indicating that the AMCblock is the bottleneck of the applica-tion. It consumes 97 percent of the over-all update time, making it a prime candi-date for hardware acceleration.

Prior to profiling, we expected thesolver used for coefficient computa-tion in Figure 4 would consume a larg-er part of the update time, because incontrast to the other functionality it

was performing double-precisionfloating-point operations. However,the ARM’s floating-point unit solvedthe task very efficiently.

Before actually implementing ahardware accelerator for the AMCblock, we examined potential softwareoptimization possibilities. The SIMDNEON engine of the ARM processorhas a 128-bit-wide data path. Since theAMC algorithm works on 64-bit fixed-point data types, the NEON engine cancarry out two parallel computations, asFigure 6 illustrates. Instead of usinglow-level assembly instructions toaccess the NEON engine, the compilerprovides a set of functionlike wrappersfor the instructions. These wrappers,which are called intrinsics, providetype-safe operations, while allowingthe compiler to automatically schedulethe C variables to NEON registers.

Applying the intrinsics in the C coderesults in a speed-up factor of two.Furthermore, during the NEON opera-tions, the normal ARM processor is freeand can continue processing simplenon-NEON instructions like loop condi-tions and pointer increments, while theNEON engine runs in parallel.

HARDWARE IMPLEMENTATIONFOR DIGITAL PREDISTORTIONTo improve the overall parameterupdate time, we implemented an AMCaccelerator using the Vivado HLS toolbased on the design flow in Figure 3.Our accelerator’s programmable config-urations support a number of differentpredistorter coefficients and allow theflexible selection of nonlinear terms inthe Volterra-series-based model. Hence,it is possible to support several DPDconfigurations using the same AMCaccelerator. In addition, you can makenew changes in the existing C++ codeand in the compiler directives to gener-ate a brand-new accelerator in a muchshorter time than if you were doing ahand-coded RTL design. Let’s take acloser look at some specific examples ofcode rewriting and compiler directivesthat we used for the AMC accelerator.

First Quarter 2013 Xcell Journal 27

X C E L L E N C E I N W I R E L E S S C O M M U N I C A T I O N S

1: typedef struct {2: ap_int<32> real;3: ap_int<32> imag;4: } CINT32;5:6: typedef struct {7: ap_int<64> real;8: ap_int<64> imag;9: } CINT64;10:11: CINT64 CMULT32(CINT32 x, CINT32 y){12: CINT64 res;13: ap_int<33> preAdd1, preAdd2, preAdd3;14: ap_int<65> sharedMul;15:16: preAdd1 = (ap_int<33>)x.real + x.imag;17: preAdd2 = (ap_int<33>)x.imag - x.real;18: preAdd3 = (ap_int<33>)y.real + y.imag;19:20: sharedMul = x.real * preAdd3;21: res.real = sharedMul - y.imag * preAdd1;22: res.imag = sharedMul + y.real * preAdd2;23: return res;24: }

3 pre-adders

3 multiplicationsinstead of 4

Non-standard bit-widths

1: void amc_accelerator_top(_)2: {3: #pragma HLS allocation instances=mul limit=3*UNROLL_FACTOR operation4:5: <function body>6: CINT64 Marray[MSIZE];7:8: #pragma HLS array_partition variable=Marray cyclic factor=UNROLL_FACTOR dim=19:10:11:#pragma HLS resource variable=Marray core=RAM_2P12: <function body>13: label_compute_M: for (int i=0; i < MSIZE; ++i)14: {15:#pragma HLS pipeline16:#pragma HLS unroll factor=UNROLL_FACTOR17: <loop body>18: }19: <function body>20: }

Limit the number of multiplications

Partition based on the unrolling factor

Resource mapping

Loop pipelining and unrolling

Figure 7 – Optimized complex multiplication code

Figure 8 – Code snippet for loop unrolling

Page 28: Xcell82

It’s possible to rewrite the C/C++code to more efficiently utilize specificFPGA resources and, hence, improvetiming and reduce area. There are twovery specific examples of this type ofoptimization: bit-width optimizationsand efficient use of embedded DSPblocks (DSP48s). For example, thestandard approach uses built-in C/C++data types (such as short, int) in thereference C/C++ code, whereas theactual design may require fixed-pointdata types having word lengths that arenot integer multiples of the byte size.Here, the Vivado HLS tool supportsC++ template classes that can repre-sent integer data types with an arbi-trary bit width. For our AMC accelera-tor, we leveraged these template class-es, hence reducing FPGA resourcesand minimizing the impact on timing.

The snippet of C++ code in Figure 7is a good example of code rewriting,bit-width optimization and the effi-cient use of DSP48s. The examplefocuses on complex multiplication,which we widely used in our AMCaccelerator. A standard complex mul-tiplication carries out four real multi-

plications, and requires the use of fourdifferent multipliers in a fullypipelined implementation. However,we show in Figure 7 that we canachieve the equivalent functionality byrewriting this code to use three multi-pliers (employing fewer DSP48blocks), at the expense of three addi-tional pre-adders and a 1-bit increasein the multiplier word length. In thisexample, the Vivado HLS tool gener-ates three 33 x 32-bit multipliers, eachgiving a 65-bit result.

The original reference C++ code forthe complex multiplication uses fourmultipliers, each multiplying two 32-bitnumbers. Please note that the originalcode uses built-in C/C++ data types andthe 32-bit inputs should be cast to 64-bitinteger to avoid loss of information(because the result is a 64-bit integer).However, based on this code, the HLStool generates four 64 x 64-bit multipli-ers, which are clearly much moreexpensive in terms of DSP48s. On theother hand, by using the C++ templateclasses in Figure 7 for the data types,the C++ code functionality works finewithout any casting, while the Vivado

HLS tool generates only three 33 x 32-bit multipliers, each using four DSP48s.

Our main reason for implementingan AMC accelerator was to reduce thetime that it took to run the AMC algo-rithm, and hence to improve the totaltime to update predistorter coeffi-cients. For this purpose, we pipelinedthe loops in the C++ code and alsoapplied loop unrolling when possible.Using the Vivado HLS tool, we verifiedthat the most time-consuming loop inour implementation was the matrixcomputation loop. We unrolled thisloop by a configurable factor that wepredefined as a C macro in the compil-er options. Depending on theunrolling factor, the designer canchoose between better resource shar-ing and shorter computation time.

Figure 8 shows how we employedthe configurable unrolling factor(UNROLL_FACTOR) and the VivadoHLS directives for loop unrolling at thesoftware level. In the matrix computa-tion loop, we called the CMULT32 func-tion in Figure 7, using three multipliersfor complex multiplication. In Figure 8,the HLS directive on line 3 enables the

28 Xcell Journal First Quarter 2013

X C E L L E N C E I N W I R E L E S S C O M M U N I C A T I O N S

ARM CoreSight Multicore and Trace Debug

NEON/FPU Engine

Cortex-A9 MP Core32/32/KB I/D Caches

512 KB L2 Cache

Timers/Counters

General Interrupt Controller

Amba Switches Memory

AXI4-Lite Interconnect

AXI FIFO

AMC Accelerator s

s

m m

NEON/FPU Engine

Cortex-A9 MP Core32/32/KB I/D Caches

Snoop Control Unit (SCU)

256 KB On-Chip Memory

DMA Configuration

Figure 9 – Integration of the processor system with the hardware accelerator

Page 29: Xcell82

sharing of 3xUNROLL_FACTOR multi-pliers in the case of loop parallelization.The directive on line 8 partitions thearray that stores the matrix values bythe unrolling factor, to avoid parallelaccess problems. Lines 15 and 16 showhow we applied the loop pipelining andunrolling directives, respectively.Furthermore, Vivado HLS allows you tospecify the resource to implement avariable in RTL. For example, wemapped the Marray variable to dual-port RAM on line 11.

INTEGRATING HARDWARE ANDSOFTWARE COMPONENTSThe communication protocol betweenthe processor system and hardwareaccelerator is the Advanced eXtensibleInterface (AXI). The second and most-recent version of AXI is AXI4. Ourdesign uses the Xilinx AXI interconnectto transfer data from the Zynq process-ing system (ARMs) to the hardwareaccelerator. Given the bandwidthrequirements of this application, wedon’t need DMA transfers. As Figure 9shows, we used the AXI interconnectcore to link AXI4-Lite masters to slaves.

AXI4-Lite is a lightweight, single-trans-action memory-mapped interface. Whenconnected to AXI4-Lite slaves, the AXIinterconnect core stores the transactionIDs and restores them in the responsetransfers. Furthermore, it controls thetransactions and does not propagate anyillegal transaction to the AXI4-Lite slave.

The AXI FIFOs in Figure 9 are forthe input and output data samples ofthe accelerator, which are using AXI4-Stream interfaces of the accelerator.The AXI4-Lite slave on the acceleratoris for the AMC configuration parame-ters. We generated all the streamingand AXI4-Lite interfaces for the accel-erator at the software level, usingVivado HLS.

PERFORMANCE RESULTS ON ZYNQ-7000 SOCFigure 10 illustrates the total DPDcoefficient update time at differentdesign stages. Our target was theZynq-7000 SoC ZC706 board using aZynq 7045 device. The target clock fre-quency that we used in Vivado HLS togenerate the AMC accelerator was 250MHz. However, it is possible to

increase the target clock frequencyconstraint in Vivado HLS in order togenerate faster accelerators. Ourimplementation with Vivado HLS isquite efficient, resulting in 3 percentarea utilization.

In Figure 10, the update time for thesoftware-only solution using the origi-nal code is around 1.250 seconds, withthe AMC as the main bottleneck forthe application. After some code opti-mization for the AMC, we obtained aspeed-up factor of about 2. NEONoptimizations added another speed-upfactor of 2. After accelerating the AMCin the programmable logic, weachieved a 70x speed-up for this block.We have tested our designs successful-ly on the target device.

Our accelerated design will bepreferable in the high-complexity DPDapplications going forward. For exam-ple, in a multiantenna basestation, it ispossible to run an accelerated designmore than once in a serial fashion(computing predistorter coefficientsfor a different antenna each time). Theaccelerated design becomes more fea-sible if the number of predistortercoefficients increases, as well. Forlow-complexity DPD applications, oursoftware-only design using NEONoptimizations can be sufficient.

Performance results corroboratethat our software-programmabledesign flow allows the implementa-tion of several DPD designs, tradingoff faster predistorter coefficientupdate times. Our design flow sup-ports the flexible partitioning of func-tionality among hardware/softwarecomponents and facilitates the imple-mentation of efficient DPD solutionsfor modern wireless transmitters.

We would like to thank the XilinxDPD Engineering Team for their valu-able comments and cooperation.

First Quarter 2013 Xcell Journal 29

X C E L L E N C E I N W I R E L E S S C O M M U N I C A T I O N S

1250

Upd

ate

Tim

e (m

s) 1000

750

500

250

0Original Optimized NEON PL Accelerator

CC

AMC

Alignment

3% Zynq 7045

70x speed-upfor AMC

Figure 10 – Performance results on a ZC706 board

Performance results corroborate that our software-programmabledesign flow allows the implementation of several DPD designs,

trading off faster predistorter coefficient update times.

Page 30: Xcell82

30 Xcell Journal First Quarter 2013

XPERTS CORNER

How a MicroBlaze Can Peaceably Coexistwith the Zynq SoC

Page 31: Xcell82

The Xilinx® Zynq™-7000 AllProgrammable SoC alreadyhas plenty of processingpower onboard. But the

presence of powerful twin Cortex™-A9processors and associated peripheralsin Zynq’s application processing unit(APU) should not keep you fromadding one or more MicroBlaze™processors in the same package, if yourapplication would benefit from them.

Why might you want to add aMicroBlaze to a solution alreadyendowed with serious processingclout? First there is the issue of relia-bility. Single-threading dramaticallyimproves reliability. You can cleanlyplace one thread per Cortex-A9 (forcomputationally intensive tasks), andinstantiate as many MicroBlaze proces-sors as you need for other threads.Second, you can farm out any house-keeping chores that don’t require thepower of a Cortex-A9 to a MicroBlaze,thus saving critical performance cyclesfor the jobs that need them most.

Here’s an example that covers bothof the above situations. Consider atask that requires long stretches ofintense computing while monitoringuser input. Here the MicroBlaze couldmanage the user input (lower frequen-cy, non-computationally intensive)and write into the APU’s memoryspace so that when the APU “comesup for air”—that is, completes its pro-

cessing task—it can see what infor-mation it needs to process next.

Once you’ve made the decision toinclude a MicroBlaze processor inyour Zynq-based design, severalissues become immediately appar-ent. First and foremost is the ques-tion of how the APU will communi-cate with the MicroBlaze, and whatprocessing system (PS) resourcesare available for the MicroBlaze.Many boards, such as the ZC702 andZedboard, map many of the peripher-als directly to the pins connected tothe PS. These pins are not directlyaccessible to the MicroBlaze in theprogrammable logic (PL). The PSalso contains a variety of timers andinterrupt sources. Is there any wayto access them from the domain ofthe MicroBlaze?

INTERFACES BETWEEN THE PS AND PLThe processor system and the pro-grammable logic are well coupled.This means that there are multipletightly integrated connections be-tween the Cortex-A9s, snoop controlunit (SCU), PS peripherals, clockmanagement and other functions, andthe programmable logic. In fact, thereare six different types of intercon-nects between the PS and the PL, andyou can use them in conjunction withone another. Additionally, many of

First Quarter 2013 Xcell Journal 31

X P E R T S C O R N E R

Processor Systemwith Dual Cortex-A9s

Programmable Logicwith MicroBlaze

?

Figure 1 – Is the boundary between the PS and the MicroBlaze within the PL a minefield, or can the two share resources?

by Bill KafigSenior Content Development Engineer Xilinx, [email protected]

Praveen VenugopalSolutions Development EngineerXilinx, [email protected]

It’s a fairly simple matter to add MicroBlaze processors to a design based on Xilinx’sZynq All Programmable SoC.

Page 32: Xcell82

job of the SCU is to ensure coherencyamong the L1, L2 and DDR memories.Using the ACP, you can access the fastcache memory for each of the Cortex-A9 processors in the PS and not be con-cerned with synchronizing data withthe main memory (as the hardware willautomatically take care of this). Thiscapability greatly reduces the burden ofdesign and provides a significantlyfaster way of moving data between theprocessors and the PL.

these paths are symmetric—that is,the PS can initiate or “master” connec-tions to the PL and the PL can masterconnections to the PS.

Much of the information presentlyavailable from Xilinx, from app notesto user guides and white papers, illus-trates how the Zynq-7000 APU, as the“center” of the design, can use theprogrammable logic to access memo-ry, PL-based peripherals and hard sili-con peripherals such as the PCIe®

block, Block RAMs, DSP48s andmultigigabit transceivers. In examin-ing how the MicroBlaze can be thecaptain of its domain, the logical placeto begin is by looking at the six inter-face varieties, starting with threetypes of AXI interfaces: general pur-pose, high performance and theAccelerator Coherency Port.

The PS is equipped with two masterAXI channels to the PL and two slavechannels mastered by the PL (Figure2). “Master” in this context means thatthe AXI channel is the initiator and canbegin data exchanges, whereas a“slave” can only respond to arrivingdata. The master AXI channels are typ-ically used to communicate withperipherals located in the PL. The slaveAXI channels respond to requestsmade from the PL, which can includetransactions made by MicroBlazeprocessors. These AXI channels tieinto the central interconnect of the PSand can be routed to many resources.

In addition, there are four channelsof high-performance (64-bit-wide) AXIattachment points. All four of thesechannels are slaves from the PS’ per-spective and are connected to thememory interface subsystem withinthe PS (Figure 3). The purpose ofthese four channels is to allow mastersin the PL to initiate double-data-rate(DDR) memory transactions.

This memory interconnect andDDR memory controller are the gate-ways to the DDR memory from allsources. While the Cortex-A9 proces-sors usually have priority over theslave AXI connections, each one of the

32 Xcell Journal First Quarter 2013

X P E R T S C O R N E R

four slave AXI connections has a “serv-ice me now” signal that gives priorityto the requesting channel. When thissignal is not asserted, the architectureuses a round-robin scheme to deter-mine which requestor can gain accessto the specific type of memory.

The Accelerator Coherency Port(ACP) is another 32-bit AXI PS slaveconnection from the PL. What makesthe ACP unique is that it is tied directlyinto the snoop control unit (SCU). The

I/O Peripherals

APB Register Access

Debug (DAP)

L2 Cache Memory

DMA (8 channels)

General-PurposeMaster AXI

32-bit

General-PurposeSlave AXI

32-bit

Central Interconnect

Programmable Logic

OCM

DDR Memory Controller

L2Cache

CentralInterconnect

AXI_HP toMemory

Interconnect(64-bit paths)

AXI HP 0

AXI HP 1

AXI HP 2

AXI HP 3

M1

M2

M0

FIFO

FIFO

FIFO

FIFO

Figure 3 – Simplified connections to the DDR memory controller and on-chip memory (OCM)

Figure 2 – Simplified connections to the processing system’s central interconnect

Page 33: Xcell82

X P E R T S C O R N E R

First Quarter 2013 Xcell Journal 33

Beyond AXI links, the ExtendedMultiplexed Input and Output (EMIO)signals are available for routing manyof the PS’ hard peripherals through thePL to access the package pins. Thereare only 54 package pins tied directly tothe PS; however, the PS’ hard peripher-als can use considerably more thanthese 54 pins. The EMIO is the conduitbetween the PS’ hard peripherals andthe PL. These I/O signals can be routeddirectly to the package pins available tothe PL. Alternatively, you may use themto communicate with a compatibleperipheral located in the PL.

Another variety of miscellaneoussignals between the PS and PL fallsinto five basic categories: clocks and

resets; interrupt signals; event signals;idle AXI; DDR memory signals; andDMA signals.

• Clocks and resets: There are fourindependent programmable frequen-cies that the PS makes available tothe PL. Typically one of these clocksis used for the AXI connections. Eachof these clock domains has its owndomain reset signals for resetting anydevice associated with that domain.

• Interrupt signals: The general inter-rupt controller (GIC) in the PS col-lects interrupts from all availablesources, including all of the interruptsources from the PS’ peripherals and16 “peripheral” type interrupts from

the programmable logic. Additionally,there are four direct interrupts thattie to the CPUs (IRQ0, IRQ1, FIQ0and FIQ1). A total of 28 interrupts(from the PS’ peripherals) are avail-able to the PL.

• Event signals: These “out-of-band”asynchronous signals indicate a spe-cial condition of the PS. The PS pro-vides a number of signals that indicatewhich CPU has entered a standbymode and which CPU has executed aSEV (“send event”) instruction. The PScan leverage an event signal to wakefrom a WFE (“wait for event”) state.

• Idle AXI and DDR memory signals:The idle AXI signal to the PS is used toindicate that there are no outstandingAXI transactions in the PL. Driven bythe PL, this signal is one of the condi-tions used to initiate a PS bus clockshutdown by ensuring that all PL busdevices are idle. The DDR urgent/arbsignal is used to indicate a criticalmemory-starvation situation to theDDR arbitration for the four AXI portsof the PS DDR memory controller.

• DMA signals: The direct-memory-access module within the PS commu-nicates with the PL slaves via a seriesof request-and-acknowledge signals.

ACCESSING DDR MEMORYLet’s take a look at an example designthat covers several common needs thetypical MicroBlaze user might have,including how to access DDR memory,how to use the peripherals in the PS’IOP, how to pass blocks of data betweenthe MicroBlaze and the PS, and how tosynchronize events between theMicroBlaze and the PS. Figure 4 showsspecific techniques to address each ofthese issues (labeled 1-4, respectively).

The easiest way to access the DDRmemory is to connect through one ormore of the four high-performance(HP) AXI interfaces (top right sectionof the diagram). Four 64-bit-wide portsare accessible to the programmablelogic. You must first enable a port; thenyou can connect an AXI to it. The

PS Design

MicroBlaze (PL)Design

GIC

BRAMController

BR

AM

Con

trol

ler

BR

AM

Con

trol

ler

BRAMController

DCM_locked

BRAM

MicroBlaze

Local AXI PeripheralGPIO

intc

BRAM

DDR

MIO PIO

BLOCK

Interruptfrom UART

Interruptfrom Timer

GPIO LED 8bit

DLMB ILMB

Cortex A9

HPS-AXI (x4)

S_AXI_HP3

S_AXI_HP2

S_AXI_HP1

S_AXI_HP0

Inte

rrup

t fro

m U

AR

T

µB D

ata

Sid

e

µB In

terr

uptio

n S

ide

TTC Waveform

clk S-AXI (x2)

M_AXI_GP0 S_AXI_GP0

100

MH

z cl

k

rese

t_n

8 LEDs

1

42

3

Figure 4 – Block diagram of example hardware design; the numbers represent (1) ways to access DDR memory; (2) use of the peripherals in the PS’ IOP; (3) how to pass blocks of data between the MicroBlaze and the PS; and (4) how to synchronize

events between the MicroBlaze and the PS.

Page 34: Xcell82

other peripherals in the MicroBlaze’saddress space can overlap this rangeof addresses, as they will be sharedbetween the processors.

Meanwhile, a number of mecha-nisms support passing blocks of databetween the MicroBlaze and the PS—DMA to and from DDR and on-chipmemory (OCM) being but two possibil-ities. Another option, as implementedin the example design, uses a dual-portBlock RAM (circle “3” in the figure).The PS masters one side of a BRAMcontroller while the MicroBlaze mas-ters the other side. Software managespartitioning and use of the sharedBRAM memory space, since both sideshave full read/write access to theentire contents of this memory.

The final issue involves how to syn-chronize events between the MicroBlazeand the PS. The PS contains a plethoraof timers, and you can instantiate virtual-ly any number of timers in the PL. Theexample design uses one channel withinone of the triple-timer/counter peripher-als to generate a 100-millisecond pulse(circle “4” in the figure). The trick here isto realize that the TTC generates a wave-form that is accessible outside the PS,but the TTC interrupt is only availableinside the PS. You can get around thisissue by using the waveform itself as aninterrupt to the MicroBlaze processor.Alternatively, you may employ a hard-ware timer (whether an AXI timer or onethat’s user-coded) to provide an inter-rupt to both the PS and the MicroBlaze.

SOFTWARE SUPPORTAs with any embedded design, youmust consider not only the hardwareimplementation, but the software sup-port as well. For this example, considerthe areas of overlap between the twoprocessors, as outlined in Figure 5.

In terms of the hardware, think ofthe system as two intersecting embed-ded designs: the PS and the MicroBlaze.Our reference design has the PS gener-ating the clocks for both itself and theMicroBlaze as well as sourcing theinterrupts. The MicroBlaze can use

MicroBlaze utilizes the AXI4-Liteinterface, whereas the HP ports areexpecting full AXI4 connections.Fortunately, the Xilinx design toolsautomatically compensate to make asuccessful connection without anymanual modification. This connectionis shown in the block diagram at cir-

cle “1.” The advantage in making thistype of connection is that it is veryeasy, and the DDR memory appears tothe design as regular memory. Thedisadvantage is that AXI-Lite doesn’tsupport burst mode and is only 32 bitswide, so you lose quite a bit of per-formance.

There are other ways to connect PL-based peripherals to the DDR memoryas well, including the use of a DMAcontroller, likewise by means of the HPports. While more complex, this mech-anism offers better performance fortransferring large blocks of data.

Within the PS, a UART and a tripletimer/counter (TTC) are enabled. TheM_AXI_GP0 connects to a BRAMcontroller that resides in the PL but isconceptually (and practically) part ofthe PS design.

The other portion of the design,the MicroBlaze, uses BRAMs to pro-vide 64 Kbytes of combined codeand data space for exclusive use bythe MicroBlaze. It is certainly possi-ble to connect these ports to the HP

34 Xcell Journal First Quarter 2013

X P E R T S C O R N E R

ports of the PS in order to access theDDR memory. However, doing soinvites additional complexity in theimplementation.

There are three standard peripher-als attached to the MicroBlaze—aninterrupt controller that receives theTTC’s PWM waveform input and inter-

prets it as an interrupt source and theUART-character received interrupt; aGPIO that drives the eight LEDs onthe ZC702 board; and a BRAM con-troller that connects to the “B” side ofthe same BRAM that the PS connectsto. Then there are two additional con-nections: one to the S_AXI_GP0 sothat the peripherals internal to the PScan be addressed, and one to the high-performance ports so that theMicroBlaze has access to a portion ofthe DDR memory.

PERIPHERALS, DATA BLOCKS AND SYNCHRONIZATIONThe key to accessing the programma-ble system’s IOP block is the connec-tion to the PS’ S_AXI_GP0 port (circle“2” in the figure). The “S” denotes thatthis is a slave port—one that canaccept transactions that elements inthe PL initiate. When you make theconnection between the MicroBlazeand the S_AXI_GP0 port, the IOPblock spans the range of 0xE000_0000to 0xE02F_FFFF. This means that no

Cortex-A9Software

Realm

MicroBlazeSoftware

Realm

Shared:Interrupt sources

Clock(s)

Overlap:BRAM addressesIOP addressesDDR memory addresses

Figure 5 – Independent software realms and items of overlap

Page 35: Xcell82

X P E R T S C O R N E R

First Quarter 2013 Xcell Journal 35

these signals, but it can’t influence oralter them. Both the PS and the PLshare the PS’ IOP block, the DDRmemory and controller, and the com-mon dual-port BRAM.

From an address map standpoint,the IOP addresses are fixed in hard-ware and cannot be modified. Boththe MicroBlaze and the Cortex-A9must use the same addresses for anyof these peripherals. The TechnicalReference Manual is especially help-ful in this regard. There are two por-tions of the TRM that are useful fordealing with peripherals in thisdesign: the sections on the specificperipheral (such as Section 8 – Timersand Section 19 – UART Controller),and Appendix “B,” which lists all ofthe registers for each of the peripher-als. Each register in this section is list-ed by its address along with a briefdescription. It is left as an exercise tothe user to manage the resourcesbetween the two processors to avoiddata collisions, starvation and so on.

Use of the DDR memory and itscontroller is a bit easier, since theDDR memory controller is capable ofmanaging collisions and has appropri-ate heuristics to avoid data starvationfor any of the requestors. Since this is

not under user control, the latencymay become a bit unpredictable.

The common BRAM will have iden-tical lower address bits; however, thelocation in each processor’s memoryspace for this block is selectable usingthe address tab in Xilinx PlatformStudio. As before, data collisions, syn-chronization and memory allocationare left as an exercise to the reader.

Our example software design ismore about demonstrating the processof connecting a Cortex-A9 with aMicroBlaze than doing anything prac-tical. A couple of techniques willillustrate some of the possibilities inthis regard.

The Zynq-7000 application shownin the left side of Figure 6 is a simpledesign that uses the Xilinx Standaloneboard support package. It runs a pro-gram that, for the most part, keeps theprocessor idling and periodicallyissues a “T” character (for “tic”) viathe serial port to indicate that it is stillalive. Additionally, it counts the num-ber of elapsed tics and places thiscount in the Block RAM that is sharedwith the MicroBlaze.

The MicroBlaze application alsoruns a Standalone board support pack-age. This approach supports interrupts,

and each time a character appears onthe serial port the MicroBlaze buffersthat character and echoes its Hex-ASCIIequivalent back to the serial port in thePS’ IOP block. Additionally, it builds astring of received characters and placesit in the DDR memory for each charac-ter received. When a carriage return isreceived, a checksum is computed andadded to the DDR memory immediatelyfollowing the string. A flag is set in apredetermined location in BRAM andwhen the Cortex code “sees” this flag, itreads the string from the DDR memoryand verifies the checksum. If the check-sum is correct, the MicroBlaze emits a“+.” Otherwise, it transmits an “X.”

In short, our example designproves that the MicroBlaze and Zynq-7000 PS can coexist peacefully. Thecopious number of AXI connectionpoints provide the MicroBlaze (oranything in the PL) easy access to theperipherals in the IOP block, theOCM and the DDR. The programma-ble logic can easily access interruptsfrom the IOP block, just as a numberof interrupts can be generated in thePL and supplied to the PS. Softwarecoordination and well-defined behav-ior are key to avoiding race condi-tions and addressing conflicts.

PS - Cortex-A9 PL - MicroBlaze

TTC Interrupt

Wait fortic

BRAM

UARTInterrupt

UARTInterrupt

Wait forchar

Increment the tic counterSet the BRAM offset

and address

Get the occurencecounter from BRAM

Set the LEDs Decrementthe LED

Convert to hex ACSII

Send char pair via UART

Build a buffer with checksum and write to DDR

Increment the occurrencecounter

Occurrence= limit?

Send msg via serial port

Validate DDR memory

LEDs

DDR Memory(data space)

Figure 6 – Flow diagram for both the Cortex-A9 application and the MicroBlaze application

Page 36: Xcell82

36 Xcell Journal First Quarter 2013

XPLANATION: FPGA 101

Debugging High-Speed Memories

Debugging High-Speed Memoriesby David J. EastonFounderDesignVerify [email protected]

Page 37: Xcell82

Modern FPGAs often havehigh-speed SRAM andSDRAM memories con-nected to them. These

devices can be difficult to debug toensure error-free operation. Thereare a number of factors that all haveto be correct to guarantee a success-ful working memory design, includ-ing the board layout, the power sup-plies and the memory interface cir-cuitry within the FPGA.

You may encounter several problemswhile debugging SRAM and SDRAMmemories that might leave you scratch-ing your head for days. It is essential tohave a good debug methodology inplace when designing and commission-ing high-speed memory interfaces. Thedesigns that I will be describing here asmodels for an efficient flow usedXilinx® Virtex®-4 and Virtex-5 FPGAs,but the problems and solutions areequally valid for Virtex-6 and 7 seriesdevices too. My colleagues and I usedthe Xilinx Memory Interface Generator(MIG) tool to generate all the memoryIP cores, and the Xilinx ISE® tools tosynthesize, place and route the designs.

Before examining specific designfeatures in detail, let’s take a look atthe Xilinx MIG tool, the memory cali-bration process specific to XilinxFPGAs and the built-in self-test (BIST)circuitry that we used to validate thememory interface designs.

XILINX MEMORY INTERFACE GENERATOR Xilinx provides a Memory InterfaceGenerator tool to generate memoryinterface cores for its FPGA devices.

The MIG tool, which is invoked from theXilinx CORE Generator™, can produceinterfaces for many types of memoriesincluding DDR-II SRAM, DDR2 SDRAMand QDR-II SRAM, to name a few. Figure1 shows a Xilinx CORE Generatorscreenshot detailing the Xilinx familiesand memory technologies supported inMIG3.6.1. There is also a 7 series MIGtool for the newest Xilinx devices.

The MIG relies on a graphical userinterface (GUI) to input the details ofthe memory and FPGA types. The tooloutputs register-transfer-level (RTL)files in either VHDL or Verilog, alongwith a User Constraints File (UCF) forthe memory interface. You can thenintegrate both of these types of fileswith the rest of your design. The MIGalso generates an infrastructure blockthat provides the clocks and resetsrequired for the memory interface. Inaddition, the tool can provide test-bench circuitry with which to gener-ate writes and reads to the memorydevice to verify correct operation. Youcan download a full description of theMIG tool, and details of the FPGAsand memories it supports, from theXilinx website.

CALIBRATION CONCERNSTo ensure that the data is correctlycaptured from the memory device intothe FPGA, the memory interface coreneeds to calibrate the read datapathbefore it can be used. The calibrationprocess takes place automatically onpower-up. Essentially, a data trainingpattern is written to the memorydevice and then continually read back.This is a three-stage process in which

First Quarter 2013 Xcell Journal 37

X P L A N A T I O N : F P G A 1 0 1

Many FPGA designs have high-speedmemory interfaces that can be difficult to debug, but the rightmethodology will ensure success.

Page 38: Xcell82

minimize skew, and also necessary tominimize crosstalk between signals andto provide adequate decoupling for alldevices. The FPGAs and memories havea number of power supplies, which aretypically powered using a mixture ofswitching and linear voltage regulators.It is important to follow the manufactur-ers’ guidelines to ensure correct opera-tion of all of these devices. You willrecoup any time you spend studying thedetails of the schematics and layout ofthe power supplies to the FPGAs andmemories in reduced debug time.

Now, let’s take a closer look at threeexamples of memories that exhibited biterrors when we tested them using theBIST circuit previously described. Allthe examples exhibited similar symp-toms, which were bit errors thatoccurred after a successful calibrationof the memory device. However, in eachexample, the error had a different cause.We found these bit errors by repeatedlyrunning the BIST tester. It should alsobe mentioned that all the boards hadsuccessfully passed JTAG testing.

EXAMPLE 1: VIRTEX-4 BOARD WITH DDR-II SRAMThis board used a Virtex-4 FPGA witha DDR-II SRAM memory operating at200 MHz. The memory interface cali-brated successfully, and then whenthe BIST started running, single biterrors occurred. These errors were

the first stage centers the read-datawindow with respect to the data strobesignal, clocking the data into the FPGA.The second stage ensures that the cen-tered data and clock are synchronizedto the FPGA clock domain so that thedata can be transferred between theinput flip-flops and the flip-flops in theFPGA fabric. The third stage then pro-vides a read-valid signal. The XilinxMIG documentation offers full descrip-tions of the calibration process for thedifferent types of memories.

CLOCK GENERATOR AND BISTWe generated all of our memory inter-face designs using the Xilinx MIGtools. But we diverted from the MIGflow in a couple of instances. TheFPGAs that we were using resided onboards that are designed for a varietyof users and for a variety of applica-tions. For this reason, we designed acustom clock generator and reset mod-ule to provide all the clocks and resetsrequired for user applications. Thismeant removing the MIG-generatedinfrastructure module and supplyingthe clocks and resets to the memoryinterface from our custom module.

We also designed our own customBIST module to fully test the memoriesby writing and reading every memorylocation using various data patterns,including 0s, Fs, As, 5s, walking 0s,walking 1s, sequential data and

38 Xcell Journal First Quarter 2013

X P L A N A T I O N : F P G A 1 0 1

pseudorandom binary sequence(PRBS) data. The BIST circuitry runsunder control of a host interface, andreports back any errors found. TheBIST circuitry can store the address ofany data errors found, and also signalswhich data bit or bits are in error.Figure 2 shows a block diagram of thegeneric design, with the MIG core beingspecific to the memory technology onthe board—that is, DDR-II SRAM,DDR2 SDRAM or QDR-II SRAM.

It bears repeating that for anyFPGA memory interface to operatecorrectly, it is essential that you followthe PCB layout recommendations forboth the FPGA and the memorydevice. It is important to match thetrack lengths between the devices to

Clock Generator& Reset

HostInterface

MIGCore

BISTHost Memory

FPGA

Figure 1 – The CORE Generator MIG tool serves many generations of Xilinx FPGAs.

Figure 2 – The MIG core in this memory interface block diagram will be specific to the memory technology on the board.

Page 39: Xcell82

X P L A N A T I O N : F P G A 1 0 1

First Quarter 2013 Xcell Journal 39

cropping up at different memory loca-tions and in different bit positionswithin the data word.

The first thing we did was to meas-ure all the power supplies to the FPGAand memory, and we found them all tobe within the correct voltage specifica-tion. We then checked that the oscilla-tor on the board was the correct fre-quency, and that there was no excessivejitter on the signal. Next, we decided tocheck the voltages again, using an oscil-loscope when the BIST was running.

The memory power supply VDD issupplied by a 1.8-volt regulator, andwe discovered that the output voltagefrom the regulator had excessive rip-ple that was going out with the speci-fication of the memory device. Wechecked the regulator circuit andfound that it had an incorrect valuefor the output-smoothing capacitor,being only 10 percent of what wasrequired. When we replaced it withthe correct value, the ripple droppedto acceptable levels. As a result, thevoltage came within the specificationand the bit errors disappeared.

EXAMPLE 2: VIRTEX-5 BOARD WITH DDR2 SDRAMOur second example board used aVirtex-5 FPGA with DDR2 SDRAMmemory operating at 250 MHz. Thememory interface calibrated successful-ly, but when the BIST started running,single bit errors occurred. These weretaking place at different memory loca-tions and in different bit positions with-in the data word, but then gradually,every bit within the data word failed.

Just as in the first example, wechecked all the power supplies andclocks on the board and found these tobe within specification. Again, wedecided to check all the power supplyvoltages when the BIST was running.The memory devices have a voltagereference input VREF, supplied by avoltage regulator. This regulator wasalso supplying some other circuitry.When all circuits were active, the reg-ulator was slightly overloaded and the

voltage on VREF was falling below thespecification. We modified the regula-tor circuitry to power it using a differ-ent supply, and that change kept theVREF voltage within specification.Once again the bit errors disappeared.

EXAMPLE 3: VIRTEX-5 BOARD WITH QDR-II SRAMOur third board used a Virtex-5 FPGAwith a QDR-II SRAM memory operat-ing at 250 MHz. In this case, there weretwo failure mechanisms. The first wasthat occasionally the memory interfacewould fail to calibrate. The memoryinterface outputs a cal_done status bitthat is asserted when calibration com-pletes. The user needs to monitor thisprocess to determine that the memoryinterface is ready to use.

The second failure that we were see-ing was that even though the memoryinterface completed the calibrationstage, repeated BIST testing was report-ing bit errors. Once again, we checkedthe voltages and clocks on the board,and then we started looking at the FPGARTL design. Because the calibration wasfailing, this pointed toward somethinggoing awry shortly after the power-up orupon reset of the FPGA, as the calibra-tion happens immediately afterward.

The QDR-II memory interface thatthe Xilinx MIG tool generates has aSIM_ONLY generic, which is set at thetop-level module when it is instantiatedin the design. Its purpose is to bypass a200-microsecond power-on delay dur-ing simulation, so as to speed up yoursimulations. When the memory inter-face is used in hardware, you must setthe SIM generic at 0 to enable the 200-µsdelay, and when simulating you shouldset it to 1 to bypass it.

The QDR-II memory device has inputclocks K/K#, which are provided by thememory interface, and output clocksCQ/CQ#, which are generated by aninternal phase-locked loop (PLL) in thememory device and are also referencedto the K/K# clocks. The CQ/CQ# outputclocks are used to clock the data intothe FPGA. The PLL needs to be locked

correctly, and this typically requiresaround 20 µs of stable K/K# clocks.

What we discovered was that we hadincorrectly set the SIM_ONLY genericso that the calibration process was run-ning immediately after the system reset,at which point we were also just start-ing to output the K/K# clocks. The cali-bration process was running before thePLL was locked, and the CQ/CQ# out-put clocks would not be stable.Sometimes the calibration would pass

Visual Board Inspection

Check Power Supplies

JTAG Test Board

Check Clocks and Frequencies

Check FPGA Timing Closure

Check FPGA Reset Methodology

Run Built-In Self-Tests

Check Power Supplies During BIST

Run BIST Over Temperature

Figure 3 – A typical debug flow like this one will uncover problems in both

hardware and firmware.

Page 40: Xcell82

The problems we found in ourthree examples were relativelystraightforward to unearth and cor-rect, but this is not always the case.Both hardware and firmware can bethe source of problems in FPGAdesigns involving high-speed memory.

Figure 3 outlines a typical debugmethodology to use. Unfortunately,sometimes things go awry when thepressure of commissioning new hard-ware kicks in, and a few things may fallthrough the cracks—especially theSIM_ONLY generic!

DesignVerify Ltd. is a Scottish-based consultancy specializing in thedesign and verification of FPGA andCPLD devices. For more information,see www.designandverify.com.

and sometimes it would fail, but evenwhen it passed it would not have beencompleted optimally; hence the biterrors during subsequent BIST runs.Correcting the SIM_ONLY genericallowed 200 µs for the K/K# clocks tostabilize. Once we had corrected theSIM_ONLY generic, our calibration andbit-error problems disappeared.

I am embarrassed to admit that thisSIM_ONLY issue has caught me outmore than once, much to the amuse-ment of colleagues. To avoid the sameproblem, make sure that the top-leveldesign has this generic set to alwaysenable the power-on delay, and thatthe simulation testbench always hasthe delay skipped while simulating. Donot design it so that an engineer has to

40 Xcell Journal First Quarter 2013

X P L A N A T I O N : F P G A 1 0 1

remember to modify the SIM_ONLYgeneric before either simulating orbuilding a design for hardware testing.He or she will forget.

TIPS ON METHODOLOGYA good methodology—in our case, usingPCBs with the board layout done cor-rectly—is mandatory when designingand commissioning high-speed memoryinterfaces. We also used FPGA designswith timing constraints correctlyapplied to the complete design, andensured that we achieved timing closureusing the ISE tools. It cannot be stressedenough that if you have not achievedtiming closure, then all manner of timingerrors may surface and obfuscate anyunderlying problems.

www.alpha-data.com · 4 West Silvermills Lane, Edinburgh, EH3 5BD, UK · +44 131 558 2600

[email protected] · 3507 Ringsby Court, Suite 105, Denver, CO 80216 · (303) 954 8768

Camera Link: dna srossecorp AGPF lufrewop neewteb noitcennoc eht sedivorp KNILAREMAC-CMF s’ataD ahplA

a wide range of cameras, image sensors, and scientific instruments. Providing base, dual base, medium and full

Camera Link modes, the FMC-CAMERALINK offers the ability to implement applications such as

frame-grabbers, camera emulators, machine vision, digital video and image processing

systems in the FPGA.

XRM-CLINK-MINI

XRM-CLINK-GIGE

FMC-CAMERALINK

ADM-XRC-7V1

ADPE-XRC-6T

DESIGN · DEVELOP · DEPLOY

Combined with Alpha Data’s latest Virtex-6 and 7-Series FPGA boards, and

Camera Link IP blocks (7-Series coming soon), it provides a total solution for

computer vision, scientific imaging, and defense surveillance applications

enabled with Camera Link.

Page 42: Xcell82

42 Xcell Journal First Quarter 2013

XPLANATION: FPGA 101

Nuts and Bolts ofDesigning an FPGAinto Your Hardware

by Adam TaylorPrincipal EngineerEADS [email protected]

Page 43: Xcell82

To many engineers and proj-ect managers, implementingthe functionality within anFPGA and achieving timing

closure are the main areas of focus.However, actually designing the FPGAonto the printed-circuit board at thehardware level can provide a numberof interesting challenges that you mustsurmount for a successful design.

It all begins with the architecture.The initial stage of hardware develop-ment involves defining the architectureof the solution. This architecture shouldbe a response to the system require-ments, detailing how they are to beimplemented within the hardware.Although this architecture will varyfrom system to system—and each sys-tem will vary from application to appli-cation—many will contain similar archi-tectural blocks. You can and shouldreuse frequently needed hardware mod-ules, in just the same way that you reusecommon HDL modules.

Figures 1 and 2 show examples ofan overall architecture and a powerarchitecture, respectively, while thesidebar lists common considerationsto take into account when designingan FPGA-based system.

DEVICE SELECTIONThe most important choice you willface at the outset is selecting the cor-rect FPGA from the large numberavailable. There are many factorsthat will drive the choice of thedevice. The first and most importantfactor is the resources available with-

in the FPGA and whether they aresufficient to implement the function-ality you desire at the operating fre-quency required.

These parameters will quickly thindown the field to a manageable numberof suitable devices, in a process thatallows you to apply further selectioncriteria to these correctly sized FPGAs.Another driving factor will be addition-al resources your system might require,such as DSP slices or multipliers,embedded processors or high-speedserial links. The availability of theseresources should further narrow thefield in terms of device selection. Inmany cases, the list of resources willlead you to subfamilies of devices, forexample, the Xilinx® Spartan®-6 LX orSpartan-6 LXT if logic or high-speedserial links are required.

The number of inputs and outputsrequired from the device will driveboth the device selection and thepackaging selection, as it is possible toget the same device in several differ-ent packages, each providing a differ-ent number of user I/Os. Here, it’soften worth your while to consider anupgrade path and choose a footprintthat is common across a number ofdevices within the selected family.

You must also consider the operat-ing environment when selecting adevice. For example, is a commercialcomponent sufficient, or do you needto consider an industrial, medical orautomotive part? In some cases, yoursystem may require a defense- orspace-grade component.

First Quarter 2013 Xcell Journal 43

X P L A N A T I O N : F P G A 1 0 1

Many engineers feel the job is doneonce they’ve defined the functionalityof their FPGA. However, insertingthat FPGA into your PCB can provideits own set of challenges.

Page 44: Xcell82

FPGA, along with the dynamic currentonce the device is configured andclocking.

Power estimation of the design isthus very important and is requiredearly in the project to correctly sizethe power architecture. Xilinx pro-vides power estimation spreadsheets,which you can download fromhttp://www.origin.xilinx.com/prod-

ucts/design_tools/logic_design/xpe.htm.Within these spreadsheets, you canselect the FPGA resources’ clock ratesand toggle rates along with environ-mental parameters for the chosendevice. If you are not sure, err on theside of caution—be pessimistic ratherthan optimistic in your estimates.Once you have determined the powerrequirements for the FPGA, you canmerge them into a larger, overallpower budget for the system and use itto determine the power architecture.

In systems that require large cur-rents, it is advisable to utilize switchingDC/DC converters to maintain overallefficiency and ensure that the thermaldesign of the unit is not complicated.For lower-current requirements and

Then too, you must also take the con-figuration architecture into account.Are SRAM-based FPGAs acceptable forthe application or is a nonvolatile solu-tion like the Xilinx Spartan-3AN series abetter fit? Security of the design shouldalso factor into your deliberations. Ifyour application needs it, you shouldconsider preventing readback andencrypting the data stream.

As always, the cost of the compo-nent will also be a driver, regardless ofwhether the application is to be pro-duced in volume or as a one-off,bespoke spaceflight design. Youshould have a target cost budget andstrive to ensure you achieve it.

A number of the above parametersare also important when consideringother devices for use within the sys-tem. In particular, one important crite-rion is selecting components thatoperate off the same voltage rails, thussimplifying the power architecture.

You should also attempt where pos-sible to derate components within thedesign. All manufacturers’ componentdata sheets provide absolute-maxi-mum and operating electrical stresses

44 Xcell Journal First Quarter 2013

X P L A N A T I O N : F P G A 1 0 1

for the device in question. If wedesigned a system such that a devicewas operating with an electrical stressjust below the absolute maximum, thereliability of the design would plummetbecause the system would be function-ing outside the recommended operat-ing conditions. Depending upon theend application, this failure couldresult in anything from the loss oflife/mission to your company’s gaininga bad reputation for quality. To pro-duce reliable equipment, you mustreduce the electrical stress upon thedesign. You should therefore considerthe electrical stresses that devices willbe subjected to when selecting one foryour application.

POWER ARCHITECTURE After the selection of the device, thepower architecture is the next mostcritical area for a successful project,and one that is often overlooked. TheFPGA will have a core voltage that willvary from 0.9 to 1.5 V for modernFPGAs. Often these devices willrequire a quiescent current that can bepretty large for a high-performance

FPGA

Config Memory NVRAM

Power Supplies

Core Voltage

I/O Voltages

Misc. Voltages

Ad

dress

Data

Co

ntro

l

Config

CANIF

RX Data

TX Data

Reset

Clock

Clock[7:0]

DisplayIF

Syncs

Red

Green

Blue

JTAG Connector

JTAG

Pixel Clock

Figure 1 – A typical overall system architecture

Page 45: Xcell82

X P L A N A T I O N : F P G A 1 0 1

First Quarter 2013 Xcell Journal 45

systems that must be particularlynoise free—namely, those supplyingpower rails for high-speed serial linksor sensitive ADC and DAC compo-nents—you may need to implementlinear voltage regulators and addition-al filtering. In all cases, you shouldthoroughly read the data sheet of thedevice in question.

Once you have completed theFPGA design, you can obtain a more-detailed power estimation by usingthe XPower Analyzer within theXilinx ISE® design suite. This stepshould help you close the loop interms of the power architecture.

In addition, you should also consid-er the supply decoupling of devices onthe board. Modern devices are capa-ble of switching many times fasterthan the power supplies regulating thevoltage. Without decoupling capaci-tors, this situation could lead to a dipin the power rail while the regulatorattempts to keep up. Ideally, select thedecoupling capacitors’ values suchthat the impedance profile they pres-ent when combined with inter-PCBplane capacitance is below 0.1 ohm,between 100 kHz and 1 GHz, if possi-

ble. To achieve this performance, youwill need to use a range of decoupling-capacitor values, each with a differentself-resonant frequency. This techniqueenables devices with a poor power sup-ply rejection ratio (PSRR) to achievethe best possible performance, asPSRR tends to decrease with operatingfrequency on some devices.

Finally, try to minimize the number ofadditional voltage rails. Doing so willreduce the complexity of the solution.You also need to be aware of any powerrail sequencing or ramp rates, andensure that your solution meets them.

CLOCKS AND RESET TREE Your system will require at least oneclock to operate. It is normal to use alogic-level oscillator at the desired fre-quency. When considering an oscillator,there are a number of factors to takeinto account. The desired output fre-quency and stability of the oscillator arekey parameters. Oscillator stability isusually given in parts per million; typicalPPMs are +/-50 PPM or +/-100 PPM.

A low phase noise and jitter are alsooften required for oscillators drivinghigh-speed serial links or providing

clocks to ADCs or DACs. Keepingnoise and jitter low is important, aslow noise reduces the bit-error rate ofthe high-speed link, while high noiseand jitter will increase the noise flooron the ADC and DAC, reducing thesignal-to-noise ratio.

You should also consider the outputsignal standard. Differential signalingsuch as LVDS or LVPECL provides for abetter noise immunity than single-ended LVCMOS or LVTTL outputs.Differential signaling also lessens EMIconcerns and delivers faster rise andfall times. However, differential signal-ing can have a worse phase-noise per-formance than a single-ended output.High-performance systems will usesine wave oscillators as the primaryclock source to reduce the effects ofphase noise and jitter; this is especiallytrue when clocking ADCs and DACs.

Since FPGAs can support severalclock domains internally, it is there-fore common these days for systemsto have more than one clock domain.Often, designers will use a high-speedclock and a lower-rate clock if theycannot achieve the necessary divisionvia internal clocking resources such as

IsolatingDC/DC

IP Voltage Switching Regulator

Switching Regulator

Switching Regulator

Intermediate

VoltageCore Voltage

I/O Voltage

Misc. VoltageLinear

Regulator

Reset Reset

Figure 2 – Typical power architecture

Page 46: Xcell82

as Xilinx’s PlanAhead™ to pull in thepin allocation and check for I/O bank-ing rule and placement violations. Thisis especially helpful when using high-speed serial links, as you need toensure that user I/O around the supplypins for these HSSLs is not used.Otherwise, inadvertent noise couldcouple onto the supply rail and affectthe performance of the link.

You should also consider the signalintegrity (remember, it is not the sig-nal frequency but the edge rate thatdetermines the length of trace beforeit will need termination). Most FPGAssupport on-chip termination, reduc-ing the need for discrete components.Of course, this approach works cor-rectly only if you have defined theprinted-circuit-board stack-up as con-trolled impedance.

DLLs, DCMs or PLLs. Alternatively,varying protocols or algorithms couldrequire different clock frequencies,creating multiple clocks and domainswithin the design. Figure 3 shows atypical clock tree.

Many FPGA designs will alsorequire a reset that is connected to anumber of devices to ensure they areinitialized properly following thepower-up of the system or when com-manded, for example via a pushbut-ton. Resets can cause many issues in adesign that can be tough to get a gripon, since they occur only randomly.Once the power rails have reachedtheir operating range, the oscillatorson the board will begin to function,though it will take time for them tobecome stable. It is therefore a goodidea to ensure the reset on the board isasserted until the voltage rails haverisen and the oscillator is stable.

Because this signal will be subject-ed to a potentially high fan-out withinthe FPGA, it’s best to filter the signalheavily to ensure that glitches and EMIcannot result in accidental resets ofthe system.

You must also take care within theFPGA design to ensure that the resetis connected only where it is reallyrequired (remember, with SRAM,FPGA registers and RAMs can be ini-tialized as part of the configuration).Also, take care within the FPGA toensure that the reset removal cannotlead to a metastable event—that is,that its removal near a clock edgecauses metastability. It is thereforeimportant to correctly synchronizethe reset to all clock domains.

Be sure to allocate the clocks andresets to the correct input style. Clockscan be allocated to global or localclock resources, and the hardware

46 Xcell Journal First Quarter 2013

X P L A N A T I O N : F P G A 1 0 1

engineer will have to work closely withthe FPGA engineers to ensure the cor-rect global resources are used.

I/O STANDARDS AND SIGNAL INTEGRITY FPGAs are capable of supporting a num-ber of single-ended and differential I/Ostandards. Planning the I/Os of a deviceand ensuring that the banking rules aremaintained can be a major challenge.This stage of the design requires closecooperation between the FPGA engi-neer and the hardware designer.

This is where the benefits of mini-mizing the voltage rails and interfacetypes prove themselves. Once youhave completed the schematic designor used a tool like Mentor Graphics’I/O Designer to successfully plan thedevice’s I/Os, you can use tools such

ClockSelection

Prime Clock

Redundant Clock

ClockFan-Out

Clock[0]

Clock[1]

Clock[2]

Clock[3]

Clock[4]

Clock[5]

Clock[6]

Clock[7]

PixelClock

Pixel Clock

Figure 3 – Typical clock tree

Many FPGA designs will also require a reset that is connected to a number of devices to ensure they initialize properly. Resets can cause issues that can be tough to get a grip on, since they occur only randomly.

Page 47: Xcell82

X P L A N A T I O N : F P G A 1 0 1

First Quarter 2013 Xcell Journal 47

MECHANICAL AND THERMAL CONSIDERATIONSThe hardware engineer will alsohave to work closely with themechanical engineers involved inthe project to determine the physicalboard outline, positions of connec-tors and the like. Modern FPGAs canget pretty hot, and require thermalconsiderations. This will impact themechanical design, maybe requiringforced airflow (fans) or thermalstraps and heat sinks to achieve thedesired operating environment.

If the application is to be based in aharsh or rugged environment, themechanical design will add more con-straints to the PCB placement, as youwill need to take shock and vibrationlevels into account. Depending uponthe tool sets used, the mechanical andPCB engineers should be able totransfer information as DXF filesbetween CAD systems to check thatboard outlines, keep-out zones andskylines do not clash.

PCB LAYOUT The printed-circuit-board layout isone of the most critical areas inachieving the design performancerequired of the system. For any high-performance system, the board mustbe a controlled-impedance, multilay-er PCB. Achieving this requires dis-cussion between the PCB layoutengineer and the PCB supplier todetermine the stack-up of the board.The stack-up and correspondingtrace widths and separations are crit-ical to implementing a controlled-impedance board. Within most high-performance systems, you must cal-culate both microstrip and striplinecorrectly to provide a controlledimpedance of 50 Ω for single-endedsignals and 100 Ω for differential sig-nals. This enables the hardware engi-neer to correctly terminate any trans-mission lines that occur using eitheron-chip or discrete termination.

It is also important within thestack-up to ensure that the voltage

planes and the return planes are closeto each other in order to allow capaci-tive coupling. This setup enables thesubstrate of the PCB between theplanes to act like a decoupling capaci-tor, adding the decoupling perform-ance. Figure 4 shows a typical imped-ance-control multilayer PCB stack-up.

You must also specify a series ofconstraints to the layout engineer at

this stage as well. These will guide theplacement of components, separatinganalog from digital. The list should alsospecify constraints on the groupingand length matching of buses and sig-nals within the design. To aid with thistask, you might find it helpful to alsospecify a time of flight for the signal.

In addition, don’t forget to consid-er crosstalk between signals within

1. The FPGA itself: The system requirements and functions will dictatethe size, technology and speed grade, while other parameters (high-speedserial links, embedded processors, etc.) will dictate the FPGA choice.

2. Power architecture: In evaluating the power architecture, be sure toconsider the numerous voltage rails needed, the efficiency of the overallsolution and—particularly in more-sensitive applications—the noisepresent on these rails.

3. Configuration: Modern FPGAs require a large nonvolatile memory tostore the configuration data. As these devices have several methods of con-figuration, you need to choose the one that’s most optimal for your design.

4. JTAG port and chain: You will need these to configure both the FPGAand the configuration memory, as well as for boundary-scan testing of thehardware following manufacture to identify production issues.

5. Clocks and reset tree: Providing the different clocks required for theFPGA design and the reset tree will ensure the design is reset correctly.

6. Application-specific interfaces. Depending on your design, thesemight include communications links such as Ethernet, PCI Express™,CAN and RS422 or RS232, allowing the solution to talk to other compo-nents of the system.

7. Application-specific devices: Typically, most designs use discrete devicesinterfacing to the FPGA. Commonly used devices are A/D and D/A converters.

8. Application-specific memories. Use these devices to store large quan-tities of data during processing or application-configuration data, such asimage overlays for an imaging system. Examples include high-capacitymemories (double-data-rate or quad-data-rate SRAMs) and nonvolatilememory—such as flash or FRAM devices—for storing data that must beretained when the system loses power.

9. Operating environment. Attention to this parameter will ensure youpick the correct grade of components.

9 THINGS TO TAKE INTO ACCOUNT WHEN DESIGNING YOUR HARDWARE

Page 48: Xcell82

the PCB. It’s important to separatesignal traces correctly so as to reducesignals on the same layer and ensurethat signals on the layers above orbelow within stripline structures donot have long parallel runs, as this toowill be a source of crosstalk. It is agood idea if possible to declareimpedance-controlled layers betweenplanes as vertical or horizontal rout-ing to ensure that tracks on differentlayers do not have long parallel runs.Increasing the space between twotraces on a PCB will reduce thecrosstalk; a good rule of thumb is tokeep traces separated by at least threetrack widths. However, for more-criti-cal signals, increase the distance tofive or even seven track widths.

Be sure to keep analog and digitalsignals as far apart as possible, andwhenever you can, use separate ana-log returns and power supplies.Neither digital nor analog signals

48 Xcell Journal First Quarter 2013

X P L A N A T I O N : F P G A 1 0 1

Voltage / Ground Plane

Voltage / Ground Plane

Voltage / Ground Plane

Voltage / Ground Plane

Micro Via and Breakout

Top (Component Mounting) Layer 1

Layer 2

Layer 3

Layer 4

Layer 5

Layer 6

Layer 7

Layer 8

Layer 9

Layer 10

Layer 11

Layer 12

Layer 13

Layer 14

Micro Via and Breakout

Bottom (Component Mounting)

50-Ohm

Single-Ended

Signals

50-Ohm

Single-Ended

Signals

100-Ohm

Differential

Signals

Figure 4 – Typical 14-layer impedance-controlled PCB stack-up

Figure 5 -- Results of a DC power drop analysis for a space-grade Virtex-5 device

Page 49: Xcell82

X P L A N A T I O N : F P G A 1 0 1

First Quarter 2013 Xcell Journal 49

should be routed referencing theother’s plane, as it complicates thereturn-current path and will degradethe system performance.

One of the most important stagestoward the end of the layout, oncethe planes have been entered, is thevalidation of both the signal integrityof the completed design and thepower integrity, using tools likeMentor Graphics’ HyperLynx SI andPI. This validation my be an iterativeprocess—more so for the powerintegrity, as the high currents theFPGA core requires may necessitateadjusting more planes to achieve therequired voltage performance of theplanes. Figure 5 shows the results ofa PI simulation for a large, high-per-formance FPGA.

TEST ISSUESThe engineer must also give somethought to how the board and its per-formance will be verified as achiev-ing all requirements. You can use theJTAG controller and boundary-scantesting to test the infrastructure andinterconnectivity of most digitaldevices and drivers. However, youmust also consider the performanceof power, clock and reset circuits.Test headers or points connected tovoltage rail outputs allow quick andeasy checking of the power architec-ture. (Make sure you use a seriesresistor to limit the current should

the test point accidentally becomeshorted to something.) You can usevery low-value resistors (in the mil-liohm range) to determine the cur-rents downstream by the FPGA andsupporting devices. Similarly, if yourdesign uses clock buffers and there isa spare output, it is wise to correctlyterminate this output to allow for test-ing of the clocks.

Debug LEDs to show the status ofthe FPGA done line, and others thatthe application will use, can be ofgreat assistance in terms of debug-ging as the board is commissioned.

It may also be necessary to addlogic-analyzer or pattern-generatorheaders to the board. Make surethese are easily accessible not only atopen-layer testing but also at systemtesting, when the unit is within itsenclosure.

Tools like Xilinx’s ChipScope™will help you debug the FPGA func-tionality and interfaces, and candetermine the bit-error rate on high-speed serial links.

Designing FPGAs into the hard-ware of the system can present anumber of interesting challenges foran engineer. A number of tools areavailable to assist you in this task,while numerous data sheets and userguides can also provide guidance. Butthe best resource is a solid under-standing of the basic hardware ele-ments of your system.

One of the most important stages toward the end of the layout, once the planes have been entered, is the validation of both the

signal integrity of the completed design and the power integrity, using tools like Mentor

Graphics’ HyperLynx SI and PI. This validationmy be an iterative process.

Common Features• On-Board Power Supplies• Very Low-Cost• Long-Term Available• Open Reference Designs• Ruggedized forIndustrial Applications

• Customizable• Custom Integration Services

Development Services• Hardware Design• HDL Design• Software Development

TE0630Xilinx Spartan-6 LX (USB)

www.trenz-electronic.de

GigaBeeXilinx Spartan-6 LX (Ethernet)

• Scalable:45 k to 150 k Logic Cells

• Hi-Speed USB 2.0• Very Small: 47.5 × 40.5 mm• Compatible with TE0300

• Scalable:45 k to 150 k Logic Cells

• Gigabit Ethernet• 2 Independent DDR3Memory Banks

• LVDS I/Os• Very Small: 40 x 50 mm• Coming Soon: Version

Page 50: Xcell82

50 Xcell Journal First Quarter 2013

TOOLS OF XCELLENCE

Using the Parallel FFTfor MultigigahertzFPGA Signal Processing

Page 51: Xcell82

First Quarter 2013 Xcell Journal 51

Very high-speed fast Fouriertransform (FFT) cores are anessential requirement for anyreal-time spectral-monitoringsystem. As the demand formonitoring bandwidth grows in

pace with the proliferation of wireless devices indifferent parts of the spectrum, these systemsmust convert time domain to spectrum ever morerapidly, necessitating faster FFT operations.Indeed, in most modern monitoring systems, it isoften necessary to use parallel FFTs to run at sam-ple throughputs of multiple times the pace of thehighest clock rate achievable in state-of-the-artFPGAs like the Xilinx® Virtex®-7, taking advan-tage of wideband A/D converters that can easilyattain sample rates of 12.5 Gigasamples/secondand more. [1]

At the same time, as communications proto-cols become increasingly packetized, the dutycycles of signals that need to be monitored aredecreasing. This phenomenon requires a dramat-ic decrease in scan repeat time, which necessi-tates low-latency FFT cores. Parallel FFTs canhelp in this regard as well, since the latencyscales down almost proportionally to the ratio ofsample rate to clock speed.

For all of these reasons, let’s delve into thedesign of a parallel FFT (PFFT) with runtime-configurable transform length, taking note of thethroughput and utilization numbers that areachievable when using parallel FFT.

HARDWARE PARALLELISM FOR FFTSDue to the complexity of implementing FFTsdirectly in logic, many hardware designers useoff-the-shelf FFT cores from various vendors.[2] However, most off-the-shelf FFT cores use“streaming” or “block” architectures thatprocess only one or fewer samples per clock,which limits the throughput to the maximumclock speed achievable by the FPGA or ASICdevice. A PFFT offers a faster alternative. APFFT can accept multiple samples per clockand process them in parallel, to deliver multipleoutput samples per clock. This architecturemultiplies the throughput beyond the achiev-able device clock speed, but comes at an addi-tional cost in area and complexity. Thus, to usea PFFT you will have to make trade-offs inthroughput vs. area. The trade-offs for a typicalVirtex-7 FPGA design are outlined in Figure 1and Table 1.

T O O L S O F X C E L L E N C E

by Chris EddingtonSenior Technical Marketing ManagerSynopsys [email protected]

Baijayanta RayCorporate Application Engineer for Synphony Model CompilerSynopsys [email protected]

Page 52: Xcell82

Looking at the table, a few general fea-tures can be seen in the trade-off curve:

1. As parallel throughput increases,multiplier (area) utilization increas-es, with a slightly lower multiple(better than linear).

2. Slower system clocks and timing clo-sure yield sublinear throughputgrowth as parallelism increases.However, on modern FPGAs thisdegradation is diminishing.

3. Overall better-than-linear through-put/area growth is realized due to No.1 and No. 2 above.

4. Latency decreases as parallelismincreases.

Note that the specific numbers meas-ured in Table 1 are valid only for agiven target and configuration of theFFT. In this case, that is a length of1024, with 16-bit input, dynamic lengthprogrammability (4 through 1024) andflow control. Flow control is veryimportant for applications such asspectral monitoring, where side-chan-nel information is often utilized tochange the FFT size (in order tochange the resolution bandwidth) orto temporarily stall the FFT while

other operations, such as acquisition,are going on. In theory, you canaccomplish flow control by insertingbuffers before the transform opera-tion. But for acquisition-driven opera-tions like spectral monitoring, it’s noteasy to precompute the size of thebuffer required, resulting in the needto maintain large, fast and expensivememory banks.

IMPLEMENTATIONARCHITECTURE While there are a number of ways toimplement FFTs, a parallelized versionof the Radix2 Multi-Path DelayCommutator kernel (Radix2-MDC) [3]works very well as a modular methodto create configurable parallel-FFTcores that scale well in advancedFPGA devices. The Radix2-MDC is aclassical approach to buildingpipelined FFTs of varying lengths, asshown in Figure 2a for a 16-length FFT.It breaks the input sequence into twoparallel data streams flowing forwardwith the correct “distance” betweendata elements that are entering the but-terfly (a subelement of FFT algo-rithms) and that are scheduled byproper delays. The Radix2-MDC is rel-atively easy to parallelize using a widerdata path and vector operations, asshown in Figure 2b. MDC structuresalso lend themselves easily to flowcontrol and dynamic length reconfigu-

T O O L S O F X C E L L E N C E

Multiple outputs/clk

Optional flow controlsynchronization, and

dynamic lenghth inputs

Multiple outputs/clk

Optional flow control andsynchronization outputs

System clock

ParallelFFT

Figure 1 – A parallel FFT processes multiple samples at a time to scale throughput beyondachievable system clocks of the target device. Optional features include flow control,

synchronization and dynamic length programmability.

FFTArchitectureLength 1024

16-bit

Number ofComplex Input

Samples

Max SystemClock

Achieved

FFTThroughput(Samples/s)

HardwareMultiplierUtilization

Latency(SystemCycles)

Streaming

Parallel x2

Parallel x4

Parallel x8

Parallel x16

1

2

4

8

16

500 MHz

500 MHz

490 MHz

490 MHz

440 MHz

500 MS/s

1 GS/s

1.968 GS/s

3.92 GS/s

7.088 GS/s

32

64

128

260

408

1260

630

360

220

145

Table 1 – Area scalability is generalized by hardware multiplier utilization. Throughput scalability vs. area is slightly better than linear and generally very usable for increasing throughput to multigigahertz sample rates.

Typical performance and area trade-offs for parallel FFTs on Virtex-7 class devices

52 Xcell Journal First Quarter 2013

Page 53: Xcell82

ration, as opposed to single-path delayfeedback (SDF) structures, where theincorporation of flow control (stall)signals typically reduces maximumthroughput considerably.

Another choice that can affect scal-ability is the complex-multiplierimplementation—that is, either 4mul-tiply (4M) or 3multiply (3M) struc-tures. Choosing a 3M complex multi-ply often leads to lower area usage inyour design, but at the expense ofslower clock speeds. [4] This trade-offis also very dependent on the DSPhardware of the FPGA device. Beloware the most important parametersand the choices we made in the casestudy that we are about to present:

• Length = 1024

• Input precision = 16 bits

• Radix2-MDC architecture using4Mult-5add complex multipliers

• Data path precision = 1-bit growthper stage (10 stages / bits forL=1024)

• Dynamic length programmabilityincluded

• Optional flow control and synchro-nization turned on

CASE STUDY: A 1024-LENGTHPARALLEL FFT For our case study, we used Synopsys’Synphony Model Compiler (SynphonyMC or SMC) high-level synthesis tool [5]to explore a set of parallel-FFT resultsfor Xilinx Virtex-7 and Virtex-5 FPGAs.

Synphony MC has a parallel-FFTIP block that is designed specificallyto achieve high-quality-of-results(QoR) DSP mapping for advancedFPGA devices like the Virtex-7. Thiscustom IP block instantiates arith-metic primitives in the subsystemunderneath to create an architecture

based on user-specified options suchas length, precision, flow controland dynamic programmability.

Using the vector support and acustom block methodology inSynphony MC, we were able to createa concise, parameterizable parallelRadix2-MDC block that enabledquick instantiation of parallel FFTsacross different configurations.Figure 3 shows a subset of a 1024x16PFFT data path using these customblocks (called PR2MDC). Each blockinputs and outputs a length-32 vectorrepresenting 16 complex samples ofreal and imaginary values, and alength and twiddle-factor addressparameter. The fixed-point wordlength grows 1 bit in each stage, butis parameterized for easy adjustment.Underneath this block is the imple-mentation of Figure 2b usingSynphony MC arithmetic operators(multiply, add, shift registers, etc.).

T O O L S O F X C E L L E N C E

First Quarter 2013 Xcell Journal 53

C2

8 4 2 1

BF2 BF2 BF2

4

C2 C2 C2BF2

2 1

a) Radix2-MDC for 16x1 streaming FFT

b) Parallel Radix2-MDC for NxK parallel FFT

K parallelcomplexinputs 2K

2K Delay line

Delay line

+

-

Butterfly #P

2KMux

2K 2KTwiddlefactor mult

K parallelcomplexinputs

K parallelcomplexoutputs

K parallelcomplexinputs

Butterfly#1

Butterfly#2

Butterfly#2

Butterfly#log2(N)-k

K pointAll Parallel FFT

Figure 2 – The Radix2-MDC kernel (a, at top) can effectively be parallelized and used in

a modular way to create parallel-FFT implementations (b).

Page 54: Xcell82

Although SMC is a high-level designenvironment, you can use it to achievespecific mapping for a variety of targettechnologies. In this case, we chose a4M complex multiply that maps intoone DSP48E unit using 18x25 multiplymode. The block is designed to daisy-chain the twiddle-factor lookup forperformance and flow control capa-bility (note that Figure 3 does notshow flow control). The SMC parallel-FFT core does not show any signifi-cant timing degradation between turn-ing flow control on or off, or usingdynamic length programmability; thethroughput scalability it offers on

advanced FPGA devices is a useful fea-ture when building real-time spectral-monitoring applications.

We used Synphony MC to generateRTL optimized for Virtex-5 andVirtex-7 targets, and then used the2012.09 release of Synphony ModelCompiler and Synplify Pro withXilinx’s ISE® 14.2 tool suite for place-and-route. Table 2 shows the post-place-and-route area and timingresults. The observed scalabilitymatches very well to theoreticalexpectations and the FPGA family’scapabilities, achieving better than 7Gsamples/s using an X16 parallel con-

figuration with only approximately6.3 times the resources of the 1X con-figuration. The better-than-linear areagrowth is due to multiply operationsthat become fixed-coefficient as par-allelism increases, and thus get imple-mented more efficiently in logic thanif you were using a full multiply in aDSP unit. This happens mostly in thelast K-point FFT (see Figure 2b).

HIGHER THROUGHPUTParallel FFTs are required for high-throughput gigasample applicationssuch as spectral monitoring. Our PFFThardware implementation offers some

54 Xcell Journal First Quarter 2013

T O O L S O F X C E L L E N C E

Parallel FFT1

Virtex-5 Virtex-5 Virtex-5 Virtex-5 Virtex-5 Virtex-5

Max SystemClock

Achieved(Virtex-7)

FFTThroughput2

(Samples/s)(Virtex-7)

DSP48EUtilization2

(Virtex-7)

BRAMUtilization2

(Virtex-7)

Latency(SystemCycles)

Parallel x2

Parallel x4

Parallel x8

Parallel x16

500 MHz

492 MHz

490 MHz

443 MHz

64

128

260

408

1 GS/s

1.968 GS/s

3.92 GS/s

7.088 GS/s

8

11

16

8

630

360

220

145

Parallel x2

Parallel x4

Parallel x8

Parallel x16

1. Using Synphony Model Compiler, parallel Radix2-MDC architecture and configuration. 2. Implementation flow with Synphony Model Compiler 2012.09., Synplify Pro 2012.09 and ISE 14.2

349 MHz

312 MHz

280 MHz

251 MHz

64

128

260

408

0.698 GS/s

1.248 GS/s

2.24 GS/s

4.016 GS/s

8

11

16

8

630

360

220

145

Table 2 – The PFFT block’s portability is illustrated by the excellent results achieved on both the Virtex-7 485T: -3 and Virtex-5 SX95T: -2.

Figure 3 – The parallel-FFT IP block user interface (left) delivers a microarchitecture based on user-specified configuration.

Page 55: Xcell82

insight and guidance on the throughputscalability for advanced Xilinx FPGAdevices. A case study demonstratesthat the parallel FFTs using the parallelRadix2-MDC architecture can effec-tively achieve throughput of more than7 GHz on FPGA devices in an X16 par-allel configuration. We developed thePFFT block in a high-level design flowusing Synphony Model Compiler, withthe ability to fine-tune the implementa-tion for DSP hardware mapping acrossdifferent Xilinx devices. [5] Synopsys ismaking this block freely available to allSynphony Model Compiler users.

For more information on Syn-phony Model Compiler, please visitht tp : / /www.synopsys . com/cg i -

bin/sld/pdfdla/pdfr1.cgi?file=syn-

phony_model_comp_ds.pdf?cmp=FP

GA-SMC-xcell.

References

1. Tektronix Component Solutions

Web Page, http://component-solutions.tek.com/services/data-converter/

2. Xilinx FFT Core Datasheet,

http://www.xilinx.com/support/documen-tation/ip_documentation/ds808_xfft.pdf

3. Shousheng He; Torkelson, M.; “A

new approach to pipeline FFT proces-

sor,” Parallel Processing Symposium,

1996, Proceedings of IPPS ’96, The

10th International, April 1996

4. Xilinx Complex Multiplier v5.0

Block Datasheet,

http://www.xilinx.com/support/docu-mentation/ip_documentation/cmpy/v5_0/ds793_cmpy.pdf

5. Synphony Model Compiler

Datasheet, http://www.synopsys.com/cgi-bin/sld/pdfdla/pdfr1.cgi?file=synphony

T O O L S O F X C E L L E N C E

First Quarter 2013 Xcell Journal 55

Page 56: Xcell82

56 Xcell Journal First Quarter 2013

The new OZ745 platform eases thedesign of real-time professionalbroadcast and other apps.

TOOLS OF XCELLENCE

OmniTek, Xilinx RollZynq-7000 VideoDevelopment Kit by Mike Santarini

Publisher, Xcell JournalXilinx, [email protected]

Page 57: Xcell82

First Quarter 2013 Xcell Journal 57

Xilinx and Xilinx Alliance mem-ber OmniTek this month willmake available a real-time

video-processing platform targeted atcompanies creating high-bandwidthequipment for a wide range of motion-picture industries.

The new platform, called theOmniTek OZ745 Zynq™-7000 SoCVideo Development Kit, includes allthe key components to help OEMsimprove system performance, reduceBOM costs and get to market quicklywith new and innovative monitors, stu-dio cameras, production switchers andan expanding list of other applicationsin the medical, automotive and aero-space industries that require leading-edge video technologies.

The platform—which includes hard-ware, design tools, IP, an operating sys-tem and preverified reference designs—is built around the Xilinx® Zynq-7045 AllProgrammable SoC. This device inte-

grates high-speed transceivers capableof supporting uncompressed HD video(for example, SD/HD 3G-SDI). “Withthis platform, the broadcast and profes-sional A/V industry has, for the firsttime, a completely programmabledevice that bridges software and hard-ware domains,” said Roger Fawcett,managing director at OmniTek.

Fawcett said that the Zynq-7000 SoCenables system architects to implementvideo- and audio-processing algorithmsin two ways: either in software on thedual-core ARM processors onboard, oroffloaded into FPGA hardware if theapplication needs acceleration to meetreal-time demands. An FPGA solutionis particularly useful for multichannelHD, 4K, faster frame rates and othersophisticated video demands. Further,the tight coupling between softwareprocessor and FPGA hardware makesit possible to develop codec algorithmsin both domains.

“It is also worth noting that eachARM® Cortex™-A9 processor includesa NEON single-/double-precision float-ing-point unit. Added to the DSP-richFPGA fabric, this FPU gives the DSPdesigner a highly flexible platform onwhich to design all kinds of signal-pro-cessing algorithms,” said Fawcett.“Additionally, the ability to stream andprocess uncompressed video fromtransceivers through the FPGA hard-ware, while monitoring the videostream in software as well as imple-menting overall system control, meansthat the innovation possibilities areendless.” Then too, he said, a Zynq-based solution “also means savingpower and cost through integration ofmultiple ASSPs.”

The OmniTek board (see Figure 1)extends the capabilities of the Zynqdevice further by offering support fora wide range of signal types throughfive SD/HD 3G-SDI input/outputs; an

T O O L S O F X C E L L E N C E

USB Comms

LEDs

S Video + VGA Input

USB x 2

1G EthernetUSB x 2

CompIn

SFP+Cage

SDI I/OSD, HD, 3G PCB Dimensions: 190 mm x 140 mm

HDMIIn/Out

DIPSwitch

Genlock andClock Synthesis

HPC FMC

XilinxZynq-7045FFG900

2GB x 64-bit DDR3 PL

512MB x 32-bit DDR3 PS

16MB + 256MB QSPI FLASH

ARM PJTAG

SD Card Slot

Analog + DigitalAudio I/O

XADC

SerialPorts

FPD Power

JTAG

10 x LVDS I/O

Figure 1 – The board for OmniTek’s OZ745 Zynq-based Video Development Kit

Page 58: Xcell82

HDMI input/output pair; and compos-ite, VGA and S-Video inputs.

The board also provides analogand digital audio I/O; up to four USBports; a serial port; a 1-Gbps Ethernetport; and 10 bidirectional LVDS portsto use, for example, with an LCD flat-panel display. Also included are LCDflat-panel power, an HPC FMC expan-sion connector and an SD Card slot,along with FPGA and ARM JTAGdebug ports. There is also an SFP+cage, which designers can use, forexample, to add 10-Gbps Ethernetcapabilities for supporting new andemerging standards for video-over-Internet Protocol networks.

TWO REFERENCE DESIGNS“Just as important as the hardware isthe board support package that is

provided alongside the OZ745 board,”said Fawcett. In addition to a Linuxbuild with Qt graphics and boardbuilt-in self-test (BIST), “this packagealso offers two reference designs foruse as the basis for the user’s owndesigns,” he said. The first referencedesign provides design teams withthe firmware, drivers and other ele-ments they need to enable the ARMprocessor to access and control allthe I/O and peripherals on the board.A simple control application allowsdesigners to exercise the I/O on theboard as well as peek at and poke theI/O and the onboard SDRAM.

Meanwhile, the second referencedesign is a version of the latestXilinx multichannel Real-Time VideoEngine (RTVE 2.1). This referencedesign implements a range of video

IP blocks within the FPGA fabric ofthe Zynq-7045 SoC to deliver a work-ing FPGA design that design teamscan use to evaluate image quality(see Figure 2).

READY-MADE VIDEO PIPELINEFawcett said that a key element ofRTVE 2.1 is OmniTek’s new OSVP IPblock, which provides all the firmwareneeded to handle multiformat conver-sion and compositing in a single IPblock. Among the functions the OSVPblock handles are video deinterlacingand resizing of up to 10 SDI channels;chroma resampling; and multichannelcompositing of video inputs onto asmany as 10 video outputs, facilitatingpicture-in-picture, quad-split displayand other effects. The IP block alsosupports video graphics overlay; 3:2

58 Xcell Journal First Quarter 2013

T O O L S O F X C E L L E N C E

SDI Rx

SDI Rx

SDI Rx

SDI Rx

SDI Rx

SDI Rx

SDI Rx

SDI Rx

HDMIRx

AXI MIG SDRAM Controller

AXI Interconnect

ARM Cortex-A9

InterruptController

1GEthernet

ChromaResample

CCM

OmniTek OSVPScalable Video

Processor

SDI Tx

HDMITx

Figure 2 – Outline structure of the RTVE 2.1 reference design

A second reference design implements a range of video IPblocks within the FPGA fabric of the Zynq-7045 SoC, delivering

a working FPGA design to use in evaluating image quality.

Page 59: Xcell82

and 2:2 film-cadence detection andprocessing; detail enhancement; andframe synchronization, even acrosschanges in video standards.

Fawcett said the company specifi-cally optimized the OSVP block forXilinx FPGA technology. “All theinterfaces to the block conform to theAXI4 protocol, and Xilinx COREGenerator™ system and EDK integra-tion are also supported, making theblock easy to interface to other IPintended for implementation in Xilinxtechnology,” he said.

Applications include flat-panel dis-play controllers, video-format convert-ers, multiviewer displays and videopattern generators. The platform sup-

ports all SD, HD and 3-Gbit/secondSDI formats, and can also interfacewith flat-panel displays with resolu-tions up to 4K and beyond.

RANGE OF IP BLOCKSIn addition to providing develop-ment platforms like the new Zynqboard, OmniTek also supplies anexpanding range of video-process-ing-related IP blocks, many of whichare available optimized for Xilinxtechnology. The functions coveredinclude video deinterlace and resize;SDI audio embed and extract; videoand audio monitoring; video andaudio test-pattern generation; audiocodecs; 608, 708, OP-47 and PAL

closed-caption and teletext decode;SDI eye pattern and jitter monitoring;and digital effects.

OmniTek also operates an IP anddesign consultancy service that spe-cializes in FPGA, electronics andsoftware design for video- andimage-processing applications in thebroadcast, medical-imaging, industrialand defense markets.

The OmniTek OZ745 Video Develop-ment Kit featuring Xilinx’s Zynq-7000All Programmable SoC is available now.For more information, please visit theXilinx Broadcast page (http://www.

xilinx.com/applications/broadcast/

index.htm) or the OmniTek website, athttp://www.omnitek.tv.

T O O L S O F X C E L L E N C E

First Quarter 2013 Xcell Journal 59

Page 60: Xcell82

60 Xcell Journal First Quarter 2013

XAPP744: HARDWARE-IN-THE-LOOP (HIL)SIMULATION FOR THE ZYNQ-7000 ALLPROGRAMMABLE SOChttp://www.xilinx.com/support/documentation/applica-

tion_notes/xapp744-HIL-Zynq-7000.pdf

The fact that the Xilinx® Zynq™-7000 All Programmable SoCdoes not deliver a simulation model poses a problem fordesigners. This application note describes a way to bring theMPCore processor subsystem into the ISE® Design SuiteSimulator (ISim) simulation environment with the help ofZynq-7000 platform hardware-in-the-loop (HIL) simulationtechnology. Author Umang Parekh explains how to bring theprocessing system in a Zynq-7000 AP SoC design into thesimulation and guides the user in setting up a system envi-ronment for HIL simulation.

The Zynq-7000 All Programmable SoC is a new class ofproduct from Xilinx that combines an industry-standardARM® dual-core Cortex™-A9 MPCore processor subsys-tem (PS) with Xilinx 28-nanometer programmable logic(PL). Traditionally, Xilinx has offered the MicroBlaze™embedded processor. Designers created peripheral IP inthe FPGA logic in RTL and simulated the entire systemusing an RTL simulator for which Xilinx provided theMicroBlaze simulation RTL model.

HIL technology simulates, debugs and tests both the PSand the PL portions of a Zynq-7000 SoC design. All of theAXI-based IP connected to the PS through the master/slavegeneral-purpose, high-performance or Accelerator Co-herency Port (ACP) interfaces can be simulated in ISim,while the PS is simulated in the hardware (the ZC702 board).Clocking the AXI-based PL interfaces using the testbenchclock allows cycle-accurate simulation of IP in the PL. The PSand the DDR memory operate in free-running mode.

This approach is very useful for developing IP with theprocessor system, and also for IP driver development andsoftware debugging. Software support for the Zynq-7000 HILis available in the 14.2 version of the Xilinx ISE design tools.

HIL simulation is much faster than RTL simulation, ensuringquick design turns for both hardware and software.

XAPP892: IMPLEMENTING SMPTE SDI INTERFACES WITH VIRTEX-7 GTX TRANSCEIVERShttp://www.xilinx.com/support/documentation/applica-

tion_notes/xapp892-smpte-sdi-if-v7-gtx-transceivers.pdf

The Society of Motion Picture and Television Engineers’(SMPTE) serial digital interface family of standards formsthe foundation for much professional broadcast video equip-ment. Broadcast studios and video production centers usethese SDI interfaces to carry uncompressed digital video,along with embedded ancillary data such as multiple audiochannels. The Xilinx SMPTE SD/HD/3G-SDI LogiCORE™ IPis a generic SDI receive/transmit datapath that does not haveany device-specific control functions. This application noteprovides a module containing control logic that couples theSMPTE SDI LogiCORE IP with the Virtex®-7 GTX trans-ceivers to form a complete SDI interface. Author John Snowalso provides several example SD/HD/3G-SDI designs thatrun on the Xilinx Virtex-7 FPGA VC707 evaluation board.

Designers can connect the Xilinx SDI core to a Virtex-7GTX transceiver to implement an SDI interface capable ofsupporting the SMPTE SD-SDI, HD-SDI and 3G-SDI stan-dards. Some additional logic is necessary to connect theSDI core and GTX transceiver together to implement afully functional SDI interface. The application notedescribes this extra control and interface logic, and pro-vides the necessary control and interface modules in bothVerilog and VHDL source code.

Also included is a wrapper file that contains an instanceof the control module for the GTX transceiver and aninstance of the SMPTE SDI core with the necessary connec-tions between them. This wrapper file simplifies the processof creating an SDI interface. Two SDI demonstration appli-cations provide detailed examples of SDI implementationsin a Virtex-7 FPGA design.

XAMPLES. . .

Application NotesIf you want to do a bit more reading about how our FPGAs lend themselves to a broad number of applications,we recommend these application notes.

Page 61: Xcell82

First Quarter 2013 Xcell Journal 61

XAPP794: 1080P60 CAMERA IMAGE PROCESSING REFERENCE DESIGNhttp://www.xilinx.com/support/documentation/applica-

tion_notes/xapp794-1080p60-camera.pdf

The Xilinx Zynq-7000 All Programmable SoC Video andImaging Kit (ZVIK) builds on the ZC702 evaluation kit byadding hardware, software and IP components for thedevelopment of custom video applications. The includedvideo reference designs, WUXGA color image sensor andvideo I/O FPGA mezzanine card (FMC) with HDMI inputand output enable users to immediately start developmentof video system software, firmware and hardware designs.

This application note by Mario Bergeron (Avnet, Inc.)and Steve Elzinga, Gabor Szedo, Greg Jewett and Tom Hillof Xilinx describes how to set up and run the 1080p60 cam-era image-processing reference design using the ZVIK. Theauthors also provide instructions on how to build the hard-ware and software components and how to create the SDcard boot image.

The VITA-2000 image sensor from ON Semiconductor,which is configured for 1080p60 resolution, generates thevideo input. An image-processing pipeline implementedusing LogiCORE IP video cores converts the raw Bayersubsampled image to an RGB image; the IP cores removedefective pixels, de-mosaic and color-correct the image. Avideo frame buffer is implemented in the processing sys-tem’s DDR3 memory, making images accessible to theARM processor cores via the AXI Video Direct MemoryAccess (VDMA). The video frame buffer is not required forthe operation of the image-processing pipeline, but isincluded in the design to enable the capture of input videoimages for analysis.

XAPP524: SERIAL LVDS HIGH-SPEED ADC INTERFACEhttp://www.xilinx.com/support/documentation/applica-

tion_notes/xapp524-serial-lvds-adc-interface.pdf

This application note by Marc Defossez describes a methodof utilizing dedicated SelectIO™ deserializer components(ISERDESE2 primitives) in 7 series FPGAs to interface withanalog-to-digital converters (ADCs) that have serial low-volt-age, differential-signaling (LVDS) outputs. The associated ref-erence design illustrates a basic LVDS interface connecting aKintex™-7 FPGA to an ADC with high-speed, serial LVDSoutputs. The high-speed ADCs used today have a resolutionof 12, 14 or 16 bits, with possible multiple converters in a sin-gle package. Each converter in the package can function instandalone mode or can be combined in an interleaved modeto double or quadruple the conversion (sample) speed.

In both standalone and interleaved modes, designerscan use one or two physical serial outputs as a connectionto the interfacing device. One set of differential outputs iscalled a data lane. Using one data lane means that the con-verter is operating in one-wire mode, while a two-data-lane design works in two-wire mode. For every possibledata output combination there is always one high-speedbit clock and one sample rate frame clock available. Theone-wire mode is used in single- and double-data-rate(SDR and DDR) configurations, while two-wire mode isused only in DDR mode.

The FPGA’s SelectIO technology deserializer compo-nents are configured as ISERDESE2 primitives. The appli-cation note uses two ISERDESE2s in SDR mode to cap-ture a DDR signal. One ISERDESE2 is clocked at the ris-ing edge and the second at the falling edge of the bit clock.This method allows capturing up to 16 bits, since eachISERDESE2 can capture 8 bits.

XAPP743: EYE SCAN WITH MICROBLAZE PROCESSOR MCShttp://www.xilinx.com/support/documentation/applica-

tion_notes/xapp743-eye-scan-mb-mcs.pdf

This application note by Mike Jenkins and DavidMahashin describes code that executes on an internalMicroBlaze processor, implementing the algorithm tomeasure a statistical eye (bit-error ratio vs. time and volt-age offset) at the post-equalization, data-sampling pointwithin the receiver of a 7 series FPGA GTX transceiver.Point-by-point measured data is stored in Block RAM tobe burst-read by an external host PC. This software-cen-tric approach for implementing a statistical eye measure-ment provides a flexible and extensible method for incor-porating eye scan capability into existing designs.

As line rates and channel attenuation increase,receiver equalizers are more often enabled to over-come channel attenuation. This situation poses a chal-lenge to system bring-up, because it’s impossible todetermine the quality of the line by measuring the far-end eye opening at the receiver pins. At high line rates,the received eye measured on the printed-circuit boardcan appear to be completely closed even though theinternal eye, which follows the receiver equalizer, isopen. The RX Eye Scan in GTH, GTX and GTP trans-ceivers of 7 series FPGAs provides a mechanism tomeasure and visualize the receiver eye margin after theequalizer step. Additional use modes enable severalother methods to determine and diagnose the effects ofequalization settings.

Page 62: Xcell82

62 Xcell Journal First Quarter 2013

XAPP792: DESIGNING HIGH-PERFORMANCE VIDEOSYSTEMS WITH THE ZYNQ-7000 ALLPROGRAMMABLE SOChttp://www.xilinx.com/support/documentation/applica-

tion_notes/xapp792-high-performance-video-zynq.pdf

With high-end processing systems like the Xilinx Zynq-7000 All Programmable SoC, customers want to fully uti-lize the processing system (PS) and custom peripheralswithin the device. An example is multiple video pipelinesin which live video streams are written into memory(input) and memory is read to send out live video streams(output) in which the processor is accessing memory. Thisapplication note by James Lucero and Ygal Arbel coversdesign principles that target customers needing high per-formance from the Zynq-7000 SoC’s memory interfaces,from AXI master interfaces implemented in the program-mable logic (PL) and from the ARM Cortex-A9 processors.

Guaranteed worst-case latency is required for the videostreams in such designs, to ensure that no frames aredropped or corrupted. Providing high-speed AXI masterinterfaces in the PL with lower latency and direct accessto the device’s memory interfaces demands high-perform-ance (HP) interfaces. The Zynq-7000 AP SoC contains fourHP links that are 64-bit or 32-bit AXI3 slave interfacesdesigned for high throughput.

The design uses three AXI video direct-memory access(VDMA) engines to simultaneously move six streams(three transmit and three receive), each in 1,920 x 1,080pformat, with a 60-Hz refresh rate and up to 32 data bits perpixel. Each VDMA is driven from a video test pattern gen-erator (TPG) with a video timing controller (VTC) core toset up the necessary video timing signals. Data read byeach AXI VDMA goes to a common on-screen display(OSD) core capable of multiplexing or overlaying multiplevideo streams to a single output video stream. The outputof the OSD core drives the HDMI video display interfaceon the board.

Performance-monitor cores are added to capture per-formance data. The three AXI VDMAs are connected tothree separate HP interfaces by means of the AXI inter-connect and are controlled by the Cortex-A9 processor.This design uses 70 percent of the memory controllerbandwidth. The reference system is targeted for the Zynq-7000 ZC702 evaluation board.

XAPP586: USING SPI FLASH WITH 7 SERIES FPGAShttp://www.xilinx.com/support/documentation/applica-

tion_notes/xapp586-spi-flash.pdf

This application note by Arthur Yang describes the advan-tages of selecting a serial peripheral interface (SPI) flash asthe configuration memory storage for the Xilinx 7 seriesFPGAs. The document includes the required connectionsbetween the FPGA and the SPI flash memory, and the detailsnecessary to select the proper SPI flash.

The SPI flash is a simple, low-pin-count solution for con-figuring 7 series FPGAs. Support of indirect programmingenhances ease of use by allowing in-system programmingupdates through reuse of connections already required forthe configuration solution. Although some other configura-tion options permit faster configuration times or higher den-sity, the SPI flash solution offers a good balance of speedand simplicity.

XAPP733: APPLYING MULTIBOOT AND THE LOGICORE IP SOFT-ERROR MITIGATION CONTROLLERhttp://www.xilinx.com/support/documentation/applica-

tion_notes/xapp733-multiboot-sem-controller.pdf

This document and six supporting reference designs detailthe implementation of the MultiBoot feature and theLogiCORE IP Soft Error Mitigation (SEM) controller forSpartan®-6, Virtex-6 and 7 series FPGAs. The MultiBoot fea-ture supports robust design management and field-upgradecapabilities. Designers initiate a MultiBoot event by com-manding the FPGA to fully reconfigure itself with an alter-nate bitstream, usually over the Internal ConfigurationAccess Port (ICAP).

Typically, says author Eric Crabill, the SEM controller hasexclusive control of the ICAP to meet its functional and per-formance specifications. So, to apply MultiBoot and theSEM controller, you must coordinate ICAP sharing andmight also need to organize multiple sets of SEM controllerdata in addition to multiple sets of FPGA configuration data.This application note demonstrates how to do so, while alsoillustrating how to organize multiple sets of SEM controllerdata. The supporting reference designs yield implementa-tions for hardware evaluation on the SP605, ML605 andKC705 evaluation kits.

XAMPLES. . .

Page 63: Xcell82

First Quarter 2013 Xcell Journal 63

XAPP523: LVDS 4X ASYNCHRONOUSOVERSAMPLING USING 7 SERIES FPGAShttp://www.xilinx.com/support/documentation/appli-

cation_notes/xapp523-lvds-4x-asynchronous-over-

sampling.pdf

Xilinx 7 series FPGAs can implement asynchronous com-munication using SelectIO™ interface resources, makingit possible to reserve GT transceivers for other uses. Thisapplication note by Marc Defossez describes a method ofcapturing asynchronous communication using LVDS withSelectIO interface primitives. The method consists ofoversampling the data with a clock of similar frequency(±100 ppm) by taking multiple samples of the data at dif-ferent clock phases to get a sample of the data at themost ideal point.

The SelectIO interface in 7 series FPGAs can perform4x asynchronous oversampling at 1.25 Gbits/second byusing ISERDESE2 primitives. A mixed-mode clock man-ager (MMCME2_ADV) generates the clocks through ded-icated high-performance paths between the components.The technique also helps to reduce cost by making it pos-sible to select a smaller FPGA for your design.

XAPP538: SOFT ERROR MITIGATION USING PRIORITIZED ESSENTIAL BITShttp://www.xilinx.com/support/documentation/appli-

cation_notes/xapp538-soft-error-mitigation-essential-

bits.pdf

A soft error in configuration memory can corrupt adesign. However, the proper operation of a typical designloaded into a Xilinx FPGA involves only a fraction of thetotal number of configuration memory cells. This appli-cation note describes a method for defining the hierar-chical regions of interest in a design and identifying theprioritized essential bits associated with the defined userlogic, using the ISE design tools, version 13.4 and later.Author Robert Le also explains how to use the LogiCOREIP Soft Error Mitigation (SEM) controller with the prior-itized essential bits to detect and correct soft errors inthe configuration memory of Xilinx 7 series and Virtex-6FPGAs. The method reduces the effective failures in time(FIT) and increases design availability.

GetonTarget

Is your marketing message reaching the right people?

Hit your target by advertising your product or service in the Xilinx Xcell Journal, you’ll reach

thousands of qualified engineers, designers, and engineering managers worldwide.

The Xilinx Xcell Journal is an award-winning publication, dedicated specifically to helping

programmable logic users – and it works.

We offer affordable advertising rates and a variety of advertisement sizes to meet any budget!

Call today: (800) 493-5551 or e-mail us at

[email protected]

Page 64: Xcell82

INTRODUCING VIVADOWEBPACK EDITION

The Vivado WebPACK™ tool enables youto simulate, synthesize and implementyour design at no cost when targetingselect devices. The Vivado Design SuiteWebPACK Edition supports the Artix™-7(7A100T, 7A200T) and Kintex™-7(7K70T, 7K160T) devices.

You can download the WebPACKtoday at www.xilinx.com/download.

VIVADO 2012.4 DEVICE SUPPORT

The following devices are productionready:

■ Virtex®-7 2000T (including Low Voltage)

■ Artix-7 100T and 200T

The Virtex-7 X1140T device is in gen-eral engineering sampling.

LOGIC SIMULATION

Vivado now offers faster simulationmodels for GTHE2 primitives, fea-

turing a 5x to 6x speedup over cur-rent models. The simulator nowsupports Aldec Riviera-PRO incompile_simlib.

VIVADO HIGH-LEVELSYNTHESIS (HLS)

Xilinx has expanded register-trans-fer-level (RTL) co-simulation sup-port to include four new simula-tors: Cadence’s Incisive and Aldec’sRiviera-PRO, as well as the XilinxVivado and ISE simulators.

Support has been added forfloating-point single- and double-precision tan, atan, sinh, cosh andexponent cmath.h functions.

In addition, intellectual property(IP) exported in Pcore format nowsupports a fully or partially synthe-sized netlist, reducing the overall syn-thesis time when the IP is used inXilinx Platform Studio. Vivado nowsupports Xilinx development boardsas targets for synthesis in the graphi-cal user interface (GUI). The VivadoHLS library contains 31 beta functionsfor video and OpenCV I/O interfaces.

SYSTEM GENERATOR FOR DSP

A new model-upgrade feature pro-vides seamless migration whenusing the latest version of supersed-ed blocks. Also, System Generatorfor DSP has several new floating-point blocks, including multiplyadd, accumulator and exponential.The latest version of the tool offersa 5x simulation time improvementfor the DSP48 macro block. Modelsusing multiple DSP48 macro blockswill see significant simulation-timeimprovement.

XILINX INTELLECTUALPROPERTY

For a detailed list of Xilinx IP cores,see the IP Release Notes Guide(XTP025).

For more information regarding thelatest version of Vivado, please see theVivado 2012.4 Release Notes, athttp://www.xilinx.com/support/docu-

m e n t a t i o n / s w _ m a n u a l s / x i l -

inx2012_4/irn.pdf.

64 Xcell Journal First Quarter 2013

What’s New in theVivado 2012.4 Release? The Vivado™ Design Suite 2012.4 is now available, at no additional cost, to all

Xilinx® ISE® Design Suite customers that are currently in warranty. The Vivado

Design Suite provides a highly integrated design environment with a completely

new generation of system-to-IC-level features, including high-level synthesis,

analytical place-and-route and an advanced timing engine. These tools enable

developers to increase design integration and implementation productivity.

XTRA, XTRA

Page 65: Xcell82

Q: WHAT IS THE VIVADO

DESIGN SUITE?

A: It’s all about improving design-er productivity. This entirely

new tool suite was architected toincrease your overall productivity indesigning, integrating and implement-ing with the 28-nanometer family ofXilinx All Programmable devices. With28-nm manufacturing, Xilinx devicesare now much larger and come with avariety of new technologies includingstacked-silicon interconnect, high-speed I/O interfaces operating at up to28 Gbps, hardened microprocessorsand peripherals, and analog/mixed sig-nal. These larger and more complexdevices present developers with multi-dimensional design challenges thatcan prevent them from hitting marketwindows and increasing productivity.

The Vivado Design Suite is a completereplacement for the existing XilinxISE Design Suite of tools. All of thecapabilities of the ISE Design Suitepoint tools are now built directly intothe Vivado integrated developmentenvironment (IDE), leveraging ashared scalable data model. With theVivado Design Suite, developers areable to accelerate design creation withhigh-level synthesis and implementa-tion by using place-and-route to ana-lytically optimize for multiple and con-current design metrics, such as timing,congestion, total wire length, utiliza-tion and power. Thanks to Vivado’sshared scalable data model, the entiredesign process can be executed inmemory without the need to write ortranslate any intermediate file for-mats, accelerating runtimes, debugand implementation while reducingmemory requirements.

Vivado provides users with upfrontmetrics that allow for design andtool-setting modifications earlier inthe design process, when they haveless overall impact on the schedule.This capability reduces design itera-tions and accelerates productivity.Users can manage the entire designprocess in a pushbutton manner by

and now WebPACK, the no-cost,device-limited version of the toolsuite. In-warranty ISE Design SuiteLogic and Embedded Edition cus-tomers will receive the new VivadoDesign Edition. ISE Design Suite DSPand System Edition customers willreceive the new Vivado SystemEdition. To download today, pleasevisit www.xilinx.com/download.

Xilinx recommends that customersstarting a new design contact theirlocal FAE to determine if Vivado isright for the design. Xilinx does notrecommend transitioning duringthe middle of a current ISE DesignSuite project, as design constraintsand scripts are not compatiblebetween the environments.

For more information, please readthe ISE 14.4 and Vivado 2012.4release notes.

Q: WHAT ARE THE LICENS-

ING TERMS FOR VIVADO?

A: The Vivado Design SuiteWebPACK Edition is a free

download that provides support forArtix-7 100T and 200T and Kintex-770T and 160T devices. The VivadoDesign Suite Design Edition isavailable at no additional cost to in-warranty ISE Design Suite LogicEdition and Embedded Edition cus-tomers, while the Vivado DesignSuite System Edition with high-level synthesis is available at noadditional cost to ISE Design SuiteDSP and System Edition customers.

For customers who generated anISE Design Suite license for ver-sions 13 or 14 after Feb. 2, 2012,your current license will also workfor Vivado. Customers who are stillin warranty but who generatedlicenses prior to February 2 of lastyear will need to regenerate theirlicenses in order to use Vivado.

For license generation, please visitwww.xilinx.com/getlicense.

using the Flow Navigator feature inthe Vivado IDE, or control it manuallyby using Tcl scripting.

Q: SHOULD I CONTINUE TO

USE THE ISE DESIGN

SUITE OR MOVE TO VIVADO?

A: The ISE Design Suite is anindustry-proven solution for all

generations of Xilinx’s All Pro-grammable devices. The Xilinx ISEDesign Suite continues to bring innova-tions to a broad base of developers, andextends the familiar design flow for 7series and Xilinx Zynq™-7000 AllProgrammable SoC projects. ISE 14.4,which brings new innovations and con-tains updated device support, is avail-able for immediate download.

The Vivado Design Suite 2012.4, Xilinx’snext-generation design environment,supports 7 series devices includingVirtex-7, Kintex-7 and Artix-7 FPGAs. Itoffers enhanced tool performance, espe-cially on large or congested designs.

Q: IS VIVADO DESIGN SUITE

TRAINING AVAILABLE?

A: Vivado is new and takes fulladvantage of industry standards

such as powerful interactive Tcl script-ing, Synopsys Design Constraints,SystemVerilog and more. To reduceyour learning curve, Xilinx has rolledout new instructor-led classes to showyou how to use the Vivado tools. Formore information, please visitwww.xilinx.com/training. We alsoencourage you to view the VivadoQuick Take videos found at www.xil-

inx.com/training/vivado.

Q: ARE THERE DIFFERENT

EDITIONS OF THE VIVA-

DO DESIGN SUITE?

A: The Vivado Design Suite comesin three editions: Design, System

First Quarter 2013 Xcell Journal 65

Page 66: Xcell82

66 Xcell Journal First Quarter 2013

Xpress Yourself in Our Caption Contest

If you’re looking to Xercise your funny bone, here’s your opportunity. We invite readers to swing over to our verbal challenge and submit anengineering- or technology-related caption for this cartoon showing a

Tarzengineer in the jungle of his workplace. The image might inspire a captionlike “Eddie made a New Year’s resolution to get more exercise, but his col-leagues disapproved of the way he retrofitted the lab.”

Send your entries to [email protected]. Include your name, job title, companyaffiliation and location, and indicate that you have read the contest rules atwww.xilinx.com/xcellcontest. After due deliberation, we will print the sub-missions we like the best in the next issue of Xcell Journal. The winner willreceive an Avnet ZedBoard, featuring the Zynq™-7000 All Programmable SoC(http://www.zedboard.org/). Two runners-up will gain notoriety, fame and acool, Xilinx-branded gift from our swag closet.

The contest begins at 12:01 a.m. Pacific Time on Feb. 6, 2013. All entries mustbe received by sponsor by 5 p.m. PT on April 1, 2013.

So, grab a vine and get swinging!

ROBERT TAYLOR, senior engineerat MacAulay Brown, Inc.

(Dayton, Ohio), won a shiny newAvnet ZedBoard with this caption

for the cartoon of lab cats in Issue 81 of Xcell Journal:

Congratulations as well to

our two runners-up:

Bob had his hopes of designing withVivado dashed when he discovered his

new abbreviated job title actuallystood for “Feline Play and Grooming

Accessory Design Engineer.”

– John Blessing, senior design engineer,

Garmin International (Olathe, Kans.)

“I thought you said you had a mouse problem.”

– Hector Herrera, lead technologist,

Pier Programming Services Ltd.

(Vancouver, British Columbia)

“I asked for anti-static mats, not cats!”

DA

NIE

LG

UID

ER

AXCLAMATIONS!

NO PURCHASE NECESSARY. You must be 18 or older and a resident of the fifty United States, the District of Columbia, or Canada (excluding Quebec) to enter. Entries must be entirely original. Contest begins onFeb. 6, 2013. Entries must be received by 5:00 pm Pacific Time (PT) on April 1, 2013. Official rules available online at www.xilinx.com/xcellcontest. Sponsored by Xilinx, Inc. 2100 Logic Drive, San Jose, CA 95124.