unit iii : cpld & fpga architecture & applications

Dr.Y.Narasimmha Murthy Ph.D [email protected]

UNIT III : CASE STUDIES

[CPLD & FPGA ARCHITECTURE & APPLICATIONS]

INTRODUCTION:

The Field Programmable Gate Arrays consist of an array of programmable logic blocks

including general logic, memory and multiplier blocks, surrounded by a programmable routing

fabric that allows blocks to be . The array is surrounded by programmable input/output blocks,

labeled I/O in the figure, that connect the chip to the outside world. Here the term

“programmable” indicates an ability to program a function into the chip after completion of

silicon fabrication . This is possible by the programming technology, which is a method that

can cause a change in the behavior of the pre-fabricated chip after fabrication, in the “field,”

where system users create designs. The first programmable logic devices used very small fuses

as the programming technology. Every FPGA depends on a programming technology that is

used to control the programmable switches that give FPGAs their programmability.

Programming Technologies

There are a number of programming technologies that have been used for reconfigurable

architectures. Each of these technologies have different characteristics and have significant effect

on the programmable architecture. Some of the well-known technologies are

(i).SRAM Based Programming Technology (ii).Flash Programming Technology(EEPROM) ,

and (iii) Anti-fuse based Programming Technology

SRAM-Based Programming Technology

Static memory cells are the basic cells used for SRAM-based FPGAs. Most commercial vendors like

XILINX, Lattice and Altera etc.. use static memory (SRAM) based programming technology in their

devices. These devices use static memory cells which are divided throughout the FPGA to provide

configurability. An example of such memory cell is shown below .In an SRAM-based FPGA, SRAM

cells are mainly used for following purposes

(i). To program the routing interconnect of FPGAs which are generally steered by small multiplexors.

1


(ii). To program Configurable Logic Blocks (CLBs) that are used to implement logic functions.

There are two primary uses for the SRAM cells. Most are used to set the select lines to

multiplexers that steer interconnect signals. The majority of the remaining SRAM cells are used

to store the data in the lookup-tables (LUTs) that are typically used in SRAM-based FPGAs to

implement logic functions. Historically, SRAM cells were used to control the tri-state buffers

and simple pass transistors that were also used for programmable interconnect.

SRAM-based programming technology has become the dominant approach for FPGAs because

of its re-programmability and the use of standard CMOS process technology and therefore

leading to increased integration, higher speed and lower dynamic power consumption of new

process with smaller geometry.

There are however a number of drawbacks associated with SRAM-based programming

technology. For example an SRAM cell requires 6 transistors which makes this technology

costly in terms of area compared to other programming technologies.

Further SRAM cells are volatile in nature and external devices are required to permanently store

the configuration data. These external devices add to the cost and area overhead of SRAM-based

FPGAs.

There is a problem in terms of security of data also. Since the configuration information must be

loaded into the device at power up, there is the possibility that the configuration information

2


could be intercepted and stolen for use in a competing system. To overcome this problem certain

encryption techniques are followed.

Electrical properties of pass transistors are not ideal. i.e SRAM-based FPGAs typically rely on

the use of pass transistors to implement multiplexers. However, they are far from ideal switches

as they have significant on-resistances and present an appreciable capacitive load. As FPGAs

migrate to smaller device geometries these issues may be exacerbated.

Flash Programming Technology

An important alternative to the SRAM-based programming technology is the use of flash or

EEPROM based programming technology. This technology inject charge onto a gate that

“floats” above the transistor. This approach is used in flash or EEPROM memory cells. These

cells are non-volatile; they do not lose information when the device is powered down. With

modern IC fabrication processes, it has become possible to use the floating gate cells directly as

switches. Flash memory cells, in particular, are now used because of their improved area

efficiency. The widespread use of flash memory cells for non-volatile memory chips ensures that

flash manufacturing processes will benefit from steady decreases in process geometries.

Flash-based programming technology offers several advantages. For example, this programming

technology is nonvolatile in nature. Flash-based programming technology is also more area

efficient than SRAM-based programming technology. Flash-based programming technology has

its own disadvantages also. Unlike SRAM-based programming technology, flash based devices

cannot be reconfigured/reprogrammed an infinite number of times. Also, flash-based technology

uses non-standard CMOS process.

This flash-based programming technology offers several unique advantages, most importantly

non-volatility. This feature eliminates the need for the external resources required to store and

load configuration data when SRAM-based programming technology is used. Additionally,

a flash-based device can function immediately upon power-up instead of having to wait for the

loading of configuration data. The flash approach is also more area efficient than SRAM-based

technology which requires up to six transistors to implement the programmable storage. The

programming circuitry, such as the high and low voltage buffers needed to program the cell,

contributes an area overhead not present in SRAM-based devices. However, this cost is

relatively modest as it is amortized across numerous programmable elements. In comparison to

3


anti-fuses, an alternative non-volatile programming technology, flash-based FPGAs are

reconfigurable and can be programmed without being removed from a printed circuit board. The

use of a floating-gate to control the switching transistor adds design complexity because care

must be taken to ensure the source–drain voltage remains sufficiently low to prevent charge

injection into the floating gate . Since newer processes require lower voltage levels, this issue

may become less of a concern in the future .One disadvantage of flash-based devices is that they

cannot be reprogrammed an infinite number of times. Charge buildup in the oxide eventually

prevents a flash-based device from being properly erased and programmed . Devices such as the

Actel ProASIC3 are useful for only 500 programming cycles . For most of the uses of

FPGAs ,this programming count is more than sufficient. In many cases FPGAs are programmed

for only one use. Another significant disadvantage of flash devices is the need for a non-

standard CMOS process. Also, like the static memory-based technology, this programming

technology suffers from relatively high resistance and capacitance due to the use of transistor-

based switches. One trend that has recently emerged is the use of flash storage in combination

with SRAM programming technology.

In devices from Altera, Xilinx and Lattice, on-chip flash memory is used to provide non-

volatile storage while SRAM cells are still used to control the programmable elements in the

design. This addresses the problems associated with the volatility of pure-SRAM approaches,

such as the cost of additional storage devices or the possibility of configuration data interception,

while maintaining the infinite re-configurability of SRAM-based devices.

It is important to recognize that, since the programming technology is still based on SRAM cells,

the devices are no different than pure-SRAM based devices from an FPGA architecture

standpoint. However, the incorporation of flash memory generally means that the processing

technology will not be as advanced as pure-SRAM devices. Additionally, the devices incur more

area overhead than pure-SRAM devices since both flash and SRAM bits are required for every

programmable element.

Anti-fuse Programming Technology

An alternative to SRAM and floating gate-based technologies is anti fuse programming

technology. This technology is based on structures which exhibit very high-resistance under

normal circumstances but can be programmably “blown” (in reality, connected) to create a low

resistance link.

4


An anti-fuse is a two terminal device with an unprogrammed state presenting a very high

resistance between its terminals. When a high voltage (from 11 to 20 volts, depending on the

type of anti-fuse) is applied across its terminals the anti-fuse will “blow” and create a low

resistance link. This link is permanent. Anti-fuses in use today are built either using an Oxygen-

Nitrogen-Oxygen (ONO) dielectric between N+ diffusion and poly-silicon or amorphous silicon

between metal layers or between polysilicon and the first layer of metal.

Programming an anti-fuse requires extra circuitry to deliver the high programming voltage and a

relatively high current of 5 mA or more. This is done in through fairly sizable pass transistors to

provide addressing to each anti-fuse. Anti-fuse technology is used in the FPGA’s from Actel ,

Quick logic , and Cross point.

A major advantage of the anti-fuse is its small size, little more than the cross-section of two

metal wires. But this advantage is limited by the large size of the necessary programming

transistors, which handle large currents, and the inclusion of isolation transistors that are

sometimes needed to protect low voltage transistors from high programming voltages.

A second major advantage of an anti-fuse is its relatively low series resistance. The on-resistance

of the ONO anti-fuse is 300 to500 ohms, while the amorphous silicon anti-fuse is 50 to100 ohms.

Additionally, the parasitic capacitance of an un programmed amorphous anti-fuse is significantly

lower than for other programming technologies.

The limitations of this technology are , this technology does not make use of standard CMOS

process. Also, anti-fuse programming technology based devices cannot be reprogrammed. The

ideal technology should be re-programmable, non-volatile, and that uses a standard CMOS

process. But it is clear that none of the above technologies satisfy these conditions.

However, SRAM-based programming technology is the most widely used programming

technology. The main reason is its use of standard CMOS process .Due to this reason it is

expected that this technology will continue to dominate the other two programming technologies.

5


Comparison of Programming Technologies

Programmin

g Technology

Re-Programmable Volatile

Storage

Series

Resistance

Capacitance

in pf

Cell Area

Static RAM In-circuit Yes 1KΩ 15 5X

Anti-Fuse No No 50-500 Ω 1.2 – 5.0 1X

EPROM Outside circuit No 2 KΩ 10 1X

EEPROM In-Circuit No 2 KΩ 10 2X

XILINX XC3000 FPGA Device

Xilinx introduced the first FPGA family, called the XC2000 series, in 1984 and next offered

three more series of FPGAs namely XC3000, XC4000, and XC5000 etc. The first modern-era

FPGA was introduced with 64 logic blocks and 58 inputs and outputs.

XC3000 series of FPGA devices were introduced in 1985 by XILINX Inc.This was the most

successful family of FPGAs. The XC3000 archtecture includes enhancements to the XC2000

architecture to improve performance ,density and usability. The XC3000 architecture was

developed with manual tools for design implementation and the architecture also shows a bias

towards manual design. The XC3000 Family covers a range of nominal device densities from

2,000 to 9,000 gates, practically achievable densities from 1,000 to 6,000 gates with up to 144

user-definable I/Os. Device speeds, described in terms of maximum guaranteed toggle

frequencies, range from 70 to 125 MHz. The XC3000 Configurable Logic block is substantially

larger than XC2000 and Each of the lookup tables has four inputs and requires 16 bits of

configuration memory.

The two lookup tables can be combined with a multiplexer to produce any function of five inputs

and some functions of up to seven inputs.The XC3000 archtecture allows faster logic

implementation with minimum CLBs in series.

There are now four distinct families within the XC3000 Series of FPGA devices

• XC3000A Family

6


• XC3000L Family

• XC3100A Family

• XC3100L Family

All four families share a common architecture, development software, design and programming

methodology, and also common package pin-outs.

• XC3000A Family : The XC3000A is an enhanced version of the basic XC3000 family,

featuring additional interconnect resources and other user-friendly enhancements.

• XC3000L Family : The XC3000L is identical in architecture and features to the XC3000A

family, but operates at a nominal supply voltage of 3.3 V. The XC3000L is the right solution for

battery-operated and low-power applications.

• XC3100A Family — The XC3100A is a performance-optimized relative of the XC3000A

family. While both families are bit stream and footprint compatible, the XC3100A family

extends toggle rates to 370 MHz and in-system performance to over 80 MHz. The XC3100A

family also offers one additional array size, the XC3195A.

• XC3100L Family — The XC3100L is identical in architectures and features to the XC3100A

family, but operates at a nominal supply voltage of 3.3V.

The details of XC3000 family of devices are given below in the table.

S.NO Member of the

family

CLB Array Size IOs Gate Capacity

Max Typical

1 XC3020 8X8 64 2000 1200

2 XC3030 10X10 80 3000 1800

3 XC3042 12X12 96 4200 2500

4 XC3064 16X14 120 6400 3800

5 XC3090 16X20 144 9000 5500

6 XC3195 22X22 168 13000 7500

The basic LCA (Logic Cell Array) of XC3000 consists of three components .They are

Programmable I/O Blocks , Configurable Logic Block and Programmable Interconnect. In

addition to this a small amount of configurable memory is also present .

7


Programmable I/O Block The I/O Block of the XC3000 is more complex than the XC2000 ,IOB.The important addition in

this is a flip-flop in the out-put path .By registering the data in IOB ,the clock to-out- time does

ot include interconnect delays. The result is a fast ,predictable clocked output. The XC3000 IOB

also includes a programmable pull up, optional output inversion and selectable slew rate. A

lower I/O slew rate for low speed signals reduce power surges, simplifying board level

design.Input from the pad can be brought into the interior of the chip either directly or registered

or both.By allowing both, the I/O block can de-multiplex external signals such as address/data

busses,storing the address in the IO flip-flop and feeding the data directly into the wiring.

Each user-configurable IOB as shown below, provides an interface between the external

package pin of the device and the internal user logic. Each IOB includes both registered and

direct input paths. Each IOB provides a programmable 3-state output buffer, which may be

driven by a registered or direct output signal. Configuration options allow each IOB an

inversion, a controlled slew rate and a high impedance pull-up. Each input circuit also provides

input clamping diodes to provide electrostatic protection, and circuits to inhibit latch-up

produced by input currents.

8


Each IOB includes input and output storage elements and I/O options selected by configuration

memory cells. A choice of two clocks is available on each die edge. The polarity of each clock

line (not each flip-flop or latch) is programmable. A clock line that triggers the flip-flop on the

rising edge is an active Low Latch Enable (Latch transparent) signal and vice versa. Passive pull-

up can only be enabled on inputs, not on outputs. All user inputs are programmed for TTL or

CMOS thresholds.

Configurable Logic Block : The XC3000 CLB is substantially larger than the XC2000 CLB.

Each of the look-up tables has four inputs rather than three and hence requires sixteen bits of

configuration memory rather than eight. The lookup tables can be combined with a multiplexer

to produce any function of five inputs and some functions of up to seven inputs.This allows the

XC3000 architecture to implement faster logic. The XC3000 CLB has two flip-flops ,to ensure

that all combinational logic can be followed by a pipelining flip-flop. The register rich CLB

allows the XC3000 to implement state intensive applications and heavily pipe lined designs

efficiently.

As shown in the block diagram , each CLB includes a combinatorial logic section, two flip-flops

and a program memory controlled multiplexer selection of function. It has the following

components

Five logic variable inputs A, B, C, D, and E

a direct data in DI

an enable clock EC

a clock (invertible) K

an asynchronous direct RESET RD

Two outputs X and Y.

9


XC3000 CLB

Each CLB has a combinatorial logic section, two flip-flops, and an internal control section. The

CLB has five logic inputs (A, B, C, D and E) ; a common clock input(K); an asynchronous

direct RESET input (RD) and an enable clock (EC) as shown in the block diagram. Each CLB

also has two outputs (X and Y) which may drive interconnect networks. Data input for the flip-

flops within a CLB is supplied from the function F or G outputs of the combinatorial logic, or

the block input, DI. Both flip-flops in each CLB share the asynchronous RD which, when

enabled , is dominant over clocked inputs. All flip-flops are reset by the active-Low chip input,

RESET, or during the configuration process. The flip-flops share the enable clock (EC) which,

when Low, re circulates the flip-flops present states and inhibits response to the data-in or

combinatorial function inputs on a CLB. The user may enable these control inputs and select

their sources. The user may also select the clock net input (K), as well as its active sense within

each CLB. This programmable inversion eliminates the need to route both phases of a clock

signal throughout the device.

Programmable Interconnect :

Programmable-interconnection resources in the Field Programmable Gate Array provide routing

paths to connect inputs and outputs of the IOBs and CLBs into logic networks. Interconnections

10


between blocks are composed of a two-layer grid of metal segments. Specially designed pass

transistors, each controlled by a configuration bit, form programmable interconnect points (PIPs)

and switching matrices used to implement the necessary connections between selected metal

segments and block pins.

The XC3000 interconnect structure has five general interconnect lines both vertically and

horizontally .In addition each CLB has direct connections to adjacent CLBs both vertically and

horizontally.

Three types of metal resources are provided to accommodate various network interconnect

requirements.

• General Purpose Interconnect

• Direct Connection

• Long lines (multiplexed busses and wide AND gates)

These interconnects are shown in the diagrams below.

XC3000 Interconnect

11


The channels in the XC3000 are at the ends of the fixed output wires.The output channels are not

both adjacent to the CLB.This enlarges the immediate neighbourhood for high speed

connections between CLBs ,since a signal can skip a switch box in two of the four directions.For

XC3000 ,the pins are accessable from more than one channel .Therefore the routability depends

on which channel the placer expects the router to use to route to the pinand on the ability of the

router to bring the signal into the channel on the correct track.

Additional enhancements to the XC3000speed and density came as a result of software

improvements.These improvements do not change the maximum gate capacity,nor do they

change the maximum toggle frequency,but they do increase the typical capacity and narrow the

difference between the toggle frequency and the automatically achievable clock

frequency .Software has improved the speed of automatically placed and routed designs by about

50% .

XILINX XC4000 FPGA Device : The XC4000 was designed to improve performance and gate

density for large designs. Several dedicated features were added to the general purpose logic

features of XC3000 , resulting an interesting combination of special –purpose and general

purpose functions. The XC4000 family was designed using placement and routing tools to

evaluate architectural decisions. The architectural features were designed to interact efficiently

with an automated design methodology.

The basic building blocks used in the XC4000 family are :

(i)Look-up tables for implementation of logic functions.A designer can use a fumction generator

to implement any Boolen function of a given number of inputs by pre-loading the memory with

the bit pattern corresponding to the truth table of the function.All functions of a function

generator have the timing ,the time to look-up results in the memory.Therefore ,the inputs to the

function generator are fully interchangeable by simple rearrangement of the bits in the look-up

table.

(ii).A Programmable Interconnect Point(PIP) is a pass transistor controlled by a memory cell.The

PIP is the basic unit of configurable interconnect mechanism.the wire segments on each side of

the transistor are connected depending on the value in the memory cell.The pass transistor

introduces resistance into the interconnected paths and hence delay occurs.

12


The advanced Features of the XC4000 FPGAs are :

(i).CLBs can be used as on-chip RAM

(ii).Fast carry chain for high speed implementation of arithmetic

(iii).Boundary scan compatibility (JTAG)

(iv).Wide decode logic

(v).More global clocks

(vi).Faster placement and routing algorithms

(vii).Scaled routing resources

Configurable Logic Block (CLB): The XC4000 CLB is similar to the XC3000CLB.It contains

three lookup tables and two flip-flops.The two primary look-up tables F &G implement any

function of four variables.These two results can be brought out of the block independently or

they can be combined with another input in the H –look up table to make any function of five

inputs or some function of up to nine inputs.This allows functions such as nine-input AND ,OR ,

exclusive OR (parity) or address decode to be done at high speed in one clock.The flip-flop can

take their inputs independently from the look-up tables or from external signals,but they share

control signals. Unlike XC2000 and XC3000 ,flip-flop outputs are not recirculated internally.A

registered feedback signal in the XC4000 must be routed in the general interconnect back to a

CLB input pin.

The XC3000 can implement arithmetic with sum in one look-up table and carry in another look-

up table.The XC4000 CLB can implement arithmetic in this way also,but as the speed of the

arithmetic operation is dominated by the speed of the carry chain ,the XC4000 CLB includes

dedicated high speed carry logic.

The block diagram below shows the XC4000 CLB .The dedicated carry logic in the XC4000

substantially speeds-up arithmetic while doubling its density. This XC4000 Configurable Logic

Block (CLB) is based on look-up tables (LUTs). A LUT is a small one bit wide memory array,

where the address lines for the memory are inputs of the logic block and the one bit output from

the memory is the LUT output. A LUT with K inputs would then correspond to a 2K x 1 bit

memory and can realize any logic function of its K inputs by programming the logic function’s

13


truth table directly into the memory. The XC4000 CLB contains three separate LUTs, in the

configuration as shown below. There are two 4-input LUTS that are fed by CLB inputs, and the

third LUT can be used in combination with the other two. This arrangement allows the CLB to

implement a wide range of logic functions of up to nine inputs, two separate functions of four

inputs or other possibilities. Each CLB also contains two flip-flops.

Xilinx XC4000 Configurable Logic Block (CLB).

XC4000 I/O BLOCK:

The block diagram below shows the I/O block.The signals to be output from the chip can be

registered before output and enabled by a separate control signal.Outputs can be optionally

pulled up or down and the output driver can be configured with either fast or or slow slew

rate.Inputs from the pad can be brought into the interior of the chip directly ,registered or both

to facilitate multiplexed bus interfaces.Further more ,inputs can drive dedicated decoders ,built

into the edge interconnect ,for fast recognition of addresses.

The XC4000IOB includes boundary scan logic compatible with the ANSI

IEEE1149.1(JTAG)boundary scan standard. The boundary scan can check internal logic or

14


external logic.Scan operation can take place before and after the FPGA is programmed and do

not interfere with the operation of the part.

To provide high density devices that support the integration of entire systems, the XC4000

chips have “system oriented” features. For example, each CLB contains circuitry that allows it to

efficiently perform arithmetic (i.e., a circuit that can implement a fast carry operation for adder-

like circuits) and also the LUTs in a CLB can be configured as read/write RAM cells. A new

version of this family, the 4000E, has the additional feature that the RAM can be configured as a

dual port RAM with a single write and two read ports. In the 4000E, RAM blocks can be

synchronous RAM. Also, each XC4000 chip includes very wide AND-planes around the

periphery of the logic block array to facilitate implementing circuit blocks such as wide

decoders.

Interconnect Structure :

The other important feature of this FPGA is its interconnect structure. The XC4000

interconnect is arranged in horizontal and vertical channels. Each channel contains some number

of short wire segments that span a single CLB (the number of segments in each channel depends

on the specific part number), longer segments that span two CLBs, and very long segments that

span the entire length or width of the chip. Programmable switches are available to connect the

inputs and outputs of the CLBs to the wire segments, or to connect one wire segment to another..

The figure below shows only the wire segments in a horizontal channel, and does not show the

vertical routing channels, the CLB inputs and outputs, or the routing switches.

The salient feature about the Xilinx interconnect is that signals must pass through switches to

reach one CLB from another, and the total number of switches traversed depends on the

15


particular set of wire segments used. Thus, speed-performance of an implemented circuit

depends in part on how the wire segments are allocated to individual signals by CAD tools.

Actel FPGAs

In contrast to XILINX FPGAs the devices manufactured by Actel are based on anti fuse

technology. Actel offers three main families .They are : Act 1, Act 2, and Act 3. Actel devices

are based on a structure similar to traditional gate arrays; the logic blocks are arranged in rows

and there are horizontal routing channels between adjacent rows. This architecture is shown in

figure below. The logic blocks in the Actel devices are relatively small in comparison to the LUT

based ones. , and are based on multiplexers. The figure illustrates the logic block in the Act 3

and shows that it comprises an AND and OR gate that are connected to a multiplexer based

circuit block. The multiplexer circuit is arranged such that, in combination with the two logic

gates, a very wide range of functions can be realized in a single logic block. About half of the

logic blocks in an Act 3 device also contain a flip-flop.

Actel FPGA structure.

16


Actel’s interconnect is organized in horizontal routing channels. The channels consist of wire

segments of various lengths with anti-fuses to connect logic blocks to wire segments or one wire

to another. Also, Actel chips have vertical wires that overlay the logic blocks, for signal paths

that span multiple rows. In terms of speed-performance, it is evident that Actel chips are not fully

predictable, because the number of anti-fuses traversed by a signal depends on how the wire

segments are allocated during circuit implementation by CAD tools. However, Actel provides a

rich selection of wire segments of different length in each channel and has developed algorithms

that guarantee strict limits on the number of anti-fuses traversed by any two-point connection in

a circuit which improves speed-performance significantly.

Quicklogic pASIC FPGAs :

The Quicklogic is the main competitor for Actel in anti-fuse -based FPGAs . It produces two

families of devices, called pASIC and pASIC-2. The pASIC-2 is an enhanced version of

pASIC. The pASIC, consists of a regular two-dimensional array of blocks called pASIC Logic

Blocks (pLBs).The logic capacities of first generation of Quick Logic FPGAs is between 48 and

380pLBs,or 500 to 4000 equivalent MPGAs gates.

As shown in figure below pASIC has similarities to other FPGAs i.e the overall structure is

array-based like Xilinx FPGAs, and logic blocks use multiplexers similar to Actel FPGAs, and

the interconnect consists of only long- lines like in Altera FLEX 8000. It is to be noted that the

pASIC architecture is now independently developed by Cypress also.

17


Structure of Quicklogic pASIC FPGA.

It consists of a top layer of metal, an insulating layer of amorphous silicon, and a bottom layer of

metal. When compared to Actel’s PLICE anti-fuse, Via Link offers a very low on-resistance of

about 50 ohms (PLICE is about 300 ohms) and a low parasitic capacitance. The Via Link anti-

fuses are present at every crossing of logic block pins and interconnect wires, providing generous

connectivity.

Quicklogic (Cypress) Logic Cell

pASIC’s multiplexer-based logic block is shown in the above figure. It is more complex than

Actel’s Logic Module, with more inputs and wide (6-input) AND-gates on the multiplexer select

lines. Every logic block also contains a flip- flops.

Altera FLEX 8000 and FLEX 10000 FPGAs :

18


The first FPGA chips from Aletra were simple arrays of logic cells ,which are relatively simple

logic elements (LEs),each element comprising of a three input look-up table (LUT ) to generate

logic functions ,a single configurable flip-flop and multiplexers for routing the signals and

selecting clocks. The logic cells were connected by switch boxes instead of fixed interconnect.

The general architecture of Altera’s FPGAs is shown in the diagram below .

.

There are two high performance FPGA series called FLEX series. Altera’s FLEX 8000 series

consists of a three-level hierarchy similar to CPLDs. However, the lowest level of the hierarchy

consists of a set of lookup tables, rather than an SPLD like block, and so the FLEX 8000 is

categorized here as an FPGA. It should be noted, however ,that FLEX 8000 is a combination of

FPGA and CPLD technologies. FLEX 8000 is SRAM-based and features a four-input LUT as its

basic logic block. Logic capacity ranges from about 4000gates to more than 15,000 for the 8000

series.

The architecture of FLEX 8000 is shown in figure below. The basic logic block, called a Logic

Element (LE) contains a four-input LUT, a flip-flop, and special-purpose carry circuitry for

arithmetic circuits (similar to Xilinx XC 4000). The LE also includes cascade circuitry that

allows for efficient implementation of wide AND functions.

19


In the FLEX 8000, LEs are grouped into sets of 8, called Logic Array Blocks (LABs, a term

borrowed from Altera’s CPLDs). As shown in Figure below each LAB contains local

interconnect and each local wire can connect any LE to any other LE within the same LAB.

Architecture of Altera FLEX 8000 FPGAs.

Altera FLEX 8000 Logic Element (LE).

Local interconnect also connects to the FLEX 8000’s global interconnect, called Fast Track. Fast

Track is similar to Xilinx long lines in that each Fast Track wire extends the full width or height

of the device. However, a major difference between FLEX 8000 and Xilinx chips is that Fast

Track consists of only long lines. This makes the FLEX 8000 easy for CAD tools to

20


automatically configure. All Fast-Track wires horizontal wires are identical, and so interconnect

delays in the FLEX 8000 are more predictable than FPGAs that employ many smaller length

segments because there are fewer programmable switches in the longer paths. Predictability is

furthered aided by the fact that connections between horizontal and vertical lines pass through

active buffers.

The FLEX 8000 architecture has been extended in the state-of-the-art FLEX 10000 family.

FLEX 10000 offers all of the features of FLEX 8000, with the addition of variable-sized blocks

of SRAM, called Embedded Array Blocks (EABs) which shows that each row in a FLEX 10000

chip has an EAB on one end. Each EAB is configurable to serve as an SRAM block with a

variable aspect ratio: 256 x 8, 512 x 4, 1K x 2, or 2K x 1. In addition, an EAB can alternatively

be configured to implement a complex logic circuit, such as a multiplier, by employing it as a

large multi-output lookup table. Altera provides, as part of their CAD tools, several macro-

functions that implement useful logic circuits in EABs. Counting the EABs as logic gates, FLEX

10000 offers the highest logic capacity of any FPGA, although it is hard to provide an accurate

number.

Concurrent Logic FPGA Device : The manufacturer Concurrent Logic offers the CFA6006

FPGA device ,which is based on two dimensional array of identical blocks ,where each block is

symmetrical on its four sides. The array holds 3136 of such blocks ,providing a total logic

capacity of about 5000 equivalent gates. Connections are formed using multiplexers that are

configured by a static RAM programming technology.

The structure of the Concurrent Logic Block is shown below diagram. It comprises of user

configurable multiplexers, basic gates and a D type flip-flop .The concurrent FPGA is especially

suitable for register-intensive and arithmetic applications since the logic block can easily

implement a half-adder and a register bit.

21


There are two direct connections A and B formed by routing signals through the multiplexers

within the blocks.Long connection is implemented using a bussing network, in which wires of

various lengths are superimposed on the array of logic blocks.

Crosspoint Solutions FPGAs:

The crosspoint FPGAs are different from other FPGAs because it is configurable at the

transistor level as aoposed to logic block level in other FPGAs.Basically the architecture

consists of rows of transistor pairs ,where the rows are separated by horizontal wiring

segments .Veritical wiring segments are also available ,for connection among the rows.

Each transistor row comprises two lines of series connected transistors ,with one line being

NMOS and the other PMOS .The wiring resources allow individual transistor pairs tobe

interconnected to implement CMOS logic gates. The programming technology used for the

programmable switches is similar to the Via-Link anti-fuse ,which is based on amorphous

silicon.

The structure of the transistor pair rows is shown in below diagram.The diagram shows the

implementation of a NOR gate and a NAND gate using the transistor lines. The transistor

gates ,drains , sources can be programmable interconnected to other transistors and also to

power and ground.The series connections across the lines is broken where necessary by

22


permanently holding a transistor in its OFF state. A wide range of logic gates can be

implemented by the transistor lines and the interconnection patterns.

The FPGAs currently offered by Crosspoint Solutions has a total logic capacity of 4200

gates.The chip has 256 rows of transistor pairs and an additional 64-rows of multiplexer like

structures are provided.With its rows based architecture ,anti-fuse programming technology and

multiplexers ,the Crosspoint FPGAs are most similar to those of Actel FPGAs.

ALGOTRONIX CAL-1024

This design has a two-dimensional mesh array structure which resembles the gate array “sea of

gates” architecture . Like the Xilinx architecture, Algotronics used Static RAM programming

technology to specify the function performed by each logic cell and to control the switching of

connections between cells. The CAL1024 design contains 1024 identical logic cells arranged in a

32 X 32 matrix. The design is considered to be a mesh-connected architecture since each cell is

directly connected to its nearest north, south, east, and west neighbors. In addition to these direct

connects, two global interconnect signals are routed to each cell to distribute clock and other

“low skew requirement” control signals. Figure below shows the basic array architecture,

indicating both nearest neighbor and global connections to the logic cells.

23


ALGOTRONIX Array Architecture

The basic building block of the Algotronix design is a configurable cell containing multiplexers

and a function unit. As indicated in the figure , the function unit is preceded by multiplexers

which select the source for the X1 and X2 inputs. The function unit is capable of generating any

logic function of the two inputs, or of operating as a D-type latch. There are four additional

multiplexers which select the function output or one of the external inputs for routing to each of

the four outputs (north, south, east, and west).

24


A unique feature in the Algotronix I/O pad design is its capability to provide simultaneous input

and output on the same pin when communicating with another Algotronix chip. This is done

through a 3-level (ternary) logic signaling scheme in which I/O pads sense whenever two outputs

are driving each other via a contention scheme. Even during contention, the pad can deduce the

correct input value and pass it along to the internal circuitry. This makes it easier to partition a

single design across multiple FPGAs because the increased connectivity reduces pin limitations

on communications bandwidth.

AMD Mach : AMD offers a CPLD family comprising five subfamilies calledMach. Each Mach

device consists of multiple PAL-like blocks (or optimizedPALs). Mach 1 and 2 consist of

optimized22V16 PALs, Mach 3 and 4 consist of several optimized 34V16 PALs,and Mach 5 is

similar to Mach 3 and 4but offers enhanced speed performance .All Mach chips use EEPROM

technology, and together the five subfamilies provide a wide range of selection ,from small,

inexpensive chips to larger, state-of-the-art ones

Figure (a) below depicts a Mach 4 chip, showing the multiple 34V16 PAL-like blocks and the

interconnect, called the central switch matrix. The in-circuit programmable chips range in size

from6 to 16 PAL-like blocks, corresponding roughly to 2,000 to 5,000 equivalent gates. All

connections between PAL-like blocks (even from a PAL-like block to itself) pass through the

central switch matrix. Thus, the device is not merely a collection of PAL-like blocks but a

single ,large device. Since all connections travel through the same path, circuit timing delays are

predictable. Figure (b) illustrates a Mach 4 PAL-like block. It has 16 outputs and a total of

25


34inputs (16 of which are the fed-back outputs), so it corresponds to a 34V16 PAL. However,

there are two key differences between this block and a normal PAL:1) a product term (PT)

allocator between the AND plane and the macro cells (the macro cells comprise an OR gate, an

EXOR gate, and a flip-flop), and2) an output switch matrix between the OR gates and the I/O

pins. These features make a Mach 4 chip easier to use because they decouple sections of the

PAL-like block. More specifically, the product term allocator distributes and shares product

terms from the AND plane to OR gates that require them, allowing much more flexibility than

thefixed-size OR gates in regular PALs. The output switch matrix enables any macrocell output

(OR gate or flip-flop)to drive any I/O pin connected to the PAL-like block, again providing

greater flexibility than a PAL, in which each macro cell can drive only one specific I/O pin.

Mach 4’s combination of in-system programmability and high flexibility allow easy hardware

design changes.

AMD Mach 4 structure

26


Summary of Commercially available FPGAs :

Manufacturer General

Architecture

Type of Logic Block Programming

Technology Used

XILINX Symmetrical Array Look-up Table Static RAM

ACTEL Row-Based Multiplexer based Anti-fuse

ALTERA Hierarchical -PLD PLD-Block EPROM

Plessey Sea of Gates NAND-Gtae Static RAM

PLUS Logic Hierarchical -PLD PLD-Block EPROM

AMD Hierarchical -PLD PLD-Block EEPROM

QUICK Logic Symmetrical Array Multiplexer Based Anti-fuse

ALGOTRONIX Sea of Gates Multiplexers and

Basic gates

Static RAM

CONCURRENT Sea of Gates Multiplexers and

Basic gates

Static RAM

CROSSPOINT

Solutions

Row Based Transistor Pairs &

Multiplexers

Anti-fuse

FPGA Design Flow:

27


The earlier PLD and FPGA designs were performed largely by hand But to-days

complex programmable logic devices requires the use of an integrated Computer-Aided Design

(CAD) system. Both commercial CAD tool vendors and FPGA companies offer appropriate

tools. For example, traditional Electronic Design Automation (EDA) vendors such as Cadence,

Mentor Graphics, Synopsys, and View Logic etc. offer tools to support FPGA design. These

tools are typically used for the front-end design entry and simulation operations and provide the

necessary interfaces to vendor-specific back-end tools for chip placement and routing.

Examples of vendor specific tools are the Xilinx XACT system and the Altera

MAX+PLUS II software.The Altera’s MAX+PLUS II software supports the entire design flow

on either PC or workstation platforms.

The first step in the design process is the description of the logic circuit,which can be done

either by schematic capture tool or with Boolean expressions.This is followed by a translation

that converts the original circuit description into a standard format used by the suitable CAD

tools (Ex: XILINX CAD tools).The circuit is then passed through CAD programs that partition it

into appropriate logic blocks. Select a specific location in the FPGA for each logic block and

form the required interconnections.( (Cadence, View Logic, OrCAD, etc.)

The performance of the implemented circuit can then be checked and its functionality is

verified.Finally a bitmap is generated and downloaded in a serial fashion to configure the FPGA.

Initial Design Entry: The detailed description of the logic circuit are entered using a schematic

capture program. In the design entry phase, RTL or schematic entry is used to create the logic

to be implemented in the device. Pin assignments can also be made, including pin placement

information, and timing constraints that might be necessary for building a functioning design. In

the design entry step a schematic or Block Design File (.bdf) is created that is the top-level

design. The library of parameterized modules (LPM) functions are added and Verilog HDL

code is used to add a logic block.

The library may be either supplied by the vendor of the schematic capture program or any FPGA

vendor(Like Xilinx or Altera etc.) .An alternate way to specify the logic circuit is to use a

Boolean expression or state machine language.This is done without the graphical interface.Some

times it is possible to use a mixture of both schematic and Boolean expressions.

28


Translation to XNF Format: After the logic circuit is successfully designed and merged into

one circuit ,it is translated into a special format that is understood by the CAD tools.For Xilinx

this format is called Xilinx net list format or XNF.This translation utility is supported by the

Xilinx or by the vendor of the logic entry tool.The translation process may also involve

automatic optimizations of the circuit.

Partition: The XNF circuit is partitioned into logic cells (this partition is also known as

Technology Mapping). This technology mapping converts the XNF circuit which is a net list of

basic logic gates ,into a net list of Xilinx logic cells.The logic cell used depends on which Xilinx

product the circuit is to be implemented in. XACT tools also attempt to optimize the circuit

during this step. For example, circuitry associated with unused logic block inputs or outputs is

eliminated from the design. In addition, the partitioning program attempts to minimize either the

total number of CLBs used or the number of logic stages in the critical delay path. The mapping

procedure attempts to optimize the resulting circuit, either to minimize the total of logic cells

required or the number of stages of logic cells in time critical circuitry.

Place and Route: This step is performed by using the CAD tools, manually by the user or

mixture of the two. The first step is placement ,in which each logic cell generated during the

partition step is assigned to a specific location in the FPGA. Automatic placement can be done

using the simulated annealing algorithm.

After the placement ,the required interconnections among the logic cells must be realized by

selecting wire segments and routing switches within the FPGA interconnection resources.An

automatic routing algorithm is used for this task which is based on Maze routing algorithm.

Generally this routing and placement must be done automatically but sometimes it is done

manually by the user also. With the physical placement and routing completed, exact timing

values can now be used to determine chip performance. The XACT tools provide a critical path

29


timing analyzer which provides delay information on the longest through shortest paths through

the chip.In addition, the physical layout timing information can also be back-annotated to the

schematics to get more accurate functional simulation results. The final step in the Xilinx design

flow is the creation of the BIT file which contains the binary programming data needed to

configure the SRAM bits of the target chip. This file is then downloaded to configure the chip

for final functional and timing tests of the programmed chip.

After creating the design it must be compiled. Compilation converts the design into a bitstream

that can be downloaded into the FPGA. The most important output of compilation is an SRAM

Object File (.sof), which is used to program the device. The software also generates other report

files that provide information about the code as it compiles

In the design flow process the simulation is very important to learn, and there are entire

applications devoted to simulating hardware designs. There are two types of simulation, RTL

and timing. RTL (or functional) simulation allows you to verify that your code is place-and-

route) simulation verifies that the design meets timing and functions appropriately in the device.

After completion of the design ,its performance is checked either by downloading the

configuration bits into FPGA or by using an interface to a timing simulation program.If the

performance is not satisfactory ,suitable modifications are done at some point in the design

flow.Once the timing and functionality is verified the implementation is complete.

--------------------xxxxxx------------------

References:

1.Field Programmable Gate Arrays – S.D Brown, R.J.Francis et al 2.FPGA and CPLD Architectures : A Tutorial -STEPHEN BROWN & JONATHAN ROSE.3. FPGA Architecture: Survey and Challenges --Ian Kuon1, Russell Tessier and Jonathan Rose1

30

unit iii : cpld & fpga architecture & applications

Documents

srambased fpgas

remaining sram cells

programming technologiesthere

static memory sram

programmable logic devices

programmable interconnect

programmable architecture

technology costly