arbitery document

7/27/2019 Arbitery Document

1/78

AAcckknnoowwlleeddggeemmeenntt

We would like to thank our beloved parents for their endless kind support both mentally,

financially and for encouraging us, without which we would not be what we are today.

At the outset we sincerely thank Mr. Director, for his kind cooperation and

Encouragement for the successful completion of project work and providing the necessary

facilities.

We are most obliged and grateful to our Principal, H.O.D ECE Dept, and internal guide,

Associate Professor, ECE Dept, for giving us guidance in completing this project successfully.

We are grateful to , Project Guide, Hyderabad, for their sagacious guidance, scholarly

advice and the inspiration offered in an amiable and pleasant manner in helping us

completing this project successfully.

Last but no the least, we are thankful to our friends and well wishers.


2/78

Design and Implementation of A Lottery-based Bandwidth

Guaranteed and Low Latency Arbiter for On-Chip Bus

Abstract

In the paper, we propose the two-level Lottery-based bus arbitration algorithm, which is

called RB_ Lottery arbitration algorithm, where R means real-time, and B means binary group

logic for priority selections. The proposed bus arbitration solves the impartiality and starvation

problems which exist in the previous Lottery method, and reduces the average latency of bus

requests for

real-time applications. The software simulation results show that the proposed RB_Lottery

algorithm has better performance of bandwidth guarantees, and has less average latency of bus

requests than the Lottery arbitration.

The bus arbiter decides which master can be granted for bus accesses when the multiple masters

issue bus requests at the same time in a system-on-chip. In the previous bus arbitration

algorithms, the static fixed priority algorithm, and the time division multiplexing (TDM)/ Round-

robin algorithm , and the Lottery ticket bus algorithm exist some drawbacks in arbitrations, such

as the bus starvation problem, and low system performance problem because of bus distribution

latency during bus arbitration, and the large latency delay problem because of lower ratio

assigned ticket number of the masters. Recently, the real-time issue has been considered for bus

arbitrations [5]. In our paper, we propose the two-level static priority bus arbitration algorithm,

the RB_Lottery algorithm, which is based on Lottery bus arbitration method. The proposed RB_

Lottery bus arbitration can handle the real-time requirements for all masters, and solve the

traditional bus distribution problem, and guarantee the bandwidth requirement of each master,

and then reduces the bus arbitration latency. In the first level bus arbitration, we use static

priority real-time counters to satisfy the real-time requirements, which is named the static

priority real-time handler. In the second level bus arbitration, we adopt a Lottery-based algorithm

with binary group partition for avoiding starvation and reducing bus latency.


3/78

TABLE OF CONTENTS


4/78


5/78

INTRODUCTION

Xilinx introduced Field programmable gate arrays, or FPGAs, in 1985. Figure 1 is a conceptual

model of an FPGA.

FPGA are constructed of three basic elements: logic blocks, I/O cells, and interconnection

resources. A useful analogy for an FPGA is the layout of a city. The logic blocks correspond to

city blocks that are

occupied by different businesses receiving products from various suppliers within the city, just as

the logic blocks receive data from other logic blocks within the FPGA, and processing those

products for consumption by other firms or end users, just as logic block outputs are sent to other

blocks and ultimately to the device utilizing the FPGA. FPGAs and our mythical city both utilize

interconnections

between blocks, wire segments for FPGAs and streets and telephone connections for the city,

that can be flexibly designed to meet changing needs with routers in both cases and stoplights in


6/78

one. The final elements in the model are the mechanisms for interaction with the outside world;

I/O cells to the FPGA

as airports, freeways, and long distance telephone lines are to the city. The rest of this report will

explore in greater detail implementations of this basic three-element model.

Configurable Logic Blocks:

The heart of the FPGA lies in the CLBs. CLBs appear in rows and columns within

all

FPGAs and implement the logic functions desired by the programmer. Most CLBs accomplish

this

with a lookup table2 . Lookup tables (LUTs) are digital memory arrays that contain truth tables

for

any logic function that can be implemented by the given number of logic inputs for a CLB. The

output of the CLB is then the logical result of the function recorded in the lookup table. In order

to

program the CLBs, truth tables be loaded into the LUTs of each CLB. Refer to page 3 for an

example of the CLB architecture for a Xilinx XC5200 chip.2

I/O Blocks:

I/O blocks provide for interaction with the outside world. An I/O pin can be used for

input

or output.4 I/O blocks can contain logic functionality, although high logic utilization decreases

pin

placement flexibility, as I/O blocks utilized in logic cannot be reassigned mid-design.5

Interconnection (Routing) Architecture:

The routing architecture usually covers 60-90% of FPGA chip area2 and fittingly will

require the longest description of the three basic FPGA elements. The routing architecture of

FPGAs is constructed of wires segmented into various lengths intersecting each other at routing

switches3. The most popular programmable switch element (PSE) technology, static RAM, for

implementing these routing switches is briefly discussed in the next section.

Two types of routing architecture are common:


7/78

row based routing, where only horizontal channels are used to connect CLBs, and symmetrical

routing, where vertical and horizontal channels are utilized, as in figure 1.Direct connection

wires link neighboring CLBs across routing channels. Connections to distant blocks are

implemented through programmable switch matrices4 (PSMs), which contain a set of PSEs that

switch perpendicular wires. The wires routed through PSM are either single lines, which must

pass through one PSE for each CLB bypassed, or double lines, which pass two CLBs for every

switch. Long lines skip switching all together. The implementation of complex routing

techniques is described later for the Xilinx XC5200.

Static RAM Programmable Switch

The most common programmable element used for FPGA implementation is static RAM.FPGAs

use permanent memory, usually PROM, to store the logic configuration of the chip. Upon power

up, each RAM cell gets a value based upon the PROM configuration. When the cell is high, the

transistor is

conducting and current flows, when low, the transistor is cutoff and no current flows.

Configurable Logic Blocks:

The diagram on the left is of one of the four identical logic cells that constitute

each CLB. The segment labeled F contains a lookup table for four inputs (F4-F1). The

trapezoidal objects are 2:1

mutiplexers. The chip enable (CE), clock (CK) and clear (CLR) signals travel to this cell and all

others in the architecture via global long lines. Each cell can be cleared individually or all can be

cleared at once. Each logic cell can implement either a D flip-flop or a latch. When the clock

transitions high, the D flip-flop (FD) passes the output of the programmed logic operation to the

output (Q). From DI to DO, a feed-through path that does not change the logic of the input can

be

implemented. This is used in routing applications discussed later.

In this case, two lookup tables (F) are used for input, each fed with the same four logic

lines. The fifth input is used to toggle the 2:1 mux between the lookup tables, adding a fifth bit to

the logic

function. There are four lookup tables in each CLB, so four independent four-input logic

functions


8/78

or two independent five input logic functions can be implemented in each block.

I/O Blocks:

The I/O blocks of XC5200 are completely decoupled from the internal logic of the

CLBs.5

The I/O blocks are attached to the internal logic through a ring of inter-connect cells which form

a

ring around the chip. The extra routing layer provides connection to nearby CLBs as well as far

away CLBs through long lines. The XC5200 can be connected with TTL or CMOS logic.

Interconnection (Routing) Architecture:

This chip has six levels of routing hierarchy: single length lines (1), double length lines

(2), direct connects (3), long lines/global lines (4), local interconnection matrices (5), and logic

cell feed through paths (6). The global routing matrix (GRM) contains the switch matrix

architecture discussed earlier in this report. The GRM routs logic signals over the single, double

and long lines, then communicates to the CLB via a 24-line interface to4 the LIM. These

matrices connect far-away sections of the chip as well as link all CLBs to a global command

structure. The remaining routing architecture for the XC5200 is

contained within the Versa-Block units. These units are comprised of the CLBs, as well as local

interconnection matrices. The local interconnection matrix (LIM) handles connections to

neighboring CLBs through direct connect lines that bypass the GRMs. The LIM also handles

logic cell feed through paths, which do not perform any calculations, but merely re-power a

signal that has faded passing through the chip. This splitting of the routing resources between

local and global areas simplifies router design, decreases the chip space necessary for routing,

and decreases use of routing switches, which add resistance and capacitance to circuits.

A field-programmable gate array (FPGA) is an integrated circuit designed to be

configured by a customer or a designer after manufacturinghence "field-programmable". The

FPGA configuration is generally specified using a hardware description language (HDL), similar

to that used for an application-specific integrated circuit (ASIC) (circuit diagrams were

previously used to specify the configuration, as they were for ASICs, but this is increasingly

rare). Contemporary FPGAs have large resources of logic gates and RAM blocks to implement
http://en.wikipedia.org/wiki/Integrated_circuithttp://en.wikipedia.org/wiki/Field-programmablehttp://en.wikipedia.org/wiki/Hardware_description_languagehttp://en.wikipedia.org/wiki/Application-specific_integrated_circuithttp://en.wikipedia.org/wiki/Circuit_diagramhttp://en.wikipedia.org/wiki/Circuit_diagramhttp://en.wikipedia.org/wiki/Application-specific_integrated_circuithttp://en.wikipedia.org/wiki/Hardware_description_languagehttp://en.wikipedia.org/wiki/Field-programmablehttp://en.wikipedia.org/wiki/Integrated_circuit


9/78

complex digital computations. As FPGA designs employ very fast IOs and bidirectional data

buses it becomes a challenge to verify correct timing of valid data within setup time and hold

time. Floor planning enables resources allocation within FPGA to meet these time

constraints. FPGAs can be used to implement any logical function that an ASIC could perform.

The ability to update the functionality after shipping, partial re-configuration of a portion of the

design and the low non-recurring engineering costs relative to an ASIC design (notwithstanding

the generally higher unit cost), offer advantages for many applications.

FPGAs contain programmable logic components called "logic blocks", and a

hierarchy of reconfigurable interconnects that allow the blocks to be "wired together"

somewhat like many (changeable) logic gates that can be inter-wired in (many) different

configurations. Logic blocks can be configured to perform complex combinational functions, or

merely simple logic gates like AND and XOR. In most FPGAs, the logic blocks also include

memory elements, which may be simple flip-flops or more complete blocks of memory.

Some FPGAs have analog features in addition to digital functions. The most

common analog feature is programmable slew rate and drive strength on each output pin,

allowing the engineer to set slow rates on lightly loaded pins that would

otherwise ring unacceptably, and to set stronger, faster rates on heavily loaded pins on high-

speed channels that would otherwise run too slow. Another relatively common analog feature is

differential comparators on input pins designed to be connected to differential

signaling channels. A few "mixed signal FPGAs" have integrated peripheral analog-to-digital

converters (ADCs) and digital-to-analog converters (DACs) with analog signal conditioning

blocks allowing them to operate as a system-on-a-chip. Such devices blur the line between an

FPGA, which carries digital ones and zeros on its internal programmable interconnect fabric,

and field-programmable analog array (FPAA), which carries analog values on its internal

programmable interconnect fabric.
http://en.wikipedia.org/wiki/Partial_re-configurationhttp://en.wikipedia.org/wiki/Programmable_logic_devicehttp://en.wikipedia.org/wiki/Combinational_logichttp://en.wikipedia.org/wiki/Logic_gatehttp://en.wikipedia.org/wiki/AND_gatehttp://en.wikipedia.org/wiki/XOR_gatehttp://en.wikipedia.org/wiki/Flip-flop_(electronics)http://en.wikipedia.org/wiki/Slew_ratehttp://en.wikipedia.org/wiki/Electrical_resonancehttp://en.wikipedia.org/wiki/Differential_signalinghttp://en.wikipedia.org/wiki/Differential_signalinghttp://en.wikipedia.org/wiki/Mixed-signal_integrated_circuithttp://en.wikipedia.org/wiki/Analog-to-digital_converterhttp://en.wikipedia.org/wiki/Analog-to-digital_converterhttp://en.wikipedia.org/wiki/Digital-to-analog_converterhttp://en.wikipedia.org/wiki/System-on-a-chiphttp://en.wikipedia.org/wiki/Field-programmable_analog_arrayhttp://en.wikipedia.org/wiki/Field-programmable_analog_arrayhttp://en.wikipedia.org/wiki/System-on-a-chiphttp://en.wikipedia.org/wiki/Digital-to-analog_converterhttp://en.wikipedia.org/wiki/Analog-to-digital_converterhttp://en.wikipedia.org/wiki/Analog-to-digital_converterhttp://en.wikipedia.org/wiki/Mixed-signal_integrated_circuithttp://en.wikipedia.org/wiki/Differential_signalinghttp://en.wikipedia.org/wiki/Differential_signalinghttp://en.wikipedia.org/wiki/Electrical_resonancehttp://en.wikipedia.org/wiki/Slew_ratehttp://en.wikipedia.org/wiki/Flip-flop_(electronics)http://en.wikipedia.org/wiki/XOR_gatehttp://en.wikipedia.org/wiki/AND_gatehttp://en.wikipedia.org/wiki/Logic_gatehttp://en.wikipedia.org/wiki/Combinational_logichttp://en.wikipedia.org/wiki/Programmable_logic_devicehttp://en.wikipedia.org/wiki/Partial_re-configuration


10/78

HISTORY

The FPGA industry sprouted from programmable read-only memory (PROM)

and programmable logic devices (PLDs). PROMs and PLDs both had the option of being

programmed in batches in a factory or in the field (field programmable), however programmable

logic was hard-wired between logic gates.

In the late 1980s the Naval Surface Warfare Department funded an experiment proposed

by Steve Casselman to develop a computer that would implement 600,000 reprogrammable

gates. Casselman was successful and a patent related to the system was issued in 1992.

Some of the industrys foundational concepts and technologies for programmable logic

arrays, gates, and logic blocks are founded in patents awarded to David W. Page and LuVerne R.

Peterson in 1985.

Xilinx co-founders Ross Freeman and Bernard Vonderschmitt invented the first

commercially viable field programmable gate array in 1985the XC2064.The XC2064 had

programmable gates and programmable interconnects between gates, the beginnings of a new

technology and market. The XC2064 boasted a mere 64 configurable logic blocks (CLBs), with

two 3-input lookup tables (LUTs). More than 20 years later, Freeman was entered intothe National Inventors Hall of Fame for his invention.

Xilinx continued unchallenged and quickly growing from 1985 to the mid-1990s, when

competitors sprouted up, eroding significant market-share. By 1993, Actel was serving about 18

percent of the market.

The 1990s were an explosive period of time for FPGAs, both in sophistication and the

volume of production. In the early 1990s, FPGAs were primarily used in telecommunications

and networking. By the end of the decade, FPGAs found their way into consumer, automotive,

and industrial applications.

Modern Developments
http://en.wikipedia.org/wiki/Programmable_read-only_memoryhttp://en.wikipedia.org/wiki/Programmable_logic_deviceshttp://en.wikipedia.org/wiki/Xilinxhttp://en.wikipedia.org/wiki/Ross_Freemanhttp://en.wikipedia.org/wiki/Bernard_Vonderschmitthttp://en.wikipedia.org/wiki/National_Inventors_Hall_of_Famehttp://en.wikipedia.org/wiki/National_Inventors_Hall_of_Famehttp://en.wikipedia.org/wiki/Bernard_Vonderschmitthttp://en.wikipedia.org/wiki/Ross_Freemanhttp://en.wikipedia.org/wiki/Xilinxhttp://en.wikipedia.org/wiki/Programmable_logic_deviceshttp://en.wikipedia.org/wiki/Programmable_read-only_memory


11/78

A recent trend has been to take the coarse-grained architectural approach a step further

by combining the logic blocks and interconnects of traditional FPGAs with

embedded microprocessors and related peripherals to form a complete "system on a

programmable chip". This work mirrors the architecture by Ron Perlof and Hana Potash of

Burroughs Advanced Systems Group which combined a reconfigurable CPU architecture on a

single chip called the SB24. That work was done in 1982. Examples of such hybrid technologies

can be found in the Xilinx Zynq-7000 All Programmable SoC, which includes a 1.0 GHz dual-

core ARM Cortex-A9 MPCore processor embedded within the FPGA's logic fabric or in

the Altera Arria V FPGA which includes a 800 Mhz dual-core ARM Cortex-A9 MPCore. The

Atmel FPSLIC is another such device, which uses an AVRprocessor in combination with

Atmel's programmable logic architecture. The Actel SmartFusiondevices incorporate an ARM

Cortex-M3 hard processor core (with up to 512 kB of flash and 64 kB of RAM) and analog

peripherals such as a multi-channel ADC and DACs to their flash-based FPGA fabric.

In 2010, Xilinx Inc introduced the first All Programable System on a Chip branded

Zynq-7000 that fused features of an ARM high-end microcontroller (hard-core

implementations of a 32-bit processor, memory, and I/O) with an FPGA fabric to make FPGAs

easier for embedded designers to use. By incorporating the ARM processor-based platform into a

28 nm FPGA family, the extensible processing platform enables system architects and embedded

software developers to apply a combination of serial and parallel processing to address the

challenges they face in designing today's embedded systems, which must meet ever-growing

demands to perform highly complex functions. By allowing them to design in a familiar ARM

environment, embedded designers benefit from multiple advantages including: decreased time-

to-market, significantly reduced power, reduced BOM (bill of materials) cost, etc. These are

among many advantages of an All Programmable FPGA platform compared to more traditional

design cycles associated with ASICs.

An alternate approach to using hard-macro processors is to make use ofsoft

processorcores that are implemented within the FPGA logic. Nios

II, MicroBlaze and Mico32 are examples of popular soft core processors.
http://en.wikipedia.org/wiki/Microprocessorshttp://en.wikipedia.org/wiki/Xilinxhttp://en.wikipedia.org/wiki/ARM_architecturehttp://en.wikipedia.org/wiki/Alterahttp://en.wikipedia.org/wiki/Atmel_AVRhttp://en.wikipedia.org/wiki/SmartFusionhttp://en.wikipedia.org/wiki/Soft_processorhttp://en.wikipedia.org/wiki/Soft_processorhttp://en.wikipedia.org/wiki/Semiconductor_intellectual_property_corehttp://en.wikipedia.org/wiki/Nios_IIhttp://en.wikipedia.org/wiki/Nios_IIhttp://en.wikipedia.org/wiki/MicroBlazehttp://en.wikipedia.org/wiki/Mico32http://en.wikipedia.org/wiki/Mico32http://en.wikipedia.org/wiki/MicroBlazehttp://en.wikipedia.org/wiki/Nios_IIhttp://en.wikipedia.org/wiki/Nios_IIhttp://en.wikipedia.org/wiki/Semiconductor_intellectual_property_corehttp://en.wikipedia.org/wiki/Soft_processorhttp://en.wikipedia.org/wiki/Soft_processorhttp://en.wikipedia.org/wiki/SmartFusionhttp://en.wikipedia.org/wiki/Atmel_AVRhttp://en.wikipedia.org/wiki/Alterahttp://en.wikipedia.org/wiki/ARM_architecturehttp://en.wikipedia.org/wiki/Xilinxhttp://en.wikipedia.org/wiki/Microprocessors


12/78

As previously mentioned, many modern FPGAs have the ability to be reprogrammed

at "run time," and this is leading to the idea of reconfigurable computing or reconfigurable

systemsCPUs that reconfigure themselves to suit the task at hand.

Additionally, new, non-FPGA architectures are beginning to emerge. Software-

configurable microprocessors such as the Stretch S5000 adopt a hybrid approach by providing an

array of processor cores and FPGA-like programmable cores on the same chip.

Gates

1987: 9,000 gates, Xilinx

1992: 600,000, Naval Surface Warfare Department

Early 2000s: Millions

Market size

1985: First commercial FPGA : Xilinx XC2064

1987: $14 million

~1993: >$385 million

2005: $1.9 billion

2010 estimates: $2.75 billion

FPGA design starts

2005: 80,000

2008: 90,000

FPGA comparisons

Historically, FPGAs have been slower, less energy efficient and generally achieved

less functionality than their fixed ASIC counterparts. An older study had shown that designs

implemented on FPGAs need on average 40 times as much area, draw 12 times as much dynamic

power, and are three times slower than the corresponding ASIC implementations; however, the

times are changing. Today's FPGAs such as the Xilinx Virtex-7 or the Altera Stratix 5 rival

ASIC and ASSP solutions providing significantly reduced power, increased speed, lower BOM

cost, minimal implementation real-estate, and maximum on-the-fly configurability. Where
http://en.wikipedia.org/wiki/Reconfigurable_computinghttp://en.wikipedia.org/wiki/Central_processing_unithttp://en.wikipedia.org/wiki/Microprocessorhttp://en.wikipedia.org/wiki/ASIChttp://en.wikipedia.org/wiki/Xilinxhttp://en.wikipedia.org/wiki/Alterahttp://en.wikipedia.org/wiki/Alterahttp://en.wikipedia.org/wiki/Xilinxhttp://en.wikipedia.org/wiki/ASIChttp://en.wikipedia.org/wiki/Microprocessorhttp://en.wikipedia.org/wiki/Central_processing_unithttp://en.wikipedia.org/wiki/Reconfigurable_computing


13/78

previously a design may have included 6 to 10 ASICs, today the same design can be achieved

using only one FPGA.

A Xilinx Zynq-7000 All Programmable System on a Chip.

Advantages include the ability to re-program in the field to fix bugs, and may include a

shortertime to market and lowernon-recurring engineering costs. Vendors can also take a

middle road by developing their hardware on ordinary FPGAs, but manufacture their final

version so it can no longer be modified after the design has been committed.

Xilinx claims that several market and technology dynamics are changing the

ASIC/FPGA paradigm:

Integrated circuit costs are rising aggressively ASIC complexity has lengthened development time R&D resources and headcount are decreasing Revenue losses for slow time-to-market are increasing Financial constraints in a poor economy are driving low-cost technologies These trends make FPGAs a better alternative than ASICs for a larger number of higher-

volume applications than they have been historically used for, to which the company

attributes the growing number of FPGA design starts .
http://en.wikipedia.org/wiki/Xilinxhttp://en.wikipedia.org/wiki/Time_to_markethttp://en.wikipedia.org/wiki/Non-recurring_engineeringhttp://en.wikipedia.org/wiki/Integrated_circuithttp://en.wikipedia.org/wiki/R%26Dhttp://en.wikipedia.org/wiki/File:Xilinx_Zynq-7000_AP_SoC.jpghttp://en.wikipedia.org/wiki/R%26Dhttp://en.wikipedia.org/wiki/Integrated_circuithttp://en.wikipedia.org/wiki/Non-recurring_engineeringhttp://en.wikipedia.org/wiki/Time_to_markethttp://en.wikipedia.org/wiki/Xilinx


14/78

Some FPGAs have the capability ofpartial re-configuration that lets one portion of thedevice be re-programmed while other portions continue running.

Complex Programmable Logic Devices (CPLD)

The primary differences between CPLDs (Complex Programmable Logic Devices)

and FPGAs are architectural. A CPLD has a somewhat restrictive structure consisting of one or

more programmable sum-of-products logic arrays feeding a relatively small number of clocked

registers. The result of this is less flexibility, with the advantage of more predictable timing

delays and a higher logic-to-interconnect ratio. The FPGA architectures, on the other hand, are

dominated by interconnect. This makes them far more flexible (in terms of the range of designs

that are practical for implementation within them) but also far more complex to design for.

In practice, the distinction between FPGAs and CPLDs is often one of size as FPGAs

are usually much larger in terms of resources than CPLDs. Typically only FPGA's contain more

complex embedded functions such as adders, multipliers, memory, and serdes. Another common

distinction is that CPLDs contain embedded flash to store their configuration while FPGAs

usually, but not always, require an external flash memory.

Security considerations

With respect to security, FPGAs have both advantages and disadvantages as

compared to ASICs or secure microprocessors. FPGAs' flexibility makes malicious

modifications during fabrication a lower risk. Previously, for many FPGAs, the design bitstream

is exposed while the FPGA loads it from external memory (typically on every power-on). All

major FPGA vendors now offer a spectrum of security solutions to designers such as

bitstream encryption and authentication. For example, Altera and Xilinx offerAES (up to 256

bit) encryption for bitstreams stored in an external flash memory.

FPGAs that store their configuration internally in nonvolatile flash memory, such

as Microsemi's ProAsic 3 orLattice's XP2 programmable devices, do not expose the bitstream

and do not need encryption. In addition, flash memory for LUT provides SEU protection for

space applications.
http://en.wikipedia.org/wiki/Partial_re-configurationhttp://en.wikipedia.org/wiki/CPLDhttp://en.wikipedia.org/wiki/Serdeshttp://en.wikipedia.org/wiki/Encryptionhttp://en.wikipedia.org/wiki/Authenticationhttp://en.wikipedia.org/wiki/Alterahttp://en.wikipedia.org/wiki/Xilinxhttp://en.wikipedia.org/wiki/Advanced_Encryption_Standardhttp://en.wikipedia.org/wiki/Microsemihttp://en.wikipedia.org/wiki/Latticehttp://en.wikipedia.org/wiki/Latticehttp://en.wikipedia.org/wiki/Microsemihttp://en.wikipedia.org/wiki/Advanced_Encryption_Standardhttp://en.wikipedia.org/wiki/Xilinxhttp://en.wikipedia.org/wiki/Alterahttp://en.wikipedia.org/wiki/Authenticationhttp://en.wikipedia.org/wiki/Encryptionhttp://en.wikipedia.org/wiki/Serdeshttp://en.wikipedia.org/wiki/CPLDhttp://en.wikipedia.org/wiki/Partial_re-configuration


15/78

Applications

Applications of FPGAs include digital signal processing, software-defined

radio, ASICprototyping, medical imaging, computer vision, speech

recognition, cryptography, bioinformatics, computer hardware emulation, radio astronomy, metal

detection and a growing range of other areas.

FPGAs originally began as competitors to CPLDs and competed in a similar space, that

ofglue logic forPCBs. As their size, capabilities, and speed increased, they began to take over

larger and larger functions to the state where some are now marketed as full systems on chips

(SoC). Particularly with the introduction of dedicated multipliers into FPGA architectures in the

late 1990s, applications which had traditionally been the sole reserve ofDSPsbegan to

incorporate FPGAs instead.

Traditionally, FPGAs have been reserved for specific vertical applications where the

volume of production is small. For these low-volume applications, the premium that companies

pay in hardware costs per unit for a programmable chip is more affordable than the development

resources spent on creating an ASIC for a low-volume application.

Today, new cost and performance dynamics have broadened the range of viable

applications.

Common FPGA Applications:

Aerospace and Defense Avionics/DO-254 MILCOM Missiles & Munitions Secure Solutions Space ASIC Prototyping Audio Connectivity Solutions Portable Electronics Radio
http://en.wikipedia.org/wiki/Digital_signal_processinghttp://en.wikipedia.org/wiki/Software-defined_radiohttp://en.wikipedia.org/wiki/Software-defined_radiohttp://en.wikipedia.org/wiki/Application-specific_integrated_circuithttp://en.wikipedia.org/wiki/Medical_imaginghttp://en.wikipedia.org/wiki/Computer_visionhttp://en.wikipedia.org/wiki/Speech_recognitionhttp://en.wikipedia.org/wiki/Speech_recognitionhttp://en.wikipedia.org/wiki/Cryptographyhttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Hardware_emulationhttp://en.wikipedia.org/wiki/Radio_astronomyhttp://en.wikipedia.org/wiki/CPLDhttp://en.wikipedia.org/wiki/Glue_logichttp://en.wikipedia.org/wiki/Printed_circuit_boardhttp://en.wikipedia.org/wiki/System-on-a-chiphttp://en.wikipedia.org/wiki/Digital_signal_processorhttp://en.wikipedia.org/wiki/Vertical_applicationhttp://en.wikipedia.org/wiki/Vertical_applicationhttp://en.wikipedia.org/wiki/Digital_signal_processorhttp://en.wikipedia.org/wiki/System-on-a-chiphttp://en.wikipedia.org/wiki/Printed_circuit_boardhttp://en.wikipedia.org/wiki/Glue_logichttp://en.wikipedia.org/wiki/CPLDhttp://en.wikipedia.org/wiki/Radio_astronomyhttp://en.wikipedia.org/wiki/Hardware_emulationhttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Cryptographyhttp://en.wikipedia.org/wiki/Speech_recognitionhttp://en.wikipedia.org/wiki/Speech_recognitionhttp://en.wikipedia.org/wiki/Computer_visionhttp://en.wikipedia.org/wiki/Medical_imaginghttp://en.wikipedia.org/wiki/Application-specific_integrated_circuithttp://en.wikipedia.org/wiki/Software-defined_radiohttp://en.wikipedia.org/wiki/Software-defined_radiohttp://en.wikipedia.org/wiki/Digital_signal_processing


16/78

Digital Signal Processing (DSP) Automotive High Resolution Video Image Processing Vehicle Networking and Connectivity Automotive Infotainment Broadcast Real-Time Video Engine Edge QAM Encoders Displays Switches and Routers Consumer Electronics Digital Displays Digital Cameras Multi-function Printers Portable Electronics Set-top Boxes Data Center Servers Security Routers Switches Gateways Load Balancing High Performance Computing Servers Super Computers SIGINT Systems High-end RADARS


17/78

High-end Beam Forming Systems Data Mining Systems Industrial Industrial Imaging Industrial Networking Motor Control Medical Ultrasound CT Scanner MRI X-ray PET Surgical Systems Security Industrial Imaging Secure Solutions Image Processing Video & Image Processing High Resolution Video Video Over IP Gateway Digital Displays Industrial Imaging Wired Communications Optical Transport Networks Network Processing Connectivity Interfaces Wireless Communications Baseband Connectivity Interfaces Mobile Backhaul


18/78

Radio

Architecture

The most common FPGA architecture consists of an array of logic blocks (called

Configurable Logic Block, CLB, or Logic Array Block, LAB, depending on vendor), I/O pads,

and routing channels. Generally, all the routing channels have the same width (number of wires).

Multiple I/O pads may fit into the height of one row or the width of one column in the array.

An application circuit must be mapped into an FPGA with adequate resources. While

the number of CLBs/LABs and I/Os required is easily determined from the design, the number of

routing tracks needed may vary considerably even among designs with the same amount of logic.

For example, a crossbar switch requires much more routing than a systolic array with the same

gate count. Since unused routing tracks increase the cost (and decrease the performance) of the

part without providing any benefit, FPGA manufacturers try to provide just enough tracks so that

most designs that will fit in terms ofLookup tables (LUTs) and IOs can be routed. This is

determined by estimates such as those derived from Rent's rule or by experiments with existing

designs.

In general, a logic block (CLB or LAB) consists of a few logical cells (called ALM,

LE, Slice etc.). A typical cell consists of a 4-input LUT, a Full adder(FA) and a D-type flip-flop,

as shown below. The LUTs are in this figure split into two 3-input LUTs. In normal mode thoseare combined into a 4-input LUT through the left mux. In arithmetic mode, their outputs are fed

to the FA. The selection of mode is programmed into the middle multiplexer. The output can be

either synchronous or asynchronous, depending on the programming of the mux to the right, in

the figure example. In practice, entire or parts of the FA are put as functions into the LUTs in

order to save space.
http://en.wikipedia.org/wiki/Crossbar_switchhttp://en.wikipedia.org/wiki/Systolic_arrayhttp://en.wikipedia.org/wiki/Lookup_table#Hardware_LUTshttp://en.wikipedia.org/wiki/Rent%27s_rulehttp://en.wikipedia.org/wiki/Full_adderhttp://en.wikipedia.org/wiki/Flip-flop_(electronics)http://en.wikipedia.org/wiki/Multiplexerhttp://en.wikipedia.org/wiki/File:FPGA_cell_example.pnghttp://en.wikipedia.org/wiki/Multiplexerhttp://en.wikipedia.org/wiki/Flip-flop_(electronics)http://en.wikipedia.org/wiki/Full_adderhttp://en.wikipedia.org/wiki/Rent%27s_rulehttp://en.wikipedia.org/wiki/Lookup_table#Hardware_LUTshttp://en.wikipedia.org/wiki/Systolic_arrayhttp://en.wikipedia.org/wiki/Crossbar_switch


19/78

Simplified example illustration of a logic cell

ALMs and Slices usually contains 2 or 4 structures similar to the example figure, with some

shared signals.

CLBs/LABs typically contains a few ALMs/LEs/Slices.

In recent years, manufacturers have started moving to 6-input LUTs in their high

performance parts, claiming increased performance.

Since clock signals (and often other high-fan-out signals) are normally routed via special-

purpose dedicated routing networks in commercial FPGAs, they and other signals are separately

managed.

For this example architecture, the locations of the FPGA logic block pins are shown below.

Logic Block Pin Locations

Each input is accessible from one side of the logic block, while the output pin can

connect to routing wires in both the channel to the right and the channel below the logic block.

Each logic block output pin can connect to any of the wiring segments in the channels adjacent

to it.

Similarly, an I/O pad can connect to any one of the wiring segments in the channel

adjacent to it. For example, an I/O pad at the top of the chip can connect to any of the W wires

(where W is the channel width) in the horizontal channel immediately below it.

Generally, the FPGA routing is unsegmented. That is, each wiring segment spans

only one logic block before it terminates in a switch box. By turning on some of the

programmable switches within a switch box, longer paths can be constructed. For higher speed

interconnect, some FPGA architectures use longer routing lines that span multiple logic blocks.
http://en.wikipedia.org/wiki/Fan-outhttp://en.wikipedia.org/wiki/File:Logic_block_pins.svghttp://en.wikipedia.org/wiki/Fan-out


20/78

Whenever a vertical and a horizontal channel intersect, there is a switch box. In this

architecture, when a wire enters a switch box, there are three programmable switches that allow

it to connect to three other wires in adjacent channel segments. The pattern, or topology, of

switches used in this architecture is the planar or domain-based switch box topology. In this

switch box topology, a wire in track number one connects only to wires in track number one in

adjacent channel segments, wires in track number 2 connect only to other wires in track number

2 and so on. The figure below illustrates the connections in a switch box.

Switch box topology

Modern FPGA families expand upon the above capabilities to include higher level

functionality fixed into the silicon. Having these common functions embedded into the silicon

reduces the area required and gives those functions increased speed compared to building them

from primitives. Examples of these include multipliers, generic DSP blocks, embedded

processors, high speed IO logic and embedded memories.

FPGAs are also widely used for systems validation including pre-silicon validation,

post-silicon validation, and firmware development. This allows chip companies to validate their

design before the chip is produced in the factory, reducing the time-to-market.

To shrink the size and power consumption of FPGAs, vendors such as Tabula and Xilinx have

introduced new 3D or stacked architectures. Following the introduction of its 28 nm 7-series

FPGAs, Xilinx revealed that several of the highest-density parts in those FPGA product lines
http://en.wikipedia.org/wiki/File:Switch_box.svg


21/78

will be constructed using multiple dies in one package, employing technology developed for 3D

construction and stacked-die assemblies. The technology stacks several (three or four) active

FPGA dice side-by-side on a silicon interposera single piece of silicon that carries passive

interconnect.

FPGA design and programming

To define the behavior of the FPGA, the user provides a hardware description

language (HDL) or a schematic design. The HDL form is more suited to work with large

structures because it's possible to just specify them numerically rather than having to draw every

piece by hand. However, schematic entry can allow for easier visualization of a design.

Then, using an electronic design automation tool, a technology-mapped netlist is

generated. The net list can then be fitted to the actual FPGA architecture using a process

called place-and-route, usually performed by the FPGA company's proprietary place-and-route

software. The user will validate the map, place and route results via timing analysis, simulation,

and otherverification methodologies. Once the design and validation process is complete, the

binary file generated (also using the FPGA company's proprietary software) is used to

(re)configure the FPGA. This file is transferred to the FPGA/CPLD via a serial interface (JTAG)

or to an external memory device like an EEPROM.

The most common HDLs are VHDL and Verilog, although in an attempt to reduce the

complexity of designing in HDLs, which have been compared to the equivalent ofassembly

languages, there are moves to raise the abstraction level through the introduction ofalternative

languages. National Instrument's Lab VIEW graphical programming language (sometimes

referred to as "G") has an FPGA add-in module available to target and program FPGA hardware.

To simplify the design of complex systems in FPGAs, there exist libraries of

predefined complex functions and circuits that have been tested and optimized to speed up the

design process. These predefined circuits are commonly called IP cores, and are available from

FPGA vendors and third-party IP suppliers (rarely free, and typically released under proprietary

licenses). Other predefined circuits are available from developer communities such

as OpenCores (typically released underfree and open source licenses such as the GPL, BSD or

similar license), and other sources.
http://en.wikipedia.org/wiki/Interposerhttp://en.wikipedia.org/wiki/Hardware_description_languagehttp://en.wikipedia.org/wiki/Hardware_description_languagehttp://en.wikipedia.org/wiki/Schematichttp://en.wikipedia.org/wiki/Electronic_design_automationhttp://en.wikipedia.org/wiki/Netlisthttp://en.wikipedia.org/wiki/Place_and_routehttp://en.wikipedia.org/wiki/Timing_analysishttp://en.wikipedia.org/wiki/Simulationhttp://en.wikipedia.org/wiki/Verification_and_validationhttp://en.wikipedia.org/wiki/Serial_communicationhttp://en.wikipedia.org/wiki/Joint_Test_Action_Grouphttp://en.wikipedia.org/wiki/EEPROMhttp://en.wikipedia.org/wiki/VHDLhttp://en.wikipedia.org/wiki/Veriloghttp://en.wikipedia.org/wiki/Assembly_languagehttp://en.wikipedia.org/wiki/Assembly_languagehttp://en.wikipedia.org/wiki/Hardware_description_language#HDL_and_programming_languageshttp://en.wikipedia.org/wiki/Hardware_description_language#HDL_and_programming_languageshttp://en.wikipedia.org/wiki/LabVIEWhttp://en.wikipedia.org/wiki/Semiconductor_intellectual_property_corehttp://en.wikipedia.org/wiki/OpenCoreshttp://en.wikipedia.org/wiki/Free_and_open_source_softwarehttp://en.wikipedia.org/wiki/GNU_General_Public_Licensehttp://en.wikipedia.org/wiki/BSD_licensehttp://en.wikipedia.org/wiki/BSD_licensehttp://en.wikipedia.org/wiki/GNU_General_Public_Licensehttp://en.wikipedia.org/wiki/Free_and_open_source_softwarehttp://en.wikipedia.org/wiki/OpenCoreshttp://en.wikipedia.org/wiki/Semiconductor_intellectual_property_corehttp://en.wikipedia.org/wiki/LabVIEWhttp://en.wikipedia.org/wiki/Hardware_description_language#HDL_and_programming_languageshttp://en.wikipedia.org/wiki/Hardware_description_language#HDL_and_programming_languageshttp://en.wikipedia.org/wiki/Assembly_languagehttp://en.wikipedia.org/wiki/Assembly_languagehttp://en.wikipedia.org/wiki/Veriloghttp://en.wikipedia.org/wiki/VHDLhttp://en.wikipedia.org/wiki/EEPROMhttp://en.wikipedia.org/wiki/Joint_Test_Action_Grouphttp://en.wikipedia.org/wiki/Serial_communicationhttp://en.wikipedia.org/wiki/Verification_and_validationhttp://en.wikipedia.org/wiki/Simulationhttp://en.wikipedia.org/wiki/Timing_analysishttp://en.wikipedia.org/wiki/Place_and_routehttp://en.wikipedia.org/wiki/Netlisthttp://en.wikipedia.org/wiki/Electronic_design_automationhttp://en.wikipedia.org/wiki/Schematichttp://en.wikipedia.org/wiki/Hardware_description_languagehttp://en.wikipedia.org/wiki/Hardware_description_languagehttp://en.wikipedia.org/wiki/Interposer


22/78

In a typical design flow, an FPGA application developer will simulate the design at

multiple stages throughout the design process. Initially the RTL description

in VHDL orVerilog is simulated by creating test benches to simulate the system and observe

results. Then, after the synthesis engine has mapped the design to a netlist, the netlist is

translated to a gate level description where simulation is repeated to confirm the synthesis

proceeded without errors. Finally the design is laid out in the FPGA at which point propagation

delays can be added and the simulation run again with these values back-annotated onto the

netlist.

Basic process technology types

SRAM - based on static memory technology. In-system programmable and re-programmable.

Requires external boot devices.

CMOS -Currently in use.

Antifuse- One-time programmable. CMOS.

PROM- Programmable Read-Only Memory technology. One-time programmable because of

plastic packaging. Obsolete.

EPROM- Erasable Programmable Read-Only Memory technology. One-time programmable

but with window, can be erased with ultraviolet (UV) light. CMOS. Obsolete.

EEPROM- Electrically Erasable Programmable Read-Only Memory technology. Can be

erased, even in plastic packages. Some but not all EEPROM devices can be in-system

programmed. CMOS.

Flash - Flash-erase EPROM technology. Can be erased, even in plastic packages. Some but not

all flash devices can be in-system programmed. Usually, a flash cell is smaller than an equivalent

EEPROM cell and is therefore less expensive to manufacture. CMOS.

Fuse - One-time programmable. Bipolar. Obsolete.

Major manufacturers

Xilinx and Altera are the current FPGA market leaders and long-time industry rivals.

Together, they control over 80 percent of the market.Both Xilinx and Altera provide

free Windows and Linux design software which provides limited sets of devices. Other

competitors include Lattice Semiconductor(SRAM based with integrated configuration Flash,
http://en.wikipedia.org/wiki/Register_transfer_levelhttp://en.wikipedia.org/wiki/VHDLhttp://en.wikipedia.org/wiki/Veriloghttp://en.wikipedia.org/wiki/Logic_synthesishttp://en.wikipedia.org/wiki/Static_Random_Access_Memoryhttp://en.wikipedia.org/wiki/CMOShttp://en.wikipedia.org/wiki/Antifusehttp://en.wikipedia.org/wiki/Antifusehttp://en.wikipedia.org/wiki/Programmable_read-only_memoryhttp://en.wikipedia.org/wiki/Programmable_read-only_memoryhttp://en.wikipedia.org/wiki/EPROMhttp://en.wikipedia.org/wiki/EPROMhttp://en.wikipedia.org/wiki/EEPROMhttp://en.wikipedia.org/wiki/EEPROMhttp://en.wikipedia.org/wiki/Flash_memoryhttp://en.wikipedia.org/wiki/Fuse_(electrical)http://en.wikipedia.org/wiki/Xilinxhttp://en.wikipedia.org/wiki/Alterahttp://en.wikipedia.org/wiki/Microsoft_Windowshttp://en.wikipedia.org/wiki/Linuxhttp://en.wikipedia.org/wiki/Lattice_Semiconductorhttp://en.wikipedia.org/wiki/Lattice_Semiconductorhttp://en.wikipedia.org/wiki/Linuxhttp://en.wikipedia.org/wiki/Microsoft_Windowshttp://en.wikipedia.org/wiki/Alterahttp://en.wikipedia.org/wiki/Xilinxhttp://en.wikipedia.org/wiki/Fuse_(electrical)http://en.wikipedia.org/wiki/Flash_memoryhttp://en.wikipedia.org/wiki/EEPROMhttp://en.wikipedia.org/wiki/EPROMhttp://en.wikipedia.org/wiki/Programmable_read-only_memoryhttp://en.wikipedia.org/wiki/Antifusehttp://en.wikipedia.org/wiki/CMOShttp://en.wikipedia.org/wiki/Static_Random_Access_Memoryhttp://en.wikipedia.org/wiki/Logic_synthesishttp://en.wikipedia.org/wiki/Veriloghttp://en.wikipedia.org/wiki/VHDLhttp://en.wikipedia.org/wiki/Register_transfer_level


23/78

instant-on, low power, live reconfiguration), Actel (now Microsemi, antifuse, flash-based,

mixed-signal), SiliconBlue Technologies (extremely low power SRAM-based FPGAs with

optional integrated nonvolatile configuration memory; acquired by Lattice in

2011), Achronix (SRAM based, 1.5 GHz fabric speed) ,[39] and QuickLogic (handheld focused

CSSP, no general purpose FPGAs).
http://en.wikipedia.org/wiki/Actelhttp://en.wikipedia.org/wiki/Microsemihttp://en.wikipedia.org/wiki/SiliconBlue_Technologieshttp://www.achronix.com/http://en.wikipedia.org/wiki/Field-programmable_gate_array#cite_note-39http://en.wikipedia.org/wiki/QuickLogichttp://en.wikipedia.org/wiki/QuickLogichttp://en.wikipedia.org/wiki/Field-programmable_gate_array#cite_note-39http://www.achronix.com/http://en.wikipedia.org/wiki/SiliconBlue_Technologieshttp://en.wikipedia.org/wiki/Microsemihttp://en.wikipedia.org/wiki/Actel


24/78

Design and Implementation of A Lottery-based Bandwidth

Guaranteed and Low Latency Arbiter for On-Chip Bus1.Introduction

The bus arbiter decides which master can be granted for bus accesses when the multiple masters

issue bus requests at the same time in a system-on-chip. In the previous bus arbitration

algorithms, the static fixed priority algorithm , and the time division multiplexing (TDM)/

Round-robin algorithm , and the Lottery ticket bus algorithm exist some drawbacks in

arbitrations, such as the bus starvation problem, and low system performance problem because of

bus distribution latency during bus arbitration, and the large latency delay problem because of

lower ratio assigned ticket number of the masters. Recently, the real-time issue has been

considered for bus arbitrations [5]. In our paper, we propose the two-level static priority bus

arbitration algorithm, the RB_Lottery algorithm, which is based on Lottery bus arbitration

method. The proposed RB_Lottery bus arbitration can handle the real-time requirements for all

masters, and solve the traditional bus distribution problem, and guarantee the bandwidth

requirement of each master, and then reduces the bus arbitration latency. In the first level bus

arbitration, we use static priority real-time counters to satisfy the real-time requirements, which

is named the static priority real-time handler. In the second level bus arbitration, we adopt a

Lottery-based algorithm with binary group partition for avoiding starvation and reducing bus

latency.


25/78

2. Previous Bus Arbitration Schemes for SOC Bus

Communications2.1 Static Fixed Priority Algorithm

The static fixed priority algorithm assigns the unique value of the priority in each master, and

then the arbiter will periodically check the requirement of each master. When several masters

issue requests simultaneously, the master which owns the highest priority will be granted to

access the bus. The advantages of the scheme are quick arbitration and simple architecture, but

static priority based arbitration allocates the proportion of communication bandwidth to each

master according to its own priority, and this causes that the low priority master will have

bandwidth starvation if there are many high priority communication traffics on the bus.

2.2 Two-level TDM / Round-robin Algorithm

Time division multiplexed (TDM) scheme divides the scheduling execution time on the bus into

the time slots, and then allocates the time slots to each master . Each time slot can span several

physical transactions on the bus. The arbitration can provide elastic bandwidth assignments,

when a master which has reserved more than one slot is potentially granted to access the bus

multiple times. The 1st level of arbitration uses a timing wheel, where each slot is statically

reserved for a unique master. If the master possesses the current time slot, but the master does

not issue a request, the current time slot will be wasted. For repairing this defect, the 2nd level of

arbitration, which is called the Round-robin algorithm, can reallocate the

available slots to other requesting masters.

2.3 Lottery Bus Algorithm

For the Lottery bus arbitration algorithm , the role of the arbitration is like a lottery manager,

which decides which lucky one can win the prize. The lottery manager gathers the requests of

bus accesses from all of the masters, and then each master is statically assigned a number of

lottery tickets. The lottery manager generates a pseudo random number, which corresponds to

one ticket number, and thus the master which owns more tickets is most likely granted. The

ticket number in the lottery arbitration algorithm is equal to the weight of each master. The

Lottery arbitration algorithm is the probability-based distribution, which can avoid the bus


26/78

starvation. Meanwhile, the Lottery arbitration has great control ability of communication

bandwidth allocations to each master, but the master which owns lower tickets has more average

latency than the other masters. In Figure 1, let us set the bus masters to be C1, C2,,

Cn. We define that the number of tickets held by each master is t1, t2, , tn. At any bus cycle,

let us define the pending requests to be represented by a set of Boolean variables ri for i=1, 2,

,n, where ri=1 if the master Ci has a pending request, and otherwise ri =0. For the Lottery

arbitration, the granted master is chosen by a randomized way, i.e. the probability of granting

master Ci.

Lottery bus arbiter for four bus masters


27/78

2.Proposed Bus Arbitration Scheme3.1 Scope of the Arbitration

Since the previous arbitration algorithms can not handle the strict real-time requirements, we

propose the two-level arbitration algorithm, which is called the RB_Lottery bus arbitration. The

proposed arbiter architecture is shown in Figure 2. In the first level, the static priority real-time

handler intends to handle the real-time requirements. In the second level, the binary group

partition with Lottery-based scheme intends to guarantee the bandwidth which each master

needs, and reduces the distribution latency during bus arbitrations. It notes that once the static

priority real-time handler in the first level generates a valid grant output (grant=1) for one

bus master, the output of the second level arbitration will be disabled. On the contrary, the grant

output of the second level arbitration will be valid only when the first level arbitration does not

output a valid grant. The detailed descriptions of the proposed RB_Lottery scheme will be

discussed in the following sections.

3.2 Proposed Arbitration Algorithm

The static priority real-time handler sets a priority real-time counter for the real-time requirement

of each master. Initially, we can set suitably initial counter values into real-time counters for all

bus masters. When a master issues a request, the corresponding real-time counter will be

decreased by 1 until the master is granted. Two conditions will happen when a master issues a

request for bus grant. On the one hand, when the counter value is decreased to zero, then the first

level arbitration will generate a valid grant, and the counter value in the real-timer counter will

be reset. On the other hand, when the counter value is not decreased to zero and the second level

arbitration generates a corresponding valid grant at the same time, then the corresponding real-

time counter will be reset. If several real-time counters are decreased to zero simultaneously, themaster which owns the highest priority will be granted.


28/78

Two-level arbitration scheme for the proposed RB_ Lottery arbiter

In the aspect of binary group logic, each master gives the identification number for priority

request. Then, we group two masters into a binary set. If the higher priority master issues arequest signal, then this signal will deliver into Lottery-based block. The binary group logic

architecture is shown in Figure . In Figure 3 the input net, Lo priority request, must be connected

to the request signal of lower priority master in a binary set. Then, the Hi priority request must be

connected to the request signal of higher priority master in the same binary set.

Binary group logic circuit for priority selection


29/78

3.3 Proposed Arbiter Architecture

To describe clearly the architecture of the proposed RB_Lottery arbiter, we discuss the proposed

bus arbitration with 4-master case in Figure 4 as follows. Let us define that the request signals of

Master 1 (M1), Master 2 (M2), Master 3 (M3) and Master 4 (M4) are assigned to r1, r2, r3

and r4, respectively. Then, the priority order is assigned to M1 > M2 > M3 > M4, where M1

owns the highest priority. In Lottery-based part, t1=1 and t2=2. If the master wants to access or

communicate data through using bus, the corresponding request signal will be set to high (i.e.,1);

otherwise, the corresponding request signal will be set low (i.e.,0). In condition 1, suppose that

r1=r3=1, r2=r4=0, the real-time counters of M1 and M3 are decreasing to zero, simultaneously.

Since the priority of M3 is more important than that of M1, the M3 will obtain the bus grant,

and then the grant signal, gnt[3], will be set to 1 in the static priority real-time handler. At the

same time, the grant output of the binary group with Lottery-based block is disabled. In

condition 2, suppose that r1=r3=1, r2=r4=0, but the real-time counters of M1 and M3 are not

decreasing to zero, then the rb1 signal at the output of the binary group logic, MUX 1, is set to1.

Meanwhile, the rb2 signal at the output of the binary group logic, MUX 2, is also set to 1.

Thus, the grant output of the Lottery-based scheme will be active. Table 1 shows the truth table

of grant decoder in the RB_Lottery arbiter for 4-master case. The generated random number is

compared in parallel with two partial sums, where the outputs of the comparators are b0 and b1.

The comparator will output a 1 if the value of the random number is less than the partial sum at

the other input.


30/78

The architecture of proposed RB_ Lottery bus arbiter for 4-master case


31/78

AMBA AXI4 architecture

AMBA AXI4 [3] supports data transfers up to 256 beats and unaligned data transfers using byte

strobes. In AMBA AXI4 system 16 masters and 16 slaves are interfaced. Each master and slave

has their own 4 bit ID tags. AMBA AXI4 system consists of master, slave and bus (arbiters and

decoders). The system consists of five channels namely write address channel, write data

channel, read data channel, read address channel, and write response channel. The AXI4 protocol

supports the following mechanisms:

Unaligned data transfers and up-dated write response requirements.

Variable-length bursts, from 1 to 16 data transfers per burst.

A burst with a transfer size of 8, 16, 32, 64, 128, 256, 512 or 1024 bits wide is supported.

Updated AWCACHE and ARCACHE signalling details.

Each transaction is burst-based which has address and control information on the address

channel that describes the nature of the data to be transferred. The data is transferred between

master and slave using a write data channel to the slave or a read data channel to the master.

Table gives the information of signals used in the complete design of the protocol. The write

operation process starts when the master sends an address and control information on the write

address channel as shown in fig. 1. The master then sends each item of write data over the write

data channel. The master keeps the VALID signal low until the write data is available. The

master sends the last data item, the WLAST signal goes HIGH. When the slave has accepted all

the data items, it drives a write response signal BRESP[1:0] back to the master to indicate that

the write transaction is complete. This signal indicates the status of the write transaction. The

allowable responses are OKAY, EXOKAY, SLVERR, and DECERR. After the read address

appears on the address bus, the data transfer occurs on the read data channel as shown in fig. The

slave keeps the VALID signal LOW until the read data is available. For the final data transfer of

the burst, the slave asserts the RLAST signal to show that the last data item is being transferred.

The RRESP[1:0] signal indicates the status of the read transfer. The allowable responses are

OKAY, EXOKAY, SLVERR, and DECERR.


32/78

The work carried out in this project is the achievement of communication between one

master and one slave. AMBA AXI4 slave is designed with operating frequency of 100MHz,

which gives each clock cycle of duration 10ns. To access slave interconnect is needed, hence

interconnect signals are also studied. Master block functions are assumed to be available and the

slave characteristics are studied. The AMBA AXI4 system components consists of

1) Master

2) AMBA AXI4 Interconnect

2.1) Arbiters

2.2) Decoders

3) Slave

The master is connected to the interconnect using a slave interface and the slave is

connected to the interconnect using a master interface as shown in fig. The AXI4 master gets

connected to the AXI4 slave interface port of the interconnect and the AXI slave gets connected

to the AXI4 Master interface port of the interconnect. The parallel capability of this

interconnects enables master M1 to access one slave at the same as master M0 is accessing the

other.


33/78

AMBA AXI4 slave Read/Write block Diagram.

SOFTWARE TOOLS

Verilog HDL

Verilog is used to model the design in this project. It is a hardware description language (HDL)

used to model electronic systems. It is sometimes called Verilog HDL, which supports the

design, verification, and implementation of analog, digital, and mixed-signal circuits at variouslevels of abstraction.

Verilog was originally developed and owned by Gate Way design in 1984. After this,

Cadence Design Systems purchased Gate Way and continued selling Verilog-XL as a Verilog-

HDL simulator with PLI support in 1990. In 1995 Cadence released the specs for Verilog-HDL


34/78

and they were accepted as IEEE -1364 standard which included the PLI1.0 (TF/ACC) routines as

a standard for all Verilog Simulators. In 1993 PLI2.0 (VPI) routines were released as a standard

by OVI and in 1999 IEEE will vote on updating the 1364 standard to include PLI2.0. In 2001

IEEE accepted the updated Verilog standard commonly known as Verilog 2001 and today, there

are a dozen simulators that simulate Verilog HDL.

Verilog was generated as a language for the industry rather than academia. It is very C

like programming style that closely represents hardware. VHDL supports 9 values logic, where

as Verilog supports 7 strengths on 3 values. Compared to VHDL, VHDL offers more

programming constructs where as Verilog is closer to hardware.

Verilog HDL is a general purpose hardware description language that is easy to learn

and easy to use. It is similar to C programming language. Designers with C programming

experience will find it easy to learn Verilog HDL, and will be comfortable with its syntax.

Besides, Verilog allows different levels of abstraction to be mixed in the same models. Thus, a

designer can define a hardware model in terms of switches, gates, RTL, or behavioral code. Also,

a designer needs to learn only one language for stimulus and hierarchical design. On top of that,

most popular logic synthesis tools support Verilog HDL. This makes it the language of choice

for many ASIC companies. More important is, all fabrication vendors provide Verilog HDL

libraries for post logic synthesis simulation. Thus, designing a chip in Verilog HDL allows the

widest choice of vendors.

3.3 INTRODUCTION TO XILINX ISE 12.1 EDA TOOL:

ISE (Integrated Software Environment) continues to be the design tool of choice for

FPGA designers based on independent media surveys. Xilinx once again earned top ranking in

this year in all FPGA EDA categories and scored higher than any other FPGA vendor in user

satisfaction.

Xilinx ISE Overview

The Integrated Software Environment (ISE) is the Xilinx design software suite that

allows you to take your design from design entry through Xilinx device programming. The ISE


35/78

Project Navigator manages and processes your design through the following steps in the ISE

design flow.

A simplified version of design flow is given in the flowing diagram.

Figure 3.5: FPGA Design Flow

Design Entry

There are different techniques for design entry. Schematic based, Hardware Description

Language and combination of both etc. Selection of a method depends on the design and

designer. If the designer wants to deal more with Hardware, then Schematic entry is the better

choice. When the design is complex or the designer thinks the design in an algorithmic way then

HDL is the better choice. Language based entry is faster but lag in performance and density.

HDLs represent a level of abstraction that can isolate the designers from the details of the

hardware implementation. Schematic based entry gives designers much more visibility into the

hardware. It is the better choice for those who are hardware oriented. Another method but rarely

used is state-machines. It is the better choice for the designers who think the design as a series of


36/78

states. But the tools for state machine entry are limited. In this documentation we are going to

deal with the HDL based design entry.

Synthesis

The process which translates VHDL or Verilog code into a device netlist format. i.e a complete

circuit with logical elements (gates, flip flops, etc) for the design.If the design contains more

than one sub designs, ex. to implement a processor, we need a CPU as one design element and

RAM as another and so on, then the synthesis process generates netlist for each design element

Synthesis process will check code syntax and analyze the hierarchy of the design which ensures

that the design is optimized for the design architecture, the designer has selected. The resulting

netlist(s) is saved to an NGC( Native Generic Circuit) file (for Xilinx Synthesis Technology

(XST)).

Implementation

This process consists a sequence of three steps

1. Translate

2. Map

3. Place and Route

Translate process combines all the input netlists and constraints to a logic design file. This

information is saved as a NGD (Native Generic Database) file. This can be done using NGD

Build program. Here, defining constraints is nothing but, assigning the ports in the design to the

physical elements (ex. pins, switches, buttons etc) of the targeted device and specifying time


37/78

requirements of the design. This information is stored in a file named UCF (User Constraints

File).

Tools used to create or modify the UCF are PACE, Constraint Editor etc.

Map process divides the whole circuit with logical elements into sub blocks such that they can

be fit into the FPGA logic blocks. That means map process fits the logic defined by the NGD file

into the targeted FPGA elements (Combinational Logic Blocks (CLB), Input Output Blocks

(IOB)) and generates an NCD (Native Circuit Description) file which physically represents the

design mapped to the components of FPGA. MAP program is used for this purpose.


38/78

Place and Route PAR program is used for this process. The place and route process places the

sub blocks from the map process into logic blocks according to the constraints and connects the

logic blocks. Ex. if a sub block is placed in a logic block which is very near to IO pin, then it

may save the time but it may effect some other constraint. So trade off between all the

constraints is taken account by the place and route process

The PAR tool takes the mapped NCD file as input and produces a completely routed NCD file as

output. Output NCD file consists the routing information.

Device Programming

Now the design must be loaded on the FPGA. But the design must be converted to a format so

that the FPGA can accept it. BITGEN program deals with the conversion. The routed NCD file is

then given to the BITGEN program to generate a bit stream (a .BIT file) which can be used to

configure the target FPGA device. This can be done using a cable. Selection of cable depends on

the design.

Design Verification

Verification can be done at different stages of the process steps.

Behavioral Simulation (RTL Simulation) This is first of all simulation steps; those are

encountered throughout the hierarchy of the design flow. This simulation is performed before

synthesis process to verify RTL (behavioral) code and to confirm that the design is functioning


39/78

as intended. Behavioral simulation can be performed on either VHDL or Verilog designs. In this

process, signals and variables are observed, procedures and functions are traced and breakpoints

are set. This is a very fast simulation and so allows the designer to change the HDL code if the

required functionality is not met with in a short time period. Since the design is not yet

synthesized to gate level, timing and resource usage properties are still unknown.

Functional simulation(Post Translate Simulation) Functional simulation gives information

about the logic operation of the circuit. Designer can verify the functionality of the design using

this process after the Translate process. If the functionality is not as expected, then the designer

has to made changes in the code and again follow the design flow steps.

Static Timing AnalysisThis can be done after MAP or PAR processes Post MAP timing

report lists signal path delays of the design derived from the design logic. Post Place and Route

timing report incorporates timing delay information to provide a comprehensive timing summary

of the design.

FPGA AND ISE DEVELOPMENT SOFTWARE BASICS

The Spartan-3 EDK Board provides a powerful, self-contained development platform for designs

targeting the new Spartan-3 FPGA from Xilinx. It features a 200K gate Spartan-3, onboard I/O

devices, and 1MB fast asynchronous SRAM, making it the perfect platform to experiment with

any new design, from a simple logic circuit to an embedded processor core. The board alsocontains a Platform Flash JTAG-programmable ROM, so designs can easily be made non-

volatile.

Xilinx Spartan3 FPGA:

200,000-gate Xilinx Spartan 3 FPGA in a 144-TQFP (XC3S200-4TQG144C)


40/78

4,320 logic cell equivalents Twelve 18K-bit block RAMs (216K bits) Twelve 18x18 hardware multipliers Four Digital Clock Managers (DCMs) Up to 97 user-defined I/O signals

External Peripherals Modules

2x16 LCD with Contrast adjusts 2-Nos. of common anode seven segment display 8-Nos. General purpose point LEDs 8-Nos of Toggle switches (Digital inputs) 4-Nos of Push Button PS/2 Keyboard or Mouse Interface

Communication protocols

Full Duplex UART (EIA RS232)Other Features:

VGA Interface Connector On-board 4 MB Platform Flash Memory (PROM) 8 MB On Board SRAM JTAG Interface Connector for parallel programming Spartan3 FPGA 50 MHz crystal oscillator clock source


41/78

SPARTAN3 (EDK) Board Components placement top view


42/78

Block Diagram

Figure 2. Xilinx Spartan3Advanced Development Board Block Diagram

On-board Peripherals

The Spartan3FPGA Lab Kit comes with many interfacing options

2 Nos. of Seven-segment display 8-Nos. of Toggle switches (Digital Inputs) 4-Nos. of Push Button (Digital Inputs)


43/78

8-Nos. of Point LEDs (Digital Outputs) 2x16 Character LCD UART for serial port communication through PC PS/2 keyboard Interface 3-Bit VGA Interface

GETTING STARTED

Software Requirements

To use this tutorial, you must install the following software:

ISE 12.1

For more information about installing Xilinx software, see the ISE Release Notes and

Installation Guide at: http://www.xilinx.com/support/software_manuals.htm.

Hardware Requirements

To use this tutorial, you must have the following hardware:

Spartan-3 Startup Kit, containing the Spartan-3 Startup Kit Demo Board8 www.xilinx.com ISE

STARTING THE ISE SOFTWARE

To start ISE, double-click the desktop icon

or start ISE from the Start menu by selecting:

Start All Programs Xilinx ISE 12.1 Project Navigator

Note: Your start-up path is set during the installation process and may differ from the one above

ACCESSING HELP

At any time during the tutorial, you can access online help for additional information

about the ISE software and related tools.

To open Help, do either of the following:

Press F1 to view Help for the specific tool or function that you have selected or highlighted.

Launch the ISE Help Contents from the Help menu. It contains information about creating and

maintaining your complete design flow in ISE.


44/78

Figure 1: ISE Help Topics

Create a New Project

Create a new ISE project which will target the FPGA device on the Spartan-3 Startup Kit demo

board.

To create a new project:

1. Select File > New Project... The New Project Wizard appears.

2. Type tutorial in the Project Name field.

3. Enter or browse to a location (directory path) for the new project. A tutorial subdirectory is

created automatically.

4. Verify that HDL is selected from the Top-Level Source Type list.

5. Click Next to move to the device properties page.

6. Fill in the properties in the table as shown below:

Product Category: All

Family: Spartan3

Device: XC3S200

Package: TQ144

Speed Grade: -4

Top-Level Source Type: HDL

Synthesis Tool: XST (VHDL/Verilog)

Simulator: ISE Simulator (VHDL/Verilog)

Preferred Language: Verilog (or VHDL)


45/78

Verify that Enable Enhanced Design Summary is selected.

Leave the default values in the remaining fields.

When the table is complete, your project properties will look like the following:

Figure 2: Project Device Properties

Creating a Verilog Source

Create the top-level Verilog source file for the project as follows:

1. Click New Source in the New Project dialog box.

2. Select Verilog Module as the source type in the New Source dialog box.

3. Type in the file name counter.

4. Verify that the Add to Project checkbox is selected.

5. Click Next.

6. Declare the ports for the design by filling in the port information.

The source file containing the row DWT module displays in the Workspace, and the counter

displays in the Sources tab, as shown below:


46/78

Checking the Syntax of the New Counter Module

When the source files are complete, check the syntax of the design to find errors and typos.

1. Verify that Implementation is selected from the drop-down list in the Sources window.

2. Select the row DWT design source in the Sources window to display the related processes in

the Processes window.3. Click the + next to the Synthesize-XST process to expand the process group.

4. Double-click the Check Syntax process.

Note: You must correct any errors found in your source files. You can check for errors in the

Console tab of the Transcript window. If you continue without valid syntax, you will not be able

to simulate or synthesize your design.

5.Close the HDL file.

BEHAVIORAL SIMULATION

ISIM SETUP

ISim is automatically installed and set up with the ISE Design Suite 12 installer on supported

operating systems. To see a list of operating systems supported by ISim, please see the ISE

Design Suite 12: Installation, Licensing, and Release Notes available from the Xilinx website.


47/78

Getting Started

The following sections outline the requirements for performing behavioral simulation in this

tutorial.

Required FilesThe behavioral simulation flow requires design files, a test bench file, and Xilinx simulation

libraries.

Design Files (VHDL, Verilog, or Schematic)

This chapter assumes that you have completed the design entry tutorial in either Chapter 2,

HDL-Based Design, or Chapter 3, Schematic-Based Design. After you have completed one

of these chapters, your design includes the required design files and is ready for simulation.

Test Bench File

To simulate the design, a test bench file is required to provide stimulus to the design. VHDL and

Verilog test bench files are available with the tutorial files. You may also create your own test

bench file.

Simulation Libraries

Xilinx simulation libraries are required when a Xilinx primitive or IP core is instantiated in the

design. The design in this tutorial requires the use of simulation libraries because it contains

instantiations of a digital clock manager (DCM) and a CORE Generator software component.

For information on simulation libraries and how to compile them, see the next section, Xilinx

Simulation Libraries.

XILINX SIMULATION LIBRARIES

To simulate designs that contain instantiated Xilinx primitives, CORE Generator software

components, and other Xilinx IP cores you must use the Xilinx simulation libraries. These

libraries contain models for each component. These models reflect the functions of each

component, and provide the simulator with the information required to perform simulation. For a

detailed description of each library, see the Synthesis and Simulation Design Guide. This guide

is available from the ISE Software Manuals collection, automatically installed with your ISE

software. To open the Software Manuals collection, select Help > Software Manuals. The

Software Manuals collection is also available from the Xilinx website.


48/78

Adding an HDL Test Bench

To add an HDL test bench to your design project, you can either add a test bench file provided

with this tutorial, or create your own test bench file and add it to your project. Adding theTutorial Test Bench File

This section demonstrates how to add an existing test bench file to the project. A VHDL

test bench and Verilog test fixture are provided with this tutorial.

Note: To create your own test bench file in the ISE software, select Project > New Source, and

select either VHDL Test Bench or Verilog Text Fixture in the New Source Wizard. An empty

stimulus file is added to your project. You must define the test bench in a text editor.

Verilog Simulation

To add the tutorial Verilog test fixture to the project, do the following:

1. In Project Navigator, select Project > Add Source.

2. Select the file tb_rowDWT.v.

3. Click Open.

4. Ensure that Simulation is selected for the file association type.

5. Click OK.

Behavioral Simulation Using ISim

Follow this section of the tutorial if you have skipped the previous section, Behavioral

Simulation Using ModelSim.

Now that you have a test bench in your project, you can perform behavioral simulation on

the design using ISim. The ISE software has full integration with ISim. The ISE software

enables ISim to create the work directory, compile the source files, load the design, and

perform simulation based on simulation properties.

To select ISim as your project simulator, do the following:

1. In the Hierarchy pane of the Project Navigator Design panel, right-click the device line

(xc3s700A-4fg484), and select Design Properties.

2. In the Design Properties dialog box, set the Simulator field to ISim (VHDL/Verilog).

Locating the Simulation Processes

The simulation processes in the ISE software enable you to run simulation on the design


49/78

using ISim. To locate the ISim processes, do the following:

1. In the View pane of the Project Navigator Design panel, select Simulation, and select

Behavioral from the drop-down list.

2. In the Hierarchy pane, select the test bench file (tb_rowDWT).

3. In the Processes pane, expand ISim Simulator to view the process hierarchy.

Performing Simulation

After the process properties have been set, you are ready to run ISim to simulate the

design. To start the behavioral simulation, double-click Simulate Behavioral Model. ISim creates

the work directory, compiles the source files, loads the design, and performs

simulation for the time specified.

Adding Signals

To view signals during the simulation, you must add them to the Waveform window. The

ISE software automatically adds all the top-level ports to the Waveform window.

Additional signals are displayed in the Instances and Processes panel. The following

procedure explains how to add additional signals in the design hierarchy. For the purpose

of this tutorial, add the DCM signals to the waveform.

To add additional signals in the design hierarchy, do the following:

1. In the Instances and Processes panel, expand tb_rowDWT, and expand UUT.

The following figure shows the contents of the Instances and Processes panel for the

verilog flow.

.


50/78

Drag all the selected signals to the waveform. Alternatively, right click on a selected

signal and select Add To Waveform.

Notice that the waveforms have not been drawn for the newly added signals. This is

because ISim did not record the data for these signals. By default, ISim records data only

for the signals that are added to the waveform window while the simulation is running.

Therefore, when new signals are added to the waveform window, you must rerun the


51/78

simulation for the desired amount of time.

Rerunning Simulation

To rerun the simulation in ISim, do the following:

1. Click the Restart Simulation icon.Figure : ISim Restart Simulation Icon

2. At the ISim command prompt in the Console, enter run 2000 ns and press Enter.

The simulation runs for 2000 ns. The waveforms for the counter block are now visible in the

Waveform window.

Running a Simulation in ISim

Simulation, the process of verifying the logic and timing of a design, can be run from

ISim using functions in the interface or at the command line.

To Run a Simulation From the ISim GUI

The following GUI menu commands can be used to run simulation.

Simulation > Restart - Stops simulation and sets simulation time back to 0. Use

the Run All, Run For or Step command to run the simulation over again without

reloading the design. See restart Tcl command.

Simulation > Run All - Runs simulation until all events are executed. You can

also use the Run Tcl command with the all option.

Simulation > Run - Runs simulation for 100ns or for specified amount of time in

the toolbar. Time and time unit are entered in the Value box in the toolbar. You can

also use the run Tcl command with a length and unit specified.

Simulation > Step - Runs simulation for one executable HDL instruction at a

time. See Stepping a Simulation. See also step Tcl command.

In addition, you can run simulation until a specific point in your HDL source code

is reached. To do so, use breakpoints and the Run All command. See Source Level

Debugging Overview.

Note The current simulation time is displayed on the status bar in the lower right corner.

Pausing a Simulation

While running a simulation for any length of time, you can pause a simulation using the

Break command, which leaves the simulation session open.


52/78

To close the session of ISim, see Closing ISim.

To Pause a Running Simulation

You can pause a running simulation using the Break command as follows:

Select Simulation > Break.

Click the Break toolbar button .

EnterCtrl+C at the command line only.

The simulator stops at the next executable HDL line. The line at which the simulation

stopped is displayed in the text editor.

Note This behavior applies to designs that have not been compiled with the -nodebug

switch.

The simulation can be resumed at any time by using the Run All, Run, or Step

commands. See Running a Simulation in ISim for details.

Closing ISim

You can terminate a simulation and close the ISim session.

To Close ISim

Select File > Exit.

Enter the quit -f command in the Console panel at the prompt. This will prevent

an are you sure dialog box from opening.

Click the X at the top-right corner of the main window.

The simulation terminates and the session of ISim closes.

Changing the Radix

To Set the Default Radix

The default radix controls the bus radix displayed in the wave configuration, Objects

panel, and the Console panel. To change the default radix from the default binary:

1. Select Edit > Preferences.

2. In the Preferences dialog box, click ISim Simulator in the left pane.

3. Select a radix from the Default Radix field drop-down list.

4. Click Apply, and click OK.

To Change an Individual Radix

You can change the radix of an individual signal (HDL object) in the Object panel

as follows:


53/78

1. Right-click on a bus in the Objects panel.

2. Select Radix, and the desired format from submenu menu:

Binary

Hexadecimal

Unsigned Decimal

Signed Decimal

Octal

ASCII

vcd Command

The vcd command generates simulation results in VCD format. This command enables

you to dump specified instances to a VCD file, to name the VCD file, to start and stop

the dump process, and other functions. See also Writing Activity Data of the Design.

Note This command is case sensitive.


54/78

Syntax vcd (option)

Examples

The vcd command can be used as follows.

Following are the commands you would use to write the VCD simul

arbitery document

Documents