arbitery document

Upload: hvchillal88

Post on 14-Apr-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/27/2019 Arbitery Document

    1/78

    AAcckknnoowwlleeddggeemmeenntt

    We would like to thank our beloved parents for their endless kind support both mentally,

    financially and for encouraging us, without which we would not be what we are today.

    At the outset we sincerely thank Mr. Director, for his kind cooperation and

    Encouragement for the successful completion of project work and providing the necessary

    facilities.

    We are most obliged and grateful to our Principal, H.O.D ECE Dept, and internal guide,

    Associate Professor, ECE Dept, for giving us guidance in completing this project successfully.

    We are grateful to , Project Guide, Hyderabad, for their sagacious guidance, scholarly

    advice and the inspiration offered in an amiable and pleasant manner in helping us

    completing this project successfully.

    Last but no the least, we are thankful to our friends and well wishers.

  • 7/27/2019 Arbitery Document

    2/78

    Design and Implementation of A Lottery-based Bandwidth

    Guaranteed and Low Latency Arbiter for On-Chip Bus

    Abstract

    In the paper, we propose the two-level Lottery-based bus arbitration algorithm, which is

    called RB_ Lottery arbitration algorithm, where R means real-time, and B means binary group

    logic for priority selections. The proposed bus arbitration solves the impartiality and starvation

    problems which exist in the previous Lottery method, and reduces the average latency of bus

    requests for

    real-time applications. The software simulation results show that the proposed RB_Lottery

    algorithm has better performance of bandwidth guarantees, and has less average latency of bus

    requests than the Lottery arbitration.

    The bus arbiter decides which master can be granted for bus accesses when the multiple masters

    issue bus requests at the same time in a system-on-chip. In the previous bus arbitration

    algorithms, the static fixed priority algorithm, and the time division multiplexing (TDM)/ Round-

    robin algorithm , and the Lottery ticket bus algorithm exist some drawbacks in arbitrations, such

    as the bus starvation problem, and low system performance problem because of bus distribution

    latency during bus arbitration, and the large latency delay problem because of lower ratio

    assigned ticket number of the masters. Recently, the real-time issue has been considered for bus

    arbitrations [5]. In our paper, we propose the two-level static priority bus arbitration algorithm,

    the RB_Lottery algorithm, which is based on Lottery bus arbitration method. The proposed RB_

    Lottery bus arbitration can handle the real-time requirements for all masters, and solve the

    traditional bus distribution problem, and guarantee the bandwidth requirement of each master,

    and then reduces the bus arbitration latency. In the first level bus arbitration, we use static

    priority real-time counters to satisfy the real-time requirements, which is named the static

    priority real-time handler. In the second level bus arbitration, we adopt a Lottery-based algorithm

    with binary group partition for avoiding starvation and reducing bus latency.

  • 7/27/2019 Arbitery Document

    3/78

    TABLE OF CONTENTS

  • 7/27/2019 Arbitery Document

    4/78

  • 7/27/2019 Arbitery Document

    5/78

    INTRODUCTION

    Xilinx introduced Field programmable gate arrays, or FPGAs, in 1985. Figure 1 is a conceptual

    model of an FPGA.

    FPGA are constructed of three basic elements: logic blocks, I/O cells, and interconnection

    resources. A useful analogy for an FPGA is the layout of a city. The logic blocks correspond to

    city blocks that are

    occupied by different businesses receiving products from various suppliers within the city, just as

    the logic blocks receive data from other logic blocks within the FPGA, and processing those

    products for consumption by other firms or end users, just as logic block outputs are sent to other

    blocks and ultimately to the device utilizing the FPGA. FPGAs and our mythical city both utilize

    interconnections

    between blocks, wire segments for FPGAs and streets and telephone connections for the city,

    that can be flexibly designed to meet changing needs with routers in both cases and stoplights in

  • 7/27/2019 Arbitery Document

    6/78

    one. The final elements in the model are the mechanisms for interaction with the outside world;

    I/O cells to the FPGA

    as airports, freeways, and long distance telephone lines are to the city. The rest of this report will

    explore in greater detail implementations of this basic three-element model.

    Configurable Logic Blocks:

    The heart of the FPGA lies in the CLBs. CLBs appear in rows and columns within

    all

    FPGAs and implement the logic functions desired by the programmer. Most CLBs accomplish

    this

    with a lookup table2 . Lookup tables (LUTs) are digital memory arrays that contain truth tables

    for

    any logic function that can be implemented by the given number of logic inputs for a CLB. The

    output of the CLB is then the logical result of the function recorded in the lookup table. In order

    to

    program the CLBs, truth tables be loaded into the LUTs of each CLB. Refer to page 3 for an

    example of the CLB architecture for a Xilinx XC5200 chip.2

    I/O Blocks:

    I/O blocks provide for interaction with the outside world. An I/O pin can be used for

    input

    or output.4 I/O blocks can contain logic functionality, although high logic utilization decreases

    pin

    placement flexibility, as I/O blocks utilized in logic cannot be reassigned mid-design.5

    Interconnection (Routing) Architecture:

    The routing architecture usually covers 60-90% of FPGA chip area2 and fittingly will

    require the longest description of the three basic FPGA elements. The routing architecture of

    FPGAs is constructed of wires segmented into various lengths intersecting each other at routing

    switches3. The most popular programmable switch element (PSE) technology, static RAM, for

    implementing these routing switches is briefly discussed in the next section.

    Two types of routing architecture are common:

  • 7/27/2019 Arbitery Document

    7/78

    row based routing, where only horizontal channels are used to connect CLBs, and symmetrical

    routing, where vertical and horizontal channels are utilized, as in figure 1.Direct connection

    wires link neighboring CLBs across routing channels. Connections to distant blocks are

    implemented through programmable switch matrices4 (PSMs), which contain a set of PSEs that

    switch perpendicular wires. The wires routed through PSM are either single lines, which must

    pass through one PSE for each CLB bypassed, or double lines, which pass two CLBs for every

    switch. Long lines skip switching all together. The implementation of complex routing

    techniques is described later for the Xilinx XC5200.

    Static RAM Programmable Switch

    The most common programmable element used for FPGA implementation is static RAM.FPGAs

    use permanent memory, usually PROM, to store the logic configuration of the chip. Upon power

    up, each RAM cell gets a value based upon the PROM configuration. When the cell is high, the

    transistor is

    conducting and current flows, when low, the transistor is cutoff and no current flows.

    Configurable Logic Blocks:

    The diagram on the left is of one of the four identical logic cells that constitute

    each CLB. The segment labeled F contains a lookup table for four inputs (F4-F1). The

    trapezoidal objects are 2:1

    mutiplexers. The chip enable (CE), clock (CK) and clear (CLR) signals travel to this cell and all

    others in the architecture via global long lines. Each cell can be cleared individually or all can be

    cleared at once. Each logic cell can implement either a D flip-flop or a latch. When the clock

    transitions high, the D flip-flop (FD) passes the output of the programmed logic operation to the

    output (Q). From DI to DO, a feed-through path that does not change the logic of the input can

    be

    implemented. This is used in routing applications discussed later.

    In this case, two lookup tables (F) are used for input, each fed with the same four logic

    lines. The fifth input is used to toggle the 2:1 mux between the lookup tables, adding a fifth bit to

    the logic

    function. There are four lookup tables in each CLB, so four independent four-input logic

    functions

  • 7/27/2019 Arbitery Document

    8/78

    or two independent five input logic functions can be implemented in each block.

    I/O Blocks:

    The I/O blocks of XC5200 are completely decoupled from the internal logic of the

    CLBs.5

    The I/O blocks are attached to the internal logic through a ring of inter-connect cells which form

    a

    ring around the chip. The extra routing layer provides connection to nearby CLBs as well as far

    away CLBs through long lines. The XC5200 can be connected with TTL or CMOS logic.

    Interconnection (Routing) Architecture:

    This chip has six levels of routing hierarchy: single length lines (1), double length lines

    (2), direct connects (3), long lines/global lines (4), local interconnection matrices (5), and logic

    cell feed through paths (6). The global routing matrix (GRM) contains the switch matrix

    architecture discussed earlier in this report. The GRM routs logic signals over the single, double

    and long lines, then communicates to the CLB via a 24-line interface to4 the LIM. These

    matrices connect far-away sections of the chip as well as link all CLBs to a global command

    structure. The remaining routing architecture for the XC5200 is

    contained within the Versa-Block units. These units are comprised of the CLBs, as well as local

    interconnection matrices. The local interconnection matrix (LIM) handles connections to

    neighboring CLBs through direct connect lines that bypass the GRMs. The LIM also handles

    logic cell feed through paths, which do not perform any calculations, but merely re-power a

    signal that has faded passing through the chip. This splitting of the routing resources between

    local and global areas simplifies router design, decreases the chip space necessary for routing,

    and decreases use of routing switches, which add resistance and capacitance to circuits.

    A field-programmable gate array (FPGA) is an integrated circuit designed to be

    configured by a customer or a designer after manufacturinghence "field-programmable". The

    FPGA configuration is generally specified using a hardware description language (HDL), similar

    to that used for an application-specific integrated circuit (ASIC) (circuit diagrams were

    previously used to specify the configuration, as they were for ASICs, but this is increasingly

    rare). Contemporary FPGAs have large resources of logic gates and RAM blocks to implement

    http://en.wikipedia.org/wiki/Integrated_circuithttp://en.wikipedia.org/wiki/Field-programmablehttp://en.wikipedia.org/wiki/Hardware_description_languagehttp://en.wikipedia.org/wiki/Application-specific_integrated_circuithttp://en.wikipedia.org/wiki/Circuit_diagramhttp://en.wikipedia.org/wiki/Circuit_diagramhttp://en.wikipedia.org/wiki/Application-specific_integrated_circuithttp://en.wikipedia.org/wiki/Hardware_description_languagehttp://en.wikipedia.org/wiki/Field-programmablehttp://en.wikipedia.org/wiki/Integrated_circuit
  • 7/27/2019 Arbitery Document

    9/78

    complex digital computations. As FPGA designs employ very fast IOs and bidirectional data

    buses it becomes a challenge to verify correct timing of valid data within setup time and hold

    time. Floor planning enables resources allocation within FPGA to meet these time

    constraints. FPGAs can be used to implement any logical function that an ASIC could perform.

    The ability to update the functionality after shipping, partial re-configuration of a portion of the

    design and the low non-recurring engineering costs relative to an ASIC design (notwithstanding

    the generally higher unit cost), offer advantages for many applications.

    FPGAs contain programmable logic components called "logic blocks", and a

    hierarchy of reconfigurable interconnects that allow the blocks to be "wired together"

    somewhat like many (changeable) logic gates that can be inter-wired in (many) different

    configurations. Logic blocks can be configured to perform complex combinational functions, or

    merely simple logic gates like AND and XOR. In most FPGAs, the logic blocks also include

    memory elements, which may be simple flip-flops or more complete blocks of memory.

    Some FPGAs have analog features in addition to digital functions. The most

    common analog feature is programmable slew rate and drive strength on each output pin,

    allowing the engineer to set slow rates on lightly loaded pins that would

    otherwise ring unacceptably, and to set stronger, faster rates on heavily loaded pins on high-

    speed channels that would otherwise run too slow. Another relatively common analog feature is

    differential comparators on input pins designed to be connected to differential

    signaling channels. A few "mixed signal FPGAs" have integrated peripheral analog-to-digital

    converters (ADCs) and digital-to-analog converters (DACs) with analog signal conditioning

    blocks allowing them to operate as a system-on-a-chip. Such devices blur the line between an

    FPGA, which carries digital ones and zeros on its internal programmable interconnect fabric,

    and field-programmable analog array (FPAA), which carries analog values on its internal

    programmable interconnect fabric.

    http://en.wikipedia.org/wiki/Partial_re-configurationhttp://en.wikipedia.org/wiki/Programmable_logic_devicehttp://en.wikipedia.org/wiki/Combinational_logichttp://en.wikipedia.org/wiki/Logic_gatehttp://en.wikipedia.org/wiki/AND_gatehttp://en.wikipedia.org/wiki/XOR_gatehttp://en.wikipedia.org/wiki/Flip-flop_(electronics)http://en.wikipedia.org/wiki/Slew_ratehttp://en.wikipedia.org/wiki/Electrical_resonancehttp://en.wikipedia.org/wiki/Differential_signalinghttp://en.wikipedia.org/wiki/Differential_signalinghttp://en.wikipedia.org/wiki/Mixed-signal_integrated_circuithttp://en.wikipedia.org/wiki/Analog-to-digital_converterhttp://en.wikipedia.org/wiki/Analog-to-digital_converterhttp://en.wikipedia.org/wiki/Digital-to-analog_converterhttp://en.wikipedia.org/wiki/System-on-a-chiphttp://en.wikipedia.org/wiki/Field-programmable_analog_arrayhttp://en.wikipedia.org/wiki/Field-programmable_analog_arrayhttp://en.wikipedia.org/wiki/System-on-a-chiphttp://en.wikipedia.org/wiki/Digital-to-analog_converterhttp://en.wikipedia.org/wiki/Analog-to-digital_converterhttp://en.wikipedia.org/wiki/Analog-to-digital_converterhttp://en.wikipedia.org/wiki/Mixed-signal_integrated_circuithttp://en.wikipedia.org/wiki/Differential_signalinghttp://en.wikipedia.org/wiki/Differential_signalinghttp://en.wikipedia.org/wiki/Electrical_resonancehttp://en.wikipedia.org/wiki/Slew_ratehttp://en.wikipedia.org/wiki/Flip-flop_(electronics)http://en.wikipedia.org/wiki/XOR_gatehttp://en.wikipedia.org/wiki/AND_gatehttp://en.wikipedia.org/wiki/Logic_gatehttp://en.wikipedia.org/wiki/Combinational_logichttp://en.wikipedia.org/wiki/Programmable_logic_devicehttp://en.wikipedia.org/wiki/Partial_re-configuration
  • 7/27/2019 Arbitery Document

    10/78

    HISTORY

    The FPGA industry sprouted from programmable read-only memory (PROM)

    and programmable logic devices (PLDs). PROMs and PLDs both had the option of being

    programmed in batches in a factory or in the field (field programmable), however programmable

    logic was hard-wired between logic gates.

    In the late 1980s the Naval Surface Warfare Department funded an experiment proposed

    by Steve Casselman to develop a computer that would implement 600,000 reprogrammable

    gates. Casselman was successful and a patent related to the system was issued in 1992.

    Some of the industrys foundational concepts and technologies for programmable logic

    arrays, gates, and logic blocks are founded in patents awarded to David W. Page and LuVerne R.

    Peterson in 1985.

    Xilinx co-founders Ross Freeman and Bernard Vonderschmitt invented the first

    commercially viable field programmable gate array in 1985the XC2064.The XC2064 had

    programmable gates and programmable interconnects between gates, the beginnings of a new

    technology and market. The XC2064 boasted a mere 64 configurable logic blocks (CLBs), with

    two 3-input lookup tables (LUTs). More than 20 years later, Freeman was entered intothe National Inventors Hall of Fame for his invention.

    Xilinx continued unchallenged and quickly growing from 1985 to the mid-1990s, when

    competitors sprouted up, eroding significant market-share. By 1993, Actel was serving about 18

    percent of the market.

    The 1990s were an explosive period of time for FPGAs, both in sophistication and the

    volume of production. In the early 1990s, FPGAs were primarily used in telecommunications

    and networking. By the end of the decade, FPGAs found their way into consumer, automotive,

    and industrial applications.

    Modern Developments

    http://en.wikipedia.org/wiki/Programmable_read-only_memoryhttp://en.wikipedia.org/wiki/Programmable_logic_deviceshttp://en.wikipedia.org/wiki/Xilinxhttp://en.wikipedia.org/wiki/Ross_Freemanhttp://en.wikipedia.org/wiki/Bernard_Vonderschmitthttp://en.wikipedia.org/wiki/National_Inventors_Hall_of_Famehttp://en.wikipedia.org/wiki/National_Inventors_Hall_of_Famehttp://en.wikipedia.org/wiki/Bernard_Vonderschmitthttp://en.wikipedia.org/wiki/Ross_Freemanhttp://en.wikipedia.org/wiki/Xilinxhttp://en.wikipedia.org/wiki/Programmable_logic_deviceshttp://en.wikipedia.org/wiki/Programmable_read-only_memory
  • 7/27/2019 Arbitery Document

    11/78

    A recent trend has been to take the coarse-grained architectural approach a step further

    by combining the logic blocks and interconnects of traditional FPGAs with

    embedded microprocessors and related peripherals to form a complete "system on a

    programmable chip". This work mirrors the architecture by Ron Perlof and Hana Potash of

    Burroughs Advanced Systems Group which combined a reconfigurable CPU architecture on a

    single chip called the SB24. That work was done in 1982. Examples of such hybrid technologies

    can be found in the Xilinx Zynq-7000 All Programmable SoC, which includes a 1.0 GHz dual-

    core ARM Cortex-A9 MPCore processor embedded within the FPGA's logic fabric or in

    the Altera Arria V FPGA which includes a 800 Mhz dual-core ARM Cortex-A9 MPCore. The

    Atmel FPSLIC is another such device, which uses an AVRprocessor in combination with

    Atmel's programmable logic architecture. The Actel SmartFusiondevices incorporate an ARM

    Cortex-M3 hard processor core (with up to 512 kB of flash and 64 kB of RAM) and analog

    peripherals such as a multi-channel ADC and DACs to their flash-based FPGA fabric.

    In 2010, Xilinx Inc introduced the first All Programable System on a Chip branded

    Zynq-7000 that fused features of an ARM high-end microcontroller (hard-core

    implementations of a 32-bit processor, memory, and I/O) with an FPGA fabric to make FPGAs

    easier for embedded designers to use. By incorporating the ARM processor-based platform into a

    28 nm FPGA family, the extensible processing platform enables system architects and embedded

    software developers to apply a combination of serial and parallel processing to address the

    challenges they face in designing today's embedded systems, which must meet ever-growing

    demands to perform highly complex functions. By allowing them to design in a familiar ARM

    environment, embedded designers benefit from multiple advantages including: decreased time-

    to-market, significantly reduced power, reduced BOM (bill of materials) cost, etc. These are

    among many advantages of an All Programmable FPGA platform compared to more traditional

    design cycles associated with ASICs.

    An alternate approach to using hard-macro processors is to make use ofsoft

    processorcores that are implemented within the FPGA logic. Nios

    II, MicroBlaze and Mico32 are examples of popular soft core processors.

    http://en.wikipedia.org/wiki/Microprocessorshttp://en.wikipedia.org/wiki/Xilinxhttp://en.wikipedia.org/wiki/ARM_architecturehttp://en.wikipedia.org/wiki/Alterahttp://en.wikipedia.org/wiki/Atmel_AVRhttp://en.wikipedia.org/wiki/SmartFusionhttp://en.wikipedia.org/wiki/Soft_processorhttp://en.wikipedia.org/wiki/Soft_processorhttp://en.wikipedia.org/wiki/Semiconductor_intellectual_property_corehttp://en.wikipedia.org/wiki/Nios_IIhttp://en.wikipedia.org/wiki/Nios_IIhttp://en.wikipedia.org/wiki/MicroBlazehttp://en.wikipedia.org/wiki/Mico32http://en.wikipedia.org/wiki/Mico32http://en.wikipedia.org/wiki/MicroBlazehttp://en.wikipedia.org/wiki/Nios_IIhttp://en.wikipedia.org/wiki/Nios_IIhttp://en.wikipedia.org/wiki/Semiconductor_intellectual_property_corehttp://en.wikipedia.org/wiki/Soft_processorhttp://en.wikipedia.org/wiki/Soft_processorhttp://en.wikipedia.org/wiki/SmartFusionhttp://en.wikipedia.org/wiki/Atmel_AVRhttp://en.wikipedia.org/wiki/Alterahttp://en.wikipedia.org/wiki/ARM_architecturehttp://en.wikipedia.org/wiki/Xilinxhttp://en.wikipedia.org/wiki/Microprocessors
  • 7/27/2019 Arbitery Document

    12/78

    As previously mentioned, many modern FPGAs have the ability to be reprogrammed

    at "run time," and this is leading to the idea of reconfigurable computing or reconfigurable

    systemsCPUs that reconfigure themselves to suit the task at hand.

    Additionally, new, non-FPGA architectures are beginning to emerge. Software-

    configurable microprocessors such as the Stretch S5000 adopt a hybrid approach by providing an

    array of processor cores and FPGA-like programmable cores on the same chip.

    Gates

    1987: 9,000 gates, Xilinx

    1992: 600,000, Naval Surface Warfare Department

    Early 2000s: Millions

    Market size

    1985: First commercial FPGA : Xilinx XC2064

    1987: $14 million

    ~1993: >$385 million

    2005: $1.9 billion

    2010 estimates: $2.75 billion

    FPGA design starts

    2005: 80,000

    2008: 90,000

    FPGA comparisons

    Historically, FPGAs have been slower, less energy efficient and generally achieved

    less functionality than their fixed ASIC counterparts. An older study had shown that designs

    implemented on FPGAs need on average 40 times as much area, draw 12 times as much dynamic

    power, and are three times slower than the corresponding ASIC implementations; however, the

    times are changing. Today's FPGAs such as the Xilinx Virtex-7 or the Altera Stratix 5 rival

    ASIC and ASSP solutions providing significantly reduced power, increased speed, lower BOM

    cost, minimal implementation real-estate, and maximum on-the-fly configurability. Where

    http://en.wikipedia.org/wiki/Reconfigurable_computinghttp://en.wikipedia.org/wiki/Central_processing_unithttp://en.wikipedia.org/wiki/Microprocessorhttp://en.wikipedia.org/wiki/ASIChttp://en.wikipedia.org/wiki/Xilinxhttp://en.wikipedia.org/wiki/Alterahttp://en.wikipedia.org/wiki/Alterahttp://en.wikipedia.org/wiki/Xilinxhttp://en.wikipedia.org/wiki/ASIChttp://en.wikipedia.org/wiki/Microprocessorhttp://en.wikipedia.org/wiki/Central_processing_unithttp://en.wikipedia.org/wiki/Reconfigurable_computing
  • 7/27/2019 Arbitery Document

    13/78

    previously a design may have included 6 to 10 ASICs, today the same design can be achieved

    using only one FPGA.

    A Xilinx Zynq-7000 All Programmable System on a Chip.

    Advantages include the ability to re-program in the field to fix bugs, and may include a

    shortertime to market and lowernon-recurring engineering costs. Vendors can also take a

    middle road by developing their hardware on ordinary FPGAs, but manufacture their final

    version so it can no longer be modified after the design has been committed.

    Xilinx claims that several market and technology dynamics are changing the

    ASIC/FPGA paradigm:

    Integrated circuit costs are rising aggressively ASIC complexity has lengthened development time R&D resources and headcount are decreasing Revenue losses for slow time-to-market are increasing Financial constraints in a poor economy are driving low-cost technologies These trends make FPGAs a better alternative than ASICs for a larger number of higher-

    volume applications than they have been historically used for, to which the company

    attributes the growing number of FPGA design starts .

    http://en.wikipedia.org/wiki/Xilinxhttp://en.wikipedia.org/wiki/Time_to_markethttp://en.wikipedia.org/wiki/Non-recurring_engineeringhttp://en.wikipedia.org/wiki/Integrated_circuithttp://en.wikipedia.org/wiki/R%26Dhttp://en.wikipedia.org/wiki/File:Xilinx_Zynq-7000_AP_SoC.jpghttp://en.wikipedia.org/wiki/R%26Dhttp://en.wikipedia.org/wiki/Integrated_circuithttp://en.wikipedia.org/wiki/Non-recurring_engineeringhttp://en.wikipedia.org/wiki/Time_to_markethttp://en.wikipedia.org/wiki/Xilinx
  • 7/27/2019 Arbitery Document

    14/78

    Some FPGAs have the capability ofpartial re-configuration that lets one portion of thedevice be re-programmed while other portions continue running.

    Complex Programmable Logic Devices (CPLD)

    The primary differences between CPLDs (Complex Programmable Logic Devices)

    and FPGAs are architectural. A CPLD has a somewhat restrictive structure consisting of one or

    more programmable sum-of-products logic arrays feeding a relatively small number of clocked

    registers. The result of this is less flexibility, with the advantage of more predictable timing

    delays and a higher logic-to-interconnect ratio. The FPGA architectures, on the other hand, are

    dominated by interconnect. This makes them far more flexible (in terms of the range of designs

    that are practical for implementation within them) but also far more complex to design for.

    In practice, the distinction between FPGAs and CPLDs is often one of size as FPGAs

    are usually much larger in terms of resources than CPLDs. Typically only FPGA's contain more

    complex embedded functions such as adders, multipliers, memory, and serdes. Another common

    distinction is that CPLDs contain embedded flash to store their configuration while FPGAs

    usually, but not always, require an external flash memory.

    Security considerations

    With respect to security, FPGAs have both advantages and disadvantages as

    compared to ASICs or secure microprocessors. FPGAs' flexibility makes malicious

    modifications during fabrication a lower risk. Previously, for many FPGAs, the design bitstream

    is exposed while the FPGA loads it from external memory (typically on every power-on). All

    major FPGA vendors now offer a spectrum of security solutions to designers such as

    bitstream encryption and authentication. For example, Altera and Xilinx offerAES (up to 256

    bit) encryption for bitstreams stored in an external flash memory.

    FPGAs that store their configuration internally in nonvolatile flash memory, such

    as Microsemi's ProAsic 3 orLattice's XP2 programmable devices, do not expose the bitstream

    and do not need encryption. In addition, flash memory for LUT provides SEU protection for

    space applications.

    http://en.wikipedia.org/wiki/Partial_re-configurationhttp://en.wikipedia.org/wiki/CPLDhttp://en.wikipedia.org/wiki/Serdeshttp://en.wikipedia.org/wiki/Encryptionhttp://en.wikipedia.org/wiki/Authenticationhttp://en.wikipedia.org/wiki/Alterahttp://en.wikipedia.org/wiki/Xilinxhttp://en.wikipedia.org/wiki/Advanced_Encryption_Standardhttp://en.wikipedia.org/wiki/Microsemihttp://en.wikipedia.org/wiki/Latticehttp://en.wikipedia.org/wiki/Latticehttp://en.wikipedia.org/wiki/Microsemihttp://en.wikipedia.org/wiki/Advanced_Encryption_Standardhttp://en.wikipedia.org/wiki/Xilinxhttp://en.wikipedia.org/wiki/Alterahttp://en.wikipedia.org/wiki/Authenticationhttp://en.wikipedia.org/wiki/Encryptionhttp://en.wikipedia.org/wiki/Serdeshttp://en.wikipedia.org/wiki/CPLDhttp://en.wikipedia.org/wiki/Partial_re-configuration
  • 7/27/2019 Arbitery Document

    15/78

    Applications

    Applications of FPGAs include digital signal processing, software-defined

    radio, ASICprototyping, medical imaging, computer vision, speech

    recognition, cryptography, bioinformatics, computer hardware emulation, radio astronomy, metal

    detection and a growing range of other areas.

    FPGAs originally began as competitors to CPLDs and competed in a similar space, that

    ofglue logic forPCBs. As their size, capabilities, and speed increased, they began to take over

    larger and larger functions to the state where some are now marketed as full systems on chips

    (SoC). Particularly with the introduction of dedicated multipliers into FPGA architectures in the

    late 1990s, applications which had traditionally been the sole reserve ofDSPsbegan to

    incorporate FPGAs instead.

    Traditionally, FPGAs have been reserved for specific vertical applications where the

    volume of production is small. For these low-volume applications, the premium that companies

    pay in hardware costs per unit for a programmable chip is more affordable than the development

    resources spent on creating an ASIC for a low-volume application.

    Today, new cost and performance dynamics have broadened the range of viable

    applications.

    Common FPGA Applications:

    Aerospace and Defense Avionics/DO-254 MILCOM Missiles & Munitions Secure Solutions Space ASIC Prototyping Audio Connectivity Solutions Portable Electronics Radio

    http://en.wikipedia.org/wiki/Digital_signal_processinghttp://en.wikipedia.org/wiki/Software-defined_radiohttp://en.wikipedia.org/wiki/Software-defined_radiohttp://en.wikipedia.org/wiki/Application-specific_integrated_circuithttp://en.wikipedia.org/wiki/Medical_imaginghttp://en.wikipedia.org/wiki/Computer_visionhttp://en.wikipedia.org/wiki/Speech_recognitionhttp://en.wikipedia.org/wiki/Speech_recognitionhttp://en.wikipedia.org/wiki/Cryptographyhttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Hardware_emulationhttp://en.wikipedia.org/wiki/Radio_astronomyhttp://en.wikipedia.org/wiki/CPLDhttp://en.wikipedia.org/wiki/Glue_logichttp://en.wikipedia.org/wiki/Printed_circuit_boardhttp://en.wikipedia.org/wiki/System-on-a-chiphttp://en.wikipedia.org/wiki/Digital_signal_processorhttp://en.wikipedia.org/wiki/Vertical_applicationhttp://en.wikipedia.org/wiki/Vertical_applicationhttp://en.wikipedia.org/wiki/Digital_signal_processorhttp://en.wikipedia.org/wiki/System-on-a-chiphttp://en.wikipedia.org/wiki/Printed_circuit_boardhttp://en.wikipedia.org/wiki/Glue_logichttp://en.wikipedia.org/wiki/CPLDhttp://en.wikipedia.org/wiki/Radio_astronomyhttp://en.wikipedia.org/wiki/Hardware_emulationhttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Cryptographyhttp://en.wikipedia.org/wiki/Speech_recognitionhttp://en.wikipedia.org/wiki/Speech_recognitionhttp://en.wikipedia.org/wiki/Computer_visionhttp://en.wikipedia.org/wiki/Medical_imaginghttp://en.wikipedia.org/wiki/Application-specific_integrated_circuithttp://en.wikipedia.org/wiki/Software-defined_radiohttp://en.wikipedia.org/wiki/Software-defined_radiohttp://en.wikipedia.org/wiki/Digital_signal_processing
  • 7/27/2019 Arbitery Document

    16/78

    Digital Signal Processing (DSP) Automotive High Resolution Video Image Processing Vehicle Networking and Connectivity Automotive Infotainment Broadcast Real-Time Video Engine Edge QAM Encoders Displays Switches and Routers Consumer Electronics Digital Displays Digital Cameras Multi-function Printers Portable Electronics Set-top Boxes Data Center Servers Security Routers Switches Gateways Load Balancing High Performance Computing Servers Super Computers SIGINT Systems High-end RADARS

  • 7/27/2019 Arbitery Document

    17/78

    High-end Beam Forming Systems Data Mining Systems Industrial Industrial Imaging Industrial Networking Motor Control Medical Ultrasound CT Scanner MRI X-ray PET Surgical Systems Security Industrial Imaging Secure Solutions Image Processing Video & Image Processing High Resolution Video Video Over IP Gateway Digital Displays Industrial Imaging Wired Communications Optical Transport Networks Network Processing Connectivity Interfaces Wireless Communications Baseband Connectivity Interfaces Mobile Backhaul

  • 7/27/2019 Arbitery Document

    18/78

    Radio

    Architecture

    The most common FPGA architecture consists of an array of logic blocks (called

    Configurable Logic Block, CLB, or Logic Array Block, LAB, depending on vendor), I/O pads,

    and routing channels. Generally, all the routing channels have the same width (number of wires).

    Multiple I/O pads may fit into the height of one row or the width of one column in the array.

    An application circuit must be mapped into an FPGA with adequate resources. While

    the number of CLBs/LABs and I/Os required is easily determined from the design, the number of

    routing tracks needed may vary considerably even among designs with the same amount of logic.

    For example, a crossbar switch requires much more routing than a systolic array with the same

    gate count. Since unused routing tracks increase the cost (and decrease the performance) of the

    part without providing any benefit, FPGA manufacturers try to provide just enough tracks so that

    most designs that will fit in terms ofLookup tables (LUTs) and IOs can be routed. This is

    determined by estimates such as those derived from Rent's rule or by experiments with existing

    designs.

    In general, a logic block (CLB or LAB) consists of a few logical cells (called ALM,

    LE, Slice etc.). A typical cell consists of a 4-input LUT, a Full adder(FA) and a D-type flip-flop,

    as shown below. The LUTs are in this figure split into two 3-input LUTs. In normal mode thoseare combined into a 4-input LUT through the left mux. In arithmetic mode, their outputs are fed

    to the FA. The selection of mode is programmed into the middle multiplexer. The output can be

    either synchronous or asynchronous, depending on the programming of the mux to the right, in

    the figure example. In practice, entire or parts of the FA are put as functions into the LUTs in

    order to save space.

    http://en.wikipedia.org/wiki/Crossbar_switchhttp://en.wikipedia.org/wiki/Systolic_arrayhttp://en.wikipedia.org/wiki/Lookup_table#Hardware_LUTshttp://en.wikipedia.org/wiki/Rent%27s_rulehttp://en.wikipedia.org/wiki/Full_adderhttp://en.wikipedia.org/wiki/Flip-flop_(electronics)http://en.wikipedia.org/wiki/Multiplexerhttp://en.wikipedia.org/wiki/File:FPGA_cell_example.pnghttp://en.wikipedia.org/wiki/Multiplexerhttp://en.wikipedia.org/wiki/Flip-flop_(electronics)http://en.wikipedia.org/wiki/Full_adderhttp://en.wikipedia.org/wiki/Rent%27s_rulehttp://en.wikipedia.org/wiki/Lookup_table#Hardware_LUTshttp://en.wikipedia.org/wiki/Systolic_arrayhttp://en.wikipedia.org/wiki/Crossbar_switch
  • 7/27/2019 Arbitery Document

    19/78

    Simplified example illustration of a logic cell

    ALMs and Slices usually contains 2 or 4 structures similar to the example figure, with some

    shared signals.

    CLBs/LABs typically contains a few ALMs/LEs/Slices.

    In recent years, manufacturers have started moving to 6-input LUTs in their high

    performance parts, claiming increased performance.

    Since clock signals (and often other high-fan-out signals) are normally routed via special-

    purpose dedicated routing networks in commercial FPGAs, they and other signals are separately

    managed.

    For this example architecture, the locations of the FPGA logic block pins are shown below.

    Logic Block Pin Locations

    Each input is accessible from one side of the logic block, while the output pin can

    connect to routing wires in both the channel to the right and the channel below the logic block.

    Each logic block output pin can connect to any of the wiring segments in the channels adjacent

    to it.

    Similarly, an I/O pad can connect to any one of the wiring segments in the channel

    adjacent to it. For example, an I/O pad at the top of the chip can connect to any of the W wires

    (where W is the channel width) in the horizontal channel immediately below it.

    Generally, the FPGA routing is unsegmented. That is, each wiring segment spans

    only one logic block before it terminates in a switch box. By turning on some of the

    programmable switches within a switch box, longer paths can be constructed. For higher speed

    interconnect, some FPGA architectures use longer routing lines that span multiple logic blocks.

    http://en.wikipedia.org/wiki/Fan-outhttp://en.wikipedia.org/wiki/File:Logic_block_pins.svghttp://en.wikipedia.org/wiki/Fan-out
  • 7/27/2019 Arbitery Document

    20/78

    Whenever a vertical and a horizontal channel intersect, there is a switch box. In this

    architecture, when a wire enters a switch box, there are three programmable switches that allow

    it to connect to three other wires in adjacent channel segments. The pattern, or topology, of

    switches used in this architecture is the planar or domain-based switch box topology. In this

    switch box topology, a wire in track number one connects only to wires in track number one in

    adjacent channel segments, wires in track number 2 connect only to other wires in track number

    2 and so on. The figure below illustrates the connections in a switch box.

    Switch box topology

    Modern FPGA families expand upon the above capabilities to include higher level

    functionality fixed into the silicon. Having these common functions embedded into the silicon

    reduces the area required and gives those functions increased speed compared to building them

    from primitives. Examples of these include multipliers, generic DSP blocks, embedded

    processors, high speed IO logic and embedded memories.

    FPGAs are also widely used for systems validation including pre-silicon validation,

    post-silicon validation, and firmware development. This allows chip companies to validate their

    design before the chip is produced in the factory, reducing the time-to-market.

    To shrink the size and power consumption of FPGAs, vendors such as Tabula and Xilinx have

    introduced new 3D or stacked architectures. Following the introduction of its 28 nm 7-series

    FPGAs, Xilinx revealed that several of the highest-density parts in those FPGA product lines

    http://en.wikipedia.org/wiki/File:Switch_box.svg
  • 7/27/2019 Arbitery Document

    21/78

    will be constructed using multiple dies in one package, employing technology developed for 3D

    construction and stacked-die assemblies. The technology stacks several (three or four) active

    FPGA dice side-by-side on a silicon interposera single piece of silicon that carries passive

    interconnect.

    FPGA design and programming

    To define the behavior of the FPGA, the user provides a hardware description

    language (HDL) or a schematic design. The HDL form is more suited to work with large

    structures because it's possible to just specify them numerically rather than having to draw every

    piece by hand. However, schematic entry can allow for easier visualization of a design.

    Then, using an electronic design automation tool, a technology-mapped netlist is

    generated. The net list can then be fitted to the actual FPGA architecture using a process

    called place-and-route, usually performed by the FPGA company's proprietary place-and-route

    software. The user will validate the map, place and route results via timing analysis, simulation,

    and otherverification methodologies. Once the design and validation process is complete, the

    binary file generated (also using the FPGA company's proprietary software) is used to

    (re)configure the FPGA. This file is transferred to the FPGA/CPLD via a serial interface (JTAG)

    or to an external memory device like an EEPROM.

    The most common HDLs are VHDL and Verilog, although in an attempt to reduce the

    complexity of designing in HDLs, which have been compared to the equivalent ofassembly

    languages, there are moves to raise the abstraction level through the introduction ofalternative

    languages. National Instrument's Lab VIEW graphical programming language (sometimes

    referred to as "G") has an FPGA add-in module available to target and program FPGA hardware.

    To simplify the design of complex systems in FPGAs, there exist libraries of

    predefined complex functions and circuits that have been tested and optimized to speed up the

    design process. These predefined circuits are commonly called IP cores, and are available from

    FPGA vendors and third-party IP suppliers (rarely free, and typically released under proprietary

    licenses). Other predefined circuits are available from developer communities such

    as OpenCores (typically released underfree and open source licenses such as the GPL, BSD or

    similar license), and other sources.

    http://en.wikipedia.org/wiki/Interposerhttp://en.wikipedia.org/wiki/Hardware_description_languagehttp://en.wikipedia.org/wiki/Hardware_description_languagehttp://en.wikipedia.org/wiki/Schematichttp://en.wikipedia.org/wiki/Electronic_design_automationhttp://en.wikipedia.org/wiki/Netlisthttp://en.wikipedia.org/wiki/Place_and_routehttp://en.wikipedia.org/wiki/Timing_analysishttp://en.wikipedia.org/wiki/Simulationhttp://en.wikipedia.org/wiki/Verification_and_validationhttp://en.wikipedia.org/wiki/Serial_communicationhttp://en.wikipedia.org/wiki/Joint_Test_Action_Grouphttp://en.wikipedia.org/wiki/EEPROMhttp://en.wikipedia.org/wiki/VHDLhttp://en.wikipedia.org/wiki/Veriloghttp://en.wikipedia.org/wiki/Assembly_languagehttp://en.wikipedia.org/wiki/Assembly_languagehttp://en.wikipedia.org/wiki/Hardware_description_language#HDL_and_programming_languageshttp://en.wikipedia.org/wiki/Hardware_description_language#HDL_and_programming_languageshttp://en.wikipedia.org/wiki/LabVIEWhttp://en.wikipedia.org/wiki/Semiconductor_intellectual_property_corehttp://en.wikipedia.org/wiki/OpenCoreshttp://en.wikipedia.org/wiki/Free_and_open_source_softwarehttp://en.wikipedia.org/wiki/GNU_General_Public_Licensehttp://en.wikipedia.org/wiki/BSD_licensehttp://en.wikipedia.org/wiki/BSD_licensehttp://en.wikipedia.org/wiki/GNU_General_Public_Licensehttp://en.wikipedia.org/wiki/Free_and_open_source_softwarehttp://en.wikipedia.org/wiki/OpenCoreshttp://en.wikipedia.org/wiki/Semiconductor_intellectual_property_corehttp://en.wikipedia.org/wiki/LabVIEWhttp://en.wikipedia.org/wiki/Hardware_description_language#HDL_and_programming_languageshttp://en.wikipedia.org/wiki/Hardware_description_language#HDL_and_programming_languageshttp://en.wikipedia.org/wiki/Assembly_languagehttp://en.wikipedia.org/wiki/Assembly_languagehttp://en.wikipedia.org/wiki/Veriloghttp://en.wikipedia.org/wiki/VHDLhttp://en.wikipedia.org/wiki/EEPROMhttp://en.wikipedia.org/wiki/Joint_Test_Action_Grouphttp://en.wikipedia.org/wiki/Serial_communicationhttp://en.wikipedia.org/wiki/Verification_and_validationhttp://en.wikipedia.org/wiki/Simulationhttp://en.wikipedia.org/wiki/Timing_analysishttp://en.wikipedia.org/wiki/Place_and_routehttp://en.wikipedia.org/wiki/Netlisthttp://en.wikipedia.org/wiki/Electronic_design_automationhttp://en.wikipedia.org/wiki/Schematichttp://en.wikipedia.org/wiki/Hardware_description_languagehttp://en.wikipedia.org/wiki/Hardware_description_languagehttp://en.wikipedia.org/wiki/Interposer
  • 7/27/2019 Arbitery Document

    22/78

    In a typical design flow, an FPGA application developer will simulate the design at

    multiple stages throughout the design process. Initially the RTL description

    in VHDL orVerilog is simulated by creating test benches to simulate the system and observe

    results. Then, after the synthesis engine has mapped the design to a netlist, the netlist is

    translated to a gate level description where simulation is repeated to confirm the synthesis

    proceeded without errors. Finally the design is laid out in the FPGA at which point propagation

    delays can be added and the simulation run again with these values back-annotated onto the

    netlist.

    Basic process technology types

    SRAM - based on static memory technology. In-system programmable and re-programmable.

    Requires external boot devices.

    CMOS -Currently in use.

    Antifuse- One-time programmable. CMOS.

    PROM- Programmable Read-Only Memory technology. One-time programmable because of

    plastic packaging. Obsolete.

    EPROM- Erasable Programmable Read-Only Memory technology. One-time programmable

    but with window, can be erased with ultraviolet (UV) light. CMOS. Obsolete.

    EEPROM- Electrically Erasable Programmable Read-Only Memory technology. Can be

    erased, even in plastic packages. Some but not all EEPROM devices can be in-system

    programmed. CMOS.

    Flash - Flash-erase EPROM technology. Can be erased, even in plastic packages. Some but not

    all flash devices can be in-system programmed. Usually, a flash cell is smaller than an equivalent

    EEPROM cell and is therefore less expensive to manufacture. CMOS.

    Fuse - One-time programmable. Bipolar. Obsolete.

    Major manufacturers

    Xilinx and Altera are the current FPGA market leaders and long-time industry rivals.

    Together, they control over 80 percent of the market.Both Xilinx and Altera provide

    free Windows and Linux design software which provides limited sets of devices. Other

    competitors include Lattice Semiconductor(SRAM based with integrated configuration Flash,

    http://en.wikipedia.org/wiki/Register_transfer_levelhttp://en.wikipedia.org/wiki/VHDLhttp://en.wikipedia.org/wiki/Veriloghttp://en.wikipedia.org/wiki/Logic_synthesishttp://en.wikipedia.org/wiki/Static_Random_Access_Memoryhttp://en.wikipedia.org/wiki/CMOShttp://en.wikipedia.org/wiki/Antifusehttp://en.wikipedia.org/wiki/Antifusehttp://en.wikipedia.org/wiki/Programmable_read-only_memoryhttp://en.wikipedia.org/wiki/Programmable_read-only_memoryhttp://en.wikipedia.org/wiki/EPROMhttp://en.wikipedia.org/wiki/EPROMhttp://en.wikipedia.org/wiki/EEPROMhttp://en.wikipedia.org/wiki/EEPROMhttp://en.wikipedia.org/wiki/Flash_memoryhttp://en.wikipedia.org/wiki/Fuse_(electrical)http://en.wikipedia.org/wiki/Xilinxhttp://en.wikipedia.org/wiki/Alterahttp://en.wikipedia.org/wiki/Microsoft_Windowshttp://en.wikipedia.org/wiki/Linuxhttp://en.wikipedia.org/wiki/Lattice_Semiconductorhttp://en.wikipedia.org/wiki/Lattice_Semiconductorhttp://en.wikipedia.org/wiki/Linuxhttp://en.wikipedia.org/wiki/Microsoft_Windowshttp://en.wikipedia.org/wiki/Alterahttp://en.wikipedia.org/wiki/Xilinxhttp://en.wikipedia.org/wiki/Fuse_(electrical)http://en.wikipedia.org/wiki/Flash_memoryhttp://en.wikipedia.org/wiki/EEPROMhttp://en.wikipedia.org/wiki/EPROMhttp://en.wikipedia.org/wiki/Programmable_read-only_memoryhttp://en.wikipedia.org/wiki/Antifusehttp://en.wikipedia.org/wiki/CMOShttp://en.wikipedia.org/wiki/Static_Random_Access_Memoryhttp://en.wikipedia.org/wiki/Logic_synthesishttp://en.wikipedia.org/wiki/Veriloghttp://en.wikipedia.org/wiki/VHDLhttp://en.wikipedia.org/wiki/Register_transfer_level
  • 7/27/2019 Arbitery Document

    23/78

    instant-on, low power, live reconfiguration), Actel (now Microsemi, antifuse, flash-based,

    mixed-signal), SiliconBlue Technologies (extremely low power SRAM-based FPGAs with

    optional integrated nonvolatile configuration memory; acquired by Lattice in

    2011), Achronix (SRAM based, 1.5 GHz fabric speed) ,[39] and QuickLogic (handheld focused

    CSSP, no general purpose FPGAs).

    http://en.wikipedia.org/wiki/Actelhttp://en.wikipedia.org/wiki/Microsemihttp://en.wikipedia.org/wiki/SiliconBlue_Technologieshttp://www.achronix.com/http://en.wikipedia.org/wiki/Field-programmable_gate_array#cite_note-39http://en.wikipedia.org/wiki/QuickLogichttp://en.wikipedia.org/wiki/QuickLogichttp://en.wikipedia.org/wiki/Field-programmable_gate_array#cite_note-39http://www.achronix.com/http://en.wikipedia.org/wiki/SiliconBlue_Technologieshttp://en.wikipedia.org/wiki/Microsemihttp://en.wikipedia.org/wiki/Actel
  • 7/27/2019 Arbitery Document

    24/78

    Design and Implementation of A Lottery-based Bandwidth

    Guaranteed and Low Latency Arbiter for On-Chip Bus1.Introduction

    The bus arbiter decides which master can be granted for bus accesses when the multiple masters

    issue bus requests at the same time in a system-on-chip. In the previous bus arbitration

    algorithms, the static fixed priority algorithm , and the time division multiplexing (TDM)/

    Round-robin algorithm , and the Lottery ticket bus algorithm exist some drawbacks in

    arbitrations, such as the bus starvation problem, and low system performance problem because of

    bus distribution latency during bus arbitration, and the large latency delay problem because of

    lower ratio assigned ticket number of the masters. Recently, the real-time issue has been

    considered for bus arbitrations [5]. In our paper, we propose the two-level static priority bus

    arbitration algorithm, the RB_Lottery algorithm, which is based on Lottery bus arbitration

    method. The proposed RB_Lottery bus arbitration can handle the real-time requirements for all

    masters, and solve the traditional bus distribution problem, and guarantee the bandwidth

    requirement of each master, and then reduces the bus arbitration latency. In the first level bus

    arbitration, we use static priority real-time counters to satisfy the real-time requirements, which

    is named the static priority real-time handler. In the second level bus arbitration, we adopt a

    Lottery-based algorithm with binary group partition for avoiding starvation and reducing bus

    latency.

  • 7/27/2019 Arbitery Document

    25/78

    2. Previous Bus Arbitration Schemes for SOC Bus

    Communications2.1 Static Fixed Priority Algorithm

    The static fixed priority algorithm assigns the unique value of the priority in each master, and

    then the arbiter will periodically check the requirement of each master. When several masters

    issue requests simultaneously, the master which owns the highest priority will be granted to

    access the bus. The advantages of the scheme are quick arbitration and simple architecture, but

    static priority based arbitration allocates the proportion of communication bandwidth to each

    master according to its own priority, and this causes that the low priority master will have

    bandwidth starvation if there are many high priority communication traffics on the bus.

    2.2 Two-level TDM / Round-robin Algorithm

    Time division multiplexed (TDM) scheme divides the scheduling execution time on the bus into

    the time slots, and then allocates the time slots to each master . Each time slot can span several

    physical transactions on the bus. The arbitration can provide elastic bandwidth assignments,

    when a master which has reserved more than one slot is potentially granted to access the bus

    multiple times. The 1st level of arbitration uses a timing wheel, where each slot is statically

    reserved for a unique master. If the master possesses the current time slot, but the master does

    not issue a request, the current time slot will be wasted. For repairing this defect, the 2nd level of

    arbitration, which is called the Round-robin algorithm, can reallocate the

    available slots to other requesting masters.

    2.3 Lottery Bus Algorithm

    For the Lottery bus arbitration algorithm , the role of the arbitration is like a lottery manager,

    which decides which lucky one can win the prize. The lottery manager gathers the requests of

    bus accesses from all of the masters, and then each master is statically assigned a number of

    lottery tickets. The lottery manager generates a pseudo random number, which corresponds to

    one ticket number, and thus the master which owns more tickets is most likely granted. The

    ticket number in the lottery arbitration algorithm is equal to the weight of each master. The

    Lottery arbitration algorithm is the probability-based distribution, which can avoid the bus

  • 7/27/2019 Arbitery Document

    26/78

    starvation. Meanwhile, the Lottery arbitration has great control ability of communication

    bandwidth allocations to each master, but the master which owns lower tickets has more average

    latency than the other masters. In Figure 1, let us set the bus masters to be C1, C2,,

    Cn. We define that the number of tickets held by each master is t1, t2, , tn. At any bus cycle,

    let us define the pending requests to be represented by a set of Boolean variables ri for i=1, 2,

    ,n, where ri=1 if the master Ci has a pending request, and otherwise ri =0. For the Lottery

    arbitration, the granted master is chosen by a randomized way, i.e. the probability of granting

    master Ci.

    Lottery bus arbiter for four bus masters

  • 7/27/2019 Arbitery Document

    27/78

    2.Proposed Bus Arbitration Scheme3.1 Scope of the Arbitration

    Since the previous arbitration algorithms can not handle the strict real-time requirements, we

    propose the two-level arbitration algorithm, which is called the RB_Lottery bus arbitration. The

    proposed arbiter architecture is shown in Figure 2. In the first level, the static priority real-time

    handler intends to handle the real-time requirements. In the second level, the binary group

    partition with Lottery-based scheme intends to guarantee the bandwidth which each master

    needs, and reduces the distribution latency during bus arbitrations. It notes that once the static

    priority real-time handler in the first level generates a valid grant output (grant=1) for one

    bus master, the output of the second level arbitration will be disabled. On the contrary, the grant

    output of the second level arbitration will be valid only when the first level arbitration does not

    output a valid grant. The detailed descriptions of the proposed RB_Lottery scheme will be

    discussed in the following sections.

    3.2 Proposed Arbitration Algorithm

    The static priority real-time handler sets a priority real-time counter for the real-time requirement

    of each master. Initially, we can set suitably initial counter values into real-time counters for all

    bus masters. When a master issues a request, the corresponding real-time counter will be

    decreased by 1 until the master is granted. Two conditions will happen when a master issues a

    request for bus grant. On the one hand, when the counter value is decreased to zero, then the first

    level arbitration will generate a valid grant, and the counter value in the real-timer counter will

    be reset. On the other hand, when the counter value is not decreased to zero and the second level

    arbitration generates a corresponding valid grant at the same time, then the corresponding real-

    time counter will be reset. If several real-time counters are decreased to zero simultaneously, themaster which owns the highest priority will be granted.

  • 7/27/2019 Arbitery Document

    28/78

    Two-level arbitration scheme for the proposed RB_ Lottery arbiter

    In the aspect of binary group logic, each master gives the identification number for priority

    request. Then, we group two masters into a binary set. If the higher priority master issues arequest signal, then this signal will deliver into Lottery-based block. The binary group logic

    architecture is shown in Figure . In Figure 3 the input net, Lo priority request, must be connected

    to the request signal of lower priority master in a binary set. Then, the Hi priority request must be

    connected to the request signal of higher priority master in the same binary set.

    Binary group logic circuit for priority selection

  • 7/27/2019 Arbitery Document

    29/78

    3.3 Proposed Arbiter Architecture

    To describe clearly the architecture of the proposed RB_Lottery arbiter, we discuss the proposed

    bus arbitration with 4-master case in Figure 4 as follows. Let us define that the request signals of

    Master 1 (M1), Master 2 (M2), Master 3 (M3) and Master 4 (M4) are assigned to r1, r2, r3

    and r4, respectively. Then, the priority order is assigned to M1 > M2 > M3 > M4, where M1

    owns the highest priority. In Lottery-based part, t1=1 and t2=2. If the master wants to access or

    communicate data through using bus, the corresponding request signal will be set to high (i.e.,1);

    otherwise, the corresponding request signal will be set low (i.e.,0). In condition 1, suppose that

    r1=r3=1, r2=r4=0, the real-time counters of M1 and M3 are decreasing to zero, simultaneously.

    Since the priority of M3 is more important than that of M1, the M3 will obtain the bus grant,

    and then the grant signal, gnt[3], will be set to 1 in the static priority real-time handler. At the

    same time, the grant output of the binary group with Lottery-based block is disabled. In

    condition 2, suppose that r1=r3=1, r2=r4=0, but the real-time counters of M1 and M3 are not

    decreasing to zero, then the rb1 signal at the output of the binary group logic, MUX 1, is set to1.

    Meanwhile, the rb2 signal at the output of the binary group logic, MUX 2, is also set to 1.

    Thus, the grant output of the Lottery-based scheme will be active. Table 1 shows the truth table

    of grant decoder in the RB_Lottery arbiter for 4-master case. The generated random number is

    compared in parallel with two partial sums, where the outputs of the comparators are b0 and b1.

    The comparator will output a 1 if the value of the random number is less than the partial sum at

    the other input.

  • 7/27/2019 Arbitery Document

    30/78

    The architecture of proposed RB_ Lottery bus arbiter for 4-master case

  • 7/27/2019 Arbitery Document

    31/78

    AMBA AXI4 architecture

    AMBA AXI4 [3] supports data transfers up to 256 beats and unaligned data transfers using byte

    strobes. In AMBA AXI4 system 16 masters and 16 slaves are interfaced. Each master and slave

    has their own 4 bit ID tags. AMBA AXI4 system consists of master, slave and bus (arbiters and

    decoders). The system consists of five channels namely write address channel, write data

    channel, read data channel, read address channel, and write response channel. The AXI4 protocol

    supports the following mechanisms:

    Unaligned data transfers and up-dated write response requirements.

    Variable-length bursts, from 1 to 16 data transfers per burst.

    A burst with a transfer size of 8, 16, 32, 64, 128, 256, 512 or 1024 bits wide is supported.

    Updated AWCACHE and ARCACHE signalling details.

    Each transaction is burst-based which has address and control information on the address

    channel that describes the nature of the data to be transferred. The data is transferred between

    master and slave using a write data channel to the slave or a read data channel to the master.

    Table gives the information of signals used in the complete design of the protocol. The write

    operation process starts when the master sends an address and control information on the write

    address channel as shown in fig. 1. The master then sends each item of write data over the write

    data channel. The master keeps the VALID signal low until the write data is available. The

    master sends the last data item, the WLAST signal goes HIGH. When the slave has accepted all

    the data items, it drives a write response signal BRESP[1:0] back to the master to indicate that

    the write transaction is complete. This signal indicates the status of the write transaction. The

    allowable responses are OKAY, EXOKAY, SLVERR, and DECERR. After the read address

    appears on the address bus, the data transfer occurs on the read data channel as shown in fig. The

    slave keeps the VALID signal LOW until the read data is available. For the final data transfer of

    the burst, the slave asserts the RLAST signal to show that the last data item is being transferred.

    The RRESP[1:0] signal indicates the status of the read transfer. The allowable responses are

    OKAY, EXOKAY, SLVERR, and DECERR.

  • 7/27/2019 Arbitery Document

    32/78

    The work carried out in this project is the achievement of communication between one

    master and one slave. AMBA AXI4 slave is designed with operating frequency of 100MHz,

    which gives each clock cycle of duration 10ns. To access slave interconnect is needed, hence

    interconnect signals are also studied. Master block functions are assumed to be available and the

    slave characteristics are studied. The AMBA AXI4 system components consists of

    1) Master

    2) AMBA AXI4 Interconnect

    2.1) Arbiters

    2.2) Decoders

    3) Slave

    The master is connected to the interconnect using a slave interface and the slave is

    connected to the interconnect using a master interface as shown in fig. The AXI4 master gets

    connected to the AXI4 slave interface port of the interconnect and the AXI slave gets connected

    to the AXI4 Master interface port of the interconnect. The parallel capability of this

    interconnects enables master M1 to access one slave at the same as master M0 is accessing the

    other.

  • 7/27/2019 Arbitery Document

    33/78

    AMBA AXI4 slave Read/Write block Diagram.

    SOFTWARE TOOLS

    Verilog HDL

    Verilog is used to model the design in this project. It is a hardware description language (HDL)

    used to model electronic systems. It is sometimes called Verilog HDL, which supports the

    design, verification, and implementation of analog, digital, and mixed-signal circuits at variouslevels of abstraction.

    Verilog was originally developed and owned by Gate Way design in 1984. After this,

    Cadence Design Systems purchased Gate Way and continued selling Verilog-XL as a Verilog-

    HDL simulator with PLI support in 1990. In 1995 Cadence released the specs for Verilog-HDL

  • 7/27/2019 Arbitery Document

    34/78

    and they were accepted as IEEE -1364 standard which included the PLI1.0 (TF/ACC) routines as

    a standard for all Verilog Simulators. In 1993 PLI2.0 (VPI) routines were released as a standard

    by OVI and in 1999 IEEE will vote on updating the 1364 standard to include PLI2.0. In 2001

    IEEE accepted the updated Verilog standard commonly known as Verilog 2001 and today, there

    are a dozen simulators that simulate Verilog HDL.

    Verilog was generated as a language for the industry rather than academia. It is very C

    like programming style that closely represents hardware. VHDL supports 9 values logic, where

    as Verilog supports 7 strengths on 3 values. Compared to VHDL, VHDL offers more

    programming constructs where as Verilog is closer to hardware.

    Verilog HDL is a general purpose hardware description language that is easy to learn

    and easy to use. It is similar to C programming language. Designers with C programming

    experience will find it easy to learn Verilog HDL, and will be comfortable with its syntax.

    Besides, Verilog allows different levels of abstraction to be mixed in the same models. Thus, a

    designer can define a hardware model in terms of switches, gates, RTL, or behavioral code. Also,

    a designer needs to learn only one language for stimulus and hierarchical design. On top of that,

    most popular logic synthesis tools support Verilog HDL. This makes it the language of choice

    for many ASIC companies. More important is, all fabrication vendors provide Verilog HDL

    libraries for post logic synthesis simulation. Thus, designing a chip in Verilog HDL allows the

    widest choice of vendors.

    3.3 INTRODUCTION TO XILINX ISE 12.1 EDA TOOL:

    ISE (Integrated Software Environment) continues to be the design tool of choice for

    FPGA designers based on independent media surveys. Xilinx once again earned top ranking in

    this year in all FPGA EDA categories and scored higher than any other FPGA vendor in user

    satisfaction.

    Xilinx ISE Overview

    The Integrated Software Environment (ISE) is the Xilinx design software suite that

    allows you to take your design from design entry through Xilinx device programming. The ISE

  • 7/27/2019 Arbitery Document

    35/78

    Project Navigator manages and processes your design through the following steps in the ISE

    design flow.

    A simplified version of design flow is given in the flowing diagram.

    Figure 3.5: FPGA Design Flow

    Design Entry

    There are different techniques for design entry. Schematic based, Hardware Description

    Language and combination of both etc. Selection of a method depends on the design and

    designer. If the designer wants to deal more with Hardware, then Schematic entry is the better

    choice. When the design is complex or the designer thinks the design in an algorithmic way then

    HDL is the better choice. Language based entry is faster but lag in performance and density.

    HDLs represent a level of abstraction that can isolate the designers from the details of the

    hardware implementation. Schematic based entry gives designers much more visibility into the

    hardware. It is the better choice for those who are hardware oriented. Another method but rarely

    used is state-machines. It is the better choice for the designers who think the design as a series of

  • 7/27/2019 Arbitery Document

    36/78

    states. But the tools for state machine entry are limited. In this documentation we are going to

    deal with the HDL based design entry.

    Synthesis

    The process which translates VHDL or Verilog code into a device netlist format. i.e a complete

    circuit with logical elements (gates, flip flops, etc) for the design.If the design contains more

    than one sub designs, ex. to implement a processor, we need a CPU as one design element and

    RAM as another and so on, then the synthesis process generates netlist for each design element

    Synthesis process will check code syntax and analyze the hierarchy of the design which ensures

    that the design is optimized for the design architecture, the designer has selected. The resulting

    netlist(s) is saved to an NGC( Native Generic Circuit) file (for Xilinx Synthesis Technology

    (XST)).

    Implementation

    This process consists a sequence of three steps

    1. Translate

    2. Map

    3. Place and Route

    Translate process combines all the input netlists and constraints to a logic design file. This

    information is saved as a NGD (Native Generic Database) file. This can be done using NGD

    Build program. Here, defining constraints is nothing but, assigning the ports in the design to the

    physical elements (ex. pins, switches, buttons etc) of the targeted device and specifying time

  • 7/27/2019 Arbitery Document

    37/78

    requirements of the design. This information is stored in a file named UCF (User Constraints

    File).

    Tools used to create or modify the UCF are PACE, Constraint Editor etc.

    Map process divides the whole circuit with logical elements into sub blocks such that they can

    be fit into the FPGA logic blocks. That means map process fits the logic defined by the NGD file

    into the targeted FPGA elements (Combinational Logic Blocks (CLB), Input Output Blocks

    (IOB)) and generates an NCD (Native Circuit Description) file which physically represents the

    design mapped to the components of FPGA. MAP program is used for this purpose.

  • 7/27/2019 Arbitery Document

    38/78

    Place and Route PAR program is used for this process. The place and route process places the

    sub blocks from the map process into logic blocks according to the constraints and connects the

    logic blocks. Ex. if a sub block is placed in a logic block which is very near to IO pin, then it

    may save the time but it may effect some other constraint. So trade off between all the

    constraints is taken account by the place and route process

    The PAR tool takes the mapped NCD file as input and produces a completely routed NCD file as

    output. Output NCD file consists the routing information.

    Device Programming

    Now the design must be loaded on the FPGA. But the design must be converted to a format so

    that the FPGA can accept it. BITGEN program deals with the conversion. The routed NCD file is

    then given to the BITGEN program to generate a bit stream (a .BIT file) which can be used to

    configure the target FPGA device. This can be done using a cable. Selection of cable depends on

    the design.

    Design Verification

    Verification can be done at different stages of the process steps.

    Behavioral Simulation (RTL Simulation) This is first of all simulation steps; those are

    encountered throughout the hierarchy of the design flow. This simulation is performed before

    synthesis process to verify RTL (behavioral) code and to confirm that the design is functioning

  • 7/27/2019 Arbitery Document

    39/78

    as intended. Behavioral simulation can be performed on either VHDL or Verilog designs. In this

    process, signals and variables are observed, procedures and functions are traced and breakpoints

    are set. This is a very fast simulation and so allows the designer to change the HDL code if the

    required functionality is not met with in a short time period. Since the design is not yet

    synthesized to gate level, timing and resource usage properties are still unknown.

    Functional simulation(Post Translate Simulation) Functional simulation gives information

    about the logic operation of the circuit. Designer can verify the functionality of the design using

    this process after the Translate process. If the functionality is not as expected, then the designer

    has to made changes in the code and again follow the design flow steps.

    Static Timing AnalysisThis can be done after MAP or PAR processes Post MAP timing

    report lists signal path delays of the design derived from the design logic. Post Place and Route

    timing report incorporates timing delay information to provide a comprehensive timing summary

    of the design.

    FPGA AND ISE DEVELOPMENT SOFTWARE BASICS

    The Spartan-3 EDK Board provides a powerful, self-contained development platform for designs

    targeting the new Spartan-3 FPGA from Xilinx. It features a 200K gate Spartan-3, onboard I/O

    devices, and 1MB fast asynchronous SRAM, making it the perfect platform to experiment with

    any new design, from a simple logic circuit to an embedded processor core. The board alsocontains a Platform Flash JTAG-programmable ROM, so designs can easily be made non-

    volatile.

    Xilinx Spartan3 FPGA:

    200,000-gate Xilinx Spartan 3 FPGA in a 144-TQFP (XC3S200-4TQG144C)

  • 7/27/2019 Arbitery Document

    40/78

    4,320 logic cell equivalents Twelve 18K-bit block RAMs (216K bits) Twelve 18x18 hardware multipliers Four Digital Clock Managers (DCMs) Up to 97 user-defined I/O signals

    External Peripherals Modules

    2x16 LCD with Contrast adjusts 2-Nos. of common anode seven segment display 8-Nos. General purpose point LEDs 8-Nos of Toggle switches (Digital inputs) 4-Nos of Push Button PS/2 Keyboard or Mouse Interface

    Communication protocols

    Full Duplex UART (EIA RS232)Other Features:

    VGA Interface Connector On-board 4 MB Platform Flash Memory (PROM) 8 MB On Board SRAM JTAG Interface Connector for parallel programming Spartan3 FPGA 50 MHz crystal oscillator clock source

  • 7/27/2019 Arbitery Document

    41/78

    SPARTAN3 (EDK) Board Components placement top view

  • 7/27/2019 Arbitery Document

    42/78

    Block Diagram

    Figure 2. Xilinx Spartan3Advanced Development Board Block Diagram

    On-board Peripherals

    The Spartan3FPGA Lab Kit comes with many interfacing options

    2 Nos. of Seven-segment display 8-Nos. of Toggle switches (Digital Inputs) 4-Nos. of Push Button (Digital Inputs)

  • 7/27/2019 Arbitery Document

    43/78

    8-Nos. of Point LEDs (Digital Outputs) 2x16 Character LCD UART for serial port communication through PC PS/2 keyboard Interface 3-Bit VGA Interface

    GETTING STARTED

    Software Requirements

    To use this tutorial, you must install the following software:

    ISE 12.1

    For more information about installing Xilinx software, see the ISE Release Notes and

    Installation Guide at: http://www.xilinx.com/support/software_manuals.htm.

    Hardware Requirements

    To use this tutorial, you must have the following hardware:

    Spartan-3 Startup Kit, containing the Spartan-3 Startup Kit Demo Board8 www.xilinx.com ISE

    STARTING THE ISE SOFTWARE

    To start ISE, double-click the desktop icon

    or start ISE from the Start menu by selecting:

    Start All Programs Xilinx ISE 12.1 Project Navigator

    Note: Your start-up path is set during the installation process and may differ from the one above

    ACCESSING HELP

    At any time during the tutorial, you can access online help for additional information

    about the ISE software and related tools.

    To open Help, do either of the following:

    Press F1 to view Help for the specific tool or function that you have selected or highlighted.

    Launch the ISE Help Contents from the Help menu. It contains information about creating and

    maintaining your complete design flow in ISE.

  • 7/27/2019 Arbitery Document

    44/78

    Figure 1: ISE Help Topics

    Create a New Project

    Create a new ISE project which will target the FPGA device on the Spartan-3 Startup Kit demo

    board.

    To create a new project:

    1. Select File > New Project... The New Project Wizard appears.

    2. Type tutorial in the Project Name field.

    3. Enter or browse to a location (directory path) for the new project. A tutorial subdirectory is

    created automatically.

    4. Verify that HDL is selected from the Top-Level Source Type list.

    5. Click Next to move to the device properties page.

    6. Fill in the properties in the table as shown below:

    Product Category: All

    Family: Spartan3

    Device: XC3S200

    Package: TQ144

    Speed Grade: -4

    Top-Level Source Type: HDL

    Synthesis Tool: XST (VHDL/Verilog)

    Simulator: ISE Simulator (VHDL/Verilog)

    Preferred Language: Verilog (or VHDL)

  • 7/27/2019 Arbitery Document

    45/78

    Verify that Enable Enhanced Design Summary is selected.

    Leave the default values in the remaining fields.

    When the table is complete, your project properties will look like the following:

    Figure 2: Project Device Properties

    Creating a Verilog Source

    Create the top-level Verilog source file for the project as follows:

    1. Click New Source in the New Project dialog box.

    2. Select Verilog Module as the source type in the New Source dialog box.

    3. Type in the file name counter.

    4. Verify that the Add to Project checkbox is selected.

    5. Click Next.

    6. Declare the ports for the design by filling in the port information.

    The source file containing the row DWT module displays in the Workspace, and the counter

    displays in the Sources tab, as shown below:

  • 7/27/2019 Arbitery Document

    46/78

    Checking the Syntax of the New Counter Module

    When the source files are complete, check the syntax of the design to find errors and typos.

    1. Verify that Implementation is selected from the drop-down list in the Sources window.

    2. Select the row DWT design source in the Sources window to display the related processes in

    the Processes window.3. Click the + next to the Synthesize-XST process to expand the process group.

    4. Double-click the Check Syntax process.

    Note: You must correct any errors found in your source files. You can check for errors in the

    Console tab of the Transcript window. If you continue without valid syntax, you will not be able

    to simulate or synthesize your design.

    5.Close the HDL file.

    BEHAVIORAL SIMULATION

    ISIM SETUP

    ISim is automatically installed and set up with the ISE Design Suite 12 installer on supported

    operating systems. To see a list of operating systems supported by ISim, please see the ISE

    Design Suite 12: Installation, Licensing, and Release Notes available from the Xilinx website.

  • 7/27/2019 Arbitery Document

    47/78

    Getting Started

    The following sections outline the requirements for performing behavioral simulation in this

    tutorial.

    Required FilesThe behavioral simulation flow requires design files, a test bench file, and Xilinx simulation

    libraries.

    Design Files (VHDL, Verilog, or Schematic)

    This chapter assumes that you have completed the design entry tutorial in either Chapter 2,

    HDL-Based Design, or Chapter 3, Schematic-Based Design. After you have completed one

    of these chapters, your design includes the required design files and is ready for simulation.

    Test Bench File

    To simulate the design, a test bench file is required to provide stimulus to the design. VHDL and

    Verilog test bench files are available with the tutorial files. You may also create your own test

    bench file.

    Simulation Libraries

    Xilinx simulation libraries are required when a Xilinx primitive or IP core is instantiated in the

    design. The design in this tutorial requires the use of simulation libraries because it contains

    instantiations of a digital clock manager (DCM) and a CORE Generator software component.

    For information on simulation libraries and how to compile them, see the next section, Xilinx

    Simulation Libraries.

    XILINX SIMULATION LIBRARIES

    To simulate designs that contain instantiated Xilinx primitives, CORE Generator software

    components, and other Xilinx IP cores you must use the Xilinx simulation libraries. These

    libraries contain models for each component. These models reflect the functions of each

    component, and provide the simulator with the information required to perform simulation. For a

    detailed description of each library, see the Synthesis and Simulation Design Guide. This guide

    is available from the ISE Software Manuals collection, automatically installed with your ISE

    software. To open the Software Manuals collection, select Help > Software Manuals. The

    Software Manuals collection is also available from the Xilinx website.

  • 7/27/2019 Arbitery Document

    48/78

    Adding an HDL Test Bench

    To add an HDL test bench to your design project, you can either add a test bench file provided

    with this tutorial, or create your own test bench file and add it to your project. Adding theTutorial Test Bench File

    This section demonstrates how to add an existing test bench file to the project. A VHDL

    test bench and Verilog test fixture are provided with this tutorial.

    Note: To create your own test bench file in the ISE software, select Project > New Source, and

    select either VHDL Test Bench or Verilog Text Fixture in the New Source Wizard. An empty

    stimulus file is added to your project. You must define the test bench in a text editor.

    Verilog Simulation

    To add the tutorial Verilog test fixture to the project, do the following:

    1. In Project Navigator, select Project > Add Source.

    2. Select the file tb_rowDWT.v.

    3. Click Open.

    4. Ensure that Simulation is selected for the file association type.

    5. Click OK.

    Behavioral Simulation Using ISim

    Follow this section of the tutorial if you have skipped the previous section, Behavioral

    Simulation Using ModelSim.

    Now that you have a test bench in your project, you can perform behavioral simulation on

    the design using ISim. The ISE software has full integration with ISim. The ISE software

    enables ISim to create the work directory, compile the source files, load the design, and

    perform simulation based on simulation properties.

    To select ISim as your project simulator, do the following:

    1. In the Hierarchy pane of the Project Navigator Design panel, right-click the device line

    (xc3s700A-4fg484), and select Design Properties.

    2. In the Design Properties dialog box, set the Simulator field to ISim (VHDL/Verilog).

    Locating the Simulation Processes

    The simulation processes in the ISE software enable you to run simulation on the design

  • 7/27/2019 Arbitery Document

    49/78

    using ISim. To locate the ISim processes, do the following:

    1. In the View pane of the Project Navigator Design panel, select Simulation, and select

    Behavioral from the drop-down list.

    2. In the Hierarchy pane, select the test bench file (tb_rowDWT).

    3. In the Processes pane, expand ISim Simulator to view the process hierarchy.

    Performing Simulation

    After the process properties have been set, you are ready to run ISim to simulate the

    design. To start the behavioral simulation, double-click Simulate Behavioral Model. ISim creates

    the work directory, compiles the source files, loads the design, and performs

    simulation for the time specified.

    Adding Signals

    To view signals during the simulation, you must add them to the Waveform window. The

    ISE software automatically adds all the top-level ports to the Waveform window.

    Additional signals are displayed in the Instances and Processes panel. The following

    procedure explains how to add additional signals in the design hierarchy. For the purpose

    of this tutorial, add the DCM signals to the waveform.

    To add additional signals in the design hierarchy, do the following:

    1. In the Instances and Processes panel, expand tb_rowDWT, and expand UUT.

    The following figure shows the contents of the Instances and Processes panel for the

    verilog flow.

    .

  • 7/27/2019 Arbitery Document

    50/78

    Drag all the selected signals to the waveform. Alternatively, right click on a selected

    signal and select Add To Waveform.

    Notice that the waveforms have not been drawn for the newly added signals. This is

    because ISim did not record the data for these signals. By default, ISim records data only

    for the signals that are added to the waveform window while the simulation is running.

    Therefore, when new signals are added to the waveform window, you must rerun the

  • 7/27/2019 Arbitery Document

    51/78

    simulation for the desired amount of time.

    Rerunning Simulation

    To rerun the simulation in ISim, do the following:

    1. Click the Restart Simulation icon.Figure : ISim Restart Simulation Icon

    2. At the ISim command prompt in the Console, enter run 2000 ns and press Enter.

    The simulation runs for 2000 ns. The waveforms for the counter block are now visible in the

    Waveform window.

    Running a Simulation in ISim

    Simulation, the process of verifying the logic and timing of a design, can be run from

    ISim using functions in the interface or at the command line.

    To Run a Simulation From the ISim GUI

    The following GUI menu commands can be used to run simulation.

    Simulation > Restart - Stops simulation and sets simulation time back to 0. Use

    the Run All, Run For or Step command to run the simulation over again without

    reloading the design. See restart Tcl command.

    Simulation > Run All - Runs simulation until all events are executed. You can

    also use the Run Tcl command with the all option.

    Simulation > Run - Runs simulation for 100ns or for specified amount of time in

    the toolbar. Time and time unit are entered in the Value box in the toolbar. You can

    also use the run Tcl command with a length and unit specified.

    Simulation > Step - Runs simulation for one executable HDL instruction at a

    time. See Stepping a Simulation. See also step Tcl command.

    In addition, you can run simulation until a specific point in your HDL source code

    is reached. To do so, use breakpoints and the Run All command. See Source Level

    Debugging Overview.

    Note The current simulation time is displayed on the status bar in the lower right corner.

    Pausing a Simulation

    While running a simulation for any length of time, you can pause a simulation using the

    Break command, which leaves the simulation session open.

  • 7/27/2019 Arbitery Document

    52/78

    To close the session of ISim, see Closing ISim.

    To Pause a Running Simulation

    You can pause a running simulation using the Break command as follows:

    Select Simulation > Break.

    Click the Break toolbar button .

    EnterCtrl+C at the command line only.

    The simulator stops at the next executable HDL line. The line at which the simulation

    stopped is displayed in the text editor.

    Note This behavior applies to designs that have not been compiled with the -nodebug

    switch.

    The simulation can be resumed at any time by using the Run All, Run, or Step

    commands. See Running a Simulation in ISim for details.

    Closing ISim

    You can terminate a simulation and close the ISim session.

    To Close ISim

    Select File > Exit.

    Enter the quit -f command in the Console panel at the prompt. This will prevent

    an are you sure dialog box from opening.

    Click the X at the top-right corner of the main window.

    The simulation terminates and the session of ISim closes.

    Changing the Radix

    To Set the Default Radix

    The default radix controls the bus radix displayed in the wave configuration, Objects

    panel, and the Console panel. To change the default radix from the default binary:

    1. Select Edit > Preferences.

    2. In the Preferences dialog box, click ISim Simulator in the left pane.

    3. Select a radix from the Default Radix field drop-down list.

    4. Click Apply, and click OK.

    To Change an Individual Radix

    You can change the radix of an individual signal (HDL object) in the Object panel

    as follows:

  • 7/27/2019 Arbitery Document

    53/78

    1. Right-click on a bus in the Objects panel.

    2. Select Radix, and the desired format from submenu menu:

    Binary

    Hexadecimal

    Unsigned Decimal

    Signed Decimal

    Octal

    ASCII

    vcd Command

    The vcd command generates simulation results in VCD format. This command enables

    you to dump specified instances to a VCD file, to name the VCD file, to start and stop

    the dump process, and other functions. See also Writing Activity Data of the Design.

    Note This command is case sensitive.

  • 7/27/2019 Arbitery Document

    54/78

    Syntax vcd (option)

    Examples

    The vcd command can be used as follows.

    Following are the commands you would use to write the VCD simul