a survey of logic block architectures
DESCRIPTION
A Survey of Logic Block Architectures. For Digital Signal Processing Applications. Presentation Outline. Considerations in Logic Block Design Computation Requirements Why Inefficiencies? Representative Logic Block Architectures Proposed Commercial Conclusions: What is suitable Where?. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/1.jpg)
A Survey of Logic Block Architectures
For Digital Signal Processing Applications
![Page 2: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/2.jpg)
Presentation Outline
Considerations in Logic Block DesignComputation RequirementsWhy Inefficiencies?
Representative Logic Block ArchitecturesProposedCommercial
Conclusions: What is suitable Where?
![Page 3: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/3.jpg)
Why DSP??? The Context
Representative of computationally intensive class of applications datapath oriented and arithmetic oriented
Increasingly large use of FPGAs for DSP multimedia signal processing, communications, and much more
To study the “issues” in reconfigurable fabric design for compute intensive applications What is involved in making a fabric to accelerate multimedia reconfigurable computing possible?
![Page 4: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/4.jpg)
Elements of a Reconfigurable Architecture Logic Block/Processing Element
Differing Grains Fine>>Coarse>>ALUs Routing Dynamic Reconfiguration
![Page 5: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/5.jpg)
So what’s wrong with the typical FPGA? Meant to be general purpose lower
risks Toooo Flexible! Result: Efficiency Gap Higher Implementation Cost, Larger Delay,
Larger Power Consumption than ASICs Performance vs. Flexibility Tradeoff
Postponing Mapping and Silicon Re-use
![Page 6: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/6.jpg)
Solution? See how FPGAs are Used? FPGAs are being used for “classes” of
applications Encryption, DSP, Multimedia etc.
Here lies the Key Design FPGAs for a class of applications
Application Domain Characterization Application Domain Tuning
![Page 7: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/7.jpg)
Domain Specialization
COMPUTATION defines ARCHITECTURE Target Application Characteristics known
beforehand? Yes1. Characterize the application domain
2. Determine a balance b/w flexibilty vs efficiency
3. Tune the architecture according
![Page 8: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/8.jpg)
Categorizing the “Computation”
Control Random Logic Implementation Datapath Processing of Multi-bit Data Conflicting Requirements???
![Page 9: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/9.jpg)
Datapath Element Requirements
Operates on Word Slices or Bit Slices Produces multi-bit outputs Requires many smaller elements to
produce each bit output i.e. multiple small LUTs
![Page 10: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/10.jpg)
Control Logic Requirements
Produces a single output from many single bit inputs
Benefits from large grain LUT as logic levels gets reduced
![Page 11: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/11.jpg)
Logic Block Design: Considerations “How much” of “what kinds” of
computations to support? Tradeoff: Generality vs Specialization
![Page 12: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/12.jpg)
How much of What? Applications benchmarking
![Page 13: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/13.jpg)
So what do we have to support?
Datapath functionality, in particular arithmetic, is dominant in DSP.
The datapath functions have different bit-widths. DSP designs heavily use multiplexers of various
size. Thus, an efficient mapping of multiplexers should be supported.
DSP functions do contain random logic. The amount of random logic varies per design.
Some DSP designs use wide boolean functions.
![Page 14: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/14.jpg)
DSP Building Blocks
Some techniques widely used to achieve area-speed efficient DSP implementations
Bit Serial Computations Routing Efficient Bit Level Pipelining Increases throughput even more
Digit Serial Computation Combining “Area efficiency” of bit-serial and with
“Time efficiency” of Bit-parallel
![Page 15: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/15.jpg)
Classes of DSP-optimized FPGA Architectures
1. Architectures with Dedicated DSP Logic Homogeneous Hetrogeneous Globally Homogeneous, Locally
Heterogenous
2. Architectures of Coarser Granularity3. With DSP Specific Improvements (e.g.
Carry Chains, Input Sharing, CBS)
![Page 16: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/16.jpg)
Some Representative Architectures
![Page 17: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/17.jpg)
Bit-Serial FPGA with SR LUT
Bit-serial paradigm suites the existing FPGA so why not optimize the FPGA for it!
Logic block to support efficient implementation of bit-serial data path and bit-level pipelining
LUTs can be used for combinational logic as well as for Shift Registers
![Page 18: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/18.jpg)
A Bit-Serial Adder
A Bit-Serial Adder which processes two bits at a time
Interface Block Diagram
![Page 19: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/19.jpg)
A Bit-Serial Multiplier Cell
![Page 20: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/20.jpg)
The Proposed Bit Serial Logic Block Architecture
4x4-input LUTs and 6 flip-flops.
The two multiplexers in front of the LUTs are targeted mainly for carry-save operations which are frequently used in bit-serial computations.
There are 18 signal inputs and 6 signal outputs, plus a clock input.
Feed-back inputs c2, c3, c4, c5 can be connected to either GND or VDD or to one of the 4 outputs d0, d1, d2, d3. Therefore, each LUT can implement any 4-input functions controlled by inputs a0, a1, a2, a3 or b0, b1, b2, b3.
Programmable switches connected to inputs a4 and b4 control the functionality of the four multiplexers at the output of LUTs. As a result, 2 LUTs can implement any 5-input functions.
The final outputs d0, d1, d2, d3 can either be the direct outputs from the multiplexers or the outputs from flip-flops. All bit-serial operators use the outputs from flip-flops; therefore the attached programmable switches are actually unnecessary. They are only present in order to implement any other logic functions other than bit-serial datapath circuits.
Two flip-flops are added (inputs c0 and c1) to implement shift registers which are frequently used in bit-serial operations.
![Page 21: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/21.jpg)
The Modified LUT Implementing a Shift Register
![Page 22: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/22.jpg)
Performance Results
![Page 23: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/23.jpg)
Digit-Serial Logic Block Architecture Digit–Serial Architectures process one
digit (N=4 bits) at a time They offer area efficiency similar to bit-
serial architectures and time-efficiency close to bit-parallel architectures
N=4 bits can serve as an optimal granularity for processing larger digit sizes (N=8,16 etc)
![Page 24: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/24.jpg)
Digit-Serial Building Blocks
A Digit-Serial Adder A Digit-Serial Unsigned Multiplier
![Page 25: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/25.jpg)
Digit-Serial Building Blocks
A Pipelined Digit-Serial Unsigned Multiplier For Y=8 bits
![Page 26: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/26.jpg)
Digit-Serial Signed Multiplier Blocks
Middle Stages ModuleFirst Stage Module Last Stage Module
![Page 27: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/27.jpg)
Signed Digit-Serial Multiplier
A Digit-Serial Signed Booth’s Pipelined Multiplier with Y=8
![Page 28: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/28.jpg)
Proposed Digit-Serial Logic Block
![Page 29: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/29.jpg)
Detailed Structure of Digit-Serial Logic Block
![Page 30: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/30.jpg)
The Basic Logic Module (LM)
The Structure of the LM
Table of Functions Implemented
![Page 31: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/31.jpg)
Examples of Implementations
N=4 Unsigned Multiplier
N=4 SignedMultiplier
Two N=2 Multipliers
Bit-Level Pipelined
![Page 32: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/32.jpg)
Area Comparison with Xilinx 4000 Series
![Page 33: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/33.jpg)
Mixed-Grain Logic Block Architecture Exploits the adder inverting property Efficiently implements both datapath and
random logic in the same logic block design
![Page 34: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/34.jpg)
Adder Inverting Property
Full Adder and Equations ShowingThe Inverting Property
An optimal structure derived fromthe property
![Page 35: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/35.jpg)
LUT Bits Utilization in Datapath and Logic Modes
![Page 36: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/36.jpg)
Structure of a Single Slice
![Page 37: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/37.jpg)
Complete Logic Block
![Page 38: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/38.jpg)
Modified ALU Like Functionality
![Page 39: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/39.jpg)
Comparison Results
![Page 40: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/40.jpg)
Comparison Results (Cont…)
![Page 41: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/41.jpg)
Comparison Results (cont…)
![Page 42: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/42.jpg)
Coarser ALU Like Architectures
![Page 43: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/43.jpg)
CHESS Architecture
![Page 44: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/44.jpg)
CHESS ALU Based Logic Block
![Page 45: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/45.jpg)
Structure of a Switch Box
![Page 46: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/46.jpg)
Comparison Results
![Page 47: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/47.jpg)
Computation Field Programmable Architecture A Heterogeneous architecture with cluster
of datapath logic blocks Separate LUT Based Logic Blocks for
supporting random logic mapping Basic Logic Block called a Partial Adder
Subtraction Multiplier (PASM) Module
![Page 48: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/48.jpg)
PASM Logic Block of CFPA
![Page 49: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/49.jpg)
Cluster of PASM Logic Blocks
![Page 50: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/50.jpg)
Comparison Results
![Page 51: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/51.jpg)
Some Industry Architectures Designs
![Page 52: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/52.jpg)
Altera APEX II Logic Element
![Page 53: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/53.jpg)
Altera MAX II Logic Element
![Page 54: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/54.jpg)
LE Configuration in Arithmetic Mode
![Page 55: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/55.jpg)
LE in Random Logic Implementation
![Page 56: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/56.jpg)
Altera Stratix Logic Element
![Page 57: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/57.jpg)
Altera Stratix II Architecture
![Page 58: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/58.jpg)
Stratix II Adaptive Logic Module
![Page 59: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/59.jpg)
Stratix II ALM in Arithmetic Mode
![Page 60: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/60.jpg)
Various Configurations in an ALM of Stratix II
![Page 61: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/61.jpg)
Multiplier Resources in Stratix II
![Page 62: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/62.jpg)
Structure of a DSP Block in Stratix II
![Page 63: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/63.jpg)
XILINX Virtex II Pro Architecture
![Page 64: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/64.jpg)
Basic Logic Element of Virtex II Pro
![Page 65: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/65.jpg)
Dedicated Multipliers in Virtex II Pro
![Page 66: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/66.jpg)
Processor-Programmable Logic Coupled Architecture
![Page 67: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/67.jpg)
PiCoGA Architecture Coupled with a VLIW processor
![Page 68: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/68.jpg)
PiCoGA Logic Block
![Page 69: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/69.jpg)
Conclusions
Traditional general purpose FPGA inefficient for data path mapping
Logic blocks with DSP specific enhancements seem a promising solution
Coarse Grained Logic can achieve better application mapping for data path but sacrifice flexibility
Dedicated Blocks (Multipliers) increase performance but also increases cost significantly
![Page 70: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/70.jpg)
Conclusions
PDSPs with embedded FPGA can achieve a good balance between performance and power consumption
So…Which approach is the best? No single best exists
![Page 71: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/71.jpg)
Suitability of Approaches
Highly computationally intensive applications with large amounts of parallelism can use platform FPGAs where often large resources are required and power consumption is not an issue.
Here cost/function will be lowest
![Page 72: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/72.jpg)
Suitability of Approaches
Field Programmable Logic based coprocessors can benefit from coarse grained blocks where most control functions are implemented by the PDSP itself
![Page 73: A Survey of Logic Block Architectures](https://reader035.vdocument.in/reader035/viewer/2022062323/568159f8550346895dc7436d/html5/thumbnails/73.jpg)
Suitability of Approaches
Higher flexibility and lower cost can be achieved with logic blocks with DSP specific enhancements but flexibility to implement control logic in an efficient manner.