uic panella thesis
Post on 06-May-2015
3.383 Views
Preview:
TRANSCRIPT
Design Methodologies for Dynamic Reconfigurable
Multi-FPGA Systems
BY
Alessandro Panella
alessandro.panella@dresd.org
Thesis defense – May 5, 2008
THESIS COMMITTEE:
John Lillis, Marco D. Santambrogio, Ajay Kshemkalyani
2
About this thesis (1/2)About this thesis (1/2)
PROBLEM STATEMENT:Extend the range of application of dynamic reconfigurability techniques from the single FPGA case to multi-FPGA systems
NOVELTYMethodology for the design of multi-FPGA systems
Dynamic reconfigurabilitySeen as a solution for implementing area over-requiring applicationsOnly used “when needed”
Regularity-driven partitioning for run-time reuse
About this thesis (2/2)About this thesis (2/2)
Major contribution:Development of a multi-FPGA system design flow which exploits dynamic reconfigurability for blocks’ reuse.
Useful contributions:Creation of an intermediate representation for structural and hierarchical circuits.Creation of a framework for the extraction of the design from VHDL.Design and implementation of static global layout algorithms.Exploit hierarchy information for regular patterns extraction.
3
4
OutlineOutline
Context definitionFPGAMulti-FPGA Systems (MFS)Dynamic reconfigurability
Related worksMFS design flowsDynamic reconfigurable MFS’s
Proposed methodologyDesign extractionGlobal layoutReuse and Dynamic reconfigurability
Experimental resultsConclusion and future works
5
Field Programmable Gate Field Programmable Gate ArrayArray
Re-programmable semi-custom hardwareLow Non Recurrent Engineering (NRE) costsGood performancesHigh flexibility
Composed of Configurable Logic Blocks (CLB)Xilinx Virtex CLB:
2 slices, each containing two 4-input Look-Up Tables (LUT)
6
Multi-FPGA Systems (MFS)Multi-FPGA Systems (MFS)
Ensembles of more FPGAs (2 - 1000’s)
Motivations:Massively parallel computingNeed to implement large applicationsGeneral trend in VLSI towards multi-core computers
Applications:SupercomputingLogic emulationNeural networks, …
Terminology:Architecture: physical cluster of FPGAs Application: programmed functionalitySystem: architecture + application
7
MFS topologies (1/2)MFS topologies (1/2)
Connections:Hardwired vs. ProgrammableDedicated vs. Shared (bus, point to point)
Complete graph (Clique) Direct connection between any two chips Planarity; Pin requirements
Mesh: 4(8)-neighbor pattern Expandability No fixed length path Communication logic
in intermediate chips
PRO CON
8
MFS topologies (2/2)MFS topologies (2/2)
Crossbar: logic bearing chips and routing chipsTotal (one routing chip) Partial (several routing chips)
Equal communication delays Low scalability
Hybrid: combine benefits of the two approaches
Example: Complete Graph Partial Crossbar (HCGP)(from Khalid, M.: Routing Architecture and Layout Synthesis for Multi-FPGA Systems, Ph.D. Thesis, University of Toronto, 1999)
9
ReconfigurabilityReconfigurability
Reconfiguration: altering the location or functionality of a system element (H. Estrin, 1960)FPGA: suitable physical ground
Partial vs. Total
(Partial) Dynamic vs. Static:Only some parts of the system take part in each reconfigurationThe execution of the system does not cease
Motivations and applicationsProvide a larger virtual areaReact to sudden and frequent changes in applications needsFault tolerance
10
Dynamically Reconfigurable Dynamically Reconfigurable MFS’sMFS’s
Rationale: expand the capabilities of static MFS’sGoing beyond MFS physical limitationsProvide a high level of flexibility
E.g. in logic emulation: dynamic fault fixing
Partial vs. Total reconfiguration in MFS
Two main scenarios (not exclusive)Reconfiguration of logic chipsReconfiguration of routing chips
The interconnections are dynamically mutableComponents can be reused
11
Design hierarchyDesign hierarchy
Application composed of:Blocks
Can have sub-blocks
NetsBlock-to-blockBlock-to-interface
Advantages:Handle the complexity of designReuse of modules
IP-Cores libraries
Block-to-block net
Block-to-interface
net
12
What’s nextWhat’s next
Context definitionFPGAMulti-FPGA Systems (MFS)Dynamic reconfigurability
Related worksMFS design flowsDynamic reconfigurable MFS’s
Proposed methodologyDesign extractionGlobal layoutReuse and Dynamic reconfigurability
Experimental resultsConclusion and future works
13
Related works - MFS design Related works - MFS design flowflow
All MFS design flows have a similar structureDifferent algorithms used in each phase
Examples: Hauck (a) and Kahlid (b)
Global layout tasks: partitioning, placement and routing
a) Hauck, S.: Multi-FPGA Systems, Ph.D. Thesis, University of Washington, 1995
b) Kahlid, M.: Routing Architecture and Layout Synthesis for Multi-FPGA Systems, Ph.D. Thesis, University of Toronto, 1990
14
Complete MFS design flows (a)Complete MFS design flows (a)
Integrated solution to partitioning, placement and routing– Recursive bi-partitioning
• Multilevel approach– Clustering and refinement phases
– Partition orderings for placement• Identify the bottlenecks in the architecture• Assign the two initial partitions to the least
connected parts of the architecture, and so on recursively
– The connections are routed as the bisections are computed
PROS: the architecture is considered CONS: no flexibility on routing given partitioning
and placement
15
Complete MFS design flows Complete MFS design flows (b)(b)
Partitioning: recursive bisection using Fiduccia-Mattheyses heuristic
Placement: dependent on the topology– Mesh: force-directed– Crossbar: trivial task, the FPGAs have the same
distance Routing: two approaches
– General (obtain a graph from the architecture)– Specific (fitted on the particular MFS topology)
PROS: uses existent effective and robust algorithms
CONS: stress on routing and topology evaluation
16
Partial MFS design flowsPartial MFS design flows Address only some phases of the design
– Usually partitioning and placement
Iterative approaches– Genetic algorithm [Hidalgo et al., DSD ‘02]– Simulated annealing
[Roy at al., ICCAD ’93; Vicente et al., FPL ‘99]
Hierarchical approaches– Exploit the design hierarchy in partitioning– Behrens et al., ICCAD ‘96
• Hierarchy exploration heuristicFang et al., TODAES ‘00
Hierarchy extraction from Verilog spec.Set-covering procedure
17
Dynamic Reconfigurable MFSDynamic Reconfigurable MFS
Extraction of a directed task graph from VHDL Task graph divided into time segments
– Using a non-linear programming model Each segment is spatially partitioned
[Ouaiss et al., An Integrated Partitioning and Synthesis System for Dynamically Reconfigurable Multi-FPGA architectures, 1998]
Dynamic?
18
What’s nextWhat’s next
Context definition– FPGA– Multi-FPGA Systems (MFS)– Dynamic reconfigurability
Related works– MFS design flows– Dynamic reconfigurable MFS’s
Proposed methodology– Design extraction– Global layout– Reuse and Dynamic reconfigurability
Experimental results Conclusion and future works
19
Proposed methodologyProposed methodology
Multi-FPGA design flow Three main phases
1. Design extraction2. Static Global Physical Layout
• Partitioning• Placement• Routing
3. Reuse through Dynamic Reconfigurability
Reuse introduces extra delays– Reconf. times, sequential
execution…– Only adopted when needed– In such case, the introduced delay
has to be minimized
Input: VHDL description Output: Intermediate representation
– Ad hoc created data structure
Two sub-phases:– VHDL preprocessing– VHDL structural parsing
20
Design ExtractionDesign Extraction
21
Intermediate representationIntermediate representation
C++ data structure Contains both structural and hierarchical
information Graphs implemented using the Boost Graph
Library Container class provides an API
22
VHDL ParsingVHDL Parsing
VHDL preprocessing: obtain a pure structural VHDL description
– Features of each component are retrieved using vendors synthesis tools (i.e. Xilinx XST, Synplify PRO)
Create the intermediate representation from the pure VHDL description
23
ExampleExample
Hierarchy
Flattened view
DES encryption core(part of the 3DES core circuit)
24
Static Global LayoutStatic Global Layout
This phase addresses Partitioning and Placement
Two implemented approaches:– Integrated P&P
– Sequential P&P
25
Simulated annealing algorithm– Iterative randomized approach
• Suitable to cope with high dimesionality problems• Partitioning + Placement is such a problem
– Aim: minimize a cost function f– The algorithm starts with a “high” temperature T– At each iteration
• M random moves are performed• The move if accepted (Metropolis criterium)
– Always if the cost decreases or remains equal– With probability if the cost increase
• T is decreased by a cooling factor α– Stop after S consecutive non-accepted moves
Integrated P&PIntegrated P&P
€
e−Δc /T
26
Annealing implementationAnnealing implementation
Solution: array [ci], node i is placed in FPGA ci Cost: Weighted Estimated Wire Length (WEWL)
Random move: single-node or swap, with equal probability
Constraints:– Area constraint– I/O Pin constraint– Handled with penalties
27
Sequential P&PSequential P&P
Partitioning: bottom-up clustering 1-to-1 Placement: annealing
– Simplified version of the integrated P&P algorithm
CLUSTERING: Initialization: each node is considered as a
cluster At each iteration
– Choose two nodes on the basis of a metric– Collapse them
Stop when– Only one cluster is left– No clusters can be formed due to
• Area constraint• I/O Pin constraint
28
Clustering metricsClustering metrics
1. Connection:
2. Communication Ratio:
– Internal comm.
– External comm.
2. Communication density:
29
Blocks reuseBlocks reuse Problem: application does not fit onto the
architecture – Reuse similar parts of the circuit in order to save
space Def: dynamically-interconnected structure
Architectural scenarios– Bus– Crossbar
30
Isomorphic clustersIsomorphic clusters Which parts of the structure consider for reuse? Def. Isomorphic Clusters
– Substructures which contain the same blocks having the same connections
– Example
Two subproblems– Finding isomorphic clusters– Select the ones to reuse (and how many times)
31
Isomorphic clusters extraction (1/2)Isomorphic clusters extraction (1/2)
Regularity driven clustering
Def. type of a node: component which the node is instance of
If two nodes selected for collapsing have the same parent– Look for nodes with the same type of the parent in
the hierarchy– Execute the same collapsing operation– Assign the same type to the newly created clusters
Clustering itself benefits from this enhancement– Problem of standard clustering: lack of global
metric– Regularity provides global information
32
Isomorphic clusters extraction Isomorphic clusters extraction (2/2)(2/2)
The key feature is the assignment of a “type” to clusters
Example:
33
Blocks reuse choicesBlocks reuse choices
Choose which blocks to reuse Difficulty: high complexity due to hierarchical
clusters– Some clusters contains others
Solution– ILP model fast even for a high number of nodes– Run the ILP model on each “cut” of the dendrogram
– Each cut is a flatten structural view of the application
34
ILP model for blocks reuseILP model for blocks reuse
xi: number of times cluster type ti is reused (= no. of needed reconfigurations)
35
What’s nextWhat’s next
Context definition– FPGA– Multi-FPGA Systems (MFS)– Dynamic reconfigurability
Related works– MFS design flows– Dynamic reconfigurable MFS’s
Proposed methodology– Design extraction– Global layout– Reuse and Dynamic reconfigurability
Experimental results Conclusion and future works
ExperimentsExperiments
Test circuit description (slide 37)
Integrated vs. Sequential partitioning & placement– Methodologically, both approaches are valid– They are compared from a numerical point of view
• Partitioning evaluation (slide 38)• Placement evaluation (slide 39)
Sequential P&P vs. Metis (slide 40)– Provide a comparison with an external approach
Blocks reuse evaluation (slide 41)– Execution time– Example of application
36
37
Results: test circuitsResults: test circuits
Triple-DES encryption+decryption core (3DES) Finite Impulse Response filter (FIR) Noekeon cipher (NOEK) Composed module FIR+3DES
Integrated vs. Sequential P&P (1/2)Integrated vs. Sequential P&P (1/2)
Partitioning evaluation
38
NOTE: by setting the distance between any two FPGAs equal to 1, the integrated annealing approach is actually a partitioning algorithm
Placement evaluation (on mesh architectures) Integrated P&P
Sequential P&P
v
39
Integrated vs. Sequential P&P (2/2)Integrated vs. Sequential P&P (2/2)
Clustering Vs. MetisClustering Vs. Metis
40
41
Results: ILP model solvingResults: ILP model solving
Timing results
ILP result - example: • 3DES-FIR circuit
• Conn metric
• 4 FPGAs of 600 slices needed
Only 3 are available
• Adopt reuse
• Dendrogram cuts 2-7 provides the lowest estimated rec. time
42
What’s nextWhat’s next
Context definition– FPGA– Multi-FPGA Systems (MFS)– Dynamic reconfigurability
Related works– MFS design flows– Dynamic reconfigurable MFS’s
Proposed methodology– Design extraction– Global layout– Reuse and Dynamic reconfigurability
Experimental results Conclusion and future works
43
Conclusion: contributionsConclusion: contributions
Major contribution:– Development of a multi-FPGA systems design flow which
exploits dynamic reconfigurability for blocks reuse while minimizing the estimated execution time.
Useful contributions:– Creation of a intermediate representation for structural
and hierarchical circuits.– Creation of a framework for the extraction of the design
from VHDL.– Design and implementation of static global layout
algorithms.– Exploit hierarchy information for regular patterns
extraction.
The proposed approaches have been validated through experimental evaluations
44
Conclusion: future worksConclusion: future works
Improvements– Go beyond the inherent greediness of clustering– More powerful closeness metrics– More accurate time estimation function for blocks
reuse
Additions– Development of a robust and effective routing
algorithm for both static and dynamic implementations
– Partitioning and placement for dynamically-interconnected structures
– Binding and scheduling of application blocks on the instantiated clusters
45
The end.The end.
Questions?
46
That’s all folks!That’s all folks!
Thank you.
How ‘bout a funny joke?
top related