iccad’01: november, 2001 instruction generation for hybrid reconfigurable systems ryan kastner,...
Post on 21-Dec-2015
225 views
TRANSCRIPT
ICCAD’01: November, 2001
Instruction Generation for Instruction Generation for Hybrid Reconfigurable SystemsHybrid Reconfigurable Systems
Instruction Generation for Instruction Generation for Hybrid Reconfigurable SystemsHybrid Reconfigurable Systems
Ryan Kastner, Seda Ogrenci-Memik,
Elaheh Bozorgzadeh and Majid Sarrafzadeh
{kastner,seda,elib,majid}@cs.ucla.edu
Ryan Kastner, Seda Ogrenci-Memik,
Elaheh Bozorgzadeh and Majid Sarrafzadeh
{kastner,seda,elib,majid}@cs.ucla.edu
Embedded and Reconfigurable Systems GroupEmbedded and Reconfigurable Systems Group
Computer Science DepartmentComputer Science Department
UCLAUCLA
Los Angeles, CA 90095Los Angeles, CA 90095
Embedded and Reconfigurable Systems GroupEmbedded and Reconfigurable Systems Group
Computer Science DepartmentComputer Science Department
UCLAUCLA
Los Angeles, CA 90095Los Angeles, CA 90095
ICCAD’01: November, 2001
OutlineOutlineOutlineOutline Introduction
Programmability Hybrid Reconfigurable Systems Strategically Programmable System
Instruction Generation Uses in Hybrid Reconfigurable Systems Relation to Template Generation and Matching
Algorithm for Template Generation and Matching Experiments Conclusion
Introduction Programmability Hybrid Reconfigurable Systems Strategically Programmable System
Instruction Generation Uses in Hybrid Reconfigurable Systems Relation to Template Generation and Matching
Algorithm for Template Generation and Matching Experiments Conclusion
ICCAD’01: November, 2001
ProgrammabilityProgrammabilityProgrammabilityProgrammability Future systems need programmability multiple levels of
computation hierarchy
Computational Hierarchy:
Future systems need programmability multiple levels of computation hierarchy
Computational Hierarchy:
Gate LevelGate Level
ADD Register
MUL
Control
-Architecture -Architecture
LevelLevel
FU
MemoryRegister
Bank
Control
Architecture Architecture
LevelLevel
FU
ProgrammabilityProgrammability BitBit ByteByte Instruction Instruction
(8 – 128 bits)(8 – 128 bits)
Basic Unit of Basic Unit of ComputationComputation
Boolean OperationBoolean Operation
(and, or, xor)(and, or, xor)
Arithmetic OperationArithmetic Operation Functional OperationFunctional Operation
CommunicationCommunication Direct wires Direct wires connectionsconnections
Bundles of wires, Bundles of wires, registersregisters
Bus, memoryBus, memory
Hybrid Reconfigurable Systems have programmability at Hybrid Reconfigurable Systems have programmability at one or more levelsone or more levels
Hybrid Reconfigurable Systems have programmability at Hybrid Reconfigurable Systems have programmability at one or more levelsone or more levels
Register
ICCAD’01: November, 2001
TradeoffsTradeoffsTradeoffsTradeoffs
ADD Register
MUL
Control
FU
MemoryRegister
Bank
Control
FU
Register
Example Example PlatformPlatform
Types of Types of Programmable Programmable
UnitsUnits
Custom Custom instructions, instructions,
Register banksRegister banks
Datapath unit, Datapath unit, Control unit, RAMControl unit, RAMCLBs, LUTsCLBs, LUTs
Architecture Architecture levellevel
Micro-Micro-architecture architecture
levellevelGate levelGate level
Hybrid Reconfigurable Systems should find a happy mediumHybrid Reconfigurable Systems should find a happy medium
Tensilica, ImprovTensilica, ImprovChameleon Chameleon SystemsSystemsXilinx, AlteraXilinx, Altera
FlexibilityConfiguration TimeThousands
of cycles
Hundreds
of cycles
ICCAD’01: November, 2001
SPS - Strategically Programmable SystemSPS - Strategically Programmable SystemSPS - Strategically Programmable SystemSPS - Strategically Programmable System
Embed (hard or soft) computational units – Versatile Programmable Blocks (VPB) - into FPGA-like fabric
Combine programmable units from gate, microarchitecture and architecture levels
Balance flexibility and configuration time
Embed (hard or soft) computational units – Versatile Programmable Blocks (VPB) - into FPGA-like fabric
Combine programmable units from gate, microarchitecture and architecture levels
Balance flexibility and configuration time
VPB VPB
VPB
Memory
Memory
Need automated method of determining the functionality of VPBs
Need automated method of determining the functionality of VPBs
ICCAD’01: November, 2001
SPS ArchitectureSPS Architecture
RoutingRouting
Arch.Arch.
Overview of SPSOverview of SPSOverview of SPSOverview of SPSSPS CompilerSPS Compiler
SPS Architecture GenerationSPS Architecture Generation
VPB VPB
SynthesisSynthesis
SPSSPS
Module Module
PlacementPlacement
Set of applications Set of applications specified in high level code specified in high level code
(c/c++, fortran, MOC)(c/c++, fortran, MOC)• Compile to low Compile to low level specificationlevel specification• Determine VPB Determine VPB functionalityfunctionality
ICCAD’01: November, 2001
VPB Instruction GenerationVPB Instruction GenerationVPB Instruction GenerationVPB Instruction Generation Given a set of applications, what computation should be
implemented on VPBs?
Given a set of applications, what computation should be implemented on VPBs?
RA
M
VPB
VPBs?
Want complex, commonly occurring computation patterns
Look for computational patterns at the instruction level Basic operation is add, multiply, shift, etc.
Want complex, commonly occurring computation patterns
Look for computational patterns at the instruction level Basic operation is add, multiply, shift, etc.
Set of applicationsSet of applications
VPB
RAM
ICCAD’01: November, 2001
Problem DefinitionProblem DefinitionProblem DefinitionProblem Definition
Determining VPB functionality requires regularity extraction
Regularity Extraction - find common sub-structures (templates) in one or a collection of graphs
Each application can be specified by collection of graphs (CDFGs)
Templates are implemented as VPBs Two related sub-problems:
Template Matching Template Generation
Determining VPB functionality requires regularity extraction
Regularity Extraction - find common sub-structures (templates) in one or a collection of graphs
Each application can be specified by collection of graphs (CDFGs)
Templates are implemented as VPBs Two related sub-problems:
Template Matching Template Generation
ICCAD’01: November, 2001
Template Matching – Formal Def’nTemplate Matching – Formal Def’nTemplate Matching – Formal Def’nTemplate Matching – Formal Def’n
Problem 1: Given a directed, labeled graph G(N, A), a library of templates, each of which is a directed labeled graph Ti(V,E), find every subgraph of G that is
isomorphic to any Ti
Problem 1: Given a directed, labeled graph G(N, A), a library of templates, each of which is a directed labeled graph Ti(V,E), find every subgraph of G that is
isomorphic to any Ti
+
*
*
+
+
* +
+
* &
+ ||
+
+
&
* *
Templates T+ *
* +
+
&
% +
+
%
* *
* & ||
* * +
+ +
Directed Labeled Graph G
T1 T2 T3
T4T5T6
ICCAD’01: November, 2001
Template Matching – Formal Def’nTemplate Matching – Formal Def’nTemplate Matching – Formal Def’nTemplate Matching – Formal Def’n Problem 2: Given an infinite number of each set of
templates = T1, … , Tk and an overlapping set of
subgraphs of the given graph G(N,E) which are isomorphic to some member of ; minimize k as well as xi where xi is the number of templates of type Ti used
such that the number of nodes left uncovered is the minimum.
Problem 2: Given an infinite number of each set of templates = T1, … , Tk and an overlapping set of
subgraphs of the given graph G(N,E) which are isomorphic to some member of ; minimize k as well as xi where xi is the number of templates of type Ti used
such that the number of nodes left uncovered is the minimum.
+ *
* +
+
&
% +
+
%
+ *
* & ||
* * +
+ +
ICCAD’01: November, 2001
Template GenerationTemplate GenerationTemplate GenerationTemplate Generation
Templates may not always be given as input
An automatic regularity extraction algorithm must develop it’s own templates
Generate a set of templates such that: Number of templates is minimized Covering of the graph is maximized
Templates may not always be given as input
An automatic regularity extraction algorithm must develop it’s own templates
Generate a set of templates such that: Number of templates is minimized Covering of the graph is maximized
ICCAD’01: November, 2001
Related WorkRelated WorkRelated WorkRelated Work
Useful in a wide variety of CAD applications
Data path regularity [Chowdhary98], [Callahan99]
Scheduling [Ly95] System partitioning [Rao93] Low power design [Mehra96] Soft macros – CPR [Cadambi99] for PipeRench
architecture
Useful in a wide variety of CAD applications
Data path regularity [Chowdhary98], [Callahan99]
Scheduling [Ly95] System partitioning [Rao93] Low power design [Mehra96] Soft macros – CPR [Cadambi99] for PipeRench
architecture
ICCAD’01: November, 2001
An Algorithm for Simultaneous An Algorithm for Simultaneous Template Generation and MatchingTemplate Generation and Matching An Algorithm for Simultaneous An Algorithm for Simultaneous Template Generation and MatchingTemplate Generation and Matching
1.1. Given a labeled digraph Given a labeled digraph G(V, E)G(V, E)
2.2. # C is a set of edge types# C is a set of edge types
3.3. C C
4.4. while (stop_conditions_not_met(while (stop_conditions_not_met(GG))))
5.5. C C profile_graph( profile_graph(GG))
6.6. cluster_common_edges(cluster_common_edges(G, CG, C))
1.1. Find the most common Find the most common edge typeedge type
2.2. Contract common Contract common edgesedges
3.3. Repeat until stopping Repeat until stopping condition metcondition met
Formal DefinitionFormal DefinitionFormal DefinitionFormal Definition Informal DefinitionInformal DefinitionInformal DefinitionInformal Definition
ICCAD’01: November, 2001
Explanation of AlgorithmExplanation of AlgorithmExplanation of AlgorithmExplanation of Algorithm
Edge contraction: Merge adjacent nodes and maintain connectivity
Edge contraction: Merge adjacent nodes and maintain connectivity
Stopping Conditions Reach certain number of templates Graph sufficiently covered No frequently occurring edge type
Stopping Conditions Reach certain number of templates Graph sufficiently covered No frequently occurring edge type
Profile Edges: Find most common edge types Profile Edges: Find most common edge types
Contract Contract
EdgeEdge
+ *
*
*
*
+
*
*
**
+ *
*
*
**
*Most Common Most Common
Edge TypeEdge Type
ICCAD’01: November, 2001
Edge 1 Edge 2 Edge 3Edge 4
Algorithm in ActionAlgorithm in ActionAlgorithm in ActionAlgorithm in Action
* * *
* *
>> %
*
&+
Iteration 2* * *
* *
>> %
*
&+
MIS
Edge 2
Conflict GraphConflict Graph
Edge 1Edge 3Edge 4
Create Conflict Graph
Determine MIS
* * *
* *
>> %
*
&+
Contract edges 2 and 4
TemplatesTemplates
* * *
* *
>> %
*
&+
Contract edges
TemplatesTemplates
ICCAD’01: November, 2001
Algorithm SummaryAlgorithm SummaryAlgorithm SummaryAlgorithm Summary
Algorithm can be generalized and used in a variety of applications
Easily extended to hypergraphs
Input/output pin restrictions can easily be added
Performs template generation and matching simultaneously
Algorithm can be generalized and used in a variety of applications
Easily extended to hypergraphs
Input/output pin restrictions can easily be added
Performs template generation and matching simultaneously
We target algorithm towards VPB We target algorithm towards VPB generation in SPSgeneration in SPS
We target algorithm towards VPB We target algorithm towards VPB generation in SPSgeneration in SPS
ICCAD’01: November, 2001
Experimental SetupExperimental SetupExperimental SetupExperimental Setup
Set of applicationsSet of applicationsspecified in Cspecified in C
SUIFSUIF
&&
Machine-SUIFMachine-SUIF
Control Flow GraphControl Flow Graph
+ *
+*
+
Control Dataflow GraphControl Dataflow Graph
Dataflow Dataflow
Graph Graph
GenerationGeneration
PassPass
ICCAD’01: November, 2001
Perform Perform
Template Template
Generation Generation
and Matchingand Matching
Experimental SetupExperimental SetupExperimental SetupExperimental Setup
MediaBench FilesMediaBench Files+ *
+*
+
Control Dataflow GraphControl Dataflow Graph
Compile to CDFGs
GatherGather
Statistics:Statistics:
Graph Coverage,Graph Coverage,
Num. TemplatesNum. Templates
ICCAD’01: November, 2001
Benchmark C File Description
mpeg2 motion.c Motion vector decoding
mpeg2 getblk.c DCT block decoding
adpcm adpcm.c ADPCM to/from 16-bit PCM
epic convolve.c 2D general image convolution
jpeg jctrans.c Transcoding compression
jpeg jdmerge.c Color conversion
rasta fft.c Fast Fourier Transform
rasta noise_est.c Noise estimation functions
gsm gsm_decode.c GSM decoding
gsm gsm_encode.c GSM encoding
Experimental Setup - BenchmarksExperimental Setup - BenchmarksExperimental Setup - BenchmarksExperimental Setup - Benchmarks
Selected files from MediaBench Selected files from MediaBench
ICCAD’01: November, 2001
Similarity Across ApplicationsSimilarity Across ApplicationsSimilarity Across ApplicationsSimilarity Across ApplicationsOper-ation
MediaBench file name
motion jdmerge getblk gsm_dec jctrans
ADD 50.3% 84.6% 44.5% 29.6% 84.6%
MUL 36.3% 13.8% 24.0% 22.4% 13. 8%
Template Coverage
MUL- MUL
0.0% 0.0% 1.3% 0.0% 0.0%
ADD-ADD
14.5% 9.1% 3.2% 3.6% 9.1%
ADD-MUL
0.0% 0.4% 0.6% 0.0% 0.4%
MUL-ADD
36.3% 13.0% 21.5% 22.4% 13.0%
ICCAD’01: November, 2001
Experimental ResultsExperimental ResultsExperimental ResultsExperimental Results
30%
40%
50%
60%
70%
80%
90%
0 10 20 30number of templates
% n
od
es
co
ve
red
No restrictions
Simple
Techniques Simple – restrict templates to two operations No restrictions – unlimited amount of operations
Stopping condition: most common edge occurs < x% (x5-25)
Techniques Simple – restrict templates to two operations No restrictions – unlimited amount of operations
Stopping condition: most common edge occurs < x% (x5-25)
ICCAD’01: November, 2001
SummarySummarySummarySummary
Systems need programmability at multiple levels of the computational hierarchy
Introduced SPS as a Hybrid Reconfigurable System Developed an instruction generation algorithm to
determine VPB functionality Showed that common templates can be found across a
similar set of applications An efficient covering possible using simple templates
Future work: Create methods to uncover more complex templates
Systems need programmability at multiple levels of the computational hierarchy
Introduced SPS as a Hybrid Reconfigurable System Developed an instruction generation algorithm to
determine VPB functionality Showed that common templates can be found across a
similar set of applications An efficient covering possible using simple templates
Future work: Create methods to uncover more complex templates