compiling application-specific hardware mihai budiu seth copen goldstein carnegie mellon university
Post on 21-Dec-2015
219 views
TRANSCRIPT
Problems
• Complexity
• Power
• Global Signals
• Limited issue window => limited ILP
We propose a scalable architecture
Our Solution
General: applicable to today’s software - programming languages
- applications
Automatic: compiler-driven
Scalable: - run-time: with clock, hardware - compile-time: with program size
Parallelism: exploit application parallelism
New
• Entire C applications
• Dynamically scheduled circuits
• Custom dataflow machines
- application-specific
- direct execution (no interpretation)
- spatial computation
Primitives+Arithmetic/logic
Multiplexors
Merge
Eta (gateway)
Memory
data
predicates
datapredicate
ld st
Forward Branches
if (x > 0) y = -x;
elsey = b*x;
*
xb 0
y
!
- >
Decoded mux
Conditionals => Speculation
Lenient Operations
if (x > 0) y = -x;
elsey = b*x;
*
xb 0
y
!
- >
Solve the problem of unbalanced paths
!
ret
i
+1< 100
0
*
+
sum
0
Loops
int sum=0, i;
for (i=0; i < 100; i++)
sum += i*i;
return sum;
Control flow => data flow
Compilation
• Translate C to dataflow machines
• Optimizationssoftware-, hardware-, dataflow-specific
• Expose parallelism – predication– speculation– localized synchronization– pipelining
Predicate ackedge is on thecritical path.
Pipeliningi
+
<=
100
1
*
+
sum
critical pathi’s loop
sum’s loop
ASH Features
• What you code is what you get– no hidden control logic– lean hardware
(no CAM, multi-ported files, etc.)– no global signals
• Compiler has complete control
• Dynamic scheduling => latency tolerant
• Natural ILP and loop pipelining
Conclusions
• ASH: compiler-synthesized hardware from HLL
• Exposes program parallelism
• Dataflow techniques applied to hardware
• ASH promises to scale with:
– circuit speed
– transistors
– program size
Backup slides
• Hyperblocks• Predication• Speculation• Memory access• Procedure calls• Recursive calls• Resources• Performance
Memory Access
back
load
addresspredicate
token
tokendataLoad-store
queue
store
address pred token
token
data
Inte
rcon
nect
ion
netw
ork
Memory
Procedure calls
back
Inte
rcon
nect
ion
netw
ork
Extract args
ret
result caller
Procedure P
call P
args
Resources
• Estimated SpecINT95 and Mediabench
• Average < 100 bit-operations/line of code
• Routing resources harder to estimate
• Detailed data in paper
back