computing without processors thesis proposal mihai budiu july 30, 2001 this presentation uses...
Post on 22-Dec-2015
217 views
TRANSCRIPT
Computing Without ProcessorsThesis Proposal
Mihai Budiu July 30, 2001
This presentation uses TeXPoint by George Necula
Thesis Committee:Seth Goldstein, chair
Todd Mowry Peter Lee
Babak Falsafi, ECENevin Heintze, Agere Systems
2
Four Types of Research
• Solve nonexistent problems
• Solve past problems
• Solve current problems
• Solve future problems
11
Premises of this Research
• We will have lots of gates– Moore’s law continues– Nanotechnology
• Contemporary architectures do not scale
12
Outline
• Motivation
• ASH: Application-Specific Hardware
• The spatial model of computation
• CASH: Compiling for ASH
• Evolutionary path
• Conclusions
• Future work
14
ASH: A Scalable Architecture-- Thesis Statement --
Application-specific hardware on a reconfigurable-hardware substrate is a solution for the smooth evolution of computer architecture.
We can provide scalable compilers for translating high-level languages into hardware.
16
Outline
• Motivation
• ASH: Application-Specific Hardware
• The spatial model of computation
• CASH: Compiling for ASH
• Evolutionary path
• Conclusions
• Future work
17
• Build reconfigurable hardware using nanotechnology
Huge structures
ASH and Nanotechnology
• Low Power: 1010 gates use less than 2 W• Low cost: nanocents/gate• High density: 105x over CMOS
Nano-RAM cell
In yellow: a CMOS RAM cell.
18
A graph of the whole program execution:
A Limit Study of Performance
Memory word
Basic block
Memory write
Memory read
Control-flow transfer
19
Typical Program Graph (g721_e)
Control flow transfer
100% memory cluster
Memory reads
100% code cluster
memcpy
21
Application Slowdown
-1
0
1
2
3
4
5
6
7
8
9
10
11
tim
es s
low
er t
han
nat
ive
1 clock/square 5 clocks/square
22
How Time Is Spent
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
099.g
o
129.c
ompr
ess
130.l
i
132.i
jpeg
adpc
m_d
adpc
m_e
epic_
e
g721
_Q_d
g721
_Q_e
gsm
_d
gsm
_e
jpeg_d
jpeg_e
mpe
g2_d
per
cen
t
idle
executioncontrol flow
register traffic
No caches: reads expensive
No speculation
24
Outline
• Motivation
• ASH: Application-Specific Hardware
• The spatial model of computation
• CASH: Compiling for ASH
• Evolutionary path
• Future work
26
Compilation
1. Program
int reverse(int x){ int k,r=0; for (k=0; k<32; k++) r |= x&1; x = x >> 1; r = r << 1; }}
Unknown latency ops.
Computations& local storage2. Split-phase Abstract
Machines
3. Configurations placed
independently4. Placement on chip
Reliability
31
The SAM FSM
Computation
Predicates (control)
Combinational logic
start exit
Reg
iste
r
args results
32
Computation = Dataflow
• Variables => wires + tokens• No token store; no token matching • Local communication only
Signals
x = a & 7;...
y = x >> 2;
Programs
&
a 7
>>
2
x
Circuits
33
Tokens & Synchronization
• Tokens signal operation completion• Possible implementations:
data
validack
Local
data
valid
reset
Global
data valid
Static
34
Speculation
if (x > 0) y = -x;
elsey = b*x;
*
x
b 0
y
!
slow
Computation Predicates
- >- >
and Eager Muxes
Static-Single Assignment implemented in hardware
ILP
35
Predicates
*q = 2;
• Guard side-effects– Memory access– Procedure calls
• Control looping
• Decide exit branch
• Select variable definition x=... x=...
...=x
36
Computing Predicates
• Correct for irreducible graphs• Correct even when speculatively computed • Can be eagerly computed
s t
b
37
Loops + Dataflow
for (i=0; i < 10; i++)a[i] += i;
+
load
+
store
&a[0]
+
1i
a[0]
0
a[1]
a[2]
a[3]
= Pipelining
38
Outline
• Motivation
• ASH: Application-Specific Hardware
• The spatial model of computation
• CASH: Compiling for ASH
• Evolutionary path
• Conclusions
• Future work
42
Outline
• Motivation
• ASH: Application-Specific Hardware
• The spatial model of computation
• CASH: Compiling for ASH
• Evolutionary path
• Conclusions
• Future work
43
ASH Benefits
Problem Solution
Reliability Configuration around defects
Power Only “useful” gates switching
Signals Localized computation
ILP Statically extracted
45
Summary
• Contemporary CPU architecture faces lots of problems
• Application-Specific Hardware (ASH) provides a scalable technology
• Compiling HLL into hardware dataflow machines is an effective solution
46
Timeline
12/0206/01
CASH core
09/01 12/01 04/02 06/02 09/02
Writethesis
Hw/sw partitioning(ASH + CPU)
Costmodels
ASH Simulation
Loop parallelization
Explore architectural/compiler trade-offs
now
Memory partitioning
47
Extras
• Related work
• Reconfigurable hardware
• Other cross-over phenomena
• A CPU + ASH study
• More about predicates
48
Related Work
• Hardware synthesis from HLL
• Reconfigurable hardware
• Predicated execution
• Dataflow machines
• Speculative execution
• Predicated SSA
back
49
Reconfigurable Hardware
Universal gates
and/or
storage elements
Interconnectionnetwork
Programmable Switches
back to presentation
50
Switch controlled by a 1-bit RAM cell
0001
Universal gate = RAM
a0a1a0
a1
dataa1 & a2
0data in
control
Main RH Ingredient: RAM Cell
back
51
Reconfigurable Computing
• Back to ENIAC-style computation
• Synthesize one machine to solve one problem
back back to “extras”
56
ASH BenefitsProblem Solution
Reliability Configuration around defects
Power Only “useful” gates switching
Signals Localized computation
ILP Statically extracted
Complexity Hierarchy of abstractions
CAD Compiler + local place & route
Efficiency Circuit customized to application
Cost No masks, no physics, same substrate
Performance Scalableback
57
CPU+ASH Study
• Reconfigurable functional unit on processor pipeline
• Adapted SimpleScalar 3.0• ASH & CPU use the same memory
hierarchy (incl. L1)• ASH can access CPU registers• CPU pipeline interlocked with ASH• Results pending
back