(keynote) (from hpc to) new horizons of very high performance computing (vhpc): hurdles and chances...
Post on 19-Dec-2015
218 views
TRANSCRIPT
(keynote)(from HPC to)
New Horizons of Very High Performance Computing
(VHPC): Hurdles and Chances
Reiner Hartenstein
TU Kaiserslautern
Rhodes Island, Greece, April 25-26, 2006
© 2006, [email protected] http://hartenstein.de2
TU KaiserslauternReconfigurable Supercomputing
(VHPC) going commercial
Cray XD1
silicon graphics RASC
… it‘s a paradigm shift !… and other vendors
© 2006, [email protected] http://hartenstein.de3
TU Kaiserslautern
The Pervasiveness of RC
162,000
127,000
158,000113,000
171,000194,000
# of hits by Google
1,620,000
915,000
398,000
272,000
647,000
1,490,000
# of hits by Google
“FPGA and ….”ECE-savvy scene Math/SW-savvy sceneunqualified for RC ?
© 2006, [email protected] http://hartenstein.de4
TU Kaiserslautern
world-wide a mass movement
Methodology ?
reminds me to the mass migration of lemmings
terminology chaosnot really a sense of direction
an urgent need to get organized
© 2006, [email protected] http://hartenstein.de5
TU Kaiserslautern>> Outline <<
•Reconfigurable Computing Paradox
•The Supercomputing Paradox
•We are using the wrong model
•Coarse-grained Reconfigurable Devices
•Super Pentium for Desktop Supercomputer
http://www.uni-kl.de
© 2006, [email protected] http://hartenstein.de6
TU KaiserslauternThe Reconfigurable Computing
Paradox
very poor effective integration density
„very power-hungry“ [Rick Kornfeld*]
very poor application development support
poor FPGA technology:
lower clock frequencies, and more expensive.
RC education: extremely poor, or none
Languages and tools unacceptable for software peoplemost hardware experts (86%**) hate their tools
**) DeHon ‘98 *) personal communication
poor tools:
poor education:
However, brilliant
results everywhere
what paradox ?
ignored by CS curricula
… teach like for a 50 year old mainframe …
© 2006, [email protected] http://hartenstein.de7
TU Kaiserslautern
Computing Curricula 2004fully ignores
Reconfigurable Computing
Joint Task Force for
FPGA & synonyma: 0 hits
not even here
(Google: 10 million hits)
Education ?
© 2006, [email protected] http://hartenstein.de8
TU Kaiserslautern
Computing Curricula v.2005:no changes other than „… FPGA, etc.“(not really mentioning that it‘s missing)
Completed ?
Taskforce activity completed ?Next task force in 2020 or later ?
© 2006, [email protected] http://hartenstein.de9
TU Kaiserslautern
End of this week: brainstorming session at DARPA:
(urgently needed – overdue! )
Tools ?
© 2006, [email protected] http://hartenstein.de10
TU Kaiserslautern
fine-grained RC: 1st DeHon‘s Law Technology:
reconfigurability overhead>
routing congestion
wiring overhead
overhead:
>> 10 000
1980 1990 2000 2010100
103
106
109
FPGAlogical
FPGArouted
density:
FPGAphysical
(Gordon Moore curve)
transistors / microchip
(microprocessor)
immense area inefficiency
[1996: Ph. D, MIT]
© 2006, [email protected] http://hartenstein.de11
TU Kaiserslautern
X 2/yr
FPGA
published speed-up factors
1980 1990 2000 2010100
103
106
109
8080
Pentium 4
7%/yr
50%/yr
http://xputers.informatik.uni-kl.de/faq-pages/fqa.html
10 000
Los Alamos traffic simulation
Los Alamos traffic simulation
47
real-time face detectionreal-time face detection6000
video-rate stereo vision
video-rate stereo vision
900pattern
recognitionpattern
recognition730
SPIHT wavelet-based image compressionSPIHT wavelet-based image compression 457Smith-Waterman pattern matching
Smith-Waterman pattern matching
288
BLASTBLAST52protein identificationprotein identification
40
molecular dynamics simulationmolecular dynamics simulation
88
Reed-Solomon Decoding
Reed-Solomon Decoding2400
Viterbi DecodingViterbi Decoding
400
FFTFFT
100
1000MA
CMA
C
Grid-based DRC:no FPGA: DPLA on MoM by TU-KL
Grid-based DRC:no FPGA: DPLA on MoM by TU-KL
20002000
2-D FIR filter [TU-KL]2-D FIR filter [TU-KL]
39,4
Lee Routing (by TU-KL)
Lee Routing (by TU-KL)
160
Grid-based DRC („fair
comparizon“)
Grid-based DRC („fair
comparizon“)1500015000
DSP and wirelessImage processing,Pattern matching,
Multimedia
Bioinformatics
GRAPEGRAPE20
Astrophysics
DPLADPLA
MoM Xputer architecture
Microprocessor
rela
tive
perf
orm
anc
e
Memory
10 000
x1.25 / yr (Moore)
cryptocrypto
1000
pre-FPGA era
© 2006, [email protected] http://hartenstein.de12
TU Kaiserslautern
pre FPGA era: Why DPLA* was so good
Close to Moore because of small overhead (wiring, programmability, routing)
Large arrays of canonical boolean expressions
PLA layout ~similar to RAM / ROM layout:
Mid’ 80ies: first very tiny FPGAs available
*) designed by TU-KL, fabricated by E.I.S. German multi university project
GAG Generic Address Generator to avoid address computation overhead
2ASM: Auto-Sequencing MemoryASM
[M. Herz et al.: ICECS 2003, Dubrovnik]
© 2006, [email protected] http://hartenstein.de13
TU Kaiserslautern(anti-von-Neumann machine
paradigm)Data Counter instead of Program CounterGeneralization of the DMA
datacounter
GAG RAM
ASM: Auto-Sequencing MemoryASM
GAG & enabling technology:published 1989 [by TU-KL],Survey paper: [M. Herz et al.*: IEEE ICECS 2003, Dubrovnik] *) IMEC & TU-KL
**) -- patented by TI** 1995
Storge Scheme optimization methodology, etc.
© 2006, [email protected] http://hartenstein.de14
TU Kaiserslautern
Thousands or Millions of $ for free
Application migration [from supercomputer] resulting not only in massive speed-upsElectricity bills reduced by an order of magnitude and even more you may get for free…. up to millions of $ dollars per year
(also a matter of national energy policy)
GoogleAmsterdam
NY
© 2006, [email protected] http://hartenstein.de15
TU KaiserslauternReconfigurable Scientific
Computing How software types do programming the FPGAs ?Hiring a good student from the EE Dept. ?
Because of Missing RC education: Far away from optimum solutions ?Much higher speedup achievable ?
1 or 2 more orders of magnitude ? 100.000 ? 1.000.000 ?
© 2006, [email protected] http://hartenstein.de16
TU Kaiserslautern
X 2/yr
FPGA
By education: better speed-up factors ?
1980 1990 2000 2010100
103
106
109
8080
P4
7%/yr
50%/yr
http://xputers.informatik.uni-kl.de/faq-pages/fqa.html
10 000
Los Alamos traffic simulation
Los Alamos traffic simulation
47
real-time face detectionreal-time face detection6000
video-rate stereo vision
video-rate stereo vision
900pattern
recognitionpattern
recognition730
SPIHT wavelet-based image compressionSPIHT wavelet-based image compression 457Smith-Waterman pattern matching
Smith-Waterman pattern matching
288
BLASTBLAST52protein identificationprotein identification
40
molecular dynamics simulationmolecular dynamics simulation
88
Reed-Solomon Decoding
Reed-Solomon Decoding2400
Viterbi DecodingViterbi Decoding
400
FFTFFT
100
1000MA
CMA
C
Grid-based DRC:no FPGA: DPLA on MoM by TU-KL
Grid-based DRC:no FPGA: DPLA on MoM by TU-KL
20002000
2-D FIR filter [TU-KL]2-D FIR filter [TU-KL]
39,4
Lee Routing (by TU-KL)
Lee Routing (by TU-KL)
160
Grid-based DRC („fair
comparizon“)
Grid-based DRC („fair
comparizon“)1500015000
DSP and wirelessImage processing,Pattern matching,
Multimedia
Bioinformatics
GRAPEGRAPE20
Astrophysics
DPLADPLA
MoM Xputer architecture
Microprocessor
rela
tive
perf
orm
anc
e
Memory
10 000
x1.25 / yr (Moore)
cryptocrypto
1000
tool
s & e
du a
vaila
ble
?
© 2006, [email protected] http://hartenstein.de17
TU Kaiserslautern>> Outline <<
•Reconfigurable Computing Paradox
•The Supercomputing Paradox
•We are using the wrong model
•Coarse-grained Reconfigurable Devices
•Super Pentium for Desktop Supercomputer
http://www.uni-kl.de
© 2006, [email protected] http://hartenstein.de18
TU Kaiserslautern
The Supercomputing Paradox
Growing listed Teraflops
Often limited sustained Teraflops
Almost stalled application implementation progress
Increasing number of processors running in parallel
COTS processor decreasing cost
Very high total cost of the Tera(?)flops
promising technology
poor results
Scientists waiting for affordable compute capacity
The Law of More
© 2006, [email protected] http://hartenstein.de19
TU Kaiserslautern>> Outline <<
•Reconfigurable Computing Paradox
•The Supercomputing Paradox
•We are using the wrong model
•Coarse-grained Reconfigurable Devices
•Super Pentium for Desktop Supercomputer
http://www.uni-kl.de
© 2006, [email protected] http://hartenstein.de20
TU Kaiserslautern
Why traditional supercomputing / HPC failed
instruction-stream-based: memory-cycle-hungry
the wrong way, how the data are moved around
because of the wrong multi-core interconnect architecture
extr
emel
y unbal
ance d
stolen from Bob Colwell
CPU
© 2006, [email protected] http://hartenstein.de21
TU Kaiserslautern
Earth Simulator
5120 Processors, 5000 pins eachES 20: TFLOPS
Crossbar weight: 220 t, 3000 km of thick cable,moving data around
inside the
© 2006, [email protected] http://hartenstein.de22
TU Kaiserslautern
Bringing together data and processor
moving the grand piano
by SoftwareMoving data to the processor:
© 2006, [email protected] http://hartenstein.de23
TU Kaiserslautern>> Outline <<
•Reconfigurable Computing Paradox
•The Supercomputing Paradox
•We are using the wrong model
•Coarse-grained Reconfigurable Devices
•Super Pentium for Desktop Supercomputer
http://www.uni-kl.de
© 2006, [email protected] http://hartenstein.de24
TU Kaiserslautern
coarse-grained RC: Hartenstein‘s Law
rDPA
FPGArouted
>> 10 000
1980 1990 2000 2010100
103
106
109
(Gordon Moore curve)
transistors / microchip
rDPA physical rDPA logical
area efficiency very close to Moore‘s law
[1996: ISIS, Austin, TX]
e.g.
KressArray
family
© 2006, [email protected] http://hartenstein.de25
TU Kaiserslautern
X 2/yr
FPGA
higher speed-up factors by coarse-grained?
1980 1990 2000 2010100
103
106
109
8080
P4
7%/yr
50%/yr
http://xputers.informatik.uni-kl.de/faq-pages/fqa.html
10 000
Los Alamos traffic simulation
Los Alamos traffic simulation
47
real-time face detectionreal-time face detection6000
video-rate stereo vision
video-rate stereo vision
900pattern
recognitionpattern
recognition730
SPIHT wavelet-based image compressionSPIHT wavelet-based image compression 457Smith-Waterman pattern matching
Smith-Waterman pattern matching
288
BLASTBLAST52protein identificationprotein identification
40
molecular dynamics simulationmolecular dynamics simulation
88
Reed-Solomon Decoding
Reed-Solomon Decoding2400
Viterbi DecodingViterbi Decoding
400
FFTFFT
100
1000MA
CMA
C
Grid-based DRC:no FPGA: DPLA on MoM by TU-KL
Grid-based DRC:no FPGA: DPLA on MoM by TU-KL
20002000
2-D FIR filter [TU-KL]2-D FIR filter [TU-KL]
39,4
Lee Routing (by TU-KL)
Lee Routing (by TU-KL)
160
Grid-based DRC („fair
comparizon“)
Grid-based DRC („fair
comparizon“)1500015000
DSP and wirelessImage processing,Pattern matching,
Multimedia
Bioinformatics
GRAPEGRAPE20
Astrophysics
DPLADPLA
MoM Xputer architecture
Microprocessor
rela
tive
perf
orm
anc
e
Memory
10 000
x1.25 / yr (Moore)
cryptocrypto
1000Coa
rse-
grai
ned
arra
ys ?
© 2006, [email protected] http://hartenstein.de26
TU Kaiserslautern
rDPU not used used for routing only operator and routing port location markerLegend: backbus connect
array size: 10 x 16 = 160 rDPUs
Coarse grain is about computing, not logic
rout thru only
not usedbackbus connect
SNN filter on KressArray (mainly a pipe network)
[Ulrich Nageldinger]
reconfigurable Data Path Unit, e. g. 32 bits wide
reconfigurable Data Path Unit, e. g. 32 bits wide
no CPUrDPUrDPU
© 2006, [email protected] http://hartenstein.de27
TU Kaiserslautern
SW 2coarse-grained CW migration example
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
S
+
© 2006, [email protected] http://hartenstein.de28
TU KaiserslauternCompare it to software solution on CPU
on a very simple CPU C = 1
memory cycles
nanoseconds
if C then read A
read instruction
instruction decoding
read operand*
operate & register transfers
if not C then read B
read instruction
instruction decoding
add & store
read instruction
instruction decoding
operate & register transfers
store result
total
S
+
ABR C
Clock200
=1
S
+
S = R + (if C then A else B endif);
© 2006, [email protected] http://hartenstein.de29
TU Kaiserslautern
hypothetical branching example to illustrate software-to-configware
migration
*) if no intermediate storage in register file
C = 1simple conservative CPU example
memory cycles
nanoseconds
if C then read A
read instruction 1 100instruction decoding
read operand* 1 100operate & reg. transfers
if not C then read B
read instruction 1 100instruction decoding
add & store
read instruction 1 100instruction decoding
operate & reg. transfers
store result 1 100
total 5 500
S = R + (if C then A else B endif);
S
+
ABR C
clock200 MHz(5 nanosec)
=1
no m
emor
y cy
cles
:
no m
emor
y cy
cles
:
spee
d-up
fac
tor
= 1
00
spee
d-up
fac
tor
= 1
00
© 2006, [email protected] http://hartenstein.de30
TU Kaiserslautern
moving the locality of operation into the route of the data stream by P&R
Why the speed-up? What‘s the difference?
instead of moving data by instruction streams
© 2006, [email protected] http://hartenstein.de31
TU Kaiserslautern
rDPU not used used for routing only operator and routing port location markerLegend: backbus connect
rout thru only
not usedbackbus connect[Ulrich Nageldinger]
The wrong mind set ....
S = R + (if C then A else B endif);
=1
+
ABR C
section of a very large pipe network:
decision
not knowing this solution:symptom of the hardware / software chasm
and the configware / software chasm
„but you can‘t implement decisions!“
We need Reconfigurable Computing Education
© 2006, [email protected] http://hartenstein.de32
TU Kaiserslautern
The new paradigm: how the data are traveling
not transport-triggered: old hat
pipeline, or chaining
super systolic array
no, not by instruction execution
DPU DPU DPU
vN Move Processor
instruction-driven
+ instruction-driven
[Jack Lipovski, EUROMiCRO, Nice, 1975]
P&R: move locality of operation, not data !
© 2006, [email protected] http://hartenstein.de33
TU Kaiserslautern
DPA
xxx
xxx
xxx
|
||
x x
x
x
x
x
x x
x
- -
-
input data stream
xx
x
x
x
x
xx
x
--
-
-
-
-
-
-
-
-
-
-
xxx
xxx
xxx
|
|
|
|
|
|
|
|
|
|
|
|
|
|output data streams
„data
streams“ time
port #
time
time
port #time
port #
define: ... which data item at which time at which port
Data streams
(pipe network)
H. T. Kung paradigm(systolic array)
implemented by distributed
memory
datacounter
GAG RAM
ASM
ASM
ASM
ASM
ASM
ASM
AS
M
AS
M
AS
M
AS
M
AS
M
AS
MASM: Auto-Sequencing
Memory
50 & more on-chip ASM are feasible
50 & more on-chip ASM are feasible
© 2006, [email protected] http://hartenstein.de34
TU Kaiserslautern
The Generalization of the Systolic Array
[R. Kress]:use optimization algorithmse. g.: simulated annealing
Achievement: also non-linear and non-uniform pipes, and even more wild pipe structures possible
reconfigurability makes sense
discard algebraic synthesis methods
remedy?
only for applications with regular data dependencies
Kress-Kung paradigmsuper systolic array
© 2006, [email protected] http://hartenstein.de35
TU Kaiserslautern>> Outline <<
• Reconfigurable Computing Paradox
• The Supercomputing Paradox
• We are using the wrong model
• Coarse-grained Reconfigurable Devices
• Super Pentium for Desktop Supercomputer
http://www.uni-kl.de
© 2006, [email protected] http://hartenstein.de36
TU Kaiserslautern
Here is the common model
data-stream-based
instruction-stream-
based
software code
accelerator reconfigurable
accelerator hardwired
configware code
CPU
it’s not von Neumann the vN monopoly in our curricula is severely harmful
wagging the dog
the tail is
we need dual paradigm education
© 2006, [email protected] http://hartenstein.de37
TU Kaiserslautern
A potential Pentium successorDiscard most caches
have 64* cores, 0.5 - 1 GHz
with clever interconnect for:
concurrent processes and
and for multithreading,
Kung-Kress pipe network
The Desk-top Supercomputer!
*) CPU mode / DPU mode capability
and, for
CPU
mod
eDP
U m
ode
© 2006, [email protected] http://hartenstein.de38
TU Kaiserslautern“Super Pentium” configuration
examplerDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU
CPUCPU
CPUCPU CPUCPU
CPUCPU
© 2006, [email protected] http://hartenstein.de39
TU Kaiserslautern
e. g.: ~ 8 x 8 rDPA: all feasible under 500 MHz
GamesGames MusicMusicVideosVideos
SMeXPPSMeXPP
CameraCamera
Baseband-Baseband-ProcessorProcessor
Radio-Radio-InterfaceInterface
AudioAudio--InterfaceInterface
SD/MMC CardsSD/MMC Cards
LCD DISPLAY
rDPArDPA
• Variable resolutions and refresh rates• Variable scan mode characteristics• Noise Reduction and Artifact Removal• High performance requirements• Variable file encoding formats• Variable content security formats• Variable Displays• Luminance processing• Detail enhancement• Color processing• Sharpness Enhancement• Shadow Enhancement• Differentiation • Programmable de-interlacing heuristics• Frame rate detection and conversion• Motion detection & estimation & compensation• Different standards (MPEG2/4, H.264)• A single device handles all modes
World TV & game console & multi media center
http://pactcorp.com
© 2006, [email protected] http://hartenstein.de40
TU Kaiserslautern
Dual Paradigm Application Development
instruction-stream-
based
software code
accelerator reconfigurable
accelerator hardwired
configware codedata-stream-based
CPU
software/configwareco-compiler
high level language
© 2006, [email protected] http://hartenstein.de41
TU KaiserslauternSoftware / Configware Co-
Compilation
Juergen Becker’s CoDe-
X, 1996
CPUCPU
Resource Parameters
supportingdifferentplatforms
SWcompiler
CWcompiler
C language source
Partitioner
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
Placement &
Routing
Placement &
Routing(Move the Locality of Operation
)
© 2006, [email protected] http://hartenstein.de42
TU Kaiserslautern
Bringing together data and processor
Move the stool
byConfigware
Place the location of execution into the data pipe
© 2006, [email protected] http://hartenstein.de43
TU Kaiserslautern>> Conclusions <<
•Reconfigurable Computing Paradox
•The Supercomputing Paradox
•We are using the wrong model
•Coarse-grained Reconfigurable Devices
•Super Pentium for Desktop Supercomputer
•Conclusions http://www.uni-kl.de
© 2006, [email protected] http://hartenstein.de44
TU Kaiserslautern
Conclusions (1): Hurdles
Obstacles are:
unbelievably disastrous tools market:
unbelievably ignorant curricula:
enabling technologies available, partly decades old, but not used
transdisciplinary models not available nor taught at CS, nor elsewhere
fragmentation into application-domain-specific cultures and trick boxes
… teach like for a 50 year old mainframe …
© 2006, [email protected] http://hartenstein.de45
TU Kaiserslautern
Conclusions (2): Future Work
CS disciplines must recognize and accept its strategic role and its responsibility toward all its application disciplines: embedded and scientific computing.
The monopoly of the von-Neumann-based mind set in CS education:
heavily stalls progress in R&D, not only in HPC causes high cost in R&D, not only in supercomputing
The von-Neumann-only-based mind set in CS urgently needs to go to adopt the dual paradigm common model
CS graduates are not qualified for our job market
© 2006, [email protected] http://hartenstein.de46
TU Kaiserslautern
Conclusions (3): Chances
New horizons: chances are brilliant
© 2006, [email protected] http://hartenstein.de51
TU Kaiserslautern
Co-Compiler Enabling Technology
is available from academia
only a small team needed for commercial re-implementation
on the road map to the Personal Supercomputer
© 2006, [email protected] http://hartenstein.de52
TU KaiserslauternCompilation: Software vs.
Configware
source program
softwarecompiler
software code
Software Engineeri
ng
Software Engineeri
ng
configware code
mapper
configwarecompiler
scheduler
flowware code
source „program“
Configware
Engineering
Configware
Engineering
placement &
routing
data
C, FORTRANMATHLAB
© 2006, [email protected] http://hartenstein.de53
TU Kaiserslautern
configware resources: variable
Nick Tredennick’s Paradigm Shifts explain the differences
2 programming sources needed
flowware algorithm: variable
Configware EngineeringConfigware Engineering
Software EngineeringSoftware Engineering
1 programming source needed
algorithm: variable
resources: fixedsoftware
CPU
© 2006, [email protected] http://hartenstein.de54
TU Kaiserslautern
Co-Compilation
softwarecompiler
software code
Software / Configware Co-Compiler
Software / Configware Co-Compiler
configware code
mapperconfigware
compiler
scheduler
flowware code
data
C, FORTRAN, MATHLAB
automatic SW / CW partitionersimulated annealing
simulated annealing
simulated annealing
simulated annealing
© 2006, [email protected] http://hartenstein.de55
TU Kaiserslautern
Co-Compiler for Hardwired Kress/Kung Machine
[e. g. Brodersen]
softwarecompiler
software code
Software / Flowware
Co-Compiler
Software / Flowware
Co-Compiler
flowwarecompiler
scheduler
flowware code
data
source
automatic SW / CW partitioner
© 2006, [email protected] http://hartenstein.de56
TU KaiserslauternThe first archetype machine model
mainframe
CPU
compile orassemble
proceduralpersonalization
Software IndustrySoftware Industry Software Industry’sSecret of Success
simple basic .Machine Paradigm
personalization:RAM-based
instruction-stream- based mind set
“von Neumann”
© 2006, [email protected] http://hartenstein.de57
TU KaiserslauternThe 2nd archetype machine model
compilestructural
personalization
Configware IndustryConfigware Industry
Configware Industry’sSecret of Success
personalization:RAM-based
data-stream- based mind set
“Kress-Kung”
accelerator reconfigurable
simple basic .Machine Paradigm
© 2006, [email protected] http://hartenstein.de58
TU Kaiserslautern
„Saves more than $10,000 in electricity bills per year (7¢ / kWh) - .... per 64-processor 19" rack“ [Herb Riley, R. Associates]
© 2006, [email protected] http://hartenstein.de59
TU Kaiserslauternmodern FPGA bestsellers:
The new model is reality:FPGA fabrics, together with several µprocessors, many memory banks, and other IP cores, on the same COTS microchip
© 2006, [email protected] http://hartenstein.de60
TU Kaiserslautern
500MHz FlexibleSoft Logic Architecture
200KLogic Cells
500MHz Programmable DSP Execution Units
0.6-11.1GbpsSerial Transceivers
500MHz PowerPC™ Processors(680DMIPS)
withAuxiliary Processor Unit
1Gbps DifferentialI/O
500MHz multi-portDistributed 10 Mb SRAM
500MHz DCM DigitalClock Management
DSP platform FPGA[courtesy Xilinx Corp.]