epimap : using epimorphism to map applications on cgras
DESCRIPTION
EPIMap : Using Epimorphism to Map Applications on CGRAs. Mahdi Hamzeh , Aviral Shrivastava , and Sarma Vrudhula School of Computing, Informatics, and Decision Systems Engineering Arizona State University June 2012. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: EPIMap : Using Epimorphism to Map Applications on CGRAs](https://reader035.vdocument.in/reader035/viewer/2022062410/56816260550346895dd2bc15/html5/thumbnails/1.jpg)
EPIMap: Using Epimorphism to Map Applications on CGRAs
Mahdi Hamzeh, Aviral Shrivastava, and Sarma VrudhulaSchool of Computing, Informatics, and Decision Systems Engineering
Arizona State UniversityJune 2012
This work was supported in part by NSF IUCRC Center for Embedded Systems under Grant DWS-0086, by the Science Foundation of Arizona (Grant SRG 0211-07), and by the Stardust Foundation.
![Page 2: EPIMap : Using Epimorphism to Map Applications on CGRAs](https://reader035.vdocument.in/reader035/viewer/2022062410/56816260550346895dd2bc15/html5/thumbnails/2.jpg)
2
Accelerators for Energy Efficiency
ProcessorAccelerator
Shared Memory
50 100 150 200 2501
10
100 DRESC CGRA
Intel Core i7
NVIDIA Tesla™ c2050
Power (W)
Giga Opsper Sec
60 GOpS/W
1.4 GOpS/W 4.3 GOpS/W
• Demand for performance• Power consumption• Technology scaling
![Page 3: EPIMap : Using Epimorphism to Map Applications on CGRAs](https://reader035.vdocument.in/reader035/viewer/2022062410/56816260550346895dd2bc15/html5/thumbnails/3.jpg)
3
Coarse-grained Reconfigurable Architectures
• 2D array of Processing Elements (PEs)• ALU + local register File -> PE• Mesh interconnection• Shared data bus• PE inputs:– 4 Neighboring PEs– Local register file– Data memory
1 2
3 4
1 23 4
1 23 4
Time
![Page 4: EPIMap : Using Epimorphism to Map Applications on CGRAs](https://reader035.vdocument.in/reader035/viewer/2022062410/56816260550346895dd2bc15/html5/thumbnails/4.jpg)
4
a b
c d
ef
g
1 23 4
Time
0
11 2
3 4
1 23 4
1 23 4
2
3
a bb
c d
ef
g
1 23 4
1 23 4
1 23 4
1 23 4
a b
bc d
2
What to Map on CGRA and How?
4
a b
c d
ef
g
a b
c d
b
ef
g
II is the performance metric
![Page 5: EPIMap : Using Epimorphism to Map Applications on CGRAs](https://reader035.vdocument.in/reader035/viewer/2022062410/56816260550346895dd2bc15/html5/thumbnails/5.jpg)
5
a
b
c d
e f
1 23 40
1
2
3
1 23 4
1 23 4
1 23 4
a
b
c d
e f
1 23 4
1 23 4
1 23 4
1 23 4
a
Re-Computation
4
3
2
d
fe
d
e f
c dc d
e fe f
c d
e f
bb
c d
e f
a
b
c d
e f
a
bb
c d
e f
bb
Re-computation can lead to better mapping
![Page 6: EPIMap : Using Epimorphism to Map Applications on CGRAs](https://reader035.vdocument.in/reader035/viewer/2022062410/56816260550346895dd2bc15/html5/thumbnails/6.jpg)
6
1 23 4
1 23 4
1 23 4
1 23 4
1 23 4
a
a
b
c d e f
0
1
2
1 23 4
1 23 4
b
cd
b
31 2
3 4e
f
b
cd b
ef
b b
cde
fc
ef
d
aa
bb
Re-Computation and Routing
3
2
bb
b
![Page 7: EPIMap : Using Epimorphism to Map Applications on CGRAs](https://reader035.vdocument.in/reader035/viewer/2022062410/56816260550346895dd2bc15/html5/thumbnails/7.jpg)
7
• Several CGRAs architectures been designed• XPP, PADDI, PipeBench, KressArray etc. Survey in [Harstentien 2001]
•Compilers for CGRA– EMS [Park 2008], Semi-simulated annealing based [Mei 2004] , Simulated annealing
based [Hatanaka 2007, Friedman 2009]– Use routing to resolve resource limitation problem– No techniques exist that exploit re-computation for mapping.
• Contributions of this work– General problem formulation
• Re-computation, routing, or both for resource limitation problem– Application mapping heuristic EPIMap
• More accurate MII extraction• Resource aware routing• Efficient placement (Maximum Common Subgraph problem)• Use information from unsuccessful attempts for next mapping
Related Works and Contributions
![Page 8: EPIMap : Using Epimorphism to Map Applications on CGRAs](https://reader035.vdocument.in/reader035/viewer/2022062410/56816260550346895dd2bc15/html5/thumbnails/8.jpg)
8
• Loops from SPEC2006 and multimedia benchmarks
• 4 × 4 CGRA with enough instruction and data memory
• Shared data bus for each row• Latency is 1 cycle and 2 registers at PEs • EMS[Park 2006] and BCEMS (best among 500
runs)
Experimental Setup
![Page 9: EPIMap : Using Epimorphism to Map Applications on CGRAs](https://reader035.vdocument.in/reader035/viewer/2022062410/56816260550346895dd2bc15/html5/thumbnails/9.jpg)
9
Mapping Results
SOR
Swim_cal1
Swim_cal2
Sobel
lowpass
laplace
forward
waveletBzip
2H.264
Jpeg
Libquantum Milc
sjeng
Average0
5
10
15
20
25
EPI EMS BCEMS
Initi
atio
n In
terv
al
The lower II, the better performance
2.8X less than EMS
2.2X less than BCEMS
EPIMap improves performance on average by
2.8X more than EMS
![Page 10: EPIMap : Using Epimorphism to Map Applications on CGRAs](https://reader035.vdocument.in/reader035/viewer/2022062410/56816260550346895dd2bc15/html5/thumbnails/10.jpg)
10
Achieved II vs. Minimum II
SOR
Swim_ca
l1
Swim_cal2
Sobel
lowpass
laplace
forward
waveletBzip
2H.264
Jpeg
Libquantum Milc
Sjeng
Average0
20
40
60
80
100
EPI EMS BCEMS
Rela
tive
to M
II
Minimum II may be not achievable
Relative to MII=
The higher the value, the closer to optimum II
92.9%
EPIMap finds MII in 9 out of 14 loops
![Page 11: EPIMap : Using Epimorphism to Map Applications on CGRAs](https://reader035.vdocument.in/reader035/viewer/2022062410/56816260550346895dd2bc15/html5/thumbnails/11.jpg)
11
Reasonable Running time
0.001
0.01
0.1
1
10
100
1000
Tim
e (S
)
252
38
![Page 12: EPIMap : Using Epimorphism to Map Applications on CGRAs](https://reader035.vdocument.in/reader035/viewer/2022062410/56816260550346895dd2bc15/html5/thumbnails/12.jpg)
12
• Accelerators for energy efficiency• Coarse-grained reconfigurable architecture, a
programmable accelerator• Contributions– Problem formulation– Re-computation, routing, or both– EPIMap
• Better mappings 2.8X performance improvement• Optimum mapping in 9 out of 14• Reasonable compilation time
Summary