1 presenter: ming-shiun yang 2013/01/21 saga : systemc acceleration on gpu architectures design...
TRANSCRIPT
![Page 1: 1 Presenter: Ming-Shiun Yang 2013/01/21 SAGA : SystemC Acceleration on GPU Architectures Design Automation Conference (DAC), 2012 49 th ACM/EDAC/IEEE Sara](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdf1a28abf838cb26b5/html5/thumbnails/1.jpg)
1
Presenter: Ming-Shiun Yang
National Sun Yat-sen University
Embedded System Laboratory
2013/01/21
SAGA : SystemC Acceleration on GPU Architectures
Design Automation Conference (DAC), 2012 49th ACM/EDAC/IEEESara Vinco(Italy), Debapriya Chatterjee(USA), Valeria Bertacco(USA), Franco Fummi(Italy)
![Page 2: 1 Presenter: Ming-Shiun Yang 2013/01/21 SAGA : SystemC Acceleration on GPU Architectures Design Automation Conference (DAC), 2012 49 th ACM/EDAC/IEEE Sara](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdf1a28abf838cb26b5/html5/thumbnails/2.jpg)
SystemC is a widespread language for HW/SW system simulation and design exploration, and thus a key development platform in embedded system design. However, the growing complexity of SoC designs is having an impact on simulation performance, leading to limited SoC exploration potential, which in turns affects development and verification schedules and time-to-market for new designs. Previous efforts have attempted to parallelize SystemC simulation, targeting both multiprocessors and GPUs. However, for practical designs, those approached fall far short of satisfactory performance. This paper proposes SAGA, a novel simulation approach that fully exploits the intrinsic parallelism of RTL SystemC descriptions, targeting GPU platforms. By limiting synchronization events with ad-hoc static scheduling and separate independent dataflows, we shows that we can simulate complex SystemC descriptions up to 16 times faster than traditional simulators.
Abstract
2
![Page 3: 1 Presenter: Ming-Shiun Yang 2013/01/21 SAGA : SystemC Acceleration on GPU Architectures Design Automation Conference (DAC), 2012 49 th ACM/EDAC/IEEE Sara](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdf1a28abf838cb26b5/html5/thumbnails/3.jpg)
Original SystemC simulation Use scheduler to dispatch all processes to one core. Sequential processing. The growing complexity of SoC designs is having
impact on simulation performance.
What’s the problem
3
![Page 4: 1 Presenter: Ming-Shiun Yang 2013/01/21 SAGA : SystemC Acceleration on GPU Architectures Design Automation Conference (DAC), 2012 49 th ACM/EDAC/IEEE Sara](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdf1a28abf838cb26b5/html5/thumbnails/4.jpg)
Related Works
4
[1,4,9,10] Parallel SystemC
Environment
[7]CUDA Programming Guide
[3]HIFSuite
Mapping SystemC to CUDAGeneral purpose programming interface
This Paper
Heavy overheadCode modification
![Page 5: 1 Presenter: Ming-Shiun Yang 2013/01/21 SAGA : SystemC Acceleration on GPU Architectures Design Automation Conference (DAC), 2012 49 th ACM/EDAC/IEEE Sara](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdf1a28abf838cb26b5/html5/thumbnails/5.jpg)
Compute Unified Device Architecture (CUDA)
An interface is proposed to GP-GPU programming
GPU is a co-processor capable of executing many threads in parallel
NVIDIA CUDA Architecture
5
HIFSuite:
sc2hif
HIFSuite:hif2C
C fileHIF fileSystemC CUDA
Mapping SystemC to CUDA :
![Page 6: 1 Presenter: Ming-Shiun Yang 2013/01/21 SAGA : SystemC Acceleration on GPU Architectures Design Automation Conference (DAC), 2012 49 th ACM/EDAC/IEEE Sara](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdf1a28abf838cb26b5/html5/thumbnails/6.jpg)
SAGA Exploit scheduling to eliminate the need of frequent
synchronization. Carve independent dataflows and then mapped to distinct threads
and processors. (Parallel execution)
Proposed Method
6 Traditional Simulator Proposed Simulator (SAGA)
HIFSuite:
sc2hif SAGA
HIFSuite:hif2C
C fileHIF filemodified
HIF fileSystemCCUDA
![Page 7: 1 Presenter: Ming-Shiun Yang 2013/01/21 SAGA : SystemC Acceleration on GPU Architectures Design Automation Conference (DAC), 2012 49 th ACM/EDAC/IEEE Sara](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdf1a28abf838cb26b5/html5/thumbnails/7.jpg)
SAGA methodology – Steps 1. Construction the dependency graph
7
![Page 8: 1 Presenter: Ming-Shiun Yang 2013/01/21 SAGA : SystemC Acceleration on GPU Architectures Design Automation Conference (DAC), 2012 49 th ACM/EDAC/IEEE Sara](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdf1a28abf838cb26b5/html5/thumbnails/8.jpg)
SAGA methodology –Step 2 : Partitioning into concurrent dataflows
8
![Page 9: 1 Presenter: Ming-Shiun Yang 2013/01/21 SAGA : SystemC Acceleration on GPU Architectures Design Automation Conference (DAC), 2012 49 th ACM/EDAC/IEEE Sara](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdf1a28abf838cb26b5/html5/thumbnails/9.jpg)
Example – step 2
9
P8
Queue
P8
Current dataflow list
Queue ≠ Empty, pop P8
Queue
P8
Current dataflow list
P6 P7 P6
Queue
P8
Current dataflow list
P2 P6
Queue ≠ Empty, pop P6
Queue
P8
Current dataflow list
P1 P2 P6 P7P3
P1P7
Queue ≠ Empty, pop P7
P7
P4
Current dataflow list
P8 P6 P7 P1 P2 P3 P4
![Page 10: 1 Presenter: Ming-Shiun Yang 2013/01/21 SAGA : SystemC Acceleration on GPU Architectures Design Automation Conference (DAC), 2012 49 th ACM/EDAC/IEEE Sara](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdf1a28abf838cb26b5/html5/thumbnails/10.jpg)
10
SAGA methodology –Step 3 : Process levelization and scheduling
![Page 11: 1 Presenter: Ming-Shiun Yang 2013/01/21 SAGA : SystemC Acceleration on GPU Architectures Design Automation Conference (DAC), 2012 49 th ACM/EDAC/IEEE Sara](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdf1a28abf838cb26b5/html5/thumbnails/11.jpg)
Example – step 3
11
0 0 0 0 0 0 0
1. Set all leaf nodes to 0 level
2. Set all non-leaf nodes to -1 level3. if parent level < child level, parent level = child level +1 ex. P6’s level < P1’s level P6’s level = 0+1 =1 …
-1 -1 -1
-1 -1
1 1 1
2 2
![Page 12: 1 Presenter: Ming-Shiun Yang 2013/01/21 SAGA : SystemC Acceleration on GPU Architectures Design Automation Conference (DAC), 2012 49 th ACM/EDAC/IEEE Sara](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdf1a28abf838cb26b5/html5/thumbnails/12.jpg)
Experimental setup
12
Column 3 : loc – line of codesColumn 4 : Dataflows (#) – partition number of dataflows in step 2.Column 5 : Replicated processes / the maxmum amount of replication for
these process
![Page 13: 1 Presenter: Ming-Shiun Yang 2013/01/21 SAGA : SystemC Acceleration on GPU Architectures Design Automation Conference (DAC), 2012 49 th ACM/EDAC/IEEE Sara](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdf1a28abf838cb26b5/html5/thumbnails/13.jpg)
SAGA Performance and Speedup
13
16 times faster than traditional SystemC simulator.
![Page 14: 1 Presenter: Ming-Shiun Yang 2013/01/21 SAGA : SystemC Acceleration on GPU Architectures Design Automation Conference (DAC), 2012 49 th ACM/EDAC/IEEE Sara](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdf1a28abf838cb26b5/html5/thumbnails/14.jpg)
Costs of Compilation
14
HIFSuite : A set of tools and APIs that provide support for modeling and verification of HW/SW system.
HIFSuite:
sc2hif SAGA
HIFSuite:hif2C
C fileHIF filemodified
HIF fileSystemCCUDA
![Page 15: 1 Presenter: Ming-Shiun Yang 2013/01/21 SAGA : SystemC Acceleration on GPU Architectures Design Automation Conference (DAC), 2012 49 th ACM/EDAC/IEEE Sara](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdf1a28abf838cb26b5/html5/thumbnails/15.jpg)
Proposed a parallel schedule method for SystemC simulator.
A novel partitioning technique to carve independent dataflows mapped to distinct threads and multi-processors.
Conclusion
15
![Page 16: 1 Presenter: Ming-Shiun Yang 2013/01/21 SAGA : SystemC Acceleration on GPU Architectures Design Automation Conference (DAC), 2012 49 th ACM/EDAC/IEEE Sara](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdf1a28abf838cb26b5/html5/thumbnails/16.jpg)
The time of translating SystemC to CUDA by HIFSuite is so long.
。They expect that a mature version could operate directly on SystemC source code (future work)
This paper is good illustrate clearly Experiment result
。Achieve their goal (reduce the simulation time) 。Many analysis。Compare with other works
My common
16