efficient and portable real-time application implementation on heterogeneous mpsoc platforms
TRANSCRIPT
-
7/27/2019 Efficient and Portable Real-Time Application Implementation on Heterogeneous MPSoC Platforms
1/40
Efficient and Portable
ea - me pp ca on mp emen a on
on Heterogeneous MPSoC Platforms
Gerd Ascheid
-
7/27/2019 Efficient and Portable Real-Time Application Implementation on Heterogeneous MPSoC Platforms
2/40
Outline
n ro uc on
Case Study : MIMO OFDM Transceiver
Tool Flow
Summary & Outlook
2
-
7/27/2019 Efficient and Portable Real-Time Application Implementation on Heterogeneous MPSoC Platforms
3/40
Application Specific MPSoC Platforms
SDR Platform
P2012 Platform
Proprietary Platform
3
-
7/27/2019 Efficient and Portable Real-Time Application Implementation on Heterogeneous MPSoC Platforms
4/40
Application Specific MPSoC Platforms
SDR PlatformPlatform programmability issues:
Portability
Software is portable onto different platforms. _ , ..., _
Loadability
P2012 Platform
Platform is capable of running different standards
Device Standard_1.exe, ..., Standard_n.exe
Proprietary PlatformSpecific mobile platform issue:
Efficiency
Power consumption must be close to power consumption
4
-
7/27/2019 Efficient and Portable Real-Time Application Implementation on Heterogeneous MPSoC Platforms
5/40
Application Specific MPSoC Platforms
SDR PlatformPlatform programmability issues:
Portability
Software is portable onto different platforms. _ , ..., _
LoadabilityContradicting Requirements !
P2012 Platform
Platform is capable of running different standards
Device Standard_1.exe, ..., Standard_n.exeFlexibility (programmability) vs.Energy Efficiency
Proprietary PlatformSpecific mobile platform issue:
Efficiency
Power consumption must be close to power consumption
5
-
7/27/2019 Efficient and Portable Real-Time Application Implementation on Heterogeneous MPSoC Platforms
6/40
Energy Efficiency Comparison
CAESAR
Algorithmic Flexibility Cost by Energy Efficiency
~
nJ
1
0.1
.
of magnitude
ionBits/
10-2
~ . or ers
of magnitude
QPSKIRISC-fp(C-code,
hardwaretInform
at
10-3
10-4
64 QAM
fixed pt.)
IRISC(portableCorrec
10-5
C-code,
software fixed pt. emul.)10-6
6
Signal-to-Noise Ratio
0 10 20 3040
-
7/27/2019 Efficient and Portable Real-Time Application Implementation on Heterogeneous MPSoC Platforms
7/40
Outline
n ro uc on
Case Study: MIMO OFDM Transceiver
Tool Flow
Summary & Outlook
7
-
7/27/2019 Efficient and Portable Real-Time Application Implementation on Heterogeneous MPSoC Platforms
8/40
Nucleus Methodology
Transceiver
Description
Transceiver DescriptionN 1
N 7 N 5N 2Non N
Tasks
Nuclei N 1 N 2 N 7 NNN 5
Nucleus
Library
Critical, demanding, algorithmic kernel
Kernel is common among different waveforms
o wave orm nor ar ware spec c
PE 2
(rASIP)
PE 3
(DSP)PE1
(ASIP)HW
8
MEM
Comm. Arch.
PE 5(FPGA)
PE 4(GPP)
Platform
-
7/27/2019 Efficient and Portable Real-Time Application Implementation on Heterogeneous MPSoC Platforms
9/40
Nucleus Methodology
Transceiver
Description
Transceiver DescriptionN 1
N 7 N 5N 2Non N
Tasks
Nuclei N 1 N 2 N 7 NNN 5
Nucleus
Library
NINI
Flavor
PEsPE 2 PE 3 PE 4 PE 5PE 1
PE 2
(rASIP)
PE 3
(DSP)PE1
(ASIP)HW
MEM
Comm. Arch.
PE 5(FPGA)
PE 4(GPP)
Platform
9
-
7/27/2019 Efficient and Portable Real-Time Application Implementation on Heterogeneous MPSoC Platforms
10/40
Nucleus Methodology
Transceiver
Description
Transceiver DescriptionN 1
N 7 N 5N 2Non N
Tasks
Nuclei N 1 N 2 N 7 NNN 5
Nucleus
Library
Mapping
&
Evaluation
Compile
Board
NI
NINI
PEs
uppor
Package
PE 2PE 1 PE 3 PE 4 PE 5
Flavors
PE 2
(rASIP)
PE 3
(DSP)PE1
(ASIP)HW
MEM
Comm. Arch.
PE 5(FPGA)
PE 4(GPP)
Platform
10
-
7/27/2019 Efficient and Portable Real-Time Application Implementation on Heterogeneous MPSoC Platforms
11/40
Outline
n ro uc on
Case Study: MIMO OFDM Transceiver
Application analysis - Nuclei identification Mapping on a ST P2012 platform
-
Using a preprocessing ASIP
Tool Flow
11
-
7/27/2019 Efficient and Portable Real-Time Application Implementation on Heterogeneous MPSoC Platforms
12/40
Nuclei Identification: Transceiver Structure
Outer Modem
IEEE 802.11n
Channel (De-)coding (De-)Interleaving
nner o em
RX OFDM Processing
Channel Estimation
OFDM Slot
Spatial Equalizing: Mitigate channel impact on payload
Soft Demapping: Calculate soft bits (LLRs)
, , ,
12
-
7/27/2019 Efficient and Portable Real-Time Application Implementation on Heterogeneous MPSoC Platforms
13/40
Nuclei Identification: Kernel Identification
Analyze different algorithmic choices within RX blocks
Identify computational kernels
Operate on data with certain alignment
Build application as composition of kernels
13
-
7/27/2019 Efficient and Portable Real-Time Application Implementation on Heterogeneous MPSoC Platforms
14/40
Nuclei Identification: Kernel Overview
Application variants consist of a few kernels only!14
-
7/27/2019 Efficient and Portable Real-Time Application Implementation on Heterogeneous MPSoC Platforms
15/40
Nuclei Identification: Kernel Overview
Wide variety of algorithms is coveredfor key transceiver functions
Spatial Equalization
Channel Coding
15
-
7/27/2019 Efficient and Portable Real-Time Application Implementation on Heterogeneous MPSoC Platforms
16/40
Frame Error Rate of 4x4 MIMO System
Fix-point issues at low FER
mus e cons ere n mapp ng me o o ogy(not discussed in this presentation)
16
-
7/27/2019 Efficient and Portable Real-Time Application Implementation on Heterogeneous MPSoC Platforms
17/40
What Describes a Nucleus Library Element?
Flavors and configuration parameters influenceperformance properties
Latency
Throughput
Implementation properties
rea
Memory
Algorithmic properties
Bit error rate
V. Ramakrishnan et al.: Efficient Implementations
From Libraries: Analyzing the Influence of
Properties, IEEE PIMRC Conference 2009
-
7/27/2019 Efficient and Portable Real-Time Application Implementation on Heterogeneous MPSoC Platforms
18/40
Outline
n ro uc on
Case Study: MIMO OFDM Transceiver
Application analysis - Nuclei identification
Mapping on a ST P2012 platform
-
Using a preprocessing ASIP
Tool Flow
18
-
7/27/2019 Efficient and Portable Real-Time Application Implementation on Heterogeneous MPSoC Platforms
19/40
Application Implementation:
P2012 Platform (ST Microelectronics)
SoC platform with maximum of 32 clusters
One cluster provides
Max. 16 RISC cores (STxP70) @ 600MHz
VECx vector extension (SIMD)
128 bit vector re isters
8x16 bit or 4x32 bit operations
Hardware synchronizer for inter-core signaling
19
-
7/27/2019 Efficient and Portable Real-Time Application Implementation on Heterogeneous MPSoC Platforms
20/40
Application Implementation: Kernel Overview
For 2x2 and 4x4 MIMO use case
Cycles for execution on
single STxP70 processorcore including VECX unit
Corresponding time for
600MHz clock frequency
In the range of
IEEE 802.11n real time(4s per OFDM slot)
20
-
7/27/2019 Efficient and Portable Real-Time Application Implementation on Heterogeneous MPSoC Platforms
21/40
Application-to-Platform Mapping:
Assigning Cores to PGs
Given: Single core timing requirements
Goal: Assi n cores to match real time constraints 4 s er slot
Task time (us) #cores
LS Channel Estimation 17.47
Equalizer Preprocessing 159.36
4
4
Actual Processing (per OFDM slot)
OFDM Demodulation (mem. realign) 6.83
E ualizer Actual Detection 6.08
2
Soft Demapping (64 QAM) 3.78 2
21
-
7/27/2019 Efficient and Portable Real-Time Application Implementation on Heterogeneous MPSoC Platforms
22/40
Application-to-Platform Mapping:
Assigning Cores
Final mapping includes parallelism on task level Partitioning of components into processing groups
um er o cores per group
8 cores enable real time
PG 2
PP&EQ
4 PEs
PG 1Modulation
2 PEs
Demapping2 PEs
22
-
7/27/2019 Efficient and Portable Real-Time Application Implementation on Heterogeneous MPSoC Platforms
23/40
Occupation Graph
Implementation on P2012 platform using 8 cores
Minimum latency for 21 or more OFDM slots of data payload
.
IEEE 802.11n allows 16s (including MAC layer)
23
-
7/27/2019 Efficient and Portable Real-Time Application Implementation on Heterogeneous MPSoC Platforms
24/40
Outline
n ro uc on
Case Study: MIMO OFDM Transceiver
Application analysis - Nuclei identification
Mapping on a ST P2012 platform
-
Using a preprocessing ASIP
Tool Flow
24
-
7/27/2019 Efficient and Portable Real-Time Application Implementation on Heterogeneous MPSoC Platforms
25/40
TI C6474 Evaluation Board
25
-
7/27/2019 Efficient and Portable Real-Time Application Implementation on Heterogeneous MPSoC Platforms
26/40
Execution Time 2x2
Component Cycles Time (us) @850MHzPreprocessing Steps
OFDM Demodulation 1,296 1.525
Subcarrier Demapping 1,220 1.435
CP remove 360 0.424
, , .
Detection Preproc. (MIMO 2x2, MMSE, DS) 39,499 46.469
Sum 45,647 53.702
Detection Steps
OFDM Demodulation 648 0.762
Subcarrier Demapping 610 0.359
CP remove 180 0.718
Equalizing - MIMO 984 1.158
. .
Soft Demapping 16QAM (per OFDM sym.) 618 0.727
Soft Demapping 64QAM (per OFDM sym.) 1,062 1.249
Close to real time
requirements (4s)
already with 1 core
Close to real time
requirements (4s)
already with 1 core
26
Sum 3,484 4.099
-
7/27/2019 Efficient and Portable Real-Time Application Implementation on Heterogeneous MPSoC Platforms
27/40
Outline
n ro uc on
Case Study: MIMO OFDM Transceiver
Application analysis - Nuclei identification
Mapping on a ST P2012 platform
-
Using a preprocessing-optimized ASIP
Tool Flow
27
-
7/27/2019 Efficient and Portable Real-Time Application Implementation on Heterogeneous MPSoC Platforms
28/40
Issue
Some preprocessing functionsoperate on short vectors and small matrices (4x4)
rregu ar process ng no so su a e or vec or processors
ASIP with register file large enough
to hold all vectors and matrices
erna ve :
Coarse grained reconfigurable ASIP
28
-
7/27/2019 Efficient and Portable Real-Time Application Implementation on Heterogeneous MPSoC Platforms
29/40
Issue
Some preprocessing functionsoperate on short vectors and small matrices (4x4)
rregu ar process ng no so su a e or vec or processors
ASIP with register file large enough
to hold all vectors and matrices
erna ve :
Coarse grained reconfigurable ASIP
29
-
7/27/2019 Efficient and Portable Real-Time Application Implementation on Heterogeneous MPSoC Platforms
30/40
Alternative Solution 2: CGRA
Pipelined Coarse Grained Reconfigurable Array (CGRA) Processing Element (PE)
omp ex mu p er
Complex ALU
Shifter PE 2x2
d1 d2 d3 d4
PE 2x2
Local register file
PE 2x2
PE PE
PE PE
PE PE
PE PE
Broadcast inputs PE Chain
PEChainPE PECenterAlpha PE PE
Results from PE 2x2
Center Alpha Unit
S ecial al ha arameter
CGRA
PE PE
PE PE
PE PE
PE PE
in matrix inversion
30
-
7/27/2019 Efficient and Portable Real-Time Application Implementation on Heterogeneous MPSoC Platforms
31/40
Integration with Processor
LLR
CGRA
on . em a r
Ctrl. Info.
LLR Reg. idx
H, y, N0
conf. bitsRISC Pipeline
PF FE DC WBEX
Wide
Register
FileData
Memory
(128-bit) Mem. Write Req. & Data
Mem. Read Data
. .
31
-
7/27/2019 Efficient and Portable Real-Time Application Implementation on Heterogeneous MPSoC Platforms
32/40
Results: Area and Throughput
Synthesis results*Component Area / kGE
(Gate Equivalents)
Complex multiplier 13,0
Complex adder 1,1
Processing Element 18,6
CGRA 393
Configuration Memory 43
Processing performance Input: H, y
Output: soft
Throughput: yHIHHx HH 10
N
Speed up > 2 orders of magnitude
(150 Msym/s @ 588 MHz 30 ns/sym)^
* Synopsys Design Compiler Ultra 2010.12-SP2 Faraday 65nm technology,
Standard Performance, typical case library & conditions, 1.00V, 25C
32
-
7/27/2019 Efficient and Portable Real-Time Application Implementation on Heterogeneous MPSoC Platforms
33/40
Outline
n ro uc on
Case Study: MIMO OFDM Transceiver
Tool Flow
Summary & Outlook
33
-
7/27/2019 Efficient and Portable Real-Time Application Implementation on Heterogeneous MPSoC Platforms
34/40
Tool Flow: Overview
Flow inputs
Nucleus
Library
User inputs
N1 N2 NN1
Waveform descriptionConfig & constraints
Flow inputs
N3 .cpn.xml .xml
Flavor
Board Support
Package
Library
Platform
Description
Nucleus Mapper MAPS*
.xmlCode Generator
-
Virtual Platform
-
implementation
-
Virtual Platform
34
* For further details see the presentation by Rainer Leupers
Programming Heterogeneous MPSoC Platforms: The MAPS Approach
-
7/27/2019 Efficient and Portable Real-Time Application Implementation on Heterogeneous MPSoC Platforms
35/40
Tool Flow: Overview (2)
Screenshot of Graphical User Interface:
35
T l Fl M i
-
7/27/2019 Efficient and Portable Real-Time Application Implementation on Heterogeneous MPSoC Platforms
36/40
Tool Flow: Mapping
Mapping: Finding a flavor for every nucleus Architecture: List ofOption-Graphs that is refined
Exports: Feasible options for further analysis
Flavor
Library
Platform
Description
Provides timing analysis,Provides timing analysis,Provides timing analysis,
Interface Agent
.xml mapp ng an sc e u ngmapp ng an sc e u ngmapp ng an sc e u ng
Option Graphs
(NiFij)
Throughput Agent
Latency AgentMAPS
Mapping and.
36
p on rap
Scheduler descriptor
T l Fl C d G ti
-
7/27/2019 Efficient and Portable Real-Time Application Implementation on Heterogeneous MPSoC Platforms
37/40
Tool Flow: Code Generation
With option graph and schedule descriptorgenerates code for different platforms
Multi-threaded host
implementation
Generate
avor .
MAPS
CPN -
Virtual PlatformCompiler
Mapper
ISS-based
Virtual Platform
Code generation for target system or simulation platform
37
O tli
-
7/27/2019 Efficient and Portable Real-Time Application Implementation on Heterogeneous MPSoC Platforms
38/40
Outline
n ro uc on
Case Study: MIMO OFDM Transceiver
Tool Flow
Summary & Outlook
38
Summary & Outlook
-
7/27/2019 Efficient and Portable Real-Time Application Implementation on Heterogeneous MPSoC Platforms
39/40
Summary & Outlook
Summary
Nucleus methodology represents a systematic design approach
for flexible and efficient transceiver design
Nucleus methodology successfully appliedto a MIMO OFDM transceiver with mapping
onto different MPSoC platforms
Tool support for methodology available
Outlook
xpan concep o o er cr ca process ng unc ons
in portable multimedia devices
39
Basic References
-
7/27/2019 Efficient and Portable Real-Time Application Implementation on Heterogeneous MPSoC Platforms
40/40
Basic References
Ramakrishnan, V., Witte, E. M., Kempf, T., Kammler, D., Ascheid, G., Meyr, H., Adrat, M., and Antweiler, M.:
Efficient and Portable SDR Waveform Development: The Nucleus Concept,
in Proceedings of the IEEE Military Communications Conference (MILCOM) (Boston, USA), pp. 918924, Oct. 2009
Kempf, T., Guenther, D., Ishaque, A., and Ascheid, G.:
MIMO OFDM transceiver for a Many-Core Computing Fabric - A Nucleus based Implementation,
in SDR'11 - The Wireless Innovation Forum Conference on Communications Technologies and
Software Defined Radio (Washington D.C., USA), Dec. 2011
Chen, X., Minwegen, A., Hassan, Y., Kammler, D., Li, S., Kempf, T., Chattopadhyay, A., Ascheid, G., Leupers, R.:
FLEXDET: Flexible, Efficient Multi-Mode MIMO Detection using reconfigurable ASIP,
20th Annual IEEE International Symposium on Field-Programmable Custom Computing Machines 2012
J. Castrillon, S. Schrmans, A. Stulova, W. Sheng, T. Kempf, R. Leupers, G. Ascheid, and H. Meyr,
Component-Based Waveform Development: The Nucleus Tool Flow for Efficient and Portable Software Defined Radio,Journal of Analog Integrated Circuits and Signal Processing, vol. 69, no. 2, pp. 173190, Oct. 2011
, ., , ., , .
MAPS: Mapping Concurrent Dataflow Applications to Heterogeneous MPSoCs,
IEEE Transactions on Industrial Informatics, p. 19, Nov. 2011
40