the importance of memory in the next generation of real...
TRANSCRIPT
![Page 2: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time](https://reader036.vdocument.in/reader036/viewer/2022070801/5f0294a87e708231d404f8f9/html5/thumbnails/2.jpg)
©2017 University of Modena and Reggio Emilia
The four horsemen1. Heavy workloads
– Sensor-fusion and image-processing
2. Reduced power consumption
– Smaller batteries and renewable power sources
3. Quickly interact with the environment
– Prompt elaboration of sensor data
4. Run highest criticality workloads
– Replacing safety-critical human activities
Future embedded systems
Artificial intelligence
Industry 4.0
Internet-of-Things
Autonomous
drivingHealth and medicine
Cyber-physical
systems
IWES @Rome, September 8, 2017 2
![Page 3: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time](https://reader036.vdocument.in/reader036/viewer/2022070801/5f0294a87e708231d404f8f9/html5/thumbnails/3.jpg)
©2017 University of Modena and Reggio Emilia
Multi- and many-core platforms are the solution for 1-2(-3)
✓ Climbing "the power wall"
✓ High Performance @ poor Watts
Real-Time system: produce result in a guaranteed/bounded amount of time
✓ By construction
✓ Application fields: automotive, avionics, industry, medical…
The keyword: predictability
✓ Provide the correct result.…when expected
✓ The system must be simple to analyze
Real-Time multi-core systems?
IWES @Rome, September 8, 2017 3
![Page 4: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time](https://reader036.vdocument.in/reader036/viewer/2022070801/5f0294a87e708231d404f8f9/html5/thumbnails/4.jpg)
©2017 University of Modena and Reggio Emilia
Single-core, multiple tasks/applications
1. Analyze the system (HW/SW)
2. Derive a (mathematical?) model
3. Do some magic mathematics…
…guaranteed timing
bounds!
Optimal sharing of the core between task
✓ ..and guaranteed by construction
✓ Scheduling (also, mapping)
Real-Time systems – traditional approach
CPU
Main memory,
or L3 cache
Offchip memory
(DRAM)
T
L1 $
Level-2 $
TT
IF
IWES @Rome, September 8, 2017 4
Application
(Taskset)
![Page 5: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time](https://reader036.vdocument.in/reader036/viewer/2022070801/5f0294a87e708231d404f8f9/html5/thumbnails/5.jpg)
©2017 University of Modena and Reggio Emilia
Architectural bottlenecks
✓ Shared memory banks
✓ Caches ($)
✓ I/Os
Multi-core systems
IWES @Rome, September 8, 2017
CPU
0
Main memory, or L3 cache
Offchip memory
CPU
1
CPU
2
CPU
3
T TT T T
L1 $ L1 $ L1 $ L1 $
Level-2 $
IF
5IWES @Rome, September 8, 2017
![Page 6: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time](https://reader036.vdocument.in/reader036/viewer/2022070801/5f0294a87e708231d404f8f9/html5/thumbnails/6.jpg)
©2017 University of Modena and Reggio Emilia
Beyond traditional tecnhiques
1. More parameters
– Shared resources (e.g., memory, SSDs, IOs, caches..)
– The complexty of analysis grows exponentially w/number of
cores
2. Mem accesses: instead of thin lines, big bars
– The mostly accessed resource in the system
– Traditional techniques are too conservative (bounds too
high)
It's (mainly) a memory issue!
MEM
Mem
ory
acc
esse
s
TT
# cores
8 4 2 1
IWES @Rome, September 8, 2017 6
![Page 7: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time](https://reader036.vdocument.in/reader036/viewer/2022070801/5f0294a87e708231d404f8f9/html5/thumbnails/7.jpg)
©2017 University of Modena and Reggio Emilia
Thousands cores arranged in CLUSTERS✓
Host✓ -acccelerator architecture (e.g., GP-GPUs)
..even worse!✓
Many-core systems
CPU
L1 $
MMU
L2 $
CPU
L1 $
MMU
L2 $
cluster
L1 MEM
DMA
cluster
L1 MEM
DMA
L2 MEM
Main Memory
Coherent interconnect
Interconnect
NetworkInterface
General purpose host Many-core acceleratorCPU
MMU/$
CPU CPU…
DMA
100s cores
IWES @Rome, September 8, 2017 7
1000s cores
![Page 8: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time](https://reader036.vdocument.in/reader036/viewer/2022070801/5f0294a87e708231d404f8f9/html5/thumbnails/8.jpg)
©2017 University of Modena and Reggio Emilia
Two motivating examples
✓ Both from real systems
1. Many-core accelerator-based platforms
– Quad-/Octa-core as host
– Integrated GPU – iGPU of FPGA
– Powerful enough to run neural networks
2. Reference industrial system
– Multi-core ARM
– Multi-OS (embedded Linux + Win for UI)
– Hypervisor-based
Knowledge of the platform is power
IWES @Rome, September 8, 2017 8
![Page 9: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time](https://reader036.vdocument.in/reader036/viewer/2022070801/5f0294a87e708231d404f8f9/html5/thumbnails/9.jpg)
©2017 University of Modena and Reggio Emilia
Qualitatively analyze and characterize the conflicts due to parallel accesses to main memory by
both CPU cores and iGPU
1. NVIDIA Tegra K1 w/Kepler GPU
2. NVIDIA Tegra X1 w/Maxwell GPU
3. NVIDIA Tegra X2 w/Parker GPU – automotive-grade
4. Intel i7-6700 w/intel GPU
5. Xilinx Zynq Ultrascale multi-core + FPGA (+GPU)
Testbed #1: "automotive" platforms
Roberto Cavicchioli, Nicola Capodieci and Marko Bertogna, "Memory Interference
Characterization between CPU cores and integrated GPUs in Mixed-Criticality
Platforms", 22nd IEEE International Conference on Emerging Technologies And
Factory Automation
IWES @Rome, September 8, 2017 9
![Page 10: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time](https://reader036.vdocument.in/reader036/viewer/2022070801/5f0294a87e708231d404f8f9/html5/thumbnails/10.jpg)
©2017 University of Modena and Reggio Emilia
✓ Shared memory between CPU/GPU complex
– "Unified Virtual Memory"
– Unlike traditional "discrete" GPU systems
Notable contention points
NVIDIA Tegra K2
1
IWES @Rome, September 8, 2017 10
![Page 11: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time](https://reader036.vdocument.in/reader036/viewer/2022070801/5f0294a87e708231d404f8f9/html5/thumbnails/11.jpg)
©2017 University of Modena and Reggio Emilia
Test 'A' - Tegra X2 – A57
IWES @Rome, September 8, 2017 11
![Page 12: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time](https://reader036.vdocument.in/reader036/viewer/2022070801/5f0294a87e708231d404f8f9/html5/thumbnails/12.jpg)
©2017 University of Modena and Reggio Emilia
✓ Last-generation FPGA-based heterogeneous SoC
– FPGA = (re-)programmability
✓ ARM A53 Quad-core as host "PS"
✓ FPGA as accelerator "PL"
Notable contention points
Xilinx Zynq Ultrascale
1
IWES @Rome, September 8, 2017 12
![Page 13: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time](https://reader036.vdocument.in/reader036/viewer/2022070801/5f0294a87e708231d404f8f9/html5/thumbnails/13.jpg)
©2017 University of Modena and Reggio Emilia
Test 'A' - Xilinx Zynq
IWES @Rome, September 8, 2017 13
![Page 14: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time](https://reader036.vdocument.in/reader036/viewer/2022070801/5f0294a87e708231d404f8f9/html5/thumbnails/14.jpg)
©2017 University of Modena and Reggio Emilia
Test 'B' - Tegra X2
IWES @Rome, September 8, 2017 14
![Page 15: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time](https://reader036.vdocument.in/reader036/viewer/2022070801/5f0294a87e708231d404f8f9/html5/thumbnails/15.jpg)
©2017 University of Modena and Reggio Emilia
Test 'B' - Xilinx Ultrascale
IWES @Rome, September 8, 2017 15
![Page 16: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time](https://reader036.vdocument.in/reader036/viewer/2022070801/5f0294a87e708231d404f8f9/html5/thumbnails/16.jpg)
©2017 University of Modena and Reggio Emilia
Test 'C' - Tegra X2 – A57
IWES @Rome, September 8, 2017 16
![Page 17: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time](https://reader036.vdocument.in/reader036/viewer/2022070801/5f0294a87e708231d404f8f9/html5/thumbnails/17.jpg)
©2017 University of Modena and Reggio Emilia
Test 'C' - Xilinx Ultrascale
IWES @Rome, September 8, 2017 17
![Page 18: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time](https://reader036.vdocument.in/reader036/viewer/2022070801/5f0294a87e708231d404f8f9/html5/thumbnails/18.jpg)
©2017 University of Modena and Reggio Emilia
✓ Interfere with prefetching mechanism
✓ Interfering cores read at increasing strided addresses
Prefetching
IWES @Rome, September 8, 2017 18
![Page 19: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time](https://reader036.vdocument.in/reader036/viewer/2022070801/5f0294a87e708231d404f8f9/html5/thumbnails/19.jpg)
©2017 University of Modena and Reggio Emilia
NXP iMX✓ 6 from Egicon
Components for F– 1 teams, industrial telescopic arms
Credits to Francesco Bellei–
Testbed #2: industrial platform
IWES @Rome, September 8, 2017 19
![Page 20: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time](https://reader036.vdocument.in/reader036/viewer/2022070801/5f0294a87e708231d404f8f9/html5/thumbnails/20.jpg)
©2017 University of Modena and Reggio Emilia
✓ More "traditional"
iMX6 mem hierarchy
Core 1
Cache L1
Cache L2
Core 2
Cache L1
Core 3
Cache L1
Core 4
Cache L1
Memoria (RAM)
fast fast fast fast
slow
IWES @Rome, September 8, 2017 20
![Page 21: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time](https://reader036.vdocument.in/reader036/viewer/2022070801/5f0294a87e708231d404f8f9/html5/thumbnails/21.jpg)
©2017 University of Modena and Reggio Emilia
Memory latency - sequential (ns)
0,0
50,0
100,0
150,0
200,0
250,0
300,0
1,0 4,0 16,0 64,0 256,0 1024,0 4096,0 16384,0
Sequential access
Senza Interferenza Interferenza 1 core Interferenza 2 core Interfernza 3 coreWorking Set in KB
N
a
n
o
s
e
c
o
n
d
s
Lat
Cache Line Size
62,8 ns
32 byte
== 0,5 GB/sMax BW =
IWES @Rome, September 8, 2017 21
![Page 22: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time](https://reader036.vdocument.in/reader036/viewer/2022070801/5f0294a87e708231d404f8f9/html5/thumbnails/22.jpg)
©2017 University of Modena and Reggio Emilia
Memory interference impact
0,0
100,0
200,0
300,0
400,0
500,0
600,0
1,0 4,0 16,0 64,0 256,0 1024,0 4096,0 16384,0
Random vs Sequential
Senza Interf. Seq. Con Interf. Seq Senza Interf. Rand. Con interf Rnd.
3 random
interference
3 sequential
interference
L1$ region Mem
N
a
n
o
s
e
c
o
n
d
s
L2 $ region
IWES @Rome, September 8, 2017 22
![Page 23: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time](https://reader036.vdocument.in/reader036/viewer/2022070801/5f0294a87e708231d404f8f9/html5/thumbnails/23.jpg)
What do we do with this
knowledge?
![Page 24: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time](https://reader036.vdocument.in/reader036/viewer/2022070801/5f0294a87e708231d404f8f9/html5/thumbnails/24.jpg)
©2017 University of Modena and Reggio Emilia
✓ A set of techniques to turn the view of the system that software has..
Single-core equivalence
CPU 0
Shared RAM
CPU 1
Shared $
CPU 0
RAM
$
CPU 1
RAM
$
…into this
Cache coloring/
partitioning
Time Division
Multiple Access
Multi-port mem
w/bank partitioning
From this…
IWES @Rome, September 8, 2017 24
Interconnect
Interconnect
![Page 25: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time](https://reader036.vdocument.in/reader036/viewer/2022070801/5f0294a87e708231d404f8f9/html5/thumbnails/25.jpg)
©2017 University of Modena and Reggio Emilia
✓ Group memory access at the beginning of
every software task
✓ Co-schedule memory accesses and tasks-
to-cores
✓ Greatly reduces the complexity of the
scheduling problem
…and increases performance
Up to 4x predictable performance
on a many-core platform
PREM - PRedictable Execution Models
MEM
Mem
ory
acc
esse
s
TT
non-PREM
TT
C
M
C
M
With PREM
Memoryscheduler
2015 paper
@ RTEST
IWES @Rome, September 8, 2017 25
![Page 27: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time](https://reader036.vdocument.in/reader036/viewer/2022070801/5f0294a87e708231d404f8f9/html5/thumbnails/27.jpg)
Backup
![Page 28: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time](https://reader036.vdocument.in/reader036/viewer/2022070801/5f0294a87e708231d404f8f9/html5/thumbnails/28.jpg)
©2017 University of Modena and Reggio Emilia
1. One observed core reads sequentially within a variable sized working set, while other cores are
interfering sequentially
2. One observed core reads randomly within a variable sized working set, while other cores are
interfering sequentially
3. One observed core reads sequentially within a variable sized working set, while other cores are
interfering randomly
4. One observed core reads randomly within a variable sized working set, while other cores are
interfering randomly
Test case A – intra-CPU interference
IWES @Rome, September 8, 2017 28
![Page 29: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time](https://reader036.vdocument.in/reader036/viewer/2022070801/5f0294a87e708231d404f8f9/html5/thumbnails/29.jpg)
©2017 University of Modena and Reggio Emilia
✓ Shared memory between CPU/GPU complex
– "Unified Virtual Memory"
Notable contention points
NVIDIA Tegra family
TK1 TX1/2
1
IWES @Rome, September 8, 2017 29
![Page 30: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time](https://reader036.vdocument.in/reader036/viewer/2022070801/5f0294a87e708231d404f8f9/html5/thumbnails/30.jpg)
©2017 University of Modena and Reggio Emilia
Intel i7-6700 Skylake
✓ x86_64 powerful host + iGPU
– Sharing L3$, External DRAM…
Notable contention points 1
IWES @Rome, September 8, 2017 30
![Page 31: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time](https://reader036.vdocument.in/reader036/viewer/2022070801/5f0294a87e708231d404f8f9/html5/thumbnails/31.jpg)
©2017 University of Modena and Reggio Emilia
Tegra X1
IWES @Rome, September 8, 2017 31
![Page 32: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time](https://reader036.vdocument.in/reader036/viewer/2022070801/5f0294a87e708231d404f8f9/html5/thumbnails/32.jpg)
©2017 University of Modena and Reggio Emilia
Tegra X2 - Denver
IWES @Rome, September 8, 2017 32
![Page 33: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time](https://reader036.vdocument.in/reader036/viewer/2022070801/5f0294a87e708231d404f8f9/html5/thumbnails/33.jpg)
©2017 University of Modena and Reggio Emilia
1. One CPU core reads sequentially within a variable working set, while the GPU accesses
memory according to different paradigms:
– CUDA memcpy
– CUDA memcpy on UVM
– CUDA memcpy on pinned mem
– CUDA memset (0)
2. Same, but CPU core reads randomly
Test case B – iGPU interference on CPU
IWES @Rome, September 8, 2017 33
![Page 34: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time](https://reader036.vdocument.in/reader036/viewer/2022070801/5f0294a87e708231d404f8f9/html5/thumbnails/34.jpg)
©2017 University of Modena and Reggio Emilia
Tegra X1
IWES @Rome, September 8, 2017 34
![Page 35: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time](https://reader036.vdocument.in/reader036/viewer/2022070801/5f0294a87e708231d404f8f9/html5/thumbnails/35.jpg)
©2017 University of Modena and Reggio Emilia
1. CPU generates sequential interfering mem accesses, while GPU accesses memory according
to different paradigms:
– CUDA memcpy
– CUDA memcpy on UVM
– CUDA memcpy on pinned mem
– CUDA memset (0)
2. Same, but CPU core interference is random
Test case C – CPU interference on iGPU
IWES @Rome, September 8, 2017 35
![Page 36: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time](https://reader036.vdocument.in/reader036/viewer/2022070801/5f0294a87e708231d404f8f9/html5/thumbnails/36.jpg)
©2017 University of Modena and Reggio Emilia
Tegra X1
IWES @Rome, September 8, 2017 36
![Page 37: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time](https://reader036.vdocument.in/reader036/viewer/2022070801/5f0294a87e708231d404f8f9/html5/thumbnails/37.jpg)
©2017 University of Modena and Reggio Emilia
Tegra X2 - Denver
IWES @Rome, September 8, 2017 37
![Page 38: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time](https://reader036.vdocument.in/reader036/viewer/2022070801/5f0294a87e708231d404f8f9/html5/thumbnails/38.jpg)
©2017 University of Modena and Reggio Emilia
Test 'C' - Intel i7-6700
IWES @Rome, September 8, 2017 38