understanding a problem in multicore and how to solve it from onur mutlu carnegie mellon university
TRANSCRIPT
![Page 1: Understanding a Problem in Multicore and How to Solve It From Onur Mutlu Carnegie Mellon University](https://reader036.vdocument.in/reader036/viewer/2022072005/56649ccc5503460f94995e36/html5/thumbnails/1.jpg)
Understanding a Problem in Multicore and How to Solve It
From Onur MutluCarnegie Mellon University
![Page 2: Understanding a Problem in Multicore and How to Solve It From Onur Mutlu Carnegie Mellon University](https://reader036.vdocument.in/reader036/viewer/2022072005/56649ccc5503460f94995e36/html5/thumbnails/2.jpg)
An Example: Multi-Core Systems
2
CORE 1
L2 C
AC
HE
0
SH
AR
ED
L3 C
AC
HE
DR
AM
INT
ER
FA
CE
CORE 0
CORE 2 CORE 3L
2 CA
CH
E 1
L2 C
AC
HE
2
L2 C
AC
HE
3
DR
AM
BA
NK
S
Multi-CoreChip
*Die photo credit: AMD Barcelona
DRAM MEMORY CONTROLLER
![Page 3: Understanding a Problem in Multicore and How to Solve It From Onur Mutlu Carnegie Mellon University](https://reader036.vdocument.in/reader036/viewer/2022072005/56649ccc5503460f94995e36/html5/thumbnails/3.jpg)
Unexpected Slowdowns in Multi-Core
3
Memory Performance HogLow priority
High priority
(Core 0) (Core 1)
Moscibroda and Mutlu, “Memory performance attacks: Denial of memory service in multi-core systems,” USENIX Security 2007.
![Page 4: Understanding a Problem in Multicore and How to Solve It From Onur Mutlu Carnegie Mellon University](https://reader036.vdocument.in/reader036/viewer/2022072005/56649ccc5503460f94995e36/html5/thumbnails/4.jpg)
A Question or Two Can you figure out why there is a disparity in
slowdowns if you do not know how the processor executes the programs?
Can you fix the problem without knowing what is happening “underneath”?
4
![Page 5: Understanding a Problem in Multicore and How to Solve It From Onur Mutlu Carnegie Mellon University](https://reader036.vdocument.in/reader036/viewer/2022072005/56649ccc5503460f94995e36/html5/thumbnails/5.jpg)
5
Why the Disparity in Slowdowns?
CORE 1 CORE 2
L2 CACHE
L2 CACHE
DRAM MEMORY CONTROLLER
DRAM
Bank 0
DRAM
Bank 1
DRAM
Bank 2
Shared DRAMMemory System
Multi-CoreChip
unfairnessINTERCONNECT
matlab gcc
DRAM
Bank 3
![Page 6: Understanding a Problem in Multicore and How to Solve It From Onur Mutlu Carnegie Mellon University](https://reader036.vdocument.in/reader036/viewer/2022072005/56649ccc5503460f94995e36/html5/thumbnails/6.jpg)
DRAM Bank Operation
6
Row Buffer
(Row 0, Column 0)
Row
dec
oder
Column mux
Row address 0
Column address 0
Data
Row 0Empty
(Row 0, Column 1)
Column address 1
(Row 0, Column 85)
Column address 85
(Row 1, Column 0)
HITHIT
Row address 1
Row 1
Column address 0
CONFLICT !
Columns
Row
s
Access Address:
![Page 7: Understanding a Problem in Multicore and How to Solve It From Onur Mutlu Carnegie Mellon University](https://reader036.vdocument.in/reader036/viewer/2022072005/56649ccc5503460f94995e36/html5/thumbnails/7.jpg)
7
DRAM Controllers
A row-conflict memory access takes significantly longer than a row-hit access
Current controllers take advantage of the row buffer
Commonly used scheduling policy (FR-FCFS) [Rixner 2000]*
(1) Row-hit first: Service row-hit memory accesses first(2) Oldest-first: Then service older accesses first
This scheduling policy aims to maximize DRAM throughput
*Rixner et al., “Memory Access Scheduling,” ISCA 2000.*Zuravleff and Robinson, “Controller for a synchronous DRAM …,” US Patent 5,630,096, May 1997.
![Page 8: Understanding a Problem in Multicore and How to Solve It From Onur Mutlu Carnegie Mellon University](https://reader036.vdocument.in/reader036/viewer/2022072005/56649ccc5503460f94995e36/html5/thumbnails/8.jpg)
8
The Problem Multiple threads share the DRAM controller DRAM controllers designed to maximize DRAM
throughput
DRAM scheduling policies are thread-unfair Row-hit first: unfairly prioritizes threads with high row buffer
locality Threads that keep on accessing the same row
Oldest-first: unfairly prioritizes memory-intensive threads
DRAM controller vulnerable to denial of service attacks Can write programs to exploit unfairness
![Page 9: Understanding a Problem in Multicore and How to Solve It From Onur Mutlu Carnegie Mellon University](https://reader036.vdocument.in/reader036/viewer/2022072005/56649ccc5503460f94995e36/html5/thumbnails/9.jpg)
// initialize large arrays A, B
for (j=0; j<N; j++) { index = rand(); A[index] = B[index]; …}
9
A Memory Performance Hog
STREAM
- Sequential memory access - Very high row buffer locality (96% hit rate)- Memory intensive
RANDOM
- Random memory access- Very low row buffer locality (3% hit rate)- Similarly memory intensive
// initialize large arrays A, B
for (j=0; j<N; j++) { index = j*linesize; A[index] = B[index]; …}
streaming random
Moscibroda and Mutlu, “Memory Performance Attacks,” USENIX Security 2007.
![Page 10: Understanding a Problem in Multicore and How to Solve It From Onur Mutlu Carnegie Mellon University](https://reader036.vdocument.in/reader036/viewer/2022072005/56649ccc5503460f94995e36/html5/thumbnails/10.jpg)
10
What Does the Memory Hog Do?
Row Buffer
Row
dec
oder
Column mux
Data
Row 0
T0: Row 0
Row 0
T1: Row 16
T0: Row 0T1: Row 111
T0: Row 0T0: Row 0T1: Row 5
T0: Row 0T0: Row 0T0: Row 0T0: Row 0T0: Row 0
Memory Request Buffer
T0: STREAMT1: RANDOM
Row size: 8KB, cache block size: 64B128 (8KB/64B) requests of T0 serviced before T1
Moscibroda and Mutlu, “Memory Performance Attacks,” USENIX Security 2007.
![Page 11: Understanding a Problem in Multicore and How to Solve It From Onur Mutlu Carnegie Mellon University](https://reader036.vdocument.in/reader036/viewer/2022072005/56649ccc5503460f94995e36/html5/thumbnails/11.jpg)
Now That We Know What Happens Underneath How would you solve the problem?
What is the right place to solve the problem? Programmer? System software? Compiler? Hardware (Memory controller)? Hardware (DRAM)? Circuits?
Two other goals of this course: Make you think critically Make you think broadly
11
Microarchitecture
ISA (Architecture)
Program/Language
Algorithm
Problem
Logic
Circuits
Runtime System(VM, OS, MM)
Electrons