symbiotic scheduling for shared caches in multi-core systems using memory footprint signature
DESCRIPTION
Symbiotic Scheduling for Shared Caches in Multi-Core Systems Using Memory Footprint Signature. Mrinmoy Ghosh Ripal Nathuji Min Lee Karsten Schwan Hsien-Hsin S. Lee. ARM Microsoft Research Georgia Tech. Cache Interference in “Concurrent Processes”. Core B. Core A. P2. P1. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Symbiotic Scheduling for Shared Caches in Multi-Core Systems Using Memory Footprint Signature](https://reader036.vdocument.in/reader036/viewer/2022081417/56814963550346895db6b90f/html5/thumbnails/1.jpg)
Symbiotic Scheduling for Shared Caches in Multi-Core Systems Using Memory Footprint Signature
Mrinmoy Ghosh
Ripal Nathuji
Min Lee
Karsten Schwan
Hsien-Hsin S. Lee
ARM Microsoft Research Georgia Tech
![Page 2: Symbiotic Scheduling for Shared Caches in Multi-Core Systems Using Memory Footprint Signature](https://reader036.vdocument.in/reader036/viewer/2022081417/56814963550346895db6b90f/html5/thumbnails/2.jpg)
Cache Interference in “Concurrent Processes”
L2 Cache
Core A
L1 Cache
Core B
L1 Cache
P1
P2
P1 $ LineP2 $ LineLine Hit !!!Conflict !!!
![Page 3: Symbiotic Scheduling for Shared Caches in Multi-Core Systems Using Memory Footprint Signature](https://reader036.vdocument.in/reader036/viewer/2022081417/56814963550346895db6b90f/html5/thumbnails/3.jpg)
Cache Interference Effect (Concurrent Processes)
Maximum performance degradation less than 10%
mcf
libq
mcf
perl
mcf
perl
libqlibq
libq
mcfmcf
libq
0.96
0.98
1.00
1.02
1.04
1.06
1.08
1.10
perlb
ench
gobm
k
hmm
er
sopl
ex
povr
ay
omne
tpp
mcf
libqu
antu
m
asta
r
bwav
es
sphi
nx3
xala
ncbm
k
Rel
ativ
e R
un
Tim
e
![Page 4: Symbiotic Scheduling for Shared Caches in Multi-Core Systems Using Memory Footprint Signature](https://reader036.vdocument.in/reader036/viewer/2022081417/56814963550346895db6b90f/html5/thumbnails/4.jpg)
Cache Interference in “Shared Cache Multi-Core”
L2 Cache
Core A
L1 Cache
Core B
L1 Cache
P1 P2
P1 $ LineP2 $ LineConflict !!!
![Page 5: Symbiotic Scheduling for Shared Caches in Multi-Core Systems Using Memory Footprint Signature](https://reader036.vdocument.in/reader036/viewer/2022081417/56814963550346895db6b90f/html5/thumbnails/5.jpg)
Cache Interference Effect (Shared Cache Multi-Core)
Performance degraded by as much as 65%
lbmlbmlibqbwaves
libq libqmcf
libq
libq
libqlibq
soplex
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
1.8
perlb
ench
gobm
k
hmm
er
sopl
ex
povr
ay
omne
tpp
mcf
libqu
antu
m
asta
r
bwav
es
sphi
nx3
xala
ncbm
k
Rel
ativ
e R
un
Tim
es
Intelligent Process Management Needed !!
![Page 6: Symbiotic Scheduling for Shared Caches in Multi-Core Systems Using Memory Footprint Signature](https://reader036.vdocument.in/reader036/viewer/2022081417/56814963550346895db6b90f/html5/thumbnails/6.jpg)
• Problem– Processes in different cores can be incompatible– Shared resource contention
• Observation– Less contention of incompatible processes when running
on the same core
• Insight:
– Process incompatibility severely affects performance– Compatibility-based scheduling increases throughput
Process (In-)Compatibility in Multi-Cores
![Page 7: Symbiotic Scheduling for Shared Caches in Multi-Core Systems Using Memory Footprint Signature](https://reader036.vdocument.in/reader036/viewer/2022081417/56814963550346895db6b90f/html5/thumbnails/7.jpg)
7
Ideas
• Use Counting Bloom Filter to record memory access signature
• Compatibility test using signature
![Page 8: Symbiotic Scheduling for Shared Caches in Multi-Core Systems Using Memory Footprint Signature](https://reader036.vdocument.in/reader036/viewer/2022081417/56814963550346895db6b90f/html5/thumbnails/8.jpg)
Insertion: Counting Bloom Filter
PresenceBit
1
1
Counter
N-to-mHash Func X
N-to-mHash Func Y
N-bit Data Address A
![Page 9: Symbiotic Scheduling for Shared Caches in Multi-Core Systems Using Memory Footprint Signature](https://reader036.vdocument.in/reader036/viewer/2022081417/56814963550346895db6b90f/html5/thumbnails/9.jpg)
Insertion: Counting Bloom Filter
PresenceBit
1
1
1
Counter
N-to-mHash Func X
N-to-mHash Func Y
N-bit Data Address B
2
![Page 10: Symbiotic Scheduling for Shared Caches in Multi-Core Systems Using Memory Footprint Signature](https://reader036.vdocument.in/reader036/viewer/2022081417/56814963550346895db6b90f/html5/thumbnails/10.jpg)
Deletion: Counting Bloom Filter
PresenceBit
1
1
Counter
N-to-mHash Func X
N-to-mHash Func Y
Data Address AWas Evicted
12
![Page 11: Symbiotic Scheduling for Shared Caches in Multi-Core Systems Using Memory Footprint Signature](https://reader036.vdocument.in/reader036/viewer/2022081417/56814963550346895db6b90f/html5/thumbnails/11.jpg)
Query: Counting Bloom Filter
PresenceBit
1
0
2
Counter
N-to-mHash Func X
N-to-mHash Func Y
Data Address A??
1
Data Not Present !!!
![Page 12: Symbiotic Scheduling for Shared Caches in Multi-Core Systems Using Memory Footprint Signature](https://reader036.vdocument.in/reader036/viewer/2022081417/56814963550346895db6b90f/html5/thumbnails/12.jpg)
Bloom Filter Signatures vs. Cache Footprint
Strong Correlation !!!
0
500
1000
1500
2000
2500
3000
3500
0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700
Cache Footprint Signature Value
![Page 13: Symbiotic Scheduling for Shared Caches in Multi-Core Systems Using Memory Footprint Signature](https://reader036.vdocument.in/reader036/viewer/2022081417/56814963550346895db6b90f/html5/thumbnails/13.jpg)
13
Architectural Support
![Page 14: Symbiotic Scheduling for Shared Caches in Multi-Core Systems Using Memory Footprint Signature](https://reader036.vdocument.in/reader036/viewer/2022081417/56814963550346895db6b90f/html5/thumbnails/14.jpg)
Bloom Filter Signature Multi-Core Architecture
L2 Cache
Core A
L1 Cache
Core B
L1 Cache
Last Filter
Core Filter
Last Filter
Core Filter
Bloom Filter Counters
![Page 15: Symbiotic Scheduling for Shared Caches in Multi-Core Systems Using Memory Footprint Signature](https://reader036.vdocument.in/reader036/viewer/2022081417/56814963550346895db6b90f/html5/thumbnails/15.jpg)
Bloom Filter Signature Multi-Core Architecture
L2 Cache
Core A
L1 Cache
Core B
L1 Cache
P1 P2
Last Filter
Core Filter
Last Filter
Core Filter
Bloom Filter Counters
P3
![Page 16: Symbiotic Scheduling for Shared Caches in Multi-Core Systems Using Memory Footprint Signature](https://reader036.vdocument.in/reader036/viewer/2022081417/56814963550346895db6b90f/html5/thumbnails/16.jpg)
Metric for Execution StateLast Filter
Core Filter
RBV (Running Bit Vector)
+Occupancy Weight
(i.e., # of 1s)
![Page 17: Symbiotic Scheduling for Shared Caches in Multi-Core Systems Using Memory Footprint Signature](https://reader036.vdocument.in/reader036/viewer/2022081417/56814963550346895db6b90f/html5/thumbnails/17.jpg)
Interference Metric (Complement of Symbiosis)
Process Pool (Processes waiting to be scheduled) Proc1 RBV
Proc0
Proc1
Proc2
Proc**Proc*
Core Filter
Symbiosis = 5+
Interference Metric = N - 5
+
![Page 18: Symbiotic Scheduling for Shared Caches in Multi-Core Systems Using Memory Footprint Signature](https://reader036.vdocument.in/reader036/viewer/2022081417/56814963550346895db6b90f/html5/thumbnails/18.jpg)
18
Process-to-Core
Mapping Algorithms
• A1: Use Occupancy Weight
• A2: Use Interference Graph
• A3: Use Weighted Interference Graph
![Page 19: Symbiotic Scheduling for Shared Caches in Multi-Core Systems Using Memory Footprint Signature](https://reader036.vdocument.in/reader036/viewer/2022081417/56814963550346895db6b90f/html5/thumbnails/19.jpg)
• Sort all processes according to occupancy weight• Processes form groups using sorted weight
– # of processes in a group = Processes/Cores• Map processes to cores based on sorting results
A1: Weight Sorted Algorithm
P0100
P499
P270
P565
P643
P320
P115
Core A
L1 Cache
Core B
L1 Cache
Core C
L1 Cache
Core D
L1 Cache
![Page 20: Symbiotic Scheduling for Shared Caches in Multi-Core Systems Using Memory Footprint Signature](https://reader036.vdocument.in/reader036/viewer/2022081417/56814963550346895db6b90f/html5/thumbnails/20.jpg)
• Form interference graph using interference metric• Find MAX-CUT of the graph
A2: Interference Graph Algorithm
P0
CA=20
CB=30
P1
CA=10
CB=45
P2
CA=40
CB=25
P3
CA=15
CB=50
Was in CA Was in CB
P0(A)
P1(A)
P2(B)
P3(B)
30
40Interference Graph
![Page 21: Symbiotic Scheduling for Shared Caches in Multi-Core Systems Using Memory Footprint Signature](https://reader036.vdocument.in/reader036/viewer/2022081417/56814963550346895db6b90f/html5/thumbnails/21.jpg)
• Form interference graph using interference metric• Find MAX-CUT of the graph
A2: Interference Graph Algorithm
P0
CA=20
CB=30
P1
CA=10
CB=45
P2
CA=40
CB=25
P3
CA=15
CB=50
Was in CA Was in CB
P0(A)
P1(A)
P2(B)
P3(B)
70
Interference Graph
![Page 22: Symbiotic Scheduling for Shared Caches in Multi-Core Systems Using Memory Footprint Signature](https://reader036.vdocument.in/reader036/viewer/2022081417/56814963550346895db6b90f/html5/thumbnails/22.jpg)
• Form interference graph using interference metric• Find MAX-CUT of the graph
A2: Interference Graph Algorithm
P0
CA=20
CB=30
P1
CA=10
CB=45
P2
CA=40
CB=25
P3
CA=15
CB=50
Was in CA Was in CB
P0(A)
P1(A)
P2(B)
P3(B)
70
Interference Graph
60
30 7545
85
![Page 23: Symbiotic Scheduling for Shared Caches in Multi-Core Systems Using Memory Footprint Signature](https://reader036.vdocument.in/reader036/viewer/2022081417/56814963550346895db6b90f/html5/thumbnails/23.jpg)
• Form interference graph using interference metric• Find MAX-CUT of the graph
A2: Interference Graph Algorithm
P0(A)
P1(A)
P2(B)
P3(B)
70
Interference Graph
60
30 7545
85
P1(A)
P3(B)
P0(A)
P2(B)
85 45
![Page 24: Symbiotic Scheduling for Shared Caches in Multi-Core Systems Using Memory Footprint Signature](https://reader036.vdocument.in/reader036/viewer/2022081417/56814963550346895db6b90f/html5/thumbnails/24.jpg)
• To address high interference issues• Weight the edges of the interference graph• The rest are the same as A2
A3: Weighted Interference Graph Algorithm
P0OW=90
CA=20
CB=30
P1OW=85
CA=10
CB=45
P2OW=50
CA=40
CB=25
P3OW=100
CA=15
CB=50
Was in CA Was in CB
P0(A)
P1(A)
P2(B)
P3(B)
90*30
50*40Interference Graph
![Page 25: Symbiotic Scheduling for Shared Caches in Multi-Core Systems Using Memory Footprint Signature](https://reader036.vdocument.in/reader036/viewer/2022081417/56814963550346895db6b90f/html5/thumbnails/25.jpg)
25
Performance Evaluation
![Page 26: Symbiotic Scheduling for Shared Caches in Multi-Core Systems Using Memory Footprint Signature](https://reader036.vdocument.in/reader036/viewer/2022081417/56814963550346895db6b90f/html5/thumbnails/26.jpg)
Evaluation Methodology
P1 P2 P3 PN
Fedora Linux
Simics x86
Gather Footprint in Emulator
“magic”interface
Process-to-CoreMapping
P1 P2 P3 PN
Intel Core 2
Native x86 Run
P1 P2 PN
Linux Linux Linux
Xen Hypervisor
Intel Core 2
VM Run
![Page 27: Symbiotic Scheduling for Shared Caches in Multi-Core Systems Using Memory Footprint Signature](https://reader036.vdocument.in/reader036/viewer/2022081417/56814963550346895db6b90f/html5/thumbnails/27.jpg)
0%
10%
20%
30%
40%
50%
60%
asta
r
gobm
k
hm
mer
lbm
libquantu
m
mcf
om
netp
p
perlbench
povra
y
sople
x
sphin
x
xala
ncbm
k
Performance Results
0%
5%
10%
15%
20%
25%
asta
r
gobm
k
hm
mer
lbm
libquantu
m
mcf
om
netp
p
perlbench
povr
ay
sople
x
sphin
x
xala
ncbm
k
Maximum performance improvement of up to 54%
Average performance improvement of up to 23%
![Page 28: Symbiotic Scheduling for Shared Caches in Multi-Core Systems Using Memory Footprint Signature](https://reader036.vdocument.in/reader036/viewer/2022081417/56814963550346895db6b90f/html5/thumbnails/28.jpg)
Performance of Virtualized Systems
Maximum performance improvement of up to 26%
Average performance improvement of up to 9.5%
asta
r
gobm
k
hmm
er
lbm
libqu
antu
m
mcf
omne
tpp
perlb
ench
povr
ay
sopl
ex
sphi
nx
xala
nbcm
k
0%1%2%3%4%5%6%7%8%9%
10%
![Page 29: Symbiotic Scheduling for Shared Caches in Multi-Core Systems Using Memory Footprint Signature](https://reader036.vdocument.in/reader036/viewer/2022081417/56814963550346895db6b90f/html5/thumbnails/29.jpg)
Performance Sensitivity of 3 Algorithms
0%
4%
8%
12%
16%
mcfgobmkpovray
omnetpp
mcfhmmer
libquantumomnetpp
perlbenchgobmk
libquantumomnetpp
gobmkhmmer
libquantumpovray
mcfhmmer
libquantumpovray
Application Mix
Per
form
ance
Ben
efit
Sorted Graph Weighted Graph
Weighted Interference Graph has the best performance
![Page 30: Symbiotic Scheduling for Shared Caches in Multi-Core Systems Using Memory Footprint Signature](https://reader036.vdocument.in/reader036/viewer/2022081417/56814963550346895db6b90f/html5/thumbnails/30.jpg)
Conclusion
30/53
Shared Resource (e.g., LLC) Management is Critical
Process Scheduling using Compatibility in Multi-Core
Capturing Cache Reference Behavior for Processes
Symbiotic Scheduling with Bloom Filter Signature
Measured Speedup of 22% (up to 54%) on Intel Core 2
![Page 31: Symbiotic Scheduling for Shared Caches in Multi-Core Systems Using Memory Footprint Signature](https://reader036.vdocument.in/reader036/viewer/2022081417/56814963550346895db6b90f/html5/thumbnails/31.jpg)
31
That’s All, Folks!
Georgia TechECE MARS Labhttp://arch.ece.gatech.edu