timing analysis of concurrent programs running on shared cache multi-cores
DESCRIPTION
Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores. Zheng Wu. Outline. Background Motivation Analysis Framework Intra-Core Cache Analysis Cache Conflict Analysis Optimization Techniques WCRT Analysis Experiment Setup Experiment Results Contribution Conclusion. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores](https://reader036.vdocument.in/reader036/viewer/2022062814/56816727550346895ddbbfdc/html5/thumbnails/1.jpg)
Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores
Zheng Wu
![Page 2: Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores](https://reader036.vdocument.in/reader036/viewer/2022062814/56816727550346895ddbbfdc/html5/thumbnails/2.jpg)
• Background• Motivation• Analysis Framework• Intra-Core Cache Analysis• Cache Conflict Analysis• Optimization Techniques• WCRT Analysis• Experiment Setup• Experiment Results• Contribution• Conclusion
Outline
![Page 3: Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores](https://reader036.vdocument.in/reader036/viewer/2022062814/56816727550346895ddbbfdc/html5/thumbnails/3.jpg)
• Hard Real-time Systems– Worst case execution time: essential input for schedulability analysis
• Static Program Analysis– Program path modeling: infeasible path/ loop bound detection– Micro-architecture modeling: instruction/data cache, branch prediction,
out-of-order pipeline.
Background
Dis
tribu
tion Actual WCET
Execution Time
ActualObserved
Observed WCET
![Page 4: Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores](https://reader036.vdocument.in/reader036/viewer/2022062814/56816727550346895ddbbfdc/html5/thumbnails/4.jpg)
Background• Concurrent Programs
– Task interaction: control/data dependency, preemption.– Resource contention: Shared cache multi-core architectures
• Problem: shared L2 instruction cache contention.
Core 1 Core nCPU
L1 Cache
CPU
L1 Cache
L2 Cache
……
![Page 5: Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores](https://reader036.vdocument.in/reader036/viewer/2022062814/56816727550346895ddbbfdc/html5/thumbnails/5.jpg)
• Message Sequence Chart (MSC)
time
Process 1 Process 2 Process 3
fm1
fm2
fm4
fr0
fr1
fs0
Core 1 Core 2
task Message communication
System Model
![Page 6: Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores](https://reader036.vdocument.in/reader036/viewer/2022062814/56816727550346895ddbbfdc/html5/thumbnails/6.jpg)
• Different cores conflict in L2 Shared cache.
Problem: Shared L2 cache conflicts
L2 Cache
Set 0
Set 1
Set 2
Set 3
Core 1 Core 2Process 1 Process 2 Process 3
fm1
fm2
fm4
fr0
fr1
fs0
m m’
concurrent
![Page 7: Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores](https://reader036.vdocument.in/reader036/viewer/2022062814/56816727550346895ddbbfdc/html5/thumbnails/7.jpg)
• J. Yan and W. Zhang RTAS 2008– T, T’ are from different cores and they conflict for cache set C. – All the accesses from T and T’ to C are cache misses in the worst case.
Related Work
L2 Cache
Set C - 1
Set C
Set C + 1
Core 2
m1, …, mk m’1, …, m’k
misses misses
Core 1Process 1
fm1
fm2
fm4
Process2 Process3
fr0
fr1
fs0
![Page 8: Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores](https://reader036.vdocument.in/reader036/viewer/2022062814/56816727550346895ddbbfdc/html5/thumbnails/8.jpg)
• Task Execution Lifetime
Motivation
Core 1 Core 2
Start time
End time
Scenario 1: overlap lifetime
conflicts
Core 1 Core 2
Start time
End time
Scenario 2: disjoint lifetime
No conflicts
Task 1 Task 2 Task 1 Task 2
![Page 9: Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores](https://reader036.vdocument.in/reader036/viewer/2022062814/56816727550346895ddbbfdc/html5/thumbnails/9.jpg)
Analysis Framework
yes
no
Estimated WCRT
Intra Core Cache Analysis
Core 1
Initial task interference
Intra Core Cache Analysis
Core n
L2 cache conflict analysis
WCRT Analysis
Task Interference changes ?
……
![Page 10: Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores](https://reader036.vdocument.in/reader036/viewer/2022062814/56816727550346895ddbbfdc/html5/thumbnails/10.jpg)
Analysis Framework
yes
no
Estimated WCRT
Initial task interference
Modified taskinterference
L2 cache conflict analysis
WCRT Analysis
Interference changes ?
Intra Core Cache Analysis
Core n……Intra Core Cache Analysis
Core 1
![Page 11: Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores](https://reader036.vdocument.in/reader036/viewer/2022062814/56816727550346895ddbbfdc/html5/thumbnails/11.jpg)
• Must Analysis: Always Hit (AH)– Memory blocks guaranteed to be present in the cache
• May Analysis: Always Miss (AM)– Memory blocks may be present in the cache.
• Persistence Analysis – Never evicted from cache after first iteration.
• Others: Non Classified (NC)
Intra-core Cache AnalysisH. Theiling, C. Ferdinand. and R. Wilheml. Fast and precise WCET prediction by separated cache and path analyses. RTS 2000.
![Page 12: Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores](https://reader036.vdocument.in/reader036/viewer/2022062814/56816727550346895ddbbfdc/html5/thumbnails/12.jpg)
• L1 Intra-core Cache Analysis.− Always Hit (AH), Always Miss (AM), Non Classified (NC)
• L2 Intra-core Cache Analysis
Intra-core Cache Analysis
AH AM NC
L2 cache analysis
L1 cache analysis
AH AM NC
access not access
![Page 13: Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores](https://reader036.vdocument.in/reader036/viewer/2022062814/56816727550346895ddbbfdc/html5/thumbnails/13.jpg)
Analysis Framework
yes
no
Estimated WCRT
Intra Core Cache Analysis
Core 1
Initial task interference
Modified taskinterference
Intra Core Cache Analysis
Core 2
L2 cache conflict analysis
WCRT Analysis
Interference changes ?
![Page 14: Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores](https://reader036.vdocument.in/reader036/viewer/2022062814/56816727550346895ddbbfdc/html5/thumbnails/14.jpg)
• Initial Task interference graph
L2 Cache Conflict Analysis
![Page 15: Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores](https://reader036.vdocument.in/reader036/viewer/2022062814/56816727550346895ddbbfdc/html5/thumbnails/15.jpg)
L2 Cache Conflict Analysis
• Analyze each cache set individually• Intra core L2 analysis
– Always miss– Non classified– Always hit
Set i
Set j
Task T
m0, m1
m2, m3
conflicting tasks
m’0, m’1Non classified
Always hit
L2 cache
![Page 16: Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores](https://reader036.vdocument.in/reader036/viewer/2022062814/56816727550346895ddbbfdc/html5/thumbnails/16.jpg)
Optimization for Set Associativity• Consider memory block age: LRU replacement
– age(m): maximal/upper bound of age of m.
m0
m1
Age: 1
2
3
4
Task T
m2
Always hitAlways hitAlways hit
m’0, m’1
Conflicting tasks
m0
m1
Age: 1
2
3
4
w/o optimization
m2
Non classifiedNon classifiedNon classified
m0
m1
Age: 1
2
3
4
with optimization
m2
Always hitAlways hitNon classified
2 memory blocks
![Page 17: Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores](https://reader036.vdocument.in/reader036/viewer/2022062814/56816727550346895ddbbfdc/html5/thumbnails/17.jpg)
Analysis Framework
yes
no
Estimated WCRT
Intra Core Cache Analysis
Core 1
Initial task interference
Modified taskinterference
Intra Core Cache Analysis
Core 2
L2 cache conflict analysis
WCRT Analysis
Interference changes ?
![Page 18: Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores](https://reader036.vdocument.in/reader036/viewer/2022062814/56816727550346895ddbbfdc/html5/thumbnails/18.jpg)
BCET and WCET Analysis
L1 cache L2 cache Best-Case Worst-CaseAH --- L1 hit L1 hit
AM AH L2 hit L2 hit
AM AM L2 miss L2 miss
AM NC L2 hit L2 miss
NC AH L1 hit L2 hit
NC AM L1 hit L2 miss
NC NC L1 hit L2 miss
• BCET and WCET •Access Latency for best case and worst case.
– Assumption: no timing anomalies with other architecture features• Shortest (longest) path
![Page 19: Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores](https://reader036.vdocument.in/reader036/viewer/2022062814/56816727550346895ddbbfdc/html5/thumbnails/19.jpg)
WCRT Analysis• Compute earliest, latest ready and finish time
![Page 20: Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores](https://reader036.vdocument.in/reader036/viewer/2022062814/56816727550346895ddbbfdc/html5/thumbnails/20.jpg)
Initial task interference
L2 cache conflict analysis and WCRT analysis
Interference graph
Estimated WCRT
Change ?Yes
No
Putting Together
![Page 21: Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores](https://reader036.vdocument.in/reader036/viewer/2022062814/56816727550346895ddbbfdc/html5/thumbnails/21.jpg)
Experiments Parameters
• Cache access latency– L1 hit: 1 cycle, L2 hit: 10 cycle, Memory access: 100 cycle
• Various core numbers– 1 core, 2 cores and 4 cores
•Various cache configurations – cache size, block size, associativity
• Real-World Benchmarks: DEBIE.
![Page 22: Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores](https://reader036.vdocument.in/reader036/viewer/2022062814/56816727550346895ddbbfdc/html5/thumbnails/22.jpg)
• Real-World Benchmarks: DEBIE.• Space Debris Monitoring Software• 8 MSC, 35 tasks.
0-1k 1k-2k 2k-4k 4k-8k 8k-16k 16k-0
2
4
6
8
10
12Code Size Distribution
Task Code Size
#of t
asks
Experiments Parameters
![Page 23: Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores](https://reader036.vdocument.in/reader036/viewer/2022062814/56816727550346895ddbbfdc/html5/thumbnails/23.jpg)
• Comparison with Yan-Zhang RTAS 2008. • Direct mapped cache only.
Experimental Results
(a) WCRT Comparison (b) Inter-core Eviction Comparison
1-core, L2:8KB
2-core, L2:16KB
4-core, L2:32KB
10,000,000
15,000,000
20,000,000
25,000,000
Yan-Zhang's Method Our Method
Core Configuration (L1: 2KB)
Estim
ated
WC
RT(
mill
ion)
1-core. L2:8KB
2-core. L2:16KB
4-core. L2:32KB
05,000
10,00015,00020,00025,00030,000
Yan-Zhang's Method Our Method
Core Configuration (L1: 2KB)In
ter c
ore
Evic
tions
![Page 24: Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores](https://reader036.vdocument.in/reader036/viewer/2022062814/56816727550346895ddbbfdc/html5/thumbnails/24.jpg)
• Vary L1 and L2 Size.
Experimental Results
(a) Varying L1 Size
512B 1KB 2KB 4KB0
20,000,000
40,000,000
60,000,000
80,000,000
100,000,000
120,000,000
Yan-Zhang's Method Our Method
Core Configuration (2-core, L2: 16KB)
Estim
ated
WC
RT(
mill
ion)
4KB 8KB 16KB 32KB18,000,00019,000,00020,000,00021,000,00022,000,00023,000,00024,000,00025,000,00026,000,000
Yan-Zhang's Method Our Method
Core Configuration (2-core, L1: 2KB)Es
timat
ed W
CR
T (m
illio
n)
(b) Varying L2 Size
![Page 25: Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores](https://reader036.vdocument.in/reader036/viewer/2022062814/56816727550346895ddbbfdc/html5/thumbnails/25.jpg)
• Set associative cache optimizationsExperimental Results
1way 2way 4way 8way16,000,00017,000,00018,000,00019,000,00020,000,00021,000,00022,000,00023,000,00024,000,000
w/o optimization wtih optimization
Core Configuration (2-core, L1:2KB)
Est
imat
ed W
CR
T (
mill
ion)
m0
m1
w/o optimization
m2
m0
m1
w/o optimization
m2
Non classifiedNon classifiedNon classified
Always hitAlways hitNon classified
Age:1234
![Page 26: Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores](https://reader036.vdocument.in/reader036/viewer/2022062814/56816727550346895ddbbfdc/html5/thumbnails/26.jpg)
2KB 4KB 8KB 16KB 32KB05
1015202530
L1:2x512B
L1:2x1KB
L1:2x2KB
Shared L2 Cache Size
Ana
lysi
s T
ime
(sec
)
Experimental Results
• Runtime of our iterative analysis
![Page 27: Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores](https://reader036.vdocument.in/reader036/viewer/2022062814/56816727550346895ddbbfdc/html5/thumbnails/27.jpg)
• WCRT analysis of concurrent programs running on Shared cache multi-cores.
• Use task lifetime to indentify real conflicts.
• Optimizations for set associative cache.
• Experiments: tighter WCET than state of the art.
• Future work: data cache, other replacement policy
Conclusion
![Page 28: Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores](https://reader036.vdocument.in/reader036/viewer/2022062814/56816727550346895ddbbfdc/html5/thumbnails/28.jpg)
Thank You!