applying control theory to the caches of multiprocessors department of eecs university of tennessee,...
TRANSCRIPT
![Page 1: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma](https://reader035.vdocument.in/reader035/viewer/2022062804/56649eec5503460f94bfd93d/html5/thumbnails/1.jpg)
Applying Control Theory to the Caches of Multiprocessors
Department of EECSUniversity of Tennessee, Knoxville
Kai Ma
![Page 2: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma](https://reader035.vdocument.in/reader035/viewer/2022062804/56649eec5503460f94bfd93d/html5/thumbnails/2.jpg)
2
Applying Control Theory to the Caches of Multiprocessors
Shared L2 cache is one of the most important on-chip shared resource. Largest area and leakage power consumer One of the dominant players in terms of performance
Two Papers: Relative Cache Latency Control for Performance Differentiations in
Power-Constrained Chip Multiprocessors SHARP Control: Controlled Shared Cache Management in Chip
Multiprocessors
![Page 3: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma](https://reader035.vdocument.in/reader035/viewer/2022062804/56649eec5503460f94bfd93d/html5/thumbnails/3.jpg)
Relative Cache Latency Control for Performance Differentiations in Power-
Constrained Chip Multiprocessors
Department of EECSUniversity of Tennessee, Knoxville
Xiaorui Wang, Kai Ma, Yefu Wang
![Page 4: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma](https://reader035.vdocument.in/reader035/viewer/2022062804/56649eec5503460f94bfd93d/html5/thumbnails/4.jpg)
4
Background
NUCA (Non Uniform Cache Architecture)
Key idea: Different cache banks have different access latencies.
13
![Page 5: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma](https://reader035.vdocument.in/reader035/viewer/2022062804/56649eec5503460f94bfd93d/html5/thumbnails/5.jpg)
5
Introduction The power of the cache part needs to be constrained.
With controlled power, the performance of the caches also need to be guaranteed. Why control relative latency (the ratio between the average
cache access latencies of two threads)?
1. Accelerate critical threads 2. Reduce priority inversion
![Page 6: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma](https://reader035.vdocument.in/reader035/viewer/2022062804/56649eec5503460f94bfd93d/html5/thumbnails/6.jpg)
6
System Design
Thread 1 on core 1
Thread 0 on core 0
Thread 3 on core 3
Latency Monitor
Thread 2 on core 2
Relative Latency Controller
Cache Resizing and Partitioning Modulator
Power Monitor
Power Controller
Latency Monitor Latency Monitor
Latency Monitor
Relative Latency Controller
Relative Latency Controller
Shared L2 Cache
Relative Latency Control Loop
Power Control Loop
Cache bank of Thread 0
Cache bank of Thread 2
Cache bank of Thread 3
Cache bank of Thread 1
Inactive cache bank
![Page 7: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma](https://reader035.vdocument.in/reader035/viewer/2022062804/56649eec5503460f94bfd93d/html5/thumbnails/7.jpg)
7
Relative Latency Controller (RLC)
New cache ratio RLRLC
Relative latency set point
• PI (Proportional-Integral) controller System modeling Controller design Control analysis
1.5
Error: 0.3Increase 0.2
Workload variation Total cache size variation
1.5
Shared L2 caches
1.2
![Page 8: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma](https://reader035.vdocument.in/reader035/viewer/2022062804/56649eec5503460f94bfd93d/html5/thumbnails/8.jpg)
8
21
11
)()()(n
jij
n
jji jkcbjklakl
Relative Latency Model
is the relative latency between and core is the cache size ratio between and core
RL model
System identification Model orders Parameters
21,nn
ii ba ,0.25 0.17 0.17
0.22 0.17 0.17
0.18 0.15 0.15
01 n 11 n 21 n
12 n
22 n
32 n
Model Orders and Error
)(klithi thi )1(
thi thi )1( ic
![Page 9: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma](https://reader035.vdocument.in/reader035/viewer/2022062804/56649eec5503460f94bfd93d/html5/thumbnails/9.jpg)
9
Controller Design PID controller
Proportional Integral
Design: Root Locus
New cache ratio Relative latencyRelative Latency
set point
Error
)(ke )(
)(
keK
keK
I
P
)1()()1()( 21 keKkeKkckc ii
Shared L2 caches
![Page 10: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma](https://reader035.vdocument.in/reader035/viewer/2022062804/56649eec5503460f94bfd93d/html5/thumbnails/10.jpg)
10
Control Analysis
Derive the transfer function of the controller
Derive the transfer function of the system with system model variations
Derive the transfer function of the close-loop system and compute the poles
The control period of the power control loop is selected to be longer than the settling time of the relative latency control loop.
)1()1(')( 11 kcbklakl ii
Stability range:
18.1'69.0 1 a
![Page 11: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma](https://reader035.vdocument.in/reader035/viewer/2022062804/56649eec5503460f94bfd93d/html5/thumbnails/11.jpg)
11
Power Controller is the total cache size in the power control period. is the cache power in the power control period. are the parameters depended on applications
System Model Leakage power is proportional to the cache size. Leakage power counts for the largest portion of cache
power.
PI Controller
Controller analysis: and
( ) * ( )p k c s k d
( )p k( )s k thk
thk,c d
0'c 76.0' c
![Page 12: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma](https://reader035.vdocument.in/reader035/viewer/2022062804/56649eec5503460f94bfd93d/html5/thumbnails/12.jpg)
12
Simulation Simulator
Simplescalar with NUCA cache (Alpha 21264 like core)
Power reading Dynamic part: Wattch (with CACTI) Leakage part: Hotleakage
Workload Selected workloads from SPEC2000
Actuator Cache bank resizing and partitioning
3
7
11
15
1
4
8
12
2
5
9
13
6
10
14
16
3
7
11
15
1
4
8
12
2
5
9
13
6
10
14
16
3
7
11
15
1
4
8
12
2
5
9
13
6
10
14
16
3
7
11
15
1
4
8
12
2
5
9
13
6
10
14
16
![Page 13: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma](https://reader035.vdocument.in/reader035/viewer/2022062804/56649eec5503460f94bfd93d/html5/thumbnails/13.jpg)
13
Single Control Evaluation
Switch workloads here
RLC set point change Power controller set point change
Workload switch Total cache bank count change
![Page 14: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma](https://reader035.vdocument.in/reader035/viewer/2022062804/56649eec5503460f94bfd93d/html5/thumbnails/14.jpg)
14
Relative Latency & IPC
![Page 15: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma](https://reader035.vdocument.in/reader035/viewer/2022062804/56649eec5503460f94bfd93d/html5/thumbnails/15.jpg)
15
Coordination
Cache access latencies and IPC values of the four threads on the four cores of the CMP.
Cache access latencies and IPC values of the two threads on Core 0 and Core 1 for different benchmarks.
![Page 16: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma](https://reader035.vdocument.in/reader035/viewer/2022062804/56649eec5503460f94bfd93d/html5/thumbnails/16.jpg)
16
Conclusions Relative Cache Latency Control for Performance
Differentiations in Power-Constrained Chip Multiprocessors
Simultaneously control power and relative latency
Achieve desired performance differentiations
Theoretically analyze the single loop control and coordinated system stability
![Page 17: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma](https://reader035.vdocument.in/reader035/viewer/2022062804/56649eec5503460f94bfd93d/html5/thumbnails/17.jpg)
SHARP Control: Controlled Shared Cache Management in Chip Multiprocessors
Shekhar Srikantaiah, Mahmut Kandemir, *Qian Wang
Department of CSE
*Department of MNE
The Pennsylvania State University
![Page 18: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma](https://reader035.vdocument.in/reader035/viewer/2022062804/56649eec5503460f94bfd93d/html5/thumbnails/18.jpg)
18
Introduction Lack of control over shared on-chip resource
Faded performance isolation Lack of Quality of Service (QoS) guarantee
It is challenging to achieve high utilization meanwhile guaranteeing the QoS. Static/dynamic resource reservations may lead to low
resource utilization. Existing heuristics adjustment cannot provide theoretical
guarantee like “settling time” or “stability range”.
![Page 19: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma](https://reader035.vdocument.in/reader035/viewer/2022062804/56649eec5503460f94bfd93d/html5/thumbnails/19.jpg)
19
Contribution Two-layer control theory based SHARP (SHAred
Resource Partitioning) architecture Propose an empirical model Design a customized application controller (Reinforced
Oscillation Resistant controller) Study two policies can be used in SHARP
SD (Service Differentiation) FSI (Fair Speedup Improvement)
( )
1 ( )
i
i
Napp base
i app scheme
NFS
IPC
IPC
![Page 20: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma](https://reader035.vdocument.in/reader035/viewer/2022062804/56649eec5503460f94bfd93d/html5/thumbnails/20.jpg)
20
System Design
![Page 21: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma](https://reader035.vdocument.in/reader035/viewer/2022062804/56649eec5503460f94bfd93d/html5/thumbnails/21.jpg)
21
Why not PID? Disadvantages of PID (Proportional-Integral-
Derivative) controller Painstaking to tune the parameters Hard to be integrated with hierarchical architecture Sensitive to model variation during run time Static parameters Generic controller (not problem-specific) Linear model based controller
![Page 22: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma](https://reader035.vdocument.in/reader035/viewer/2022062804/56649eec5503460f94bfd93d/html5/thumbnails/22.jpg)
22
Application Controller
![Page 23: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma](https://reader035.vdocument.in/reader035/viewer/2022062804/56649eec5503460f94bfd93d/html5/thumbnails/23.jpg)
23
Pre-Actuation Negotiator (PAN) Map an overly demanded cache partition to a
feasible partition
Policies:
SD (Service Differentiation )
FSI (Fair Speedup Improvement )
))1((
0
*
N
ii
ii
w
spillwwfloorw
N
ii Wwspillw
0
![Page 24: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma](https://reader035.vdocument.in/reader035/viewer/2022062804/56649eec5503460f94bfd93d/html5/thumbnails/24.jpg)
24
SHARP Controller Increase IPC set points when cache ways are under
utilized
FSI & SD policies
The proof of guaranteed optimal utilization
N
j jrefjout
j
j
N
j jrefii
PtP
tw
WPtP
0
*
0*
))1(
)1((
)(
![Page 25: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma](https://reader035.vdocument.in/reader035/viewer/2022062804/56649eec5503460f94bfd93d/html5/thumbnails/25.jpg)
25
Experimental Setup Simulator : Simics (Full system simulator)
Operating System: Solaris 10
Configuration (2, 8 cores)
Workload: 6 mixes of applications selected from SPEC2000
![Page 26: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma](https://reader035.vdocument.in/reader035/viewer/2022062804/56649eec5503460f94bfd93d/html5/thumbnails/26.jpg)
26
Evaluation (Application Controller)
Long run results of PID controller and ROR controller
![Page 27: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma](https://reader035.vdocument.in/reader035/viewer/2022062804/56649eec5503460f94bfd93d/html5/thumbnails/27.jpg)
27
Evaluation (FSI)
SHARP vs Baselines
![Page 28: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma](https://reader035.vdocument.in/reader035/viewer/2022062804/56649eec5503460f94bfd93d/html5/thumbnails/28.jpg)
28
Evaluation (SD)
Adaptation of IPC with the SD policy using the ROR controllers.
![Page 29: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma](https://reader035.vdocument.in/reader035/viewer/2022062804/56649eec5503460f94bfd93d/html5/thumbnails/29.jpg)
29
Sensitivity & Scalability
Sensitivity analysis for different reference points
Scalability (8 cores)
![Page 30: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma](https://reader035.vdocument.in/reader035/viewer/2022062804/56649eec5503460f94bfd93d/html5/thumbnails/30.jpg)
30
Conclusion SHARP Control: Controlled Shared Cache
Management in Chip Multiprocessor Propose and design the SHARP control architecture for
shared L2 caches Validate SHARP with different management policies (FSI or
SD) Achieve desired FS and SD specifications
![Page 31: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma](https://reader035.vdocument.in/reader035/viewer/2022062804/56649eec5503460f94bfd93d/html5/thumbnails/31.jpg)
31
Critiques (1)
How to decide the relative latency set point?
For accelerating critical thread purpose, the parallel workloads may be more applicable.
![Page 32: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma](https://reader035.vdocument.in/reader035/viewer/2022062804/56649eec5503460f94bfd93d/html5/thumbnails/32.jpg)
32
Critiques (2)
No stability proof
Insufficient description about how to update the parameters for the application controllers
![Page 33: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma](https://reader035.vdocument.in/reader035/viewer/2022062804/56649eec5503460f94bfd93d/html5/thumbnails/33.jpg)
33
ComparisonRelative latency control with the power constraint
SHARP control architecture
Goal Guarantee NUCA L2 cache relative latency with different power budget
Improve the normal L2 cache utilization while guaranteeing the QoS metrics
Design Two-layer hierarchical design
Two-layer hierarchical design
Controller PID ROR
Coordination & Stability Yes No
Actuator Cache bank resizing and partitioning
Cache way resizing and partitioning
Evaluation Simplescalar Simics
![Page 34: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma](https://reader035.vdocument.in/reader035/viewer/2022062804/56649eec5503460f94bfd93d/html5/thumbnails/34.jpg)
34
Q & A
Thank you
![Page 35: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma](https://reader035.vdocument.in/reader035/viewer/2022062804/56649eec5503460f94bfd93d/html5/thumbnails/35.jpg)
35
Backup Slides Start
![Page 36: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma](https://reader035.vdocument.in/reader035/viewer/2022062804/56649eec5503460f94bfd93d/html5/thumbnails/36.jpg)
36
Relative Controller Evaluation (2)
![Page 37: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma](https://reader035.vdocument.in/reader035/viewer/2022062804/56649eec5503460f94bfd93d/html5/thumbnails/37.jpg)
37
Application Controller Evaluation (2)
![Page 38: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma](https://reader035.vdocument.in/reader035/viewer/2022062804/56649eec5503460f94bfd93d/html5/thumbnails/38.jpg)
38
Guaranteed Optimal Utilization Proof are time varying coefficient depended on applications,i iK
*
*
0
*
0
*
**
0
( ) ( )
( 1) ( )
( 1)
( 1)( )
( )
( )
( 1)( )
( )( )
i i i
refi i i
N
i ii
refi i N
refi i
i
ii out
j
refi i N
refiiout
i i
w t P t
P t P K t
P t W
WP t P
P
w t
P t
WP t P
w tP
P t
![Page 39: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma](https://reader035.vdocument.in/reader035/viewer/2022062804/56649eec5503460f94bfd93d/html5/thumbnails/39.jpg)
39
System Design