fairness via source throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfb1 a1 a2 a3...
TRANSCRIPT
Fairness via Source Throttling:
A configurable and high-performance fairness substrate for multi-core memory systems
Eiman Ebrahimi*
Chang Joo Lee*
Onur Mutlu‡
Yale N. Patt*
* HPS Research Group
The University of Texas at Austin
‡ Computer Architecture Laboratory
Carnegie Mellon University
Wednesday, March 17, 2010
Background and Problem
Core 0 Core 1 Core 2 Core N...
2
Wednesday, March 17, 2010
Background and Problem
Core 0 Core 1 Core 2 Core N
Shared Cache
Memory Controller
DRAMBank 0
DRAMBank 1
DRAM Bank 2
... DRAMBank K
...
Chip BoundaryOn-chipOff-chip
2
Wednesday, March 17, 2010
Background and Problem
Core 0 Core 1 Core 2 Core N
Shared Cache
Memory Controller
DRAMBank 0
DRAMBank 1
DRAM Bank 2
... DRAMBank K
...
Chip BoundaryOn-chipOff-chip
2
Wednesday, March 17, 2010
Background and Problem
Core 0 Core 1 Core 2 Core N
Shared Cache
Memory Controller
DRAMBank 0
DRAMBank 1
DRAM Bank 2
... DRAMBank K
...
Chip BoundaryOn-chipOff-chip
2
Wednesday, March 17, 2010
Background and Problem
Core 0 Core 1 Core 2 Core N
Shared Cache
Memory Controller
DRAMBank 0
DRAMBank 1
DRAM Bank 2
... DRAMBank K
...
Chip BoundaryOn-chipOff-chip
2
Wednesday, March 17, 2010
Background and Problem
Core 0 Core 1 Core 2 Core N
Shared Cache
Memory Controller
DRAMBank 0
DRAMBank 1
DRAM Bank 2
... DRAMBank K
...
Chip BoundaryOn-chipOff-chip
2
Wednesday, March 17, 2010
Background and Problem
Core 0 Core 1 Core 2 Core N
Shared Cache
Memory Controller
DRAMBank 0
DRAMBank 1
DRAM Bank 2
... DRAMBank K
...
Shared MemoryResources
Chip BoundaryOn-chipOff-chip
2
Wednesday, March 17, 2010
•Applications slow down due to interference from memory requests of other applications
3
Background and Problem
Wednesday, March 17, 2010
• A memory system is fair if slowdowns of same-priority applications are equal(MICRO ’06, MICRO ’07, ISCA ’08)
•Applications slow down due to interference from memory requests of other applications
3
Background and Problem
Wednesday, March 17, 2010
• A memory system is fair if slowdowns of same-priority applications are equal(MICRO ’06, MICRO ’07, ISCA ’08)
•Applications slow down due to interference from memory requests of other applications
3
Background and Problem
SharedTi
•Slowdown of application i = Ti
Alone
Wednesday, March 17, 2010
• A memory system is fair if slowdowns of same-priority applications are equal(MICRO ’06, MICRO ’07, ISCA ’08)
•Applications slow down due to interference from memory requests of other applications
3
Background and Problem
SharedTi
•Slowdown of application i = Ti
Alone
Max{Slowdown i} over all applications i
Min{Slowdown i} over all applications i(MICRO ’07)
•Unfairness =
Wednesday, March 17, 2010
4
Background and Problem
Wednesday, March 17, 2010
• Magnitude of each application’s slowdown depends on concurrently running applications’ memory behavior
4
Background and Problem
Wednesday, March 17, 2010
0
1
2
3
4
5
zeus art
Slow
dow
n• Magnitude of each application’s slowdown depends on
concurrently running applications’ memory behavior
4
Background and Problem
Wednesday, March 17, 2010
0
1
2
3
4
5
zeus art
Slow
dow
n• Magnitude of each application’s slowdown depends on
concurrently running applications’ memory behavior
01234567
lbm omnet apsi vortexSl
owdo
wn
4
Background and Problem
Wednesday, March 17, 2010
0
1
2
3
4
5
zeus art
Slow
dow
n• Magnitude of each application’s slowdown depends on
concurrently running applications’ memory behavior
• Large disparities in slowdowns are unacceptable
01234567
lbm omnet apsi vortexSl
owdo
wn
4
Background and Problem
Wednesday, March 17, 2010
0
1
2
3
4
5
zeus art
Slow
dow
n• Magnitude of each application’s slowdown depends on
concurrently running applications’ memory behavior
• Large disparities in slowdowns are unacceptable• Low system performance
01234567
lbm omnet apsi vortexSl
owdo
wn
4
Background and Problem
Wednesday, March 17, 2010
0
1
2
3
4
5
zeus art
Slow
dow
n• Magnitude of each application’s slowdown depends on
concurrently running applications’ memory behavior
• Large disparities in slowdowns are unacceptable• Low system performance • Vulnerability to denial of service attacks
01234567
lbm omnet apsi vortexSl
owdo
wn
4
Background and Problem
Wednesday, March 17, 2010
0
1
2
3
4
5
zeus art
Slow
dow
n• Magnitude of each application’s slowdown depends on
concurrently running applications’ memory behavior
• Large disparities in slowdowns are unacceptable• Low system performance • Vulnerability to denial of service attacks• Difficult for system software to enforce priorities
01234567
lbm omnet apsi vortexSl
owdo
wn
4
Background and Problem
Wednesday, March 17, 2010
•Background and Problem
•Motivation for Source Throttling
• Fairness via Source Throttling (FST)
•Evaluation
•Conclusion
5
Outline
Wednesday, March 17, 2010
6
Prior Approaches
Wednesday, March 17, 2010
• Primarily manage inter-application interference in only one particular resource
• Shared Cache, Memory Controller, Interconnect, etc.
6
Prior Approaches
Wednesday, March 17, 2010
• Primarily manage inter-application interference in only one particular resource
• Shared Cache, Memory Controller, Interconnect, etc.
• Combining techniques for the different resources can result in negative interaction
6
Prior Approaches
Wednesday, March 17, 2010
• Primarily manage inter-application interference in only one particular resource
• Shared Cache, Memory Controller, Interconnect, etc.
• Combining techniques for the different resources can result in negative interaction
• Approaches that coordinate interaction among techniques for different resources require complex implementations
6
Prior Approaches
Wednesday, March 17, 2010
• Primarily manage inter-application interference in only one particular resource
• Shared Cache, Memory Controller, Interconnect, etc.
• Combining techniques for the different resources can result in negative interaction
• Approaches that coordinate interaction among techniques for different resources require complex implementations
Our Goal: Enable fair sharing ofthe entire memory system by dynamically detecting and controlling interference in a coordinated manner
6
Prior Approaches
Wednesday, March 17, 2010
7
Our Approach
Wednesday, March 17, 2010
• Manage inter-application interference at the cores, not at the shared resources
7
Our Approach
Wednesday, March 17, 2010
• Manage inter-application interference at the cores, not at the shared resources
• Dynamically estimate unfairness in the memory system
7
Our Approach
Wednesday, March 17, 2010
• Manage inter-application interference at the cores, not at the shared resources
• Dynamically estimate unfairness in the memory system
• If unfairness > system-software-specified target thenthrottle down core causing unfairness & throttle up core that was unfairly treated
7
Our Approach
Wednesday, March 17, 2010
Wednesday, March 17, 2010
A:B:
Unmanaged Interference
A:
B:Fair Source Throttling
Wednesday, March 17, 2010
Oldest ⎧||⎩
Shared MemoryResources
A:B:
Unmanaged Interference
A:
B:Fair Source Throttling
queue of requests to shared resources
Wednesday, March 17, 2010
Oldest ⎧||⎩
Shared MemoryResources
A: Compute
ComputeB:Unmanaged Interference
A:
B:Fair Source Throttling
queue of requests to shared resources
Wednesday, March 17, 2010
Oldest ⎧||⎩
Shared MemoryResources
A: Compute
ComputeB:
Request Generation Order: A1, A2, A3, A4, B1
Unmanaged Interference
A:
B:Fair Source Throttling
queue of requests to shared resources
Wednesday, March 17, 2010
A4B1
A1
A2
A3
Oldest ⎧||⎩
Shared MemoryResources
A: Compute
ComputeB:
Request Generation Order: A1, A2, A3, A4, B1
Unmanaged Interference
A:
B:Fair Source Throttling
queue of requests to shared resources
Wednesday, March 17, 2010
A4B1
A1
A2
A3
Oldest ⎧||⎩
Shared MemoryResources
A: Compute Stall on A1 Stall on A2 Stall on A3 Stall on A4
ComputeB:
Request Generation Order: A1, A2, A3, A4, B1
Unmanaged Interference
A:
B:Fair Source Throttling
queue of requests to shared resources
Wednesday, March 17, 2010
A4B1
A1
A2
A3
Oldest ⎧||⎩
Shared MemoryResources
A: Compute Stall on A1 Stall on A2 Stall on A3 Stall on A4
ComputeB:
Request Generation Order: A1, A2, A3, A4, B1
Unmanaged Interference
Core A’s stall time
A:
B:Fair Source Throttling
queue of requests to shared resources
Wednesday, March 17, 2010
A4B1
A1
A2
A3
Oldest ⎧||⎩
Shared MemoryResources
A: Compute Stall on A1 Stall on A2 Stall on A3 Stall on A4
Compute Stall waiting for shared resourcesB:
Request Generation Order: A1, A2, A3, A4, B1
Unmanaged Interference
Core A’s stall time
A:
B:Fair Source Throttling
queue of requests to shared resources
Wednesday, March 17, 2010
A4B1
A1
A2
A3
Oldest ⎧||⎩
Shared MemoryResources
A: Compute Stall on A1 Stall on A2 Stall on A3 Stall on A4
Compute Stall waiting for shared resources Stall on B1B:
Request Generation Order: A1, A2, A3, A4, B1
Unmanaged Interference
Core A’s stall time
A:
B:Fair Source Throttling
queue of requests to shared resources
Wednesday, March 17, 2010
A4B1
A1
A2
A3
Oldest ⎧||⎩
Shared MemoryResources
A: Compute Stall on A1 Stall on A2 Stall on A3 Stall on A4
Compute Stall waiting for shared resources Stall on B1B:
Request Generation Order: A1, A2, A3, A4, B1
Unmanaged Interference
Core A’s stall timeCore B’s stall time
A:
B:Fair Source Throttling
queue of requests to shared resources
Wednesday, March 17, 2010
A4B1
A1
A2
A3
Oldest ⎧||⎩
Shared MemoryResources
A: Compute Stall on A1 Stall on A2 Stall on A3 Stall on A4
Compute Stall waiting for shared resources Stall on B1B:
Request Generation Order: A1, A2, A3, A4, B1
Unmanaged Interference
Core A’s stall timeCore B’s stall time
A:
B:Fair Source Throttling
queue of requests to shared resources
Intensive application A generates many requests and causes long stall times for less intensive application B
Wednesday, March 17, 2010
A4B1
A1
A2
A3
Oldest ⎧||⎩
Shared MemoryResources
A: Compute Stall on A1 Stall on A2 Stall on A3 Stall on A4
Compute Stall waiting for shared resources Stall on B1B:
Request Generation Order: A1, A2, A3, A4, B1
Unmanaged Interference
Core A’s stall timeCore B’s stall time
A: Compute
ComputeB:Fair Source Throttling
Request Generation OrderA1, A2, A3, A4, B1
queue of requests to shared resources
Intensive application A generates many requests and causes long stall times for less intensive application B
Wednesday, March 17, 2010
A4B1
A1
A2
A3
Oldest ⎧||⎩
Shared MemoryResources
A: Compute Stall on A1 Stall on A2 Stall on A3 Stall on A4
Compute Stall waiting for shared resources Stall on B1B:
Request Generation Order: A1, A2, A3, A4, B1
Unmanaged Interference
Core A’s stall timeCore B’s stall time
A: Compute
ComputeB:Fair Source Throttling
Request Generation OrderA1, A2, A3, A4, B1
queue of requests to shared resources
Intensive application A generates many requests and causes long stall times for less intensive application B
Wednesday, March 17, 2010
A4B1
A1
A2
A3
Oldest ⎧||⎩
Shared MemoryResources
A: Compute Stall on A1 Stall on A2 Stall on A3 Stall on A4
Compute Stall waiting for shared resources Stall on B1B:
Request Generation Order: A1, A2, A3, A4, B1
Unmanaged Interference
Core A’s stall timeCore B’s stall time
A: Compute
ComputeB:Fair Source Throttling
Request Generation OrderA1, B1, A2, A3, A4
queue of requests to shared resources
Intensive application A generates many requests and causes long stall times for less intensive application B
Throttled Requests
Wednesday, March 17, 2010
A4B1
A1
A2
A3
Oldest ⎧||⎩
Shared MemoryResources
A: Compute Stall on A1 Stall on A2 Stall on A3 Stall on A4
Compute Stall waiting for shared resources Stall on B1B:
Request Generation Order: A1, A2, A3, A4, B1
Unmanaged Interference
Core A’s stall timeCore B’s stall time
⎧||⎩
Shared MemoryResources
A: Compute
ComputeB:Fair Source Throttling
Request Generation OrderA1, B1, A2, A3, A4
queue of requests to shared resources
queue of requests to shared resources
Oldest
Intensive application A generates many requests and causes long stall times for less intensive application B
Throttled Requests
Wednesday, March 17, 2010
A4B1
A1
A2
A3
Oldest ⎧||⎩
Shared MemoryResources
A: Compute Stall on A1 Stall on A2 Stall on A3 Stall on A4
Compute Stall waiting for shared resources Stall on B1B:
Request Generation Order: A1, A2, A3, A4, B1
Unmanaged Interference
Core A’s stall timeCore B’s stall time
A4
B1
A1
A2
A3
⎧||⎩
Shared MemoryResources
A: Compute
ComputeB:Fair Source Throttling
Request Generation OrderA1, B1, A2, A3, A4
queue of requests to shared resources
queue of requests to shared resources
Oldest
Intensive application A generates many requests and causes long stall times for less intensive application B
Throttled Requests
Wednesday, March 17, 2010
A4B1
A1
A2
A3
Oldest ⎧||⎩
Shared MemoryResources
A: Compute Stall on A1 Stall on A2 Stall on A3 Stall on A4
Compute Stall waiting for shared resources Stall on B1B:
Request Generation Order: A1, A2, A3, A4, B1
Unmanaged Interference
Core A’s stall timeCore B’s stall time
A4
B1
A1
A2
A3
⎧||⎩
Shared MemoryResources
A: Compute Stall on A1
Compute Stall wait.B:Fair Source Throttling
Request Generation OrderA1, B1, A2, A3, A4
queue of requests to shared resources
queue of requests to shared resources
Oldest
Intensive application A generates many requests and causes long stall times for less intensive application B
Throttled Requests
Wednesday, March 17, 2010
A4B1
A1
A2
A3
Oldest ⎧||⎩
Shared MemoryResources
A: Compute Stall on A1 Stall on A2 Stall on A3 Stall on A4
Compute Stall waiting for shared resources Stall on B1B:
Request Generation Order: A1, A2, A3, A4, B1
Unmanaged Interference
Core A’s stall timeCore B’s stall time
A4
B1
A1
A2
A3
⎧||⎩
Shared MemoryResources
A: Compute Stall on A1
Compute Stall wait. Stall on B1B:Fair Source Throttling
Stall wait.
Request Generation OrderA1, B1, A2, A3, A4
queue of requests to shared resources
queue of requests to shared resources
Oldest
Intensive application A generates many requests and causes long stall times for less intensive application B
Throttled Requests
Wednesday, March 17, 2010
A4B1
A1
A2
A3
Oldest ⎧||⎩
Shared MemoryResources
A: Compute Stall on A1 Stall on A2 Stall on A3 Stall on A4
Compute Stall waiting for shared resources Stall on B1B:
Request Generation Order: A1, A2, A3, A4, B1
Unmanaged Interference
Core A’s stall timeCore B’s stall time
A4
B1
A1
A2
A3
⎧||⎩
Shared MemoryResources
A: Compute Stall on A1
Compute Stall wait. Stall on B1B:
Core B’s stall time
Fair Source Throttling
Stall wait.
Request Generation OrderA1, B1, A2, A3, A4
queue of requests to shared resources
queue of requests to shared resources
Oldest
Intensive application A generates many requests and causes long stall times for less intensive application B
Throttled Requests
Wednesday, March 17, 2010
A4B1
A1
A2
A3
Oldest ⎧||⎩
Shared MemoryResources
A: Compute Stall on A1 Stall on A2 Stall on A3 Stall on A4
Compute Stall waiting for shared resources Stall on B1B:
Request Generation Order: A1, A2, A3, A4, B1
Unmanaged Interference
Core A’s stall timeCore B’s stall time
A4
B1
A1
A2
A3
⎧||⎩
Shared MemoryResources
A: Compute Stall on A1 Stall on A2
Compute Stall wait. Stall on B1B:
Core B’s stall time
Fair Source Throttling
Stall wait.
Request Generation OrderA1, B1, A2, A3, A4
queue of requests to shared resources
queue of requests to shared resources
Oldest
Intensive application A generates many requests and causes long stall times for less intensive application B
Throttled Requests
Stall on A4Stall on A3
Wednesday, March 17, 2010
A4B1
A1
A2
A3
Oldest ⎧||⎩
Shared MemoryResources
A: Compute Stall on A1 Stall on A2 Stall on A3 Stall on A4
Compute Stall waiting for shared resources Stall on B1B:
Request Generation Order: A1, A2, A3, A4, B1
Unmanaged Interference
Core A’s stall timeCore B’s stall time
A4
B1
A1
A2
A3
⎧||⎩
Shared MemoryResources
A: Compute Stall on A1 Stall on A2
Compute Stall wait. Stall on B1B:
Core A’s stall time
Core B’s stall time
Fair Source Throttling
Stall wait.
Request Generation OrderA1, B1, A2, A3, A4
queue of requests to shared resources
queue of requests to shared resources
Oldest
Intensive application A generates many requests and causes long stall times for less intensive application B
Throttled Requests
Stall on A4Stall on A3
Extra Cycles Core A
Wednesday, March 17, 2010
A4B1
A1
A2
A3
Oldest ⎧||⎩
Shared MemoryResources
A: Compute Stall on A1 Stall on A2 Stall on A3 Stall on A4
Compute Stall waiting for shared resources Stall on B1B:
Request Generation Order: A1, A2, A3, A4, B1
Unmanaged Interference
Core A’s stall timeCore B’s stall time
A4
B1
A1
A2
A3
⎧||⎩
Shared MemoryResources
A: Compute Stall on A1 Stall on A2
Compute Stall wait. Stall on B1B:
Core A’s stall time
Core B’s stall time
Fair Source Throttling
Stall wait.
Request Generation OrderA1, B1, A2, A3, A4
queue of requests to shared resources
queue of requests to shared resources
Saved Cycles Core BOldest
Intensive application A generates many requests and causes long stall times for less intensive application B
Throttled Requests
Stall on A4Stall on A3
Extra Cycles Core A
Wednesday, March 17, 2010
A4B1
A1
A2
A3
Oldest ⎧||⎩
Shared MemoryResources
A: Compute Stall on A1 Stall on A2 Stall on A3 Stall on A4
Compute Stall waiting for shared resources Stall on B1B:
Request Generation Order: A1, A2, A3, A4, B1
Unmanaged Interference
Core A’s stall timeCore B’s stall time
A4
B1
A1
A2
A3
⎧||⎩
Shared MemoryResources
A: Compute Stall on A1 Stall on A2
Compute Stall wait. Stall on B1B:
Dynamically detect application A’s interference for application B and throttle down application A
Core A’s stall time
Core B’s stall time
Fair Source Throttling
Stall wait.
Request Generation OrderA1, B1, A2, A3, A4
queue of requests to shared resources
queue of requests to shared resources
Saved Cycles Core BOldest
Intensive application A generates many requests and causes long stall times for less intensive application B
Throttled Requests
Stall on A4Stall on A3
Extra Cycles Core A
Wednesday, March 17, 2010
•Background and Problem
•Motivation for Source Throttling
• Fairness via Source Throttling (FST)
•Evaluation
•Conclusion
9
Outline
Wednesday, March 17, 2010
10
Fairness via Source Throttling (FST)
Wednesday, March 17, 2010
•Runtime Unfairness Evaluation• Dynamically estimates the unfairness in the
memory system
10
Fairness via Source Throttling (FST)
Wednesday, March 17, 2010
•Runtime Unfairness Evaluation• Dynamically estimates the unfairness in the
memory system
•Dynamic Request Throttling• Adjusts how aggressively each core makes
requests to the shared resources
10
Fairness via Source Throttling (FST)
Wednesday, March 17, 2010
Runtime UnfairnessEvaluation
DynamicRequest Throttling
FST
TimeInterval 1 Interval 2 Interval 3
Fairness via Source Throttling (FST)
Wednesday, March 17, 2010
Runtime UnfairnessEvaluation
DynamicRequest Throttling
FST
| ⎨ | ⎧⎩Slowdown Estimation
TimeInterval 1 Interval 2 Interval 3
Fairness via Source Throttling (FST)
Wednesday, March 17, 2010
Runtime UnfairnessEvaluation
DynamicRequest Throttling
1- Estimating system unfairness
FST
| ⎨ | ⎧⎩Slowdown Estimation
TimeInterval 1 Interval 2 Interval 3
Runtime UnfairnessEvaluation
Fairness via Source Throttling (FST)
Wednesday, March 17, 2010
Runtime UnfairnessEvaluation
DynamicRequest Throttling
1- Estimating system unfairness 2- Find app. with the highest slowdown (App-slowest)
FST
| ⎨ | ⎧⎩Slowdown Estimation
TimeInterval 1 Interval 2 Interval 3
Runtime UnfairnessEvaluation
Fairness via Source Throttling (FST)
Wednesday, March 17, 2010
Runtime UnfairnessEvaluation
DynamicRequest Throttling
1- Estimating system unfairness 2- Find app. with the highest slowdown (App-slowest)3- Find app. causing most interference for App-slowest (App-interfering)
FST
| ⎨ | ⎧⎩Slowdown Estimation
TimeInterval 1 Interval 2 Interval 3
Runtime UnfairnessEvaluation
Fairness via Source Throttling (FST)
Wednesday, March 17, 2010
Runtime UnfairnessEvaluation
DynamicRequest Throttling
1- Estimating system unfairness 2- Find app. with the highest slowdown (App-slowest)3- Find app. causing most interference for App-slowest (App-interfering)
FSTUnfairness Estimate
App-slowestApp-interfering
| ⎨ | ⎧⎩Slowdown Estimation
TimeInterval 1 Interval 2 Interval 3
DynamicRequest Throttling
Fairness via Source Throttling (FST)
Wednesday, March 17, 2010
Runtime UnfairnessEvaluation
DynamicRequest Throttling
1- Estimating system unfairness 2- Find app. with the highest slowdown (App-slowest)3- Find app. causing most interference for App-slowest (App-interfering)
if (Unfairness Estimate >Target) {
FSTUnfairness Estimate
App-slowestApp-interfering
| ⎨ | ⎧⎩Slowdown Estimation
TimeInterval 1 Interval 2 Interval 3
DynamicRequest Throttling
Fairness via Source Throttling (FST)
Wednesday, March 17, 2010
Runtime UnfairnessEvaluation
DynamicRequest Throttling
1- Estimating system unfairness 2- Find app. with the highest slowdown (App-slowest)3- Find app. causing most interference for App-slowest (App-interfering)
if (Unfairness Estimate >Target) { 1-Throttle down App-interfering
FSTUnfairness Estimate
App-slowestApp-interfering
| ⎨ | ⎧⎩Slowdown Estimation
TimeInterval 1 Interval 2 Interval 3
DynamicRequest Throttling
Fairness via Source Throttling (FST)
Wednesday, March 17, 2010
Runtime UnfairnessEvaluation
DynamicRequest Throttling
1- Estimating system unfairness 2- Find app. with the highest slowdown (App-slowest)3- Find app. causing most interference for App-slowest (App-interfering)
if (Unfairness Estimate >Target) { 1-Throttle down App-interfering 2-Throttle up App-slowest}
FSTUnfairness Estimate
App-slowestApp-interfering
| ⎨ | ⎧⎩Slowdown Estimation
TimeInterval 1 Interval 2 Interval 3
DynamicRequest Throttling
Fairness via Source Throttling (FST)
Wednesday, March 17, 2010
Runtime UnfairnessEvaluation
DynamicRequest Throttling
1- Estimating system unfairness 2- Find app. with the highest slowdown (App-slowest)3- Find app. causing most interference for App-slowest (App-interfering)
if (Unfairness Estimate >Target) { 1-Throttle down App-interfering 2-Throttle up App-slowest}
FSTUnfairness Estimate
App-slowestApp-interfering
12
Fairness via Source Throttling (FST)
Wednesday, March 17, 2010
13
SharedTi
•Slowdown of application i = Ti
Alone
Max{Slowdown i} over all applications i
Min{Slowdown i} over all applications i•Unfairness =
Estimating System Unfairness
Wednesday, March 17, 2010
•How can be estimated in shared mode?TiAlone
13
SharedTi
•Slowdown of application i = Ti
Alone
Max{Slowdown i} over all applications i
Min{Slowdown i} over all applications i•Unfairness =
Estimating System Unfairness
Wednesday, March 17, 2010
•How can be estimated in shared mode?TiAlone
13
SharedTi
•Slowdown of application i = Ti
Alone
Max{Slowdown i} over all applications i
Min{Slowdown i} over all applications i•Unfairness =
TiExcess
• is the number of extra cycles it takes application i to execute due to interference
Estimating System Unfairness
Wednesday, March 17, 2010
•How can be estimated in shared mode?TiAlone
13
SharedTi
•Slowdown of application i = Ti
Alone
Max{Slowdown i} over all applications i
Min{Slowdown i} over all applications i•Unfairness =
TiExcess
• is the number of extra cycles it takes application i to execute due to interference
TiShared
=TiAlone
- TiExcess
•
Estimating System Unfairness
Wednesday, March 17, 2010
14
Tracking Inter-Core Interference
Wednesday, March 17, 2010
Core 0 Core 1 Core 2 Core 3
Bank 0 Bank 1 Bank 2 Bank 7...
Memory Controller
Shared Cache
14
Tracking Inter-Core Interference
Wednesday, March 17, 2010
Core 0 Core 1 Core 2 Core 3
Bank 0 Bank 1 Bank 2 Bank 7...
Memory Controller
Shared Cache
Three interference sources:
14
Tracking Inter-Core Interference
Wednesday, March 17, 2010
Core 0 Core 1 Core 2 Core 3
Bank 0 Bank 1 Bank 2 Bank 7...
Memory Controller
Shared Cache
Three interference sources:1. Shared Cache
14
Tracking Inter-Core Interference
Wednesday, March 17, 2010
Core 0 Core 1 Core 2 Core 3
Bank 0 Bank 1 Bank 2 Bank 7...
Memory Controller
Shared Cache
Three interference sources:1. Shared Cache2. DRAM bus and bank
14
Tracking Inter-Core Interference
Wednesday, March 17, 2010
Core 0 Core 1 Core 2 Core 3
Bank 0 Bank 1 Bank 2 Bank 7...
Memory Controller
Shared Cache
Three interference sources:1. Shared Cache2. DRAM bus and bank3. DRAM row-buffers
14
Bank 2
Tracking Inter-Core Interference
Row
Wednesday, March 17, 2010
Core 0 Core 1 Core 2 Core 3
Bank 0 Bank 1 Bank 2 Bank 7...
Memory Controller
Shared Cache
Three interference sources:1. Shared Cache2. DRAM bus and bank3. DRAM row-buffers
14
Bank 2
Tracking Inter-Core Interference
Row
Wednesday, March 17, 2010
0 0 0 0
Interference per corebit vector
Core # 0 1 2 3
Core 0 Core 1 Core 2 Core 3
Bank 0 Bank 1 Bank 2 Bank 7...
Memory Controller
Shared Cache
Three interference sources:1. Shared Cache2. DRAM bus and bank3. DRAM row-buffers
FST hardware
14
Bank 2
Tracking Inter-Core Interference
Row
Wednesday, March 17, 2010
0 0 0 0
Interference per corebit vector
Core # 0 1 2 3
Core 0 Core 1 Core 2 Core 3
Bank 0 Bank 1 Bank 2 Bank 7...
Memory Controller
Shared Cache
Three interference sources:1. Shared Cache2. DRAM bus and bank3. DRAM row-buffers
FST hardware
14
Bank 2
Tracking Inter-Core Interference
Row
Wednesday, March 17, 2010
Core 0 Core 1
15
Tracking DRAM Row-Buffer Interference
Wednesday, March 17, 2010
Bank 2Bank 0 Bank 1 Bank 7...
Core 0 Core 1
15
Tracking DRAM Row-Buffer Interference
Wednesday, March 17, 2010
Row B
Row A
Bank 2Bank 0 Bank 1 Bank 7...
Core 0 Core 1
Row Aqueue of requests to bank 2
15
Tracking DRAM Row-Buffer Interference
Wednesday, March 17, 2010
Row B
Row A
Bank 2
Row B
Bank 0 Bank 1 Bank 7...
Core 0 Core 1
Row Aqueue of requests to bank 2
Row Buffer:
15
Tracking DRAM Row-Buffer Interference
Wednesday, March 17, 2010
Row B
Row A
Bank 2
Row B
Bank 0 Bank 1 Bank 7...
Core 0 Core 1
Row Aqueue of requests to bank 2
FST additions
Row Buffer:
15
Tracking DRAM Row-Buffer Interference
Wednesday, March 17, 2010
Row B
Row A
Bank 2
Row B
Bank 0 Bank 1 Bank 7...
Core 0 Core 1
Row A
Interference per corebit vector
0 0Core # 0 1
queue of requests to bank 2
FST additions
Row Buffer:
15
Tracking DRAM Row-Buffer Interference
Wednesday, March 17, 2010
Row B
Row A
Bank 2
Row B
Bank 0 Bank 1 Bank 7...
Core 0 Core 1
Row A
Shadow Row Address Register(SRAR) Core 0 :
Interference per corebit vector
0 0Core # 0 1
Shadow Row Address Register(SRAR) Core 1 :
queue of requests to bank 2
FST additions
Row Buffer:
15
Tracking DRAM Row-Buffer Interference
Wednesday, March 17, 2010
Row B
Row A
Bank 2
Row B
Bank 0 Bank 1 Bank 7...
Core 0 Core 1
Row A
Shadow Row Address Register(SRAR) Core 0 :
Interference per corebit vector
0 0Core # 0 1
Shadow Row Address Register(SRAR) Core 1 :
queue of requests to bank 2
FST additions
Row Buffer:
15
Tracking DRAM Row-Buffer Interference
Wednesday, March 17, 2010
Row B
Row A
Bank 2
Row B
Bank 0 Bank 1 Bank 7...
Core 0 Core 1
Row Hit
Row A
Shadow Row Address Register(SRAR) Core 0 :
Interference per corebit vector
0 0Core # 0 1
Shadow Row Address Register(SRAR) Core 1 :
queue of requests to bank 2
FST additions
Row Buffer:
15
Tracking DRAM Row-Buffer Interference
Wednesday, March 17, 2010
Row B
Row A
Bank 2
Row B
Bank 0 Bank 1 Bank 7...
Core 0 Core 1
Row Hit
Row A
Row BShadow Row Address Register(SRAR) Core 0 :
Interference per corebit vector
0 0Core # 0 1
Shadow Row Address Register(SRAR) Core 1 :
queue of requests to bank 2
FST additions
Row Buffer:
15
Tracking DRAM Row-Buffer Interference
Wednesday, March 17, 2010
Row B
Row A
Bank 2
Row B
Bank 0 Bank 1 Bank 7...
Core 0 Core 1
Row ConflictRow A
Row BShadow Row Address Register(SRAR) Core 0 :
Row A
Interference per corebit vector
0 0Core # 0 1
Shadow Row Address Register(SRAR) Core 1 :
queue of requests to bank 2
FST additions
Row Buffer:
15
Tracking DRAM Row-Buffer Interference
Wednesday, March 17, 2010
Row B
Row A
Bank 2
Row B
Bank 0 Bank 1 Bank 7...
Core 0 Core 1
Row ConflictRow A
Row BRow B
Shadow Row Address Register(SRAR) Core 0 :
Row A
Interference per corebit vector
0 0Core # 0 1
Shadow Row Address Register(SRAR) Core 1 :
queue of requests to bank 2
FST additions
Row Buffer:
15
Tracking DRAM Row-Buffer Interference
Wednesday, March 17, 2010
Row BRow A
Bank 2
Row B
Bank 0 Bank 1 Bank 7...
Core 0 Core 1
Row A
Row Hit
Row BRow B
Shadow Row Address Register(SRAR) Core 0 :
Row A
Interference per corebit vector
0 0Core # 0 1
Shadow Row Address Register(SRAR) Core 1 :
queue of requests to bank 2
FST additions
Row Buffer:
15
Tracking DRAM Row-Buffer Interference
Wednesday, March 17, 2010
Row BRow A
Bank 2
Row B
Bank 0 Bank 1 Bank 7...
Core 0 Core 1
Row ARow Conflict
Row B
Row BShadow Row Address Register(SRAR) Core 0 :
Row A
Interference per corebit vector
0 0Core # 0 1
Shadow Row Address Register(SRAR) Core 1 :
queue of requests to bank 2
FST additions
Row Buffer:
15
Tracking DRAM Row-Buffer Interference
Wednesday, March 17, 2010
Row BRow A
Bank 2
Row B
Bank 0 Bank 1 Bank 7...
Core 0 Core 1
Row ARow Conflict
Row B
Row BShadow Row Address Register(SRAR) Core 0 :
Row A
Interference inducedrow conflict
Interference per corebit vector
0 0Core # 0 1
Shadow Row Address Register(SRAR) Core 1 :
queue of requests to bank 2
FST additions
Row Buffer:
15
Tracking DRAM Row-Buffer Interference
Wednesday, March 17, 2010
Row BRow A
Bank 2
Row B
Bank 0 Bank 1 Bank 7...
Core 0 Core 1
Row ARow Conflict
Row B
Row BShadow Row Address Register(SRAR) Core 0 :
Row A
Interference inducedrow conflict
Interference per corebit vector
0Core # 0 1
1
Shadow Row Address Register(SRAR) Core 1 :
queue of requests to bank 2
FST additions
Row Buffer:
15
Tracking DRAM Row-Buffer Interference
Wednesday, March 17, 2010
0 0 0 0
Interference per corebit vector
Core # 0 1 2 3
FST hardwareCore 0 Core 1 Core 2 Core 3
Bank 0 Bank 1 Bank 2 Bank 7...
Memory Controller
Shared Cache
16
Tracking Inter-Core Interference
Wednesday, March 17, 2010
0 0 0 0
Interference per corebit vector
Core # 0 1 2 3
0
0
0
0
Excess Cycles Counters per core
FST hardwareCore 0 Core 1 Core 2 Core 3
Bank 0 Bank 1 Bank 2 Bank 7...
Memory Controller
Shared Cache
16
TiExcess
|⎨|⎧
⎩
Tracking Inter-Core Interference
Wednesday, March 17, 2010
0 0 0 0
Interference per corebit vector
Core # 0 1 2 3
0
0
0
0
Excess Cycles Counters per core
TCycle Count
FST hardwareCore 0 Core 1 Core 2 Core 3
Bank 0 Bank 1 Bank 2 Bank 7...
Memory Controller
Shared Cache
16
TiExcess
|⎨|⎧
⎩
Tracking Inter-Core Interference
Wednesday, March 17, 2010
0 0 0
Interference per corebit vector
Core # 0 1 2 3
0
0
0
0
Excess Cycles Counters per core
1
TCycle Count
FST hardwareCore 0 Core 1 Core 2 Core 3
Bank 0 Bank 1 Bank 2 Bank 7...
Memory Controller
Shared Cache
16
TiExcess
|⎨|⎧
⎩
Tracking Inter-Core Interference
Wednesday, March 17, 2010
0 0 0
Interference per corebit vector
Core # 0 1 2 3
0
0
0
0
Excess Cycles Counters per core
1
TCycle Count
FST hardwareCore 0 Core 1 Core 2 Core 3
Bank 0 Bank 1 Bank 2 Bank 7...
Memory Controller
Shared Cache
16
TiExcess
|⎨|⎧
⎩
Tracking Inter-Core Interference
Wednesday, March 17, 2010
0 0 0
Interference per corebit vector
Core # 0 1 2 3
0
0
0
Excess Cycles Counters per core
1
Cycle Count T+2
2FST hardware
Core 0 Core 1 Core 2 Core 3
Bank 0 Bank 1 Bank 2 Bank 7...
Memory Controller
Shared Cache
16
TiExcess
|⎨|⎧
⎩
Tracking Inter-Core Interference
Wednesday, March 17, 2010
0 0 0
Interference per corebit vector
Core # 0 1 2 3
0
0
0
Excess Cycles Counters per core
1
Cycle Count T+2
2FST hardware
Core 0 Core 1 Core 2 Core 3
Bank 0 Bank 1 Bank 2 Bank 7...
Memory Controller
Shared Cache
16
TiExcess
|⎨|⎧
⎩
Tracking Inter-Core Interference
Wednesday, March 17, 2010
0 0 0
Interference per corebit vector
Core # 0 1 2 3
0
0
0
Excess Cycles Counters per core
1
Cycle Count T+2
2FST hardware
1
Core 0 Core 1 Core 2 Core 3
Bank 0 Bank 1 Bank 2 Bank 7...
Memory Controller
Shared Cache
16
TiExcess
|⎨|⎧
⎩
Tracking Inter-Core Interference
Wednesday, March 17, 2010
0 0 0
Interference per corebit vector
Core # 0 1 2 3
0
0
0
Excess Cycles Counters per core
1
Cycle Count T+2
2FST hardware
1
Core 0 Core 1 Core 2 Core 3
Bank 0 Bank 1 Bank 2 Bank 7...
Memory Controller
Shared Cache
16
TiExcess
|⎨|⎧
⎩
Tracking Inter-Core Interference
Wednesday, March 17, 2010
0 0 0
Interference per corebit vector
Core # 0 1 2 3
0
0
0
Excess Cycles Counters per core
1
Cycle Count T+2
2FST hardware
1
T+3
3
1
Core 0 Core 1 Core 2 Core 3
Bank 0 Bank 1 Bank 2 Bank 7...
Memory Controller
Shared Cache
16
TiExcess
|⎨|⎧
⎩
Tracking Inter-Core Interference
Wednesday, March 17, 2010
Runtime UnfairnessEvaluation
DynamicRequest Throttling
1- Estimating system unfairness 2- Find app. with the highest slowdown (App-slowest)3- Find app. causing most interference for App-slowest (App-interfering)
if (Unfairness Estimate >Target) { 1-Throttle down App-interfering 2-Throttle up App-slowest}
FSTUnfairness Estimate
App-slowestApp-interfering
17
Fairness via Source Throttling (FST)
Wednesday, March 17, 2010
18
Tracking Inter-Core Interference
Wednesday, March 17, 2010
• To identify App-interfering, for each core i
18
Tracking Inter-Core Interference
Wednesday, March 17, 2010
• To identify App-interfering, for each core i• FST separately tracks interference
caused by each core j ( j ! i )
18
Tracking Inter-Core Interference
Wednesday, March 17, 2010
0
Interference per corebit vector
Core # 0 1 2 3
• To identify App-interfering, for each core i• FST separately tracks interference
caused by each core j ( j ! i )
0 0 0
18
Tracking Inter-Core Interference
Wednesday, March 17, 2010
0
0 0 0 -
Interference per corebit vector
Core # 0 1 2 3
• To identify App-interfering, for each core i• FST separately tracks interference
caused by each core j ( j ! i )
0 0 - 0
0 - 0 0
- 0 0 00123
18
Tracking Inter-Core Interference
Wednesday, March 17, 2010
0
0 0 0 -
Interference per corebit vector
Core # 0 1 2 3
• To identify App-interfering, for each core i• FST separately tracks interference
caused by each core j ( j ! i )
0 0 - 0
0 - 0 0
- 0 0 0
|⎨|⎧ ⎩
|⎨|⎧
⎩
Interfered with core
Interfering core
0123
18
Tracking Inter-Core Interference
Wednesday, March 17, 2010
Cnt 3Cnt 2Cnt 1Cnt 00
0 0 0 -
Interference per corebit vector
Core # 0 1 2 3
Excess Cycles Counters per core
• To identify App-interfering, for each core i• FST separately tracks interference
caused by each core j ( j ! i )
0 0 - 0
0 - 0 0
- 0 0 0
|⎨|⎧ ⎩
|⎨|⎧
⎩
Interfered with core
Interfering core
0123
18
Tracking Inter-Core Interference
Wednesday, March 17, 2010
Cnt 3Cnt 2Cnt 1Cnt 00
0 0 0 -
Interference per corebit vector
Core # 0 1 2 3-
Cnt 1,0
Cnt 2,0
Cnt 3,0
Excess Cycles Counters per core
• To identify App-interfering, for each core i• FST separately tracks interference
caused by each core j ( j ! i )
0 0 - 0
0 - 0 0
- 0 0 0
|⎨|⎧ ⎩
|⎨|⎧
⎩
Interfered with core
Interfering core
Cnt 0,1
-
Cnt 2,1
Cnt 3,1
Cnt 0,2
Cnt 1,2
-
Cnt 3,2
Cnt 0,3
Cnt 1,3
Cnt 2,3
-
0123
18
Tracking Inter-Core Interference
Wednesday, March 17, 2010
Cnt 3Cnt 2Cnt 1Cnt 00
0 0 0 -
Interference per corebit vector
Core # 0 1 2 3-
Cnt 1,0
Cnt 2,0
Cnt 3,0
Excess Cycles Counters per core
• To identify App-interfering, for each core i• FST separately tracks interference
caused by each core j ( j ! i )
0 0 - 0
0 - 0 0
- 0 0 0
|⎨|⎧ ⎩
|⎨|⎧
⎩
Interfered with core
Interfering core
Cnt 0,1
-
Cnt 2,1
Cnt 3,1
Cnt 0,2
Cnt 1,2
-
Cnt 3,2
Cnt 0,3
Cnt 1,3
Cnt 2,3
-
1
0123
18
Tracking Inter-Core Interference
Wednesday, March 17, 2010
Cnt 3Cnt 2Cnt 1Cnt 00
0 0 0 -
Interference per corebit vector
Core # 0 1 2 3-
Cnt 1,0
Cnt 2,0
Cnt 3,0
Excess Cycles Counters per core
• To identify App-interfering, for each core i• FST separately tracks interference
caused by each core j ( j ! i )
0 0 - 0
0 - 0 0
- 0 0 0
|⎨|⎧ ⎩
|⎨|⎧
⎩
Interfered with core
Interfering core
Cnt 0,1
-
Cnt 2,1
Cnt 3,1
Cnt 0,2
Cnt 1,2
-
Cnt 3,2
Cnt 0,3
Cnt 1,3
Cnt 2,3
-
1core 2
interfered with
core 1
0123
18
Tracking Inter-Core Interference
Wednesday, March 17, 2010
Cnt 3Cnt 2Cnt 1Cnt 00
0 0 0 -
Interference per corebit vector
Core # 0 1 2 3-
Cnt 1,0
Cnt 2,0
Cnt 3,0
Excess Cycles Counters per core
• To identify App-interfering, for each core i• FST separately tracks interference
caused by each core j ( j ! i )
0 0 - 0
0 - 0 0
- 0 0 0
|⎨|⎧ ⎩
|⎨|⎧
⎩
Interfered with core
Interfering core
Cnt 0,1
-
Cnt 3,1
Cnt 0,2
Cnt 1,2
-
Cnt 3,2
Cnt 0,3
Cnt 1,3
Cnt 2,3
-
1core 2
interfered with
core 1
Cnt 2,1++
0123
18
Tracking Inter-Core Interference
Wednesday, March 17, 2010
Cnt 3Cnt 2Cnt 1Cnt 00
0 0 0 -
Interference per corebit vector
Core # 0 1 2 3-
Cnt 1,0
Cnt 2,0
Cnt 3,0
Excess Cycles Counters per core
• To identify App-interfering, for each core i• FST separately tracks interference
caused by each core j ( j ! i )
0 0 - 0
0 - 0 0
- 0 0 0
|⎨|⎧ ⎩
|⎨|⎧
⎩
Interfered with core
Interfering core
Cnt 0,1
-
Cnt 3,1
Cnt 0,2
Cnt 1,2
-
Cnt 3,2
Cnt 0,3
Cnt 1,3
Cnt 2,3
-
1core 2
interfered with
core 1
Cnt 2,1++
0123
App-slowest = 2
18
Tracking Inter-Core Interference
Wednesday, March 17, 2010
Cnt 3Cnt 2Cnt 1Cnt 00
0 0 0 -
Interference per corebit vector
Core # 0 1 2 3-
Cnt 1,0
Cnt 2,0
Cnt 3,0
Excess Cycles Counters per core
• To identify App-interfering, for each core i• FST separately tracks interference
caused by each core j ( j ! i )
0 0 - 0
0 - 0 0
- 0 0 0
|⎨|⎧ ⎩
|⎨|⎧
⎩
Interfered with core
Interfering core
Cnt 0,1
-
Cnt 3,1
Cnt 0,2
Cnt 1,2
-
Cnt 3,2
Cnt 0,3
Cnt 1,3
Cnt 2,3
-
1core 2
interfered with
core 1
Cnt 2,1++
0123
Row with largest count determines App-interfering
App-slowest = 2
18
Tracking Inter-Core Interference
Wednesday, March 17, 2010
Runtime UnfairnessEvaluation
DynamicRequest Throttling
1- Estimating system unfairness 2- Find app. with the highest slowdown (App-slowest)3- Find app. causing most interference for App-slowest (App-interfering)
if (Unfairness Estimate >Target) { 1-Throttle down App-interfering 2-Throttle up App-slowest}
FSTUnfairness Estimate
App-slowestApp-interfering
19
Fairness via Source Throttling (FST)
Wednesday, March 17, 2010
20
Dynamic Request Throttling
Wednesday, March 17, 2010
• Goal: Adjust how aggressively each core makes requests to the shared resources
20
Dynamic Request Throttling
Wednesday, March 17, 2010
• Goal: Adjust how aggressively each core makes requests to the shared resources
•Mechanisms:• Miss Status Holding Register (MSHR) quota
20
Dynamic Request Throttling
Wednesday, March 17, 2010
• Goal: Adjust how aggressively each core makes requests to the shared resources
•Mechanisms:• Miss Status Holding Register (MSHR) quota• Controls the number of concurrent requests
accessing shared resources from each application
20
Dynamic Request Throttling
Wednesday, March 17, 2010
• Goal: Adjust how aggressively each core makes requests to the shared resources
•Mechanisms:• Miss Status Holding Register (MSHR) quota• Controls the number of concurrent requests
accessing shared resources from each application
• Request injection frequency
20
Dynamic Request Throttling
Wednesday, March 17, 2010
• Goal: Adjust how aggressively each core makes requests to the shared resources
•Mechanisms:• Miss Status Holding Register (MSHR) quota• Controls the number of concurrent requests
accessing shared resources from each application
• Request injection frequency• Controls how often memory requests are issued
to the last level cache from the MSHRs
20
Dynamic Request Throttling
Wednesday, March 17, 2010
• Throttling level assigned to each core determines both MSHR quota and request injection rate
21
Dynamic Request Throttling
Wednesday, March 17, 2010
• Throttling level assigned to each core determines both MSHR quota and request injection rate
Throttling level MSHR quota Request Injection Rate
100% 128 Every cycle50% 64 Every other cycle25% 32 Once every 4 cycles10% 12 Once every 10 cycles5% 6 Once every 20 cycles4% 5 Once every 25 cycles3% 3 Once every 30 cycles2% 2 Once every 50 cycles
Total # ofMSHRs: 128
21
Dynamic Request Throttling
Wednesday, March 17, 2010
• Throttling level assigned to each core determines both MSHR quota and request injection rate
Throttling level MSHR quota Request Injection Rate
100% 128 Every cycle50% 64 Every other cycle25% 32 Once every 4 cycles10% 12 Once every 10 cycles5% 6 Once every 20 cycles4% 5 Once every 25 cycles3% 3 Once every 30 cycles2% 2 Once every 50 cycles
Total # ofMSHRs: 128
21
Dynamic Request Throttling
Wednesday, March 17, 2010
Time
Runtime UnfairnessEvaluation Dynamic
Request Throttling
FSTUnfairness Estimate
App-slowest
App-interfering
Throttling Levels
Core 0 Core 1 Core 3Interval i
Interval i + 1Interval i + 2
Core 0 Core 2
System software fairness goal: 1.4
22
FST at Work
Wednesday, March 17, 2010
TimeInterval i
Runtime UnfairnessEvaluation Dynamic
Request Throttling
FSTUnfairness Estimate
App-slowest
App-interfering
Throttling Levels
Core 0 Core 1 Core 350% 100% 10% 100%Interval i
Interval i + 1Interval i + 2
Core 0 Core 2
System software fairness goal: 1.4
22
FST at Work
Slowdown Estimation
| ⎨ | ⎧⎩
Wednesday, March 17, 2010
TimeInterval i
Runtime UnfairnessEvaluation Dynamic
Request Throttling
FSTUnfairness Estimate
App-slowest
App-interfering
Throttling Levels
Core 0 Core 1 Core 350% 100% 10% 100%Interval i
Interval i + 1Interval i + 2
Core 0 Core 2
System software fairness goal: 1.4
22
FST at Work
Slowdown Estimation
| ⎨ | ⎧⎩
Wednesday, March 17, 2010
TimeInterval i
Runtime UnfairnessEvaluation Dynamic
Request Throttling
FSTUnfairness Estimate
App-slowest
App-interfering
Throttling Levels
Core 0 Core 1 Core 350% 100% 10% 100%Interval i
Interval i + 1Interval i + 2
3
Core 2
Core 0
Core 0 Core 2
System software fairness goal: 1.4
22
FST at Work
Slowdown Estimation
| ⎨ | ⎧⎩
Wednesday, March 17, 2010
TimeInterval i
Runtime UnfairnessEvaluation Dynamic
Request Throttling
FSTUnfairness Estimate
App-slowest
App-interfering
Throttling Levels
Core 0 Core 1 Core 350% 100% 10% 100%Interval i
Interval i + 1Interval i + 2
3
Core 2
Core 0
Core 0 Core 2
System software fairness goal: 1.4
22
FST at Work
Slowdown Estimation
| ⎨ | ⎧⎩
Wednesday, March 17, 2010
TimeInterval i
Runtime UnfairnessEvaluation Dynamic
Request Throttling
FSTUnfairness Estimate
App-slowest
App-interfering
Throttling Levels
Core 0 Core 1 Core 350% 100% 10% 100%Interval i
Interval i + 1Interval i + 2
3
Core 2
Core 0
Core 0 Core 2Throttle down Throttle up
System software fairness goal: 1.4
22
FST at Work
Slowdown Estimation
| ⎨ | ⎧⎩
Wednesday, March 17, 2010
TimeInterval i Interval i+1
Runtime UnfairnessEvaluation Dynamic
Request Throttling
FSTUnfairness Estimate
App-slowest
App-interfering
Throttling Levels
Core 0 Core 1 Core 350% 100% 10% 100%25% 100% 25% 100%
Interval iInterval i + 1Interval i + 2
3
Core 2
Core 0
Core 0 Core 2Throttle down Throttle up
System software fairness goal: 1.4
22
FST at Work
Slowdown Estimation
| ⎨ | ⎧⎩
Wednesday, March 17, 2010
TimeInterval i Interval i+1
Runtime UnfairnessEvaluation Dynamic
Request Throttling
FSTUnfairness Estimate
App-slowest
App-interfering
Throttling Levels
Core 0 Core 1 Core 350% 100% 10% 100%25% 100% 25% 100%
Interval iInterval i + 1Interval i + 2
Core 0 Core 2
2.5
Core 2
Core 1
System software fairness goal: 1.4
22
FST at Work
Slowdown Estimation
| ⎨ | ⎧⎩
Wednesday, March 17, 2010
TimeInterval i Interval i+1
Runtime UnfairnessEvaluation Dynamic
Request Throttling
FSTUnfairness Estimate
App-slowest
App-interfering
Throttling Levels
Core 0 Core 1 Core 350% 100% 10% 100%25% 100% 25% 100%
Interval iInterval i + 1Interval i + 2
Core 0 Core 2
2.5
Core 2
Core 1
Throttle down Throttle up
System software fairness goal: 1.4
22
FST at Work
Slowdown Estimation
| ⎨ | ⎧⎩
Wednesday, March 17, 2010
TimeInterval i Interval i+1 Interval i+2
Runtime UnfairnessEvaluation Dynamic
Request Throttling
FSTUnfairness Estimate
App-slowest
App-interfering
Throttling Levels
Core 0 Core 1 Core 350% 100% 10% 100%25% 100% 25% 100%25% 50% 50% 100%
Interval iInterval i + 1Interval i + 2
Core 0 Core 2
2.5
Core 2
Core 1
Throttle down Throttle up
System software fairness goal: 1.4
22
FST at Work
Slowdown Estimation
| ⎨ | ⎧⎩
Wednesday, March 17, 2010
23
System Software Support
Wednesday, March 17, 2010
•Different fairness objectives can be configured by system software
23
System Software Support
Wednesday, March 17, 2010
•Different fairness objectives can be configured by system software• Estimated Unfairness > Target Unfairness
23
System Software Support
Wednesday, March 17, 2010
•Different fairness objectives can be configured by system software• Estimated Unfairness > Target Unfairness
• Estimated Max Slowdown > Target Max Slowdown
23
System Software Support
Wednesday, March 17, 2010
•Different fairness objectives can be configured by system software• Estimated Unfairness > Target Unfairness
• Estimated Max Slowdown > Target Max Slowdown
• Estimated Slowdown(i) > Target Slowdown(i)
23
System Software Support
Wednesday, March 17, 2010
•Different fairness objectives can be configured by system software• Estimated Unfairness > Target Unfairness
• Estimated Max Slowdown > Target Max Slowdown
• Estimated Slowdown(i) > Target Slowdown(i)
•Support for thread priorities
23
System Software Support
Wednesday, March 17, 2010
•Different fairness objectives can be configured by system software• Estimated Unfairness > Target Unfairness
• Estimated Max Slowdown > Target Max Slowdown
• Estimated Slowdown(i) > Target Slowdown(i)
•Support for thread priorities• Weighted Slowdown(i) =
Estimated Slowdown(i) x Weight(i)
23
System Software Support
Wednesday, March 17, 2010
•Total storage cost required for 4 cores is ! 12KB
• FST does not require any structures or logic that are on the processor’s critical path
24
Hardware Cost
Wednesday, March 17, 2010
•Background and Problem
•Motivation for Source Throttling
• Fairness via Source Throttling (FST)
•Evaluation
•Conclusion
25
Outline
Wednesday, March 17, 2010
• x86 cycle accurate simulator
• Baseline processor configuration• Per-core• 4-wide issue, out-of-order, 256 entry ROB
• Shared (4-core system)• 128 MSHRs • 2 MB, 16-way L2 cache
• Main Memory• DDR3 1333 MHz• Latency of 15ns per command (tRP, tRCD, CL)• 8B wide core to memory bus
26
Evaluation Methodology
Wednesday, March 17, 2010
0
1
2
3
4
5
6
7
Syst
em U
nfai
rnes
s
No FairnessFair Cache Capacity (VPC)Parallelism-Aware Batch Scheduling+VPCFairness via Source Throttling (FST)
grom+a
rt+ast
ar+h
264
art+
games
+Gem
s+h2
64
lbm+o
mnet+
apsi+
vorte
x
art+
leslie
+gam
es+g
rom
art+
astar
+lesli
e+cr
afty
art+
milc+v
ortex
+calc
ulix
lucas+
ammp+
xalan
c+gro
m
lbm+G
ems+
astar
+mes
a
mgrid+
parse
r+so
plex+
perlb
gcc0
6+xa
lanc+
lbm+c
actu
s
gmea
n
27
System Unfairness Results
Wednesday, March 17, 2010
0
1
2
3
4
5
6
7
Syst
em U
nfai
rnes
s
No FairnessFair Cache Capacity (VPC)Parallelism-Aware Batch Scheduling+VPCFairness via Source Throttling (FST)
grom+a
rt+ast
ar+h
264
art+
games
+Gem
s+h2
64
lbm+o
mnet+
apsi+
vorte
x
art+
leslie
+gam
es+g
rom
art+
astar
+lesli
e+cr
afty
art+
milc+v
ortex
+calc
ulix
lucas+
ammp+
xalan
c+gro
m
lbm+G
ems+
astar
+mes
a
mgrid+
parse
r+so
plex+
perlb
gcc0
6+xa
lanc+
lbm+c
actu
s
gmea
n
27
System Unfairness Results
Wednesday, March 17, 2010
0
1
2
3
4
5
6
7
Syst
em U
nfai
rnes
s
No FairnessFair Cache Capacity (VPC)Parallelism-Aware Batch Scheduling+VPCFairness via Source Throttling (FST)
grom+a
rt+ast
ar+h
264
art+
games
+Gem
s+h2
64
lbm+o
mnet+
apsi+
vorte
x
art+
leslie
+gam
es+g
rom
art+
astar
+lesli
e+cr
afty
art+
milc+v
ortex
+calc
ulix
lucas+
ammp+
xalan
c+gro
m
lbm+G
ems+
astar
+mes
a
mgrid+
parse
r+so
plex+
perlb
gcc0
6+xa
lanc+
lbm+c
actu
s
gmea
n
27
System Unfairness Results
Wednesday, March 17, 2010
0
1
2
3
4
5
6
7
Syst
em U
nfai
rnes
s
No FairnessFair Cache Capacity (VPC)Parallelism-Aware Batch Scheduling+VPCFairness via Source Throttling (FST)
grom+a
rt+ast
ar+h
264
art+
games
+Gem
s+h2
64
lbm+o
mnet+
apsi+
vorte
x
art+
leslie
+gam
es+g
rom
art+
astar
+lesli
e+cr
afty
art+
milc+v
ortex
+calc
ulix
lucas+
ammp+
xalan
c+gro
m
lbm+G
ems+
astar
+mes
a
mgrid+
parse
r+so
plex+
perlb
gcc0
6+xa
lanc+
lbm+c
actu
s
gmea
n
27
System Unfairness Results
Wednesday, March 17, 2010
0
1
2
3
4
5
6
7
Syst
em U
nfai
rnes
s
No FairnessFair Cache Capacity (VPC)Parallelism-Aware Batch Scheduling+VPCFairness via Source Throttling (FST)
grom+a
rt+ast
ar+h
264
art+
games
+Gem
s+h2
64
lbm+o
mnet+
apsi+
vorte
x
art+
leslie
+gam
es+g
rom
art+
astar
+lesli
e+cr
afty
art+
milc+v
ortex
+calc
ulix
lucas+
ammp+
xalan
c+gro
m
lbm+G
ems+
astar
+mes
a
mgrid+
parse
r+so
plex+
perlb
gcc0
6+xa
lanc+
lbm+c
actu
s
gmea
n
27
System Unfairness Results
Wednesday, March 17, 2010
0
1
2
3
4
5
6
7
Syst
em U
nfai
rnes
s
No FairnessFair Cache Capacity (VPC)Parallelism-Aware Batch Scheduling+VPCFairness via Source Throttling (FST)
grom+a
rt+ast
ar+h
264
art+
games
+Gem
s+h2
64
lbm+o
mnet+
apsi+
vorte
x
art+
leslie
+gam
es+g
rom
art+
astar
+lesli
e+cr
afty
art+
milc+v
ortex
+calc
ulix
lucas+
ammp+
xalan
c+gro
m
lbm+G
ems+
astar
+mes
a
mgrid+
parse
r+so
plex+
perlb
gcc0
6+xa
lanc+
lbm+c
actu
s
gmea
n
27
System Unfairness Results
Wednesday, March 17, 2010
0
1
2
3
4
5
6
7
Syst
em U
nfai
rnes
s
No FairnessFair Cache Capacity (VPC)Parallelism-Aware Batch Scheduling+VPCFairness via Source Throttling (FST)
grom+a
rt+ast
ar+h
264
art+
games
+Gem
s+h2
64
lbm+o
mnet+
apsi+
vorte
x
art+
leslie
+gam
es+g
rom
art+
astar
+lesli
e+cr
afty
art+
milc+v
ortex
+calc
ulix
lucas+
ammp+
xalan
c+gro
m
lbm+G
ems+
astar
+mes
a
mgrid+
parse
r+so
plex+
perlb
gcc0
6+xa
lanc+
lbm+c
actu
s
gmea
n
44.4%
27
System Unfairness Results
Wednesday, March 17, 2010
0
1
2
3
4
5
6
7
Syst
em U
nfai
rnes
s
No FairnessFair Cache Capacity (VPC)Parallelism-Aware Batch Scheduling+VPCFairness via Source Throttling (FST)
grom+a
rt+ast
ar+h
264
art+
games
+Gem
s+h2
64
lbm+o
mnet+
apsi+
vorte
x
art+
leslie
+gam
es+g
rom
art+
astar
+lesli
e+cr
afty
art+
milc+v
ortex
+calc
ulix
lucas+
ammp+
xalan
c+gro
m
lbm+G
ems+
astar
+mes
a
mgrid+
parse
r+so
plex+
perlb
gcc0
6+xa
lanc+
lbm+c
actu
s
gmea
n
44.4%
36%
27
System Unfairness Results
Wednesday, March 17, 2010
0
0.4
0.8
1.2
1.6
2
Syst
em P
erf.
Nor
mal
ized
to
No
Fair
ness Fair Cache Capacity (VPC)
Parallelism-Aware Batch Scheduling + VPCFairness via Source Throttling (FST)
gmea
n
grom+a
rt+ast
ar+h
264
art+
games
+Gem
s+h2
64
lbm+o
mnet+
apsi+
vorte
x
art+
leslie
+gam
es+g
rom
art+
astar
+lesli
e+cr
afty
art+
milc+v
ortex
+calc
ulix
lucas+
ammp+
xalan
c+gro
m
lbm+G
ems+
astar
+mes
a
mgrid+
parse
r+so
plex+
perlb
gcc0
6+xa
lanc+
lbm+c
actu
s
28
System Performance Results
Wednesday, March 17, 2010
0
0.4
0.8
1.2
1.6
2
Syst
em P
erf.
Nor
mal
ized
to
No
Fair
ness Fair Cache Capacity (VPC)
Parallelism-Aware Batch Scheduling + VPCFairness via Source Throttling (FST)
gmea
n
grom+a
rt+ast
ar+h
264
art+
games
+Gem
s+h2
64
lbm+o
mnet+
apsi+
vorte
x
art+
leslie
+gam
es+g
rom
art+
astar
+lesli
e+cr
afty
art+
milc+v
ortex
+calc
ulix
lucas+
ammp+
xalan
c+gro
m
lbm+G
ems+
astar
+mes
a
mgrid+
parse
r+so
plex+
perlb
gcc0
6+xa
lanc+
lbm+c
actu
s
28
System Performance Results
Wednesday, March 17, 2010
0
0.4
0.8
1.2
1.6
2
Syst
em P
erf.
Nor
mal
ized
to
No
Fair
ness Fair Cache Capacity (VPC)
Parallelism-Aware Batch Scheduling + VPCFairness via Source Throttling (FST)
gmea
n
25.6%
grom+a
rt+ast
ar+h
264
art+
games
+Gem
s+h2
64
lbm+o
mnet+
apsi+
vorte
x
art+
leslie
+gam
es+g
rom
art+
astar
+lesli
e+cr
afty
art+
milc+v
ortex
+calc
ulix
lucas+
ammp+
xalan
c+gro
m
lbm+G
ems+
astar
+mes
a
mgrid+
parse
r+so
plex+
perlb
gcc0
6+xa
lanc+
lbm+c
actu
s
28
System Performance Results
Wednesday, March 17, 2010
0
0.4
0.8
1.2
1.6
2
Syst
em P
erf.
Nor
mal
ized
to
No
Fair
ness Fair Cache Capacity (VPC)
Parallelism-Aware Batch Scheduling + VPCFairness via Source Throttling (FST)
gmea
n
25.6%
14%
grom+a
rt+ast
ar+h
264
art+
games
+Gem
s+h2
64
lbm+o
mnet+
apsi+
vorte
x
art+
leslie
+gam
es+g
rom
art+
astar
+lesli
e+cr
afty
art+
milc+v
ortex
+calc
ulix
lucas+
ammp+
xalan
c+gro
m
lbm+G
ems+
astar
+mes
a
mgrid+
parse
r+so
plex+
perlb
gcc0
6+xa
lanc+
lbm+c
actu
s
28
System Performance Results
Wednesday, March 17, 2010
• Fairness via Source Throttling (FST) is a new fair and high-performance shared resource management approach for CMPs
• Dynamically monitors unfairness and throttles down sources of interfering memory requests
• Eliminates the need for and complexity of multiple per-resource fairness techniques
• Improves both system fairness and performance
• Incorporates thread weights and enables different fairness objectives
29
Conclusion
Wednesday, March 17, 2010
Fairness via Source Throttling:
A configurable and high-performance fairness substrate for multi-core memory systems
Eiman Ebrahimi*
Chang Joo Lee*
Onur Mutlu‡
Yale N. Patt*
* HPS Research Group
The University of Texas at Austin
‡ Computer Architecture Laboratory
Carnegie Mellon University
Wednesday, March 17, 2010