Datacenter application interference
CMPs (popular in datacenters) offer increased throughput and reduced power consumption
They also increase resource sharing between applications, which can result in negative interference.
1
Resource contention is well studied
… at least on single machines.
3 main methods:
(1) Gladiator style match-ups
(2) Static analysis to predict application resource usage
(3) Measure benchmark resource usage; apply to live applications
2
New methodology for understanding datacenter interference is needed.
One that can handle complexities of a datacenter:
(10s of) thousands of applications real user inputs production hardware financially feasible low overhead
Hardware counter measurements of live applications.
3
Our contributions
1. ID complexities in datacenters
2. New measurement methodology
3. First large-scale study of measured interference on live datacenter applications.
4
Complexities of understanding application interference in a datacenter
5
Large chips and high core utilizations
Profiling 1000 12-core, 24-hyperthread Google servers running production workloads revealed the average machine had >14/24 HW threads in use.
6
Heterogeneous application mixes
Often applications have more than one co-runner on a machine.
Observed max of 19 unique co-runner threads (out of 24 HW threads).
0-1 Co-runners
2-3 Co-runners
4+ Co-runners
7
Application complexities
Fuzzy definitions
Varying and sometimes unpredictable inputs
Unknown optimal performance
8
Hardware & Economic Complexities
Varying micro-arch platforms
Necessity for low overhead = limited measurement capabilities
Corporate policies
9
Measurement methodology
10
Measurement Methodology
The goal:
A generic methodology to collect application interference data on live production datacenter servers
11
Measurement Methodology
12
App. A App. B
Tim
e
Measurement Methodology
1. Use sample-based monitoring to collect per machine per core event (HW counter) sample data.
1.
13
Measurement Methodology
14
App. A App. B
2 M instrs
2 M instrs
2 M instrs
2 M instrs
2 M instrs
2 M instrs
2 M instrs
2 M instrs
2 M instrs
2 M instrs
1
2
3
4
5
6
1
2
3
4
Measurement Methodology
2. Identify sample sized co-runner relationships…
2.
15
Measurement Methodology
16
App. A App. B
Samples A:1-A:6 are co-runners with App. B.
Samples B:1-B:4 are co-runners with App. A.
Measurement Methodology
17
App. C
App. A
App. B
Say that a new App. C starts running on CPU 1…
… B:4 no longer has a co-runner.
Measurement Methodology
3. Filter relationships by arch. independent interference classes…
3.
18
Measurement Methodology
Be on opp. sockets.
19
Measurement Methodology
Share only I/O
20
Measurement Methodology
4. Aggregate equivalent co-schedules.
4.
21
Measurement Methodology
22
For example: • Aggregate all the samples of App. A
that have App. B as a shared core co- runner.
• Aggregate all samples of App. A that have App. B as a shared core co-runner and App. C as a shared socket co- runner.
Measurement Methodology
5. Finally, calculate statistical indicators (means, medians) to get a midpoint performance for app. interference comparisons
5.
23
Measurement Methodology
24
App. A App. B
Avg. IPC = 2.0
Avg. IPC = 1.5
Applying the measurement methodology at Google.
25
Applying the Methodology @ Google
Event Instrs IPC
Sampling period 2.5 Million
Number of machines* 1000
Experiment Details:
* All had Intel Westmere chips (24 hyperthreads, 12 cores), matching clock speed, RAM, O/S
1. Collect samples
Method:
26
Applying the Methodology @ Google
Event Instrs IPC
Sampling period 2.5 Million
Number of machines* 1000
Experiment Details:
* All had Intel Westmere chips (24 hyperthreads, 12 cores), matching clock speed, RAM, O/S
Unique binary apps 1102
Co-runner relationships (top 8 apps)
Avg. shared core rel’ns 1M (min 2K)
Avg. shared socket 9.5M (min 12K)
Avg. opposite socket 11M (min 14K)
Collection results:
1. Collect samples
Method:
2. ID sample size relationships
3. Filter by interference classes
27
Applying the Methodology @ Google
4. Aggregate equiv. schedules
Method:
5. Calculate statistical indicators
28
Analyze Interference
streeview’s IPC changes with top co-runners
Overall median IPC across 1102 applications
29
Beyond noisy interferers (shared core)
30
Co-running applications
Base
Ap
plic
atio
n
Less or pos. interference
Negative interference
Noisy data
Beyond noisy interferers (shared core)
* Recall minimum pair has 2K samples; medians across full grid of 1102 apps
31
Base
Ap
plic
atio
ns
Co-running applications
Less or pos. interference
Noisy data
Negative interference
Performance Strategies
Restrict negative beyond noisy interferers (or encourage positive interferers as co-runners)
Isolate sensitive or antagonistic applications
32
Takeaways
1. New datacenter application interference studies can use our identified complexities as a check list.
2. Our measurement methodology (verified at Google in 1st large-scale measurements of live datacenter interference), is generally applicable and shows promising initial performance opportunities.
33