cpu optimizations in the cern cloud - february 2016
TRANSCRIPT
CPU optimizations in the CERN Cloud Ops Midcycle - High Performance Computing with OpenStack - Manchester, 2016
Belmiro Moreira
[email protected] @belmiromoreira
Arne Wiebalck Tim Bell
Sean Crosby (Univ. of Melbourne) Ulrich Schwickerath
What is CERN?
3
CERN Cloud – LHC and Experiments
4
CMS detector
https://www.google.com/maps/streetview/#cern
CERN Cloud – AMS
5
OpenStack at CERN by numbers
6
~ 5500 Compute Nodes (~140k cores) • ~ 5300 KVM • ~ 200 Hyper-V
~ 2800 Images ( ~ 44 TB in use)
~ 2000 Volumes ( ~ 800 TB allocated) ~ 2200 Users ~ 2500 Projects
> 17000 VMs running
Number of VMs created (green) and VMs deleted (red) every 30 minutes
The “20% overhead” problem • When running the batch system on top of the Cloud Infrastructure
we reach the limit of the total number of hosts in LSF
• On our batch full node VMs we noticed that the HS06 rating was ~20% lower than on the underlying host
• Smaller VMs behaved much better: ~8% (sum of simultaneous HS06 runs on 4x8core VMs on a 32core host)
7
HS06 on virtual batch workers
8
HWDB HS06
VM Size (cores)
Per VM HS06 Total HS06 Overhead
357±16 4x 8 82.3±11 329 7.8%
2x 16 150±5 300 16% 1x 32 284±11 284 20.4%
Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
Testing Optimizations – KSM off
9
• ATLAS T0 batch VMs show an IOwait of 20-30% • Compute nodes started to swap even when leaving 2 GB for
the OS
Optimization by numbers – EPT off
10
HWDB HS06 VM Size (cores) Per VM HS06 Total HS06 Overhead
357±16 4x 8 82.3±11 329 7.8%
2x 16 150±5 300 16% 1x 32 284±11 284 20.4%
HWDB HS06 VM Size (cores) Per VM HS06 Total HS06 Overhead Overhead
Reduction
357±16 4x 8 87±11 348 2.5% 68%
2x 16 163.5±1 327 8.4% 52% 1x 32 311±1 311 12.9% 37%
Before:
After:
General virtualization issue? • Crosscheck w/ SLC6 VMs on Hyper-V
- 0.8% HS06 loss on 4x 8-core - 3.3% HS06 loss on 1x 32-core SLC6 VM
• No general virtualization overhead issue! - Rather a feature or configuration issue
• What’s the difference between the VMs on Hyper-V and KVM?
11
NUMA • Hyper-V VMs have vCPUs pinned to
physical NUMA nodes
- Pinned to sets that correspond to physical NUMA nodes
• OpenStack wider support for this is available in Kilo
12
NUMA - in the lab
… reduced the overhead to ~3% of the bare metal
13
Deploying in production • EPT off; KSM on; NUMA-aware • System services add ~1-2% overhead • We got a total overhead of:
~5%
14
and then Extremely slow nodes... • Small fraction of jobs 10x slower
- VMs look OK, actually pretty good - Hosts: 30-50% system load, >100k IRQ/s
(mostly TLB shoot-downs)
• Load attributed to qemu-kvm
- ‘perf top’: 90% in _raw_spin_lock - ‘systemtap’: paging64_page_fault
and kvm_mmu_pte* …
15
VM CPU utilization
Compute Node CPU utilization
Back to the drawing board • Needed to combine optimizations with EPT on
• Huge pages a way out?
- Idea: reduce the number of pages to be handled, increase hit ratio
• 1GB huge pages
- Best HS06 results (with EPT on)
• 2MB huge pages
- Also one of the default sizes - Performance loss around 5% compared to bare metal on batch VMs
16
Optimization by numbers
17
- NUMA + Pinning
- 2MB huge pages - EPT on - KSM on
VM sizes (cores) Before After
4x 8 7.8% 3.3%
2x 16 16% 4.6%
1x 32 20.4% 3-6%
Deploy in production • A small fraction can cause a lot of trouble…
18
Summary • Reduced the virtualization HS06 overhead to a few
percent compared to bare metal - On full node VMs! - NUMA + pinning + huge pages + EPT on + KSM on
• Pre-deployment testing very difficult
- EPT off side-effects initially undetected
19
[email protected] @belmiromoreira
http://openstack-in-production.blogspot.com