![Page 1: Data Center Server Agnostic Power Management… · Detect OC power events. including OC fault detection time) • Static server level limit can impact system performance. • Rack](https://reader036.vdocument.in/reader036/viewer/2022071103/5fdc97924e6d8f20cf35ae2d/html5/thumbnails/1.jpg)
![Page 2: Data Center Server Agnostic Power Management… · Detect OC power events. including OC fault detection time) • Static server level limit can impact system performance. • Rack](https://reader036.vdocument.in/reader036/viewer/2022071103/5fdc97924e6d8f20cf35ae2d/html5/thumbnails/2.jpg)
D a t a C e n t e r S e r v e r A g n o s t i c P o w e r M a n a g e m e n tBryan Kelly – Principal Firmware Eng Manager
Track: Datacenter Facility
![Page 3: Data Center Server Agnostic Power Management… · Detect OC power events. including OC fault detection time) • Static server level limit can impact system performance. • Rack](https://reader036.vdocument.in/reader036/viewer/2022071103/5fdc97924e6d8f20cf35ae2d/html5/thumbnails/3.jpg)
PSU
Rack
Compute PMDU
ServerManagement Switch
TOR
Rack Manager
Management
Project Olympus System Overview
![Page 4: Data Center Server Agnostic Power Management… · Detect OC power events. including OC fault detection time) • Static server level limit can impact system performance. • Rack](https://reader036.vdocument.in/reader036/viewer/2022071103/5fdc97924e6d8f20cf35ae2d/html5/thumbnails/4.jpg)
Project Olympus
• System and Rack design
• Manufacturing collateral; schematics and board files open sourced in Feb 2017
• Open Sourced Firmware: Open EDKII, OpenBMC and Open PDU and Rack Manager.
![Page 5: Data Center Server Agnostic Power Management… · Detect OC power events. including OC fault detection time) • Static server level limit can impact system performance. • Rack](https://reader036.vdocument.in/reader036/viewer/2022071103/5fdc97924e6d8f20cf35ae2d/html5/thumbnails/5.jpg)
Modules Features
PMDU
• 48U and 42U support• AC Power for rack devices (FCI and Cable)• Enables blade management and power control• Dual Feed with Circuit Breakers• Hot Plug Rack Manager and AC/DC Converter• No active single point of repair
Rack Manager + AC/DC
• Hot pluggable• 1GbE Management• Power Meter for PMDU• UARTs for switch and Digi• Blade power control
Management Switch • COTS 1GbE managed switch• Cold aisle cabling
Standalone Rack Manager
• 1U Standalone unit• Non-WCS rack management• Row management• Colo management• Reuse of Rack Manager Board Assy
PMDU
Power Management Distribution Unit
![Page 6: Data Center Server Agnostic Power Management… · Detect OC power events. including OC fault detection time) • Static server level limit can impact system performance. • Rack](https://reader036.vdocument.in/reader036/viewer/2022071103/5fdc97924e6d8f20cf35ae2d/html5/thumbnails/6.jpg)
6
Three Phase Power SupplyUniversal PMDU
Normal mode1020W340W
340W12v
340W
Phase fault 680W340W
340W12v
0W
Double fault 340W340W
0W12v
0W
Throttles & triggers repair
Rack Level Power Control Mechanism
![Page 7: Data Center Server Agnostic Power Management… · Detect OC power events. including OC fault detection time) • Static server level limit can impact system performance. • Rack](https://reader036.vdocument.in/reader036/viewer/2022071103/5fdc97924e6d8f20cf35ae2d/html5/thumbnails/7.jpg)
Rack Level Power Control Mechanism
• PMDU fans-out gpio logic from Rack Manager to servers• Servers are configured with a Power Policy
• Activated upon alert signal• Rack Manager contains rack level power meter, that runs
in a dedicated real-time coprocessor.• Rack Manager assigned power limit for the rack• If limit is reached, signal asserts causing servers to
activate their policy (instantly throttle, depending on policy)
• Rack Manager has option for auxiliary in-puts (row,colo)
![Page 8: Data Center Server Agnostic Power Management… · Detect OC power events. including OC fault detection time) • Static server level limit can impact system performance. • Rack](https://reader036.vdocument.in/reader036/viewer/2022071103/5fdc97924e6d8f20cf35ae2d/html5/thumbnails/8.jpg)
Server Motherboard
CPU Domain
BMC
CPU0
CPU1
PSU_OC_ALERT#
GPIO
PSU
SMLINK
GPIO
ALERT#
FORCE
ALERT#
ENABLE
GPIO
PROCHOT#MEMHOT#
PROCHOT# / MEMHOT#
ALERT
PmgtCtrl
HSC
RM THROTTLE RM
GPIO
ALERT#
ALERT#
ENABLE
GPIO
ALERT
PCAP#
PCAP#
Intel Baseboard Power Control Logic
• Alert logic can immediately throttle CPU power (<1ms)• BMC intercepts signals and applies power control policy to chipset, then deasserts logical alert• Chipset implements power/frequency limits on the CPU• Chipset perpetually polls HSC for platform power, and adjusts CPU limit
![Page 9: Data Center Server Agnostic Power Management… · Detect OC power events. including OC fault detection time) • Static server level limit can impact system performance. • Rack](https://reader036.vdocument.in/reader036/viewer/2022071103/5fdc97924e6d8f20cf35ae2d/html5/thumbnails/9.jpg)
CPU Agnostic Baseboard Power Control Logic
• Alert logic can immediately throttles CPU power (<1ms)• BMC intercepts signals and BMC co-processor applies power policy to CPUs, then deasserts logical alert• BMC implements power/frequency limits on the CPU• BMC perpetually polls HSC for platform power, and adjust CPU limit
![Page 10: Data Center Server Agnostic Power Management… · Detect OC power events. including OC fault detection time) • Static server level limit can impact system performance. • Rack](https://reader036.vdocument.in/reader036/viewer/2022071103/5fdc97924e6d8f20cf35ae2d/html5/thumbnails/10.jpg)
Power Control – Rack Level Interactions
5. Read Alert events - Under voltage- Phase loss- Over current- Feed failure- HW Int
2. Hardware interrupt(1ms response)
6. Event-based Power PolicyRemediation
4. Performance-aware Soft Power Cap
(within ms of power event)
3. Emergency Hard Power Cap(1ms of power event signal, 21ms including OC fault detection time)1. Detect OC power events
![Page 11: Data Center Server Agnostic Power Management… · Detect OC power events. including OC fault detection time) • Static server level limit can impact system performance. • Rack](https://reader036.vdocument.in/reader036/viewer/2022071103/5fdc97924e6d8f20cf35ae2d/html5/thumbnails/11.jpg)
• Static server level limit can impact system performance.
• Rack level permits servers run as high as they like, but cumulatively below the rack limit
• Tighter power provisioning made possible due to internal evitable workloads
• Fabric orchestrates policy, priority given to critical workloads
Dynamic Rack Level Power Control
server
server
server
server
server
server
server
server
server
server
server
server
server
server
server
server
server
server
server
server
server
server
server
server
server
server
server
server
server
server
server
server
server
server
server
20KW
0KW
Rack
![Page 12: Data Center Server Agnostic Power Management… · Detect OC power events. including OC fault detection time) • Static server level limit can impact system performance. • Rack](https://reader036.vdocument.in/reader036/viewer/2022071103/5fdc97924e6d8f20cf35ae2d/html5/thumbnails/12.jpg)
• Standalone Rack Manager located in MOR• Controls up to 24 Rack Managers
• Remote Power Control• Bootstrap• Rack Level Throttle
• RJ45 Cables• RM Throttle bypass
Row Management
Middle of Row Manager, same hardware and firmware, different role.
![Page 13: Data Center Server Agnostic Power Management… · Detect OC power events. including OC fault detection time) • Static server level limit can impact system performance. • Rack](https://reader036.vdocument.in/reader036/viewer/2022071103/5fdc97924e6d8f20cf35ae2d/html5/thumbnails/13.jpg)
• Redundant Colo Managers at Power Source (same device, firmware, different role).
• Standalone Rack Manager located in Row distribution panels
• Hierarchical design• Alert signal distribution propagates from
data center infrastructure to server infrastructure
600AMP
600AMP
600AMP
250AMP
250AMP
250AMP
250AMP
250AMP
250AMP
600AMP
600AMP
600AMP
250AMP
250AMP
250AMP
250AMP
250AMP
250AMP
2000AMP
PM
Colo Manager
20xServerRacks
20xServerRacks
20xServerRacks
20xServerRacks
20xServerRacks
20xServerRacks
20xServerRacks
20xServerRacks
20xServerRacks
20xServerRacks
20xServerRacks
20xServerRacks
Row Manager
Row Manager
Row Manager
Row Manager
Row Manager
Row Manager
Row Manager
Row Manager
Row Manager
Row Manager
Row Manager
Row Manager
Row Manager
Row Manager
Row Manager
Row Manager
Row Manager
Row Manager
Row Manager
Row Manager
Colo Power Management
![Page 14: Data Center Server Agnostic Power Management… · Detect OC power events. including OC fault detection time) • Static server level limit can impact system performance. • Rack](https://reader036.vdocument.in/reader036/viewer/2022071103/5fdc97924e6d8f20cf35ae2d/html5/thumbnails/14.jpg)
• Project Olympus:• https://github.com/opencomputeproject/Project_Olympus
• Rack Manager• https://github.com/Project-Olympus/rackmanager
• OpenUEFI• https://github.com/tianocore/edk2-platforms/tree/devel-
MinPlatform/Platform/Intel/PurleyOpenBoardPkg/BoardMtOlympus
Further Information
![Page 15: Data Center Server Agnostic Power Management… · Detect OC power events. including OC fault detection time) • Static server level limit can impact system performance. • Rack](https://reader036.vdocument.in/reader036/viewer/2022071103/5fdc97924e6d8f20cf35ae2d/html5/thumbnails/15.jpg)
![Page 16: Data Center Server Agnostic Power Management… · Detect OC power events. including OC fault detection time) • Static server level limit can impact system performance. • Rack](https://reader036.vdocument.in/reader036/viewer/2022071103/5fdc97924e6d8f20cf35ae2d/html5/thumbnails/16.jpg)