ken boyden international rectifier september, 2006 - ibm · ken boyden international rectifier...
TRANSCRIPT
System Power ManagementPower Architecture and Power Monitoring
Ken Boyden
International Rectifier
September, 2006
Problem• Management of cooling simply by monitoring temperature has several
problems
• Thermal Latency – Thermal response lag time is usually long compared to the stimulating events
• Reduction of cooling usually comes too early
• No knowledge of what is coming next
• Typical hysteretic mode or linear mode fan control has several issues
• Cooling response is triggered by thermostatic trigger events rather than actual power requirements
• Or by linear thermal response where the cooling lags the thermal rise
• This causes excessive response by the fan cooling system
• Efficiency is dynamic not just static• Thermal spikes are caused by power loss spikes
• If we could react to power rather than thermal events we could reduce loss peaks by controlling thermal-resistive elements in the system
Power Distribution in a Data Center
Power input TotalEquipment IT to Delivered Powerefficiency Center Data =
Source: Electrical Efficiency Modeling for Data Centers
Typical Data Center Efficiency – 30% to 60%
Annual Utility Cost for just the Server•Taken from an independent study of server cost of ownership for industry standard severs•All servers were chosen to provide nearly equal performance ≈120K opps /sec.
Server Annual Cost of Operation
Server Package KW Electrical AC Watts AC Cost 75% eff. BLDC 85% eff. VR now 87% eff. VR future 92% eff. Electrical AC Totalp650/Linux 1.60 $1,136 591.41 $420 $171 $151 $535 $478 $76 $28 $105DL740/Linux 1.60 $1,136 591.41 $420 $171 $151 $535 $478 $76 $28 $105DL740/Windows 1.60 $1,136 591.41 $420 $171 $151 $535 $478 $76 $28 $105rx5670/Linux 2.79 $1,981 1031.28 $732 $297 $262 $932 $834 $133 $49 $182rx5670/Windows 2.79 $1,981 1031.28 $732 $297 $262 $932 $834 $133 $49 $182SunFire/Solaris 3.92 $2,783 1448.96 $1,029 $418 $369 $1,310 $1,172 $187 $69 $256Cluster/Linux 1.85 $1,314 683.82 $486 $197 $174 $618 $553 $88 $33 $121Cluster/Windows 1.85 $1,314 683.82 $486 $197 $174 $618 $553 $88 $33 $121
VR Annual Cost of Power SavingsFans Annual Cost of PowerTotal Consumption Cooling
Other Cost Factors
• Reliability
• Transistor MTBF is exponential function of operating junction temperature
• A junction temperature rise of as little as 10°C can halve the lifetime of the component
• Performance
• The microprocessor can operate at higher clock speeds with lower junction temperatures .
• Gate delays are also reduced.
• Power due to leakage current is also reduced at lower temperatures.
• Noise
• Using PMAC motors with sinusoidal drive and tightly controlled power, significantly reduces both acoustic and EM noise.
Dynamic Thermal Management
• Most Package and cooling designs are based upon peak thermal events
• It takes over 5 ms to retrieve processor temperature data via Serial Management Buses
• Dynamic thermal management allows us to design for lower thermal events
• Dynamic voltage positioning already provides about a 10% savings in overall thermal budget.
• Sensing instantaneous and average power provides extra trigger points other than just extreme thermal events
• By monitoring both power and temperature it is possible to dynamically profile the processing environment. Statistical analysis can be used to determine trigger points for cooling based upon power and temperature sensing.
• Thermal reduction Mechanisms:• FAN
• Clock reduction
• Voltage Scaling
• Cache/Core enabling
Thermal Throttling – Intel Pentium
• Performance throttling(clock/voltage) is currently used to control thermal envelope.
• The big issue with this is the long thermal response time which causes ‘thermal overshoot’
• Throttling of VIDs also causes efficiency losses
Source-Intel Technology Journal
VR Efficiency
0102030405060708090
100
20 40 60 80 100 120 140
Current Amps
Eff
icie
ncy
Thermal Max. Design Point
VR Efficiency
0102030405060708090
100
20 40 60 80 100 120 140
Current Amps
Eff
icie
ncy
Dynamic Power Controlled Design Point
Initial Server Costs
Server Package KW Electrical AC Watts AC Costp650/Linux 1.60 $1,136 591.41 $420DL740/Linux 1.60 $1,136 591.41 $420DL740/Windows 1.60 $1,136 591.41 $420rx5670/Linux 2.79 $1,981 1031.28 $732rx5670/Windows 2.79 $1,981 1031.28 $732SunFire/Solaris 3.92 $2,783 1448.96 $1,029Cluster/Linux 1.85 $1,314 683.82 $486Cluster/Windows 1.85 $1,314 683.82 $486
Total Consumption Cooling
Costs with 20% reduction in Cooling Power Consumption
Server Package KW Electrical AC Watts AC Costp650/Linux 1.60 $1,022 473.13 $336DL740/Linux 1.60 $1,022 473.13 $336DL740/Windows 1.60 $1,022 473.13 $336rx5670/Linux 2.79 $1,783 825.02 $586rx5670/Windows 2.79 $1,783 825.02 $586SunFire/Solaris 3.92 $2,505 1159.17 $823Cluster/Linux 1.85 $1,182 547.06 $388Cluster/Windows 1.85 $1,182 547.06 $388
Total Consumption Cooling Electrical ACServer Package Savings Savings Totalp650/Linux $114 $84 $198DL740/Linux $114 $84 $198DL740/Windows $114 $84 $198rx5670/Linux $198 $146 $345rx5670/Windows $198 $146 $345SunFire/Solaris $278 $206 $484Cluster/Linux $131 $97 $228Cluster/Windows $131 $97 $228
•10% Electrical savings assumed by controlling the loadpoint for the entire power train
•20% savings assumed by reducing the AC requirements
Costs with 30% reduction in Cooling Power Consumption
Server Package KW Electrical AC Watts AC Costp650/Linux 1.60 $1,022 473.13 $336DL740/Linux 1.60 $1,022 473.13 $336DL740/Windows 1.60 $1,022 473.13 $336rx5670/Linux 2.79 $1,783 825.02 $586rx5670/Windows 2.79 $1,783 825.02 $586SunFire/Solaris 3.92 $2,505 1159.17 $823Cluster/Linux 1.85 $1,182 547.06 $388Cluster/Windows 1.85 $1,182 547.06 $388
Total Consumption Cooling Electrical ACServer Package Savings Savings Totalp650/Linux $114 $130 $243DL740/Linux $114 $130 $243DL740/Windows $114 $130 $243rx5670/Linux $198 $226 $424rx5670/Windows $198 $226 $424SunFire/Solaris $278 $318 $596Cluster/Linux $131 $150 $281Cluster/Windows $131 $150 $281
•10% Electrical savings assumed by controlling the loadpoint for the entire power train
•30% savings assumed by reducing the AC requirements
Data Center Example
• In a Data Center example, we see the greatest savings from Dynamic power control and designing for the actual power envelope
Source: Electrical Efficiency Modeling for Data Centers
Requirements for Solution
• Board based Power Management Control
• Control loop based upon Load Power rather than thermal events
• Accurate monitoring of each system Load point• This includes FBDIMM
• VR
• Chipset
• Drive Modules
• Graphics control
• System Based Power Control• Consolidates inputs from board/ module power controllers
• Control enclosure fans
• Provide system loading commands
• Control VRs
System management
Controller
Intelligent Platform
PowerInfo
VRPM
VRPM
VRPM
VRPM
PMVR
Control
Fan ControlChipset
FBDIM
M
FBDIM
M
Future Developments
• Processor
• Integrated Power Detection elements
• Energy per Operation detection
• Instruction Cache toggling
• Clock gating
• VR
• Intimate tie between CPU and VR voltage
• Operating system
• Speculative Processing• Like speculative branching but set up to minimize peak power events
Summary and Feedback
• The Majority of Data Center and Server Costs come from controlling the operating environment
• Most of the innovation has gone into the power train
• A method of determining and communicating actual dynamic power is needed
• Next Steps…
Bibliography• Dynamic Thermal Management for High-Performance Microprocessors
• David Brooks, Margaret Martonosi
• Dynamic Thermal Management for Distributed Systems• Andreas Weissel, Frank Bellosa
• Electrical Efficiency Modeling for Datacenters • Neil Rasmussen
• Intelligent Power Management Interface Specification
• Increasing Data Center Density While Driving Down Power and Cooling Costs• Intel Corporation
• Energy Efficient Server Clusters• E.N. Elnozahy, Michael Kistler, Ramakrishnan Rajamony
• SharkRack: The Problem of Thermal Management• HP corporation
• Thermal Performance Challenges from Silicon to Systems• Ram Viswanath, Vijay Wakharkar, Abhay Watwe, Vassou Lebonheur, Intel Corp.
• Total Cost of Ownership for Enterprise Application Workloads• Robert Frances Group
IR Variable Speed
BlowerController
A New Thermal Management Concept
ServerManagement
Module
CPU and System Thermal Inputs
PMBusor I2C
VRMs &
POLs
Power Feedback and load control via PMBus or I2C
PFC
8051uController
To PFC FET gate
BlowerController
#1
BlowerController
#2
Server Thermal Management:A Different Approach • A dual Motor/Blower controller utilizing Sensor-less control to
• Remove costly Hall Effect Sensors
• Remove PSOC times (n) blowers
• Remove Housekeeping supplies for PSOC in each Blower
• Improve Machine performance and efficiency
• An improved PFC• That increases overall conversion efficiency thus reducing Blower Power Dissipation, Data Center
TCO and / or increases CPU utilization
• Removes costly redundancy of PFC in each blower
• Improved overall Blower System Efficiency• Using Proprietary IR technology in PFC, Bridge and VS controls
• Increase CPU Utilization, or reduction in power, cooling requirements and TCO
• Increased blower control functionality• Improve granularity of blower control
• Further Reduce Noise
• Remove System Latency, which improves efficiency
Total Platform Power Control: A VISION
• Imagine Knowing the power draw of each power subsystem
• Imagine Being Able to Make Real Time Decisions to Optimize Performance While Maintaining a Thermal Envelope
• Imagine Knowing which subsystem to throttle to maximize User demanded performance
• Imagine Knowing the system is running its coolest and quietest
IR SubsystemPower Monitoring
IR SubsystemPower Monitoring
IR3720 Discrete Power Monitor And Temperature Sensor
VCC
RTN
VCO ÷ N
÷ N
PWRRegister
TAVGRegisterSet N
VSEN
+-9
CounterR
Clock
SMBusSlave
STROBE
+-
)( RTNVOIoRDCRR
I
F −⋅⋅⋅
X+ -
GND
SCL
SDA
ADD1
ADD2
ISEN
IOUT
SENREF
ALARMRegister>
ALARM#
Block Diagram• Flexible Current Sensing• DCR or Resistor
• Single Output or Multiphase
• Remote Temperature Sense• Uses External NTC
• Programmable Alarm• Alerts microcontroller of
excessive average power, or Over Temperature