ken boyden international rectifier september, 2006 - ibm · ken boyden international rectifier...

31
System Power Management Power Architecture and Power Monitoring Ken Boyden International Rectifier September, 2006

Upload: dokhuong

Post on 18-Jun-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

System Power ManagementPower Architecture and Power Monitoring

Ken Boyden

International Rectifier

September, 2006

Heat Densities – Future trend

Problem• Management of cooling simply by monitoring temperature has several

problems

• Thermal Latency – Thermal response lag time is usually long compared to the stimulating events

• Reduction of cooling usually comes too early

• No knowledge of what is coming next

• Typical hysteretic mode or linear mode fan control has several issues

• Cooling response is triggered by thermostatic trigger events rather than actual power requirements

• Or by linear thermal response where the cooling lags the thermal rise

• This causes excessive response by the fan cooling system

• Efficiency is dynamic not just static• Thermal spikes are caused by power loss spikes

• If we could react to power rather than thermal events we could reduce loss peaks by controlling thermal-resistive elements in the system

Power Distribution in a Data Center

Power input TotalEquipment IT to Delivered Powerefficiency Center Data =

Source: Electrical Efficiency Modeling for Data Centers

Typical Data Center Efficiency – 30% to 60%

Annual Utility Cost for just the Server•Taken from an independent study of server cost of ownership for industry standard severs•All servers were chosen to provide nearly equal performance ≈120K opps /sec.

Server Annual Cost of Operation

Server Package KW Electrical AC Watts AC Cost 75% eff. BLDC 85% eff. VR now 87% eff. VR future 92% eff. Electrical AC Totalp650/Linux 1.60 $1,136 591.41 $420 $171 $151 $535 $478 $76 $28 $105DL740/Linux 1.60 $1,136 591.41 $420 $171 $151 $535 $478 $76 $28 $105DL740/Windows 1.60 $1,136 591.41 $420 $171 $151 $535 $478 $76 $28 $105rx5670/Linux 2.79 $1,981 1031.28 $732 $297 $262 $932 $834 $133 $49 $182rx5670/Windows 2.79 $1,981 1031.28 $732 $297 $262 $932 $834 $133 $49 $182SunFire/Solaris 3.92 $2,783 1448.96 $1,029 $418 $369 $1,310 $1,172 $187 $69 $256Cluster/Linux 1.85 $1,314 683.82 $486 $197 $174 $618 $553 $88 $33 $121Cluster/Windows 1.85 $1,314 683.82 $486 $197 $174 $618 $553 $88 $33 $121

VR Annual Cost of Power SavingsFans Annual Cost of PowerTotal Consumption Cooling

Other Cost Factors

• Reliability

• Transistor MTBF is exponential function of operating junction temperature

• A junction temperature rise of as little as 10°C can halve the lifetime of the component

• Performance

• The microprocessor can operate at higher clock speeds with lower junction temperatures .

• Gate delays are also reduced.

• Power due to leakage current is also reduced at lower temperatures.

• Noise

• Using PMAC motors with sinusoidal drive and tightly controlled power, significantly reduces both acoustic and EM noise.

Dynamic Thermal Management

• Most Package and cooling designs are based upon peak thermal events

• It takes over 5 ms to retrieve processor temperature data via Serial Management Buses

• Dynamic thermal management allows us to design for lower thermal events

• Dynamic voltage positioning already provides about a 10% savings in overall thermal budget.

• Sensing instantaneous and average power provides extra trigger points other than just extreme thermal events

• By monitoring both power and temperature it is possible to dynamically profile the processing environment. Statistical analysis can be used to determine trigger points for cooling based upon power and temperature sensing.

• Thermal reduction Mechanisms:• FAN

• Clock reduction

• Voltage Scaling

• Cache/Core enabling

Thermal Throttling – Intel Pentium

• Performance throttling(clock/voltage) is currently used to control thermal envelope.

• The big issue with this is the long thermal response time which causes ‘thermal overshoot’

• Throttling of VIDs also causes efficiency losses

Source-Intel Technology Journal

Actual Thermal Envelope

Throttling Area

Envelope controlled with Dynamic Power Management

Design for Power vs. Performance

Source-Intel Technology Journal

VR Efficiency

0102030405060708090

100

20 40 60 80 100 120 140

Current Amps

Eff

icie

ncy

Thermal Max. Design Point

VR Efficiency

0102030405060708090

100

20 40 60 80 100 120 140

Current Amps

Eff

icie

ncy

Dynamic Power Controlled Design Point

Fan EfficiencyThermal Max. Design Point

Fan Efficiency

Dynamic Power Controlled Design Point

Initial Server Costs

Server Package KW Electrical AC Watts AC Costp650/Linux 1.60 $1,136 591.41 $420DL740/Linux 1.60 $1,136 591.41 $420DL740/Windows 1.60 $1,136 591.41 $420rx5670/Linux 2.79 $1,981 1031.28 $732rx5670/Windows 2.79 $1,981 1031.28 $732SunFire/Solaris 3.92 $2,783 1448.96 $1,029Cluster/Linux 1.85 $1,314 683.82 $486Cluster/Windows 1.85 $1,314 683.82 $486

Total Consumption Cooling

Costs with 20% reduction in Cooling Power Consumption

Server Package KW Electrical AC Watts AC Costp650/Linux 1.60 $1,022 473.13 $336DL740/Linux 1.60 $1,022 473.13 $336DL740/Windows 1.60 $1,022 473.13 $336rx5670/Linux 2.79 $1,783 825.02 $586rx5670/Windows 2.79 $1,783 825.02 $586SunFire/Solaris 3.92 $2,505 1159.17 $823Cluster/Linux 1.85 $1,182 547.06 $388Cluster/Windows 1.85 $1,182 547.06 $388

Total Consumption Cooling Electrical ACServer Package Savings Savings Totalp650/Linux $114 $84 $198DL740/Linux $114 $84 $198DL740/Windows $114 $84 $198rx5670/Linux $198 $146 $345rx5670/Windows $198 $146 $345SunFire/Solaris $278 $206 $484Cluster/Linux $131 $97 $228Cluster/Windows $131 $97 $228

•10% Electrical savings assumed by controlling the loadpoint for the entire power train

•20% savings assumed by reducing the AC requirements

Costs with 30% reduction in Cooling Power Consumption

Server Package KW Electrical AC Watts AC Costp650/Linux 1.60 $1,022 473.13 $336DL740/Linux 1.60 $1,022 473.13 $336DL740/Windows 1.60 $1,022 473.13 $336rx5670/Linux 2.79 $1,783 825.02 $586rx5670/Windows 2.79 $1,783 825.02 $586SunFire/Solaris 3.92 $2,505 1159.17 $823Cluster/Linux 1.85 $1,182 547.06 $388Cluster/Windows 1.85 $1,182 547.06 $388

Total Consumption Cooling Electrical ACServer Package Savings Savings Totalp650/Linux $114 $130 $243DL740/Linux $114 $130 $243DL740/Windows $114 $130 $243rx5670/Linux $198 $226 $424rx5670/Windows $198 $226 $424SunFire/Solaris $278 $318 $596Cluster/Linux $131 $150 $281Cluster/Windows $131 $150 $281

•10% Electrical savings assumed by controlling the loadpoint for the entire power train

•30% savings assumed by reducing the AC requirements

Data Center Example

• In a Data Center example, we see the greatest savings from Dynamic power control and designing for the actual power envelope

Source: Electrical Efficiency Modeling for Data Centers

Requirements for Solution

• Board based Power Management Control

• Control loop based upon Load Power rather than thermal events

• Accurate monitoring of each system Load point• This includes FBDIMM

• VR

• Chipset

• Drive Modules

• Graphics control

• System Based Power Control• Consolidates inputs from board/ module power controllers

• Control enclosure fans

• Provide system loading commands

• Control VRs

System management

Controller

Intelligent Platform

PowerInfo

VRPM

VRPM

VRPM

VRPM

PMVR

Control

Fan ControlChipset

FBDIM

M

FBDIM

M

Controlling the Data Center - IPMIBlade or MP board Internet

Source-Intel

Future Developments

• Processor

• Integrated Power Detection elements

• Energy per Operation detection

• Instruction Cache toggling

• Clock gating

• VR

• Intimate tie between CPU and VR voltage

• Operating system

• Speculative Processing• Like speculative branching but set up to minimize peak power events

Summary and Feedback

• The Majority of Data Center and Server Costs come from controlling the operating environment

• Most of the innovation has gone into the power train

• A method of determining and communicating actual dynamic power is needed

• Next Steps…

Bibliography• Dynamic Thermal Management for High-Performance Microprocessors

• David Brooks, Margaret Martonosi

• Dynamic Thermal Management for Distributed Systems• Andreas Weissel, Frank Bellosa

• Electrical Efficiency Modeling for Datacenters • Neil Rasmussen

• Intelligent Power Management Interface Specification

• Increasing Data Center Density While Driving Down Power and Cooling Costs• Intel Corporation

• Energy Efficient Server Clusters• E.N. Elnozahy, Michael Kistler, Ramakrishnan Rajamony

• SharkRack: The Problem of Thermal Management• HP corporation

• Thermal Performance Challenges from Silicon to Systems• Ram Viswanath, Vijay Wakharkar, Abhay Watwe, Vassou Lebonheur, Intel Corp.

• Total Cost of Ownership for Enterprise Application Workloads• Robert Frances Group

IR Variable Speed

BlowerController

A New Thermal Management Concept

ServerManagement

Module

CPU and System Thermal Inputs

PMBusor I2C

VRMs &

POLs

Power Feedback and load control via PMBus or I2C

PFC

8051uController

To PFC FET gate

BlowerController

#1

BlowerController

#2

Server Thermal Management:A Different Approach • A dual Motor/Blower controller utilizing Sensor-less control to

• Remove costly Hall Effect Sensors

• Remove PSOC times (n) blowers

• Remove Housekeeping supplies for PSOC in each Blower

• Improve Machine performance and efficiency

• An improved PFC• That increases overall conversion efficiency thus reducing Blower Power Dissipation, Data Center

TCO and / or increases CPU utilization

• Removes costly redundancy of PFC in each blower

• Improved overall Blower System Efficiency• Using Proprietary IR technology in PFC, Bridge and VS controls

• Increase CPU Utilization, or reduction in power, cooling requirements and TCO

• Increased blower control functionality• Improve granularity of blower control

• Further Reduce Noise

• Remove System Latency, which improves efficiency

Total Platform Power Control: A VISION

• Imagine Knowing the power draw of each power subsystem

• Imagine Being Able to Make Real Time Decisions to Optimize Performance While Maintaining a Thermal Envelope

• Imagine Knowing which subsystem to throttle to maximize User demanded performance

• Imagine Knowing the system is running its coolest and quietest

IR SubsystemPower Monitoring

IR SubsystemPower Monitoring

IR3720 Discrete Power Monitor And Temperature Sensor

VCC

RTN

VCO ÷ N

÷ N

PWRRegister

TAVGRegisterSet N

VSEN

+-9

CounterR

Clock

SMBusSlave

STROBE

+-

)( RTNVOIoRDCRR

I

F −⋅⋅⋅

X+ -

GND

SCL

SDA

ADD1

ADD2

ISEN

IOUT

SENREF

ALARMRegister>

ALARM#

Block Diagram• Flexible Current Sensing• DCR or Resistor

• Single Output or Multiphase

• Remote Temperature Sense• Uses External NTC

• Programmable Alarm• Alerts microcontroller of

excessive average power, or Over Temperature

XPhase Generation 3

Gen 3 Enhancements

•25% Higher Switching frequency with same power loss

•Reduced output capacitors

•Reduced external components

•Reduced system cost

• Improved accuracy

•25% improvement in power density

•Easier to use