avoiding the pitfalls of redundant power systems design · nonredundant power system 1812 ps #1...

13
IBM Systems and Technology Group September 20, 2005 © 2005 IBM Corporation Steve Ahladas [email protected] Avoiding the Pitfalls of Redundant Power Systems Design

Upload: others

Post on 13-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Avoiding the Pitfalls of Redundant Power Systems Design · Nonredundant power system 1812 PS #1 Redundant power system w/ quarterly scheduled service 19.5 Hot swap redundant power

1

IBM Systems and Technology Group

September 20, 2005© 2005 IBM Corporation

Steve [email protected]

Avoiding the Pitfalls of Redundant Power Systems Design

Page 2: Avoiding the Pitfalls of Redundant Power Systems Design · Nonredundant power system 1812 PS #1 Redundant power system w/ quarterly scheduled service 19.5 Hot swap redundant power

2

2

IBM Systems and Technology Group

© 2005 IBM Corporation

Power Supply 200K Hr MTBF

Nonredundant power system

1812

PS #1

Redundant power system w/ quarterly scheduled service

19.5

Hot swap redundant power system; 48hr replacement

If only it were true…

Load

N+1 Power System Availability

PS #2

Outages/10K servers with 5 year life_____

0.4

Page 3: Avoiding the Pitfalls of Redundant Power Systems Design · Nonredundant power system 1812 PS #1 Redundant power system w/ quarterly scheduled service 19.5 Hot swap redundant power

3

3

IBM Systems and Technology Group

© 2005 IBM Corporation

Single Points of Failure

System Outage Probability =Probability of 2 failed power supplies in the system

+ Probability of all single fails that can cause a system outage

The design of a high availability power systems needs to focus on the reduction of critical failures more than the increase of power supply reliability.

Page 4: Avoiding the Pitfalls of Redundant Power Systems Design · Nonredundant power system 1812 PS #1 Redundant power system w/ quarterly scheduled service 19.5 Hot swap redundant power

4

4

IBM Systems and Technology Group

© 2005 IBM Corporation

Power Supply

Power Supply

Bus Converter

Bus Converter

Dual Power Cords

Redundant DC buses

Redundant POL convertersRedundant bus converters

shorted output

shorted output

shorted input

shorted input

shorted output

shorted output

shorted input

shorted input

Group converterparameters

Service System

Faulty sensors

Faulty service action

Where Single Points of Failure Can Occur

Page 5: Avoiding the Pitfalls of Redundant Power Systems Design · Nonredundant power system 1812 PS #1 Redundant power system w/ quarterly scheduled service 19.5 Hot swap redundant power

5

5

IBM Systems and Technology Group

© 2005 IBM Corporation

Shorted Output Sections

Load

RegulationControls

RegulationControls

Regulator #1

Regulator #2

Shorted output FET or Cap will bring down output bus

Addition of ORing FET (rectifier)

Rectifier – Simple but lossy

FET – Need control circuit; more efficientORing FETControls

RegulationControls

Page 6: Avoiding the Pitfalls of Redundant Power Systems Design · Nonredundant power system 1812 PS #1 Redundant power system w/ quarterly scheduled service 19.5 Hot swap redundant power

6

6

IBM Systems and Technology Group

© 2005 IBM Corporation

Latent Failures

Undetected failures can compromise redundancy

When this fails we need to know it

…and report it to the service system

The addition of the “+1” supply is simple, but the implications are not

ORing FETControls

RegulationControls

Page 7: Avoiding the Pitfalls of Redundant Power Systems Design · Nonredundant power system 1812 PS #1 Redundant power system w/ quarterly scheduled service 19.5 Hot swap redundant power

7

7

IBM Systems and Technology Group

© 2005 IBM Corporation

Separation of Regulation and Protection Circuits

UC3844

COMP1

VFB2

ISNS3

RT/CT4

OUT6

VCC7

VREF8

To Current Sense

To Gate Drive

VCC

+

-

OUT

+

-

OUTOvervoltage indicator

To Remote

Sense Point

+Vlocal

-Vlocal

Regulation and protection share remote sense amp

Regulation reference also shared

Second remote sense amp

Second reference

Second pair of sense leads

Open sense lead latent fail omitted

UC3844

COMP1

VFB2

ISNS3

RT/CT4

OUT6

VCC7

VREF8

To Current Sense

To Gate Drive

0

VCC

+

-

OUT

+

-

OUT

To Remote

Sense Point

Overvoltage indicator

+Vcc

+

-

OUT

To Remote

Sense Point #2

+Vcc

+Vee

+Vee

+Vcc

Page 8: Avoiding the Pitfalls of Redundant Power Systems Design · Nonredundant power system 1812 PS #1 Redundant power system w/ quarterly scheduled service 19.5 Hot swap redundant power

8

8

IBM Systems and Technology Group

© 2005 IBM Corporation

Regulator to Load Impedance (Penalty of Hot Plug)

∆Vo∆I2

2L

Vbulk Vo CL..Large signal approximation

Distribution LPhase L

I

L = Total L from bulk to load= Lp/N + Ld

Numerical example:

I=140; Lp=200nH; Ld=150nH;N=15;Vbulk=12; Vo=2

For V = 20mV; CL= 4000uF

If this were a VRM on the board (Ld~3nH) CL=400uF

The decision to add hot plug should not be taken lightly. It may have significant impact on cost and packaging.

Lp

Load

Ld

ooo

Vbulk Vo

CL

Total of N Phases

Page 9: Avoiding the Pitfalls of Redundant Power Systems Design · Nonredundant power system 1812 PS #1 Redundant power system w/ quarterly scheduled service 19.5 Hot swap redundant power

9

9

IBM Systems and Technology Group

© 2005 IBM Corporation

Overvoltage Discrimination

Error Amp

+

-

OUT

PWM

+

-

OUT

Vref

Ramp

Voltage Sense

VCC

To Gate Drive

+

-

OUT

Vref2

-Disable OV (Option 1)

-Disable OV (Option 2)

Bus OV does not indicate which regulator caused OV

Simple Solution

Better YetPS #2

I=0

I>0

Duty Cycle must go to 0 to use this

PS #1

OV culprit

Page 10: Avoiding the Pitfalls of Redundant Power Systems Design · Nonredundant power system 1812 PS #1 Redundant power system w/ quarterly scheduled service 19.5 Hot swap redundant power

10

10

IBM Systems and Technology Group

© 2005 IBM Corporation

Featurable Loads

PS #1 5v@50A

PS #2 5V@50A

One populated PCI slot

Nine empty PCI slots

5V@2A PCI card

o o

o

I=1

I=1

To assure N+1 redundancy, accurate current sharing & sensing required

1% current sharing with 1% current measurement accuracy is marginal here

Periodic Vout adjust up/down by service system will help

Most I/O (PCI cards, Disks) and memory tend to be highly featurable

Page 11: Avoiding the Pitfalls of Redundant Power Systems Design · Nonredundant power system 1812 PS #1 Redundant power system w/ quarterly scheduled service 19.5 Hot swap redundant power

11

11

IBM Systems and Technology Group

© 2005 IBM Corporation

Faulty Sensor (Group Overcurrent)

PS #1 1.1v@50A

PS #2 1.1V@50A

PS #3 1.1V@50A

Service System

Load =100A

Reported

Currents

I=33

I=33

I=60

Total output current must be observed to prevent/minimize smoke during load faults

Defective current sensor likely

A real group overcurrent will still have balanced regulator output currents

This group does not need to be shut down, only PS#3

Page 12: Avoiding the Pitfalls of Redundant Power Systems Design · Nonredundant power system 1812 PS #1 Redundant power system w/ quarterly scheduled service 19.5 Hot swap redundant power

12

12

IBM Systems and Technology Group

© 2005 IBM Corporation

Input FaultsPS #1

PS #2

PS #3Branch Circuit Protector

Shared (critical path)

Selective coordination can be nontrivial

Similar technology circuit protection used in the PS and branch protector will generally not coordinate.

VRM #1

VRM #2

Applies to distributed DC busses also

Bus Converter #1

Bus Converter #2

Shorted input on VRM input can bring down both busses

Solid state protection circuits practical

Page 13: Avoiding the Pitfalls of Redundant Power Systems Design · Nonredundant power system 1812 PS #1 Redundant power system w/ quarterly scheduled service 19.5 Hot swap redundant power

13

13

IBM Systems and Technology Group

© 2005 IBM Corporation

FMEA analysis at system levelTesting with real hardwareService modeling by service personnelDesign reviews of microcode

Design Practices That Can Help

“Failure of Imagination”