mercury laser driver reliability considerations hapl integration group earl ault june 20, 2005...

Mercury

Laser Driver Reliability Considerations

HAPL Integration Group

Earl Ault

June 20, 2005

UCRL-POST-213303

RJB/VG

Purpose of this work

• So far the HAPL project has characterized the required reliability of the laser driver by a single number, the shot life in the range of 108 to 109 shots

• At 10 Hz this translates into a “lifetime” of ~270 to ~2700 hours• This is a simplification; the real question is “How long does an

individual unit (beam line) have to run and how long do we have to repair a unit in order to achieve an acceptable system availability ?”

• In the work described below we deal with this question• Once we know the relationship between unit failure rate and system

availability we can then establish the reliability requirements for an IFE driver at the subsystem level

• From this we can determine how much it costs to run and repair the system at a given availability

RJB/VG

Previous experience

• Our experience in solid state IFE storage lasers is based on single shot systems where the issue is making sure nearly all the beam lines fire at the right time and with the right power balance

• For a rep-rated system the additional requirement is availability, the fraction of the system that is online compared to its full up capacity

• In the Mercury laser the main “unreliability” is caused by optical damage of critical components

• Diodes, pin holes, transport optics, etc., are expected to fail gracefully, leading to degraded performance over time

• Critical optical elements can be single point, catastrophic failures• In Mercury, comprehensive damage diagnostics allow us to

intervene before catastrophic damage occurs• Repairs can be effected to mitigate the cost of full replacement once

damage initiation is discovered, but at a cost in dollars and availability

RJB/VG

Reliability Considerations

•For the IFE driver we don’t yet know the characteristics of these failures because we do not have design and testing information

•What we can do is to use simple tools to understand what is required of these designs and see the impact of systemarchitecture decisions

•For this poster we will assume that testing has caught most infant failures and the design is robust enough to have a low random failure component

•We assume the operations management is sufficiently mature that failures due to operator errors or QC problems are rare

•This leaves some sort of wear out failure, e.g., critical optical component damage, etc.

RJB/VG

Wear out or lifetime failures

Failure Modes

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.2 0.4 0.6 0.8 1 1.2 1.4

Relative Time

Failure Probability

Wear out failure

Random failures

Infant failureMTBF

Different failuredistributions

RJB/VG

IRE Plant Driver Simulator

• A numerical simulator is used to clock through system operating hours with failed units dropping out and being repaired and coming back on-line

• NIF like architecture - 192 beam lines• Assumptions:

– All beam lines identical– Beam lines are grouped in quads (4) for delivery the the target– Have a Mean Time Before Failure that can be characterized by a

Weibull distribution– Repair time equals clock tic time

• System availability defined as units running divided by units installed

• Two cases are considered: service by quad (repair 1, idle 3) and repair single beam lines while all others run normally

RJB/VG

Weibull Distribution

• Well behaved function, found to be appropriate for complex systems characterized by a life time

• We define the characteristic time as the MTBF is a shape factor =1 defines an exponential distribution• We use in the range of 6-10 to get a smeared out

failure probability to model the statistical effect of beam lines having a distribution of life times around some mean life time

Reliability function R = e-(t^) /

RJB/VG

Weibull distribution continued

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

0 0.5 1 1.5 2 2.5

beta=1

beta=3

beta=7

R = e-(t^)/

RJB/VG

Probability Matrix R = e-(t^)/

• We need to find a way to distribute the possible failure times of an ensemble of beam lines all running at the same time

• Form a matrix of rows equal to a reliability value and columns related to the age of the device

• For reliability values greater than R assign a one• For values less than R assign a zero• Select elements by asking the question “At a given age, what is the

reliability value?” Select a value at random from the total available row values. If the element is 1, continue running. If zero, declare a failure and repair.

RJB/VG

Probability Matrix continued R = e-(t^)/

1111111111000000000000000

1111111111100000000000000

1111111111110000000000000

1111111111110000000000000

1111111111111000000000000

1111111111111000000000000

1111111111111100000000000

1111111111111111000000000

1111111111111111111000000

Create a matrix of 10,000 elements For this model

100 rows(related toProbability)

100 columns (related to time)

Weibull Distribution Test

-50

0

50

100

150

200

250

0 20 40 60 80 100 120

column number

Accumulated score

RJB/VG

Weibull Distribution Test

-50

0

50

100

150

200

250

0 20 40 60 80 100 120

column number

Accumulated score

Test of the Probability Matrix

20,000 tries at randomly selecting matrix elements shows that the 10,000 element probability matrix displays the desired shape of a Weibull distribution

Define as MTBF (probibility = 1/e)

RJB/VG

Run model

• The model consists of a matrix of cells, each is a beam line• The accumulated running hours on that line since the last repair is

shown in the cell• A running cell is green• An idle cell is yellow• A failed cell is red• The program updates each cell by a tic equal to the MTTR (not

required, could by any time step)• Each cell is interrogated and the run time compared to the probability

matrix to see if it should fail or not• On the next tic all the units are back in service• We keep track of the total failures, integrated failures, and availability• Operating costs can then be estimated based on repair cost and

number of units per day repaired

530 510 350 450

640 0 310 520

10 540 310 590

Quad

Beam line age hr

280 215

410 0

Port bundleof single lines

RJB/VG

Screen shot:Case 1 repair by quads

35 30 625 235 50 755 140 625 530 5 140 670

500 640 595 65 140 525 210 720 615 85 305 55

15 555 325 160 100 625 510 195 15 35 280 140

185 585 415 5 730 690 660 170 115 20 570 110

135 620 35 150 520 85 75 145 245 15 50 140

495 120 55 575 160 30 70 675 115 105 205 275

700 620 5 90 695 80 605 5 260 80 185 155

45 540 155 275 710 50 265 10 250 70 135 65

140 20 40 585 30 705 95 195 95 85 135 195

45 70 160 215 75 125 220 345 545 700 85 145

680 55 305 0 175 170 140 675 210 395 35 285

635 805 115 220 105 20 70 215 235 75 305 265

315 120 130 15 610 360 115 220 135 35 155 185

365 700 35 605 680 25 610 220 350 100 25 140

15 55 20 175 0 245 185 70 345 310 310 170

0

100

200

300

400

500

600

0 200 400 600 800 1000 1200 1400 1600 1800 20000

2

4

6

8

10

12

Time Step 389Failed Units 2

Total Failures 525System Time 1945

System Availability 0.958Total Units 192

Random multiplier 250Quad install time 10 hr

Idle Units 6

Shot count 7.0E+06MTTR 5 hrMTBF 1000 hr MTBF shots 3.6E+06

RJB/VG

System time 1945 hr - repair as quads

System Availibility

0.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

0 200 400 600 800 1000 1200 1400 1600 1800 2000

System Time hr

Availibility

Current Failures and Integrated Failures

0

100

200

300

400

500

600

0 200 400 600 800 1000 1200 1400 1600 1800 2000

System Time Hours

0

2

4

6

8

10

12

RJB/VG

Comments concerning the previous slides

• The peaks and valleys in availability as well as in the number of failures is an artifact of the system being activated 4 beams at a time. Over time they smear out as the individual beam ages become random.

• The failures begin to show up at a few hundred hours because the system is activated in groups every 10 hours. Therefore there is a range of ages when we start the clock

• The dips in availability are significant and would likely require the plant to be out of service for the repair time (5 hr in this example)

• Even with a MTTB of 1000 hours or 3.6x106 shots with a fairly broad Weibull distribution of failure life times as this example shows, system availability is over 90% most of the time

• If the distribution of the failures can be narrowed (a steeper Weibull centered at a given age) then is is possible to have a preventative maintenance program that could synchronize repairs with other plant activities so as to achieve higher availability when the driver is operating

RJB/VG

Screen shot:Case 1 repair individually

295 240 120 770 745 315 600 375 355 340 240 150

365 90 80 365 230 210 230 80 325 255 190 225

45 95 130 340 115 775 240 385 70 430 360 100

240 470 330 140 130 175 305 230 220 10 65 335

305 40 695 220 130 470 220 220 205 40 25 235

170 235 230 325 775 280 245 235 220 130 250 775

775 250 270 80 360 200 120 130 205 220 240 400

155 75 170 205 340 175 60 165 160 295 155 585

145 220 125 345 165 200 165 105 325 255 175 180

430 185 195 40 45 450 110 305 115 225 45 310

95 740 185 140 595 180 285 310 0 55 85 160

135 200 125 170 280 320 265 315 230 165 90 110

90 145 225 775 335 185 160 110 265 235 695 320

150 0 135 95 280 180 290 100 700 230 80 280

130 305 145 290 435 200 65 20 450 110 200 0

140 295 100 260 40 130 220 45 255 255 315 310

Time Step 280Failed Units 3

Total Failures 370System Time 1400

System Availability 0.984Total Units 192

Random multiplier 250Activation time 2

48 Port bundles in Yellow

Shot count 5.0E+06MTBF hours 1000MTBF shots 3.6E+06MTTR hours 5Load random ages

RJB/VG

System time 1400 hr - repair individually

System Availibility

0.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

0 200 400 600 800 1000 1200 1400 1600

System Time hr

Availibility

Current Failures and Integrated Failures

0

2

4

6

8

10

12

14

16

18

20

0 200 400 600 800 1000 1200 1400 1600

System Time Hours

0

50

100

150

200

250

300

350

400

RJB/VG

Model results with beams repaired individually

• Target symmetry will likely require fewer out of service beams than shown in the first case

• This can be addressed by designing the system to allow single beam line repair, increase the individual reliability,or both

• Here we have relaxed the requirement to idle 3 beams when repairing a single beam while leaving the MTBF and MTTR the same

• We see that the availability is significantly improved, never dipping below 95%

• Even with an MTBF of 1000 hr (3.6x106 shots) driver performance may be adequate in terms of beam balance

• Obviously extended shot life and the possibility of a preventative maintenance program will reduce costs and down time

RJB/VG

Summary

• The minimum availability the plant can tolerate will depend on the beam bundling architecture, power balance on target, and beam symmetry

• Integration choices and selection of LRU unit design are important issues that ultimately drive system performance

• This tool allows us to study these questions and can be extended to the reliability assessment at the beam line or lower level

• In addition, the simulator gives us a way to partition the system into appropriate LRUs for optimum repair and operation

• With these two simple examples we see the impact of the decision to repair at the quad level as in a NIF architecture (4 beams, 3 idle, 1 repaired) verses ability to repair at the single beam level

mercury laser driver reliability considerations hapl integration group earl ault june 20, 2005...

Documents

acceptable system availability

repair single beam lines

given availability slide

ife driver

infant failures

catastrophic failures

beam lines assumptions

lifetime failures