mercury laser driver reliability considerations hapl integration group earl ault june 20, 2005...
TRANSCRIPT
Mercury
Laser Driver Reliability Considerations
HAPL Integration Group
Earl Ault
June 20, 2005
UCRL-POST-213303
RJB/VG
Purpose of this work
• So far the HAPL project has characterized the required reliability of the laser driver by a single number, the shot life in the range of 108 to 109 shots
• At 10 Hz this translates into a “lifetime” of ~270 to ~2700 hours• This is a simplification; the real question is “How long does an
individual unit (beam line) have to run and how long do we have to repair a unit in order to achieve an acceptable system availability ?”
• In the work described below we deal with this question• Once we know the relationship between unit failure rate and system
availability we can then establish the reliability requirements for an IFE driver at the subsystem level
• From this we can determine how much it costs to run and repair the system at a given availability
RJB/VG
Previous experience
• Our experience in solid state IFE storage lasers is based on single shot systems where the issue is making sure nearly all the beam lines fire at the right time and with the right power balance
• For a rep-rated system the additional requirement is availability, the fraction of the system that is online compared to its full up capacity
• In the Mercury laser the main “unreliability” is caused by optical damage of critical components
• Diodes, pin holes, transport optics, etc., are expected to fail gracefully, leading to degraded performance over time
• Critical optical elements can be single point, catastrophic failures• In Mercury, comprehensive damage diagnostics allow us to
intervene before catastrophic damage occurs• Repairs can be effected to mitigate the cost of full replacement once
damage initiation is discovered, but at a cost in dollars and availability
RJB/VG
Reliability Considerations
•For the IFE driver we don’t yet know the characteristics of these failures because we do not have design and testing information
•What we can do is to use simple tools to understand what is required of these designs and see the impact of systemarchitecture decisions
•For this poster we will assume that testing has caught most infant failures and the design is robust enough to have a low random failure component
•We assume the operations management is sufficiently mature that failures due to operator errors or QC problems are rare
•This leaves some sort of wear out failure, e.g., critical optical component damage, etc.
RJB/VG
Wear out or lifetime failures
Failure Modes
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.2 0.4 0.6 0.8 1 1.2 1.4
Relative Time
Failure Probability
Wear out failure
Random failures
Infant failureMTBF
Different failuredistributions
RJB/VG
IRE Plant Driver Simulator
• A numerical simulator is used to clock through system operating hours with failed units dropping out and being repaired and coming back on-line
• NIF like architecture - 192 beam lines• Assumptions:
– All beam lines identical– Beam lines are grouped in quads (4) for delivery the the target– Have a Mean Time Before Failure that can be characterized by a
Weibull distribution– Repair time equals clock tic time
• System availability defined as units running divided by units installed
• Two cases are considered: service by quad (repair 1, idle 3) and repair single beam lines while all others run normally
RJB/VG
Weibull Distribution
• Well behaved function, found to be appropriate for complex systems characterized by a life time
• We define the characteristic time as the MTBF is a shape factor =1 defines an exponential distribution• We use in the range of 6-10 to get a smeared out
failure probability to model the statistical effect of beam lines having a distribution of life times around some mean life time
Reliability function R = e-(t^) /
RJB/VG
Weibull distribution continued
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
0 0.5 1 1.5 2 2.5
beta=1
beta=3
beta=7
R = e-(t^)/
RJB/VG
Probability Matrix R = e-(t^)/
• We need to find a way to distribute the possible failure times of an ensemble of beam lines all running at the same time
• Form a matrix of rows equal to a reliability value and columns related to the age of the device
• For reliability values greater than R assign a one• For values less than R assign a zero• Select elements by asking the question “At a given age, what is the
reliability value?” Select a value at random from the total available row values. If the element is 1, continue running. If zero, declare a failure and repair.
RJB/VG
Probability Matrix continued R = e-(t^)/
1111111111000000000000000
1111111111100000000000000
1111111111110000000000000
1111111111110000000000000
1111111111111000000000000
1111111111111000000000000
1111111111111100000000000
1111111111111111000000000
1111111111111111111000000
Create a matrix of 10,000 elements For this model
100 rows(related toProbability)
100 columns (related to time)
Weibull Distribution Test
-50
0
50
100
150
200
250
0 20 40 60 80 100 120
column number
Accumulated score
RJB/VG
Weibull Distribution Test
-50
0
50
100
150
200
250
0 20 40 60 80 100 120
column number
Accumulated score
Test of the Probability Matrix
20,000 tries at randomly selecting matrix elements shows that the 10,000 element probability matrix displays the desired shape of a Weibull distribution
Define as MTBF (probibility = 1/e)
RJB/VG
Run model
• The model consists of a matrix of cells, each is a beam line• The accumulated running hours on that line since the last repair is
shown in the cell• A running cell is green• An idle cell is yellow• A failed cell is red• The program updates each cell by a tic equal to the MTTR (not
required, could by any time step)• Each cell is interrogated and the run time compared to the probability
matrix to see if it should fail or not• On the next tic all the units are back in service• We keep track of the total failures, integrated failures, and availability• Operating costs can then be estimated based on repair cost and
number of units per day repaired
530 510 350 450
640 0 310 520
10 540 310 590
Quad
Beam line age hr
280 215
410 0
Port bundleof single lines
RJB/VG
Screen shot:Case 1 repair by quads
35 30 625 235 50 755 140 625 530 5 140 670
500 640 595 65 140 525 210 720 615 85 305 55
15 555 325 160 100 625 510 195 15 35 280 140
185 585 415 5 730 690 660 170 115 20 570 110
135 620 35 150 520 85 75 145 245 15 50 140
495 120 55 575 160 30 70 675 115 105 205 275
700 620 5 90 695 80 605 5 260 80 185 155
45 540 155 275 710 50 265 10 250 70 135 65
140 20 40 585 30 705 95 195 95 85 135 195
45 70 160 215 75 125 220 345 545 700 85 145
680 55 305 0 175 170 140 675 210 395 35 285
635 805 115 220 105 20 70 215 235 75 305 265
315 120 130 15 610 360 115 220 135 35 155 185
365 700 35 605 680 25 610 220 350 100 25 140
15 55 20 175 0 245 185 70 345 310 310 170
0
100
200
300
400
500
600
0 200 400 600 800 1000 1200 1400 1600 1800 20000
2
4
6
8
10
12
Time Step 389Failed Units 2
Total Failures 525System Time 1945
System Availability 0.958Total Units 192
Random multiplier 250Quad install time 10 hr
Idle Units 6
Shot count 7.0E+06MTTR 5 hrMTBF 1000 hr MTBF shots 3.6E+06
RJB/VG
System time 1945 hr - repair as quads
System Availibility
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
0 200 400 600 800 1000 1200 1400 1600 1800 2000
System Time hr
Availibility
Current Failures and Integrated Failures
0
100
200
300
400
500
600
0 200 400 600 800 1000 1200 1400 1600 1800 2000
System Time Hours
0
2
4
6
8
10
12
RJB/VG
Comments concerning the previous slides
• The peaks and valleys in availability as well as in the number of failures is an artifact of the system being activated 4 beams at a time. Over time they smear out as the individual beam ages become random.
• The failures begin to show up at a few hundred hours because the system is activated in groups every 10 hours. Therefore there is a range of ages when we start the clock
• The dips in availability are significant and would likely require the plant to be out of service for the repair time (5 hr in this example)
• Even with a MTTB of 1000 hours or 3.6x106 shots with a fairly broad Weibull distribution of failure life times as this example shows, system availability is over 90% most of the time
• If the distribution of the failures can be narrowed (a steeper Weibull centered at a given age) then is is possible to have a preventative maintenance program that could synchronize repairs with other plant activities so as to achieve higher availability when the driver is operating
RJB/VG
Screen shot:Case 1 repair individually
295 240 120 770 745 315 600 375 355 340 240 150
365 90 80 365 230 210 230 80 325 255 190 225
45 95 130 340 115 775 240 385 70 430 360 100
240 470 330 140 130 175 305 230 220 10 65 335
305 40 695 220 130 470 220 220 205 40 25 235
170 235 230 325 775 280 245 235 220 130 250 775
775 250 270 80 360 200 120 130 205 220 240 400
155 75 170 205 340 175 60 165 160 295 155 585
145 220 125 345 165 200 165 105 325 255 175 180
430 185 195 40 45 450 110 305 115 225 45 310
95 740 185 140 595 180 285 310 0 55 85 160
135 200 125 170 280 320 265 315 230 165 90 110
90 145 225 775 335 185 160 110 265 235 695 320
150 0 135 95 280 180 290 100 700 230 80 280
130 305 145 290 435 200 65 20 450 110 200 0
140 295 100 260 40 130 220 45 255 255 315 310
Time Step 280Failed Units 3
Total Failures 370System Time 1400
System Availability 0.984Total Units 192
Random multiplier 250Activation time 2
48 Port bundles in Yellow
Shot count 5.0E+06MTBF hours 1000MTBF shots 3.6E+06MTTR hours 5Load random ages
RJB/VG
System time 1400 hr - repair individually
System Availibility
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
0 200 400 600 800 1000 1200 1400 1600
System Time hr
Availibility
Current Failures and Integrated Failures
0
2
4
6
8
10
12
14
16
18
20
0 200 400 600 800 1000 1200 1400 1600
System Time Hours
0
50
100
150
200
250
300
350
400
RJB/VG
Model results with beams repaired individually
• Target symmetry will likely require fewer out of service beams than shown in the first case
• This can be addressed by designing the system to allow single beam line repair, increase the individual reliability,or both
• Here we have relaxed the requirement to idle 3 beams when repairing a single beam while leaving the MTBF and MTTR the same
• We see that the availability is significantly improved, never dipping below 95%
• Even with an MTBF of 1000 hr (3.6x106 shots) driver performance may be adequate in terms of beam balance
• Obviously extended shot life and the possibility of a preventative maintenance program will reduce costs and down time
RJB/VG
Summary
• The minimum availability the plant can tolerate will depend on the beam bundling architecture, power balance on target, and beam symmetry
• Integration choices and selection of LRU unit design are important issues that ultimately drive system performance
• This tool allows us to study these questions and can be extended to the reliability assessment at the beam line or lower level
• In addition, the simulator gives us a way to partition the system into appropriate LRUs for optimum repair and operation
• With these two simple examples we see the impact of the decision to repair at the quad level as in a NIF architecture (4 beams, 3 idle, 1 repaired) verses ability to repair at the single beam level