a failure-time model for infant-mortality and wearout failure modes

IEI?E TRANSACITONS ON REI.IABILITY, VOL. 48, NO. 4, 1999 DECEMBER 311

A Failure-Time Model for Infant-Mortality and Wearout Failure Modes

Victor Chan Iowa State University, Ames

william $. Meeker, Senior Member IE8E Iowa State University, Ames

Key Words - Bathtub hazard, Censored data, Limited failure population, Maximum likelihood.

Summary & Conclusions -- Some populations of electronic devices or other system components are subject to both infant-mortality & wearout failure modes. Typically, interest is in the estimation of reliability metrics such as distribution- quantiles or fraction-failing at a point in time for the population of units. This involves . modeling the failure time, . estimating the parameters of the failure-time distributions,

for the different failure modes, as well as the proportion of defective units. This paper: . Proposes GLFP (general limited failure population) for this purpose. . Uses the ML (maximum likelihood) method of to estimate

the unknown model parameters; the formulas for the likelihood contribution corresponding to difforent types of censoring are provided. . Describes a likelihood-based method to construct statistical-

confidence intervals and simultaneous statistical-confidence hands for quantities of interest. . Fits the model to a set of censored data to illustrate the

estimation technique and some of the model's characteristics. The model-fitting indicates that identification of the failure

mode of at least a few failed units is necessary to estimate model-parameters.

Rased on the fitting of tho data from the lifetime of circuit boards, the GLFP model provides a useful description of the failure-time distribution for components that have both wearout and some infant mortality behavior. However, the data must include the cause of failure for at least a few observations in order to avoid complications in the ML estimation. The more failed units whose failure mode bas been identified, the better model estimates are in terms of model-fitting.

1. INTRODUCTION Acronyms Ed Abbreviations'

GLFP general LFP LFP limited-failure-population (model)

ML maximum likelihood MLE ML estimate/estimator

Mode-1 failure mode due to infant mortality

'The singular & plural of an acronym are always spelled the same.

fadlure mode due to wearout Weibull (distribution) LogNorma1 (distribution) circuit board

failure time due to infant mortahy failure time due to wearout failure time for a GLFP proportion of population subject to

vector k of unknown model parameters Cdf of T k with 0, at time t , k = 1,2 Cdf of T and model parameter 0 at time t 1 - F(.) pdf corresponding to F T ( ~ ; 0) likelihood with parameter 0 likelihood contribution of observation i s-confidence region

infant mortality

1.1 Problem Some product populations contain a mixture of defec-

tive and nondefective units. The units with manufactur- ing defects will usually lead to an infant-mortality failure early in their lifetimes, whereas the nondefective units will eventually fail from wearout. Typically, the proportion of defective units is small. Especially in applications where some useful life is obtained from the defective units, it is useful to use a common model for the two types of failures. Examples of products & components that exhibit such behavior in a population include lasers, high-power transmitting tubes, and computer disk-drives.

This paper describes a (statistical) GLFP for such populations. The model combines 2 failure-time distributions, . one for Mode-1, . one for Mode-2.

Because the defect-related Mode-1 occurs in only a fraction of the population, this model also takes p into account.

When data become available, the cause of failure for all units is not always known. There a,re situations where the cause of failure for only some selected units has been de- termined, possibly by using failure mode analysis. This GLFP accounts for this in estimating the model parameters.

0018-9S29/99/$10.00 01999 IEEE

378 IEEE TRANSACTIONS ON RELIABILITY. VOL. 48. NO. 4, 1999 DECEMBER

1.2 Related Work LFP [9] does not allow for wearout failures. The GLFP

extends the LFP to allow for units with wearout failure mechanisms.

Ref [6] studied a related model with 2 s-independent causes of failures, both of which are assumed to be active for all the units in a population. It used the exponential distribution to model early failures and a Weihull distribution to model wearout failures. It investigated the prop- erties of the MLE for this model under the assumption that the exact failure times were known and all unit were susceptible to both failure modes, but that the cause of failure was not available for any of the failures.

The classical competing risk model (eg, [12: chapter 51) includes two or more s-independent causes of failures, all of which are assumed to be active for all the units in a population.

The traditional analyses presume that the data include the cause of failure for all failed units. But often, life data contain observations whose exact cause of failure are not known. Such data are masked. Ref [14] describes a method based on ML for estimating the parameters of component life distributions for uncensored masked data. Ref [7] provides more discussion on the method, including the treat- ment on time-censored masked data. Ref [5] improves the estimation suggested by the earlier papers.

1.3 Overvicw This paper describes MLE for the parameters in the

GLFP. As an illustration, the results of fitting the GLFP to a data set from electronic CB are given. . Section 2 describes the mathematical formulation of the

GLFP. The Weibull & lognormal distributions are commonly used in models for lifetime data, and are also used here in an example of GLFP fitting. . Section 3 briefly discusses the two distributions. . Section 4 discusses in detail the likelihood function and

contributions used in the MLE. . Sections 5 & 6 describe likelihood-ba,sed methods for

constructiug s-confidence intervals for the estimated parameters of the model. . Section 7 presents the results of fitting the GLFP to

some CB data.

2. GLFP GLFP can be viewed as a special case of the s-indepen-

dent competing risks (or series-system failure) model with 2 failure causes. Information on competing failure risk models is available in many reliability texts, eg, [12: chapter 51,

Assumptions 1. Every unit in a population is subject to a universal

causc of failure, such as failure due to wearout. 2. There is another cause of failure that d e c t s only

p of the population. When present, this failure mode usudly causes a unit to fail prematurely (and,hence the

population-fraction at risk to this failure is considered defective).

4

If we were to ignore the second failure mode at this point, thenp .Fl ( t ;&) is the Cdf of the failure time in the LFP used in [9].

The true failure time of a randomly selected unit is: T = m i n ( T ~ , T z ) . By the s-independence between the two failure modes, T has Cdf & pdE

3. The two failure modes are s-independent.

F T ( t ; e ) = p r { T s t } = i - [ i - p . 4 ( t ; 8 1 ) ] . ~ z ( t ; B z ) @ = (&,W

dFT(t; e) dt

Pdf{Tk} = f 1 ( t ; Q . k = 1 , 2 .

pdf(T} = f T ( t ; O ) =

= P . fl(t;el) .4( t ;ez) + f z ( t ; e z ) . [I - P . -%(t;e,)i

For simplicity, log-location-scale distributions such as Weihull& lognormal are used for Fk(t; Ok), k = 1,2 . Other distributions could be substituted without difficulty.

3. WEIBULL & LOGNORMAL DISTRIBUTIONS In the numerical example (section 7), the failure time

for the individual failure modes follows either a Weibull or lognormal distribution. These distributions belong to the log-location-scale family which, in general have a Cdf that can he expressed gs:

p location parameter, U scale parameter, for the Cdf of log(T); @(z) = a standardized Cdf (the Cdf of log(T) with p = 0 and U = 1).

Due to its flexibility in fitting data and its physical mo- tivation as a distribution of smallest extreme values, the Weibull distribution is a very popular model for failurc times. The Weibull Cdf is:

Pr{T 5 t } = F(t ; a, p)

Qsev(z) = 1 - exp[-exp(z)] is the standardized smallest extreme value distribution Cdf, a = exp(p) the positive scale parameter, 0 = 1,’~ = the positive shape parameter.

Another popular distribution for failure times, especially for electronic components, is the lognormal distribution. Its Cdf is:

On,, standard s-normal Cdf, p log(mean),

379 CIHANIMBEKER A TAILURh-TIME MODEL FOR INFANT-MORTALITY AND WhAROUT FAILURE-MODES

exp(p) = lognormal median, 0 = a shape parameter (log[standard deviation]),

4. MAXIMUM LIKELIHOOD ESTIMATION Given specified parametric forms of and Fg, the ML

is used to estimate el, 0 2 , and p .

4.1 Likelihood Function The likelihood function for the GLFP depends on the

data type and on whether the cause of failure for each failed unit has been identified. See [12: chapter 81 for details.

The L(0) for a sample of n s-independent observations can be expressed as:

n

L,(0) = thc likelihood contribubion for observation i. Correspondingly, the log lilcelihood for such a sample is:

4.2.2 Exact failure observation Even when failures occur in a continuous time process

with continuous monitoring, failure-time data are really discrete because reported failure times are rounded in some manner. Then the correct likeliliood is that for intcr- vel censoring. Nevertheless, failures are often rcported as exact times. For convenience (or because the rouud- ing rule Ims not been reported), the following density- approximation approach is commonly used.

Suppose that the failure time of unit i is reported as ti. When the failure mode has been identified, the likeliliood function can be obta.ined by taking the limit, of the ap- propriate probability for tlic interval-censored case, as the width of the interval becomes infinitesimally small. There are 3 cases: . The failure cause is known to be Mode-1,

~ ~ ( 0 ) = P . f l ( t i ; e 1 ) . [ i - ~ z ( t i ; e 2 ) ~

. The failure cause is known to be Mode-2,

The MLE of the unknown parameters in 0 = ( 6 ' 1 , B g ) are found by maximizing (2).

4.2 Likelihood Contributions In general, Li(0) can be thought of a the probability

that observation i would fail a,t the timc (or in tlie time interval) of tlie actual failure. Assnmptions 1 ~ 3 nre used here.

4.2.1 Interval-censored observation

times t: and t f . . If the cause of failure is M o d e l ,

Li(Q) = Pr{(tf < T < t?) n (Ti < Tz)}

&(e) = p . fl(ti;el). Fz(ti;W +fi(ti;ez) ' [ 1 - p . F l ( t i ; u , ) ] .

4.2,3 Let ti he Ieft-censoring time for unit i (tlie failure time

for unit i is known only to be some time before ti). Where the cause of failure is known, this is a special case of int,erval-censoring. . The failure is known to be Mode-1,

Li(@)= ~ p . f l ( ~ ; 8 1 ) . ~ ~ ( ~ ; 8 g ) ] ds.

. The failure is known to be Moda-2,

Suppose that unit, i is known to have failed between

. o

T z min(T1,Tg).

. Similarly, if the cause of failure is Mode-2,

~ ~ ( 0 ) = f ' f g ( ~ ; e g ) . [i - p . F l ( s ; B l ) ] ds. t i

. If the failure mode is not known,

= Pr{t: < T 5 t;} = ~+(t:; e) - FT(tE; e) = [l - p . F l ( t : ; e l ) ] ' F z ( t : ; e 2 ) 1

- [l-p.Fi(t:';ei)].Fg(t:";82)].

. The failure cause is not known,

&(e) = Pr{T 5 ti}

= 1 - [1 - p . F,(ti;U,)]. Fz(ti;ez), 4.2.4 Right-censored observation

(the failure time is greater than t i ) , Unit i is known only to have survived beyond ti

~ ~ ( 0 ) = Pr{T > t i } = [I - p . ~ , ( t : ~ ; e , ) ] . Fz( t i ; 0,)

5. s-CONFIDENCE REGIONS & INTERVALS Ref 191 demonstrated the import.ant advantages of like-

lihood-bascd s-confidencc intervals for the LFP. In view of this, likelihood-based methods are used here to coll- struct s-confidence intervals for functions of thc unknown

380 IBEC TRANSACTIONS ON RELIABILI'I'Y, VOI. 48. NO. 4, 1999 DECEMBER

parameters. location-scale distributions for tlie failure time of Mode-1 & Mode-2 in tlie GLFP, there a,re altogether 5 unknown parameters (0 has 5 elements). Therefore, the construc- tion of likelihood-based s-confidence intervals and regions

Also see [2, 3, 13, 151. Assuming log- T = number of elements in 01. As an example, if 0 = (SI, 02, O s , , . , ), the profile likeli-

hood for (&,e,) is L(@1,02,03r , , . ) R ( S 1 , S z ) = m a ^ ^ ^

can requirc a substantial amount of computation. This 0 3 # 4 , . . . L(01,82,83, . . . ) section- briefly discusses general methods for constructing likelihood-based simultancous s-confidence bands and Then a large-sample approximate (1 -a) joint s-confidence

known parametcrs in a model.

5.1 Likelihood-Based Joint s-Confidencc Region Assuming that standard regularity conditions are satis-

fied, a large-sample approximate (1 - a ) joint .y-confidencc region for 0 is given by tlie set

P xfp,l-e) = 1 - a quantile of the x 2 distribution with P Cegrees of frsedom, 0 MLE of 0.

5.2 Simultaneous s-Coniidence Bands for a Function of 0 Let g(0, t ) be a function of 0 and some other variable (or

variables), say t. In particuhr, g(0, t) could be the Cdf of T , to be estimated over a range of values o f t . To quantify uncertainty in g(0, t ) over some range of t , one can use a. simultaneous s-confidence bmd. The upper & lower points of an approximate (1 - a ) simultaneous s-confidence band at a particular point t ase obtained by fixing t and letting g ( 0 , t ) range over all possible values of 0 in C R I defined in (3). Hence, the endpoints of the s-confidence band for g(0, t ) for a given t are Lower: min{g(0, t ) : 0 t CRI}, Upper: max{g(0, t ) : 0 E CRI}.

Repeating this process for a ra,nge of values o f t gives a s-confidence band for g ( 0 , t ) with a simultaneous s-confi- deiice level of o (1 - a) . This generalizes the approach suggested in [1].

5.3 Profile Likelihood and Approximate +Confidence

number of unknown parameters in 0,

Region for a, Parameter Subset

5.4 Profile Likelihood and Approximat,e s-Confidence Intervals for Scalar Functions of 0

As a special case, the profile likolihood provides an approximate (1 - a ) s-confidence interval for a scalar function of 0, g(0), provided that g(0) is reparameterizablc in terms of 0 (eg, see [8: chapter 51 or [lo: chapter 81. In other words, what is required is a one-to-one transformation between g(0) and one of .the unknown parameters in 0. By repxameterizing g(8 ) in terms of the chosen parameter, the profile likelihood, and hence the s-confidence interval for g(B), can be obtained.

For example, with 0 = ( f i , U ) , let g(0) be the Cdf of a log-location-scale distribution,

where t is fixed. Then there exists a onc-to-one transformation between F ( t ) and f i , with

p = log(t) - U . @-'[F(t)] .

Thc profile likelihood for F ( t ) is obtaincd by using F ( t ) instead of fi as a parameter of the likelihood:

An approximate (1 - a) s-confidence interval for g(0) = F ( t ) consists of all values of F ( t ) such that

Profile likclihoods call be used to obtain a joint s-confidence region (or interval) for any subset of the unknown parameters in 0. By definition, the profile likelihood for gI , a partition of 0 = (OL, &), is

Because the s-confidence interval is 1-dimensional, a chi- square distribution with 1 degree of freedom is required.

The same method can easily be generalized to vector functions. For example, with 0 = (b,u) as before, let

d P , d = [gl(P>~),g2(P,. .)I be a vector function with a one-to-one transformation with (b, U). Then substituting:

e^= MLE of 0. . 91 ( f i , 0 ) for f i , Then a large-samplc approximate (1 - a ) sconfidence . g2(p,u) for ~,

region for 01 is the set in the likelihood, cnables one to compute the profile likelihood for the elements of g ( p , u ) and thus to obta,in a, s-confidence region for g ( p , cr).

CHANIMEEKER' A FAII,URE-TIMI MODEL FOR INFANT-MORTALITY AND WEAROUT FAILURE-MODES

6. CONSTRUCTING s-CONFIDENCE INTERVALS FOR GLFP

This section describes the numerical method used in finding tlie s-confidence intervals and simultaneous s-confidence bands for the GLFP fitted to the CB failure data.

6.1 Alternative Method For Finding A Pointwise 8-Confidence Interval For a Scalar Function

Constructing s-confidence intervals using profile lil<eli- hood niethods can be computationally difficult and ini- practical, especially where: . reparameterization of the scalar function g(0) in terms

of one of tlie parainetcrs in B is not feasible, . tlie dimension of 0 is more than 2 or 3 .

It also might be inefficient when s-confidence intervals for many different scalar functions of 0, g ( B ) , are needed. An alternative and equivalent way of obtaining a s-confidence interval for g ( 0 ) that gets around such difficulties uses tlie set:

{Y(Q) : > L@). A [-;x& }. (4)

The maxiiiium & minimum values of in this set form a pointwise approximate (1 - C Y ) s-confidence interval for

This method requires no reparameterization of g(0) and d o ) .

as long as one has the set:

- C R 2 = {0 : L(0) > L ( 0 ) . exp [-;&,)] )>

in hand, the pointwise .?-confidence iiitcrval for any scalar function g(0) can be constructed. But this raises t,he question of how to obtain the set C R 2 (or a good approximation for it) in tlie first place.

One way of solving this problem is the scheme suggested in [ll], which is discussed briefly in section G.2. This method also enables one to obtain the set CRI defined in (3), needed to construct simultaneous s-confidence bands.

6.2 Random Walk Approximation of s-Confidence Intervals

Ref [ll] proposed an algorithm to construct approximate s-confidence intervals hased on a variation of Gibbs sampling from a uniform distribution over a s-confidence region in the parameter space. The basic idea is to approximate the target, s-confidence region CR by generating many uniformly distribukd points within CR using the: Random-Walk Algorithm

1. Choose a line a,t random in tlie parameter space that passes through a point in CR (CRI and/or C R z ) , depending on the type of s-confidence intervals or bands being constructed). A simple choice for the point is tlie MLE of 0.

2. Using a uniform distribution, randomly select a point that lies on the line.

3. Check to see if the point is contained in CR. If so, record the point; Else return to step 2; End-If.

4. Return to step 1, this time with the line passing through tlie point chosen in step 3 . Continue this procr- dnre and repeat until a enough simulated points inside the .?-confidence region have been generated.

End-Algorithm

A simple modification can bc made to speed up the RandomlWalk Algorithm. In step 3, if the randomly selected point, say 81, does not fall within CR, then restrict the selection of the next random point to the set of points on the line not farther than 01, as seen from the previously chosen point that gives rise to the line. This does not alter tlie distribution of the chosen points iii CR [11].

Notation . S E Set of m points generated by Random-Walk Algo-

rithm in the parameter spa.ce; these are approximately uniformly distributed within the P-dimensional s-coiifidence region. . g(S) = Set of va,lues of g(0) obta.ined by evaluating it.

at each point 0 in S.

If the points ill S adequately cover tlie s-confidence region, the nia,xinium & minimum values of g(S) can be used as the limits of the s-confidence interval for y(0)

However, one can question the accuracy of the endpoints of tlie interval obtained this way because S might not . sufficiently represent the .+confidence region, or . extend fully to the boundaries of the region.

The minimum number of points for a specified accuracy increases very rapidly with P (mimnber of paramet,c:rs in B ) because of tlie exponential growth in the volume of the P-dimensional region. Tablc 1 [ll] lists the approximate sims needed us number of unknown parameters to attain 2- digit, accuracy when using the range of g(S) as the limits of the s-confidence interval (under the assumption that y(B) is approximately linear). Table 1 indicates tha t for a parameter space of 8 dimensions such a,? tliat of tlie GLFP, the required number of simulated trials is of t,he order lo5.

Table 1: Approximate m Needed From a

[So tlia,t the distribution-mode of tlie sample-minimum

[P =? Dimension, m * Required number of points]

Beta(a(P + l), i ( P + l), L, U] Distribution

< L + O.Ol(U - L) [ll] ]

m P 3 9.102 4 ? . l o 3 5 5.104

s 1 . 1 0 7

6 3 .105 7 2.. 106

9 5.10' 10 3.10'

382 IEEE TRANSACTIONS O N RBI.IABILITY, VOL. 48. N O 4. 1999 DECEMBER

When the number of unknown parameters in 0 is small, say 3 or 4, then the amount of computation needed is relatively modest. For higher dimensions, [11] suggested the use of an extrapolated quantile-quantile plot to obtain more accurate endpoints of the s-confidence interval than the range of g(S) , This extrapolation idea is based on the fact that a 1-dimensional linear projection of uniformly distributed points in the interior of a P-dimensional el- lipsoid has a Beta[$(P + l), i ( P + l), L, U] distribution, where L aiid U are the endpoints of the support of the het,a distribution [4 p 751, [11]. L and U serve as the endpoints of the approximate (1 - a ) .y-confidence interval.

The CR1 (or C R 2 ) is generally not ellipsoidal, and g ( 0 ) is generally not equivalent to a 1-dimensional linear projection. In large samples, however, C R 1 is approxima,tely ellipsoidal and if g ( 0 ) is a,pproximately linear, then g(S) has, approximately, a B e t a [ i ( P i- 1), i ( P + 1) , L, U] distribution.

A plot of the sample qualitiles of the simulated values of g(0) 71s the Beta[$(P + I), + ( P + l ) , L , U ] distri- hulion should be a smooth curve passing through (0, L ) a,nd (1, U), To estimate L a,nd U, first construct a sample quantilc-quantile plot of g(S) and Beta distribution and fit a curve through the plot. By extrapolating the curve, thc g ( 0 ) values that intersect with the Beta quantiles of 0 and 1 011 the curve provide the approximations for L and U. For P < 4, extrapolation is not needed. See [ll] for more details & examples.

7. EXAMPLE OF GLFP FITTING This section illust.rates the fitting of the GFLP model

to CB failure data a,nd describes the results of MLE of the model parameters.

7.1 Data

1.31. The trial lasted lo4 hours. Table 2 gives . the a,ctual number of failures, . the number of failures observed for Vendor 1 during

periodic inspections in the trial. Engineering judgment indicated that most of the early

failures were caused by defective CB, and that later failures were caused by a wearout failure mechanism. Interest cen- tered on finding a model that, would adequately describe the data aiid make predictions for the proportion of units that would fail in the future.

7.2 Maximum Likelihood Results

Fitting the GLFP to tlie data requires specifying the form of the distribution for each of the two failure modes and whether the cause of failure of each unit is known or not. Unfortunately, no cause-of-failure information was available for these da,ta (the data. are masked). Engineer- ing judgment, however, suggested that: . failures of the CB during the early periods were proba-

bly due to infant mortality caused hy component defects, . late failures were due to a wearout-type f d u r e mode.

The data came from a field trial of 4992 CB [lo: example

Table 2: Failure Data from a CB Field-Tracking Study

[Operating Hours (Interval Endpoints): LB + Lower Bound,

[NF + Number Failing] UB + Upper Bound ]

LB UB NF 0 1 10 1 2 1 2 5 3 5 10 1

10 20 2 20 50 6 50 100 3

100 200 2 zoo 500 8 500 1000 4

1000 2000 5 2000 5000 6 5000 6000 3 6000 7000 9 7000 8000 10 8000 9000 16 gono 10000 7

10000 cc 4897 After 10k hours of operation, there were 4897 unfailed boards.

Now, examine the estimation results arising from several combinations of model distributions and types of failure- identification to see how well they fit the data.

Consider tlie 3 assumptions on what is known about the cause of failure for the individual CB.

1. Andysis 1: The cause of failure of each unit failing before l k hours is due to infant mortality. The cause of failure of all other units is wearout.

2. Analysis 2: CB that fa,iled before 200 hours were infant mortality failures. CB that, failed after 5k hours were wearout failures. The failure-cause between 200 and 5k hours is not known.

3. Analysis 3: The failure mode'is not identified for any unit. 4

In every ca,se, consider both the Weibull and lognormal distributions for both failure modes in order to assess sen- sitivity.

7.2.1 Ana,lysis 1

Table 3 gives the MLE of the 5 unknown parameters and the corresponding approximate Ga,ussian-theory s t andad errors for the GLFP with: W/W m d W/LN. The fitted models with LN/W and LN/LN are virtually the same as thc W/W and the W/LN models, respectively, as far as their behaviors are concerned. Thus they are omitted here.

The MLE of p and the parameters for the infant morta,l- ity failure mode distribution are the same (to the number of digits shown) for both of the wearout distributional assumptions.

CHANIMEEKER: A I~AILURC~'I 'IMI~ MODEL FOR INI~ANT-MOIU'AI.II'Y AND WEAKOUI' I:AILURE-MOUES 383

Dist.

104

\ Table 3: MLE a n d (Estimated Standard Errors)

El $1 i i z $2 5 (se,-,) (.%a,) (S?iP2) (s%,) (&c)

W/LN (0.44) (0.16) (0.30) (0.136) (0.0013)

4.10 2.607 12.5 1.442 0.0080 (0.44) (0.16) (0.42) (0.12) (0.0013)

Figure 1 shows a Weibull probability plot of the W / W and the W/I,N GLFP MLE for the Cl3 data. The plot of the data (represented by the dots) alone indicatcs the possibility of two failure modes, each with different distributions. Whilc both rnodels fit thc data very well, the one with W/W predicts a higher failure-rate a,mong t,lie units t,biit are still running after io4 hoi~rs. 1

- w.Dyw.oyI m w m W l U w . m , r n * W r m ~

100 10' 1 02 1 o3 10' 1 os HOW3

Figure 2: Hazard Function Plot, Analysis 1 [Compares the W/W and W/LN GLFP MLE of tlio CB failure-data]

.ml 3 , , __I , , , , ,,,,/ , , , ,.n,, , , , ,,,,,, I $00 10, 90' 80' 40' $0. loo

H m

Figure 1: Weibull Probability Plot, Analysis 1 [Compares the W / W m d tlic W/LN GLFP MLE of F ( t ) for the CR failure-data., iuiidcr Analysis l]

Figure 2 shows plot of tlie liamrd finictions corresponding to tlie W/W and W/LN inodcls. The hazard function was coinputed using

4 ( t ; 8) = Cdf of GLFP, fr(t; 8) = pdf of GLFP. The plot shows that the liazard function for both models is decreasing nntil tlie wearout hilure mode ta,kes over as the c m s c of failure a t Ik hours (t,lie turning point, or the minimum, for tlie two curves occurs very closc to t = 1k hours). As is consistent with figure 1, after 1k hours tlie W/W model shows a, inore rapid increase in the fajlurc rate.

Figure 3 shows a Weibull probability plot of tlie W/W disl.ributions GLFP fitted t,o t,lie data, just as in figure 1, hut includes the simultaneous & point,wise s-confidence intervals. Tliesc intervals are calculated using the mcthod

d ,os ,om Hours

Figure 3: Weibull Probability Plot, Analysis 1 [Compares the W/W GLFP MLE of the CB failure-data pointwise .$-confidence intervals and sirrniltancous s-confidence bands]

in sect,ion 6. Not surprisingly, during the tiinc period between 1 and lo4 hours, the s-confidence intervals for the MLE from tlie GLFP are coniparat.ively imrrow, whereas, due: to the lack of data, the intervals get substantially wider as the time progresses beyond io4 Iioms.

Figure 4 gives the simultaneous &: pointwise s-confidence intervals for the GLFP with W/LN distributions plotted on a Weibull probability paper. The most striking com trast between the .$-confidence intervals plots of the W/W model and thc W/LN inodel is the width of the intervals at tirncs beyond lo4 hours. The diffcrcucc is very large; eg) at. t = 10' hours, the simultaneous s-confidence intcr- Val for proportion failing under the . W/W modcl is (0.21,1), . W/LN model is (0.10,0.52).

The Weibull distribution, as a model for the wearout failure mode, is not only more pessimistic, but also expresses

384 IEEE TRANSACTIONS ON KELIAUILITY, VOL. 48, NO. 4, 1999 DECEMBER

Dist.

HOWS

Figure 4: Weibull Probability Plot, Analysis 1 [Compares the W/LN GLFP MLE of tlie CB failure-data pointwisc s-confidence intervals and simultaneous s-confidence bands]

more nncertainty when compa,rcd with the lognormal distribution.

7.2.2 Analysis 2 For the analysis with the failure mode identified for the

early and late failures, table 4 gives the MLE of the GLFP parameters and approximate standard errors with W / W a,nd W/LN distributions. GLFP with LN/W distributions and LN/LN distributions a.re almost identical to the first t,wo, and as in Analysis I , they are not discussed here.

Table 4: MLE and (Estimated Standard Errors)

5. (.^.El) (e?,) ( % E z ) (.^.a,) (.̂ e,-)

- & - P 1 21 112

W/LN (0.59) (0.17) (0.30) (0.197) (0.0015)

5.05 2.97 11.33 0.9092 0.0096 (0.59) (0.17) (0.37) (0.174) (0.0015)

.ma _1 __1 , ....... 1 ........ I 1 ....... I . .......,

,.a0 30' ,m' 10. 40' I d d H0"m

Figure 5: Weibull Probability Plot, Analysis 2

[compares tlie W/W and the W/LN GLFP MLE of F ( t ) for the CB failure-data, nnder Analysis 21

Figure 6: Hazard Function Plot, Analyses 1 & 2

[Compares the W/W and W/LN GLFP MLE of the CB failure-data]

7.2.3 Analysis 3 When the cause of failure for a,ll units is unknown, thcrc

is a, serious identification problem due to the lack of lcnowl- edge of the fa,ihire mode responsible for the failures in the GLFP. The early failures can be due to eitlicr the failure mode associat,ed with only a, fraction p of tlie population (infant, mortality) or tlic failure mode that affects all units (wcarout). Likewise for later failures. In other words, there is ai1 ambiguity in matching the two failure modes to the early a.nd late failures.

As a result, 'maximizing the likelihood of tlie GLFP' rr- snlts in two possible solulions, depending on tlic stmting point of the optiinal search algorithm. 111 some ca,ses, thr maxiniization algorithm fails to converge. The existeiice of two possible MLE of tlie GLFl? parameters is a coli- sequencc of the iiiabilit,y to distinguish between the two failure modes.

CHANIMEEKER A PAILUIIE-'I'IMI; MOUEL FOR INFANT-MORTALITY AND WEAKOU7 FAIIJJRE-MODES 385

Dist. W/W (A) W/W (B) W/LN (A)

Figure 7: Weibull Probability Plot, Analysis 2 [Compares the W/W GLFP MLE of tlie CB failure-data pointwise s-confidence intervals and siniultaneons s-confidence hands]

GI a1 z 2 a, 5 6.37 3.65 10.62 0.2997 0.0115

9.038 0.120 33.5 5.45 0.0079 6.65 3.72 11.17 0.8272 0.0121

Table 5: MLE

W/LN (B) 1 9.037 0.121 44.63 16.52 0.0080

Table 5 lists the two possible sets of MLE of the GLFP for each of the two distribution assumptions when con- vergence of the Random-Walk Algorithm is successful. The sets, (A) & (B), correspond to the two maxima in the likcliliood. Unlike tables 3 &4, the standard error for each estimate is not given because the standard errors have lit- tle mcaning whcn the Iikclihood function bas two maxima. Tliere is a large disparity between tlie parameter estimates in sets (A) and (B) for each distributional assumption.

Fignre 9 illustrates t,lie comparison between the Cdf of t,hc GLFP for two possible set,s of MLE for the W/W distributions and that of the W/LN distributions on a Weihull prohability plot. The curves corresponding to maximum point, (A) are nenrly the same as those in figures 1 & 5, implying tlmt:

1. The failure mode tliat afFects only p of tlic population is responsible for t,he early failures.

2. The otlicr failurc mode, Lo which all units are sus- 4 ccptihle, accounts for tlie late failures.

This, of coursc, is consistent with our intuition and CII-

gincering judgment. The curves corresponding to maximum point (B) show

a,n imiisual hehavior. The fit within the range of the data is iiearly identiml to that of (A). But when extrapolating he- yond lo4 hours, the (B) curves increase at near their initial rate. instead of followinr the trend of steeD increase. This

HDYR

figure 8: Weibull Probability Plot, Analysis 2 [Compares the W/LN GLFP MLE of tlic CB failurc-data pointwise s-confidence intervals a,nd simultaneous s-confidence bands]

'mq -1 d d $0' 1 0' 80' 40' 40,

Figure 9: Weihull Probability Plot, Analysis 3 [Compares tlie W/W and the W/LN GLFP MLE of F ( t ) for tlic CB failure-data]

Hours

have switched roles. Statistically, the failnres occurring hcfore 5k hours have been a,ttributed to Mode-2. Tlic la,to failures (bet,ween 5k and 10k hours) have bccn attributed to Mode-1. Plots of the pdf (figure 10) and Cdf (figure 11) of tlie two failure modes arc helpful in clarifying the W/W case (tlie W/LN case is sirnila,r).

As is clcar from figure 11, Mode-2 has a Cdf that increases very slowly, not even reaching 0.04 after loe hours. In contrast, tlic Cdf of Mode-1 is very close to its maximum value of 1 after lo4 hours. Figures 10 & 11 indicate that Mode-1 lias alniost all of its probability mass conccn- trated bctwccn 5k and 10k hours. Mode-2 is the failnrc mode to which all unit,s are at risk and Mode-l only al ' fccts fraction p of tlic population. with '= 0.0079 for tlie . .

" WJW (B) curve; thus Mode-2 is priinarily responsible for the failures tlia,t occur before 5k liours. Between 5k and can he explained hy tlic h c t that the two failure modes

186 IEEE TRANSACrlONS ON REI.IARILII'Y. VOL.. 48. NO. 4, 1999 DECEMBER

HO"<E

Figure 10: Plot of t h e Weibull pdf Estimates, Analysis 3 [Of Model and Mode-2 fbr niaxiinum point. B]

HOUS

Figure 11: Plot of the Weibull Cdf Estimates, Analysis 3

[Of Mode-1 and Mode-2 for maximinn point B]

1Ok hours, Mode-l afkcts 0.79% of the population. By t h c end of 10k hours alniost all of the units in t,his fraction 01 the population have fa.iled. Beyond t1ia.t time-point, what is left, are predominantly the units whose, cause of failure is Mode-2. This explains why the slope of curves (B) after 1Ok hours is almost the same as t.llat of curves (B) nt the beginning.

Another interest,ing observa,tion can be madc from the profile likelihood for p in figure 12. This plot is siinihr to [g: figure 41. Thn right tail end of the profile likelihood of p for maxilnuni point (A) levels off at about 0.5, ill- dii;at,ing tbat t,hc upper endpoint of any likelihood-based s-coiifidcnce interval Sor p is 1. This largc uncertainty in p with Mode:-1 cause of failure is undoubtedly the rcsult of the lack OS information on thc failure modes. In comperi- son, inaxirnuin point (B) smns to allow construi:t,ion of a likpliliood-ratio-based 95% s-confidence interval for p .

0.8 4 I

J L-.--.... o m 0.0 # 1 1 , I . , . , I . I . . I , , , , I . , , . I .

0.W 0.01 0.02 0.03 0.M 0.05

P

Figure 12: Comparison of Profile Likelihood of p [For the two different ML solutions 1111der tlic assnmptiim of WjW dist,ributions]

The a~nbiguity between tlic two Sailwe inodcs, hcnce the possibility of t,wo solutions, is oasily r:liminat.ed by identi- fying the failiirc mode respimsible Cor a few early and 1at.e failures. For example, assign . one fa,ilurc mode to tlic fa,iliires t,ha,t occurrcd t d k e thr:

lirst 5 Imurs, . the other failure mode to llie failurcs or:curring hetwccn

8k and 1Ok llours; tlmn t ld plot of the MLE of Crlf of hhe GLFP consists of curves that look like eitlicr cusves (A) os curvcs (13). How- ever, tho fewer tlic f d u r e inodes t1ia.t arc ident,ified, tlic more likely t,lial noiiconvergcnce in the ML maximizatioii will arist:, especially when the st,a,rting point ill tlie ML maxiinization is far from the actu;rl rnaxinmm in thc log- likcliliood space. Convorgencc might also occur due to a fdsc or local maximum mrresponding to MLE that resilks in a niodel tliat docs noli fit the da.ta very well.

ACKNOWLEDGMENT We arc pleased to tlmnk Luis A. Es~obar , Steve Varde-

man, m d Eric Salisbury for helpfill coniincnts &k suggcs- tions on a11 mrlier version of t,liis paper. We are also pleased to tliank Dr. Duncan J. Murdoch for prnviding a preprint, of his paper and for answering our quesainns ahout his ,?-i:onfidcncc interwd schime.

Ciniiputing for t,lic research reported in this paper was done using equipment, purclia,sed with funds providcd by an NSF SCREMS grant a m m i DMS 0707740 to Iowa St,ate University.

R.EFERENCFSS

[1] R.C.11. Chcng, 1'.C I~cs, LLConlidencr hands lor a m w lative distribution functions ol continuous random vari- alllcs", Technometrics, vol 25, lS183, pp 77 ~ 86.

(21 N. Doganaksoy, LLLikeliliood ratio conlidence Iiitervnls in lifc-data analysis", Recent Adoonces in LaJe-Testing und

CHANIMEEKER: A FAILURE-TIME MODEL FOR INFANT-MORTALITY AND WEAROUT FAILURE-MODES

~

387

Reliability, 1995, chapter 20; CRC Press. 131 N. Doganaksoy, J. Schmee, "Comparisons of approxi-

mate confidence intorvals Cor distributions used in life-data analysis", Technometrics, vol 35, 1993, pp 175 - 184.

141 K.T. Fang, S. Kote, K.W. Ng, Symmetfac Multivariate and Related Distributions, 1990; Chapman & Hall.

151 T. Gastaldi, "Improved maximum likelihood estimation lor component reliabilities with Miyakawa-Usher- Hodgson-Guess' estimators iindcr censored search Cor the cause of failure", Statistics W Probability Letters, vol 19, 1994, pp 5 - 18.

16) I.B. Gertsbakh, L. Friedman, "Rilaximum likelihood estimation in a minimum-type model with exponential and Weibull failure modes", J. American Statistical Assoc, vol 75, 1980 Jun, pp 460 - 465.

[7] M.F. Guess, J.S. Usher, T.J. Hodgson, "Estimating system and component reliabilities undcr partial information an cause of failure", J. Statistical Planning and Inference, vol 29, 1991, pp 75 - 85.

181 J.F. Lawless, Statistical Models and Methods for Lifetime Data, 1982; John Wiley & Sons.

[9j W.Q. Meeker, "Limited failure population life tests: Ap- plication to integrated circuit reliability", Technometrics, vol 29, 1987 Feb, pp 51 ~ G5.

1101 W.Q. Meeker, L.A. Escobar, Statistical Method.$ for Reliability Data, 1998; John Wiley & Sons.

1111 D.J. Murdoch, "Random walk approximation of confidence intervals", Quality Improvement Tlhrough Statistical Method.s, 1998, pp 393 ~ 404; Birkhauser.

[I21 W. Nelson, Applied Life Dota Analysis, 1982; John Wiley & Sons.

[13] G. Ostrouchov, W.Q. Meeker, "Accuracy of approximate confidence bounds computed from interval censored Wei- bull and lognormal data", J. Statistical Computation and

1141 J.S. Usher, T.J. Hodgson, "Maximum likelihood analysis of component reliability using masked system life-test data", IEEE 'Ban.. Reliability, vol 37, 1988 Dec, pp 550 - 555.

[15] SA. Van der Wiel, W.Q. Meeker, '"Accuracy of approximate confiderice bounds using censored Wcibull rogression data from accelcrated life tests", IEEE nans. Relial~ility, vol 39, 1990 Aug, pp 346 ~ 351.

Simulation, VOI 29, 1988, pg 43 ~ 76.

AUTHORS Victor Chan; Dep't of Statistics; Iowa State University; Ames, Iowa 50011 USA. Internet (e-mail): [email protected]

Victor Chan is a PhD candidate in Statistics at Iowa State University. He has a BA (1992) from Whitman College, an MS (1995) in Atmospheric Science from State University of New York at Stony Brook and an MS (1997) in Statistics from Iowa State University. Ilk research interests include accelerated testing, stochastic modeling of degradation due to environmental elTects and applied probability.

Dr. William Q. Meeker; Dep't of Statistics; Iowa State Univer- sity; Ames, Iowa 50011 1JSA. Internet (e-moil): [email protected]

William Q. Meeker is a Professor of Statistics and Dis- tinguished Professor of Liberal Arts and Sciences at Iowa State University. He holds a BS (1972) from Clarkson University and MS (1973) and PhD (1975) from Union College. He has con- sulted extensively on problems in reliability data analysis, reliability test planning, accelerated testing, and statistical com- puting, for companies such as AT&T Bell Laboratorics, Gen- eral Electric Corporate Research and Devclopment, and the Ford Motor Company. He is a Pellow of the American Sta- tistical Association and a past Editor of Technometncs. He is an Associatc Editor for International Statistical Hruiew. IIc has won the American Society for Quality (ASQ) Youden prize twice and the AS4 Wilcoxon Prize twice. He is co-author of 3 books, 5 hook chapters, and of numerous publications in the engineering and statistical literature.

Manuscript TR1998-076 received: 1998 May 21; revised: 1999 September 11

Responsible editor: H. Ayhan Publisher Item Identifier S 0018-9529(99)10312-9

mailto:[email protected]

mailto:[email protected]

a failure-time model for infant-mortality and wearout failure modes

Documents