State Probability of a Series-parallel Repairable System
with Two-types of Failure States
Gregory Levitin
Reliability Department, Planning, Development and Technology Division,
Israel Electric Corporation Ltd., P.O. Box 10, Haifa, 31000 Israel
Tieling Zhang, Min Xie
Department of Industrial and Systems Engineering
National University of Singapore, Singapore 117576
ABSTRACT
This paper presents a method for the analysis of series-parallel safety-critical system
where the system states can be distinguished into failure-safe and failure-dangerous. The
method incorporates Markov chain and universal generating function technique. In the
model considered, both periodic inspection and repair (perfect and imperfect) of system
elements are taken into account. The system state distributions and the overall system
safety function are derived based on the developed model. The proposed method is
applicable to complex systems for analyzing state distributions and it is also useful in
decision-making such as determining the optimal proof-test interval or repair resource
allocation. An illustrative example is given.
Keywords: Availability, Safety-critical system, Markov model, Universal generating function,
Periodic inspection, Failure-safe, Failure-dangerous
1
1. IntroductionSafety is of paramount concern for large and complex systems such as nuclear power and
chemical processing plants, aircraft navigation control system, power transmission and
high speed railway networks, and so on. The complexity of large systems raises many
important problems concerning safety such that it may be very difficult or even impossible
to ensure that the systems will always behave as expected under all foreseeable conditions.
Dangerous faults may be caused by not only random hardware failure but also systematic
faults inadvertently designed into the system. Safety analysis or risk assessment for such a
system thus becomes a complex problem that involves study of human factors (human
error), production process, manufacturing control, on-line measurement or test and repair,
diagnosis with periodic inspections and so on. See Dominguez-Garcia et al. (2006), Delon
et al. (2005), Cowing et al. (2004), Marseguerra et al. (2004), Burgazzi (2003) for some
related discussions on the recent reliability related research for safety-critical systems.
The use of safety-critical systems represents taking proactive measures to prevent a
process plant from occurrence of dangerous events. For example, emergency shutdown
controllers are widely used in chemical processing industry. Their function is to monitor a
plant process and to identify if the process is operating within the acceptable limits. If the
process moves outside of an acceptable operation range, the controller automatically shuts
the process down in a safe manner (Bukowski, 2001). In order to provide proper analysis of
safety-critical systems the dangerous and non-dangerous failures should be distinguished,
that are corresponding to failure-safe and failure-dangerous states of the system.
The international standard IEC 61508 (1998) includes two frameworks: One is risk
reduction with Safety-Related System (SRS) and the other is the Overall Safety Life-cycle.
Since its publication, it has been widely adopted in various safety related studies and
applications (see, e.g., Faller, 2004, Hokstad and Cornliussen, 2004, Zhang et al., 2003,
Nunns, 2000, and Knegtering, 1999). A typical architecture of SRS is regarded to consist
of components with diagnosis and periodic inspection, where the failures in each
component are classified into detectable and undetectable. There are a number of studies
on safety-critical systems which correspond to different specific system structures, see,
e.g., some recent references such as Kang and Jang (2006), Kim et al. (2005), Weber et al.
(2005), Lee et al. (2004), Latif-Shabgahi (2004) and Son and Seong (2003).
2
Periodic inspection is important for safety-critical systems and it has been studied in
reliability analysis in general (see, e.g., Cui et al., 2004, Biswas, 2003, Bris et al., 2003,
Bukowski, 2001). In various studies of safety-critical system performance, the effects of
periodic inspection have been either ignored or modeled by assigning quite longer average
repair times for unrecognized degraded states (Zhang et al., 2003 or 2006). In practice, the
unrecognized fault can not be repaired until the next periodic inspection (proof-test). In
fact, the repair for this kind of faults is carried out at determined time. However, only very
few studies have concerned the problem. Bukowski (2001) gives a method of
incorporating periodic inspection and repair into Markov model in which both perfect and
imperfect inspection and repair can be modeled. However, in Bukowski (2001), the
situation that both unrecognized and recognized degraded states may exist simultaneously
was not included in the Markov model. As the unrecognized failure can only be found at
periodic inspection, the two kinds of faults could exist in some period of time.
The purpose of this paper is to present a method for evaluating the probabilities of
failure-safe and failure-dangerous states for arbitrary complex series-parallel systems with
imperfect diagnostics and imperfect periodic inspections and repairs of elements. Each
kind of element failures whatever are of failure-safe or failure-dangerous can be either
detected or undetected. The emphasis is on exact state probability or availability of such a
system. See Bowles and Dobbins (2004), Chandrasekhar et al. (2004) and Carrasco (2004)
for some related study of other systems.
The remainder of this paper is composed of Markov model for determining state
distribution of a single system element, universal generating function technique for
determining state distribution of the entire system and an illustrative example presented.
Acronyms & Notations
FD failure-dangerous state
FS failure-safe state
W operational state
G set of states of element (system): G = {W, FS, FD} structure function
par structure function for elements connected in parallel
ser structure function for elements connected in series
Sj random discrete state variable of element j
3
sjk k-th realization of Sj: sjk G
Fd detected failure
Fu undetected failure
FDd detected failure-dangerous
FDu undetected failure-dangerous
FSd detected failure-safe
FSu undetected failure-safe
pfd probability of failure on demand
pfdD probability of failure-dangerous on demand
pfdS probability of failure-safe on demand system transition rate matrix
0k zero column vector of size k1
1k unit column vector of size k1
PW(t), PFS(t), PFD(t) probability of subsystem or the entire system is in state W, FS, FD at
time t
sd, dd, du, su failure rate of FSd, FDd, FDu, FSu
sd, dd, du, su repair rate of FSd, FDd, FDu, FSu
d fraction of detected failures that are detected correctly
TI Proof-test interval
Assumptions
1. System is composed of elements and each element can experience two categories of
failures: Dangerous and non-dangerous, corresponding respectively to failure-
dangerous and failure-safe events. Failure-dangerous and failure-safe events are
independent.
2. Both categories of failures can be detected and undetected.
3. Detected and undetected failures constitute independent events.
4. Failure rates for both kinds of failures are constant.
5. The element is in operation state if no failure event (detected or undetected) has
occurred.
6. The element is in failure-safe state if at least one non-dangerous failure (detected or
undetected) has occurred and no dangerous failure has occurred.
4
7. The element is in failure-dangerous state if at least one dangerous failure (detected or
undetected) has occurred.
8. The elements are independent and can undergo periodic inspections at different times.
9. The state of any composition of elements is unambiguously defined by the states of
these elements and the nature of elements interaction in the system.
10. The elements’ interaction is represented by series-parallel block diagram.
2. State distribution of single system element
According to IEC 61508, the typical system structure is composed of elements to which
diagnosis and periodic inspection and repair are applied. Failure-safe or failure-dangerous
events can occur independently. The failure category depends on the effects of a fault
occurrence. For example, if a failure results in shutdown of a properly operating process, it
is of the type of failure-safe (FS). This type of failure is referred in a variety of ways to
false trip and false alarm. However, if a safety-critical system fails in an operation which
is required to shut down a process, that could cause hazardous results, such as failure of a
monitor that is applied to control an important process. This type of failure is generally
called failure-dangerous (FD).
Both FS and FD events can be detected or undetected. The detected failure can be
detected instantly by diagnostic devices. An imperfect diagnosis model presumes that a
fraction d of detected failures can be detected instantaneously by diagnostic devices.
Whenever the failure of this kind is detected, the on-line repair is initiated. The failures
that can not be detected by the diagnostic devices or remain undetected because of the
imperfect diagnosis are considered to be undetected failures. These failures can be found
only by the proof-test (periodical inspection) just after the end of a proof-test interval. We
assume that failure rates of detected failure-safe and failure-dangerous (sd and dd,
respectively) as well as undetected failure-safe and failure-dangerous (su and du,
respectively) can be calculated or elicited from tests.
The state of any single element can be represented as combination of two independent
states corresponding to detected and undetected failures. Each of the two failures can be in
three different states of no failure (state W), failure of category FS and failure of category
FD. According to assumptions 5-7, the state of each element can be determined based on
each combination of states of failures using Table 1.
5
Table 1. States of single element.
The state of each element j can be represented by a discrete random variable Sj that
takes values from the set G = {W, FS, FD}. In order to obtain the element state
distribution pjW = Pr(Sj = W), pjFS = Pr(Sj = FS) and pjFD = Pr(Sj = FD), one should
summarize the probabilities of any combination of states of detected and undetected
failures that results in the element states W, FS and FD, respectively. Based on element
state transition analysis, one can obtain the Markov state transition diagram presented in
Fig. 1. In this diagram, each possible combination of the states of detected and undetected
failures (marked inside the cycles) belongs to one of the three sets corresponding to three
different states of element defined according to Table 1.
Practically, no repair action is applied to the undetected failure until the next proof-test.
In general, the periodic inspection and repair take very short time when comparing to the
proof test interval TI, and the whole system stops operation (in down state) during the
process of periodic inspection and repair. Therefore, it is reasonable to set repair rates for
undetected failures du = su = 0 when analyzing the behavior of a safety-critical system
within the proof test interval (unlike equivalent repair rates for du and su used in Zhang et
al. (2003).
6
Detected FailureW FSd FDd
Undetected
Failure
W W FS FDFSu FS FS FDFDu FD FD FD
Fig. 1. Markov state transition diagram used for calculating state distribution of a single element.
According to Fig. 1, the following group of equations describes the element’s
behavior:
Pj(t) = Pj(t) j (1)
Pj(t) = (pj1(t), pj2(t), …, pj9(t)) is the vector of state probabilities, P(t) is derivative of P(t)
with respect to t, and j is transition rate matrix, see appendix. According to Table 1, state
1 in the Markov diagram corresponds to state W of the element, states 2 - 4 correspond to
state FS of the element and states 5 - 9 correspond to state FD of the element. Having the
solution P(t) of Eq. (1) for any element j, one can obtain pjW = pj1, pjFS = pj2 + pj3 + pj4 and
pjFD = pj5+ pj6 + pj7 + pj8 + pj9. The solution of Eq. (1) can be expressed as
Pj(t) = Pj(0) exp(j t), for t 0; (2)
Pj(t) = Pj(n TI+) exp(j (t n TI)), for n TI
+ t (n +1) TI+ , n = 0, 1, 2,
To consider imperfect inspection and repair, the undetected fault can not be repaired as
good as new and some may still exist after inspection and repair. A matrix Mji is used to
describe this behavior. Each element of the matrix Mji describes the transition rate of
probability from one state to another. Thus, we have
Pj(TI+) = Pj(TI) Mj1 = Pj(0) exp(j TI) Mj1 (3)
Pj(2TI+) = Pj(2TI) Mj2 = Pj(0) exp(j TI) Mj1 exp(j TI) Mj2
Pj(n TI+) = Pj(n TI) Mjn
W, W Detected Undetected
W, FSu FSd, W W, FDu FDd, W
su
sd
du
dd
sd
su
sddu
dd
FSd, FDu FDd, FDuFDd, FSu
dd
dusu
du
su
sddu su
sd
sd
dd
du
su
dd
dd
FS FD
W1
2 3
4
5 6
7 98
7
FSd, FSu
= Pj((n 1 )TI+) exp(j TI) Mjn
= Pj((n 2 )TI+) exp(j TI) Mj(n 1) exp(j TI) Mjn
= Pj(0) exp(j TI) Mj1 exp(j TI) Mj2
exp(j TI) Mj(n 1) exp(j TI) Mjn for n = 1, 2, 3,
(4)
In Eq. (4), n represents the nth proof-test interval and Mji (i = 1, 2, 3, , n) is matrix
associated with the ith proof-test.
3. State distribution of the entire series-parallel system
In order to obtain the state distribution of the entire system, the procedure used in this paper is based on the universal generating function (u-function) technique. This method was introduced in Ushakov (1987) and has shown to be very effective for the reliability evaluation of different types of multi-state systems, see Levitin et al. (1998) and Lisnianski and Levitin (2003). The comprehensive description of the method and its numerous applications in reliability engineering can be found in (Levitin, 2005). For some recent and related applications, see e.g., Levitin (2004 and 2005), and Korczak et al. (2006).
The u-function of a discrete random variable Y is defined as a polynomial
(5)
where the variable Y has K possible values and qk is the probability that Y takes the value
of yk. In our case, the polynomial u(z) can define state distributions, i.e. it represents all of
the possible mutually exclusive states of the element (or any subsystem) by relating the
probabilities of each state to the value that takes the random state variable corresponding
to this element (subsystem) in that state. Note that the performance distribution of the
basic element j (probability mass function of discrete random variable Sj) can now be
represented as
, (6)
where sj1 = FD, sj2 = FS, sj3 = W for any j.
To obtain the u-function of a subsystem consisting of two elements, composition
operators are introduced. These operators determine the u-function for two elements
8
connected in parallel and in series, respectively, using simple algebraic operations on the
individual u-functions of basic elements. All the composition operators take the form
. (7)
The obtained u-function relates the probability of each combination of states of the
independent elements (which is equal to the product of the probabilities of these states) to
the value that the random state variable of the entire subsystem takes when this
combination is realized. The function (.) in composition operators expresses the
dependence of the entire subsystem state on the states of both of its elements. The
definition of the function (.) strictly depends on the physical nature of the system and on
the nature of the interaction of the system elements.
The structure functions for pairs of elements connected in parallel and in series should
be defined for any specific application based on analysis of system functioning. For
example, in the widely applied conservative approach the following assumptions are
made. Any subsystem consisting of two parallel elements is in failure-dangerous state if at
least one of elements is in failure-dangerous state and is in operational state if at least one
of the elements is in operational state. In the rest of cases, the subsystem is in failure-safe
state. This can be expressed by the structure function par(.) presented in Table 2. A
subsystem consisting of two elements connected in series is in the operational state if both
of the elements are in the operational state, whereas it is in failure-dangerous state if at
least one of elements is in failure-dangerous state. In the rest of cases, the subsystem is in
failure-safe state. This can be expressed by the structure function ser(.) presented in Table
3.
Table 2. Structure function for pair of elements connected in parallel.
Table 3. Structure function for pair of elements connected in series.
9
Element 1W FS FD
Element 2
W W FS FDFS FS FS FDFD FD FD FD
Element 1W FS FD
Element 2
W W W FDFS W FS FDFD FD FD FD
In the numerical realization of the composition operator in Eq. (7), we can encode the
states W, FS and FD by integer numbers 3, 2 and 1, respectively, as such sjk = k for any j.
In our case, k = 1, 2, 3. It can be seen that in this case the defined above functions par(.)
and ser(.) take the form:
par(sjk, sih) = and ser(sjk, sih) = min(sjk, sih).
Note that the nine possible different combinations of element states produce only three
possible states of the subsystem. The probabilities of combinations that produce the same
subsystem state should be summed in order to obtain this state probability. This can be
done by collecting terms with equal exponents in the u-function obtained by Eq. (7).
Finally, any subsystem state distribution can be represented by the u-function taking the
form of Eq. (6).
Any subsystem consisting of two elements can be further treated as a single equivalent
element with a performance distribution that is equal to the performance distribution of
this subsystem. Consecutively applying the composition operators and replacing pairs of
elements by equivalent elements, one can obtain the u-function representing the
performance distribution of the entire system.
The recursive algorithm
The following recursive algorithm obtains the u-function that represents the entire system
state distribution:
Step 1. Obtain the state probabilities for each element j using the Markov
transition diagram method presented in Section 2.
Step 2. Define the u-functions uj(z) for each element j using Eq. (6).
Step 3. If the system contains a pair elements connected in parallel or in a
series, replace this pair with an equivalent element with u-function obtained
by operator of Eq. (7) with the structure functions par(.) and ser(.),
respectively.
Step 4. If the system contains more than one element, return to Step 3.
Otherwise, the algorithm stops.
10
The coefficients of the obtained u-function are equal to probabilities of operational,
failure-safe and failure-dangerous states of the entire system.
With the state probabilities of each element in the form of functions of time, one can
use the algorithm presented above to get the probability values corresponding to any given
time. Finally, the entire system state probabilities and the overall system safety (defined as
the sum of operational probability and failure-safe state probability) as functions of time
can be obtained. In the following section, we use an example to illustrate the procedure
described here.
4. Illustrative example
Consider a combine-cycle power plant with two generating units. Each unit consists of a
gas turbine blocks and fuel supply systems. The fuel to each turbine block can be supplied
by two parallel systems. The simplified reliability block diagram of the plant is presented
in Fig. 2. Each fuel supply system as well as each turbine can experience both safe and
dangerous failures (detected and undetected).
Fig. 2. Reliability block diagram of combine cycle power plant
The parameters of fuel supply systems are: sd = 2.5610-5, su= 10-5, dd= 8.910-6,
du = 110-6, sd = 0.25; dd = 0.0833, su= du = 0; d = 0.99; TI = 1.5 years. The fuel
supply systems are statistically identical, but the inspection times of systems 2 and 4 are
shifted 0.5 year earlier relatively to inspection times of systems 1 and 3. The matrix Mji
11
1
25
3
46
Fuel supply systems
Turbine block
associated with each fuel supply system is M1i (i = 1, 2, 3, 4) as shown in Eq. (A2) in
Appendix.
The turbine blocks are also statistically identical. The parameters of the turbine blocks
are: sd = 2.5610-5, su= 6.54010-6, dd= 7.910-6, du = 7.810-7; sd = 0.25, dd =
0.0625, su= du = 0; d = 0.99; TI = 2 years. The matrix Mji associated with each turbine
block is M2i (i = 1, 2, 3) as shown in Eq. (A3) in Appendix.
The probabilities pjW(t), pjFS(t) and pjFD(t) for each system element obtained by solving
equations (2) and (3) for a period of time, 65000 hours, are presented in Fig. 3 - 5. At the
same time, the probabilities PW(t), PFS(t) and PFD(t) for single generating unit and for the
entire system (the structure functions are defined in accordance with Tables 2 and 3,
respectively), obtained using the algorithm given in Section 3, are also presented in Fig. 3
through 5. These figures show that the variations of these probabilities for single
generating unit and the entire system have also the property of periodicity.
The system safety S(t)=PW(t)+PFS(t) as the function of time is presented in Fig. 6.
0.84
0.88
0.92
0.96
1
0 10 20 30 40 50 60
t (thousands of hours)
PW
elements 1,3 elements 2,4 elements 5,6
single unit system
Fig. 3. Probabilities of working states
12
0
0.04
0.08
0.12
0 10 20 30 40 50 60
t (thousands of hours)
PS
elements 1,3 elements 2,4 elements 5,6 single unit system
Fig. 4. Probabilities of failure-safe states
0
0.016
0.032
0.048
0.064
0.08
0 10 20 30 40 50 60
t (thousands of hours)
PD
elements 1,3 elements 2,4 elements 5,6
single unit system
Fig. 5. Probabilities of failure-dangerous states
13
0.9
0.92
0.94
0.96
0.98
1
0 10 20 30 40 50 60
t (thousands of hours)
S
Fig. 6. Overall system safety
5. Conclusions
In this paper a method is proposed for the study of series-parallel systems with
imperfect diagnostics and imperfect periodic inspections and repairs of elements. Element
failures can be failure-safe and failure-dangerous and can be either detected or undetected.
The proposed model incorporates periodic inspection and repair (both perfect and
imperfect) of system elements. The Markov model is used for the determination of state
distribution of a single system element, while universal generating function technique for
state distribution of the entire system. The presented example shows that the procedure
can be easily implemented to estimate the state probabilities and the overall safety of a
safety-critical system.
The method presented in this paper can be applied to different research fields such as
power generation units, electronic devices and chips, data storage based on redundant
array of inexpensive disks (Katz et al., 1989; Gibson and Patterson, 1993, etc.) and so on. It
can be used for evaluating safety of a fault-tolerant single-chip multiple microprocessors
architecture (Yao, et al., 2004) which represents a promising solution to partly mitigate the
system faults and to increase the system dependability in mission-critical applications.
14
Acknowledgement:
This research was carried out while the first author was visiting National University of
Singapore supported by the research grant R-266-000-020-112 at National University of
Singapore. The authors would like to thank three referees for their constructive comments.
References
Biswas, A.; Sarkar, J. and Sarkar, S. (2003). Availability of a periodically inspected system, maintained under an imperfect-repair policy. IEEE Transactions on Reliability, 52 (3), 311-318.
Bowles, J.B. and Dobbins, J.G. (2004). Approximate reliability and availability models for high availability and fault-tolerant systems with repair. Quality and Reliability Engineering International, 20 (7), 679-697.
Bris, R., Chatelet, E. and Yalaoui, F. (2003). New method to minimize the preventive maintenance cost of series-parallel systems. Reliability Engineering & System Safety, 82 (3), 247-255.
Bukowski, J.W. (2001). Modeling and analyzing the effects of periodic inspection on the performance of safety-critical systems, IEEE Transactions on Reliability, 50 (2), 321 – 329.
Burgazzi, L. (2003). Reliability evaluation of passive systems through functional reliability assessment. Nuclear Technology, 144 (2), 145-151.
Carrasco, J.A. (2004). Solving large interval availability models using a model transformation approach. Computers & Operations Research, 31 (6), 807-861.
Chandrasekhar, P.; Natarajan, R. and Yadavalli, V.S.S. (2004). A study on a two unit standby system with Erlangian repair time. Asia-Pacific Journal of Operational Research, 21 (3), 271-277
Cowing, M.M.; Pate-Cornell, M.E. and Glynn, P.W. (2004). Dynamic modeling of the tradeoff between productivity and safety in critical engineering systems. Reliability Engineering & System Safety, 86 (3), 269-284.
Cui, L.R.; Loh, H.T. and Xie, M. (2004). Sequential inspection strategy for multiple systems under availability requirement. European Journal of Operational Research, 155 (1), 170-177.
DeLong, T.A.; Smith, D.T. and Johnson, B.W. (2005). Dependability metrics to assess safety-critical systems. IEEE Transactions on Reliability, 54, 498-505.
Dominguez-Garcia, A.D.; Kassakian, J.G. and Schindall, J.E. (2006). Reliability evaluation of the power supply of an electrical power net for safety-relevant applications. Reliability Engineering & System Safety, 91, 505-514.
Faller, R. (2004). Project experience with IEC 61508 and its consequences. Safety Science, 42 (5), 405-422.Gibson G. A. and Patterson D.A. (1993). Designing Disk Arrays for High Data Reliability, Journal of
Parallel and Distributed Computing, 17, 4 – 27. Goble, W.M. (1998). Control Systems Safety Evaluation and Reliability, 2nd ed: ISA. Hokstad, P. and Corneliussen, J. (2004). Loss of safety assessment and the IEC 61508 standard. Reliability
Engineering & System Safety, 83 (1), 111-120.IEC 61508 (1998). Functional safety of electric/electronic/programmable electronic safety-related systems,
Parts. 1–7, October 1998–May 2000. Inagaki, T. and Ikebe, Y. (1989). Performance analysis of a safety monitoring system under human-machine
interface of safety-presentation type, Microelectronics and Reliability, 29 (2), 1989, 165 – 175. Kang, H.G. and Jang, S.C. (2006). Application of condition-based HRA method for a manual actuation of
the safety features in a nuclear power plant. Reliability Engineering & System Safety, 91, 627-633.
15
Katz R.H.; Gibson G.A. and Patterson D. (1989). Disk System Architectures for High Performance Computing, Proceedings of the IEEE, 77, No. 12, pp. 1842 – 1858.
Kim, H.; Lee, H. and Lee, K. (2005). The design and analysis of AVTMR (all voting triple modular redundancy) and dual-duplex system. Reliability Engineering & System Safety, 88, 291-300.
Korczak, E.; Levitin, G and Ben Haim. H. (2005). Survivability of series-parallel systems with multilevel protection. Reliability Engineering & System Safety, 66, 45-54.
Knegtering, B. and Brombacher, A.C. (1999). Application of micro Markov models for quantitative safety assessment to determine safety integrity levels as defined by the IEC 61508 standard for functional safety. Reliability Engineering & System Safety, 66 (2), 171-175.
Latif-Shabgahi, G.; Bass, J.M. and Bennett, S. (2004). Taxonomy for software voting algorithms used in safety-critical systems. IEEE Transactions on Reliability, 53 (3), 319-328.
Lee, D.Y.; Han, J.B. and Lyou, J. (2004). Reliability analysis of the reactor protection system with fault diagnosis. Key Engineering Materials, 270, 1749-1754.
Levitin, G. (2004). A universal generating function approach for the analysis of multi-state systems with dependent elements. Reliability Engineering & System Safety, 66, 285-292.
Levitin, G. (2005). Uneven allocation of elements in linear multi-state sliding window system. Eyropean Journal of Operational Research, 163, 418-433.
Levitin G.; Lisnianski A.; Beh-Haim H. and Elmakis, D. (1998). Redundancy optimization for series-parallel multi-state systems, IEEE Transactions on Reliability, 47 (2), 165-172.
Lisnianski, A. and Levitin, G. (2003). Multi-state System Reliability, World Scientific, Singapore.Levitin, G. (2005). The Universal Generating Function in Reliability Analysis and Optimisation. Springer-
Verlag: Berlin, Springer Series in Reliability Engineering.Marseguerra, M.; Zio, E. and Podofillini, L. (2004). A multiobjective genetic algorithm approach to the
optimization of the technical specifications of a nuclear safety system. Reliability Engineering & System Safety, 84 (1), 87-99.
Nunns, S.R. (2000). Conformity assessment of safety related systems to IEC 61508 - the CASS initiative. Computing & Control Engineering Journal, 11 (1), 33-39.
Olbrich, T; Richardson, A.M.D. and Bradley, D.A. (1996). Built-in self-test and diagnostic support for safety critical Microsystems, Microelectronics and Reliability, 36, 1125– 1136.
Son, H.S. and Seong, P.H. (2003). Development of a safety critical software requirements verification method with combined CPN and PVS: a nuclear power plant protection system application. Reliability Engineering & System Safety, 80 (1), 19-32.
Ushakov I., (1987). Optimal standby problems and a universal generating function, Soviet Journal of Computer System Science, 25, 79-82.
Wang, D. and Inagaki, T. (1994).Time-dependent optimality of an alarm subsystem, Microelectronics and Reliability, 34, 1623 – 1633.
Weber, W.; Tondok, H. and Bachmayer, M.B. (2005). Enhancing software safety by fault trees: experiences from an application to flight critical software. Reliability Engineering & System Safety, 89, 57-70.
Yao, W.B.; Wang D.S. and Zheng W.M. (2004). A Fault-tolerant Single-chip Multiprocessor, ACSAC 2004 Proceedings of Advances in Computer Systems Architecture: 9 th Asia-Pacific Conference, Pen-Cheng Yew and Jingling Xue (eds.), Berlin: Springer, 2004, p. 137-145.
Zhang, T.L.; Long, W. and Sato, Y. (2003). Availability of systems with self-diagnostic components—applying Markov model to IEC 61508-6, Reliability Engineering & System Safety, 80, 133 – 141.
Zhang, T.L.; Xie, M. and Horigome, M. (2006). Availability and reliability of k-out-of-(M plus N): G warm standby systems. Reliability Engineering & System Safety, 91, 381-387.
Zhou, Z. (1987). Analysis of a two unit standby redundant fail-safe system. Microelectronics and Reliability, 27, 469 – 474.
16
Appendix
The transition rate matrix for one element is
c su sd du dd
0 0 0
0
0 0 0 0sd 0
dd
0
sd 0 0 0
su du 0 0
0
0 0 0 0
sd 0 dd
dd 0 0 0 0 0 su du
0 sd 0 0 0 sd 0 0 00 0 0 sd 0 0 sd 0 00 dd 0 0 0 0 0 dd 00 0 0 dd 0 0 0 0 dd
where c = sd + dd + du + su .
The matrices M1i (i = 1, 2, 3, 4) for fuel supply system are
p1
p2 p3 p4
p5 p6 p7 p8 p9
1 0 0 009 09 09 09 090.90 0.10 0 0
1 0 0 00.80 0 0 0.20 15 05 05 05
17
(sd +dd)
(su + ddu +sd )
(sd +dd)
(su +du+ dd )j = (A1)
M11 = ,
p1
p2 p3 p4
p5 p6 p7 p8 p9
1 0 0 009 09 09 09 090.88 0.12 0 0
1 0 0 0
0.776 0 00.22
4 15 05 05 05
p1
p2 p3 p4
p5 p6 p7 p8 p9
1 0 0 009 09 09 09 090.85 0.15 0 0
1 0 0 0
0.747 0 00.25
3 15 05 05 05
p1
p2 p3 p4
p5 p6 p7 p8 p9
1 0 0 009 09 09 09 090.808 0.192 0 0
1 0 0 0
0.711 0 00.28
9 15 05 05 05
The matrices M2i (i = 1, 2, 3) for turbine block are
18
M12 = ,
M13 = ,
M14 = .
(A2)
p1
p2 p3 p4
p5 p6 p7 p8 p9
1 0 0 009 09 09 09 090.92 0.08 0 0
1 0 0 00.85 0 0 0.15 15 05 05 05
p1
p2 p3 p4
p5 p6 p7 p8 p9
1 0 0 009 09 09 09 090.804 0.096 0 0
1 0 0 0
0.832 0 00.16
8 15 05 05 05
p1
p2 p3 p4
p5 p6 p7 p8 p9
1 0 0 009 09 09 09 090.882 0.118 0 0
1 0 0 0
0.810 0 00.19
0 15 05 05 05
19
M21 = ,
M22 = ,
M23 = .
(A3)
Gregory Levitin received a PhD degree in Industrial Automation from Moscow Research Institute of Metalworking Machines in 1989. From 1982 to 1990 he worked as software engineer and research associate in the field of industrial automation. From 1991 to 1993 he worked at the Technion (Israel Institute of Technology) as a postdoctoral fellow at the faculty of Industrial Engineering and Management. Dr. Levitin is presently an engineer-expert at the Reliability Department of the Israel Electric Corporation and adjunct senior lecturer at the Technion. His current interests are in operations research and artificial intelligence applications in reliability and power engineering. In this field Dr. Levitin has published over 100 papers and two books. He is senior member of IEEE. He serves in editorial boards of IEEE Transactions on Reliability and Reliability Engineering and System Safety.
Tieling Zhang received a Ph.D. in engineering from Tokyo University of Mercantile Marine in 2001. He has six years’ experience of teaching, three years’ working in industry and a few years holding research positions. Currently he is with Hitachi GST, Singapore. He has 30 articles included in peer-review journals and international conference proceedings. He holds a new practical patent of China. His research interests include system reliability, maintainability and safety, system optimization and vibration control.
Min Xie received his Ph.D. in Quality Technology from Linkoping University, Sweden, in 1987. Dr Xie has been active in reliability and quality related research since then. He has authored or co-authored over 100 articles in refereed journals and 6 books, including Software Reliability Modelling by World Scientific, Statistical Models and Control Charts for High Quality Processes by Kluwer Academic Publisher, and Weibull Models by John Wiley & Sons. He is a department editor of IIE Transactions, an associate editor of IEEE Trans on Reliability, and on the editorial board of several other journals. He is a fellow of IEEE.
20