[ieee 2013 13th international conference on intelligent systems design and applications (isda) -...

Random Validation and Fault Detection Method in

Systems Implementations

Danilo Valeros Bernardo

Db2P Research Institute Sydney, AUSTRALIA

e-mail: [email protected]

Abstract- The problematic absence of a structured technique which in its presence ensures both complex infrastructure implementations and software deployments focus on how to utilize prior knowledge of existing infrastructure and on how to apply the information obtained from the preceding and historical outcomes in achieving successful validation cases, has become the central point of discussion in this paper. The concept of Markov process and chain validation is based on the Bayesian approach to parametric models for implementations which can employ prior knowledge, even skills and preceding outcomes for their parameter estimation. This paper proposes an important validation technique drawn from the Markov process and Monte Carlo method and presents statistical analysis to examine the effectiveness of Markov chain with basic random validation.

Keywords-Bayesian approach, Markov Chain Monte Carlo, Random validation

I. INTRODUCTION

Validation of technical implementation is significant to verify the reliability of newly introduced systems in an infrastructure. Thus it is important to consider how validation can be performed more effectively and at a lower cost. It is argued that this can be achieved through the use of systematic and effective methods available.

However, since exhaustive validation and evaluation of implemented systems can sometimes unavoidably expensive, it is essential for organizations to make use of their limited resources and generate good validation and evaluation cases that more effectively achieve high probability of detecting faults [5] and design anomalies.

This paper introduces Random Validation Review (RVR). This is simple and easy to implement. However, this technique is often considered limited as there is no attempt in including existing and available information about the systems that need to change.

The introduction of the concept of Adaptive Random Validation (ARV) as an improved failure-detection version of RVR may address such limitation. In RVR, cases are simply generated in a random manner. Take for example,

978-1-4799-3516-1113/$31.00 ©2013 IEEE 172

Bee Bee Chua

HCTD- The University of Technology Sydney, AUSTRALIA

e-mail:[email protected]

evaluating network traffic and data flow randomly across disparate areas, which we refer to as zones, and then capturing and documenting the outcomes.

In ARV, cases are not only randomly selected but also evenly spread for example, validating cases across multiple areas by gathering pressure points across systems in an infrastructure. This technique explores areas as extensively as possible to gather information across a segregated environment. The main motivation of ARV is to evenly spread validation cases in order to gain a greater chance of finding faults and design anomalies.

This paper proposes to use a combination of Markov chains and Monte Carlo [8,15] (MCMC)-RVR to improve the efficiency of the failure-and-anomaly-finding capability of R YR. The motivation behind combining MCMC and RVR is to use a statistical model for developing the validation cases generation, since the probabilities of failure-causing areas, unlike in ARV, are not evenly spread in the input domain. Failures attached to relatively high probability cases will impact the stochastic process more than the main model - thus, the use of MCMC method to find the inputs with high probabilities of a failure.

When discussing a new system implementation using MCMC-RVR method, it is important to highlight the use of the statistical model utilizing the prior knowledge of existing semantics. The main benefit of MCMC [16-17] is it allows the use of statistical inference techniques to complete probabilistic aspects of the validation process. The validation generation process is achieved by using MCMCRVR method, which generates new validation case from the previously generated cases based on the construction of system input model.

The rest of this paper is organized as follows: Section 2 summarizes the validation methods, Section 3 describes techniques, and Section 4 discusses results and future works in the area of system validation.

II. METHODS

Random Validation Review (RVR) is regarded as a simple method. It avoids complex analysis of structures and simply selects cases from the whole input matrix randomly.

In comparison, AR V is used to improve the fault detection capability of RVR by capitalizing successful validation cases. AR V is based on the observation that

failure-causing inputs are normally clustered together in one or more regions in the input domain. In other words, failurecausing inputs are prominently noticeable in some areas than others.

Consider distance-based RVR (D-RVR) as the fIrst implementation of AR V that according to research maintains a set of candidate validation cases [8,25] V ={VJ,V2"'" Vd and a set of successful validation cases S={ SJ, S2,"" Sd. Accordingly, the validation set consists of a fIxed number of validation cases that are randomly selected. The successful set fInalizes all successful validation cases, which are used to guide the selection of the next case. For each validation case Vi, D-RVR computes its distance from the successful set S. Di is defmed as the minimum distance between Vi and the S, and then selects validation V gaining the maximum D to be the next validation case.

Another method is Restricted Random Validation, which widely considered in the research. It however, only maintains the successful set S={ SJ, S2,"" Sd without any V set. It specifIes exclusion zones around successful validation case. It generates validation case 1 by 1 until a validation outside all exclusion zones is found.

Both D-RVR and RVR select validation cases based on location of successful validation cases, and use distances as a gauge to measure whether the next validation case is suffIciently far apart from all the successful validation cases.

MCMC [8] is a technique for generating fair samples from a high probability dimensional space. It is considered to be simple, and the technique is performed by creating a Markov chain with stationary distribution consistent with the target distance, then by simulating the distribution based on sampling. The sample obtained by the long-term simulation can be regarded as a sample drawn from stationary distribution, i.e., the target distribution.

Marko chain needs to be created such that its stationary distribution is the probability distribution from which the samples are generated.

III. TECHNIQUES

A. Markov Chain Monte Carlo

The computation of the posterior probability Posp (£Ix) of unknown parameter p £ and data X based on the likelihood Posp (xl£) and the prior probability Prp(£)is based on the Bayes rule [8.15,25] :

Posp (£Ix) = Posp (xl£)Prp(£),

Zc The posterior probability is normalizing constant [8,25];

Zc= f Posp (xl£)Prp(£) de.

(1)

derived, where Zc is

(2)

Expectedly, Eg (2) becomes an integration. It is arduous to compute the normalizing constant when the dimension number is high [9].

MCMC-RVR is built on statistical parameter estimation, and its fault-detection capability depends on the validation model which is utilized to generate validation cases.

In this paper, Bayesian Networks (BNs) [6-7,10] are used to build an effective validation model for MCMC.

B .Bayesian Technique and Markov Chain

The technique being proposed is based and built on statistical parameter estimation. Its fault-detection capability depends on the validation model used to generate the validation cases. We use Bayesian Networks (BNs) [6-7,10] to design an effective validation model.

BNs are annotated directed graphs [11,20,22] that encode a probabilistic relationship among distinctions of interest in an uncertain-reasoning problem. It provides a formal framework for the combination of data with the decisions and judgments of validators. It represents a joint probability distribution over a set of random variables V which consists of x discrete variables WI, ... ,Wn. We defIne the network by a pair of directed acyclic graphs, B=<G,E> [23]. G is the directed graph and E represents a set of parameters of the network. The set contains the parameter for each realization of a variable say WI of WI conditioned on , the set of parents of Wi in G [24-25].

x x

To represent a validation model using BNs, we present dual dimensional n-by-n input matrix model, where node [8.25]

P (T(I.I), T (1,2),·· .,T(n,n)1 I) a p(If(T(I,I), T (1,2), ... ,T(n,n) ) x

p(If(T(l,I), T (l,2), ... ,T(n,n)), (4)

P(T(i,j)= 11 T (i-I,j) = tJ, T (i+I,j) = t2, T (i,j-l) = t3, T (i,j+l) =t4)

a P(T (i-I,j) = tI IT(i,j)= 1) P (T (i+I,j) =t2IT(i,j)= 1)

x P(T (i,j_1) = t31 T(i,j)= 1) P(T (i,j+l) =t41 T(i,j)= 1) P(T(i,j)= 1). (5)

Input has faults when other members of the graph have faults. This conditional probability is presented by (6) and (7) [25].

Cond P(T=tl S =J) = exp (fA /( exp(flt) + exp (fIt)) (6)

Cond P (T=tl S =- J) = exp (f2t) /( exp(f2t) + exp (f2t)) (7)

20J 3 J 3th International Conference on Intelligent Systems Design and Applications (ISDA) 173

Where S is one of the member inputs, t' is the change of t. Eqs 5-7 are derived to

'" Cond P(T(i,j) =t IT", (i,j) = t", (i,j) ) = exp(j3t I t", (i,j) )

(8)

'" '" exp(j3t I t"'(i,j») + exp(j3t I tHi,j»)

i�l i�l

where is j3a constant and <I> is the total number of members of the input.

C. Simple practical application: Stochastic matrix and Markov process.

Suppose that the first state technical validation of a project is

I (Network connectivity validation) 30 %

II (Security policies validation) 20 %

III (User acceptance validation) 50 %

We need to detennine the rate of the second, third, and fourth validations, assuming that the transition probabilities for the given intervals are presented by the matrix

To I To II To III

0.8 0.1 0.1 From I

A= 0.1 0.7 0.2 From II

o 0.1 0.9 From III

Remark: A square matrix with nonnegative entries and row sums all equal to 1 is called stochastic matrix. A therefore is a stochastic matrix. A stochastic process for which the probability of entering a certain state depends only on the last state occupied (and on the matrix governing the process) is called a Markov process. The given example therefore concerns a Markov process.

Solution: From matrix A and the first state we can compute the second state:

I (Network connectivity validation)

0.8 * 30 + 0.1 *20 + 0.50 26 [%]

II (Security policies validation)

0.1 *30 +0.7 *20 + 0.1*50= 22 [%]

III (User acceptance validation)

0.1 *30 + 0.2 *20 + 0.9*50 = 52 [%].

The sum is 100%, as it should be. We present this in matrix form. Let the column vector x denote the first stat. Thus,

xT = [30 20 50]. Let y denote the second state.

Then

0.8 0.1 0.1

yT = xT A= [30 20 50] 0.1 0.7 0.2 � [26 22 52 I

o 0.1 0.9

Similarly, for the third and fourth we get the state vectors, as you may verify,

ZT = yT A = (xT A) A = xT A2 = [23.0 23.2 53.8]

uT

= ZT A = (x

T A2) A = xT A3 = [20.72 23.92 55.36].

In the second state, the network connectivity validation will be 26%, the security policies 22% and the user acceptance validation 52%. For the third state the corresponding figures are 23%, 23.2%, and 53.8%. For the fourth state, they are 20.72%, 23.92% and 55.56%.

The above example can assist in achieving reasonable estimations for how validations should be performed in future projects using basic Markov process. This example is a prerequisite in understanding how MCMC-Random Validation Review can be beneficial in project validation estimation and cases generation.

D. MCMC - Random Validation Review

In the preceding section, we presented a simple application of validations using Markov process.

In this section, we present a model that utilizes the observation of the previous validation states. The concept of this technique is based on the Bayesian [16-18] approach to parametric models. These models utilize preceding and existing infonnation as their statistical parameters.

In the framework of matrix model, MCMC is utilized to determine the input that has the highest probability of failure as a validation case based on Bayesian estimation. An example of this is connectivity, routes and traffic. Therefore the fust step of MCMC-RVR is to calculate the state probability of each input by using MCMC with prior infonnation and the information on the preceding outcomes. If it is detennined that the failure-causing inputs make a cluster, the probabilities [15] of other factors (such as other areas) of a successful input are less than the others. Such probability calculation in MCMC-RVR is similar to the distance calculation ARV.

The concrete MCMC-RVR steps in the case of input matrix model are as follows:

174 20J 3 J 3th International Conference on Intelligent Systems Design and Applications (ISDA)

Step 1. Construct matrix model and define the initial state of each node in such model (Example successful PING, ROUTES, Trace, DROP, DENY, ALLOW traffic)

Step 2.Repeat the following steps , note the time and return Note: Choose a node randomly from input domain

Step 3. Calculate the fault existing probability P of node (see EQ 9)

<il P(T(iJ) �tl T �(i,J) � t�(i,J)) � exp(fJt I t�(i,J) ) (9)

i=1

� � exp(fJt I t�(i.J)) + exp(fJt' I t�(iJ))

i=J i=J

Step 4. Select the node which has state 1 randomly from input as the case

Step 5. Execute the validation scheme based on the result. If there is no fault found, set state of node to the desired value and GOTO step 2 until the first failure is found, or until reaching the stopping condition.

MCMC-RVR is designed to be effective alternative of R VR, given that it retains most of the characteristics of R VR, and offers nearly optimum effectiveness. This has two extensions: (1) validation cases chosen from the input matrix are probabilistically generated based on the probability distribution that presents a profile of actual or anticipated use of the systems; and (2) a statistical analysis [20-21,25] is implemented on the history that enables the measurement of various probabilistic aspect of the validation process. To ensure that MCMC-RVR works for the business, it is important to construct a model to obtain the test cases by developing comprehensive validation cases, inclusive of the pressure areas when implementing large systems.

IV. CONCLUSION AND FUTURE WORK

In this work, fundamental validation techniques have been presented. They are basic ones that simply select validation cases disparately from the whole input domain and can effectively detect faults and design anomalies in an implemented system. ARV was initially reviewed to improve on the fault-detection capability of the basic RVR technique. However, it is determined that MCMC-RVR can address most of RVR's limitations and fault-finding capability.

The motivation behind MCMC-RVR is to use a statistical model to develop the validation cases, because the probabilities of failure-causing inputs are not relatively spread in the initial domain [8,25]. Failures attached to relatively high probability validation cases will impact the scholastic process more than failure attached to lower probability cases. The input domain and MCMC method are thus used to find inputs having probabilities of a failure.

Examples of such inputs are network connectivity, security policies, and UAT [13-14] outcomes.

We determined that addressing a practical problem by employing a technique during systems implementations - a technique that has demonstrated the use of Markov and stochastic processes - will require fewer evaluation and validation cases to detect some sets of important factors than what the other techniques rely on in the preceding outcomes. But for a desired or unexpected outcome, it is always fundamentally important to rely on past data to achieve better statistical distribution.

Future implementations include utilizing some statistical techniques when employing proprietary network systems and tailored processes.

ACKNOWLEDGEMENT

This work is partially supported by DB2P Grant 20013090

REFERENCES

[1] I. Ajzen, and T. Madden, (1986). "Prediction of GoalDirected Behavior: Attitudes, Intentions, and Perceived Behavioral Control." Journal of Experimental Social

Psychology 22: 453-474. [2] Australian Bureau of Statistics, 8153.0 - Internet Activity,

Australia, December 2010 http://www.abs.gov.au/ausstats/[email protected]/mf/8153.0I; last accessed Oct 16,2013.

[3] 1. Baudrillard, (1994). Simulacra and Simulation. Ann Arbor: University of Michigan Press.

[4] J.O. Berger, (2006). The case for objective Bayesian

analysis. Bayesian Analysis, 1,385-402.

[5] J.0. Berger, & R.L. Wolpert, (1988). The likelihood

principle. Haywood, CA: The Institute of Mathematical

Statistics.

[6] J.M. Bernardo, & A.F.M. Smith, (1994). Bayesian theory.

New York: Wiley.

[7] D.A. Berry, (1996). Statistics: A Bayesian perspective.

London: Duxbury.

[8] S.P. Brooks, Markov Chain Monte Carlo Method and its

application. Journal of the Royal Statistical Society, Series

(The Statisticia) 47 (I), 69-100(1998)

[9] T. Y. Chen, D. Huang, T.H. Tse, Z.,Yang, An innovative

approach to tackling the boundary effect in adaptive random

testing. In : Proceedings of the 40th Annual Hawaii

International Conference on System Sciences, p. 262a (2207)

[10] P. Congdon, (2001). Bayesian statistical modelling. Chichester, UK: Wiley.

[II] P. Congdon, (2003). Applied Bayesian models. Chichester, UK: Wiley.

[12] F.D. Davis, (1986). A Technology Acceptance Model for Empirically Testing New End-User Information Systems: Theory and Results. Boston, MIT. PhD thesis.

[13] F.D. Davis, (1989). "Perceived Usefulness, Perceived Ease of Use, and User Acceptance of Infonnation Technology." MIS Quarterly 10(3): 318-340.

[14] F.D. Davis, R. Bagozzi, and P. Warshaw, (1989). "User Acceptance of Computer Technology: A Comparison of

20J 3 J 3th International Conference on Intelligent Systems Design and Applications (ISDA) 175

176

Two Theoretical Models." Management Science 35(8): 982-1003. Fishbein, M. and Ajzen, I. (1975). Belief,

Attitude, Intention, and Behavior: An Introduction to

Theory and Research. Reading, Addison-Wesley. [15] W.R., Gilks, S. Richardson, & D.J. Spiegelhalter, (Eds.).

(1995). Markov chain Monte Carlo in practice. London: Chapman & Hall.

[16] M. Goldstein, (2006). Subjective Bayesian analysis:Principles and practice. Bayesian Analysis, 1, 403--420

[17] C. Howson, & P. Urbach, (1993). Scientific reasoning: The Bayesian approach (2nd ed.). Chicago: Open Court.

[18] P.M. Lee, (2004). Bayesian statistics: An introduction (3rd ed.). London: Edward Arnold.

[19] D.V. Lindley, (1980). Making decisions (2nd ed.). New York: Wiley.

[20] H.S. Migon, & D. Gamerman, (1999). Statistical inference: An integrated approach. London: Edward Arnold.

[21] A. 0 'Hagan, (1988). Probabil ity: Methods and measurement. London: Chapman & Hall.

[22] A. O'Hagan, (2006). Bayesian analysis of computer code outputs: A tutorial. Reliability Engineering and System Safety, 91,1290-1300.

[23] A. O'Hagan, C.c. Buck, A. Daneshkhah, J.R. Eiser, P.H., Garthwaite, D.1. Jenkinson, et al.(2006). Uncertain judgements: Eliciting expert probabilities. Chichester, UK: Wiley.

[24] A. O'Hagan, & 1.1. Forster, (2004). Bayesian inference (2nd ed., Vol. 2B). London: Edward Arnold.

[25] B. Zhou, H. Okamura, D. Tadashi, (2010) Markov Chain Monte Carlo Random Testing Advances in Computer Science and IT AST/UCMA LNCS 6059 pp 447-456

20J 3 J 3th International Conference on Intelligent Systems Design and Applications (ISDA)

[ieee 2013 13th international conference on intelligent systems design and applications (isda) -...

Documents