an information theoretic technique to design belief network based expert systems

18

Click here to load reader

Upload: sumit-sarkar

Post on 25-Aug-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: An information theoretic technique to design belief network based expert systems

E L S E V I E R Decision Support Systems 17 (1996) 13-30

An inforrrlation theoretic technique to design belief network based expert systems

S u m i t S a r k a r a,1 R a m S. S r i r a m b S h i b u J o y k u t t y a, I s h w a r M u r t h y a

a Department of Information Systems and Decision Sciences, College of Business Administration, Louisiana State University, Baton Rouge, LA 70803, USA

b School of Accountancy, Georgia State Unicersity, Atlanta, GA 30303, USA

Abstract

This paper addresses the problem of constructing belief network based expert systems. We discuss a design tool that assists in the development of such expert systems by comparing alternative representations. The design tool uses information theoretic measures to compare alternative structures. Three important capabilities of the design tool are discussed: (i) evaluating alternative structures based on sample data; (ii) finding optimal networks with specified connectivity conditions; and (iii) eliminating weak dependencies from derived network structures. We have examined the performance of the design tool on many sets of simulated data, and show that the design tool can accurately recover the important dependencies across variables in a problem domain. We illustrate how this program can be used to design a belief network for evaluating the financial distress situation for banks.

Keywords: Belief networks; Expert systems; Information theory; Knowledge acquisition; Probabilistic reasoning

1. Introduction

The representat ion of uncertainty, and its sub- sequent manipulation, has become an important issue in expert systems. Different uncertainty cal- culi that have been proposed for this task include probability theory [8,23], the Demps t e r -Sha fe r theory of evidence [32], the certainty factor calcu- lus [33], and fuzzy set theory [42]. Each of these calculi have their relative advantages and disad- vantages over the other ones, and the choice of a calculus often depends on the task that is being represented in an expert system. Among the dif-

1 Sumit Sarkar was supported in part by a grant from the College of Business, Louisiana State University.

ferent belief calculi, probability theory has the richest historical and philosophical foundations for representing uncertainty [5,11,16]. Probability measures have a sound theoretical basis, and provide a meaningful communication tool for representing uncertainty. For example, the state- ment " the probability that it will rain today is 0.6" can be easily interpreted; other uncertainty measures do not have an equally well-understood interpretation. Subsequently, probability mea- sures can be empirically tested, which is not pos- sible for any of the other measures. For this reason, many researchers have addressed the problem of representing beliefs in expert systems using probability measures [10,33]. An important drawback has been that traditional rule-based expert systems, which are very appropriate for

0167-9236/96/$15.00 © 1996 Elsevier Science B.V. All rights reserved SSDI 0167-9236(95)00020-8

Page 2: An information theoretic technique to design belief network based expert systems

14 S. Sarkar et al. / Decision Support Systems 17 (1996) 13-30

representing deterministic knowledge, have been shown to have some severe limitations for repre- senting uncertainty using a probability calculus [13,31]. It has been shown in recent years that belief networks provide a theoretically consistent representation scheme for capturing uncertainty using probability measures [21,24].

Consider the example of an investor assessing a firms profitability based on the financial ratios Return on Assets (ROA) and Return on Equity (ROE). In practice, observing a set of values for the ratios R O A and R O E usually does not allow the investor to categorically determine whether a firm is consistently profitable or not. Instead, the observed ratios either strengthen or weaken the investors' belief that the firm is profitable. This relationship between the firms' profitability and its ratios R O A and R O E can be represented using a belief network as shown in Fig. 1. In a belief network, the nodes correspond to the dif- ferent variables of interest (e.g., ROA, R O E and Profitability in this example). Each variable is categorical in nature; for each candidate firm R O A and R O E are either above the industry average or below the industry average, respec- tively, while Profitability is considered to be ei- ther Good or Poor.

Each variable in the belief network has a prob- ability distribution associated with it. The arcs in the network signify dependencies between the linked variables. When a variable has one or more parent nodes in the network, then the prob- ability distribution for that variable is stored con- ditioned on all combinations of the outcomes of

its parent variable(s). Since the variable R O A does not have any parent node in Fig. 1, the probability distribution associated with it is the prior probability for each outcome of that vari- able. The variable Profitability has two parents, and, therefore, the distribution associated with it is conditioned on all the possible outcomes of ROA and ROE. Thus, the first element of the left column in the matrix is the probability that a firms' Profitability is Good when both R O A and R O E are above average. Similarly, the second element in the left column is the probability that a firms' Profitability is Good, when R O A is above average and R O E is below average, etc. The probability that the firms' Profitability is Poor for different outcomes of R O A and R O E are indi- cated in the right column. Representing uncer- tainty using probabilities as shown allow them to be used for making inferences about some vari- able of interest, based on observing the value(s) of one or more other variables. For example, if the ratio R O A for a firm is observed to be above average, then the distributions stored in the net- work can be used to evaluate the ret~ised belief that the firms' Profitability is Good. This, then, is the way a belief network is used as an expert system. The interested reader is referred to Ref. [24] for details of belief propagation techniques in network structures.

Designing a belief network for a problem do- main requires identification of the structure of the network, followed by specification of the con- ditional and prior probabilities for the variables in the network. To obtain the desired structure,

P ( R O A ) = 10.6,0.41

• P(lh'ofilability[ 0.9 0.1 I 0.7 O.3

0.5 0.5

Ol 0.9

Fig. 1. Representing probabilistic dependencies in a belief network.

Page 3: An information theoretic technique to design belief network based expert systems

S. Sarkar et a l . / Decision Support Systems 17 (1996) 13-30 15

experts have to identify the important determi- nants for each variable of interest in the problem domain. Subsequently, the probability parameters that are required to specify the underlying depen- dencies are elicited from domain experts. In prac- tice, this is a hard problem. While experienced domain experts can often provide good probabil- ity estimates [41], nevertheless, obtaining a com- plete and consistent set of prior and conditional probabilities for all required dependencies is usu- ally very difficult. Furthermore, humans (even domain experts) can demonstrate many forms of biases in their judgments [40,41]. Therefore, in order to obtain accurate assessments, knowledge engineers need to be trained in probability elici- tation techniques. Both domain experts and knowledge engineers are often expensive re- sources for an organization, and elicitation of probabilistic knowledge requires a large invest- ment of time and effort on their part. Usually, the expert(s) would like to consider many alterna- tive feasible structures for representing a prob- lem domain. In orde; to evaluate such structures, additional probabilit~y parameters are needed to completely specify each structure. This drastically increases the number of probability parameters that the expert(s) must provide, which substan- tially adds to the knowledge elicitation costs. Fur- thermore, it increases the likelihood of more in- consistencies across the probability parameters obtained. Often, the expert is simply unable to provide accurate estimates for so many different probability parameters. Subsequently, choosing the most appropriate structure is usually done in an ad-hoc manner. The focus of this research then is to develop a systematic method for design- ing belief network structures that successfully overcomes the above difficulties.

The objectives of this paper are twofold. First, we discuss a design tool that uses rigorous analyt- ical techniques to a,;sist in the development of belief network based expert systems. The design tool is a program that has been developed in C. The techniques incorporated in the program are based upon a prior body of theoretical work for comparing alternative structures [29,30]. The prior research has shown that the well-known mutual information measure is appropriate for evaluating

belief network structures. The design tool uses this mutual information measure to compare dif- ferent structures, when the underlying probability distribution across variables in the network is known. The program can also be used to estimate the necessary probability parameters from sample data. This is of particular interest, since large volumes of historical data are available in com- puterized form for many business applications. Such data reflects probabilistic dependencies across entities of interest in their respective prob- lem domains, and can serve as an important source of knowledge for building belief-network based expert systems. Further, using such data for estimation purposes can greatly reduce the burden of eliciting probability estimates from ex- perts when designing such systems. This point is crucial, since, as discussed earlier, it is often infeasible to obtain a large number of probability estimates that are reliable and consistent from the experts alone. Three important capabilities of the design tool are discussed: (i) evaluating alter- native structures based on sample data; (ii) find- ing optimal networks with specified connectivity conditions; and (iii) eliminating weak dependen- cies from derived network structures. These capa- bilities are demonstrated on simulated data ob- tained by generating random samples from known network structures with specified probability dis- tributions. We have examined the performance of the design tool on many additional sets of simu- lated data that are obtained from network struc- tures that are themselves randomly generated. The design tool is shown to be fairly accurate in recovering from sample data the underlying de- pendencies across variables in a problem domain.

The second objective of this paper is to show how the design tool can be used to design a belief network based expert system for a real-world application. We address the decision problem faced by auditors in examining the financial via- bility of banks. An important practical require- ment for a belief network based system is that it should represent the important dependencies across variables in the problem domain in a parsi- monious manner for efficient belief propagation (i.e., only the essential dependencies should be stored by the system). We show how variables

Page 4: An information theoretic technique to design belief network based expert systems

16 S. Sarkar et al. / Decision Support Systems 17 (1996) 13-30

that are not directly measurable (called latent variables) can be used to obtain efficient struc- tures. The use of latent variables help in obtain- ing structures that are consistent with an experts' reasoning process. The design tool is used to obtain computationally efficient network struc- tures that are strongly supported by the sample data.

The rest of the paper is organized as follows. Section 2 discusses the information theoretic techniques that form the basis for comparing alternative representations. In Section 3, we dis- cuss the capabilities of the design tool that is used to compare alternatives. Experiments con- ducted to examine the performance of the design tool are also presented in this section. In Section 4, we discuss the problem of evaluating financial distress for banks, and demonstrate how the de- sign tool can be used to build a belief network for this application. Concluding remarks are pre- sented in Section 5.

2. Evaluating belief network structures

In this section, we provide a summary of theo- retical results obtained in prior research that form the basis for comparing alternative belief network representations [29,30]. We first discuss the choice of an appropriate measure for per- forming the desired comparisons, and then dis- cuss two important results that lead to a compu- tationally tractable technique for comparing al- ternatives.

2.1. Theoretical considerations

A belief network structure represents a joint distribution over the different variables in the

STRUCTURE I STRUCTURE It

Fig. 2. Two alternative network structures.

network. The joint distribution can be repre- sented as a product of the conditional and prior probabilities used to specify the network. For instance, the portion of the belief network shown in Fig. 1 can be represented by the following product form:

P (Profitability, ROA, R O E )

= P (Profitability IROA,ROE)

× P ( R O A ) P ( R O E ) .

In general, for a problem with variables X 1 . . . . . An, the product form is given by:

P( X t . . . . . Xn) = 11;_ t.n P( XiIF( Xi) ).

F(X;) refers to the set of parent variables for Xi. For variables with no parents, F ( X i) = O, which indicates that the marginal probabilities are stored for those variables.

For illustration, consider the two structures shown in Fig. 2 as candidate networks for repre- senting the important dependencies across vari- ables X 1, X2, X 3, X4 and X 5. For the moment , we assume that the complete underlying probabil- ity distribution over all the variables in the net- work is available (we subsequently show how this assumption can be relaxed in practice). Usually, the complete joint distribution cannot be used in its entirety because of the enormous number of probability parameters that would have to be stored and manipulated. In order to choose be- tween the two alternatives, we must determine the best set of probability parameters for each structure, and then evaluate the structures by comparing the probability parameters associated with each structure with the true underlying dis- tribution, respectively. We need a measure that we can use to determine the best set of probabil- ity parameters for each structure, as well as, to compare alternative structures.

Measures used to compare probability distri- butions are called scoring rules [37]. The expected score associated with a feasible alternative, S(p,r), is a function of the vectors p and r, where p denotes the true distribution (expressed as a vec- tor of the probability masses corresponding to the different states that each variable may have), and r the distribution associated with the alternative, ft,

Page 5: An information theoretic technique to design belief network based expert systems

S. Sarkar et al . / Decision Support Systems 17 (1996) 13-30 17

scoring rule is said to be proper if S(p,p) > S(p,r). This implies that assessments other than p cannot get a higher score than p itself. Among the vari- ous possible proper :scoring rules, three that have received particular attention in the literature are: (i) the quadratic scoring rule [6]; (ii) the logarithm scoring rule [12]; arid (iii) the spherical scoring rule [26]. A requirement often imposed on a scoring rule is that the score depend only on the probability assigned to the event that is actually realized (called the principle of relevance [37]). For instance, in a three event state space, the two assessments (0.6, 0.3, 0.1) and (0.6, 0.2, 0.2) should receive the same score if the first event was realized (if one of the other two events were to occur, the two assessments would get different scores). The logarithm rule is the only proper scoring rule that satisfies this requirement for any arbitrary distribution [37]. It has been shown that using the logarithm rule is equivalent to using the I-Divergence measure commonly used in commu- nication theory [15]. Sarkar et al. [29] have per- formed an experimental comparison of the loga- rithm and the quadrztic rule over a large range of probability distributions and loss parameters, and show that the results obtained using the loga- rithm rule are usually better than those obtained using the quadratic rule.

For the above reasons, the logarithm rule is deemed appropriate for comparing alternative belief network representations. The logarithm rule evaluates the di:~tance between the true dis- tribution and the distribution associated with a feasible network st:ructure using the formula shown:

S ( p , r ) = - ]~i Pi log r,. (i)

A high score is preferred as it indicates that r is close to p. The sccce is maximized when the distributions p and r are identical. We state a result that allows us to obtain the "best" set of probability parameters for a given structure in a simple and intuitively appealing manner.

Result 1: When using the logarithm rule, the best set of probability parameters that corre- spond to a given ~Iopology T is one where P r ( X i l F ( X ) ) = P(Xi lF(Xi)) for all i.

The above result is obtained by deriving the optimality conditions for maximizing the loga- rithm score associated with a given network struc- ture (a formal proof of this result is shown in [30]). Thus, for Structure I in Fig. 2, the optimal conditional probability distribution associated with variable X 5 for that structure is obtained by estimating the true conditional probability distri- bution P(Xs[X3,X4). This result will hold irre- spective of the rest of the network structure. Subsequently, the full joint distribution for that structure is obtained by independently estimating the different conditional and marginal probability distributions that constitute the product-form for the structure. The complete true distribution un- derlying the problem domain is not required; only the relevant conditional and marginal distribu- tions need to be evaluated. Estimating the condi- tional and marginal probabilities can be done much more easily and with greater accuracy than estimating a complete joint distribution.

Once the probability parameters are obtained, then the two structures can be compared to each other by evaluating the logarithm measure associ- ated with each structure, and choosing the one with a higher score (as shown in Eq. (i)). It has been shown that the mutual information across the conditioned and conditioning variables for each term that appears in the product-form can be used to compare the two structures [30]. The mutual information between variables X i and F ( X i) is defined as [15]:

I ( X i ; F ( X i ) )

= Y'.xi,v<x~)P(X i ,F (X i ) )

( P ( X i , F ( X ~ ) ) × log

e ( x i ) " P ( F ( X i ) ) "

The mutual information I(X,.;F(Xi)) represents the amount of information contributed by the variables F ( X i) about variable Xi. The best topology is characterized as follows:

Result 2: The structure with a higher logarithm score is the one that has a higher sum of mutual information terms for components of the prod- uct-form distribution, i.e., the topology with a higher E~_ 1,n I(X~;F(Xi)).

Page 6: An information theoretic technique to design belief network based expert systems

18 S. Sarkar et al. / Decision Support Systems 17 (1996) 13-30

As a result, a structure is preferred if the set of variables that constitute the conditioning (parent) variables are highly informative about the child variable, for each component of the product-form. The evaluation function, ~i=l,n I(Xi;F(Xi)), is linearly separable in the mutual information terms associated with each variable. Therefore, if two alternative structures differ only in the parent sets associated with one node in the network, then the only mutual information terms needed for comparing the two structures are those associated with that node. Thus, the two struc- tures shown in Fig. 2 can be compared by evaluat- ing just the expressions I(Xs; X3,X4) and I(Xs; X2,X 3) from the underlying distribution, and then choosing the structure with the higher associated mutual information term. This is intuitively ap- pealing since different structures are compared only on the basis of the differences in their struc- tures. More generally, the above result allows us to identify the right parent set for each node of the network, independent of the rest of the net- work. Therefore, a potentially hard problem (in terms of combinatorial possibilities) is rendered relatively easy by this decomposition. The only constraint is that there be no cycles in the net- work, since that is not a legitimate representation for any underlying joint distribution.

Using the mutual information measure can also help to detect the right set of dependencies that exist across variables in a network. In the example networks shown in Fig. 2, both the vari- ables X 3 and X 4 depend on variable X 1. Usually, the variables X 3 and X 4 would display a mutual dependence in such a situation (i.e., I(X4;X 3) > 0). However, when the mutual information mea- sure is being used, then considering both X 1 and X 3 as a parent set for X 4 evaluates as exactly the same as when only X 1 is considered as a parent for X4, i.e., I ( X 4 ; X 1 , X 3) = I ( X 4 ; X 1 ) . This prop- erty of the mutual information measure is used to eliminate spurious dependencies from showing up in a network structure.

3. A design tool to evaluate belief networks

In this section, we discuss the design tool that uses the analytic results to compare alternative

network structures. The design tool is a program developed in C that can be used in several differ- ent ways to assist developers of belief network based expert systems. We first discuss the three important capabilities of the design tool in assist- ing users to obtain belief networks: (i) evaluating alternative structures based on sample data; (ii) finding optimal networks with specified connec- tivity conditions; and (iii) eliminating weak de- pendencies from derived network structures. These features are demonstrated on simulated data obtained by generating random samples from known network structures with specified proba- bility distributions. We have examined the perfor- mance of the design tool on many additional sets of simulated data. These simulated data sets are obtained from network structures that are them- selves randomly generated. Finally, we also pre- sent the results of experiments that illustrate how well the design tool is able to recover from sam- ple data the important dependencies across vari- ables in a problem domain.

3.1. Evaluating alternative structures based on sample data

The design tool requires as input the number of different variables being considered for inclu- sion in a network structure, the maximum num- ber of realizations that each variable may have, and a collection of records where each record stores the actual realizations of all the variables of interest for each instance of a problem. This collection of records constitutes the sample data for the problem domain, and is made available to the program through external data files. The de- sign tool prompts the user to provide alternative feasible structures that are to be considered. Us- ing a question answer interface, it requires the user to specify the parent set for each variable included in a structure. The user can compare multiple network structures at a time. For each suggested topology it does the following. It evalu- ates the mutual information term associated with every variable and its corresponding parent set. This is accomplished by estimating from the sam- ple data the necessary conditional and marginal probabilities associated with that variable. It then

Page 7: An information theoretic technique to design belief network based expert systems

S. Sarkar et aL / Decision Support Systems 17 (1996) 13-30 19

Fig. 3. A network struclure used to generate sample data.

uses the sum of the mutual information terms to compare a topology with other suggested topolo- gies, and identifies the structure with the highest value. The program also provides estimates for prior and conditional probabilities associated with different nodes for the selected structure.

The program also allows users to evaluate al- ternative parent sets for a variable without re- quiring users to specify complete topologies. We have used this feature to examine the perfor- mance of the program on data sets generated from many different network structures. One structure used for this purpose is shown in Fig. 3. Each variable in this network can take on upto 3 d~fferent values. The network has a known proba- bility distribution associated with it. Sample data was generated from this distribution using Monte Carlo simulation, where each sample (record) consisted of a single realization of each of the variables in the n e ~ o r k . We used the program to determine, based on the sample data, whether variable X 4 was more strongly dependent on variable X I or on variable X 2 (i.e., does X 1 make a bet ter predictor for X4, or does )(2?). The program evaluated I (X4;X 1) and I (Xa;X 2) by estimating the necessary conditional and marginal probabilities from the sample data. We performed such a comparison using as few as 250 samples to estimate the necessary probability pa- rameters. The program correctly identified X 2 as the parent for X 4. Similarly, when three feasible parents, X1, X 2 and X 3, were examined for vari- able X 7, the program correctly selected X 3 as the best parent variable. Proceeding in a similar man- ner, we have been able to reconstruct the com- plete network by finding the best set of parent variables for each node, one at a time.

3.2. Finding the best degree-constrained network structure

Quite often, an important pragmatic require- ment for an expert system is that it should per- form its reasoning in a specified time frame. The response time for belief network based expert systems is critically dependent on the number of parent variables that nodes are allowed to have in the network. This is so because the complexity of belief revision algorithms are exponential in the size of the largest parent set in a network [17]. We call the size of the largest parent set in a network as the degree of the network.

When the desired response time is stringent, then it is necessary to limit the allowable degree of belief networks that are considered for imple- mentation. The second capability of the design tool is to assist users in obtaining the best degree-constrained network representation. In this mode, the design tool prompts users to pro- vide the following information: (i) the allowable degree of networks to consider; and (ii) for each variable, the set of other variables from which its parent set may be drawn. Based on this user provided information and the sample data, the program identifies the best degree-constrained network structure as follows. For each variable, the program generates all feasible parent sets that include the specified number of variables. It evaluates the mutual information measure for each parent set, and selects the parent set that is most informative for the variable under consider- ation. Repeating this process for all variables in the network, it arrives at the best solution.

One of the network structures used to test this feature of the design tool is shown in Fig. 4. The prior and conditional distributions associated with variables in this network were specified, and sam- ple data generated as before. The program was instructed to find the best structure with a degree

Fig. 4. Second test network.

Page 8: An information theoretic technique to design belief network based expert systems

20 S. Sarkar et al. / Decision Support Systems 17 (1996) 13-30

constraint of 2 (which is the degree of the true network as well). For variable X 4, the parent set was selected from {X1,Xz,X3}, for X 5 it was selected from {X1,Xz,X3,X4}, and for X 6 it was selected from {X1,X2,X3,X4,X 5} (i.e., all lexico- graphically preceding variables). When a sample size of 100 was used to estimate probability pa- rameters, the program selected the right parent sets for X 4 and X 5. For X 6, it selected (X3,X 4) instead of the true parent set (Xz,X3). When the sample size was increased to 500, it detected the correct parent set for X 6 as well.

3.3. Eliminating weak dependencies from derit~ed network structures

The third capability of the design tool is to eliminate insignificant dependencies from the network. While the degree constraint ensures that belief network structures under consideration are efficient for making inferences, the structure may nevertheless include redundant arcs across some of the nodes. This is because in practice, we may not know beforehand the exact number of parent variables that each variable should have. If the degree constraint is larger than the number of truly predictive variables for a node in the net- work, then the program would identify as many variables in the parent set as the degree will allow. In theory the mutual information test should exactly identify only those parents that contribute additional information. However, it is not possible to identify independence and condi- tional independence properties in an exact man- ner when estimating probabilities from samples. For instance, sampling errors lead to the mutual information conveyed by two parent variables to be usually more than that conveyed by either one of the two taken individually, even for those variables that have only one parent in the original network. Therefore, when identifying the parent set for a variable, we wish to include those vari- ables that can significantly contribute additional information about that variable.

The design tool eliminates weak dependencies in the network by performing a significance test on the mutual information terms associated with having additional variables in parent sets of nodes in the network. The likelihood-ratio chi-square

test is appropriate for this purpose. When used in this mode, the program prompts the user to spec- ify the set of feasible parent variables for each node in the network. The following procedure is used for each node in the network. The program first considers all parent sets that consist of a single variable and finds the one that is most predictive. The selected variable is tested for significance by comparing the mutual information term against the chi-square value for the appro- priate degree of freedom. The applicable degree of freedom depends on the number of possible realizations of the parent variable being consid- ered, and is evaluated by the program. The pro- gram performs the significance test at the 5% significance level. If the mutual information term is not found to be significant, then the program concludes that the node under consideration should have no parent. If the mutual information term is significant, the program stores the predic- tive variable as the best parent set of size 1. The program then finds the best two variable parent set. The difference in the mutual information terms for the different parent sizes is tested using the chi-square value for the appropriate degree of freedom (which now depends on the number of possible realizations for each of the two parent sets being compared). The process is repeated until increasing the number of variables in the parent set does not lead to a significant increase in the mutual information value.

We have tested this capability of the design tool on data generated from the network shown in Fig. 3. The parent set for each variable was chosen from its lexicographically preceding vari- ables. When 100 sample records were used, the program excluded the three arcs (XI,X3), (X1,X 5) and (X2,X 5) at the 5% significance level. At the same time, it included arc (X3,X 5) although X 3 was not a parent of X 5 in the true distribution. The program identified the exact set of parent variables for each node in the network when 500 sample records were used instead to estimate the different probability parameters.

3.4. Performance evaluation of the design tool

In the preceding sections, we demonstrated the ability of the design tool to recover underly-

Page 9: An information theoretic technique to design belief network based expert systems

S. Sarkar et a l . / Decision Support Systems 17 (1996) 13-30 21

ing network structures for two small sized prob- lems. Subsequently, we conducted extensive ex- periments to examine how the program works for problem sizes ranging from 10 nodes to 30 nodes. The experiments were performed using many dif- ferent belief network structures for each problem size. The accuracy of the design tool was mea- sured in terms of how "close" network structures obtained from the program were to network structures that were used to generate sample data.

The belief network structures used in the ex- periment were theraselves randomly generated using another program (which is called the gener- ating program). Binary variables were considered for the generated networks. For the generated structures, each variable was allowed to have upto 4 variables in its parent set. For a given problem size, the generating program first ran- domly generated the number of realizations that each variable may have. Then, for each variable the program randomly generated the number of parents (within the specified size restrictions). For each node, based on its randomly generated parent size, the program randomly identified the parent variable set :From the other variables. In order to eliminate cycles in the structures that were generated, the feasible parent set for each node was restricted to its lexicographically pre- ceding variables. Once the structure was specified for a network, it was used to randomly generate probability distribulions associated with each node in the network:. For variables with no par- ents, only the prior probability distributions were needed. For variables with parents, the condi- tional probability distributions were generated for each possible realization of its parent set. Finally, based on these prior and conditional probability distributions, the sample data was randomly gen- erated for each variable in the network.

The sample data was then used as input for the design program. The design program was also provided with the following information: (i) the lexicographic order of the variables, (ii) the num- ber of realizations for each variable in the net- work, (iii) the number of variables constituting the parent of each variable in the network, and (iv) the true parent set for each variable in the

Table 1 Performance of design tool when parent sizes are known

Problem Sample Average Avg. # % size size # of of arcs Accuracy (# of arcs in correctly nodes) network detected

10 100 16.0 11.1 69.4% 500 15.1 13.9 92.0% 1000 15.8 14.7 93.0% 2000 15.1 14.2 94.0%

20 100 38.8 22.7 58.5% 500 38.1 35.1 92.1% 1000 36.2 34.4 95.0% 2000 40.0 39.5 98.8%

30 100 63.6 29.2 45.9% 500 65.1 56.4 86.6% 1000 65.7 63.0 95.9% 2000 68.0 67.1 98.7%

network. Sample data from ten different net- works were generated for each problem size, and the accuracy of the design tool was averaged over these instances. The design tool was tested in two different ways. First, it was used to identify from a nodes' lexicographically preceding variables, the best set of parents corresponding to the true size of its parent set. This enabled us to examine how accurately the design tool could compare alterna- tive parent sets of the same size given different samples sizes. Table 1 summarizes the perfor- mance of the design tool for this test. Each row in the table corresponds to a combination of prob- lem size (in number of nodes) and sample size (in number of records used to estimate probabilities). The performance of the design tool is reflected by the number of arcs correctly identified by the design tool for each such combination. The last column in Table 1 represents the percentage of arcs that were correctly detected on average.

The results in Table 1 clearly indicate that the design tool is very accurate in predicting the true parent variables of different nodes in a network for samples consisting of 500 or more records. Further, the performance was found to be equally good for networks of all the three sizes that were examined.

Next, the design tool was used to identify the set of significant parent variables for each node

Page 10: An information theoretic technique to design belief network based expert systems

22 S. Sarkar et al. / Decision Support Systems 17 (1996) 13-30

in a network. This enabled us to examine how accurately the design tool could identify the exact parent set when testing for statistical significance given different sample sizes. In our experiments, we allowed for a 5% Type 1 error, or a 95% confidence level. The accuracy of the design tool was averaged over ten instances as before. Table 2 summarizes the performance of the design tool for this test. Once again, when sample sizes of 500 or more records were available, the design tool could recover around 90% of all the arcs in a network. In addition to the arcs correctly identi- fied, this table also indicates the number of addi- tional arcs indicated by the program, as well as the number of existing arcs that were dropped by the program. While few arcs were missed out on average for sample sizes of 500 or larger, a rela- tively larger number of extraneous arcs were picked up by the program (ranging from 6% to 47% of the number of arcs in a network). Fur- ther, the number of extraneous arcs increased as the size of the network grew larger. A possible explanation is that for larger networks, there are many more feasible parent sets (of any given size), and, hence, more opportunities to pick re- dundant arcs in the network.

The experimental results indicate that when the parent sizes are available to the design tool, it is very good at recovering the right structure. Further, the program accomplishes that with rela-

tively small sample sizes. Thus, if the design tool is used to compare alternative structures speci- fied by an user, then it is very reliable in identify- ing the better network structure. Similarly, when it is used to obtain degree-constrained network structures, it identifies optimal structures very accurately. However, when used to identify the statistically significant parent variables for each node, it is less accurate. The ability to detect existing arcs improves with the size of sample data as is to be expected. However, increasing the sample size does not seem to significantly affect the number of extraneous arcs selected by the program. The analysis indicates that the design tool is not as reliable when selecting parent sets for individual variables based on significance tests than when it is used to compare alternatives.

4. Constructing belief networks for evaluating financial distress in banks

The experiments described in the previous sec- tion show that the techniques we use are quite robust for evaluating alternative structures. The implemented program makes it relatively easy to evaluate belief network structures that represent different dependencies across variables in a prob- lem domain. In this section, we show how the design tool can be used to design a belief network

Table 2 Performance of design tool when parent sizes are not known

Problem size Sample Average # of (# of nodes) size arcs in network

Arcs correctly detected (5% level)

Avg. # %

Arcs added incorrectly (5% level)

Avg # %

Arcs deleted incorrectly (5% level)

Avg. # %

10 100 16.0 500 15.1 1000 15.8 2000 15.1

8.4 12.1 14.0 13.7

52.5% 80.1% 88.6% 90.7%

3.3 1.3 1.4 0.9

20.6% 8.6% 8.9% 6.0%

7.6 3.0 1.8 1.4

47.5% 19.9% 11.4% 9.3%

20 100 38.8 500 38.1 1000 36.2 2000 40.0

20.6 34.2 33.3 39.8

53.1% 89.8% 91.9% 99.5%

10.6 11.8 8.1 10.0

27.3% 30.9% 22.4% 25.0%

18.2 3.9 2.9 0.2

46.9% 10.2 8.0% 0.5%

30 100 63.6 500 65.1 1000 65.7 2000 68

32.3 56.5 62.1 67.1

50.8% 86.8% 94.5% 98.7%

29.8 21.1 23.7 21.5

46.8% 32.6% 36.1% 31.6%

31.3 8.6 3.6 0.1

49.2% 13.2% 5.5% 0.1%

Page 11: An information theoretic technique to design belief network based expert systems

S. Sarkar et al. / Decision Support Systems 17 (1996) 13-30 23

structure for the decision problem faced by audi- tors when examining the financial viability of banks. We briefly describe the problem environ- ment, and discuss why a belief network represen- tation could be appropriate for this task. We then show how the use of latent (unmeasurable) vari- ables help in obtaining parsimonious models. Fi- nally, we illustrate how a belief network can be generated for such an application with the help of the design tool.

4.1. The going-concern decision for banking institu- tions

During audit of a bank, auditors consider whether there is reasonable doubt as to the finan- cial viability of the audited bank. The financial stability evaluation i,; a difficult and complex de- cision. The increase in bank failures witnessed in the US in the last few years has made the finan- cial stability evaluation even more important. For auditors, the most important evidence of financial instability is provided by the financial statements in the form of financial ratios. The financial ra- tios indicate such problems as "poor loan quality", " inadequate capital", or "managerial ineffi- ciency".

In recent years, many alternative models have been proposed for examining the financial viabil- ity of banks [1,2,4,7,14,18-20,28,39]. According to one body of auditing researchers, an auditor's judgment process o:f financial stability can be viewed as that of belief revision, where an auditor starts with an initial belief regarding the viability of an institution, and then revises the belief up-

ward or downward, depending on the observed evidence [3,9,22]. A belief network based expert system would be appropriate for modeling the financial viability task in such a fashion, since it provides an automated means of performing the revision of belief. Auditors can input data which they deem relevant in a given situation and then use the network to provide the revised belief in the viability of the bank under consideration. Based on the revised beliefs, the auditor can choose to either pass a judgement on the bank, or perform further analysis on one or more aspects of the banks financial viability.

When predicting a banks future, a large num- ber of different financial ratios are used. For expositional simplicity, we consider the nine ra- tios shown in Table 3 that have been frequently used in prior studies [34,35]. Since belief network representations require categorical data, the dif- ferent ratios are classified as either above the group average, or below the group average, for each bank.

Using financial ratios to make predictions about financial stability is a complex process, and it usually takes an auditor a large number of years of experience. Due to cognitive limitations, it is usually hard even for experienced auditors to evaluate the effect of a large number of evidence variables simultaneously on some unobservable decision variable [18]. In practice, auditors exam- ine different aspects of a banks performance, and then arrive at a comprehensive evaluation regard- ing its financial health. When analyzing a banks performance, four important factors considered by an auditor are: loan quality, efficiency, prof-

Table 3 Financial ratios used in study

Variable name Description

1. PROVNLNS 2. PROVOPIN 3. NLNSASST 4. ROA 5. ROE 6. MARGIN 7. OPINCAST 8. OEOPINC 9. CAPADQ

Provision for loan losses to Net Loans Provision for loan losses to Total Operating Income Net Loan/Total Assets Net Income/Average Total Assets Net Income/Average Total Equity (Total Interest Income - Total Interest Expenses)/Total Assets Total Operating Income/Total Assets Total Operating Expenses/Total Operating Income Total Equity Capital/Total Assets

Page 12: An information theoretic technique to design belief network based expert systems

24 S. Sarkar et al. / Decision Support Systems 17 (1996) 13-30

Fig. 5. A belief network structure without latent performance factors.

itability, and capital adequacy [34,35]. Three of these factors, loan quality, efficiency and prof- itability are not directly measurable (we call such variables as intermediate variables). Instead audi- tors use surrogate financial ratios to examine a bank's performance in these areas. Such interme- diate variables have been called latent variables in the econometric and sociology literature [27,36,38]. The use of these intermediate latent variables help to decompose large complex prob- lems into smaller tractable ones.

We include such intermediate variables in be- lief networks for two important reasons. First, by incorporating these variables, we can obtain par- simonious models that capture the important de-

pendencies, without having to store a joint distri- bution that completely enumerates the condi- tional probability for the Financial Distress vari- able for every combination of outcomes of differ- ent variables in the network. For example, if we were to represent the dependencies directly be- tween the measurable financial ratios and the Financial Distress condition of a bank, we would have a network of the form shown in Fig. 5.

Since there are nine incoming arcs to the deci- sion node (the Financial Distress condition) in the network, the distribution that has to be stored must include the conditional probabilities associ- ated with all the possible realizations for the set of nine observable variables. This requires storing

\ /,

)

Fig. 6. Using latent factors in the network structure.

Page 13: An information theoretic technique to design belief network based expert systems

S. Sarkar et al. / Decision Support Systems 17 (1996) 13-30 25

and processing an enormous number of probabil- ity parameters when the system is to be used in practice. In addition, obtaining precise estimates for all these parameters is very hard. The latent variables serve as mediating factors between the measurable variables and the final hypothesis, and their use can snbstantially reduce the num- ber of parameters required to represent the rele- vant dependencies. For example, consider the case where the latent factor Loan Quality cap- tures the effect of the ratios PROVLNS, P R O V O P I N and NLNASST on the Financial Distress variable. The revised structure would be as shown in Fig. 6.

Using the Loan Quality factor as an intermedi- ate variable helps to reduce the number of incom- ing arcs for the Financial Distress node in the network, by having the arcs from PROVLNS, P R O V O P I N and NLNASST point to the Loan Quality node instead. The added increase in probability parameters required as a result of incorporating this new Loan Quality variable is more than adequately compensated by the reduc- tion in the probability parameters required for the Financial Distress variable. For instance, if all the variables are considered to be binary, then the number of additional probability parameters required to capture the conditional probability distribution associated with the Loan Quality node is 24 (i.e., 16). However, the reduction in parameters required for the Financial Distress node is 2 l ° - 28, which is 768. Even when the intermediate variable, Loan Quality, is allowed to take on three values instead of two, the number of new parameters required, 3 × 23 (which is 24) is a small fraction o:F the parameters eliminated for the Financial Distress node (which is 2 l ° - 3 x 27= 640). This example clearly illustrates how incorporating intermediate latent variable make the representation far more compact, and subse- quently much more efficient for belief propaga- tion.

A second reason :For incorporating intermedi- ate variables in the belief network structure is that the resulting structure would match the ex- perts intuitive model of the problem. Prior stud- ies indicate that user,s more readily accept expert systems which use a reasoning model that matches

their reasoning process [25]. For example, when using such a structure, the effect of a measurable variable like P R O V O P I N on the Financial Dis- tress condition of a firm is examined as follows. First, the belief regarding the latent variable Loan Quality is revised, and then this revised belief is used to update the belief regarding the Financial Distress condition.

4.2. Designing the belief network

We show how our program can be used to examine feasible belief network structures for this particular problem. The study uses data on the nine financial ratios obtained from 911 com- mercial banks 2, 126 of which eventually filed for bankruptcy. The sample included banks of vary- ing sizes from all regions of the United States. In order to use the latent factors in the belief net- work model, we needed the historical information about these factors as well. However, such infor- mation was not publicly available. Therefore, in order to illustrate the design process, we have augmented the available data by artificially gen- erating data for the three intermediate latent factors. This was done by using the available financial ratios to classify each bank on the three intermediate factors. We have used the rules shown below to generate this data. While the rules used are ad hoc, they capture the essential nature of these variables.

Loan quality • If all the three ratios, PROVNLNS, PROVO-

PIN and NLNASST are above group average, then Loan Quality = Good

• If any two of the three ratios, PROVNLNS, P R O V O P I N and NLNASST are above aver- age, and the other ratio is within 20% of the average, then Loan Quality = Average

• For all other cases, Loan Quality = Poor Profitability • If at least two of the ratios ROA, R O E and

M A R G I N are above average then Profitability = Good

2 This data was obtained from the accounting firm KPMG Peat Marwick, and consists of year-end figures for the year 1988.

Page 14: An information theoretic technique to design belief network based expert systems

26 S. Sarkar et al. / Decision Support Systems 17 (1996) 13-30

Fig. 7. A belief network for predicting financial distress in banks.

• If exactly one of the ratios ROA, R O E and M A R G I N is above average then Profitability = Average If all three ratios ROA, R O E and M A R G I N are below average then Profitability = Poor

Efficiency • If the ratio O E O P I N C is below average then

Efficiency = Good If both the ratios O E O P I N C and OPINCAST are above average, then Efficiency = Average 3

• If the ratio O E O P I N C is above average, and OPINCAST is below average, then Efficiency = Poor The entire data set (that include the financial

ratios, the intermediate latent variables and the subsequent bankruptcy status for each bank in the sample) was provided as input to the pro- gram. To ensure that the available data captured the dependencies that are typically expected across the different variables for this problem, we used the design tool to compare the predictive power of some of the ratios that have been iden- tified in prior studies as important predictors of

3 When the ratio OEOPINC is above average, that is a sign of poor Efficiency. However, if OPINCAST is above average, that is a sign of good Efficiency. Therefore, when the two are observed simultaneously, then the overall Efficiency is deemed to be average.

the financial distress situation. The design tool was used to rank the predictive power of these individual ratios. This was accomplished by pro- viding as alternative structures the different ra- tios, one at a time, as parents of the financial distress variable. The different alternatives were ranked using the mutual information measure. These rankings were found to be consistent with results obtained in prior studies.

To obtain the desired belief network, we al- lowed the program to determine, for the financial distress node and the intermediate variables, the best parent sets of different sizes ranging from one to five variables. For each of the three inter- mediate variables, all the financial ratios were provided as feasible parent variables. For the financial distress variable, the feasible parent variables included all the financial ratios as well as the three intermediate variables. We also used the program to perform the significance tests for parents generated for each node. For example, to determine the final parent set for the Loan Qual- ity variable, the program first calculated the mu- tual information independently conveyed by each financial ratios about Loan Quality. The ratio P R O V O P I N was found to be the most predictive. Although the variable Loan Quality was gener- ated based on the ratios PROVLNS, P R O V O P I N and NLNASST only, the other financial ratios, when considered one at a time, also demon-

Page 15: An information theoretic technique to design belief network based expert systems

S. Sarkar et al. / Decision Support Systems 17 (1996) 13-30 27

strated significant predictive power for Loan Quality. This was because of the correlations that exist between the financial ratios under consider- ation. Subsequently, the program considered the set of two parents that were most predictive, and found the ratios P R O V O P I N and NLNASST as the most predictive 19air. The improvement in the mutual information for the two parent case over the single pa r en t case (i.e., using both P R O V O P I N and NLNASST as compared to us- ing only PROVOP1N) was found to be significant at the 5% level. As expected, the best three- parent set was determined to be PROVLNS, P R O V O P I N and NLNASST. Once again, the additional parent (PROVLNS) was found to sig- nificantly increase the mutual information mea- sure. When parent ~;ets with four variables were considered, then none of the other ratios was found to contribute significantly to the Loan Quality variable. This was consistent with the way the data was generated for the Loan Quality variable. The program obtained the parent set for the intermediate variables Profitability and Effi- ciency in a similar manner. The significant parent sets identified for each intermediate variable are shown in Fig. 7. As expected, the program se- lected exactly those financial ratios as parents for Profitability and Efficiency that were used to generate the data itself.

Next, the program examined the financial ra- tios as well as the intermediate variables as po- tential parents for the Financial Distress node. The variables selected as the best parent(s) for the Financial Distress variable are shown in Table 4.

Table 4 Optimal parent sets of different sizes for the financial distress node

No. of Variables selected parents

ROA Profitability, Efficiency Profitability, Efficiency, Loan Quality Profitability, Efficiency, Loan Quality, CAPADQ Profitability, Efficiency, Loan Quality, CAPADQ, MARGIN

It is interesting to note that the financial ratio R O A was most predictive of the Financial Dis- tress condition when a single variable was consid- ered as a parent. When two variables were con- sidered, then Profitability and Efficiency taken together conveyed the maximum information about the Financial Distress condition. The fact that R O A was not included any more is not unusual, since our technique explicitly considers the mutual interaction of the variables in the parent set. The best parent set of size three included the three intermediate variables Prof- itability, Efficiency and Loan Quality. We found that the increase in mutual information on adding Loan Quality to the parent set was not significant at the 5% level. A likely reason for this is that we have used fairly rudimentary rules to generate data for the Loan Quality variable. An experi- enced auditor would usually be able to classify the Loan Quality of a bank in a more precise manner. When such information is used for con- structing the network, we expect Loan Quality to be more significant. We found that the financial ratio C A P A D Q was included in the best parent set, along with the three intermediate variables, when parent sets consisting of four variables were considered. Once again, the increase in mutual information was not significant at the 5% level. The same was true for the best five-variable par- ent set, which included M A R G I N to the earlier solution.

In practice, we expect designers to consider solutions that result in the most logical structure for an application domain. Thus, even when the effect of a variable is not at the level of signifi- cance often associated with statistical tests, the expert may feel that it should be incorporated in the network. In such cases, an important consid- eration will be to examine the effect of the addi- tional variable towards the computational effi- ciency of the resulting network (when used to make inferences in actual use). For instance, hav- ing three intermediate variables as parents for the Financial Distress node requires a conditional probability distribution for that node with 54 probability parameters. Adding C A P A D Q as the fourth parent requires 108 probability parame- ters. The increase in computational requirements

Page 16: An information theoretic technique to design belief network based expert systems

28 s. Sarkar et al. / Decision Support Systems 17 (1996) 13-30

for propagating beliefs is proportional to the in- crease in the probability parameters required. Since having a larger number of variables as parents usually leads to bet ter solutions (in some small measure), the decision to include them will depend on the computational resources that are available in the work environment. Benchmark tests can be performed to estimate the response times that result from different sizes of the par- ent set. Results of such tests can help to decide whether a larger parent set is worthwhile or not.

5. Discussion

In this paper, we present a design tool to assist in the development of belief network based ex- pert systems. The design tool uses information theoretic measures to compare alternative struc- tures. The program can estimate the necessary probability parameters for network representa- tions from sample data. Three important capabil- ities of the design tool are discussed with the help of examples. They are: (i) evaluating alternative structures; (ii) finding optimal networks with specified connectivity; and (iii) eliminating weak dependencies from derived network structures. We examine the performance of the design tool on many sets of simulated data, and show that the design tool is fairly accurate in recovering the important dependencies across variables in a problem domain. We illustrate how this program can be used to design a belief network for evalu- ating the financial distress situation for banks. We also show that the use of intermediate latent variables can lead to a parsimonious representa- tion of the important dependencies for the finan- cial distress problem. The techniques imple- mented in the design tool allow different struc- tures to be easily evaluated, and can enable de- signers to consider many alternatives before arriv- ing at a final design.

An interesting aspect of the design tool is that it allows users who have expertise in the applica- tion domain to guide the search for a desired network structure using their domain knowledge. Such users can often restrict the search process to consider only those structures that are appro-

priate for the relevant problem. The experimen- tal results indicate that when used in this fashion, the program can serve as a powerful design tool. Relatively naive users can use the design tool to generate computationally efficient structures, and test the dependencies in proposed structures for statistical significance when sample data is used to generate these structures. When used in this fashion, the program will usually identify the right set of predictive attributes for each variable. The potential drawback is that it could also select additional variables that may not always be truly predictive.

In this research, the design of a belief network for the financial distress problem has been of an exploratory nature. An important issue for future research is to examine the performance of such a belief network based expert system. One way to do this would be to empirically validate an expert system that uses a belief network mechanism by comparing it to practicising auditors. Designs ob- tained from the current program can be imple- mented in the expert system, and used to revise beliefs about a bank's financial distress status based on observations of the financial ratios that are part of the structure. The results obtained from the system could then be compared with revised beliefs of auditors when provided with the same information. To this end, we are currently examining the financial distress problem in fur- ther detail.

Acknowledgements

We wish to thank the anonymous referees whose helpful comments and suggestions have improved the quality of this paper.

References

[1] E.I. Altman, Financial Ratios, Discriminant Analysis, and the Prediction of Corporate Bankruptcy, Journal of Finance, September (1968) 589-609.

[2] E.I. Altman, R. Haldeman and P. Narayanan, Zeta Anal- ysis: A New Model to Identify Bankruptcy Risk of Cor- porations, Journal of Banking and Finance, June (1977) 29-54.

Page 17: An information theoretic technique to design belief network based expert systems

S. Sarkar et al. / Decision Support Systems 17 (1996) 13-30 29

[3] S.K. Asare and W.F. Messier, A Review of Audit Re- search Using the Belief-Adjustment Model, in: L.A. Ponemon and D.R.L. Gabhart (Eds.), Auditing: Ad- vances in Behavioral Research (Springer-Verlag, Berlin, 1991) pp. 75-91.

[4] R. Barniv and A. Raveh, Identifying Financial Distress: A New Nonparametric Approach, Journal of Business Finance and Accounting, Summer (1989) 361-383.

[5] D. Bobrow, Qualitative Reasoning About Physical Sys- tems (Elsevier, Amsterdam, 1984).

[6] G.W. Brier, Verification of Forecasts Expressed in Terms of Probability, Monthly Weather Review 78(1) (1958) 1-3.

[7] C.J. Casey and N. Bartczak, Using Operating Cash Flow Data to Predict Financial Distress: Some Extensions, Journal of Accounting Research, Spring (1985) 384-401.

[8] G.F. Cooper, NESTOR: A Computer-Based Medical Di- agnostic Aid that i[ntegrates Causal and Probabilistic Knowledge, PhD Di:~sertation, Stanford University, Stan- ford, CA, 1984.

[9] W.N. Dilla, R.G. File, I. Solomon and L.A. Tomassini, Predictive bankruptcy Judgements by Auditors: A Proba- bilistic Approach, in: L.A. Ponemon and D.R.L. Gabhart (Eds.), Auditing: Advances in Behavioral Research (Springer-Verlag, Berlin, 1991) 75-91.

[10] R.O. Duda, P.E.Hart, K.Konolige and R.Reboh, A Com- puter-Based Consukant for Mineral Exploration, Final Report, SR1 Projects 6415, SRI International, Menlo Park, CA, September 1979.

Il l] T.L. Fine, Theories of Probability (Academic Press, New York. 1973).

[12] l.J. Good, Rational Decisions, Journal of the Royal Sta- tistical Society B14 (1952) 107-114.

[13] D.E. Heckerman, Probabilistic Interpretation for MY- CIN's Certainty Factors, in: L.N. Kanal and J.F. Lemmer (Eds.), Uncertainty in Artificial Intelligence (North Hol- land, Amsterdam, 1586).

[14] F.L. Jones, Current Techniques in Bankruptcy Predic- tion, Journal of Accounting Literature 6 (1987) 131-164.

[15] S. Kullback, Information Theory and Statistics (Wiley, New York, 1959).

[16] H. Kyburg, Probability and Inductive Logic (MacMillan, London, 1970).

[17] S.L. Lauritzen and D.J. Spiegelhalter, Local Computa- tion with Probabilities in Graphical Structures and Their Applications to Expert Systems, Journal of the Royal Statistical Society B 50(2) (1988) 157-224.

[18] R. Libby, The Use of Simulated Decision Makers in Information Evaluation, The Accounting Review, July (1975) 475-489.

[19] D. Martin, Early Warning of Bank Failure, Journal of Banking and Finance, 1 (1977) 29-276.

[20] W.F, Messier, Jr. and J.V. Hansen, Inducing Rules for Expert Systems Development, Management Science 34(12) (1988) 1403-1415.

[21] R.E. Neapolitan, Probabilistic Reasoning in Expert Sys- tems: Theory and Al:gorithms (Wiley, New York, 1990).

[22] J.A. Ohlson, Financial Ratios and the Probabilistic Pre- diction of Bankruptcy, Journal of Accounting Research, Spring (1980) 109-131.

[23] J. Pearl, Fusion, Propagation, and Structuring in Belief Networks, Artificial Intelligence 29 (1986) 241-288.

[24] J. Pearl, Probabilistic Reasoning in Intelligent Systems, Morgan Kaufman, San Mateo, CA, 1988.

[25] R. Quinlan, Decision Trees and Decision Making, IEEE Transactions on Systems, Man and Cybernetics 20 (1990) 339-346.

[26] T.B. Roby, Belief States and the Uses of Evidence, Behavioral Science 10 (1965) 255-270.

[27] D.E. Rumelhart and J.L. McClelland, Parallel Dis- tributed Processing: Explorations in the Microstructure of Cognition, Vols. I and II (MIT Press, Cambridge, MA, 1986).

[28] L.M. Salchenberger, E.M. Cinar and N.A. Lash, Neural Networks: A New Tool for Predicting Thrift Failures, Decision Sciences 23(4) (1992).

[29] S. Sarkar and I. Murthy, Criteria to Evaluate Approxi- mate Belief Network Representations in Expert Systems, Decision Support Systems 15 (1995) 323-350.

[30] S. Sarkar and I. Murthy, Some Theoretical Results for Approximating Belief Network Structures, Working Pa- per, Department of Information Systems & Decision Sciences, Louisiana State University, Baton Rouge, 1994.

[31] S. Schocken and P.R. Kleindorfer, Artificial Intelligence Dialects of the Bayesian Belief Revision Language, IEEE Transactions on Systems, Man and Cybernetics 19 (1989) 1106-1121.

[32] G. Shafer, A Mathematical Theory of Evidence (Prince- ton University Press, Princeton, NJ, 1976).

[33] E.H. Shortliffe and B.G. Buchanan, A Model of Inexact Reasoning in Medicine, in: B.G. Buchanan and E.H. Shortliffe (Eds.), Rule-Based Expert Systems (Addison Wesley, Reading, MA, 1984).

[34] J.F. Sinkey, A Multivariate Statistical Analysis of the Characteristics of Problem Banks, Journal of Finance, March (1975) 21-35.

[35] J.F. Sinkey, Problem Banks: Identification and Charac- teristics, Journal of Bank Research, Winter (1975) 208- 217.

[36] P. Spirtes, C. Glymour and R. Scheines, Causation, Pre- diction and Search, Lecture Notes in Statistics (Springer-Verlag, Berlin, 1993).

[37] C.-A.S. Stael von Holstein, Assessment and Evaluation of Subjective Probability Distributions, The Economic Re- search Institute at the Stockholm School of Economics, Stockholm, 1970.

[38] L. Steeles, Components of Expertise, AI Magazine, Win- ter (1990).

[39] K.Y. Tam and M.Y. Kiang, Managerial Applications of Neural Networks: The Case of Bank Failure Predictions, Management Science 38(7) (1992) 926-947.

[40] A. Tversky and D. Kahneman, Judgement Under Uncer- tainty: Heuristics and Biases, Science 185 (1974) 1124- 1131.

Page 18: An information theoretic technique to design belief network based expert systems

30 S. Sarkar et al. / Decision Support Systems 17 (1996) 13-30

[41] R.L. Winkler and R.M. Poses, Evaluating and Combining Physicians' Probabilities of Survival in an Intensive Care Unit, Management Science 39(12) (1993) 1526-1543.

[42] L.A. Zadeh, The Concept of a Linguistic Variable and Its Application to Approximate Reasoning, Information Sci- ences 9 (1975) 43-80.

Sumit Sarkar is Assistant Professor of Management Information Systems at the College of Business at Louisiana State University. He re- ceived his MS and PhD degrees in Computers and Information Systems from the University of Rochester. His current research interests include the design of knowledge-based systems, the representation of uncertainty in expert systems and databases, and the

economics of information systems. He is a member of AAAI, ACM, INFORMS and IEEE Computer Society.

Ram S. Sriram is an Associate Pro- fessor of Accounting Information Sys- tems at Georgia State University. He received his PhD in Accounting from University of North Texas. He is also a Certified Public Accountant and Certified Fraud Examiner. His cur- rent research interests are in the ar- eas of auditing, neural networks, and expert systems.

Shibu Joykutty received his BS de- gree in Physics from Gandhi Univer- sity, Kerala, India in July 1987, and his MS degree in Operations Re- search and Computer Applications from Cochin University of Science and Technology, India, in December 1989. He worked as a Programmer Analyst and Consultant at Delta Management Consultants from January 1990 to De- cember 1992. He received his MS de- gree in Quantitative Business Analy-

sis from Louisiana State University, Baton Rouge, in May 1995. Currently he is an internal auditor at the Corporate Audit Services at Sprint, Inc.

Ishwar Murthy is Associate Professor in the Department of Information Systems and Decision Sciences at Louisiana State University, Baton Rouge. He received his PhD Degree in Management Science from Texas A&M University. Dr. Murthy's re- search interests inlcude Network Op- timization, Multiobjective Optimiza- tion and Mathematical Programming Applications in Telecommunications and Expert Systems. His published

articles have appeared in Annals of Operations Research, European Journal of Operational Research, Naval Research Logistics and Operations Research, among others.