bayesian belief networks: computational considerations for the environmental researcher stephen...

1
Bayesian Belief Networks: Computational Considerations for the Environmental Researcher Stephen Jensen & Alix I Gitelman Statistics Department Oregon State University August, 2003 Bayesian Belief Networks (BBN) Bayesian Belief Networks are a class of models which can be used to describe succinctly the dependencies and interactions between large sets of variables. BBN have been used extensively in fields such as artificial intelligence and decision sciences. BBN are also well-suited to environmental research with its large numbers of variables and extensive dependencies and interactions. In fact, BBN are ideally suited to studying relationships between effluent limitations and water quality (e.g., Borsuk et al., 2003). BBN are graphical models—a set of nodes represents the random variables and a set of vertices (edges) represents relationships between the nodes. A BBN can contain directed or undirected vertices, and even a mixture of the two (some examples are shown below): Undirected BBN have their genesis in statistical physics work by Gibbs (1902). Additionally, they have been used through the years for modeling spatial interactions (Besag, 1974), interpreting hierarchical log-linear models by analogy to Markov random fields (Darroch et al., 1980). Directed BBN originated in path analysis (Wright 1921). They have seen much more use recently in the guise of causal networks (Pearl, 1986; Lauritzen & Speed, 1988). Estimating BBN Several directed BBN algorithms have been devised, including HUGIN (Andersen et al., 1989), TETRAD (Sprites et al., 1993) and DEAL (Bøttcher & Dethlefsen, 2003). To determine a BBN model for a set of data, a canned package such as HUGIN or DEAL can be used exclusively. These packages search across some space of likely models using either an exhaustive search or possibly a heuristic like a greedy search. The networks are scored in some manner to determine the best candidates. These packages are useful for structural learning—that is, understanding which nodes are connected to which. Using these packages, the probability of an edge being included in the graph is not given, and neither are standard error estimates provided for parameter (i.e., node probability) estimates. Furthermore, for efficiency, these canned packages all require frequent triangulation of a graph, which is known to be an NP-hard problem. Although restriction to decomposable models only requires a single triangulation. (Deshpande et al., 2001). An extension of this method is possible by first using HUGIN or another package first to find a set of candidate models (that is, to perform the structural learning), and then estimate the parameters of those models by using a Markov Chain Monte Carlo method (this is the parameter learning phase). In this way, varaibility estimates for the model parameters can be calculated. However, this approach is two- stage, and in it, probabilities for an edge being included in the model remain uncalculated. Others have simply assumed a structure for a BBN, and then used extant software to estimate the strength of the edges (Borsuk et al., 2003). Reversible jump Markov chain Monte Carlo (RJMCMC; Green, 1995) is a generalization of the Metropolis-Hastings algorithm to models with varying dimension of the parameter space. This makes it well suited for graphical model estimation, where the number (and direction) of edges may be unknown a priori. That is, using the RJMCMC algorithm, we can accomplish both structural and parameter learning. Furthermore, edge probabilities and estimate variability are calculated as a matter of course. RJ MCMC works well for discrete undirected graphical models (Dellaportas & Forster, 1999), but is computationally demanding. Efficiency is improved by restricting the search to decomposable models. Algorithms for undirected decomposable graphs (UDGs) exist in purely continuous (Giudici & Green, 1999) and purely discrete settings (Giudici et al., 2000). Fronk & Giudici (2000) provided an algorithm for directed acyclic graphs (DAGs). The RJ MCMC algorithms for graphical models involve three steps: Recommendations Depending on the nature of prior information regarding the ecological systems one wishes to model using BBN, there are several computational approaches from which to choose. •Assume “known” nodes, perform structural learning (e.g., DEAL, HUGIN). •Assume “known” edges, perform parameter learning (e.g., WINBUGS). •Perform a two-step approach—first learn the structure, then the parameters. •Perform structural and parameter learning simultaneously (RJMCMC; e.g., BayesX). Tradeoffs for these approaches include computational expenses (in terms of programming costs and CPU time) and the feasibility (really, availability) of probability and uncertainty estimation. Four Node Example: Structural Learning vs RJMCMC Methods or “This is all very nice, Steve, but will it work?” In this example we consider a simple data set of four continuous variables, to which we will fit a directed graphical model. This model is fit by using a traditional algorithm (in this case DEAL), and also by an RJ MCMC algorithm (using the package BayesX). The data are Mid-Atlantic Integrated Assessment (MAIA) data set for 1997-1998. Specifically, the variables are BUGIBI, an index of biotic integrity for macro-invertebrates; LAT, latitude of the sample point; ANC, acid neutralizing capacity; and SO4,sulfur dioxide. References Andersen, S.K., Olesen, K.G., Jensen, F.V. and Jensen, F. (1989). HUGIN – A shell for building Bayesian belief universes for expert systems, in Shafer, G. and Pearl, J. (eds) (1990). Readings in Uncertain Reasoning, Morgan Kaufmann Publishers, San Mateo, CA. BayesX. http://www.stat.uni-muenchen.de/~lang/bayesx/bayesx.html Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems (with discussion). J. Roy. Statist. Soc. Ser. B 36 302-309. Borsuk, M.E., Stow, C.A., Reckhow, K.H. (2003) Integrated approach to total maximum daily load development for Neuse River Estuary using Bayesian probability network model. J. Water Res. Pl. & Mgmt. 129,4 271-282. Bøttcher, S.G. and Dethlefsen, C. (2003). DEAL: A package for learning Bayesian networks . Technical report, R-2003- 03. Department of Mathematical Sciences, Aalborg University. BUGS. http://www.mrc-bsu.cam.ac.uk/bugs/welcome.shtml Darroch, J.N., Lauritzen, S.L. and Speed, T.P. (1980). Markov fields and log-linear interaction models for contingency tables. Ann. Stat. 8 522-539. Dawid, A. and Lauritzen, S. (1993). Hyper Markov laws in the statistical analysis of decomposable graphical models. Ann. Stat., 21 1272-1317. DEAL. http://www.math.auc.dk/novo/deal/ Dellaportas, P., Forster, J.J. (1999). Markov chain Monte Carlo model determination for hierarchical and graphical log-linear models. Biometrika 86 615-633. Deshpande, A., Garofalakis, M. and Jordan, M.I. (2001). In J.Breese and D. Koller (eds)., Uncertainty in Artificial Intelligence (UAI), Proceedings of the Seventeenth Conference 2001. Fronk, E. and Giudici, P. (2000). Markov chain Monte Carlo model selection for DAG models. Technical report #118, Dept. of Political Economy and Quantitative Methods, University of Pavia. Gansner, E., Koutsofis, E., and North, S. Drawing graphs with dot. http://www.research.att.com/sw/tools/graphviz/dotguide.pdf Gibbs, W. (1902). Elementary Principles of Statistical Mechanics. Yale University Press, New Haven, Connecticut. Giudici, P., Green, P.J. and Tarantola, C. (2000). Efficient model determination for discrete graphical models. Biometrika. To appear. Giudici, P. and Green, P.J. (1999). Decomposable graphical Gaussian model determination. Biometrika 86 785- 801. Green, P.J. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82 711-732. HUGIN. http://www.hugin.com/ Lauritzen, S.L. (2003). gRaphical models -- A Software Perspective. Slides from DSC-2003. Lauritzen, S.L. and Spiegelhalter, D.J. (1988). Local computations with probabilities on graphical structures and their application to expert systems (with discussion). J. Roy. Stat. Soc. Ser. B 50 157-224. Pearl, J. (1986). Fusion, propagation and structuring in belief networks. Artificial Intelligence 29 241-288. Sprites, P., Glymour, C. and Scheines, R. (1993). Causation, Prediction and Search. Springer-Verlag, New York. TETRAD. http://www.phil.cmu.edu/projects/tetrad/ Structural Learning Method Reversible Jump MCMC Method Probability: 0.130 Probability: 0.125 Directed Graph Undirected Graph Chain Graph This research is funded by U.S.EPA – Science To Achieve Results (STAR) Program Cooperative Agreement #CR-829095 The work reported here was developed under the STAR Research Assistance Agreement CR-829095 awarded by the U.S. Environmental Protection Agency (EPA) to Colorado State University. This poster has not been formally reviewed by EPA. The views expressed here are solely those of the authors and the STARMAP, the Program they represents. EPA does not endorse any products or commercial services mentioned in this presentation. Probability: 0.107 Probability: 0.638 Relative Score = 1 Relative Score = 0.8211 Relative Score = 0.7234 Relative Score = 0.7999 Title: deal1 Creator: dot version 1.10 (Wed Jul 9 23:09 Preview: This EPS picture was not saved with a preview included in it. Comment: This EPS picture will print to a PostScript printer, but not to other types of printers. Title: deal3 Creator: dot version 1.10 (Wed Jul 9 2 Preview: This EPS picture was not save with a preview included in it Comment: This EPS picture will print t PostScript printer, but not t other types of printers. Title: deal4 Creator: dot version 1.10 (Wed Jul Preview: This EPS picture was not with a preview included i Comment: This EPS picture will pri PostScript printer, but n other types of printers. Title: deal2 Creator: dot version 1.10 (Wed Ju Preview: This EPS picture was not with a preview included Comment: This EPS picture will pr PostScript printer, but other types of printers. This poster is brought to you by:

Upload: lucinda-lane

Post on 31-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bayesian Belief Networks: Computational Considerations for the Environmental Researcher Stephen Jensen & Alix I Gitelman Statistics Department Oregon State

Bayesian Belief Networks: Computational Considerations for the Environmental Researcher

Stephen Jensen & Alix I GitelmanStatistics Department

Oregon State UniversityAugust, 2003

Bayesian Belief Networks (BBN)Bayesian Belief Networks are a class of models which can be used to describe succinctly the

dependencies and interactions between large sets of variables. BBN have been used extensively in fields such as artificial intelligence and decision sciences. BBN are also well-suited to environmental research with its large numbers of variables and extensive dependencies and interactions. In fact, BBN are ideally suited to studying relationships between effluent limitations and water quality (e.g., Borsuk et al., 2003).

BBN are graphical models—a set of nodes represents the random variables and a set of vertices (edges) represents relationships between the nodes. A BBN can contain directed or undirected vertices, and even a mixture of the two (some examples are shown below):

Undirected BBN have their genesis in statistical physics work by Gibbs (1902). Additionally, they have been used through the years for modeling spatial interactions (Besag, 1974), interpreting hierarchical log-linear models by analogy to Markov random fields (Darroch et al., 1980). Directed BBN originated in path analysis (Wright 1921). They have seen much more use recently in the guise of causal networks (Pearl, 1986; Lauritzen & Speed, 1988).

Estimating BBNSeveral directed BBN algorithms have been devised, including HUGIN (Andersen et al., 1989),

TETRAD (Sprites et al., 1993) and DEAL (Bøttcher & Dethlefsen, 2003). To determine a BBN model for a set of data, a canned package such as HUGIN or DEAL can be used exclusively. These packages search across some space of likely models using either an exhaustive search or possibly a heuristic like a greedy search. The networks are scored in some manner to determine the best candidates. These packages are useful for structural learning—that is, understanding which nodes are connected to which. Using these packages, the probability of an edge being included in the graph is not given, and neither are standard error estimates provided for parameter (i.e., node probability) estimates. Furthermore, for efficiency, these canned packages all require frequent triangulation of a graph, which is known to be an NP-hard problem. Although restriction to decomposable models only requires a single triangulation. (Deshpande et al., 2001).

An extension of this method is possible by first using HUGIN or another package first to find a set of candidate models (that is, to perform the structural learning), and then estimate the parameters of those models by using a Markov Chain Monte Carlo method (this is the parameter learning phase). In this way, varaibility estimates for the model parameters can be calculated. However, this approach is two-stage, and in it, probabilities for an edge being included in the model remain uncalculated. Others have simply assumed a structure for a BBN, and then used extant software to estimate the strength of the edges (Borsuk et al., 2003).

Reversible jump Markov chain Monte Carlo (RJMCMC; Green, 1995) is a generalization of the Metropolis-Hastings algorithm to models with varying dimension of the parameter space. This makes it well suited for graphical model estimation, where the number (and direction) of edges may be unknown a priori. That is, using the RJMCMC algorithm, we can accomplish both structural and parameter learning. Furthermore, edge probabilities and estimate variability are calculated as a matter of course. RJ MCMC works well for discrete undirected graphical models (Dellaportas & Forster, 1999), but is computationally demanding. Efficiency is improved by restricting the search to decomposable models. Algorithms for undirected decomposable graphs (UDGs) exist in purely continuous (Giudici & Green, 1999) and purely discrete settings (Giudici et al., 2000). Fronk & Giudici (2000) provided an algorithm for directed acyclic graphs (DAGs).

The RJ MCMC algorithms for graphical models involve three steps:

1. Select an edge for addition, removal or reversal.

2. Decide whether to make the structural change and update the structure and parameters.

3. Update the parameters in some fashion without making a structural change.

RJ MCMC algorithms will often require the same re-triangulation of the graph as the traditional algorithms, making the structural updating quite computationally intensive. Restriction to decomposable models makes the computation easier, as the model structure can be updated in polynomial time (Desphande et al., 2001). Updating the parameters (step 3 above) is a simple Metropolis-Hastings step, which is rather quick.

RecommendationsDepending on the nature of prior information regarding the ecological systems one wishes to model

using BBN, there are several computational approaches from which to choose.

•Assume “known” nodes, perform structural learning (e.g., DEAL, HUGIN).

•Assume “known” edges, perform parameter learning (e.g., WINBUGS).

•Perform a two-step approach—first learn the structure, then the parameters.

•Perform structural and parameter learning simultaneously (RJMCMC; e.g., BayesX).

Tradeoffs for these approaches include computational expenses (in terms of programming costs and CPU time) and the feasibility (really, availability) of probability and uncertainty estimation.

Four Node Example: Structural Learning vs RJMCMC Methods

or “This is all very nice, Steve, but will it work?”In this example we consider a simple data set of four continuous variables, to which we will fit a directed graphical model. This model is fit by using a traditional algorithm (in this case DEAL), and also by an RJ

MCMC algorithm (using the package BayesX). The data are Mid-Atlantic Integrated Assessment (MAIA) data set for 1997-1998. Specifically, the variables are BUGIBI, an index of biotic integrity for macro-invertebrates; LAT, latitude of the sample point; ANC, acid neutralizing capacity; and SO4,sulfur dioxide.

ReferencesAndersen, S.K., Olesen, K.G., Jensen, F.V. and Jensen, F. (1989). HUGIN – A shell for building Bayesian belief universes for expert systems, in Shafer, G. and

Pearl, J. (eds) (1990). Readings in Uncertain Reasoning, Morgan Kaufmann Publishers, San Mateo, CA.

BayesX. http://www.stat.uni-muenchen.de/~lang/bayesx/bayesx.html

Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems (with discussion). J. Roy. Statist. Soc. Ser. B 36 302-309.

Borsuk, M.E., Stow, C.A., Reckhow, K.H. (2003) Integrated approach to total maximum daily load development for Neuse River Estuary using Bayesian probability network model. J. Water Res. Pl. & Mgmt. 129,4 271-282.

Bøttcher, S.G. and Dethlefsen, C. (2003). DEAL: A package for learning Bayesian networks. Technical report, R-2003-03. Department of Mathematical Sciences, Aalborg University.

BUGS. http://www.mrc-bsu.cam.ac.uk/bugs/welcome.shtml

Darroch, J.N., Lauritzen, S.L. and Speed, T.P. (1980). Markov fields and log-linear interaction models for contingency tables. Ann. Stat. 8 522-539.

Dawid, A. and Lauritzen, S. (1993). Hyper Markov laws in the statistical analysis of decomposable graphical models. Ann. Stat., 21 1272-1317.

DEAL. http://www.math.auc.dk/novo/deal/

Dellaportas, P., Forster, J.J. (1999). Markov chain Monte Carlo model determination for hierarchical and graphical log-linear models. Biometrika 86 615-633.

Deshpande, A., Garofalakis, M. and Jordan, M.I. (2001). In J.Breese and D. Koller (eds)., Uncertainty in Artificial Intelligence (UAI), Proceedings of the Seventeenth Conference 2001.

Fronk, E. and Giudici, P. (2000). Markov chain Monte Carlo model selection for DAG models. Technical report #118, Dept. of Political Economy and Quantitative Methods, University of Pavia.

Gansner, E., Koutsofis, E., and North, S. Drawing graphs with dot. http://www.research.att.com/sw/tools/graphviz/dotguide.pdf

Gibbs, W. (1902). Elementary Principles of Statistical Mechanics. Yale University Press, New Haven, Connecticut.

Giudici, P., Green, P.J. and Tarantola, C. (2000). Efficient model determination for discrete graphical models. Biometrika. To appear.

Giudici, P. and Green, P.J. (1999). Decomposable graphical Gaussian model determination. Biometrika 86 785-801.

Green, P.J. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82 711-732.

HUGIN. http://www.hugin.com/

Lauritzen, S.L. (2003). gRaphical models -- A Software Perspective. Slides from DSC-2003.

Lauritzen, S.L. and Spiegelhalter, D.J. (1988). Local computations with probabilities on graphical structures and their application to expert systems (with discussion). J. Roy. Stat. Soc. Ser. B 50 157-224.

Pearl, J. (1986). Fusion, propagation and structuring in belief networks. Artificial Intelligence 29 241-288.

Sprites, P., Glymour, C. and Scheines, R. (1993). Causation, Prediction and Search. Springer-Verlag, New York.

TETRAD. http://www.phil.cmu.edu/projects/tetrad/

Waagepetersen, R. and Sorensen, D. (2001). A tutorial on reversible jump MCMC with a view toward QTL-mapping. International Statistical Review 69 49-61.

Wright, S. (1921). Correlation and causation. J. Agric. Res. 20 557-585.

Structural Learning Method Reversible Jump MCMC Method

Probability: 0.130 Probability: 0.125

Directed Graph Undirected Graph Chain Graph

This research is funded by

U.S.EPA – Science To AchieveResults (STAR) ProgramCooperativeAgreement

#CR-829095

The work reported here was developed under the STAR Research Assistance Agreement CR-829095 awarded by the U.S. Environmental Protection Agency (EPA) to Colorado State University. This poster has not been formally reviewed by EPA.  The views expressed here are solely those of the authors and the STARMAP, the Program they represents. EPA does not endorse any products or commercial services mentioned in this presentation.

Probability: 0.107 Probability: 0.638

Relative Score = 1 Relative Score = 0.8211

Relative Score = 0.7234Relative Score = 0.7999

Title:deal1Creator:dot version 1.10 (Wed Jul 9 23:09:17 EDT 2003)Preview:This EPS picture was not savedwith a preview included in it.Comment:This EPS picture will print to aPostScript printer, but not toother types of printers.

Title:deal3Creator:dot version 1.10 (Wed Jul 9 23:09:17 EDT 2003)Preview:This EPS picture was not savedwith a preview included in it.Comment:This EPS picture will print to aPostScript printer, but not toother types of printers.

Title:deal4Creator:dot version 1.10 (Wed Jul 9 23:09:17 EDT 2003)Preview:This EPS picture was not savedwith a preview included in it.Comment:This EPS picture will print to aPostScript printer, but not toother types of printers.

Title:deal2Creator:dot version 1.10 (Wed Jul 9 23:09:17 EDT 2003)Preview:This EPS picture was not savedwith a preview included in it.Comment:This EPS picture will print to aPostScript printer, but not toother types of printers.

This poster is brought to you by: