visualizing inference in large bayesian networks

8/13/2019 Visualizing Inference in Large Bayesian Networks

1/14

1 IntroductionThe application of machine learning in the 21 stcentury is

increasingly both exciting and challenging, with many

orders of magnitude more digital data available than

before. The amount of text data on the internet has

increased from an estimated couple terabytes in 1997, to

Twitter.com alone storing 5 gigabytes of new tweets

daily !1"!2". This does not count the many private data#

bases used in enterprise, such as the petabytes of cus#

tomer and transaction information $al#%art &tores

retains !'".

Though the collectionof raw data will continue to have its

challenges and costs, significant attention has now

turned to the problem of utilizingall of this data. $hether

the goal is indexing, data#mining, or building predictive

models, today(s challenge is fundamentally tied to the

enormous number of observations and variables cap#

tured. This is the so#called )big data* problem.

+ually important as the algorithms or storage systems

are the visuali-ation methods. resentation is not simply

an aesthetic concern. +dward Tufte writes that )often themost effective way to describe, explore, and summari-e a

set of numbers / even a very large set / is to loo0 at pic#

tures of those numbers,* and that data graphics can be

both )the simplest !and" most powerful* of methods !".

ayesian networ0s in many applications enable efficient

and scalable statistical and causal modeling !5", and have

a natural visual representation following from theirgraph structure. y viewing a visual representation of

the graph structure one can uic0ly identify potential

correlations or causal relationships between variables

simply by viewing the presence of edges in the rendered

graph structure. eyond this simple case, more sophisti#

cated analysis by visual means alone is difficult, espe#

cially as the model grows in si-e. 3iewing conditional dis#

tributions with more than a couple parent variables

uic0ly becomes unwieldy, and networ0s with upwards

of 5 variables can be difficult for a user to navigate and

parse visually. 4ecent wor0 has focused on improvingvisuali-ation and navigation in large networ0s of up to

thousands of variables !", and is not a solved problem

yet. To understand these large, modern networ0s more

efficiently, and in turn better utili-e the wealth of data

available in the era of big data, new methods of visuali-a#

tion are needed.

To this end, we introduce two ideas to assist the visual

analysis of large ayesian networ0s6 inference diffs and rel-

evance filtering.

2 Summary of Prior Work

Though creating effective ways to visuali-e ayesian net#

wor0s is not a new problem relative to the age of

)ayesian networ0s* proposed in detail by earl in 1988

!7", it appears to be a problem that has received rela#

:raft v5 1 21' :ecember 9

Visualizing Inference in Large Bayesian Networks

;lifford ;hampion:epartment of ;omputer &cience and +ngineering

n this pro?ect we address the challenge of viewing and usingayesian networ0s as their structural si-e and complexity grows. $eintroduce two new visuali-ation methods, inference diffs, and relevancefiltering, to enable visual analysis of information flow in thesenetwor0s, and to enable direct comparison of two evidenceconfigurations simultaneously. $e implement and discuss theperformance of these visuali-ation methods on two modestly largenetwor0s, built from real#world data.


2/14

tively little attention. $hile there have been advances in

visuali-ing large graphs such as those surveyed by &cha#

effer !8", these methods depend on basic graph#theoretic

information at most, such as cliues and node degree,

and don(t directly consider the probabilistic aspects of

ayesian networ0s.

@evertheless, some variety of visual designs and princi#ples specific to ayesian networ0s have been developed

or explored over time, and are briefly recounted here.

To visuali-e causal relationships globally, so#called tem#

poral or causal layouts are popular, placing ancestors

e.g. independent variables near the top of the visual

layout and descendents e.g. dependent variables near

the bottom, for a generally downward flow of edge direc#

tions, for a downward flow of causation. This 0ind of lay#

out is often used without explicit mention and is a fea#

ture of some directed graph layout algorithms, but Aap#ato#4ivera et al., and ;hiang et al. called out this layout

explicitly !9"!1".

To visuali-e local influence i.e. between exactly two vari#

ables the direction of the edge arrow is of course well#

established for indicating the direction of modeled cause

and effect. eyond simply the edge direction, or its pres#

ence at all, Aapato#4ivera et al. explored fixed color

assignments to independent variables, and color mix#

tures thereof to dependent variables, weighted to indi#

cate relative strength influence from the parent variables!9". Aapato#4ivera also considered varying edge lengths,

so that mutually influential nodes appear nearer to one

another than if uninfluential. Burther, both Aap#

ato#4ivera et al. and Coiter explored varying edge thic0#

ness to indicate influence between parent and child vari#

ables, using thic0 lines for strong influence. +ach dis#

cussed various analytical definitions for computing the

input values needed for these visuali-ation techniues !9"

!11".

To visuali-e conditional probability tables ;Ts, ;hainget al. proposed miniature 2d heatmaps attached to edges

!1", however this is appears to be well#defined only for

children with exactly one parent each. ;ossalter et al.

introduced )bubble lines* connecting nodes in the net#

wor0 to floating ;T windows, ma0ing it easier for the

user to 0eep their bearings while debugging ;Ts in large

networ0s. They also introduced a numerical difference

view for viewing ;Ts of two variables expected to have

similar local distributions !12".

To visuali-e the presence of evidence, common practice

in literature is to draw a double#border around observed

nodes variables with evidence, or to use shading on the

interior of the node. $illiams and Dmant experimentedwith using different colors of shading to indicate differ#

ent evidence values !1'".

To visuali-e marginal and posterior distributions, at least

three techniues have been explored. &oftware applica#

tion @etica used rectangular nodes instead of circular, in

order to embed bar charts for the marginal probability

masses of each variable !1". &oftware application

ayesiaEab allowed the user to open a distribution win#

dow for each variable and compare the prior no evi#

dence and posterior with evidence distributions of avariable in two hori-ontal bar charts overlaid on one

another !15". Aapato#4ivera et al. used node diameter to

indicate large or small posterior probabilities for binary#

valued variables, and animation thereof to indicate

changes to posteriors under changing evidence !9".

To visuali-e local and global information simultaneously,

&undarara?an et al. employed a partition and fish#eye

approach to graph layout, letting the user define and

inspect local areas of interest in the networ0 while still

seeing the context and structure of the full networ0 !".

D common trait among most of the approaches is their

dependence on relatively static information about the

networ0, whether this be the conditional probability

tables, or simple posterior or marginal distributions.

Fur goal is to create a visuali-ation that captures a more

dynamic view of the ayesian networ0s, hopefully shin#

ing new light on information flowG and to scale effec#

tively in large networ0s. $e outline our basic design

choices next, using or iterating on prior wor0, and upon

this foundation introduce a more dynamic approach to

visuali-ing ayesian networ0s using inference diffs.



3/14

3 Visual oundation

3!1 "ssum#tions and Princi#les

Fur approach is to define a visual language eually suited

for print or personal computer, and consistent with the

principles proposed by +dward Tufte on )graphical dis#

plays* !". $hen a human#computer interaction is dis#

cussed, we assume one user at a time, that they are using

a mouse or a touch#screen interface, and that their dis#

play si-e and :> is that of an average tablet or des0top

display.

Fur underlying model to visuali-e is a ayesian networ0

of finitely many random variables, and each variable hav#

ing a finite event space. $e assume at the very least the

user would desire to be able to view the ayesian net#

wor0 structure, inspect local conditional probability dis#

tributions, see marginal or posterior distributions,

inspect the event space of each variable, and otherwise

clearly see the basic ma0eup of the networ0 instance.

These assumptions are sufficient for defining the founda#

tion of our visual design.

To stay disciplined and focused, we will also see0 to avoid

)chart ?un0*, avoid distorting data or potentially mis#

leading the user, and avoid unnecessary in0, maximi-ing

the )data#in0 ratio* !". +very pixel or drop of in0 should

convey information and convey so unambiguously.

3!2 Network Structure and $andom

Varia%les

The most basic information in a ayesian networ0 is the

structure and random variables therein. To present the

structure is to present the learned or constructed

causal influence between variables. Eogically, this ob?ect

is a directed acyclic graph :DHG visually, this is tradi#

tionally a collection of labeled circles and arrows

between them. $e largely continue this tradition.

4andom variables must be clearly identifiable, while at

the same time clutter must be 0ept under control, other#

wise it becomes noise. To this end, there are two views6

structural and legend. The structural view present a vari#

able as a single, circumscribed capital letter, ta0en from

the first letter in the name of the variable. The legend

view maps these letters to their full variable names, such

Atoage. ;apital letters are used in the structural view for

readability. Ds with previous methods, vertical ordering

in the structural view is presented causally top#down to

the extent possible, though this is not always possible,

especially in large networ0s.

To scale to large networ0s, we perform two more things.

Birst, where two or more random variables have the same

single letter in the structural view, we suffix their name

with a uniue number, chosen seuentially from 1. This

numerical suffix appears both in the structural view and

the legend view in subscript type. &econd, both views are

scrollable and both follow loosely the same top#to#bot#

tom variable ordering.

Fn the appearance of random variables in the structural

view, we de#emphasi-e the dar0 stro0e that traditionally

circumscribes the variable, as we will be using this stro0e

to carry meaning later.


4/14

within each variable, not over the collection of variables.

This permits us to design a reusable, optimal color pal#

ette with minimal visual confusion when within the con#

text of a single variable. Though there is possible ambigu#

ity in values of different variables sharing a same color,

our design avoids this ambiguity by always framing color

in the context of a specific variable. $here two or morevariables share the same event space, we reuse the color

mapping for stronger consistency.

To indicate what in fact

the color assignments

are for each value and

each variable, we aug#

ment our legend view to

list each event space

value with correspond#

ing assigned colors, seenin Bigure 2.

Bor categorical event

spaces our color map is

constructed so that no two contiguous colors are per#

ceived too near to one another6 for example, orange may

follow green but may not follow brown. Bor ordered

event spaces the color map is constructed in the opposite

fashion by seuentially choosing neighboring hues on the

color wheel, e.g. from the blue region, through yellow, to

the red region.

Ds will be important later, we ensure any presented color

order value order is constant, e.g. for a particular vari#

able, blue always appears first before orange.

3!* &onditional

Pro%a%ility +a%les

Fne of practical difficul#

ties with ayesian net#

wor0s can be the si-e of

each conditional proba#bility table ;T, or its

local distribution. 4ecall

that the si-e of a random

variable(s ;T is gener#

ally the number of prob#

ability weight assign#

ments specified, which grows exponentially in the num#

ber of parent variables or the in#degree of the variable.

&ome tools present the ;T as a single table with col#

umns for each permutation of parent values, but this

tends to reuire a large hori-ontal scrolling area. $e pro#

pose that vertical scrolling is more natural, and present

our ;T vertically. $e present each conditional probabil#ity distribution for a given parent permutation as a sim#

ple vertical list of probability densities. $e stac0 each

such list vertically, and separate each by their corre#

sponding parent value permutation, placed above it. $e

use our event space color mappings here, for each proba#

bility density and each parent valueG and again use corre#

sponding abbreviated variable names e.g. D1.

3!, (m%edded -istri%utions

3iewing the marginal distribution of each variableshould be convenientG that is, we want a clear way to see

P(X) for each random variable X . %ost tools

reuire the user open an additional window to see such

distributions, either in tabular form or bar chart. $e sim#

ply embed the distribution directly in the variable in the

structural view. To do this we construct a pie chart using

our event space color mapping, and render pie slices pro#

portional in si-e to the posterior probability mass of each

:raft v5 21' :ecember 9

Bigure '6 ;onditional probabilitytable view for a variable T2.

Bigure 26 3alues in each randomvariable event space are mappedto a preset color palette.

Bigure 6 %arginal or posterior distributions are embeddeddirectly in the variable via area pie chart.


5/14

value, starting at the 12 o(cloc0 position, allocating slices

in cloc0wise order.

This highly visual approach conveniently presents an

overview of the entire statistical model to the user, with#

out their needing to inspect variables one#by#one in

seuence, or in additional views. $here more precise

numerical inspection is needed, a we use an additionalcolor#coded tabular view similar to our ;T view.

3!. ()idence

Drguably the greatest power of a ayesian networ0

model once it is trained or constructed is in computing

posterior distributions of arbitrary evidence, i.e.

P(X| E) . $e will not explicitly discuss causal modeling

and interventions the analogue of evidence until later,

as most of our visual design for statistical models has a

natural extension into causal models.

Brom a visuali-ation perspective it is important that the

user clearly see which variables currently have evidence

and which do notG and furthermore, what specific values

of evidence have been specified. To indicate that a vari#

able has user#defined evidence, we circumscribe the vari#

able in the structural view with a strong blac0 stro0e.

%oreover, the interior of node is colored entirely with

the associated color of the evidence value, as seen in Big#

ure 5.

Binally, all embedded distributions for non#evidence

nodes are updated in the structural view to reflect each

variable(s new, posterior distribution. Fur previous visu#

ali-ation which embeds marginal distributions is of

course simply a special case of visuali-ing posterior dis#

tributions with an empty evidence set. $e will formali-e

our notion of evidence in finer detail shortly.

* /aking an Inference -iff

*!1 /oti)ation

$ith a basic visual foundation established, we focus our

attention to more sophisticated visual analysis methods.

The basic idea we introduce first is that of an inference

diff.

>nference and information flow is an important capability

of ayesian networ0s. ;onsider a large networ0 which

models the health of components in a large multi#compo#

nent system. Fne may wish to use this networ0 to as0

which components( probabilities of failure are affected

by one or more other variables, for instance ambient air

temperature, and for those affected, to what degreeG or

the inverse of this and as0 what are the most li0ely envi#

ronmental conditions given a failure in one or more com#

ponents. Ds networ0s grow in si-e, the answers to these

uestions can be as difficult to find as the right uestion

to as0 in the first place.

There are analytical tools such as d#separationG however,

such a tool is limited in its application, largely because a

ayesian networ0 is itself inherently limited in its ability

to describe certain independent relations !1". 4ecall that

for a networ0G , its >#map I(G ) may not be a minimal

>#map, meaning G contains unnecessary edges and is too

safe in its conditional independence claims. %oreover,

for a true ?oint distribution it may be impossible for any

ayesian networ0 to have a perfect >#map or #map.

There may also be context#specific independencies in the

networ0, not discernible in networ0 structure alone. Bur#

ther, the user may not need to 0now or care about cer#

tain dependencies if small or approximately indepen#dent. >n each case these issues can lessen the usefulness

of d#separation analysis in practice, or reuire more com#

plicated analysis.

Fn the other hand there is exhaustive computation,

using inference algorithms to produce complete poste#

rior distributions for some or all of the non#evidence


Bigure 56 +vidence nodes are circumscribed with a strong blac0

stro0e, and colored according to their evidence. Dll other nodes(embedded distributions are updated to reflect their posteriordistributions. Bor example, T1(s embedded distribution nowreflects P(T1 | V=v , A=a) rather than simply P(T1) .


6/14

variables. &uch output is much more detailed and precise,

but suffers from another limitation in that it is static.

Brom it there is no direct indication of how or to what

degree belief propagation occurred. There is simply a

before and after state of the distribution.

$hat we would li0e is a way to visuali-e in an obvious

way the effects of information flow through the networ0.To this end we find inspiration in modern software engi#

neering practices and so#called )diff* tools short for

)difference* tool. 4eviewing the )diff* of two or more

human#readable files is an every#day practice in com#

mercial software development, generally aided by the use

of color and side#by#side before#and#after views. Ds a

means of visuali-ingchange, diffs are highly effective and

visually intuitive. Fur goal is to find an as#effective

method of viewing inference and information flow in

ayesian networ0s, in hopes of enabling a more powerful0ind of visual analysis.

*!2 -efinition

$e start with a mathematical definition an )inference

diff*. Hiven a ayesian networ0 B describing a proba#

bilistic model over n random variables{Xi : i[1,n ]} ,

each with finite event space Si , and given two evidence

sets E1

and E2, each an element of the set of partial

observations i=1n

Si {?} , we define an inference diff

to be the set of pairs

={( P(Xi|E1), P(Xi|E2) ): i[1, n ] } .

>n other words, an inference diff is the set of pairs of con#

ditional probability distributions, for each random vari#

able, and according to two sets of evidence.

Bor example if the random variables of the networ0 are

X , A , and B , and B ta0es on value b in E1 then

E1=(? , ? , b ) and P(X| E

1)=P(X| B=b) . >f an evi#

dence set is eual to (? , ? , ..., ?) we say that it is empty.

@ote that we use ? to represent )unobserved* or)unspecified*.

*!3 Visualization

To visuali-e inference diffs we extend our use of pie

charts. Birst we establish convention that evidence set

E1 is in fact the evidence set when simply using the net#

wor0 to view a single set of posterior results. Bor instance

Bigure 5 represents an E1

with values for two variables,

and an E2

that is empty. To augment our visuali-ation

for the case when E2

is non#empty, we introduce for

each variable a )ring* chart, concentric with the vari#

able(s existing pie chart. $e reuse the event space colormap established for that variable, maintain a consistent

event space ordering, and again weight the chart slices in

proportion to posterior probability masses for that vari#

able, this time conditioned on E2

.

To indicate which variables have evidence specified, we

continue to use the strong blac0 stro0e, applied to either

the pie, the ring, or both, in accordance with which vari#

able and evidence set has evidence. y carefully reserv#

ing use of the blac0 stro0e earlier, we are able to apply it

here in a more nuanced fashion, to help disambiguatefrom which evidence set a variable(s evidence is speci#

fied.

This concentric design allows the user to ma0e direct

comparisons of the effects of evidence sets E1

and E2

,

uic0ly and easily. Dt least two classes of ueries can be

performed now and produce interesting visual answers.

Birst, one can view information flow more concretely,

simply by setting E1={?} , and E

2to any partial

observation. &econd, one can ma0e direct comparisons

between different non#empty evidence sets, such as as0#

ing whether observing some three variables is different

than observing only two of them, or as0ing how different

is it to observe variable Xi having some value than to

observe variable X j having that value.

:raft v5 21' :ecember 9

Bigure 6 Dn inference diff between two evidence sets. Ierevariable D1has observed values in both sets, indicated by ablac0 stro0e around both the inner circle evidence set 1and the outer ring evidence set 2.


7/14

, $ele)ance iltering

$hile the inference diff enables direct comparisons of

the effects of different evidence on each variable, it

doesn(t necessarily help guide the user to the variables

they may be interested in the most. This can be a prob#

lem in large ayesian networ0s, where there is simply

too much information visible simultaneously, or where

the user itself lac0s familiarity with the variables in the

model. $hat we would hope to achieve is a way to guide

the user to the variables they are li0ely to be interested

in, given some provided evidence. To accomplish this we

use the inference diff as our basis, and add to it relevance

filtering.

,!1 -efinition

$e define the relevanceof a random variable Xi as simply

the symmetric Cullbac0#Eeibler CE divergence !17" of

that random variable given its inference diff. %ore pre#

cisely, given an inference diff derived from evidence

sets E1

and E2

, we define the relevance r of a random

variable X as

r(X) =D KL (P(X| E1)|| P(X| E2) )+DKL ( P(X|E2)|| P(X| E1) )

.

The function DKL indicates the Cullbac0/Eeibler diver#

gence, with standard definition

DKL(P||Q )=i

ln(P(i)Q(i )

)P(i)

for probability distributions Pand Qsharing the same

event space. $e use the symmetric CE divergence so that

our definition of relevance is also symmetric. Hiven two

random variables Xi and X j ,ifr(Xi ) < r(Xj) then

we say that X j is more relevant than Xi given evidence

sets E1

and E2

.

This definition of relevance is chosen intuitively. ecause

distributions which differ greatly have high CE diver#

gence values, we are saying that the variables whose con#

ditional distributions changed the most between E1

and

E2

are the variables that are most relevant to the user.

,!2 Visualization

$ith relevance defined, we have our final mathematical

tool to complete our visuali-ation method. Bor large net#

wor0s it can be easy for evidence sets to produce minimal

differences in the posterior distributions of variables. >t is

with this situation in mind that we apply our definition

of relevance.

To do this, we first decide which variables are relevant

enough for the user. Hiven the current inference diff ,

we sort the variables of in descending order according

to their relevance. &econd, we introduce a user#config#

urable relevance threshold, represented as a percent

valuec between J and 1J. Binally, we tag each ran#

dom variable Xi as either )relevant* or )irrelevant*

according to whether whether Xi is in the top c percent

of variables ordered by relevance.

Eastly, we ad?ust our visuali-ation in the structure and

legend views. 3ariables which are irrelevant according to

thresholdc are shrun0, dimmed, and their pie and ring

charts removed in the structure view. $e also shorten

the edge lengths between any two collapsed variables,

and edges connecting at least one irrelevant variable are

changed to a dotted line rendering.

The overall effect is to shrin0 the virtual space needed

for the entire graph structure, which focuses the user(s

attention on the relatively small number of variables

remaining. Bor these remaining variables we continue to

show the pie and ring charts associated with the infer#

ence diff. $e also remove from the legend view irrele#


Bigure 76 Dn inference diff with relevance filtering enabled.3ariablesAand Vcontain evidence in at least one evidence seteach. ;ompared with the structure seen in Bigure 1, variableslabeled , T!, and"were least relevant given the evidence sets,and thus are reduced in si-e and visibility.


8/14

vant variables.

The end result is a user#controllable level of visual com#

plexity, and a clear and concise, ualitative view of infor#

mation flow where most impactful.

. -ata Set (0am#les

To test our proposed visuali-ations, we developed a sim#

ple application called B-Vis. >n building this application

we implemented our own structure learning and infer#

ence based on existing algorithms in the literature,open#sourced separately as #-A$ !18". Bor graph layout we

choose the library %ra&'( !19", ma0ing small modifica#

tions as necessary.


9/14

were most affected by the change in evidence. ecause

the &ugiyama layout algorithm is layered means there is

also resemblance between the before and after layouts.

The Traffic data set is somewhat special#case in that all

variables share the same event space, and that the vari#

ables are highly correlated for identical event values.

.!2 !S! 1 &ensus -ata Set

Bor a larger and more interesting networ0, we consider

the 199


10/14


Bigure 16 ;ensus networ0, consisting of 8 variables, -oomed out to reveal entirestructure.

Bigure 116 ;ensus networ0, with relevance filtering enabled for the currentinference diff. The top 2J most relevant variables retain their embeddedposterior distributions, while all other variables are reduced in si-e and visibility.


11/14

4 uture Work

4!1 &'allenges and Scaling urt'er

%aintaining layout stability is particularly important.

The user must be able to ad?ust relevance filtering with#

out radical changes to the layout, otherwise the experi#

ence uic0ly becomes disorienting. $e were able to

maintain a minimally sufficient level of stability, simply

by using the )&ugiyama +fficient* algorithm in %ra&'(.

Though this has some inherent stability due to its layered

solution, it is not uite perfect for our needs. $e would

li0e to incorporate a customi-ed or more sophisticated

layout algorithm with inference diffs and relevance fil#

tering in mind. &uch an algorithm may continue to be

layered, for instance with stability addressed in more

detail in &ugiyama(s original wor0s !2"G or forced#

directed with constraints, which has ongoing exploration

!2". >t may also be possible to combine the fish#eye tech#

niues of &undarara?an et al., by automatically configur#

ing their interest areas using the locations of our rele#

vant variables after relevance filtering !".

Fur choice of color as a modality for values is challenging

when scaling to large event spaces. Bor instance, some

variables in the ;ensus data set contained over 15 possi#

ble values. Fur color mapping at present contains only

uniue color values, meaning that for such variables

some colors were used multiple times. ecause we

present a consistent ordering, both radially and in the

legend view values, ambiguity is mostly removed, but

reuires additional mental energy on the part of the user.

Bor categorical variables with significantly more than 15

values, the effectiveness of the color mapped approach is

expected to fall apart. Bor color#blind users, using a lim#

ited color palette creates further difficulty in scaling. Fne

interesting possibility may lay in collapsing colors in

inference diffs, where the probability masses of certainvalues are small or have changed very little in the diff for

that variable. ;oloring continuous#valued variables is not

addressed here, but may be possible as well, perhaps by

bounding the event space and assigning distinct colors to

special points of significance in the event space, with

weighted color blending radially.

4!2 "##lications to 5t'er 6ra#'ical /odels

$ith respect to ayesian networ0s, though the models

presented here are statistical, rather than causal, aninference diff is possibly most useful in a networ0 with

truly causal modeling. Bor example in medicine, one

could uic0ly as0 the networ0 for a visual answer to the

uestion6 given a patient with conditions X and Y , i.e.

X=true ,Y=true in E1

and E2

, what are the largest

differences expected between prescribing treatment A

versus treatmentB , i.e. do (A=true , B=fa!e) in E1

and do (A=fa!e , B=true ) in E2

.

Fther probabilistic networ0 models may benefit from

inference diff and relevance filtering visuali-ations, suchas %ar0ov random fields which have similar inference

and belief propagation capabilities.

4!3 nused /odalities

nstead of

beginning embedded distributions( slices at the 12o(cloc0 position, the start degree could be varied and

demarcated to carry significance of some 0ind. The shape

of nodes in the structural view could also convey infor#

mation, such as whether the node captures a causal

dependency.


Bigure 126 >nference diff of variable (means( oftransportation. The diff is generated from an emptyevidence set 1, and a evidence set 2 with (income(Ntrue.


12/14

4!* 5t'er /etrics for $ele)ance

%etrics other than CE#divergence may be useful or more

appropriate for relevance filtering. 4egardless of the

basic metric chosen, it may be useful if weights can be

attached to values in the event spaces, such that large

changes to probability masses of low#weight events

count less toward ma0ing that variable relevant.

7 &onclusion

3isuali-ation methods are increasingly important as the

scope and uantity of data increases. The flexibility and

distributed nature of ayesian networ0s, and graphical

models in general, ma0e them one of many useful tools

for machine learning. >n this pro?ect explored ways to

visuali-e these networ0s, and introduced a number of

new visuali-ation techniues.

Birst, viewing the structure of a networ0 is but half the

coin. $e complete the other half by placing posterior dis#

tributions directly in the networ0 as embedded pie

charts. &econd, direct comparisons are incredibly useful

tools, yet past visuali-ations have struggled with this. $e

propose inference diffs as a method of meaningful direct

comparisons, with concentric pie and ring charts for

visuali-ation. Binally, navigating efficiently in large

ayesian networ0s is generally a challenge. $e introduce

relevance filtering, with CE divergence as a mathematical

basis, as a tool to guide the user to variables of interest inthe model.

"##endi0 "8 S#ecifying ()idence

>n implementing B-Viswe sought a convenient interac#

tion model for allowing the user to add or remove evi#

dence. Burther, we wanted the interaction to exhibit

cohesiveness with the existing design language. The

practical challenges are in presenting potentially many

possible values efficiently e.g. without resorting to

scrolling or multi#level menus, and in providing a mech#

anism for choosing the target evidence set i.e. E1

or E2

.

>n 0eeping with the use of color and circular charts, we

opted for a radial menu approach, seen in Bigure 1'. +ach

node in the networ0 is directly clic0able, opening a radial

menu with color#coded choices available for each possi#

ble value in the event space for that variable. The user

then simply drags a value from the radial menu onto

either the inner circle or the inner ring, corresponding to

evidence sets 1 and 2 respectively. To remove evidence,

the user performs the action in reverse, dragging from

the inner circle or ring to anywhere else outside of thenode.

Ds with before, the legend view provides a reference of

the event space to color mapping for the user. $e also

fade the rest of the networ0 to remove visual complexity

while choosing evidence. Dlso note that because no 0ey#

board input is necessary, this interaction model is also

touch#screen friendly.

"##endi0 B8 Inference "lgorit'm sed

Bor inference in the above networ0s we implemented

approximate inference using Hibbs sampling, a %ar0ov

chain method with theoretical convergence to the true

posterior distribution !1". This implementation is avail#

able publicly under our #-A$pro?ect !18". To reduce the

effects of statistical dependence from particle to particle,

we retain only one out of ever ad?acent particles when

building new posterior distributions. To seed the %ar0ov

chain we begin with a forward sampling of the networ0,followed by a warmup period of 25 Hibbs samples.

ecause this computational approach is iterative, we are

able to regularly update the embedded distributions in B-

Vis, effectively creating a real#time animation of the con#

vergence toward the stationary distribution. $e update

the structural view in B-Vis approximately once every


Bigure 1'6 Dn example of the evidence menu. This variable(sevent space is of si-e six, with corresponding color#codedvalues available for drag#drop into evidence sets 1 or 2.


13/14

', variable#iterations, or every iterations for a 5

variable networ0.

"##endi0 &8 Learning "lgorit'm sed

To create the ayesian networ0s seen, we wrote our own

implementation of general structure learning and ;T

learning. Fur algorithm is based almost entirely on the

ideas presented in section 18. of Coller and Briedman

!1", and amounts to an iterative search over the space of

possible networ0 structures. +ach possible structure has

an associate score, and the goal is to find a structure of

maximal score. $e use the ayesian information crite#

rion >; score for this purpose. $e use a uniform

:irichlet prior to effectively ad?ust the sufficient statis#

tics of the training set, which combined with the >;

score acts to control the level of complexity in the result#

ing networ0 structure.

The search algorithm is initiali-ed with a fully discon#

nected networ0, and at each iteration chooses the best

edge operation among a candidate of operations. +ach

candidate operation is either an edge addition, edge dele#

tion, or edge reversal, so long as the operation will not

violate acyclicity. $e compute the change to the total

score of the networ0 for each candidate action, and for

that iteration of search choose the action with largest

improvement to the total networ0 score. >f no action is

found that increases the total score, the search is haltedand the most recent structure is returned.

ecause each edge operation affects at most two local

family scores, most family scores from iteration to itera#

tion do not change. Ds further suggested in Coller and

Briedman !1", we exploit this fact to cache previously

computed family scores, and invalidate only when a cho#

sen edge operation affects the associated variable of a

cached family score. This ma0es the algorithm a dynamic

programming algorithm, and significantly improves the

computational demand. Burther, we also cache varioussufficient statistics values associated with the training

set, also speeding up the overall process.

"##endi0 -8 Software -esign

Dn important philosophy in the application design was

that anything that can be visuali-ed should be visuali-ed.

The software design of B-Vis can be divided into three

main modules6 the statistical learning O inference, the

application data model, and the graphics presentation.

Eearning and inference is implemented in BK a deriva#

tive of F;aml on the %icrosoft .@+T Bramewor0 v.5.

3ariables, event spaces, observations, distributions, and

ayesian networ0s are all represented through ob?ect#oriented types, while algorithms are generally imple#

mented as pure functions. Dn important reali-ation part#

way through the development process was that complex#

ity was significantly reduced in adopting an immutable

design approach for each ob?ect type. The final imple#

mentation of most ob?ects are in fact immutable, such

that most methods that would otherwise mutate an

ob?ect instead return a new instance of a derivative

ob?ect. Dnecdotally, when the probability distribution

type was converted to be immutable, a significant speed#up in structure learning occurred, presumably due to a

more hardware cache#friendly memory access pattern.

Dll learning and inference code is available publicly on

HitIub as pro?ect #-A$!18".

The application data model, written in ;K, was an impor#

tant separate layer in the development of B-Vis. The data

model maintains a source of truth for the application,

and manages a number of bac0ground threads for

simultaneous user#input and bac0ground processing. Dt

any given moment up to four threads are active, for

learning, inference, layout, and user#input O rendering.

>n a model#view#controller paradigm, our data model

acts as both a model and controller, translating user

actions into messages for the learning, inference, and

layout threads. This approach was advantageous in that

it separated user#interface and rendering concerns from

threading and computational concerns.

Binally for user#input and presentation we used the vec#

tor drawing features of %icrosoft $B v.5. $e imple#

mented a number of reusable visual controls, for individ#

ual responsibilities such as rendering pie charts, ring

charts, for rendering vertices and edges, and for routing

raw user#input events to the application data model.

"cknowledgments

>(d li0e to ac0nowledge the ;&+ department at


14/14

:iego for their thoughtful curriculum design and faculty.

>(d li0e to than0 rof. ;harles +l0an for his instruction

and guidance, both in the sub?ects of machine learning

and in data graphics.

$eferences

!1" %. Ees0, Iow %uch >nformation >s There >n the$orldM, 1997http6LLcourses.cs.washington.eduLcoursesLcse59sL'auLles0.pdf

!2" Twitter, >nc.. nformation. Hraphics ress EE;. 21

!5" Q. earl, ayesian @etwor0s, 2

!" . C. &undarara?anG F. Q. %engshoelG T. &el0er, %ulti#focus and %ulti#window Techniues for >nteractive@etwor0 +xploration, 21'

!7" Q. earl. robabilistic reasoning in intelligent systems6networ0s of plausible inference. %organ Caufmann. 1988

!8" &. +. &chaeffer. Hraph clustering. ;omputer &cience4eview 1#1. 27

!9" Q.#:. Aapata#4iveraG +. @eufeldG Q. +. Hreer.

3isuali-ation of ayesian elief @etwor0s. roceedings of>+++ 3isuali-ationR99. 1999

!1" ;.#I. ;hiangG . &haughnessyG H. EivingstonG H.Hrinstein. 3isuali-ing Hraphical robabilistic %odels.Technical 4eport 25#17, ntegration of Dnalyticsand 3isuali-ation. 211

!1'" E. $illiamsG 4. &t. Dmant. D 3isuali-ation Techniuefor ayesian %odeling. roc. of >. 3ol. .. 2

!1" @etica. http6LLwww.norsys.comLnetica.html.Dccessed 21'

!15" ayesiaEab. http6LLwww.bayesia.comL. Dccessed21'

!1" :. Coller, @. Briedman. robabilistic Hraphical %odel6rinciples and Techniues. %assachusetts >nstitute ofTechnology. 29

!17" &. Cullbac0G 4. D. Eeibler. Fn >nformation and&ufficiency. The Dnnals of %athematical &tatistics 22.1 .1951

!18" B#D>. https6LLgithub.comLduc0maestroLB#D>.Dccessed 21'

!19" HraphK. http6LLgraphsharp.codeplex.comL. Dccessed21'

!2" C. &ugiyama. Hraph :rawing and Dpplications for&oftware and Cnowledge +ngineers. $orld &cientific ub;o >nc. 22

!21" D. Crause, ;. +. Huestrin. @ear#optimal @onmyopic3alue of >nformation in Hraphical %odels. Twenty#Birst;onference on ntelligence. 212

!22" :. &hahafG D. ;hechet0aG ;. Huestrin. Eearning Thin

Qunction Trees via Hraph ;uts. >nternational ;onferenceon Drtificial >ntelligence and &tatistics. 29

!2'" ;. %ee0, . Thiesson, :. Iec0erman. The Eearning#;urve &ampling %ethod Dpplied to %odel#ased;lustering. The Qournal of %achine Eearning 4esearch.22

!2" C. ache, %. Eichman. %achine Eearning4epository.

visualizing inference in large bayesian networks

Documents