visualizing inference in large bayesian networks

Upload: duckmaestro

Post on 04-Jun-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/13/2019 Visualizing Inference in Large Bayesian Networks

    1/14

    1 IntroductionThe application of machine learning in the 21 stcentury is

    increasingly both exciting and challenging, with many

    orders of magnitude more digital data available than

    before. The amount of text data on the internet has

    increased from an estimated couple terabytes in 1997, to

    Twitter.com alone storing 5 gigabytes of new tweets

    daily !1"!2". This does not count the many private data#

    bases used in enterprise, such as the petabytes of cus#

    tomer and transaction information $al#%art &tores

    retains !'".

    Though the collectionof raw data will continue to have its

    challenges and costs, significant attention has now

    turned to the problem of utilizingall of this data. $hether

    the goal is indexing, data#mining, or building predictive

    models, today(s challenge is fundamentally tied to the

    enormous number of observations and variables cap#

    tured. This is the so#called )big data* problem.

    +ually important as the algorithms or storage systems

    are the visuali-ation methods. resentation is not simply

    an aesthetic concern. +dward Tufte writes that )often themost effective way to describe, explore, and summari-e a

    set of numbers / even a very large set / is to loo0 at pic#

    tures of those numbers,* and that data graphics can be

    both )the simplest !and" most powerful* of methods !".

    ayesian networ0s in many applications enable efficient

    and scalable statistical and causal modeling !5", and have

    a natural visual representation following from theirgraph structure. y viewing a visual representation of

    the graph structure one can uic0ly identify potential

    correlations or causal relationships between variables

    simply by viewing the presence of edges in the rendered

    graph structure. eyond this simple case, more sophisti#

    cated analysis by visual means alone is difficult, espe#

    cially as the model grows in si-e. 3iewing conditional dis#

    tributions with more than a couple parent variables

    uic0ly becomes unwieldy, and networ0s with upwards

    of 5 variables can be difficult for a user to navigate and

    parse visually. 4ecent wor0 has focused on improvingvisuali-ation and navigation in large networ0s of up to

    thousands of variables !", and is not a solved problem

    yet. To understand these large, modern networ0s more

    efficiently, and in turn better utili-e the wealth of data

    available in the era of big data, new methods of visuali-a#

    tion are needed.

    To this end, we introduce two ideas to assist the visual

    analysis of large ayesian networ0s6 inference diffs and rel-

    evance filtering.

    2 Summary of Prior Work

    Though creating effective ways to visuali-e ayesian net#

    wor0s is not a new problem relative to the age of

    )ayesian networ0s* proposed in detail by earl in 1988

    !7", it appears to be a problem that has received rela#

    :raft v5 1 21' :ecember 9

    Visualizing Inference in Large Bayesian Networks

    ;lifford ;hampion:epartment of ;omputer &cience and +ngineering

    n this pro?ect we address the challenge of viewing and usingayesian networ0s as their structural si-e and complexity grows. $eintroduce two new visuali-ation methods, inference diffs, and relevancefiltering, to enable visual analysis of information flow in thesenetwor0s, and to enable direct comparison of two evidenceconfigurations simultaneously. $e implement and discuss theperformance of these visuali-ation methods on two modestly largenetwor0s, built from real#world data.

  • 8/13/2019 Visualizing Inference in Large Bayesian Networks

    2/14

    tively little attention. $hile there have been advances in

    visuali-ing large graphs such as those surveyed by &cha#

    effer !8", these methods depend on basic graph#theoretic

    information at most, such as cliues and node degree,

    and don(t directly consider the probabilistic aspects of

    ayesian networ0s.

    @evertheless, some variety of visual designs and princi#ples specific to ayesian networ0s have been developed

    or explored over time, and are briefly recounted here.

    To visuali-e causal relationships globally, so#called tem#

    poral or causal layouts are popular, placing ancestors

    e.g. independent variables near the top of the visual

    layout and descendents e.g. dependent variables near

    the bottom, for a generally downward flow of edge direc#

    tions, for a downward flow of causation. This 0ind of lay#

    out is often used without explicit mention and is a fea#

    ture of some directed graph layout algorithms, but Aap#ato#4ivera et al., and ;hiang et al. called out this layout

    explicitly !9"!1".

    To visuali-e local influence i.e. between exactly two vari#

    ables the direction of the edge arrow is of course well#

    established for indicating the direction of modeled cause

    and effect. eyond simply the edge direction, or its pres#

    ence at all, Aapato#4ivera et al. explored fixed color

    assignments to independent variables, and color mix#

    tures thereof to dependent variables, weighted to indi#

    cate relative strength influence from the parent variables!9". Aapato#4ivera also considered varying edge lengths,

    so that mutually influential nodes appear nearer to one

    another than if uninfluential. Burther, both Aap#

    ato#4ivera et al. and Coiter explored varying edge thic0#

    ness to indicate influence between parent and child vari#

    ables, using thic0 lines for strong influence. +ach dis#

    cussed various analytical definitions for computing the

    input values needed for these visuali-ation techniues !9"

    !11".

    To visuali-e conditional probability tables ;Ts, ;hainget al. proposed miniature 2d heatmaps attached to edges

    !1", however this is appears to be well#defined only for

    children with exactly one parent each. ;ossalter et al.

    introduced )bubble lines* connecting nodes in the net#

    wor0 to floating ;T windows, ma0ing it easier for the

    user to 0eep their bearings while debugging ;Ts in large

    networ0s. They also introduced a numerical difference

    view for viewing ;Ts of two variables expected to have

    similar local distributions !12".

    To visuali-e the presence of evidence, common practice

    in literature is to draw a double#border around observed

    nodes variables with evidence, or to use shading on the

    interior of the node. $illiams and Dmant experimentedwith using different colors of shading to indicate differ#

    ent evidence values !1'".

    To visuali-e marginal and posterior distributions, at least

    three techniues have been explored. &oftware applica#

    tion @etica used rectangular nodes instead of circular, in

    order to embed bar charts for the marginal probability

    masses of each variable !1". &oftware application

    ayesiaEab allowed the user to open a distribution win#

    dow for each variable and compare the prior no evi#

    dence and posterior with evidence distributions of avariable in two hori-ontal bar charts overlaid on one

    another !15". Aapato#4ivera et al. used node diameter to

    indicate large or small posterior probabilities for binary#

    valued variables, and animation thereof to indicate

    changes to posteriors under changing evidence !9".

    To visuali-e local and global information simultaneously,

    &undarara?an et al. employed a partition and fish#eye

    approach to graph layout, letting the user define and

    inspect local areas of interest in the networ0 while still

    seeing the context and structure of the full networ0 !".

    D common trait among most of the approaches is their

    dependence on relatively static information about the

    networ0, whether this be the conditional probability

    tables, or simple posterior or marginal distributions.

    Fur goal is to create a visuali-ation that captures a more

    dynamic view of the ayesian networ0s, hopefully shin#

    ing new light on information flowG and to scale effec#

    tively in large networ0s. $e outline our basic design

    choices next, using or iterating on prior wor0, and upon

    this foundation introduce a more dynamic approach to

    visuali-ing ayesian networ0s using inference diffs.

    :raft v5 2 21' :ecember 9

  • 8/13/2019 Visualizing Inference in Large Bayesian Networks

    3/14

    3 Visual oundation

    3!1 "ssum#tions and Princi#les

    Fur approach is to define a visual language eually suited

    for print or personal computer, and consistent with the

    principles proposed by +dward Tufte on )graphical dis#

    plays* !". $hen a human#computer interaction is dis#

    cussed, we assume one user at a time, that they are using

    a mouse or a touch#screen interface, and that their dis#

    play si-e and :> is that of an average tablet or des0top

    display.

    Fur underlying model to visuali-e is a ayesian networ0

    of finitely many random variables, and each variable hav#

    ing a finite event space. $e assume at the very least the

    user would desire to be able to view the ayesian net#

    wor0 structure, inspect local conditional probability dis#

    tributions, see marginal or posterior distributions,

    inspect the event space of each variable, and otherwise

    clearly see the basic ma0eup of the networ0 instance.

    These assumptions are sufficient for defining the founda#

    tion of our visual design.

    To stay disciplined and focused, we will also see0 to avoid

    )chart ?un0*, avoid distorting data or potentially mis#

    leading the user, and avoid unnecessary in0, maximi-ing

    the )data#in0 ratio* !". +very pixel or drop of in0 should

    convey information and convey so unambiguously.

    3!2 Network Structure and $andom

    Varia%les

    The most basic information in a ayesian networ0 is the

    structure and random variables therein. To present the

    structure is to present the learned or constructed

    causal influence between variables. Eogically, this ob?ect

    is a directed acyclic graph :DHG visually, this is tradi#

    tionally a collection of labeled circles and arrows

    between them. $e largely continue this tradition.

    4andom variables must be clearly identifiable, while at

    the same time clutter must be 0ept under control, other#

    wise it becomes noise. To this end, there are two views6

    structural and legend. The structural view present a vari#

    able as a single, circumscribed capital letter, ta0en from

    the first letter in the name of the variable. The legend

    view maps these letters to their full variable names, such

    Atoage. ;apital letters are used in the structural view for

    readability. Ds with previous methods, vertical ordering

    in the structural view is presented causally top#down to

    the extent possible, though this is not always possible,

    especially in large networ0s.

    To scale to large networ0s, we perform two more things.

    Birst, where two or more random variables have the same

    single letter in the structural view, we suffix their name

    with a uniue number, chosen seuentially from 1. This

    numerical suffix appears both in the structural view and

    the legend view in subscript type. &econd, both views are

    scrollable and both follow loosely the same top#to#bot#

    tom variable ordering.

    Fn the appearance of random variables in the structural

    view, we de#emphasi-e the dar0 stro0e that traditionally

    circumscribes the variable, as we will be using this stro0e

    to carry meaning later.

  • 8/13/2019 Visualizing Inference in Large Bayesian Networks

    4/14

    within each variable, not over the collection of variables.

    This permits us to design a reusable, optimal color pal#

    ette with minimal visual confusion when within the con#

    text of a single variable. Though there is possible ambigu#

    ity in values of different variables sharing a same color,

    our design avoids this ambiguity by always framing color

    in the context of a specific variable. $here two or morevariables share the same event space, we reuse the color

    mapping for stronger consistency.

    To indicate what in fact

    the color assignments

    are for each value and

    each variable, we aug#

    ment our legend view to

    list each event space

    value with correspond#

    ing assigned colors, seenin Bigure 2.

    Bor categorical event

    spaces our color map is

    constructed so that no two contiguous colors are per#

    ceived too near to one another6 for example, orange may

    follow green but may not follow brown. Bor ordered

    event spaces the color map is constructed in the opposite

    fashion by seuentially choosing neighboring hues on the

    color wheel, e.g. from the blue region, through yellow, to

    the red region.

    Ds will be important later, we ensure any presented color

    order value order is constant, e.g. for a particular vari#

    able, blue always appears first before orange.

    3!* &onditional

    Pro%a%ility +a%les

    Fne of practical difficul#

    ties with ayesian net#

    wor0s can be the si-e of

    each conditional proba#bility table ;T, or its

    local distribution. 4ecall

    that the si-e of a random

    variable(s ;T is gener#

    ally the number of prob#

    ability weight assign#

    ments specified, which grows exponentially in the num#

    ber of parent variables or the in#degree of the variable.

    &ome tools present the ;T as a single table with col#

    umns for each permutation of parent values, but this

    tends to reuire a large hori-ontal scrolling area. $e pro#

    pose that vertical scrolling is more natural, and present

    our ;T vertically. $e present each conditional probabil#ity distribution for a given parent permutation as a sim#

    ple vertical list of probability densities. $e stac0 each

    such list vertically, and separate each by their corre#

    sponding parent value permutation, placed above it. $e

    use our event space color mappings here, for each proba#

    bility density and each parent valueG and again use corre#

    sponding abbreviated variable names e.g. D1.

    3!, (m%edded -istri%utions

    3iewing the marginal distribution of each variableshould be convenientG that is, we want a clear way to see

    P(X) for each random variable X . %ost tools

    reuire the user open an additional window to see such

    distributions, either in tabular form or bar chart. $e sim#

    ply embed the distribution directly in the variable in the

    structural view. To do this we construct a pie chart using

    our event space color mapping, and render pie slices pro#

    portional in si-e to the posterior probability mass of each

    :raft v5 21' :ecember 9

    Bigure '6 ;onditional probabilitytable view for a variable T2.

    Bigure 26 3alues in each randomvariable event space are mappedto a preset color palette.

    Bigure 6 %arginal or posterior distributions are embeddeddirectly in the variable via area pie chart.

  • 8/13/2019 Visualizing Inference in Large Bayesian Networks

    5/14

    value, starting at the 12 o(cloc0 position, allocating slices

    in cloc0wise order.

    This highly visual approach conveniently presents an

    overview of the entire statistical model to the user, with#

    out their needing to inspect variables one#by#one in

    seuence, or in additional views. $here more precise

    numerical inspection is needed, a we use an additionalcolor#coded tabular view similar to our ;T view.

    3!. ()idence

    Drguably the greatest power of a ayesian networ0

    model once it is trained or constructed is in computing

    posterior distributions of arbitrary evidence, i.e.

    P(X| E) . $e will not explicitly discuss causal modeling

    and interventions the analogue of evidence until later,

    as most of our visual design for statistical models has a

    natural extension into causal models.

    Brom a visuali-ation perspective it is important that the

    user clearly see which variables currently have evidence

    and which do notG and furthermore, what specific values

    of evidence have been specified. To indicate that a vari#

    able has user#defined evidence, we circumscribe the vari#

    able in the structural view with a strong blac0 stro0e.

    %oreover, the interior of node is colored entirely with

    the associated color of the evidence value, as seen in Big#

    ure 5.

    Binally, all embedded distributions for non#evidence

    nodes are updated in the structural view to reflect each

    variable(s new, posterior distribution. Fur previous visu#

    ali-ation which embeds marginal distributions is of

    course simply a special case of visuali-ing posterior dis#

    tributions with an empty evidence set. $e will formali-e

    our notion of evidence in finer detail shortly.

    * /aking an Inference -iff

    *!1 /oti)ation

    $ith a basic visual foundation established, we focus our

    attention to more sophisticated visual analysis methods.

    The basic idea we introduce first is that of an inference

    diff.

    >nference and information flow is an important capability

    of ayesian networ0s. ;onsider a large networ0 which

    models the health of components in a large multi#compo#

    nent system. Fne may wish to use this networ0 to as0

    which components( probabilities of failure are affected

    by one or more other variables, for instance ambient air

    temperature, and for those affected, to what degreeG or

    the inverse of this and as0 what are the most li0ely envi#

    ronmental conditions given a failure in one or more com#

    ponents. Ds networ0s grow in si-e, the answers to these

    uestions can be as difficult to find as the right uestion

    to as0 in the first place.

    There are analytical tools such as d#separationG however,

    such a tool is limited in its application, largely because a

    ayesian networ0 is itself inherently limited in its ability

    to describe certain independent relations !1". 4ecall that

    for a networ0G , its >#map I(G ) may not be a minimal

    >#map, meaning G contains unnecessary edges and is too

    safe in its conditional independence claims. %oreover,

    for a true ?oint distribution it may be impossible for any

    ayesian networ0 to have a perfect >#map or #map.

    There may also be context#specific independencies in the

    networ0, not discernible in networ0 structure alone. Bur#

    ther, the user may not need to 0now or care about cer#

    tain dependencies if small or approximately indepen#dent. >n each case these issues can lessen the usefulness

    of d#separation analysis in practice, or reuire more com#

    plicated analysis.

    Fn the other hand there is exhaustive computation,

    using inference algorithms to produce complete poste#

    rior distributions for some or all of the non#evidence

    :raft v5 5 21' :ecember 9

    Bigure 56 +vidence nodes are circumscribed with a strong blac0

    stro0e, and colored according to their evidence. Dll other nodes(embedded distributions are updated to reflect their posteriordistributions. Bor example, T1(s embedded distribution nowreflects P(T1 | V=v , A=a) rather than simply P(T1) .

  • 8/13/2019 Visualizing Inference in Large Bayesian Networks

    6/14

    variables. &uch output is much more detailed and precise,

    but suffers from another limitation in that it is static.

    Brom it there is no direct indication of how or to what

    degree belief propagation occurred. There is simply a

    before and after state of the distribution.

    $hat we would li0e is a way to visuali-e in an obvious

    way the effects of information flow through the networ0.To this end we find inspiration in modern software engi#

    neering practices and so#called )diff* tools short for

    )difference* tool. 4eviewing the )diff* of two or more

    human#readable files is an every#day practice in com#

    mercial software development, generally aided by the use

    of color and side#by#side before#and#after views. Ds a

    means of visuali-ingchange, diffs are highly effective and

    visually intuitive. Fur goal is to find an as#effective

    method of viewing inference and information flow in

    ayesian networ0s, in hopes of enabling a more powerful0ind of visual analysis.

    *!2 -efinition

    $e start with a mathematical definition an )inference

    diff*. Hiven a ayesian networ0 B describing a proba#

    bilistic model over n random variables{Xi : i[1,n ]} ,

    each with finite event space Si , and given two evidence

    sets E1

    and E2, each an element of the set of partial

    observations i=1n

    Si {?} , we define an inference diff

    to be the set of pairs

    ={( P(Xi|E1), P(Xi|E2) ): i[1, n ] } .

    >n other words, an inference diff is the set of pairs of con#

    ditional probability distributions, for each random vari#

    able, and according to two sets of evidence.

    Bor example if the random variables of the networ0 are

    X , A , and B , and B ta0es on value b in E1 then

    E1=(? , ? , b ) and P(X| E

    1)=P(X| B=b) . >f an evi#

    dence set is eual to (? , ? , ..., ?) we say that it is empty.

    @ote that we use ? to represent )unobserved* or)unspecified*.

    *!3 Visualization

    To visuali-e inference diffs we extend our use of pie

    charts. Birst we establish convention that evidence set

    E1 is in fact the evidence set when simply using the net#

    wor0 to view a single set of posterior results. Bor instance

    Bigure 5 represents an E1

    with values for two variables,

    and an E2

    that is empty. To augment our visuali-ation

    for the case when E2

    is non#empty, we introduce for

    each variable a )ring* chart, concentric with the vari#

    able(s existing pie chart. $e reuse the event space colormap established for that variable, maintain a consistent

    event space ordering, and again weight the chart slices in

    proportion to posterior probability masses for that vari#

    able, this time conditioned on E2

    .

    To indicate which variables have evidence specified, we

    continue to use the strong blac0 stro0e, applied to either

    the pie, the ring, or both, in accordance with which vari#

    able and evidence set has evidence. y carefully reserv#

    ing use of the blac0 stro0e earlier, we are able to apply it

    here in a more nuanced fashion, to help disambiguatefrom which evidence set a variable(s evidence is speci#

    fied.

    This concentric design allows the user to ma0e direct

    comparisons of the effects of evidence sets E1

    and E2

    ,

    uic0ly and easily. Dt least two classes of ueries can be

    performed now and produce interesting visual answers.

    Birst, one can view information flow more concretely,

    simply by setting E1={?} , and E

    2to any partial

    observation. &econd, one can ma0e direct comparisons

    between different non#empty evidence sets, such as as0#

    ing whether observing some three variables is different

    than observing only two of them, or as0ing how different

    is it to observe variable Xi having some value than to

    observe variable X j having that value.

    :raft v5 21' :ecember 9

    Bigure 6 Dn inference diff between two evidence sets. Ierevariable D1has observed values in both sets, indicated by ablac0 stro0e around both the inner circle evidence set 1and the outer ring evidence set 2.

  • 8/13/2019 Visualizing Inference in Large Bayesian Networks

    7/14

    , $ele)ance iltering

    $hile the inference diff enables direct comparisons of

    the effects of different evidence on each variable, it

    doesn(t necessarily help guide the user to the variables

    they may be interested in the most. This can be a prob#

    lem in large ayesian networ0s, where there is simply

    too much information visible simultaneously, or where

    the user itself lac0s familiarity with the variables in the

    model. $hat we would hope to achieve is a way to guide

    the user to the variables they are li0ely to be interested

    in, given some provided evidence. To accomplish this we

    use the inference diff as our basis, and add to it relevance

    filtering.

    ,!1 -efinition

    $e define the relevanceof a random variable Xi as simply

    the symmetric Cullbac0#Eeibler CE divergence !17" of

    that random variable given its inference diff. %ore pre#

    cisely, given an inference diff derived from evidence

    sets E1

    and E2

    , we define the relevance r of a random

    variable X as

    r(X) =D KL (P(X| E1)|| P(X| E2) )+DKL ( P(X|E2)|| P(X| E1) )

    .

    The function DKL indicates the Cullbac0/Eeibler diver#

    gence, with standard definition

    DKL(P||Q )=i

    ln(P(i)Q(i )

    )P(i)

    for probability distributions Pand Qsharing the same

    event space. $e use the symmetric CE divergence so that

    our definition of relevance is also symmetric. Hiven two

    random variables Xi and X j ,ifr(Xi ) < r(Xj) then

    we say that X j is more relevant than Xi given evidence

    sets E1

    and E2

    .

    This definition of relevance is chosen intuitively. ecause

    distributions which differ greatly have high CE diver#

    gence values, we are saying that the variables whose con#

    ditional distributions changed the most between E1

    and

    E2

    are the variables that are most relevant to the user.

    ,!2 Visualization

    $ith relevance defined, we have our final mathematical

    tool to complete our visuali-ation method. Bor large net#

    wor0s it can be easy for evidence sets to produce minimal

    differences in the posterior distributions of variables. >t is

    with this situation in mind that we apply our definition

    of relevance.

    To do this, we first decide which variables are relevant

    enough for the user. Hiven the current inference diff ,

    we sort the variables of in descending order according

    to their relevance. &econd, we introduce a user#config#

    urable relevance threshold, represented as a percent

    valuec between J and 1J. Binally, we tag each ran#

    dom variable Xi as either )relevant* or )irrelevant*

    according to whether whether Xi is in the top c percent

    of variables ordered by relevance.

    Eastly, we ad?ust our visuali-ation in the structure and

    legend views. 3ariables which are irrelevant according to

    thresholdc are shrun0, dimmed, and their pie and ring

    charts removed in the structure view. $e also shorten

    the edge lengths between any two collapsed variables,

    and edges connecting at least one irrelevant variable are

    changed to a dotted line rendering.

    The overall effect is to shrin0 the virtual space needed

    for the entire graph structure, which focuses the user(s

    attention on the relatively small number of variables

    remaining. Bor these remaining variables we continue to

    show the pie and ring charts associated with the infer#

    ence diff. $e also remove from the legend view irrele#

    :raft v5 7 21' :ecember 9

    Bigure 76 Dn inference diff with relevance filtering enabled.3ariablesAand Vcontain evidence in at least one evidence seteach. ;ompared with the structure seen in Bigure 1, variableslabeled , T!, and"were least relevant given the evidence sets,and thus are reduced in si-e and visibility.

  • 8/13/2019 Visualizing Inference in Large Bayesian Networks

    8/14

    vant variables.

    The end result is a user#controllable level of visual com#

    plexity, and a clear and concise, ualitative view of infor#

    mation flow where most impactful.

    . -ata Set (0am#les

    To test our proposed visuali-ations, we developed a sim#

    ple application called B-Vis. >n building this application

    we implemented our own structure learning and infer#

    ence based on existing algorithms in the literature,open#sourced separately as #-A$ !18". Bor graph layout we

    choose the library %ra&'( !19", ma0ing small modifica#

    tions as necessary.

  • 8/13/2019 Visualizing Inference in Large Bayesian Networks

    9/14

    were most affected by the change in evidence. ecause

    the &ugiyama layout algorithm is layered means there is

    also resemblance between the before and after layouts.

    The Traffic data set is somewhat special#case in that all

    variables share the same event space, and that the vari#

    ables are highly correlated for identical event values.

    .!2 !S! 1 &ensus -ata Set

    Bor a larger and more interesting networ0, we consider

    the 199

  • 8/13/2019 Visualizing Inference in Large Bayesian Networks

    10/14

    :raft v5 1 21' :ecember 9

    Bigure 16 ;ensus networ0, consisting of 8 variables, -oomed out to reveal entirestructure.

    Bigure 116 ;ensus networ0, with relevance filtering enabled for the currentinference diff. The top 2J most relevant variables retain their embeddedposterior distributions, while all other variables are reduced in si-e and visibility.

  • 8/13/2019 Visualizing Inference in Large Bayesian Networks

    11/14

    4 uture Work

    4!1 &'allenges and Scaling urt'er

    %aintaining layout stability is particularly important.

    The user must be able to ad?ust relevance filtering with#

    out radical changes to the layout, otherwise the experi#

    ence uic0ly becomes disorienting. $e were able to

    maintain a minimally sufficient level of stability, simply

    by using the )&ugiyama +fficient* algorithm in %ra&'(.

    Though this has some inherent stability due to its layered

    solution, it is not uite perfect for our needs. $e would

    li0e to incorporate a customi-ed or more sophisticated

    layout algorithm with inference diffs and relevance fil#

    tering in mind. &uch an algorithm may continue to be

    layered, for instance with stability addressed in more

    detail in &ugiyama(s original wor0s !2"G or forced#

    directed with constraints, which has ongoing exploration

    !2". >t may also be possible to combine the fish#eye tech#

    niues of &undarara?an et al., by automatically configur#

    ing their interest areas using the locations of our rele#

    vant variables after relevance filtering !".

    Fur choice of color as a modality for values is challenging

    when scaling to large event spaces. Bor instance, some

    variables in the ;ensus data set contained over 15 possi#

    ble values. Fur color mapping at present contains only

    uniue color values, meaning that for such variables

    some colors were used multiple times. ecause we

    present a consistent ordering, both radially and in the

    legend view values, ambiguity is mostly removed, but

    reuires additional mental energy on the part of the user.

    Bor categorical variables with significantly more than 15

    values, the effectiveness of the color mapped approach is

    expected to fall apart. Bor color#blind users, using a lim#

    ited color palette creates further difficulty in scaling. Fne

    interesting possibility may lay in collapsing colors in

    inference diffs, where the probability masses of certainvalues are small or have changed very little in the diff for

    that variable. ;oloring continuous#valued variables is not

    addressed here, but may be possible as well, perhaps by

    bounding the event space and assigning distinct colors to

    special points of significance in the event space, with

    weighted color blending radially.

    4!2 "##lications to 5t'er 6ra#'ical /odels

    $ith respect to ayesian networ0s, though the models

    presented here are statistical, rather than causal, aninference diff is possibly most useful in a networ0 with

    truly causal modeling. Bor example in medicine, one

    could uic0ly as0 the networ0 for a visual answer to the

    uestion6 given a patient with conditions X and Y , i.e.

    X=true ,Y=true in E1

    and E2

    , what are the largest

    differences expected between prescribing treatment A

    versus treatmentB , i.e. do (A=true , B=fa!e) in E1

    and do (A=fa!e , B=true ) in E2

    .

    Fther probabilistic networ0 models may benefit from

    inference diff and relevance filtering visuali-ations, suchas %ar0ov random fields which have similar inference

    and belief propagation capabilities.

    4!3 nused /odalities

    nstead of

    beginning embedded distributions( slices at the 12o(cloc0 position, the start degree could be varied and

    demarcated to carry significance of some 0ind. The shape

    of nodes in the structural view could also convey infor#

    mation, such as whether the node captures a causal

    dependency.

    :raft v5 11 21' :ecember 9

    Bigure 126 >nference diff of variable (means( oftransportation. The diff is generated from an emptyevidence set 1, and a evidence set 2 with (income(Ntrue.

  • 8/13/2019 Visualizing Inference in Large Bayesian Networks

    12/14

    4!* 5t'er /etrics for $ele)ance

    %etrics other than CE#divergence may be useful or more

    appropriate for relevance filtering. 4egardless of the

    basic metric chosen, it may be useful if weights can be

    attached to values in the event spaces, such that large

    changes to probability masses of low#weight events

    count less toward ma0ing that variable relevant.

    7 &onclusion

    3isuali-ation methods are increasingly important as the

    scope and uantity of data increases. The flexibility and

    distributed nature of ayesian networ0s, and graphical

    models in general, ma0e them one of many useful tools

    for machine learning. >n this pro?ect explored ways to

    visuali-e these networ0s, and introduced a number of

    new visuali-ation techniues.

    Birst, viewing the structure of a networ0 is but half the

    coin. $e complete the other half by placing posterior dis#

    tributions directly in the networ0 as embedded pie

    charts. &econd, direct comparisons are incredibly useful

    tools, yet past visuali-ations have struggled with this. $e

    propose inference diffs as a method of meaningful direct

    comparisons, with concentric pie and ring charts for

    visuali-ation. Binally, navigating efficiently in large

    ayesian networ0s is generally a challenge. $e introduce

    relevance filtering, with CE divergence as a mathematical

    basis, as a tool to guide the user to variables of interest inthe model.

    "##endi0 "8 S#ecifying ()idence

    >n implementing B-Viswe sought a convenient interac#

    tion model for allowing the user to add or remove evi#

    dence. Burther, we wanted the interaction to exhibit

    cohesiveness with the existing design language. The

    practical challenges are in presenting potentially many

    possible values efficiently e.g. without resorting to

    scrolling or multi#level menus, and in providing a mech#

    anism for choosing the target evidence set i.e. E1

    or E2

    .

    >n 0eeping with the use of color and circular charts, we

    opted for a radial menu approach, seen in Bigure 1'. +ach

    node in the networ0 is directly clic0able, opening a radial

    menu with color#coded choices available for each possi#

    ble value in the event space for that variable. The user

    then simply drags a value from the radial menu onto

    either the inner circle or the inner ring, corresponding to

    evidence sets 1 and 2 respectively. To remove evidence,

    the user performs the action in reverse, dragging from

    the inner circle or ring to anywhere else outside of thenode.

    Ds with before, the legend view provides a reference of

    the event space to color mapping for the user. $e also

    fade the rest of the networ0 to remove visual complexity

    while choosing evidence. Dlso note that because no 0ey#

    board input is necessary, this interaction model is also

    touch#screen friendly.

    "##endi0 B8 Inference "lgorit'm sed

    Bor inference in the above networ0s we implemented

    approximate inference using Hibbs sampling, a %ar0ov

    chain method with theoretical convergence to the true

    posterior distribution !1". This implementation is avail#

    able publicly under our #-A$pro?ect !18". To reduce the

    effects of statistical dependence from particle to particle,

    we retain only one out of ever ad?acent particles when

    building new posterior distributions. To seed the %ar0ov

    chain we begin with a forward sampling of the networ0,followed by a warmup period of 25 Hibbs samples.

    ecause this computational approach is iterative, we are

    able to regularly update the embedded distributions in B-

    Vis, effectively creating a real#time animation of the con#

    vergence toward the stationary distribution. $e update

    the structural view in B-Vis approximately once every

    :raft v5 12 21' :ecember 9

    Bigure 1'6 Dn example of the evidence menu. This variable(sevent space is of si-e six, with corresponding color#codedvalues available for drag#drop into evidence sets 1 or 2.

  • 8/13/2019 Visualizing Inference in Large Bayesian Networks

    13/14

    ', variable#iterations, or every iterations for a 5

    variable networ0.

    "##endi0 &8 Learning "lgorit'm sed

    To create the ayesian networ0s seen, we wrote our own

    implementation of general structure learning and ;T

    learning. Fur algorithm is based almost entirely on the

    ideas presented in section 18. of Coller and Briedman

    !1", and amounts to an iterative search over the space of

    possible networ0 structures. +ach possible structure has

    an associate score, and the goal is to find a structure of

    maximal score. $e use the ayesian information crite#

    rion >; score for this purpose. $e use a uniform

    :irichlet prior to effectively ad?ust the sufficient statis#

    tics of the training set, which combined with the >;

    score acts to control the level of complexity in the result#

    ing networ0 structure.

    The search algorithm is initiali-ed with a fully discon#

    nected networ0, and at each iteration chooses the best

    edge operation among a candidate of operations. +ach

    candidate operation is either an edge addition, edge dele#

    tion, or edge reversal, so long as the operation will not

    violate acyclicity. $e compute the change to the total

    score of the networ0 for each candidate action, and for

    that iteration of search choose the action with largest

    improvement to the total networ0 score. >f no action is

    found that increases the total score, the search is haltedand the most recent structure is returned.

    ecause each edge operation affects at most two local

    family scores, most family scores from iteration to itera#

    tion do not change. Ds further suggested in Coller and

    Briedman !1", we exploit this fact to cache previously

    computed family scores, and invalidate only when a cho#

    sen edge operation affects the associated variable of a

    cached family score. This ma0es the algorithm a dynamic

    programming algorithm, and significantly improves the

    computational demand. Burther, we also cache varioussufficient statistics values associated with the training

    set, also speeding up the overall process.

    "##endi0 -8 Software -esign

    Dn important philosophy in the application design was

    that anything that can be visuali-ed should be visuali-ed.

    The software design of B-Vis can be divided into three

    main modules6 the statistical learning O inference, the

    application data model, and the graphics presentation.

    Eearning and inference is implemented in BK a deriva#

    tive of F;aml on the %icrosoft .@+T Bramewor0 v.5.

    3ariables, event spaces, observations, distributions, and

    ayesian networ0s are all represented through ob?ect#oriented types, while algorithms are generally imple#

    mented as pure functions. Dn important reali-ation part#

    way through the development process was that complex#

    ity was significantly reduced in adopting an immutable

    design approach for each ob?ect type. The final imple#

    mentation of most ob?ects are in fact immutable, such

    that most methods that would otherwise mutate an

    ob?ect instead return a new instance of a derivative

    ob?ect. Dnecdotally, when the probability distribution

    type was converted to be immutable, a significant speed#up in structure learning occurred, presumably due to a

    more hardware cache#friendly memory access pattern.

    Dll learning and inference code is available publicly on

    HitIub as pro?ect #-A$!18".

    The application data model, written in ;K, was an impor#

    tant separate layer in the development of B-Vis. The data

    model maintains a source of truth for the application,

    and manages a number of bac0ground threads for

    simultaneous user#input and bac0ground processing. Dt

    any given moment up to four threads are active, for

    learning, inference, layout, and user#input O rendering.

    >n a model#view#controller paradigm, our data model

    acts as both a model and controller, translating user

    actions into messages for the learning, inference, and

    layout threads. This approach was advantageous in that

    it separated user#interface and rendering concerns from

    threading and computational concerns.

    Binally for user#input and presentation we used the vec#

    tor drawing features of %icrosoft $B v.5. $e imple#

    mented a number of reusable visual controls, for individ#

    ual responsibilities such as rendering pie charts, ring

    charts, for rendering vertices and edges, and for routing

    raw user#input events to the application data model.

    "cknowledgments

    >(d li0e to ac0nowledge the ;&+ department at

  • 8/13/2019 Visualizing Inference in Large Bayesian Networks

    14/14

    :iego for their thoughtful curriculum design and faculty.

    >(d li0e to than0 rof. ;harles +l0an for his instruction

    and guidance, both in the sub?ects of machine learning

    and in data graphics.

    $eferences

    !1" %. Ees0, Iow %uch >nformation >s There >n the$orldM, 1997http6LLcourses.cs.washington.eduLcoursesLcse59sL'auLles0.pdf

    !2" Twitter, >nc.. nformation. Hraphics ress EE;. 21

    !5" Q. earl, ayesian @etwor0s, 2

    !" . C. &undarara?anG F. Q. %engshoelG T. &el0er, %ulti#focus and %ulti#window Techniues for >nteractive@etwor0 +xploration, 21'

    !7" Q. earl. robabilistic reasoning in intelligent systems6networ0s of plausible inference. %organ Caufmann. 1988

    !8" &. +. &chaeffer. Hraph clustering. ;omputer &cience4eview 1#1. 27

    !9" Q.#:. Aapata#4iveraG +. @eufeldG Q. +. Hreer.

    3isuali-ation of ayesian elief @etwor0s. roceedings of>+++ 3isuali-ationR99. 1999

    !1" ;.#I. ;hiangG . &haughnessyG H. EivingstonG H.Hrinstein. 3isuali-ing Hraphical robabilistic %odels.Technical 4eport 25#17, ntegration of Dnalyticsand 3isuali-ation. 211

    !1'" E. $illiamsG 4. &t. Dmant. D 3isuali-ation Techniuefor ayesian %odeling. roc. of >. 3ol. .. 2

    !1" @etica. http6LLwww.norsys.comLnetica.html.Dccessed 21'

    !15" ayesiaEab. http6LLwww.bayesia.comL. Dccessed21'

    !1" :. Coller, @. Briedman. robabilistic Hraphical %odel6rinciples and Techniues. %assachusetts >nstitute ofTechnology. 29

    !17" &. Cullbac0G 4. D. Eeibler. Fn >nformation and&ufficiency. The Dnnals of %athematical &tatistics 22.1 .1951

    !18" B#D>. https6LLgithub.comLduc0maestroLB#D>.Dccessed 21'

    !19" HraphK. http6LLgraphsharp.codeplex.comL. Dccessed21'

    !2" C. &ugiyama. Hraph :rawing and Dpplications for&oftware and Cnowledge +ngineers. $orld &cientific ub;o >nc. 22

    !21" D. Crause, ;. +. Huestrin. @ear#optimal @onmyopic3alue of >nformation in Hraphical %odels. Twenty#Birst;onference on ntelligence. 212

    !22" :. &hahafG D. ;hechet0aG ;. Huestrin. Eearning Thin

    Qunction Trees via Hraph ;uts. >nternational ;onferenceon Drtificial >ntelligence and &tatistics. 29

    !2'" ;. %ee0, . Thiesson, :. Iec0erman. The Eearning#;urve &ampling %ethod Dpplied to %odel#ased;lustering. The Qournal of %achine Eearning 4esearch.22

    !2" C. ache, %. Eichman. %achine Eearning4epository.