bayesian tuto

Upload: -

Post on 04-Apr-2018

237 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/29/2019 Bayesian Tuto

    1/49

    Haimonti Dutta , Department Of

    Computer And InformationScience 1

    David HeckerMann

    A Tutorial On Learning With

    Bayesian Networks

  • 7/29/2019 Bayesian Tuto

    2/49

    Haimonti Dutta , Department Of

    Computer And InformationScience 2

    Outline

    Introduction

    Bayesian Interpretation of probability and review methods

    Bayesian Networks and Construction from prior knowledge

    Algorithms for probabilistic inference

    Learning probabilities and structure in a bayesian network

    Relationships between Bayesian Network techniques andmethods for supervised and unsupervised learning

    Conclusion

  • 7/29/2019 Bayesian Tuto

    3/49

  • 7/29/2019 Bayesian Tuto

    4/49

    Haimonti Dutta , Department Of

    Computer And InformationScience 4

    What do Bayesian Networks and Bayesian

    Methods have to offer ?

    Handling of Incomplete Data Sets

    Learning about Causal Networks Facilitating the combination of domain knowledge

    and data

    Efficient and principled approach for avoiding the

    over fitting of data

  • 7/29/2019 Bayesian Tuto

    5/49

    Haimonti Dutta , Department Of

    Computer And InformationScience 5

    The Bayesian Approach toProbability and Statistics

    Bayesian Probability : the degree of belief in that event

    Classical Probability : true or physical probability of an event

  • 7/29/2019 Bayesian Tuto

    6/49

    Haimonti Dutta , Department Of

    Computer And InformationScience 6

    Some Criticisms of Bayesian Probability

    Why degrees of belief satisfy the rules ofprobability

    On what scale should probabilities be measured?

    What probabilites are to be assigned to beliefsthat are not in extremes?

  • 7/29/2019 Bayesian Tuto

    7/49

    Haimonti Dutta , Department Of

    Computer And InformationScience 7

    Some Answers

    Researchers have suggested different sets of

    properties that are satisfied by the degrees ofbelief

  • 7/29/2019 Bayesian Tuto

    8/49

    Haimonti Dutta , Department Of

    Computer And InformationScience 8

    Scaling Problem

    The probability wheel : a tool for assessingprobabilities

    What is the probability that the fortune wheel stopsin the shaded region?

  • 7/29/2019 Bayesian Tuto

    9/49

    Haimonti Dutta , Department Of

    Computer And InformationScience 9

    Probability assessment

    An evident problem : SENSITIVITY

    How can we say that the probability of an event is 0.601

    and not .599 ?

    Another problem : ACCURACY

    Methods for improving accuracy are available indecision analysis techniques

  • 7/29/2019 Bayesian Tuto

    10/49

    Haimonti Dutta , Department Of

    Computer And InformationScience 10

    Learning with Data

    Thumbtack problem

    When tossed it can rest on either heads or tails

    Heads Tails

  • 7/29/2019 Bayesian Tuto

    11/49

    Haimonti Dutta , Department Of

    Computer And InformationScience 11

    Problem

    From N observations we want to determine the

    probability of heads on the N+1 th toss.

  • 7/29/2019 Bayesian Tuto

    12/49

    Haimonti Dutta , Department Of

    Computer And InformationScience 12

    Two Approaches

    Classical Approach :

    assert some physical probability of heads(unknown)

    Estimate this physical probability from Nobservations

    Use this estimate as probability for the headson the N+1 th toss.

  • 7/29/2019 Bayesian Tuto

    13/49

    Haimonti Dutta , Department Of

    Computer And Information

    Science 13

    The other approach

    Bayesian Approach

    Assert some physical probability

    Encode the uncertainty about theis physicalprobability using the Bayesian probailities

    Use the rules of probability to compute therequired probability

  • 7/29/2019 Bayesian Tuto

    14/49

    Haimonti Dutta , Department Of

    Computer And Information

    Science 14

    Some basic probability

    formulas Bayes theorem : the posterior probability for

    given D and a background knowledge :

    p( /D, ) = p( / ) p (D/ , )P(D / )

    Where p(D/ )= p(D/ , ) p( / ) d

    Note : is an uncertain variable whose value corresponds tothe possible true values of the physical probability

  • 7/29/2019 Bayesian Tuto

    15/49

    Haimonti Dutta , Department Of

    Computer And Information

    Science 15

    Likelihood functionHow good is a particular value of ?

    It depends on how likely it is capable of generating the observed

    dataL ( :D ) = P( D/ )

    Hence the likelihood of the sequence H, T,H,T ,T may be L ( :D )

    = . (1- ). . (1- ). (1- ).

  • 7/29/2019 Bayesian Tuto

    16/49

    Haimonti Dutta , Department Of

    Computer And Information

    Science 16

    Sufficient statistics

    To compute the likelihood in the thumb tack

    problem we only require h and t(the number ofheads and the number of tails)

    h and t are called sufficient statistics for thebinomial distribution

    A sufficient statistic is a function that summarizesfrom the data , the relevant information for thelikelihood

  • 7/29/2019 Bayesian Tuto

    17/49

  • 7/29/2019 Bayesian Tuto

    18/49

    Haimonti Dutta , Department Of

    Computer And Information

    Science 18

    To remember

    We need a method to assess the prior distributionfor .

    A common approach usually adopted is assumethat the distribution is a beta distribution.

  • 7/29/2019 Bayesian Tuto

    19/49

    Haimonti Dutta , Department Of

    Computer And Information

    Science 19

    Maximum Likelihood Estimation

    MLE principle :

    We try to learn the parameters that maximize thelikelihood function

    It is one of the most commonly used estimators instatistics and is intuitively appealing

  • 7/29/2019 Bayesian Tuto

    20/49

    Haimonti Dutta , Department Of

    Computer And Information

    Science 20

    A graphical model that efficiently encodes thejoint probability distribution for a large set ofvariables

    What is a Bayesian Network ?

  • 7/29/2019 Bayesian Tuto

    21/49

    Haimonti Dutta , Department Of

    Computer And Information

    Science 21

    Definition

    A Bayesian Network for a set of variables

    X = { X1,.Xn} contains network structure S encoding conditional

    independence assertions about X

    a set P of local probability distributions

    The network structure S is a directed acyclic graphAnd the nodes are in one to one correspondencewith the variables X.Lack of an arc denotes aconditional independence.

  • 7/29/2019 Bayesian Tuto

    22/49

    Haimonti Dutta , Department Of

    Computer And Information

    Science 22

    Some conventions.

    Variables depicted as nodes

    Arcs represent probabilistic dependence between

    variables

    Conditional probabilities encode the strength ofdependencies

  • 7/29/2019 Bayesian Tuto

    23/49

    Haimonti Dutta , Department Of

    Computer And Information

    Science 23

    An Example

    Detecting Credit - Card Fraud

    Fraud Age Sex

    Gas Jewelry

  • 7/29/2019 Bayesian Tuto

    24/49

    Haimonti Dutta , Department Of

    Computer And Information

    Science 24

    Tasks

    Correctly identify the goals of modeling

    Identify many possible observations that may be relevant toa problem

    Determine what subset of those observations is worthwhileto model

    Organize the observations into variables having mutuallyexclusive and collectively exhaustive states.

    Finally we are to build a Directed A cyclic Graph that encodesthe assertions of conditional independence

  • 7/29/2019 Bayesian Tuto

    25/49

    Haimonti Dutta , Department Of

    Computer And Information

    Science 25

    A technique of constructing a

    Bayesian NetworkThe approach is based on the following

    observations :

    People can often readily assert causal

    relationships among the variables Casual relations typically correspond to

    assertions of conditional dependence

    To construct a Bayesian Network we simply draw

    arcs for a given set of variables from the causevariables to their immediate effects.In the finalstep we determine the local probabilitydistributions.

  • 7/29/2019 Bayesian Tuto

    26/49

  • 7/29/2019 Bayesian Tuto

    27/49

    Haimonti Dutta , Department Of

    Computer And Information

    Science 27

    Bayesian inference

    On construction of a Bayesian network we need to determinethe various probabilities of interest from the model

    Observed data QueryComputation of a probability of interest given a model is probabilistic

    inference

    x1 x2 x[m] x[m+1]

  • 7/29/2019 Bayesian Tuto

    28/49

    Haimonti Dutta , Department Of

    Computer And Information

    Science 28

    Learning Probabilities in a Bayesian

    NetworkProblem: Using data to update the probabilities of agiven network structure

    Thumbtack problem: We do not learn the

    probability of the heads , we update the posteriordistribution for the variable that represents thephysical probability of the heads

    The problem restated:Given a random sample D

    compute the posterior probability .

  • 7/29/2019 Bayesian Tuto

    29/49

    Haimonti Dutta , Department Of

    Computer And Information

    Science 29

    Assumptions to compute the posterior

    probability

    There is no missing data in the random sample D.

    Parameters are independent .

  • 7/29/2019 Bayesian Tuto

    30/49

    Haimonti Dutta , Department Of

    Computer And Information

    Science 30

    But

    Data may be missing and then how dowe proceed ?????????

  • 7/29/2019 Bayesian Tuto

    31/49

    Haimonti Dutta , Department Of

    Computer And Information

    Science 31

    Obvious concerns.

    Why was the data missing?

    Missing values

    Hidden variables

    Is the absence of an observationdependent on the actual states of thevariables?

    We deal with the missing data that areindependent of the state

  • 7/29/2019 Bayesian Tuto

    32/49

    Haimonti Dutta , Department Of

    Computer And Information

    Science 32

    Incomplete data (contd)

    Observations reveal that for any interesting set oflocal likelihoods and priors the exactcomputation of the posterior distribution will be

    intractable.

    We require approximation for incomplete data

  • 7/29/2019 Bayesian Tuto

    33/49

    Haimonti Dutta , Department Of

    Computer And Information

    Science 33

    The various methods of approximations forIncomplete Data

    Monte Carlo Sampling methods

    Gaussian Approximation

    MAP and Ml Approximations and EM algorithm

  • 7/29/2019 Bayesian Tuto

    34/49

    Haimonti Dutta , Department Of

    Computer And Information

    Science 34

    Gibbs Sampling

    The steps involved :

    Start :

    Choose an initial state for each of the variables in X atrandom

    Iterate : Unassign the current state of X1.

    Compute the probability of this state given that of n-1variables.

    Repeat this procedure for all X creating a new sample of X

    After burn in phase the possible configuration of X will

    be sampled with probability p(x).

  • 7/29/2019 Bayesian Tuto

    35/49

    Haimonti Dutta , Department Of

    Computer And Information

    Science 35

    Problem in Monte Carlo method

    Intractable when the sample size is large

    Gaussian Approximation

    Idea : Large amounts of data can be approximatedto a multivariate Gaussian Distribution.

  • 7/29/2019 Bayesian Tuto

    36/49

    Haimonti Dutta , Department Of

    Computer And Information

    Science 36

    Criteria for Model Selection

    Some criterion must be used to determine thedegree to which a network structure fits the priorknowledge and data

    Some such criteria include

    Relative posterior probability

    Local criteria

  • 7/29/2019 Bayesian Tuto

    37/49

    Haimonti Dutta , Department Of

    Computer And Information

    Science 37

    Relative posterior probability

    A criteria for model selection is the logarithm of therelative posterior probability given as follows :

    Log p(D /Sh) = log p(Sh) + log p(D /Sh)

    log prior log marginallikelihood

  • 7/29/2019 Bayesian Tuto

    38/49

    Haimonti Dutta , Department Of

    Computer And Information

    Science 38

    Local Criteria

    An Example :

    A Bayesian network structure for medical diagnosis

    Ailment

    Finding 1 Finding 2Finding n

  • 7/29/2019 Bayesian Tuto

    39/49

    Haimonti Dutta , Department Of

    Computer And Information

    Science 39

    Priors

    To compute the relative posterior probability

    We assess the

    Structure priors p(Sh)

    Parameter priors p( s /Sh)

  • 7/29/2019 Bayesian Tuto

    40/49

    Haimonti Dutta , Department Of

    Computer And Information

    Science 40

    Priors on network parameters

    Key concepts :

    Independence Equivalence

    Distribution Equivalence

  • 7/29/2019 Bayesian Tuto

    41/49

    Haimonti Dutta , Department Of

    Computer And Information

    Science 41

    Illustration of independent equivalence

    Independence assertion : X and Z are conditionallyindependent given Y

    X

    Y Z

    X

    YZ

    X

    Y Z

  • 7/29/2019 Bayesian Tuto

    42/49

    Haimonti Dutta , Department Of

    Computer And Information

    Science 42

    Priors on structures

    Various methods.

    Assumption that every hypothesis is equallylikely ( usually for convenience)

    Variables can be ordered and presence orabsence of arcs are mutually independent

    Use of prior networks

    Imaginary data from domain experts

  • 7/29/2019 Bayesian Tuto

    43/49

    Haimonti Dutta , Department Of

    Computer And Information

    Science 43

    Benefits of learning structures

    Efficient learning --- more accurate models withless data

    Compare P(A) and P(B) versus P(A,B) former

    requires less data

    Discover structural properties of the domain

    Helps to order events that occur sequentially andin sensitivity analysis and inference

    Predict effect of the actions

  • 7/29/2019 Bayesian Tuto

    44/49

    Haimonti Dutta , Department Of

    Computer And Information

    Science 44

    Search Methods

    Problem : We are to find the best network from theset of all networks in which each node has nomore than k parents

    Search techniques :

    Greedy Search

    Greedy Search with restarts

    Best first Search

    Monte Carlo Methods

  • 7/29/2019 Bayesian Tuto

    45/49

  • 7/29/2019 Bayesian Tuto

    46/49

    Haimonti Dutta , Department Of

    Computer And Information

    Science 46

    What is all this good for anyway????????

    Implementations in real life :

    It is used in the Microsoft products(MicrosoftOffice)

    Medical applications and Biostatistics (BUGS)

    In NASA Autoclass projectfor data analysis

    Collaborative filtering (Microsoft MSBN) Fraud Detection (ATT)

    Speech recognition (UC , Berkeley )

  • 7/29/2019 Bayesian Tuto

    47/49

    Haimonti Dutta , Department Of

    Computer And Information

    Science 47

    Limitations Of Bayesian Networks

    Typically require initial knowledge of manyprobabilitiesquality and extent of prior

    knowledge play an important role

    Significant computational cost(NP hard task)

    Unanticipated probability of an event is not takencare of.

  • 7/29/2019 Bayesian Tuto

    48/49

    Haimonti Dutta , Department Of

    Computer And Information

    Science 48

    Conclusion

    Inducer

    Bayesian Network

    Data +priorknowledge

  • 7/29/2019 Bayesian Tuto

    49/49