bayesian tuto

7/29/2019 Bayesian Tuto

1/49

Haimonti Dutta , Department Of

Computer And InformationScience 1

David HeckerMann

A Tutorial On Learning With

Bayesian Networks


2/49



Outline

Introduction

Bayesian Interpretation of probability and review methods

Bayesian Networks and Construction from prior knowledge

Algorithms for probabilistic inference

Learning probabilities and structure in a bayesian network

Relationships between Bayesian Network techniques andmethods for supervised and unsupervised learning

Conclusion


3/49


4/49



What do Bayesian Networks and Bayesian

Methods have to offer ?

Handling of Incomplete Data Sets

Learning about Causal Networks Facilitating the combination of domain knowledge

and data

Efficient and principled approach for avoiding the

over fitting of data


5/49



The Bayesian Approach toProbability and Statistics

Bayesian Probability : the degree of belief in that event

Classical Probability : true or physical probability of an event


6/49



Some Criticisms of Bayesian Probability

Why degrees of belief satisfy the rules ofprobability

On what scale should probabilities be measured?

What probabilites are to be assigned to beliefsthat are not in extremes?


7/49



Some Answers

Researchers have suggested different sets of

properties that are satisfied by the degrees ofbelief


8/49



Scaling Problem

The probability wheel : a tool for assessingprobabilities

What is the probability that the fortune wheel stopsin the shaded region?


9/49



Probability assessment

An evident problem : SENSITIVITY

How can we say that the probability of an event is 0.601

and not .599 ?

Another problem : ACCURACY

Methods for improving accuracy are available indecision analysis techniques


10/49



Learning with Data

Thumbtack problem

When tossed it can rest on either heads or tails

Heads Tails


11/49



Problem

From N observations we want to determine the

probability of heads on the N+1 th toss.


12/49



Two Approaches

Classical Approach :

assert some physical probability of heads(unknown)

Estimate this physical probability from Nobservations

Use this estimate as probability for the headson the N+1 th toss.


13/49


Computer And Information

Science 13

The other approach

Bayesian Approach

Assert some physical probability

Encode the uncertainty about theis physicalprobability using the Bayesian probailities

Use the rules of probability to compute therequired probability


14/49



Science 14

Some basic probability

formulas Bayes theorem : the posterior probability for

given D and a background knowledge :

p( /D, ) = p( / ) p (D/ , )P(D / )

Where p(D/ )= p(D/ , ) p( / ) d

Note : is an uncertain variable whose value corresponds tothe possible true values of the physical probability


15/49



Science 15

Likelihood functionHow good is a particular value of ?

It depends on how likely it is capable of generating the observed

dataL ( :D ) = P( D/ )

Hence the likelihood of the sequence H, T,H,T ,T may be L ( :D )

= . (1- ). . (1- ). (1- ).


16/49



Science 16

Sufficient statistics

To compute the likelihood in the thumb tack

problem we only require h and t(the number ofheads and the number of tails)

h and t are called sufficient statistics for thebinomial distribution

A sufficient statistic is a function that summarizesfrom the data , the relevant information for thelikelihood


17/49


18/49



Science 18

To remember

We need a method to assess the prior distributionfor .

A common approach usually adopted is assumethat the distribution is a beta distribution.


19/49



Science 19

Maximum Likelihood Estimation

MLE principle :

We try to learn the parameters that maximize thelikelihood function

It is one of the most commonly used estimators instatistics and is intuitively appealing


20/49



Science 20

A graphical model that efficiently encodes thejoint probability distribution for a large set ofvariables

What is a Bayesian Network ?


21/49



Science 21

Definition

A Bayesian Network for a set of variables

X = { X1,.Xn} contains network structure S encoding conditional

independence assertions about X

a set P of local probability distributions

The network structure S is a directed acyclic graphAnd the nodes are in one to one correspondencewith the variables X.Lack of an arc denotes aconditional independence.


22/49



Science 22

Some conventions.

Variables depicted as nodes

Arcs represent probabilistic dependence between

variables

Conditional probabilities encode the strength ofdependencies


23/49



Science 23

An Example

Detecting Credit - Card Fraud

Fraud Age Sex

Gas Jewelry


24/49



Science 24

Tasks

Correctly identify the goals of modeling

Identify many possible observations that may be relevant toa problem

Determine what subset of those observations is worthwhileto model

Organize the observations into variables having mutuallyexclusive and collectively exhaustive states.

Finally we are to build a Directed A cyclic Graph that encodesthe assertions of conditional independence


25/49



Science 25

A technique of constructing a

Bayesian NetworkThe approach is based on the following

observations :

People can often readily assert causal

relationships among the variables Casual relations typically correspond to

assertions of conditional dependence

To construct a Bayesian Network we simply draw

arcs for a given set of variables from the causevariables to their immediate effects.In the finalstep we determine the local probabilitydistributions.


26/49


27/49



Science 27

Bayesian inference

On construction of a Bayesian network we need to determinethe various probabilities of interest from the model

Observed data QueryComputation of a probability of interest given a model is probabilistic

inference

x1 x2 x[m] x[m+1]


28/49



Science 28

Learning Probabilities in a Bayesian

NetworkProblem: Using data to update the probabilities of agiven network structure

Thumbtack problem: We do not learn the

probability of the heads , we update the posteriordistribution for the variable that represents thephysical probability of the heads

The problem restated:Given a random sample D

compute the posterior probability .


29/49



Science 29

Assumptions to compute the posterior

probability

There is no missing data in the random sample D.

Parameters are independent .


30/49



Science 30

But

Data may be missing and then how dowe proceed ?????????


31/49



Science 31

Obvious concerns.

Why was the data missing?

Missing values

Hidden variables

Is the absence of an observationdependent on the actual states of thevariables?

We deal with the missing data that areindependent of the state


32/49



Science 32

Incomplete data (contd)

Observations reveal that for any interesting set oflocal likelihoods and priors the exactcomputation of the posterior distribution will be

intractable.

We require approximation for incomplete data


33/49



Science 33

The various methods of approximations forIncomplete Data

Monte Carlo Sampling methods

Gaussian Approximation

MAP and Ml Approximations and EM algorithm


34/49



Science 34

Gibbs Sampling

The steps involved :

Start :

Choose an initial state for each of the variables in X atrandom

Iterate : Unassign the current state of X1.

Compute the probability of this state given that of n-1variables.

Repeat this procedure for all X creating a new sample of X

After burn in phase the possible configuration of X will

be sampled with probability p(x).


35/49



Science 35

Problem in Monte Carlo method

Intractable when the sample size is large

Gaussian Approximation

Idea : Large amounts of data can be approximatedto a multivariate Gaussian Distribution.


36/49



Science 36

Criteria for Model Selection

Some criterion must be used to determine thedegree to which a network structure fits the priorknowledge and data

Some such criteria include

Relative posterior probability

Local criteria


37/49



Science 37

Relative posterior probability

A criteria for model selection is the logarithm of therelative posterior probability given as follows :

Log p(D /Sh) = log p(Sh) + log p(D /Sh)

log prior log marginallikelihood


38/49



Science 38

Local Criteria

An Example :

A Bayesian network structure for medical diagnosis

Ailment

Finding 1 Finding 2Finding n


39/49



Science 39

Priors

To compute the relative posterior probability

We assess the

Structure priors p(Sh)

Parameter priors p( s /Sh)


40/49



Science 40

Priors on network parameters

Key concepts :

Independence Equivalence

Distribution Equivalence


41/49



Science 41

Illustration of independent equivalence

Independence assertion : X and Z are conditionallyindependent given Y

X

Y Z

X

YZ

X

Y Z


42/49



Science 42

Priors on structures

Various methods.

Assumption that every hypothesis is equallylikely ( usually for convenience)

Variables can be ordered and presence orabsence of arcs are mutually independent

Use of prior networks

Imaginary data from domain experts


43/49



Science 43

Benefits of learning structures

Efficient learning --- more accurate models withless data

Compare P(A) and P(B) versus P(A,B) former

requires less data

Discover structural properties of the domain

Helps to order events that occur sequentially andin sensitivity analysis and inference

Predict effect of the actions


44/49



Science 44

Search Methods

Problem : We are to find the best network from theset of all networks in which each node has nomore than k parents

Search techniques :

Greedy Search

Greedy Search with restarts

Best first Search

Monte Carlo Methods


45/49


46/49



Science 46

What is all this good for anyway????????

Implementations in real life :

It is used in the Microsoft products(MicrosoftOffice)

Medical applications and Biostatistics (BUGS)

In NASA Autoclass projectfor data analysis

Collaborative filtering (Microsoft MSBN) Fraud Detection (ATT)

Speech recognition (UC , Berkeley )


47/49



Science 47

Limitations Of Bayesian Networks

Typically require initial knowledge of manyprobabilitiesquality and extent of prior

knowledge play an important role

Significant computational cost(NP hard task)

Unanticipated probability of an event is not takencare of.


48/49



Science 48

Conclusion

Inducer

Bayesian Network

Data +priorknowledge


49/49

bayesian tuto

Documents