![Page 1: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/1.jpg)
25.01.11
582636 Probabilistic Models582636 Probabilistic Models Spring 2011
Petri MyllymäkiDepartment of Computer Science
University of Helsinki, Finland
http://www.cs.helsinki.fi/courses/582636/2011/k/k/1
![Page 2: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/2.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-2
25.01.11
Three concepts
Information
Compression,coding, modeling
Probability
Uncertainreasoning
Utility
(Stochastic) searchDecision making
![Page 3: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/3.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-3
25.01.11
Our motivation: Uncertain reasoning
Computer Science
MathematicalStatistics
InformationTheory
![Page 4: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/4.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-4
25.01.11
Yet another probability course?● Computer science point of view● Artificial Intelligence Point of View
● Agent point of view● Knowledge representation (KR)● Reasoning● Rationality ● Decision making ● Grounding
● Machine learning point of view● Computational methods for data analysis● Large data bases
![Page 5: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/5.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-5
25.01.11
Reasoning under uncertainty
● The world is a very uncertain place● Thirty years of Artificial Intelligence and
Database research danced around this fact● And then a few AI researchers decided to use
some ideas from the eighteenth century● Uncertainty in Artificial Intelligence
conference series 1985-● Probabilistic reasoning now mainstream AI
![Page 6: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/6.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-6
25.01.11
But first there was logic
● Historically too it was first - syllogisms– a model of rationality
● Certainty, correctness, modularity, monotonicity
● BUT limited applicability since
Agents almost never have access to the whole truth about their
environment!
![Page 7: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/7.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-7
25.01.11
Acting by certain knowledge only?● Is it enough to leave home 90 minutes before
the flight departure?● Anything can happen.● How about X minutes before departure?● Are you bound to stay home?
Qualification problem:What are the things that have to
be taken into account?
![Page 8: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/8.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-8
25.01.11
Knowledge representation in FOPL
● Let us try to use FOPL for dental diagnosis
∀pSymptomp ,Toothache⇒Diseasep,Cavity
∀pSymptomp ,Toothache⇒Diseasep ,Cavity ∨Diseasep ,GumDisease∨Diseasep , Abscess...
∀pDiseasep,Cavity ⇒Symptomp ,Toothache
Wrong!
Incomplete!
Wrong again, need to add qualifications!
![Page 9: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/9.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9
25.01.11
FOPL representation fails because● Laziness– Its is too much work to list all the factors to ensure
exceptionless rules
● Theoretical ignorance– We do not know all the factors that play role in the
phenomenon
● Practical Ignorance– Even if we know all the factors in general, we do not
know them for each particular case
![Page 10: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/10.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-10
25.01.11
Probability to rescue
Probability provides a way to summarize the uncertainty
that comes from our laziness and ignorance
![Page 11: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/11.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-11
25.01.11
On modeling
In building intelligent systems, in statistics andin the rest of the world ...
![Page 12: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/12.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-12
25.01.11
Modeling framework
Problem
Prediction
Modeling
Decision making
![Page 13: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/13.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-13
25.01.11
What does this mean?
● Problem: there is a need to model some part of the universe and make decisions based on the model
● Modeling: build the best model possible from a priori knowledge and data available
● Prediction: use the model to predict properties of interest
● Decision making: decide actions based on the predictions
![Page 14: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/14.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-14
25.01.11
For example● Problem: online troubleshooting of
software/hardware ● Modeling: build a latent variable (Bayes)
model of the problems user encounters based on knowledge about the software and symptom data
● Prediction: use the model to predict the underlying problem given symptoms
● Decision making: propose actions to remove the problem (or to find more symptoms)
![Page 15: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/15.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-15
25.01.11
Microsoft Technical support
![Page 16: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/16.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-16
25.01.11
Example: Printer Troubleshooter
![Page 17: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/17.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-17
25.01.11
Bayesian email spam filters
● SpamBayes, OPFile, Outclass, bayespam, bogofilter, ifile, PASP, spamoracle, Spam Assassin Annoyance Filter, BSpam, Spam Bully, Death2Spam, InBoxer, …
● Software: – http://spambayes.sourceforge.net/related.html
● Background:– http://spambayes.sourceforge.net/background.html
![Page 18: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/18.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-18
25.01.11
Real questions are ...● Infinite number of models - what
models do we consider? – Model is always chosen from a set of
possible models!● How do we compare models (i.e.,
measure that one model is better than another one) given some data?
● How do we find good models?
![Page 19: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/19.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-19
25.01.11
…and more
● How do we use the models to predict unobserved quantities of interest?
● What actions do we choose given the predictions?
![Page 20: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/20.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-20
25.01.11
General “rational agent” framework
Problemdomain
Sampling
Modelfamily
Data
LearningDomainmodel
Problem
Inference
Predictivedistribution
Actions
Decisionmaking
Utilities
![Page 21: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/21.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-21
25.01.11
Choice of models
● Simple models vs. complex models● Linear models vs. non-linear models● Parametric models vs. non-parametric
models● Flat models vs. structural models● What is complex is a totally nontrivial
question● One intuition: a complex model has more
effective parameters
![Page 22: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/22.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-22
25.01.11
The Occam’s razor principle● The problem:– You are given the following sequence: -1, 3, 7, 11
– Question: What are the two next numbers?
● Solution 1:– Answer: 15 and 19
– Explanation: add 4 to the previous number
● Solution 2:– Answer: -19.9 and 1043.8
– Explanation: if the previous number is x, the next one is –x3/11 + 9/11x2 + 23/11
● “Of two competing hypotheses both conforming to our observations, choose the simpler one.”
![Page 23: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/23.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-23
25.01.11
Occam’s Razor in Modeling
● there is a trade-off between the model complexity and fit to the data
# of car accidents
age
too simple
too complexOccam
![Page 24: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/24.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-24
25.01.11
● interpretation: they are easier to understand● computation: predictions are typically easier
to compute (not necessarily!)● universality: they can be applied in more
domains (more accurate predictions)● “models should be only as complex as the
data justifies”● BUT: simpler models are NOT more probable
a priori!● Bayesian model selection: automatic Occam’s
razor for model complexity regularization
Simpler models are better than complex models
![Page 25: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/25.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-25
25.01.11
Two types of modeling
● Descriptive models (“Statistical modeling”)– describe objects (e.g., data) as they are
– typically exploratory structures
● Predictive models (“Predictive inference”)– models that are able to predict unknown objects
(e.g., future data)
– models of the underlying process
![Page 26: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/26.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-26
25.01.11
Some viewpoints
● “prediction is our business”● why the best fit to data is not the best
predictor– data can be erroneous - perfect fit is too
“specialized” and models the errors also!
– a sample can only “identify” up to a certain level of complexity
● intuitive goal: minimize model complexity + prediction error - it keeps you honest!
![Page 27: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/27.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-27
25.01.11
Alternatives
● Probabilistic inference– Statistical inference
– Bayesian inference
● Fuzzy inference● Dempster-Shafer inference● Non-monotonic logic
![Page 28: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/28.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-28
25.01.11
Bayesian inference: basic concepts
![Page 29: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/29.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-29
25.01.11
Some early history
● Bernoulli (1654-1705)● Bayes (1701-1761)● Laplace (1749-1827)● Prediction problem (“forward probability”): – If the probability of an outcome in a single trial is p,
what is the relative frequency of occurrence of this outcome in a series of trials?
● Learning problem (“inverse probability”):– Given a number of observations in a series of trials,
what are the probabilities of the different possible outcomes?
![Page 30: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/30.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-30
25.01.11
The Bayes rule● Axioms of probability theory:– The sum rule:
● P(A | C) + P(Ā | C) = 1– The product rule:
● P(AB | C) = P(A | BC) P (B | C)● The Bayes rule:– P(A | BC) = P(A | C) P(B | AC) / P(B | C)
● A rule for updating our beliefs after obtaining new information
● H = hypothesis (model), I = background information, D = data (observations):
– P(H | D I) = P(H | I) P(D | H I) / P(D | I)
![Page 31: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/31.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-31
25.01.11
Do I have a good test?
● A new home HIV test is assumed to have “95% sensitivity and 98% specificity”
● a population has HIV prevalence of 1/1000. If you use the test, what is the chance that someone testing positive actually has HIV?
![Page 32: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/32.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-32
25.01.11
Test continued ...● P(HIV + | test HIV +) = ?● We know that– P(test HIV + | HIV +) = .95– P(test HIV + | HIV -) = .02
● from Bayes we have learned that we can calculate the probability of having HIV given a positive test result by
45
P test HIV +∣HIV +P HIV +P test HIV +∣HIV +P HIV +P test HIV +∣HIV - P HIV -
= 0.95 x0.0010.95 x0.0010.02x 0.99
=0.045
![Page 33: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/33.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-33
25.01.11
Thus finally
● thus over 95% of those testing positive will, in fact, not have HIV
● the right question is:
How should the test result change our belief that we areHIV positive?
![Page 34: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/34.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-34
25.01.11
Bayesian?● Probabilities can be interpreted in various
ways:– Frequentist interpretation (Fisher,Neyman,
Cramer)
– “Degree of belief” interpretation (Bernoulli, Bayes, Laplace, Jeffreys, Lindley, Jaynes)
![Page 35: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/35.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-35
25.01.11
Frequentist says ...
● The long-run frequency of an event is the proportion of the time it occurs in a long sequence of trials - probability is this frequency
● probability can only be attached to “random variables” - not to individual events
![Page 36: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/36.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-36
25.01.11
Bayesian says ...● an event x = state of some part of the
universe● probability of x is the degree of belief that
event x will occur● probability will always depend on the state of
knowledge● p(x|y,C) means probability of event x given
that event y is true and background knowledge C
![Page 37: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/37.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-37
25.01.11
Frequentist language for solving problems
● P(data | model)● sampling distributions
Model
?
Data
![Page 38: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/38.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-38
25.01.11
Bayesian language for solving problems
● Bayesian: P(data | model) & P(model | data)
?
Prior knowledge
Data
![Page 39: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/39.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-39
25.01.11
Isn’t this what I already do? No.
…...
Estimator(function ofdata)
…...
Data
0
0,05
0,1
0,15
0,2
0,25
0 .2 .4 .6 .8
1.0
M
“Sampling distributionof the estimator”
Hypothesis testing X
![Page 40: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/40.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-40
25.01.11
“The Bayesian way”
Data
Likelihood Prior distribution of the models
0
0,05
0,1
0,15
0,2
0,25
0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1
M
0
0,05
0,1
0,15
0,2
0,25
0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1
M
0
0,05
0,1
0,15
0,2
0,25
0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1
M
Posteriordistributionof the models
X
![Page 41: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/41.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-41
25.01.11
Reasons for using probability theory
● Cox/Jaynes argument: probability is an appealing choice as the language for plausible inference
● Berger argument: Decision theory offers a theoretical framework for optimal decision making, and decision theory needs probabilities
● Pragmatic argument: it is a very general framework and it works
![Page 42: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/42.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-42
25.01.11
On plausible reasoning● “The actual science of logic is conversant at present only with
things either certain, impossible, or entirely doubtful, non of which (fortunately) we have to reason on. Therefore the true logic for this world is the calculus of Probabilities, which takes account of the magnitude of the probability which is, or ought to be, in a reasonable man’s mind” (James Clerk Maxwell)
● Probabilistic reasoning is intuitively easy to understand, but on the other hand intuition may be a poor guide when facing probabilistic evidence
● “Inside every non-Bayesian there is a Bayesian struggling to get out” (Dennis V. Lindley)
![Page 43: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/43.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-43
25.01.11
Real questions
● Q1: Given plausibilities Plaus(A) and Plaus(B), what is Plaus(AB)?
● Q2: How is Plaus(~A) related to Plaus(A)?
![Page 44: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/44.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-44
25.01.11
Qualitative properties of p.r.● D1. Degrees of plausibility are represented by real
numbers● D2. Direction of inference has a qualitative
correspondence with common sense– For example: if Plaus(A | C´) > Plaus(A | C) and
Plaus(B |C’) = Plaus(B | C), then Plaus(AB | C’) > Plaus(AB | C)
– Ensures consistency in the limit (with perfect certainty) with deductive logic
● D3. If a conclusion can be inferred in more than one way, every possible way should lead to the same result
● D4. All relevant information is always taken into account
● D5. Equivalent states of knowledge must be represented by equivalent plausibility assignments
![Page 45: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/45.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-45
25.01.11
Cox/Jaynes/Cheeseman argument
● Every allowed extension of Aristotelian logic to plausibility theory is isomorphic to Bayesian probability theory
● Product rule (answers question Q1)– P(AB | C) = P(A | BC) P (B | C)
● Sum rule (answers question Q2)– P(A | C) + P(Ā | C) = 1
![Page 46: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/46.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-46
25.01.11
Bayesian inference: How to update beliefs?
● Select the model space● Use Bayes theorem to obtain the
posterior probability of models (given data)
Posterior distribution is “the result” of the inference; what one needs from the posterior depends on what decisions are to be made
P Model∣Data =P Data∣Model P Model
P Data
![Page 47: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/47.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-47
25.01.11
The Bayesian modeling viewpoint
● Explicitly include prediction (and intervention) in modeling
Models are a means (a language) to describe interesting properties of the phenomenon to be studied, but they are not intrinsic to the phenomenon itself.
“All models are false, but some are useful.”
![Page 48: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/48.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-48
25.01.11
(Being predictive …)
Good predictive models describe useful regularities of the data generating mechanism, while models that give a high probability to the observed data have possibly only learnt to memorize it.
True prediction performance is a function of future data, not a model fit to current data
![Page 49: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/49.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-49
25.01.11
Bayesian decision making for kids
● assign a benefit for every possible outcome (for every possible decision)
● assign a probability to every possible outcome given every possible decision
● what is the best decision?
![Page 50: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/50.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-50
25.01.11
Decision theory argument
● Decision theory offers a theoretical framework for optimal decision making
? P($100)=0.1, P($50)=0.9Expected utility:0.1*$100+0.9*$50=$55
P($200)=0.2, P($5)=0.8Expected utility:0.2*$200+0.8*$5=$44
P($80)=0.5, P($50)=0.5Expected utility:0.5*$80+0.5*$50=$65
![Page 51: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/51.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-51
25.01.11
Optimal actions
● Optimal policy: choose the action with maximal expected utility
● The Dutch book argument: betting agencies must be Bayesians
● Where to get the utilities? (decision theory)
![Page 52: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/52.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-52
25.01.11
“Pragmatic” reasons for using probability theory
● The predictor and predicted variables (the inference task) do not have to be determined in advance
– probabilistic models can be used for solving both classification (discriminative tasks), and configuration problems and prediction (regression problems)
– predictions can also be used as a criteria for Data mining (explorative structures)
![Page 53: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/53.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-53
25.01.11
More pragmatic reasons for using probability theory
● consistent calculus– creating a consistent calculus for uncertain
inference is not easy (the Cox theorem)– cf. fuzzy logic
● Probabilistic models can handle both discrete and continuous variables at the same time
● Various approaches for handling missing data (both in model building and in reasoning)
![Page 54: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/54.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-54
25.01.11
Nice theory, but...
● “isn’t probabilistic reasoning counter-intuitive, something totally different from human reasoning?”
● Cause for confusion: the old frequentist interpretation. But probabilities do NOT have to be thought of as frequencies, but as measures of belief
● The so called paradoxes are often misleading– A: P(€1.000.000)=1.0
– B: P(€1.000.000)=0.25, P(€4.000.000)=0.25, P(€0)=0.5
● Even if that were true, maybe that would be a good thing!
![Page 55: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/55.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-55
25.01.11
Nice theory, but...
● “Where do all the numbers come from?”– Bayesian networks: small number of parameters– the numbers do not have to be accurate– probability theory offers a framework for
constructing models from sample data, from domain knowledge, or from their combination
![Page 56: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/56.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-56
25.01.11
We can learn from Bayesians :-)
● Bayesian approaches never overfit (in principle)
● Bayesian approaches infer only from observed data (not possible data)
● Bayesian inference is always relative to a model family
● Does all this semi-philosophical debate really matter in practice?
– YES!!
– see e.g. “The great health hoax” by Robert Matthews, The Sunday Telegraph, September 13, 1998, or “Why Most Published Research Findings are False” by John Ioannidis, PLOS Medicine 2 (2005) 8.
“I rest my case”
![Page 57: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/57.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-57
25.01.11
What is the modelspace?
Fundamental questions
How do we search?
How do we compare models?
![Page 58: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/58.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-58
25.01.11
Bayesian answers● Model family (space) is made explicit● Comparison criteria is a probability● No restrictions on the search algorithm
Model family is implicit (normal distributions)
Comparison criteria is fit to data, deviation from “random” behavior, “model index”
Simple deterministic “greedy” algorithms( )Classical statistics answers
![Page 59: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/59.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-59
25.01.11
Bayesian inference: basic operations
![Page 60: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/60.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-60
25.01.11
Probability of propositions● Notation P(x) : read “probability of “x-pression”
● Expressions are statements about the contents of random variables
● Random variables are very much like variables in computer programming languages.
– Boolean; statements, propositions
– Enumerated, discrete; small set of possible values
– Integers or natural numbers; idealized to infinity
– Floating point (continuous); real numbers to ease calculations
![Page 61: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/61.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-61
25.01.11
Elementary “probositions”● P(X=x)– probability that random variable X has value x
● we like to use words starting with capital letters to denote random variables
● For example:– P(It_will_snow_tomorrow = true)
– P(The_weekday_I'll_graduate = sunday)
– P(Number_of_planets_around_Gliese_581 = 7)
– P(The_average_height_of_adult Finns = 1702mm)
![Page 62: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/62.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-62
25.01.11
Semantics of P(X=x)=p● So what does it mean?– P(The_weekday_I'll_graduate = sunday)=0.20
– P(Number_of_planets_around_Gliese_581 = 7)=0.3
● Bayesian interpretation:– The proposition is either true or false, nothing in
between, but we may be unsure about the truth. Probabilities measure that uncertainty.
– The greater the p, the more we believe that X=x:● P(X=x) = 1 : Agent totally believes that X = x. ● P(X=x) = 0 : Agent does not believe that X=x at all.
![Page 63: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/63.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-63
25.01.11
● Elementary propositions can be combined using logical operators ˄, ˅ and ¬.− like P(X=x ˄ ¬ Y=y) etc.− Possible shorthand: P(X ϵ S)
● P(X≤x) for continuous variables
– Operator � is the most common one, and often replaced by just comma like : P(A=a, B=b).
– Naturally other logical operators can also be defined as derivatives.
Compound “probositions”
![Page 64: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/64.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-64
25.01.11
Axioms of probability
● Kolmogorov's axioms:
1.0 ≤ P(x) ≤ 1
2.P(true) = 1, P(false)=0
3.P(x ˅ y) = P(x) + P(y) – P(x ˄ y)● Some extra technical axioms needed to make theory rigorous
● Axioms can also be derived from common sense requirements (Cox/Jaynes argument)
![Page 65: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/65.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-65
25.01.11
BA
Axiom 3 again– P(x or y) = P(x) + P(y) – P(x and y)
– It is there to avoid double counting:
– P(“day_is_sunday” or “day_is_in_July”) = 1/7 + 31/365 - 4/31.
A and Β
![Page 66: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/66.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-66
25.01.11
Discrete probability distribution● Instead of stating that
• P(D=d1)=p
1,
• P(D=d2)=p
2,
• ... and
• P(D=dn)=p
n
● we often compactly say
– P(D)=(p1,p
2, ..., p
n).
● P(D) is called a probability distribution of D.
– NB! p1 + p
2 +
... + p
n = 1.
Mon Tue Wed Thu Fri
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
1
P(D)
![Page 67: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/67.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-67
25.01.11
Continuous probability distribution● In continuous case, the area under P(X=x) must
equal one. For example P(X=x) = exp(-x):
![Page 68: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/68.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-68
25.01.11
Main toolbox of the Bayesians
● Definition of conditional probability● Chain rule● The Bayes rule● Marginalization
● NB. These are all direct derivates of the axioms of probability theory
![Page 69: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/69.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-69
25.01.11
Conditional probability● Let us define a notation for the probability of x
given that we know (for sure) that y:
P x∣y =P x∧y P y
● Let us define a notation for the probability of x given that we know (for sure) that y, and we know nothing else:
● Bayesians say that all probabilities are conditional since they are relative to the agent's knowledge K.
●
– But Bayesians are lazy too, so they often drop K.
– Notice that P(x,y) = P(x|y)P(y) is also very useful!
P x∣y , K =P x∧y∣K P y∣K
![Page 70: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/70.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-70
25.01.11
Chain rule
● From the definition of conditional probability, we get:
P X 1 , X 2=P X 2∣X 1P X 1
P X 1 , ... , X n=∏i
P X 1P X 2∣X 1...P X n∣X 1 , X 2, ... , X n−1
● And more generally:
![Page 71: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/71.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-71
25.01.11
Marginalization● Let us assume we have a joint probability
distribution for a set S of random variables.● Let us further assume S1 and S2 partitions the
set S.● Now ●
●
● where s1 and s are vectors of possible value
combination of S1 and S2 respectively.●
P S1=s1= ∑s∈dom S2
P S1=s1,S2=s
= ∑s∈domS2
P S1=s1∣S2=sP S2=s ,
![Page 72: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/72.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-72
25.01.11
● You may also think this as a P(Too_Cat_Cav=x), where x is a 3-dimensional vector of truth values.
● Generalizes naturally to any set of discrete variables, not only Booleans.
Joint probability distribution
Toothache Catch Cavity probabilitytrue true true 0,108true true false 0,016true false true 0,012true false false 0,064false true true 0,072false true false 0,144false false true 0,008false false false 0,576
1,000
● P(Toothache=x,Catch=y,Cavity=z) for all combinations of truth values (x,y,z):
![Page 73: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/73.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-73
25.01.11
Joys of joint probability distribution● Summing the condition matching numbers from
the joint probability table you can calculate probability of any subset of events.
● P(Cavity=true or Toothache=true):
Toothache Catch Cavity probabilitytrue true true 0,108true true false 0,016true false true 0,012true false false 0,064false true true 0,072false true false 0,144false false true 0,008false false false 0,576
0,280
![Page 74: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/74.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-74
25.01.11
Marginal probabilities are probabilities too
● P(Cavity=x, Toothache=y)Toothache Catch Cavity probabilitytrue true true 0,108true true false 0,016true false true 0,012true false false 0,064false true true 0,072false true false 0,144false false true 0,008false false false 0,576
1,000
● Probabilities of the lines with equal values for marginal variables are simply summed.
![Page 75: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/75.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-75
25.01.11
Conditioning● Marginalization can be used to calculate
conditional probability:
PCavity=t∣Toothache=t =PCavity=t∧Toothache=t P Toothache=t
Toothache Catch Cavity probabilitytrue true true 0,108true true false 0,016true false true 0,012true false false 0,064false true true 0,072false true false 0,144false false true 0,008false false false 0,576
1,000
0.1080.0120.1080.0160.0120.064
=0.6
![Page 76: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/76.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-76
25.01.11
Conditioning via marginalization
P X∣Y
(definition) =P X ,Y P Y
(marginalization) =∑Z
P X , Z ,Y
P Y
(chain rule) =∑Z
P X∣Z ,Y P Z∣Y P Y
P Y =∑
Z
P X∣Z ,Y P Z∣Y .
![Page 77: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/77.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-77
25.01.11
Bayes' rule
● yields the famous Bayes formula
P x∣y , K =P x∧y∣K P y∣K
P x∧y∣K =P y∧x∣K =P y∣x ,K P x∣K
P x∣y ,K =P x∣K P y∣x ,K P y∣K
P h∣e= P hP e∣hP e
● or
● Combining
![Page 78: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/78.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-78
25.01.11
Bayes formula as an update rule● Prior belief P(h) is updated to posterior belief
P(h|e1). This, in turn, gets updated to P(h|e
1,e
2)
using the very same formula with P(h|e1) as a
prior. Finally, denoting P(·|e1) with P
1 we get
P h∣e1,e2=P h,e1,e2P e1,e2
=P h,e1P e2∣h,e1
P e1P e2∣e1
=P h∣e1P e2∣h,e1
P e2∣e1=P1hP1e2∣h
P1e2
![Page 79: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/79.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-79
25.01.11
Bayes formula for diagnostics● Bayes formula can be used to calculate the
probabilities of possible causes for observed symptoms.
P cause∣symptoms= P causeP symptoms∣causeP symptoms
● Causal probabilities P(symptoms|cause) are usually easier for experts to estimate than diagnostic probabilities P(cause|symptoms).
![Page 80: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/80.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-80
25.01.11
Bayes formula for model selection● Bayes formula can be used to calculate the
probabilities of hypotheses, given observations
P H1∣D=P H1P D∣H1
P D
P H2∣D=P H2P D∣H2
P D...
![Page 81: 582636 Probabilistic Models - University of Helsinki · Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-9 25.01.11 FOPL representation fails because Laziness](https://reader034.vdocument.in/reader034/viewer/2022042400/5f0f67ec7e708231d4440046/html5/thumbnails/81.jpg)
Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki I-81
25.01.11
General recipe for Bayesian inference● X: something you don't know and need to
know● Y: the things you know● Z: the things you don't know and don't need
to know● Compute:
● That's it - we're done.
P X ∣Y =∑Z
P X ∣Z ,Y P Z∣Y