probability & statistics: intro, summary statistics ...eero/math-tools18/... · probability:...

Mathematical Tools for Neural and Cognitive Science

Probability & Statistics: Intro, summary statistics, probability

Fall semester, 2018

- Efron & Tibshirani, Introduction to the Bootstrap, 1998

Some history…• 1600’s: Early notions of data summary/averaging

• 1700’s: Bayesian prob/statistics (Bayes, Laplace)

• 1920’s: Frequentist statistics for science (e.g., Fisher)

• 1940’s: Statistical signal analysis and communication, estimation/decision theory (e.g., Shannon, Wiener, etc)

• 1950’s: Return of Bayesian statistics (e.g., Jeffreys, Wald, Savage, Jaynes…)

• 1970’s: Computation, optimization, simulation (e.g,. Tukey)

• 1990’s: Machine learning (large-scale computing + statistical inference + lots of data)

• Since 1950’s! : statistical neural/cognitive models

Scientific process

Summarize/fit model(s),compare with predictions

Create/modifyhypothesis/model

Generate predictions,design experiment

Observe / measure data

Descriptive statistics: Central tendency5

Descriptive statistics: Central tendency• We often summarize data with the average. Why?

• Average minimizes the squared error (as in regression!):

• Generalize: minimize Lp norm:

– minimize L1 norm: median,

– minimize L0 norm: mode

– minimize norm: midpoint of range

• Issues: outliers, asymmetry, bimodality

• How do we choose?

µ(~x) = argminc

�xn � c

argminc

|xn � c|p#1/p

Descriptive statistics: Dispersion7

Descriptive statistics: Dispersion

• Sample standard deviation

• Mean absolute deviation (MAD) about the median

• Quantiles

d(~x) =1

��xn �m(~x)

��

�(~x) = minc

(xn � c)2#1/2

(xn � µ(~x))2#1/2

Descriptive statistics: Dispersion

Summary statistics (eg: sample mean/var) can be interpreted as estimates of model parametersTo formalize this, we need tools from probability…

histogram

{ck, hk}

probability distribution

{ ⃗x n}

probabilistic model

pθ( ⃗x )

Measurement

Inference

You pick a family at random and discover that one of the children is a girl.

Probabilistic Middleville

The stork delivers boys and girls randomly, with family probability {BB,BG,GB,GG}={0.2,0.3,0.2,0.3}

probabilistic model

In Middleville, every family has two children, brought by the stork.

What are the chances that the other child is a girl?

inference

Statistical MiddlevilleIn Middleville, every family has two children, brought by the stork.

In a survey of 100 of the Middleville families, 32 have two girls, 23 have two boys, and the remainder one of each.

inference

The stork delivers boys and girls randomly, with family probability {BB,BG,GB,GG}={0.2,0.3,0.2,0.3}

You pick a family at random and discover that one of the children is a girl.

What are the chances that the other child is a girl?

Probability basics (outline)

• distributions: discrete and continuous

• expected value, moments

• cumulative distributions. Quantiles, Q-Q plots, drawing samples.

• transformations: affine, monotonic nonlinear

Probability: Definitions/notationlet X, Y, Z be random variables

they can take on values (like ‘heads’ or ‘tails’; or integers 1-6; or real-valued numbers)

let x, y, z stand generically for values they can take, and denote events such as X = x

write the probability that X takes on value x as P(X = x), or PX(x), or sometimes just P(x)

P(x) is a function over values x, which we call the probability “distribution” function (pdf) (for continuous variables, “density”)

Useful to have this notation up on slid, while introducing concepts on board

Probability distributions

−∞∫ x xd p( ) =1( )b

a= ∫ ( )P a< < b d x px x

0 < P(xi ) <1, ∀i

P(xi ) = 1i∑

0 < p(x)

p(x)dx = 1−∞

Discrete random variable Continuous random variable

1 2 3 4 5 6 7 8 9 10 110

0 200 400 600 800 10000

2 3 4 5 6 7 8 9 10 11 120

1 2 3 4 5 60

a not-quite-fair coin sum of two rolled fair dice

clicks of a Geiger counter, in a fixed time interval

horizontal velocity of gas molecules exiting a fan... and, time between clicks

Example distributionsroll of a fair die

-0 1 2 4 53 6 7 8 9 10

Expected value - discrete

[the mean, ]µE(X ) = xi p(xi )i=1

0 1 2 3 4# of credit cards

E( f (X )) = f (xi )p(xi )i=1

∑More generally:

Expected value - continuous

E(x) =

Zx p(x) dx

E(x2) =

2p(x) dx

�(x� µ)2

Z(x� µ)2 p(x) dx

2p(x) dx� µ

E (f(x)) =

Zf(x) p(x) dx

[mean, ]µ

[“second moment”, m2]

�2[variance, ]

Note: this is an inner product, and thus linear:

[equal to m2 minus ]μ2

E (af(x) + bg(x)) = aE (f(x)) + bE (g(x))

[“expected value of f ”]

Cumulatives

2 3 4 5 6 7 8 9 1011120

2 4 6 8 10 120

50 100 150x

50 100 1500

c(y) =

�1p(x)dx

Drawing samples - discrete

• joint distributions

• marginals (integrating)

• conditionals (slicing)

• Bayes’ rule (inverse probability)

• statistical independence (separability)

• linear transformations

Multi-variate probability

[on board]

Joint and conditional probability - discrete23

Joint and conditional probability - discrete

P(Ace) P(Heart) P(Ace & Heart) P(Ace | Heart) P(not Jack of Diamonds) P(Ace | not Jack of Diamonds)

“Independence”

p(x, y)

Joint distribution (continuous)25

p(x) =

Zp(x, y)dy

p(x, y)

Marginal distribution 26

Conditional probability

A BA & B

p(A | B) = probability of A given that B is asserted to be true = p(A& B)p(B)

Neither A nor B

p(x, y) p(x|y = 68)

Conditional distribution28

p(x|y = 68) = p(x, y = 68)

�Zp(x, y = 68)dx

= p(x, y = 68).p(y = 68)

Conditional distribution

p(x|y) = p(x, y)/p(y)

More generally:

normalize (by marginal)slice joint distribution

Bayes’ Rule

A BA & B

p(A& B) = p(B)p(A | B)

= p(A)p(B | A)

⇒ p(A | B) = p(B | A)p(A)p(B)

p(A | B) = probability of A given that B is asserted to be true = p(A& B)p(B)

Bayes’ Rule

p(x|y) = p(y|x) p(x)/p(y)

(a direct consequence of the definition of conditional probability)

P(x|Y=120)

Conditional vs. marginal

In general, the marginals for different Y values differ. When are they they same? In particular, when are all conditionals equal to the marginal?

Statistical independence

Random variables X and Y are statistically independent if (and only if):

Independence implies that all conditionals are equal to the corresponding marginal:

p(x, y) = p(x)p( y) ∀ x, y

p(x | y) = p(x, y) / p( y) = p(x) ∀ x, y

[note: for discrete distributions, this is an outer product!]

Sums of RVs

In addition, if X and Y are independent, then

E(XY ) = E(X )E(Y )

σ Z2 = E X +Y( )− µX + µY( )( )2( ) =σ X

2 +σ Y2

and is a convolution of andpZ (z)

E(X +Y ) = E(X )+ E(Y )

Let Z = X + Y. Since expectation is linear:

pX (x) pY ( y)

[on board]

• Mean and variance summarize the centroid/width

• Translation and rescaling of random variables

• Mean/variance of weighted sum of random variables

• The sample average

• ... converges to true mean (except for bizarre distributions)

• ... with variance

• ... most common common choice for an estimate ...

Mean and variance35

−4 −3 −2 −1 0 1 2 3 40

500(u+u+u+u)/sqrt(4)

−4 −3 −2 −1 0 1 2 3 40

250104 samples of uniform dist

−4 −3 −2 −1 0 1 2 3 40

60010 u’s divided by sqrt(10)

−4 −3 −2 −1 0 1 2 3 40

450(u+u)/sqrt(2)

Central limit for a uniform distribution...

10k samples, uniform density (sigma=1)

0 0.2 0.4 0.6 0.8 10

6000one coin

0 0.2 0.4 0.6 0.8 10

4000avg of 4 coins

0 0.2 0.4 0.6 0.8 10

2000avg of 16 coins

0 0.2 0.4 0.6 0.8 10

2000avg of 64 coins

0 0.2 0.4 0.6 0.8 10

2500avg of 256 coins

Central limit for a binary distribution...37

probability & statistics: intro, summary statistics ...eero/math-tools18/... · probability:...

Documents

statistics & probability

probability & statistics 4

statistics and probability

probability and statistics - programs, courses aiu...

probability with statistics · probability with statistics...

probability statistics advanced

dr. héctor allendereview of probability and statistics 1 a...

qbm117 business statistics probability conditional...

chapter 6: probability. flow of inferential statistics and...

basic probability & statistics

21998904 probability statistics

probability & statistics 3

probability & statistics: intro, summary statistics

basic probability theory for biomedical...

probability & statistics,

schaum's probability & statistics

probability and statistics in...

probability and statistics - caniban.files.wordpress.com ·...

1 probability and statistics what is probability? what is...

probability and statistics