>> x=rand(2,10000); %uniform in square >> ix=find(x(1,:)> x=x(:,ix); >>...

52
>> x=rand(2,10000); %uniform in square >> ix=find(x(1,:)<x(2,:));% below diagonal: linear density >> x=x(:,ix); >> plot(x(1,:),x(2,:),'*'); %scatter plot >> d=x(2,:)*2; %distribution of sphere %random point distances >> d=sort(d); >> plot(d); >> k=d.^2; >> plot(k); HW2- linear density and squares

Post on 21-Dec-2015

225 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: >> x=rand(2,10000); %uniform in square >> ix=find(x(1,:)> x=x(:,ix); >> plot(x(1,:),x(2,:),'*'); %scatter plot

>> x=rand(2,10000); %uniform in square>> ix=find(x(1,:)<x(2,:));% below diagonal: linear density>> x=x(:,ix);>> plot(x(1,:),x(2,:),'*'); %scatter plot>> d=x(2,:)*2; %distribution of sphere %random point distances >> d=sort(d);>> plot(d);>> k=d.^2;>> plot(k);

HW2- linear density and squares

Page 2: >> x=rand(2,10000); %uniform in square >> ix=find(x(1,:)> x=x(:,ix); >> plot(x(1,:),x(2,:),'*'); %scatter plot

>> mean (d)ans =1.3384>> median(d)ans =1.4239>> mean(k)ans =2.0085>> median(k)ans =2.0275

Page 3: >> x=rand(2,10000); %uniform in square >> ix=find(x(1,:)> x=x(:,ix); >> plot(x(1,:),x(2,:),'*'); %scatter plot

Rejection sampling:Y-coordinates have linear density function

Page 4: >> x=rand(2,10000); %uniform in square >> ix=find(x(1,:)> x=x(:,ix); >> plot(x(1,:),x(2,:),'*'); %scatter plot

Plot of cdf of d Plot of cdf of d^2

Page 5: >> x=rand(2,10000); %uniform in square >> ix=find(x(1,:)> x=x(:,ix); >> plot(x(1,:),x(2,:),'*'); %scatter plot

Statistical Data models,Non-parametrics,

Dynamics

Page 6: >> x=rand(2,10000); %uniform in square >> ix=find(x(1,:)> x=x(:,ix); >> plot(x(1,:),x(2,:),'*'); %scatter plot

Non-informative, proper and improper priors

• For real quantity bounded to interval,standard prior is uniform distribution

• For real quantity, unbounded, standard is uniform - but with what density?

• For real quantity on half-open interval, standard prior is f(s)=1/s - but integral diverges!

• Divergent priors are called improper -they can only be used with convergent likelihoods

Page 7: >> x=rand(2,10000); %uniform in square >> ix=find(x(1,:)> x=x(:,ix); >> plot(x(1,:),x(2,:),'*'); %scatter plot

Dirichlet Distribution-prior for discrete distribution

Page 8: >> x=rand(2,10000); %uniform in square >> ix=find(x(1,:)> x=x(:,ix); >> plot(x(1,:),x(2,:),'*'); %scatter plot

Mean of Dirichlet - Laplaces estimator

Page 9: >> x=rand(2,10000); %uniform in square >> ix=find(x(1,:)> x=x(:,ix); >> plot(x(1,:),x(2,:),'*'); %scatter plot

Occurence table probability

Page 10: >> x=rand(2,10000); %uniform in square >> ix=find(x(1,:)> x=x(:,ix); >> plot(x(1,:),x(2,:),'*'); %scatter plot

Occurence table probabilityUniform prior:

Page 11: >> x=rand(2,10000); %uniform in square >> ix=find(x(1,:)> x=x(:,ix); >> plot(x(1,:),x(2,:),'*'); %scatter plot

Non-parametric inference

• How to perform inference about a distribution without assuming a distribution family?

• A distribution over reals can be approximated by a piecewise uniform distribution a mixture of real distributions

• But how many parts? This is non-parametric inference

Page 12: >> x=rand(2,10000); %uniform in square >> ix=find(x(1,:)> x=x(:,ix); >> plot(x(1,:),x(2,:),'*'); %scatter plot

Non-parametric inferenceChange-points, Rao-Blackwell

• Given times for events (eg coal-mining disasters)Infer a piecewise constant intensity function(change-point problem)

• State is set of change-points with intensities inbetween• But how many pieces? This is non-parametric inference• MCMC: Given current state, propose change in segment

bounadry or intensity• But it is possible to integrate out intensities proposed

Page 13: >> x=rand(2,10000); %uniform in square >> ix=find(x(1,:)> x=x(:,ix); >> plot(x(1,:),x(2,:),'*'); %scatter plot

Probability ratio in MCMC

For a proposed merge of intervals j and j+1, with sizesproportional to (,1-), were the counts and obtained by tossing a ‘coin’ with success probability or not? Compute model probability ratio as in HW1.

Also, the total number of breakpoints has prior distributionPoisson with parameter (average) . Probability ratio in favor of split :

n j

n j+1

λ

Page 14: >> x=rand(2,10000); %uniform in square >> ix=find(x(1,:)> x=x(:,ix); >> plot(x(1,:),x(2,:),'*'); %scatter plot

Averging MCMC run, positionsand number of breakpoints

Page 15: >> x=rand(2,10000); %uniform in square >> ix=find(x(1,:)> x=x(:,ix); >> plot(x(1,:),x(2,:),'*'); %scatter plot

Averging MCMC run, positionswith uniform test data

Page 16: >> x=rand(2,10000); %uniform in square >> ix=find(x(1,:)> x=x(:,ix); >> plot(x(1,:),x(2,:),'*'); %scatter plot

Mixture of Normals

Page 17: >> x=rand(2,10000); %uniform in square >> ix=find(x(1,:)> x=x(:,ix); >> plot(x(1,:),x(2,:),'*'); %scatter plot

Mixture of Normalselimination of nuisance parameters

Page 18: >> x=rand(2,10000); %uniform in square >> ix=find(x(1,:)> x=x(:,ix); >> plot(x(1,:),x(2,:),'*'); %scatter plot

Mixture of Normalselimination of nuisance parameters

(integrate using normalization constant of Gaussian and Gamma distributions)

Page 19: >> x=rand(2,10000); %uniform in square >> ix=find(x(1,:)> x=x(:,ix); >> plot(x(1,:),x(2,:),'*'); %scatter plot

Matlab Mixture of Normals, MCMC (AutoClass method)

function [lh,lab,trlpost,trm,trstd,trlab,trct,nbounc]= mmnonu1(x,N,k,labi,NN);%[lh,lab,trlpost,trm,trstd,trlab,trct,nbounc]=% MMNONU1(x,N,k,labi,NN);%inputs% 1D MCMC mixture modelling,% x - 1D data column vector% N - MCMC iterations.% k - number of components%lab,labi - component labelling of data vector)% NN - thinning (optional)

Page 20: >> x=rand(2,10000); %uniform in square >> ix=find(x(1,:)> x=x(:,ix); >> plot(x(1,:),x(2,:),'*'); %scatter plot

Matlab Mixture of Normals, MCMC

function [lab,trlh,trm,trstd,trlab,trct,nbounc]= mmnonu1(x,N,k,labi,NN);%[lh,lab,trlpost,trm,trstd,trlab,trct,nbounc]=% MMNONU1(x,N,k,labi,NN);%outputs%trlh - thinned trace of log probability (optional)%trm - thinned trace of means vector (optional)%trstd - thinned vector of standard deviations (optional)%trlab - thinned trace of labels vector (size(x,1) by N/NN (optional)%trct - thinned trace of mixing proportions

Page 21: >> x=rand(2,10000); %uniform in square >> ix=find(x(1,:)> x=x(:,ix); >> plot(x(1,:),x(2,:),'*'); %scatter plot

Matlab Mixture of Normals, MCMC

N=10000;NN=100;x=[randn(100,1)-1;randn(100,1)*3;randn(100,1)+1];% 3 components synthetic datak=2; labi=ceil(rand(size(x))*2);[llhc,lab2,trl,trm,trstd,trlab,trct,nbounc]= … mmnonu1(x,N,k,labi,NN);[llhc2,lab2,trl2,trm2,trstd2,trlab2,trct2,nbounc]=… mmnonu1(x,N,k,lab2,NN); … (k=3, 4, 5)

Page 22: >> x=rand(2,10000); %uniform in square >> ix=find(x(1,:)> x=x(:,ix); >> plot(x(1,:),x(2,:),'*'); %scatter plot

Matlab Mixture of Normals, MCMC

The three componentsand the jointempirical distr

Page 23: >> x=rand(2,10000); %uniform in square >> ix=find(x(1,:)> x=x(:,ix); >> plot(x(1,:),x(2,:),'*'); %scatter plot

Matlab Mixture of Normals, MCMC Putting them

together makesthe identificationseem harder.

Page 24: >> x=rand(2,10000); %uniform in square >> ix=find(x(1,:)> x=x(:,ix); >> plot(x(1,:),x(2,:),'*'); %scatter plot

Matlab Mixture of Normals, MCMC

K=2:

std

mean

Page 25: >> x=rand(2,10000); %uniform in square >> ix=find(x(1,:)> x=x(:,ix); >> plot(x(1,:),x(2,:),'*'); %scatter plot

Matlab Mixture of Normals, MCMC

K=3:

std

mean

Burn inprogressing

Page 26: >> x=rand(2,10000); %uniform in square >> ix=find(x(1,:)> x=x(:,ix); >> plot(x(1,:),x(2,:),'*'); %scatter plot

Matlab Mixture of Normals, MCMC

K=3:

std

mean

Burnt in

Page 27: >> x=rand(2,10000); %uniform in square >> ix=find(x(1,:)> x=x(:,ix); >> plot(x(1,:),x(2,:),'*'); %scatter plot

Matlab Mixture of Normals, MCMC

K=4: Low prob

std

mean

No focus-No interpretationas 4 clusters

Page 28: >> x=rand(2,10000); %uniform in square >> ix=find(x(1,:)> x=x(:,ix); >> plot(x(1,:),x(2,:),'*'); %scatter plot

Matlab Mixture of Normals, MCMC

K=5: Low prob

std

mean

Page 29: >> x=rand(2,10000); %uniform in square >> ix=find(x(1,:)> x=x(:,ix); >> plot(x(1,:),x(2,:),'*'); %scatter plot

Matlab Mixture of Normals, MCMC

X sample: 1-100 : (-1 1) 101:200: (0 3) 201:300: (1 1)

Trace of state labels

Unsorted sample label trace sorted

Page 30: >> x=rand(2,10000); %uniform in square >> ix=find(x(1,:)> x=x(:,ix); >> plot(x(1,:),x(2,:),'*'); %scatter plot

Dynamic Systems,time series

• An abundance of linear prediction models exists

• For non-linear and Chaotic systems, method was developed in 1990:s (Santa Fe)

• Gershenfeld, Weigend: The Future of Time Series

• Online/offline: prediction/retrodiction

Page 31: >> x=rand(2,10000); %uniform in square >> ix=find(x(1,:)> x=x(:,ix); >> plot(x(1,:),x(2,:),'*'); %scatter plot

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 32: >> x=rand(2,10000); %uniform in square >> ix=find(x(1,:)> x=x(:,ix); >> plot(x(1,:),x(2,:),'*'); %scatter plot

Berry and Linoff have eloquently stated their preferences with the often quoted sentence:

"Neural networks are a good choice for most classification problemswhen the results of the model are more important than understandinghow the model works".

“Neural networks typically give the right answer”

Page 33: >> x=rand(2,10000); %uniform in square >> ix=find(x(1,:)> x=x(:,ix); >> plot(x(1,:),x(2,:),'*'); %scatter plot

Dynamic Systems and Taken’s Theorem

• Lag vectors (xi,x(i-1),…x(i-T), for all i,occupy a submanifold of E^T, if T is large enough

• This manifold is ‘diffeomorphic’ to original state space and can be used to create a good dynamic model

• Taken’s theorem assumes no noise and must be empirically verified.

Page 34: >> x=rand(2,10000); %uniform in square >> ix=find(x(1,:)> x=x(:,ix); >> plot(x(1,:),x(2,:),'*'); %scatter plot

Dynamic Systems and Taken’s Theorem

Page 35: >> x=rand(2,10000); %uniform in square >> ix=find(x(1,:)> x=x(:,ix); >> plot(x(1,:),x(2,:),'*'); %scatter plot

Santa Fe 1992 Competition

Unstable Laser

Intensive Care Unit Data,Apnea

Exchange rate Data

Synthetic series with drift

White Dwarf Star Data

Bach’s unfinished Fugue

Page 36: >> x=rand(2,10000); %uniform in square >> ix=find(x(1,:)> x=x(:,ix); >> plot(x(1,:),x(2,:),'*'); %scatter plot

Stereoscopic 3D view of statespace manifold, series A (Laser)The points seem to lie on asurface, which means that alag-vector of 3 gives goodprediction of the time series.

Page 37: >> x=rand(2,10000); %uniform in square >> ix=find(x(1,:)> x=x(:,ix); >> plot(x(1,:),x(2,:),'*'); %scatter plot

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 38: >> x=rand(2,10000); %uniform in square >> ix=find(x(1,:)> x=x(:,ix); >> plot(x(1,:),x(2,:),'*'); %scatter plot

Variational Bayes

Page 39: >> x=rand(2,10000); %uniform in square >> ix=find(x(1,:)> x=x(:,ix); >> plot(x(1,:),x(2,:),'*'); %scatter plot

QuickTime™ and a decompressor

are needed to see this picture.

True trajectory in state space

Page 40: >> x=rand(2,10000); %uniform in square >> ix=find(x(1,:)> x=x(:,ix); >> plot(x(1,:),x(2,:),'*'); %scatter plot

QuickTime™ and a decompressor

are needed to see this picture.

Reconstructed trajectory in inferred state space

Page 41: >> x=rand(2,10000); %uniform in square >> ix=find(x(1,:)> x=x(:,ix); >> plot(x(1,:),x(2,:),'*'); %scatter plot

Hidden Markov Models

• Given a sequence of discrete signals xi

• Is there a model likely to have produced xi from a sequence of states si of a Finite Markov Chain?

• P(.|s) - transition probability in state s

• S(.|s) - signal probability in state s

• Speech Recognition, Bioinformatics, …

Page 42: >> x=rand(2,10000); %uniform in square >> ix=find(x(1,:)> x=x(:,ix); >> plot(x(1,:),x(2,:),'*'); %scatter plot

Hidden Markov Models

function [Pn,Sn,stn,trP,trS,trst,tll]=… hmmsim(A,N,n,s,prop,Po,So,sto,NN);%[Pn,Sn,stn,trP,trS,trst]=HMMSIM(A,N,n,s,prop,Po,So,sto,NN);% Compute trace of posterior for hmm parameters% A - the sequence of signals% N - the length of trace% n - number of states in Markov chain% s - number of signal values % prop - proposal stepsize% optional inputs:% Po - starting transition matrix (each of n columns a discrete pdf% in n-vector% So - starting signal matrix (each of n columns a discrete pdf

Page 43: >> x=rand(2,10000); %uniform in square >> ix=find(x(1,:)> x=x(:,ix); >> plot(x(1,:),x(2,:),'*'); %scatter plot

Hidden Markov Models

function [Pn,Sn,stn,trP,trS,trst,tll]=… hmmsim(A,N,n,s,prop,Po,So,sto,NN);% in s-vector% sto - starting state sequence (congruent to vector A)% NN - thining of trace, default 10% outputs% Pn - last transition matrix in trace% Sn - last signal emission matrix% stn - last hidden state vector (congruent to A)% trP - trace of transition matrices% trS - trace of signal matrices% trace of hidden state vectors

Page 44: >> x=rand(2,10000); %uniform in square >> ix=find(x(1,:)> x=x(:,ix); >> plot(x(1,:),x(2,:),'*'); %scatter plot

Hidden Markov Models

Page 45: >> x=rand(2,10000); %uniform in square >> ix=find(x(1,:)> x=x(:,ix); >> plot(x(1,:),x(2,:),'*'); %scatter plot

Hidden Markov Models

Page 46: >> x=rand(2,10000); %uniform in square >> ix=find(x(1,:)> x=x(:,ix); >> plot(x(1,:),x(2,:),'*'); %scatter plot

Hidden Markov Models

Page 47: >> x=rand(2,10000); %uniform in square >> ix=find(x(1,:)> x=x(:,ix); >> plot(x(1,:),x(2,:),'*'); %scatter plot

Hidden Markov ModelsOver 100000 iterations, burnin is visible2 states, 2 signalsP-transition matrix S-signaling

Page 48: >> x=rand(2,10000); %uniform in square >> ix=find(x(1,:)> x=x(:,ix); >> plot(x(1,:),x(2,:),'*'); %scatter plot

Chapman Kolmogorov version of Bayes’ rule

f (λt |Dt) ∝ f(dt |λt)∫ f (λt |λt−1) f (λt−1 |Dt−1)dλt−1

Page 49: >> x=rand(2,10000); %uniform in square >> ix=find(x(1,:)> x=x(:,ix); >> plot(x(1,:),x(2,:),'*'); %scatter plot

Chapman Kolmogorov version of Bayes’ rule

f (λt |Dt) ∝ f(dt |λt)∫ f (λt |λt−1) f (λt−1 |Dt−1)dλt−1

Page 50: >> x=rand(2,10000); %uniform in square >> ix=find(x(1,:)> x=x(:,ix); >> plot(x(1,:),x(2,:),'*'); %scatter plot

Observation and video based particle filter tracking

Defence: tracking with heterogeneousobservations

Crowd analysis: tracking from video

Page 51: >> x=rand(2,10000); %uniform in square >> ix=find(x(1,:)> x=x(:,ix); >> plot(x(1,:),x(2,:),'*'); %scatter plot

Cycle in Particle filter

Importance (weighted)sampleResampled ordinary sample

Diffused sample

Weighted by likelihood

X- state Z - Observation

Time step cycle

Page 52: >> x=rand(2,10000); %uniform in square >> ix=find(x(1,:)> x=x(:,ix); >> plot(x(1,:),x(2,:),'*'); %scatter plot

Particle filter-general tracking