1 markov chains: transitional modeling qi liu. 2 content terminology transitional models without...

1

Markov Chains: Transitional Modeling

Qi Liu

2

content

Terminology Transitional Models without Explanatory Variables Inference for Markov chains Data Analysis :Example 1 (ignoring explanatory

variables) Transitional Models with Explanatory Variables Data Anylysis: Example 2 (with explanatory

variables)

3

Terminology

Transitional models Markov chain K th-order Markov chain Tansitional probabilities and Tansitional

matrix

4

Transitional models

{y0,y1,…,yt-1} are the responses observed previously. Our focus is on the dependence of Yt on the {y0,y1,…,yt-1} as well as any explanatory variables. Models of this type are called transitional models.

5

Markov chain

A stochastic process, for all t, the conditional distribution of Yt+1,given Y0,Y1,…,Yt is identical to the conditional distribution of Yt+1 given Yt alone. i.e, given Yt, Yt+1 is conditional independent of Y0,Y1,…,Yt-1. So knowing the present state of a Markov chain,information about the past states does not help us predict the future

P(Yt+1|Y0,Y1,…Yt)=P(Yt+1|Yt)

6

K th-order Markov chain

For all t, the conditional distribution of Yt+1 given Y0,Y1,…,Yt is identical to the conditional distribution of Yt+1 ,given (Yt,…,Yt-k+1)

P(Yt+1|Y0,Y1,…Yt)=P(Yt+1|Yt-k+1,Yt-k+2,….Yt) i.e, given the states at the previous k times, the

future behavior of the chain is independent of past behavior before those k times. We discuss here is first order Markov chain with k=1.

7

Tansitional probabilities

Denote the conditional probability P(Yt=j | Yt-1=i) by ij| (t) ,the { ij| (t)} are called Tansitional probabilities, which satisfy

j1 (t) i|j . The II matrix { ij| (t) ,i=1,…,I,

j=1,…,I} is a transition probability matrix. It is called one-step, to distinguish it from the matrix probabilities for k-step transitions from time t-k to time t.

8

Transitional Models without Explanatory VariablesAt first, we ignore explanatory variables. Let f(y0,…,yT) denote the

joint probability mass function of (Y0,…,YT),transitional models use the factorization:

f(y0,…,yT) =f(y0)f(y1|y0)f(y2|y0,y1)…f(yT|y0,y1,…,yT-1)This model is conditional on the previous responses.For Markov chains,f(y0,…,yT) =f(y0)f(y1|y0)f(y2|y1)…f(yT|yT-1) (*)From it, a Markov chain depends only on one-step transition

probabilities and the marginal distribution for the initial state. It also follows that the joint distribution satisfies loglinear model (Y0Y1, Y1Y2,…, YT-1YT)

For a sample of realizations of a stochastic process, a contingency table displays counts of the possible sequences. A test of fit of this loglinear model checks whether the process plausibly satisfies the Markov property.

9

Inference for Markov chains

Use standard methods of categorical analysis. eg, ML estimation of transition probabilities. Let n ij(t) denote the number of transitions from state i at time t-1 to state j at time t. For fixed t, { n ij(t) }form the two-way marginal table for dimensions t-1 and t of an I 1T contingency table. For the Ni+(t) subjects in category I at time t-1,suppose that { n ij(t),j=1,…,I} have a multinomial distribution with parameters { ij| (t)}.Let {ni0}denote the initial counts. Suppose that they also have a multinomial distribution, with parameters { 0i }

10

Inference for Markov chains(continue)

If subjects behave independently, from the (*), the likelihood function is proportional to

I

i 1

nioio

T

t

tnij

ij

I

j

I

i

t1

)(

|11

)(

The transitional probabilities are parameters of IT independent multinomial distributions. From Anderson and Good man (1957), the ML estimates are ̂ ij| (t)=n ij(t)/n i (t)

11

Example 1 (ignoring explanatory variables)A study at Harvard of effects of air pollution on respiratory illness in children.The children were examined annually at ages 9 through 12 and classified according to the presence or absence of wheeze. Let Yt denote the binary response at age t, t=9,10,11,12.1 wheeze;2 no wheeze

y9 y10 y11 y12 count

y9 y10 y11 y12 count

1 1 1 1 94 2 1 1 1 19

1 1 1 2 30 2 1 1 2 15

1 1 2 1 15 2 1 2 1 10

1 1 2 2 28 2 1 2 2 44

1 2 1 1 14 2 2 1 1 17

1 2 1 2 12 2 2 1 2 42

1 2 2 1 12 2 2 2 1 35

1 2 2 2 63 2 2 2 2 572

12

Code of Example 1

Code of 11.7 data breath; input y9 y10 y11 y12 count; datalines; 1 1 1 1 94 1 1 1 2 30 1 1 2 1 15 1 1 2 2 28 1 2 1 1 14 1 2 1 2 9 1 2 2 1 12 1 2 2 2 63 2 1 1 1 19 2 1 1 2 15 2 1 2 1 10 2 1 2 2 44 2 2 1 1 17 2 2 1 2 42 2 2 2 1 35 2 2 2 2 572 ; proc genmod; class y9 y10 y11 y12; model count= y9 y10 y11 y12 y9*y10 y10*y11 y11*y12 /dist=poi lrci type3 residuals obstats; run; proc genmod; class y9 y10 y11 y12; model count= y9 y10 y11 y12 y9*y10 y9*y11 y10*y11 y10*y12 y11*y12 y9*y10*y11 y10*y11*y12/dist=poi lrci type3 residuals obstats; run; proc genmod; class y9 y10 y11 y12; model count= y9 y10 y11 y12 y9*y10 y9*y11 y9*y12 y10*y11 y10*y12 y11*y12 /dist=poi lrci type3 residuals obstats; run; data breath_new;set breath; a=y9*y10+y10*y11+y11*y12; b=y9*y12+Y10*y12+y9*y11; proc genmod; class y9 y10 y11 y12; model count= y9 y10 y11 y12 a b /dist=poi lrci type3 residuals obstats; run;

13

Data analysis

The loglinear model (y9y10,y10y11,y11y12) a first order Markov chain. P(Y11|Y9,Y10)=P(Y11|Y10)

P(Y12|Y10,Y11)=P(Y12|Y11) G²=122.9025, df=8, with p-value<0.0001, it

fits poorly. So given the state at time t, classification at time t+1 depends on the states at times previous to time t.

14

Data analysis (cont…)

Then we consider model (y9y10y11, y10y11y12),a second-order Markov chain, satisfying conditional independence at ages 9 and 12, given states at ages 10 and 11.

This model fits poorly too, with G²=23.8632,df=4 and p-value<0.001.

15

Data analysis (cont)

The loglinear model (y9y10,y9y11,y9y12,y10y11,y10y12,y11y12) that permits association at each pair of ages fits well, with G²=1.4585,df=5,and p-value=0.9178086.

Parameter Estimate Error Limits Square Pr > ChiSqy9*y10 1.8064 0.1943 1.4263 2.1888 86.42 <.0001y9*y11 0.9478 0.2123 0.5282 1.3612 19.94 <.0001y9*y12 1.0531 0.2133 0.6323 1.4696 24.37 <.0001y10*y11 1.6458 0.2093 1.2356 2.0569 61.85 <.0001y10*y12 1.0742 0.2205 0.6393 1.5045 23.74 <.000y11*y12 1.8497 0.2071 1.4449 2.2574 79.81 <.0001

16


From above, we see that the association seems similar for pairs of ages1 year apart, and somewhat weaker for pairs of ages more than 1 year apart. So we consider the simpler model in which

It also fits well, with G²=2.3, df=9, and p-value= 0.9857876.

109 yyij = 1110yy

ij = 1211 yyij and 119 yy

ij = 129 yyij = 1210yy

ij

17

Estimated Conditonal Log Odds Ratios

Association Estimate Simpler Structure

Y9Y10 1.81 1.75 Y10Y11 1.65 1.75 Y11Y12 1.85 1.75 Y9Y11 0.95 1.04 Y9Y12 1.05 1.04 Y10Y12 1.07 1.04

18

Transitional Models with Explanatory

Variables

The joint mass function of T sequential responses is f(y1,…,yT;X) =f(y1;X)f(y2|y1;X)f(y3|y1,y2;X)…f(yT|y1,y2,…,yT-

1;X) For binary y, we can use a logistic regression model for each term in the above factorization, f(yt|y1,y2,…,yt-1;Xt)=

)exp(1

)](exp[

1111

1111

ttt

tttt

Xyy

Xyyy

, y t=0,1

The model treats previous responses as explanatory variables. It is called regressive logistic model (Bonney 1987). The interpretation and magnitude of depends on how many previous observations are in the model. Continue…

19

Within-cluster effects may diminish markedly by conditioning on previous responses. This is an important difference from marginal models, for which the interpretation does not depend on the specification of the dependence structure. In the special case of first-order Markov structure, the coefficients of (y1,…,yt-2) equal 0 in the model for yt. Given the predictor, the model treats repeated transitions by a subject as independent. Thus, one can fit the model with ordinary GLM software, treating each transition as a separate observation. (Bonney 1986)

20

Data Anylysis

Example 2 (with explanatory variables) At ages 7 to 10, children were evaluated

annually on the presence of respiratory illness. A predictor is maternal smoking at the start of the study, where s=1 for smoking regularly and s=0 otherwise.

21

Child’s Respiratory Illness by Age and Maternal

Smoking Child’s Respiratory Illness by Age and Maternal Smoking No Maternal

Smoking (S=0)

Maternal Smoking (S=1)

Child’s Respiratory Illness

Age 10 Age 10

Age7 Age8 Age9

No Yes No Yes

No No No 237 10 118 6 Yes 15 4 8 2 Yes No 16 2 11 1 Yes 7 3 6 4 Yes No No 24 3 7 3 Yes 3 2 3 1 Yes No 6 2 4 2 Yes 5 11 4 7

22


Let yt denote the response at age t (t=7,8,9,10). Regressive logistic model Logit[p(yt=1)]= + 1321 tyts , t=8,9,10 Each subject contributes three observations to the model fitting. The data set consists of 12 binomials, for the 2*3*2 combinations of (s,t,yt-1). EG, for the combination (0,8,0), s=0, t=8, yt-1=7, then y8=0 for 237+10+15+4=266 subjects and y8=1 for 16+2+7+3=28 subjects.

23

Code of Example 2

data illness; input t tp ytp yt s count; datalines; 8 7 0 0 0 266 8 7 0 0 1 134 8 7 0 1 0 28 8 7 0 1 1 22 8 7 1 0 0 32 8 7 1 0 1 14 8 7 1 1 0 24 8 7 1 1 1 17 9 8 0 0 0 274 9 8 0 0 1 134 9 8 0 1 0 24 9 8 0 1 1 14 9 8 1 0 0 26 9 8 1 0 1 18 9 8 1 1 0 26 9 8 1 1 1 21

9 8 1 0 0 26 9 8 1 0 1 18 9 8 1 1 0 26 9 8 1 1 1 21 10 9 0 0 0 283 10 9 0 0 1 140 10 9 0 1 0 17 10 9 0 1 1 12 10 9 1 0 0 30 10 9 1 0 1 21 10 9 1 1 0 20 10 9 1 1 1 14 ; run; proc logistic descending; freq count; model yt = t ytp s/scale=none

aggregate; run;

24

Output from SAS

Deviance and Pearson Goodness-of-Fit Statistics Criterion DF Value Value/DF Pr > ChiSq Deviance 8 3.1186 0.3898 0.9267 Pearson 8 3.1275 0.3909 0.9261 Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -0.2926 0.8460 0.1196 0.7295 t 1 -0.2428 0.0947 6.5800 0.0103 ytp 1 2.2111 0.1582 195.3589 <.0001 s 1 0.2960 0.1563 3.5837 0.0583

25

Analysis

The MLE fit is Logit[ )1(ˆ typ ]=log

)1(1

)1(

t

t

yp

yp =-0.2926-0.24282t+0.2960s+2.2111 1ty

)1(ˆ typ =] 2.2111+0.2960s+0.24282t--0.2926exp[1

2.2111]+0.2960s+0.24282t-6exp[-0.292

=

]2111.22960.024282.02926.0exp[

1

1 tys is an increasing

function of s and Yt-1, a decreasing function of t. Then: If S and Yt-1 are fixed, p(Yt=1)>p(Yt-1=1),which means that a younger child is easier to have illness. If t and Yt-1 are fixed, when s=1, P(Yt=1) is bigger than that when s=0. Which means that a child whose mother smokes has bigger chance to have illness than those whose mother does not smoke. If t and s are fixed, when Yt-1=1, P(Yt=1) is bigger than that when Yt-1=0, which means that if a child had illness when he was t-1, he would have more probability to have illness at age t than a child who didn’t have illness at age t-1.

26

The model fits well, with G²=3.1186, df=8, p-value=0.9267.

The coefficient of is 2.2111 with SE 0.1582 , Chi-Square statistic 195.3589 and p-value <.0001 ,which shows that the previous observation has a strong positive effect. So if a child had illness when he was t-1, he would have more probability to have illness at age t than a child who didn’t have illness at age t-1.

The coefficient of s is 0.2960, the likelihood ratio test of H0 :=0 is 3.5837,df=1,with p-value 0.0583. There is slight evidence of a positive effect of maternal smoking.

27

Interpratation of Paramters ß Interpratation of Paramters ß :

Logit[ )1(ˆ typ ]=log)1(1

)1(

t

t

yp

yp = + 1321 tyts ,

Then )0,8,0|0(

)0,8,0|1(

7

7

ytsYP

ytsyP

t

t =exp( +8 2 )=exp(-0.2926+8*(-

0.2428))=0.107, P( 1ty )=0.0967. If a child did not have illness at age 7,and his mother did not smoke, the probability that he would have illness at age 8 is 0..0967.

)1,8,0|0(

)1,8,0|1(

7

7

ytsyP

ytsyP

t

t =exp( +8 2 + 3 )=exp(-0.2926-

8*0.2428+2.2111)=0.9764, P( 1ty )=0.494 >> 0.0967 So for those children whose mother didn’t smoke, if the child had illness at age 7, he would have the probability of 0.494 to have illness at age 8.

3 =log)1,8,0|1(

)1,8,0|1(

7

7

ytsyP

ytsyP

t

t -log)0,8,0|0(

)0,8,0|1(

7

7

ytsYP

ytsyP

t

t

And in this way, we can get the interpretation of other parameters.

28

Thank you !

1 markov chains: transitional modeling qi liu. 2 content terminology transitional models without...

Documents

yt slide

dependence of yt

conditional distribution

markov chains slide

markov property

tansitional matrix slide

explanatory variablesexample

markov chains data analysis