count data. ht cleopatra vii & marcus antony c c aa

Count Data

),(~ HH pnBinX

),;(~),( THTH ppnMNomXX

),,,;(~),,,( 621621 pppnMNomXXX

Cleopatra VII & Marcus Antony

),,,;(~),,,( cacACaCAcacACaCA ppppnMNomXXXX

1st 122nd 123rd 12

Gregor Mendel, 1822-1884

RY Ry rY ry Total

Obs. 950 250 350 50 1600

Expect( )

900 300 300 100 1600

)1:3:3:9():::(:0 ryrYRyRY ppppH

)1:3:3:9():::(:1 ryrYRyRY ppppH

Which statement is right or ?1H0H

RY Ry rY ry Total

Obs. 950 250 350 50 1600

Expect 900 300 300 100 1600

O-E 50 -50 50 -50 0

2500 2500 2500 2500 100002)( EO

),,,;(~),,,( 43214321 ppppnMNomXXXX

4,3,2,1,),(~ inpPoissonX iiii

iiii NPoisson largefor,),(~)(

)1(~,)1,0(~ 2

),,,;(~),,,( 43214321 ppppnMNomXXXX

)1(~ 2

nXXXX 4321

)14(~ 24

1 2 3 4 Total

Obs. 950 250 350 50 1600

Expect 900 300 300 100 1600

O-E 50 -50 50 -50 0

25/9 25/3 25/3 25 25*15/9EEO /)( 2

)3(~ 2

1 3 5 8 15 24 ∞

0.975 0.001

0.831 2.180

12.401

0.95 0.004

1.145 2.733

13.848

0.05 3.841

11.071

15.507

24.996

36.415

0.025 5.024

12.833

17.535

27.488

39.364

0 2 4 6 8 10

- 2 0 2 4 6 8 10 12

Â2®(n)Â2®(n) Â2

®(n)Â2®(n)

RY Ry rY ry Total

Obs. 950 250 350 50 1600

Expect( )

900 300 300 100 1600

)1:3:3:9(),,,(:0 ryrYRyRY ppppH

)1:3:3:9(),,,(:1 ryrYRyRY ppppH

815.744.449/16*25 2

3,05.0

> x <- c(950,250,350,50)> p <- c(9,3,3,1)/16> chisq.test(x, p=p) Chi-squared test for given probabilitiesdata: x X-squared = 44.4444, df = 3, p-value = 1.214e-09

MM ppHvsppH *1

*0 :.:

303)dim()dim( 010 HHHdf

Y y Total

R 950 250 1200

r 350 50 400

Total 1300 300 1600

scscscsc pppHvspppH ),(1),(0 :.:

cs),( RYp ),( Ryp

),( ryp),( rYp

16/13 16/3

cs),( RYp ),( Ryp

),( ryp),( rYp

16/13 16/3

12 16/12

16/13 16/3

Chi-square test for Independence test

RY Ry rY ry Total

Obs. 950 250 350 50 1600

Expect( )

1600975 225 325 75

12 16/12

16/13 16/3

R 1200

1300 300 1600

121600

RY Ry rY ry Total

Obs. 950 250 350 50 1600

Expect( )

0.64 2.77 1.92 8.33 13.67

975 225 325 750H

EEO /)( 2

84.367.13 2

1,05.0

> mx<- matrix(c(950,250,350,50),2,)> chisq.test(mx,correct=F) Pearson's Chi-squared testdata: mx X-squared = 13.6752, df = 1, p-value = 0.0002173

> mx [,1] [,2][1,] 950 350[2,] 250 50

cs),( RYp ),( Ryp

),( ryp),( rYp

1 rR pp

1 yY pp

1)12()12( df

123)dim()dim( 010 HHHdf

2)dim( 0 H

3)dim( 10 HH

y1 … ym Tot

)1()1()dim()dim( 010 kmHHHdf

)1()1()dim( 0 kmH

1)dim( 10 mkHH

Obs. 8 12 7 14 9 10 60Expec ( )

10 10 10 10 10 10 60

0.4 0.4 0.9 1.6 0.1 0 3.4

)1:1:1:1:1:1():::::(: 6543210 ppppppH

)1:1:1:1:1:1():::::(: 6543211 ppppppH

EEO /)( 20H

25,05.0

22 07.114.3

> x <- c(8,12,7,14,9,10)> p <- rep(1,6)/6> chisq.test(x,p=p) Chi-squared test for given probabilitiesdata: x X-squared = 3.4, df = 5, p-value = 0.6386

H T Total

Obs. 60 40 100Expec( )

50 50 100

2 2 4EEO /)( 20H

)1:1():(:0 TH ppH

)1:1():(:1 TH ppH

2/1:0 HpH

2/1:1 HpH

)1:1():(:1 TH ppH

21,05.0

22 84.34

> chisq.test(c(60,40),p=c(1,1)/2) Chi-squared test for given probabilitiesdata: c(60, 40) X-squared = 4, df = 1, p-value = 0.0455

> head2 <- c( 560, 640)> toss2 <- c( 1000, 1000)> prop.test(head2, toss2)2-sample test for equality of proportions ….data: head2 out of toss2 X-squared = 13.0021, df = 1, p-value = 0.0003111alternative hypothesis: two.sided 95 percent confidence interval: -0.12379728 -0.03620272 sample estimates:prop 1 prop 2 0.56 0.64

Caesar Tolemy

Head 560 640

Tail 440 360

> chisq.test(mx,cor=F) Pearson's Chi-squared testdata: mx X-squared = 13.3333, df = 1, p-value = 0.0002607> chisq.test(mx) Pearson's Chi-squared test with Yates‘ continuity correctiondata: mx X-squared = 13.0021, df = 1, p-value = 0.0003111

> mx <- matrix(c(560,440,640,360),2,)> mx [,1] [,2][1,] 560 640[2,] 440 360

Chi-square test for Homogeneity of distributions

> > # H0 : all four coins have the same proportion showing head side> # H1 : at least one coin have different proportion to the others> > head4 <- c( 83, 90, 129, 70 )> toss4 <- c( 86, 93, 136, 82 )> prop.test(head4, toss4)

4-sample test for equality of proportions without continuity correction

data: head4 out of toss4 X-squared = 12.6004, df = 3, p-value = 0.005585alternative hypothesis: two.sided sample estimates: prop 1 prop 2 prop 3 prop 4 0.9651163 0.9677419 0.9485294 0.8536585

Coin 1 Coin 2 Coin 3 Coin 4

Head 83 90 129 70 Alive

Tail 3 3 7 12 Dead

Total 86 93 136 82 Total

Hospital 1

Hospital 2

Hospital 3 Hospital 4

> mx <- matrix(c(83,3,90,3,129,7,70,12),2,)> chisq.test(mx) Pearson's Chi-squared testdata: mx X-squared = 12.6004, df = 3, p-value = 0.005585

D W WD

CC 37 190 94

CR 23 59 23

RC 10 141 28

RR 15 58 26

Australia rare plants data

Common (C ) & Rare (R ) in ( South Australia, Victoria) and (Tasmania )

The number of plants:

in Dry (D ), Wet (W ) and Wet or Dry (WD ) regions.

Question (null hypothesis):

Is the distribution of plants for (D,W,WD) are equal for all CC, CR, RC and RR?

Australia rare plants data

> rareplants<-matrix(c(37,23,10,15,190,59,141,58,94,23,28,16),4,)> dimnames(rareplants)<-list(c("CC","CR","RC","RR"),c("D","W","WD"))> rareplants> (sout<- chisq.test(rareplants) )

Pearson's Chi-squared test

data: rareplants X-squared = 34.9863, df = 6, p-value = 4.336e-06

> round( sout$expected ,1 ) D W WDCC 39.3 207.2 74.5CR 12.9 67.8 24.4RC 21.9 115.6 41.5RR 10.9 57.5 20.6> round( sout$resid ,3 ) D W WDCC -0.369 -1.196 2.263CR 2.828 -1.067 -0.275RC -2.547 2.368 -2.099RR 1.242 0.072 -1.023

The lady tasting tea

http://www.youtube.com/watch?v=lgs7d5saFFc

http://en.wikipedia.org/wiki/Fisher's_exact_test

Fisher’s exact test for 2X2 tables with small n (n<25)

> chisq.test(matrix(c(7,2,1,5),2,)) Pearson's Chi-squared test with Yates' continuity correctionX-squared = 3.2254, df = 1, p-value = 0.0725Warning message: 카이 자승 근사는 부정확할지도 모릅니다> fisher.test(matrix(c(7,2,1,5),2,)) Fisher's Exact Test for Count Data p-value = 0.04056alternative hypothesis: true odds ratio is not equal to 1 95 percent confidence interval: 0.8646648 934.0087368 sample estimates: odds ratio 13.59412 > fisher.test(matrix(c(7,2,1,5),2,),alter="greater") Fisher's Exact Test for Count Datap-value = 0.03497alternative hypothesis: true odds ratio is greater than 1 95 percent confidence interval: 1.179718 Inf sample estimates: odds ratio 13.59412

Guess\Making Milk 1st Tea 1st Sum

Milk 1st 7 1 8

Tea 1st 2 5 7

sum 9 6 15

There are 7 possible tables for given marginal counts.

G\M M 1st

M 1st 8 0 8

T 1st 1 6 7

sum 9 6 15

G\M M 1st

M 1st 7 1 8

T 1st 2 5 7

sum 9 6 15

G\M M 1st

M 1st 6 2 8

T 1st 3 4 7

sum 9 6 15

G\M M 1st

M 1st 5 3 8

T 1st 4 3 7

sum 9 6 15G\M M

M 1st 4 4 8

T 1st 5 2 7

sum 9 6 15

G\M M 1st

M 1st 3 5 8

T 1st 6 1 7

sum 9 6 15

G\M M 1st

M 1st 2 6 8

T 1st 7 0 7

sum 9 6 15

What is the probability that each table will show at the experiment ?

G\M M 1st

M 1st a b a+b

T 1st c d c+d

sum a+c b+d n

G\M M 1st

M 1st r q v

T 1st 1-r 1-q 1-v

sum 1 1 1

1 means no discernible ability.

Odds ratio :

qrv dcban

1:.1: 10 HvsH

dbcadcba

p!!!!!

)!()!()!()!(

1:.1: 10 HvsH

ad with some

correction

G\M M 1st

M 1st 8 0 8

T 1st 1 6 7

sum 9 6 15G\M M

M 1st 7 1 8

T 1st 2 5 7

sum 9 6 15

G\M M 1st

M 1st 6 2 8

T 1st 3 4 7

sum 9 6 15

G\M M 1st

M 1st 5 3 8

T 1st 4 3 7

sum 9 6 15

G\M M 1st

M 1st 4 4 8

T 1st 5 2 7

sum 9 6 15

G\M M 1st

M 1st 3 5 8

T 1st 6 1 7

sum 9 6 15

G\M M 1st

M 1st 2 6 8

T 1st 7 0 7

sum 9 6 15

0.00140

0.03356 0.19580

0.00560

0.39161

0.29370 0.07832

0.00140 + 0.03356 + 0.00560 = 0.04056 (See, p-value of the fisher exact test; two-sided test)

0.00140 + 0.03356 = 0.03497 (one-sided test)

G\M M 1st

M 1st 9 0 9

T 1st 0 6 6

sum 9 6 15

G\M M 1st

M 1st 4 4 8

T 1st 5 2 7

sum 9 6 15100% correct answers Some are misclassified

Fisher exact test considers only the cases with the same fixed margins.

The probabilities of tables with different margins are completely ignored.

This is referred to data-respecting (?) inference, from time to time.

Use Fisher’s exact test only for small n ( less than 25).

> Pearson's Chi-squared testX-squared = 10.8036, df = 1, p-value = 0.001013> chisq.test(matrix(c(14,4,2,10),2,)) Pearson's Chi-squared test with Yates' continuity correctionX-squared = 8.4877, df = 1, p-value = 0.003576> fisher.test(matrix(c(14,4,2,10),2,)) Fisher's Exact Test for Count Datap-value = 0.002185alternative hypothesis: true odds ratio is not equal to 1 95 percent confidence interval: 2.123319 202.143800 sample estimates: odds ratio 15.40804

Milk 1st 14 2 16

Tea 1st 4 10 14

sum 18 12 30

No big difference when n is large !

Yates’ continuity correction

8036.10

))()()((

)( 222

dbcadcba

G\M M 1st

M 1st a b a+b

T 1st c d c+d

sum a+c b+d ndcban

4877.8))()()((

)2/|(|21

||corrected

dbcadcba

nbcadn

Odds ratio : q1q

rlog)log(

0.0 0.2 0.4 0.6 0.8 1.0

-6 -4 -2 0 2 4 6

),(~ 2ii NYtindependen

Regressionii X 0

Generalized Linear Model (GLM)

iij ),(~ 2ijij NY

ijii X ),(~ 2ijij NY

Linear Model (LM)

tindependen

Linear Model (LM) - Regression, - ANOVA

)(~ ijij PoissonY

),(~ ijij pnBinY

tindependen

ijiij X )log(

Generalized Linear Model (GLM)

Poisson Regression

Binomial Regression ( Logistic Regression )

ijiij X ),(~ 2ijij NYtindependen

Milk 1st 7 1 8

Tea 1st 2 5 7

sum 9 6 15

),9(~ 11 pBinY

):;:(2,1,),,(~ GuessjMakingijipnBinY iii

),6(~ 22 pBinY

11 9 YV 22 6 YV 1,7 21 YY are observed!

Logistic regression

> tm<-data.frame(gm=c(7,1),gt=c(2,5), making=c("M","T"))> summary( glm(cbind(gm,gt)~making,family=binomial, data=tm) )Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 1.2528 0.8018 1.562 0.118 makingT -2.8622 1.3575 -2.108 0.035 *

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 5.7863e+00 on 1 degrees of freedomResidual deviance: 8.8818e-16 on 0 degrees of freedomAIC: 8.1909

Number of Fisher Scoring iterations: 4

Logistic regression with the lady tasting tea data

A B C D E F

10 11 0 3 3 11

7 17 1 5 5 9

20 21 7 12 3 15

14 11 2 6 5 22

14 16 3 4 3 15

12 14 1 3 6 16

10 17 2 5 1 13

23 17 1 5 1 10

17 19 3 5 3 26

20 21 0 5 2 26

14 7 1 2 6 24

13 13 4 4 4 13

A B C D E F

InsectSprays data

Type of spray

> sx<-rep(LETTERS[1:6],e=12)> dx<-c(10,7,20,14,14,12,10,23,17,20,14,13,11,17,21,11,16,14,17,17,19,21,7,13,+ 0,1,7,2,3,1,2,1,3,0,1,4,3,5,12,6,4,3,5,5,5,5,2,4,3,5,3,5,3,6,1,1,3,2,6,+ 4,11,9,15,22,15,16,13,10,26,26,24,13)> ax<- 30-dx> insect<-data.frame(dead=dx,alive=ax,spray=sx)> gout<-glm(cbind(dead,alive)~spray,family=binomial, data=insect)> summary( gout )

Call: glm(formula = cbind(dead, alive) ~ spray, family = binomial, data = insect)

Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -0.06669 0.10547 -0.632 0.5272 sprayB 0.11114 0.14913 0.745 0.4561 sprayC -2.52856 0.23259 -10.871 <2e-16 ***sprayD -1.56288 0.17719 -8.821 <2e-16 ***sprayE -1.95769 0.19513 -10.033 <2e-16 ***sprayF 0.28983 0.14958 1.938 0.0527 .

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 614.07 on 71 degrees of freedomResidual deviance: 171.24 on 66 degrees of freedomAIC: 416.16

> gres<-rbind(unique(fitted(gout)),unique(predict(gout)))> dimnames(gres)[[2]]<-LETTERS[1:6]

> gres A B C D E F[1,] 0.48333333 0.51111111 0.06944445 0.1638889 0.1166667 0.5555556[2,] -0.06669137 0.04445176 -2.59525468 -1.6295728 -2.0243818 0.2231436

> anova(gout)Analysis of Deviance Table

Model: binomial, link: logit

Response: cbind(dead, alive)

Terms added sequentially (first to last)

Df Deviance Resid. Df Resid. DevNULL 71 614.07spray 5 442.83 66 171.24

Correlation and causality

The more STBK stores, the higher will APT price increase ?

The more Starbucks, the higher APT price !

APT prices in Seoul

APT price

강남구 45 1030

강동구 2 530

중구 24 520

중랑구 0 330

STBK: the number of Starbucks stores

APT price: Average APT price by a 1 m2

)(~ ii PoissonY

ii X )log(

y<-c(45, 2,1,4,4,6,4,2,1,0,2,3,10,8,21,3,5,5,3,12,7,1,20,24,0)x<-c(3373,1907,1115,1413,1286,1861,1218,1018,1250,1135,1240,1528, 1675,1220,2854,1644,1247,2427,2034,1723,2594,1138,1634,1729,1101)

xm<- x/(3.3) # 평단가

( res<- glm(y~xm, family=poisson) )

anova(res)summary(res)

plot(xm,y,ylab="Starbucks",xlab="APT price/m2")

points(xm,fitted(res),col="red",pch=16) # exp(predict(res))=fitted(res)

300 400 500 600 700 800 900 1000

APT price/m2

> summary(res)

Call:glm(formula = y ~ xm, family = poisson)

Deviance Residuals: Min 1Q Median 3Q Max -2.6923 -1.7239 -0.6041 0.5783 5.3036

Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -0.0072064 0.2128074 -0.034 0.973 xm 0.0035630 0.0003009 11.841 <2e-16 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 235.19 on 24 degrees of freedomResidual deviance: 111.52 on 23 degrees of freedomAIC: 195.4

> anova(res)Analysis of Deviance Table

Model: poisson, link: log

Response: y

Terms added sequentially (first to last)

Df Deviance Resid. Df Resid. DevNULL 24 235.19xm 1 123.67 23 111.52

A 0.75 0.05 0.05 0.05 0.05 0.05

B 0.1 0.5 0.1 0.1 0.1 0.1

C 0.05 0.05 0.05 0.05 0.05 0.75

distribution & likelihood

,),(~ pnBinX

xnx ppx

What is ?

is observed.

)1,0(p

distribution & likelihood

0 1 2 3 4 5 6

0.0 0.2 0.4 0.6 0.8 1.0

0.133 0.587

42 )1(2

6)2( ppf

2)( jjj

nj ,,2,1 ),(~ 2 jj NYtindependen

jjj eyf 1

22 2/)(2/2

)2()|(likelihood

)2log(/)()|(log2hood)log(likeli2- 2

22 2/)(

)log(2 likelihoodDeviance

nj ,,2,1 tindependen

)(~ jj PoissonY

,2,1,0,!

jjj yeyf jj

)!/()|(likelihood

jjj yyyf

))!log(log(2)|(log2hood)log(likeli2-

)(~ jj PoissonY

jjj X )log(

jjj yyyf

))!log(log(2)|(log2hood)log(likeli2-

jjjjj xyx

1)log()(2

link function(for Poisson family)

:,2)log(2 kklikelhoodAIC the number of parameters

tindependen

linear modeling for the link function

)log(2 likelihoodDeviance

nj ,,2,1 tindependen

),(~ jj pnBinY

jj ppy

)1()|(likelihood

1log)1log(2hood)log(likeli2-

npyfyf yny ,...,1,0,)1()|()(

loglink function (for binomial family)

linear modeling for the link function

Independence test in GLM for Australia rare plants data

> rareplants<-matrix(c(37,23,10,15,190,59,141,58,94,23,28,16),4,)

> dimnames(rareplants)<-list(c("CC","CR","RC","RR"),c("D","W","WD"))> (sout<- chisq.test(rareplants) )

Pearson's Chi-squared testdata: rareplants X-squared = 34.9863, df = 6, p-value = 4.336e-06

> wdx<-rep(c("D","W","WD"),e=4)> crx<-rep(c("CC","CR","RC","RR"),3)> rplants<-data.frame(wd=wdx,cr=crx,r=c(rareplants))> anova( glm(r~wd*cr,family=poisson,data=rplants) )

Analysis of Deviance TableModel: poisson, link: log, Response: r

Terms added sequentially (first to last) Df Deviance Resid. Df Resid. DevNULL 11 522.11wd 2 305.28 9 216.83cr 3 181.88 6 34.95wd:cr 6 34.95 0 -9.77e-15

D W WD

CC 37 190 94

CR 23 59 23

RC 10 141 28

RR 15 58 26

)(~)log(2 2 dflikelhood > 1-pchisq(34.95,6) [1] 4.406699e-06

> # H0 : all four coins have the same proportion showing head side> # H1 : at least one coin have different proportion to the others> > head4 <- c( 83, 90, 129, 70 )> toss4 <- c( 86, 93, 136, 82 )> prop.test(head4, toss4) 4-sample test for equality of proportions without continuity correction X-squared = 12.6004, df = 3, p-value = 0.005585alternative hypothesis: two.sided > coins<-factor(LETTERS[1:4])> anova(glm(cbind(head4,toss4-head4)~coins,family=binomial))Analysis of Deviance TableTerms added sequentially (first to last) Df Deviance Resid. Df Resid. DevNULL 3 10.667coins 3 10.667 0 1.132e-14

Coin 1 Coin 2 Coin 3 Coin 4

Head 83 90 129 70 Alive

Tail 3 3 7 12 Dead

Total 86 93 136 82 Total

Hosp’l 1 Hosp’l 2 Hosp’l 3 Hosp’l 4

)(~)log(2 2 dflikelhood > 1-pchisq(10.667,3) [1] 0.01366980

Homogeneity test in GLM for coin tossing example

Thank you !!

count data. ht cleopatra vii & marcus antony c c aa

p chisquared test

mx chisq

toss4 x

toss2 x

toss22sample test

head4 toss4 prop

head2 toss2 prop

sided sample estimates

Documents

antony and cleopatra - gold falcon inc

carol rutter, draft of chapter 1, antony and cleopatra in

the death drive in shakespeare's antony and cleopatra

shakespeare’s antony and cleopatra

shakespeare - antony & cleopatra

shakespeare - antony and cleopatra (yale shakespeare)

shakespeare survey online · antony and cleopatra enter the...

the coinage system of cleopatra vii, marc antony and...

carmen february 2 antony & cleopatra march 16

r. gruenwald: antony and cleopatra, parts

02 - antony and cleopatrashakespeare.acobb.com/ebooks/antony...

antony & cleopatra caterers at collingswood grand...

antony and cleopatra - saint mary's college antony and...

antony and cleopatra - image.hollandfestival.nl...

antony and cleopatra - character extract · antony and...

antony and cleopatra in performance introduction: draft ·...

antony and cleopatra -...

shakespeare, william - antony and cleopatra

antony and cleopatra - internet archive

your strategy: the battle of actium. cape actium antony and...