logistic regression using r

31
7/23/2019 Logistic Regression Using R http://slidepdf.com/reader/full/logistic-regression-using-r 1/31   Association Analysis,  Logistic Regression,  R and S-PLUS Richard Mott http://bioinformatics.ell.o!.ac."#/lect"res/

Upload: soma1243

Post on 17-Feb-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Logistic Regression Using R

7/23/2019 Logistic Regression Using R

http://slidepdf.com/reader/full/logistic-regression-using-r 1/31

 

 Association Analysis,

 Logistic Regression,

 R and S-PLUS

Richard Mott

http://bioinformatics.ell.o!.ac."#/lect"res/

Page 2: Logistic Regression Using R

7/23/2019 Logistic Regression Using R

http://slidepdf.com/reader/full/logistic-regression-using-r 2/31

 

Logistic Regression in Statistical

$enetics

%  Applicable to Association St"dies

% &ata:

 ' (inary o"tcomes )eg disease stat"s*

 ' &ependent on genotypes + se!, enironment

%  Aim is to identify hich factors infl"ence the

o"tcome

% Rigoro"s tests of statistical significance% le!ible modelling lang"age

% $eneralisation of 0hi-S1"ared 2est

Page 3: Logistic Regression Using R

7/23/2019 Logistic Regression Using R

http://slidepdf.com/reader/full/logistic-regression-using-r 3/31

 

3hat is R 4

% Statistical analysis pac#age

% ree

%Similar to commercial pac#age S-PLUS% R"ns on Uni!, 3indos, Mac

% .r-pro5ect.org

% Many pac#ages for statistical genetics,microarray analysis aailable in R

% 6asily Programmable

Page 4: Logistic Regression Using R

7/23/2019 Logistic Regression Using R

http://slidepdf.com/reader/full/logistic-regression-using-r 4/31

 

Modelling in R

% &ata for indiid"al labelled i=1…n:

 ' Response y i 

 ' $enotypes g ij 

 for mar#ers j=1..m

Page 5: Logistic Regression Using R

7/23/2019 Logistic Regression Using R

http://slidepdf.com/reader/full/logistic-regression-using-r 5/31

 

0oding Unphased $enotypes

% Seeral possibilities:

 ' AA, A$, $$ original genotypes

 ' 78, 87, 88

 ' 7, 8, 9

 ' , 7, 8 ; of $ alleles

% Missing &ata

 ' <A defa"lt in R

Page 6: Logistic Regression Using R

7/23/2019 Logistic Regression Using R

http://slidepdf.com/reader/full/logistic-regression-using-r 6/31

 

Using R

% Load genetic logistic regression tools• > source(‘logistic.R’)

% Read data table from file– > t <- read.table(‘geno.dat’,

header=TRU)

% 0ol"mn names

– na!es(t)

– t"# response ),7*

– t"!$, t"!%, &. $enotypes for each mar#er 

Page 7: Logistic Regression Using R

7/23/2019 Logistic Regression Using R

http://slidepdf.com/reader/full/logistic-regression-using-r 7/31

 

0ontigency 2ables in R

• 'table(t"#,t"!$) prints the contingency table

> 'table(t"#,t"!$)

  $$ $% %% 

  *$* + *

$  %+ $$ %

>

Page 8: Logistic Regression Using R

7/23/2019 Logistic Regression Using R

http://slidepdf.com/reader/full/logistic-regression-using-r 8/31

 

0hi-S1"ared 2est in R

> chis.test(t"#,t"!$)

  earson/s 0hi-suared test

data1 t"# and t"!$

2-suared = .+3%3, d' = %, 4-5alue = .$363

7arning !essage1

0hi-suared a44ro8i!ation !a# be incorrect in1chis.test(t"#, t"!$)

>

Page 9: Logistic Regression Using R

7/23/2019 Logistic Regression Using R

http://slidepdf.com/reader/full/logistic-regression-using-r 9/31

 

2he Logistic Model

% Prob)Y i =* = e!p)ηi )/(1+e!p)ηi **

∀ ηi = Σ j  !i5 b 5  - Linear Predictor 

% !i5  – &esign Matri! )genotypes etc*

% b 5  – Model Parameters )to be estimated*

% Model is inestigated by

 ' estimating the b 5>s by ma!im"m li#elihood

 ' testing if the estimates are different from

Page 10: Logistic Regression Using R

7/23/2019 Logistic Regression Using R

http://slidepdf.com/reader/full/logistic-regression-using-r 10/31

 

2he Logistic "nction

Prob)Y i =* = e!p)ηi )/(1+e!p)ηi **

η

Prob)?=*

Page 11: Logistic Regression Using R

7/23/2019 Logistic Regression Using R

http://slidepdf.com/reader/full/logistic-regression-using-r 11/31

 

2ypes of genetic effect at a single

loc"s

AA AG GG

Recessive 0 0 1

Dominant 1 1 0

Additive 0 1 2

Genotype 0   α β

Page 12: Logistic Regression Using R

7/23/2019 Logistic Regression Using R

http://slidepdf.com/reader/full/logistic-regression-using-r 12/31

 

 Additie $enotype Model

% 0ode genotypes as '  AA !=, '  A$ !=7, ' $$ !=8

% Linear Predictor   η = b  !b7

%P)?=@!* = e!p)b

:

  !b7

*/)7e!p)b:

  !b7

**% P AA = P)?=@!=* = e!p)b:*/)7e!p)b:**

% P A$ = P)?=@!=7* = e!p)b:  b7*/)7e!p)b:  b7**

% P$$ = P)?=@!=8* = e!p)b:  8b7*/)7e!p)b:  8b7**

Page 13: Logistic Regression Using R

7/23/2019 Logistic Regression Using R

http://slidepdf.com/reader/full/logistic-regression-using-r 13/31

 

 Additie Model: b = -8 b7 = 8

P AA = .78 P A$ = . P$$ = .BB

η

Prob)?=*

Page 14: Logistic Regression Using R

7/23/2019 Logistic Regression Using R

http://slidepdf.com/reader/full/logistic-regression-using-r 14/31

 

 Additie Model: b = b7 = 8

P AA = . P A$ = .BB P$$ = .CB

η

Prob)?=*

Page 15: Logistic Regression Using R

7/23/2019 Logistic Regression Using R

http://slidepdf.com/reader/full/logistic-regression-using-r 15/31

 

Recessie Model

% 0ode genotypes as '  AA !=,

 '  A$ !=,

 ' $$ !=7

% Linear Predictor   η = b  !b7

% P)?=@!* = e!p)b:  !b7*/)7e!p)b:  !b7**% P AA = P A$ = P)?=@!=* = e!p)b:*/)7e!p)b:**

% P$$ = P)?=@!=7* = e!p)b:  b7*/)7e!p)b:  b7**

Page 16: Logistic Regression Using R

7/23/2019 Logistic Regression Using R

http://slidepdf.com/reader/full/logistic-regression-using-r 16/31

 

Recessie Model: b = b7 = 8

P AA = P A$ = . P$$ = .BB

η

Prob)?=*

Page 17: Logistic Regression Using R

7/23/2019 Logistic Regression Using R

http://slidepdf.com/reader/full/logistic-regression-using-r 17/31

 

$enotype Model

% 6ach genotype has an independent probability% 0ode genotypes as )for e!ample*

 '  AA !=, y= '  A$ !=7, y=

 ' $$ !=, y=7

% Linear Predictor   η = b:  !b7yb8 to parameters

% P)?=@!* = e!p)b  !b7yb8*/)7e!p)b  !b7yb8**% P AA = P)?=@!=,y=* = e!p)b*/)7e!p)b**

% P A$ = P)?=@!=7,y=* = e!p)b  b7*/)7e!p)b  b7**

% P$$ = P)?=@!=,y=7* = e!p)b  b8*/)7e!p)b  b8**

Page 18: Logistic Regression Using R

7/23/2019 Logistic Regression Using R

http://slidepdf.com/reader/full/logistic-regression-using-r 18/31

 

$enotype Model: b = b7 = 8 b8 = -7

P AA = . P A$ = .BB P$$ = .8D

η

Prob)?=*

Page 19: Logistic Regression Using R

7/23/2019 Logistic Regression Using R

http://slidepdf.com/reader/full/logistic-regression-using-r 19/31

 

Models in R

response ygenotype g

AA AG GG model DF

Recessive 0 0 1 y ~ dominant(g) 1

Dominant 0 1 1 y ~ recessive(g) 1

Additive 0 1 2 y ~ additive(g) 1

Genotype 0   α β   y ~ genotype(g) 2

Page 20: Logistic Regression Using R

7/23/2019 Logistic Regression Using R

http://slidepdf.com/reader/full/logistic-regression-using-r 20/31

 

&ata 2ransformation

• g <- t"!$

% "se these f"nctions to treat a genotype

ector in a certain ay:– a <- additi5e(g)

– r <- recessi5e(g)

– d <- do!inant(g)

– g <- genot#4e(g)

Page 21: Logistic Regression Using R

7/23/2019 Logistic Regression Using R

http://slidepdf.com/reader/full/logistic-regression-using-r 21/31

 

itting the Model

• a'it <- gl!( t"# 9 additi5e(g),'a!il#=‘bino!ial’)

• r'it <- gl!( t"# 9 recessi5e(g),'a!il#=‘bino!ial’)

• d'it <- gl!( t"# 9 do!inant(g),'a!il#=‘bino!ial’)

• g'it <- gl!( t"# 9 genot#4e(g),'a!il#=‘bino!ial’)

% 61"ialent models:

 ' genotype = dominant recessie

 ' genotype = additie recessie

 ' genotype = additie dominant

 ' genotype E standard chi-s1"ared test of genotype

association

Page 22: Logistic Regression Using R

7/23/2019 Logistic Regression Using R

http://slidepdf.com/reader/full/logistic-regression-using-r 22/31

 

Parameter 6stimates

> su!!ar#(gl!( t"# 9 genot#4e(t"!$), 'a!il#=/bino!ial/))

0oe''icients1

  sti!ate :td. rror ; 5alue r(>;)

b (nterce4t) -%.$% .$3$ -$*.6 <%e-$6 ???

b$ genot#4e(t"!$)$% -.63+6 .6%$ -$.$ . .

b% genot#4e(t"!$)%% -.$%3 .3% -.6 .%

---

:igni'. codes1 @???/ .$ @??/ .$ @?/ .* @./ .$ @ / $

>

Page 23: Logistic Regression Using R

7/23/2019 Logistic Regression Using R

http://slidepdf.com/reader/full/logistic-regression-using-r 23/31

 

 Analysis of &eiance

0hi-S1"ared 2est

> anova(gl!( t"# 9 genot#4e(t"!$), 'a!il#=/bino!ial/))

Anal#sis o' Be5iance Table

Codel1 bino!ial, linD1 logit

Res4onse1 t"#

Ter!s added seuentiall# ('irst to last)

  B' Be5iance Resid. B' Resid. Be5

EUFF $$ 3.$

genot#4e(t"!$) % .6  $$* .6

Page 24: Logistic Regression Using R

7/23/2019 Logistic Regression Using R

http://slidepdf.com/reader/full/logistic-regression-using-r 24/31

 

Model 0omparison

% 0ompare general model ith additie,

dominant or recessie models:> a'it <- gl!(t"# 9 additi5e(t"!%))

> g'it <- gl!(t"# 9 genot#4e(t"!%))> ano5a(a'it,g'it)

Anal#sis o' Be5iance Table

Codel $1 t"# 9 additi5e(t"!%)

Codel %1 t"# 9 genot#4e(t"!%)

  Resid. B' Resid. Be5 B' Be5iance$ $$6 +.$

% $$* +.$%3 $ .$

F

Page 25: Logistic Regression Using R

7/23/2019 Logistic Regression Using R

http://slidepdf.com/reader/full/logistic-regression-using-r 25/31

 

Scanning all Mar#ers

> logscan(t,!odel=‘additi5e’)

  Be5iance BG 5al Fog5al

!$ +.63$eH $ .*+e- %.333*+

!% .6eH $ .+%6e- %.+36*%%

! 6.6++%e-$ $ 3.$63%%e-$ .+36*6

!3 .+$%+6eH $ *.+6*3e-% $.%6$$3

!* .$36eH $ .$6e- %.$6%**++

!6 %.33$%eH $ $.$*e-$ .%6%+*+

! %.$+*6$eH $ $.*6e-$ .+*6$*66

!+ $.%%$$eH $ %.6*e-$ .*$+*%

! %.*%*6%eH$ $ 3.+3%*e- 6.$33*6*

!$ *.%63eH$ $ .3+*$+e-$3 $.3%6$3+!$$ .$33$eH$ $ %.3+%e-+ .63+%*

&

&

&

Page 26: Logistic Regression Using R

7/23/2019 Logistic Regression Using R

http://slidepdf.com/reader/full/logistic-regression-using-r 26/31

 

M"ltiloc"s Models

% 0an test the effects of fitting to or more

mar#ers sim"ltaneo"sly

% Seeral m"ltiloc"s models are possible

% Gnteraction Model ass"mes that each

combination of genotypes has a different

effect

% eg t"# 9 t"!$ ? t"!$* 

Page 27: Logistic Regression Using R

7/23/2019 Logistic Regression Using R

http://slidepdf.com/reader/full/logistic-regression-using-r 27/31

 

M"lti-Loc"s Models

> ' <- gl!( t"# 9 genot#4e(t"!$) ? genot#4e(t"!%6) , 'a!il#=/bino!ial/)

> ano5a(')

Anal#sis o' Be5iance Table

Codel1 bino!ial, linD1 logit

Res4onse1 t"#

Ter!s added seuentiall# ('irst to last)

  B' Be5iance Resid. B' Resid. Be5

EUFF $$ 3.$

genot#4e(t"!$) % $+.6+ $$* %*.

genot#4e(t"!%6) % $.$3 $$ %.+

genot#4e(t"!$)1genot#4e(t"!%6) 6. $$ %%.+6

> 4chis(6.,%,loIer.tail=G) calc"late p-al"e 

J$K .33*+3

Page 28: Logistic Regression Using R

7/23/2019 Logistic Regression Using R

http://slidepdf.com/reader/full/logistic-regression-using-r 28/31

 

 Adding the effects of Se! and other

0oariates

% Read in se! and other coariate data, eg.

age from a file into ariables, say a"se8,

a"age

% it models of the form• 'it$ <- gl!(t"# 9 additi5e(t"!$) H a"se8 H a"age, 'a!il#=‘bino!ial’) • 'it% <- gl!(t"# 9 a"se8 H a"age, 'a!il#=‘bino!ial’)

Page 29: Logistic Regression Using R

7/23/2019 Logistic Regression Using R

http://slidepdf.com/reader/full/logistic-regression-using-r 29/31

 

 Adding the effects of Se! and other

0oariates

% 0ompare models "sing anoa ' test if the effect

of the mar#er m7 is significant after ta#ing into

acco"nt se! and age

• ano5a('it$,'it%)

Page 30: Logistic Regression Using R

7/23/2019 Logistic Regression Using R

http://slidepdf.com/reader/full/logistic-regression-using-r 30/31

 

M"ltiple 2esting

% 2a#e care interpreting significance leels henperforming m"ltiple tests

% Lin#age dise1"ilibri"m can red"ce the effectie n"mber ofindependent tests

% Perm"tation is a safe proced"re to determine significance% Repeat 5=7..< times: ' Perm"te disease stat"s y beteen indiid"als ' it all mar#ers ' Record ma!im"m deiance !a8de5JLK oer all mar#ers

% Perm"tation p-al"e for a mar#er is the proportion oftimes the perm"ted ma!im"m deiance across allmar#ers e!ceeds the obsered deiance for the mar#er – logscan(t,4er!ute=$) sloH

Page 31: Logistic Regression Using R

7/23/2019 Logistic Regression Using R

http://slidepdf.com/reader/full/logistic-regression-using-r 31/31

Iaplotype Association

% Iaplotype Association ' &ifferent from m"ltiple genotype models

 ' Phase ta#en into acco"nt

 ' Iaplotype association can be modelled in a similar logisticframeor#

% 2reat haplotypes as e!tended alleles

% it additie, recessie, dominant J genotype models asbefore ' 6g haplotypes are h = AA$0A2, A2$022, etc

 ' y E additie)h*

 ' y E dominant)h* etc