logistic regression using r
TRANSCRIPT
7/23/2019 Logistic Regression Using R
http://slidepdf.com/reader/full/logistic-regression-using-r 1/31
Association Analysis,
Logistic Regression,
R and S-PLUS
Richard Mott
http://bioinformatics.ell.o!.ac."#/lect"res/
7/23/2019 Logistic Regression Using R
http://slidepdf.com/reader/full/logistic-regression-using-r 2/31
Logistic Regression in Statistical
$enetics
% Applicable to Association St"dies
% &ata:
' (inary o"tcomes )eg disease stat"s*
' &ependent on genotypes + se!, enironment
% Aim is to identify hich factors infl"ence the
o"tcome
% Rigoro"s tests of statistical significance% le!ible modelling lang"age
% $eneralisation of 0hi-S1"ared 2est
7/23/2019 Logistic Regression Using R
http://slidepdf.com/reader/full/logistic-regression-using-r 3/31
3hat is R 4
% Statistical analysis pac#age
% ree
%Similar to commercial pac#age S-PLUS% R"ns on Uni!, 3indos, Mac
% .r-pro5ect.org
% Many pac#ages for statistical genetics,microarray analysis aailable in R
% 6asily Programmable
7/23/2019 Logistic Regression Using R
http://slidepdf.com/reader/full/logistic-regression-using-r 4/31
Modelling in R
% &ata for indiid"al labelled i=1…n:
' Response y i
' $enotypes g ij
for mar#ers j=1..m
7/23/2019 Logistic Regression Using R
http://slidepdf.com/reader/full/logistic-regression-using-r 5/31
0oding Unphased $enotypes
% Seeral possibilities:
' AA, A$, $$ original genotypes
' 78, 87, 88
' 7, 8, 9
' , 7, 8 ; of $ alleles
% Missing &ata
' <A defa"lt in R
7/23/2019 Logistic Regression Using R
http://slidepdf.com/reader/full/logistic-regression-using-r 6/31
Using R
% Load genetic logistic regression tools• > source(‘logistic.R’)
% Read data table from file– > t <- read.table(‘geno.dat’,
header=TRU)
% 0ol"mn names
– na!es(t)
– t"# response ),7*
– t"!$, t"!%, &. $enotypes for each mar#er
7/23/2019 Logistic Regression Using R
http://slidepdf.com/reader/full/logistic-regression-using-r 7/31
0ontigency 2ables in R
• 'table(t"#,t"!$) prints the contingency table
> 'table(t"#,t"!$)
$$ $% %%
*$* + *
$ %+ $$ %
>
7/23/2019 Logistic Regression Using R
http://slidepdf.com/reader/full/logistic-regression-using-r 8/31
0hi-S1"ared 2est in R
> chis.test(t"#,t"!$)
earson/s 0hi-suared test
data1 t"# and t"!$
2-suared = .+3%3, d' = %, 4-5alue = .$363
7arning !essage1
0hi-suared a44ro8i!ation !a# be incorrect in1chis.test(t"#, t"!$)
>
7/23/2019 Logistic Regression Using R
http://slidepdf.com/reader/full/logistic-regression-using-r 9/31
2he Logistic Model
% Prob)Y i =* = e!p)ηi )/(1+e!p)ηi **
∀ ηi = Σ j !i5 b 5 - Linear Predictor
% !i5 – &esign Matri! )genotypes etc*
% b 5 – Model Parameters )to be estimated*
% Model is inestigated by
' estimating the b 5>s by ma!im"m li#elihood
' testing if the estimates are different from
7/23/2019 Logistic Regression Using R
http://slidepdf.com/reader/full/logistic-regression-using-r 10/31
2he Logistic "nction
Prob)Y i =* = e!p)ηi )/(1+e!p)ηi **
η
Prob)?=*
7/23/2019 Logistic Regression Using R
http://slidepdf.com/reader/full/logistic-regression-using-r 11/31
2ypes of genetic effect at a single
loc"s
AA AG GG
Recessive 0 0 1
Dominant 1 1 0
Additive 0 1 2
Genotype 0 α β
7/23/2019 Logistic Regression Using R
http://slidepdf.com/reader/full/logistic-regression-using-r 12/31
Additie $enotype Model
% 0ode genotypes as ' AA !=, ' A$ !=7, ' $$ !=8
% Linear Predictor η = b !b7
%P)?=@!* = e!p)b
:
!b7
*/)7e!p)b:
!b7
**% P AA = P)?=@!=* = e!p)b:*/)7e!p)b:**
% P A$ = P)?=@!=7* = e!p)b: b7*/)7e!p)b: b7**
% P$$ = P)?=@!=8* = e!p)b: 8b7*/)7e!p)b: 8b7**
7/23/2019 Logistic Regression Using R
http://slidepdf.com/reader/full/logistic-regression-using-r 13/31
Additie Model: b = -8 b7 = 8
P AA = .78 P A$ = . P$$ = .BB
η
Prob)?=*
7/23/2019 Logistic Regression Using R
http://slidepdf.com/reader/full/logistic-regression-using-r 14/31
Additie Model: b = b7 = 8
P AA = . P A$ = .BB P$$ = .CB
η
Prob)?=*
7/23/2019 Logistic Regression Using R
http://slidepdf.com/reader/full/logistic-regression-using-r 15/31
Recessie Model
% 0ode genotypes as ' AA !=,
' A$ !=,
' $$ !=7
% Linear Predictor η = b !b7
% P)?=@!* = e!p)b: !b7*/)7e!p)b: !b7**% P AA = P A$ = P)?=@!=* = e!p)b:*/)7e!p)b:**
% P$$ = P)?=@!=7* = e!p)b: b7*/)7e!p)b: b7**
7/23/2019 Logistic Regression Using R
http://slidepdf.com/reader/full/logistic-regression-using-r 16/31
Recessie Model: b = b7 = 8
P AA = P A$ = . P$$ = .BB
η
Prob)?=*
7/23/2019 Logistic Regression Using R
http://slidepdf.com/reader/full/logistic-regression-using-r 17/31
$enotype Model
% 6ach genotype has an independent probability% 0ode genotypes as )for e!ample*
' AA !=, y= ' A$ !=7, y=
' $$ !=, y=7
% Linear Predictor η = b: !b7yb8 to parameters
% P)?=@!* = e!p)b !b7yb8*/)7e!p)b !b7yb8**% P AA = P)?=@!=,y=* = e!p)b*/)7e!p)b**
% P A$ = P)?=@!=7,y=* = e!p)b b7*/)7e!p)b b7**
% P$$ = P)?=@!=,y=7* = e!p)b b8*/)7e!p)b b8**
7/23/2019 Logistic Regression Using R
http://slidepdf.com/reader/full/logistic-regression-using-r 18/31
$enotype Model: b = b7 = 8 b8 = -7
P AA = . P A$ = .BB P$$ = .8D
η
Prob)?=*
7/23/2019 Logistic Regression Using R
http://slidepdf.com/reader/full/logistic-regression-using-r 19/31
Models in R
response ygenotype g
AA AG GG model DF
Recessive 0 0 1 y ~ dominant(g) 1
Dominant 0 1 1 y ~ recessive(g) 1
Additive 0 1 2 y ~ additive(g) 1
Genotype 0 α β y ~ genotype(g) 2
7/23/2019 Logistic Regression Using R
http://slidepdf.com/reader/full/logistic-regression-using-r 20/31
&ata 2ransformation
• g <- t"!$
% "se these f"nctions to treat a genotype
ector in a certain ay:– a <- additi5e(g)
– r <- recessi5e(g)
– d <- do!inant(g)
– g <- genot#4e(g)
7/23/2019 Logistic Regression Using R
http://slidepdf.com/reader/full/logistic-regression-using-r 21/31
itting the Model
• a'it <- gl!( t"# 9 additi5e(g),'a!il#=‘bino!ial’)
• r'it <- gl!( t"# 9 recessi5e(g),'a!il#=‘bino!ial’)
• d'it <- gl!( t"# 9 do!inant(g),'a!il#=‘bino!ial’)
• g'it <- gl!( t"# 9 genot#4e(g),'a!il#=‘bino!ial’)
% 61"ialent models:
' genotype = dominant recessie
' genotype = additie recessie
' genotype = additie dominant
' genotype E standard chi-s1"ared test of genotype
association
7/23/2019 Logistic Regression Using R
http://slidepdf.com/reader/full/logistic-regression-using-r 22/31
Parameter 6stimates
> su!!ar#(gl!( t"# 9 genot#4e(t"!$), 'a!il#=/bino!ial/))
0oe''icients1
sti!ate :td. rror ; 5alue r(>;)
b (nterce4t) -%.$% .$3$ -$*.6 <%e-$6 ???
b$ genot#4e(t"!$)$% -.63+6 .6%$ -$.$ . .
b% genot#4e(t"!$)%% -.$%3 .3% -.6 .%
---
:igni'. codes1 @???/ .$ @??/ .$ @?/ .* @./ .$ @ / $
>
7/23/2019 Logistic Regression Using R
http://slidepdf.com/reader/full/logistic-regression-using-r 23/31
Analysis of &eiance
0hi-S1"ared 2est
> anova(gl!( t"# 9 genot#4e(t"!$), 'a!il#=/bino!ial/))
Anal#sis o' Be5iance Table
Codel1 bino!ial, linD1 logit
Res4onse1 t"#
Ter!s added seuentiall# ('irst to last)
B' Be5iance Resid. B' Resid. Be5
EUFF $$ 3.$
genot#4e(t"!$) % .6 $$* .6
7/23/2019 Logistic Regression Using R
http://slidepdf.com/reader/full/logistic-regression-using-r 24/31
Model 0omparison
% 0ompare general model ith additie,
dominant or recessie models:> a'it <- gl!(t"# 9 additi5e(t"!%))
> g'it <- gl!(t"# 9 genot#4e(t"!%))> ano5a(a'it,g'it)
Anal#sis o' Be5iance Table
Codel $1 t"# 9 additi5e(t"!%)
Codel %1 t"# 9 genot#4e(t"!%)
Resid. B' Resid. Be5 B' Be5iance$ $$6 +.$
% $$* +.$%3 $ .$
F
7/23/2019 Logistic Regression Using R
http://slidepdf.com/reader/full/logistic-regression-using-r 25/31
Scanning all Mar#ers
> logscan(t,!odel=‘additi5e’)
Be5iance BG 5al Fog5al
!$ +.63$eH $ .*+e- %.333*+
!% .6eH $ .+%6e- %.+36*%%
! 6.6++%e-$ $ 3.$63%%e-$ .+36*6
!3 .+$%+6eH $ *.+6*3e-% $.%6$$3
!* .$36eH $ .$6e- %.$6%**++
!6 %.33$%eH $ $.$*e-$ .%6%+*+
! %.$+*6$eH $ $.*6e-$ .+*6$*66
!+ $.%%$$eH $ %.6*e-$ .*$+*%
! %.*%*6%eH$ $ 3.+3%*e- 6.$33*6*
!$ *.%63eH$ $ .3+*$+e-$3 $.3%6$3+!$$ .$33$eH$ $ %.3+%e-+ .63+%*
&
&
&
7/23/2019 Logistic Regression Using R
http://slidepdf.com/reader/full/logistic-regression-using-r 26/31
M"ltiloc"s Models
% 0an test the effects of fitting to or more
mar#ers sim"ltaneo"sly
% Seeral m"ltiloc"s models are possible
% Gnteraction Model ass"mes that each
combination of genotypes has a different
effect
% eg t"# 9 t"!$ ? t"!$*
7/23/2019 Logistic Regression Using R
http://slidepdf.com/reader/full/logistic-regression-using-r 27/31
M"lti-Loc"s Models
> ' <- gl!( t"# 9 genot#4e(t"!$) ? genot#4e(t"!%6) , 'a!il#=/bino!ial/)
> ano5a(')
Anal#sis o' Be5iance Table
Codel1 bino!ial, linD1 logit
Res4onse1 t"#
Ter!s added seuentiall# ('irst to last)
B' Be5iance Resid. B' Resid. Be5
EUFF $$ 3.$
genot#4e(t"!$) % $+.6+ $$* %*.
genot#4e(t"!%6) % $.$3 $$ %.+
genot#4e(t"!$)1genot#4e(t"!%6) 6. $$ %%.+6
> 4chis(6.,%,loIer.tail=G) calc"late p-al"e
J$K .33*+3
7/23/2019 Logistic Regression Using R
http://slidepdf.com/reader/full/logistic-regression-using-r 28/31
Adding the effects of Se! and other
0oariates
% Read in se! and other coariate data, eg.
age from a file into ariables, say a"se8,
a"age
% it models of the form• 'it$ <- gl!(t"# 9 additi5e(t"!$) H a"se8 H a"age, 'a!il#=‘bino!ial’) • 'it% <- gl!(t"# 9 a"se8 H a"age, 'a!il#=‘bino!ial’)
7/23/2019 Logistic Regression Using R
http://slidepdf.com/reader/full/logistic-regression-using-r 29/31
Adding the effects of Se! and other
0oariates
% 0ompare models "sing anoa ' test if the effect
of the mar#er m7 is significant after ta#ing into
acco"nt se! and age
• ano5a('it$,'it%)
7/23/2019 Logistic Regression Using R
http://slidepdf.com/reader/full/logistic-regression-using-r 30/31
M"ltiple 2esting
% 2a#e care interpreting significance leels henperforming m"ltiple tests
% Lin#age dise1"ilibri"m can red"ce the effectie n"mber ofindependent tests
% Perm"tation is a safe proced"re to determine significance% Repeat 5=7..< times: ' Perm"te disease stat"s y beteen indiid"als ' it all mar#ers ' Record ma!im"m deiance !a8de5JLK oer all mar#ers
% Perm"tation p-al"e for a mar#er is the proportion oftimes the perm"ted ma!im"m deiance across allmar#ers e!ceeds the obsered deiance for the mar#er – logscan(t,4er!ute=$) sloH
7/23/2019 Logistic Regression Using R
http://slidepdf.com/reader/full/logistic-regression-using-r 31/31
Iaplotype Association
% Iaplotype Association ' &ifferent from m"ltiple genotype models
' Phase ta#en into acco"nt
' Iaplotype association can be modelled in a similar logisticframeor#
% 2reat haplotypes as e!tended alleles
% it additie, recessie, dominant J genotype models asbefore ' 6g haplotypes are h = AA$0A2, A2$022, etc
' y E additie)h*
' y E dominant)h* etc