a general modeling framework for studying candidate genes copy files from f:\edwin\example
DESCRIPTION
A General Modeling Framework for Studying Candidate Genes Copy files from f:\edwin\example. Why general modeling framework?. Candidate genes for quantitative traits usually “main effect” on mean. Genetic advantage more extensive modeling framework - PowerPoint PPT PresentationTRANSCRIPT
A General Modeling Framework for Studying Candidate Genes
Copy files from f:\edwin\example
Why general modeling framework?
• Candidate genes for quantitative traits usually “main effect” on mean.
• Genetic advantage more extensive modeling framework– Some candidate genes may be more likely to be
detected• One reason is power e.g. (pleiotropic) easier to
detect in multivariate study• Some genes may not work in a simple “main effect”
fashion e.g. exert their effects in severely deprived environments only, or influence the sensitivity to environmental fluctuations (variance)
• Correct tests? e.g. different genotypic variances in selected samples
• Substantive advantage general modeling framework– More extensive picture genetic effects– Shed new light on traditional research questions
Continuity, change, and heterotypyComorbidity/pleiotropyComplex traits: Causal mechanisms involving multiple factors
– New issues: The interplay between genotypes and environment.
Vulnerability, resilience, and protective factorsRisk behavior and the construction of favorable environmentsSensitivity to environmental fluctuations
– Instrumental function due to unique properties
Requirements modeling framework
• Genetic effects on the means, variances, and relations between variables
• Stratification effects on all these components
• Nuclear families of various sizes
• Interpretable parameterization
• Di- and multi-allelic loci, marker haplotypes, multiple loci simultaneously, and parental genotypes
• Easy to fit in existing (Mx) software
LISREL based model
(s)jk(s)jk(s)(s)jk(s)jk(s)
y(s)yjk(s)y
jk(s)(s)yjk(s)
xxkx
kxk
y subject variables
x family variables
Names, Symbols and Function of Model Matrices
Name Symbol Function
Subject (=y) variables
Structural part
Alpha jk Means
Beta jk Causal effects of subject variables on each other
Gamma jk Causal effects of family variables on subject variables
Psi yjk
diagonal Residual variances
off-diagonal Residual covariances
Names, Symbols and Function of Model Matrices
Name Symbol Function
Measurement part
Nu yjk Intercepts or means indicators
Lambda yjk Factor loadings of indicators
Theta yjk
diagonal Variances errors of measurement
off-diagonal Covariances between errors of measurement
Covariances between y variables of subjects from same family
Ck
Names, Symbols and Function of Model Matrices
Name Symbol Function
Family (=x) variables
Psi xk
diagonal Variances
off-diagonal Covariances
Nu xk Intercepts or means of indicators
Lambda xk Factor loadings of indicators
Theta xk
diagonal Variances errors of measurement
off-diagonal Covariances between errors of measurement
Alternative Models
• Conditional model
(s)jk(s)jk(s)(s)jk(s)xsjk(s)
y(s)jk(s)jk(s)(s)jk(s)xsjk(s)
• x-variables is independent subject plus family variables– relax assumption full multivariate normality
– curvi or non-linear effects x-variables• Disadvantage:
- Optimization,
- Measurement model x-variables
Other modeling frameworks
Partitioning parameter matrices
• Most matrices:
– a) general matrices that are not subscripted represent overall model in all genotype groups and population strata
– b) genetic matrices j represent deviations from the general model caused by locus effects
– c) matrices that are subscripted k and represent deviations from the general model caused by population stratification
How?
• Example matrix Beta: Causal effects of subject variables on each other
jk(s) =j(gsI) k(fI)
• Main effects are in B that has dimension n n,
• Genetic effects in term j(gsI)
– The ng 1 vector gs contains ng dummy variables coding the genotype (haplotype) of subject s
• deviations from B thus maximum = #genotypes - 1
• sets of dummy variables to study multiple loci simultaneously or effects of parental genotypes
j = [ 1 | 2 |… | ng]
dimension is n (ng n),
• where 1 is the n n submatrix containing the effects of the first dummy variable, …etc.
A1A1 A1A2 A2A2
G1 1 0 -1G2 0 1 0
Example
0 0
21 0
(gsI) = =
j(gs
I) = =0 0
21(1) 00 0
21(2) 0
1 1 0
1 00 10 00 0
1 00 10 00 0
0 0
21(1) 0
A1A1 subjects
(gsI) = =
j(gs
I) = =0 0
21(1) 00 0
21(2) 0
0 1 0
0 00 01 00 1
0 00 01 00 1
0 0
21(2) 0
A1A2 subjects
(gsI) = =
j(gs
I) = =0 0
21(1) 00 0
21(2) 0
-1 1 0
-1 0 0 -1 0 0 0 0
-1 0 0 -1 0 0 0 0
0 0
21(1) 0
A2A2 subjects
Stratification effects in termk(fI)
• The nf 1 vector f contains the nf dummy variables used to code family types– deviations thus maximum = #family types - 1
• k = [ 1 | 2 |… | nf]
dimension is n (nf n),– where 1 is the n n submatrix containing the effects
of the first dummy variable, …etc.
and I select proper matrix for dummy variable
F1 F2 F3 F4 F5
SubjectA
SubjectB
Not informative 2 2 1 0 0 0 0
of stratification 1 1 0 1 0 0 0
0 0 0 0 1 0 0
Informative 2 1 0 0 0 1 0
of stratification 2 0 0 0 0 0 1
1 0 0 0 0 0 0
Sibling pairs
ParentA
ParentB
SubjectA
F1 F2 F3 F4 F5
Not informative 2 2 2 1 0 0 0 0
of stratification 2 0 1 0 1 0 0 0
0 0 0 0 0 1 0 0
Informative 2 1 2 0 0 0 1 0
of stratification 1 0 0 0 1 0
1 1 2 0 0 0 0 1
1 0 0 0 0 1
0 0 0 0 0 1
1 0 1 0 0 0 0 0
0 0 0 0 0 0
Two Parents, one “child”
Family Types in a Sample of Singletons and Pairs of Siblings With or Without Genotyped Parentsa
Parent not genotyped Parent genotyped
One subject Two subjects One subject Two subjects
Family types not informative Subject 1 Subject 1 Subject 2 Parent 1 Parent 2 Subject 1 Parent 1 Parent 2 Subject 1 Subject 2
of stratification 2 2 2 2 2 2 2 2 2 2
1 1 1 2 0 1 2 0 1 1
0 0 0 0 0 0 0 0 0 0
Family types informative 2 1 2 1 2 2 1 2 2
of stratification 2 0 1 2 1
1 0 1 1 2 1 1
1 1 1 2 2
0 2 1
1 0 1 2 0
0 1 1
1 0
0 0
1 0 1 1
0 1
0 0
a The cells list the number of A1 alleles.
Subject (=y) variables, Structural part
jk(s)
jgs
kf
with dimension is n 1, j is n ng, k = n nf
jk(s) =j(gs
I) k(fI)
with dimension is n n, j is n (ng n), k is n (nf n)
jk(s) =j(gs
I) k(fI)
with dimension is n n, j is n (ng n), k is n (nf n)
jk(s)
j(gs
Ik(fI)
with dimension is n n, j is n (ng n), k is n (nf n)
Other matrices are partitioned in the same way
Subject (=y) variables, measurement part
yjk(s)
yyjgsy
kf
with dimension y = ny 1, yj = ny ng, y
k = ny nf
yjk(s)
yyj(gs
Iyk(fI
with dimension y = ny n, yj = ny (ng n), y
k = ny (nf n)
yjk(s) yy
j(gsIy
yk(fIy
with dimension y ny ny, yj = ny (ng ny), y
k = ny (nf ny)
Covariance between subjects from same family:
k(s=A,s=B)
= (C + Ck(fIy
with dimension C = ny ny, Ck = ny (nf ny).
Family (=x) variables:
xkxx
k(fI)
with dimension x is n n, xk is n (nf n)
xkxxf
with dimension x = nx 1, xk = nx nf
xkxx
k(fI
with dimension x = nx n, x k = nx (nf n)
xk xx
k(fIx
with dimension x nx nx, xk = nx (nf nx)
General interpretation
• Genetic effects on:
– means are “main” effects
– relations between variables are interaction effects
– residuals are variance effects
1 (2 )
2 (1 )
1 (1 ) 2 (2 )
G 1 G 2
Genotype
11
2 1
21
2 2
y2y1
1 2
2 1
Simple example
y = jgy
E( t) = y
or,
= + + +
or,
y1 = 1 + 1(1)G1 + 1(2)G2 + 12y2 + 1
y2 = 2 + 2(1)G1 + 2(2)G2 + 21y1 + 2
0 12
21 0
1
2
1(1)
1(2)
2(1) 2(2)
y1
y2
G1
G2
y1
y2
1
2
y = 11
21
22
Genetic effects on y1
-1(1) 1(2) 1(1)
A2A2 A1A2 A1A1
1
A1A1 A1A2 A2A2
G1 1 0 -1G2 0 1 0
Additive model
2(1)
2(2)
G1 G2
Genotype
21
11
22
y2y1
Mediator model
1(2)
1(1)
G1 G2
Genotype
21
11
22
y2y1
21
Reversed effect model
2(1)
2(2)
G1 G2
Genotype
12
11
22
y2y1
21
21
Common gene model
1(2)
2(1)
1(1) 2(2)
G1 G2
Genotype
11
21
21
22
y2y1
Interactions
y = jgyjy +
j =
Applied to additive model:
y1 = 1 + 1
y2 = 2 + 2(1)G1 + 2(2)G2 + 21y1 + 21(1)y1G1 + 21(2)y1G2 + 2
0 0
21(1) 00 0
21(2) 0
A1A1
A1A2
A2A2
y
y
21(1) > 0 and 21(2) = 0
A2A2
A1A2
A1A1
y
y
21(1) and 21(2) >0
Estimation and specification in Mx
y(s)y
jk(s) y
jk(s)y
jk(s)I jk(s)
jk(s)
xxkx
k
E((y(s) y
jk(s))(y(s) y
jk(s))t)yjk(s)
yjk(s)
I jk(s)
jk(s)x
k
jk(s)ty
jk(s))I jk(s) ty
jk(s)ty
jk(s)
E((x xk)(x x
k)t)xkx
kx
kx
ktx
k
E((y(s)
jk(s))(x xk)t)yx
k(s)y
jk(s)I jk(s)
jk(s)
xk(x)tk
Expected means and covariances single subject
Complete data vector zt = (xt,yt):
zttjk = [(x
k)t, (y
jk(s=1))t,…,(y
jk(s=ns))t]
xkxy
jk(s=1)xy
jk(s=ns)
yxjk(s=1)
yjk(s=1)
yk(s=A,s=B)
yxjk(s=ns)
k(s=A,s=B)
yjk(s=ns)
k(s=A, s=B)
covariances between subjects from same family
E((z jk)(z jk)t)jk
Expected means and covariances whole family
NlnL(;zi) = lnLi
i=1
lnLi = { nzilog(2) + log (jk + (zi - (jk)t(jk-1(zi - (jk)}
Maximize log-likelihood function given the observed data by Raw Maximum likelihood
Minus two times the difference between the log likelihoods of two nested models is chi-square distributed with the difference in estimated parameters as the degrees of freedom.
where the individual log-likelihoods equal
Specification
– Most instances selection of matrices
– Dimension matrices > boring, errors
– Get started
Therefore simple program– Batch or questions
MxScript
• Data structure– Number of (latent) subject variables?– Number of subjects in largest family?– Number of dummy variables for genotypes?
• Matrices to be used– Do the subject variables have causal effects on each other? BETA?– GENETIC: causal relations between subject variables? BETA?– STRATIFICATION: means of subject variables? ALPHA?
• File names– Name of file with your data? (DOS name)?– Name of the file for the Mx script? (DOS name)
Structure Mx script
• Most instances four groupsGroup Function Free parameters Starting values
1 General part yes yes
2 Genetic effects yes
3 Stratification effects yes
4 Fit model to data
Type from DOS-prompt: MxScript <ENTER>
Type from DOS-prompt: MxScript input.dat <ENTER>
Example
• Name data file: example.dat
• Sibling pairs, no parents
• Three genotype groups
• Family variables in data file (indicate that you want specify admixture effects)
• Starting values: sample drawn from multivariate distribution with means 0 and variances 1.5
BMDexercise
Hip
Arm
SpineDuration
Intensity
General part
Identification measurement model:
y
0 00 10 42
0 52
BMDexercise
Hip
Arm
SpineDuration
IntensityGenetic + Stratification effects
Common pathway?
Independent pathway?
Tests
Common pathway-Estimate model with genetic and stratification effects on means of second latent variable and test for significance of:
1. Genetic effects
2. Stratification effects
3. Genetic + stratification effect
Independent pathway- Estimate model with genetic and stratification effects on means of the indicators of the second latent variable and test for significance of:
1. Genetic effects
2. Stratification effects
3. Genetic + stratification effect
Free elements
a Full 2 1 Free [Matrices-End matrices section]
Free a 1 1 a 2 1 [After End matrices - free elements]
Free a 1 1 to a 2 1 [After End matrices - free range]
Names, Symbols and Function of Model Matrices
Name Symbol Function
Subject (=y) variables
Structural part
Alpha jk Means
Beta jk Causal effects of subject variables on each other
Gamma jk Causal effects of family variables on subject variables
Psi yjk
diagonal Residual variances
off-diagonal Residual covariances
Names, Symbols and Function of Model Matrices
Name Symbol Function
Measurement part
Nu yjk Intercepts or means indicators
Lambda yjk Factor loadings of indicators
Theta yjk
diagonal Variances errors of measurement
off-diagonal Covariances between errors of measurement
Covariances between y variables of subjects from same family
Ck
Solution
Copy files from f:\edwin\solution