a general modeling framework for studying candidate genes copy files from f:\edwin\example

$: A General Modeling Framework for Studying Candidate Genes Copy files from f:\edwin\example$
A General Modeling Framework for Studying Candidate Genes

Copy files from f:\edwin\example

Why general modeling framework?

• Candidate genes for quantitative traits usually “main effect” on mean.

• Genetic advantage more extensive modeling framework– Some candidate genes may be more likely to be

detected• One reason is power e.g. (pleiotropic) easier to

detect in multivariate study• Some genes may not work in a simple “main effect”

fashion e.g. exert their effects in severely deprived environments only, or influence the sensitivity to environmental fluctuations (variance)

• Correct tests? e.g. different genotypic variances in selected samples

• Substantive advantage general modeling framework– More extensive picture genetic effects– Shed new light on traditional research questions

Continuity, change, and heterotypyComorbidity/pleiotropyComplex traits: Causal mechanisms involving multiple factors

– New issues: The interplay between genotypes and environment.

Vulnerability, resilience, and protective factorsRisk behavior and the construction of favorable environmentsSensitivity to environmental fluctuations

– Instrumental function due to unique properties

Requirements modeling framework

• Genetic effects on the means, variances, and relations between variables

• Stratification effects on all these components

• Nuclear families of various sizes

• Interpretable parameterization

• Di- and multi-allelic loci, marker haplotypes, multiple loci simultaneously, and parental genotypes

• Easy to fit in existing (Mx) software

LISREL based model

(s)jk(s)jk(s)(s)jk(s)jk(s)

y(s)yjk(s)y

jk(s)(s)yjk(s)

xxkx

kxk

y subject variables

x family variables

Names, Symbols and Function of Model Matrices

Name Symbol Function

Subject (=y) variables

Structural part

Alpha jk Means

Beta jk Causal effects of subject variables on each other

Gamma jk Causal effects of family variables on subject variables

Psi yjk

diagonal Residual variances

off-diagonal Residual covariances



Measurement part

Nu yjk Intercepts or means indicators

Lambda yjk Factor loadings of indicators

Theta yjk

diagonal Variances errors of measurement

off-diagonal Covariances between errors of measurement

Covariances between y variables of subjects from same family

Ck



Family (=x) variables

Psi xk

diagonal Variances

off-diagonal Covariances

Nu xk Intercepts or means of indicators

Lambda xk Factor loadings of indicators

Theta xk



Alternative Models

• Conditional model

(s)jk(s)jk(s)(s)jk(s)xsjk(s)

y(s)jk(s)jk(s)(s)jk(s)xsjk(s)

• x-variables is independent subject plus family variables– relax assumption full multivariate normality

– curvi or non-linear effects x-variables• Disadvantage:

- Optimization,

- Measurement model x-variables

Other modeling frameworks

Partitioning parameter matrices

• Most matrices:

– a) general matrices that are not subscripted represent overall model in all genotype groups and population strata

– b) genetic matrices j represent deviations from the general model caused by locus effects

– c) matrices that are subscripted k and represent deviations from the general model caused by population stratification

How?

• Example matrix Beta: Causal effects of subject variables on each other

jk(s) =j(gsI) k(fI)

• Main effects are in B that has dimension n n,

• Genetic effects in term j(gsI)

– The ng 1 vector gs contains ng dummy variables coding the genotype (haplotype) of subject s

• deviations from B thus maximum = #genotypes - 1

• sets of dummy variables to study multiple loci simultaneously or effects of parental genotypes

j = [ 1 | 2 |… | ng]

dimension is n (ng n),

• where 1 is the n n submatrix containing the effects of the first dummy variable, …etc.

A1A1 A1A2 A2A2

G1 1 0 -1G2 0 1 0

Example

0 0

21 0

(gsI) = =

j(gs

I) = =0 0

21(1) 00 0

21(2) 0

1 1 0

1 00 10 00 0

1 00 10 00 0

0 0

21(1) 0

A1A1 subjects

(gsI) = =

j(gs

I) = =0 0

21(1) 00 0

21(2) 0

0 1 0

0 00 01 00 1

0 00 01 00 1

0 0

21(2) 0

A1A2 subjects

(gsI) = =

j(gs

I) = =0 0

21(1) 00 0

21(2) 0

-1 1 0

-1 0 0 -1 0 0 0 0

-1 0 0 -1 0 0 0 0

0 0

21(1) 0

A2A2 subjects

Stratification effects in termk(fI)

• The nf 1 vector f contains the nf dummy variables used to code family types– deviations thus maximum = #family types - 1

• k = [ 1 | 2 |… | nf]

dimension is n (nf n),– where 1 is the n n submatrix containing the effects

of the first dummy variable, …etc.

and I select proper matrix for dummy variable

F1 F2 F3 F4 F5

SubjectA

SubjectB

Not informative 2 2 1 0 0 0 0

of stratification 1 1 0 1 0 0 0

0 0 0 0 1 0 0

Informative 2 1 0 0 0 1 0

of stratification 2 0 0 0 0 0 1

1 0 0 0 0 0 0

Sibling pairs

ParentA

ParentB

SubjectA

F1 F2 F3 F4 F5

Not informative 2 2 2 1 0 0 0 0

of stratification 2 0 1 0 1 0 0 0

0 0 0 0 0 1 0 0

Informative 2 1 2 0 0 0 1 0

of stratification 1 0 0 0 1 0

1 1 2 0 0 0 0 1

1 0 0 0 0 1

0 0 0 0 0 1

1 0 1 0 0 0 0 0

0 0 0 0 0 0

Two Parents, one “child”

Family Types in a Sample of Singletons and Pairs of Siblings With or Without Genotyped Parentsa

Parent not genotyped Parent genotyped

One subject Two subjects One subject Two subjects

Family types not informative Subject 1 Subject 1 Subject 2 Parent 1 Parent 2 Subject 1 Parent 1 Parent 2 Subject 1 Subject 2

of stratification 2 2 2 2 2 2 2 2 2 2

1 1 1 2 0 1 2 0 1 1

0 0 0 0 0 0 0 0 0 0

Family types informative 2 1 2 1 2 2 1 2 2

of stratification 2 0 1 2 1

1 0 1 1 2 1 1

1 1 1 2 2

0 2 1

1 0 1 2 0

0 1 1

1 0

0 0

1 0 1 1

0 1

0 0

a The cells list the number of A1 alleles.

Subject (=y) variables, Structural part

jk(s)

jgs

kf

with dimension is n 1, j is n ng, k = n nf

jk(s) =j(gs

I) k(fI)

with dimension is n n, j is n (ng n), k is n (nf n)

jk(s) =j(gs

I) k(fI)


jk(s)

j(gs

Ik(fI)


Other matrices are partitioned in the same way

Subject (=y) variables, measurement part

yjk(s)

yyjgsy

kf

with dimension y = ny 1, yj = ny ng, y

k = ny nf

yjk(s)

yyj(gs

Iyk(fI

with dimension y = ny n, yj = ny (ng n), y

k = ny (nf n)

yjk(s) yy

j(gsIy

yk(fIy

with dimension y ny ny, yj = ny (ng ny), y

k = ny (nf ny)

Covariance between subjects from same family:

k(s=A,s=B)

= (C + Ck(fIy

with dimension C = ny ny, Ck = ny (nf ny).

Family (=x) variables:

xkxx

k(fI)

with dimension x is n n, xk is n (nf n)

xkxxf

with dimension x = nx 1, xk = nx nf

xkxx

k(fI

with dimension x = nx n, x k = nx (nf n)

xk xx

k(fIx

with dimension x nx nx, xk = nx (nf nx)

General interpretation

• Genetic effects on:

– means are “main” effects

– relations between variables are interaction effects

– residuals are variance effects

1 (2 )

2 (1 )

1 (1 ) 2 (2 )

G 1 G 2

Genotype

11

2 1

21

2 2

y2y1

1 2

2 1

Simple example

y = jgy

E( t) = y

or,

= + + +

or,

y1 = 1 + 1(1)G1 + 1(2)G2 + 12y2 + 1

y2 = 2 + 2(1)G1 + 2(2)G2 + 21y1 + 2

0 12

21 0

1

2

1(1)

1(2)

2(1) 2(2)

y1

y2

G1

G2

y1

y2

1

2

y = 11

21

22

Genetic effects on y1

-1(1) 1(2) 1(1)

A2A2 A1A2 A1A1

1

A1A1 A1A2 A2A2

G1 1 0 -1G2 0 1 0

Additive model

2(1)

2(2)

G1 G2

Genotype

21

11

22

y2y1

Mediator model

1(2)

1(1)

G1 G2

Genotype

21

11

22

y2y1

21

Reversed effect model

2(1)

2(2)

G1 G2

Genotype

12

11

22

y2y1

21

21

Common gene model

1(2)

2(1)

1(1) 2(2)

G1 G2

Genotype

11

21

21

22

y2y1

Interactions

y = jgyjy +

j =

Applied to additive model:

y1 = 1 + 1

y2 = 2 + 2(1)G1 + 2(2)G2 + 21y1 + 21(1)y1G1 + 21(2)y1G2 + 2

0 0

21(1) 00 0

21(2) 0

A1A1

A1A2

A2A2

y

y

21(1) > 0 and 21(2) = 0

A2A2

A1A2

A1A1

y

y

21(1) and 21(2) >0

Estimation and specification in Mx

y(s)y

jk(s) y

jk(s)y

jk(s)I jk(s)

jk(s)

xxkx

k

E((y(s) y

jk(s))(y(s) y

jk(s))t)yjk(s)

yjk(s)

I jk(s)

jk(s)x

k

jk(s)ty

jk(s))I jk(s) ty

jk(s)ty

jk(s)

E((x xk)(x x

k)t)xkx

kx

kx

ktx

k

E((y(s)

jk(s))(x xk)t)yx

k(s)y

jk(s)I jk(s)

jk(s)

xk(x)tk

Expected means and covariances single subject

Complete data vector zt = (xt,yt):

zttjk = [(x

k)t, (y

jk(s=1))t,…,(y

jk(s=ns))t]

xkxy

jk(s=1)xy

jk(s=ns)

yxjk(s=1)

yjk(s=1)

yk(s=A,s=B)

yxjk(s=ns)

k(s=A,s=B)

yjk(s=ns)

k(s=A, s=B)

covariances between subjects from same family

E((z jk)(z jk)t)jk

Expected means and covariances whole family

NlnL(;zi) = lnLi

i=1

lnLi = { nzilog(2) + log (jk + (zi - (jk)t(jk-1(zi - (jk)}

Maximize log-likelihood function given the observed data by Raw Maximum likelihood

Minus two times the difference between the log likelihoods of two nested models is chi-square distributed with the difference in estimated parameters as the degrees of freedom.

where the individual log-likelihoods equal

Specification

– Most instances selection of matrices

– Dimension matrices > boring, errors

– Get started

Therefore simple program– Batch or questions

MxScript

• Data structure– Number of (latent) subject variables?– Number of subjects in largest family?– Number of dummy variables for genotypes?

• Matrices to be used– Do the subject variables have causal effects on each other? BETA?– GENETIC: causal relations between subject variables? BETA?– STRATIFICATION: means of subject variables? ALPHA?

• File names– Name of file with your data? (DOS name)?– Name of the file for the Mx script? (DOS name)

Structure Mx script

• Most instances four groupsGroup Function Free parameters Starting values

1 General part yes yes

2 Genetic effects yes

3 Stratification effects yes

4 Fit model to data

Type from DOS-prompt: MxScript <ENTER>

Type from DOS-prompt: MxScript input.dat <ENTER>

Example

• Name data file: example.dat

• Sibling pairs, no parents

• Three genotype groups

• Family variables in data file (indicate that you want specify admixture effects)

• Starting values: sample drawn from multivariate distribution with means 0 and variances 1.5

BMDexercise

Hip

Arm

SpineDuration

Intensity

General part

Identification measurement model:

y

0 00 10 42

0 52

BMDexercise

Hip

Arm

SpineDuration

IntensityGenetic + Stratification effects

Common pathway?

Independent pathway?

Tests

Common pathway-Estimate model with genetic and stratification effects on means of second latent variable and test for significance of:

1. Genetic effects

2. Stratification effects

3. Genetic + stratification effect

Independent pathway- Estimate model with genetic and stratification effects on means of the indicators of the second latent variable and test for significance of:

1. Genetic effects

2. Stratification effects

3. Genetic + stratification effect

Free elements

a Full 2 1 Free [Matrices-End matrices section]

Free a 1 1 a 2 1 [After End matrices - free elements]

Free a 1 1 to a 2 1 [After End matrices - free range]



Subject (=y) variables

Structural part

Alpha jk Means

Beta jk Causal effects of subject variables on each other

Gamma jk Causal effects of family variables on subject variables

Psi yjk

diagonal Residual variances

off-diagonal Residual covariances



Measurement part

Nu yjk Intercepts or means indicators

Lambda yjk Factor loadings of indicators

Theta yjk



Covariances between y variables of subjects from same family

Ck

Solution

Copy files from f:\edwin\solution

a general modeling framework for studying candidate genes copy files from f:\edwin\example

Documents