gene mapping quantitative traits using ibd sharing references: introduction to quantitative...

39
Gene Mapping Quantitative Traits using IBD sharing rences: oduction to Quantitative Genetics, by D.S. Falconer T. F.C. Mackay (1996) Longman Press ter 5, Statistics in Human Genetics by P. Sham (199 ld Press ter 8, Mathematical and Statistical Methods for Gen ysis by K. Lange (2002) Springer

Upload: calvin-morgan

Post on 03-Jan-2016

221 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Gene Mapping Quantitative Traits using IBD sharing References: Introduction to Quantitative Genetics, by D.S. Falconer and T. F.C. Mackay (1996) Longman

Gene Mapping Quantitative Traits using IBD sharing

References:

Introduction to Quantitative Genetics, by D.S. Falconerand T. F.C. Mackay (1996) Longman Press

Chapter 5, Statistics in Human Genetics by P. Sham (1998)Arnold Press

Chapter 8, Mathematical and Statistical Methods for GeneticAnalysis by K. Lange (2002) Springer

Page 2: Gene Mapping Quantitative Traits using IBD sharing References: Introduction to Quantitative Genetics, by D.S. Falconer and T. F.C. Mackay (1996) Longman

What is a Quantitative Trait?

A quantitative trait has numerical values that can be ordered highest to lowest. Examples include height, weight, cholesterol level, reading scores etc. There are discrete values where the values differ by a fixed amount and continuous values where the difference in two values can be arbitrarily small. Most methods for quantitative traits assume that the data are continuous (at least approximately).

Page 3: Gene Mapping Quantitative Traits using IBD sharing References: Introduction to Quantitative Genetics, by D.S. Falconer and T. F.C. Mackay (1996) Longman

Why use quantitative traits?(1) More power. Fewer subjects may need to be examined (phenotyped)

if one uses the quantitative trait rather than dichotomizing it to create qualitative trait.

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0.045

-5 5

affectedsxy w z

unaffecteds

Individuals w and x have similar trait values, yet w is grouped with z and x is grouped with y. Note that even among affecteds, knowing the trait value is useful (v and z are more similar than v and w).

v

Page 4: Gene Mapping Quantitative Traits using IBD sharing References: Introduction to Quantitative Genetics, by D.S. Falconer and T. F.C. Mackay (1996) Longman

Why use Quantitative Traits?(2) The genotype to phenotype relationship may be more direct. Affection

with a disease could be the culmination of many underlying events involving gene products, environmental factors and gene-environment interactions. The underlying events may differ among people, resulting in heterogeneity.

Some quantitative traits are more likely under the control of a single gene than others. An example: Intermediate traits like factor IX levels are influenced by fewer genes than clotting times. Genes influencing factor IX level will be easier to map than genes influencing clotting times.

Page 5: Gene Mapping Quantitative Traits using IBD sharing References: Introduction to Quantitative Genetics, by D.S. Falconer and T. F.C. Mackay (1996) Longman

Why use quantitative traits?

(3) End stage disease may be too late. If the disease is late onset, then parents may not be available anymore. However if there is a quantitative trait that is known to predict increased risk of the disease, then it might be measured earlier in a person’s lifetime. Their parents may also be available for genotyping resulting in more information.

Page 6: Gene Mapping Quantitative Traits using IBD sharing References: Introduction to Quantitative Genetics, by D.S. Falconer and T. F.C. Mackay (1996) Longman

Why not use quantitative traits?

(1) The quantitative trait doesn’t meet the assumptions of the proposed statistical method. For example many methods assume the quantitative traits are unimodal but not all quantitative traits are unimodal.

(2) The values of the quantitative trait might be very unreliable.

(3) There are no good intermediate quantitative phenotypes for a particular disease. The quantitative traits available aren’t telling the whole story.

Page 7: Gene Mapping Quantitative Traits using IBD sharing References: Introduction to Quantitative Genetics, by D.S. Falconer and T. F.C. Mackay (1996) Longman

Components of the Phenotypic Variance of a Quantitative Trait

The total variance in a quantitative trait, termed the phenotypic variance, can be partitioned into the variance due to genetic components, the environmental components and gene-environment interaction components.

genes

sharedenvironment

Independent environment

genes

Trait value Poly-genes

Page 8: Gene Mapping Quantitative Traits using IBD sharing References: Introduction to Quantitative Genetics, by D.S. Falconer and T. F.C. Mackay (1996) Longman

Components of Phenotypic Variance of a Quantitative Trait Often we make simplifying assumptions, for example that there is no

variance component due to interactions, that there is no shared environment and that all genes are acting independently.

genes

Independent environment

Trait value

In this case we can write the phenotypic variance,

VP, as the sum of the

genetic variance, VG, and the environmental

variance, VE.

VP = VE+VG

Page 9: Gene Mapping Quantitative Traits using IBD sharing References: Introduction to Quantitative Genetics, by D.S. Falconer and T. F.C. Mackay (1996) Longman

The Additive and Dominance Components of Variance

VG = VA + VD

VA, the additive genetic variance is attributedthe inheritance of individual alleles.

VD, the dominance genetic variance is attributed to the alleles acting together as genotypes.

VG / VP= heritability in the broad-sense.VA /VP= heritability in the narrow-sense.

Page 10: Gene Mapping Quantitative Traits using IBD sharing References: Introduction to Quantitative Genetics, by D.S. Falconer and T. F.C. Mackay (1996) Longman

The degree of correlation between two relatives depends on the theoretical kinship coefficient

• An important measure of family relationship is the theoretical Kinship coefficient.

• It is the probability that two alleles, at a randomly chosen locus, one chosen randomly from individual i and one from j are identical by descent.

• The kinship coefficient does not depend on the observed genotype data.

Page 11: Gene Mapping Quantitative Traits using IBD sharing References: Introduction to Quantitative Genetics, by D.S. Falconer and T. F.C. Mackay (1996) Longman

Covariance between relatives under an polygenic model depends on the theoretical kinship coefficient and the

probability that, at any arbitrary autosomal locus, the pair share both genes IBD

Relationship kinship coefficient P(IBD=2) covarianceparent-offspring 1/4 0 1/2*VA

full siblings 1/4 1/4 1/2*VA+1/4*VD

uncle-nephew 1/8 0 1/4*VA

first cousins 1/16 0 1/8*VA

Note: This doesn’t depend on any measured genotype effects(marker information).

Page 12: Gene Mapping Quantitative Traits using IBD sharing References: Introduction to Quantitative Genetics, by D.S. Falconer and T. F.C. Mackay (1996) Longman

Covariance among relatives also depends upon theallele sharing at a trait locus

Allele Sharing: Identity-by-Descent (IBD)

Parental genotypes1 / 2 3 / 4

1 / 3 2 / 4

1 / 31 / 3

1 / 42 / 3

1 / 3 1 / 3

Alleles shared IBD0

1

2

Proportion ofAlleles shared

IBD0

0.5

1.0

Page 13: Gene Mapping Quantitative Traits using IBD sharing References: Introduction to Quantitative Genetics, by D.S. Falconer and T. F.C. Mackay (1996) Longman

The proportion of alleles shared IBD is equivalent to twice the conditional kinship

coefficient.

The conditional kinship coefficient is the probability that a gene chosen randomly from person i at a specific locus matches a gene chosen randomly from person j given the available genotype information at markers.

Page 14: Gene Mapping Quantitative Traits using IBD sharing References: Introduction to Quantitative Genetics, by D.S. Falconer and T. F.C. Mackay (1996) Longman

We expect two siblings with similar, extreme trait values to share more alleles IBD at the trait locus than two siblings who have dissimilar extreme trait values.

1-2

Y = -3.011-1 1-1 1-2

1 2

IBD34 = 0= 0

3 4 5 6

1-2Y=0.2 Y=0.35

2-2Y = 3.78

IBD45 = 2

Y = 3.22 Y = 1.06IBD56 = 1

0ˆ ij2

1ˆ ij 4

1ˆ ij

Page 15: Gene Mapping Quantitative Traits using IBD sharing References: Introduction to Quantitative Genetics, by D.S. Falconer and T. F.C. Mackay (1996) Longman

The dependence of the trait’s covariance on the IBD sharing at a marker is a function of the distance between the trait andthe marker loci as well as the strength of the QTL.

As the map distance increases, the covariance of the trait values becomes less dependent on IBD sharing at the marker and so the apparent QTL variance component will decrease.

Page 16: Gene Mapping Quantitative Traits using IBD sharing References: Introduction to Quantitative Genetics, by D.S. Falconer and T. F.C. Mackay (1996) Longman

We expect two siblings with similar, extreme trait values to share more alleles IBD at the trait locus than two siblings who have dissimilar extreme trait values. Or another to think about it, we expect that the correlation among trait values will depend on IBD sharing.

-4-3-2-101234

-4 -2 0 2 4

IBD = 0

-4

-2

0

2

4

-4 -2 0 2 4

IBD = 1

j

-4-3-2-10

1234

-4 -3 -2 -1 0 1 2 3 4

IBD = 2

If a marker is strongly linked to the trait locus

-4

-2

0

2

4

-4 -2 0 2 4

-4

-2

0

2

4

-4 -2 0 2 4

Under no linkage we expect

-4

-2

0

2

4

-4 -2 0 2 4

-4

-2

0

2

4

-4 -2 0 2 4

If no linkage we expect

Page 17: Gene Mapping Quantitative Traits using IBD sharing References: Introduction to Quantitative Genetics, by D.S. Falconer and T. F.C. Mackay (1996) Longman

H a s e m a n - E l s t o n R e g r e s s i o n :

L e t Y 1 k d e n o t e t h e t r a i t v a l u e f o r s i b l i n g 1 i n s i b l i n g p a i rk a n d Y 2 k t h e v a l u e f o r s i b l i n g 2 .

I f t h e t r a i t a n d t h e m a r k e r a r e l i n k e d t h e n , t h ed i f f e r e n c e i n t h e t r a i t v a l u e s f o r t h e p a i r w i l l b e r e l a t e dt o t h e n u m b e r o f g e n e s s h a r e d I B D b y t h e p a i r .

M o r e p r e c i s e l y , l e t k d e n o t e t h e p r o p o r t i o n o f a l l e l e s

s h a r e d I B D a t t h e m a r k e r f o r s i b l i n g p a i r k t h e n ,

e k i s t h e v a r i a t i o n d u e t o m e a s u r e m e n t e r r o r o ru n c o n t r o l l e d e n v i r o n m e n t d i f f e r e n c e s .

I f t h e r e i s n o l i n k a g e b e t w e e n t h e t r a i t a n d m a r k e r t h e n = 0 . I f t h e m a r k e r a n d t r a i t l o c u s a r e l i n k e d t h e n > 0 . 0 .

kkkk eYY 221

Page 18: Gene Mapping Quantitative Traits using IBD sharing References: Introduction to Quantitative Genetics, by D.S. Falconer and T. F.C. Mackay (1996) Longman

Haseman Elston Regression

00.20.40.60.8

11.21.41.6

0 0.5 1 1.5 2

IBD

squ

are

dif

f

square diff

Page 19: Gene Mapping Quantitative Traits using IBD sharing References: Introduction to Quantitative Genetics, by D.S. Falconer and T. F.C. Mackay (1996) Longman

********************************************* * * * ANALYSIS FOR TRAIT NUMBER 01 ( TRAIT ) * * * *********************************************

SIMPLE LINEAR REGRESSION ANALYSIS Effective Regress Y on Pi Trait Locus D.F. t-value P-values Intercept Slope() ----- ----- ---- ------- -------- --------- ------- TRAIT M1 102 -3.0735 .001357** 2.4431 1.5060

t-value and p-value are for the slope

Page 20: Gene Mapping Quantitative Traits using IBD sharing References: Introduction to Quantitative Genetics, by D.S. Falconer and T. F.C. Mackay (1996) Longman

INTERPRETING THE RESULTS: Under the nullhypothesis of no linkage between the trait and the markerthe probability of observing a T-stat as negative or morenegative than -3.0735 is 0.001357

CAVEATS:

(1) Method subject to the same cautions as linearregression. For example, severe non-normality of the traitcan influence the estimates.

(2) The caveats listed for qualitative, model free linkage alsoapply.

Page 21: Gene Mapping Quantitative Traits using IBD sharing References: Introduction to Quantitative Genetics, by D.S. Falconer and T. F.C. Mackay (1996) Longman

Another way to test whether the covariance among relatives trait values is correlated with the IBD sharing at a locus is to use a variance component model.

QTL mapping using a variance component model

Page 22: Gene Mapping Quantitative Traits using IBD sharing References: Introduction to Quantitative Genetics, by D.S. Falconer and T. F.C. Mackay (1996) Longman

A simple variance component model has one major trait locus, a polygenic effect, environmental factors that are independent of genetic effects and independent across family members (no household effects). The major gene and polygenic effects are also independent

Polygenes Independent environment

QTL

Trait value

Page 23: Gene Mapping Quantitative Traits using IBD sharing References: Introduction to Quantitative Genetics, by D.S. Falconer and T. F.C. Mackay (1996) Longman

Mathematically:

Yi=+Tai+gi+qi+ei

where is the population mean, a are the “environmental” predictor variables, q is the major trait locus, g is the polygenic effect, and e is the residual error.

Polygenes Independent environment

QTL

Trait value

Page 24: Gene Mapping Quantitative Traits using IBD sharing References: Introduction to Quantitative Genetics, by D.S. Falconer and T. F.C. Mackay (1996) Longman

The variance of Y is: var(Y)=VA +VD+VG +VE

and for relatives i and j:

where VA= additive genetic variance, VD = dominance genetic variance, VG= polygenic variance,ij = the theoretical kinship coefficient for i and j, = the conditional kinship coefficient for i and j at a map location = the probability that i and j share both alleles ibd at a map location,

Bottom line: the trait covariance increases as ibd sharing increases.

GijDijAijji VVVYYCov 2ˆˆ2),(

ij

ij

Page 25: Gene Mapping Quantitative Traits using IBD sharing References: Introduction to Quantitative Genetics, by D.S. Falconer and T. F.C. Mackay (1996) Longman

Some things to consider:

The estimates of VA and VD are not the actual variances due to the QTL - they depend on how far the map location is from the QTL and the sampled data.

For a parent-child pair,cov(Yi,Yj)=VA +VG

for any map location. Why is the conditional kinship coefficient always 1/4? Why is the dominance variance missing from this equation?

For two siblings i and j,

GDijAijji VVVYYCov2

1ˆˆ2),(

Page 26: Gene Mapping Quantitative Traits using IBD sharing References: Introduction to Quantitative Genetics, by D.S. Falconer and T. F.C. Mackay (1996) Longman

Often VD is assumed to be negligible (VD = 0): Then for any relative pair i and j

GijAijji VVYYCov 2ˆ2),(

Under the null hypothesis of no linkage to the map location,

GAijGijAijji VVVVYYCov 222),(

Page 27: Gene Mapping Quantitative Traits using IBD sharing References: Introduction to Quantitative Genetics, by D.S. Falconer and T. F.C. Mackay (1996) Longman

Variance component methods of linkage analysis example overview:

(1) Estimate the IBD sharing at specified locations along the genome using marker data.

(2) Estimate the variance componentsVG and VE, under the null model by maximizing the likelihood.

(3) Given the IBD sharing, estimate the variance components, VA,VG and VE by maximizing the likelihood using the IBD sharing at specified map positions Z.

(4) Calculate the location score for each map position Z,

Identify the map positions where the location score is large.

)()(10 ZLZLLog

Page 28: Gene Mapping Quantitative Traits using IBD sharing References: Introduction to Quantitative Genetics, by D.S. Falconer and T. F.C. Mackay (1996) Longman

As the number of traits increases the complexity of the log-likelihood also increases

The loglikelihood is maximized using a steepest ascent algorithm.

It becomes more and more difficult to find the global maximum as multiple local maximum exist.One “solution” is to use several starting points for the maximization.

s1f1 s2 f2

Page 29: Gene Mapping Quantitative Traits using IBD sharing References: Introduction to Quantitative Genetics, by D.S. Falconer and T. F.C. Mackay (1996) Longman

QTL Example:

The mystery trait example from the Mendel manual:

Besides the usual commands:

PREDICTOR = Grand :: Trait1PREDICTOR = SEX :: Trait1PREDICTOR = AGE :: Trait1PREDICTOR = BMI :: Trait1COEFFICIENT_FILE = Coefficient19b.in <ibd info from sibwalkQUANTITATIVE_TRAIT = Trait1COVARIANCE_CLASS = Additive <polygenicCOVARIANCE_CLASS = EnvironmentalCOVARIANCE_CLASS = Qtl <now specify an additive qtlGRID_INCREMENT = 0.005 <spacing of the map pointsANALYSIS_OPTION = Polygenic_QtlVARIABLE_FILE = Variable19b.inPROBAND = 1PROBAND_FACTOR = PROBAND

Page 30: Gene Mapping Quantitative Traits using IBD sharing References: Introduction to Quantitative Genetics, by D.S. Falconer and T. F.C. Mackay (1996) Longman

Results• Get a summary file and a full output file• The summary file looks like: MARKER MAP LOCATION AIC NUMBER

OF DISTANCE SCORE FACTORSMarker01 0.0000 1.5892 6.6816 1Marker02 0.0010 1.5679 6.7798 1-- 0.0050 1.6693 6.3126 1-- 0.0100 1.8603 5.4329 1-- 0.0150 2.1112 4.2778 1-- 0.0200 2.4028 2.9346 1Marker03 0.0228 2.5740 2.1463 1Marker04 0.0238 2.5757 2.1383 1-- 0.0250 2.5666 2.1804 1-- 0.0300 2.4896 2.5351 1

AIC = -2*ln(L(Z))+2n The smaller the AIC the better the fitn = number of parameters – number of constraintsFactors will be explained in a little while.

Page 31: Gene Mapping Quantitative Traits using IBD sharing References: Introduction to Quantitative Genetics, by D.S. Falconer and T. F.C. Mackay (1996) Longman

There is more information in the output file including parameter estimates. However, the estimates of locus specific additive variance and narrow sense heritability obtained from genome wide scans are upwardly biased. Therefore these estimates could lead one to over estimate the importance of the QTL indetermining trait values (Goring et al, 2001, AJHG 69:1357-1369).

Page 32: Gene Mapping Quantitative Traits using IBD sharing References: Introduction to Quantitative Genetics, by D.S. Falconer and T. F.C. Mackay (1996) Longman

The model we have been considering is very simple:

(1)When examining large pedigrees, it may be possible to consider more realistic models for the environmental covariance. Example: Modeling common environmental effects using a household indicator. H=1 if i and j are members of the same household and H=0 if i and j are not members of the same household.

(2) The variance component model can use more than one quantitative trait at once as the outcome.

cijGijDijAijP VHVVVV 2ˆˆ2

Page 33: Gene Mapping Quantitative Traits using IBD sharing References: Introduction to Quantitative Genetics, by D.S. Falconer and T. F.C. Mackay (1996) Longman

Using more than one quantitative trait in the analysis

• The model extends so that multiple traits can be considered at the same time.

• The phenotypic variance is now a matrix.

• The variance components get more complicated. Instead of one term per variance component, there are (1+…+n) = (n+1)*n/2 terms where n is the number of quantitative traits.

• As an example, consider two traits X and Y.

eYeXY

eXYeX

AYAXY

AXYAX

gYgXY

gXYgX

PYPXY

PXYPX

VV

VV

VV

VV

VV

VV

VV

VV

Page 34: Gene Mapping Quantitative Traits using IBD sharing References: Introduction to Quantitative Genetics, by D.S. Falconer and T. F.C. Mackay (1996) Longman

For technical reasons it is better to reparameterize the variances using factor analytic approach

• Factor refers to hidden underlying variables that capture the essence of the data

• Each variance component is parameters in terms of factors.

• We will illustrate with the additive genetic variance matrix for two traits X and Y (in principle any number of traits or any of the components could have been used).

• There exists a matrix such that:

22

212121

21 ,, AAAYAAAXYAAX VVV

212

1 0

AA

A

Page 35: Gene Mapping Quantitative Traits using IBD sharing References: Introduction to Quantitative Genetics, by D.S. Falconer and T. F.C. Mackay (1996) Longman

Factors can be used to search for pleiotropic effects?

Could a single factor explain QTL variance component?

A single factor is consistent with pleiotropy although there may be other explanations a single factor.

When are more than two traits we could have reduced numbers of factors.

Page 36: Gene Mapping Quantitative Traits using IBD sharing References: Introduction to Quantitative Genetics, by D.S. Falconer and T. F.C. Mackay (1996) Longman

Reduction in ParametersRecall the original factor matrix for the QTL

Set A2 = 0

212121

21 ,, AAYAAAXYAAX VVV

Page 37: Gene Mapping Quantitative Traits using IBD sharing References: Introduction to Quantitative Genetics, by D.S. Falconer and T. F.C. Mackay (1996) Longman

Modifications to the control file

QUANTITATIVE_TRAIT = Trait1QUANTITATIVE_TRAIT = Trait2PREDICTOR = Grand :: Trait1PREDICTOR = SEX :: Trait1PREDICTOR = AGE :: Trait1PREDICTOR = BMI :: Trait1PREDICTOR = Grand :: Trait2PREDICTOR = SEX :: Trait2PREDICTOR = AGE :: Trait2PREDICTOR = BMI :: Trait2COVARIANCE_CLASS = AdditiveCOVARIANCE_CLASS = EnvironmentalCOVARIANCE_CLASS = Qtl

Page 38: Gene Mapping Quantitative Traits using IBD sharing References: Introduction to Quantitative Genetics, by D.S. Falconer and T. F.C. Mackay (1996) Longman

One factor explains the results as well as two

MARKER MAP LOCATION AIC NUMBER OF DISTANCE SCORE FACTORSMarker01 0.0000 1.6533 24.3863 1Marker02 0.0010 1.6492 24.4052 1-- 0.0050 1.7558 23.9143 1-- 0.0100 1.9508 23.0161 1...

Marker10 0.0931 0.5852 29.3049 1Marker11 0.0941 0.5111 29.6464 1

2 factors:Marker01 0.0000 1.6605 26.3529 2Marker02 0.0010 1.6520 26.3925 2-- 0.0050 1.7568 25.9098 2-- 0.0100 1.9508 25.0162 2...

Marker10 0.0931 0.5914 31.2764 2Marker11 0.0941 0.5179 31.6149 2

Page 39: Gene Mapping Quantitative Traits using IBD sharing References: Introduction to Quantitative Genetics, by D.S. Falconer and T. F.C. Mackay (1996) Longman

Summary

• Variance component models can be used to understand the correlations among traits in families

• They can also be used to map QTLs

• Variance component models provide a powerful approach for multivariate quantitative trait data.