genome wide association studies and genomic...

5/23/2016

1

Genome wide association

Studies and Genomic

Selection

Animal selection

• Selection based on EBV has been very effective

– Substantial increases in the majority of traits and

livestock species

– An efficient procedure is being routine applied with

little to no intervation by the userAccuracies depend

on heritabilities and amount of information

• For some traits, the results are modest and even

in the opposite direction

– Fertility in dairy cattle

– Health traits in poultry

5/23/2016

2

Animal selection

• Accuracies of BV estimation depend on:

– Heritabilities

– Amount of information

– Precision in phenotype measurements

• Genetic progress is limited by

– Accuracies of EBV

– Generation interval

– Cost

How can we improve selection?

• Use of molecular information

– Identify causative major genes

– QTL in linked to markers

– Develop gene or marker assisted selection

• Very little success!!

– Very few major genes were identified

– Only QTLs with large effects were identified • Small portion of the total variation

• Hard to use in commercial applications

5/23/2016

3

Things could be better if

• More markers are made available

– It was not possible in the early molecular

information

• The identification of genes (QTLs) is not

needed

– Elimination of the mapping step

• Removes the uncertainties

• Remove the complexity of use in commercial setup

• Account even for small QTLs

Two Approaches

• Infinitesimal model:

– Assumes that traits are determined by

an infinite number of additive loci, each

with a small effect

– Basic tool of animal breeding

– Spectacularly successful in many cases

5/23/2016

4

Two Approaches

• Finite loci model:

– There is a finite amount of DNA and genes

– Quantify any gene contribution to variation

– Allows for the dissection of the genetic complexity

• increase the accuracy of breeding values estimation

• Increase selection response

Two Approaches

• But How to find these QTLs

– Candidate gene approach • Some success

• Number of candidate genes is too large

• Very difficult to find candidates

– Linkage mapping • Track genome segment from a generation to the

next using markers

• QTL are not precisely mapped

• Wide confidence intervals of QTL location

5/23/2016

5

Two approaches

– Fine mapping

• Use Linkage disequilibrium mapping approaches

• The process is slow to find causative mutation

• Marker density was a problem

– Result : Failure of MAS

Then …

• The completion of a working draft of the human and several livestock species genomes

– Sequencing of every nucleotide in the genome

• Development in High-through-put technologies

– Hybridization techniques

– Polymorphisms genotyping

5/23/2016

6

Polymorphisms

• A specific sequence variation that happens

with a certain frequency in the population – At least 1%, often 5%

• For example

– Blood type

– CNV (Segment of DNA that are found in different

numbers of copies among individuals)

– Single Nucleotide Polymorphism (SNP)

SNP

)Murray 2007(

5/23/2016

7

SNPs

AGATTTAGATCCCGATAGAG

AGATTTAGATCACGATAGAG

A SNP is a genomic position at which two or more different bases occur in

the population, each with a frequency > Threshold (example 1%)

Alleles are C/A

SNPs

• Human Genome ~ 3 Billion base pairs

• Bovine Genome ~ 3 Billion base pairs

• Two individuals are 99.5 to 99.9% identical

– differ in 3 -10 M base pairs.

• SNPs occur once every ~600 bp (average

gene in the human ~27Kb)

• Around 50 SNPs per gene

5/23/2016

8

SNPs

• Currently available SNP marker chips

– 1- 2 M SNPs in human

– 50k -500K SNPs in cattle

– 60K SNPs in chickens and pigs

– ~200K for dogs

– Varying numbers for fish, sheep and plants

– Cost ~ $120- $200 for 60 K chip

G C T C G A C A A C A G

G T T C G T C A A C A G

C A G

T T G

SNP 1 SNP 2 SNP 2

Haplotypes

SNP Haplotypes

Sorin Istrail, 2009

5/23/2016

9

Polymorphisms

• single nucleotide polymorphisms

(SNPs)

• Difference between any two individuals

3 106 SNPs

… ataggtccCtatttcgcgcCgtatacacgggActata …

… ataggtccGtatttcgcgcCgtatacacgggTctata …

… ataggtccCtatttcgcgcCgtatacacgggTctata …

Haplotypes and Genotypes

• Haplotype: SNP alleles on a chromosome

– 0/1 vector: 0 for major allele, 1 for minor

• Genotype: Alleles of an SNP on both

chromosomes

– a vector of 0/1/2 vector:

5/23/2016

10

Haplotypes and Genotypes

011100110

001000010

021200210

+ Individual two haplotypes

genotype

Coding

Genotypes Dominant Codominant Recessive

AA

AG

GG

1

1

0

2

1

0

1

0

0

1. Coding depends on the genetic model (there is no unique way to code genotypes

2. Interpretation of results depends on the coding procedure

5/23/2016

11

Genotypes vs Haplotypes

• It depends on the application

– Degrees of freedom vs model parameters

• There is no simple answer (application

dependent)

• If possible, you could try both

• Keep in mind, you are trying to fit a model

(explain the variation in the dependent

variable)

How to use this information?

• Genome-wide Association Study (GWAS) – Identify genetic associations with observable traits

– Identify possible causative mutations

• Genomic selection – Estimation genetic quantities that could be used for

selection

– Estimation of relationships between individuals

– Paternity checking

• Genomic enhanced management – Decision support tool

5/23/2016

12

Data analysis

• Single marker analysis

– Compare phenotype means across marker’s

classes

• Simple but too many analyses to be

conducted

– False positive/negatives

– High linkage disequilibrium

iiji egeny

Data analysis

• Example

• Using linear model

– SNP effect estimate = 10

SNP genotype Trait (EBV) average

TT

GT

GG

30

20

10

5/23/2016

13

Data analysis

• Based on single marker analysis results

– Select only significant SNPs for marker

assisted selection!

– Use all SNPs

• Some will be missed

• False positive/negatives

• A better solution: A model that uses all the

SNPs simultaneously

Multi-marker analysis: Genomic

selection

• Use all SNPs in the panel (i.e. 50K) in a

single analysis

– Unless as many observations as SNPs are

available, fixed model could not be

implemented

– Assuming SNP effects as random will remove

the problem

• Prior information is needed

5/23/2016

14

Data analysis

• Model

• The SNP genotypes will be coded as

covariables

– Minor allele content (0,1, 2)

SNP

j

iijji egy#

1

is the trait value (often a pseudo record) for individual i, is the effect of

SNP j, and is the genotype of SNP j for animal i

j

ijg

iy

Data analysis

• The resulting system of equations is very dense

– Sparse matrix techniques are of little interest

– Constructing and inverting the coefficient matrix is seldom possible!

• Gauss-Seidel with residual update (GSRU)

– matrix-free BLUP-like estimation procedure

– Low Computational cost

5/23/2016

15

Data analysis

• Gauss-Seidel

• Further

α

t

j

j

'

j

t

n:1jp:1j

1t

1j:11j:1

'

j

xx

βXβX(yx )1

1t

jj

jt β

',1) xeβXβX(yt

n:1jp:1j

1t

1j:11j:1

Data analysis

• GSRU

• Only vectors are involved in

computation

α

β t

jj

jt

t

j

j

'

j

'

j

'

j

xx

xxex ),1

1

5/23/2016

16

Data analysis

• Implementation depends on the assumed prior information – BLUP type, BayesA, BayesB, BayesC ..etc

• Implementation depends of what you are interested in estimating – SNP effects

– Genomic EBVs –GBLUP (simular to regular BLUP, except A is replaced by the genomic relationship matrix

• In both cases, an estimated GBV is obtained – Directly –GBLUP

– Sum of SNP effects

SNP

j

ijj g#

1

^

5/23/2016

17

Data analysis

• Model

• Likelihood function

With

p

j

ijiji eXy1

BLUP

2222 ~),|(

eveeeee ssp

constant~)(p

Prior distributions for and 2

e

Prior distributions for j

),0(~)|( 22

Np j

BLUP Like approach

5/23/2016

18

BLUP

2

Assume is known then,

The joint posterior distribution is easily obtained as the product of the likelihood function

and the prior distributions:

)()()(),,|(

),,(),,|()|,,(

22

222

ee

eee

pppp

ppp

ββy

ββyyβ

)()(

)()()(),,(

2

22

e

ee

pp

pppp

β

ββ

)()(),,|()|,,( 222 ββyyβ pppp eee

Given that

Then

BLUP

),,|( 2yβ

ep

),,|( 2yβ

ep

),,|( 2yβ

ep

])(2

1exp[)

2

1(

)2

)0(exp()2(

)2

1exp()]()(

2

1exp[)

2

1(),,|(

2

1 12

2/

2

12

22/12

2

2

012/2

2

1 12

2/

2

2

N

i

p

j

jijij

e

N

e

p

i

i

e

e

N

i

p

j

jijij

e

N

e

e

Xy

SXyp

yβ

For an implementation via the Gibbs sampler, we have to derive the full conditional distributions:

5/23/2016

19

BLUP

2/)2

1( N

])()(2(2

1exp[)

1(

])()(2(2

1exp[)

1(

])(2

1exp[)

1(),,|(

2

1 11 1

2

22

2

1

2

11

2

22

2

1

2

12

22

2

N

i

p

j

jiji

N

i

p

j

jiji

e

N

e

N

i

p

j

jiji

p

j

jiji

e

N

e

N

i

p

j

jiji

e

N

e

e

XyXyN

XyXy

Xyp

yβ

Putting into the proportionality term,

BLUP

}]

)()(

2{2

1exp[)

1(),,|(

2

1 11 12

22

2

2

N

Xy

N

Xy

Np

N

i

p

j

jiji

N

i

p

j

jiji

e

N

e

e

yβ

Further

Adding and subtracting leads to: 21 1]

)(

[N

XyN

i

p

j

jiji

]]

)(

[]

)(

[

}

)()(

2{2

1exp[)

1(),,|(

21 121 1

2

1 11 12

22

2

2

N

Xy

N

Xy

N

Xy

N

Xy

Np

N

i

p

j

jiji

N

i

p

j

jiji

N

i

p

j

jiji

N

i

p

j

jiji

e

N

e

e

yβ

5/23/2016

20

BLUP

Organizing elements within the exponential

}]

)(

[

)(

(2

1exp(

}

)(

{2

1exp[)

1(),,|(

21 1

2

1 1

2

21 1

22

2

2

N

Xy

N

Xy

N

N

Xy

Np

N

i

p

j

jiji

N

i

p

j

jiji

e

N

i

p

j

jiji

e

N

e

e

yβ

Since the second exponential does not depend on

21 1

22

2

2 }

)(

{2

1exp[)

1(),,|(

N

Xy

Np

N

i

p

j

jiji

e

N

e

e

yβ

BLUP

),(~),,|(^

2 VNp e yβ

N

XyN

i

p

j

jiji

1 1

^)(

NV e

2

Hence,

where

5/23/2016

21

BLUP

β

),,|,...,,(),,|( 2

21

2yyβ epe pp

β

Conditional distribution of

In general it is much easier to work with univariate distributions. Hence, we will derive the conditional

distribution for an element “k” of the vector

)2

exp()2]()(2

1exp[)

2

1(

)2

)0(exp()2(

)2

1exp()]()(

2

1exp[)

2

1(),,,|(

2

2

2/12

2

1 12

2/

2

12

22/12

2

2

012/2

2

1 12

2/

2

2

kN

i

p

j

jiji

e

N

e

p

i

i

e

e

N

i

p

j

jiji

e

N

e

ekk

Xy

SXyp

yβ

BLUP

)2

exp()2]()(2

1exp[)

2

1(

)2

exp()2]()(2

1exp[)

2

1(),,,|(

2

2

2/12

2

12

2/

2

2

2

2/12

2

12

2/

2

2

kN

i

kiki

e

N

e

kN

i

kik

p

kj

jiji

e

N

e

ekk

Xw

XXyp

yβ

Then,

))'('2'{2

1exp[)2()

2

1(

)2

exp()2)](()'(2

1exp[)

2

1(),,,|(

2

2

2

2/122/

2

2

2

2/12

2

2/

2

2

ke

kkkk

e

N

e

kkkkk

e

N

e

ekk

XXWXWW

XWXWp

yβ

Further

5/23/2016

22

BLUP

)')'(2){'(2

1exp[

))'('2'{2

1exp[)2()

2

1(),,,|(

1

2

2

2

2

2

2

2

2

2/122/

2

2

kke

kkke

kk

e

ke

kkkk

e

N

e

ekk

WXXXXX

XXWXWWp

yβ

Then

Finally,

)

]'[

,']'([~),,,|(

2

2

21

2

22

ekk

ek

ekkekk

XX

WXXXNp

yβ

BLUP

Conditional distribution of 2

e

)](2

1exp[)

2

1(

)2

1exp()]()(

2

1exp[)

2

1(),,|(

2

02

12/)(

2

2

2

012/2

2

1 12

2/

2

2

S

SXyp

e

e

N

e

e

ee

N

i

p

j

jijij

e

N

e

e

ee'

yβ

where

p

j

jijii Xye1

2

)(

2

0

2 )(),,|(

eNee Sp ee'yβ

Then

5/23/2016

23

BLUP

was assumed known, but what is ? 2

2

If locus j is a random sample of all possible loci then

)var(2

j

where is the random variable j

However, our model is:

βii Xg

where is constant for all individuals. So the genetic variation is due to the

genotypes!

β

BLUP

Genetic variance (assuming LE)

2

1

2 i

p

i

iia qpV

Where are the allele frequencies at SNP i ii qp and

Let

iii qpV 2

2

iiU

Then

i

p

i

ia UVV

1

5/23/2016

24

BLUP

Covariance between V and U

))((

)().()(

)])()][(([),(

111

p

U

p

V

p

UV

UEVEVUE

UEUVEVEUVCOV

p

i

i

p

i

i

p

i

ii

Then, the genetic variance could be re-written as:

)(2),( 1

2

1 pqpUVpCOVV

p

i

ip

i

jia

BLUP

Let p

p

i

i 1

2

2

Then

2

1

)2(),(

p

i

jia qpUVpCOVV

and

p

i

ji

a

qp

UVpCOVV

1

2

2

),(

5/23/2016

25

BLUP

Assume that is unknown, then we have to specify a prior 2

2222 ~),|(

vssp

))(2

1exp()(

)2

1exp()()

2

)0(exp()2(

)2

1exp()()

2

)0(exp()2(

)2

1exp()]()(

2

1exp[)

2

1(),,,|(

2

212/)(2

2

212/2

12

22/12

2

212/2

12

22/12

2

2

012/2

2

1 12

2/

2

22

S

S

S

SXyp

p

p

i

i

p

i

i

e

ee

N

i

p

j

jijij

e

N

e

e

ββ'

yβ

Conditional distribution

2222 ~),,,|(

pe Sp ββ'yβ

Bayes-A

• Model


With

p

j

ijiji eXy1

5/23/2016

26

Bayes-A

2222 ~),|(

eveeeee ssp

constant~)(p


e


),0(~)|( 22

jjj Np

Prior distributions for 2

j

2222 ~),|(

jvjjjjj ssp

Bayes-A

p

i

p

i i

iii

i

i

e

ee

N

i

p

j

jijij

e

N

e

pe

S

SXyp

i

1 12

212/2

2

2

2/12

2

2

012/2

2

1 12

2/

2

22

1

2

)2

1exp()()

2exp()2(

)2

1exp()]()(

2

1exp[)

2

1()|,...,,,,(

yβ

Joint posterior distribution

Conditional distributions

For the same conditional distributions as with the BLUP-type model 2 and,, e β

- Normal distributions for position parameters

- Scaled inverted Chi-Square for the residual variance

5/23/2016

27

Bayes-A

Conditional distribution 2

j

)2

1exp())(

2exp()2(

)2

1exp()()

2exp()2(

)2

1exp()]()(

2

1exp[)

2

1(),,,,|(

2

212/2

2

2

2/12

1 12

212/2

2

2

2/12

2

2

012/2

2

1 12

2/

2

2

#

22

j

jj

j

j

j

j

p

i

p

i i

iii

i

i

e

ee

N

i

p

j

jijij

e

N

e

jiej

S

S

SXyp

j

i

yβ

)2

exp()(),,,,|(2

22

12/)1(22

#

22

j

jjj

jjiej

Sp j

yβ

2

1

222

#

22 )(~),,,,|(

jjjjjiej Sp yβ

So,

Bayes-B

• Model


With

p

j

ijiji eXy1

5/23/2016

28

Bayes-B

2222 ~),|(

eveeeee ssp

constant~)(p


e


y probabilith wit0

)-(1y probabilit with ),0(~)|(

2

2 j

jj

Np


j

2222 ~),|(

jvjjjjj ssp

Bayes-B

• Full conditional distributions

– Same as with normal prior for

• Normal for the mean

• Scaled inverted Chi-square for the residual variance

• Conditional distributions for

• Not in closed form: We cannot use Gibbs

Sampler!!

2 and, e

2 and , jj

),,,,,|(),,,,|(),,,,|,( 22

#

22

#

222

#

22yβyβyβ jjiejjjiejjjiejjj ppp

5/23/2016

29

Bayes-B

• If the prior is used as proposal distribution

• Then

)()(),,,,|(

)()(),,,,|(,1min{),(

2222

#

2

2222

#

2

candjjejijj

jcandjejijcandj

ppyp

ppyp

),,,,|(

),,,,|(,1min{),(

22

#

2

22

#

2

ejijj

ejijcandj

yp

yp

Bayes-B

• Conditional distribution of

– Given

j

2

j

)

]'[

,']'([~),,,,|(

2

2

21

2

222

j

ejj

ej

j

ejjejjj

XX

WXXXNp

yβ

5/23/2016

30

Bayes-B

• A better re-parametrization

– Create a dummy variable taking values of 1

and 0 with probability and for each SNP

• Assuming is known

ii

ip

1

)1(~)|(

)1(

Bayes-B

• Given


λ

j

) )( , , , , 0 | ( ) 1)( , , , , 1 | (

) | ( ) , , , , | () , , , , , | (

2 2 2 2

2 2

2 2

e j j e j j

j e j j

e j j

y p y p

p y pp

β β

βy β

0 if 0

1 if ),0(~),|(

j

j

2

2

j

jjj

Np

))(,,,,0|()1)(,,,,1|(

)|(),,,,|(),,,,,|(

2222

22

22

ejjejj

jejj

ejjypyp

pypp

ββ

βyβ

5/23/2016

31

Bayes-B

• Assuming unknown

• Posterior of SNP effects and variances

remain the same

• We need to derive the conditional

distributions of

]1,0[~)( Up

Bayes-B

• Conditional of

• Thus,

rrp

epjp )1(),,,,|( )(22

),1(

yβ

Where r is the number of SNPs with indicator variable equal to 1.

)1,1(~),,,,|( 22

),..,1(

rrpBetap

epjyβ

5/23/2016

32

Bayes-C

• Model


With

p

j

ijiji eXy1

Bayes-C

2222 ~),|(

eveeeee ssp

constant~)(p


e


y probabilith wit0

)-(1y probabilit with ),0(~)|(

2

2 Np jj


j

2222 ~),|(

vssp

5/23/2016

33

Bayes-C

• It is similar to attaching a dummy variable,

taking values of 1 and 0 with probability (1-pi)

and pi, such that:

i

0 if 0

1 if ),0(~)|(

i

i

2

2

N

p jj

Thus, )1(~ Bern o u llii

and

]1,0[~)( Up

Bayes-C

• Conditional distributions:

– Given

• Normal for position parameters

• Scaled inverted Chi-square for the residual variance


},...,,{ 21 p

i

))(,,,,0|()1)(,,,,1|(

)|(),,,,|(),,,,,|(

2222

22

22

ejjejj

jejj

ejjypyp

pypp

ββ

βyβ

5/23/2016

34

Bayes-C

2

))(2

1exp()(

)2

1exp()()

2

)(exp()2(),,,|(

2

212/)'(2

2

212/2

12

22/1222

S

Sp

p

p

i

ie

ββ'

yβ


2

'

222 ~),,,|(

pe Sp ββ'yβ

Where p’ is the number of SNP with non-zero effect

Bayes-C

rrp

ep )1(),,,,|( )'(22 yβ


Where r is the number of SNPs with indicator variable equal to 1.

)1,1(~),,,,|( 22 rrpBetap e yβ

5/23/2016

35

Non-Genotyped animals

• Multiple step procedures

– Only genotyped and phenotyped individuals

• Can we include non-genotyped animals?

– Impute missing genotyped

– Use expected relationships

• Produce genomic breeding values to all

animals


p

j

ijiji eXy1

5/23/2016

36


• For non-genotyped animals

– X matrix is unknown

– Can it be imputed?

• In matrix notation

– Where g1 and g2 are the BV for genotyped

and non-genotyped animals

2

1

2

1

2

1

0

0

e

e

g

g

Z

ZIy


• Let,

– Where M1 and M2 (unknown) are the

matrices for marker genotypes for genotyped

and non-genotyped animals and is the

vector of SNP effects

•

βMg11

βMg22

β

5/23/2016

37

Non-genotyped animals

• Using some proprieties for multivariate

normal distribution

• where

),0

0(~),|,( 22

21 uuANp

Auu

),|(),|(~),|,( 2

12

2

1

2

21 uuuppp AuuAuAuu

),0(~),|( 22

1 uuNp AAu

)],0[0(~),,|( 2

21

1

11121

1

1112

2

12 uuNp AAAuAAAuu


• Replacing u1 by the marker effects

• Thus,

• Where

)],[0(~),,|( 2

21

1

1112221

1

1112

2

12 uuNp AAAAβMAAAuu

2

1

1

1

1112

1

2

1

0

0

e

e

εβMAA

βM

Z

ZIy

)][,0(~,| 2

21

1

111222

2

uuN AAAAA

5/23/2016

38


• Further,

• Let

2

1

2

1

2

1

0

0

e

e

εβ

βM

Z

ZIy

M

2

222

111

0

ZU

MZW

MZW


• Model

• In matrix notation,

eUεWβIy

yU'

yW'

yI'

ε

β

μ

σ

σ)AAA(AUU'WU'IU'

UW'σ

σIWW'IW'

UI'WI'II'

2

ε

2

e1

21

1

111222

2

β

2

e

5/23/2016

39


• Re-parametrization using A-1

• It is easy to prove that:

2221

1211

2221

1211

AA

AAA

AA

AAA

1 and

122

21

1

111222)(][ AAAAA

122121

1112)( AAAA


• In matrix notation using A-1,

• If # genotyped animals > # of non-genotyped animals then use A-1

for calculating the conditional mean

• If # genotyped animals < # of non-genotyped animals then use A for

calculating the conditional mean

yU'

yW'

yI'

ε

β

μ

σ

σAUU'WU'IU'

UW'σ

σIWW'IW'

UI'WI'II'

2

ε

2

e

2

β

2

e

22

5/23/2016

40


εUβ

M

M

g

gg

2

1

2

1

Genomic “animal” model

• For selection purposes, we are interested

in animal effects rather than SNP effects

• For 50k panel and few thousands animals,

the system of equations is huge

• With the availability of even higher density

panels, the situation could be worse

5/23/2016

41


• Model

• In matrix notation

p

j

ijiji eXy1

p

j

I1

)( eβX1y jj


• Model

With:

p

j 1

)( jjβXu

IZ

eZu1y

5/23/2016

42


• MMEModel

IZ

yZ'

y1'

uGZZ'Z'1

Z1'1'11 ^

^

2

e

yZ'

y1'

uGI1

1'1

2

e

N

Gu )var(


• MMEModel

IZ

yZ'

y1'

uGZZ'Z'1

Z1'1'11 ^

^

2

e

yZ'

y1'

uGI1

1'1

2

e

N

Gu )var(

5/23/2016

43


• MMEModel

y

y1'

βXβXI1

1'

jj

p

i

j

p

i

je

N

1

1

1

2 )][var(

GuβX j

)var()var(1

p

j

j


• The G matrix

2

1

'

1

'

1

)var()var(

j

p

j

j

j

p

j

j

p

j

j

XX

βXXβX

j

jj

where

)var(2

jj β

5/23/2016

44


• Only few animals are genotyped

– Few hundreds

• Extensive phenotyping

– Several millions

• “pseudo” phenotypes

– Estimates – Non observed

– Accuracies


• Can we combine phenotypes and genomic

information in a single analysis?

• Make use of both source of information

• Remove the need for “pseudo” records

• Work well for regular genetic evaluation,

but not necessarily for other applications

• Decision making tools

• Causative mutation identification

5/23/2016

45


• Model

• Further,

eZuwby

'

2

1

u

uu

Where y is a vector of “raw” phenotypes, and u1 and u2 are the vector of genetic

merit of non-genotyped and genotyped animals, respectively.


• If no Genomic information is available

• Further,

eZuwby

'22

2

2 ),(~),|,(),|( uuu Npp A0AuuAu 1

2

2221

12112

2 ),|,var( uu

AA

AAAuu1

5/23/2016

46


• If animals in u2 has been genotyped

'22

2

2 ),(~),|,(),|( uuu Npp H0AuuAu 1

2

21

12112

2 ),,|,var( uu

GA

AAGAuu1

- Naive approach - Inverse of the matrix could be hard


• If animals in u2 has been genotyped

• Let

),|()|(

),|,(),|(

212

2

2

2

AuuGu

AuuAu 1

pp

pp uu

),(~),|(

),0(~)|(

21

1

2212112

1

221221

2

AAAAuAAAuu

GGu

Np

Np

5/23/2016

47


• Thus,

• where

),(~),|()|(),|( 2

212

2

uu Nppp H0AuuGuAu

11 12

21 22

1 1 1 1

12 22 22 21 11 12 22 21 12 22

1

22 21

H HH

H H

A A GA A A A A A A A G

GA A G


• Further,

• It is very important for computation

1 1

1 1

22

H A 0 0

0 G A

5/23/2016

48


• Implementation

• Conditional distribution of the data

• Priors

eZuwβy

),(~),|( 22

uuNp H0Au

ctep )(b2222 ~),|(

uvuuuuussp

2222 ~),|(

eveeeeessp


• Implementation

– Joint prior

),,( 22

uep uβ,

2

2

2/)2(2

2

2

2/)2(2

22

2

2/)2(2

2

2

2/)2(2

2exp)(

2exp)(

2

1exp

2exp)(

2exp)(

u

uuq

u

e

ee

e

uu

uuq

u

e

ee

e

SS

SS

ue

ue

uHu'

uHu

1

1

2

2

2/)2(2

2

2

2/)2(22/2

2exp)(

2exp)(

2

1exp)(

u

uu

u

e

ee

e

q

u

SSue

uHu1

5/23/2016

49


• Joint posterior

• Conditional distributions


• Conditional distributions

• and

)'u',(βθ '

')'u,'β(θ^^^

Z][XW

5/23/2016

50


1)( WRW'1 (co)variance matrix


• Conditional of

• Let

• and

5/23/2016

51



• Thus,


• Implementation via Gibbs sampler

1. set )0(

ββ , )0(

uu , )0(22

ee and

)0(22

uu

2. Sample )(i

β from ),,,|( )0(2)0(2)1()(yuuβ

ue

iip

Sample )(

1

i from ),,,,|β( )1(2)1(2)1()1(

1

)(

1yuuβ

i

u

i

e

iiip

Sample )(

2

i from ),,,,,...,,β|β( )1(2)1(2)1()1()1(

3

)(

1

)(

2yuu

i

u

i

e

ii

p

iiip

.

.

.

Sample )(i

p from ),,,,β|β( )1(2)1(2)1()()(

yuu

i

u

i

e

ii

p

i

pp

3. Sample )(i

u from ),,,|( )1(2)1(2)()(yββu

i

u

i

e

iip

Sample )(

1

iu from ),,,,|u( )1(2)1(2)()1(

1

)(

1yββu

i

u

i

e

iiip

Sample )(

2

i from ),,,,,...,,u|u( )1(2)1(2)()1()1(

3

)(

1

)(

2yββ

i

u

i

e

ii

q

iii uup

.

.

.

Sample )(i

qu from ),,,,|u( )1(2)1(2)()(

yββu(i)

i

u

i

e

i

p

i

qp

3. Sample )(2 i

e from ),,,|( )1(2)(2

yββu(i) i

u

ii

eup

4. Sample )(2 i

u from ),,,|( )(2)(2

yββu(i) i

e

ii

uup

5. set i=i+1 and go to step 1

5/23/2016

52

Genomic matrix

• Let X be the matrix of SNP genotypes with

entries

• For each locus, let pi is the frequency of

the second allele. Further let P be a matrix

with elements in column i equal to

Genomic matrix

• Let

• Adjusting with P sets mean allele effects to

zero

• Minor allele frequencies have to be

calculated on the base population

• Using data from other generations could

lead to +/- relationships and imbreeding

5/23/2016

53

Genomic matrix

• Computing G

• The denominator scales G to A

• Inbreeding for animal i is obtained by

Genomic matrix

• Computing G

• Where

• Often used in human genetics

5/23/2016

54

Genomic matrix

• Functional formula

• Where xij is coded as 0, 1, and 2

• In all cases non zero relationships even

with non related animals!

Genomic matrix

• Example An si Dam

841 34 552

842 34 552

843 34 580

844 34 580

845 34 533

846 34 533

847 34 446

848 34 446

849 34 536

850 34 536

851 34 186

5/23/2016

55

Genomic matrix

• Example

0.9995 0.4950 0.2588 0.2356 0.2089 0.2638 0.2707 0.2939 0.1937 0.3118 0.1834

0.4950 1.0038 0.2340 0.2233 0.2262 0.2402 0.2260 0.2919 0.1764 0.2389 0.2367

0.2588 0.2340 0.9886 0.6394 0.1527 0.2846 0.2175 0.2655 0.2178 0.2398 0.2405

0.2356 0.2233 0.6394 0.9894 0.2432 0.2661 0.1954 0.2703 0.2420 0.2006 0.1858

0.2089 0.2262 0.1527 0.2432 0.9987 0.5352 0.1843 0.2067 0.2975 0.1586 0.2206

0.2638 0.2402 0.2846 0.2661 0.5352 0.9968 0.2329 0.2845 0.2915 0.1992 0.1812

0.2707 0.2260 0.2175 0.1954 0.1843 0.2329 0.9935 0.5320 0.2446 0.2756 0.2884

0.2939 0.2919 0.2655 0.2703 0.2067 0.2845 0.5320 0.9867 0.2264 0.2632 0.1975

0.1937 0.1764 0.2178 0.2420 0.2975 0.2915 0.2446 0.2264 0.9918 0.5263 0.2634

0.3118 0.2389 0.2398 0.2006 0.1586 0.1992 0.2756 0.2632 0.5263 0.9906 0.2785

0.1834 0.2367 0.2405 0.1858 0.2206 0.1812 0.2884 0.1975 0.2634 0.2785 0.9975

Some comment on G

• For a given population, it is one realization

of A

• Even with few thousands well distributed

SNPs you get a good estimates

– HD panels will have less burden on variance

components based methods

– More sensitive to errors in the genotype data

• Comparison with A are somewhat

meaningless

5/23/2016

56

Factors affecting GS

• Accuracy of Genomic selection

– Corr(TBV,GEBV); Corr(Y,Y)

• It depends on:

– Linkage disequilibrium between QTL and

markers

• Density of the marker panel

• Accuracy increased from 0.65 to 0.80 when r2

increased from 0.095 to 0.21.

• Bovine genome = 3B bp

– Modeling of SNP effects

• Single, joint, haplotypes of markers


0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1 2 3 4 5 6 7 8 9 10 11

avera

ge r

2

distance (kb)

r2

5/23/2016

57


• It depends on:

– Modeling of SNP effects

• Single, joint, haplotypes of markers

– Method of analysis

• Regression, Bayesian, variance based methods

– Size of the genotyped population

• Size of training population

• Estimation of the genomic relationship matrix

• Relationships between genotyped and non

genotyped animals


0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

500 1000 2200

Acc

ura

cy

number of phenotypes

LS

BLUP

BayesB

5/23/2016

58


• Stratification, admixture and crossbred

populations

– Change in minor allele frequency of markers

and QTLs

– Change in LD between markers and between

markers and QTL

– Change in phase of the linkage between

markers and QTLs

• The same problem, although with less

severity, happens across generations


• Change in minor alleles frequencies

– 10 markers

5/23/2016

59


• Change in LD between QTLs and markers

– 1 QTL and 10 markers


• Change in LD between markers L2 L1

5/23/2016

60


Difference in LD between markers difference of LD between

markers and QTL for lines1 and 2


• Estimates of SNP effects

5/23/2016

61


• Change in phase of LD between QTL-

markers

A-----Q A------Q

a------q A------Q

a------q a-------q

a------q A------Q

L1

A-----q A------q

a------Q A------q

a------Q a-------Q

a------Q A------q

L2


• Dealing with admixture

– Pooled data

• Mixed results: 1) Genetic similarity between

mixture components; 2) loss in quality of fit vs.

number of parameters in the model

– In general it tends to lead to intermediate

results

– No explicit solution exists yet

• It was though that with HD panels the

problem could be solved. Not the case

5/23/2016

62


• Non-additive effects

– Dominance

– Epistasis

• Interactions between alleles at different loci

• Joint effect is greater than the sun of marginal

effects

• Very costly to model

aa Aa AA

________________________________________________________


• Epistasis models

• is the epistatic effect between SNPs j

and k

• Although possible to fit, the model is not

well defined

p

ji

p

j

p

jkjkikijjiji

eXXXy1 1

jk

5/23/2016

63


• Keep in mind that: • Only additive effects are inherited

• EBVs should contain only additive effects

• However,

– Can be used to improve genetic potential of

commercial animals

– If carefully mapped, they could be used in the

genetic improvement program

Training population and re-

estimation of segment effects

• The usefulness of the estimated SNPs

effects decay with the number of

generations

– The higher the LD between markers and

QTLs, the longer they can be used

– In extreme, if the SNPs were the QTLs then

there is no need for re-estimation

– Recombination between markers- QTLs will

reduce the accuracy of the GEBV in

subsequent generations

5/23/2016

64

re-estimation of segment effects

• Decay in accuracy

re-estimation of segment effects

• Fit a polygenic effect in the model

• Re-estimate SNP effects when accuracy

falls below a certain threshold

p

jiijiji

euXy1

),0(~ 2

uN Au

jiji

XuGEBV

5/23/2016

65

It could get worse!

• If imputed genotypes are used

– Less then perfect accuracy

– The decay could be faster

– It depends on the number of animals being

genotyped in the posterior generations

• No one has looked at yet!

genome wide association studies and genomic...

Documents