genome wide association studies and genomic...
TRANSCRIPT
5/23/2016
1
Genome wide association
Studies and Genomic
Selection
Animal selection
• Selection based on EBV has been very effective
– Substantial increases in the majority of traits and
livestock species
– An efficient procedure is being routine applied with
little to no intervation by the userAccuracies depend
on heritabilities and amount of information
• For some traits, the results are modest and even
in the opposite direction
– Fertility in dairy cattle
– Health traits in poultry
5/23/2016
2
Animal selection
• Accuracies of BV estimation depend on:
– Heritabilities
– Amount of information
– Precision in phenotype measurements
• Genetic progress is limited by
– Accuracies of EBV
– Generation interval
– Cost
How can we improve selection?
• Use of molecular information
– Identify causative major genes
– QTL in linked to markers
– Develop gene or marker assisted selection
• Very little success!!
– Very few major genes were identified
– Only QTLs with large effects were identified • Small portion of the total variation
• Hard to use in commercial applications
5/23/2016
3
Things could be better if
• More markers are made available
– It was not possible in the early molecular
information
• The identification of genes (QTLs) is not
needed
– Elimination of the mapping step
• Removes the uncertainties
• Remove the complexity of use in commercial setup
• Account even for small QTLs
Two Approaches
• Infinitesimal model:
– Assumes that traits are determined by
an infinite number of additive loci, each
with a small effect
– Basic tool of animal breeding
– Spectacularly successful in many cases
5/23/2016
4
Two Approaches
• Finite loci model:
– There is a finite amount of DNA and genes
– Quantify any gene contribution to variation
– Allows for the dissection of the genetic complexity
• increase the accuracy of breeding values estimation
• Increase selection response
Two Approaches
• But How to find these QTLs
– Candidate gene approach • Some success
• Number of candidate genes is too large
• Very difficult to find candidates
– Linkage mapping • Track genome segment from a generation to the
next using markers
• QTL are not precisely mapped
• Wide confidence intervals of QTL location
5/23/2016
5
Two approaches
– Fine mapping
• Use Linkage disequilibrium mapping approaches
• The process is slow to find causative mutation
• Marker density was a problem
– Result : Failure of MAS
Then …
• The completion of a working draft of the human and several livestock species genomes
– Sequencing of every nucleotide in the genome
• Development in High-through-put technologies
– Hybridization techniques
– Polymorphisms genotyping
5/23/2016
6
Polymorphisms
• A specific sequence variation that happens
with a certain frequency in the population – At least 1%, often 5%
• For example
– Blood type
– CNV (Segment of DNA that are found in different
numbers of copies among individuals)
– Single Nucleotide Polymorphism (SNP)
SNP
)Murray 2007(
5/23/2016
7
SNPs
AGATTTAGATCCCGATAGAG
AGATTTAGATCACGATAGAG
A SNP is a genomic position at which two or more different bases occur in
the population, each with a frequency > Threshold (example 1%)
Alleles are C/A
SNPs
• Human Genome ~ 3 Billion base pairs
• Bovine Genome ~ 3 Billion base pairs
• Two individuals are 99.5 to 99.9% identical
– differ in 3 -10 M base pairs.
• SNPs occur once every ~600 bp (average
gene in the human ~27Kb)
• Around 50 SNPs per gene
5/23/2016
8
SNPs
• Currently available SNP marker chips
– 1- 2 M SNPs in human
– 50k -500K SNPs in cattle
– 60K SNPs in chickens and pigs
– ~200K for dogs
– Varying numbers for fish, sheep and plants
– Cost ~ $120- $200 for 60 K chip
G C T C G A C A A C A G
G T T C G T C A A C A G
C A G
T T G
SNP 1 SNP 2 SNP 2
Haplotypes
SNP Haplotypes
Sorin Istrail, 2009
5/23/2016
9
Polymorphisms
• single nucleotide polymorphisms
(SNPs)
• Difference between any two individuals
3 106 SNPs
… ataggtccCtatttcgcgcCgtatacacgggActata …
… ataggtccGtatttcgcgcCgtatacacgggTctata …
… ataggtccCtatttcgcgcCgtatacacgggTctata …
Haplotypes and Genotypes
• Haplotype: SNP alleles on a chromosome
– 0/1 vector: 0 for major allele, 1 for minor
• Genotype: Alleles of an SNP on both
chromosomes
– a vector of 0/1/2 vector:
5/23/2016
10
Haplotypes and Genotypes
011100110
001000010
021200210
+ Individual two haplotypes
genotype
Coding
Genotypes Dominant Codominant Recessive
AA
AG
GG
1
1
0
2
1
0
1
0
0
1. Coding depends on the genetic model (there is no unique way to code genotypes
2. Interpretation of results depends on the coding procedure
5/23/2016
11
Genotypes vs Haplotypes
• It depends on the application
– Degrees of freedom vs model parameters
• There is no simple answer (application
dependent)
• If possible, you could try both
• Keep in mind, you are trying to fit a model
(explain the variation in the dependent
variable)
How to use this information?
• Genome-wide Association Study (GWAS) – Identify genetic associations with observable traits
– Identify possible causative mutations
• Genomic selection – Estimation genetic quantities that could be used for
selection
– Estimation of relationships between individuals
– Paternity checking
• Genomic enhanced management – Decision support tool
5/23/2016
12
Data analysis
• Single marker analysis
– Compare phenotype means across marker’s
classes
• Simple but too many analyses to be
conducted
– False positive/negatives
– High linkage disequilibrium
iiji egeny
Data analysis
• Example
• Using linear model
– SNP effect estimate = 10
SNP genotype Trait (EBV) average
TT
GT
GG
30
20
10
5/23/2016
13
Data analysis
• Based on single marker analysis results
– Select only significant SNPs for marker
assisted selection!
– Use all SNPs
• Some will be missed
• False positive/negatives
• A better solution: A model that uses all the
SNPs simultaneously
Multi-marker analysis: Genomic
selection
• Use all SNPs in the panel (i.e. 50K) in a
single analysis
– Unless as many observations as SNPs are
available, fixed model could not be
implemented
– Assuming SNP effects as random will remove
the problem
• Prior information is needed
5/23/2016
14
Data analysis
• Model
• The SNP genotypes will be coded as
covariables
– Minor allele content (0,1, 2)
SNP
j
iijji egy#
1
is the trait value (often a pseudo record) for individual i, is the effect of
SNP j, and is the genotype of SNP j for animal i
j
ijg
iy
Data analysis
• The resulting system of equations is very dense
– Sparse matrix techniques are of little interest
– Constructing and inverting the coefficient matrix is seldom possible!
• Gauss-Seidel with residual update (GSRU)
– matrix-free BLUP-like estimation procedure
– Low Computational cost
5/23/2016
15
Data analysis
• Gauss-Seidel
• Further
α
t
j
j
'
j
t
n:1jp:1j
1t
1j:11j:1
'
j
xx
βXβX(yx )1
1t
jj
jt β
',1) xeβXβX(yt
n:1jp:1j
1t
1j:11j:1
Data analysis
• GSRU
• Only vectors are involved in
computation
α
β t
jj
jt
t
j
j
'
j
'
j
'
j
xx
xxex ),1
1
5/23/2016
16
Data analysis
• Implementation depends on the assumed prior information – BLUP type, BayesA, BayesB, BayesC ..etc
• Implementation depends of what you are interested in estimating – SNP effects
– Genomic EBVs –GBLUP (simular to regular BLUP, except A is replaced by the genomic relationship matrix
• In both cases, an estimated GBV is obtained – Directly –GBLUP
– Sum of SNP effects
SNP
j
ijj g#
1
^
5/23/2016
17
Data analysis
• Model
• Likelihood function
With
p
j
ijiji eXy1
BLUP
2222 ~),|(
eveeeee ssp
constant~)(p
Prior distributions for and 2
e
Prior distributions for j
),0(~)|( 22
Np j
BLUP Like approach
5/23/2016
18
BLUP
2
Assume is known then,
The joint posterior distribution is easily obtained as the product of the likelihood function
and the prior distributions:
)()()(),,|(
),,(),,|()|,,(
22
222
ee
eee
pppp
ppp
ββy
ββyyβ
)()(
)()()(),,(
2
22
e
ee
pp
pppp
β
ββ
)()(),,|()|,,( 222 ββyyβ pppp eee
Given that
Then
BLUP
),,|( 2yβ
ep
),,|( 2yβ
ep
),,|( 2yβ
ep
])(2
1exp[)
2
1(
)2
)0(exp()2(
)2
1exp()]()(
2
1exp[)
2
1(),,|(
2
1 12
2/
2
12
22/12
2
2
012/2
2
1 12
2/
2
2
N
i
p
j
jijij
e
N
e
p
i
i
e
e
N
i
p
j
jijij
e
N
e
e
Xy
SXyp
yβ
For an implementation via the Gibbs sampler, we have to derive the full conditional distributions:
5/23/2016
19
BLUP
2/)2
1( N
])()(2(2
1exp[)
1(
])()(2(2
1exp[)
1(
])(2
1exp[)
1(),,|(
2
1 11 1
2
22
2
1
2
11
2
22
2
1
2
12
22
2
N
i
p
j
jiji
N
i
p
j
jiji
e
N
e
N
i
p
j
jiji
p
j
jiji
e
N
e
N
i
p
j
jiji
e
N
e
e
XyXyN
XyXy
Xyp
yβ
Putting into the proportionality term,
BLUP
}]
)()(
2{2
1exp[)
1(),,|(
2
1 11 12
22
2
2
N
Xy
N
Xy
Np
N
i
p
j
jiji
N
i
p
j
jiji
e
N
e
e
yβ
Further
Adding and subtracting leads to: 21 1]
)(
[N
XyN
i
p
j
jiji
]]
)(
[]
)(
[
}
)()(
2{2
1exp[)
1(),,|(
21 121 1
2
1 11 12
22
2
2
N
Xy
N
Xy
N
Xy
N
Xy
Np
N
i
p
j
jiji
N
i
p
j
jiji
N
i
p
j
jiji
N
i
p
j
jiji
e
N
e
e
yβ
5/23/2016
20
BLUP
Organizing elements within the exponential
}]
)(
[
)(
(2
1exp(
}
)(
{2
1exp[)
1(),,|(
21 1
2
1 1
2
21 1
22
2
2
N
Xy
N
Xy
N
N
Xy
Np
N
i
p
j
jiji
N
i
p
j
jiji
e
N
i
p
j
jiji
e
N
e
e
yβ
Since the second exponential does not depend on
21 1
22
2
2 }
)(
{2
1exp[)
1(),,|(
N
Xy
Np
N
i
p
j
jiji
e
N
e
e
yβ
BLUP
),(~),,|(^
2 VNp e yβ
N
XyN
i
p
j
jiji
1 1
^)(
NV e
2
Hence,
where
5/23/2016
21
BLUP
β
),,|,...,,(),,|( 2
21
2yyβ epe pp
β
Conditional distribution of
In general it is much easier to work with univariate distributions. Hence, we will derive the conditional
distribution for an element “k” of the vector
)2
exp()2]()(2
1exp[)
2
1(
)2
)0(exp()2(
)2
1exp()]()(
2
1exp[)
2
1(),,,|(
2
2
2/12
2
1 12
2/
2
12
22/12
2
2
012/2
2
1 12
2/
2
2
kN
i
p
j
jiji
e
N
e
p
i
i
e
e
N
i
p
j
jiji
e
N
e
ekk
Xy
SXyp
yβ
BLUP
)2
exp()2]()(2
1exp[)
2
1(
)2
exp()2]()(2
1exp[)
2
1(),,,|(
2
2
2/12
2
12
2/
2
2
2
2/12
2
12
2/
2
2
kN
i
kiki
e
N
e
kN
i
kik
p
kj
jiji
e
N
e
ekk
Xw
XXyp
yβ
Then,
))'('2'{2
1exp[)2()
2
1(
)2
exp()2)](()'(2
1exp[)
2
1(),,,|(
2
2
2
2/122/
2
2
2
2/12
2
2/
2
2
ke
kkkk
e
N
e
kkkkk
e
N
e
ekk
XXWXWW
XWXWp
yβ
Further
5/23/2016
22
BLUP
)')'(2){'(2
1exp[
))'('2'{2
1exp[)2()
2
1(),,,|(
1
2
2
2
2
2
2
2
2
2/122/
2
2
kke
kkke
kk
e
ke
kkkk
e
N
e
ekk
WXXXXX
XXWXWWp
yβ
Then
Finally,
)
]'[
,']'([~),,,|(
2
2
21
2
22
ekk
ek
ekkekk
XX
WXXXNp
yβ
BLUP
Conditional distribution of 2
e
)](2
1exp[)
2
1(
)2
1exp()]()(
2
1exp[)
2
1(),,|(
2
02
12/)(
2
2
2
012/2
2
1 12
2/
2
2
S
SXyp
e
e
N
e
e
ee
N
i
p
j
jijij
e
N
e
e
ee'
yβ
where
p
j
jijii Xye1
2
)(
2
0
2 )(),,|(
eNee Sp ee'yβ
Then
5/23/2016
23
BLUP
was assumed known, but what is ? 2
2
If locus j is a random sample of all possible loci then
)var(2
j
where is the random variable j
However, our model is:
βii Xg
where is constant for all individuals. So the genetic variation is due to the
genotypes!
β
BLUP
Genetic variance (assuming LE)
2
1
2 i
p
i
iia qpV
Where are the allele frequencies at SNP i ii qp and
Let
iii qpV 2
2
iiU
Then
i
p
i
ia UVV
1
5/23/2016
24
BLUP
Covariance between V and U
))((
)().()(
)])()][(([),(
111
p
U
p
V
p
UV
UEVEVUE
UEUVEVEUVCOV
p
i
i
p
i
i
p
i
ii
Then, the genetic variance could be re-written as:
)(2),( 1
2
1 pqpUVpCOVV
p
i
ip
i
jia
BLUP
Let p
p
i
i 1
2
2
Then
2
1
)2(),(
p
i
jia qpUVpCOVV
and
p
i
ji
a
qp
UVpCOVV
1
2
2
),(
5/23/2016
25
BLUP
Assume that is unknown, then we have to specify a prior 2
2222 ~),|(
vssp
))(2
1exp()(
)2
1exp()()
2
)0(exp()2(
)2
1exp()()
2
)0(exp()2(
)2
1exp()]()(
2
1exp[)
2
1(),,,|(
2
212/)(2
2
212/2
12
22/12
2
212/2
12
22/12
2
2
012/2
2
1 12
2/
2
22
S
S
S
SXyp
p
p
i
i
p
i
i
e
ee
N
i
p
j
jijij
e
N
e
e
ββ'
yβ
Conditional distribution
2222 ~),,,|(
pe Sp ββ'yβ
Bayes-A
• Model
• Likelihood function
With
p
j
ijiji eXy1
5/23/2016
26
Bayes-A
2222 ~),|(
eveeeee ssp
constant~)(p
Prior distributions for and 2
e
Prior distributions for j
),0(~)|( 22
jjj Np
Prior distributions for 2
j
2222 ~),|(
jvjjjjj ssp
Bayes-A
p
i
p
i i
iii
i
i
e
ee
N
i
p
j
jijij
e
N
e
pe
S
SXyp
i
1 12
212/2
2
2
2/12
2
2
012/2
2
1 12
2/
2
22
1
2
)2
1exp()()
2exp()2(
)2
1exp()]()(
2
1exp[)
2
1()|,...,,,,(
yβ
Joint posterior distribution
Conditional distributions
For the same conditional distributions as with the BLUP-type model 2 and,, e β
- Normal distributions for position parameters
- Scaled inverted Chi-Square for the residual variance
5/23/2016
27
Bayes-A
Conditional distribution 2
j
)2
1exp())(
2exp()2(
)2
1exp()()
2exp()2(
)2
1exp()]()(
2
1exp[)
2
1(),,,,|(
2
212/2
2
2
2/12
1 12
212/2
2
2
2/12
2
2
012/2
2
1 12
2/
2
2
#
22
j
jj
j
j
j
j
p
i
p
i i
iii
i
i
e
ee
N
i
p
j
jijij
e
N
e
jiej
S
S
SXyp
j
i
yβ
)2
exp()(),,,,|(2
22
12/)1(22
#
22
j
jjj
jjiej
Sp j
yβ
2
1
222
#
22 )(~),,,,|(
jjjjjiej Sp yβ
So,
Bayes-B
• Model
• Likelihood function
With
p
j
ijiji eXy1
5/23/2016
28
Bayes-B
2222 ~),|(
eveeeee ssp
constant~)(p
Prior distributions for and 2
e
Prior distributions for j
y probabilith wit0
)-(1y probabilit with ),0(~)|(
2
2 j
jj
Np
Prior distributions for 2
j
2222 ~),|(
jvjjjjj ssp
Bayes-B
• Full conditional distributions
– Same as with normal prior for
• Normal for the mean
• Scaled inverted Chi-square for the residual variance
• Conditional distributions for
• Not in closed form: We cannot use Gibbs
Sampler!!
2 and, e
2 and , jj
),,,,,|(),,,,|(),,,,|,( 22
#
22
#
222
#
22yβyβyβ jjiejjjiejjjiejjj ppp
5/23/2016
29
Bayes-B
• If the prior is used as proposal distribution
• Then
)()(),,,,|(
)()(),,,,|(,1min{),(
2222
#
2
2222
#
2
candjjejijj
jcandjejijcandj
ppyp
ppyp
),,,,|(
),,,,|(,1min{),(
22
#
2
22
#
2
ejijj
ejijcandj
yp
yp
Bayes-B
• Conditional distribution of
– Given
j
2
j
)
]'[
,']'([~),,,,|(
2
2
21
2
222
j
ejj
ej
j
ejjejjj
XX
WXXXNp
yβ
5/23/2016
30
Bayes-B
• A better re-parametrization
– Create a dummy variable taking values of 1
and 0 with probability and for each SNP
• Assuming is known
ii
ip
1
)1(~)|(
)1(
Bayes-B
• Given
• Conditional distribution of
λ
j
) )( , , , , 0 | ( ) 1)( , , , , 1 | (
) | ( ) , , , , | () , , , , , | (
2 2 2 2
2 2
2 2
e j j e j j
j e j j
e j j
y p y p
p y pp
β β
βy β
0 if 0
1 if ),0(~),|(
j
j
2
2
j
jjj
Np
))(,,,,0|()1)(,,,,1|(
)|(),,,,|(),,,,,|(
2222
22
22
ejjejj
jejj
ejjypyp
pypp
ββ
βyβ
5/23/2016
31
Bayes-B
• Assuming unknown
• Posterior of SNP effects and variances
remain the same
• We need to derive the conditional
distributions of
]1,0[~)( Up
Bayes-B
• Conditional of
• Thus,
rrp
epjp )1(),,,,|( )(22
),1(
yβ
Where r is the number of SNPs with indicator variable equal to 1.
)1,1(~),,,,|( 22
),..,1(
rrpBetap
epjyβ
5/23/2016
32
Bayes-C
• Model
• Likelihood function
With
p
j
ijiji eXy1
Bayes-C
2222 ~),|(
eveeeee ssp
constant~)(p
Prior distributions for and 2
e
Prior distributions for j
y probabilith wit0
)-(1y probabilit with ),0(~)|(
2
2 Np jj
Prior distributions for 2
j
2222 ~),|(
vssp
5/23/2016
33
Bayes-C
• It is similar to attaching a dummy variable,
taking values of 1 and 0 with probability (1-pi)
and pi, such that:
i
0 if 0
1 if ),0(~)|(
i
i
2
2
N
p jj
Thus, )1(~ Bern o u llii
and
]1,0[~)( Up
Bayes-C
• Conditional distributions:
– Given
• Normal for position parameters
• Scaled inverted Chi-square for the residual variance
• Conditional distribution of
},...,,{ 21 p
i
))(,,,,0|()1)(,,,,1|(
)|(),,,,|(),,,,,|(
2222
22
22
ejjejj
jejj
ejjypyp
pypp
ββ
βyβ
5/23/2016
34
Bayes-C
2
))(2
1exp()(
)2
1exp()()
2
)(exp()2(),,,|(
2
212/)'(2
2
212/2
12
22/1222
S
Sp
p
p
i
ie
ββ'
yβ
Conditional distribution
2
'
222 ~),,,|(
pe Sp ββ'yβ
Where p’ is the number of SNP with non-zero effect
Bayes-C
rrp
ep )1(),,,,|( )'(22 yβ
Conditional distribution
Where r is the number of SNPs with indicator variable equal to 1.
)1,1(~),,,,|( 22 rrpBetap e yβ
5/23/2016
35
Non-Genotyped animals
• Multiple step procedures
– Only genotyped and phenotyped individuals
• Can we include non-genotyped animals?
– Impute missing genotyped
– Use expected relationships
• Produce genomic breeding values to all
animals
Non-Genotyped animals
p
j
ijiji eXy1
5/23/2016
36
Non-Genotyped animals
• For non-genotyped animals
– X matrix is unknown
– Can it be imputed?
• In matrix notation
– Where g1 and g2 are the BV for genotyped
and non-genotyped animals
2
1
2
1
2
1
0
0
e
e
g
g
Z
ZIy
Non-Genotyped animals
• Let,
– Where M1 and M2 (unknown) are the
matrices for marker genotypes for genotyped
and non-genotyped animals and is the
vector of SNP effects
•
βMg11
βMg22
β
5/23/2016
37
Non-genotyped animals
• Using some proprieties for multivariate
normal distribution
• where
),0
0(~),|,( 22
21 uuANp
Auu
),|(),|(~),|,( 2
12
2
1
2
21 uuuppp AuuAuAuu
),0(~),|( 22
1 uuNp AAu
)],0[0(~),,|( 2
21
1
11121
1
1112
2
12 uuNp AAAuAAAuu
Non-genotyped animals
• Replacing u1 by the marker effects
• Thus,
• Where
)],[0(~),,|( 2
21
1
1112221
1
1112
2
12 uuNp AAAAβMAAAuu
2
1
1
1
1112
1
2
1
0
0
e
e
εβMAA
βM
Z
ZIy
)][,0(~,| 2
21
1
111222
2
uuN AAAAA
5/23/2016
38
Non-genotyped animals
• Further,
• Let
2
1
2
1
2
1
0
0
e
e
εβ
βM
Z
ZIy
M
2
222
111
0
ZU
MZW
MZW
Non-genotyped animals
• Model
• In matrix notation,
eUεWβIy
yU'
yW'
yI'
ε
β
μ
σ
σ)AAA(AUU'WU'IU'
UW'σ
σIWW'IW'
UI'WI'II'
2
ε
2
e1
21
1
111222
2
β
2
e
5/23/2016
39
Non-genotyped animals
• Re-parametrization using A-1
• It is easy to prove that:
2221
1211
2221
1211
AA
AAA
AA
AAA
1 and
122
21
1
111222)(][ AAAAA
122121
1112)( AAAA
Non-genotyped animals
• In matrix notation using A-1,
• If # genotyped animals > # of non-genotyped animals then use A-1
for calculating the conditional mean
• If # genotyped animals < # of non-genotyped animals then use A for
calculating the conditional mean
yU'
yW'
yI'
ε
β
μ
σ
σAUU'WU'IU'
UW'σ
σIWW'IW'
UI'WI'II'
2
ε
2
e
2
β
2
e
22
5/23/2016
40
Non-genotyped animals
εUβ
M
M
g
gg
2
1
2
1
Genomic “animal” model
• For selection purposes, we are interested
in animal effects rather than SNP effects
• For 50k panel and few thousands animals,
the system of equations is huge
• With the availability of even higher density
panels, the situation could be worse
5/23/2016
41
Genomic “animal” model
• Model
• In matrix notation
p
j
ijiji eXy1
p
j
I1
)( eβX1y jj
Genomic “animal” model
• Model
With:
p
j 1
)( jjβXu
IZ
eZu1y
5/23/2016
42
Genomic “animal” model
• MMEModel
IZ
yZ'
y1'
uGZZ'Z'1
Z1'1'11 ^
^
2
e
yZ'
y1'
uGI1
1'1
2
e
N
Gu )var(
Genomic “animal” model
• MMEModel
IZ
yZ'
y1'
uGZZ'Z'1
Z1'1'11 ^
^
2
e
yZ'
y1'
uGI1
1'1
2
e
N
Gu )var(
5/23/2016
43
Genomic “animal” model
• MMEModel
y
y1'
βXβXI1
1'
jj
p
i
j
p
i
je
N
1
1
1
2 )][var(
GuβX j
)var()var(1
p
j
j
Genomic “animal” model
• The G matrix
2
1
'
1
'
1
)var()var(
j
p
j
j
j
p
j
j
p
j
j
XX
βXXβX
j
jj
where
)var(2
jj β
5/23/2016
44
Genomic “animal” model
• Only few animals are genotyped
– Few hundreds
• Extensive phenotyping
– Several millions
• “pseudo” phenotypes
– Estimates – Non observed
– Accuracies
Genomic “animal” model
• Can we combine phenotypes and genomic
information in a single analysis?
• Make use of both source of information
• Remove the need for “pseudo” records
• Work well for regular genetic evaluation,
but not necessarily for other applications
• Decision making tools
• Causative mutation identification
5/23/2016
45
Genomic “animal” model
• Model
• Further,
eZuwby
'
2
1
u
uu
Where y is a vector of “raw” phenotypes, and u1 and u2 are the vector of genetic
merit of non-genotyped and genotyped animals, respectively.
Genomic “animal” model
• If no Genomic information is available
• Further,
eZuwby
'22
2
2 ),(~),|,(),|( uuu Npp A0AuuAu 1
2
2221
12112
2 ),|,var( uu
AA
AAAuu1
5/23/2016
46
Genomic “animal” model
• If animals in u2 has been genotyped
'22
2
2 ),(~),|,(),|( uuu Npp H0AuuAu 1
2
21
12112
2 ),,|,var( uu
GA
AAGAuu1
- Naive approach - Inverse of the matrix could be hard
Genomic “animal” model
• If animals in u2 has been genotyped
• Let
),|()|(
),|,(),|(
212
2
2
2
AuuGu
AuuAu 1
pp
pp uu
),(~),|(
),0(~)|(
21
1
2212112
1
221221
2
AAAAuAAAuu
GGu
Np
Np
5/23/2016
47
Genomic “animal” model
• Thus,
• where
),(~),|()|(),|( 2
212
2
uu Nppp H0AuuGuAu
11 12
21 22
1 1 1 1
12 22 22 21 11 12 22 21 12 22
1
22 21
H HH
H H
A A GA A A A A A A A G
GA A G
Genomic “animal” model
• Further,
• It is very important for computation
1 1
1 1
22
H A 0 0
0 G A
5/23/2016
48
Genomic “animal” model
• Implementation
• Conditional distribution of the data
• Priors
eZuwβy
),(~),|( 22
uuNp H0Au
ctep )(b2222 ~),|(
uvuuuuussp
2222 ~),|(
eveeeeessp
Genomic “animal” model
• Implementation
– Joint prior
),,( 22
uep uβ,
2
2
2/)2(2
2
2
2/)2(2
22
2
2/)2(2
2
2
2/)2(2
2exp)(
2exp)(
2
1exp
2exp)(
2exp)(
u
uuq
u
e
ee
e
uu
uuq
u
e
ee
e
SS
SS
ue
ue
uHu'
uHu
1
1
2
2
2/)2(2
2
2
2/)2(22/2
2exp)(
2exp)(
2
1exp)(
u
uu
u
e
ee
e
q
u
SSue
uHu1
5/23/2016
49
Genomic “animal” model
• Joint posterior
• Conditional distributions
Genomic “animal” model
• Conditional distributions
• and
)'u',(βθ '
')'u,'β(θ^^^
Z][XW
5/23/2016
50
Genomic “animal” model
1)( WRW'1 (co)variance matrix
Genomic “animal” model
• Conditional of
• Let
• and
5/23/2016
51
Genomic “animal” model
• Conditional distribution of
• Thus,
Genomic “animal” model
• Implementation via Gibbs sampler
1. set )0(
ββ , )0(
uu , )0(22
ee and
)0(22
uu
2. Sample )(i
β from ),,,|( )0(2)0(2)1()(yuuβ
ue
iip
Sample )(
1
i from ),,,,|β( )1(2)1(2)1()1(
1
)(
1yuuβ
i
u
i
e
iiip
Sample )(
2
i from ),,,,,...,,β|β( )1(2)1(2)1()1()1(
3
)(
1
)(
2yuu
i
u
i
e
ii
p
iiip
.
.
.
Sample )(i
p from ),,,,β|β( )1(2)1(2)1()()(
yuu
i
u
i
e
ii
p
i
pp
3. Sample )(i
u from ),,,|( )1(2)1(2)()(yββu
i
u
i
e
iip
Sample )(
1
iu from ),,,,|u( )1(2)1(2)()1(
1
)(
1yββu
i
u
i
e
iiip
Sample )(
2
i from ),,,,,...,,u|u( )1(2)1(2)()1()1(
3
)(
1
)(
2yββ
i
u
i
e
ii
q
iii uup
.
.
.
Sample )(i
qu from ),,,,|u( )1(2)1(2)()(
yββu(i)
i
u
i
e
i
p
i
qp
3. Sample )(2 i
e from ),,,|( )1(2)(2
yββu(i) i
u
ii
eup
4. Sample )(2 i
u from ),,,|( )(2)(2
yββu(i) i
e
ii
uup
5. set i=i+1 and go to step 1
5/23/2016
52
Genomic matrix
• Let X be the matrix of SNP genotypes with
entries
• For each locus, let pi is the frequency of
the second allele. Further let P be a matrix
with elements in column i equal to
Genomic matrix
• Let
• Adjusting with P sets mean allele effects to
zero
• Minor allele frequencies have to be
calculated on the base population
• Using data from other generations could
lead to +/- relationships and imbreeding
5/23/2016
53
Genomic matrix
• Computing G
• The denominator scales G to A
• Inbreeding for animal i is obtained by
Genomic matrix
• Computing G
• Where
• Often used in human genetics
5/23/2016
54
Genomic matrix
• Functional formula
• Where xij is coded as 0, 1, and 2
• In all cases non zero relationships even
with non related animals!
Genomic matrix
• Example An si Dam
841 34 552
842 34 552
843 34 580
844 34 580
845 34 533
846 34 533
847 34 446
848 34 446
849 34 536
850 34 536
851 34 186
5/23/2016
55
Genomic matrix
• Example
0.9995 0.4950 0.2588 0.2356 0.2089 0.2638 0.2707 0.2939 0.1937 0.3118 0.1834
0.4950 1.0038 0.2340 0.2233 0.2262 0.2402 0.2260 0.2919 0.1764 0.2389 0.2367
0.2588 0.2340 0.9886 0.6394 0.1527 0.2846 0.2175 0.2655 0.2178 0.2398 0.2405
0.2356 0.2233 0.6394 0.9894 0.2432 0.2661 0.1954 0.2703 0.2420 0.2006 0.1858
0.2089 0.2262 0.1527 0.2432 0.9987 0.5352 0.1843 0.2067 0.2975 0.1586 0.2206
0.2638 0.2402 0.2846 0.2661 0.5352 0.9968 0.2329 0.2845 0.2915 0.1992 0.1812
0.2707 0.2260 0.2175 0.1954 0.1843 0.2329 0.9935 0.5320 0.2446 0.2756 0.2884
0.2939 0.2919 0.2655 0.2703 0.2067 0.2845 0.5320 0.9867 0.2264 0.2632 0.1975
0.1937 0.1764 0.2178 0.2420 0.2975 0.2915 0.2446 0.2264 0.9918 0.5263 0.2634
0.3118 0.2389 0.2398 0.2006 0.1586 0.1992 0.2756 0.2632 0.5263 0.9906 0.2785
0.1834 0.2367 0.2405 0.1858 0.2206 0.1812 0.2884 0.1975 0.2634 0.2785 0.9975
Some comment on G
• For a given population, it is one realization
of A
• Even with few thousands well distributed
SNPs you get a good estimates
– HD panels will have less burden on variance
components based methods
– More sensitive to errors in the genotype data
• Comparison with A are somewhat
meaningless
5/23/2016
56
Factors affecting GS
• Accuracy of Genomic selection
– Corr(TBV,GEBV); Corr(Y,Y)
• It depends on:
– Linkage disequilibrium between QTL and
markers
• Density of the marker panel
• Accuracy increased from 0.65 to 0.80 when r2
increased from 0.095 to 0.21.
• Bovine genome = 3B bp
– Modeling of SNP effects
• Single, joint, haplotypes of markers
Factors affecting GS
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1 2 3 4 5 6 7 8 9 10 11
avera
ge r
2
distance (kb)
r2
5/23/2016
57
Factors affecting GS
• It depends on:
– Modeling of SNP effects
• Single, joint, haplotypes of markers
– Method of analysis
• Regression, Bayesian, variance based methods
– Size of the genotyped population
• Size of training population
• Estimation of the genomic relationship matrix
• Relationships between genotyped and non
genotyped animals
Factors affecting GS
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
500 1000 2200
Acc
ura
cy
number of phenotypes
LS
BLUP
BayesB
5/23/2016
58
Factors affecting GS
• Stratification, admixture and crossbred
populations
– Change in minor allele frequency of markers
and QTLs
– Change in LD between markers and between
markers and QTL
– Change in phase of the linkage between
markers and QTLs
• The same problem, although with less
severity, happens across generations
Factors affecting GS
• Change in minor alleles frequencies
– 10 markers
5/23/2016
59
Factors affecting GS
• Change in LD between QTLs and markers
– 1 QTL and 10 markers
Factors affecting GS
• Change in LD between markers L2 L1
5/23/2016
60
Factors affecting GS
Difference in LD between markers difference of LD between
markers and QTL for lines1 and 2
Factors affecting GS
• Estimates of SNP effects
5/23/2016
61
Factors affecting GS
• Change in phase of LD between QTL-
markers
A-----Q A------Q
a------q A------Q
a------q a-------q
a------q A------Q
L1
A-----q A------q
a------Q A------q
a------Q a-------Q
a------Q A------q
L2
Factors affecting GS
• Dealing with admixture
– Pooled data
• Mixed results: 1) Genetic similarity between
mixture components; 2) loss in quality of fit vs.
number of parameters in the model
– In general it tends to lead to intermediate
results
– No explicit solution exists yet
• It was though that with HD panels the
problem could be solved. Not the case
5/23/2016
62
Factors affecting GS
• Non-additive effects
– Dominance
– Epistasis
• Interactions between alleles at different loci
• Joint effect is greater than the sun of marginal
effects
• Very costly to model
aa Aa AA
________________________________________________________
Factors affecting GS
• Epistasis models
• is the epistatic effect between SNPs j
and k
• Although possible to fit, the model is not
well defined
p
ji
p
j
p
jkjkikijjiji
eXXXy1 1
jk
5/23/2016
63
Factors affecting GS
• Keep in mind that: • Only additive effects are inherited
• EBVs should contain only additive effects
• However,
– Can be used to improve genetic potential of
commercial animals
– If carefully mapped, they could be used in the
genetic improvement program
Training population and re-
estimation of segment effects
• The usefulness of the estimated SNPs
effects decay with the number of
generations
– The higher the LD between markers and
QTLs, the longer they can be used
– In extreme, if the SNPs were the QTLs then
there is no need for re-estimation
– Recombination between markers- QTLs will
reduce the accuracy of the GEBV in
subsequent generations
5/23/2016
64
re-estimation of segment effects
• Decay in accuracy
re-estimation of segment effects
• Fit a polygenic effect in the model
• Re-estimate SNP effects when accuracy
falls below a certain threshold
p
jiijiji
euXy1
),0(~ 2
uN Au
jiji
XuGEBV
5/23/2016
65
It could get worse!
• If imputed genotypes are used
– Less then perfect accuracy
– The decay could be faster
– It depends on the number of animals being
genotyped in the posterior generations
• No one has looked at yet!