genomic-based relatedness for prediction of parental use ...€¦ · marcos malosetti, colleen...
TRANSCRIPT
Use of kinship matrices blending pedigree and genomic-based relatedness for prediction of parental
line breeding values in sorghum
Julio Velazco, Marcos Malosetti, Colleen Hunt, Emma Mace,
David Jordan & Fred van Eeuwijk
DO NOT C
OPY
Breeding program for female parents
Pedigree+
SNP genotypes
tester
test-cross
Phenotype
A/B-line BV
After Acquaah (2012)DO NOT C
OPY
Phenotypic data
Advanced Yield Testing:
• 646 female lines x 5 male testers 2645 testcross hybrids
• 26 trials across the major sorghum cropping region in Australia
• 15 locations, between 2008 and 2014
• Traits: grain yield (and plant height)
DO NOT C
OPY
genotyped lines
Pedigree and marker data
• Pedigree: 646 female lines + 499 ancestors (tracing back 28 generations)
Full-sib families with complex pedigree
• Genotyped: 646 female lines (bottom layer)
4.8 K SNPs after filtering
DO NOT C
OPY
Pedigree (A) and genomic (G) relationship matrices
• A estimates expected average relationships (IBD), assuming infinite loci
• G gives realized relationships (IBS), considering finite size of the genome
• G (based on many markers) is similar to A, but more precise: SNPs follow Mendel’s rules, tracing alleles (“fine relationships”) Genotype of descendant = half of each parent + Mendelian sampling e.g. some full-sibs are more similar than others (rA≠0.5)
• A (based on deep pedigree) can account for potential LD not traced by G: at population level at family level across chromosomesDO N
OT COPY
Explore prediction models that combine genomic and
pedigree-based relationships for genetic evaluation of
parental lines
1. when all the reference lines are genotyped
2. when only some of the reference lines are genotyped (preliminary)
Aim
DO NOT C
OPY
1. All lines are genotyped
P
Ancestors
Lines
Test-crosses
A
G
DO NOT C
OPY
1. All lines are genotyped
P
Ancestors
Lines
Test-crosses
A
G
Phenotypic analysis
yL
Stage 1: Spatial analysisto obtain spatially-adjusted testcross means from each trial
Stage 2: MET analysisto obtain adjusted Line means across testers-trials (y
L)
DO NOT C
OPY
Prediction models
A-BLUP:
G-BLUP:
yL= 1μ + Za + e
• a vector of familial additive effects, a ~ N(0, Aσa2)
yL= 1μ + Zg + e
• g vector of additive genomic effects, g ~ N(0, Gσg2)
DO NOT C
OPY
G-BLUP plus a polygenic effect:
yL= 1μ + Zk + e
• k vector of total additive genetic effects, k ~ N(0, Kσk2)
•K = wA + (1−w)G•w proportion of additive genetic variance not explained by the SNPs
Prediction models
DO NOT C
OPY
Two different approaches to set w:
Use the w that maximizes the (RE)likelihood based on phenotyped lines;
equivalent to fitting G+A:
The w that optimizes predictions of unphenotyped lines: K-BLUP
G-BLUP plus a polygenic effect:
yL= 1μ + Zk + e
• k vector of total additive genetic effects, k ~ N(0, Kσk2)
•K = wA + (1−w)G•w proportion of additive genetic variance not explained by the SNPs
Prediction models
yL= 1μ + Zg + Za + e
DO NOT C
OPY
Within-family prediction (W-fam)
Validation of prediction models
prediction
Training set
20% of lines from each full-sib family
high genetic relatedness between tested and predicted lines
prediction
Validation set
Training set
Validation set
20% of completefull-sib families
Among-family prediction (A-fam)
low genetic relatedness between tested and predicted lines
Results averaged over 100 realizations (5 folds x 20 reps)DO NOT C
OPY
Evaluation of prediction models
Prediction quality criteria:
Predictive ability• r(yL(VS), EBV)
Bias (inflation)• Coefficient of regression (b)
• b < 1 indicates inflation of predicted genetic variance
Accuracy• Mean squared error of prediction (MSEP)
The optimal predictor maximizes r, is unbiased and not inflated (b = 1) and minimizes MSEPDO N
OT COPY
K = wA + (1-w)G
0.21
0.22
0.23
0.24
0.25
0.26
0.27
0.28
0.29
0.30
0.31
0.32
0.33
0.34
0.35
0.36
0.37
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
Pred
ictiv
e ab
ility
(r)
Weight (w)
GY W-fam A-fam
Effect of different weights w on predictive ability using K-BLUP
(G-BLUP) DO NOT C
OPY
K = wA + (1-w)G
0.21
0.22
0.23
0.24
0.25
0.26
0.27
0.28
0.29
0.30
0.31
0.32
0.33
0.34
0.35
0.36
0.37
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
Pred
ictiv
e ab
ility
(r)
Weight (w)
GY W-fam A-fam
Effect of different weights w on predictive ability using K-BLUP
(G-BLUP)
G vs K
6.6%
8.1%
DO NOT C
OPY
K = wA + (1-w)G
0.21
0.22
0.23
0.24
0.25
0.26
0.27
0.28
0.29
0.30
0.31
0.32
0.33
0.34
0.35
0.36
0.37
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
Pred
ictiv
e ab
ility
(r)
Weight (w)
GY W-fam A-fam
Effect of different weights w on predictive ability using K-BLUP
(G-BLUP)
G vs K
6.6%
8.1%
0.43
0.44
0.45
0.46
0.47
0.48
0.49
0.50
0.51
0.52
0.53
0.54
0.55
0.56
0.57
0.58
0.59
0.60
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
Pred
ictiv
e ab
ility
(r)
Weight (w)
PHW-fam A-fam
DO NOT C
OPY
K = wA + (1-w)G
0.21
0.22
0.23
0.24
0.25
0.26
0.27
0.28
0.29
0.30
0.31
0.32
0.33
0.34
0.35
0.36
0.37
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
Pred
ictiv
e ab
ility
(r)
Weight (w)
GY W-fam A-fam
Effect of different weights w on predictive ability using K-BLUP
(G-BLUP)
G vs K
6.6%
8.1%
0.43
0.44
0.45
0.46
0.47
0.48
0.49
0.50
0.51
0.52
0.53
0.54
0.55
0.56
0.57
0.58
0.59
0.60
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
Pred
ictiv
e ab
ility
(r)
Weight (w)
PHW-fam A-fam
1.2%
1.9%
G vs K
DO NOT C
OPY
0.84
0.86
0.88
0.90
0.92
0.94
0.96
0.98
1.00
1.02
1.04
1.06
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
Reg.
coe
ffici
ent (
b)
Weights (w)
PH W-fam A-fam
0.76
0.78
0.80
0.82
0.84
0.86
0.88
0.90
0.92
0.94
0.96
0.98
1.00
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
Reg.
coe
ffici
ent (
b)
Weights (w)
GYW-fam A-fam
K = wA + (1-w)G
Effect of different weights w on bias (inflation)
(G-BLUP) DO NOT C
OPY
0.84
0.86
0.88
0.90
0.92
0.94
0.96
0.98
1.00
1.02
1.04
1.06
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
Reg.
coe
ffici
ent (
b)
Weights (w)
PH W-fam A-fam
0.76
0.78
0.80
0.82
0.84
0.86
0.88
0.90
0.92
0.94
0.96
0.98
1.00
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
Reg.
coe
ffici
ent (
b)
Weights (w)
GYW-fam A-fam
K = wA + (1-w)G
Effect of different weights w on bias (inflation)
(G-BLUP) DO NOT C
OPY
23.0
24.0
25.0
26.0
27.0
28.0
29.0
30.0
31.0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
MSE
Weights (w)
PHW-fam A-fam
0.205
0.210
0.215
0.220
0.225
0.230
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
MSE
Weights (w)
GYW-fam A-fam
K = wA + (1-w)G
Effect of different weights w on accuracy
(G-BLUP) DO NOT C
OPY
Prediction quality of models
Quality criterionW-fam A-fam
A G G+A K(0.6) A G G+A K(0.5)
Predictive ability (r) 0.299 0.323 0.339 0.345 0.184 0.230 0.243 0.249Gain in r (%) − 8.1 13.4 15.3 − 25.0 31.9 35.1Inflation (b) 0.963 0.924 0.924 0.948 0.858 0.788 0.828 0.853MSEP 0.217 0.214 0.211 0.210 0.231 0.227 0.225 0.224
GY:
DO NOT C
OPY
Prediction quality of models
Quality criterionW-fam A-fam
A G G+A K(0.6) A G G+A K(0.5)
Predictive ability (r) 0.299 0.323 0.339 0.345 0.184 0.230 0.243 0.249Gain in r (%) − 8.1 13.4 15.3 − 25.0 31.9 35.1Inflation (b) 0.963 0.924 0.924 0.948 0.858 0.788 0.828 0.853MSEP 0.217 0.214 0.211 0.210 0.231 0.227 0.225 0.224
G more important for A-fam predictions
K optimized predictions
GY:
DO NOT C
OPY
Prediction quality of models
Quality criterionW-fam A-fam
A G G+A K(0.6) A G G+A K(0.5)
Predictive ability (r) 0.299 0.323 0.339 0.345 0.184 0.230 0.243 0.249Gain in r (%) − 8.1 13.4 15.3 − 25.0 31.9 35.1Inflation (b) 0.963 0.924 0.924 0.948 0.858 0.788 0.828 0.853MSEP 0.217 0.214 0.211 0.210 0.231 0.227 0.225 0.224
G more important for A-fam predictions
K optimized predictions
GY:
DO NOT C
OPY
Prediction quality of models
Quality criterionW-fam A-fam
A G G+A K(0.6) A G G+A K(0.5)
Predictive ability (r) 0.299 0.323 0.339 0.345 0.184 0.230 0.243 0.249Gain in r (%) − 8.1 13.4 15.3 − 25.0 31.9 35.1Inflation (b) 0.963 0.924 0.924 0.948 0.858 0.788 0.828 0.853MSEP 0.217 0.214 0.211 0.210 0.231 0.227 0.225 0.224
Quality criterionW-fam A-fam
A G G+A K(0.8) A G G+A K(0.8)
Predictive ability (r) 0.420 0.574 0.579 0.581 0.235 0.468 0.469 0.477Gain in r (%) − 36.6 37.8 38.3 − 99.1 99.4 102.9Inflation (b) 0.997 0.935 0.944 0.949 0.791 0.865 0.884 0.902MSEP 30.4 24.8 24.6 24.4 35.0 29.0 28.9 28.6
G more important for A-fam predictions
K optimized predictions
Higher impact of G
Better optimization for A-fam predictions
GY:
PH:
DO NOT C
OPY
Prediction quality of models
Quality criterionW-fam A-fam
A G G+A K(0.6) A G G+A K(0.5)
Predictive ability (r) 0.299 0.323 0.339 0.345 0.184 0.230 0.243 0.249Gain in r (%) − 8.1 13.4 15.3 − 25.0 31.9 35.1Inflation (b) 0.963 0.924 0.924 0.948 0.858 0.788 0.828 0.853MSEP 0.217 0.214 0.211 0.210 0.231 0.227 0.225 0.224
Quality criterionW-fam A-fam
A G G+A K(0.8) A G G+A K(0.8)
Predictive ability (r) 0.420 0.574 0.579 0.581 0.235 0.468 0.469 0.477Gain in r (%) − 36.6 37.8 38.3 − 99.1 99.4 102.9Inflation (b) 0.997 0.935 0.944 0.949 0.791 0.865 0.884 0.902MSEP 30.4 24.8 24.6 24.4 35.0 29.0 28.9 28.6
G more important for A-fam predictions
K optimized predictions
Higher impact of G
Better optimization for A-fam predictions
Better optimization for A-fam predictions
GY:
PH:
DO NOT C
OPY
• Genotyping the whole reference population may not be economically feasible or reasonable
• In practice, GP models are limited to only use phenotypic data of the genotyped lines
• On the other hand, pedigrees typically connect most (if not all) of the phenotyped lines in the breeding population
It would be better if we had more accurate relationships for all the genotypes...
2. When only part of the reference population is genotyped
DO NOT C
OPY
Combined pedigree-genomic matrix H
Matrix A: entire pedigree
lines (genotyped)
Ancestors (non-genotyped)
Legarra et al. 2009; Aguilar et al., 2010; Christensen & Lund, 2010
DO NOT C
OPY
Now more related!
Matrix H: updated pedigree
Combined pedigree-genomic matrix H
lines (replaced by G)
Legarra et al. 2009; Aguilar et al., 2010; Christensen & Lund, 2010
• H as a Bayesianupdating of A
• G is projected on the rest of the pedigree
DO NOT C
OPY
Comparing predictions from A-, G- and H-BLUP
All lines
A y H y
G
G y
vs vs
DO NOT C
OPY
Comparing predictions from A-, G- and H-BLUP
All lines
A y H y
G
G y
vs
Gain due to genomic information
Gain due to more phenotypic information
Extra observations
vs
DO NOT C
OPY
Genotyping scenarios
Combinations of:
3 proportions of genotyped lines: 80%, 60%, 40%
2 strategies for genotyping:
• Within-family (W-fam):High genetic relatedness between genotyped and non-genotyped
• Across-family (A-fam):Low genetic relatedness between genotyped and non-genotyped
K-fold cross-validation using random sampling and constant size of VS across scenarios and models
DO NOT C
OPY
Quality criterion A
W-fam A-fam
H80 H60 H40 H80 H60 H40
Predictive ability (r) 0.288 0.327 0.326 0.323 0.322 0.321 0.318Gain in r (%) − 11.9 11.7 10.8 10.6 10.3 9.4Inflation (b) 0.954 0.998 1.002 0.992 1.006 0.988 1.004MSEP 0.218 0.215 0.215 0.216 0.215 0.216 0.216
Prediction quality of A-BLUP vs H-BLUP models
Gain due to genomic information:
DO NOT C
OPY
Quality criterion A
W-fam A-fam
H80 H60 H40 H80 H60 H40
Predictive ability (r) 0.288 0.327 0.326 0.323 0.322 0.321 0.318Gain in r (%) − 11.9 11.7 10.8 10.6 10.3 9.4Inflation (b) 0.954 0.998 1.002 0.992 1.006 0.988 1.004MSEP 0.218 0.215 0.215 0.216 0.215 0.216 0.216
Prediction quality of A-BLUP vs H-BLUP models
Higher quality of prediction in all genotyping strategies
Slightly better when some lines of each family have genotype
benchmark
Gain due to genomic information:
DO NOT C
OPY
Quality criteriaW-fam
G H80 G H60 G H40
Predictive ability (r) 0.324 0.325 0.306 0.323 0.244 0.309Gain in r (%) − 0.3 − 5.6 − 26.6Inflation (b) 0.940 0.990 0.926 0.978 0.829 0.938MSEP 0.216 0.214 0.217 0.216 0.221 0.207
Quality criteriaA-fam
G H80 G H60 G H40
Predictive ability (r) 0.316 0.317 0.308 0.320 0.260 0.302Gain in r (%) − 0.3 − 3.9 − 16.2Inflation (b) 0.961 0.981 0.938 1.003 0.906 0.944MSEP 0.218 0.216 0.218 0.214 0.225 0.216
Prediction quality of G-BLUP vs H-BLUP models
Gain due to more phenotypic information:
DO NOT C
OPY
Quality criteriaW-fam
G H80 G H60 G H40
Predictive ability (r) 0.324 0.325 0.306 0.323 0.244 0.309Gain in r (%) − 0.3 − 5.6 − 26.6Inflation (b) 0.940 0.990 0.926 0.978 0.829 0.938MSEP 0.216 0.214 0.217 0.216 0.221 0.207
Quality criteriaA-fam
G H80 G H60 G H40
Predictive ability (r) 0.316 0.317 0.308 0.320 0.260 0.302Gain in r (%) − 0.3 − 3.9 − 16.2Inflation (b) 0.961 0.981 0.938 1.003 0.906 0.944MSEP 0.218 0.216 0.218 0.214 0.225 0.216
H uses 150% more information than G for prediction
Increases r, reduces bias and minimizes MSEP
Prediction quality of G-BLUP vs H-BLUP models
Gain due to more phenotypic information:
DO NOT C
OPY
Summary
When all selection candidates are genotyped:
• Combining genomic and pedigree information can enhance prediction quality in sorghum
• Blending information with K-BLUP optimized predictive ability and reduction of bias
• The optimal weight and its impact on prediction quality differed between traits
When not all selection candidates are genotyped:
• Combining A and G through H-BLUP was always beneficial
A can make more phenotypic data available
G provides more precise genetic relationships
H combine both benefits to improve predictions
DO NOT C
OPY
Acknowledgments
Fred van Eeuwijk (WUR)
Marcos Malosetti (WUR)
David Jordan (QAAFI-UQ)
Colleen Hunt (DAF, QAAFI-UQ)
Emma Mace (DAF, QAAFI-UQ)
Phenotyping/genotyping funding: PhD funding:
DO NOT C
OPY
genotyped line
tracing back 28 generations
Pedigree and marker data
• Pedigree: 646 female lines + 499 ancestors
Full-sib families with complex pedigree
• Genotyped: 646 female lines (bottom layer)
4.8K SNPs after filtering
DO NOT C
OPY
Construction of matrix H
39
Legarra et al. 2009; Aguilar et al., 2010; Christensen & Lund, 2010
genotyped
non genotyped
Consider:
Simple inverse:
Correcting the pedigree
Joint distribution:
DO NOT C
OPY