genomic-based relatedness for prediction of parental use ...€¦ · marcos malosetti, colleen...

Use of kinship matrices blending pedigree and genomic-based relatedness for prediction of parental

line breeding values in sorghum

Julio Velazco, Marcos Malosetti, Colleen Hunt, Emma Mace,

David Jordan & Fred van Eeuwijk

DO NOT C

OPY

Breeding program for female parents

Pedigree+

SNP genotypes

tester

test-cross

Phenotype

A/B-line BV

After Acquaah (2012)DO NOT C

OPY

Phenotypic data

Advanced Yield Testing:

• 646 female lines x 5 male testers 2645 testcross hybrids

• 26 trials across the major sorghum cropping region in Australia

• 15 locations, between 2008 and 2014

• Traits: grain yield (and plant height)

DO NOT C

OPY

genotyped lines

Pedigree and marker data

• Pedigree: 646 female lines + 499 ancestors (tracing back 28 generations)

Full-sib families with complex pedigree

• Genotyped: 646 female lines (bottom layer)

4.8 K SNPs after filtering

DO NOT C

OPY

Pedigree (A) and genomic (G) relationship matrices

• A estimates expected average relationships (IBD), assuming infinite loci

• G gives realized relationships (IBS), considering finite size of the genome

• G (based on many markers) is similar to A, but more precise: SNPs follow Mendel’s rules, tracing alleles (“fine relationships”) Genotype of descendant = half of each parent + Mendelian sampling e.g. some full-sibs are more similar than others (rA≠0.5)

• A (based on deep pedigree) can account for potential LD not traced by G: at population level at family level across chromosomesDO N

OT COPY

Explore prediction models that combine genomic and

pedigree-based relationships for genetic evaluation of

parental lines

1. when all the reference lines are genotyped

2. when only some of the reference lines are genotyped (preliminary)

Aim

DO NOT C

OPY

1. All lines are genotyped

P

Ancestors

Lines

Test-crosses

A

G

DO NOT C

OPY

1. All lines are genotyped

P

Ancestors

Lines

Test-crosses

A

G

Phenotypic analysis

yL

Stage 1: Spatial analysisto obtain spatially-adjusted testcross means from each trial

Stage 2: MET analysisto obtain adjusted Line means across testers-trials (y

L)

DO NOT C

OPY

Prediction models

A-BLUP:

G-BLUP:

yL= 1μ + Za + e

• a vector of familial additive effects, a ~ N(0, Aσa2)

yL= 1μ + Zg + e

• g vector of additive genomic effects, g ~ N(0, Gσg2)

DO NOT C

OPY

G-BLUP plus a polygenic effect:

yL= 1μ + Zk + e

• k vector of total additive genetic effects, k ~ N(0, Kσk2)

•K = wA + (1−w)G•w proportion of additive genetic variance not explained by the SNPs

Prediction models

DO NOT C

OPY

Two different approaches to set w:

Use the w that maximizes the (RE)likelihood based on phenotyped lines;

equivalent to fitting G+A:

The w that optimizes predictions of unphenotyped lines: K-BLUP

G-BLUP plus a polygenic effect:

yL= 1μ + Zk + e

• k vector of total additive genetic effects, k ~ N(0, Kσk2)

•K = wA + (1−w)G•w proportion of additive genetic variance not explained by the SNPs

Prediction models

yL= 1μ + Zg + Za + e

DO NOT C

OPY

Within-family prediction (W-fam)

Validation of prediction models

prediction

Training set

20% of lines from each full-sib family

high genetic relatedness between tested and predicted lines

prediction

Validation set

Training set

Validation set

20% of completefull-sib families

Among-family prediction (A-fam)

low genetic relatedness between tested and predicted lines

Results averaged over 100 realizations (5 folds x 20 reps)DO NOT C

OPY

Evaluation of prediction models

Prediction quality criteria:

Predictive ability• r(yL(VS), EBV)

Bias (inflation)• Coefficient of regression (b)

• b < 1 indicates inflation of predicted genetic variance

Accuracy• Mean squared error of prediction (MSEP)

The optimal predictor maximizes r, is unbiased and not inflated (b = 1) and minimizes MSEPDO N

OT COPY

K = wA + (1-w)G

0.21

0.22

0.23

0.24

0.25

0.26

0.27

0.28

0.29

0.30

0.31

0.32

0.33

0.34

0.35

0.36

0.37

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

Pred

ictiv

e ab

ility

(r)

Weight (w)

GY W-fam A-fam

Effect of different weights w on predictive ability using K-BLUP

(G-BLUP) DO NOT C

OPY

K = wA + (1-w)G

0.21

0.22

0.23

0.24

0.25

0.26

0.27

0.28

0.29

0.30

0.31

0.32

0.33

0.34

0.35

0.36

0.37

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

Pred

ictiv

e ab

ility

(r)

Weight (w)

GY W-fam A-fam


(G-BLUP)

G vs K

6.6%

8.1%

DO NOT C

OPY

K = wA + (1-w)G

0.21

0.22

0.23

0.24

0.25

0.26

0.27

0.28

0.29

0.30

0.31

0.32

0.33

0.34

0.35

0.36

0.37

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

Pred

ictiv

e ab

ility

(r)

Weight (w)

GY W-fam A-fam


(G-BLUP)

G vs K

6.6%

8.1%

0.43

0.44

0.45

0.46

0.47

0.48

0.49

0.50

0.51

0.52

0.53

0.54

0.55

0.56

0.57

0.58

0.59

0.60

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

Pred

ictiv

e ab

ility

(r)

Weight (w)

PHW-fam A-fam

DO NOT C

OPY

K = wA + (1-w)G

0.21

0.22

0.23

0.24

0.25

0.26

0.27

0.28

0.29

0.30

0.31

0.32

0.33

0.34

0.35

0.36

0.37

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

Pred

ictiv

e ab

ility

(r)

Weight (w)

GY W-fam A-fam


(G-BLUP)

G vs K

6.6%

8.1%

0.43

0.44

0.45

0.46

0.47

0.48

0.49

0.50

0.51

0.52

0.53

0.54

0.55

0.56

0.57

0.58

0.59

0.60

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

Pred

ictiv

e ab

ility

(r)

Weight (w)

PHW-fam A-fam

1.2%

1.9%

G vs K

DO NOT C

OPY

0.84

0.86

0.88

0.90

0.92

0.94

0.96

0.98

1.00

1.02

1.04

1.06

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

Reg.

coe

ffici

ent (

b)

Weights (w)

PH W-fam A-fam

0.76

0.78

0.80

0.82

0.84

0.86

0.88

0.90

0.92

0.94

0.96

0.98

1.00

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

Reg.

coe

ffici

ent (

b)

Weights (w)

GYW-fam A-fam

K = wA + (1-w)G

Effect of different weights w on bias (inflation)

(G-BLUP) DO NOT C

OPY

23.0

24.0

25.0

26.0

27.0

28.0

29.0

30.0

31.0

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

MSE

Weights (w)

PHW-fam A-fam

0.205

0.210

0.215

0.220

0.225

0.230

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

MSE

Weights (w)

GYW-fam A-fam

K = wA + (1-w)G

Effect of different weights w on accuracy

(G-BLUP) DO NOT C

OPY

Prediction quality of models

Quality criterionW-fam A-fam

A G G+A K(0.6) A G G+A K(0.5)

Predictive ability (r) 0.299 0.323 0.339 0.345 0.184 0.230 0.243 0.249Gain in r (%) − 8.1 13.4 15.3 − 25.0 31.9 35.1Inflation (b) 0.963 0.924 0.924 0.948 0.858 0.788 0.828 0.853MSEP 0.217 0.214 0.211 0.210 0.231 0.227 0.225 0.224

GY:

DO NOT C

OPY



A G G+A K(0.6) A G G+A K(0.5)


G more important for A-fam predictions

K optimized predictions

GY:

DO NOT C

OPY



A G G+A K(0.6) A G G+A K(0.5)



A G G+A K(0.8) A G G+A K(0.8)




Higher impact of G

Better optimization for A-fam predictions

GY:

PH:

DO NOT C

OPY



A G G+A K(0.6) A G G+A K(0.5)



A G G+A K(0.8) A G G+A K(0.8)




Higher impact of G



GY:

PH:

DO NOT C

OPY

• Genotyping the whole reference population may not be economically feasible or reasonable

• In practice, GP models are limited to only use phenotypic data of the genotyped lines

• On the other hand, pedigrees typically connect most (if not all) of the phenotyped lines in the breeding population

It would be better if we had more accurate relationships for all the genotypes...

2. When only part of the reference population is genotyped

DO NOT C

OPY

Combined pedigree-genomic matrix H

Matrix A: entire pedigree

lines (genotyped)

Ancestors (non-genotyped)

Legarra et al. 2009; Aguilar et al., 2010; Christensen & Lund, 2010

DO NOT C

OPY

Now more related!

Matrix H: updated pedigree

Combined pedigree-genomic matrix H

lines (replaced by G)


• H as a Bayesianupdating of A

• G is projected on the rest of the pedigree

DO NOT C

OPY

Comparing predictions from A-, G- and H-BLUP

All lines

A y H y

G

G y

vs vs

DO NOT C

OPY

Comparing predictions from A-, G- and H-BLUP

All lines

A y H y

G

G y

vs

Gain due to genomic information

Gain due to more phenotypic information

Extra observations

vs

DO NOT C

OPY

Genotyping scenarios

Combinations of:

3 proportions of genotyped lines: 80%, 60%, 40%

2 strategies for genotyping:

• Within-family (W-fam):High genetic relatedness between genotyped and non-genotyped

• Across-family (A-fam):Low genetic relatedness between genotyped and non-genotyped

K-fold cross-validation using random sampling and constant size of VS across scenarios and models

DO NOT C

OPY

Quality criterion A

W-fam A-fam

H80 H60 H40 H80 H60 H40

Predictive ability (r) 0.288 0.327 0.326 0.323 0.322 0.321 0.318Gain in r (%) − 11.9 11.7 10.8 10.6 10.3 9.4Inflation (b) 0.954 0.998 1.002 0.992 1.006 0.988 1.004MSEP 0.218 0.215 0.215 0.216 0.215 0.216 0.216

Prediction quality of A-BLUP vs H-BLUP models

Gain due to genomic information:

DO NOT C

OPY

Quality criterion A

W-fam A-fam

H80 H60 H40 H80 H60 H40

Predictive ability (r) 0.288 0.327 0.326 0.323 0.322 0.321 0.318Gain in r (%) − 11.9 11.7 10.8 10.6 10.3 9.4Inflation (b) 0.954 0.998 1.002 0.992 1.006 0.988 1.004MSEP 0.218 0.215 0.215 0.216 0.215 0.216 0.216

Prediction quality of A-BLUP vs H-BLUP models

Higher quality of prediction in all genotyping strategies

Slightly better when some lines of each family have genotype

benchmark

Gain due to genomic information:

DO NOT C

OPY

Quality criteriaW-fam

G H80 G H60 G H40

Predictive ability (r) 0.324 0.325 0.306 0.323 0.244 0.309Gain in r (%) − 0.3 − 5.6 − 26.6Inflation (b) 0.940 0.990 0.926 0.978 0.829 0.938MSEP 0.216 0.214 0.217 0.216 0.221 0.207

Quality criteriaA-fam

G H80 G H60 G H40


Prediction quality of G-BLUP vs H-BLUP models

Gain due to more phenotypic information:

DO NOT C

OPY

Quality criteriaW-fam

G H80 G H60 G H40


Quality criteriaA-fam

G H80 G H60 G H40


H uses 150% more information than G for prediction

Increases r, reduces bias and minimizes MSEP

Prediction quality of G-BLUP vs H-BLUP models

Gain due to more phenotypic information:

DO NOT C

OPY

Summary

When all selection candidates are genotyped:

• Combining genomic and pedigree information can enhance prediction quality in sorghum

• Blending information with K-BLUP optimized predictive ability and reduction of bias

• The optimal weight and its impact on prediction quality differed between traits

When not all selection candidates are genotyped:

• Combining A and G through H-BLUP was always beneficial

A can make more phenotypic data available

G provides more precise genetic relationships

H combine both benefits to improve predictions

DO NOT C

OPY

Acknowledgments

Fred van Eeuwijk (WUR)

Marcos Malosetti (WUR)

David Jordan (QAAFI-UQ)

Colleen Hunt (DAF, QAAFI-UQ)

Emma Mace (DAF, QAAFI-UQ)

Phenotyping/genotyping funding: PhD funding:

DO NOT C

OPY

genotyped line

tracing back 28 generations

Pedigree and marker data

• Pedigree: 646 female lines + 499 ancestors

Full-sib families with complex pedigree

• Genotyped: 646 female lines (bottom layer)

4.8K SNPs after filtering

DO NOT C

OPY

Construction of matrix H

39


genotyped

non genotyped

Consider:

Simple inverse:

Correcting the pedigree

Joint distribution:

DO NOT C

OPY

genomic-based relatedness for prediction of parental use ...€¦ · marcos malosetti, colleen...

Documents