the effect of genetic relationships and other factors on...

48
The effect of genetic relationships and other factors on genomic prediction accuracy in public plant breeding programs Aaron Lorenz NGGIBCI-2014 ICRISAT, Feb 21, 2014

Upload: others

Post on 25-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The effect of genetic relationships and other factors on ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Ramil-Mauleon.pdf · The effect of genetic relationships and other factors

The effect of genetic relationships and other factors on genomic prediction accuracy in public plant breeding programs

Aaron Lorenz NGGIBCI-2014

ICRISAT, Feb 21, 2014

Page 2: The effect of genetic relationships and other factors on ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Ramil-Mauleon.pdf · The effect of genetic relationships and other factors

Jon Luetchens Drought phenotyping Corn breeding

Liakat Ali Drought phenotyping Physiology/genetics

Collin Lamkey Heterotic groups Corn breeding/QG

Nonoy Bandillo Time series GWAS GWAS methods

Amritpal Singh Goss’s wilt GWAS/Genome analysis

Dnyaneshwar Kadam Hybrid prediction Corn breeding/ Genomic Prediction

Ibrahim El -Basyoni Winter wheat Genomic prediction

Diego Jarquin Statistics Soybean genomic pred.

Page 3: The effect of genetic relationships and other factors on ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Ramil-Mauleon.pdf · The effect of genetic relationships and other factors

Plant breeding in the 21st century Two important trends

$

Genotypic data

Phenotypic data

$

Page 4: The effect of genetic relationships and other factors on ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Ramil-Mauleon.pdf · The effect of genetic relationships and other factors

Genomic selection

DNA marker data

Phenotypic data

y Xb + Zu + e

Model training

Predict and select

Selection candidates

Trai

nin

g Po

pu

lati

on

C

alib

rati

on

Set

• No QTL mapping • No testing for

significant markers

Page 5: The effect of genetic relationships and other factors on ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Ramil-Mauleon.pdf · The effect of genetic relationships and other factors

Estimation methods for genomic selection

1. Shrinkage models • RR-BLUP, BayesA

2. Dimension reduction methods • Partial least squares • Principal component

regression

3. Variable selection models • BayesB, BayesCπ, BayesDπ

4. Kernel and machine learning methods • Support vector machine

regression

Training population

Line 1 76 1 1 1

Line 2 56 1 1 1

Line 3 45 1 1 1

Line 4 67 0 1 0

Line n 22 1 1 1

Line Yield Mrk 1 Mrk 2 … Mrk p

LARGE p !!

smaller n !!

Page 6: The effect of genetic relationships and other factors on ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Ramil-Mauleon.pdf · The effect of genetic relationships and other factors

A genome-wide approach typically provides better predictions

Lorenzana and Bernardo (2009) Lorenz (2013)

Ge

no

mic

rA

MAS rA

MAS GS MAS GS

Page 7: The effect of genetic relationships and other factors on ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Ramil-Mauleon.pdf · The effect of genetic relationships and other factors
Page 8: The effect of genetic relationships and other factors on ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Ramil-Mauleon.pdf · The effect of genetic relationships and other factors

Test varieties and

release

Make crosses and advance generations

Genotype selection

candidates

New Germplasm

Line Development

Cycle

Genomic Selection

Advance lines with highest

GEBV

Phenotype (lines have

already been genotyped)

Train prediction

model

Advance lines informative for

model improvement

Model Training

Cycle

Updated Model

Modified from Heffner, Sorrells, and Jannink 2009. Crop Sci.

Genomic selection in motion

Page 9: The effect of genetic relationships and other factors on ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Ramil-Mauleon.pdf · The effect of genetic relationships and other factors

Test varieties and release

Make crosses and advance generations

Genotype selection

candidates

New Germplasm

Line Development

Cycle

Genomic Selection

Advance lines with highest

GEBV

Phenotype (lines have

already been genotyped)

Train prediction

model

Advance lines informative for

model improvement

Model Training

Cycle

Updated Model

Modified from Heffner, Sorrells, and Jannink 2009. Crop Sci.

Which model?

Which lines?

Marker platform? Marker subset?

Genomic selection in motion

Page 10: The effect of genetic relationships and other factors on ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Ramil-Mauleon.pdf · The effect of genetic relationships and other factors

Effect of genetic distance between training population and selection candidates on prediction accuracy

Page 11: The effect of genetic relationships and other factors on ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Ramil-Mauleon.pdf · The effect of genetic relationships and other factors

Genetic distance between subpopulations

Page 12: The effect of genetic relationships and other factors on ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Ramil-Mauleon.pdf · The effect of genetic relationships and other factors
Page 13: The effect of genetic relationships and other factors on ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Ramil-Mauleon.pdf · The effect of genetic relationships and other factors

Information sharing decreases with greater genetic distance

1. Epistasis

– Genetic background-by-QTL interactions

2. Differing marker-QTL linkage phases

3. Polymorphic loci not shared

Pop 1 M------Q M------Q m------q m------q

Pop 2 m------Q m------Q M------q M------q

Page 14: The effect of genetic relationships and other factors on ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Ramil-Mauleon.pdf · The effect of genetic relationships and other factors

BuschAg University of MN NDSU 6-row

PC 1

PC

2

1180 polymorphic markers

Predicting across subpopulations

Subpop 1 Subpop 2

Validation sets

Trai

nin

g se

ts

Lorenz et al. (2012)

Page 15: The effect of genetic relationships and other factors on ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Ramil-Mauleon.pdf · The effect of genetic relationships and other factors

Objectives

1. Examine relationship between prediction accuracy and genetic distance between training population and selection candidates.

2. Devise a method to intelligently sample a training dataset to maximize prediction accuracy.

Page 16: The effect of genetic relationships and other factors on ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Ramil-Mauleon.pdf · The effect of genetic relationships and other factors

FHB Genomic Selection Project

UM NDSU Parents (Training pop)

Crosses

U.S. Wheat & Barley Scab Initiative

x

x

x

x

x

x

x

x

x

x

x

x

N= 384 N=384

UM x UM N = 100

UM x ND N = 100

ND x ND N = 100

Progeny (Validation pop)

Genotyping 3072 SNPs 384 SNPs

Kevin Smith UMN

Page 17: The effect of genetic relationships and other factors on ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Ramil-Mauleon.pdf · The effect of genetic relationships and other factors

MN ND

Training Population

MN

X

MN

MN

X

ND

ND

X

ND

Valid

ation P

opula

tion

ˆijA

1.5

-1

0

Realized relationship matrix calculated with method of Endelman and Jannink (2012)

Page 18: The effect of genetic relationships and other factors on ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Ramil-Mauleon.pdf · The effect of genetic relationships and other factors

1. Order TP by average relatedness to VP.

2. Select TP of 200 lines 3. Sliding window

increments of 10

Sliding window approach

Page 19: The effect of genetic relationships and other factors on ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Ramil-Mauleon.pdf · The effect of genetic relationships and other factors

1. Order TP by average relatedness to VP.

2. Select TP of 200 lines 3. Sliding window

increments of 10

Page 20: The effect of genetic relationships and other factors on ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Ramil-Mauleon.pdf · The effect of genetic relationships and other factors

1. Order TP by average relatedness to VP.

2. Select TP of 200 lines 3. Sliding window

increments of 10

Page 21: The effect of genetic relationships and other factors on ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Ramil-Mauleon.pdf · The effect of genetic relationships and other factors

Can adding increasingly unrelated individuals actually hurt prediction accuracy?

Page 22: The effect of genetic relationships and other factors on ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Ramil-Mauleon.pdf · The effect of genetic relationships and other factors

“Next kin” plots

1. Rank TP individuals according to avg relationship with selection candidates

2. Select 10 most closely related individuals and predict,

3. Add next closest 10 and repeat

ˆijA

TP Size

r(p

red

, ob

s)

100% MN

99% - 90%

89% - 80%

79% - 70%

69% - 60%

<60% MN

2

2

2

DON: 0.70

FHB : 0.89

HT: 0.77

Adj R

Adj R

Adj R

TP: MN+ND parents VP: MN x MN prog.

Page 23: The effect of genetic relationships and other factors on ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Ramil-Mauleon.pdf · The effect of genetic relationships and other factors

“Next kin” plots

1. Rank TP individuals according to avg relationship with selection candidates

2. Select 10 most closely related individuals and predict,

3. Add next closest 10 and repeat

ˆijA

TP Size

r(p

red

, ob

s)

100% MN

99% - 90%

89% - 80%

79% - 70%

69% - 60%

<60% MN

2

2

2

DON: 0.70

FHB : 0.89

HT: 0.77

Adj R

Adj R

Adj R

TP: MN+ND parents VP: MN x MN prog.

Page 24: The effect of genetic relationships and other factors on ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Ramil-Mauleon.pdf · The effect of genetic relationships and other factors

“Next kin” plots

1. Rank TP individuals according to avg relationship with selection candidates

2. Select 10 most closely related individuals and predict,

3. Add next closest 10 and repeat

ˆijA

TP Size

r(p

red

, ob

s)

100% ND

99% - 90%

89% - 80%

79% - 70%

69% - 60%

<60% ND

2

2

2

DON: 0.16

FHB : 0.58

HT: 0.57

Adj R

Adj R

Adj R

TP: MN+ND parents VP: ND x ND prog.

Page 25: The effect of genetic relationships and other factors on ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Ramil-Mauleon.pdf · The effect of genetic relationships and other factors

Comparing TP selection schemes

0

0.1

0.2

0.3

0.4

0.5

0.6

DON FHB HT

Random

A_Mean

A_Ind Specific

A_Fam Specific

r (p

red

, ob

s)

TP VP MN+ND MN x MN

Page 26: The effect of genetic relationships and other factors on ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Ramil-Mauleon.pdf · The effect of genetic relationships and other factors

Comparing TP selection schemes

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

DON FHB HT

Random

A_Mean

A_Ind Specific

A_Fam Specific

r (p

red

, ob

s)

TP VP MN+ND ND x ND

Page 27: The effect of genetic relationships and other factors on ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Ramil-Mauleon.pdf · The effect of genetic relationships and other factors

Can we establish a standard cutoff for inclusion and exclusion of training individuals based on relatedness?

Page 28: The effect of genetic relationships and other factors on ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Ramil-Mauleon.pdf · The effect of genetic relationships and other factors

Persistence of LD phase across populations

Correlation of r

M Q M Q

M Q m q

m q m q

M Q M q

m Q m q

M q m Q

M q M q

M q m Q

m Q m Q

Page 29: The effect of genetic relationships and other factors on ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Ramil-Mauleon.pdf · The effect of genetic relationships and other factors

Persistence of LD phase across populations

Correlation of r 1. Calculate r between each pair of adjacent markers in each

population • m – 1 r values

2. Correlate the m – 1 r values. • cor(r1, r2)

3. Populations with consistent LD phases between adjacent markers will have high “correlation of r”.

1 2 1 2A A B B

Dr

p p p p

Page 30: The effect of genetic relationships and other factors on ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Ramil-Mauleon.pdf · The effect of genetic relationships and other factors

de Roos et al. (2008)

Page 31: The effect of genetic relationships and other factors on ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Ramil-Mauleon.pdf · The effect of genetic relationships and other factors

Co

rre

lati

on

of

r

-0.1

0

.0

0.1

0

.2

0.3

0

.4

0.5

-0.4 -0.2 0.0 0.2 0.4

Mean Aij between TP and VP

Page 32: The effect of genetic relationships and other factors on ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Ramil-Mauleon.pdf · The effect of genetic relationships and other factors

DON FHB HT

MNxMN

NDxND MNxND

Ind cor of r

TP Size

Calculate cor of r between whole VP and every individual in TP

Page 33: The effect of genetic relationships and other factors on ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Ramil-Mauleon.pdf · The effect of genetic relationships and other factors

Future work

• Determine if similar relationships exist in other species/breeding populations

• Continue to validate “individual cor of r“ criteria for designing TP and compare to multi-locus LD measures

Page 34: The effect of genetic relationships and other factors on ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Ramil-Mauleon.pdf · The effect of genetic relationships and other factors

Marker platform

Page 35: The effect of genetic relationships and other factors on ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Ramil-Mauleon.pdf · The effect of genetic relationships and other factors

GBS vs 92K iSelect assay Hard red winter wheat diversity panel

• Available: 299 lines sampled from winter wheat breeding programs

• Phenotyping

– Two N levels in 2012 and 2013.

– Three reps

– Mead, NE

• Genotyping

– 92K Illumina iSelect assay (Eduard Akhunov)

– Two-enzyme GBS (Jesse Poland)

• 10-fold CV replicated 100 times

Stephen Baenziger UNL

Page 36: The effect of genetic relationships and other factors on ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Ramil-Mauleon.pdf · The effect of genetic relationships and other factors

#SNPs

(MAF > 0.05, %NA < 0.50)

iSelect 92K 28,083

GBS 20,021

Page 37: The effect of genetic relationships and other factors on ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Ramil-Mauleon.pdf · The effect of genetic relationships and other factors

GBS vs 92K iSelect assay Hard red winter wheat diversity panel

Page 38: The effect of genetic relationships and other factors on ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Ramil-Mauleon.pdf · The effect of genetic relationships and other factors

What do we do with all these markers?

Page 39: The effect of genetic relationships and other factors on ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Ramil-Mauleon.pdf · The effect of genetic relationships and other factors

Genomic prediction in soybean

• UNL soybean breeding lines – 301 lines

• Traits

• Genotyping-by-sequencing

– Institute of Genomic Diversity, Cornell – 219,035 potential SNPs

• 10-fold cross validation replicated 200 times

Grain yld Plant Ht Maturity Date

Entry-mean h2 0.78 0.79 0.97

George Graef, UNL

Page 40: The effect of genetic relationships and other factors on ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Ramil-Mauleon.pdf · The effect of genetic relationships and other factors

Soybean genotyping-by-sequencing

Outside to inside Unique tag count SNP density MAF Percent missing

Katie Hyma, Cornell IGD

Page 41: The effect of genetic relationships and other factors on ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Ramil-Mauleon.pdf · The effect of genetic relationships and other factors

SNP Number

#SNPs (MAF > 0.05) %NA < 0.05 16,502 %NA < 0.80 52,349

Page 42: The effect of genetic relationships and other factors on ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Ramil-Mauleon.pdf · The effect of genetic relationships and other factors

Results: Average Prediction Accuracy - GY co

r(p

red

icte

d, o

bse

rved

)

cor(

pre

dic

ted

, ob

serv

ed)

MAF MAF

Naïve imputation Random Forest Imputation

Page 43: The effect of genetic relationships and other factors on ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Ramil-Mauleon.pdf · The effect of genetic relationships and other factors

Results: Bootstrap Confidence Intervals

• 200 repetitions • 95 % CI

cor(

pre

dic

ted

, ob

serv

ed)

cor(

pre

dic

ted

, ob

serv

ed)

MAF MAF

Naïve imputation Random Forest Imputation

Page 44: The effect of genetic relationships and other factors on ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Ramil-Mauleon.pdf · The effect of genetic relationships and other factors

0

0.2

0.4

0.6

0.8

1

DON FHB

Prediction models equal A

ccu

racy

Models also equivalent in: • Bernardo and Yu (2007) [Maize] • Lorenzana and Bernardo (2009) [Several plant species] • Van Raden et al. (2009) [Holstein] • Hayes (2009) [Holstein]

RR-BLUP BayesCpi Bayesian LASSO

Page 45: The effect of genetic relationships and other factors on ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Ramil-Mauleon.pdf · The effect of genetic relationships and other factors

Genome-wide modeling epistasis

% of variance

Models rA G G#G X#X E

G 0.589 91.1 -- -- 9.0

G#G 0.585 -- 87.4 -- 12.7

Kaa 0.585 -- -- 87.4 12.6

G + G#G 0.592 65.0 25.0 -- 10.0

G + Kaa 0.588 74.8 -- 15.6 9.7

# = Hadamard product G – additive realized relationship matrix Kaa = additive-by-additive relationship matrix as shown by Xu (2013)

Page 46: The effect of genetic relationships and other factors on ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Ramil-Mauleon.pdf · The effect of genetic relationships and other factors

Conclusions

• GBS seems to work well for genomic prediction

• Use all polymorphic markers and impute – Don’t worry about removing markers with high

%NA

• Pay special attention to the genetic distance between selection candidates and the training population. – Ind cor of r simultaneously factors in marker

density and relationships

Page 47: The effect of genetic relationships and other factors on ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Ramil-Mauleon.pdf · The effect of genetic relationships and other factors

Thank you www.lorenzlab.net

Page 48: The effect of genetic relationships and other factors on ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Ramil-Mauleon.pdf · The effect of genetic relationships and other factors

Acknowledgements Maize silage Natalia de Leon, PI, UW Renato Rodrigues, Postdoc, UW Tim Beissinger, Student, UW Wheat Stephen Baenziger, PI, UNL Ibrahim Salah, Postdoc, UNL Jesse Poland, PI, K-State Eduard Akhunov, PI, K-State Mary Guttierri, Student, UNL Katherine Frels, Student, UNL

Barley Kevin Smith, PI, University of Minnesota Shiaoman Chao, PI, USDA-ARS Vikas Vikram , Student, UMN Jean-Luc Jannink, PI, USDA-ARS Soybean George Graef, PI, UNL Diego Jarquin, Postdoc, UNL Kyle Kocak, Student, UNL Katie Hyma, Cornell IGD Luis Posada, Postdoc, UNL Joey Jedlica, Student, UNL

U.S. Wheat & Barley Scab Initiative