extension of single-step ssgblup to many genotyped individuals · genomic selection and single-step...

38
Extension of single-step ssGBLUP to many genotyped individuals Ignacy Misztal University of Georgia

Upload: others

Post on 20-May-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Extension of single-step ssGBLUP to many genotyped individuals · Genomic selection and single-step • Simplicity – No DYD or DP – No index – No complexity • Accuracy –

Extension of single-step ssGBLUP to many genotyped individuals

Ignacy MisztalUniversity of Georgia

Page 2: Extension of single-step ssGBLUP to many genotyped individuals · Genomic selection and single-step • Simplicity – No DYD or DP – No index – No complexity • Accuracy –

Genomic selection and single-step

Simplicity–

No DYD or DP–

No index –

No complexity

Accuracy–

Avoids double counting–

Avoids fixed index–

Accounts for preselection bias

H-1 = A-1 + 0 00 G-1 − A22

-1

Aguilar et al., 2010

Christensen and Lund, 2010

Page 3: Extension of single-step ssGBLUP to many genotyped individuals · Genomic selection and single-step • Simplicity – No DYD or DP – No index – No complexity • Accuracy –

Current implementation of SS

G and A22

created explicitly •

Quadratic memory and cubic computations•

Cost per 100k genotypes -

1.5 hr (Aguilar et al.,2014)

-1 -1

-1 -122

H = A +0 00 G A

Page 4: Extension of single-step ssGBLUP to many genotyped individuals · Genomic selection and single-step • Simplicity – No DYD or DP – No index – No complexity • Accuracy –

Number of genotypes and impending problem

> 2 M for Holsteins> 400k for Angus

Genomic pre-selection issue (Patry and Ducrocq, 2011; VanRaden et al., 2013)

BLUP increasingly biased–

Need all data on preselection included

Page 5: Extension of single-step ssGBLUP to many genotyped individuals · Genomic selection and single-step • Simplicity – No DYD or DP – No index – No complexity • Accuracy –

Unsymmetric equations

Misztal et al., 2009

No convergence without good preconditionerNo convergence with large H or A

Page 6: Extension of single-step ssGBLUP to many genotyped individuals · Genomic selection and single-step • Simplicity – No DYD or DP – No index – No complexity • Accuracy –

No G or A22

inverse model

Legarra and Ducrocq (2011)

Slow convergence with few genotypesDivergence with many genotypes

Page 7: Extension of single-step ssGBLUP to many genotyped individuals · Genomic selection and single-step • Simplicity – No DYD or DP – No index – No complexity • Accuracy –

SNP model for genotyped animals

Legarra and Ducrocq, 2011

No successful programming

Page 8: Extension of single-step ssGBLUP to many genotyped individuals · Genomic selection and single-step • Simplicity – No DYD or DP – No index – No complexity • Accuracy –

SNP model for genotyped animals

Liu et al, 2014

Page 9: Extension of single-step ssGBLUP to many genotyped individuals · Genomic selection and single-step • Simplicity – No DYD or DP – No index – No complexity • Accuracy –

SNP effects for all animals (Fernando et al., 2014)

centered genotypes

imputed genotypes

Cost of imputationRequires new type of programmingExtension to complex models unclear

Page 10: Extension of single-step ssGBLUP to many genotyped individuals · Genomic selection and single-step • Simplicity – No DYD or DP – No index – No complexity • Accuracy –

Can regular ssGBLUP be made more efficient?

Page 11: Extension of single-step ssGBLUP to many genotyped individuals · Genomic selection and single-step • Simplicity – No DYD or DP – No index – No complexity • Accuracy –

Scaling up A22-1

Page 12: Extension of single-step ssGBLUP to many genotyped individuals · Genomic selection and single-step • Simplicity – No DYD or DP – No index – No complexity • Accuracy –

Is dimensionality of genomic information limited?

Regular G not positive definite past ~5k –

Blending with A (VanRaden, 2008)

Dimensionality of SNP BLUP small (Maciotta et al., 2013)

Success of imputation

Manhattan plots noisy until averaged by 300k-10Mb (depending on species)

Page 13: Extension of single-step ssGBLUP to many genotyped individuals · Genomic selection and single-step • Simplicity – No DYD or DP – No index – No complexity • Accuracy –
Page 14: Extension of single-step ssGBLUP to many genotyped individuals · Genomic selection and single-step • Simplicity – No DYD or DP – No index – No complexity • Accuracy –

…………

Heterogenetic and homogenic tracts in genome (Stam, 1980)

E(#tracts)=4NeL (Stam, 1980)Ne –

effective population sizeL –length of genome in Morgans

Holsteins: Ne ≈100 L=30 Me=12,000

Page 15: Extension of single-step ssGBLUP to many genotyped individuals · Genomic selection and single-step • Simplicity – No DYD or DP – No index – No complexity • Accuracy –

Inversion via SVD/eigenvalue decompositionAssume 1 million animals genotyped with 60k chip

Page 16: Extension of single-step ssGBLUP to many genotyped individuals · Genomic selection and single-step • Simplicity – No DYD or DP – No index – No complexity • Accuracy –

Inverse by Woodbury formula

Ostersen et al., 2017

Mantysaari et al., 2017

Page 17: Extension of single-step ssGBLUP to many genotyped individuals · Genomic selection and single-step • Simplicity – No DYD or DP – No index – No complexity • Accuracy –

If G has limited dimensionality, can G-1

be sparse like A-1?

Page 18: Extension of single-step ssGBLUP to many genotyped individuals · Genomic selection and single-step • Simplicity – No DYD or DP – No index – No complexity • Accuracy –

Use of a la Henderson’s rules?

Use of relatives for G-1

Accuracies not good enoughTheory not clear

Page 19: Extension of single-step ssGBLUP to many genotyped individuals · Genomic selection and single-step • Simplicity – No DYD or DP – No index – No complexity • Accuracy –

Assumption of limited dimensionality

= +u Ts eBreeding value

s

n x 1 vector containing additive information ofpopulation (haplotypes, chromosome segments, LD blocks)?

1 c c−≈s T uIf uc

contains n animals:

Very small error

Breeding values of any n animals contains all additive information

Page 20: Extension of single-step ssGBLUP to many genotyped individuals · Genomic selection and single-step • Simplicity – No DYD or DP – No index – No complexity • Accuracy –

=

c c

ncn n

I 0u uP Iu ε

= 0 cncc

nc nn

I 0 I PG 0G

P I 0 IM

−−

− = −

11 1cccn

ncnn

I 0G 0I PG

P I0 M0 I

Choose core “c”

and noncore “n”

animals

ε= +n nc c nu P u=c cu u

=var( )n nnε M

Page 21: Extension of single-step ssGBLUP to many genotyped individuals · Genomic selection and single-step • Simplicity – No DYD or DP – No index – No complexity • Accuracy –

How to estimate P

and inv(G)?2var c cc cnu

n nc nn

σ

=

u G Gu G G

cc cc cnnc cc

-1 -1-1 -1 -1G 0 G GG = + M G 'G I

0 0 I

1 1| ,n c nc cc c nc cc− −= =u u G G u P G G

G is “true”

relationship matrix

APY algorithm(Algorithm for Proven and Young)

Page 22: Extension of single-step ssGBLUP to many genotyped individuals · Genomic selection and single-step • Simplicity – No DYD or DP – No index – No complexity • Accuracy –

Properties of APY algorithm

Cost: Almost linear memory and computations

G G-1

G

APY G-1

Cost: Quadratic memory and cubic computations

Page 23: Extension of single-step ssGBLUP to many genotyped individuals · Genomic selection and single-step • Simplicity – No DYD or DP – No index – No complexity • Accuracy –

EAAP meeting 2016

Page 24: Extension of single-step ssGBLUP to many genotyped individuals · Genomic selection and single-step • Simplicity – No DYD or DP – No index – No complexity • Accuracy –

Reliabilities –

Holsteins (77k)

Final score

regular G-1

4.5k 8k 14k 19k 77kNeL 2NeL 4NeL

Pocrnic et al., 2016b

Page 25: Extension of single-step ssGBLUP to many genotyped individuals · Genomic selection and single-step • Simplicity – No DYD or DP – No index – No complexity • Accuracy –

Distribution of segments/haplotypes/..

Assumed dimensionality

Relia

bilit

y

100%

≈4NeL

Page 26: Extension of single-step ssGBLUP to many genotyped individuals · Genomic selection and single-step • Simplicity – No DYD or DP – No index – No complexity • Accuracy –

Costs with 720k genotyped animals

30 M Holsteins•

50 M records•

764k 60k genotypes

Item BLUP ssGBLUPAPY G - 7 hA22-1 - 10 minrounds 402 464Time/round 51 s 83 sTotal time 6 h 17 h

Masuda et al., 2017

Page 27: Extension of single-step ssGBLUP to many genotyped individuals · Genomic selection and single-step • Simplicity – No DYD or DP – No index – No complexity • Accuracy –

Which core animals in APY?Bradford et al. (2017)

Simulated populations (QMSim; Sargolzaei and Schenkel, 2009)

Ne = 40

#genotyped animals = 50,000

Core animals:

Random gen 6 || gen 7 || gen8 || gen9 || gen 10 (y)

Random all generations30

Page 28: Extension of single-step ssGBLUP to many genotyped individuals · Genomic selection and single-step • Simplicity – No DYD or DP – No index – No complexity • Accuracy –

Which core animals in APY?

G-1

31

Bradford et al. (2016)

4NeL NeL

Page 29: Extension of single-step ssGBLUP to many genotyped individuals · Genomic selection and single-step • Simplicity – No DYD or DP – No index – No complexity • Accuracy –

Persistence over generations

Generations

Relia

bilit

y

1.0

BLUP

GBLUP –

small population

BayesB –

small population

GBLUP –

very large population

GBLUP –

very large population90% genome

Very large –

equivalent to 4NeL animals with 99% accuracyAre SNP effects from Holstein national populations converging

Page 30: Extension of single-step ssGBLUP to many genotyped individuals · Genomic selection and single-step • Simplicity – No DYD or DP – No index – No complexity • Accuracy –

Theory of limited dimensionalityNumber of haplotypes: 4 Ne LNe within each ¼

Morgan segment

¼

Morgan

Ne haplotypes

QTLsGenome haplotypes

Dimensionality of ¼

Morgan case: Ne

Fragomeni et al., 2018

Ne haplotypes

Ne haplotypes

or number of identified QTLsReduced dimensionality with weighted GRM

Page 31: Extension of single-step ssGBLUP to many genotyped individuals · Genomic selection and single-step • Simplicity – No DYD or DP – No index – No complexity • Accuracy –

ssGBLUP accuracies using SNP60K and 100 QTNs –

simulation study

Fragomeni et al. (2017)

Rank

19k

5k

98

Page 32: Extension of single-step ssGBLUP to many genotyped individuals · Genomic selection and single-step • Simplicity – No DYD or DP – No index – No complexity • Accuracy –

Multitrait ssGBLUP or SNP selection?

SNP selection/weighting (BayesB, etc.) –

Large impact with few genotypes–

Little or no impact with many

Page 33: Extension of single-step ssGBLUP to many genotyped individuals · Genomic selection and single-step • Simplicity – No DYD or DP – No index – No complexity • Accuracy –

Variance components

Based on SNP–

limitations

REML based on relationships–

Equations no longer sparse–

YAMS sparse matrix package –up to 100 times speedup (Masuda et al., 2017)

APY for REML

Method R (Legarra and Reverter, 2017)

Page 34: Extension of single-step ssGBLUP to many genotyped individuals · Genomic selection and single-step • Simplicity – No DYD or DP – No index – No complexity • Accuracy –

Extra topics

Matching pedigrees and genomic relationships•

Missing pedigrees•

Crossbreeding•

Causative SNP

Haplotypes for crossbreds (Christensen et al., 2016)•

Metafounders (Legarra et al., 2016)•

Approximation of reliabilities

Page 35: Extension of single-step ssGBLUP to many genotyped individuals · Genomic selection and single-step • Simplicity – No DYD or DP – No index – No complexity • Accuracy –

Conclusions

Limited dimensionality of genomic information due to limited effective population size

ssGBLUP suitable for any data set and model

With large data sets for Holsteins:–

Good persistence of predictions–

Convergence of predictions from different countries

Page 36: Extension of single-step ssGBLUP to many genotyped individuals · Genomic selection and single-step • Simplicity – No DYD or DP – No index – No complexity • Accuracy –

Acknowledgements

Yutaka Masuda

Shogo Tsuruta

Daniela Lourenco

Breno Fragomeni

Ignacio Aguilar

Andres Legarra

Ivan Pocrnic

Heather Bradford

Tom Lawlor, Holstein AssocPaul VanRaden, AGIL USDA

Page 37: Extension of single-step ssGBLUP to many genotyped individuals · Genomic selection and single-step • Simplicity – No DYD or DP – No index – No complexity • Accuracy –

Theory for APY

Breeding values of core animals linear functions of:–

Independent chromosome segments (Me)–

Independent effective SNP

E(Me)=4 Ne L (Stam, 1980; VanRaden, 2008)Ne –effective population sizeL –

length of genome in Morgans

Me = 4 (Ne=100) (L=30) =12,000

Page 38: Extension of single-step ssGBLUP to many genotyped individuals · Genomic selection and single-step • Simplicity – No DYD or DP – No index – No complexity • Accuracy –

QTL

Accuracy and distance from markers to QTL

46

Fragomeni

et al. (2017)