multi-population genomic relationships for estimating current … · 2017. 9. 29. · on...

13
HIGHLIGHTED ARTICLE | GENOMIC SELECTION Multi-population Genomic Relationships for Estimating Current Genetic Variances Within and Genetic Correlations Between Populations Yvonne C. J. Wientjes, 1 Piter Bijma, Jérémie Vandenplas, and Mario P. L. Calus Wageningen University and Research, Animal Breeding and Genomics, 6700 AH, The Netherlands ABSTRACT Different methods are available to calculate multi-population genomic relationship matrices. Since those matrices differ in base population, it is anticipated that the method used to calculate genomic relationships affects the estimate of genetic variances, covariances, and correlations. The aim of this article is to dene the multi-population genomic relationship matrix to estimate current genetic variances within and genetic correlations between populations. The genomic relationship matrix containing two populations consists of four blocks, one block for population 1, one block for population 2, and two blocks for relationships between the populations. It is known, based on literature, that by using current allele frequencies to calculate genomic relationships within a population, current genetic variances are estimated. In this article, we theoretically derived the properties of the genomic relationship matrix to estimate genetic correlations between populations and validated it using simulations. When the scaling factor of across-population genomic relationships is equal to the product of the square roots of the scaling factors for within-population genomic relationships, the genetic correlation is estimated unbiasedly even though estimated genetic variances do not necessarily refer to the current population. When this property is not met, the correlation based on estimated variances should be multiplied by a correction factor based on the scaling factors. In this study, we present a genomic relationship matrix which directly estimates current genetic variances as well as genetic correlations between populations. KEYWORDS genetic correlation between populations; genomic relationships; genetic variance; multi-trait model; Genomic Selection; Shared Data Resources; GenPred W HEN estimating the additive genetic values of indi- viduals, relationships between individuals are used to describe the covariances between additive genetic values. Those relationships are expressed relative to a base popula- tion, consisting of, on average, unrelated individuals that have average self-relationships of one, for which the additive genetic variance is estimated. The method used to calculate the relationships affects the used base population and, therefore, the estimated additive genetic variance as well (Speed and Balding 2015; Legarra 2016). By using current allele frequencies to calculate a genomic relationship ma- trix, the current population is the base population for which additive genetic variances are estimated (Hayes et al. 2009). Genomic data enable the calculation of relationships be- tween distantly related individuals, for example between individuals from different populations. Those relationships can be used to estimate genetic correlations between popu- lations using a multi-trait model (Karoui et al. 2012), where the same trait in each population is modeled as a different trait. Due to differences in allele frequencies and environ- ments, in combination with nonadditive effects and genotype- by-environment interactions, allele substitution effects of causal loci can differ between populations (e.g., Fisher 1918, 1930; Falconer 1952). Moreover, some causal loci might seg- regate in only one population. Therefore, genetic correlations between populations can differ from one. The genetic correlation between populations is an im- portant parameter since it is used to understand the genetic architecture and evolution of complex traits, such as dis- ease traits in humans (De Candia et al. 2013; Brown et al. 2016). In genomic prediction, combining populations in one training population is important for applications in animals (e.g., Karoui et al. 2012; Olson et al. 2012), plants (e.g., Lehermeier et al. 2015), and humans (e.g., De Candia Copyright © 2017 by the Genetics Society of America doi: https://doi.org/10.1534/genetics.117.300152 Manuscript received March 17, 2017; accepted for publication August 15, 2017; published Early Online August 16, 2017. Supplemental material is available online at www.genetics.org/lookup/suppl/doi:10. 1534/genetics.117.300152/-/DC1. 1 Corresponding author: Wageningen University and Research, Animal Breeding and Genomics, P.O. box 338, 6700 AH Wageningen, The Netherlands. E-mail: yvonne. [email protected] Genetics, Vol. 207, 503515 October 2017 503

Upload: others

Post on 24-Jan-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Multi-population Genomic Relationships for Estimating Current … · 2017. 9. 29. · on literature, that by using current allele frequencies to calculate ge nomic relationships within

HIGHLIGHTED ARTICLE| GENOMIC SELECTION

Multi-population Genomic Relationships forEstimating Current Genetic Variances Within and

Genetic Correlations Between PopulationsYvonne C. J. Wientjes,1 Piter Bijma, Jérémie Vandenplas, and Mario P. L. Calus

Wageningen University and Research, Animal Breeding and Genomics, 6700 AH, The Netherlands

ABSTRACT Different methods are available to calculate multi-population genomic relationship matrices. Since those matrices differ inbase population, it is anticipated that the method used to calculate genomic relationships affects the estimate of genetic variances,covariances, and correlations. The aim of this article is to define the multi-population genomic relationship matrix to estimate current geneticvariances within and genetic correlations between populations. The genomic relationship matrix containing two populations consists of fourblocks, one block for population 1, one block for population 2, and two blocks for relationships between the populations. It is known, basedon literature, that by using current allele frequencies to calculate genomic relationships within a population, current genetic variances areestimated. In this article, we theoretically derived the properties of the genomic relationship matrix to estimate genetic correlations betweenpopulations and validated it using simulations. When the scaling factor of across-population genomic relationships is equal to the product ofthe square roots of the scaling factors for within-population genomic relationships, the genetic correlation is estimated unbiasedly eventhough estimated genetic variances do not necessarily refer to the current population. When this property is not met, the correlation basedon estimated variances should be multiplied by a correction factor based on the scaling factors. In this study, we present a genomicrelationship matrix which directly estimates current genetic variances as well as genetic correlations between populations.

KEYWORDS genetic correlation between populations; genomic relationships; genetic variance; multi-trait model; Genomic Selection; Shared Data

Resources; GenPred

WHEN estimating the additive genetic values of indi-viduals, relationships between individuals are used to

describe the covariances between additive genetic values.Those relationships are expressed relative to a base popula-tion, consisting of, on average, unrelated individuals thathave average self-relationships of one, forwhich the additivegenetic variance is estimated. The method used to calculatethe relationships affects the used base population and,therefore, the estimated additive genetic variance as well(Speed and Balding 2015; Legarra 2016). By using currentallele frequencies to calculate a genomic relationship ma-trix, the current population is the base population for whichadditive genetic variances are estimated (Hayes et al. 2009).

Genomic data enable the calculation of relationships be-tween distantly related individuals, for example betweenindividuals from different populations. Those relationshipscan be used to estimate genetic correlations between popu-lations using a multi-trait model (Karoui et al. 2012), wherethe same trait in each population is modeled as a differenttrait. Due to differences in allele frequencies and environ-ments, in combination with nonadditive effects and genotype-by-environment interactions, allele substitution effects ofcausal loci can differ between populations (e.g., Fisher 1918,1930; Falconer 1952). Moreover, some causal loci might seg-regate in only one population. Therefore, genetic correlationsbetween populations can differ from one.

The genetic correlation between populations is an im-portant parameter since it is used to understand the geneticarchitecture and evolution of complex traits, such as dis-ease traits in humans (De Candia et al. 2013; Brown et al.2016). In genomic prediction, combining populations inone training population is important for applications inanimals (e.g., Karoui et al. 2012; Olson et al. 2012), plants(e.g., Lehermeier et al. 2015), and humans (e.g., De Candia

Copyright © 2017 by the Genetics Society of Americadoi: https://doi.org/10.1534/genetics.117.300152Manuscript received March 17, 2017; accepted for publication August 15, 2017;published Early Online August 16, 2017.Supplemental material is available online at www.genetics.org/lookup/suppl/doi:10.1534/genetics.117.300152/-/DC1.1Corresponding author: Wageningen University and Research, Animal Breeding andGenomics, P.O. box 338, 6700 AH Wageningen, The Netherlands. E-mail: [email protected]

Genetics, Vol. 207, 503–515 October 2017 503

Page 2: Multi-population Genomic Relationships for Estimating Current … · 2017. 9. 29. · on literature, that by using current allele frequencies to calculate ge nomic relationships within

et al. 2013). Although the genetic correlation between thosepopulations is not needed as an explicit parameter forimplementing multi-population genomic prediction, itsvalue does determine the benefit of combining those popu-lations in one training population (Wientjes et al. 2016).

Different methods are available to calculate multi-population genomic relationship matrices (Harris and Johnson2010; Erbe et al. 2012; Chen et al. 2013; Makgahlela et al.2013). The two most important differences between themethods are: (1) the assumed relation between effect sizeand allele frequency of loci, namely assuming that across-locieffect size and allele frequency are independent (e.g., method1 of VanRaden 2008) or assuming that loci with a lower allelefrequency have a larger effect (e.g., method 2 of VanRaden2008 and Yang et al. 2010); and (2) the used allele frequency,namely allele frequencies specific to each population, theaverage allele frequency across the populations, or the esti-mated allele frequency when the populations separated. Themethod used to calculate the genomic relationships is likelyto affect the genetic correlation estimate between popula-tions, but the required properties for unbiasedly estimatingthis genetic correlation are not yet known.

The aim of this article is to define the multi-populationgenomic relationship matrix to directly estimate current ge-netic variances within and genetic correlations between pop-ulations.We theoretically derived this relationshipmatrix andvalidated it using simulations. Since the true relationshipsbetween individuals for a certain trait are definedat the causalloci and the aim of this article is to define the theoreticallyappropriate genomic relationship matrix, we present a re-lationship matrix based on genotypes at causal loci.

Materials and Methods

Theory

The additive genetic correlation, rg, is the correlation be-tween additive genetic values (A) for two traits of the sameindividual (Bohren et al. 1966; Falconer and Mackay 1996).In an additive model, rg can be shown to be equal to theaverage correlation between allele substitution effects atcausal loci of two traits (ra), denoted as trait 1 and 2, underthe following assumptions: (1) the correlation originatesfrom pleiotropy; (2) genetic values are independent be-tween loci (i.e., the effect at one locus for a certain trait isnot a predictor of the effect at another locus for the sametrait. Note that this does not require linkage equilibrium(LE) between causal loci, but an equal probability for apositive allele at one locus to be linked to either a positiveor a negative allele at the other locus); and (3) across loci,allele substitution effects and allele frequency are indepen-dent from each other (i.e., the allele substitution effect at alocus does not depend on the allele frequency at that locus).This equality can be shown for individual i by consideringboth genotypes (z) and allele substitution effects (a) at allnc causal loci as random:

VarðAi1Þ ¼ Var

Xjzija1j

!¼ E

" Xjzija1j

! Xl

zila1l

!#

¼Xj

E�zijzij

�E�a1ja1j

�¼ ncE�zijzij

�E�a1ja1j

�VarðAi2Þ ¼ ncE

�zijzij

�E�a2ja2j

�CovðAi1;Ai2Þ ¼ Cov

0@X

jzija1j;

Xl

zila2l

1A

¼ E

240@X

jzija1j

1A X

l

zila2l

!35¼Xj

E�zijzij

�E�a1ja2j

� ¼ ncE�zijzij

�E�a1ja2j

rg ¼ CovðAi1;Ai2ÞffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiVarðAi1Þ

p ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiVarðAi2Þ

p¼ ncE

�zijzij

�E�a1ja2j

�ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffincE�zijzij

�E�a1ja1j

�q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffincE�zijzij

�E�a2ja2j

�q¼ E

�a1ja2j

�ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiE�a1ja1j

�q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiE�a2ja2j

�q ¼ sa12ffiffiffiffiffiffiffiffis2a1

q ffiffiffiffiffiffiffiffis2a2

q ¼ ra;

(1)

where j and l denote different causal loci, s2a1

and s2a2

arevariances of allele substitution effects across causal lociwithin population 1 and 2, and sa12 is the covariance betweenallele substitution effects of population 1 and 2 across causalloci. Genotypes are represented by allele counts coded as 0, 1,and 2 that are centered by subtracting 2p, where p is theallele frequency for the counted allele.

Similar to genetic correlations between traits in one popula-tion, the genetic correlation (rg) between populations can beestimated in a multi-trait model using a relationship matrixand REML by modeling the phenotypes of two populations asdifferent traits (Karoui et al. 2012). This approach is known asmulti-traitGREML. In the following,we refer to trait 1 as the traitexpressed in population 1, and to trait 2 as the trait expressed inpopulation 2. When considering performance in different popu-lations as different traits, individuals have a phenotype for onlyone trait. Therefore, the (co)variance structure of the additivegenetic values can be written as (Visscher et al. 2014)

�a1a2

�� N

�00

�;

�Varða1Þ Covða1; a2Þ

Covða2; a1Þ Varða2Þ�!

¼ N

�00

�;

�G11s

21 G12s12

G21s12 G22s22

�!: (2)

where a1 is the vector with additive genetic values for indi-viduals from population 1 for trait 1, a2 is the analogous vectorfor individuals from population 2 for trait 2, s2

1 and s22 are

genetic variances for trait 1 and 2, s12 is the genetic covariancebetween the traits,G11 is amatrix with genomic relationships inpopulation 1, G22 is a matrix with genomic relationships in

504 Y. C. J. Wientjes et al.

Page 3: Multi-population Genomic Relationships for Estimating Current … · 2017. 9. 29. · on literature, that by using current allele frequencies to calculate ge nomic relationships within

population 2, and G12 and G21(=G912) are matrices with

genomic relationships between population 1 and 2.

To derive the definition of the genomic relationships inEquation 2, we derive the variances and covariance of theadditive genetic values for the two traits. Naturally, this willresult in an equation to calculate the genomic relation-ship matrix (G) using multiple populations to estimate(co)variances in the current populations.

When both populations are in Hardy–Weinberg equilib-rium, allele substitution effects are independent from allelefrequency across loci, and, within a trait, genetic values atcausal loci are independent from each other; the genetic var-iance for trait 1 is s2

1 ¼P2p1jð12 p1jÞs2a1, where p1j is the

allele frequency at locus j in population 1 (Falconer andMackay 1996). Hence, the variance of a1 is:

Varða1Þ ¼ VarðZ1a1Þ ¼ Z1Z91s2a1

¼ Z1Z91P2p1jð12 p1jÞs

21; (3)

where Z1 is a n1 3 nc matrix of centered genotypes for allindividuals from population 1 (n1) for all causal loci, anda1 isa vector of length nc with allele substitution effects at causalloci for trait 1.

Similarly,

Varða2Þ ¼ Z2Z92P2p2jð12 p2jÞ

s22: (4)

The genetic covariance between two traits is:

s12 ¼ rgffiffiffiffiffiffis21

q ffiffiffiffiffiffis22

q¼ rg

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiX2p1j

�12 p1j

�s2a1

q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiX2p2j

�12 p2j

�s2a2

q¼ sa12

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiX2p1j

�12 p1j

�q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiX2p2j

�12 p2j

�q: (5)

Therefore, the covariance between the genetic values ofpopulation 1 and 2 is:

Covða1; a2Þ ¼ CovðZ1a1;Z2a2Þ ¼ Z1Z92sa12

¼ Z1Z92ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiP2p1jð12 p1jÞ

p ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiP2p2jð12 p2jÞ

p s12: (6)

From Equations 3, 4, and 6, it follows that the genomicrelationship matrix (G) is:

When allele frequencies from the current population are used,G from Equation 7 estimates current genetic (co)variances.Lourenco et al. (2016) presented a comparable G matrixfor combining purebred and crossbred animals. Note thatthe covariance of the genotypes between the popula-tions, Z2Z9

1, is divided by the SDs of the genotypes in each

population,ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiP

2p1jð12 p1jÞp

andffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiP

2p2jð12 p2jÞp

. There-fore, the relationships in this G are defined as correlationsbetween the genotypes of the individuals.

In Equation 7, G uses three different scaling factors for thedifferent blocks, k1 ¼P2p1jð12 p1jÞ, k2 ¼P2p2jð12 p2jÞ,and k12 ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiP

2p1jð12 p1jÞp ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiP

2p2jð12 p2jÞp

. Note thatk12 ¼ ffiffiffiffiffi

k1p ffiffiffiffiffi

k2p

, but since this is not a general property ofgenomic relationship matrices, we separately definedk12. Hence, the variance–covariance matrix in Equation2 becomes:

�G11s

21 G12s12

G21s12 G22s22

�¼

26664

Z1Z91k1

s21

Z1Z92k12

s12

Z2Z91k12

s12Z2Z92k2

s22

37775: (8)

Equation 8 shows that the scaling factors in G and the(co)variances are completely confounded. Therefore, whenusing other scaling factors in G for which k12 is not necessarilyequal to

ffiffiffiffiffik1

p ffiffiffiffiffik2

p, the genetic correlation can be estimated as

rg ¼s12

k12ffiffiffiffis21

k1

q ffiffiffiffis22

k2

q ¼ffiffiffiffiffik1

p ffiffiffiffiffik2

pk12

s12ffiffiffiffiffiffis21

q ffiffiffiffiffiffis22

q : (9)

Equation 9 shows that the genetic correlation is directlyestimated from the estimated variances when the scalingfactors of G fulfill the property k12 ¼ ffiffiffiffiffi

k1p ffiffiffiffiffi

k2p

. Whenk12 6¼ ffiffiffiffiffi

k1p ffiffiffiffiffi

k2p

, the correlation based on estimated variances

should be multiplied by� ffiffiffiffi

k1p ffiffiffiffi

k2p

k12

�to correct the estimate. By

changing the scaling factors, the estimated genetic varianceschange as well. When genetic variances of the current pop-ulation are of interest, the within-population blocks in G

G ¼�G11 G12G21 G22

�¼

2666666664

Z1Z91P2p1jð12 p1jÞ

Z1Z92ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiP2p1jð12 p1jÞ

p ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiP2p2jð12 p2jÞ

pZ2Z91ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiP

2p1jð12 p1jÞp ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiP

2p2jð12 p2jÞp Z2Z92P

2p2jð12 p2jÞ

3777777775: (7)

Relationships Between Populations 505

Page 4: Multi-population Genomic Relationships for Estimating Current … · 2017. 9. 29. · on literature, that by using current allele frequencies to calculate ge nomic relationships within

should be scaled as in Equation 7 and allele frequenciesfrom the current population should be used (Hayes et al.2009; Legarra 2016), or the estimated variance component

should be multiplied byP

2p1jð12 p1jÞk1

for population 1 or byP2p2jð12 p2jÞ

k2for population 2.

Equation 8 and Equation 9 show that the genetic correla-tion is estimatedwhen the scaling factors inG are the same forall blocks. When all scaling factors are equal to 1, so effec-tively no scaling factor is used, the (co)variances representthe (co)variances of the allele substitution effects acrosscausal loci, i.e., s2

a1, s2

a2, and sa12. A disadvantage of this

scaling is that elements of G can become very large, whichcan result in very small estimated variances that may beflagged as too small in statistical software. This might beprevented by either scaling up the phenotypic variance bymultiplying all phenotypes by a constant, or by scalingdown the elements in G by dividing all elements by thesame constant. Both scaling approaches have no influenceon the genetic correlation, but do affect the estimated ge-netic (co)variances.

Simulations

Simulations were used to validate G (Equation 7). Two sce-narios were simulated, with causal loci either in LE or inlinkage disequilibrium (LD) with each other. Note that inboth scenarios, no selection was present and genetic valueswere independent between causal loci.

For both scenarios, two populations (2500 individualseach) with phenotypes for a trait influenced by the same15,000 lociwere simulated. For thefirst scenario,with causalloci in LE, allele frequencies of loci were randomly sampledfrom a U-shape distribution, independently in both popula-tions. Thereafter, genotypes were allocated to individualsaccording to the Hardy–Weinberg equilibrium, assumingthat loci were segregating independently.

For the second scenario,with causal loci in LD,apopulationstructure was simulated in QMSim software (Sargolzaeiand Schenkel 2009). An historical population was simu-lated for 1000 generations. Population size was 2000(1000 males, 1000 females) in generation 1 and this wasgradually reduced to 100 individuals in generation 500,after which it increased again to 2000 individuals in gen-eration 1000. This bottleneck was simulated to generateLD. The simulated genome consisted of 30 chromosomesof 100 cM each, with 100,000 randomly positioned dimor-phic loci per chromosome; a recurrent mutation rate of0.00005; and on average one recombination per chromo-some. After 1000 generations, the historical populationwas split in two populations with 250 males and 500 fe-males, and a litter size of 5. At this split, 60,000 loci witha minor allele frequency of at least 0.05 were selected andthe mutation rate was set to zero. After another 500 non-overlapping generations with random mating, 15,000 locisegregating in both populations were randomly selected to

become causal loci. Allele frequencies of causal loci fol-lowed a uniform distribution, and neighboring causal lociwere on average 0.2 cM apart with an average r2 valueof 0.25.

For both scenarios, allele substitution effects were sam-pled from a bivariate normal distribution with mean zeroand variance 1, and a correlation of 0.5 between allelesubstitution effects in both populations. Allele substitutioneffects were multiplied by the corresponding genotypes tocalculate additive genetic values for individuals, assumingadditive gene action. Environmental effects were sampledfrom a normal distribution with variance ( 1

h221) times thegenetic variance, where the genetic variance was calcu-lated across all individuals in both populations. Heritabil-ity was set to 0.9, to ensure that there was sufficient powerto estimate (co)variances. Phenotypes were the sum ofadditive genetic and environmental effects, and were stan-dardized to an average of 0 and SD of 100. Simulationswere replicated 100 times.

Phenotypes were analyzed using the following bivariatemodel:�

y1y2

�¼�x1 00 x2

��m1m2

�þ�Z1 00 Z2

��a1a2

�þ�e1e2

�;

where y1 and y2 are vectors with phenotypes for popula-tion 1 and 2, x1 and x2 are incidence vectors relating phe-notypes to the mean in population 1 (m1) or population2 (m2), Z1 and Z2 are incidence matrices relating pheno-types to estimated additive genetic values for performancein population 1 (a1) or population 2 (a2), and e1 and e2 arevectors containing residual effects. Estimated additive ge-netic values were assumed to follow a normal distribution

(�N��

00

�;

�G11s

21 G12s12

G21s12 G22s22

��, Equation 2), and residuals

were assumed to be independent (�N��

00

�;

�Is2

e1 00 Is2

e2

��,

where s2e1 and s2

e2 are the residual variances in population1 and 2). All analyses were performed in ASReml(Gilmour et al. 2015) using a REML approach, whichis known to result in unbiased variance estimates(Henderson 1984).

Four G matrices were used: two G matrices derivedabove, and two commonly used G matrices for multiplepopulations (Chen et al. 2013; Makgahlela et al. 2013).The methods differed in scaling factors as well as in cen-tering of genotypes, being performed either within oracross populations. For all four methods, G was based ongenotypes at causal loci and G was bent when singularitiesappeared by replacing eigenvalues below 1026 with 1026

(Jorjani et al. 2003).The first three methods centered genotypes in Z within

population as gijm 2 2pjm, where gijm is the allele count ofindividual i from populationm at locus j, and pjm is the allelefrequency in population m at locus j. The first method,G_New, scaled G following Equation 7:

506 Y. C. J. Wientjes et al.

Page 5: Multi-population Genomic Relationships for Estimating Current … · 2017. 9. 29. · on literature, that by using current allele frequencies to calculate ge nomic relationships within

In the second method, G_1, scaling factors were equal to 1:

G 1 ¼"Z1Z91 Z1Z92Z2Z91 Z2Z92

#

The third method, G_Chen, calculated G according to Chenet al. (2013):

The fourth method, G_Across, centered genotypes using theaverage allele frequency across populations instead of popu-lation-specific allele frequencies (e.g., Makgahlela et al.2013). Thus, the matrix of genotypes, denoted Z*, had ele-ments gijm 2 2�pj, where �pj is the average allele frequencyacross populations at locus j. The scaling factor was the samefor all blocks:

G Across ¼

26666664

Z*1Z*19P

2�pjð12 �pjÞZ*1Z

*29P

2�pjð12 �pjÞZ*2Z

*19P

2�pjð12 �pjÞZ*2Z

*29P

2�pjð12 �pjÞ

37777775

G_New, G_1, and G_Across fulfilled the propertyk12 ¼ ffiffiffiffiffi

k1p ffiffiffiffiffi

k2p

to directly estimate the genetic correlation.In G_Chen, k12 6¼ ffiffiffiffiffi

k1p ffiffiffiffiffi

k2p

when allele frequencies inthe two populations were different. Therefore, thecorrelation estimated with G_Chen was multiplied byffiffiffiffi

k1p ffiffiffiffi

k2p

k12¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiP2p1jð12 p1jÞ

p ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiP2p2jð12 p2jÞ

pP2ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffip1jð12 p1jÞp2jð12 p2jÞ

p to correct the esti-

mate. Moreover, the current populations were the basepopulation for within-population blocks of G_New and

G_Chen, so genetic variances in the current populationswere estimated (Speed and Balding 2015; Legarra 2016).As explained before, estimated variances of G_1 repre-sented the variances of allele substitution effects acrosscausal loci. For G_Across, the base population was notclearly defined, so the interpretation of the estimated vari-ances is unclear.

Data availability

Supplemental Material, File S1, contains the R-script andseeds to simulate genotypes and phenotypes and to calculateG matrices for the scenario with causal loci in LE. File S2contains the QMSim input file, R-script, and seeds to simulategenotypes and phenotypes and to calculateGmatrices for thescenario with causal loci in LD.

Results

Genetic variances

Estimated genetic variances usingG_New varied only slightlyaround the simulated values, both when causal loci were inLE or in LD (Figure 1 and Figure 2). This shows that G_Newunbiasedly estimated genetic variances in the current popu-lations for both scenarios.

As expected, G_New and G_Chen estimated the same ge-netic variances (Figure 3 and Figure 4). The estimated vari-ances of G_1 represent the variances of allele substitutioneffects across causal loci, i.e., s2

a1and s2

a2. By multiplying

those variances byP

2pjmð12 pjmÞ for populationm, geneticvariances identical to G_New and G_Chen were obtained.When causal loci were in LE, genetic variances estimated

G Chen ¼

2666666664

Z1Z91P2p1jð12 p1jÞ

Z1Z92P2ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffip1jð12 p1jÞp2jð12 p2jÞ

pZ2Z91P

2ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffip1jð12 p1jÞp2jð12 p2jÞ

p Z2Z92P2p2jð12 p2jÞ

3777777775

G New ¼

2666666664

Z1Z91P2p1jð12 p1jÞ

Z1Z92ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiP2p1jð12 p1jÞ

p ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiP2p2jð12 p2jÞ

pZ2Z91ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiP

2p1jð12 p1jÞp ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiP

2p2jð12 p2jÞp Z2Z92P

2p2jð12 p2jÞ

3777777775

Relationships Between Populations 507

Page 6: Multi-population Genomic Relationships for Estimating Current … · 2017. 9. 29. · on literature, that by using current allele frequencies to calculate ge nomic relationships within

with G_Across were higher than genetic variances estimatedwith G_New and G_Chen by a factor of �1.5. The scalingfactors k1 and k2 were higher by a factor of �1.5. Hence,when multiplying estimated variances of G_Across by theratio in scaling factors, variances became identical to thosewith G_New and G_Chen. The same applied for the esti-mated genetic variances with causal loci in LD, where thefactor was 1.15. So, the difference in estimated variancesbetween methods to calculate G was completely explainedby the difference in scaling factors, while centering genotypeswithin or across populations had no effect on estimated var-iances. Estimated residual variances were exactly the samefor the four G matrices.

Genetic correlation

Despite differences in (co)variance estimates, G_New, G_1,and G_Across yielded the same average estimated geneticcorrelation (Figure 5), which was an unbiased estimate ofthe simulated genetic correlation (Figure 6 and Figure 7).This is because differences in genetic covariances amongmodels were compensated by corresponding differences ingenetic variances. When causal loci were in LE, the estimatedgenetic correlation using G_Chen was �20% lower. When

multiplying this estimate byffiffiffiffik1

p ffiffiffiffik2

pk12

� 1.23, the genetic cor-relation became identical to the other three methods. Whencausal loci were in LD, the estimated genetic correlation us-ing G_Chen was �7% lower, which was in agreement withffiffiffiffi

k1p ffiffiffiffi

k2p

k12� 1.07.

Discussion

The aim of this article was to define the multi-populationgenomic relationship matrix to estimate current genetic var-iances within and genetic correlations between populations.Our derived genomic relationship matrix, G_New, yields un-biased estimates of current genetic variances, covariances, andcorrelations, both when causal loci are in LE or LD with eachother. Moreover, we showed the required property for othergenomic relationship matrices to estimate genetic correlationsbetween populations, even though estimated genetic variancesare not necessarily related to the current populations.

Methods to calculate the genomic relationship matrix

Since G_New unbiasedly estimated both current genetic var-iances within as well as genetic correlations between popu-lations, we conclude that G_New correctly defines therelationships at causal loci within as well as between popu-lations.G_Chen also estimated current genetic variances, butestimated genetic correlations had to be multiplied byffiffiffiffi

k1p ffiffiffiffi

k2p

k12. G_1 estimated the correct genetic correlation, but

estimated the variance of allele substitution effects acrosscausal loci instead of the genetic variance. Although the basepopulation in G_Across was not well defined, genetic corre-lations were correctly estimated, but there was no clear in-terpretation of estimated genetic variances. Results alsoshowed that genetic (co)variances were not affected by cen-tering the allele counts, as shown before by Strandén andChristensen (2011).

Figure 1 Estimated vs. simulated genetic variances whencausal loci were in LE. The estimated genetic variance inboth populations in each of the 100 replicates using thegenomic relationship matrix derived in this study (G_New)vs. the simulated genetic variance. The gray line repre-sents the line y = x.

Figure 2 Estimated vs. simulated genetic variances whencausal loci were in LD. The estimated genetic variance inboth populations in each of the 100 replicates using thegenomic relationship matrix derived in this study (G_New)vs. the simulated genetic variance. The gray line repre-sents the line y = x.

508 Y. C. J. Wientjes et al.

Page 7: Multi-population Genomic Relationships for Estimating Current … · 2017. 9. 29. · on literature, that by using current allele frequencies to calculate ge nomic relationships within

Table 1 gives an overview of the most frequently usedmethods to calculateG using multiple populations, with scal-ing factors and correction factors for the estimated geneticcorrelation. G_New, G_1, G_Across, and the method de-scribed by Erbe et al. (2012) directly estimate the correctgenetic correlation. G_Chen does not directly estimate thegenetic correlation, but the estimate can be corrected usingthe scaling factors. Those five methods assume that allelesubstitution effects are independent from allele frequencyacross loci, similar to method 1 of VanRaden (2008). Thisis in contrast to another regularly used method, namelymethod 2 of VanRaden (2008), also described by Yang

et al. (2010). This method yields a valid definition of rela-tionships between individuals only when the average effectat a locus is proportional to the reciprocal of the square rootof expected heterozygosity at that locus (Appendix, EquationA8). So this method assumes that, across loci, allele substi-tution effects are fully dependent on their allele frequency,with larger effects for rarer alleles. For traits determined byrelatively few genes and undergoing directional selection,this assumption may be plausible, since selection acts morestrongly on causal loci with larger effects (Haldane 1924;Wright 1931, 1937). It is, however, a very strong assumptionin general. Many traits may experience only weak selection,

Figure 4 Estimated genetic variances in population 2 when causal loci were in LE. The estimated genetic variance in population 2 in each of the100 replicates using the genomic relationship matrix derived in this study (G_New) vs. the estimated genetic variance using population-specific allelefrequencies and either a genomic relationship matrix with scaling factors set to 1 (G_1), or based on the method of Chen et al. (2013) (G_Chen), orusing allele frequencies across populations (G_Across).

Figure 3 Estimated genetic variances in population 1 when causal loci were in LE. The estimated genetic variance in population 1 in each of the100 replicates using the genomic relationship matrix derived in this study (G_New) vs. the estimated genetic variance using population-specific allelefrequencies and either a genomic relationship matrix with scaling factors set to 1 (G_1), or based on the method of Chen et al. (2013) (G_Chen), orusing allele frequencies across populations (G_Across).

Relationships Between Populations 509

Page 8: Multi-population Genomic Relationships for Estimating Current … · 2017. 9. 29. · on literature, that by using current allele frequencies to calculate ge nomic relationships within

and/or are determined by many genes. In those cases, theallele frequency distribution is determined mainly by the in-terplay of mutation and drift, and a direct relationship be-tween effect size and allele frequency is not expected.Therefore, the assumption that across loci allele substitutioneffects and allele frequency are independent seems more re-alistic for most traits. Moreover, when across loci allele sub-stitution effects depend on allele frequency, effects for exactlythe same trait would differ between populations when allelefrequencies differ. This makes the interpretation of estimatedgenetic correlations between populations using method 2 ofVanRaden (2008) rather difficult. For those reasons, we ad-vise the use of G matrices based on method 1 instead ofmethod 2 of VanRaden (2008), especially when multiplepopulations are considered.

For a specific trait, relationships between individuals aredefined by the relationships at causal loci for that trait.Because LD between causal loci surfaces in the genomicrelationships, LD between causal loci does not create bias

in estimated genetic (co)variances and correlations. Sincecausal loci are generally unknown, genomic marker dataare used to estimate genomic relationships. By usingmarkers,differences in LD between markers and causal loci can reducethe estimated genetic correlation (Gianola et al. 2015). Thismay be especially important for estimating genetic correla-tions between populations, since the strength and phase ofLD differs across populations in humans (Sawyer et al. 2005),livestock (e.g., Heifetz et al. 2005; Gautier et al. 2007;Veroneze et al. 2013), and plants (Flint-Garcia et al. 2003;Lehermeier et al. 2014). Moreover, markers might not ex-plain all genetic variance (e.g., Yang et al. 2010; Daetwyleret al. 2013), which can affect the estimated genetic correla-tion when the genetic effects captured by markers have ahigher or lower genetic correlation than the part not captured(Bulik-Sullivan et al. 2015). Here, the focus was to theoreti-cally define the multi-population genomic relationship ma-trix. Since the true relationships between individuals for acertain trait are defined at the causal loci, we used genotypes

Figure 5 Estimated genetic correlations between population 1 and 2 when causal loci were in LE. The estimated genetic correlation betweenpopulation 1 and 2 in each of the 100 replicates using the genomic relationship matrix derived in this study (G_New) vs. the estimated geneticcorrelation using population-specific allele frequencies and either a genomic relationship matrix with scaling factors set to 1 (G_1), based on the methodof Chen et al. (2013) (G_Chen), or using allele frequencies across populations (G_Across).

Figure 6 Boxplot of estimated genetic correlations usingfour methods to calculate genomic relationships withcausal loci in LE. The estimated genetic correlation be-tween population 1 and 2 in each of the 100 replicatesusing the genomic relationship matrix derived in this study(G_New), using population-specific allele frequencies andeither a genomic relationship matrix with scaling factorsset to 1 (G_1), or based on the method of Chen et al.(2013) (G_Chen), or using allele frequencies across pop-ulations (G_Across). The simulated genetic correlation was0.5.

510 Y. C. J. Wientjes et al.

Page 9: Multi-population Genomic Relationships for Estimating Current … · 2017. 9. 29. · on literature, that by using current allele frequencies to calculate ge nomic relationships within

of causal loci to define G. A clear definition of the genomicrelationships is the essential starting point for estimating ge-nomic relationships using marker information.

Other approaches to estimate genetic correlationsbetween populations

We focused on using genomic relationships in a multi-traitmodel to estimate genetic correlations between populations.Genetic correlations can also be estimated using summarystatistics of genome-wide association studies (GWAS) (Bulik-Sullivan et al. 2015; Brown et al. 2016) or using randomregression on genotypes (Sørensen et al. 2012; Krag et al.2013). The method based on summary statistics of GWAScombines information from multiple studies and weightsestimated marker effects by LD overlap and correspondingz score (Bulik-Sullivan et al. 2015; Brown et al. 2016). Thismethod is beneficial when collecting enough data is expen-sive and data sharing is not possible. It is, however, notknown whether this method estimates the correct geneticcorrelation. The method using random regression on geno-types is equivalent to the multi-trait GREMLmethod used in

this study, since both estimate the same additive geneticvalues when genotypes are centered and scaled in the sameway (Habier et al. 2007; VanRaden 2008; Goddard 2009).Variances estimated with random regression on centeredgenotypes represent variances of allele substitution effectsacross loci (Meuwissen et al. 2001), similar to G_1. Hence,random regression on centered genotypes can also beused to estimate genetic correlations between popula-tions. When genotypes for random regression are centeredand scaled, estimated genetic correlations become equal tothe estimates using G based on method 2 of VanRaden(VanRaden 2008; Yang et al. 2010). Therefore, the inter-pretation of this estimated genetic correlation remains un-clear as well.

Importance of the genetic correlationbetween populations

Populations differ in both genetic and environmental factors,which can result in considerable differences in the expressionof complex traits across populations. The genetic correlationbetween populations provides insight into the differences in

Table 1 Overview of frequently used method to calculate G across populations with scaling and correction factors

Method ofcalculating Ga Described by

Used scaling factors of the different blocks in Gb Correction factorto correct the genetic

correlationk1c k2c k12c

G_New This studyP

2p1ið12p1iÞP

2p2ið12p2iÞffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiP

2p1ið12p1iÞp ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiP

2p2ið12 p2iÞp

Not neededG_1 This study 1 1 1 Not needed

G_Chen Chen et al. (2013)P

2p1ið12p1iÞP

2p2ið12p2iÞP

2ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffip1ið12p1iÞp2ið12p2iÞ

p ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiP2p1i ð12p1i Þ

p ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiP2p2i ð12p2i Þ

pP2ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffip1i ð12p1i Þp2i ð12 p2i Þ

p

G_Across VanRaden (2008),Makgahlela et al. (2013)

P2�pið12 �piÞ

P2�pið12 �piÞ

P2�pið12 �piÞ Not needed

Erbe Erbe et al. (2012)P

2p*i ð12p*i ÞP

2p*i ð12p*i ÞP

2p*i ð12p*i Þ Not neededVanRaden

method 2/Yang

VanRaden (2008),Yang et al. (2010)

No. of locid No. of locid No. of locid Unknown

a Methods were compared assuming that no adjustment for inbreeding or regression toward the pedigree relationship matrix was performed.b k1 is the scaling factor of the block containing relationships in population 1, k2 is the scaling factor of the block containing relationships in population 2, and k12 is thescaling factor of the block containing relationships between population 1 and 2.

c p1i is the allele frequency in population 1, p2i is the allele frequency in population 2, �pi is the average allele frequency across populations, and p*i is the estimated allelefrequency when the populations separated.

d Per locus i, genotypes are scaled byffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2pið12 piÞ

p:

Figure 7 Boxplot of estimated genetic correlations usingfour methods to calculate genomic relationships withcausal loci in LD. The estimated genetic correlation be-tween population 1 and 2 in each of the 100 replicatesusing the genomic relationship matrix derived in this study(G_New), using population-specific allele frequencies andeither a genomic relationship matrix with scaling factorsset to 1 (G_1), or based on the method of Chen et al.(2013) (G_Chen), or using allele frequencies across pop-ulations (G_Across). The simulated genetic correlationwas 0.5.

Relationships Between Populations 511

Page 10: Multi-population Genomic Relationships for Estimating Current … · 2017. 9. 29. · on literature, that by using current allele frequencies to calculate ge nomic relationships within

genetic architecture of traits across populations (Brown et al.2016). A low genetic correlation between populations indi-cates that causal loci have different effects and/or that dif-ferent causal loci are underlying the trait. This informationhas important implications for transferring the results of bio-medical studies or GWAS from one population to another.Moreover, the genetic correlation provides insight into thepotential to use information across populations for genomicprediction.When the genetic correlation is low, the accuracyof estimated genetic values is unlikely to increase by com-bining populations in one training population or by usinginformation about the location of casual variants across pop-ulations, as is done in multi-task Bayesian models (Chenet al. 2014; Technow and Totir 2015), since effects andlocations of causal loci are likely to be different.

Another factor affecting the benefit of sharing informationacross populations is the relatedness between the popula-tions, which is expected to be at least partly related to thegenetic correlation between populations. More distantlyrelated populations generally have more different allelefrequencies due to an accumulation of the effects of selec-tion and genetic drift over generations (e.g., Falconer andMackay 1996). In combination with nonadditive effects(Fisher 1918, 1930), those differences in allele frequenciesreduce the genetic correlation. The genetic correlation, how-ever, differs across traits (e.g., Karoui et al. 2012; Zhou et al.2014; Brown et al. 2016) and is also affected by differences inthe environments of the populations (Falconer 1952). Thisshows the importance of investigating the genetic correla-tion for the trait of interest as well as the relatedness be-tween the populations when deciding to use informationacross populations.

For implementing multi-population genomic prediction,explicit and accurate knowledge of genetic (co)variancesand correlations is not required. Therefore, accuracy ofestimated genetic values is quite consistent across methodsfor calculating G (Makgahlela et al. 2013, 2014; Lourencoet al. 2016). For predicting the accuracy in those scenarios,however, an accurate estimation of genetic correlations isessential (Wientjes et al. 2015, 2016). Generally, combiningpopulations is beneficial when the training population forone of the populations is small and the genetic relatednessand correlation between the populations high, which is forexample the case between subpopulations from the samebreed kept in different environments.

Conclusions

The properties of genomic relationships affect estimates ofgenetic variances within as well as genetic correlations be-tween populations. For estimating current genetic variances,allele frequencies of the current population should be used tocalculate relationships within that population. For estimatinggenetic correlations between populations, scaling factors ofthe different blocks of the relationship matrix, based onmethod 1 of VanRaden (2008), should fulfill the propertyk12 ¼ ffiffiffiffiffi

k1p ffiffiffiffiffi

k2p

. When this property is not fulfilled, estimated

genetic correlations can be corrected by multiplying the esti-

mate byffiffiffiffik1

p ffiffiffiffik2

pk12

. In this study we present a genomic relation-ship matrix, G_New, which directly results in currentgenetic variances as well as genetic correlations betweenpopulations.

Acknowledgments

This research is supported by the Netherlands Organisationof Scientific Research (NWO) and the Breed4Food consor-tium partners Cobb Europe, CRV, Hendrix Genetics, andTopigs Norsvin.

Literature Cited

Bohren, B. B., W. G. Hill, and A. Robertson, 1966 Some observa-tions on asymmetrical correlated responses to selection. Genet.Res. 7: 44–57.

Brown, B. C., C. J. Ye, A. L. Price, and N. Zaitlen, 2016 Transethnicgenetic-correlation estimates from summary statistics. Am.J. Hum. Genet. 99: 76–88.

Bulik-Sullivan, B., H. K. Finucane, V. Anttila, A. Gusev, F. R. Dayet al., 2015 An atlas of genetic correlations across human dis-eases and traits. Nat. Genet. 47: 1236–1241.

Bulmer, M. G., 1971 The effect of selection on genetic variability.Am. Nat. 105: 201–211.

Chen, L., F. Schenkel, M. Vinsky, D. Crews, and C. Li,2013 Accuracy of predicting genomic breeding values for re-sidual feed intake in Angus and Charolais beef cattle. J. Anim.Sci. 91: 4669–4678.

Chen, L., C. Li, S. Miller, and F. Schenkel, 2014 Multi-populationgenomic prediction using a multi-task Bayesian learning model.BMC Genet. 15: 53.

Daetwyler, H. D., M. P. L. Calus, R. Pong-Wong, G. De los Campos,and J. M. Hickey, 2013 Genomic prediction in animals andplants: simulation of data, validation, reporting, and bench-marking. Genetics 193: 347–365.

De Candia, T. R., S. H. Lee, J. Yang, B. L. Browning, P. V. Gejmanet al., 2013 Additive genetic variation in schizophrenia risk isshared by populations of African and European descent. Am.J. Hum. Genet. 93: 463–470.

Erbe, M., B. J. Hayes, L. K. Matukumalli, S. Goswami, P. J. Bowmanet al., 2012 Improving accuracy of genomic predictions withinand between dairy cattle breeds with imputed high-densitysingle nucleotide polymorphism panels. J. Dairy Sci. 95:4114–4129.

Falconer, D. S., 1952 The problem of environment and selection.Am. Nat. 86: 293–298.

Falconer, D. S., and T. F. C. Mackay, 1996 Introduction to Quan-titative Genetics. Pearson Education Limited, Harlow, UnitedKingdom.

Fisher, R. A., 1918 The correlation between relatives on the sup-position of Mendelian inheritance. Trans. R. Soc. Edinb. 52:399–433.

Fisher, R. A., 1930 The Genetical Theory of Natural Selection. Ox-ford University Press, Oxford.

Flint-Garcia, S. A., J. M. Thornsberry, and E. S. Buckler, IV,2003 Structure of linkage disequilibrium in plants. Annu.Rev. Plant Biol. 54: 357–374.

Gautier, M., T. Faraut, K. Moazami-Goudarzi, V. Navratil, M. Foglioet al., 2007 Genetic and haplotypic structure in 14 Europeanand African cattle breeds. Genetics 177: 1059–1070.

512 Y. C. J. Wientjes et al.

Page 11: Multi-population Genomic Relationships for Estimating Current … · 2017. 9. 29. · on literature, that by using current allele frequencies to calculate ge nomic relationships within

Gianola, D., G. De los Campos, M. A. Toro, H. Naya, C.-C. Schönet al., 2015 Do molecular markers inform about pleiotropy?Genetics 201: 23–29.

Gilmour, A. R., B. J. Gogel, B. R. Cullis, S. J. Welham, andR. Thompson, 2015 ASReml User Guide Release 4.1. VSNInternational Ltd, Hemel Hempstead, United Kingdom.

Goddard, M. E., 2009 Genomic selection: prediction of accuracyand maximisation of long term response. Genetica 136: 245–257.

Habier, D., R. L. Fernando, and J. C. M. Dekkers, 2007 The impactof genetic relationship information on genome-assisted breedingvalues. Genetics 177: 2389–2397.

Haldane, J. B. S., 1924 A mathematical theory of natural andartificial selection—I. Trans. Camb. Phil. Soc. 23: 19–41.

Harris, B. L., and D. L. Johnson, 2010 Genomic predictions forNew Zealand dairy bulls and integration with national geneticevaluation. J. Dairy Sci. 93: 1243–1252.

Hayes, B. J., P. M. Visscher, and M. E. Goddard, 2009 Increasedaccuracy of artificial selection by using the realized relationshipmatrix. Genet. Res. 91: 47–60.

Heifetz, E. M., J. E. Fulton, N. O’Sullivan, H. Zhao, J. C. M. Dekkerset al., 2005 Extent and consistency across generations of link-age disequilibrium in commercial layer chicken breeding popu-lations. Genetics 171: 1173–1181.

Henderson, C. R., 1984 Applications of Linear Models in AnimalBreeding. University of Guelph, Guelph, Canada.

Jorjani, H., L. Klei, and U. Emanuelson, 2003 A simple method forweighted bending of genetic (co)variance matrices. J. Dairy Sci.86: 677–679.

Karoui, S., M. Carabaño, C. Díaz, and A. Legarra, 2012 Joint geno-mic evaluation of French dairy cattle breeds using multiple-traitmodels. Genet. Sel. Evol. 44: 39.

Krag, K., N. A. Poulsen, M. K. Larsen, L. B. Larsen, L. L. Janss et al.,2013 Genetic parameters for milk fatty acids in Danish Holsteincattle based on SNP markers using a Bayesian approach. BMCGenet. 14: 79.

Legarra, A., 2016 Comparing estimates of genetic variance acrossdifferent relationship models. Theor. Popul. Biol. 107: 26–30.

Lehermeier, C., N. Krämer, E. Bauer, C. Bauland, C. Camisan et al.,2014 Usefulness of multiparental populations of maize (Zeamays L.) for genome-based prediction. Genetics 198: 3–16.

Lehermeier, C., C.-C. Schön, and G. De los Campos, 2015 Assessmentof genetic heterogeneity in structured plant populations usingmultivariate whole-genome regression models. Genetics 201:323–337.

Lourenco, D. A. L., S. Tsuruta, B. O. Fragomeni, C. Y. Chen, W. O.Herring et al., 2016 Crossbreed evaluations in single-step ge-nomic best linear unbiased predictor using adjusted realizedrelationship matrices. J. Anim. Sci. 94: 909–919.

Makgahlela, M. L., I. Strandén, U. S. Nielsen, M. J. Sillanpää, andE. A. Mäntysaari, 2013 The estimation of genomic relationshipsusing breedwise allele frequencies among animals in multibreedpopulations. J. Dairy Sci. 96: 5364–5375.

Makgahlela, M. L., I. Strandén, U. S. Nielsen, M. J. Sillanpää, andE. A. Mäntysaari, 2014 Using the unified relationship matrix

adjusted by breed-wise allele frequencies in genomic evalua-tion of a multibreed population. J. Dairy Sci. 97: 1117–1127.

Meuwissen, T. H. E., B. J. Hayes, and M. E. Goddard, 2001 Predictionof total genetic value using genome-wide dense marker maps.Genetics 157: 1819–1829.

Olson, K. M., P. M. VanRaden, and M. E. Tooker, 2012 Multibreedgenomic evaluations using purebred Holsteins, Jerseys, andBrown Swiss. J. Dairy Sci. 95: 5378–5383.

Sargolzaei, M., and F. S. Schenkel, 2009 QMSim: a large-scalegenome simulator for livestock. Bioinformatics 25: 680–681.

Sawyer, S. L., N. Mukherjee, A. J. Pakstis, L. Feuk, J. R. Kidd et al.,2005 Linkage disequilibrium patterns vary substantially amongpopulations. Eur. J. Hum. Genet. 13: 677–686.

Sørensen, L. P., L. Janss, P. Madsen, T. Mark, and M. S. Lund,2012 Estimation of (co)variances for genomic regions of flexiblesizes: application to complex infectious udder diseases in dairycattle. Genet. Sel. Evol. 44: 18.

Speed, D., and D. J. Balding, 2015 Relatedness in the post-genomicera: is it still useful? Nat. Rev. Genet. 16: 33–44.

Strandén, I., and O. F. Christensen, 2011 Allele coding in genomicevaluation. Genet. Sel. Evol. 43: 25.

Technow, F., and L. R. Totir, 2015 Using bayesian multilevelwhole genome regression models for partial pooling of trainingsets in genomic prediction. G3 5: 1603–1612.

VanRaden, P. M., 2008 Efficient methods to compute genomicpredictions. J. Dairy Sci. 91: 4414–4423.

Veroneze, R., P. S. Lopes, S. E. F. Guimarães, F. F. Silva, M. S.Lopes et al., 2013 Linkage disequilibrium and haplotypeblock structure in six commercial pig lines. J. Anim. Sci.91: 3493–3501.

Visscher, P. M., G. Hemani, A. A. E. Vinkhuyzen, G.-B. Chen, S. H.Lee et al., 2014 Statistical power to detect genetic (co)varianceof complex traits using SNP data in unrelated samples. PLoSGenet. 10: e1004269.

Wientjes, Y. C. J., R. F. Veerkamp, P. Bijma, H. Bovenhuis, C. Schrootenet al., 2015 Empirical and deterministic accuracies of across-population genomic prediction. Genet. Sel. Evol. 47: 5.

Wientjes, Y. C. J., P. Bijma, R. F. Veerkamp, and M. P. L. Calus,2016 An equation to predict the accuracy of genomic values bycombining data from multiple traits, breeds, lines, or environ-ments. Genetics 202: 799–823.

Wright, S., 1931 Evolution in Mendelian populations. Genetics16: 97–159.

Wright, S., 1937 The distribution of gene frequencies in popula-tions. Proc. Natl. Acad. Sci. USA 23: 307–320.

Yang, J., B. Benyamin, B. P. McEvoy, S. Gordon, A. K. Henders et al.,2010 Common SNPs explain a large proportion of the herita-bility for human height. Nat. Genet. 42: 565–569.

Zhou, L., M. S. Lund, Y. Wang, and G. Su, 2014 Genomic predic-tions across Nordic Holstein and Nordic Red using the genomicbest linear unbiased prediction model with different genomicrelationship matrices. J. Anim. Breed. Genet. 131: 249–257.

Communicating editor: M. Sillanpaa

Relationships Between Populations 513

Page 12: Multi-population Genomic Relationships for Estimating Current … · 2017. 9. 29. · on literature, that by using current allele frequencies to calculate ge nomic relationships within

Appendix

The G matrix based on method 2 of VanRaden (2008) and Yang et al. (2010), G_VR2, weights loci by the reciprocal of thesquare root of the variance of its genotypes. In this Appendix, it is shown that this is only correct under the assumption that thevariance of the average effect (a) at a locus, say l, is inversely proportional to expected heterozygosity at that locus,

s2al¼ c

2plð12 plÞ; (A1)

where c is a constant and pl is the allele frequency at locus l.Consider the single-trait mixed model y ¼ Xbþ Zaþ e, where a is the vector of random additive genetic effects, with

varðaÞ ¼ Gs2A. This mixed model is valid only when Gs2

A indeed represents the covariances between additive genetic effects(A) of individuals. This requires that

Gij ¼ covðAi;AjÞvarðAÞ; (A2)

where i and j are individuals.By definition, the additive genetic effect of an individual is the sum of the average effects at its loci, weighted by the centered

allele count (Fisher 1918; Falconer and Mackay 1996),

Ai ¼Xl

ðxil2 2plÞal; (A3)

where xil is the allele count of individual i at locus l, taking values 0, 1, or 2. Thus,

covðAi;AjÞ ¼ cov

"Xl

ðxil 2 2plÞal;Xl

ðxjl 2 2plÞal

#: (A4)

For the genetic covariance, the ðxil 2 2plÞal terms are independent between loci by definition when there is no selection(Bulmer 1971), so that the covariance reduces to

covðAi;AjÞ ¼Xl

ðxil 22plÞðxjl 2 2plÞs2al: (A5)

Substituting the relationship between average effects and allele frequency given by Equation A1 yields

covðAi;AjÞ ¼ cXl

�ðxil 2 2plÞðxjl 2 2plÞ2plð12 plÞ

�: (A6)

Analogously, the genetic variance equals

varðAÞ ¼Xl

2plð12 plÞs2al¼Xl

c ¼ nlc;

where nl is the number of loci. Finally, from Equation A2,

Gij ¼ cov�Ai;Aj

�varðAÞ ¼ 1

nl

Xl

�ðxil 22plÞ�xjl 2 2pl

�2plð12 plÞ

�; (A7)

514 Y. C. J. Wientjes et al.

Page 13: Multi-population Genomic Relationships for Estimating Current … · 2017. 9. 29. · on literature, that by using current allele frequencies to calculate ge nomic relationships within

which is G_VR2. Thus obtaining G_VR2 requires Equation A1.Hence, G_VR2 is valid under the assumption that the magnitude of the average effect at a locus is proportional to the

reciprocal of the square root of expected heterozygosity at that locus,

al }1ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

2plð12 plÞp : (A8)

Equation A7 shows that elements of G_VR2 are the genome-wide average of the correlations at individual loci; the term insquare brackets is the correlation between additive genetic effects at locus l, and the sum of these terms is divided by thenumber of loci. Thus G_VR2 may have been motivated as the genome-wide average of relationships at individual loci.

However, relatedness refers to the correlation between the total additive genetic effects of individuals (Equation A2), whichare sums of additive genetic effects at individual loci. In general, the correlation between sums does not equal the averagecorrelation between components of the sums,

Gij 6¼ 1nl

Xl

Gijl; (A9)

but is defined as the ratio of the covariance and variance of the sum,

Gij ¼ covðAi;AjÞs2A: (A10)

Equation A9 and Equation A10 are only equal to each other under the assumption given in Equation A1.

Relationships Between Populations 515