estimation of genotype effects for milk proteins with animal and sire transmitting ability models

10
Estimation of Genotype Effects for Milk Proteins with Animal and Sire Transmitting Ability Models ABSTRACT The objective of this work was to estimate the contribution of milk protein genotype to production traits of Holstein cattle. Two approaches were employed. The first was based on an animal model analysis of milk records in which cows were genotyped for milk protein variants. The second approach was the analysis of PTA of sires in which the sires were genotyped for their milk protein variants. Results of the animal model analysis agreed qualitatively with previously pub- lished analyses of cow records, indicat- ing only a minor contribution of milk protein genotype to production traits. Qualitative similarity was also found be- tween the analysis of sire transmitting abilities and the animal model. Algebraic results suggested, however, that an in- direct analysis of published sire transmit- ting abilities does not provide unbiased estimates of differences in milk protein genotypes. Although analysis of FTA is simple, only a direct analysis of geno- typed cows with an animal model pro- vides unbiased estimates of genotype differences. (Key words: animal model, milk protein genotype, sire transmitting ability) Abbreviation key: QTL = quantitative trait loci. INTRODUCTION The expanding technologies of molecular biology cause animal breeders to approach selection decisions in new and challenging ways. Traditionally, animal breeders have worked with models of continuously dis- THOMAS R. FAMULA and JUAN FERNANDO MEDRANO Department of Animal Science University of California Davis 95616 Received August 12, 1993 Accepted May 3, 1994. tributed phenotypes and an infinite number of loci. Current developments in molecular genet- ics permit identification of single loci. This ability requires that the contributions of such loci to production traits be determined. More- over, when possible, efficient means are needed for including this information in breed- ing decisions. For dairy breedmg, several methods have emerged for identification of quantitative trait loci (QTL) for production traits that can be applied to field-collected data (7, 13, 25). The focus for many of these methods is often a marker locus [see (8)]: a segregating gene that may be linked to a useful QTL but is not necessarily a QTL itself. The objective is to make use of the marker as a pointer to a nearby QTL to enhance the efficiency of selec- tion decisions. Our objective was to examine methods of QTL detection for which candidate genes have been proposed. One example of such a search is the work of Van Eenennaam and Medrano (24), in which cows were unambiguously geno- typed at several casein loci and at the @-LG locus to estimate the contribution of each locus to production of milk, fat, and total milk pro- tein. Other investigators (5, 6, 22) estimated the contribution of potential QTL to produc- tion traits using the FTA of dairy sires for these traits as the dependent variable. The PTA thus serves as the observation of interest rather than as the direct measure of performance for daughters of these sires. The comparison of these two methods (cows and phenotypes vs. sires and ITA), particularly the theoretical differences, was the focus of this investigation. One objective of this work was to ascertain the contribution of milk protein genotype to production traits of Holstein cattle. Two ap- proaches were discussed. The first method was analogous to that used by Van Eenennaam and Medrano (24), in which the milk production records of cows are directly analyzed in a mixed linear model and each record is classi- 1994 J Dairy Sci 77:3153-3162 3153

Upload: juan-fernando

Post on 03-Jan-2017

213 views

Category:

Documents


1 download

TRANSCRIPT

Estimation of Genotype Effects for Milk Proteins with Animal and Sire Transmitting Ability Models

ABSTRACT

The objective of this work was to estimate the contribution of milk protein genotype to production traits of Holstein cattle. Two approaches were employed. The first was based on an animal model analysis of milk records in which cows were genotyped for milk protein variants. The second approach was the analysis of PTA of sires in which the sires were genotyped for their milk protein variants. Results of the animal model analysis agreed qualitatively with previously pub- lished analyses of cow records, indicat- ing only a minor contribution of milk protein genotype to production traits. Qualitative similarity was also found be- tween the analysis of sire transmitting abilities and the animal model. Algebraic results suggested, however, that an in- direct analysis of published sire transmit- ting abilities does not provide unbiased estimates of differences in milk protein genotypes. Although analysis of FTA is simple, only a direct analysis of geno- typed cows with an animal model pro- vides unbiased estimates of genotype differences. (Key words: animal model, milk protein genotype, sire transmitting ability)

Abbreviation key: QTL = quantitative trait loci.

INTRODUCTION

The expanding technologies of molecular biology cause animal breeders to approach selection decisions in new and challenging ways. Traditionally, animal breeders have worked with models of continuously dis-

THOMAS R. FAMULA and JUAN FERNANDO MEDRANO Department of Animal Science

University of California Davis 95616

Received August 12, 1993 Accepted May 3, 1994.

tributed phenotypes and an infinite number of loci. Current developments in molecular genet- ics permit identification of single loci. This ability requires that the contributions of such loci to production traits be determined. More- over, when possible, efficient means are needed for including this information in breed- ing decisions.

For dairy breedmg, several methods have emerged for identification of quantitative trait loci (QTL) for production traits that can be applied to field-collected data (7, 13, 25). The focus for many of these methods is often a marker locus [see (8)]: a segregating gene that may be linked to a useful QTL but is not necessarily a QTL itself. The objective is to make use of the marker as a pointer to a nearby QTL to enhance the efficiency of selec- tion decisions. Our objective was to examine methods of

QTL detection for which candidate genes have been proposed. One example of such a search is the work of Van Eenennaam and Medrano (24), in which cows were unambiguously geno- typed at several casein loci and at the @-LG locus to estimate the contribution of each locus to production of milk, fat, and total milk pro- tein. Other investigators (5, 6, 22) estimated the contribution of potential QTL to produc- tion traits using the FTA of dairy sires for these traits as the dependent variable. The PTA thus serves as the observation of interest rather than as the direct measure of performance for daughters of these sires. The comparison of these two methods (cows and phenotypes vs. sires and ITA), particularly the theoretical differences, was the focus of this investigation.

One objective of this work was to ascertain the contribution of milk protein genotype to production traits of Holstein cattle. Two ap- proaches were discussed. The first method was analogous to that used by Van Eenennaam and Medrano (24), in which the milk production records of cows are directly analyzed in a mixed linear model and each record is classi-

1994 J Dairy Sci 77:3153-3162 3153

3154 FAMULA AND MEDRANO

fied by the genotype of each milk protein. The analysis incorporates all known additive rela- tionships as part of the estimation of genotype contributions. Kennedy et al. (14) discussed the advantages of this approach. The second method of QTL detection used PTA for Hol- stein bulls, classified by the milk protein geno- type of the bull. This method is the statistical approach adopted by Cowan et al. (5) for their analysis of prolactin contributions, Cowan et al. (6) for the analysis of milk protein effects, and Shanks et al. (22) for the analysis of uridine monophosphate synthase deficiency and its effect on production traits. Our intent is to extend the work of Kennedy et al. (14) and to demonstrate that the analysis of sire FTA is no substitute for the analysis with animal models.

MATERIALS AND METHODS

To address the quantification of contribu- tions of milk protein genotypes to production traits, two data files were used. The first con- tained 915 milk records in which each Holstein cow was individually genotyped at several milk protein loci. These data were first ana- lyzed with a completely fixed linear model using least squares, as was done by Van Eenennaam and Medrano (24). Subsequently, the same data were analyzed by a mixed linear model, including all known relationships among cows in the data. The second data file was PTA taken from the USDA evaluations of 200 Holstein bulls from January 1992. Each bull had also been genotyped at several milk protein loci. The contribution of these loci to production traits were evaluated indirectly through the predicted genetic merit of the sires rather than through the direct observation of genotyped cows.

Direct Analysis of Milk Records: Animal Model

Milk samples were collected from 10 cooperating dairy herds distributed throughout California. This process and the techniques of milk protein determination were described by Van Eenennaam and Medrano (24). For each milk sample from the 915 first lactation cows (by 187 sires), the genotype was determined for &LG, 0-CN, and K-CN genotypes. The num- ber of distinct genotypic classes was 3 for P-

LG, 8 for o-CN, and 3 for K-CN. For the purposes of this analysis, the 8 0-CN geno- typic classes were reduced to 2 classes, AA and AB, such that all A alleles were grouped together, rather than treated as separate alleles, AI, A2, and A3, to facilitate comparison of the direct analysis with results of the indirect analysis of sire transmitting abilities to follow. Sire genotypes were determined by DNA analysis, and only A and B alleles of 0-CN were classified. The analysis of these data by Van Eenennaam and Medrano (24) was limited to a completely fixed linear model. The pro- posed analysis accounted for the remaining polygenic variation not accounted for by milk protein genotype from using an animal model. A more complete discussion of the use of mixed linear models for the analysis of candi- date QTL was given by Kennedy et al. (14). For this analysis, the mixed linear model

y = Xb + Qg + Za + e [I1

where y is a vector of 915 milk yields of cows (e.g., total milk, fat, and protein records); X is a known incidence matrix relating herd and season effects to observations in y; b is an 18 x 1 vector of unknown fixed effects of herd, season, and age; Q is a known incidence ma- trix relating fixed effects of milk genotypes to records of cows in y, g is an 8 x 1 vector of unknown contributions of milk protein geno- types for the three 0-LG classes, the two @-CN classes, and the three K-CN classes; and Z is a known incidence matrix relating observations in y to the remaining additive polygenic contri- butions to phenotype represented in the ran- dom vector a. The vector e represents un- known random residuals. Both a and e are assumed to have null means and covariance structure

where y is the ratio of polygenic additive genetic variance, 4, to the residual variance,

4 (Le., y = 4/<), and A is the numerator relationship matrix among animals represented in vector a. The order of the vector a is 1963, when cows without production records were

Journal of Dairy Science Vol. 77, No. 10, 1994

ESTIMATION OF GENOTYPE EFFECTS 3155

included in the analysis to permit appropriate construction of matrix A and, more impor- tantly, A-'. These additional animals were the sires and dams of the genotyped cows and were included without records of their own because they were not genotyped for milk proteins.

For fixed effects in b, the data were col- lected on 10 California dairy herds (24). In addition, phenotypes were classified into groups for four seasons of calving and seven ages at first calving. Seasonal groups divided the calendar year into four 3-mo periods (i.e., December to February, March to May, June to August, and September to November). Age at calving was classified into seven groups (e690 d, 690 to 749 d, 750 to 809 d, 810 to 869 d, 870 to 929 d, 930 to 989 d, and >990 d).

The improvement to analysis offered by Model [ l ] over that used by Van Eenennaam and Medrano (24) can be found in the inclusion of a. First, this term allows a more realistic representation of the data in a linear model; i.e., the study of single gene contributions is not reason enough to discount the contribu- tions of many genes, each of small effect. A second reason is more statistical; regardless of the success of randomizing in the sampling of herds and primiparous cows, estimates of milk genotype difference can be influenced by the random segregation of other QTL to specific genotypes for milk proteins. In effect, quantita- tive contributions may segregate to one geno- typic class or another by chance alone. Inclu- sion of a, coupled with the use of A-' in the mixed model equations, reduces the effects of any such random segregation (14, 23).

Consideration of a adds several difficulties to the analysis, including computational difficulties caused by the addition of >lo00 equations. However, the computational com- plexity does not preclude the general applica- tion of mixed models to this type of data. Hypothesis testing for mixed models relies on prior knowledge of variance components, at least to proportionality; however, the separa- tion of potential QTL from the remaining poly- genic fraction alters our a priori knowledge of ratio y. Our analysis reestimates ratio y with milk protein genotype in the model via derivative-free REML (9, 19).

Indirect Analysis of Sire PTA

Several AI companies (American Breeders Service, DeForest, WI; Eastern AI Coopera- tive, Ithaca, NY; Landmark Genetics, Hugh- son, CA; Select Sires, Inc., Plain City, OH; Tri-State Breeders Cooperative, Baraboo, WI; and 21st Century Genetics Cooperative, Shawano, WI) provided samples of blood or semen from 200 Holstein bulls chosen at their discretion. The DNA was extracted from these samples and subsequently used to classify bulls for allelic genetic variants of 0-LG, 0- CN, and K-CN. Procedures for extraction and classification were described by Medrano and Aguilar-Cordova (16, 17) and Medrano and Shamow (18). For these data, published bull evaluations for production traits could readily be related to their known genotypes for milk proteins. However, this type of analysis may not provide a correct assessment of the contri- bution of these loci to production traits.

Analysis of predictions of genetic merit, classified by milk protein genotype, follows from a model analogous to Model [ l ] in which y is a vector of 382 PTA for milk, fat, and other production traits (the additional 182 PTA are those of sires included to build relation- ships among the 200 genotyped sires); b is a scalar unknown constant; X is a column vector of unity values; Q is a known incidence matrix relating fixed effects of milk genotypes of the bulls to PTA in y; g is an 8 x 1 vector of unknown genotype effects for milk proteins for the three 0-LG classes, the two @-CN classes, and the three K-CN classes; and Z is an inci- dence matrix relating PTA in vector y to the remaining polygenic components represented in vector a, and e is a vector of random, uncorrelated residuals. The random effects, a and e, have an assumed covariance structure of

where y is the ratio of additive genetic vari- ance of PTA, 4, to the residual genetic vari-

ance of PTA, 4; A is the numerator relation- ship matrix among animals represented in a; and D is a diagonal matrix of coefficients for the residual variance. The precise variance of PTA is based on elements of A and the inverse of the coefficient matrix of the mixed model

Journal of Dairy Science Vol. 77, No. 10, 1994

3156 FAMULA AND MEDRANO

equations used to predict the PTA (12). There- fore, the variance of PTA is approximated by accounting for the differing accuracies of evaluation of each sire. The process is identical to that discussed by Cowan et al. (5). Elements of D are determined by the published reliabil- ity (accuracy) of the PTA [see (26)].

With the appropriate form of D (and its inverse), genotype differences are estimated with the mixed model equations, and hypothe- ses are tested accordingly (12). This analysis and equivalent approaches have been presented by other investigators (5). The assumption of these models is that the covariance among PTA is only due to relationships. However, the actual model for PTA involves elements of the inverse of the mixed model equations used in their prediction and covariances between genetic and residual effects (10). The model used herein and elsewhere [e.g. (5)] , simplifies the analysis with the assumption that PTA can be equated to the actual transmitting abilities, with simultaneous adjustment for the unequal variance of each PTA.

RESULTS

Table 1 presents the frequencies of 0-LG, P-CN, and K-CN genotypes for each class of the 915 first lactation cows and the 200 bulls. Deviations of genotype frequencies between first lactation cows and bulls were examined with a chi-square test. Deviations were signifi- cant (P e .001) only at the /3-CN locus. The

TABLE 1. Genotypic frequencies and observed number of the @-LG, @-CN, and K-CN genotypes for 915 first lacta- tion cows and 200 bulls of the Holstein breed.

cows Bulls

Frequency Number Frequency Number

P-LG AA AB BB

K-CN AA AB BB

AA AB

8-CN

16.7 153 51.4 470 31.9 292

67.8 620 29.2 267 3.1 28

95.8 877 4.2 38

18.0 36 48.5 97 33.5 67

74.5 149 23.5 47 2.0 4

90.0 180 10.0 20

frequency of AB heterozygote bulls was higher than expected, possibly because of the nonran- dom distribution of two heterozygote grand- sires that were represented for 10 of the AB bulls. Of course, selection among sires also may have played a role, although the P-CN locus did not have a significant effect on any production trait (see Table 3).

Table 2 summarizes estimates of genotype effects for milk proteins across the five major breeds of dairy cattle that are analogous to those of Van Eenennaam and Medrano (24). The present analysis was restricted to the 915 Holstein records. As in the original analysis of Van Eenennaam and Medrano (24), the esti- mates of genotype differences in Table 2 are based on a fixed linear model and least squares, ignoring any additional polygenic contribution to phenotype beyond milk protein genes. Table 3 presents an analysis of the same Holstein data that incorporated a polygenic contribution to phenotype (using Model [l] and all known relationships among females), which is additional to effects induced by milk protein genotype.

Comparison of the results of two analyses of the same data is not a definitive method for contrasting models. A comparison of models is based on how the data are created, sampled, and used to estimate unknown parameters. Thus, comparison of results in Tables 2 and 3 is based on the assumption that the mixed model estimates (Table 3) are more accurate than those from the fixed analysis (Table 2). The failure of the fixed analysis to incorporate known covariances among cows because of additional polygenic inheritance supports that assumption. A comparison of standard errors for contrasts across Tables 2 and 3 reveals little change in accuracy that is due to the choice of analysis. However, such a conclusion implies that the standard errors of Table 2 (the fixed model analysis) are correct. Henderson (11) demonstrated that the true standard errors of estimation are inflated when random effects are ignored in the analysis but exist in the true model. The computed standard errors in Table 2 are underestimates of the true values for these variances because they were computed from the diagonal elements of the generalized inverse of the left-hand-side of the least squares equations. Results in Table 3, based on the mixed model analysis, should provide not

Journal of Dairy Science Vol. 77, No. 10, 1994

ESTIMATION OF GENOTYPE EFFECTS 3157

TABLE 2. Estimates (Est.) of genotype differences for milk proteins of 8-LG, K-CN, and 8-CN for production traits of Holsteins from a least squares analysis of the data presented by Van Eenennaam and Medrano (24) on 915 first lactation cows.

Milk Fat

Genotype difference Est. SE Est. SE

B-LG AB to fl-LG AA -29 120 -2.0 4.5 8-LG BB to 8-LG AA 4 8 129 -.2 4.8 K-CN AB to K-CN AA 164 98' 3.4 3.1 K-CN BB to K-CN AA 297 255 2.1 9.5 8-CN AB to 8-CN AA -386 227t -12.6 8.5

Percentage Percentage Protein of fat of protein

Est. SE Est. SE Est. SE

-3.1 3.6 -.01 .03 -.03 .02 -5.0 3.9 .02 .04 -.04 .02* 5.6 3.0t -.03 .03 0 .01 9.9 7.1 -.lo .Ol 0 .04

-9.0 6.9 -.01 .06 .02 .03

tP < . lo. *P < .os.

only more accurate estimates of the genotype contrasts, but also more accurate estimates of the variability of those contrasts. The similar- ity of standard errors across Tables 2 and 3 is artificial, because the values in Table 2 are computed without consideration of the random polygenic effects that are ignored, but assumed to exist, for milk production traits.

A nonspecific review of the results in Ta- bles 2 and 3 reveals the general similarity of genotype differences for milk proteins. When the results of Tables 2 and 3 are compared, distinctions in point estimates of genotype differences generally are small and inconse- quential. For example, differences in 305-d milk production among K-CN genotypes are similar, although not identical, as shown in Tables 2 and 3. According to both the fixed

and mixed analyses, the difference between the AA and AB genotypes of K-CN is approxi- mately 150 kg of milk yield (P < . I ) . The difference between the BB and AA genotypes of K-CN remains nonsignificant across Tables 2 and 3. The low frequency of the B allele in the Holstein population (24) provides for few BB cows and, thus, a somewhat higher stan- dard error for the estimates of genotype differ- ences. The inclusion of more BB cows in the data would provide more power for this com- parison. Earlier studies (1, 21) using more cows than in the present study indicated a superiority of the B allele; however, a signifi- cant effect of K-CN on milk yield has yet to be demonstrated.

Moreover, in the work of Van Eenennaam and Medrano (24), the allelic frequencies were

TABLE 3. Estimates (Est.) of genotype differences for milk proteins of O-LG, K-CN, and 6-CN for production traits of Holsteins from a mixed model analysis of the data presented by Van Eenennaam and Medrano (24) on 915 first lactation cows.

Milk Fat

Genotype difference Est. SE Est. SE

Percentage Protein of fat

Est. SE Est. SE

Percentage of protein

Est. SE

8-LG AB to 8-LG AA -11 118 -1.9 4.4 @-LG BB to 8-LG AA -3 129 1.1 4.8 K-CN AB to K-CN AA 136 98t 3.3 3.7 K-CN BB to K-CN AA 288 252 2.9 9.5 0-CN AB to 8-CN AA 4 0 3 22V -13.2 8.5

-2.5 3.6 -.01 .03 -3.2 3.9 .02 .04

5.0 3.0t -.02 .03 10.5 7.6 -.08 .Ol -9.5 6.9 -.02 .06

-.03 .02t -.04 .02*

.01 .01

.01 .04

.03 .03

+P < .lo. *P < .05.

Journal of Dairy Science Vol. 77, No. 10, 1994

3158 FAMULA AND MEDRANO

examined for 1965 [see Table 2 (24)]. Allelic frequencies have remained stable for the last 30 yr in spite of intensive selection for milk production. These alleles therefore may have little positive association with milk production.

When the results of Tables 2 and 3 are compared, only the contribution of 0-LG to the percentage of protein provides for a change in significance (and this difference is nearly im- perceptible). Specifically, in the fixed analysis, the AB and AA genotypes for 0-LG did not differ significantly for their impact on the per- centage of total milk protein, but the mixed model analysis suggested a possible significant difference (P c .I) between these two geno- types.

A comparison of Tables 3 and 4, however, suggests a potentially different role for milk protein genotypes for production traits. How- ever, the qualitative difference between results in Tables 3 and 4 is minimal. Table 4 presents results of the indirect analysis of milk protein genotype through PTA of AI sires, assuming that the values represent one-half of the ex- pected difference in milk protein genotypes in Tables 2 and 3. Table 4 shows that no geno- typic class for milk proteins contributes sig- nificantly to a production trait.

Despite distinction of certain details, com- parison of specific contrasts across Tables 3 and 4 yields little valuable information because the values in Table 4 are best interpreted as not different from 0. Thus, according to the in- direct analysis, information on milk protein genotype should be ignored, despite a few instances of contrary information in Table 3.

These distinctions, from the analysis of differ- ent data files, suggest that use of PTA from AI sires to estimate differences in phenotype of progeny may have some inherent problems.

DISCUSSION

The objective of this study was to reevalu- ate the contribution of milk protein genotype to production traits. The basis of the reevalua- tion was the comparison of different statistical models and the analysis of different data files. The first comparison was of the results of an animal model analysis, which included terms for the polygenic contribution to phenotype, with the results of a completely fixed model analysis, of the same data. A second compari- son was between the results of the animal model analysis and the results of separate data collected on AI sires. The second data file related milk protein genotype of AI sires to their computed ETA for production traits.

Theoretically, an animal model is easily adapted to estimation of the contributions of single genotypes to production traits. An ani- mal model permits consideration and correc- tion for the effects of fixed nongenetic contri- butions to production records, additional polygenic effects, nonadditive genetic compo- nents, maternal effects, and additional con- founding complexities (14). The principal dis- advantage of the animal model is in data collection, because each animal with a recorded phenotype must also be genotyped at the locus under study. Depending on the gene under investigation, the laboratory costs in time and money usually prohibit the analysis of large data files.

TABLE 4. Estimates @st.) of one-half of the genotype differences for milk proteins of @-LG, K-CN, and @-CN for production traits of Holsteins from an indirect analysis of PTA of 200 AI sires.

Percentage Percentage of protein

Genotype difference Est. SE Est. SE Est. SE Est. SE Est. SE

Milk Fat Protein of fat

8-LG AB to @-LG AA 31 43 15 1 5 .2 .9 01 03 08 09 1 1 1.0 -03 03 04 09 @-LG BB to @-LG AA -12 46 1 9 1 6

06 08 K-CN AB to K-CN AA 41 39 5 1 3 -.5 .8 02 02 K-CN BB to K-CN AA 63 114 - 3 4 0 -12 2.4 0 07 -02 23 P-CN AB to @-CN AA -50 45 1 4 16 1.5 1.0 -.04 03 0 09

tP < 10 * P < 05

Journal of Dairy Science Vol. 77, No. 10, 1994

ESTIMATION OF GENOTYPE EFFECTS 3159

The indirect analysis of sire PTA is a sim- ple and attractive alternative to identify contri- butions of single genes to production traits. The effect of a single locus is embedded in the computed PTA. Investigators hope to reveal the effect of the single locus through a statisti- cal analysis. Unfortunately this goal is not feasible without additional information about the putative QTL and the computations of the PTA. Sire PTA is a regressed statistic, and the differences among sires for single genotypes are masked by other genetic contributions and by the statistics involved in the computation of PTA.

To examine the problems of an indirect analysis, Model [ l ] can be reconsidered as the true model for a production trait, y, where existence of the QTL remains unknown. Ac- cordingly,

y = Xb + Qg + Za + e [I1

where Qg represents the contribution of the QTL to be identified, and Xb and Za represent fixed effects and random additive genetic values, respectively. Sire transmitting abilities are included in a. In contrast to Model [l] is the model used for the prediction of sire ITA,

[21

which ignores the existence of the QTL represented in Qg of Model [ 11. Accordingly, E[y] = Xb + Qg under the true model, but the computational model assumes that E[y] = Xb alone. Consequently, the equations used to pre- dict sire PTA, represented in 8, generate a model for solutions that are not a function of a and e alone. Henderson’s (10) notation and concepts on models for solutions was used to derive the following model for predictions of genetic merit (defined as Model [3]):

y = Xb + Za + e,

= [C;,X‘ + C22Z’JR-l Qg + [I - (222 A-’]a + [C;,X’ + C22ZIR-le

= Wlg + W2a + W3e [31

where

represents the generalized inverse of the coeffi- cient matrix of the mixed model equations constructed from the computational model without Qg (model [2]). Because a and e are assumed to have null means, Model [3] shows that E[&] = Wlg when a QTL is assumed to exist but not considered in the prediction of a.

Equation [3] is the true model for sire PTA. The indirect analysis of sire PTA is only an approximation of the true model. When PTA is considered to be a phenotype of the sire, and a contribution of the sire’s genotype at the puta- tive QTL is included, the computational model for the indirect analysis is

8 = Pg + zs + e [41

where P is an incidence matrix relating sire genotype at the putative QTL to PTA. The random vector s represents the remaining poly- genic contribution to ETA, and e is an unex- plained residual. Both s and e are assumed to have null means. Accordingly, the computa- tional model assumes that E[f] = Pg # Wlg. Thus, the indirect analysis of sire PTA is bi- ased.

The bias in estimation of g in a sire PTA analysis is found in two places. The first is in the computation of PTA under Model [2]. The incidence matrix P does not contain elements of C,, and C22 which are part of the prediction of PTA. The second source of bias is the failure to include Q in the PTA analysis. The Q matrix relates QTL genotype of daughters to their own phenotypes. Matrix P relates the QTL genotype of the sire of these daughters to h s PTA. Although P and Q are related by the segregation of alleles from parent to progeny, substitution of P for Q is not sufficient to explain the true expected value of 8. The unbi- ased identification of QTL. must rely on animal models, in which phenotypes can be directly assigned to the appropriate QTL class. The simple analysis of sire PTA cannot estimate g without bias.

The animal model, however, is not without problems. First, the animal model has addi- tional computational demands not encountered with fixed models. However, these demands can be alleviated by the growing number of statistical packages that accommodate mixed linear models (e.g., SAS). In addition, animal

Journal of Dairy Science Vol. 77, No. 10, 1994

3160 FAMULA AND MEDRANO

models require a greater demand on laboratory work to identify the genotypes of individuals with production records. To avoid these large sampling costs, Weller et al. (25) introduced granddaughter designs, which, although not ideal, offer a feasible alternative between ani- mal models and the indirect analysis of sire PTA.

Perhaps the most critical problem for iden- tifying QTL for field data is the impact of selection. Selection creates a disequilibrium between the canddate locus (if it is a contribu- tor to the production phenotype) and other polygenic effects with an influence on the trait. Animal models can correct for this effect if relatives are included in the analysis (14). Granddaughter designs (and also the indirect analysis of sire PTA) are equally vulnerable to the effects of selection, particularly when data from AI sires that have already passed their progeny test are evaluated. Such sires gener- ally have favorable polygenes for the trait of study, and differences between QTL genotypes are underestimated. To correct this underesti- mation, information on relatives could be in- cluded, although this solution is only partially helpful.

Whether included in animal models or granddaughter analyses, information on rela- tives can be accommodated in two ways. The first and simplest is to include pedigree infor- mation, expanding the size and detail of the relationship matrix. This method allows for more ties in the data but only partially offsets the complicating effects of selection. To ac- count for selection, the records of these rela- tives must also be included. However, to be useful to the analysis (beyond expanding the pedigree), these additional records must also be genotyped for the given candidate locus. The additional genotyping may be impossible (as in the case of animals that are already deceased) or simply increase the cost of the research. However, unless genotyped, the records cannot be added to the analysis. The use of ungeno- typed records is equivalent to substitution of the incorrect incidence matrix in a mixed model analysis. Henderson (1 1) established that the use of incorrect or approximate design matrices, in place of the true incidence matrix, leads to biased estimators of fixed effects. Thus, the correction for selection must rely on use of large data files, with ties back to some

unselected base, or else data must be collected from random samples within the population.

As for the results of the analyses presented here, the production of milk and fat and the fat percentage were uninfluenced by milk protein genotype. This finding agrees with the earlier results of Van Eenennaam and Medrano (24). Moreover, most of the genotype differences (Table 3) are equal to 0 except for the contribu- tion of the 8-LG locus to milk protein percent- age. In this case, the analysis of Van Eenen- naam and Medrano (24) was unable to detect a significant difference among all breeds. Our analysis of Holsteins only, in a completely fixed model, identifies a difference among homozygotes (Table 2). The mixed model also detects a slight difference (P e .lo) between the heterozygote and 0-LG AA classes. The higher protein percentage associated with the 6-LG genotype (AA greater than AB greater than BB) was demonstrated earlier (4, 15, 19, 20), in contrast to results of Bovenhuis et al. (2). The contrasts in Table 3 can be used to estimate the contribution of the @-LG locus to the genetic variance of protein percentage (3). Given the frequencies estimate by Van Eenen- naam and Medrano (24), .43 for the @-LG A allele (.57 for the B allele), the additive genetic variance for protein percentage generated by the P-LG locus is .00017; dominance variance

For a model that did not include milk pro- tein genotype (including all other terms of Model [l]), the estimated additive genetic vari- ance was .0177 [using a derivative-free REML program, DFREML (9, 19)], The estimated residual variance was .0149. Thus, although genotypic contrasts were significantly different from zero, the @-LG locus was responsible for 4 % of the total genetic variance in milk protein percentage.

is 2.5 x 10-5.

CONCLUSIONS

The animal model analysis presented in this report attempted to identify several production traits influenced by milk protein genotype. However, milk production traits did not appear to be significantly affected by milk protein genotype, regardless of the analysis used. Milk production of AB K-CN cows was only slightly greater than the milk production of their AA herdmates, and the significance of this differ-

Journal of Dairy Science Vol. 77, No. 10, 1994

ESTIMATION OF GENOTYPE EFFECTS 3161

ence was questionable. The 0-CN locus may also have had an impact on milk production; AA cows produced approximately 400 kg (-27) more milk than their AB herdmates. However, this difference also failed to result in a significance commensurate with genes of major effect.

Identification of QTL from candidate genes should be conducted with production pheno- types of genotyped individuals rather than an indirect analysis of sire PTA. This conclusion is based on theoretical considerations rather than on the specific results of our two ana- lyses. The identification of single genes with large effect is a difficult task when data are collected from the filed. Screening of popula- tions through an indirect sire analysis is a simple and effective beginning to this search, but the potential for drawing incorrect conclu- sions from an analysis by sire only can be considerable. Investigators must weigh the risk of misleading conclusions against the signifi- cant cost of an animal model approach. The animal model analysis should be considered to be the standard method to ascertain the exis- tence and magnitude of QTL for production traits from field data.

ACKNOWLEDGMENTS

Appreciation is extended to K. Meyer, who provided the DFREML program, and to C. M. Finley for his assistance in computation. This work was supported with funding from the California Milk Advisory Board.

REFERENCES

1 Aleandri, R., L. G. Buttazzoni, J. C. Schneider. A. Caroli, and R. Davoli. 1990. The effects of milk protein polymorphisms on milk components and cheese-producing ability. J . Dairy Sci. 73:241.

2Bovenhuis, H., J.A.M. van Arendonk, and S. Korver. 1992. Associations between milk protein polymor- phisms and milk production traits. J. Dairy Sci. 75: 2549.

3Bulmer. M. G. 1985. The Mathematical Theory of Quantitative Genetics. Clandon Press. Oxford, England.

4Cerbulis. J. , and J. M. Farrel. 1975. Composition of m i l k s of dairy cattle. I. Protein, lactose and fat con- tents and distribution of protein fraction. J . Dairy Sci. 58:817.

5 Cowan, C. M., M. R. Dentine, R. L. Ax, and L. A. Schuler. 1990. Structural variation around prolactin

gene linked to quantitative traits in an elite Holstein sire family. Theor. Appl. Genet. 79577.

6Cowan, C. M., M. R. Dentine, and T. Coyle. 1992. Chromosome substitution effects associated with K-

casein and &lactoglobulin in Holstein cattle. J. Dairy Sci. 75:1097.

7 Dentine, M. R., and C. M. Cowan. 1990. An analytical model for the estimation of chromosome substitution effects in the offspring of individuals heterozygous at a segregating marker locus. Theor. Appl. Genet. 79: 775.

8Femando. R. L.. and M. Grossman. 1989. Marker assisted selection using best linear unbiased predic- tion. Genet. Sel. Evol. 21:467.

9Graser, H. U., S. P. Smith, and B. Tier. 1987. A derivative-free approach for estimating variance com- ponents in animal models by restricted maximum likelihood. J. Anim. Sci. 64:1362.

lOHenderson, C. R. 1973. Sire evaluation and genetic trends. Page 10 in Proc. Anim. Breeding Genet. Symp. in Honor of Dr. J. L. Lush, Am. SOC. Anim. Sci., Am. Dairy Sci. Assoc., Champaign, IL.

11 Henderson, C. R. 1975. Comparison of alternative sire evaluation methods. I. Anim. Sci. 41:760.

12 Henderson, C. R. 1984. Applications of linear models in animal breeding. Univ. Guelph., Guelph, ON, Canada.

13 Hoeschele, I. 1988. Genetic evaluation with data pmnt ing evidence of mixed major gene and poly- genic inheritance. Theor. Appl. Genet. 76:81.

14Kennedy. B. W., M. Quinton, and J.A.M. van Aren- donk. 1992. Estimation of effects of single genes on quantitative traits. J. Anim. Sci. 70:2000.

15 McLean, D. M., E.R.B. Graham, R. W. Ponzoni, and H. A. McKenzie. 1984. Effects of milk protein genetic variants on milk yield and composition. J. Dairy Res. 51531.

16 Medrano, J. F., and E. Aguilar-Cordova. 1990. Genotyping of bovine kappa-casein loci following DNA sequence amplification. Biflechnology 8: 144.

17 Medrano, J. F., and E. Aguilar-Cordova. 1990. Poly- merase chain reaction amplification of bovine 13- lactoglobulin genomic sequences and identification of genetic variants by RFLP analysis. Anirn. Biotechnol. 1:73.

18 Medrano, J. F., and L. Shanow. 1991. Genotyping of bovine &casein loci by restriction site modification of polymerase chain reaction (FCR) amplified DNA. J. Dairy Sci. 74(Suppl. 1):282.(Abstr.)

19Meyer. K. 1988. DFREML. Programs to estimate variance components for individual animal models by restricted maximum likelihood. User notes. Edinburgh Univ., Edinburgh, Scotland.

20Ng-Kwai-Hang, K. F., J. F. Hayes, J. E. Moxley. and H. G. Monardes. 1984. Association of genetic variants of casein and milk serum proteins with milk, fat and protein production by dairy cattle. J . Dairy Sci. 67: 835.

21 Ng-Kwai-Hang, K. F., H. G. Monardes, and J. F. Hayes. 1990. Association between genetic polymor- phism of milk proteins and production traits during three lactations. I. Dairy Sci. 73:3414.

22 Shanks, R. D., J. L. Robinson, and M. H. Healy. 1986.

Journal of Dairy Science Vol. 77, No. 10. 1994

3 162 FAMULA AND MEDRANO

Productive and reproductive performance of cattle heterozygous for deficiency of Uridine monophosphate synthase. Proc. 3rd World Congr. Genet. Appl. Livest. Prod., Lincoln, NE XI:78.

23 Sorensen, D. A,, and B. W. Kennedy. 1983. The use of the relationship matrix to account for genetic drift variance in the analysis of genetic experiments. Theor. Appl. Genet. 66217.

24 Van Eenennaam. A. L.. and J. F. Medrano. 1991. Milk

protein polymorphisms in California dairy cattle. J. Dauy Sci. 74:1730.

25 Weller, J . I., Y. Kashi, and M. Soller. 1990. Power of daughter and granddaughter designs for determining linkage between marker loci and quantitative trait loci in dairy cattle. J. Dairy Sci. 73:2525.

26 Wiggans. G. R., H. D. Norman, and R. L. Powell. 1984. Changes in genetic evaluation procedures for January 1984. DHI Lett. 60:l .

Journal of Dairy Science Vol. 77, No. 10, 1994