center of statistical genetics university of pisa traceability of cattle breeds by dna analysis...
TRANSCRIPT
Center of Statistical GeneticsUniversity of Pisa
Traceability of cattle breedsby DNA analysis
Silvano Presciuttini
LimogesLimoges, , 2929 juinjuin 2007 2007
Center of Statistical Genetics
S. Presciuttini – University of Pisa
What is traceability in the food chain? Traceability of animal products to their source breed Traceability of animal products to their source breed
represents a fundamental aspect for granting food quality, represents a fundamental aspect for granting food quality, safety and authenticity, and protects both consumers and safety and authenticity, and protects both consumers and producers from possible frauds.producers from possible frauds.
In 2002, a regulation by In 2002, a regulation by the the European Union defined traceability European Union defined traceability as as ““the ability to trace and follow the ability to trace and follow food, feed and ingredients food, feed and ingredients through all stages of production, through all stages of production, processing and distributionprocessing and distribution””. .
Center of Statistical Genetics
S. Presciuttini – University of Pisa
Three levels of traceability INDIVIDUALINDIVIDUAL
At the time of slaughter, a small sample of tissue is At the time of slaughter, a small sample of tissue is taken from each carcass, and this sample provides a taken from each carcass, and this sample provides a unique DNA profile of the animal.unique DNA profile of the animal.
POPULATIONPOPULATION OR OR BREED BREED Breed traceability is essential whenever products Breed traceability is essential whenever products
with Protected Designation of Origin (PDO) or with Protected Designation of Origin (PDO) or Protected Geographical Indication (PGI) are Protected Geographical Indication (PGI) are obtained from animals of particular breeds, for obtained from animals of particular breeds, for which costs or logistic or technical reasons do not which costs or logistic or technical reasons do not make it convenient or possible to recourse to make it convenient or possible to recourse to individual traceability.individual traceability.
SPECIESSPECIES methods based upon DNA fragments analysis are methods based upon DNA fragments analysis are
helpful to identify species in sterilized fish products, helpful to identify species in sterilized fish products, i.e. canned tuna.i.e. canned tuna.
Center of Statistical Genetics
S. Presciuttini – University of Pisa
The importance of breed traceability
BBreed names are more widely used as a brand name, and there is a reed names are more widely used as a brand name, and there is a growing interest growing interest by food producersby food producers in the ability to assign anonymous in the ability to assign anonymous samples to known populations. Therefore, tests for breed identity samples to known populations. Therefore, tests for breed identity would be valuable means to validate quality and origin of livestock would be valuable means to validate quality and origin of livestock products.products.
Tracing the breed of origin of animal products represents an Tracing the breed of origin of animal products represents an opportunity for the promotion of local genetic resources with benefits opportunity for the promotion of local genetic resources with benefits for local economy, breed valorization and sustainable conservation of for local economy, breed valorization and sustainable conservation of biodiversity.biodiversity.
For these reasons breed traceability is an important topic of researchFor these reasons breed traceability is an important topic of research, , particularlyparticularly in Mediterranean countries (Italy, Spain and France) in Mediterranean countries (Italy, Spain and France), , wherewhere a high number of typical products are a high number of typical products are declared to be declared to be mono-mono-breed.breed.
Center of Statistical Genetics
S. Presciuttini – University of Pisa
Breed assignment by DNA analysis Two different approaches are possible:Two different approaches are possible:
1)1) BASED ON ANONYMOUS MARKERSBASED ON ANONYMOUS MARKERS If there is substantial variation in allele frequencies among different breeds, a If there is substantial variation in allele frequencies among different breeds, a
large number of loci typed in an individual may provide sufficient statistical large number of loci typed in an individual may provide sufficient statistical power to assign it to its true breed of origin. Tpower to assign it to its true breed of origin. This his approachapproach requires the requires the creation of a database creation of a database of allele frequencies in the relevant breedsof allele frequencies in the relevant breeds..
2)2) BASED ON BREED-SPECIFIC TRAIT LOCIBASED ON BREED-SPECIFIC TRAIT LOCI If we identifiy the loci and the alleles that are responsible of the breed If we identifiy the loci and the alleles that are responsible of the breed
phenotypic characteristics, we may assign individuals to their breed by phenotypic characteristics, we may assign individuals to their breed by inferring the phenotype from the genotype.inferring the phenotype from the genotype.
Center of Statistical Genetics
S. Presciuttini – University of Pisa
Use of microsatellites to assign individuals to populations: an example from human
Center of Statistical Genetics
S. Presciuttini – University of Pisa
Background BresciaBrescia ranks among the highest ranks among the highest
towns in Italy for the proportion of towns in Italy for the proportion of censused immigrants from non-EU censused immigrants from non-EU countries (about 10% of the local countries (about 10% of the local resident population)resident population)
Most blood crimes in this area Most blood crimes in this area happens within ethnically defined happens within ethnically defined groupsgroups
A test that could assign a biological A test that could assign a biological stain to a subject of a particular stain to a subject of a particular population would be highly population would be highly welcomedwelcomed
Center of Statistical Genetics
S. Presciuttini – University of Pisa
Population samples Unrelated individuals from the historical series of the Institute of Legal Medicine Unrelated individuals from the historical series of the Institute of Legal Medicine
of the University of Brescia, including a large number of immigrants from a of the University of Brescia, including a large number of immigrants from a variety of countries, were selected for the present analysisvariety of countries, were selected for the present analysis
In addition, blood samples from subjects of known ethnicity were collected from In addition, blood samples from subjects of known ethnicity were collected from the local hospital, following IRB approvalthe local hospital, following IRB approval
The composition of the final sample is shown belowThe composition of the final sample is shown below Subjects were typed with the Subjects were typed with the Profiler PlusProfiler PlusTMTM and and SGM PlusSGM PlusTMTM kits (totaling 13 kits (totaling 13
loci), following standard protocolsloci), following standard protocols
Ethnic group Sample size Italians (born in Brescia) 120 Indians 38 Maghrebians 69 Mongolians 21 Blacks (sub-saharan countries) 122 Slavonians (including Albanians) 66 Total 436
Center of Statistical Genetics
S. Presciuttini – University of Pisa
Population assignment test (1)Theoretically, assigning individuals to a given population based on a multilocus genotype is surprisingly easy.
The frequency of the D19S433 2/8 genotype is 38-fold higher in Blacks than in Italians
The likelihood of the BL-160 multilocus genotype is 3,700 higher in Blacks than in Italians
SA
MP
LE
CumulativeLR
BL-160 5 6 2 9 11 11 2 8 4 5 5 9 3 4 4 7 4 4 6 7 9 9 4 7 5 8Allelefreq.
Allelefreq.
Blacks 0.053 0.096Italians 0.005 0.027
Genotype probabilities Blacks Italians0.0102 0.0003
Likelihood ratio 38.1 ... ... ... ... ... ... ... ... ... 3699.9
VW
A
D7S
820
D8S
1179
FG
A
TH
01
D21
S11
D2S
1338
D3S
1358
D5S
818
D13
S31
7
D16
S53
9
D18
S51
D19S433
Center of Statistical Genetics
S. Presciuttini – University of Pisa
Population assignment test (2)S
AM
PL
E
CumulativeLR
NE-160 5 6 2 9 11 11 2 8 4 5 5 9 3 4 4 7 4 4 6 7 9 9 4 7 5 8
Allelefreq.
Allelefreq.
Negroids 0.052885 0.096154Italians 0 0.026667
Negroids ItaliansGenotype probabilities 0.01017 0LR (Negroid vs Italian) #DIV/0! ... ... ... ... ... ... ... ... ... #DIV/0!
D13
S31
7
D16
S53
9
D18
S51
D19S433
D21
S11
D2S
1338
D3S
1358
D5S
818
VW
A
D7S
820
D8S
1179
FG
A
TH
01
A computational trouble (a division by zero) arises when a particular allele is absent from a population sample; in this case, an arbitrary frequency must be given to that allele in that population.
Center of Statistical Genetics
S. Presciuttini – University of Pisa
Blacks vs Italians
-70
-60
-50
-40
-30
-70 -60 -50 -40 -30
log-likelihood
log
-lik
elih
oo
d
ITALIANS
BLACKS
We assigned a value of 1/200 (=0.005) to the frequency of the missing We assigned a value of 1/200 (=0.005) to the frequency of the missing alleles and calculated the likelihoods and the likelihood ratios using alleles and calculated the likelihoods and the likelihood ratios using the allele frequencies estimated from all available data (about 100 the allele frequencies estimated from all available data (about 100 subjects typed for each locus in both samples), subjects typed for each locus in both samples),
-20 -10 0 10 20Log(likelihood ratio)
ITALIANS
BLACKS
Mean LR +/- 95% C.L.
Center of Statistical Genetics
S. Presciuttini – University of Pisa
Simulated samplesIn order to estimate the statistical power of the assignment test In order to estimate the statistical power of the assignment test with higher accuracy, we simulated 10,000 individuals for with higher accuracy, we simulated 10,000 individuals for both the Italian and the Black samplesboth the Italian and the Black samples
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
-20 -15 -10 -5 0 5 10 15 20
Log(likelihood ratio)
de
ns
ity
False positives: proportion of Italians that are erroneously classified as Blacks ( )
True positives: proportion of Blacks that are correctly classified as Blacks (1- )
Test value
ITALIANS BLACKS
LOG(LR) DISTRIBUTIONS IN SAMPLES OF 10,000 SIMULATED INDIVIDUALS
Center of Statistical Genetics
S. Presciuttini – University of Pisa
Statistical power of discriminating between the two simulated samples (Blacks vs Italians)
Test value of ln(LR)
(ratio of false
positives)
1- (ratio of true
positive) 0.0 6.6% 93.7% 1.0 3.1% 88.4% 2.0 1.5% 80.3% 3.0 0.6% 70.3% 4.0 0.3% 57.9% 5.0 0.1% 45.3% 6.0 0.01% 32.9%
Center of Statistical Genetics
S. Presciuttini – University of Pisa
Conclusions The 13 STR loci included in two commercial kits The 13 STR loci included in two commercial kits providedprovided
a limited but significant power a limited but significant power toto infer infer the the ethnicity of ethnicity of immigrantimmigrant groupsgroups..
Not surprisingly, the highest level of discrimination was Not surprisingly, the highest level of discrimination was achieved achieved by by contrasting Blacks with resident Whites.contrasting Blacks with resident Whites.
When two When two alternativealternative hypotheses hypotheses about the ethnic origin of about the ethnic origin of a sample a sample can be formulated can be formulated with confidencewith confidence, a population , a population assignment test can already be assignment test can already be applied toapplied to real cases real cases
Center of Statistical Genetics
S. Presciuttini – University of Pisa
The objective of the present study was The objective of the present study was to assess the practicability of assigning to assess the practicability of assigning individuals among four cattle breeds individuals among four cattle breeds using STR. This goal was divided into using STR. This goal was divided into three major tasks: 1) validating the three major tasks: 1) validating the markers used in the assignment tests markers used in the assignment tests through analysis of genetic through analysis of genetic heterogeneity; 2) calculating the heterogeneity; 2) calculating the likelihood that each animal originated likelihood that each animal originated from its true breed as well as from any from its true breed as well as from any of the others; 3) performing a statistical of the others; 3) performing a statistical analysis of the assignment tests in terms analysis of the assignment tests in terms of sensitivity and specificity.of sensitivity and specificity.
ChianinaChianina (N = 67) is a large-size, high- (N = 67) is a large-size, high-priced beef breed, which originated in priced beef breed, which originated in central Italy and is the source of the central Italy and is the source of the renowned “Florentine steak”.renowned “Florentine steak”.
CharolaiseCharolaise (N = 69 (N = 69)) LimousineLimousine and and (N=(N=67) are beef breeds of French origin, 67) are beef breeds of French origin, whichwhich share an important part of the share an important part of the Italian beef market.Italian beef market.
The The Italian FriesianItalian Friesian (N = 66) is the (N = 66) is the main dairy breed reared in Italy, but it is main dairy breed reared in Italy, but it is also a relevant source of meatalso a relevant source of meat..
Center of Statistical Genetics
S. Presciuttini – University of Pisa
Estimate of allele frequenciesWhen one allele is missing from a putative source breed, When one allele is missing from a putative source breed, the the multilocus multilocus likelihood is zero, and the value of the LR is undetermined. Several likelihood is zero, and the value of the LR is undetermined. Several solutions have been proposed:solutions have been proposed:
adding the test genotype to all samples adding the test genotype to all samples assigning arbitrarily low values to the missing alleles assigning arbitrarily low values to the missing alleles replacing them with the inverse number of gene copies in each samplereplacing them with the inverse number of gene copies in each sample using a uniform prior distribution of allele frequenciesusing a uniform prior distribution of allele frequencies
Since the practical application of assignment tests may ultimately imply Since the practical application of assignment tests may ultimately imply charges of fraudcharges of fraud,, we devised a conservative method of we devised a conservative method of estimatingestimating allele allele frequenciesfrequencies::
ppii= (f= (fii+ 1)/(n+ 1)/(nii+ a),+ a),
wherewhere f fii is the number of copies of an allele observed in breed is the number of copies of an allele observed in breed i, ni, nii is the is the number of gene copies for that locus in that breed (equal to twice its number of gene copies for that locus in that breed (equal to twice its sample size), and sample size), and aa is the number of alleles at that locus observed in the is the number of alleles at that locus observed in the total sample.total sample.
Center of Statistical Genetics
S. Presciuttini – University of Pisa
Discrimination between two cattle breedsusing15 STR
Limousine vs. CharolaiseTrue positives: 0.962False positives: 0.005Probability of assignment: 0.995
Charolaise vs. LimousineTrue positives: 0.995False positives: 0.038Probability of assignment: 0.963
Center of Statistical Genetics
S. Presciuttini – University of Pisa
A more complete picture of individual allocationamong four cattle breeds
Center of Statistical Genetics
S. Presciuttini – University of Pisa
Breed allocation using coat colour loci Whereas anonymous microsatellites markers have been Whereas anonymous microsatellites markers have been extensively extensively
usedused for th for the purpose of breed allocatione purpose of breed allocation (and more recently (and more recently anonymous SNPs have been proposed), use of coat color genes has anonymous SNPs have been proposed), use of coat color genes has received received minorminor attention so far. attention so far.
However, coat colour has been used as a trademark for different However, coat colour has been used as a trademark for different cattle breeds at least during the past 200 years in Europe, so that a cattle breeds at least during the past 200 years in Europe, so that a systematic selection has been applied to particular alleles expressed systematic selection has been applied to particular alleles expressed at the level of the color phenotypeat the level of the color phenotype
AAs a consequence, some breeds carry specific s a consequence, some breeds carry specific allelesalleles that are directly that are directly related to their related to their morphological identification, for example in loci morphological identification, for example in loci associated to coat colourassociated to coat colour. .
Center of Statistical Genetics
S. Presciuttini – University of Pisa
Private alleles and fixed alleles Breed allocation based on breed-specific trait loci relies on the Breed allocation based on breed-specific trait loci relies on the
concepts of “private alleles” and “fixed alleles”concepts of “private alleles” and “fixed alleles” Private allelesPrivate alleles: alleles found only in a single population: alleles found only in a single population Fixed alleleFixed alleless: alleles: alleles for which all members of for which all members of a populationa population under study under study is is
homozygoushomozygous, so that no other allele for this, so that no other allele for this locus locus segregatessegregates in th in thatat population. population.
When When we identify an allele that is connected to a phenotypic trait, we identify an allele that is connected to a phenotypic trait, and it is both a private allele in a particular breed, and also this breed and it is both a private allele in a particular breed, and also this breed is fixed for that allele, is fixed for that allele, the identification of the identification of thatthat breed as the source of breed as the source of any of its products from which DNA can be amplified is virtually any of its products from which DNA can be amplified is virtually certain.certain.
Center of Statistical Genetics
S. Presciuttini – University of Pisa
A pilot study at the University of Limoges The goal of the analysis was to explore the feasibility of using coat The goal of the analysis was to explore the feasibility of using coat
color genes for breed traceability in cattle.color genes for breed traceability in cattle. A total of 819 animals from 22 French A total of 819 animals from 22 French cattle cattle breeds breeds had been had been typed typed
by Labogena by Labogena for three coat color genesfor three coat color genes:: MC1RMC1R, , SilverSilver, and , and AgoutiAgouti After some data-cleaning (removing duplicated recordsAfter some data-cleaning (removing duplicated records and and animals animals
with one or more blank lociwith one or more blank loci, and also the breeds with , and also the breeds with <25 animals<25 animals)), , the final database included 624 animals from 18 breedsthe final database included 624 animals from 18 breeds, or , or 34.7 34.7 animals per breed on averageanimals per breed on average..
Center of Statistical Genetics
S. Presciuttini – University of Pisa
Allele frequencies at the three lociN
Breed ED E E1 e D Dc Ds A Abr
Abondance 38 0.08 0.01 0 0.91 1 0 0 0.76 0.24Aubrac 49 0 0.11 0.89 0 1 0 0 1 0Blanc Bleu Belge 33 0.94 0 0 0.06 1 0 0 0.89 0.11Blonde d'Aquitaine 33 0 0 0 1 0.3 0 0.7 1 0Brune des Alpes 34 0 0.65 0.35 0 1 0 0 1 0Charolais 34 0 0 0 1 0 1 0 0.88 0.12Créole 32 0 0.81 0 0.19 1 0 0 1 0Gascon 36 0 0.39 0.61 0 1 0 0 1 0Limousine 39 0 0 0 1 1 0 0 1 0Montbéliarde 28 0 0 0 1 1 0 0 0.89 0.11Normande 29 0 1 0 0 1 0 0 0 1Parthenaise 37 0 0.35 0.65 0 1 0 0 1 0Pie Rouge 28 0.02 0.14 0 0.84 1 0 0 0.98 0.02Prim' Holstein 36 0.93 0.07 0 0 1 0 0 1 0Rouge des Près 30 0 0 0 1 1 0 0 0.32 0.68Salers 27 0 0.06 0 0.94 1 0 0 0.98 0.02Simmental 43 0.02 0 0 0.98 0.55 0 0.45 0.99 0.01Tarentaise 38 0 0.08 0.92 0 1 0 0 1 0Grand Total 624 0.111 0.194 0.219 0.477 0.054 0.068 0.877 0.888 0.112
MC1R Silver Agouti
This allele is both fixed and private!
Center of Statistical Genetics
S. Presciuttini – University of Pisa
Breeds excluded based on presence/absence of alleles
Based on the allele frequencies, the Based on the allele frequencies, the genotypes of each breed were checked for genotypes of each breed were checked for compatibility with the alleles present in all compatibility with the alleles present in all other breeds.other breeds.
When a genotype included an allele that was When a genotype included an allele that was missing in another breed, it was declared to missing in another breed, it was declared to be incompatible.be incompatible.
The Table shows, for each breed, the number The Table shows, for each breed, the number of breeds and percentage for which all of breeds and percentage for which all animals of that breed are incompatible.animals of that breed are incompatible.
The only breed for which all animals are The only breed for which all animals are incompatible with all others is Charolais incompatible with all others is Charolais (100% incompatible breeds), but also (100% incompatible breeds), but also Normande (82% incompatibilities), and Normande (82% incompatibilities), and Blanc Bleu Belge, Prim' Holstein, and Blanc Bleu Belge, Prim' Holstein, and Tarentaise (Tarentaise (76%76% inincompatibilities) are well compatibilities) are well discriminated.discriminated.
BreedNumber of
incompatiblebreeds
%
Abondance 7 41%
Aubrac 7 41%Blanc Bleu Belge 13 76%Blonde d'Aquitaine 8 47%Brune des Alpes 7 41%Charolais 17 100%Créole 1 6%Gascon 7 41%Limousine 8 47%Montbéliarde 8 47%Normande 14 82%Parthenaise 7 41%Pie Rouge 8 47%Prim' Holstein 13 76%Rouge des Près 8 47%Salers 8 47%Simmental 8 47%Tarentaise 13 76%
Center of Statistical Genetics
S. Presciuttini – University of Pisa
Breeds excluded based on genotype likelihood A more refined measurement of the usefulness A more refined measurement of the usefulness
of these three markers in cattle breeds of these three markers in cattle breeds traceability is based on the calculation of traceability is based on the calculation of multilocus likelihoods.multilocus likelihoods.
Based on the genotype at the three loci, the Based on the genotype at the three loci, the likelihood that each animal of each breed was likelihood that each animal of each breed was assigned to its own breed as well as to all assigned to its own breed as well as to all other breeds was calculated, and then other breeds was calculated, and then converted into posterior probabilities converted into posterior probabilities assuming equal priors.assuming equal priors.
By taking a By taking a mean mean value of this probability value of this probability over all animals of a breed over all animals of a breed <1% as an <1% as an evidence of exclusion, the number of breeds evidence of exclusion, the number of breeds for which any given breed is for which any given breed is inincompatible is compatible is modified as shownmodified as shown in the Table in the Table..
Both Charolais and Normande show Both Charolais and Normande show 101000%% inincompatibilitiescompatibilities.. The next highest values are The next highest values are those of Prim’Holstein (those of Prim’Holstein (94%94% inincompatibility) compatibility) and Blanc Bleu Belge (and Blanc Bleu Belge (82%82% inincompatibilities). compatibilities).
BreedNumber of
incompatiblebreeds
%
Abondance 9 53%Aubrac 13 76%Blanc Bleu Belge 14 82%Blonde d'Aquitaine 11 65%Brune des Alpes 12 71%Charolais 17 100%Créole 9 53%Gascon 12 71%Limousine 10 59%Montbéliarde 10 59%Normande 17 100%Parthenaise 12 71%Pie Rouge 9 53%Prim' Holstein 16 94%Rouge des Près 12 71%Salers 9 53%Simmental 11 65%Tarentaise 13 76%
Center of Statistical Genetics
S. Presciuttini – University of Pisa
Breeds clustered by coat colour genes
The The average probabilities of assignment average probabilities of assignment of the animal of each breedof the animal of each breed may may easily easily be be converted intoconverted into a similarity matrix, a similarity matrix, from which from which aa neighbour-joining neighbour-joining dendrogram dendrogram can becan be obtained. obtained.
Six major clusters of breeds can be Six major clusters of breeds can be distinguished.distinguished.
The genotypes at the three investigated The genotypes at the three investigated loci make it possible to easily assign loci make it possible to easily assign animals to a cluster, but not to assign animals to a cluster, but not to assign them to a breed within a cluster.them to a breed within a cluster.
The figure The figure also shows the prevalent also shows the prevalent genotypes that are mostly responsible of genotypes that are mostly responsible of the observed clustering.the observed clustering.
Normande
Abondance
Rouge de Près
Pie rouge
Montbéliarde
Limousin
Salers
Blonde d’Aquitaine
Simmental
Creole
Aubrac
Tarentaise
Brune des Alpes
Gascon
Parthenaise
Blanc Bleu Belge
Prim’ Holstein
Charolais
AgoutiAbr/Abr
MC1Re/e
DilutionDs/-
MC1RE/-E1/-
MC1RED/-
DilutionDc /Dc
Center of Statistical Genetics
S. Presciuttini – University of Pisa
Perspectives In conclusion, this work shows that the three typed loci could form a In conclusion, this work shows that the three typed loci could form a
reasonable basis to implement a system of traceability for French reasonable basis to implement a system of traceability for French cattle breeds.cattle breeds.
More work is necessary to increase the breed sample size (ideally, at More work is necessary to increase the breed sample size (ideally, at least 100 animals from each breed should be typed to enter a validated least 100 animals from each breed should be typed to enter a validated database for estimating allele frequencies more precisely).database for estimating allele frequencies more precisely).
In additionIn addition, other genes responsible for variation in coat color could , other genes responsible for variation in coat color could be typed, thus increasing the discrimination capacity of a test that can be typed, thus increasing the discrimination capacity of a test that can be easily implemented by the industry.be easily implemented by the industry.