genomic selection and systems biology – lessons from dairy cattle breeding
DESCRIPTION
Presentation made to the staff of Keygene, NV, in Wageningen, The Netherlands. (I don't know what the problem is with the template here. It looks fine if you use a dark background.)TRANSCRIPT
J. B. Cole Animal Improvement Programs Laboratory Agricultural Research Service, USDA Beltsville, MD 20705-2350, USA [email protected]
Genomic selec+on and systems biology – lessons from dairy ca5le breeding
Keygene N.V., Wageningen, The Netherlands, 28 May 2013 (2) Cole
Dairy Cattle
9 million cows in US
Attempt to have a calf born every year
Replaced after 2 or 3 years of milking
Bred using artificial insemination
Popular bulls have 10,000+ progeny
Cows can have many progeny though superovulation and embryo transfer
Keygene N.V., Wageningen, The Netherlands, 28 May 2013 (3) Cole
Embryo transferred to recipient"
Parents selected"
Dam inseminated"
Bull born"
Semen collected (1 y)"Daughters born (9 m later) "
Daughters have calves (2 y later)" Bull receives progeny test "
(5 y)"
Genomic Test"
Lifecycle of bull
Keygene N.V., Wageningen, The Netherlands, 28 May 2013 (4) Cole
Phenotypes recorded
Monthly recording
Milk, fat, and protein yields
Somatic cell count (udder health)
Visual appraisal for type traits
Breed associations record pedigree
Calving difficulty and stillbirth
Keygene N.V., Wageningen, The Netherlands, 28 May 2013 (5) Cole
Available data
Type of Data Number of Records
Cows with lactation data 28,394,976
Lactations 68,373,863
Individual test days 508,574,532
Dystocia records 20,770,758
Animals in pedigree file 58,893,009
Genotyped bulls 105,654
Genotyped cows 276,173
Keygene N.V., Wageningen, The Netherlands, 28 May 2013 (6) Cole
0
50000
100000
150000
200000
250000
300000
1004 1008 1012 1104 1108 1112 1204 1208 1212 1304
Bulls Cows
Cole"
Many animals have been genotyped
Evaluation Date (YYMM)"
Gen
otyp
es"
381,827 genotyped animals"
Keygene N.V., Wageningen, The Netherlands, 28 May 2013 (6)
Keygene N.V., Wageningen, The Netherlands, 28 May 2013 (7) Cole
How does genetic selection work?
ΔG = genetic gain each year
reliability = how certain we are about our estimate of an animal’s genetic merit (genomics can é)
selection intensity = how “picky” we are when making mating decisions (management can é)
genetic variance = variation in the population due to genetics (we can’t really change this)
generation interval = time between generations (genomics can ê)
Keygene N.V., Wageningen, The Netherlands, 28 May 2013 (8) Cole 8"
Calculation of genomic evaluations
Deregressed PTA derived from traditional evaluations of predictor animals
Allele substitution effects estimated for 45,188 SNP
Polygenic effect estimated for genetic variation not captured by SNP
Selection index combination of genomic and traditional not included in genomic
Keygene N.V., Wageningen, The Netherlands, 28 May 2013 (9) Cole
Many chips are available
HD"
50KV2 "
LD "
GGP HD!
BovineSNP50
Version 1 54,001 SNP
Version 2 54,609 SNP
45,188 used in evaluations
High-density (HD)
777,962 SNP
Only 50K SNP used,
Low-density (LD)
6,909 SNP
Geneseek Genomic Profiler & GGP-HD
Keygene N.V., Wageningen, The Netherlands, 28 May 2013 (10) Cole
What is a SNP genotype worth?
For the protein yield (h2=0.30), the SNP genotype provides information equivalent to an additional 34 daughters"
Pedigree is equivalent to information on about 7 daughters "
Keygene N.V., Wageningen, The Netherlands, 28 May 2013 (11) Cole
And for daughter pregnancy rate (h2=0.04), SNP = 131 daughters"
What is a SNP genotype worth?"
Keygene N.V., Wageningen, The Netherlands, 28 May 2013 (12) Cole
High density SNP chip
Currently only 50K subset of SNP used
Some increase in accuracy from better tracking of QTL possible
Realized gains have been small
Potential for across-breed evaluations
Requires few new HD genotypes once adequate base for imputation developed
Keygene N.V., Wageningen, The Netherlands, 28 May 2013 (13) Cole
Low density SNP chip
6909 SNP mostly from SNP50 chip
Evenly spaced across 30 chromosomes
Addresses performance issues with 3K while providing low-cost genotyping
Provides over 98% accuracy imputing 50K genotypes
Keygene N.V., Wageningen, The Netherlands, 28 May 2013 (14) Cole
Parentage validation and discovery
Parent-progeny conflicts detected
Animal checked against all other genotypes
Reported to breeds and requesters
Correct sire usually detected
Maternal grandsire checking
SNP at a time checking
Haplotype checking more accurate
Breeds moving to accept SNP in place of microsatellites
Keygene N.V., Wageningen, The Netherlands, 28 May 2013 (15) Cole
Imputation
Based on splitting the genotype into individual chromosomes
Missing SNP assigned by tracking inheritance from ancestors and descendants
Imputed dams increase predictor population
3K, LD, & 50K genotypes merged by imputing SNP not on LD or 3K
Keygene N.V., Wageningen, The Netherlands, 28 May 2013 (16) Cole
Genotypes and haplotypes
Genotypes indicate how many copies of each allele were inherited
Haplotypes indicate which alleles are on which chromosome
Observed genotypes partitioned into the two unknown haplotypes
Pedigree haplotyping uses relatives
Population haplotyping finds matching allele patterns
Keygene N.V., Wageningen, The Netherlands, 28 May 2013 (17) Cole
Haplotyping program – findhap.f90
Begin with population haplotyping
Divide chromosomes into segments, ~250 to 75 SNP / segment
List haplotypes by genotype match
Similar to fastPhase, IMPUTE
End with pedigree haplotyping
Detect crossover, fix noninheritance
Impute nongenotyped ancestors
Keygene N.V., Wageningen, The Netherlands, 28 May 2013 (18) Cole
O-Style Haplotypes Chromosome 15
Keygene N.V., Wageningen, The Netherlands, 28 May 2013 (19) Cole
We’re working on new tools
Keygene N.V., Wageningen, The Netherlands, 28 May 2013 (20) Cole
Recessive defect discovery
Check for homozygous haplotypes
7 to 90 expected but none observed
5 of top 11 are potentially lethal
936 to 52,449 carrier sire-by-carrier MGS fertility records
3.1% to 3.7% lower conception rates
Some slightly higher stillbirth rates
Confirmed Brachyspina same way
Keygene N.V., Wageningen, The Netherlands, 28 May 2013 (21) Cole
Impact on producers
Young-bull evaluations with accuracy of early 1stcrop evaluations
AI organizations marketing genomically evaluated 2-year-olds
Rate of genetic improvement may increase by up to 50%
Studs reducing progeny-test programs
Keygene N.V., Wageningen, The Netherlands, 28 May 2013 (22) Cole
Why genomics works in dairy
Extensive historical data available
Well-developed genetic evaluation program
Widespread use of AI sires
Progeny test programs
High-valued animals, worth the cost of genotyping
Long generation interval which can be reduced substantially by genomics
Keygene N.V., Wageningen, The Netherlands, 28 May 2013 (23) Cole
Where do we go from here?
We found a few QTL
Most traits show infinitessimal inheritance
Dominance effects also are small
What about epistasis?
Systems biology – gene/protein/transcription factor networks
Keygene N.V., Wageningen, The Netherlands, 28 May 2013 (24) Cole 24"
We confirmed known QTL
Cole, J.B. et al. 2009. Distribution and location of genetic effects for dairy traits. ICAR Tech Ser. 13:355–360."
Keygene N.V., Wageningen, The Netherlands, 28 May 2013 (25) Cole
Gene set enrichment analysis-SNP
Gene pathways (G)"GWAS results"
Score increase is proportional to SNP test statistic"
Nominal p-value corrected for multiple testing"
Pathways with moderate
effects"Holden et al., 2008 (Bioinformatics 89:1669-1683. doi:10.2527/jas.2010-3681)"
SNP ranked by significance
(L)"
SNP in pathway genes
(S)"Score increases for each Li in S"
Permutation test and FDR"
Includes all SNP, S, that are included in L"
The more SNP in S that appear near the
top of L, the higher the Enrichment Score"
Keygene N.V., Wageningen, The Netherlands, 28 May 2013 (26) Cole
Association weight matrix
Find gene coexpression networks (Fortes et al., 2010)
Select SNP by significance, correlation, dist’n, etc.
− Favor intragenic SNP significant across traits
Construct weight matrix
− Rows are SNP, columns are traits cols
− Cells are normalized z-score of the additive effect of ith SNP on jth trait
Significant correlations are identified using PCIT (Reverter and Chan, 2008) and visualized
− Cells randomly permuted as control
Keygene N.V., Wageningen, The Netherlands, 28 May 2013 (27) Cole
Can we identify regulatory networks?
Fortes et al., 2011 (J. Animal Sci. 89:1669-1683. doi:10.2527/jas.2010-3681)"
Candidate genes and pathways that affect age at puberty common to both breeds"
Keygene N.V., Wageningen, The Netherlands, 28 May 2013 (28) Cole
Network analysis
Fortes et al., 2011 (J. Animal Sci. 89:1669-1683. doi:10.2527/jas.2010-3681)"
Gene network – the red center identifies highly connected nodes."
Subnetwork of interacting transcription factors from the puberty network."
Subnetwork of interacting transcription factors from a collection of mouse and human data. (Validation step.)"
Keygene N.V., Wageningen, The Netherlands, 28 May 2013 (29) Cole
Enriched pathways
Fortes et al., 2011 (J. Animal Sci. 89:1669-1683. doi:10.2527/jas.2010-3681)"
Keygene N.V., Wageningen, The Netherlands, 28 May 2013 (30) Cole
Transcription factor network
Fortes et al., 2011 (J. Animal Sci. 89:1669-1683. doi:10.2527/jas.2010-3681)"
Yellow genes were submitted to database. Other nodes were mined from FunCoup. Red: protein-protein interaction Blue: mRNA coexpression"
Keygene N.V., Wageningen, The Netherlands, 28 May 2013 (31) Cole
How do we rank allele effects?
GSEA and AWM require that we order SNP on some criterion
p-values (actual or nominal)
q-values (false discovery rate)
Not all models provide p-values
Allele substitution effects (not so good)
Scaled substitution effects (better)
It’s not clear (to me) which is best
Keygene N.V., Wageningen, The Netherlands, 28 May 2013 (32) Cole
Aren’t P-values easy?
Single SNP, fixed-effects model
Inflation of error variances
Spurious associations
e.g., Plink
Multiple SNP, mixed-effects model
Accounts for population structure
e.g., TASSEL, GoldenHelix SVS
Keygene N.V., Wageningen, The Netherlands, 28 May 2013 (33) Cole
A recent example from dairy
Extreme birth weights are associated with increased risk of stillbirth and calving difficulty
Birth weights are not measured on most dairy farms in the US
With German colleagues, we developed a predictor based on traits we do measure
Keygene N.V., Wageningen, The Netherlands, 28 May 2013 (34) Cole
GWAS for birth weight PTA
h"
Cole et al.(2013), unpublished data"
Keygene N.V., Wageningen, The Netherlands, 28 May 2013 (35) Cole
KEGG pathways for birth weight What does regulation of the actin cytoskeleton have to do with birth weight in cattle? That is, do these results make sense?"
Maybe…these pathways may be involved in establishment & maintenance of pregnancy, as well as coordination of growth and development. "
Cole et al.(2013), unpublished data"
Keygene N.V., Wageningen, The Netherlands, 28 May 2013 (36) Cole
A new project
The Brown Swiss, Holstein, and Jersey breeds experience dystocia at different rates
We are applying the AWM method of Fortes et al. to these data
The goal is to identify gene networks…
Common to all breeds
Different by breed
Keygene N.V., Wageningen, The Netherlands, 28 May 2013 (37) Cole
We have divergent populations
Cole et al., 2005 (J. Dairy Sci. 88(4):1529–1539)"
Keygene N.V., Wageningen, The Netherlands, 28 May 2013 (38) Cole
Challenges
Annotation
This is a mess in the cow
The reference assembly may not be representative of all taurine cows
Validation
Doing functional genomics with large mammals is expensive – who pays?
When have we proven something?
Keygene N.V., Wageningen, The Netherlands, 28 May 2013 (39) Cole
Conclusions
We’re not going to find big QTL for most traits
We may identify gene networks affecting complex phenotypes
We’re learning how much we don’t know about functional genomics in the cow
Validation remains a problem
Keygene N.V., Wageningen, The Netherlands, 28 May 2013 (40) Cole
Partners
Illumina
Marylinn Munson
Cindy Lawley
Christian Haudenschild
BARC
Curt Van Tassell
Lakshmi Matukumalli
Tad Sonstegard
Missouri
Jerry Taylor
Bob Schnabel
Stephanie McKay
Alberta
Steve Moore
USMARC – Clay Center
Tim Smith
Mark Allan
iBMAC Consortium" Funding Agencies" USDA/NRI/CSREES
2006-35616-16697
2006-35205-16888
2006-35205-16701
USDA/ARS
1265-31000-081D
1265-31000-090D
5438-31000-073D
Merial
Stewart Bauck
NAAB
Godon Doak
ABS Global
Accelerated Genetics
Alta Genetics
CRI/Genex
Select Sires
Semex Alliance
Taurus Service
Keygene N.V., Wageningen, The Netherlands, 28 May 2013 (41) Cole
Questions?
http://gigaom.com/2012/05/31/t-mobile-pits-its-math-against-verizons-the-loser-common-sense/shutterstock_76826245/"