jonathan b. puritz, christopher m. hollenbeck, and john r. gold fishing for selection, but only...

1
Jonathan B. Puritz, Christopher M. Hollenbeck, and John R. Gold Fishing for selection, but only catching bias: library effects in double-digest RAD data in a non-model marine species Marine Genomics Laboratory Harte Research Institute Texas A&M University-Corpus Christi Results Introductio n Conclusions Double-digest, restriction-site associated DNA sequencing (ddRAD) has become a powerful and useful approach for population genomics, especially for non-model organisms (Peterson et al. 2012). However, once population-genomics studies extend beyond a single library, imprecision of agarose gel-size selection during library preparation has the potential to introduce bias if the same set of homologous genomic fragments across all samples is not sequenced or if SNP allelic identity is linked to RAD genomic fragment length. In high gene flow species, such as many marine organisms, small of amounts of systematic bias could overwhelm a weak signal of population structure and lead to misinterpretation of spurious outlier SNPs as potential candidates for adaptive loci. 13/64 227/52 46 Methods 532 individual red snapper (Lutjanus campechanus) from 15 different localities were sequenced across four different ddRAD libraries, two prepared with agarose gel- based size selection, and two prepared with automated size selection via a Pippin Prep. Sequence data was processed with the dDocent pipeline (Puritz et al. 2014). Individuals from two geographic localities were sequenced from multiple libraries across different size-selection techniques. Separating each of the two localities into library ‘populations’, F ST -based outlier detection (LOSITAN, Antao et al. 2008) was used to identify 1,751 SNPs with potential library bias (high F ST between different library subsets from the same locality). The total SNP data set was then filtered by low coverage individuals, minor allele frequency, call rate, minimum and maximum mean site depth, quality vs depth, allele balance, paired status, uneven balance of forward and reverse reads, SNP clusters, There is significant library bias across multiple ddRAD libraries and biased loci are disproportionately found in outlier loci relative to neutral loci. Stringent bioinformatic filtering does not remove all biased loci (~20% remain in inferred outliers), and even a few biased loci can overwhelm weak signatures of selection or population structure in a data set. Library bias in ddRAD data could be mitigated by (i) randomizing individuals across libraries, and (ii) repeating localities or subsets of localities across libraries. Field Collection (For data presented): North Carolina Department of Environmental and Natural Resources, Division of Marine Fisheries; Florida Fish and Wildlife Research Institute, Florida Fish and Wildlife Conservation Commission; Panama City Laboratory and Pascagoula Laboratory, Southeast Fisheries Science Center, National Marine Fisheries Service, and Crew of the Oregon II, National Marine Fisheries Service. Funding: MARFIN Program of the National Marine Fisheries Service. Literature Cited Antao, T., Lopes, A., Lopes, R.J., Beja- Pereira, A., Luikart, G., 2008. BMC Bioinformatics. Foll, M., and Gaggiotti, O.E., 2008. Genetics 180. Jombart, T., Devillard. S., and Balloux, F. 2010. BMC Genetics. Peterson, B.K., Weber, J.N., Kay, E.H., Fisher, H.S., Hoekstra, H.E., 2012. PLoS One. Acknowledgements

Upload: kerrie-gregory

Post on 21-Dec-2015

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Jonathan B. Puritz, Christopher M. Hollenbeck, and John R. Gold Fishing for selection, but only catching bias: library effects in double-digest RAD data

Jonathan B. Puritz, Christopher M. Hollenbeck, and John R. Gold

Fishing for selection, but only catching bias: library effects in double-digest RAD data in a non-model marine species

Marine Genomics LaboratoryHarte Research InstituteTexas A&M University-Corpus Christi

ResultsIntroduction

Conclusions

Double-digest, restriction-site associated DNA sequencing (ddRAD) has become a powerful and useful approach for population genomics, especially for non-model organisms (Peterson et al. 2012). However, once population-genomics studies extend beyond a single library, imprecision of agarose gel-size selection during library preparation has the potential to introduce bias if the same set of homologous genomic fragments across all samples is not sequenced or if SNP allelic identity is linked to RAD genomic fragment length. In high gene flow species, such as many marine organisms, small of amounts of systematic bias could overwhelm a weak signal of population structure and lead to misinterpretation of spurious outlier SNPs as potential candidates for adaptive loci.

13/64

227/5246

Methods532 individual red snapper (Lutjanus campechanus) from 15 different localities were sequenced across four different ddRAD libraries, two prepared with agarose gel-based size selection, and two prepared with automated size selection via a Pippin Prep. Sequence data was processed with the dDocent pipeline (Puritz et al. 2014).

Individuals from two geographic localities were sequenced from multiple libraries across different size-selection techniques. Separating each of the two localities into library ‘populations’, FST -based outlier detection (LOSITAN, Antao et al. 2008) was used to identify 1,751 SNPs with potential library bias (high FST between different library subsets from the same locality).

The total SNP data set was then filtered by low coverage individuals, minor allele frequency, call rate, minimum and maximum mean site depth, quality vs depth, allele balance, paired status, uneven balance of forward and reverse reads, SNP clusters, and HWE.

The final SNP call set was then separated into two subsets, one containing all loci and one that was filtered for identified SNPs with potential library bias. BAYESCAN (Foll and Gagiotti 2008) and Discriminant Analysis of Principal Components (Jombart et al. 2010) were used to examine the potential effects of library biased loci.

• There is significant library bias across multiple ddRAD libraries and biased loci are disproportionately found in outlier loci relative to neutral loci.

• Stringent bioinformatic filtering does not remove all biased loci (~20% remain in inferred outliers), and even a few biased loci can overwhelm weak signatures of selection or population structure in a data set.

• Library bias in ddRAD data could be mitigated by (i) randomizing individuals across libraries, and (ii) repeating localities or subsets of localities across libraries.

Field Collection (For data presented): North Carolina Department of Environmental and Natural Resources, Division of Marine Fisheries; Florida Fish and Wildlife Research Institute, Florida Fish and Wildlife Conservation Commission; Panama City Laboratory and Pascagoula Laboratory, Southeast Fisheries Science Center, National Marine Fisheries Service, and Crew of the Oregon II, National Marine Fisheries Service. Funding: MARFIN Program of the National Marine Fisheries Service.

Literature CitedAntao, T., Lopes, A., Lopes, R.J., Beja-Pereira, A., Luikart, G., 2008. BMC Bioinformatics.

Foll, M., and Gaggiotti, O.E., 2008. Genetics 180.

Jombart, T., Devillard. S., and Balloux, F. 2010. BMC Genetics.

Peterson, B.K., Weber, J.N., Kay, E.H., Fisher, H.S., Hoekstra, H.E., 2012. PLoS One.

Puritz, J.B., Hollenbeck, C.M., Gold, J.R., 2014. PeerJ.

Acknowledgements