sequencing the genome of rhipicephalus microplus€¦ · boophilus) microplus. mi bellgard, pm...
TRANSCRIPT
Sequencing the Genome of Rhipicephalus microplus
Felix D. Guerrero USDA-ARS Knipling-Bushland
U. S. Livestocks Insect Laboratory
Kerrville, TX, USA
on behalf of the Cattle Tick Genome Sequencing Project Team
Historical Background of the Texas Cattle Fever Problem
• Cattle tick was endemic in US
• Problems for cattle industry • 1795 & 1796: Severe outbreaks of
disease in cattle in NC & PA following a shipment of cattle northward from the south
• 1850 – Texas cattle began to be driven through AK, MO, KS & western states: 50-90 % death in cattle
• 1868: disasters in IL & IN left 15,000 cattle dead
$3 billion estimated annual savings in today's dollars realized by livestock industry
• 1200 km Line
• From 70 m -70 km wide
U.S.-Mexico Border
Import >1,000,000 head annually
Mandatory dipping in OP
OP resistant ticks in Mexico
Ticks are Common in Mexico
Based on our belief that the genome holds the key to development of new control technologies!
Critical factors drove us to a two-phase approach Huge genome: 2.4 X larger than human genome
Difficult genome: due to >70% repetitive DNA content
Phase 1 focused on sequencing gene-rich regions
Phase 2 used PacBio to sequence complex genome regions
Phase I Approach: 2004-2009
Identified ~40% of expressed genes in collaboration with TIGR-Vish Nene (2004-7) 50,000 Sanger ESTs sequenced and assembled into Gene Index
~15,000 contigs (currently BmiGI Ver2.1)
Available at http://compbio.dfci.harvard.edu/tgi/ ESTs in GenBank
Murdoch Univ. collaboration produced CattleTickBase http://ccg.murdoch.edu.au/organism/rhipicephalus/
Transcriptome of 28,900 contigs from 24,673,517 bp
Phase I Approach: 2009-2012
Sequence "unique" gene-rich regions of genomic DNA through a reassociation kinetics-based approach
• Cot selection Collaboration with Mississippi State University and
Molecular Research LP (MR DNA)
Phase I Approach: 2009-2012
Genomic DNA Fractions
Highly repetitive: Sequencing technology not available
Moderately repetitive: Available technology too expensive
Unique: Appropriate technology existed Isolate and sequence
Assembly into database
(Database consists of exons, introns, promoter)
Cot Selection Overview
Selection Protocol:
- Use highly purified genomic DNA
- Shear DNA to ~500 bp
- Resuspend DNA in sodium phosphate buffer
- Use DNA renaturation kinetics protocol
Basis of This Approach: DNA Renaturation Kinetics
Rate at which a specific single stranded DNA sequence reassociates in a solution (finds its complement) depends on how often it occurs in the genome. Thus: Repetitive DNA quickly reassociates to double-
stranded DNA Gene-rich low-copy DNA slowly reassociates
boil
Incubate for mathematically determined time and temperature
Dissociation: Reassociation
Stop reassociation with sodium phosphate buffer
ds ss DNA state change
1. Pour over hydroxyapatite resin. Wash resin with 0.03M.
1. Recover ssDNA (has not reassociated) with 0.12M
2. 454 sequencing protocol
Followup with Hydroxyapatite Chromatography
Selective binding of ssDNA using hydroxyapatite
Total Low Cot Data Assembly
Run Total Reads Total Bases
454 FLX 6 runs: 1,365,419 ~0.3 x 109
454 Titanium 6 runs: 5,851,291 ~1.5 x 109
Illumina HiSeq 3 runs: 509,552,505 ~50 x 109
RESULT: Total ~25X coverage of gene-rich region of genome
Overview of Illumina Cot DNA Scaffold Production Methodology
HiSeq Quality-checked Reads
50 x 109 seqs split three ways 50 x 109 seqs
Cap3 assembler (In progress)
Genome assembly
Phase 1 Product: Gene-rich Dataset
CattleTickBase: Website hosts the transcriptome and genome sequencing datasets
Collaboration with Murdoch University, Perth, Australia.
http://ccg.murdoch.edu.au/organism/rhipicephalus/
Transcriptome of 28,900 contigs Reassociation kinetics-based approach for partial genome sequencing of the cattle tick,
Rhipicephalus (Boophilus) microplus. FD Guerrero, P Moolhuijzen, et al., BMC Genomics 11: 374, 2010.
CattleTickBase: An integrated Internet-based bioinformatics resource for Rhipicephalus (Boophilus) microplus. MI Bellgard, PM Moolhuijzen, et al. Intl J Parasitol 42: 161-169, 2012.
Phase 2: Sequencing the Difficult Genome Regions 2012-13
Utilizing Next-Next Gen technology: Pac Bio in long-read mode
Collaboration with National Center for Genome Resources in Santa Fe, New Mexico.
177 Pac Bio SMRT cells (runs)
>200 million reads per SMRT cell New Version 3 cells
New Pac Bio XL enzyme
Avg. sequence read length ~5,000 bp
The Size of the Problem
• Statistics:
• 14,000,000 Pac Bio sequences
• Average read insert length = 2,371
• Maximum read insert length = 26,364
• Total number of tick genome basepairs = 33 x 109 (5X coverage)
• Must error correct PacBio reads before assembly.
• Using LSC Pac Bio error correcting program
• http://www.healthcare.uiowa.edu/labs/au/LSC/
• Such a large PacBio read set covering an entire complex eukaryotic
genome been never been reported as assembled.
Phase 2 Product: Draft Quality Genome Sequence
~12X genome coverage 52 x 109 bp from Illumina/454
33 x 109 bp from Pac Bio
2 x 109 bp from BAC library clone sequencing Collaboration with Amplicon Express
Assembly underway at Murdoch University Anticipate manuscripts in late-2014 and having draft
sequence available to scientific community through GenBank and CattleTickBase.
(1) Low cot 454
(2) Low Cot Illumina
(3) 18,000 BACs
Combined sets 1, 2
Soap, Ray, cap3 assemblers Combined
sets 1, 2, 3
(4) LSC Pacbio error correction
(5) 15K BAC
scaffolds
Combined sets 1,2,3,4
Combined sets 1,2,3,4
cap3 assembler
cap3, MIRA4 assemblers
cap3 Completed Ongoing Pending
Assembly Overview
12X-Coverage in CattleTickBase
Whole Genome of R. microplus 7.1 Gbp
High and Moderate Repetitive Unique Gene-rich
Phase II PacBio + BACs
Phase I ESTs + low Cot
Human Genome 3.2 Gbp
By comparison >>>
Route to Anti-Cattle Tick Vaccine
Genome studies identified tick genes and sequence
Bioinformatics computational analysis helps identify gene function
For vaccine development, we focus on tick genes/proteins with critical function Development
Feeding
Pathogen transmission
Vaccine Antigen Candidates Selection Criteria
Selected from tissue-specific transcriptomes
Verified expression in gut or ovary membrane.
Low sequence similarity to mammalian proteins.
Exclude members of large gene families. Prefer single or low copy genes to avoid functional
redundancy among gene family members.
Critical function to the tick.
Genome Project-Selected Vaccine Candidates
Name Basis for Selection Status
Antigen 1 (Aquaporin) In silico structure and function In cattle trials
Antigen 2 Midgut/saliva protein Upregulated w/Babesia inf.
In cattle trials
Antigen 4 Midgut upregulated w/Babesia inf. + outperformed Bm86 during in vitro tick feeding
In cattle trials
Antigen 5 Antigen 6 Antigen 7 Antigen 8
Upregulated in adult gut, receptor binding function Adult gut protein upregulated after Babesia infection Adult ovary protein upregulated after Babesia infection Protein upregulated in ovary and salivary glands after Babesia infection
Lab scale up Lab scale up Lab scale up Lab scale up
Acknowledgements
Consortium Leadership Team Felix D. Guerrero, USDA-ARS Kerrville, TX, USA
Matthew I. Bellgard, Murdoch University, Murdoch, WA, Australia
Robert J. Miller, USDA-ARS Edinburg, TX, USA
Collaborating Organizations U. S. Dept. of Agriculture, Agricultural Research Service
Kerrville, TX, Edinburg, TX, and Beltsville, MD
Murdoch University, Murdoch, WA, Australia
National Center for Genome Resources, Santa Fe, NM, USA
Purdue University, West Layfayette, IN, USA
Mississippi State University, Starkville, MS, USA
J. Craig Venter Institute, Rockville, MD, USA
Queensland Alliance for Ag. and Food Innovation, Brisbane, QLD, Australia
Amplicon Express, Pullman, WA, USA
Molecular Research LP, Shallowwater, TX, USA
Acknowledgements
Individuals from each organization (apologies to any that might have been left off) U. S. Dept. of Agriculture, Agricultural Research Service
Felix D. Guerrero
Robert J. Miller
Adalberto Perez de Leon
Kylie G. Bendele
John E. George
Daniel Strickman
Murdoch University, Murdoch, WA, Australia Matthew I Bellgard
Roberto Barrero
Paula Moolhuijzen
Adam Hunter
John McCook
Michael Black
National Center for Genome Resources, Santa Fe, NM, USA Ernie Retzel
Callum Bell
Andrew Farmer
Jennifer Jacobi
Nico Devitt
Peter Ngam
Patricia Mena
Faye Schilkey
Acknowledgements
Purdue University, West Layfayette, IN, USA Catherine Hill
Mississippi State University, Starkville, MS, USA Daniel G. Peterson
J. Craig Venter Institute/TIGR, Rockville, MD, USA Vish Nene
Appolinaire Djikeng
Shelby Bidwell
Lis Caler
Mathangi Thiagarajan
Linda Hannick
Vinita Joardar
Queensland Alliance for Ag. and Food Innovation, Brisbane, QLD, Australia Ala Lew-Tabor
Manuel Rodriguez Valle
Amplicon Express, Pullman, WA, USA Robert Bogden
Evan Hart
Suresh Iyer
Amy Mraz
Bandie Harrison
Travis Ruff
Molecular Research LP, Shallowater, TX, USA Scot E. Dowd