peter hollingsworth - plants plenary

24
Plant DNA Barcoding using matK some work on new primer sets Dr. Alan Forrest Prof. Pete Hollingsworth Royal Botanic Garden Edinburgh Damon Little, New York Botanic Garden Aron Fazekas, University of Guelph Gao Lian-Ming, Kunming Institute of Botany Sean Graham, University of British Columbia Mehrdad Hajibabaei, CCDB, University of Guelph Maria Kuzmina, CCDB, University of Guelph Hollingsworth, Graham, Little (2011). "Choosing and using a plant DNA barcode." PLoSONE 6: e19254.

Upload: consortium-for-the-barcode-of-life-cbol

Post on 25-May-2015

1.027 views

Category:

Education


5 download

TRANSCRIPT

Page 1: Peter Hollingsworth - Plants Plenary

Plant DNA Barcoding using matKsome work on new primer sets

Dr. Alan Forrest

Prof. Pete HollingsworthRoyal Botanic Garden Edinburgh

Damon Little, New York Botanic GardenAron Fazekas, University of GuelphGao Lian-Ming, Kunming Institute of BotanySean Graham, University of British ColumbiaMehrdad Hajibabaei, CCDB, University of GuelphMaria Kuzmina, CCDB, University of Guelph

Hollingsworth, Graham, Little (2011). "Choosing and using a plant DNA barcode." PLoSONE 6: e19254.

Page 2: Peter Hollingsworth - Plants Plenary
Page 3: Peter Hollingsworth - Plants Plenary

Angiosperms: matK baselineHow good are the current “best” matK primers?

Ca. 10K PCR & sequencing attempts from 5 labs:

Kim 1R+3F = 72% success (N=9424)

2-step protocol: Kim 1R+3F and 390F+1326R: 80% success

Poorly performing orders include Malpighiales, Piperlaes, Poales, and Myrtales (especially Melastomataceae)

*ACDB African Centre for DNA Barcoding, University of Johannesburg, South Africa

*CCDB Canadian Centre for DNA Barcoding, University of Guelph, Canada

*KIB Kunming Institute of Botany, Chinese Academy of Sciences, China

*NYBG New York Botanic Garden, USA

*UBC University of British Columbia, Canada

Page 4: Peter Hollingsworth - Plants Plenary

Angiosperms: 3 approaches to improve matK retrieval

1) ePCR of existing published primers against ca. 10K matK sequences

Genetic algorithms to search for new primers

2) CODEHOP: COnsensus DEgenerate Hybrid Oligonucleotide Primer

Primer cocktails with a degenerate ‘core’ coupled with variant 3’ triplets for all known exact matches in GenBank

3) New primers/combinations tested alongside existing primers:1R+3F KJ Kim, unpublished

390F+1326R Cuenoud et al (2002) Am J Bot 89

472F+1248R Yu et al (2011) J Syst Evol 49, 1-6

xF+MALPR1 New combination (Ford et al 2009; Dunning & Savolainen 2010)

398Fb4+1311R CODEHOP; this study

matK primer location

Page 5: Peter Hollingsworth - Plants Plenary

Angiosperms: the test sample

5 Plates of samples• Wide taxonomic sample: N=470 • 52/61 orders and 172 families sensu APG3• All samples previously sequenced for rbcL• DNA extractions standardized, concentration equilibrated

A 188 samples from accessions that worked previously for 1R+3F (retain current success rates)

B 188 samples from accessions that failed previously for 1R+3F (improve on current success rates)

C 94 samples from 5 orders that performed particularly poorly (check that the nightmare groups are fixed)

Page 6: Peter Hollingsworth - Plants Plenary

Angiosperms: testing different protocols

PCR: different additives (acetamide, betaine, BSA, DMSO, DTT, formamide, glycerol, sulfolane, trehalose, 2-pyrrolidone, CES solution) primer and magnesium concentrations, annealing time and temperature

Best results: Platinum Taq polymerase, 1M betaine, 0.2M trehalose

PCR clean-up: nothing, Qiagen columns, ExoSAP-IT (neat and dilute)no clean-up = poor sequence quality Best results: ExoSAP-IT (dilute 1:10)

Sequencing PCR:Different additives tested (nothing, betaine, DMSO, trehalose, BDX64)Best results: 0.2M trehalose increased read length by up to 150bp

Full details of tests available from Alan Forrest, to be posted on Connect

Page 7: Peter Hollingsworth - Plants Plenary

Angiosperms: PCR results from different primer pairs

Worked Failed Badbefore before clades

Collaborating labs: Total A B CrbcL 100% 100% 100% 100%matK 1R+3F 40% 100% 0% 0%

Test lab:rbcL 99% 99% 98% 97%matK 390F+1326R 71% 79% 63% 71%matK 1R+3F 85% 97% 85% 63%matK 398Fb4+1311R 86% 87% 87% 83%matK 472F+1248R 88% 93% 92% 71%matK xF+MALPR1 91% 94% 92% 85%

Page 8: Peter Hollingsworth - Plants Plenary

Angiosperms: 2-step matK PCR amplification

1st Round 2nd Round Samples amplified1R+3F 390F+1326R 91%390F+1326R 398Fb4+1311R 90%xF+MALPR1 390F+1326R 93%xF+MALPR1 398Fb4+1311R 94%1R+3F 398Fb4+1311R 95%472F+1248R 1R+3F 95%472F+1248R 390F+1326R 95%xF+MALPR1 1R+3F 96%472F+1248R 398Fb4+1311R 97%xF+MALPR1 472F+1248R 98%

Page 9: Peter Hollingsworth - Plants Plenary

Angiosperms: 2-step protocol results: xF+MALPR1 & 472F+1248R

470 samples sequenced

High quality bi-directional reads obtained for 94% samples (96% inc. single reads, 97% inc. Phusion recoveries)

Complete failures: 3 (all failed for rbcL)Sequence failures: 17 low quality unable to contig

Of these failures, 10 subsequently recovered with Phusion Taq, but 3 were potentially pseudogenes

Single reads: 9Contaminants/Mix ups: 15

Of these, 7 are contaminants when sequenced with rbcL as well8 are matK problems, but ok for rbcL

Contaminants as fails: success 91% (92% inc. Phusion recoveries)Contaminants as missing: success 96% (97% inc. Phusion recoveries)

Page 10: Peter Hollingsworth - Plants Plenary

Angiosperms: recommended work flow

Dilute DNA 1:10

1st ROUND: all samplesPCR matK primers xF+MALPR1

1M betaine, 0.2M trehalose, Platinum Taq

Clean successful PCR products

Sequence clean PCR products0.2M trehalose

Acquire samples and extract DNA

2nd ROUND: all PCR and SEQ failures3F+1R or 472F+1248R

1M betaine, 0.2M trehalose, Platinum TaqPCR and SEQUENCE rbcL

Clean successful PCR products

Sequence clean PCR products0.2M trehalose

>95% matK sequence success rate

ALL poor quality sequences/mononucleotide motifsPCR and sequence matK primers xF+ERIR

1M betaine, 0.2M trehalose, Phusion Taq

Page 11: Peter Hollingsworth - Plants Plenary

Angiosperms: recommendations and protocols

• PCR using a good quality thermostable Taq polymerase– fewer amplicons obtained with cheaper alternatives

• Clean-up amplicons and sequence using 0.2M trehalose• Poor sequences due to mononucleotide motifs can be

sequenced using Phusion Taq and primer xF+ERIR

Online resources:matK barcoding protocols will made be available on ConnectOrdinal alignments available for specific primer design for

problematic taxaStatistics on primer mismatch and mono-nucleotide motifs

available sorted by taxon

Page 12: Peter Hollingsworth - Plants Plenary

Angiosperms: matK barcode summary

The 2-step protocol recommended here allowed >90% of samples from a wide taxonomic range to be sequenced for matK

Need to assess whether this is robust to different laboratory environments and plant groups

Page 13: Peter Hollingsworth - Plants Plenary
Page 14: Peter Hollingsworth - Plants Plenary

The Guardian, 17th November, 2007

Page 15: Peter Hollingsworth - Plants Plenary

Gymnosperms: matK barcodes

Gymnosperms include ca. 1100 speciesMany economically/ecologically important and/or rare taxa

Full length matK alignment for primer design:>800 accessions representing all genera downloaded from GenBank

Gymnosperm matK quite conserved:conserved priming sites can be located, but divergent in Gnetales

Sample set:All 86/86 genera (N=119) including Ginkgo

sensu Christenhusz et al (2011) Phytotaxa 19, 55-70

Page 16: Peter Hollingsworth - Plants Plenary

Gymnosperms: matK barcodes

All gymnosperms: N=95 N=16 N=8Conifers Cycads Gnetophytes

rbcL 89% 100% 100%A GYMF1A+R1A 86% 100% 38%B1 GYM-F+GYM-R 86% 100% 25%B2 GNE-F+GNE-R na na 88%matK A+B 95% 100% 100%

7 failures in conifers for matK also failed for rbcL suggests primer mismatch not the reason for failure

Recommendation:1st round PCR and SEQ with GYM-F1A+R1A, 2nd round PCR and SEQ using GYM-F+GYM-R for conifers and cycads,

and GNE-F+GNE-R for gnetophytes

Page 17: Peter Hollingsworth - Plants Plenary

Ferns & allies: matK barcodes

Ferns and allies include ca. 10,000 speciesca. 90% of these are Polypodiales

Full length matK alignment for primer design:159 accessions representing all major groups derived from several published and unpublished sources

Fern matK very variable:difficult to locate conserved sites for primer design

Variability means potentially useful barcode:Recent publication* supports use of rbcL + matK as the core fern

barcode, but further empirical utility tests required

Sample set:14/14 orders and 44/48 families (N=95)

sensu Christenhuz et al (2011) Phytotaxa 19, 7-54

*Li et al (2011) PLoS ONE 6, e26597

Page 18: Peter Hollingsworth - Plants Plenary

Ferns & allies: matK barcodes

ePCR and manual examination of alignment failed to locate any universal priming sites:Primers therefore designed at the ordinal level

Cyatheales: Single primer pair amplifies 100% (8/8 accessions)

Polypodiales: 81% successfully sequencedSingle primer pair amplifies 43/57 accessions with 2nd primer pair adding 3 accessions 5/15 failures also failed for rbcL

Primers for lycophytes and earlier diverging orders designed but as yet untested

Page 19: Peter Hollingsworth - Plants Plenary

Liverworts: matK barcodes

Liverworts include ca. 5000 known speciesca. 90% of these are leafy liverworts

Full length matK alignment for primer design:56 accessions representing all major groups including many de novo sequences

Liverwort matK very variable:difficult to locate conserved sites for primer design

Variability means potentially useful barcode

Sample set:15/15 orders and 74/82 families (N=94) sensu Crandall-

Stotler et al (2009) Edin J Bot 66, 1-44

Page 20: Peter Hollingsworth - Plants Plenary

Two-step approach:A Best single primer pair gives 72% B Four primer pairs representing major clades used

separately on failures from step 1: complex thalloids (400 spp.), simple thalloids 1 (200 spp.), simple thalloids 2 (150 spp.), leafy (4300 spp.)

Using these 4 primer pairs as a cocktail gave lower PCR success

rbcL 100% successmatK A plus B results in 90% successFailures include early diverging Treubiales and Calobryales

(only ca. 20 spp.)Full length matK sequences are the rate limiting step

Liverworts: matK barcodes

Page 21: Peter Hollingsworth - Plants Plenary

Mosses: matK barcodes

Mosses include ca. 12,800 speciesGreatest numbers and diversity in Hypnales

Full length matK alignment for primer design:66 accessions representing all major groups including many de novo sequences

Moss matK quite conserved compared to ferns and liverworts:conserved priming sites located and range of primer pairs tested

matK barcode utility unknown:lack of moss matK primers has precluded any meaningful comparisons with other markers

Sample set:29/30 orders and 92/111 families (N=107) sensu Goffinet &

Shaw

Page 22: Peter Hollingsworth - Plants Plenary

Mosses: matK barcodes

rbcL 100% PCR success

matK: 4 primer pairs testedBest primer pair sequences 82% (Best 2-step = 94%, all 4 primers =

98%)

However:All mosses except Sphagnum contain a mononucleotide motif in the

centre of the barcode region, which is difficult to sequence across.

Phusion Taq polymerase alleviates the problem, but PCR is more difficult to optimize

Best primer pair sequences 62%

Best 2-step = 75%, best 3-step = 82% (Hypnales = 85%)

Page 23: Peter Hollingsworth - Plants Plenary

2-step protocol = >95%

2-step protocol = >95%

2-step protocol = ca. 80% Polypodiales 1-step protocol = 100% CyathealesLycophyte and early-diverging lineage primers require testing

1-step protocol = >80%

3-step protocol = >80%Further primer optimization required

2-step protocol = ca. 90%Further primer optimization required

Page 24: Peter Hollingsworth - Plants Plenary

Acknowledgements

Collaborating laboratories:

Damon LittleNew York Botanic Garden

Sean GrahamUniversity of British Columbia

Gao Lian-Ming, Li De-ZhuKunming Institute of Botany

Maria KuzminaMehrdad Hajibabaei

CCDB, University of Guelph

Aron FazekasUniversity of Guelph

Suppliers of data and samples:

Olivier Maurin, Michelle van der Bank

ACCB, University of Johannesburg

Harald SchneiderNatural History Museum, London

Dietmar Quandt, Susann WickeNees Institute, University of Bonn

Fay Wei Li, ChunNeng Wang, otherNational Taiwan University

Paul WolfUtah State University

Juan Carlos VillarealUniversity of Conneticut