hannah mcpherson - plants plenary
TRANSCRIPT
Do Next Generation Sequencing
approaches provide the answer
for DNA barcoding of plants?
Hannah McPherson Marlien van der Merwe
Paul Rymer Mark Edwards Maurizio Rossetto
Landscape-level studies of the
Australian flora Species and population
dynamics
Historical and current
processes shaping
distributions and
assemblages of native
trees
Using a range of molecular
tools, life history traits and
modelling
Reproduced from Crisp et al. 2004
Next generation sequencing
Exploring new molecular tools and
approaches
NGS to assemble whole chloroplast
genomes
Use of whole chloroplast as a barcode?
Reproduced from Crisp et al. 2004
Technical approach
Full genome shotgun sequencing
Solexa Illumina platform (7Gb/lane)
• 8 labelled paired-end libraries
multiplexed in one lane
• Sub-sampled data from single lanes
No reference sequence
Reproduced from Crisp et al. 2004
2 locations
20 rainforest tree species
4 individuals pooled from each species for each site
Sampling
*
* Sydney S
Nightcap N
Reproduced from Crisp et al. 2004
reality check: sampling from
rainforests
Collecting and identifying samples
Preserving leaf material
DNA extraction
9/20 plants successfully sequenced from
both North and South
Reproduced from Crisp et al. 2004
questions
Can we bioinformatically assemble chloroplast genomes from whole genomic shotgun sequencing without a reference?
What levels of variation do we find across a broad range of species/families?
Can we mine the data for non-chloroplast regions too?
Is whole/partial chloroplast genome sequencing a viable option for barcoding?
Reproduced from Crisp et al. 2004
From Angiosperm Phylogeny Website
http://www.mobot.org/MOBOT/Research/APweb/welcome.html
Atherospermataceae
Monimiaceae
Lauraceae
Malvaceae
Pittosporaceae
Sapindaceae,
Meliaceae
Euphorbiaceae
Proteaceae
Urticaeae
Angiosperm Phylogeny
Model organism tree
Malvales
Gossypium,
Theobroma
Brachychiton
Malvaceae
Atherospermataceae Doryphora
Monimiaceae Wilkiea
Lauraceae Cinnamomum
Calycanthaceae Calycanthus
Laurales
Map trimmed reads to whole cp genome of closest relative available on Genbank (CLC)
• Consensus of N & S
De Novo assembly (CLC and Velvet) • N & S separately
• Local BLAST / cpDNA genome database
Assemble contigs to N & S reference
(Geneious Pro)
assembling chloroplast genomes
Align with annotated
90000
110000
130000
150000
170000
Bra
chychiton
Cin
nam
om
um
Cla
oxylo
n
Dip
loglo
ttis
Dory
phora
Pitto
sporu
m
Synoum
Toona
Wilkie
a
Length/closest cpDNA ref Length mapped cpDNA Length assembled contigs
assembling chloroplast genomes
Diploglottis cunninghamii
Pittosporum multiflorum
Toona ciliata
Synoum glandulosum
Doryphora sassafras
Claoxylon australe
Cinnamomum oliveri
Brachychiton acerifolius
Wilkiea huegelii
NC_008641 Gossypium barbadense
NC_008325 Daucus carota
NC_008334 Citrus sinensis
NC_010433 Manihot esculenta
NC_004993 Calycanthus floridus var. glaucus
Aligned with MAFFT
RAXML tree from
Cipres Sci Gateway
~40Kbp excluding gaps
Map trimmed reads to newly constructed
references (assembled contigs)
SNP detection (CLC)
SNP verification
• exploring data
• Sanger sequencing
quantifying variation
Reproduced from Crisp et al. 2004
SNP detection
Synoum glandulosum (~140Kbp)
• SNPs between N and S
• ~1 in 550bp
• SNPs within N and S
• N ~1 in 2800bp
• S ~1 in 4500bp
reference
Synoum N
Synoum S
reference
Synoum N
Synoum S
reference
Synoum N
Synoum S
reference
Synoum N
Synoum S
SNP detection
data mining
Chloroplast barcoding genes
Universal cpSSR markers
Other data BLAST
The question of coverage
Reproduced from Crisp et al. 2004
choroplast barcoding loci rb
cL
a-f
F
rbcL
a-r
R
rbcL
1F
rbcL
724R
accD
1 F
accD
2 F
accD
3 R
accD
4 R
matK
2.1
F
matK
2.1
a F
matK
X F
matK
3.2
R
matK
5 R
390 F
1326 R
matK
_1F
matK
_1R
matK
_2F
matK
_2R
rpo
B 1
F
rpo
B 2
F
rpo
B 3
R
rpo
B 4
R
rpo
C1 1
F
rpo
C1 2
F
rpo
C1 3
R
rpo
C1 4
R
ycf5
1 F
ycf5
2 F
ycf5
3 R
ycf5
4 R
nd
hJ 1
F
nd
hJ 2
F
nd
hJ 3
R
nd
hJ 4
R
trn
H2 F
psb
AF
R
trn
H (
GU
G)
F
psb
A R
atp
F F
atp
H R
psb
K R
psb
I R
trn
L-c
F
trn
L-d
R
trn
L-e
F
trn
L-f
R
trn
L-g
F
trn
L-h
R
Brachychiton
Cinnamomum
Claoxylon
Diploglottis
Doryphora
Pittosporum
Synoum
Toona
Wilkiea
Daucus
Gossypium
Calycanthus
Citrus
Vijayan and Tsou 2010
universal cpSSR primers
ccm
p1F
ccm
p1R
ccm
p2F
ccm
p2R
ccm
p3F
ccm
p3R
ccm
p4F
ccm
p4R
ccm
p5F
ccm
p5R
ccm
p6F
ccm
p6R
ccm
p7F
ccm
p7R
ccm
p8F
ccm
p8R
ccm
p9F
ccm
p9R
ccm
p10F
ccm
p10R
Brachychiton
Cinnamomum
Claoxylon
Diploglottis
Doryphora
Pittosporum
Synoum
Toona
Wilkiea
Daucus
Gossypium
Calycanthus
Citrus `
Weising and Gardner 1999
data mining
26S coverage ~35-300
Rpb2 only returned when sequence
available in same family or sister family
coverage ~3-5
Resistance genes – good return but
coverage ~2-10
Leafy – no returns
Reproduced from Crisp et al. 2004
data mining
Matches were good
Seem to be in more conserved bits
Single copy nuclear genes present but
low coverage
Some difficulty retrieving regions
depending on available data for BLAST
Reproduced from Crisp et al. 2004
viability for barcoding
Large portion of the chloroplast genome
retrieved and easily assembled even
without a reference
Potential for retrieving other regions with
increased coverage/ carefully designed
multiplexing
Reproduced from Crisp et al. 2004
to sum up the story so far
We can assemble large portions of chloroplast genomes from whole genomic shotgun sequencing even without a reference
Variation is low and varies from family to family
Single copy nuclear genes present but low coverage?
Is whole/partial chloroplast genome sequencing a viable option for barcoding?
Reproduced from Crisp et al. 2004
acknowledgements
Friends of the Botanic Gardens Trust
Southern Cross University – Robert
Henry Nicole Rice Stirling Bowen
Evolutionary Ecology team at the Royal
Botanic Gardens Sydney
Emma McIntosh Alexander Dohms
Juelian Siow Ashlee Wakefield
Reproduced from Crisp et al. 2004