genome projects in invasion biologyconservation genetics 1 3...
TRANSCRIPT
Vol.:(0123456789)1 3
Conservation Genetics https://doi.org/10.1007/s10592-019-01224-x
REVIEW ARTICLE
Genome projects in invasion biology
Michael A. McCartney1 · Sophie Mallez1 · Daryl M. Gohl2,3
Received: 9 July 2018 / Accepted: 4 September 2019 © Springer Nature B.V. 2019
AbstractAdvances in sequencing and informatics and rapidly falling costs have made genome sequencing projects far more acces-sible to researchers in all of the life sciences, including invasion biology. A complete genome is now the most efficient first step towards identifying and characterizing candidate genes that control invasiveness. At the genomic level, fundamental problems in invasion science can be pursued with great precision and rigor. This includes reconstruction of the history of invasions, analysis of demographic dynamics within colonizing populations, and study of the rapid, adaptive evolution of invasiveness. This update documents new developments in the emerging field of invasion genomics. Our review found that of 100 of the world’s most damaging invasive species, assembled genomes are available for 27—a minority but still a con-siderable resource. This calls for a larger investment in genomics, but also highlights publicly available genomic resources for invasive species that remain underutilized. We examine the value of reference genomes. We conclude that while some technologies (e.g. genotyping by Next Generation Sequencing) can be applied without reference genomes or with fragmented ones, investments in high quality genome assemblies will provide considerable long-term benefits in invasion and conserva-tion genomics research programs.
Keywords Invasion genomics · Invasive species · Population genomics · Genome assembly
Introduction
A sequenced genome will soon become a routine element of biological research. Costs drop almost monthly, and advances in data collection and analysis occur so rapidly that projects often take advantage of new inventions while under-way. One notable recent example is the Mexican axolotyl, which required a new algorithm to be written to assemble its 32 gigabase genome (Nowoshilow et al. 2018)—10 times the length of Homo sapiens. Many of the life sciences can now benefit from the power of genomics, and this includes invasion biology. Here we review contributions of genomics
to the study of biological invasions to date, highlight some future directions, and comment on research strategies.
Historical framework
Although both fields have a similarly brief history, progress in invasion genomics has been slow relative to conservation genomics; the latter the subject of excellent reviews (Luikart et al. 2003; Allendorf et al. 2010, 2013; Allendorf 2017). Allendorf (2017) notes that genomics has been applied to natural populations of non-model species for less than 2 decades. He cites Black et al. (2001) as the first publication to use the term “population genomics.” This is an important paper, aimed at entomologists but of interest to a broader audience in population genetics. The authors made a strong case for genomic approaches to previously unresolved issues in molecular population genetics theory, and to insect pest control (Black et al. 2001). Luikart et al. (2003) is another important foundational review and perspective, focused on the utility of genomics to approach long-standing problems in conservation and population genetics, such as the relation-ship between census size and Ne, and outlier testing for loci under selection.
* Michael A. McCartney [email protected]
1 Minnesota Aquatic Invasive Species Research Center and Dept. of Fisheries, Wildlife and Conservation Biology, University of Minnesota, 2003 Upper Buford Circle, St. Paul, MN 55108, USA
2 University of Minnesota Genomics Center, 2003 6th Street SE, Minneapolis, MN 55455, USA
3 Department of Genetics, Cell Biology, and Developmental Biology, University of Minnesota, Minneapolis, MN 55455, USA
Conservation Genetics
1 3
While its origins can be traced to work on the evolution-ary genetics of colonizing species more than 50 years ago (ya) (e.g. Baker and Stebbins 1965) “invasion genetics” emerged as a discipline in the late 1990’s (Barrett 2017). Estoup and Guillemaud (2010) is an influential paper that helped transform invasion genetics into the active, vibrant field it is today by popularizing statistically robust meth-ods for contrasting scenarios for the history of invasions. They considered natural and human-mediated range expan-sions under the same umbrella, stressed their complemen-tary value for research (different timescales capture differ-ent phases of an historical process), and offered pragmatic advice for analysis. For example, given the short time scales over which “biological” (i.e. human-mediated) invasions occur, they emphasized analyses that do not depend on pop-ulation genetic equilibria, such as genetic clustering meth-ods, and provided a how-to for using coalescent simulations to test alternative invasion scenarios using Approximate Bayesian Computation (ABC). Importantly, they recognized that resolving invasion sources and pathways would inform management, but also that such information is needed in basic research. For example, identification of geographic source populations is required for evolutionary comparisons between native and invasive-range populations (Estoup and Guillemaud 2010).
The literature of invasion genetics since 2010 is filled with attempts to reconstruct colonization history (Cristescu 2015). Until most recently, these studies utilized single or few loci [a few using multilocus PCR-based AFLP finger-printing are exceptions], but genome-scale studies are gain-ing momentum (Cristescu 2015; Elleouet and Aitken 2018). Like Allendorf (2017) and Cristescu (2015) recognized work on natural range expansions of the threespine stickleback [Gasterosteus aculeatus: (Hohenlohe et al. 2010; Catchen et al. 2013a)] as the first genuine population genomic stud-ies. Reviews of invasion genetics document the growing interest in genomic technologies. Chown et al. (2015) on climate change and invasiveness, Sherman et al. (2016) and Bourne et al. (2018) on genomics of marine invasive species, and Pelissie et al. (2018) on insect pest species are examples.
The stickleback studies (and many others since) employed RAD-seq (Restriction-site Associated DNA sequencing), a technology for genome-wide discovery and genotyping of single nucleotide polymorphisms (SNPs) via next gen-eration sequencing (NGS) of restriction-digested genomic DNA regions flanking the restriction sites. Genotyping at SNP loci via RAD-seq, Genotyping By Sequencing (GBS), and related protocols have been covered in several excel-lent reviews that report on the continuous improvement of molecular and analytical methods (Davey et al. 2011; Nielsen et al. 2011; Catchen et al. 2013b; Narum et al. 2013; Mastretta-Yanes et al. 2015; Andrews et al. 2016). These are “reduced representation” methods—low coverage NGS
protocols that sequence portions of the genome adjacent to the ends of genomic DNA fragments generated by various methods, and broadly survey the genome to discover and genotype thousands of SNPs.
Scope of this review
Our group is involved in a de novo genome sequencing pro-ject on the zebra mussel, a highly damaging aquatic invasive species. The present report was inspired by the review of NGS studies of biological invasions by Rius et al. (2015). Our goals are to update this valuable review, to revisit pro-gress in invasion genomics and to evaluate the contributions of de novo genome sequencing projects. Rius et al. (2015) tallied few whole-genome sequencing projects on invasive species; in fact the NGS applications they reviewed were dominated by analysis without a reference genome or with a fragmented one. In the 4 years since, a high-quality reference genome has become increasingly accessible; already avail-able for several invasive species, and well within reach for a growing number of researchers to generate for their species of interest. Still, since a reference genome represents a sub-stantial up-front cost for projects in invasion genomics, one of our aims was to examine the advantages of having one.
Genomes in invasion biology
To re-evaluate contributions of genomics to invasive spe-cies research to date, we first searched the 100 alien inva-sive species that are the “world’s worst,” [according to the International Union for the Conservation of Nature (IUCN: Lowe et al. 2000)] for assembled genomes deposited at the US National Center for Biotechnology Information (NCBI 2019a, accessed 12 Aug 2019). We found assemblies for 27 of these 100 species (Table 1). At first glance, this represents a sizable investment in genomics. But of course, there are many reasons to sequence a genome, so next we examined the pub-lications that announced the public release of these genomes, and those published soon after in some cases to recheck our conclusions. About half of the projects made no mention of the invasiveness of the species, and stressed instead the eco-nomic value and/or use of the species as a research model [e.g. Oncorhynchus mykiss (rainbow trout), Sus scrofa (pig), Capra hircus (goat), Mus musculus (mouse)]. In 13 of these projects, authors referred to the species’ invasiveness in the rationale for the sequencing project; in most of these cases they also conducted explicit analysis of the genome to address questions relevant to invasion biology (Table 2). The earlier review by Rius et al. (2015) tallied 118 projects on invasive species that had used NGS technologies. At that time, only 7 were genome sequencing projects de novo. They also found
Conservation Genetics
1 3
Tabl
e 1
Seq
uenc
ed g
enom
es fr
om 1
00 o
f the
wor
ld’s
wor
st al
ien
inva
sive
spec
ies
Spec
ies
Com
mon
nam
eH
ighe
r tax
on o
r gr
oup
Maj
or im
pact
sSt
rain
or i
sola
te
(yea
r): a
ssem
bly
acce
ssio
n #
Ass
embl
y le
vel
Num
ber o
f sc
affol
dsSc
affol
d N
50
(bp)
Num
ber o
f co
ntig
sC
ontig
N50
(b
p)G
enom
e le
ngth
(Gb)
Rana
cat
esbe
i-an
aB
ullfr
ogA
mph
ibia
nPr
eys u
pon
and
outc
om-
pete
s nat
ive
amph
ibia
ns
Bru
no is
olat
e (2
017)
: G
CA_0
0228
4835
.2Sc
affol
d1,
544,
635
39,3
632,
124,
505
5415
6.25
Rhin
ella
m
arin
aC
ane
toad
Am
phib
ian
Toxi
c sk
in
glan
ds p
oiso
n pr
edat
ors
upon
inge
s-tio
n, e
ndan
-ge
ring
nativ
e sp
ecie
s
Wild
(201
8):
GCA
_900
3032
85.1
Con
tig–
–31
,391
167,
498
2.55
2
Pom
acea
can
a-lic
ulat
aG
olde
n ap
ple
snai
lA
quat
ic in
ver-
tebr
ate
Vora
ciou
s fe
eder
on
crop
s and
na
tive
vege
ta-
tion
Isol
ate
SZH
N20
17
(201
8):
GCA
_003
0730
45.1
Chr
omos
ome
2431
,531
,291
746
1,07
2,85
70.
44
Mne
mio
psis
le
idyi
Com
b je
llyA
quat
ic in
ver-
tebr
ate
Inva
sive
ca
rniv
ore
that
co
nsum
es
zoop
lank
ton
Wild
(201
1):
GCA
_000
2260
15.1
Scaff
old
5100
187,
314
24,9
2711
,914
0.15
6
Myt
ilus g
allo
-pr
ovin
cial
isM
edite
rren
ean
blue
mus
sel
Aqu
atic
inve
r-te
brat
eM
arin
e m
usse
l th
at d
ispl
aces
na
tive
spec
ies
Wild
(201
7):
GCA
_001
6769
15.1
Scaff
old
1,00
2,33
429
311,
136,
100
2627
1.5
Stur
nus v
ul-
gari
sSt
arlin
gB
irdO
utco
mpe
tes
nativ
e bi
rds
for n
estin
g si
tes a
nd
dam
ages
fr
uits
and
ot
her c
rops
Isol
ate
715
(201
5):
GCA
_001
4472
65.1
Scaff
old
2361
3,41
6,70
822
,666
151,
865
1.03
7
Gam
busi
a affi
nis
Wes
tern
mos
-qu
ito fi
shFi
shC
ause
s dec
line
and
extin
c-tio
n of
oth
er
smal
l nat
ive
fishe
s thr
ough
co
mpe
titio
n
NE0
1/N
JP10
02.9
(2
018)
: G
CA_0
0309
7735
.1
Scaff
old
2943
6,65
1,46
073
,682
17,5
110.
599
Conservation Genetics
1 3
Tabl
e 1
(con
tinue
d)
Spec
ies
Com
mon
nam
eH
ighe
r tax
on o
r gr
oup
Maj
or im
pact
sSt
rain
or i
sola
te
(yea
r): a
ssem
bly
acce
ssio
n #
Ass
embl
y le
vel
Num
ber o
f sc
affol
dsSc
affol
d N
50
(bp)
Num
ber o
f co
ntig
sC
ontig
N50
(b
p)G
enom
e le
ngth
(Gb)
Cyp
rinu
s ca
rpio
Com
mon
car
pFi
shU
proo
ts a
quat
ic
vege
tatio
n,
caus
ing
decl
ines
in
plan
ts, o
ther
fis
hes a
nd
wat
er q
ualit
y
NA
(201
4):
GCA
_000
9516
15.2
Chr
omos
ome
9378
7,82
8,95
953
,088
75,0
801.
714
Onc
orhy
nchu
s m
ykis
sR
ainb
ow tr
out
Fish
Prey
s upo
n an
d ou
tcom
pete
s na
tive
fishe
s;
ofte
n hy
brid
-iz
es w
ith
nativ
e tro
ut
Swan
son
iso-
late
(201
7):
GCA
_002
1634
95.1
Chr
omos
ome
139,
800
1,67
0,13
855
9,85
513
,827
2.17
9
Apha
nom
yces
as
taci
Cra
yfish
pla
gue
Fung
usW
ater
mol
d le
thal
to
Euro
pean
cr
ayfis
h bu
t en
dem
ic in
N
orth
Am
eri-
can
host
spec
ies
Stra
in A
PO3
(201
4):
GCA
_000
5200
75.1
Scaff
old
835
657,
536
4659
36,4
390.
076
Batra
-ch
ochy
triu
m
dend
roba
tidis
Frog
chy
trid
fung
usFu
ngus
Cau
se o
f po
pula
tion
decl
ines
and
ex
tinct
ions
of
am
phib
ian
spec
ies
Isol
ate
JAM
81
(201
1):
GCA
_000
2037
95.1
Scaff
old
127
1,48
4,46
251
031
8,11
40.
024
Phyt
opht
hora
ci
nnam
omi
Phyt
opht
hora
ro
ot ro
tFu
ngus
Emer
ging
pla
nt
path
ogen
in
fect
-in
g ~ 50
00
nativ
e fo
rest
trees
and
cr
op p
lant
s w
orld
wid
e
Stra
in M
P94-
48 (2
015)
: G
CA_0
0131
4365
.1
Scaff
old
5777
24,8
6958
3124
,715
0.05
4
Euph
orbi
a es
ula
Leaf
y sp
urge
Land
pla
ntA
ggre
ssiv
e w
eed
in
rang
elan
ds
of N
orth
A
mer
ica
Cul
tivar
198
4-N
D00
1 (2
018)
: G
CA_0
0291
9075
.1
Scaff
old
1,63
3,09
410
352,
242,
201
605
1.12
5
Conservation Genetics
1 3
Tabl
e 1
(con
tinue
d)
Spec
ies
Com
mon
nam
eH
ighe
r tax
on o
r gr
oup
Maj
or im
pact
sSt
rain
or i
sola
te
(yea
r): a
ssem
bly
acce
ssio
n #
Ass
embl
y le
vel
Num
ber o
f sc
affol
dsSc
affol
d N
50
(bp)
Num
ber o
f co
ntig
sC
ontig
N50
(b
p)G
enom
e le
ngth
(Gb)
Mus
mus
culu
sM
ouse
Mam
mal
Econ
omic
pe
sts, c
arrie
rs
of h
uman
di
seas
e,
seve
ral n
ega-
tive
impa
cts
on in
vade
d ec
osys
tem
s
Stra
in C
57B
L/6
J (2
017)
: G
CA_0
0000
1635
.8
Chr
omos
ome
162
54,5
17,9
5160
532
,813
,180
2.73
1
Ory
ctol
agus
cu
nicu
lus
Rab
bit
Mam
mal
Deg
rade
s bi
odiv
ersi
ty,
parti
cula
rly
in in
trodu
ced
area
s tha
t la
ck p
reda
tors
Thor
beck
e in
bred
br
eed
(200
5):
GCA
_000
0036
25.1
Chr
omos
ome
3318
35,9
72,8
7184
,024
64,6
482.
737
Felis
cat
usD
omes
tic c
atM
amm
alVo
raci
ous
pred
ator
s on
nativ
e bi
rds,
rept
iles a
nd
mam
mal
s, ca
usin
g lo
cal
extin
ctio
ns
Cin
nam
on is
o-la
te (2
017)
: G
CA_0
0018
1335
.4
Chr
omos
ome
4525
83,9
67,7
0749
0941
,915
,695
2.52
2
Mac
aca
fas-
cicu
lari
sC
rab-
eatin
g m
acaq
ueM
amm
alLo
wer
nat
ive
bird
div
ersi
ty
by e
atin
g eg
gs a
nd
chic
ks, a
nd
com
petin
g fo
r fo
od
Wild
(201
3):
GCA
_000
3643
45.1
Chr
omos
ome
7625
88,6
49,4
7587
,764
86,0
402.
947
Cer
vus e
laph
usRe
d de
erM
amm
alSt
rong
impa
cts
on n
ativ
e fo
rest
flora
an
d fa
una
in
inva
ded
rang
e
Subs
peci
es h
ippe
la-
phus
, Hun
garia
n is
olat
e (2
017)
: G
CA_0
0219
7005
.1
Chr
omos
ome
11,4
7910
7,35
8,00
640
6,63
779
443.
439
Sus s
crof
aPi
gM
amm
alFe
ral p
igs a
re
pests
of c
rops
an
d pr
oper
ty,
dig
up n
ativ
e ve
geta
tion,
pr
ey o
n se
vera
l nat
ive
spec
ies
Cro
ssbr
eed
isol
ate
2014
2300
4 (2
017)
: G
CA_0
0000
3025
.6
Chr
omos
ome
14,1
5713
1,45
8,09
814
,818
6,37
2,40
72.
755
Conservation Genetics
1 3
Tabl
e 1
(con
tinue
d)
Spec
ies
Com
mon
nam
eH
ighe
r tax
on o
r gr
oup
Maj
or im
pact
sSt
rain
or i
sola
te
(yea
r): a
ssem
bly
acce
ssio
n #
Ass
embl
y le
vel
Num
ber o
f sc
affol
dsSc
affol
d N
50
(bp)
Num
ber o
f co
ntig
sC
ontig
N50
(b
p)G
enom
e le
ngth
(Gb)
Cap
ra h
ircus
Goa
tM
amm
alVo
raci
ous
graz
ers w
ith
grea
t im
pact
s to
veg
etat
ion
and
casc
ad-
ing
effec
ts,
parti
cula
rly
on is
land
s
San
Cle
men
te
bree
d (2
016)
: G
CA_0
0170
4415
.1
Chr
omos
ome
29,9
0787
,277
,232
30,3
9926
,244
,591
2.92
3
Plas
mod
ium
re
lictu
mA
vian
mal
aria
Prot
istPa
rasi
tes o
f bi
rds,
caus
ing
wid
e-ra
ngin
g le
vels
of
mor
talit
y;
extin
ctio
ns o
f H
awai
ian
bird
sp
ecie
s
Stra
in S
GS1
(201
5):
GCA
_900
0057
65.1
Chr
omos
ome
514
1,28
7,09
872
458
3,86
10.
023
Line
pith
ema
hum
ileA
rgen
tine
ant
Terr
estri
al
inve
rtebr
ate
Ofte
n di
spla
ces
nativ
e an
tsW
ild (2
011)
: G
CA_0
0021
7595
.1Sc
affol
d30
301,
402,
257
18,2
2735
,858
0.22
Anop
loph
ora
glab
ripe
nnis
Asi
an lo
ng-
horn
ed b
eetle
Terr
estri
al
inve
rtebr
ate
Woo
d fe
edin
g pe
st of
tree
s in
fore
sts a
nd
urba
n se
tting
s
ALB
-LA
RVA
E (2
016)
: G
CA_0
0039
0285
.2
Scaff
old
9867
678,
234
26,7
4980
,490
0.70
7
Bem
isia
taba
ciSw
eet p
otat
o w
hite
flyTe
rres
trial
in
verte
brat
ePe
st of
veg
eta-
ble
crop
s and
or
nam
enta
ls
with
vas
t hos
t ra
nge
Isol
ate
MEA
M1
(201
6):
GCA
_001
8549
35.1
Scaff
old
19,7
513,
232,
964
31,5
7184
,501
0.61
5
Sole
nops
is
invi
cta
Red
impo
rted
fire
ant
Terr
estri
al
inve
rtebr
ate
Hig
hly
dam
ag-
ing
nuis
ance
sp
ecie
s and
pe
st of
cro
p pl
ants
, liv
e-sto
ck
Wild
(201
8):
GCA
_000
1880
75.2
Scaff
old
66,9
0462
1,03
987
,016
21,1
610.
398
Was
man
nia
auro
punc
tata
Littl
e fir
e an
tTe
rres
trial
in
verte
brat
eSt
ingi
ng a
nts
that
dis
plac
e na
tive
spec
ies
and
harm
cr
op p
lant
s
Stra
in W
ASH
AW1
(201
5):
GCA
_000
9562
35.1
Scaff
old
77,7
881,
175,
369
103,
610
37,9
120.
324
Conservation Genetics
1 3
From
thes
e 10
0 sp
ecie
s, ge
nom
es b
een
sequ
ence
d, a
ssem
bled
and
dep
osite
d at
NC
BI f
or th
e 27
spec
ies b
elow
. The
five
col
umns
with
ital
iciz
ed h
eadi
ngs p
rovi
de m
etric
s for
the
leng
th a
nd q
ual-
ity o
f the
seq
uenc
ed g
enom
es (B
ox 1
). A
ssem
bly
leve
l: ch
rom
osom
e le
vel i
s w
hen
scaff
olds
are
link
ed to
geth
er s
uch
that
bio
logi
cal c
hrom
osom
es a
re a
ssem
bled
to c
ompl
etio
n, o
r nea
rly s
o.
Bec
ause
gap
s re
mai
n, a
ssem
blie
s at
the
chro
mos
ome
leve
l typ
ical
ly h
ave
cont
igs
that
wer
e no
t ass
igne
d to
chr
omos
omes
, suc
h th
at th
e nu
mbe
r of s
caffo
lds
exce
eds
the
num
ber o
f bio
logi
cal
chro
mos
omes
in m
ost c
ases
. Gen
ome
leng
th is
the
tota
l len
gth
of th
e as
sem
bled
gen
ome
in m
egab
ase
pairs
(Mb)
Tabl
e 1
(con
tinue
d)
Spec
ies
Com
mon
nam
eH
ighe
r tax
on o
r gr
oup
Maj
or im
pact
sSt
rain
or i
sola
te
(yea
r): a
ssem
bly
acce
ssio
n #
Ass
embl
y le
vel
Num
ber o
f sc
affol
dsSc
affol
d N
50
(bp)
Num
ber o
f co
ntig
sC
ontig
N50
(b
p)G
enom
e le
ngth
(Gb)
Aede
s alb
op-
ictu
sA
sian
tige
r m
osqu
itoTe
rres
trial
in
verte
brat
eW
ides
prea
d ve
ctor
of y
el-
low
, den
gue
and
Chi
kun-
guny
a fe
ver
viru
ses
Fosh
an is
o-la
te (2
015)
: G
CA_0
0144
4175
.2
Scaff
old
154,
782
201,
017
355,
061
18,4
301.
923
that invasion biology per se had motivated only 40 or about 33% of projects—lower than our count.
Meanwhile, whole genomes of invasive species are being sequenced at an increasingly brisk pace. Of the 27 we reviewed, 13 genomes were completed from 2005 to 2015, with the remaining 14 in the last three years. This parallels the rise in the number of “Genome sequencing and assem-bly” BioProjects deposited over the same time intervals (NCBI 2019b, accessed 13 August 2019). So our first con-clusion is that the majority of high-priority invasive species lack sequenced genomes, pointing to a need for more. On the other hand, a growing number of genomes can be mined to ask interesting, fundamental questions.
Topics in invasion genomics research
History and routes of invasion
Determining history and geographic pathways of coloni-zation is the most common goal of invasion genetic and more recently, genomic studies. Below we describe several cases in which both have been used, to allow comparison. We begin with examples of natural range expansions. In a recent review Cristescu (2015) concluded that natural and human-mediated range expansions share similar evolution-ary dynamics, played out over different time scales. We examine two natural range expansions that followed the retreat of Pleistocene glaciers across North America.
Earlier population genetic work with multiple-locus nuclear markers (microsatellites and single nuclear loci) demonstrated that threespine stickleback (Gasterosteus acu-leatus) populations in freshwater drainages along the Pacific coast of North America originated from multiple independ-ent colonizations from ancestral marine populations (Cresko et al. 2004; Catchen et al. 2013a). These events have pro-vided an opportunity to study the evolutionary consequences of these “replicated” freshwater invasions during the past 10–20,000 years. For example, freshwater populations have evolved, in parallel, reductions in the bony plate armor cov-ering the body of these small fish. One likely cause is that selection favoring the protective armor in the more predator-rich marine environment was relaxed as sticklebacks entered lakes and streams (Cresko et al. 2004).
Hohenlohe et al. (2010) investigated parallel phenotypic evolution in two marine and three freshwater stickleback populations in southeast Alaska, using genomic approaches. The first stickleback reference genome (NCBI 2019c, accessed 13 August 2019) came from one of the freshwa-ter populations, and aided in the scoring of SNP genotypes by RAD-seq. Genomic data supported the “replicate inva-sion scenario” described above. Next, the authors used a moving average to scan windows of DNA sequence, and
Conservation Genetics
1 3
Tabl
e 2
Gen
ome
sequ
enci
ng o
f the
wor
ld’s
wor
st al
ien
inva
sive
spec
ies:
pro
ject
goa
ls
Spec
ies
Com
mon
nam
eH
ighe
r tax
on o
r gro
upSt
ated
goa
ls fo
r gen
ome
proj
ect
Refe
renc
e(s)
Inva
sion
ge
nom
ic
anal
ysis
Rana
cat
esbe
iana
Bul
lfrog
Am
phib
ian
Gen
omic
reso
urce
for b
iolo
gy o
f tru
e fro
gs
(Ran
idae
); fo
cus o
n de
velo
pmen
tal b
iol-
ogy
Ham
mon
d et
al.
(201
7)N
Rhin
ella
mar
ina
Can
e to
adA
mph
ibia
nLa
ck o
f dra
ft ge
nom
e to
und
erst
and
inva
-si
vene
ssEd
war
ds e
t al.
(201
8)N
Pom
acea
can
alic
ulat
aG
olde
n ap
ple
snai
lA
quat
ic in
verte
brat
eH
igh
qual
ity g
enom
e to
stud
y ge
nes e
volv
-in
g ad
aptiv
ely;
gen
es c
ontro
lling
stre
ss
tole
ranc
e; m
etag
enom
ics o
f gut
flor
a
Liu
et a
l. (2
018)
Y
Mne
mio
psis
leid
yiC
omb
jelly
Aqu
atic
inve
rtebr
ate
Early
met
azoa
n ev
olut
ion,
orig
ins o
f cel
l ty
pes
Ryan
et a
l. (2
013)
N
Myt
ilus g
allo
prov
inci
alis
Med
iterr
enea
n bl
ue m
usse
lA
quat
ic in
verte
brat
eG
enom
ic k
now
ledg
e of
myt
ilid
mus
sels
Mur
gare
lla e
t al.
(201
6)N
Stur
nus v
ulga
ris
Star
ling
Bird
Gen
omic
reso
urce
inva
sive
spec
ies
Unp
ublis
hed
Gam
busi
a affi
nis
Wes
tern
mos
quito
fish
Fish
Gen
omic
reso
urce
inva
sive
spec
ies
Hoff
berg
et a
l. (2
018)
YC
ypri
nus c
arpi
oC
omm
on c
arp
Fish
Gen
omic
reso
urce
for a
quac
ultu
re a
nd
mar
ker-a
ssist
ed b
reed
ing
of o
rnam
enta
l va
rietie
s (e.
g. k
oi)
Xu
et a
l. (2
014)
N
Onc
orhy
nchu
s myk
iss
Rai
nbow
trou
tFi
shEv
olut
iona
ry a
naly
sis o
f ver
tebr
ate
who
le
geno
me
dupl
icat
ion;
reso
urce
for b
iolo
gi-
cal r
esea
rch
on m
odel
fish
spec
ies a
nd fo
r aq
uacu
lture
Ber
thel
ot e
t al.
(201
4)N
Apha
nom
yces
ast
aci
Cra
yfish
pla
gue
Fung
usRe
fere
nce
geno
me
for m
itoge
nom
ic a
naly
sis
and
sour
ce tr
acki
ng o
f inv
asiv
e ge
noty
pes
Mak
kone
n et
al.
(201
6) a
nd M
inar
di e
t al.
(201
8)Y
Batra
choc
hytr
ium
den
drob
atid
isFr
og c
hytri
d fu
ngus
Fung
usPo
pula
tion
geno
mic
s of e
mer
ging
dis
ease
an
d ev
olut
iona
ry tr
ansi
tion
to p
atho
geni
c-ity
Farr
er e
t al.
(201
1) a
nd Jo
neso
n et
al.
(201
1)Y
Phyt
opht
hora
cin
nam
omi
Phyt
opht
hora
root
rot
Fung
usFa
ctor
s inv
olve
d in
pla
nt-p
atho
gen
inte
rac-
tions
and
mar
ker d
evel
opm
ent t
o stu
dy
spre
ad o
f em
ergi
ng d
isea
se
Stud
holm
e et
al.
(201
6) a
nd E
ngel
brec
ht
et a
l. (2
017)
Y
Euph
orbi
a es
ula
Leaf
y sp
urge
Land
pla
ntD
raft
geno
me
(of a
llohe
xapl
oid)
to u
nder
-st
and
wee
dine
ss g
enes
in sp
ecie
s with
m
ore
geno
mic
reso
urce
s
Hor
vath
et a
l. (2
018)
Y
Mus
mus
culu
sM
ouse
Mam
mal
Gen
omic
reso
urce
, bio
med
ical
mod
el sp
e-ci
esRo
berts
et a
l. (2
009)
N
Ory
ctol
agus
cun
icul
usR
abbi
tM
amm
alEv
olut
iona
ry g
enom
ics o
f dom
estic
atio
nC
arne
iro e
t al.
(201
4)N
Felis
cat
usD
omes
tic c
atM
amm
alEv
olut
iona
ry g
enom
ics o
f dom
estic
atio
nTa
maz
ian
et a
l. (2
014)
NM
acac
a fa
scic
ular
isC
rab-
eatin
g m
acaq
ueM
amm
alK
now
ledg
e of
intro
gres
sion
from
con
ge-
ner a
dvis
es u
se o
f spe
cies
as b
iom
edic
al
mod
el
Hig
ashi
no e
t al.
(201
2)N
Conservation Genetics
1 3
For t
he 2
7 sp
ecie
s tal
lied
in T
able
1, t
he re
fere
nce(
s) li
sted
belo
w (a
nd N
CB
I res
ourc
es in
the
case
of u
npub
lishe
d ge
nom
es) w
ere
exam
ined
for t
he st
ated
goa
ls o
f the
se g
enom
e pr
ojec
ts. “
Inva
-si
on g
enom
ic a
naly
sis”
was
evi
denc
ed b
y pu
blis
hed
appl
icat
ion
to re
sear
ch p
robl
ems i
n in
vasi
on b
iolo
gy a
nd c
itatio
ns to
lite
ratu
re in
the
field
Tabl
e 2
(con
tinue
d)
Spec
ies
Com
mon
nam
eH
ighe
r tax
on o
r gro
upSt
ated
goa
ls fo
r gen
ome
proj
ect
Refe
renc
e(s)
Inva
sion
ge
nom
ic
anal
ysis
Cer
vus e
laph
usRe
d de
erM
amm
alG
WA
S fo
r kno
wle
dge
of h
alf-
dom
estic
ated
fa
rm-b
red
anim
als
Ban
a et
al.
(201
8)N
Sus s
crof
aPi
gM
amm
alD
e no
vo a
ssem
bly
and
com
paris
on o
f SN
Ps
acro
ss p
ig b
reed
s wor
ldw
ide
Li e
t al.
(201
7)N
Cap
ra h
ircus
Goa
tM
amm
alH
i-C sc
affol
ding
of r
efer
ence
gen
ome:
do
mes
tic g
oat b
reed
s (m
ost c
ontig
uous
m
amm
alia
n ge
nom
e at
the
time)
Bic
khar
t et a
l. (2
017)
N
Plas
mod
ium
relic
tum
Avi
an m
alar
iaPr
otist
Dra
ft ge
nom
es fr
om li
neag
e th
at in
fect
s bi
rds a
nd st
udy
of e
volu
tion
of in
vasi
on-
rela
ted
gene
s
Boh
me
et a
l. (2
018)
Y
Line
pith
ema
hum
ileA
rgen
tine
ant
Terr
estri
al in
verte
brat
eG
enom
ic re
sour
ce fo
r hig
hly
inva
sive
sp
ecie
s–m
odel
ant
spec
ies f
or in
vasi
on
geno
mic
s
Smith
et a
l. (2
011)
Y
Anop
loph
ora
glab
ripe
nnis
Asi
an lo
ngho
rned
bee
tleTe
rres
trial
inve
rtebr
ate
Gen
omic
s of p
lant
-feed
ing
inva
sive
spec
ies,
McK
enna
et a
l. (2
016)
YBe
mis
ia ta
baci
Swee
t pot
ato
whi
tefly
Terr
estri
al in
verte
brat
eG
enom
ic re
sour
ce fo
r hig
hly
inva
sive
cro
p pe
st an
d vi
rus v
ecto
rC
hen
et a
l. (2
016)
Y
Sole
nops
is in
vict
aRe
d im
porte
d fir
e an
tTe
rres
trial
inve
rtebr
ate
Gen
omic
reso
urce
for h
ighl
y in
vasi
ve p
est;
evol
utio
n of
gen
es a
ssoc
iate
d w
ith so
cial
-ity
in H
ymen
opte
ra
Wur
m e
t al.
(201
1)Y
Was
man
nia
auro
punc
tata
Littl
e fir
e an
tTe
rres
trial
inve
rtebr
ate
Gen
omic
reso
urce
Unp
ublis
hed
Aede
s alb
opic
tus
Asi
an ti
ger m
osqu
itoTe
rres
trial
inve
rtebr
ate
Evol
utio
nary
gen
omic
ana
lysi
s of i
nvas
ive
pest
and
vect
or o
f Den
gue
and
Chi
kun-
guny
a
Che
n et
al.
(201
5)Y
Conservation Genetics
1 3
found that most of the signals of balancing and divergence-promoting selection mapped to the same genomic regions in each of the three freshwater populations (Hohenlohe et al. 2010)—strong evidence for parallel adaptive evolu-tion post-colonization. Included was the genomic region previously implicated in the control of bony plate reduc-tion, from genome-wide association studies in offspring of crosses between (the fully interfertile) marine and freshwater populations (Cresko et al. 2004; Hohenlohe et al. 2010).
A second case of recent, postglacial range expansion comes from US Atlantic coast populations of the pitcher plant mosquito Wyeomyia smithii (Emerson et al. 2010). Most of this history could not be resolved using phylogeo-graphic studies of mitochondrial DNA (mtDNA) or micro-satellite DNAs—only a northern and southern clade were revealed. With > 3700 SNPs from a RAD-seq screen, sev-eral subclades within both the southern and northern group were revealed, as was evidence that populations north of the line of southern-most advance of the Laurentian ice sheet were recolonized from refugia in the southern Appalachians (Emerson et al. 2010).
Human-mediated biological invasions occur over decades to centuries, at most, and not the tens to hundreds-of-thou-sand-year spans of natural range expansions. Nevertheless, in several cases, population genetic and phylogeographic analysis has allowed inference of invasion sources and path-ways (e.g. Miller et al. 2005; Darling et al. 2008; Ascunce et al. 2011; Lombaert et al. 2014). What new information can be gleaned from genomes? We examine two well-docu-mented cases. European green crabs (Carcinus maenas) are one of the most successful and well-studied marine inva-sive species, having spread from their large Eastern Atlantic native range (from North African Atlantic coast to Norway) to invade all continents worldwide except Antarctica. Jeffery et al. (2017) focused on the Western Atlantic invaded range along the East Coast of the US and Atlantic Canada. This was the region where the first invasive population, anywhere, was reported in the early 1800’s, followed by a much more recent one in the 1980’s. Jeffery et al. (2017) superimpose a study of 9100 SNP loci (scored using RAD-seq) onto an updated mtDNA survey. As in earlier work (Roman 2006; Darling et al. 2008, 2014), the mtDNA haplotypes detected the presence of two differentiated populations. But the SNP data provide the best evidence to date that these derive from two independent introductions, followed by secondary con-tact between two sets of descendant populations with strong genome-wide divergence.
A second example of a population genomic study of invasion history focuses on the Asian tiger mosquito, Aedes albopictus (Chen et al. 2015; Kotsakiozi et al. 2017), one of the 100 “world’s worst” (Tables 1 and 2) and an impor-tant vector of human arboviruses (e.g. dengue, Chikungu-nya and Zika). Aedes albopictus is native to Southeast Asia,
China and Japan. Its spread began about 1500 ya (SW Indian Ocean Islands), with another wave about 100 ya (Hawaii and Guam), followed 30–40 ya with spread to Europe, Africa, and North and South America. While less competent at car-rying virus than A. aegypti, public health concerns center around the ability of A. albopictus to persist in temperate environments at higher latitudes, due to facultative hormo-nally controlled diapause tied to photoperiod.
Recent invasion genetic studies include an analysis of its global expansion, using ABC model tests based on 17 microsatellite loci (Manni et al. 2017). This study revealed an intricate invasion history, further complicated by different levels of diversity in source populations, so called “chaotic” dispersion patterns, and extensive admixture post-coloni-zation. A population genomic study used a panel of 58,000 SNPs derived from double digest RAD-seq (ddRAD-seq), focusing on the native range and the most recent wave of invasions (Kotsakiozi et al. 2017). The ddRAD study pro-vided more order to the results, due to greater resolution of genetic differentiation within the native Asian and the invaded ranges. The authors question the “chaotic dispersion out of Asia” conclusion from the microsatellite study (Manni et al. 2017), which lacked the power to discern genomic dif-ferentiation across the native range. Colonization and admix-ture patterns are still very complex, but the SNP dataset sets the stage for future work to clarify the patterns of spread and if possible, use the results to plan prevention efforts.
Genomic studies even have the potential to resolve inva-sion history at the most local scales relevant to management. In North America, for example, invasive species prevention is often implemented by US States and Canadian Provinces. Sard et al. (2019) examined genomics of populations of the round goby, a highly damaging Ponto-Caspian invader of the Laurentian Great Lakes, using SNPs genotyped using RAD-seq. Gobies were collected from 18 sites along the shorelines of Lakes Michigan and Huron, and three rivers draining into Lake Huron—all sites are in the state of Michigan and all were colonized within < 15 generations. ABC model testing was used to infer sources and estimate size and number of introductions. The Flint River population showed the most unambiguous result—that it was derived from a single intro-duction from the adjacent Saginaw Bay in Lake Huron, and experienced the strongest bottleneck. But models for the Au Sable and Cheboygan Rivers also showed evidence of ori-gins from Saginaw Bay, rejected other sites in Lake Huron and more distant sites in Lake Michigan as sources, and in all cases, inferred founding populations of < 50 gobies. Since bait buckets dumped into inland waters are one likely vector for spread, management recommendations included enhanc-ing education for boaters in Saginaw Bay, and encouraging bait dealers to more effectively sort out and remove gobies from their harvests (Sard et al. 2019).
Conservation Genetics
1 3
Demography of colonizing populations
Founder effect and other demographic outcomes are often the focus of genetic studies of colonizing populations. Inter-est rose with the popularization of the so called “genetic paradox” of biological invasions—i.e. the question of how colonizers, despite expected reductions in genetic diversity, can establish growing populations and maintain the adaptive capacity to persist over time (Allendorf and Lundquist 2003; Frankham 2004; Roman and Darling 2007). The deep theo-retical roots of the issue trace to the authors of the Modern Synthesis (Baker and Stebbins 1965; Dobzhansky 1965). Based on population genetic studies, Roman and Darling (2007) found that the paradox is far from universal across aquatic species; invasive populations often show no detect-able change in diversity compared to their native ranges. Estoup et al. (2016), moreover, listed several ways in which the paradox may be spurious, or even more interestingly, how evolution post-invasion may come to overshadow evi-dence for it. What do genome-scale studies have to say about the “paradox?”
Genomic markers confirm the lack of universal reduc-tion in genetic diversity that has been found with population genetic markers. For example, RAD-seq revealed high levels of diversity and cryptic variation within invasive popula-tions of Carcinus maenus in North America (Jeffery et al. 2017), which was attributed to secondary contact between independent introductions with admixture between them, as well as geographic gradients in selection driving clinal variation at selected loci. Levels of genomic variation were found to be similar among native and invasive populations at a global scale in Aedes albopictus. Lack of diversity loss was attributed to multiple introductions and high propagule pressure, although sampling was inadequate to distinguish those alternatives (Kotsakiozi et al. 2017). With invasive plants, the genetic paradox has been a topic of considerable study, with some complex findings. While founder effects have been found to be common (Dlugosch and Parker 2007) changes in molecular marker diversity often do not parallel those in quantitative characters, nor do they correlate with invasive potential (Dlugosch and Parker 2007; Dlugosch et al. 2015; Barker et al. 2017).
Genomes from invasive plants are confirming these results. For example, yellow starthistle (Centaurea solstitia-lis) is a damaging invasive weed in Western Europe and the Americas. Populations in its native Eurasian and introduced ranges on three continents were genotyped at ~ 1000 SNP loci developed from a ddRAD screen (Barker et al. 2017). ABC contrasts favored a well-supported invasion scenario in which western European populations derived from admix-ture, long ago, between populations in eastern Europe and Asia. More recently, western Europe functioned as a genetic “bridgehead” in which evolution produced adaptations that
facilitated the success of subsequent introductions to South and North America. California invasions came from a single source region (Chile). Then divergent genotypes founded multiple, independent introductions into the Pacific North-west (from California and from admixed Western European populations). The bridgehead scenario was first introduced to invasion genetics in studies of Harmonia axyridis, the multicolored Asian lady bird beetle (Lombaert et al. 2010, 2014). Admixed bridgehead source populations in eastern North America were suggested to be where adaptations evolved that enabled rapid spread to western north America, western Europe, South America and Africa. In both H. axy-ridis and yellow starthistle, the lack of evidence for the para-dox is attributed to admixture in bridgehead locations, and adaptations evolved within genetically diverse bridgehead populations that facilitated their later invasive spread.
Evolution of invasiveness
Of great theoretical and applied interest is whether and how invasions are facilitated by adaptive evolution (Lee 2002; Sax et al. 2007; Cristescu 2015). Genomic analysis is a powerful approach for identifying evolutionary change in genes and functional networks that contribute to invasive-ness. [By “invasiveness,” we refer to the ability to establish and spread, and not to the broader definition that includes impact on biodiversity (e.g. Colautti and MacIsaac 2004; Ricciardi and Cohen 2006)]. Also, whether adaptations that favor invasiveness occur prior to, during or after establish-ment is an important question for research and management (e.g. Hufbauer et al. 2012).
The chytrid fungus Batrachochytrium dendrobatidis is a high impact invasive species (Tables 1 and 2) respon-sible for epizootics that caused widespread extinctions of amphibians. Previous invasion genetic studies with molecu-lar markers lacked the power to resolve invasion sources or to identify evolutionary events responsible for the emer-gence of this virulent pathogen. The species is a member of an early-branching lineage of non-pathogenic fungi (Divi-sion Chytridiomycota) in which all other known species are saprobes, feeding on decaying organic matter. Population genomic studies (Farrer et al. 2011; Joneson et al. 2011) have shed some light on the mystery. Whole genomes (just 24 Mb) were sequenced from 20 isolates collected world-wide (Farrer et al. 2011). Phylogenomic analysis identified a “global panzootic lineage” (termed BdGPL) that included all of the isolates from regional epizootic infections across five continents. This lineage showed evidence for posi-tive selection on multiple gene products and numerous recombination breakpoints both within and between chro-mosomes; the latter a pattern consistent with the hypoth-esized origin of BdGPL as an ancient hybrid lineage whose clonal descendants spread across the globe. Experimental
Conservation Genetics
1 3
exposures of common toads (Bufo bufo) to all lineages revealed higher virulence of BdGPL. Finally, molecular clock dating of divergence between isolates collected over a decade dated the emergence of BdGPL to recent times (25–257 ya), overlapping with its hypothesized origin in the pet trade. While these studies have yet to identify virulence genes, they provide the important testable hypothesis that recombination between genomes initiates epizootics (Farrer et al. 2011).
Joneson et al. (2011) sought to further examine the evo-lutionary events that may have promoted the emergence of the single known vertebrate pathogen among this group of fungi. By sampling across 5 fungal phyla, they identified a long list of duplication events in protease gene families that are unique to the B. dendrobatidis lineage. But because the phylogeny of the available genomes spans hundreds of millions of years, duplication events cannot be placed with confidence on the tree. Using molecular clock dating of gene duplication events, Joneson et al. (2011) acknowledge that gene family expansions occurred millions of years prior to the outbreak of this pathogenic lineage, and suggest that finer scale intraspecific comparisons of paralogs will be required to determine whether protease duplications are involved in pathogenicity.
A comparative genomics study that also asks whether gene duplications promote invasiveness, but in a more well-resolved phylogenetic framework, was performed with the Southeast Asian fruit fly Drosophila suzukii (Asplen et al. 2015). The species is rapidly expanding in Europe and North America since arriving about 2008. Unlike other (geneti-cally more well-characterized) Drosophila, D. suzukii shows the unusual behaviors of egg laying and larval feeding on ripening rather than fermenting fruit, and as a consequence has become a damaging pest of soft fruits (e.g. blueberries, blackberries, strawberries). As part of research to develop integrated pest management, nuclear genomes, mitog-enomes, and transriptomes were recently sequenced and analyzed (Ometto et al. 2013).
To examine whether adaptive molecular changes helped facilitate the ecological shift to ripening fruit, Ramasamy et al. (2016) analyzed the repertoire of 131 genes involved in olfaction throughout the genus in three gene families—the odorant receptors (ORs), the antennal expressed ionotropic receptors (aIRs: odorant-responsive ion channels expressed in the sensory neurons of antennae), and the odorant bind-ing proteins (OBPs). To do so they annotated the entire repertoire of these olfactory genes in two D. suzukii strains and in the closely related species, Drosophila biarmipes. They then searched 12 Drosophila genomes in FlyBase (Drysdale et al. 2005) for orthologs within the three gene families using and the D. suzukii assembly of Ometto et al. (2013). The authors were able to establish several instances of gene loss, gene duplication and positive selection within
these gene families along the D. suzukii lineage—candidate adaptations that facilitated the switch in larval feeding and egg laying behaviors and promoted the success of this shift to new host plants (Ramasamy et al. 2016).
Also, on our list of high-impact invasive species (Tables 1 and 2) is the Asian longhorned beetle (Anoplophora glabrip-enniss), which causes damage to > 100 tree species world-wide. It belongs to the insect family Cerambycidae, con-taining the greatest diversity of animals capable of feeding on woody plants (> 35,000 species); all within the order Coleoptera (> 400,000 species). The A. glabripenniss pro-ject (McKenna et al. 2016) allowed a detailed comparative genomic analysis of polyphagy and invasiveness. This analy-sis utilized 14 additional genomes across 6 insect orders, all of which are from species capable of digesting woody plants. It included 2 other beetle species whose genomes were ana-lyzed for the first time—one being the emerald ash borer, a devasting pest of ash trees in North America—and a termite whose plant cell wall degrading enzymes are produced by symbiotic gut protists. The A. glabripenniss genome encodes one of the largest repertoires of enzymes that can digest polysaccharides in wood. The analysis found large expan-sions of arthropod gene families that encode these enzymes, and many horizontal gene transfers (HGTs) from bacteria and fungi. Several of the HGTs are in the glycoside hydro-lase gene family that digests plant cell walls, and are ancient insertions that have evolved into functional genes in A. glabripenniss. The genome also contains large tandem gene expansions of detoxifying enzymes, allowing the plasticity to feed on a diversity of woody plants, and a large number of genes with chemosensory functions involved in locating host plants and finding mates.
Populations of the fall armyworm Spodoptera fru-giperda (a noctuid moth) exist as two sympatric host races—the “corn strain” (C strain) feeding mostly on maize, cotton and sorghum and the “rice strain” (R strain) feeding mostly on rice and pasture grasses. The C and R strains are morphologically indistinguishable, but differ in numerous physiological, behavioral and developmental traits. They also show fitness differences on their respec-tive host plants, are estimated to have diverged 2 million ya, and have evolved partial pre and postzygotic repro-ductive isolation, due in part to differential response by females to pheromones—i.e. they may represent incipient pheromonal host races. Native to North and South America and Carribbean islands, they are damaging crop pests in their home range. Gouin et al. (2017) used a fragmented genome assembly that, due to synteny across order Lepi-doptera, still enabled establishment of orthology. The authors compared the diversity of genes associated with host plant use in this highly polyphagous species to some Lepidoptera with single host plants (i.e. the monophagous species Bombyx mori, Manduca sexta, Danaus plexippus,
Conservation Genetics
1 3
and Heliconius melpomene), and between the C and R host races. The most striking of their findings is the vast diver-sification of the gustatory receptor genes (expressed on taste sensillae on tarsi, mouthparts and ovipositors) in S. frugiperda compared to the monophagous species. The C and R races showed no differences in their complement of chemosensory genes, but they show differences in copy number and sequence of genes controlling digestive and detoxification functions.
The elements of a genome sequencing project: essential concepts and quality metrics
Genome projects are complex; involving multiple steps (most conducted sequentially: Fig. 1), several technologies, and a team of investigators that collaborate across disci-plines. Below we discuss the major elements in more detail.
Genome sequencing
The efforts leading up to the completion of the first draft of the human genome in 2001 marked a shift in the general strategies employed for genome sequencing (Fleischmann et al. 1995; Goffeau et al. 1996; Adams et al. 2000). His-torically, due to the high cost of obtaining sequencing data, genome projects relied on an ordered assembly approach in which genome fragments were first cloned into vectors
such as bacterial artificial chromosomes (BACs), yeast artificial chromosomes (YACs) or Fosmids, and the order and chromosomal locations of these fragments were deter-mined by various methods. This approach was employed by the public consortium sequencing the human genome (Lander et al. 2001). The massive parallelization and auto-mation of capillary electrophoresis-based Sanger sequenc-ing dramatically reduced cost and increased throughput, which allowed the Celera Genomics team to employ a shotgun sequencing approach, in which random genome fragments that hadn’t been previously mapped or ordered were sequenced and subsequently assembled computation-ally (Venter et al. 2001). As sequencing costs have con-tinued to fall with the advent of NGS technologies, shot-gun sequencing has become standard practice for genome sequencing projects.
Initial NGS technologies had read lengths comparable to or shorter than typical Sanger sequencing reads (in the 600–1000 bp range). [Sanger sequencing is used in many “1st generation” DNA sequencing technologies, in which DNA molecules are synthesized enzymatically, and random incorporation of modified “chain terminating” nucleotides into the growing molecule allow the DNA base present at each position to be identified.] Next generation “short-read” sequencing technologies are exemplified by sequencing-by-synthesis platforms on Illumina (San Diego, CA) instru-ments. Illumina sequencing allows for either single end or paired end reads (in the latter, sequence is read from both ends of a DNA molecule), with read lengths between 50
Fig. 1 Elements of a genome project. A flow chart connect-ing the elements or steps in a genome project. Arrows connect steps that are sequential—i.e. the previous step outputs infor-mation necessary to conduct the next one. Transcriptome work can be done in parallel with genome sequencing. In parentheses are some promi-nent molecular technologies, with examples of some popular analysis programs in italics
Conservation Genetics
1 3
and 300 bp. Illumina sequencing has several advantages that make it a standard in genome sequencing projects as a means to generate at least some short-read data. Illumina data are both cost-effective and highly accurate, with quality scores often exceeding modified Phred scores of 30 (one error in a thousand bases). While short reads are insufficient to bridge many repetitive sequences, additional strategies can be used to improve the quality of a short-read assembly, including the use of mate-pair sequencing (Metzker 2010) that can be used to span larger intervening segments of difficult-to-assemble DNA, or information from BAC or fosmid map-ping (e.g. Zhang et al. 2012). More recently, techniques such as Hi-C [chromosome conformational capture from high-throughput sequencing: (Burton et al. 2013)] and syn-thetic long-read approaches, such as 10 × Genomics (Zheng et al. 2016), have been successfully used to improve genome assemblies and for long-range mapping of polymorphisms to parental chromosomes [i.e. haplotype phasing (Seo et al. 2016; Moll et al. 2017)].
The availability and rapid improvement of long read sequencing technologies, such as Pacific Biosciences (PacBio, Menlo Park CA) and Oxford Nanopore (Oxford, UK), has been a major boon for genome sequencing and assembly. PacBio sequencing can routinely achieve read lengths of tens of kilobases (and up to hundreds of kilo-bases) and has been successfully used for de novo assembly of a large number of eukaryotic genomes (Rhoads and Au 2015). Such long reads are extremely powerful for assem-bly and have been used to complete a single 25 Mb contig that spans all of Drosophila melanogaster chromosome arm 3L (Berlin et al. 2015). Oxford Nanopore sequencing was recently used for de novo assembly of the human genome and can in extreme cases attain read lengths approaching 1 megabase. Both of these long-read technologies rely on the sequencing of single molecules, and thus have much higher intrinsic error rates than Illumina (Koren et al. 2012). How-ever, several strategies for mitigating the higher intrinsic error rates have been employed, including circular consensus sequencing (CCS) in the case of PacBio, and reading both DNA strands in the case of Oxford Nanopore (Rhoads and Au 2015; Jain et al. 2016). Often, long-read sequencing pro-jects employ a step known as “polishing” (Fig. 1), in which an additional bout of Illumina sequencing is performed to check and correct errors in sequences generated by long read technologies.
Genome assembly and annotation
Assembly can be the most time consuming and expensive step in genome sequencing of eukaryotes de novo. Com-pleteness and contiguity of the assembly is dependent on several key factors, including intrinsic properties of the
genome (in particular, the number and types of repeti-tive sequences) and technical factors such as the length of the sequencing reads and the sequencing depth that can be obtained (Kingsford et al. 2010; Schatz et al. 2010; Henson et al. 2012). Genomes of eukaryotic organisms typically contain millions of DNA segments consisting of repeated sequence motifs that do not code for genes. In fact, over half the genome of humans and other mammals is comprised of repetitive DNA (de Koning et al. 2011; Padeken et al. 2015), most of which is associated with insertions of transposable elements (DNA transposons and retrotransposons).
The quality of a genome assembly can be assessed in multiple ways (Box 1; Fig. 2). Contiguity measures, such as contig numbers and contig N50 values provide one metric of assembly quality. The contig N50 is defined as the value (in units of contig length) where 50% of the assembled genome is in contigs larger than the N50 value. Depend-ing on genome complexity, N50 s in the range of kilobases to tens of kilobases are achievable with short-read NGS technologies, such as Illumina. By incorporating long-read technologies such as PacBio Single Molecule Real Time (SMRT) sequencing (Kingan et al. 2019) or Oxford Nanopore (Goodwin et al. 2015), contig N50 values in the megabase range can be obtained. The completeness of an assembly can be assessed by examining its gene con-tent in an evolutionary context. Benchmarking Universal Single-Copy Orthologs (BUSCO) analysis (Simao et al. 2015) compares an assembly to a database of single copy orthologs present in > 90% of species within a clade (e.g. Metazoa) to provide a measure of the representation and complete annotation (discussed next) of these expected, highly conserved genes.
Once a draft assembly is completed, the genome can be annotated (i.e. the locations in the genome and functions of genes are predicted and the genes are named). There exist a number of different gene prediction programs, which look for open reading frames and other predicted gene features (Stanke and Morgenstern 2005; Goodswen et al. 2012). Additional algorithms can be used to functionally annotate the genome based on homology of the predicted genes to known proteins or protein domains. Annotation can be fur-ther improved by acquiring transcriptome data (Fig. 1), via RNA-sequencing (RNA-seq). As with genome assembly, long-read approaches for RNA-seq can also help to refine gene models and provide a fuller picture of the transcrip-tome. They do so by generating full length sequences of transcripts, to distinguish differences among transcript splice isoforms that short read RNA-seq cannot resolve (Sharon et al. 2013).
In contrast to the steadily changing and improv-ing approaches to sequencing and assembly; in fact to
Conservation Genetics
1 3
virtually all other steps in a genome project (Fig. 1), annotations are done just about as they were nearly 25 ya (Salzberg 2019). The first step—automated annota-tion—is particularly challenging for the number of large, fragmented, “draft” eukaryotic genomes that are accu-mulating, and accuracy is worsened by the tendency for errors to propagate as investigators annotated new genome drafts. RNA-seq will reduce errors, and nano-pore technologies on the horizon that will sequence RNAs without first converting them to DNA (as in RNA-seq) even more so (Salzberg 2019). In our experience,
moreover, the highly unfragmented zebra mussel genome remained difficult to annotate in part due to the absence of closely related sequenced bivalves (McCartney et al. 2019). Oysters, scallops and other sequenced bivalves diverged from zebra mussels hundreds of millions of ya (McCartney et al. 2019). A strong argument in favor of generating a high-quality zebra mussel genome assembly was to begin to fill this gap in the mollusk phylogenomic tree, and the same one can be made for many other taxa in our disciplines.
Fig. 2 Genome assembly and scaffolding. Top: Reads, Con-tigs, and Scaffolds. Illustration of how reads are assembled into contigs, which are then further placed on scaffolds which contain ordered contigs interspersed with gaps. Bottom: Contiguity Measures. Illus-tration of how N50 and L50 statistics are calculated using a hypothetical 1 Mb genome Fig-ure adapted from: https ://githu b.com/schat zlab/teach ingar chive /blob/maste r/2012/CSHL.Seque ncing /Whole %20Gen ome%20Ass embly %20and %20Ali gnmen t.pdf
Conservation Genetics
1 3
Box 1: Key terms and concepts in genome sequencing and assembly
Sequencing depth and coverage These terms are often used interchangeably depending on the application. Cov-erage or depth of 100 × means that, on average, 100 reads contain a given DNA base at a position along the sequence that is “reconstructed” after piling up all of the sequenc-ing reads. Depth is a function of the total length of DNA being sequenced, the capacity of the technology (e.g. the number of bases that can sequenced on a lane of an Illu-mina instrument) and whether the reads are sequenced from only one end, or from both ends (as in paired-end and mate-pair sequencing, discussed below).
Contig A contig is a contiguous block of sequence that can be assembled at high confidence (Fig. 2). Contigs are assembled by stitching together overlapping sequencing reads. The full set of reads at a given genomic position (often referred to as a “pileup”) can be used to correct sequencing errors by consensus. Ploidy can be used to set an assumption a priori for the number of different reads expected at a given position, however, the existence of closely related paralogs can complicate such analysis. [Paralogs are members of gene families resulting from gene duplication events that generate multiple gene cop-ies at different positions on the chromosomes.]
Scaffold A scaffold is a set of contigs which are linked by gaps of known or estimated length. Repetitive or difficult to sequence regions can create contig break points that cannot be unambiguously assembled (Fig. 2). However, it may be possible to generate evidence that supports the ordering of a set of contigs and in some cases measure the distance between them, which can be used to position contigs on scaffolds. This evidence can include paired-end or mate-pair reads that bridge two contigs, as well as other forms of evidence, such as optical mapping and Hi-C (Schwartz et al. 1993; Burton et al. 2013).
Contiguity measures: N50 and L50 The N50 and L50 val-ues are measures of the contiguity of an assembly. These statistics can be applied to either contigs or scaffolds. N50 is defined as the contig (or scaffold) size where 50% of the genome is in contigs (or scaffolds) larger than the N50 size (Fig. 2). L50 is the minimum number of contigs (or scaffolds) in which 50% of the genome is contained (Fig. 2). Other similar statistics, such as N75 and L75 (where the percentage of the genome described by these values is 75%), are also sometimes reported.
A note on N50 and L50 values While N50 and L50 (and similar contiguity measures) are typically used to assess the technical quality of an assembled genome, it should be noted that these statistics are only strictly comparable when assessing genomes of the same size and number of chro-mosomes. For instance, the current human genome release (Hg38, RefSeq accession: GCF_000001405.38), still has many gaps and exists in 472 scaffolds (more than an order of magnitude greater than the expected 23 chromosomes), while the Escherichia coli K-12 (E. coli) genome is per-fectly complete (RefSeq accession: GCF_000005845.2: Blattner et al. 1997), existing as a single closed contig. Yet due to the much larger size of the human genomes compared to E. coli, the scaffold N50 of the human Hg38 assembly is > 67 Mb, while the E. coli N50 is only 4.6 Mb (the size of the full genome). Most metazoan genomes are sufficiently large that N50 and L50 values at both contig and scaffold levels provide useful information about the technical qual-ity of the assembly. Therefore, we provide them (Table 2) to evaluate the quality of invasive species genome assemblies. However recent technological advances such as Hi-C (Bur-ton et al. 2013) are capable of scaffolding reads at the scale of a full chromosome, in which case the scaffold N50 and L50 values are essentially determined by the properties of the genome being studied as opposed to technical factors.
Reference genomes in invasion and conservation genomics
Still, investigators have limited budgets, and whole genome sequences are overkill for some applications. GBS, RAD-seq and related technologies offer finer-scale resolution of popula-tion histories for invasive species and endangered species than ever before available, with more robust results from sampling thousands of independent gene genealogies, all without the need for high quality reference genomes. Elleouet and Ait-ken (2018) explored with coalescent simulations the ability of ABC to accurately estimate parameters (e.g. size of founding populations; age of introductions) for scenarios of recent spa-tial expansion of one ancestral population into a descendant, growing population; a model relevant to range expansion and biological invasions (as well as recovery of an endangered pop-ulation). They find that shallow sequencing of large numbers of individuals estimates known parameters more accurately than deep sequencing of fewer individuals per population, and they show that phased haplotype sequences and linkage dis-equilibrium information provides no more accuracy than SNPs scored without this information. Their message from this study and several reviews (Davey et al. 2011; Nielsen et al. 2011; Mastretta-Yanes et al. 2015) is to focus on reducing genotyping error and not on a genome assembly.
Conservation Genetics
1 3
In contrast, high quality assemblies are required when knowledge of genomic architecture is needed. Genomic studies of inbreeding and inbreeding depression in wild populations (Kardos et al. 2017a, b) are an example. Kardos et al. (2017a) surveyed genomes of 97 grey wolves from bottlenecked Scan-dinavian populations for Runs of Homozygosity (ROH): iden-tical-by-descent blocks of chromosomes evidenced by contigu-ous stretches of homozygous loci. Whole genome sequences mapped to a high-quality reference genome revealed that long ROH—those marking regions of inbreeding estimated to have arisen < 10 generations in the past—made up the majority of DNA regions that are identical by descent. This genomic approach offers a powerful alternative to Genome Wide Asso-ciation (GWA) mapping, and its requirement for extensive ped-igree information. In the wolf study, pedigree information was available, and correlations were calculated between the real-ized inbreeding coefficient FROH (from genome-wide ROH), the pedigree derived value (FP) and values of F estimated from 500 to 10,000 subsampled SNPs. Values of F from the smaller SNP panels were more closely correlated with the genome-wide FROH value than was FP (Kardos et al. 2017a), confirming the accuracy of the genomic methods. The success of ROH studies in humans and domesticated mammals indicate their power to complement any available pedigree information from inbred natural populations that are, for example, candidates for genetic restoration (Kardos et al. 2016).
Seven of the invasive-species projects that we reviewed were mammal genome projects motivated by domestic breed-ing programs, or by utility of the species as a biomedical research model (Table 2). Nevertheless, these provide infor-mation useful to our disciplines, as they can be used to gauge the utility of improved genome assemblies. The most recent release of the goat genome (Bickhart et al. 2017) employed long and short read sequencing, combined with optical map-ping and Hi-C to scaffold the assembly, resulting in what the authors claimed to be the most well-assembled genome of any non-human mammal. The new assembly mapped to chromosomes 90% of the 1723 previously unmapped SNP markers on the 52 K SNP Chip, a high throughput genotyping tool used in breeding programs, and improved their scoring call rate—apparently because these unmapped markers fall within repeat regions. Immune system genes showed particu-lar improvements with the new assembly due to their repeti-tive nature and extreme polymorphism, with two of the major gene regions mapping to a single scaffold. The project was completed at a total cost of $100,000 (Bickhart et al. 2017).
Scans for differentiation (based on FST and related indices) have been used for decades now to survey genomes for selec-tion. A population genomic perspective on this approach in birds recommends caution (Wolf and Ellegren 2017; Peona et al. 2018). Surveys of genomes of hooded and carrion crow in their European hybrid zone located a putative “speciation island” of high interspecific differentiation on chromosome
18, but this island spans an assembly gap of unknown size. Addition of long-read sequencing and optimal mapping found the gap to coincide with a large repeat region potentially asso-ciated with a centromere, and population resequencing data confirmed that this region showed depressed recombination, complicating its analysis (Weissensteiner et al. 2017). Recom-bination rate variation across the genomes of another avian speciation model, Ficedula albicollis and F. hypoleuca fly-catchers, is associated with structural features of the genome, such as transposable elements and locations for promoters in intergenic regions. Without high quality assembly, the fine scale “landscape” of recombination hotspots and coldspots would be missed, confounding analysis of the closely related genomes of these hybridizing species (Burri et al. 2015).
Here we point out that genome sequencing projects are works in progress, in which assembly and annotation are con-tinuously improved upon, and technological advancements as well as investigator effort drives progress. To illustrate, the first release of the human genome assembly (WGSA: from the December 2001 assembly by Celera Genomics) yielded a con-tig N50 of 23,350 bp. Since then, 221 assembly versions later, the latest update of the human reference genome (GRCh38.p12), boasts a contig N50 of 57,879,411 bp (NCBI 2019d, accessed 1 August 2019).
These and other examples underscore the value of commu-nity efforts. Projects on model organisms often use networks of investigators to undertake whole genome sequencing pro-jects on strategically sampled populations worldwide—such as the 1001 Genomes Consortium (2016) for Arabidopsis and the Drosophila melanogaster Genome Nexus [623 genomes: (Lack et al. 2015)]. One effort in a non-model taxon is a low-coverage (20 ×) whole-genome sequencing project to study genomic variation associated with domestication, phenotypic variation and disease states across > 5000 dog breeds and other canids (Ostrander et al. 2019). Another is in birds. There were three avian assemblies in 2001, 101 by 2018, and the B10 K initiative has the stated goal of sequencing 10,500 genomes for phylogenomics and a variety of other applications—i.e. nearly all bird species on Earth (Jarvis 2016; Peona et al. 2018). Assembly and annotation quality variation has been problematic in the phylogenomics project, and Jarvis (2016) suggests that researchers can help alleviate these issues and make genome projects more efficient by cooperatively gen-erating reference genomes from strategically chosen species.
Genomics in invasive species management: development of biocontrols
Some invasive species genome projects have been launched to discover biocontrols. For example, vector-directed bio-control drove the sequencing of the genomes of the invasive
Conservation Genetics
1 3
mosquito species that carry malaria [Anopheles gambiae: (Holt et al. 2002)] and yellow fever, dengue and Zika viruses [Aedes aegypti: (Nene et al. 2007; Matthews et al. 2018)]. Sequencing of the genome of the crown-of-thorns sea star (Acanthaster planci spp. group) identified the genes for an array of molecules released when animals aggregate to spawn—including a large number of unique ependymin-family proteins active in the central nervous system of many animals and their putative receptors (Hall et al. 2017). This communication system may be a target for biocontrol using synthetic peptides that mimic aggregation cues.
Genetic modification technologies for biocontrol of inva-sive species, crop pests and vectors of human disease are drawing much recent attention. Molecular biologists have invented several techniques with which they can deliver foreign DNA, or make precise edits in native DNA. The CRISPR/Cas9 gene editing system has received the greatest recent attention for applications in biological conservation, including control of invasive species (Champer et al. 2016; Noble et al. 2018; Rode et al. 2019). This approach offers the potential for spreading genetically edited alleles throughout wild populations (even when they lower fitness), through a mechanism known as a “gene drive.” The enzyme Cas9 cleaves target genes and directs their precise editing. When properly engineered, the Cas9 editing system can initiate conversion of the non-modified allele on the homologous chromosome to the modified allele, making the edited organ-ism homozygous and leading to super-Mendelian inherit-ance (Burt 2003; Gantz and Bier 2015; Gantz et al. 2015). Mosquito vectors of disease have been used in several labo-ratory trials (e.g., Gantz et al. 2015), including one in which a sex ratio distortion gene drive caused complete collapse of caged populations of Anopheles gambiae (Kyrou et al. 2018).
The first step forward in genetic modification requires selection of target genes and biological processes that, when modified, will produce the desired fitness effect (lethality, reduced viability, infertility). Genome sequence information is required for selecting target genes and designing edit-ing constructs. For example, Drury et al. (2017) generated genomic sequences from 4 global populations of the flour beetle Tribolium castaneum to examine population DNA variation at Cas9 sites. These edits are expected to produce a range of fitness costs, due to their effects on eye pigmen-tation, female and male fertility, and insecticide sensitivity. Maselko et al. (2017) searched the yeast genome for target gene promoter regions that when modified [using a “dead” Cas9 enzyme (dCas9) that does not cause a gene drive] would direct lethal overexpression of the gene product. To screen for efficacy, they searched population genomic data from rice and fruit flies for variants in dCas9 sites within promoter regions.
Conclusions
Genomic analyses have already demonstrated their greater power for the most common applications of genetics in inva-sion biology—reconstruction of invasion history. Source populations and invasion pathways can be identified with much finer resolution, even offering the potential to perform “invasion forensics” at geographic scales in which agencies implement invasive species management programs. Genom-ics has also shown greater power to research the “genetic paradox” of invasions, and provide better estimates of the number and size of introductions.
In addition to these largely quantitative improvements in power (e.g. from genotyping thousands of unlinked SNPs), genomics provides qualitatively different informa-tion about genome structure that is just beginning to be explored in studies of endangered species, and invasive species. In endangered species, studies of the genomic architecture of homozygosity have the potential to revo-lutionize studies of inbreeding and inbreeding depression, as well as the design of captive breeding and genetic res-toration programs. In genomes of invasive species, one of the most commonly reported structural features are gene family expansions, and comparative genomics is providing improved rigor to determine their role in the evolution of invasiveness.
A major challenge remaining for both invasion and con-servation genomics is the identification of genes that control invasiveness, disease resistance, or countless other traits of conservation relevance. The perspective of Salzberg (2019) reminds us that success will be limited by the quality of the annotations of genomes of the mostly non-model species that we study. We suggest that this leads to the strongest argument in favor of community efforts to gather the needed genomic resources, and for efficient strategies to gener-ate high quality reference genomes from carefully chosen species.
Acknowledgements We thank Benjamin Auch and Kenneth Beck-man in the University of Minnesota Genomics Center, Adam Herman, Thomas Kono, Kevin Silverstein, and Ying Zhang of the Minnesota Supercomputing Institute and many other collaborators for their excep-tional contributions to the zebra mussel genome project that inspired this review. Funding was provided by grants from the Minnesota Envi-ronment and Natural Resources Trust Fund and the Minnesota Aquatic Invasive Species Research Center, and from private donations.
References
Adams MD, Celniker SE, Holt RA et al (2000) The genome sequence of Drosophila melanogaster. Science 287:2185–2195
Allendorf FW (2017) Genetics and the conservation of natural popula-tions: allozymes to genomes. Mol Ecol 26:420–430
Conservation Genetics
1 3
Allendorf FW, Lundquist LL (2003) Introduction: population biol-ogy, evolution, and control of invasive species. Conserv Biol 17:24–30
Allendorf FW, Hohenlohe PA, Luikart G (2010) Genomics and the future of conservation genetics. Nat Rev Genet 11:697–709
Allendorf FW, Luikart G, Aitken SN (2013) Conservation and the genetics of populations, 2nd edn. Wiley-Blackwell, Chichester
Andrews KR, Good JM, Miller MR, Luikart G, Hohenlohe PA (2016) Harnessing the power of RADseq for ecological and evolutionary genomics. Nat Rev Genet 17:81–92
Ascunce MS, Yang C-C, Oakey J, Calcaterra L, Wu W-J, Shih C-J, Goudet J, Ross KG, Shoemaker D (2011) Global invasion his-tory of the fire ant Solenopsis invicta. Science 331:1066–1068
Asplen MK, Anfora G, Biondi A et al (2015) Invasion biology of spot-ted wing Drosophila (Drosophila suzukii): a global perspective and future priorities. J Pest Sci 88:469–494
Baker HG, Stebbins GL (eds) (1965) The evolution of colonizing spe-cies. Academic Press, New York
Bana NA, Nyiri A, Nagy J et al (2018) The red deer Cervus elaphus genome CerEla1.0: sequencing, annotating, genes, and chromo-somes. Mol Genet Genom 293:665–684
Barker BS, Andonian K, Swope SM, Luster DG, Dlugosch KM (2017) Population genomic analyses reveal a history of range expansion and trait evolution across the native and invaded range of yellow starthistle (Centaurea solstitialis). Mol Ecol 26:1131–1147
Barrett SCH (2017) Foundations of invasion genetics: The Baker and Stebbins legacy. In: Barrett SCH, Colautti RI, Dlugosch KM, Rieseberg LH (eds) Invasion genetics: the Baker and Stebbins legacy. Wiley, Chichester, pp 1–20
Berlin K, Koren S, Chin C-S, Drake JP, Landolin JM, Phillippy AM (2015) Assembling large genomes with single-molecule sequenc-ing and locality-sensitive hashing. Nat Biotechnol 33:623–630
Berthelot C, Brunet F, Chalopin D et al (2014) The rainbow trout genome provides novel insights into evolution after whole-genome duplication in vertebrates. Nat Commun 5:3657
Bickhart DM, Rosen BD, Koren S et al (2017) Single-molecule sequenc-ing and chromatin conformation capture enable de novo reference assembly of the domestic goat genome. Nat Genet 49:643–650
Black WC IV, Baer CF, Antolin MF, DuTeau NM (2001) Population genomics: genome-wide sampling of insect populations. Annu Rev Entomol 46:441–469
Blattner FR, Plunkett G 3rd, Bloch CA et al (1997) The com-plete genome sequence of Escherichia coli K-12. Science 277:1453–1462
Bohme U, Otto TD, Cotton JA et al (2018) Complete avian malaria parasite genomes reveal features associated with lineage-specific evolution in birds and mammals. Genome Res 28:547–560
Bourne SD, Hudson J, Holman LE, Rius M (2018) Marine invasion genomics: revealing ecological and evolutionary consequences of biological invasions. In: Rajora OP (ed) Population genomics: concepts, approaches and applications. Springer, Switzerland, pp 1–36
Burri R, Nater A, Kawakami T et al (2015) Linked selection and recombination rate variation drive the evolution of the genomic landscape of differentiation across the speciation continuum of Ficedula flycatchers. Genome Res 25:1656–1665
Burt A (2003) Site-specific selfish genes as tools for the control and genetic engineering of natural populations. Proc R Soc of Lond Ser B 270:921–928
Burton JN, Adey A, Patwardhan RP, Qiu R, Kitzman JO, Shendure J (2013) Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat Biotechnol 31:1119–1125
Carneiro M, Rubin C-J, Di Palma F et al (2014) Rabbit genome analysis reveals a polygenic basis for phenotypic change during domestication. Science 345:1074–1079
Catchen J, Bassham S, Wilson T, Currey M, O’Brien C, Yeates Q, Cresko WA (2013a) The population structure and recent colo-nization history of Oregon threespine stickleback determined using restriction-site associated DNA-sequencing. Mol Ecol 22:2864–2883
Catchen J, Hohenlohe PA, Bassham S, Amores A, Cresko WA (2013b) Stacks: an analysis tool set for population genomics. Mol Ecol 22:3124–3140
Champer J, Buchman A, Akbari OS (2016) Cheating evolution: engi-neering gene drives to manipulate the fate of wild populations. Nat Rev Genet 17:146–159
Chen X-G, Jiang X, Gu J et al (2015) Genome sequence of the Asian Tiger mosquito, Aedes albopictus, reveals insights into its biology, genetics, and evolution. Proc Natl Acad Sci 112:E5907–E5915
Chen W, Hasegawa DK, Kaur N et al (2016) The draft genome of whitefly Bemisia tabaci MEAM1, a global crop pest, provides novel insights into virus transmission, host adaptation, and insec-ticide resistance. BMC Biol 14:110
Chown SL, Hodgins KA, Griffin PC, Oakeshott JG, Byrne M, Hoff-mann AA (2015) Biological invasions, climate change and genomics. Evol Appl 8:23–46
Colautti RI, MacIsaac HJ (2004) A neutral terminology to define “inva-sive” species. Divers Distrib 10:135–141
Cresko WA, Amores A, Wilson C, Murphy J, Currey M, Phillips P, Bell MA, Kimmel CB, Postlethwait JH (2004) Parallel genetic basis for repeated evolution of armor loss in Alaskan threespine stick-leback populations. Proc Natl Acad Sci USA 101:6050–6055
Cristescu ME (2015) Genetic reconstructions of invasion history. Mol Ecol 24:2212–2225
Darling JA, Bagley MJ, Roman JOE, Tepolt CK, Geller JB (2008) Genetic patterns across multiple introductions of the globally invasive crab genus Carcinus. Mol Ecol 17:4992–5007
Darling JA, Tsai YH, Blakeslee AM, Roman J (2014) Are genes faster than crabs? Mitochondrial introgression exceeds larval dispersal during population expansion of the invasive crab Carcinus mae-nas. R Soc Open Sci 1:140202
Davey JW, Hohenlohe PA, Etter PD, Boone JQ, Catchen JM, Blax-ter ML (2011) Genome-wide genetic marker discovery and genotyping using next-generation sequencing. Nat Rev Genet 12:499–510
de Koning APJ, Gu W, Castoe TA, Batzer MA, Pollock DD (2011) Repetitive elements may comprise over two-thirds of the human genome. PLoS Genet 7:e1002384
Dlugosch KM, Parker IM (2007) Founding events in species invasions: genetic variation, adaptive evolution, and the role of multiple introductions. Mol Ecol 17:431–449
Dlugosch KM, Anderson SR, Braasch J, Cang FA, Gillette HD (2015) The devil is in the details: genetic variation in introduced popula-tions and its contributions to invasion. Mol Ecol 24:2095–2111
Dobzhansky T (1965) “Wild” and”domestic” species of Drosophila. In: Baker AM, Stebbins GL (eds) The genetics of colonizing species. Academic Press, New York, pp 533–546
Drury DW, Dapper AL, Siniard DJ, Zentner GE, Wade MJ (2017) CRISPR/Cas9 gene drives in genetically variable and nonran-domly mating wild populations. Sci Adv 3:e1601910
Drysdale RA, Crosby MA, FlyBase C (2005) FlyBase: genes and gene models. Nucleic Acids Res 33:D390–D395
Edwards RJ, Tuipulotu DE, Amos TG et al (2018) Draft genome assembly of the invasive cane toad, Rhinella marina. Gigas-cience 7:1–13
Elleouet JS, Aitken SN (2018) Exploring Approximate Bayesian Com-putation for inferring recent demographic history with genomic markers in nonmodel species. Mol Ecol Resour 18:525–540
Emerson KJ, Merz CR, Catchen JM, Hohenlohe PA, Cresko WA, Bradshaw WE, Holzapfel CM (2010) Resolving postglacial
Conservation Genetics
1 3
phylogeography using high-throughput sequencing. Proc Natl Acad Sci USA 107:16196–16200
Engelbrecht J, Duong TA, Berg NVD (2017) New microsatellite mark-ers for population studies of Phytophthora cinnamomi, an impor-tant global pathogen. Sci Rep 7:17631
Estoup A, Guillemaud T (2010) Reconstructing routes of invasion using genetic data: why, how and so what? Mol Ecol 19:4113–4130
Estoup A, Ravigné V, Hufbauer R, Vitalis R, Gautier M, Facon B (2016) Is there a genetic paradox of biological invasion? Annu Rev Ecol Evol Syst 47:51–72
Farrer RA, Weinert LA, Bielby J et al (2011) Multiple emergences of genetically diverse amphibian-infecting chytrids include a glo-balized hypervirulent recombinant lineage. Proc Natl Acad Sci USA 108:18732–18736
Fleischmann RD, Adams MD, White O et al (1995) Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269:496–512
Frankham R (2004) Resolving the genetic paradox in invasive species. Heredity 94:385
Gantz VM, Bier E (2015) The mutagenic chain reaction: a method for converting heterozygous to homozygous mutations. Science 348:442–444
Gantz VM, Jasinskiene N, Tatarenkova O, Fazekas A, Macias VM, Bier E, James AA (2015) Highly efficient Cas9-mediated gene drive for population modification of the malaria vector mosquito Anopheles stephensi. Proc Natl Acad Sci USA 112:E6736
Genomes Consortium (2016) 1135 genomes reveal the global pattern of polymorphism in Arabidopsis thaliana. Cell 166:481–491
Goffeau A, Barrell BG, Bussey H et al (1996) Life with 6000 genes. Science 274:546
Goodswen SJ, Kennedy PJ, Ellis JT (2012) Evaluating high-throughput ab initio gene finders to discover proteins encoded in eukaryotic pathogen genomes missed by laboratory techniques. PLoS ONE 7:e50609
Goodwin S, Gurtowski J, Ethe-Sayers S, Deshpande P, Schatz MC, McCombie WR (2015) Oxford Nanopore sequencing, hybrid error correction, and de novo assembly of a eukaryotic genome. Genome Res 25:1750–1756
Gouin A, Bretaudeau A, Nam K et al (2017) Two genomes of highly polyphagous lepidopteran pests (Spodoptera frugiperda, Noctui-dae) with different host-plant ranges. Sci Rep 7:11816
Hall MR, Kocot KM, Baughman KW et al (2017) The crown-of-thorns starfish genome as a guide for biocontrol of this coral reef pest. Nature 544:231–234
Hammond SA, Warren RL, Vandervalk BP et al (2017) The North American bullfrog draft genome provides insight into hormonal regulation of long noncoding RNA. Nat Commun 8:1433
Henson J, Tischler G, Ning Z (2012) Next-generation sequencing and large genome assemblies. Pharmacogenomics 13:901–915
Higashino A, Sakate R, Kameoka Y, Takahashi I, Hirata M, Tanuma R, Masui T, Yasutomi Y, Osada N (2012) Whole-genome sequenc-ing and analysis of the Malaysian cynomolgus macaque (Macaca fascicularis) genome. Genome Biol 13:R58
Hoffberg SL, Troendle NJ, Glenn TC, Mahmud O, Louha S, Chalopin D, Bennetzen JL, Mauricio R (2018) A high-quality reference genome for the invasive mosquitofish Gambusia affinis using a Chicago Library. G3 (Bethesda) 8:1855–1861
Hohenlohe PA, Bassham S, Etter PD, Stiffler N, Johnson EA, Cresko WA (2010) Population genomics of parallel adaptation in three-spine stickleback using sequenced RAD tags. PLoS Genet 6:e1000862
Holt RA, Subramanian GM, Halpern A et al (2002) The genome sequence of the malaria mosquito Anopheles gambiae. Science 298:129–149
Horvath DP, Patel S, Doğramaci M et al (2018) Gene space and tran-scriptome assemblies of leafy spurge (Euphorbia esula) identify
promoter sequences, repetitive elements, high-quality markers, and a full-length chloroplast genome. Weed Sci 66:355–367
Hufbauer RA, Facon B, Ravigne V, Turgeon J, Foucaud J, Lee CE, Rey O, Estoup A (2012) Anthropogenically induced adaptation to invade (AIAI): contemporary adaptation to human-altered habi-tats within the native range can promote invasions. Evol Appl 5:89–101
Jain M, Olsen HE, Paten B, Akeson M (2016) The Oxford Nanopore MinION: delivery of nanopore sequencing to the genomics com-munity. Genome Biol 17:239
Jarvis ED (2016) Perspectives from the avian phylogenomics project: questions that can be answered with sequencing all genomes of a vertebrate class. Annu Rev Anim Biosci 4:45–59
Jeffery NW, DiBacco C, Van Wyngaarden M et al (2017) RAD sequencing reveals genomewide divergence between independ-ent invasions of the European green crab (Carcinus maenas) in the Northwest Atlantic. Ecol Evol 7:2513–2524
Joneson S, Stajich JE, Shiu S-H, Rosenblum EB (2011) Genomic transition to pathogenicity in chytrid fungi. PLoS Pathol 7:e1002338
Kardos M, Taylor HR, Ellegren H, Luikart G, Allendorf FW (2016) Genomics advances the study of inbreeding depression in the wild. Evol Appl 9:1205–1218
Kardos M, Åkesson M, Fountain T et al (2017a) Genomic conse-quences of intensive inbreeding in an isolated wolf population. Nat Ecol Evol 2:124–131
Kardos M, Qvarnstrom A, Ellegren H (2017b) Inferring individual inbreeding and demographic history from segments of identity by descent in Ficedula flycatcher genome sequences. Genetics 205:1319–1334
Kingan SB, Heaton H, Cudini J, Lambert CC, Baybayan P, Galvin BD, Durbin R, Korlach J, Lawniczak MKN (2019) A high-quality de novo genome assembly from a single mosquito using PacBio Ssquencing. Genes (Basel) 10:1–11
Kingsford C, Schatz MC, Pop M (2010) Assembly complexity of prokaryotic genomes using short reads. BMC Bioinformatics 11:21
Koren S, Schatz MC, Walenz BP et al (2012) Hybrid error correction and de novo assembly of single-molecule sequencing reads. Nat Biotechnol 30:693–700
Kotsakiozi P, Richardson JB, Pichler V, Favia G, Martins AJ, Urbanelli S, Armbruster PA, Caccone A (2017) Population genomics of the Asian tiger mosquito, Aedes albopictus: insights into the recent worldwide invasion. Ecol Evol 7:10143–10157
Kyrou K, Hammond AM, Galizi R, Kranjc N, Burt A, Beaghton AK, Nolan T, Crisanti A (2018) A CRISPR-Cas9 gene drive target-ing doublesex causes complete population suppression in caged Anopheles gambiae mosquitoes. Nat Biotechnol 36:1062–1066
Lack JB, Cardeno CM, Crepeau MW, Taylor W, Corbett-Detig RB, Ste-vens KA, Langley CH, Pool JE (2015) The Drosophila genome nexus: a population genomic resource of 623 Drosophila mela-nogaster genomes, including 197 from a single ancestral range population. Genetics 199:1229–1241
Lander ES, Linton LM, Birren B et al (2001) Initial sequencing and analysis of the human genome. Nature 409:860–921
Lee CE (2002) Evolutionary genetics of invasive species. Trends Ecol Evol 17:386–391
Li M, Chen L, Tian S et al (2017) Comprehensive variation discovery and recovery of missing sequence in the pig genome using mul-tiple de novo assemblies. Genome Res 27:865–874
Liu C, Jiang F, Wang H et al (2018) The genome of the golden apple snail Pomacea canaliculata provides insight into stress tolerance and invasive adaptation. GigaScience 7:giy101
Lombaert E, Guillemaud T, Cornuet J-M, Malausa T, Facon B, Estoup A (2010) Bridgehead effect in the worldwide invasion of the biocontrol harlequin ladybird. PLoS ONE 5:e9743
Conservation Genetics
1 3
Lombaert E, Guillemaud T, Lundgren J et al (2014) Complementarity of statistical treatments to reconstruct worldwide routes of inva-sion: the case of the Asian ladybird Harmonia axyridis. Mol Ecol 23:5979–5997
Lowe S, Browne M, Boudjelas S, De Poorter M (2000) 100 of the world’s worst invasive alien species: a selection from the global invasive species database. Aukland, New Zealand, p 12
Luikart G, England PR, Tallmon D, Jordan S, Taberlet P (2003) The power and promise of population genomics: from genotyping to genome typing. Nat Rev Genet 4:981
Makkonen J, Vesterbacka A, Martin F, Jussila J, Dieguez-Uribeondo J, Kortet R, Kokko H (2016) Mitochondrial genomes and com-parative genomics of Aphanomyces astaci and Aphanomyces invadans. Sci Rep 6:36089
Manni M, Guglielmino CR, Scolari F et al (2017) Genetic evidence for a worldwide chaotic dispersion pattern of the arbovirus vector, Aedes albopictus. PLoS Negl Trop Dis 11:e0005332
Maselko M, Heinsch SC, Chacón JM, Harcombe WR, Smanski MJ (2017) Engineering species-like barriers to sexual reproduc-tion. Nat Commun 8:883
Mastretta-Yanes A, Arrigo N, Alvarez N, Jorgensen TH, Pinero D, Emerson BC (2015) Restriction site-associated DNA sequenc-ing, genotyping error estimation and de novo assembly optimiza-tion for population genetic inference. Mol Ecol Resour 15:28–41
Matthews BJ, Dudchenko O, Kingan SB et al (2018) Improved refer-ence genome of Aedes aegypti informs arbovirus vector con-trol. Nature 563:501–507
McCartney MA, Auch B, Kono T et al (2019) The genome of the zebra mussel, Dreissena polymorpha: a resource for invasive species research. bioRxiv. https ://doi.org/10.1101/69673 2v1
McKenna DD, Scully ED, Pauchet Y et al (2016) Genome of the Asian longhorned beetle (Anoplophora glabripennis), a glob-ally significant invasive species, reveals key functional and evolutionary innovations at the beetle–plant interface. BMC Genome Biol 17:227
Metzker ML (2010) Sequencing technologies—the next generation. Nat Rev Genet 11:31–46
Miller N, Estoup A, Toepfer S et al (2005) Multiple transatlantic introductions of the western corn rootworm. Science 310:992
Minardi D, Studholme DJ, van der Giezen M, Pretto T, Oidtmann B (2018) New genotyping method for the causative agent of crayfish plague (Aphanomyces astaci) based on whole genome data. J Invertebr Pathol 156:6–13
Moll KM, Zhou P, Ramaraj T et al (2017) Strategies for optimizing BioNano and Dovetail explored through a second reference quality assembly for the legume model, Medicago truncatula. BMC Genom 18:578
Murgarella M, Puiu D, Novoa B, Figueras A, Posada D, Canchaya C (2016) A first insight into the genome of the filter-feeder mus-sel Mytilus galloprovincialis. PLoS ONE 11:e0151561
Narum SR, Buerkle CA, Davey JW, Miller MR, Hohenlohe PA (2013) Genotyping-by-sequencing in ecological and conser-vation genomics. Mol Ecol 22:2841–2847
NCBI (2019a) [National Center for Biotechnology Information, National Institutes of Health, US National Library of Medi-cine], Genome resource. https ://www.ncbi.nlm.nih.gov/genom e/
NCBI (2019b) List of BioProjects, filtered for data type “Genome sequencing and assembly”. https ://www.ncbi.nlm.nih.gov/biopr oject /brows e/
NCBI (2019c) Gasterosteus aculeatus reference genome. https ://www.ncbi.nlm.nih.gov/assem bly/GCA_00018 0675.1/#/s
NCBI (2019d) Homo sapiens genome assemblies from 2001 to present: https ://www.ncbi.nlm.nih.gov/assem bly/?term=Homo+sapie ns
Nene V, Wortman JR, Lawson D et al (2007) Genome sequence of Aedes aegypti, a major arbovirus vector. Science 316:1718
Nielsen R, Paul JS, Albrechtsen A, Song YS (2011) Genotype and SNP calling from next-generation sequencing data. Nat Rev Genet 12:443–451
Noble C, Adlam B, Church GM, Esvelt KM, Nowak MA (2018) Current CRISPR gene drive systems are likely to be highly invasive in wild populations. eLife 7:e33423
Nowoshilow S, Schloissnig S, Fei J-F et al (2018) The axolotl genome and the evolution of key tissue formation regulators. Nature 554:50–55
Ometto L, Cestaro A, Ramasamy S et al (2013) Linking genomics and ecology to investigate the complex evolution of an invasive Drosophila pest. Genome Biol Evol 5:745–757
Ostrander EA, Wang G-D, Larson G et al (2019) Dog10K: an inter-national sequencing effort to advance studies of canine domes-tication, phenotypes and health. Natl Sci Rev. https ://doi.org/10.1093/nsr/nwz04 9/54376 95
Padeken J, Zeller P, Gasser SM (2015) Repeat DNA in genome organi-zation and stability. Curr Opin Genet Dev 31:12–19
Pelissie B, Crossley MS, Cohen ZP, Schoville SD (2018) Rapid evolu-tion in insect pests: the importance of space and time in popula-tion genomics studies. Curr Opin Insect Sci 26:8–16
Peona V, Weissensteiner MH, Suh A (2018) How complete are “com-plete” genome assemblies?—an avian perspective. Mol Ecol Resources 18:1188–1195
Ramasamy S, Ometto L, Crava CM et al (2016) The evolution of olfactory gene families in Drosophila and the genomic basis of chemical-ecological adaptation in Drosophila suzukii. Genome Biol Evol 8:2297–2311
Rhoads A, Au KF (2015) PacBio sequencing and its applications. Genomics Proteomics Bioinf 13:278–289
Ricciardi A, Cohen J (2006) The invasiveness of an introduced species does not predict its impact. Biol Invas 9:309–315
Rius M, Bourne S, Hornsby HG, Chapman MA (2015) Applications of next-generation sequencing to the study of biological invasions. Curr Zool 61:488–504
Roberts RJ, Church DM, Goodstadt L et al (2009) Lineage-specific biology revealed by a finished genome assembly of the mouse. PLoS Biol 7:e1000112
Rode NO, Estoup A, Bourguet D, Courtier-Orgogozo V, Débarre F (2019) Population management using gene drive: molecular design, models of spread dynamics and assessment of ecologi-cal risks. Conserv Genet 20:671–690
Roman J (2006) Diluting the founder effect: cryptic invasions expand a marine invader’s range. Proc R Soc Lond Ser B 273:2453–2459
Roman J, Darling JA (2007) Paradox lost: genetic diversity and the success of aquatic invasions. Trends Ecol Evol 22:454–464
Ryan JF, Pang K, Schnitzler CE et al (2013) The genome of the cteno-phore Mnemiopsis leidyi and its implications for cell type evolu-tion. Science 342:1242592
Salzberg SL (2019) Next-generation genome annotation: we still strug-gle to get it right. Genome Biol 20:92
Sard N, Robinson J, Kanefsky J, Herbst S, Scribner K (2019) Coales-cent models characterize sources and demographic history of recent round goby colonization of Great Lakes and inland waters. Evol Appl 12:1034–1049
Sax DF, Stachowicz JJ, Brown JH et al (2007) Ecological and evo-lutionary insights from species invasions. Trends Ecol Evol 22:465–471
Schatz MC, Delcher AL, Salzberg SL (2010) Assembly of large genomes using second-generation sequencing. Genome Res 20:1165–1173
Schwartz DC, Li X, Hernandez LI, Ramnarain SP, Huff EJ, Wang YK (1993) Ordered restriction maps of Saccharomyces cer-evisiae chromosomes constructed by optical mapping. Science 262:110–114
Conservation Genetics
1 3
Seo JS, Rhie A, Kim J et al (2016) De novo assembly and phasing of a Korean human genome. Nature 538:243–247
Sharon D, Tilgner H, Grubert F, Snyder M (2013) A single-molecule long-read survey of the human transcriptome. Nat Biotechnol 31:1009–1014
Sherman CDH, Lotterhos KE, Richardson MF, Tepolt CK, Rollins LA, Palumbi SR, Miller AD (2016) What are we missing about marine invasions? Filling in the gaps with evolutionary genom-ics. Mar Biol 163:198
Simao FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM (2015) BUSCO: assessing genome assembly and annota-tion completeness with single-copy orthologs. Bioinformatics 31:3210–3212
Smith CD, Zimin A, Holt C et al (2011) Draft genome of the globally widespread and invasive Argentine ant (Linepithema humile). Proc Natl Acad Sci USA 108:5673–5678
Stanke M, Morgenstern B (2005) AUGUSTUS: a web server for gene prediction in eukaryotes that allows user-defined constraints. Nucleic Acids Res 33:W465–W467
Studholme DJ, McDougal RL, Sambles C, Hansen E, Hardy G, Grant M, Ganley RJ, Williams NM (2016) Genome sequences of six Phytophthora species associated with forests in New Zealand. Genom Data 7:54–56
Tamazian G, Simonov S, Dobrynin P et al (2014) Annotated features of domestic cat–Felis catus genome. GigaScience 3:13
Venter JC, Adams MD, Myers EW et al (2001) The sequence of the human genome. Science 291:1304–1351
Weissensteiner MH, Pang AWC, Bunikis I, Hoijer I, Vinnere-Petterson O, Suh A, Wolf JBW (2017) Combination of short-read, long-read, and optical mapping assemblies reveals large-scale tandem repeat arrays with population genetic implications. Genome Res 27:697–708
Wolf JB, Ellegren H (2017) Making sense of genomic islands of dif-ferentiation in light of speciation. Nat Rev Genet 18:87–100
Xu P, Zhang X, Wang X et al (2014) Genome sequence and genetic diversity of the common carp, Cyprinus carpio. Nat Genet 46:1212–1219
Zhang G, Fang X, Guo X et al (2012) The oyster genome reveals stress adaptation and complexity of shell formation. Nature 490:49–54
Zheng GXY, Lau BT, Schnall-Levin M et al (2016) Haplotyping ger-mline and cancer genomes with high-throughput linked-read sequencing. Nat Biotechnol 34:303–311
Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.