variable evolutionary routes to host establishment across ... · pdf filenew host species...

6
Variable evolutionary routes to host establishment across repeated rabies virus host shifts among bats Daniel G. Streicker a,b,1 , Sonia M. Altizer a , Andrés Velasco-Villa b , and Charles E. Rupprecht b a Odum School of Ecology, University of Georgia, Athens, GA 30602; and b Poxvirus and Rabies Branch, US Centers for Disease Control and Prevention, Atlanta, GA 30333 Edited by Edward C Holmes, Pennsylvania State University, University Park, PA, and accepted by the Editorial Board October 14, 2012 (received for review February 27, 2012) Determining the genetic pathways that viruses traverse to establish in new host species is crucial to predict the outcome of cross-species transmission but poorly understood for most hostvirus systems. Using sequences encoding 78% of the rabies virus genome, we explored the extent, repeatability and dynamic outcome of evolu- tion associated with multiple host shifts among New World bats. Episodic bursts of positive selection were detected in several viral proteins, including regions associated with host cell interaction and viral replication. Host shifts involved unique sets of substitutions, and few sites exhibited repeated evolution across adaptation to many bat species, suggesting diverse genetic determinants over host range. Combining these results with genetic reconstructions of the demographic histories of individual viral lineages revealed that although rabies viruses shared consistent three-stage processes of emergence in each new bat species, host shifts involving greater numbers of positively selected substitutions had longer delays be- tween cross-species transmission and enzootic viral establishment. Our results point to multiple evolutionary routes to host establish- ment in a zoonotic RNA virus that may inuence the speed of viral emergence. comparative | phylogeny | chiroptera | Lyssavirus U nderstanding how natural selection operates in novel or changing environments is important for managing a variety of ecological problems, including species responses to climate change and the dynamics of biological invasions. Evolutionary dynamics are particularly salient in pathogens such as RNA viruses, whose tendency to jump between host species makes them a major source of newly emerging infectious diseases af- fecting humans, domestic animals, and wildlife (1). The success of RNA viruses in crossing species barriers is enhanced by their high mutation rates (generated by error-prone RNA poly- merases and large within-host population sizes), which provide genetic and phenotypic variability that enable ongoing trans- mission in new hosts (2). However, because RNA viruses have small genomes and multifunctional proteins, many potentially benecial mutations have deleterious consequences for other aspects of viral tness, making epistatic interactions a potentially important constraint on viral evolution (3, 4). This could reduce the number of evolutionary routes to host establishment or en- force fundamental limits on host range (5). Evolutionary biologists have long portrayed adaptive change as a tness landscape, with isolated peaks separated by valleys of lower tness (6). For host-shifting viruses, an analogous land- scape, where potential host species constitute distinct peaks, may emerge from the need for viruses to adapt to the physiological and ecological differences between donor and recipient host species (7). If adaptation requires few evolutionary changes, viral establishment should occur quickly, and disease control efforts might focus more on limiting initial cross-species transmission than on posttransmission measures. Alternatively, if establishing in novel hosts requires greater numbers of adaptive changes or suites of co-occurring or sequential changes, this could provide a longer window for intervention within populations of the newly infected host species. Empirical support exists for both pathways to viral establishment. For example, Venezuelan equine enceph- alitis required few adaptive changes to infect horses, whereas other host shifts [e.g., severe acute respiratory syndrome (SARS) in humans] were associated with extensive genetic change (8, 9). Predicting the evolutionary dynamics of future host shifts from single emergence events is challenging because the strength of barriers to establishment likely depend on both the host species and viral variant involved. Comparative data from multiple host shifts are therefore needed to assess whether molecular pathways to host establishment are repeatable and whether the extent of evolutionary change affects the speed of emergence. Rabies virus (RV) (Lyssavirus, Rhabdoviridae), a zoonotic RNA virus with an evolutionary history dominated by host shifts within and among bats and carnivores, provides a rare opportunity to compare the evolutionary dynamics of repeated viral estab- lishment (10). Host shifts are especially prominent among New World bats, many of which harbor species-specic viral lineages that share a relatively recent common ancestor (11). Previous studies investigated the determinants of Lyssavirus adaptation to new host species using the genomic distribution of amino acid sites under selection. These studies focused on the glycoprotein (G), the sole surface protein responsible for host cell interaction and entry into the nervous system, and the nucleoprotein (N), which forms the viral capsid and plays a role in transcription and repli- cation. Highly localized amino acid changes were observed in the ectodomain of G, a region associated with host cell entry, but the mechanisms driving these changes were unclear (10, 12). Another possibility is that adaptation depends on changes in the RNA- dependent RNA polymerase [the large gene (L)]. This gene reg- ulates viral transcription and replication, which could ultimately alter pathogenesis, virulence, and transmission, as observed in avian metapneumoviruses and paramyxoviruses (13, 14). How- ever, no studies have investigated the role of L in the establish- ment of new RV reservoirs. Here, we applied Bayesian phylogenetic ancestral state re- construction to sequence data from the N, G, and L genes of 30 bat RV lineages to identify plausible host shifts between species. Combining these estimates of donorrecipient relationships with recently developed analyses of historical selection pressures allowed us to identify episodic positive selection in the RV ge- nome and to compare the extent and repeatability of evolutionary changes associated with numerous host shifts. Finally, we used estimates of past viral demography derived from genetic data to Author contributions: D.G.S. designed research; D.G.S. performed research; D.G.S. analyzed data; and D.G.S., S.M.A., A.V.-V., and C.E.R. wrote the paper. The authors declare no conict of interest. This article is a PNAS Direct Submission. E.C.H. is a guest editor invited by the Editorial Board. Data deposition: The sequences reported in this paper have been deposited in the GenBank database (accession nos. JQ595307JQ595379). 1 To whom correspondence should be addressed. E-mail: [email protected]. This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. 1073/pnas.1203456109/-/DCSupplemental. www.pnas.org/cgi/doi/10.1073/pnas.1203456109 PNAS | November 27, 2012 | vol. 109 | no. 48 | 1971519720 EVOLUTION

Upload: vuongbao

Post on 07-Mar-2018

216 views

Category:

Documents


1 download

TRANSCRIPT

Variable evolutionary routes to host establishmentacross repeated rabies virus host shifts among batsDaniel G. Streickera,b,1, Sonia M. Altizera, Andrés Velasco-Villab, and Charles E. Rupprechtb

aOdum School of Ecology, University of Georgia, Athens, GA 30602; and bPoxvirus and Rabies Branch, US Centers for Disease Control and Prevention, Atlanta,GA 30333

Edited by Edward C Holmes, Pennsylvania State University, University Park, PA, and accepted by the Editorial Board October 14, 2012 (received for reviewFebruary 27, 2012)

Determining the genetic pathways that viruses traverse to establishin new host species is crucial to predict the outcome of cross-speciestransmission but poorly understood for most host–virus systems.Using sequences encoding 78% of the rabies virus genome, weexplored the extent, repeatability and dynamic outcome of evolu-tion associated with multiple host shifts among New World bats.Episodic bursts of positive selection were detected in several viralproteins, including regions associated with host cell interaction andviral replication. Host shifts involved unique sets of substitutions,and few sites exhibited repeated evolution across adaptation tomany bat species, suggesting diverse genetic determinants overhost range. Combining these results with genetic reconstructionsof the demographic histories of individual viral lineages revealedthat although rabies viruses shared consistent three-stage processesof emergence in each new bat species, host shifts involving greaternumbers of positively selected substitutions had longer delays be-tween cross-species transmission and enzootic viral establishment.Our results point to multiple evolutionary routes to host establish-ment in a zoonotic RNA virus that may influence the speed ofviral emergence.

comparative | phylogeny | chiroptera | Lyssavirus

Understanding how natural selection operates in novel orchanging environments is important for managing a variety

of ecological problems, including species responses to climatechange and the dynamics of biological invasions. Evolutionarydynamics are particularly salient in pathogens such as RNAviruses, whose tendency to jump between host species makesthem a major source of newly emerging infectious diseases af-fecting humans, domestic animals, and wildlife (1). The successof RNA viruses in crossing species barriers is enhanced by theirhigh mutation rates (generated by error-prone RNA poly-merases and large within-host population sizes), which providegenetic and phenotypic variability that enable ongoing trans-mission in new hosts (2). However, because RNA viruses havesmall genomes and multifunctional proteins, many potentiallybeneficial mutations have deleterious consequences for otheraspects of viral fitness, making epistatic interactions a potentiallyimportant constraint on viral evolution (3, 4). This could reducethe number of evolutionary routes to host establishment or en-force fundamental limits on host range (5).Evolutionary biologists have long portrayed adaptive change

as a fitness landscape, with isolated peaks separated by valleys oflower fitness (6). For host-shifting viruses, an analogous land-scape, where potential host species constitute distinct peaks, mayemerge from the need for viruses to adapt to the physiologicaland ecological differences between donor and recipient hostspecies (7). If adaptation requires few evolutionary changes, viralestablishment should occur quickly, and disease control effortsmight focus more on limiting initial cross-species transmissionthan on posttransmission measures. Alternatively, if establishingin novel hosts requires greater numbers of adaptive changes orsuites of co-occurring or sequential changes, this could providea longer window for intervention within populations of the newly

infected host species. Empirical support exists for both pathwaysto viral establishment. For example, Venezuelan equine enceph-alitis required few adaptive changes to infect horses, whereasother host shifts [e.g., severe acute respiratory syndrome (SARS)in humans] were associated with extensive genetic change (8, 9).Predicting the evolutionary dynamics of future host shifts fromsingle emergence events is challenging because the strength ofbarriers to establishment likely depend on both the host speciesand viral variant involved. Comparative data from multiple hostshifts are therefore needed to assess whether molecular pathwaysto host establishment are repeatable and whether the extent ofevolutionary change affects the speed of emergence.Rabies virus (RV) (Lyssavirus, Rhabdoviridae), a zoonotic

RNA virus with an evolutionary history dominated by host shiftswithin and among bats and carnivores, provides a rare opportunityto compare the evolutionary dynamics of repeated viral estab-lishment (10). Host shifts are especially prominent among NewWorld bats, many of which harbor species-specific viral lineagesthat share a relatively recent common ancestor (11). Previousstudies investigated the determinants of Lyssavirus adaptation tonew host species using the genomic distribution of amino acid sitesunder selection. These studies focused on the glycoprotein (G),the sole surface protein responsible for host cell interaction andentry into the nervous system, and the nucleoprotein (N), whichforms the viral capsid and plays a role in transcription and repli-cation. Highly localized amino acid changes were observed in theectodomain of G, a region associated with host cell entry, but themechanisms driving these changes were unclear (10, 12). Anotherpossibility is that adaptation depends on changes in the RNA-dependent RNA polymerase [the large gene (L)]. This gene reg-ulates viral transcription and replication, which could ultimatelyalter pathogenesis, virulence, and transmission, as observed inavian metapneumoviruses and paramyxoviruses (13, 14). How-ever, no studies have investigated the role of L in the establish-ment of new RV reservoirs.Here, we applied Bayesian phylogenetic ancestral state re-

construction to sequence data from the N, G, and L genes of 30bat RV lineages to identify plausible host shifts between species.Combining these estimates of donor–recipient relationships withrecently developed analyses of historical selection pressuresallowed us to identify episodic positive selection in the RV ge-nome and to compare the extent and repeatability of evolutionarychanges associated with numerous host shifts. Finally, we usedestimates of past viral demography derived from genetic data to

Author contributions: D.G.S. designed research; D.G.S. performed research; D.G.S.analyzed data; and D.G.S., S.M.A., A.V.-V., and C.E.R. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission. E.C.H. is a guest editor invited by theEditorial Board.

Data deposition: The sequences reported in this paper have been deposited in theGenBank database (accession nos. JQ595307–JQ595379).1To whom correspondence should be addressed. E-mail: [email protected].

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1203456109/-/DCSupplemental.

www.pnas.org/cgi/doi/10.1073/pnas.1203456109 PNAS | November 27, 2012 | vol. 109 | no. 48 | 19715–19720

EVOLU

TION

investigate associations between the number of positively selectedchanges since viral introduction and the speed of establishment ineach bat species.

ResultsSpatial and Temporal Patterns of Selection Along the RV Genome.We quantified the rates of nonsynonymous (dN) and synonymous(dS) substitutions across the branches of maximum likelihood(ML) phylogenetic trees inferred for the N, G, and L genes ofmajor bat RV lineages from a total of 184 viral isolates. Therelative difference in dN and dS indicates the selection pressureexerted on genes, with dN/dS > 1 (or equivalently, dN − dS > 0)indicative of positive selection. The overall dN/dS ratios were low(0.05, 0.15, and 0.05 in the N, G, and L genes, respectively),indicating a dominance of purifying selection when averagingacross all sites and branches of the phylogenetic trees. Next, weexamined selection on specific amino acid positions in two ways:first, we used a fixed effects likelihood (FEL) analysis that as-sumed constant selective pressure over time; and second,through a mixed effects model of episodic selection (MEME) to

allow for temporally varying positive selection (15, 16). For allthree genes, both methods identified amino acid sites putativelyevolving under positive selection (Table S1). Nearly all sitesdetected by FEL were also supported by MEME; however,MEME further identified positively selected sites that wereclassified as evolving neutrally or under purifying selection byFEL (Table S1). These disagreements reflect the divergentassumptions of each model and are consistent with transientpositive selection followed by enduring purifying selection tomaintain those changes (15). Indeed, likelihood ratio tests(LRTs) comparing the MEME to the FEL models for individualsites commonly favored the episodic selection model over theconstant selection model (Table S1). On average, positively se-lected codons in N, G, and L only experienced selection on 0.8%,4.6%, and 2.6% of branches in their respective phylogenies.However, several codons underwent more frequent substitutions.For example, site G493 switched nine times between six differentamino acid residues, G357 underwent seven flip-flop substitu-tions between valine and isoleucine, and L1620 switched repeat-edly between glycine and four other residues (Fig. 1 B and C). In

204222

637638643888

970

96

236

921

1012

1019

1742

1840

13191620

1743

20902093

461107108160437

ApV

I0.01

NhV

EfV2

EfV3

MaVPhV

MV2

MnV

EfrV

TbV2

DrV

TbV1

HmV

NlV

MV1

EfV1b

EfV1a

CtVMmV

LnV1

LiV1/LiV2

PsVLxVLsV

LcV

LbV2

LbV1

LeV

0.01

EfV1b

EfV1a

LiV1/LiV2

DrV

TbV1

MaV

EfV2

EfV3

MV2

PhVMcV

MyV

MnV

EfrV

MV1

TbV2

CtVPsV

LnV1

LsVLxV

LbV1

LbV2

LcV

MmV

0.01

EfV2

EfV3

McV

PhV

MaV

MyV

MV2

MV1

EfV1a

EfV1b

TbV2

DrV

TbV1

CtV

LiV2

LiV1

PsV

LnV1

LbV2

LxV

LsV

LcV

LbV1

3468139264331357

373389448

493515

Nucleoprotein Glycoprotein Polymerase

LnV2

469

48

1624

internal

tips tips

internal

tips

internalA B C

Fig. 1. Episodic positive selection in the bat RV phylogeny. Maximum likelihood topologies of the bat RV N, G and L genes are shown in A, B and C, re-spectively. Symbols indicate amino acid changes in putatively positively selected sites only. Color spectra of points follow the relative position along eachgene. Inset pie charts show the ratio of internal to tip substitutions for all sites in each gene. Lineage labels denote the reservoir host species as follows: Ap,Antrozous pallidus; Ct, Corynorhinus townsendii; Dr, Desmodus rotundus; Ef, Eptesicus fuscus; Efr, Eptesicus furinalis; Hm, Histiotus montanus; Ln, Lasio-nycteris noctivagans; Lb, Lasiurus borealis; Lc, Lasiurus cinereus; Le, Lasiurus ega; Li, Lasiurus intermedius; Ls, Lasiurus seminolus; Lx, Lasiurus xanthinus; Mm,Molossus molossus; M, Myotis sp.; Ma, Myotis austroriparius; Mc, Myotis californicus; Mn, Myotis nigricans; My, Myotis yumanensis; Nh, Nycticeius humeralis;Nl, Nyctinomops laticaudatus; Ph, Parastrellus hesperus; Ps, Perimyotis subflavus; Tb, Tadarida brasiliensis.

19716 | www.pnas.org/cgi/doi/10.1073/pnas.1203456109 Streicker et al.

each of these sites, LRTs favored the FEL model over the morecomplex MEME, consistent with more pervasive selection overtime (Table S1).Positively selected sites were located mainly within the first

160 codons of N (Table S1). In contrast to G and L, the sub-stitutions in N occurred almost exclusively on the tips of the tree,indicative of false positives with respect to host adaptation (Fig.1 and Fig. S1). We therefore restricted further analyses of pos-itive selection to G and L, where substitutions along internalbranches were more consistent with a role in host establishment.In G, nine positively selected sites were found in the ectodomain,including a main antigenic site, and two others were found in theendodomain of G, which interacts with internal viral proteins(Table S1). We also found an elevated dN in site 333 of theectodomain, a position of known importance for the attenuationof laboratory RV strains, although this did not meet our criteriaof statistical significance (dN/dS = 1.189; PFEL = 0.81; PMEME =0.07) (17). In L, positive selection was detected in all of theputative functional domains except region II (Table S1).To characterize and quantify the positively selected changes

associated with RV shifts into new bat species, we used ancestralstate reconstruction to assign host states and genetic changes tothe internal branches of a consensus phylogeny of RV estimatedfrom a joint Bayesian analysis of the N, G, and L genes (SI Textand Fig. S2). Transfers into new host species were followed bysubstitutions at zero to five positively selected sites. Host shiftsinvolved unique evolutionary routes, rather than frequent alter-ation of a few, key sites linked to host tropism (Fig. 1 and Fig.S2). Pairs of amino acid sites rarely underwent more sub-stitutions on the same branch than expected by chance, and suchepistatic interactions were particularly rare among positivelyselected sites (Table S2).

Linking the Epidemiological and Evolutionary Dynamics of Emergence.To characterize the epizootic dynamics of each host shift, we ap-plied a Bayesian coalescent approach to infer past viral populationdynamics. In 11 of the 13 RV lineages for which we had sufficientN gene sequence data for this analysis, we detected significantsignatures of viral population growth (Table S3). In most of theselineages, reconstructions of past demographic histories using thenonparametric Bayesian skyline model revealed similar three-stage processes of host shifts. Nearly all RV lineages experienceda lag phase with a low effective number of infections, an epizootic

phase during which infections increased, and an enzootic phasewhere the effective number of infections plateaued (Fig. 2).We hypothesized that viruses might establish more quickly in

new hosts when adaptation required changes in fewer numbersof positively selected sites, whereas longer lag phases mightcorrespond to more extensive evolutionary change (7). To in-vestigate the speed of viral emergence, we estimated the tem-poral delay between cross-species transmission and the onset ofepizootic growth for each RV lineage. The date of cross-speciestransmission was estimated from the consensus bat RV phylog-eny, assuming either that transmission occurred either: scenario1, at the base of the “stem” branch of each RV lineage; or sce-nario 2, including additional branches of the RV phylogenyassigned to each host species by ancestral state reconstructionfollowing a criterion of statistical significance (Fig. 2, Top Rightgraph and SI Text). The latter estimate is more biologically ac-curate because evolutionary changes on internal branches oc-curred in a bat species, but it suffers increased statisticaluncertainty because ancestral host states could not always beassigned unambiguously, particularly in deeper nodes of the RVphylogeny (Table S4). The “epizootic lag time” was then calcu-lated probabilistically as the difference between the posteriordistribution of the year of cross-species transmission and theposterior distribution of the year of the transition between his-torical (small) and contemporary (large) effective numbers ofviral infections according to Bayesian “two-epoch” demographicmodels (Fig. 2, Top Right graph). By iteratively regressing ran-dom draws from the posterior distribution of each epizootic lagtime against the number of sites that underwent positively se-lected amino acid changes, we tested the relationship betweenthe extent of molecular adaptation and delays in viral estab-lishment, while accounting for phylogenetic and demographicuncertainty. Under scenario 2, greater numbers of amino acidchanges in G and L were associated with longer delays untilenzootic viral establishment, as evidenced by the lack of overlapof the 95% confidence interval (CI) of the slope parameter withzero (Fig. 3, slope; β: 95% CI = 10.71–79.96). The relationshipwas also significant when we considered only the number ofchanges in L (β: 95% CI = 14.02–116.54) but was less supportedby data from G alone (β: 95% CI = −8.86–81.03). The moreconservative dates of cross-species transmission and, therefore,shorter lag times (scenario 1 above), yielded similar results for L(β: 95% CI = 12.27–133.81), but a negative relationship emerged

1

10

100

1000

10000 DrV EfV1a EfV2

1

10

100

1000

10000 EfV3 LbV2 LcV LnV

1900 1940 1980

1

10

100

1000

10000 MV2

1900 1940 1980

PhV

1900 1940 1980

PsV

1900 1940 1980

TbV

Effe

ctiv

e nu

mbe

r of R

V in

fect

ions

()

Pos

terio

r dis

tribu

tion

of tr

ansi

tion

to e

pizo

otic

gro

wth

T1

Virus"A"

Virus "B"

T0

T2

Year

Fig. 2. Demographic histories and transition timesto epizootic growth for 11 bat RV lineages. Eachgraph shows a Bayesian skyline plot estimated fromN gene data. The effective number of infections isthe product of the effective population size (Ne)and the generation time between infections (τ).Dashed lines are the 95% highest posterior density.Overlaid histograms show the posterior distri-butions of the transition times between historical(pregrowth) and contemporary demographic peri-ods from the two-epoch models, with 95% limitsshaded in red. Small black triangles are mediantransition times. The vertical axis for histograms wasvaried across graphs to enable comparable visuali-zation, and the scale was therefore omitted. TopRight graph shows a schematic for calculating theepizootic lag time from the two-epoch models andthe joint phylogeny. The epizootic lag time for virusA is T0 – T2, when including inferred ancestral-hoststates (scenario 2); or T1 – T2, when the origins ofhost shifts were treated as unknown (scenario 1).

Streicker et al. PNAS | November 27, 2012 | vol. 109 | no. 48 | 19717

EVOLU

TION

between lag time and the number of selected sites in G (β: 95%CI = −107.96 to −5.33), perhaps reflecting the greater frequencyof changes in G on deeper branches of the phylogeny (Fig. 1) ora spurious effect arising from the low numbers of sites that un-derwent positively selected changes in this analysis.Because the more abundant synonymous and nonpositively se-

lected, nonsynonymous (NPN) substitutions largely determined thebranch lengths that defined the epizootic lag time in this analysis,these substitutions were not surprisingly also correlated with theepizootic lag time in univariate tests (Fig. S3). However, neitherwas robust to inclusion in multivariate generalized linear modelsthat contained positively selected sites, which remained highlysignificant (positively selected: F1,8 = 8.20, P = 0.021; synonymous:F1,8 = 1.47, P = 0.250; NPN: F1,8 = 3.11, P = 0.12). Moreover,models with positively selected sites alone provided a significantlybetter fit than models including either of the other substitutiontypes according to Akaike’s information criterion (versus synony-mous: ΔAIC = 4.83; versus NPN: ΔAIC = 4.14). These differencesin predictive power reflected the absence of strong correlationsbetween the numbers of positively selected and synonymous sub-stitutions (r = 0.56; P = 0.071) or NPN substitutions (r = 0.50; P =0.12), suggesting that positively selected changes do not simply andnecessarily accumulate as a function of time.

DiscussionThe repeated emergence of RV among New World bats provideda unique opportunity to explore the whether viral adaptation tonew host species was repeatable and predictable and to assess therelationship between evolutionary pathways to host establishmentand the timing of emergence. Positive selection occurred episod-ically within the evolutionary history of RV in both exposed andinternal viral proteins and the adaptive changes associated with

host shifts were largely unique to each virus. Moreover, ouranalysis demonstrated that the extent of evolution requiredfor establishment in a new host was related to the speedof emergence.Previous studies found mixed evidence for positive selection in

some regions of G, but our study also implicated L in the es-tablishment of new RV reservoirs (10, 12). Because our datasetencompassed more host species, viral lineages and genes, welikely had greater power to identify positive selection. Ouranalysis also used biologically plausible models for host switchingthat allowed selection to vary across genomic space and evolu-tionary time (15). Like all analytical methods to detect positiveselection, this approach may be conservative in detecting somepositively selected sites and liberal in others; however, it affordeda major advantage in the ability to detect episodic positive se-lection in an evolutionary history that was otherwise dominatedby purifying selection (Fig. S1). Assuming that positive selectionoccurred during the process of host shifting, our results suggestthat adaptation commonly occurred during transient periods,followed by the longer periods of purifying selection that aremore typical of contemporary RV evolution (10, 12, 18). In RV,the rarity of temporally pervasive positive selection might beexplained by the low efficiency of innate or adaptive hostdefenses against a productive viral infection in the central ner-vous system (18). More constant positive selection might beexpected for viruses replicating in tissues with greater contactwith host immune systems or reduced physiological stability.The specific locations of amino acid sites under selection

provided clues to the biological mechanisms that enabled viralestablishment in new host species. First, most sites evolving un-der positive selection in G were found in the ectodomain. Al-though this region mediates viral interaction with host cells, theability of RV to infect and replicate in all mammals studied todate suggests that entry into host cells is not a barrier for hostshifts (18). Instead, viral establishment may rely on mechanismsto achieve a balance between virulence and transmission. In thisregard, several regions of the ectodomain, particularly antigenicsites II and III, hold residues that can affect pathogenicity (20,21). We found evidence for selection operating on a cluster ofsites in or adjacent to antigenic site III (Table S1 and Fig. S1).Moreover, dN was elevated marginally at the site 333 of theectodomain, including substitutions away from arginine and ly-sine, which can reduce or eliminate pathogenicity, disrupt cell-to-cell spread and block pathways to penetrate the central nervoussystem in laboratory RV strains (17, 22, 23). Although thesesubstitutions are unlikely to have such dramatic effects on thepathogenicity of naturally circulating RVs, processes critical forthe within-host progression of RV could be directly affected bythese changes or indirectly affected by compensatory mutationsthat restore pathogenicity.Selection was also common on internal viral regions that have

minimal interaction with the host environment. These includedthe catalytic domain of L, the region of L that interacts with thephosphoprotein to form the RNA polymerase and the endodo-main of G. One explanation for selection on these regions is thatthey could modulate viral transcription and replication to favorRV arrival in the salivary glands before overt host morbidity anddeath, enabling transmission to new hosts. Considering the di-versity of bat RV reservoirs in colonial aggregation, dispersalbehavior and seasonality, it seems probable that RVs vary theirinfection strategies across bat species (24). For example, virusesmay differ in replication rate, incubation period, or ability toreplicate at low temperatures in epithelial tissues, traits thatmight increase the probability of transmission to other species(including humans) given exposure (25).Most of the 33 codons that were putatively linked with host

adaptation experienced selection only in a handful of host shifts,and pairs of sites showed little correlated evolution (Fig. 1 and

0 1 2 3 4 5

50

100

150

200

250

300

350

400

Positively selected amino acid sites

Epi

zoot

ic la

g tim

e (y

ears

)

F1,9 = 14.86, p = 0.004

r2 = 0.62

0 30 60 90

0

25

50 Slope95% CI

Fig. 3. Relationship between the number of positively selected amino acidchanges in G and L and the epizootic lag time. Substitutions on terminalbranches and in N were excluded because the timing of these substitutionswas inconsistent with a postulated role in host adaptation. The black line andpoints show the median relationship with corresponding statistics in the UpperLeft corner. Gray lines show 2,000 model predictions using random draws fromthe posterior distribution of epizootic lag time for each viral lineage. The insethistogram shows the distribution of the slope parameter from the iteratedmodels, with 95% limits shaded in black (note the lack of overlap with zero).

19718 | www.pnas.org/cgi/doi/10.1073/pnas.1203456109 Streicker et al.

Table S2). Unique genomic routes across host shifts could resultfrom a diverse array of functionally equivalent changes or ifadaptation to each bat species required unique sets of changes.In either case, our results imply low predictability of specific sitesinvolved in adaptation during future host shifts. This observationdiverges from expectations from experimental passaging of foot-and-mouth disease virus and vesicular stomatitis virus, where thesmall, multifunctional genomes typical of RNA viruses appearedto promote convergent evolutionary routes to adaptation, al-though the environmental treatments in these studies were rel-atively homogenous compared with the multiple host speciesstudied here (26, 27). On the other hand, recent studies of avianH5N1 influenza viruses demonstrated that alternative sets ofamino acid changes allowed airborne transmission among fer-rets, suggesting disparate genetic routes to a similar infectionphenotype (28, 29). Interestingly, our results and studies in otherRNA viruses suggested that replication rate could contribute tohost range by altering pathogenesis and ultimately transmission(14, 30). One possibility is that replication might be manipulatedthrough numerous processes occurring across the viral genome,opening up the potential for functionally similar but geneticallydistinct routes to establishment that might differ in complexityand probability. In contrast, more specific barriers such as use ofnovel cell receptors or evasion of immunity might constrainevolutionary pathways, particularly in surface proteins. Thus, it ispossible that constraints on viral evolution might vary predictablyboth among viral proteins and among virus species according tothe type of barriers to host-range expansion.For RV lineages in bats, zero to five positively selected amino

acid changes in G and L occurred along emergent branches,leading to new host species. Shortened evolutionary pathways toestablishment might reflect ancestral changes present in donorhost species that remained adaptive in the recipient host species,effectively providing an evolutionary shortcut to host adaptation(i.e., preadaptation). On the other hand, greater distance be-tween the optimal viral genotypes for a given pair of host speciescould increase the number of substitutions needed for a hostshift. A major question for anticipating the evolutionary andepidemiological dynamics of viral host shifts is, therefore, thedegree to which variability in the evolutionary routes to adap-tation affects the speed (or likelihood) of viral emergence (7).Consistent with theoretical expectations, across the series of hostshifts that we studied, greater numbers of positively selectedamino acid changes were associated with delayed viral estab-lishment in new host species (Fig. 3). This suggests that virusesthat undergo fewer necessary adaptive changes before estab-lishment, either because of similarity in the adaptive landscapesof recipient and donor host species or through functionallyequivalent but genetically distinct routes to adaptation, couldemerge faster and with higher probability.Despite using the best available methods and explicitly in-

corporating uncertainty into our statistical models, our analysesof epizootic lag time faced several challenges, including how bestto define the true date of host shifts on our phylogenetic trees.We addressed this by using statistical cutoffs in our assignment ofancestral-host states to branches, assigning the dates of cross-species transmission based on different scenarios, and by in-cluding all known bat RV lineages. Although our estimates ofthe dates of host shifts are consistent with the first historicalreports of bat rabies in the Americas during European coloni-zation, the actual dates should be treated with caution. In par-ticular, because we could not include undiscovered or extinct RVlineages, it is possible that some host shifts occurred more re-cently than we estimated (although there is no reason to suspecta systematic effect that would influence our comparative analy-sis). Next, because the epizootic lag times were calculated fromsequence data, positively selected changes could simply haveaccumulated over time, with shifts to epizootic growth caused by

some other factor. However, support for this scenario would alsorequire a strong correlation between positively selected changesand synonymous substitutions, which, in turn, should have beenthe best predictor of epizootic lag time: neither of these criteriawere supported by our analyses. Thus, although the present datacannot definitively resolve causality, our findings are most con-sistent with the idea that the number of positively selected siteswas the underlying driver of variation in epizootic lag times.Testing this relationship in systems where experimental manip-ulation is possible or where ecological data exist to parameterizethe epizootic lag time is an important next step. Finally, we notethat although our assumption that the positive selection observedwas related to cross-species transmission was supported by thetiming of substitutions along branches, only controlled experi-ments can reveal the biological effects of these changes andwhether they enable onward transmission in new host species.In conclusion, our results showed evidence for episodic positive

selection on several RV genes during the early phases of hostshifts. Evolutionary routes to viral establishment shared few stepsin common, suggesting that diverse genetic changes can accompanyadaptation to new hosts and limiting the utility of past host shiftsfor predicting the evolutionary dynamics of future emergence.Importantly, variation in evolutionary routes to viral establishmentmight have demographic consequences: the number of positivelyselected changes was the best predictor of the duration of adaptiveperiods and, therefore, the speed of viral emergence. Identifyingdeterminants of the extent of viral evolution needed for host shiftsis therefore an outstanding question to anticipate the speed of viralestablishment. These determinants could include evolutionarysimilarity among donor and recipient species or ecological factors,such as differences in population density, social structure, migra-tion, or overwintering behavior (11, 24, 31, 32). Future studieslinking specific viral mutations to adaptation to host biology will beimportant for predicting pathogen emergence, and we highlight theutility of a comparative approach that analyzes multiple, naturalhost shift events to guide these efforts.

Materials and MethodsEstimating Selection Pressures Along the Viral Genome.We assembled datasetsfor the N, G, and L genes of bat RVs by supplementing published sequenceswithsequences generated from the virus archive of the Centers for Disease Controland Prevention (CDC) Rabies Program. The L (6,387 bp) and G (1,575 bp)sequences from this study were generated for 21 and 19 representative RVlineages (one to six isolates/lineage), respectively, from 15 bat species (SI Textand Table S5). All sequences from this study have been deposited in the Gen-Bank database (accession nos. JQ595307–JQ595379; Table S6). Additional RV N(n = 625), G (n = 39), and L (n = 3) sequences associatedwith bats were collectedfrom GenBank. For each gene, datasets were assembled that containeda maximum of 10 randomly selected but unique sequences per lineage. Finaldatasets comprised 30 viral lineages for N (184 unique sequences), 26 lineagesfor G (68 unique sequences), and 23 lineages for L (48 unique sequences). Weestimated a phylogenetic tree for each dataset using five replicate ML searchesin Garli Version 096b8 under substitution models selected by jModeltest, usinga raccoon RV sequence as an outgroup (33, 34). The tree with the highest loglikelihood was used in analyses of selection after removing the outgroup. Theselection pressures at specific codon sites were estimated using the FELmethod,which independently fits dN and dS to each codon position and compares the fitof these models to a null assuming dN = dS via an LRT with 1 df (16). We alsoused a mixed-effects model of episodic selection (MEME), which considered thedN/dS at each site as a fixed effect, while allowing for two categories ofbranches, those with dN/dS ≤ 1 and those with dN/dS > 1, which was treated asa random effect (15). This model was tested against a null that constrained allbranches to have dN/dS < 1. Significant positive or negative selection was in-dicated by P values of <0.05 in the MEME analysis and P values of <0.1 in theFEL analysis because the more conservative nature of the latter test reducesthe probability of type 1 errors (15, 16). As described in ref. 15, because FELis nested within MEME, with the crucial difference being the presence ofbranch-to-branch variation in substitution rates in MEME, we comparedthe fit of the pervasive and episodic selection models at each site using LRTswith 2 df. Sites were included in later analyses if either method detectedsignificant positive selection.

Streicker et al. PNAS | November 27, 2012 | vol. 109 | no. 48 | 19719

EVOLU

TION

Bayesian Ancestral Host–State Estimation. To identify bat species in whichpositive selection occurred, we used Bayesian phylogenetic ancestral host–state reconstruction using Bayesian Evolutionary Analysis by Sampling Trees(BEAST) Version 1.7 (35, 36). We extended a previously published analysis byallowing host shifts to occur asymmetrically between host species, by in-cluding viral lineages from Central and South American bat species and,most importantly, by integrating information from the N, G, and L genes ina joint phylogenetic analysis (11). For each of the 30 host-associated line-ages, we selected a maximum of three representative isolates, preferentiallychoosing those isolates for which we had sequences from multiple genes.Thus, our dataset comprised 87 isolates with a total of 86, 58, and 44sequences for N, G, and L, respectively. Each alignment was treated asa separate data partition, allowing us to estimate a consensus evolutionaryhistory of bat rabies and the most probable host states along branches (seeSI Text for analytical details).

Reconstructing Viral Demographic Histories and Transitions to EpizooticGrowth. We assembled 13 datasets of N gene sequences for which a mini-mum of 20 sequences (mean, 45; range, 20–82) were sampled over at least10 y (mean, 22.3; range, 10–30 y). N was chosen because it was the mostthoroughly sampled gene in terms of the number and temporal range ofsequences. Bayesian skyline plots (BSPs) were estimated in BEAST usingsubstitution and molecular clock models that were customized for eachlineage (Table S3). We compared models assuming constant populationsizes, exponential growth, and logistic growth using the stepping-stonemethod to calculate the marginal likelihood of each demographic model,enabling the use of Bayes factors (BFs) for hypothesis testing (37). Rejectionof the constant population size model (BF > 3) by the exponential or logisticgrowth model was considered evidence of population growth. A single

exception was made for the virus maintained by Desmodus rotundus (DrV),which had a complex demographic history that was poorly captured by theparametric models relative to the BSP (Fig. 2 and Table S3). When viralpopulation growth was supported, we estimated the beginning of thegrowth phase using parametric two-epoch models, characterized by twohistorical periods of constant but distinct viral population sizes and a pa-rameter describing the transition time between them (38, 39). Priors fordemographic parameters were informed from the results of the BSPs. Theuniform prior on the molecular clock was constrained to the 95% bounds ofthat parameter from the corresponding BSP analysis; the 95% bounds ofpopulation sizes at the beginning and end of the BSP were used as thebounds of the 1/x prior distributions for each constant size period; the lowerlimit of the transition time prior was the lower 95% bound of the time sincethe most recent common ancestor of the lineage and the upper limit was 5–15 y before the most recently sampled virus. Thus, we effectively fit the two-epoch model to the observed BSP for each lineage but verified that posteriordistributions were not strongly impinged by our prior choices and used morediffuse priors when necessary (38). Demographic inference combined threeto four replicate runs of 100 million generations after discarding the first20% of each as burn-in.

ACKNOWLEDGMENTS. We thank Ivan Kuzmin for providing some of theoligonucleotide sequences, Sergei Kosakovsky Pond for suggestions ondetection of positive selection, and Eric Crandall for help with the two-epoch demographic models. We thank many colleagues in the US statehealth departments and abroad for submission of samples for diagnosticconfirmation and viral characterization. D.G.S. was supported by a Disser-tation Completion Award from the University of Georgia and NationalScience Foundation Grant DEB-1020966.

1. Daszak P, Cunningham AA, Hyatt AD (2000) Emerging infectious diseases of wildlife—threats to biodiversity and human health. Science 287(5452):443–449.

2. Moya A, Holmes EC, González-Candelas F (2004) The population genetics and evo-lutionary epidemiology of RNA viruses. Nat Rev Microbiol 2(4):279–288.

3. Shapiro B, Rambaut A, Pybus OG, Holmes EC (2006) A phylogenetic method for de-tecting positive epistasis in gene sequences and its application to RNA virus evolution.Mol Biol Evol 23(9):1724–1730.

4. Holmes EC (2003) Error thresholds and the constraints to RNA virus evolution. TrendsMicrobiol 11(12):543–546.

5. Holmes EC (2009) The Evolution and Emergence of RNA Viruses (Oxford Univ Press,New York).

6. Wright S (1932) The roles of mutation, inbreeding, crossbreeding and selectionin evolution. Proceedings of the Sixth International Congress of Genetics (GeneticsSociety of America, Austin, TX), pp 356–366.

7. Kuiken T, et al. (2006) Host species barriers to influenza virus infections. Science 312(5772):394–397.

8. Anishchenko M, et al. (2006) Venezuelan encephalitis emergence mediated bya phylogenetically predicted viral mutation. Proc Natl Acad Sci USA 103(13):4994–4999.

9. Song HD, et al. (2005) Cross-host evolution of severe acute respiratory syndromecoronavirus in palm civet and human. Proc Natl Acad Sci USA 102(7):2430–2435.

10. Badrane H, Tordo N (2001) Host switching in Lyssavirus history from the Chiroptera tothe Carnivora orders. J Virol 75(17):8096–8104.

11. Streicker DG, et al. (2010) Host phylogeny constrains cross-species emergence andestablishment of rabies virus in bats. Science 329(5992):676–679.

12. Holmes EC, Woelk CH, Kassis R, Bourhy H (2002) Genetic constraints and the adaptiveevolution of rabies virus in nature. Virology 292(2):247–257.

13. Brown PA, et al. (2011) A single polymerase (L) mutation in avian metapneumovirusincreased virulence and partially maintained virus viability at an elevated tempera-ture. J Gen Virol 92(Pt 2):346–354.

14. Dortmans JCFM, Rottier PJ, Koch G, Peeters BPH (2011) Passaging of a Newcastledisease virus pigeon variant in chickens results in selection of viruses with mutationsin the polymerase complex enhancing virus replication and virulence. J Gen Virol92(Pt 2):336–345.

15. Murrell B, et al. (2012) Detecting individual sites subject to episodic diversifying se-lection. PLoS Genet 8(7):e1002764.

16. Kosakovsky Pond SL, Frost SDW (2005) Not so different after all: A comparison ofmethods for detecting amino acid sites under selection. Mol Biol Evol 22(5):1208–1222.

17. Dietzschold B, et al. (1983) Characterization of an antigenic determinant of the gly-coprotein that correlates with pathogenicity of rabies virus. Proc Natl Acad Sci USA80(1):70–74.

18. Rupprecht CE, Turmelle A, Kuzmin IV (2011) A perspective on lyssavirus emergenceand perpetuation. Curr Opin Virol 1(6):662–670.

19. Johnson N, Cunningham AF, Fooks AR (2010) The immune response to rabies virusinfection and vaccination. Vaccine 28(23):3896–3901.

20. Prehaud C, Coulon P, LaFay F, Thiers C, Flamand A (1988) Antigenic site II of the rabiesvirus glycoprotein: Structure and role in viral virulence. J Virol 62(1):1–7.

21. Seif I, Coulon P, Rollin PE, Flamand A (1985) Rabies virulence: Effect on pathogenicityand sequence characterization of rabies virus mutations affecting antigenic site III ofthe glycoprotein. J Virol 53(3):926–934.

22. Dietzschold B, et al. (1985) Differences in cell-to-cell spread of pathogenic and apa-thogenic rabies virus in vivo and in vitro. J Virol 56(1):12–18.

23. Kucera P, Dolivo M, Coulon P, Flamand A (1985) Pathways of the early propagation ofvirulent and avirulent rabies strains from the eye to the brain. J Virol 55(1):158–162.

24. Streicker DG, Lemey P, Velasco-Villa A, Rupprecht CE (2012) Rates of viral evolutionare linked to host geography in bat rabies. PLoS Pathog 8(5):e1002720.

25. Morimoto K, et al. (1996) Characterization of a unique variant of bat rabies virusresponsible for newly emerging human cases in North America. Proc Natl Acad SciUSA 93(11):5653–5658.

26. Fares MA, et al. (2001) Evidence for positive selection in the capsid protein-codingregion of the foot-and-mouth disease virus (FMDV) subjected to experimental pas-sage regimens. Mol Biol Evol 18(1):10–21.

27. Cuevas JM, Elena SF, Moya A (2002) Molecular basis of adaptive convergence in ex-perimental populations of RNA viruses. Genetics 162(2):533–542.

28. Herfst S, et al. (2012) Airborne transmission of influenza A/H5N1 virus between fer-rets. Science 336(6088):1534–1541.

29. Imai M, et al. (2012) Experimental adaptation of an influenza H5 HA confers re-spiratory droplet transmission to a reassortant H5 HA/H1N1 virus in ferrets. Nature486(7403):420–428.

30. Steel J, Lowen AC, Mubareka S, Palese P (2009) Transmission of influenza virus ina mammalian host is increased by PB2 amino acids 627K or 627E/701N. PLoS Pathog5(1):e1000252.

31. Longdon B, Hadfield JD, Webster CL, Obbard DJ, Jiggins FM (2011) Host phylogenydetermines viral persistence and replication in novel hosts. PLoS Pathog 7(9):e1002260.

32. Parrish CR, et al. (2008) Cross-species virus transmission and the emergence of newepidemic diseases. Microbiol Mol Biol Rev 72(3):457–470.

33. Zwickl DJ (2006) Genetic algorithm approaches for the phylogenetic analysis of largebiological sequence datasets under the maximum likelihood criterion. PhD disserta-tion (Univ of Texas at Austin, Austin, TX).

34. Posada D (2008) jModelTest: Phylogenetic model averaging. Mol Biol Evol 25(7):1253–1256.

35. Drummond AJ, Suchard MA, Xie D, Rambaut A (2012) Bayesian phylogenetics withBEAUti and the BEAST 1.7. Mol Biol Evol 29(8):1969–1973.

36. Lemey P, Rambaut A, Drummond AJ, Suchard MA (2009) Bayesian phylogeographyfinds its roots. PLOS Comput Biol 5(9):e1000520.

37. Baele G, et al. (2012) Improving the accuracy of demographic and molecular clockmodel comparison while accommodating phylogenetic uncertainty. Mol Biol Evol 29(9):2157–2167.

38. Crandall ED, Sbrocco EJ, Deboer TS, Barber PH, Carpenter KE (2012) Expansion dating:Calibrating molecular clocks in marine species from expansions onto the Sunda ShelfFollowing the Last Glacial Maximum. Mol Biol Evol 29(2):707–719.

39. Pybus OG, Drummond AJ, Nakano T, Robertson BH, Rambaut A (2003) The epide-miology and iatrogenic transmission of hepatitis C virus in Egypt: A Bayesian co-alescent approach. Mol Biol Evol 20(3):381–387.

19720 | www.pnas.org/cgi/doi/10.1073/pnas.1203456109 Streicker et al.