structural venomics reveals evolution of a complex venom ... · planet, with >100,000 extant...

10
Structural venomics reveals evolution of a complex venom by duplication and diversification of an ancient peptide-encoding gene Sandy S. Pineda a,b,1,2,3,4 , Yanni K.-Y. Chin a,c,1 , Eivind A. B. Undheim c,d,e , Sebastian Senff a , Mehdi Mobli c , Claire Dauly f , Pierre Escoubas g , Graham M. Nicholson h , Quentin Kaas a , Shaodong Guo a , Volker Herzig a,5 , John S. Mattick b,6 , and Glenn F. King a,2 a Institute for Molecular Bioscience, The University of Queensland, St Lucia, QLD 4072, Australia; b Garvan-Weizmann Centre for Cellular Genomics, Garvan Institute of Medical Research, Darlinghurst, Sydney, NSW 2010, Australia; c Centre for Advanced Imaging, The University of Queensland, St Lucia, QLD 4072, Australia; d Centre for Biodiversity Dynamics, Department of Biology, Norwegian University of Science and Technology, 7491 Trondheim, Norway; e Centre for Ecological & Evolutionary Synthesis, Department of Biosciences, University of Oslo, 0316 Oslo, Norway; f Thermo Fisher Scientific, 91941 Courtaboeuf Cedex, France; g University of Nice Sophia Antipolis, 06000 Nice, France; and h School of Life Sciences, University of Technology Sydney, Broadway, NSW 2007, Australia Edited by Adriaan Bax, National Institutes of Health, Bethesda, MD, and approved March 18, 2020 (received for review August 21, 2019) Spiders are one of the most successful venomous animals, with more than 48,000 described species. Most spider venoms are dominated by cysteine-rich peptides with a diverse range of phar- macological activities. Some spider venoms contain thousands of unique peptides, but little is known about the mechanisms used to generate such complex chemical arsenals. We used an integrated transcriptomic, proteomic, and structural biology approach to demonstrate that the lethal Australian funnel-web spider pro- duces 33 superfamilies of venom peptides and proteins. Twenty- six of the 33 superfamilies are disulfide-rich peptides, and we show that 15 of these are knottins that contribute >90% of the venom proteome. NMR analyses revealed that most of these disulfide-rich peptides are structurally related and range in complexity from sim- ple to highly elaborated knottin domains, as well as double-knot toxins, that likely evolved from a single ancestral toxin gene. spider venom | venom evolution | structural venomics | transcriptomics | proteomics S piders evolved from an arachnid ancestor in the Late Or- dovician around 450 million years ago (1), and they have since become one of the most successful animal lineages on the planet, with >100,000 extant species (2). A key contributor to their evolutionary success is the use of venom to capture prey and defend against predators. The constant selection pressure on venoms over hundreds of millions of years has enabled them to evolve into complex mixtures of bioactive compounds with a diverse range of pharmacological activities. Spider venoms are a heterogeneous mixture of salts, low molecular weight organic compounds (<1 kDa), linear and disulfide-rich peptides (DRPs) (typically, 3 to 9 kDa with three to six disulfide bonds), and proteins (10 to 120 kDa) (35). However, peptides are the major components of most spider venoms, with some containing >1,000 peptides (6). The majority of these peptides are shortDRPs (2.5 to 5 kDa), but there is also a significant proportion of longDRPs (6.5 to 8.5 kDa) (5). As the primary function of spider venom is to rapidly immobilize prey, it is perhaps not surprising that most spider-venom DRPs that have been functionally characterized target neuronal ion channels and receptors (5, 7). Although spider-venom DRPs have been shown to adopt a variety of three-dimensional (3D) structures, including the Kunitz (8), prokineticin/colipase (9), disulfide-directed β-hairpin (DDH) (10), and inhibitor cystine knot (ICK) fold (11), the majority of spider-venom DRP structures solved to date conform to the ICK motif. The ICK motif is defined as an antiparallel β-sheet stabilized by a cystine knot (11). In spider toxins, the β-sheet typically comprises only two β-strands although a third N-terminal strand is sometimes present (12). The cystine knot comprises a ringformed by two disulfide bonds and the in- tervening sections of the peptide backbone, with a third disulfide piercing the ring to create a pseudoknot (11). This knot provides Significance The venom of the Australian funnel-web spider is one of the most complex chemical arsenals in the natural world, com- prising thousands of peptide toxins. These toxins have a di- verse range of pharmacological activities and vary in size from short (3 to 4 kDa) to long (8 to 9 kDa). It is unclear how spiders evolved such complex venoms and whether there is an evolu- tionary relationship between short and long toxins. Here, we introduce a structural venomicsapproach to show that the venom of Australian funnel-web spiders evolved primarily by duplication and elaboration of a single ancestral knottin gene; short toxins are simple knottins whereas most long toxins are either highly elaborated single-domain knottins or double-knot toxins created by intragene duplications. Author contributions: S.S.P., E.A.B.U., J.S.M., and G.F.K. designed research; S.S.P., Y.K.-Y.C., E.A.B.U., S.S., M.M., C.D., P.E., G.M.N., Q.K., S.G., and V.H. performed research; S.S.P., Y.K.-Y.C., E.A.B.U., S.S., M.M., C.D., P.E., G.M.N., Q.K., S.G., V.H., and G.F.K. analyzed data; and S.S.P., Y.K.-Y.C., E.A.B.U., and G.F.K. wrote the paper. Competing interest statement: C.D. is affiliated with Thermo Fisher Scientific. This article is a PNAS Direct Submission. Published under the PNAS license. Data deposition: Atomic coordinates for protein structures determined in this study were deposited in the Protein Data Bank under accession codes 2N6N, 2N6R, 6BA3, and 2N8K while corresponding NMR chemical shifts were deposited in BioMagResBank under accessions BMRB25774, BMRB25778, BMRB25853, and BMRB30352. Metadata and annotated nucleotide sequences were deposited in the European Nucleotide Archive under project accessions PRJEB6062 (ERA298588) and PRJEB35693. Mass spectrometry data has been deposited to ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifier PXD016886. 1 S.S.P. and Y.K.-Y.C. contributed equally to the work. 2 To whom correspondence may be addressed. Email: [email protected] or glenn. [email protected]. 3 Present address: Brain and Mind Centre, University of Sydney, Camperdown, NSW 2050, Australia. 4 Present address: St Vincents Clinical School, University of New South Wales, Sydney, NSW 2010, Australia. 5 Present address: School of Science & Engineering, University of the Sunshine Coast, Sippy Downs, QLD 4556, Australia. 6 Present address: School of Biotechnology & Biomolecular Sciences, University of New South Wales, Sydney, NSW 2010, Australia. This article contains supporting information online at https://www.pnas.org/lookup/suppl/ doi:10.1073/pnas.1914536117/-/DCSupplemental. First published May 12, 2020. www.pnas.org/cgi/doi/10.1073/pnas.1914536117 PNAS | May 26, 2020 | vol. 117 | no. 21 | 1139911408 BIOCHEMISTRY Downloaded by guest on September 8, 2021

Upload: others

Post on 20-Aug-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Structural venomics reveals evolution of a complex venom ... · planet, with >100,000 extant species (2). A key contributor to their evolutionary success is the use of venom to capture

Structural venomics reveals evolution of a complexvenom by duplication and diversification of anancient peptide-encoding geneSandy S. Pinedaa,b,1,2,3,4, Yanni K.-Y. China,c,1, Eivind A. B. Undheimc,d,e

, Sebastian Senffa, Mehdi Moblic,Claire Daulyf, Pierre Escoubasg, Graham M. Nicholsonh

, Quentin Kaasa, Shaodong Guoa, Volker Herziga,5,

John S. Mattickb,6, and Glenn F. Kinga,2

aInstitute for Molecular Bioscience, The University of Queensland, St Lucia, QLD 4072, Australia; bGarvan-Weizmann Centre for Cellular Genomics, GarvanInstitute of Medical Research, Darlinghurst, Sydney, NSW 2010, Australia; cCentre for Advanced Imaging, The University of Queensland, St Lucia, QLD 4072,Australia; dCentre for Biodiversity Dynamics, Department of Biology, Norwegian University of Science and Technology, 7491 Trondheim, Norway; eCentrefor Ecological & Evolutionary Synthesis, Department of Biosciences, University of Oslo, 0316 Oslo, Norway; fThermo Fisher Scientific, 91941 CourtaboeufCedex, France; gUniversity of Nice Sophia Antipolis, 06000 Nice, France; and hSchool of Life Sciences, University of Technology Sydney, Broadway, NSW 2007,Australia

Edited by Adriaan Bax, National Institutes of Health, Bethesda, MD, and approved March 18, 2020 (received for review August 21, 2019)

Spiders are one of the most successful venomous animals, withmore than 48,000 described species. Most spider venoms aredominated by cysteine-rich peptides with a diverse range of phar-macological activities. Some spider venoms contain thousands ofunique peptides, but little is known about the mechanisms used togenerate such complex chemical arsenals. We used an integratedtranscriptomic, proteomic, and structural biology approach todemonstrate that the lethal Australian funnel-web spider pro-duces 33 superfamilies of venom peptides and proteins. Twenty-six of the 33 superfamilies are disulfide-rich peptides, and we showthat 15 of these are knottins that contribute >90% of the venomproteome. NMR analyses revealed that most of these disulfide-richpeptides are structurally related and range in complexity from sim-ple to highly elaborated knottin domains, as well as double-knottoxins, that likely evolved from a single ancestral toxin gene.

spider venom | venom evolution | structural venomics | transcriptomics |proteomics

Spiders evolved from an arachnid ancestor in the Late Or-dovician around 450 million years ago (1), and they have

since become one of the most successful animal lineages on theplanet, with >100,000 extant species (2). A key contributor totheir evolutionary success is the use of venom to capture preyand defend against predators. The constant selection pressure onvenoms over hundreds of millions of years has enabled them toevolve into complex mixtures of bioactive compounds with adiverse range of pharmacological activities.Spider venoms are a heterogeneous mixture of salts, low

molecular weight organic compounds (<1 kDa), linear anddisulfide-rich peptides (DRPs) (typically, 3 to 9 kDa with threeto six disulfide bonds), and proteins (10 to 120 kDa) (3–5).However, peptides are the major components of most spidervenoms, with some containing >1,000 peptides (6). The majorityof these peptides are “short” DRPs (2.5 to 5 kDa), but there isalso a significant proportion of “long” DRPs (6.5 to 8.5 kDa) (5).As the primary function of spider venom is to rapidly immobilizeprey, it is perhaps not surprising that most spider-venom DRPsthat have been functionally characterized target neuronal ionchannels and receptors (5, 7).Although spider-venom DRPs have been shown to adopt a

variety of three-dimensional (3D) structures, including theKunitz (8), prokineticin/colipase (9), disulfide-directed β-hairpin(DDH) (10), and inhibitor cystine knot (ICK) fold (11), themajority of spider-venom DRP structures solved to date conformto the ICK motif. The ICK motif is defined as an antiparallelβ-sheet stabilized by a cystine knot (11). In spider toxins, theβ-sheet typically comprises only two β-strands although a third

N-terminal strand is sometimes present (12). The cystine knotcomprises a “ring” formed by two disulfide bonds and the in-tervening sections of the peptide backbone, with a third disulfidepiercing the ring to create a pseudoknot (11). This knot provides

Significance

The venom of the Australian funnel-web spider is one of themost complex chemical arsenals in the natural world, com-prising thousands of peptide toxins. These toxins have a di-verse range of pharmacological activities and vary in size fromshort (3 to 4 kDa) to long (8 to 9 kDa). It is unclear how spidersevolved such complex venoms and whether there is an evolu-tionary relationship between short and long toxins. Here, weintroduce a “structural venomics” approach to show that thevenom of Australian funnel-web spiders evolved primarily byduplication and elaboration of a single ancestral knottin gene;short toxins are simple knottins whereas most long toxins areeither highly elaborated single-domain knottins or double-knottoxins created by intragene duplications.

Author contributions: S.S.P., E.A.B.U., J.S.M., and G.F.K. designed research; S.S.P.,Y.K.-Y.C., E.A.B.U., S.S., M.M., C.D., P.E., G.M.N., Q.K., S.G., and V.H. performed research;S.S.P., Y.K.-Y.C., E.A.B.U., S.S., M.M., C.D., P.E., G.M.N., Q.K., S.G., V.H., and G.F.K. analyzeddata; and S.S.P., Y.K.-Y.C., E.A.B.U., and G.F.K. wrote the paper.

Competing interest statement: C.D. is affiliated with Thermo Fisher Scientific.

This article is a PNAS Direct Submission.

Published under the PNAS license.

Data deposition: Atomic coordinates for protein structures determined in this study weredeposited in the Protein Data Bank under accession codes 2N6N, 2N6R, 6BA3, and 2N8Kwhile corresponding NMR chemical shifts were deposited in BioMagResBank underaccessions BMRB25774, BMRB25778, BMRB25853, and BMRB30352. Metadata andannotated nucleotide sequences were deposited in the European Nucleotide Archiveunder project accessions PRJEB6062 (ERA298588) and PRJEB35693. Mass spectrometrydata has been deposited to ProteomeXchange Consortium via the PRIDE partnerrepository with the dataset identifier PXD016886.1S.S.P. and Y.K.-Y.C. contributed equally to the work.2To whom correspondence may be addressed. Email: [email protected] or [email protected].

3Present address: Brain and Mind Centre, University of Sydney, Camperdown, NSW 2050,Australia.

4Present address: St Vincent’s Clinical School, University of New South Wales, Sydney,NSW 2010, Australia.

5Present address: School of Science & Engineering, University of the Sunshine Coast, SippyDowns, QLD 4556, Australia.

6Present address: School of Biotechnology & Biomolecular Sciences, University of NewSouth Wales, Sydney, NSW 2010, Australia.

This article contains supporting information online at https://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1914536117/-/DCSupplemental.

First published May 12, 2020.

www.pnas.org/cgi/doi/10.1073/pnas.1914536117 PNAS | May 26, 2020 | vol. 117 | no. 21 | 11399–11408

BIOCH

EMISTR

Y

Dow

nloa

ded

by g

uest

on

Sep

tem

ber

8, 2

021

Page 2: Structural venomics reveals evolution of a complex venom ... · planet, with >100,000 extant species (2). A key contributor to their evolutionary success is the use of venom to capture

ICK peptides (also known as knottins) (13) with exceptionalresistance to chemicals, heat, and proteases (14, 15), which hasmade them of interest as drug and insecticide leads (5, 14, 16).Some spider toxins show minor (17) or more significant (18)elaborations of the basic ICK fold involving an additional sta-bilizing disulfide bond. More recently, “double-knot” spidertoxins have been reported in which two structurally independentICK domains are joined by a short linker (19, 20).Like other folds with stabilizing disulfide bridges, knottins

display a remarkable diversity of biological functions, includingmodulation of many different types of ligand- and voltage-gatedion channels (5). Despite strong conservation of the knottinscaffold across a taxonomically diverse range of spiders, severalfactors have hampered analysis of their evolutionary history (21).First, it is not until recently that a large number of knottin pre-cursor sequences have become available from venom-glandtranscriptomes. Second, the disulfide framework in small DRPsgenerally constrains the peptide fold to such an extent that mostnoncysteine residues can be mutated without damaging the pep-tide’s structural integrity, a luxury not afforded to most globularproteins (21). Thus, evolution of DRPs is typically characterizedby the accumulation of many mutations, leaving very few con-served residues available for deep evolutionary analyses (21).Third, very few structures have been solved for DRPs larger than 5kDa, making it unclear whether just a few or many of the largerDRPs are highly derived or duplicated knottins.The only attempt so far to study the phylogeny of spider-

venom knottins concluded that both orthologous diversificationand lineage-specific paralogous diversification were important ingenerating the diverse arsenal of spider-venom knottins (22).These authors also reported that knottins that act as gatingmodifiers of voltage-gated ion channels likely evolved on multi-ple independent occasions. However, whether this resulted fromconvergent functional radiation of toxins or independent evolu-tion of gating modifiers from nonvenom knottins remains un-clear. The lack of nonvenom gland homologs and the use of onlya single nonspider knottin outgroup precluded firm conclusionsregarding the toxin, or nontoxin, origins of these convergentlyevolved spider-venom knottins. Thus, the deep evolutionaryhistory of spider-venom knottins remains enigmatic (22).Here, we report the combined use of proteomics, transcriptomics,

and structural biology to determine the complete repertoire andevolutionary history of DRPs expressed in the venom ofHadronycheinfensa, a member of the family of lethal Australian funnel-webspiders. This structural venomics approach has provided the mostcomprehensive overview to date of the toxin arsenal of a spider, andit revealed that the incredibly diverse repertoire of spider-venomDRPs is largely derived from a single ancestral knottin gene.

Results and DiscussionMass Spectrometry (MS) Reveals a Complex Spider-Venom Peptidome.MS analyses revealed that the venom peptidome of H. infensa isextremely complex. We used two complementary MS platforms toexamine the distribution of peptide masses in pooled venom fromfemale H. infensa (23). Analysis of secreted venom using matrix-assisted laser desorption/ionization tandem time-of-flight (MAL-DI-TOF/TOF) and Orbitrap MS resulted in 2,053 and 1,649 dis-tinguishable peptide masses, respectively (Fig. 1 A–D); only 651masses were shared between the two MS datasets, leading to atotal of 3,051 unique venom peptides (Fig. 1E). This is consider-ably more than the ∼1,000 peptides reported to be present invenom of the Australian funnel-web spider Hadronyche versuta (6)although this difference is likely a reflection of recent improve-ments in MS sensitivity rather than an indication of major dif-ferences in venom complexity between these closely relatedspecies. We conclude that Australian funnel-web spiders have oneof the most complex venom peptidomes reported for a terrestrialanimal, with similar diversity to those of marine cone snails (24).

The distribution of peptide masses in H. infensa venom is bi-modal. Most peptides fall in the mass range 2.5 to 5.5 kDa, butthere is a significant cohort of larger peptides with mass 6.5 to8.5 kDa (Fig. 1 C and D). This bimodal distribution matches thatpreviously described for venom from related funnel-web spiders(6, 25), various tarantulas (26), and the spitting spider Scytodesthoracica (27), and it is also reflected in the mass profile generatedfor all spider toxins reported to date (5). As reported previouslyfor Australian funnel-web spiders (6, 25), the 3D venom landscape(Fig. 1F) revealed no correlation between peptide mass andpeptide hydrophobicity, as judged by reversed-phase (RP) high-pressure liquid chromatography (HPLC) retention time.

Transcriptomics Uncovers the Biochemical Diversity of H. infensaVenom. Consistent with MS analysis of secreted venom, se-quencing of a venom-gland transcriptome from H. infensarevealed a biochemically diverse venom, with at least 33 toxinsuperfamilies (Fig. 2). In light of their likely toxic function, eachsuperfamily of toxins was named, as suggested previously (28),after gods/deities of death, destruction, or the underworld.Expressed sequence tags were sequenced using the 454 platformand assembled using MIRA v3.2 (29). This produced a total of26,980 contigs and 7,194 singlets, with an average contig lengthof 496 base pairs (bp) (maximum length 3,159 bp, N50 674 bp).After assembly, all contigs and singlets were submitted toBlast2GO (30) to acquire BLAST and functional annotations.Sequences were then grouped into three categories: 1) proteinsand other enzymes (61%); 2) toxins and toxin-like peptides(14%); and 3) sequences with no hits in ArachnoServer (31) orthe National Center for Biotechnology Information (NCBI,25%) databases (SI Appendix, Fig. S1). Recovered BLAST hitsincluded other spider toxins, as well as peptides and proteinsfound in other venomous arthropods with recently annotatedgenomes, including the tick Ixodes scapularis, the honey bee Apismellifera, the jumping ant Harpegnathos saltator, and the para-sitoid wasp Nasonia vitripennis.To maximize discovery of DRPs, we used an in-house algo-

rithm to identify potential open reading frames (ORFs). Bothannotated and previously unidentified peptides were then com-bined into a single file that was used to search for processingsignals in all precursors. After determination of cleavage regions(SI Appendix, Fig. S2), toxins were classified into superfamilies(SFs) based on their signal-peptide sequence (SI Appendix, Fig.S3), total number of cysteine residues, and their known or pre-dicted cysteine framework (Fig. 2 and SI Appendix, Fig. S3). Abrief description of each superfamily is provided in SI Appendix,Figs. S4–S36.Transcript abundance for each superfamily was estimated us-

ing RSEM v1.2.31 (32), and these values are reported as tran-scripts per million (TPM) in SI Appendix, Table S1 and Fig. 3.Abundance varied greatly from very low for SF11, SF21, andSF25 (TPM 2 to 12) to very high for SF2, SF3, SF4, SF13, andSF23 (TPM 9,000 to 13,000). Due to the low throughput of 454sequencing compared to newer sequencing technologies, we in-vestigated the extent of sequence coverage by random samplingof the total read data to produce 11 fractional datasets and thendetermined the total number of superfamilies recovered for eachdata subset (SI Appendix, Fig. S37). Even at the lowest level ofsampling (5%), 16 of 33 superfamilies were recovered, and de-tection saturation (30 of 33 superfamilies) was reached when80% of reads were used for reassembly. Not surprisingly, themore reads added to the assembly, the more isoforms we re-covered, and, similarly, superfamilies with low transcript abun-dance needed higher sampling to be detected. We conclude thatthe 454 sequencing platform provided good coverage of the H.versuta venome.Using this approach, we determined that the venom gland of

H. infensa expresses at least 26 families of DRPs, three families

11400 | www.pnas.org/cgi/doi/10.1073/pnas.1914536117 Pineda et al.

Dow

nloa

ded

by g

uest

on

Sep

tem

ber

8, 2

021

Page 3: Structural venomics reveals evolution of a complex venom ... · planet, with >100,000 extant species (2). A key contributor to their evolutionary success is the use of venom to capture

of enzymes (hyaluronidase, lipase, and phospholipase A2 [PLA2]),one family of cysteine-rich secretory proteins (CRiSPs), andthree families of uncharacterized secreted proteins (Figs. 2 and3). Hyaluronidase and PLA2 have been convergently recruitedinto many animal venoms (33). Venom hyaluronidases act ashemostatic factors (33) or facilitate spread of neurotoxins byenhancing tissue permeability (34). Sequence diversity acrosstaxa is minimal, and no new activities have been reported forvenom hyaluronidases (33). In contrast, venom PLA2s are di-verse, and they have acquired a range of new functions, some ofwhich are independent of their catalytic activity (35). There areonly a few reports of lipases in animal venoms, but the H. infensavenom lipase has significant homology to those found in waspvenoms (36, 37). It is unclear what function lipases might play inspider venom. Finally, consistent with an actively regeneratingvenom gland, the transcriptome is enriched with transcriptsencoding proteins involved in translation, processing, folding,and posttranslational modification of venom toxins (SI Appendix,Fig. S38).

Proteomic Analyses Confirm the Biochemical Complexity of H. infensaVenom. To further probe the proteomic complexity of H. infensavenom, we searched our annotated transcriptome with tandemMS spectra generated from fractionated, reduced, alkylated, andtrypsin-digested venom. Using a stringent confidence threshold(<1% false discovery rate [FDR] calculated from a decoy-basedFDR analysis in ProteinPilot v.4.5.1), we detected 1,108 of the 1,224venom proteins and peptides predicted from the transcriptomic

analyses. Using the same criteria but a different search program(Mascot v2.5.1), we identified 877 of the same of the 1,224venom proteins and peptides, 849 of which were identified byboth programs (SI Appendix, File 1). These proteomically iden-tified venom components spanned 22 (ProteinPilot) or 20(Mascot) of the predicted 33 superfamilies, including all of themost diverse superfamilies, and three of the seven predictedprotein superfamilies. There was a strong correlation betweenlow expression and lack of proteomic detection; the predictedpeptide superfamilies with cumulative expression values <100TPM (Fig. 3) accounted for all peptide superfamilies not de-tected in the venom. Taken together, our transcriptomic andproteomic analyses indicate that H. infensa has a highly complexpeptide-dominated venom.

The Venom of H. infensa Contains both Classical and Highly ElaboratedICK Toxins.Twenty-six of the 33 toxin superfamilies identified in thevenom-gland transcriptome of H. infensa are DRPs (Fig. 2 and SIAppendix, Fig. S3). Thus, DRPs are the dominant polypeptides inthe venom, even when transcript abundance is considered (Fig. 3).Eight of the 26 DRP superfamilies could be confidently classifiedas knottins based on prior determination of the 3D structure ofone of the superfamily members (SF3) or strong homology with anorthologous knottin (i.e., from another spider venom) whosestructure has been elucidated (SF8 to -10, SF13, SF16, SF17, andSF20). A further three superfamilies were confidently predicted tobe knottins because their cysteine framework and intercysteinespacing conform to the ICK motif (SF4, SF19, and SF24), albeit

A B

C D

FE

3

2

1

0 Abs

orba

nce

215

nm x

10–

5

Counts/H

PLC fraction

Retention time (min)

LC-MALDI-TOF-TOF

LC-MALDI-TOF-TOF LC-ESI Orbitrap

Cou

nts

per m

ass

bin

( )

Cum

ulative mass count ( )

Mas

s (m

/z)

Retention time (min)

Mass, Bin Range (kDa) Mass, Bin Range (kDa)

LC-MALDI-TOF-TOF

Orbitrap MALDI

998 1402651

3051 venom peptides

1.0–

1.5

n = 2052

n = 1648

1.5–

2.0

2.0–

2.5

2.5–

3.0

3.0–

3.5

3.5–

4.0

4.0–

4.5

4.5–

5.0

5.0–

5.5

5.5–

6.0

6.0–

6.5

6.5–

7.0

7.0–

7.5

7.5–

8.0

8.0–

8.5

8.5–

9.0

9.0–

9.5

9.5.

–10.

010

.0.–

10.5

10.5

.–11

.011

.0.–

11.5

11.5

.–12

.012

.0.–

12.5

12.5

.–13

.013

.0.–

13.5

13.5

.–14

.014

.0.–

14.5

14.0

.–14

.5

1.5–

2.0

2.0–

2.5

2.5–

3.0

3.0–

3.5

3.5–

4.0

4.0–

4.5

4.5–

5.0

5.0–

5.5

5.5–

6.0

6.0–

6.5

6.5–

7.0

7.0–

7.5

7.5–

8.0

8.0–

8.5

8.5–

9.0

9.0–

9.5

9.5.

–10.

010

.0.–

10.5

10.5

.–11

.011

.0.–

11.5

11.5

.–12

.012

.0.–

12.5

12.5

.–13

.013

.0.–

13.5

13.5

.–14

.014

.0.–

14.5

14.0

.–14

.5

1.0–

1.5

14.5

.–15

.0

110100908070605040 HP

LC fr

actio

nnu

mbe

r

25000

20000

15000

10000

5000

Abu

ndan

ce

3400

3600

3800

4000

4200

4400

4400

4800

5000

5200Mass (Da)

3D VENOM LANDSCAPE

Cou

nts

per m

ass

bin

( )

Cum

ulative mass count ( )

Fig. 1. Mass profile of H. infensa venom. (A) RP-HPLC chromatogram showing fractionation of crude H. infensa venom. Absorbance at 215 nm is shown indark gray (left ordinate-axis) while mass count for each HPLC fraction is shown in light gray (right ordinate-axis). (B) Distribution of MALDI-TOF/TOF masses asa function of RP-HPLC retention time. (C and D) Histograms showing distribution and abundance of peptides in H. infensa venom as detected using (C)MALDI-TOF/TOF and (D) Orbitrap MS. Masses are grouped in 500-Da bins. Gray bars indicate cumulative total number of toxins (right ordinate-axis). (E) Eulerplot showing overlap between mass counts generated via MALDI-TOF/TOF and Orbitrap MS. (F) A 3D landscape of H. infensa venom showing the correlationbetween RP-HPLC retention time and peptide mass and abundance generated by MALDI-TOF/TOF analysis. (Inset) A photo of a female H. infensa. Imagecredit: Bastian Rast (photographer).

Pineda et al. PNAS | May 26, 2020 | vol. 117 | no. 21 | 11401

BIOCH

EMISTR

Y

Dow

nloa

ded

by g

uest

on

Sep

tem

ber

8, 2

021

Page 4: Structural venomics reveals evolution of a complex venom ... · planet, with >100,000 extant species (2). A key contributor to their evolutionary success is the use of venom to capture

with an additional disulfide bond in the case of SF4. In addition,SF1 is a family of double-knot toxins, one of whose structure wasrecently solved (Fig. 4A) (20). Thus, 12 of the 26 DRP super-families can be assigned as knottins. Only six DRP superfamiliescould be confidently assigned as not being knottins: SF2 corre-sponds to the MIT/Bv8/prokineticin superfamily (38), SF5 is afamily of β-hairpin toxins with high homology to the antimicrobialpeptide gomesin isolated from hemocytes of the tarantula Acan-thoscurria gomesiana (39), SF11 is a family of cystatin-like toxins,SF12 is a derived MIT/Bv8/prokineticin family with four ratherthan the canonical five disulfide bonds, and SF7 and SF21 haveless than the minimum of six cysteine residues required to form anICK motif.Eight of the 26 DRP superfamilies with three or more disul-

fide bonds have sequences with unknown structure and bi-ological activity (SF6, SF14, SF15, SF18, SF22, SF23, SF25, andSF26). In order to better understand the structural diversity andevolutionary origin of spider-venom DRPs, we attempted todetermine the structure of representative members of SF6, SF14,SF26, and homologous sequences to those of SF22 and SF23.These superfamilies were prioritized over SF15, SF18, and SF25,

which contain only low-abundance transcripts (Fig. 3). Peptideswere expressed in the periplasm of Escherichia coli using a pre-viously optimized system (40), then purified using nickel-affinitychromatography followed by RP-HPLC (SI Appendix, Fig. S5).All of the chosen peptides were successfully produced with theexception of SF14, which failed to express in E. coli. The SF6,SF22, SF23, and SF26 peptides were uniformly labeled with 15Nand 13C, and their structures were determined using an NMRstrategy we described previously that takes advantage of non-uniform sampling to expedite data acquisition and improvestructure quality (41). Each of the determined structures is ofhigh precision and has excellent stereochemical quality, as evi-denced by the structural statistics presented in SI Appendix,Table S2.The SF6 family member adopts a highly derived knottin fold

with several unique elaborations (Fig. 4B): 1) The β-hairpin loopencompasses an eight-residue α-helix; 2) an extended C-terminaltail is locked in place by a disulfide bridge to the adjacentC-terminal β-strand of the ICK motif; and 3) a highly extendedN-terminal region (i.e., preceding the first cysteine residue) in-cludes a short six-residue α-helix and a β-strand that forms a

Toxin superfamilies

SF10Kisin

C-C-CC-C-C

SF9Shiva

C-C-CC-C-C

SF8Anat

C-C-CC-C-C-CPC-C-C

SF5Chi-YouC-C-C-C

SF2Ixtab

C-C-CC-C-C

SF1AhPuch

C-C-CC-C-C-C-C-CC-C-C

SF3Thanatos

C-C-C-C

SF4Erlik

C-C-C-CC-C-C-C

SF6‘Oro

C-C-CC-C-C-C-C

SF7Pele

C-C-C-C-C

SF11Hades

C-C-CC-C-C-C-C-hexatoxin-2-hexatoxin-1MIT/Bv8-like Gomesin Cystatin-like

Unknownfold #1

SF21AnubisC-C-C-C

SF20EreshkigalC-C-CC-C-C

SF19Yama

C-C-CC-C-C

SF16Zahhak

SF13Ares

C-C-CCC-C-C-C

SF12Sekhmet

C-C-C-C-CPC-C-C

SF14Wurrukatte

C-C-C-CC-C-C-C

SF15Pluto

C-C-C-CC-C-C-C-C-C

SF17Nergal

C-C-CC-C-C

SF18Grim Reaper

C-C-C-C-C-C-C-C

SF22Loke

C-C-CC-C-C-C-C-C-C

SF33Mehnhit

SF32Midgardsormen

SF31Hulda

SF28Tuoni

SF24Ankuo

C-C-CC-C-C

SF23Qumaits

C-C-CC-C-C-C-C

SF25TezcatlipocaC-C-C-C-C-C-C-C-C-C-C-C

SF27Mara

SF29Mars

SF30Sauron

Hyaluronidase Phospholipase A2LipaseENZYMES SECRETED PROTEINS

-hexatoxin

CRISP

2KRA

μ-agatoxin-like

Unknownfold #5

1KQH 1KFP 1AXH 1G9P 3L0R

1VTX 2H1Z

1FCU 1POC 1QNX

1S6X

ICK ICK

ICK ICKICK ICK

ICKDouble ICK ICK

ICK

ICK

-grammotoxin-like

1KOZ

VSTx1-like

Venom allergen 5

ICK

non-ICKnon-ICK non-ICKICK

non-ICK

Techylectin-like

1JC9non-ICK

2N6N

non-ICK

PredictedICK fold

C-C-CC-C-C

ICK

SF26Set

C-C-CC-C-C-C-C

ICK

PredictedICK fold

non-ICK

non-ICK

Modifiedcolipase/Bv8/MIT1/

prokineticinfold

MIT/Bv8-like

2N6R2N8F

2N8K

Unknownfold #2

Unknownfold #3

Unknownfold #4

Unknownfold #6

6BA3 1W52

1IE6

Unknownfold #7

Imperatoxin-like

PredictedICK fold

Unknownfold #8

Fig. 2. Overview of the venom proteome of H. infensa. The venom proteome of H. infensa consists of 33 toxin superfamilies (SF1 to SF33). For each SF, thecysteine framework is shown in blue, and the 3D fold is classified as ICK, putative ICK, double ICK, non-ICK, or unknown. Light blue boxes enclose previouslysolved structures of superfamily members from H. infensa. Black boxes enclose structures of orthologous superfamily members from related mygalomorphspiders or, in the case of enzymes and CRiSP proteins, orthologs from venomous hymenopterans (bees and wasps). Red boxes enclose structures solved in thecurrent study (SF6, SF22, SF23, and SF26). Stars enclosed by dashed black boxes signify toxin superfamilies for which no structural information is currentlyavailable. For each of the structures, β-strands are shown in blue, helices are green, and core disulfide bonds are shown as solid red tubes. For DRPs containingan ICK motif, additional disulfide bonds that do not form part of the core ICK motif are shown as orange tubes. Toxin superfamilies are named after gods ordeities of death, destruction, and the underworld. For each structure shown, PDB accession numbers are given in the lower right corner.

11402 | www.pnas.org/cgi/doi/10.1073/pnas.1914536117 Pineda et al.

Dow

nloa

ded

by g

uest

on

Sep

tem

ber

8, 2

021

Page 5: Structural venomics reveals evolution of a complex venom ... · planet, with >100,000 extant species (2). A key contributor to their evolutionary success is the use of venom to capture

small β-sheet via hydrogen bonds with residues in the CysIV–CysV intercystine loop. The SF26 family member also adopts ahighly modified knottin fold (Fig. 4C), with the following elab-orations of the classical ICK motif: 1) Similar to SF6 toxins, anextended C-terminal tail is stapled in place by a disulfide bridgebetween the C-terminal Cys residue and a Cys residue at the baseof the adjacent C-terminal β-strand of the ICK motif; 2) the longN-terminal region includes a short six-residue 310 helix followedby an extended segment that stretches across toward the muchlonger β-sheet of the cystine knot; and 3) a five-residue 310 helix

is inserted in the CysII–CysIII loop in the ICK motif. The SF23family member adopts a prototypical knottin fold with an addi-tional stabilizing disulfide bond at the base of the β-hairpin loop(Fig. 4D), which is a relatively common elaboration in spidertoxins (42). Superimposition of the 3D structures of SF6, SF23,SF26, and either of the two ICK domains of SF1 reveals that,regardless of the size and structural elaborations acquired inthese peptides (Fig. 4E and Movie S1), the central ICK motifremains essentially the same, despite enormous variations inamino acid sequence.Elucidation of the 3D structure of toxin U33-TRTX-Cg1c from

the Chinese tarantula Chilobrachys guanxiensis, which is homol-ogous to SF22, revealed that these toxins do not adopt an ICKfold. The overall topology of SF22 (Fig. 4F) is similar to theprokineticin/colipase fold adopted by other five-disulfide DRPs,such as the SF2 and SF12 toxins, but they are not homologous.The prokineticin/colipase fold found in vertebrate proteins canbe viewed as a head-to-tail duplication of two DDH motifs withvarious elaborations on the loops (10). SF22 also contains twocore motifs. The C-terminal motif is structurally similar to theC-terminal subdomain of prokineticin and colipase, and it adoptsa DDH motif with a long 17-residue loop between CysVI andCysVIII (cf. four to six residues in the equivalent loop of mostICK motifs). The extended loop bridges over to the N-terminalmotif, stabilized by an intersubdomain disulfide bond (CysIII–CysVII) and a hydrogen bond between Cys15 (CysIV) and Tyr49.However, in contrast with prokineticin and colipase, theN-terminal region does not adopt a DDH fold. Rather, theN-terminal core is composed of two stacked antiparallelβ-hairpins in which the parallel strands in the adjacent hairpinsare covalently linked by two disulfide bonds (Fig. 4F). Thisstacked-hairpin fold resembles the “mini-granulin” fold seen indiverse proteins, such as granulin, serine proteases, growth fac-tors, and cone snail venom peptides (43, 44). Given the uncertain

Superfamily number1 2 3 4 5 6 7 8 9 10 1112131415161718192021222324252627282930313233

Proteins

500

400

300

200

100

0

2000

5000

8000

11000

14000

Tran

scrip

ts p

er m

illio

n

ICKNon-ICKProteins

Disulfide-rich peptides

Fig. 3. Abundance of transcripts encoding DRPs and proteins obtainedfrom sequencing of an H. infensa venom-gland transcriptome. Blue barsrepresent protein-encoding transcripts while red and gray bars denotetranscripts encoding DRPs that do or don’t have an ICK scaffold, respectively.ICK transcripts dominate the venom-gland transcriptome. Superfamilynumbers correspond to those shown in Fig. 2.

Fig. 4. Structures of selected DRPs found in the venom of H. infensa, highlighting key structural innovations in “short” and “long” peptide toxins. (A)Schematic representation of the SF1 double-knot toxin, which is comprised of two independently folded ICK domains joined by an inflexible linker. (B) SF6family peptides adopt a typical ICK fold with several unique elaborations, including an extended C-terminal tail that is stapled to the rest of the structure viaan additional disulfide bond (“tail-lock”), an enlarged intercystine loop 4 that contains an α-helical insertion, and a highly extended N-terminal region thatincludes a six-residue α-helix and a short two-stranded β-sheet. (C) SF26 DRPs also adopt a highly elaborated knottin fold with a C-terminal tail tail-lock, a five-residue 310 helix in the core ICK region between CysII and CysIII, and a long N-terminal extension that includes a short α-helix. (D) SF23 DRPs only differ fromclassical ICK toxins by having an extra disulfide bond (“β-sheet staple”) that stabilizes the β-hairpin loop. (E) Structural alignment of the ICK core regions ofDRPs from SF1, SF6, SF17, SF22, and SF23, highlighting the strong conservation of the ICK motif (DDH in the case of SF22) regardless of the extent of structuralelaborations outside this core region. (F) SF22 DRP represents a previously unidentified toxin fold comprised of two independent structural domains con-nected by a disulfide bond. The C-terminal domain forms a DDH core while the N-terminal domain adopts a DABS fold. In all panels, disulfide bondscomprising the ICK motif are shown as red tubes while additional noncore disulfide bonds are shown as orange tubes. β-sheets and α-helices are highlightedin blue and green, respectively.

Pineda et al. PNAS | May 26, 2020 | vol. 117 | no. 21 | 11403

BIOCH

EMISTR

Y

Dow

nloa

ded

by g

uest

on

Sep

tem

ber

8, 2

021

Page 6: Structural venomics reveals evolution of a complex venom ... · planet, with >100,000 extant species (2). A key contributor to their evolutionary success is the use of venom to capture

evolutionary history and diverse taxa that express this proteinfold, we refer to it here as a “disulfide-stabilized antiparallelβ-hairpin stack” (DABS). Covalent linkage of a DDH and DABSdomain has not been reported in venom peptides, and thus SF22represents a new class of toxins.

ICK Toxins Dominate the Venom Arsenal of H. infensa.Our combinedtranscriptomic, proteomic, and structural biology data revealedthat ICK toxins are the major contributor to the diversity of the

H. infensa venom peptidome, with 15 of the 26 identified DRPsuperfamilies comprised of simple or highly derived knottins.Moreover, analysis of transcript abundance revealed that 11 ofthe 14 most highly expressed DRP superfamilies are ICK toxins(Fig. 3). Overall, based on transcript abundance, ICK toxinsrepresent ∼91% of the total venom peptidome. If one includesthe seven superfamilies of putative protein toxins (SF27–SF33),which are all expressed at very low levels, the venom proteome iscomposed of 90.5% ICK toxins, 9.1% non-ICK DRPs, and 0.4%

A0A4Q8K9A4

A0A4Q8K1W5

A0A4Q8K4K9

A0A4Q8K1B6

A0A4Q8K3R5

A0A1D0BPK9

δ-HXTX-Hi1a_1699

98

SF13

TXM14_MACGS

P83560

ICK2_TRILK

ADF28491B3FIV1

TXBS1_BRASM

99

9882

A3F7X1

TXLT4_LASSB

82

93 D2Y271

H16D1_HAPHA

H16B1_HAPHA

TX16C_HAPSC

B3FIQ5ABY77693D2Y283D2Y299B3FIQ7U15-H

XTX-Hi1a

A0A4Q8K6Z3

A0A4Q8K5T9

SF20TXPT6_MACGS

TXMG2_MACGS

P83558W4VRV3

ICK6_TRILK

91

87

75

99

98

75

95

92

9898

7075

91

9390

A0A

1D0B

R98

A0A

4Q8K

7E1

U 19-H

XTX

-Hi1

a_2

A0A

4Q8K

AF9

A0A

1D0C

028

U 12-H

XTX

-Hi1

a_2

U 12-H

XTX

-Hi1

b_3

U 6-H

XTX

-Hi1

a_1

U 6-H

XTX-

Hi1

b

A0A1

D0B

NA3

A0A4

Q8K

0R5

A0A4

82Z8

A8

B1P1

H6

JZT5

6_C

HIG

U

SF8

SF24

SF17

Haplopelma hainanum clade

M. giganteus(whip scorpion)

P6098

0

P0DQJ5

A0A48

2Z9S

1B1P

1D2

A0A48

2Z7T

0

A0A48

2Z6H

4B1P

1B9

B1P1C

6B1P

1A4

B1P1

C7

B1P1

C8

P0DM

67

Q75W

H5

Q5Y4V

1

Q5Y4U

8

82

67

99

9897

86

93

A0A1D0BZG7

U 18-HXTX-H

i1b_3

U18-HXTX-H

i1a

A0A1D0BRB084

84

99

D2Y2C5

A0A4Q8KB62

A0A4Q8KBP3

A0A4Q8K2E8

P60272

U2-HXTX-Hi1a_2

56

62

98

50

97

95

95

96

9899

99

99

99

9277

72

97 97

9964

86

99

94

71

96

68

7892

64

56

82

81

94

70

9785

93

97

83

7565

94

83

9980

82

83

83

75 62 73

67

9188

885551

9799

99

Mg2_c0_g1_i1Cshyp4_0_g2_i1_11A0A2P6L7U6Na_c0_g1_i2

A0A4Y2B2M7

LmL2_0_g1_i1_1A0A4Q8K4G6

A0A4Q8K3N9

A0A482ZAI5

A0A4Q8K2E3

A0A4Q8KCP8A0A4Q8K123

A0A4V2H9S7

A0A087T772

A0A2D0PC75

A0A4Q8K427A0A4Q8K559

A0A4Q8K5W3

lmL2_0_g2_i1_1

Q5Y4U4

A0A4Q8K2G6

A0A2D0PBN7Cshyp5_0_g1_i1_11

A0A4Q8K501

83

A0A4Y2B7R1Cshyp6_0_g2_i1_7Hh_DN37914_c0_g1_i1

A0A087TBW8A0A4Q8K9M1

A0A4Y2B988A0A087TD82

0.7

SF23

SF3

P83562

A0A4Q8K6G9

-HXTX-Hi1b_d1

-HXTX-Hi1a_d1

-HXTX-Hi1d_d2

-HXTX-Hi1b_d2

-HXTX-Hi1e_d2

-HXTX-Hi1a_d2

-HXTX-Hi1c_d2

-HXTX-Hi1d_d2

Q9BJV7TOTA_ATRIL

A0A4Q8KC10A0A4Q8K995

A0A4Q8K2P9A0A1D0C005TOTA_HADINo-Hi1-2do-Hi2g_1

TT

SF10

A0A4Q8K0Y5

SF1

A0A2D0PBU5B1P1H8

A0A4Q8KAE2A0A1D0BR67

A0A1D0BNB4U9-HXTX-Hi1a_2

W4VRX0

W4VSI5

ICK24_TRILK

Q75WG7

A0A4Q8K3Q5

A0A4Q8K088

A0A4V2H993

A0A4Q8K765

A0A4Q8K2N9

A0A4Q8K630

A0A4Q8KA16

A0A4Q8K8J7

A0A4V6MK97

A0A4Q8K7H1

U3-HXTX-Hi1c_1

U3-HXTX-Hi1a_1U3-HXTX-Hi1b

A0A4Q8K1T3

B3FIN4

TX18A_HAPSCA0A4Q8K1V5

A0A4Q8K442

A0A4Q8K4S4

ABY77683

A0A482Z6E1

A0A482ZHB5

B1P1I3

JZT64_CHIGU

JZT65_CHIGU

I2FKH7

BAM15662

BAN13533

96A0A4Q8K436

D2Y251

H18A1_HAPHA

D5J6X8

W4VS46

W4VRV2

W4VSB9

ICK8_TRILK

A0A4K1D8L1

A0A4K1D8B0

A0A0P0D113

A0A482ZD57A0A482ZAB2

BPP1I1

D2Y2B6

W4VSI6

W4VSI7

W4VS32

W4VSB7

ICK19_TR

ILK

W4VR

V1IC

K17_TRILK

TXP1_APOSC

TXP

4_AP

OS

CTX

P9_A

PO

SC

W4V

RU

3W

4VR

X8

U21 -H

XTX

-Hi1a

U14 -H

XT

X-H

i1c

U14 -H

XT

X-H

i1b

U14 -H

XTX

-Hi1a_2

B1P

1J5B

3FIP1

M5A

XR

5

U11 -H

XT

X-H

i1a

A0A

088BP

H1

A0A

4V2H

9N9

A0A

482ZF

69B

6DC

T3

A0A

4Q8K

7P6

A0A

04Q8K

AQ

9A

0A04Q

8KA

Q8

A0A

04Q8K

7Y5

A0A

04Q8K

2H9

A0A

1D0B

NE

8

S0F

1N0

o-Hi1b_11

2_h1iH- o

6_g1iH-o o-H

i1c_

2

A0A

4Q8K

AG

8P

0DM

Q3

CD

F44

148

S0F

1N7

TO

1F_H

AD

VN

TO

1F_A

TR

OP

0DM

Q0

TO

1C_H

AD

MO

P83

580

AO

A4Q

8K62

1A

5A3H

1S

0F1M

6C

DF4

4163

TOK

1H_H

AD

VE

4i_1g_0c_73643N

D_Y

TINI

RT

SF14

SF4

HHHH

SF19

SF16

SF9

Non

-ven

om g

land

tiss

ue

Venom gland transcripts

SF26

Fig. 5. Evolution of spider-venom ICK toxins. Maximum likelihood tree showing the phylogenetic relationship between H. infensa DRPs and DRPs isolatedfrom the venom gland or other tissues of other spider species. The tree was rooted using the whip scorpion M. giganteus as the outgroup. Bootstrap valuesare shown at each node, except for nodes where support was 100. The tree shows that, although many of the phylogenetic relationships between super-families remain unresolved, all venom-derived DRPs form a well-supported monophyletic clade. Superfamily sequences belonging to H. infensa are high-lighted in blue text, and representative structures for each superfamily are shown. DRP sequences from muscle and other tissues are highlighted in red; allother sequences (denoted by the light blue broken circle) represent venom DRPs. Venom peptides isolated from other species have their correspondingaccession numbers/common toxin names listed in the labels, with the exception of the H. hainanum superfamily XVI clade. A summary of all accessions usedcan be found in SI Appendix.

11404 | www.pnas.org/cgi/doi/10.1073/pnas.1914536117 Pineda et al.

Dow

nloa

ded

by g

uest

on

Sep

tem

ber

8, 2

021

Page 7: Structural venomics reveals evolution of a complex venom ... · planet, with >100,000 extant species (2). A key contributor to their evolutionary success is the use of venom to capture

protein toxins. We conclude that ICK toxins dominate thevenom arsenal of H. infensa both in terms of molecular diversityand abundance.To expand on the activities reported for ICK toxins from

funnel-web spider venom (SI Appendix, Table S4), we examinedwhether SF4, SF6, SF23, and SF26 are insecticidal. All fourtoxins induced paralysis when injected into sheep blowflies (SIAppendix, Table S3). SF4 and SF23 were only active at high dose,and flies recovered within 24 h. SF6 caused irreversible paralysis,leading to death at moderate doses (9.7 nmol/g), while SF26paralyzed all flies within 30 min, even at low dose (0.3 nmol/g),but the effects were reversible. The activity of SF26 is compa-rable to other potent insecticidal spider toxins tested in thisassay (45).

Phylogenetic Analyses Suggest Structural Radiation of a Single Lineageof Spider-Venom ICK Toxins. In order to examine whether ICKtoxins in funnel-web spider venom have a common origin, weexamined their phylogenetic relationship with homologous se-quences from nonvenom gland spider tissues (hypodermal tissue,book lung, and leg), as well as tissue from the whip scorpionMastigoproctus giganteus, a nonvenomous arachnid. All knownspider-venom sequences deposited in UniProtKB were also used(Fig. 5 and SI Appendix). Briefly, sequence files were downloaded,trimmed, and assembled, and then contigs/singlets were analyzedto identify ICK variants. These ICK variants included non-venom-gland transcripts from H. infensa (leg contig DN34637_c0_g1_i4)and Haplopelma hainanum (book-lung contigs DN16443_c0_g1_i1,DN40049_c0_g1_i5, and i6, and DN40049_c0_g2_i1) that encodedpeptide sequences that are identical or near identical to confirmedvenom components. In the case of H. infensa, the leg transcript wasidentical to that of ω-HXTX-Hi1h (venom gland contig RL5_rep_c1832), which was confidently identified in the venom (SI Appendix).Similarly, three transcripts from the book lung transcriptome ofH.hainanum are identical with toxins previously characterized fromthe venom of this species. Detection of these sequences in non-venom gland transcriptomes is likely due to “leaky” expression ofbona fide toxin genes, which has been reported in other venomouslineages (46).Strikingly, our analyses of nonvenom gland tissues only

returned hits to peptides with simple “classical” ICK folds. Thus,the highly elaborated ICK structures observed for severalvenom-peptide superfamilies in H. infensa appear to representextreme cases of structural diversification following functional

recruitment as toxins. Phylogenetic reconstruction of these andall other spider ICK peptides identified using our approachgrouped all confirmed venom knottins into a well-supportedmonophyletic clade (Fig. 5). Our analysis also clustered severalputative venom toxins with sequences from the leg muscletranscriptome of Liphistius malayanus, which could indicate ei-ther a physiological role for these putative toxins or multiplerecruitment of ICK peptides into spider venom. However, re-gardless of the number of ICK recruitments into venom, our re-sults suggest that the vast majority of the structural diversity ofICK toxins, which make up the bulk of the molecular diversity inthe venom ofH. infensa, arose from a single lineage of weaponized“classical” ICK fold.

Elaborations on the Basic Knottin Disulfide Architecture Facilitate ICKDiversification. Although we were unable to resolve the higherorder phylogenetic relationships between ICK toxin superfam-ilies, our analyses highlight the importance of loss and gain ofdisulfide bonds in their evolutionary radiation (Fig. 5). Func-tional radiation of ICK and other DRP folds is thought to occurprimarily by diversification of the intercystine loop regions,which does not affect the structural integrity of the peptides.However, our data show that elaborations on the disulfide ar-chitecture itself can play a prominent role in the diversificationof DRPs. While the core ICK motif contains only three disulfidebonds (11), our data suggest that the ancestral ICK spider toxineither contained a fourth disulfide bond stabilizing the β-sheet inloop 4 or that this additional disulfide was gained on at least twoseparate occasions. A loss of intramolecular disulfide bonds mayseem counterintuitive during the evolution of venom DRPs forwhich stability is paramount. However, intriguingly, the stabi-lizing loop-4 disulfide bridge was present in all homologs iden-tified in the outgroup datasets. Moreover, disulfide loss appearsto have been crucial during evolution of DDH toxins (10) fromICK precursors in scorpions (21), suggesting that this scenariomay not be as unlikely as previously postulated (10, 47).Regardless of whether the ancestral ICK toxin contained the

loop-4 disulfide, our data indicate that gain of disulfide bondsunderlies the myriad of structural variations in ICK spider toxins.This not only highlights the importance of disulfide bonds instabilizing the structure of spider toxins but further illustrates theextraordinary versatility of the ICK fold. The permissiveness of theICK fold to both sequence variation (21) and disulfide elabora-tions likely explains why it appears to be the most widespread

2N8F

1 4

32

65

C

N

2N6R

N

1G9PC

6 32 5

41 1 4

32

65

C

N

2N6N1VTX

1

6 32 5

4

N C

2M36

1

6 32 5

4

NC

1

6 32 5

4

N C

1DL0

1

6 32 5

4

N C

5

3

24 9

810

N C

6

7

1

14

32

N C

DDH fold ICK fold

Domain duplication: limited intragenic duplication to produce double-knot and double-DDH toxins

Domain elaboration: single-knot toxins with extended termini, enlarged loops, additional SS bonds

Ancestralfold

1

26 3

5

4 7

89

10

CN

1112

15 2

4

3

6

CN

9810

7

Double-knot

toxin

Double DDH toxin

(prokineticin/MIT1 fold)

2KRA

Conjugation: limited conjugation to produce mixed-domain toxins

2N8KOther domains

Conjugated

DDH-DABS fold

Fig. 6. Overview of structural innovations in spider-venom ICK peptides. Schematic overview of the mechanisms by which an ancestral ICK or DDH toxin wasduplicated, conjugated, and elaborated upon to form the diversity of ICK scaffolds found in extant mygalomorph spider venoms. Gray numbered circlesrepresent cysteine residues, with core disulfide bonds that form the cystine knot or DDH motif indicated by solid red lines. Additional noncore disulfides arehighlighted in orange, with dashed lines indicating interdomain disulfide bonds. Blue arrows and green rectangles denote β-strands and α-helices, re-spectively. N and C termini are labeled, and PDB accession codes for representative structures are given below each schematic peptide fold.

Pineda et al. PNAS | May 26, 2020 | vol. 117 | no. 21 | 11405

BIOCH

EMISTR

Y

Dow

nloa

ded

by g

uest

on

Sep

tem

ber

8, 2

021

Page 8: Structural venomics reveals evolution of a complex venom ... · planet, with >100,000 extant species (2). A key contributor to their evolutionary success is the use of venom to capture

cysteine-rich fold known, being found in diverse taxa, includingarachnids, fungi, insects, molluscs, plants, sea anemones, sponges,and even viruses (21, 48). ICK peptides also appear to constitutethe most abundant fold in the “dark proteome” (49), suggestingthat we still have much to learn about their structural andfunctional plasticity.In summary, our structural venomics approach revealed the

pathway for evolution of complex venoms in Australian funnel-web spiders. Multiple duplications of an ancestral ICK gene werefollowed by periods of diversification, generation of structuralvariants with additional disulfide bonds, and selection of variantsvia adaptive evolution to generate small disulfide-rich toxins.Limited intragenic duplication of an ICK gene produced double-knot toxins, but most venom peptides with mass >6 kDa arosefrom diverse elaborations of a single ICK scaffold, often ac-companied by inclusion of additional disulfide bonds (Fig. 6).Thus, the extraordinary pharmacological diversity of spidervenom, which is one of the most complex chemical arsenals inthe natural world, is conferred by a panel of structurally relatedDRPs that all evolved from an ancestral ICK toxin.

Materials and MethodsSpider and Venom Collection. H. infensa (mudjar nhiling guran) were col-lected on private land from Orchid Beach, K’gari (Fraser Island), QLD, Aus-tralia. Spiders were housed individually at 23 to 25 °C in plastic containers indark cabinets. Adult female spiders were aggravated using forceps untilvenom was expressed on the fang tips, and then the venom was collectedcontinuously by aspiration until no more venom could be obtained. Weassume this corresponds to exhaustion of the venom-gland contents, andhence the venom collected should closely represent the complete venomproteome. Venom was stored at –20 °C. If impurities, such as soil or sand,were detected, the venom was diluted 10-fold with ultrapure water, andcontaminants were removed using a low-protein–binding 0.22-μm Ultrafree-MC centrifugal filter (Millipore).

Venom Profiling. The venom profile of H. infensa was obtained using acombination of RP-HPLC and MS techniques (6). Offline liquid chromatog-raphy (LC)-MALDI-TOF/TOF was utilized to maximize the amount of massinformation extracted. Liquid chromatography-electrospray ionization(LC-ESI) OrbiTrap MS data were acquired as described (50). A conservativeanalysis was used to manually curate both MS datasets. Analysis criteria in-cluded: 1) Only spectra with well-defined peaks were considered as “real”; 2)a signal-to-noise ratio of 15 was set before masses were recorded; 3) masseswere excluded if they met the following criteria relative to another recordedmass: +16 Da and +32 Da (oxidized), –18 Da (dehydrated), –17 Da (deami-dated), +22 Da (sodium adduct); 4) masses <1,000 Da, which were presumedto correspond to small organic compounds such as polyamines or matrixclusters, rather than peptides, were excluded from the analysis. Duplicatedmasses were eliminated from adjacent fractions, and a final list of sortedmasses was generated. Potential dimers and doubly charged species werealso eliminated (±3 to 5 Da). Then 3D contour plots, or “venom landscapes,”were constructed from LC-MALDI-TOF/TOF MS data as described (6).

For proteomic analysis, 5 mg of milked venom pooled from three speci-mens was fractionated on a Shimadzu Prominence HPLC (Kyoto, Japan) usinga Vydac C18 column (300 Å, 5 μm, 4.6 mm × 250 mm). Venom was elutedusing a gradient of 5 to 80% solvent B (90% acetonitrile [ACN], 0.043%trifluoroacetic acid [TFA]) in solvent A (0.05% TFA) over 2 h. Twelve fractionswere manually collected (one every 10 min), and absorbance was monitoredat 214 nm and 280 nm. Fractions were lyophilized and reconstituted in ul-trapure water, and 1/10 were removed for proteomic analyses. Fractionatedproteins and peptides were reduced with dithiothreitol (5 mM in 50 mMammonium bicarbonate, 15% ACN, pH 8), alkylated with iodoacetamide(10 mM in 50 mM ammonium bicarbonate, 15% ACN, pH 8), and digestedovernight with trypsin (30 ng/μL in 15 μL of 50 mM ammonium bicarbonate,15% ACN, pH 8). Upon completion of digestion, formic acid (FA) was addedto a final concentration of 5%, and the digested samples were desaltedusing a C18 ZipTip (Thermo Fisher, Waltham, MA).

Desalted, digested samples were dried in a vacuum centrifuge and dis-solved in 0.5% FA, and 2 μg were analyzed by LC–tandem MS (LC-MS/MS) onan AB Sciex 5600TripleTOF (AB Sciex, Framingham, MA) equipped with aTurbo-V source heated to 550 °C and coupled to a Shimadzu Nexera UHPLC.Samples were fractionated on an Agilent Zorbax stable-bond C18 column(2.1 × 100 mm, 1.8 μm, 300 Å pore size) using a gradient of 1 to 40% solvent

B (90% ACN, 0.1% FA) in 0.1% FA over 45 min, using a flow rate of180 μL/min. MS1 survey scans were acquired at 300 to 1,800 m/z over 250ms, and the 20 most intense ions with a charge of +2 to +5 and an in-tensity of at least 120 counts per second were selected for MS2. The unitmass precursor ion inclusion window was ±0.7 Da, and isotopes within±2 Da were excluded from MS2, which scans were acquired at 80 to1,400 m/z over 100 ms and optimized for high resolution. For proteinidentification, MS/MS spectra were searched against the translated an-notated H. infensa venom-gland transcriptome using ProteinPilot v4.5.1(AB SCIEX, Framingham, MA). Searches were run as thorough identifi-cation searches, specifying tryptic digestion and cysteine alkylation byiodoacetamide. Decoy-based FDRs were estimated by ProteinPilot; forprotein identification, we used a protein confidence cutoff corresponding to alocal FDR of <1%. Spectra were also manually examined to further eliminatefalse positives. MS/MS data were also searched against the venom-gland tran-scriptome using Mascot v2.5.1 (Matrix Science, Boston, MA) using a decoy-basedFDR cutoff of <1%. For this search, we set the instrument to ESI-quadrupoletime-of-flight (QUAD-TOF), specified trypsin as the cleavage enzyme, allowedup to one missed cleavage, and used a peptide tolerance of ±10 parts permillion andMS/MS tolerance of ±0.1 Da, with cysteine carbamidomethylation asa fixed modification. C-terminal amidation and N-terminal pyroglutamate for-mation were allowed as variable modifications. These searches returned 3,012and 2,944 hits for ProteinPilot and Mascot, respectively.

Complementary DNA (cDNA) Library Construction and Sequencing. Three spi-ders were milked via aggravation to deplete their glands of venom.Three days later, they were anesthetized, and their venom glands weredissected and placed in TRIzol reagent (Life Technologies). Total RNA frompooled venom glands was extracted following the standard TRIzol protocol.Messenger RNA (mRNA) enrichment from total RNA was performed using anOligotex direct mRNA mini kit (Qiagen). RNA quality and concentration weremeasured using a Bioanalyzer 2100 pico chip (Agilent Technologies).

A cDNA library was constructed from 100 μg of mRNA using the standardRoche cDNA rapid library preparation and emPCR method. Sequencing wascarried at the Australian Genome Research Facility using a ROCHE GS-FLXsequencer. The raw standard flowgram file (.SFF) was processed using Cangssoftware, and low-quality sequences were discarded (Phred score cutoff of25) (51). De novo assembly was performed using MIRA software v3.2 (29)using the following parameters: -GE:not = 4–project = Hinfensa–job =denovo,est,accurate,454 454_SETTINGS -CL:qc = no -AS:mrpc = 1 -AL:mrs =99,egp = 1. Consensus sequences from contigs and singlets were submittedto Blast2GO to acquire functional annotations (30). In parallel to the func-tional annotation, an in-house algorithm (Toxin|seek) was written to allowprediction of potential toxin ORFs that may represent novel DRPs. Onceannotated and predicted, lists were merged, redundancies were removed,and signal sequences were determined using SignalP v3.1 (52). Putativepropeptide cleavage sites were predicted using a sequence logo analysis (53)of all known spider precursors (SI Appendix, Fig. S2). After identifying allprocessing signals, toxins were classified into superfamilies based on theirsignal sequence and cysteine framework. Signal sequences were considereddifferent if the level of amino acid sequence identity was <65%; the pair-wise comparison of signal sequences in SI Appendix, Fig. S3 shows that, formost signal sequence comparisons, the level of identity is <30%.

To examine the extent of sequence coverage provided by the 454 plat-form, we generated 11 subsets of data that randomly sampled the entire readset in 10% increments. Each randomly sampled dataset was assembled aspreviously described using MIRA v3.2. Each dataset was then submitted toTox|Blast, a module of Tox|Note (31), or Transdecoder v5.5.0. Hits wereisolated from the Tox|Blast output. We used cd-hit2d v4.6 (54) to cluster andcompare results from the Tox|Blast/Transdecoder outputs (settings: -c 0.6 -n4 -d 100) and then recorded the number of superfamilies recovered for eachdata subset (SI Appendix, Fig. S37).

Nomenclature. Toxins were named using the rational nomenclature describedpreviously (55). Spider taxonomy was taken from the World Spider Catalogv19.5 (https://wsc.nmbe.ch).

Peptide Expression and Purification. Genes encoding toxins of interest weresynthesized using codons optimized for E. coli expression and subcloned intoa pLICC_D168 vector by GeneArt (Regensburg, Germany). Plasmids weretransformed into E. coli BL21 (λDE3), and toxin expression and purificationwere carried out using published methods (40) (see SI Appendix, Fig. S5 foran example). Peptides were separated from salts, His6-MBP and His6-TEVprotease using a Shimadzu Prominence HPLC system (Kyoto, Japan). Sepa-ration was performed on a Jupiter C4 reverse-phase HPLC column (250 × 10 mm,

11406 | www.pnas.org/cgi/doi/10.1073/pnas.1914536117 Pineda et al.

Dow

nloa

ded

by g

uest

on

Sep

tem

ber

8, 2

021

Page 9: Structural venomics reveals evolution of a complex venom ... · planet, with >100,000 extant species (2). A key contributor to their evolutionary success is the use of venom to capture

300 Å, 10 μm; Phenomenex), using a flow rate of 5 mL/min and an elutiongradient of 5 to 60% solvent B over 30 min. Absorbance was monitored at214 nm and 280 nm, and fractions were manually collected. Purified pep-tides were lyophilized and reconstituted in water or buffer, and approxi-mate peptide concentration was determined from absorbance at 280 nmmeasured using a NanoDrop spectrophotometer (Thermo Scientific). ESI-MSspectra were acquired using an API 2000 LC-MS/MS triple quadrupole massspectrometer. Mass spectra were obtained using positive ion mode over m/zrange of 400 to 1,900 Da.

Structure Determination. The structures of toxins from SF6, SF22, SF23, andSF26 were determined using heteronuclear NMR. Each sample contained300 μL of 13C/15N-labeled peptide (100 to 300 μM) in 20 mM 2-(N-morpho-lino)ethanesulfonic acid (MES) buffer, 0.02% NaN3, 5% D2O, pH 6. NMR datawere acquired at 25 °C on an Avance II+ 900 MHz spectrometer (BrukerBioSpin) equipped with a cryogenically cooled triple resonance probe. Res-onance assignments were obtained from two-dimensional (2D) 1H-15N het-eronuclear single-quantum coherence (HSQC), 2D 1H-13C HSQC, 3D HNCACB,3D CBCA(CO)NH, 3D HNCO, 3D HBHA(CO)NH, and four-dimensional (4D)HCC(CO)NH–total correlated spectroscopy (41) spectra. The 3D and 4Dspectra were acquired using nonuniform sampling and were reconstructedusing the maximum entropy algorithm in the Rowland NMR Toolkit.13C-aliphatic, 13C-aromatic and 15N-edited nuclear Overhauser effect spec-troscopy (NOESY)-HSQC spectra were acquired using uniform sampling forextraction of interproton distance restraints.

Resonance assignments and integration of NOESY peaks were achievedusing SPARKY 3 (56), followed by automatic peak list assignment, extractionof distance restraints, and structure calculation using the torsion angle dy-namics package CYANA 3.0 (57). Dihedral-angle restraints derived fromTALOS chemical shift analysis (58) were also integrated in the calculation,with the restraint range set to twice the estimated SD. CYANA was used tocalculate 200 structures from random starting conformations, and then thebest 20 structures were selected based on their stereochemical quality asjudged using MolProbity (59). Structural statistics and Protein Data Bank(PDB) accession codes are summarized in SI Appendix, Table S2.

Homology Modeling. Superfamilies identified to have a structural counterpartin the PDB (SI Appendix, Table S5) were modeled using automodel param-eters in Modeler v9.12 (60). A total of 100 to 500 models were generated foreach structure. The lowest energy models were selected and visualized usingPyMOL v.2.0 (Schrödinger, LLC).

Phylogenetic Analyses. Bayesian inference (BI) of phylogeny and maximumlikelihood (ML) reconstruction were used to examine the evolution of ICKtoxins. To search for nonvenom outgroups, we examined datasets from onespecies of whip scorpion (M. giganteus [Uropygi]; SRR1145698), a non-venomous sister group to spiders (61), and nonvenom gland transcriptomesfrom six species of taxonomically diverse spiders: leg muscle from H. infensa(PRJEB35676), Macrothele calpeiana (SRR6994009), L. malayanus (SRR1145736),and Neoscona arabesca (SRR1145741); hypodermal tissue from Cupiennius salei(SRR880446); and book-lung tissue from H. hainanum (SRR2155568). Fastq fileswere downloaded, trimmed using trimmomatic v0.39 (62) with window quality30, window size 4, minimum read length 60 bp, and then assembled using de-fault settings in Trinity v2.3 (63). The resulting assemblies were combined, andpredicted coding DNA sequences (CDSs) were extracted using a combination oftools (31, 64). Accession numbers for all sequences are given in SI Appendix,File 2. The resulting sequences were filtered based on the presence of a signal

peptide predicted using SignalP v4.1 (52) and aligned with MAFFT v7.304busing a local Smith–Waterman alignment method with affine gap cost (L-INS-Ioption) (65), and then the alignment was adjusted manually to correct mis-aligned structurally equivalent cysteine residues before realigning the inter-cysteine loops using the MAFFT regional realignment script. The resultingalignment was used to estimate phylogenetic relationships via ML re-construction using IQ-TREE v.6.12 (66). The TESTNEW command was used toallow IQ-Tree to determine the best fitting model according to Bayesian in-formation criterion analysis, and node support was tested by ultrafast boot-strap approximation (UFBoot) (67) using 10,000 iterations.

We also estimated phylogenetic relationships via BI using MrBayes v3.2.2(68), with standard settings, except that we specified the same amino acidsubstitution model identified by IQ-TREE (69, 70). The Markov chain MonteCarlo algorithm was run for 10,000,000 generations with sampling fre-quency of 500. Two or four runs were performed simultaneously with chaintemperatures of 0.05, 0.1, 0.2, or 0.5. However, none of the calculationsreached convergence as judged by comparison of mean SD of split fre-quencies (SD > 0.5). A comparison of phylogenies obtained by ML and BIanalysis of a reduced dataset that did converge (SI Appendix, Fig. S40)showed good agreement between the two approaches and supported theML phylogeny based on the full sequence alignment (SI File 3). For the BItree summary, the log-likelihood score of each saved tree was plottedagainst the number of generations to establish the point at which the loglikelihood scores reached asymptote. Posterior probabilities for clades wereestablished by constructing a majority-rule consensus tree for all treesgenerated after burn-in. Trees were visualized and exported using Archae-opteryx v0.99 (www.phylosoft.org/archaeopteryx) and FigTree v1.3.1 (http://tree.bio.ed.ac.uk/software/figtree/).

Blowfly Toxicity Assay. Recombinant toxins were dissolved in water and in-jected into the ventrolateral thoracic region of sheep blowflies (Luciliacuprina; mass 23.9 to 36.5 mg) as previously described (42). The extent oflethality and paralysis was scored at 0.5, 1, and 24 h postinjection.

Data Deposition. Metadata and annotated nucleotide sequences were de-posited in the European Nucleotide Archive under project accessionsPRJEB6062 (ERA298588) and PRJEB35693. Atomic coordinates were depositedin the Protein Data Bank under accession codes 2N6N, 2N6R, 6BA3, and 2N8Kwhile the corresponding NMR chemical shifts were deposited inBioMagResBank under accession codes BMRB25774, BMRB25778, BMRB25853,and BMRB30352.MS datawe deposited to the ProteomeXchange Consortium(71) via the PRIDE (72) partner repository with the dataset identifierPXD016886.

ACKNOWLEDGMENTS. We thank David Wilson for collection of H. infensa;Philip Lawrence for curating MS data; Julie Klint, Niraj Bende, and Jessie Erfor recombinant peptide production; Lars Ellgaard for alerting us to themini-granulin fold; Geoff Brown (Department of Agriculture and Fisheries,Queensland, Australia) for supply of blowflies; and Michael Nuhn and Lien Li(European Molecular Biology Laboratory-Australia Bioinformatics Resource)for help with data submission to European Nucleotide Archive. This researchwas supported by Australian National Health & Medical Research CouncilPrincipal Research Fellowship APP1136889 (to G.F.K.), Australian ResearchCouncil Discovery Early Career Researcher Award Fellowship DE160101142(to E.A.B.U.), Research Council of Norway Funding Scheme for IndependentProjects–Young Research Talents Fellowship 287462, and a University ofQueensland International Postgraduate Scholarship (to S.S.P.).

1. J. Lozano-Fernandez et al., A molecular palaeobiological exploration of ar-thropod terrestrialization. Philos. Trans. R. Soc. Lond. B Biol. Sci. 371, 20150133(2016).

2. C. Mora, D. P. Tittensor, S. Adl, A. G. Simpson, B. Worm, How many species are thereon Earth and in the ocean? PLoS Biol. 9, e1001127 (2011).

3. F. C. Schroeder et al., NMR-spectroscopic screening of spider venom reveals sulfatednucleosides as major components for the brown recluse and related species. Proc.Natl. Acad. Sci. U.S.A. 105, 14283–14287 (2008).

4. L. Kuhn-Nentwig, R. Stocklin, W. Nentwig, Venom composition and strategies inspiders: Is everything possible? Adv. In Insect Phys. 40, 1–86 (2011).

5. G. F. King, M. C. Hardy, Spider-venom peptides: Structure, pharmacology, and po-tential for control of insect pests. Annu. Rev. Entomol. 58, 475–496 (2013).

6. P. Escoubas, B. Sollod, G. F. King, Venom landscapes: Mining the complexity of spidervenoms via a combined cDNA and mass spectrometric approach. Toxicon 47, 650–663(2006).

7. J. J. Smith et al., ““Therapeutic applications of spider-venom peptides”” in Venoms toDrugs: Venom as a Source for the Development of Human Therapeutics, G. F. King,Ed. (The Royal Society of Chemistry, 2015), pp. 221–244.

8. C.-H. Yuan et al., Discovery of a distinct superfamily of Kunitz-type toxin (KTT) fromtarantulas. PLoS One 3, e3414 (2008).

9. R. A. V. Morales et al., Chemical synthesis and structure of the prokineticin Bv8.ChemBioChem 11, 1882–1888 (2010).

10. X. Wang et al., Discovery and characterization of a family of insecticidal neurotoxinswith a rare vicinal disulfide bridge. Nat. Struct. Biol. 7, 505–513 (2000).

11. P. K. Pallaghy, K. J. Nielsen, D. J. Craik, R. S. Norton, A common structural motif in-corporating a cystine knot and a triple-stranded β-sheet in toxic and inhibitorypolypeptides. Protein Sci. 3, 1833–1839 (1994).

12. G. F. King, H. W. Tedford, F. Maggio, Structure and function of insecticidal neuro-toxins from australian funnel-web spiders. Toxin Rev. 21, 361–389 (2002).

13. G. Postic, J. Gracy, C. Périn, L. Chiche, J. C. Gelly, KNOTTIN: The database of inhibitorcystine knot scaffold after 10 years, toward a systematic structure modeling. NucleicAcids Res. 46, D454–D458 (2018).

14. H. Kolmar, Natural and engineered cystine knot miniproteins for diagnostic andtherapeutic applications. Curr. Pharm. Des. 17, 4329–4336 (2011).

15. V. Herzig, G. F. King, The cystine knot is responsible for the exceptional stability of theinsecticidal spider toxin ω-hexatoxin-Hv1a. Toxins (Basel) 7, 4366–4380 (2015).

Pineda et al. PNAS | May 26, 2020 | vol. 117 | no. 21 | 11407

BIOCH

EMISTR

Y

Dow

nloa

ded

by g

uest

on

Sep

tem

ber

8, 2

021

Page 10: Structural venomics reveals evolution of a complex venom ... · planet, with >100,000 extant species (2). A key contributor to their evolutionary success is the use of venom to capture

16. G. F. King, Venoms as a platform for human drugs: Translating toxins into thera-peutics. Expert Opin. Biol. Ther. 11, 1469–1484 (2011).

17. J. I. Fletcher et al., The structure of a novel insecticidal neurotoxin, ω-atracotoxin-HV1,from the venom of an Australian funnel web spider. Nat. Struct. Biol. 4, 559–566(1997).

18. N. S. Bende et al., The insecticidal neurotoxin Aps III is an atypical knottin peptide thatpotently blocks insect voltage-gated sodium channels. Biochem. Pharmacol. 85,1542–1554 (2013).

19. C. J. Bohlen et al., A bivalent tarantula toxin activates the capsaicin receptor, TRPV1,by targeting the outer pore domain. Cell 141, 834–845 (2010).

20. I. R. Chassagnon et al., Potent neuroprotection after stroke afforded by a double-knot spider-venom peptide that inhibits acid-sensing ion channel 1a. Proc. Natl. Acad.Sci. U.S.A. 114, 3750–3755 (2017).

21. E. A. B. Undheim, M. Mobli, G. F. King, Toxin structures as evolutionary tools: Usingconserved 3D folds to study the evolution of rapidly evolving peptides. BioEssays 38,539–548 (2016).

22. R. C. Rodríguez de la Vega, A note on the evolution of spider toxins containing theICK-motif. Toxin Rev. 24, 383–395 (2005).

23. P. Escoubas, L. Quinton, G. M. Nicholson, Venomics: Unravelling the complexity ofanimal venoms with mass spectrometry. J. Mass Spectrom. 43, 279–295 (2008).

24. S. Dutertre et al., Deep venomics reveals the mechanism for expanded peptide di-versity in cone snail venom. Mol. Cell. Proteomics 12, 312–329 (2013).

25. A. Palagi et al., Unravelling the complex venom landscapes of lethal Australianfunnel-web spiders (Hexathelidae: Atracinae) using LC-MALDI-TOF mass spectrome-try. J. Proteomics 80, 292–310 (2013).

26. P. Escoubas, L. Rash, Tarantulas: Eight-legged pharmacists and combinatorial chem-ists. Toxicon 43, 555–574 (2004).

27. P. A. Zobel-Thropp, S. M. Correa, J. E. Garb, G. J. Binford, Spit and venom fromscytodes spiders: A diverse and distinct cocktail. J. Proteome Res. 13, 817–835 (2014).

28. S. S. Pineda et al., Diversification of a single ancestral gene into a successful toxinsuperfamily in highly venomous Australian funnel-web spiders. BMC Genomics 15,177 (2014).

29. B. Chevreux et al., Using the miraEST assembler for reliable and automated mRNAtranscript assembly and SNP detection in sequenced ESTs. Genome Res. 14, 1147–1159(2004).

30. A. Conesa et al., Blast2GO: A universal tool for annotation, visualization and analysisin functional genomics research. Bioinformatics 21, 3674–3676 (2005).

31. S. S. Pineda et al., ArachnoServer 3.0: An online resource for automated discovery,analysis and annotation of spider toxins. Bioinformatics 34, 1074–1076 (2018).

32. B. Li, C. N. Dewey, RSEM: Accurate transcript quantification from RNA-seq data withor without a reference genome. BMC Bioinformatics 12, 323 (2011).

33. B. G. Fry et al., The toxicogenomic multiverse: Convergent recruitment of proteinsinto animal venoms. Annu. Rev. Genomics Hum. Genet. 10, 483–511 (2009).

34. O. Biner et al., Isolation, N-glycosylations and function of a hyaluronidase-like enzymefrom the venom of the spider Cupiennius salei. PLoS One 10, e0143963 (2015).

35. R. M. Kini, Excitement ahead: Structure, function and mechanism of snake venomphospholipase A2 enzymes. Toxicon 42, 827–840 (2003).

36. D. C. de Graaf et al., Insights into the venom composition of the ectoparasitoid waspNasonia vitripennis from bioinformatic and proteomic studies. Insect Mol. Biol. 19(suppl. 1), 11–26 (2010).

37. B. Vincent et al., The venom composition of the parasitic wasp Chelonus inanitusresolved by combined expressed sequence tags analysis and proteomic approach.BMC Genomics 11, 693 (2010).

38. S. Wen et al., Discovery of an MIT-like atracotoxin family: Spider venom peptides thatshare sequence homology but not pharmacological properties with AVIT familyproteins. Peptides 26, 2412–2426 (2005).

39. P. I. Silva Jr., S. Daffre, P. Bulet, Isolation and characterization of gomesin, an18-residue cysteine-rich defense peptide from the spider Acanthoscurria gomesianahemocytes with sequence similarities to horseshoe crab antimicrobial peptides of thetachyplesin family. J. Biol. Chem. 275, 33464–33470 (2000).

40. J. K. Klint et al., Production of recombinant disulfide-rich venom peptides for struc-tural and functional analysis via expression in the periplasm of E. coli. PLoS One 8,e63865 (2013).

41. M. Mobli, A. S. Stern, W. Bermel, G. F. King, J. C. Hoch, A non-uniformly sampled 4DHCC(CO)NH-TOCSY experiment processed using maximum entropy for rapid proteinsidechain assignment. J. Magn. Reson. 204, 160–164 (2010).

42. N. S. Bende et al., The insecticidal spider toxin SFI1 is a knottin peptide that blocks thepore of insect voltage-gated sodium channels via a large β-hairpin loop. FEBS J. 282,904–920 (2015).

43. A. H. Jin et al., Conotoxin Φ-MiXXVIIA from the superfamily G2 employs a novelcysteine framework that mimics granulin and displays anti-apoptotic activity. Angew.Chem. Int. Ed. Engl. 56, 14973–14976 (2017).

44. L. D. Nielsen et al., The three-dimensional structure of an H-superfamily conotoxinreveals a granulin fold arising from a common ICK cysteine framework. J. Biol. Chem.294, 8745–8759 (2019).

45. N. S. Bende et al., A distinct sodium channel voltage-sensor locus determines insectselectivity of the spider toxin Dc1a. Nat. Commun. 5, 4350 (2014).

46. J. J. Smith, E. A. B. Undheim, True lies: Using proteomics to assess the accuracy oftranscriptome-based venomics in centipedes uncovers false positives and revealsstartling intraspecific variation in Scolopendra subspinipes. Toxins (Basel) 10, 96(2018).

47. J. J. Smith et al., Unique scorpion toxin with a putative ancestral fold provides insightinto evolution of the inhibitor cystine knot motif. Proc. Natl. Acad. Sci. U.S.A. 108,10478–10483 (2011).

48. B. Madio, E. A. B. Undheim, G. F. King, Revisiting venom of the sea anemone Sti-chodactyla haddoni: Omics techniques reveal the complete toxin arsenal of a well-studied sea anemone genus. J. Proteomics 166, 83–92 (2017).

49. N. Perdigão et al., Unexpected features of the dark proteome. Proc. Natl. Acad. Sci.U.S.A. 112, 15898–15903 (2015).

50. C. Dauly, P. Escoubas, G. F. King, G. M. Nicholson, “Characterization of spider venompeptides by high-resolution LC-MS/MS analysis” (Application Note: 511, ThermoFisher Scientific, 2011).

51. R. V. Pandey, V. Nolte, C. Schlötterer, CANGS: A user-friendly utility for processing andanalyzing 454 GS-FLX data in biodiversity studies. BMC Res. Notes 3, 3 (2010).

52. T. N. Petersen, S. Brunak, G. von Heijne, H. Nielsen, SignalP 4.0: Discriminating signalpeptides from transmembrane regions. Nat. Methods 8, 785–786 (2011).

53. T. D. Schneider, R. M. Stephens, Sequence logos: A new way to display consensussequences. Nucleic Acids Res. 18, 6097–6100 (1990).

54. L. Fu, B. Niu, Z. Zhu, S. Wu, W. Li, CD-HIT: Accelerated for clustering the next-generation sequencing data. Bioinformatics 28, 3150–3152 (2012).

55. G. F. King, M. C. Gentz, P. Escoubas, G. M. Nicholson, A rational nomenclature fornaming peptide toxins from spiders and other venomous animals. Toxicon 52,264–276 (2008).

56. T. D. Goddard, D. G. Kneller, SPARKY (Version 3.11.5, University of California, SanFrancisco, 2001).

57. P. Güntert, Automated NMR structure calculation with CYANA. Methods Mol. Biol.278, 353–378 (2004).

58. Y. Shen, F. Delaglio, G. Cornilescu, A. Bax, TALOS+: A hybrid method for predictingprotein backbone torsion angles from NMR chemical shifts. J. Biomol. NMR 44,213–223 (2009).

59. V. B. Chen et al., MolProbity: All-atom structure validation for macromolecular crys-tallography. Acta Crystallogr. D Biol. Crystallogr. 66, 12–21 (2010).

60. A. Sali, T. L. Blundell, Comparative protein modelling by satisfaction of spatial re-straints. J. Mol. Biol. 234, 779–815 (1993).

61. P. P. Sharma et al., Phylogenomic interrogation of arachnida reveals systemic conflictsin phylogenetic signal. Mol. Biol. Evol. 31, 2963–2984 (2014).

62. A. M. Bolger, M. Lohse, B. Usadel, Trimmomatic: A flexible trimmer for Illumina se-quence data. Bioinformatics 30, 2114–2120 (2014).

63. M. G. Grabherr et al., Full-length transcriptome assembly from RNA-Seq data withouta reference genome. Nat. Biotechnol. 29, 644–652 (2011).

64. E. Afgan et al., Genomics virtual laboratory: A practical bioinformatics workbench forthe cloud. PLoS One 10, e0140829 (2015).

65. K. Katoh, D. M. Standley, MAFFT multiple sequence alignment software version 7:Improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013).

66. L. T. Nguyen, H. A. Schmidt, A. von Haeseler, B. Q. Minh, IQ-TREE: A fast and effectivestochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol.32, 268–274 (2015).

67. B. Q. Minh, M. A. T. Nguyen, A. von Haeseler, Ultrafast approximation for phyloge-netic bootstrap. Mol. Biol. Evol. 30, 1188–1195 (2013).

68. F. Ronquist, J. P. Huelsenbeck, MrBayes 3: Bayesian phylogenetic inference undermixed models. Bioinformatics 19, 1572–1574 (2003).

69. T. Müller, M. Vingron, Modeling amino acid replacement. J. Comput. Biol. 7, 761–776(2000).

70. J. Soubrier et al., The influence of rate heterogeneity among sites on the time de-pendence of molecular rates. Mol. Biol. Evol. 29, 3345–3358 (2012).

71. E. W. Deutsch et al., The ProteomeXchange consortium in 2017: Supporting the cul-tural change in proteomics public data deposition. Nucleic Acids Res. 45,D1100–D1106 (2017).

72. Y. Perez-Riverol et al., The PRIDE database and related tools and resources in 2019:Improving support for quantification data. Nucleic Acids Res. 47, D442–D450 (2019).

11408 | www.pnas.org/cgi/doi/10.1073/pnas.1914536117 Pineda et al.

Dow

nloa

ded

by g

uest

on

Sep

tem

ber

8, 2

021