1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 supplementary material immunity-related genes...
TRANSCRIPT
![Page 1: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K](https://reader035.vdocument.in/reader035/viewer/2022070909/5f8f19b07dbeb373966ca214/html5/thumbnails/1.jpg)
1
SUPPLEMENTARY MATERIAL
Immunity-related genes and gene families in Anopheles gambiae: A comparative
genomic analysis
George K. Christophides,1* Evgeny Zdobnov,1* Carolina Barillas-Mury,2 Ewan Birney,3
Stephanie Blandin,1 Claudia Blass,1 Paul T. Brey,4 Frank H. Collins,5 Alberto Danielli,1
George Dimopoulos,6 Charles Hetru,7 Ngo T. Hoa,8 Jules A. Hoffmann,7 Stefan M.
Kanzok,8 Ivica Letunic,1 Elena Levashina,1 Thanasis G. Loukeris,9 Gareth Lycett,1
Stephan Meister,1 Kristin Michel,1 Luis F. Moita,1 Hans-Michael Mueller,1 Mike A. Osta,1
Susan M. Paskewitz,10 Jean-Marc Reichhart,7 Andrey Rzhetsky,11 Laurent Troxler,7
Kenneth D. Vernick,12 Dina Vlachou,1 Jennifer Volz,1 Christian von Mering,1 Jiannong
Xu,12 Liangbiao Zheng,8 Peer Bork,1 Fotis C. Kafatos1#
1European Molecular Biology Laboratory, Meyerhofstr. 1, D-69117 Heidelberg, Germany. 2Colorado State
University, Department of Microbiology, Immunology and Pathology (MIP), Fort Collins, CO 80523-1682,
USA. 3European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10
1SD, UK. 4Unité de Biochimie et Biologie Moléculaire des Insectes, Institut Pasteur, 25 rue du Dr. Roux
75724 Paris Cedex 15 France. 5Center for Tropical Disease Research and Training, University of Notre
Dame, P.O. Box 369, Notre Dame, IN 46556-0369, USA. 6Department of Biological Sciences, Centre for
Molecular Microbiology & Infection, Imperial College of Science, Technology and Medicine, London
SW7 2AZ, UK. 7Institut de Biologie Moléculaire et Cellulaire, Unité Propre de Recherche, 9022 du Centre
National de la Recherche Scientifique, 15 rue Descartes, F67084 Strasbourg Cedex France. 8Yale
University School of Medicine, Epidemiology and Public Health, 60 College Street, New Haven, CT 06520
USA. 9IMBB-FORTH, Vassilika Vouton, P.O.Box 1527, GR-711 10 Heraklion, Crete, Greece.10Department of Entomology, 237 Russell Lab, 1630 Linden Drive, University of Wisconsin, Madison,
Wisconsin 53706, USA. 11Columbia Genome Center and Department of Medical Informatics, Columbia
University, Russ Berrie Medical Science Pavilion, 1150 St. Nicholas Avenue, New York, NY 10032 USA.12Department of Medical and Molecular Parasitology, New York University School of Medicine, 341 East
25th Street, Room 613, New York, NY 10010, USA.
*Contributed equally to the work
#To whom correspondence should be addressed. Email: [email protected]
![Page 2: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K](https://reader035.vdocument.in/reader035/viewer/2022070909/5f8f19b07dbeb373966ca214/html5/thumbnails/2.jpg)
2
1. Methods
Identification of immunity protein families. We employed a range of sequence analysis
procedures combined with careful manual analysis to derive the immunity-related
proteins. First, the predicted proteomes of A. gambiae (1) and D. melanogaster (2) were
compared by means of Smith-Waterman pairwise alignments of all against all proteins.
Second, sequences with significant similarity were clustered together based on pairwise
scores by a single linkage clustering algorithm. Knowing the pitfalls of this method we
also experimented with the newly introduced MCL clustering algorithm (3). Using single
linkage cut-off at e-value 10-20 and MCL inflation equals 3, both methods produced a
similar number of clusters. However, MCL classified more proteins, that were considered
as singletons by the single linkage algorithm. These data were combined with all
identified InterPro signatures of known protein domains and families using the
InterProScan (4) package. In general ,combination of InterPro and single linkage
clustering yielded the most relevant results as judged by further manual analysis by
experts. In some cases, protein families were further screened for possible missed gene
predictions by scanning Anopheles and Drosophila genomic sequences (release 3) with
characteristic HMM (5), trained on manually verified multiple alignments of the known
family members, using GeneWise (6) and HMMer software (S. Eddy,
http://hmmer.wustl.edu/).
Phylogenetic analysis. Full length, or partial predicted sequences where appropriate, were
aligned using Clustal X programs and cladograms constructed by neighbour-joining
![Page 3: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K](https://reader035.vdocument.in/reader035/viewer/2022070909/5f8f19b07dbeb373966ca214/html5/thumbnails/3.jpg)
3
analysis and displayed through Treeview. Detailed nucleotide comparison and
cytogenetic mapping was effected with reference to the Ensembl genome interface
(www.ensembl.org/Anopheles_gambiae) and Flybase (http://flybase.bio.indiana.edu/).
Genes were only considered as 1:1 orthologs if the relevant bootstrap values were above
800 (1000 iterations).
EST and oligonucleotide microarray analysis. Mosquito EST microarray construction,
hybridization and analysis were performed as described (7). Developmental profiling was
performed using embryonic, 4th larval instars, pupal and newly emerged female adult
stages of A. gambiae, Suakoko strain. A pool of total RNA prepared from all stages was
used as reference sample. For immune challenges, 2 to 3-day old female mosquitoes of
the Plasmodium-susceptible strain 4a r/r (8) were pricked with either a sterile needle or
dipped in thick suspension of E. coli or S. aureus. Total RNA was collected 12 hrs after
challenge. For malaria infection experiments, 4a r/r mosquitoes were fed on control
Balb/c mice or on mice infected with P. berghei, and mosquito RNA samples collected at
24 hrs, 28 hrs, 6 days, 11 days and 16 days post-infection.
Oligonucleotide primers were designed to amplify individual genes from a cDNA
library or adult genomic DNA (average probe-length 500bp). PCR products were purified
with ion exchange columns (Macherey-Nagel GmbH &Co.KG, Dueren, Germany) and
spotted at 500ng/µl in 3X SSC. Anti-sense oligonucleotides (60 to 70-mers) were
designed by EUROGENTEC (Seraing, Belgium) and resuspended in 3X SSC at 50 µM
prior to spotting. Usage of 5’ amino modification of the oligonucleotides proved
unnecessary. Spotting was performed on aminosilane coated glass slides using the
![Page 4: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K](https://reader035.vdocument.in/reader035/viewer/2022070909/5f8f19b07dbeb373966ca214/html5/thumbnails/4.jpg)
4
Omnigrid arrayer (GeneMachines, San Carlos, CA) and Telechem Stealth Pins
(Telechem International, Sunnyvale, CA). Cell line 4a3B (9) was challenged with
paraformaldehyde-fixed E. coli and S. aureus (OD 0,05), PGN (10 µg/ml) and H2O2 (2
µM). Duplicated RNA samples were collected 12 hrs after challenge and hybridized to
arrays as described (7). RNA prepared from naïve cells was used as reference.
Figure legend for phylogenetic trees. The following color scheme was used in all the
trees, if not noted otherwise. Red, A. gambiae; blue, D. melanogaster; green, vertebrates;
black, other invertebrates and common stems. Pink and blue shadings indicate putative
gene family expansions in mosquito and the fruitfly respectively. Physical location
(chromosomal subdivision) of genes or gene clusters is given. 1:1 rthologs or orthologous
groups are highlighted with filled or open circles, respectively. Ag, Anopheles gambiae;
Aclu, Acalolepta luxuriosa; Aeae, Aedes aegypti; Aeal, Aedes albopictus; Ag, Anopheles
gambiae; Aecy, Aeshna cyanea; Aldi, Allomyrina dichotoma; Anau, Androctonus
australis hector; Anpe, Antheraea pernyi; Anst, Anopheles stephensi; Apme, Apis
mellifera; Arsu, Armigeres subalbatus; Bomo, Bombyx mori; Bopa, Bombus pascuorum;
Bota, Bos taurus; Caet, Calpodes ethlius; Ce, Caenorhabditis elegans; Ceca, Ceratitis
capitata; Chpl, Chironomus plumosus; Crgi, Crassostrea gigas; Dm, D. melanogaster;
Dv, Drosophila virilis; Foru, Formica rufa; Game, Galleria mellonella; Hevi, Heliothis
virescens; Hs, Homo sapiens; Hyce, Hyalophora cecropia; Hycu, Hyphantria cunea;
Lequ, Leiurus quinquestriatus; Maja, Marsupenaeus japonicus; Mase, Manduca Sexta;
Mumu, Mus musculus; Pale, Pacifastacus leniusculus; Myed, Mytilus edulis; Papr,
Palomena prasina; Pemo, Penaeus monodon; Pihy, Pimpla hypochondriaca; Poma,
![Page 5: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K](https://reader035.vdocument.in/reader035/viewer/2022070909/5f8f19b07dbeb373966ca214/html5/thumbnails/5.jpg)
5
Podisus maculiventris; Prte, Protophormia terraenovae; Pyap, Pyrrhocoris apterus;
Sabu, Sarcophaga bullata; Sape, Sacrophaga peregrina; Spfr, Spodoptera frugiperda;
Stca, Stomoxys calcitrans; Susc, Sus scrofa; Temo, Tenebrio molitor; Tefl, Tetraodon
fluviatilis; Trni, Trichoplusia ni.
![Page 6: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K](https://reader035.vdocument.in/reader035/viewer/2022070909/5f8f19b07dbeb373966ca214/html5/thumbnails/6.jpg)
6
2. Supporting online text
Nomenclature
This study identified a substantial number of genes, approximately 2% of the total
predicted in the current annotation of the Anopheles genome, and belonging to more than
18 protein families. To facilitate future work on this large gene set we named the genes
systematically according to provisional nomenclature rules, modeled on those
recommended by the Human Genome Organization (HUGO) for naming human genes.
Following consultation in the Anopheles genomics community, and to avoid unsystematic
and duplicate names we recommend this as a provisional nomenclature system for the
entire A. gambiae genome, to be supervised by an international committee that is being
set up. The rules are as follows:
1. The names are mnemonic symbols, designed for easy recall. They do not aim to
summarize all current information, which in any case is in complete and subject to
errors (orthology, function, chromosomal location).
2. To avoid errors in electronic communication all names consist exclusively of
capital letters of the Latin alphabet and Arabic numerals; no punctuation marks,
dashes etc. are used.
3. To minimize the length the formal names do not include taxonomic initials. If
similarly named genes of two organisms are being compared, taxonomic initials
can be added for convenience, but do not constitute part of the name (e.g. aTEP to
be easily distinguished from dTep).
![Page 7: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K](https://reader035.vdocument.in/reader035/viewer/2022070909/5f8f19b07dbeb373966ca214/html5/thumbnails/7.jpg)
7
4. Roman letters and numerals indicate protein, italics indicate gene or RNA.
5. The name is based on sequence similarities and carries no functional implications,
which must be determined experimentally.
6. The name consists of two to three contiguous fields, as follows:
- The first field includes three to five letters and is an abbreviations of the
highest sequence grouping used, usually a protein family, e.g. CLIP (for
Clip-domain serine protease).
- The second field, if present, includes one or more letters identifying a
subgroup such as subfamily (e.g. CLIPD), or class (e.g. SCRB).
- The third field enumerates each gene by using consecutive numerals (e.g.
SCRB1,… 12).
- Sometimes the third field numeral can be preceded by letter(s) indicating
gene types within a subgroup (e.g. SCRBQ1, for a gene belonging to the
SCRB Class, and to the croquemort type).
- For historical reasons, in certain families, the third field can also
enumerate by letters rather than numerals (e.g. PGRPLA, for gene A of the
Long subfamily in the PGRP family).
7. It is recommended that names previously used in the literature or in database
submissions be gradually replaced by systematic names, following consultation
with the original author (we have done so for genes previously described by the
authors of the present study). Historical names or names that may be developed
eventually to indicate experimentally verified function or orthology can be used as
synonyms.
![Page 8: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K](https://reader035.vdocument.in/reader035/viewer/2022070909/5f8f19b07dbeb373966ca214/html5/thumbnails/8.jpg)
8
Recognition of infectious non-self
Peptidoglycan Recognition Proteins (PGRPs).
Members of this family have one or more PGRP domains and are important components
of insect immune reactions. The first PGRP was characterized in B. mori as a hemolymph
protein that binds peptidoglycan (PGN) and activates the PPO cascade (10). In
Drosophila, secreted PGRP-SA is essential for activating the Toll signaling pathway
mediated response to Gram positive (Gram+) bacteria, but not to fungi (11), while two
PGRP-LC isoforms act via an alternative immune signaling pathway (Imd) that responds
to Gram- bacteria to induce certain antimicrobial peptides (12), (13). A genome-wide
RNAi screen of Drosophila cells in culture (14) points to PGRP-LC as a key player for
phagocytosis of Gram- but not Gram+ bacteria.
Of the 13 Drosophila genes that encode PGRP domains (15), seven are classified
as short (S) and encode secreted proteins, while six genes of the long (L) subfamily
encode transmembrane or intracellular products. The Anopheles genome includes three
members of the short subfamily (S1, S2 and S3) and four of the long subfamily (LA, LB,
LC and LD). The latter are clear orthologs of correspondingly named Drosophila genes.
However, unlike the Drosophila PGRP-LB protein that is thought to be intracellular,
Anopheles PGRPLB has a putative transmembrane domain; this gene is strongly
upregulated in cells challenged with immune elicitors (7).
![Page 9: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K](https://reader035.vdocument.in/reader035/viewer/2022070909/5f8f19b07dbeb373966ca214/html5/thumbnails/9.jpg)
9
Thioester-containing proteins (TEPs).
In C. elegans, the only TEP (cTEP) identified in the genome displays the conserved
thioester (TE) motif. It is mostly expressed in epithelial cells throughout worm
development and is not induced by immune stimuli.
All six Drosophila Tep proteins (dTep) except dTepV have clear signal peptides,
indicating that they are secreted proteins. dTepI, II and IV are immune-inducible, whereas
dTepIII is expressed only during early developmental stages. dTepII is unique among the
TEP genes as it encodes five alternatively-spliced forms. In contrast, dTEPV may not be
an active gene, as it has never been amplified from cDNA libraries. In all, Drosophila can
produce nine or ten distinct TEPs (16, 17).
Sequence comparison of all cTEP, dTeps, and aTEPs identifies on a single 1:1
ortholog - dTepVI and aTEP13 (Fig. 3), suggesting that these proteins (which lack the TE
motif) might serve highly similar functions in the two insect species. In addition, dTepIII
forms an orthologous group (OG) with aTEP2 and aTEP15; all three proteins have a TE
motif. Three TEP sequences, aTEP12, aTEP14 and cTEP, are highly diverged, forming
deep branches in the tree. Finally, two sequence clusters represent species-specific
expansions of the family, one including exclusively four dTeps and the other ten aTEPs.
The Drosophila-specific expansion includes three proteins with and one without a TE
motif, while the Anopheles-specific expansion includes two with and six without TE
motif (two other proteins are uncertain, or they are only partially represented in the
sequence). The aTEP1 protein (originally designated aTEP-I) is bacterially induced and
promotes phagocytosis of bacteria (18); aTEP4 (originally designated IMCR14) is
strongly upregulated by Plasmodium (19); aTEP3 (lacking a TE motif) is upregulated
![Page 10: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K](https://reader035.vdocument.in/reader035/viewer/2022070909/5f8f19b07dbeb373966ca214/html5/thumbnails/10.jpg)
10
upon bacterial challenge. We speculate that the species-specific family expansions
represent finely tuned radiation of TEPs in response to distinct pathogenic environments
in the fruitfly and the mosquito; and that in insect TEPs more often than in vertebrate
TEPs, protein-specific function does not require the TE motif.
Gram Negative Binding Proteins (GNBPs).
Proteins of this family show homology to the catalytic region of bacterial β-1,3 and β-
1,3-1,4 glucanases (20), (21). Drosophila GNBP-1 exists in both soluble and GPI-
anchored forms and plays an important role in innate immune signaling in response to
bacterial lipopolysaccharides (22). In addition to silkworm Bombyx mori and fruitfly
homologs, one A. gambiae GNBP is known. Characterized moth and mosquito genes are
upregulated by immune challenge, whereas the fruitfly genes are constitutively expressed
at specific developmental stages.
The A. gambiae genome includes 6 GNBP genes which, together with known
moth and fruitfly homologues, reveal two distinct sequence groups (Fig. S1A). Subfamily
A includes all known fruit fly and moth as well as two mosquito sequences (GNBPA1,2).
The GNBPA2 gene of Anopheles and the GNBP3 Drosophila gene are orthologs. A new
subfamily B is mosquito-specific (GNBPB1,2,3,4), and three of its four members are
tightly clustered on chromosomal subdivision 13E (Fig. S1B). Interestingly, the mosquito
genes differ widely in intron-exon structure, showing two to five introns at non-conserved
locations (Fig. S1C).
Scavenger Receptors (SCRs).
![Page 11: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K](https://reader035.vdocument.in/reader035/viewer/2022070909/5f8f19b07dbeb373966ca214/html5/thumbnails/11.jpg)
11
Members of this diverse family of multidomain, transmembrane or secreted receptors
play important roles in innate immunity and development. They recognize modified LDL,
multiple polyanionic ligands and cell wall components, and thus help internalize bacteria
and clear apoptotic cells (23). We considered three major classes named A, B and C. (Fig.
S2A).
Proteins of the A class (SCRA) are associated with macrophages and bear
collagenous and coiled-coil domains that bind polyanionic ligands and serve receptor
trimerization, respectively. Some members of this subfamily contain a Scavenger
Receptor Cysteine-Rich (SRCR) domain which, in a human protein (MARCO), binds
both Gram+ and Gram- bacteria (24). Five SRCR-containing proteins and 4 orthologous
pairs exist both in the fruitfly and in the mosquito (Fig. S2B). The Drosophila protein
Tequila/GRAAL and its Anopheles ortholog SCRASP1 (formerly Sp22D, enriched in
hemocytes; (25) additionally bear multiple chitin binding domains (CBD) and a C-
terminal domain related to coagulation and inflammatory serine proteases (Fig. S2A).
SCRASP2 is similar but lacks CBDs. SCRAC bears a partial C-type lectin domain, and
the fourth orthologous pair, SCRAL, matches the Lysyl oxidase (lys_ox) domain of
human Lox proteins (copper-containing amine oxidases that convert primary amines to
reactive aldehydes), (26).
The numerous members of the B class (SCRB) represent the CD36 family of
receptors, associated with the uptake of multiple ligands and erythrocytes infected with P.
falciparum (27). A total of 15 Anopheles and 12 Drosophila genes belong to this family,
including 8 orthologous pairs (Fig. S2C). One sequence cluster (SCRBQ) includes five
Drosophila and four Anopheles members but only a single 1:1 ortholog; one member is
![Page 12: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K](https://reader035.vdocument.in/reader035/viewer/2022070909/5f8f19b07dbeb373966ca214/html5/thumbnails/12.jpg)
12
Croquemort, a macrophage receptor that mediates binding and phagocytosis of apoptotic
corpses (28). Another sequence cluster consists almost exclusively of 1:1 orthologs and
includes the fruitfly epithelial membrane protein, EMP (29).
The third class, SCRC, includes four Drosophila members, each with two
complement-control protein (CCP) domains followed by a MAM domain (Meprin A5
antigen and RPTP Mu), and usually a somatomedin-B-like (BO) domain (Fig. S2A).
Three members have been described previously as dSR-CI, -CII and –CIII (30) and are
thought to function as PRRs in phagocytosis and innate immunity; CCP together with
MAM bind bacteria in vitro (31). The macrophage-specific dSR-CI recognizes a broad
range of polyanionic ligands, much like the mammalian SCRA homologues. The single
mosquito member of this class resembles dSR-CI and dSR-CII but, surprisingly, bears
two transmembrane domains, at the NH2 and COOH termini (Fig. S2A), according to the
current annotation.
C-Type Lectins (CTL).
These extracellular proteins, which are membrane-bound or soluble, are named for their
Ca2+ dependence and have a ca. 130 residue carbohydrate recognition domain (CRD),
with 18 highly conserved residues including 4 cysteines (32). Some insect CTLs show
affinity for LPS, increase in abundance after body wall injury or are stage-specific,
suggesting roles in both immunity and development (33). In general, seven groups (I-VII)
have been defined by sequence similarity. In both Drosophila (24/34 members) and
Anopheles (17/22) group VII predominates; it bears a single CRD without flanking
accessory domains. Eleven orthologous pairs exist; those bearing a QPD tripeptide motif
![Page 13: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K](https://reader035.vdocument.in/reader035/viewer/2022070909/5f8f19b07dbeb373966ca214/html5/thumbnails/13.jpg)
13
are expected to bind preferentially galactose (CTLGA subfamily), while a pair showing
an EPN-motif (CTLMA subfamily) should bind mannose (34). Several species-specific
family expansions exist. The largest one is in Drosophila and includes 12 genes, all of
which are limited to two chromosomal regions. The largest mosquito expansion has
generated five additional CTLMA members, four of which are clustered within 12 kb at
25D (Fig. S3).
Three 1:1 orthologs are complex lectins possessing additional recognizable
domains. One of these orthologs includes SRAC1 (see above). The two others show
resemblance to vertebrate selectins, integral transmembrane proteins involved in cell
adhesion (35), although they lack an EGF domain present in all other selectins. Each of
these CTLSEs contains 10 Sushi repeats, adhesive units of 50-70 residues that are found
in many proteins participating in the complement immune responses of mammals (36).
The Drosophila gene furrowed encodes one of these CTLSE members, CG1500.
Galectins (GALEs).
These thiol-dependent lectins are distributed widely in metazoa, sponges and
multicellular fungi. A conserved 130 residue core forms the family-specific globular
CRD, which can interact with β-galactosidase. Sequence variation within the core is
significant (20-40% in mammals), and together with a multiplicity of family members
suggests involvement in diverse functions. Galectins function in both development and
immunity: in vertebrates they are involved in cell fate determination, cell proliferation,
apoptosis and innate immunity (37, 38). The Drosophila galectin, Dmgal, contains two
tandem CRDs.
![Page 14: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K](https://reader035.vdocument.in/reader035/viewer/2022070909/5f8f19b07dbeb373966ca214/html5/thumbnails/14.jpg)
14
The A. gambiae and D. melanogaster genomes encode eight and five galectins,
respectively. A group of five galectins, GALE4-8 represents a species-specific expansion
in the mosquito (Fig. S4), and these galectins have a single CRD architecture. Two
sequence clusters represent the double-CRD Dmgal architecture; they include GALE1
and GALE 2 and their respective orthologs, Dmgal and CG5335, plus an additional
Drosophila member in each cluster. Finally, GALE3 and its ortholog, CG18565, bear
single CRDs and dysferlin domains of unknown function.
Signal Modulation and Amplification
Clip domain serine proteases (CLIPs).
Seventy-six CLIPs with complete clip domains were analyzed: 41 in A. gambiae and 35
in D. melanogaster. Most are structurally similar, beginning with a short signal peptide
followed by the clip domain, then a linker region of highly variable length and the serine
protease domain. A few contain additional domains upstream of the clip domain or
downstream of the protease domain. Seven CLIPs contain two clip domains, 3 in A.
gambiae and 4 in D. melanogaster (see Table S1).
CLIPs can be grouped into 4 subfamilies, A-D. Subfamily A contains 21 members
(ten in Anopheles), but only two 1:1 orthologs. Most members have substitutions at one
or more of the critical His/Asp/Ser triad within the catalytic domain. Most members also
have an unusual arrangement within the conserved serine motif: GDGGSP, instead of
GDSGGP. Vertebrate serine proteases usually have eight cysteines within the catalytic
![Page 15: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K](https://reader035.vdocument.in/reader035/viewer/2022070909/5f8f19b07dbeb373966ca214/html5/thumbnails/15.jpg)
15
domain, and six of these are shared by invertebrate serine proteases (39, 40). A striking
feature of subfamily A is the appearance of the missing pair of cysteines within the
catalytic domain (C13/C17). Subfamily A is also characterized by an unusually short
spacing between C1 and C2 of the clip domain. This family contains several CLIPs that
have recently been identified as upregulated after bacterial infections in A. gambiae
(CLIPA1, 6, 7).
Subfamily B has 27 members in Anopheles and Drosophila, including Easter and
several A. gambiae CLIPs ( CLIPB1, 2, 4, 8, 9, 10, 14, 15 ) previously identified (41-43).
This subfamily includes three 1:1 orthologs and one OG. CLIPB1, 4 and 9 show modest
levels of upregulation following bacterial or malaria parasite infections in Anopheles.
Nearly all members of this subfamily can be identified by the presence of another pair of
cysteines within a short insertion in a region between the His and Asp residues of the
catalytic triad (C10/C11). They also tend to have clear activation sites at the beginning of
the catalytic domain of the form RXXGG, suggesting that proteases with specificity for
cleaving after Arg will be necessary to activate most members of this subfamily. The
prophenoloxidase activating enzymes of Holotrichia, Bombyx and Manduca all belong to
subfamily B.
Subfamily C has twelve members, including the previously identified Drosophila
proteins Persephone and Snake, and the Anopheles CLIPC3 and C4 (41). THe subfamily
contains two OGs but no 1:1 orthologs. Several members contain a Cys residue at
position 13 after the active site Ser residue. This subfamily is also characterized by an
activation site, where cleavage is expected to occur after the His or Leu residues.
Subfamily D has 15 members, including three 1:1 orthologs two OGs. A known member
![Page 16: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K](https://reader035.vdocument.in/reader035/viewer/2022070909/5f8f19b07dbeb373966ca214/html5/thumbnails/16.jpg)
16
is the Drosophila protein Stubble, which is important in cytoskeleton organization and
biogenesis (44, 45). Members of this group share the motif RIVGG at the activation site.
In total, we identified eight putative 1:1 orthologs and five putative OGs between
the two fly species, as well as several specific expansions in each species (Fig. 4A). An
analysis of the chromosomal location of all the proteases by subfamily also indicates
additional substantial clustering (data not shown). Finally, four Drosophila and three
Anopheles CLIPs bear two Clip domains.
Serpins (SRPNs).
These are well-conserved, 350-400 residues long proteins. Inhibitory serpins act as
suicide substrates, mostly for serine and more rarely cysteine proteases. Following a
variable, more or less structured N-terminal region, the compact serpin core fold consists
of three β-sheets, 7-9 α helices, a hinge and a C-terminal flexible Reactive Center Loop
(RCL), (46). The RCL acts as bait for the target protease, which cleaves a scissile P1-P1´
peptide bond in the RCL. The sequence of the hinge region determines (47) whether the
cleaved RCL can insert efficiently into the A β-sheet dramatically distorting and
repositioning the bound protease (inhibitory serpins), or not (non-inhibitory serpins).
The majority of the serpin genes in Drosophila and Anopheles are physically
clustered, at 4 chromosomal locations in each species. Eight fruit fly serpins appear to
represent a species-specific expansion (Fig. 4B).
Signal transduction pathways
![Page 17: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K](https://reader035.vdocument.in/reader035/viewer/2022070909/5f8f19b07dbeb373966ca214/html5/thumbnails/17.jpg)
17
Tolls and the Toll Pathway.
The Drosophila Toll (which we suggest to be renamed Toll-1 for consistency) was
originally identified in Drosophila as a key player in dorsoventral patterning in
embryogenesis, and was later shown to be also required for immune induction of
antifungal and anti-Gram+ bacterial responses (48), (11). This discovery and the
subsequent characterization of Toll-like receptors (TLR) in mammals have made the Toll
family and its signaling pathway paradigms of innate immune regulation (49). The Toll
family encodes single-pass transmembrane proteins, with leucine-rich repeats (LRR)
interspersed with cysteine knots in the N-terminal rapidly evolving extracellular domain,
and a C-terminal, intracellular Toll-interleukin 1 receptor (TIR) domain. Drosophila
encodes 8 additional family members, Toll-2 to -9 (50, 51). All members show
developmentally specific expression patterns (52). While the dual function of Drosophila
Toll-1 is clear, the functions of Toll-2 to -9 are currently emerging. Toll-5 and -9 have
been associated with the antifungal response (50, 51, 53) and Toll-2 is required for
general antimicrobial gene expression in larvae (54).
Two fruitfly genes, Toll-1 and -5 form a probable OG together with four mosquito
genes, TOLL1A, 1B, 5A and 5B (Fig. 4C); only the mosquito genes have an additional
intron within the TIR-coding region. Interestingly, Drosophila Toll-1 and -5 are closely
related in their TIR domain, lack this intron and cluster in the phylogenetic tree apart
from the mosquito genes, suggesting the possibility of divergent evolution occuring in
parallel within each taxonomic lineage. However, a long C-terminal extension of similar
sequence with a nearly identical 18-residue segment, identifies Toll-1 and TOLL1A as
putative orthologs. A related protein, TOLL1B, has a shorter and more divergent
![Page 18: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K](https://reader035.vdocument.in/reader035/viewer/2022070909/5f8f19b07dbeb373966ca214/html5/thumbnails/18.jpg)
18
extension. Two others, TOLL5A and 5B, have the shortest tails, a feature that is shared
with Toll-5. Although Toll-1 and -5 are unlinked, the mosquito genes are physically
clustered, TOLL1A with 5A at cytological location 6C and TOLL1B with 5B at 39B, as
determined by in situ hybridization. We suggest that the type 1 and 5 Tolls were
ancestrally fixed and duplicated in the mosquito lineage, giving the TOLL1A/5A and
TOLL1B/5B gene clusters; and that selection is maintaining both sequence similarities
and contrasting features (length of C-terminal tail) in type 1 and 5 Tolls.
In Drosophila, the end points of the Toll pathway are the Rel transcription factors,
Dorsal and Dif. The ortholog of Drosophila Dorsal, Gambif-1 was identified previously
(55). Examination of the genome sequence identified a new 3-end exon for Gambif-1,
which may be alternatively spliced as in D. melanogaster (where a BigDorsal cDNA has
been characterized). However, no mosquito gene encoding an ortholog of Dif has been
found.
Imd and STAT pathway.
In Drosophila, Gram- bacterial infections signal through the alternative Imd pathway,
leading to nuclear translocation of the Rel transcription factor Relish and subsequent
expression of antibacterial AMPs like cecropin (56). An Anopheles ortholog of Relish has
been detected (Fig. S5). Recently it has been demonstrated that PGRP-LC in Drosophila
functions as a receptor for the Imd pathway (57). Exhaustive analysis of additional
signaling pathways is beyond the scope of the present study, but it appears that the Imd
pathway is also operative in the mosquito.
![Page 19: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K](https://reader035.vdocument.in/reader035/viewer/2022070909/5f8f19b07dbeb373966ca214/html5/thumbnails/19.jpg)
19
Other than Rel factors, STATs are important nuclear mediators of immunity. In
Drosophila, only one STAT gene is known. In Anopheles, STAT1 is produced from an
intron-less gene, and undergoes nuclear translocation after immune challenge (58). A
second gene, STAT2, was identified (Fig. S6), which possesses six introns (three at the
same locations as in DmSTAT). The two mosquito STAT genes are closely related (46%
identify, closer than either is to DmSTAT), indicating duplication after separation of the
mosquito and Drosophila lineages. We cannot exclude the possibility that STAT1 arose
by retrotransposon-mediated gene duplication (59).
Immune Effectors
Prophenoloxidases (PPOs).
PPOs evolved from hemocyanin oxygen transporters and, like them, contain two copper
binding sites, each with three essential histidines at conserved positions. The three newly
described members of the PPO family, AgPPO7, 8 and 9, contain the usual conserved
features, except that PPO9 lacks two sites (RF and RE) where proteolytic activation
normally occurs; alternative tryptic target sites may be used instead.
Seven mosquito genes are clustered in tandem orientation within ca. 65 kb at 21B
(2L), while PPO2 is located at 24B (2L). The apparently primitive PPO1 gene is located
outside the 2L chromosome (at 2R, subdivision 13B). The three Drosophila genes are not
physically clustered (Table S1).
Anticrobial peptides (AMPs).
![Page 20: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K](https://reader035.vdocument.in/reader035/viewer/2022070909/5f8f19b07dbeb373966ca214/html5/thumbnails/20.jpg)
20
Insect AMPs can be assigned to three major classes: peptides with cyteine bonds (e.g.
defensins), linear peptides forming amphipathic and hydrophobic α-helices (e.g.
cecropins), and those where particular amino-acids are over-represented. In infected
Drosophila, at least seven distinct families of AMPs are produced (60).
Insect defensins (DEFs) are preferentially active against Gram+ bacteria. More than
48 defensins have been reported from a wide range of insect orders and even scorpions
and mollusks (61). They are often 34-46 residues long and synthesized as precursors with
propeptides. The four Anopheles defensin genes are unlinked and very diverse (Fig. S7).
The most common types of defensin can now be described in terms of three clades. Clade
I may be specific for Diptera; it includes all the members in Aedes, Drosophila defensin
and the previously described defensin of Anopheles (62) which we shall call DEF1. Clade
II includes hymenopteran defensins but also those of the fly Stomoxys. Clade III includes
both hemipteran and dipteran (Chironomus) defensins. Clade IV includes highly
divergent defensins: three new members from Anopheles (DEF 2,3,4) and the dipteran
Sapecin C and lepidopteran Heliomicin. Finally, Clade V are the ancient defensins known
from a mussel, and the dragonfly Aeschna.
Cecropins (CECs) are especially potent against Gram- bacteria. More than 40 are
now known from various higher insects (Diptera and Lepidoptera), and they usually are
31-39 residues in length. Typical features are a tryptophan residue at position 1 or 2, and
post-translational amidation of a C-terminal glycine. Both features are thought to increase
peptide stability and efficacy against bacteria. Interestingly, when compared to the
![Page 21: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K](https://reader035.vdocument.in/reader035/viewer/2022070909/5f8f19b07dbeb373966ca214/html5/thumbnails/21.jpg)
21
Drosophila Cecropin A which has these features, the only previously reported Anopheles
CEC1 lacking the tryptophan residue was found to be efficient against yeast and a wider
spectrum of Gram+ bacteria. Four physically clustered cecropin genes exist in
Drosophila. Four are also present in Anopheles (Fig. S8). However, the peptide
sequences of these species belong to different clades, specific for brachyceran and
nematoceran Diptera, respectively; a third clade represents the cecropins of Lepidoptera.
Interestingly, of the Anopheles cecropins only CEC1 is closely similar to the Aedes
cecropins, while CEC2, 3 and 4 are highly divergent, supporting the possibility of
diversified antimicrobial specificities. All Anopheles cecropins lack the tryptophan at the
N-terminus.
One other antimicrobial peptide has been reported from Anopheles, Gambicin
(63). It has 61 residues and shows similarity to only one peptide, from Aedes (Genbank
accession number AAL76025). No paralogue of this gene was discovered in the full
Anopheles genome. In summary, at this stage it appears that the Anopheles AMPs are
encoded by the two most widespread AMP gene families and a mosquito-specific AMP
gene.
Caspases.
Caspases are a family of aspartate dependent endopeptidases. One subfamily includes
enzymes processing pro-inflammatory cytokines (e.g. Interleukin-1β converting enzyme,
ICE), and the other includes enzymes regulating cell death (e.g. the product of the ced3
gene in C. elegans). Both subfamilies are divided into groups carrying either long (L) or
![Page 22: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K](https://reader035.vdocument.in/reader035/viewer/2022070909/5f8f19b07dbeb373966ca214/html5/thumbnails/22.jpg)
22
short (S) prodomains (64). L-prodomain caspases are initiators that respond to pro- and
anti- apoptotic regulators via adaptor molecules carrying homologous interation domains.
Oligomerized initiator caspases auto-activate by self-cleavage into active
heterotetrameric subunits. For example, the Drosophila initiator caspase, DRONC, and
the adaptor Dark interact via their caspase-recruitment domains (CARD) to form
apoptotically active complexes. Similarly, the initiator caspase DREDD interacts via its
death effector domain (DED) with the apoptotic adaptor dFADD. These activated
apoptotic complexes initiate a proteolytic cascade of S-prodomain effector caspases,
which in turn cleave vital substrates and thus lead to cell death.
Genomic analysis has identified eleven, three and seven caspases in humans, C.
elegans and D. melanogaster (65). Omitting putative haplotypes, we identified 12
caspases in A. gambiae (Fig. S9) of which two are L initiator and 10 are S effector
caspases. Of the three Drosophila L-prodomain caspases, DRONC, DREDD, and
DREAM, the first two have Anopheles orthologs. DREAM does not, but its nearest
Drosophila paralog, DAMM, is a short-prodomain caspase, as are the most similar
Anopheles S9 and S10 caspases, which are clustered at locus 40B; this OG presumably
originated with a short-prodomain caspase, with DREAM representing a novel L-
prodomain member that evolved in the Drosophila lineage. Two other Drosophila S-
prodomain caspases, DRICE and DCP-1, are grouped with two mosquito caspases, S7
and S8. Finally, the fourth Drosophila effector caspase (DECAY) is associated in an OG
together with an expanded group of six Anopheles effector caspases, which are physically
clustered in two chromosomal loci, 21A and 43D. Interestingly, the mosquito initiators
CASPL1 and CASPL2 map to the same loci (Fig. S9B).
![Page 23: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K](https://reader035.vdocument.in/reader035/viewer/2022070909/5f8f19b07dbeb373966ca214/html5/thumbnails/23.jpg)
23
The negative cellular regulators of caspases, IAPs, are characterized by a 70-
residue domain, the baculoviral IAP repeat (BIR). Most members also carry additional C-
terminal domains, such as a RING finger motif or an ubiquitin-conjugating domain
(UBC). The BIR domains are involved with binding and inhibition of mature caspases,
whereas the RING and UBC domains may regulate caspase activity by protein
degradation pathways (66). In Drosophila four IAPs with distinct domain architecture
possess documented anti-apoptotic activity. The Anopheles IAPs include clear orthologs
of three Drosophila IAPs (THREAD, DETERIN and BRUCE). IAP2, the closest
homologue of DIAP2, although the predicted protein sequence contains only one BIR
domain. The two THREAD related members physically linked to Anopheles IAP1 (at
25D) may represent a clade lost from the Drosophila lineage. The expansion of both IAPs
and effector caspases in the mosquito as compared to Drosophila possibly suggests co-
evolution of apoptotic regulators that may fine tune cell death and/or immune responses
in the mosquito, such as those in midgut cells invaded by Plasmodium. A sixth Anopheles
IAP gene exists but is incomplete in the genome assembly and cannot be classified with
certainty.
The search for mosquito pro-apoptotic genes was hampered by the rapid sequence
diversification of the main players (65). We were unable to identify Anopheles
homologues of the three clustered fruitfly genes (rpr, grim, and wrinkled) that are
responsible for the majority of embryonic apoptotic cell death. However, blast analysis
has identified loci potentially encoding orthologs of FADD, Apaf-1, Acinus, Aif, Scythe,
![Page 24: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K](https://reader035.vdocument.in/reader035/viewer/2022070909/5f8f19b07dbeb373966ca214/html5/thumbnails/24.jpg)
24
as well as the two pro-apoptotic Bcl-2 homologues found in Drosophila, dBorg-1 and
dBorg-2.
![Page 25: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K](https://reader035.vdocument.in/reader035/viewer/2022070909/5f8f19b07dbeb373966ca214/html5/thumbnails/25.jpg)
25
Figure Legends
Table S1: Anopheles and Drosophila immunity gene list
Table lists A. gambiae immune-related genes according to the nomenclature convention
proposed here; indicates synonyms (if any); Ensembl gene predictions and protein
predictions; scaffold names; best estimate of chromosomal location based on BAC
hybridizations to polytene chromosomes (performed in the F. H. Collins and F. C.
Kafatos laboratories; (1); names Drosophila orthologs (if any); and various comments
such as likely duplicate haplotypes (1), genome coordinates, protein or gene features
accession number, etc. Gene families are presented in the order they are discussed in the
article and supplementary materials.
Figure S1: GNBP family
(A) Unrooted phylogenetic tree of the GNBP family. (B) Exon-intron structure of
GNBPA and B genes (scale is indicated). The position of introns is different in all genes
within mosquito and even between orthologous genes GNBPA3 in fruit fly and mosquito
(data not shown), consistent with the hypothesis of recent invasion of introns in
eukaryotic genes. (C) Schematic representation of the chromosomal arrangement of
GNBPB genes. GNBPB2, GNBPB3 and GNBPB4 are tightly clustered (B2 and B3 are
separated by 4441bp and B3 and B4 by 5136bp). Given the bootstrap values (not shown)
of the phylogenetic tree (A) and the physical clustering it is likely that the four GNBPB
genes arose from duplications.
![Page 26: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K](https://reader035.vdocument.in/reader035/viewer/2022070909/5f8f19b07dbeb373966ca214/html5/thumbnails/26.jpg)
26
Figure S2: SCR family
(A) Schematic representation of protein domain arrays detected in the Anopheles and
Drosophila putative scavenger receptors (SCRs). Predicted proteins are grouped in
Classes A, B and C. SRCR, Scavenger receptor cysteine-rich; FRI, Frizzled domain; LD,
low density lipoprotein; Lys_ox, Lysyl oxidase; CBD, chitin binding domain; SP,
trypsin-like serine protease; CL, C-type lectin; BO, Somatomedin B; CCP, complement-
control protein; MAM, Meprin A5 antigen and RPTP Mu; TM, transmembrane domain.
Presence of a methionine at the protein NH2-terminus is represented by a circle. (B)
Phylogenetic analysis of Class A-like SCRs (SCRAs). Sequences cluster into 3 groups
(shaded in gray): SCRASP (SCRAs with SP domains), SCRAC (SCRAs with CL
domains) and SCRAL (SCRAs with Lys_ox domains). (C) Phylogenetic analysis of the
Class B SCRs (SCRBs). The central regions (containing the CD36 domain) of 15
Anopheles and 12 Drosophila predicted proteins are compared. Croquemort-related genes
(SCRBQ) are shaded in gray.
* indicate genes possibly belonging to an orthologous group as defined by microsyntenic
analysis.
Figure S3: CTL family
Tree is based on C-type lectin domain sequence alignment. Several protein clusters are
highlighted: CTLSE, Selectins with C-type lectin domain; CTLMA and CTLGA, C-type
lectins with mannose and galactose binding motifs, respectively; SRAC, scavenger
receptors with C-type lectin domains.
![Page 27: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K](https://reader035.vdocument.in/reader035/viewer/2022070909/5f8f19b07dbeb373966ca214/html5/thumbnails/27.jpg)
27
Figure S4: GALE family
Color scheme and further information on the tree can be found in the Methods section of
the Supplementary Materials.
Figure S5: REL family
Color scheme and further information on the tree can be found in the Methods section of
the Supplementary Materials.
Figure S6: STAT family
Color scheme and further information on the tree can be found in the Methods section of
the Supplementary Materials.
Figure S7: DEF family
Gray shaded areas indicate the 5 different clades of Defensins.Clade I: Diptera; II:
Hymenoptera and Diptera; III: Hemiptera, Diptera; IV: Divergent defensins; V: Ancient
Defensins.
Figure S8: CEC family
![Page 28: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K](https://reader035.vdocument.in/reader035/viewer/2022070909/5f8f19b07dbeb373966ca214/html5/thumbnails/28.jpg)
28
Note: D. melanogaster Cecropins A1 and A2 are identical at the protein level (A1/A2
Dm) although encoded by 2 different genes. Gray shaded areas indicate the 3 different
clades of Cecropins.
Figure S9: CASP family
(A) Tree was constructed from caspase domains (InterPro IPR002398) of predicted
proteins. Underlined names indicate long prodomain (initiator) caspases. (B) Anopheles
predicted CASPs (arrows) are physically located in 3 clusters. Putative haplotypes are
indicated by asterisks and are not presented in A.
Figure S10: IAP family
Tree was constructed from alignment of complete predicted sequences. Conserved
structural architecture of domains is indicated at the bottom of the figure (BIR,
baculoviral IAP repeat; RING, Ring-finger motif; CARD, caspase-recruitment domain).
BIR domains are colored-coded to indicate relative similarity compared to THREAD.
![Page 29: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K](https://reader035.vdocument.in/reader035/viewer/2022070909/5f8f19b07dbeb373966ca214/html5/thumbnails/29.jpg)
29
3. Supporting figures
![Page 30: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K](https://reader035.vdocument.in/reader035/viewer/2022070909/5f8f19b07dbeb373966ca214/html5/thumbnails/30.jpg)
30
![Page 31: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K](https://reader035.vdocument.in/reader035/viewer/2022070909/5f8f19b07dbeb373966ca214/html5/thumbnails/31.jpg)
31
![Page 32: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K](https://reader035.vdocument.in/reader035/viewer/2022070909/5f8f19b07dbeb373966ca214/html5/thumbnails/32.jpg)
32
![Page 33: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K](https://reader035.vdocument.in/reader035/viewer/2022070909/5f8f19b07dbeb373966ca214/html5/thumbnails/33.jpg)
33
![Page 34: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K](https://reader035.vdocument.in/reader035/viewer/2022070909/5f8f19b07dbeb373966ca214/html5/thumbnails/34.jpg)
34
![Page 35: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K](https://reader035.vdocument.in/reader035/viewer/2022070909/5f8f19b07dbeb373966ca214/html5/thumbnails/35.jpg)
35
![Page 36: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K](https://reader035.vdocument.in/reader035/viewer/2022070909/5f8f19b07dbeb373966ca214/html5/thumbnails/36.jpg)
36
![Page 37: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K](https://reader035.vdocument.in/reader035/viewer/2022070909/5f8f19b07dbeb373966ca214/html5/thumbnails/37.jpg)
37
![Page 38: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K](https://reader035.vdocument.in/reader035/viewer/2022070909/5f8f19b07dbeb373966ca214/html5/thumbnails/38.jpg)
38
![Page 39: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K](https://reader035.vdocument.in/reader035/viewer/2022070909/5f8f19b07dbeb373966ca214/html5/thumbnails/39.jpg)
39
4. S
uppo
rtin
g ta
bles
Tab
le S
1: A
noph
eles
and
Dro
soph
ila im
mun
ity
gene
list
Ano
phel
es g
ene
list
Fam
ilyS
ubfa
mily
Gen
e N
ame
Syn
onym
Ens
embl
#P
rote
in p
redi
ctio
nS
caff
old
Chr
. loc
atio
nD
m O
rtho
logs
(if
any)
Com
men
ts
PG
RP
PG
RP
LP
GR
PLB
EN
SA
NG
G00
0000
1145
9ag
CP
1201
7A
AA
B01
0089
877A
PG
RP
-LB
(C
G14
704)
P
GR
PLD
NO
T P
RE
DIC
TE
DN
OT
PR
ED
ICT
ED
AA
AB
0100
8960
23A
PG
RP
-LD
(C
G55
23)
PG
RP
LAE
NS
AN
GG
0000
0007
952
agC
P15
020/
1511
4A
AA
B01
0089
0021
FP
GR
P-L
A (
CG
4384
)A
lso
CG
1861
4 an
d C
G43
61
PG
RP
LCE
NS
AN
GG
0000
0007
834
agC
P15
107
AA
AB
0100
8900
21F
PG
RP
-LC
(C
G44
32)
PG
RP
SP
GR
PS
1E
NS
AN
GG
0000
0014
831
agC
P13
479
AA
AB
0100
8846
1D
P
GR
PS
2E
NS
AN
GG
0000
0010
489
agC
P58
98A
AA
B01
0089
6023
A-2
5D
P
GR
PS
3E
NS
AN
GG
0000
0010
490
agC
P59
06A
AA
B01
0089
6023
A-2
5D
GN
BP
GN
BP
AG
NB
PA
1
EN
SA
NG
G00
0000
1777
1ag
CP
3847
AA
AB
0100
8807
25D
G
NB
PA
2
EN
SA
NG
G00
0000
0671
9E
BIP
8943
AA
AB
0100
8986
43D
CG
5008
(G
NB
PA
3)
GN
BP
BG
NB
PB
1A
gGN
BP
EN
SA
NG
G00
0000
1520
5ag
CP
1409
3A
AA
B01
0088
9819
C
G
NB
PB
2
EN
SA
NG
G00
0000
1452
8ag
CP
1153
AA
AB
0100
8851
13E
G
NB
PB
3
EN
SA
NG
G00
0000
1454
6ag
CP
1164
AA
AB
0100
8851
13E
GN
BP
B4
E
NS
AN
GG
0000
0013
732
agC
P17
31A
AA
B01
0088
5113
E
SC
RS
CR
AS
CR
AS
P1
CP
6127
(SpD
22)
EN
SA
NG
G00
0000
1930
7ag
CP
6127
AA
AB
0100
8960
23A
Teq
uila
S
CR
AS
P2
E
NS
AN
GG
0000
0008
472
agC
P55
48A
AA
B01
0089
6023
A-2
5DC
G21
05
S
CR
AS
P3
E
NS
AN
GG
0000
0005
937
EB
IP78
71A
AA
B01
0089
878D
S
CR
AC
1
EN
SA
NG
G00
0000
1901
4ag
CP
4856
AA
AB
0100
8984
33D
CG
3921
SC
RA
L1
EN
SA
NG
G00
0000
1782
3ag
CP
2405
AA
AB
0100
8880
18A
LOX
2
SC
RB
SC
RB
1
EN
SA
NG
G00
0000
1171
9ag
CP
1279
AA
AB
0100
8859
11C
S
CR
B2
E
NS
AN
GG
0000
0013
404
agC
P65
00A
AA
B01
0089
6023
AC
G74
22
S
CR
B3
E
NS
AN
GG
0000
0013
400
agC
P64
64A
AA
B01
0089
6023
AC
G18
87
S
CR
B4
E
NS
AN
GG
0000
0013
409
agC
P65
24A
AA
B01
0089
6023
A
S
CR
B5
E
NS
AN
GG
0000
0007
786
agC
P17
14A
AA
B01
0088
5913
E
S
CR
B6
E
NS
AN
GG
0000
0001
210
EB
IP14
14A
AA
B01
0089
5219
C-1
9DC
G10
345
S
CR
B7
E
NS
AN
GG
0000
0010
163
agC
P43
28A
AA
B01
0089
0520
DC
G27
36
S
CR
B8
E
NS
AN
GG
0000
0010
154
agC
P43
98A
AA
B01
0089
0520
DC
G38
29
S
CR
B9
E
NS
AN
GG
0000
0010
167
agC
P43
29A
AA
B01
0089
0520
DE
mp
S
CR
B10
E
NS
AN
GG
0000
0017
284
agC
P13
309
AA
AB
0100
8846
5A
S
CR
B11
E
NS
AN
GG
0000
0018
883
agC
P13
24A
AA
B01
0088
5911
CC
G70
00
![Page 40: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K](https://reader035.vdocument.in/reader035/viewer/2022070909/5f8f19b07dbeb373966ca214/html5/thumbnails/40.jpg)
40
Ano
phel
es g
ene
list
Fam
ilyS
ubfa
mily
Gen
e N
ame
Syn
onym
Ens
embl
#P
rote
in p
redi
ctio
nS
caff
old
Chr
. loc
atio
nD
m O
rtho
logs
(if
any)
Com
men
ts
SC
R (
cont
.)
SC
RB
12
EN
SA
NG
G00
0000
1399
8ag
CP
5912
AA
AB
0100
8960
23A
S
CR
BQ
1
EN
SA
NG
G00
0000
0981
6ag
CP
8667
AA
AB
0100
8980
36C
S
CR
BQ
2
EN
SA
NG
G00
0000
0979
9ag
CP
8642
AA
AB
0100
8980
36C
S
CR
BQ
3
EN
SA
NG
G00
0000
1941
4ag
CP
1157
1A
AA
B01
0089
6429
F
SC
RB
Q4
E
NS
AN
GG
0000
0016
196
agC
P15
529
AA
AB
0100
8904
15B
CG
1278
9
agC
P10
081
Hap
loty
pe
SC
RC
SC
RC
1
EN
SA
NG
G00
0000
1271
5ag
CP
9671
AA
AB
0100
8986
43D
TE
P
TE
P1
aTE
P-I
EN
SA
NG
G00
0000
1436
8ag
CP
4020
AA
AB
0100
8951
40A
T
EP
2E
NS
AN
GG
0000
0017
238
agC
P11
437
AA
AB
0100
8964
29A
Tep
III, T
EP
15
T
EP
3E
NS
AN
GG
0000
0013
794
agC
P40
24A
AA
B01
0089
5140
A
T
EP
4ag
IMC
R14
EN
SA
NG
G00
0000
1872
7ag
CP
8988
AA
AB
0100
8979
39C
TE
P5
ENSA
NG
G00
0000
1379
4/a
gC
P40
16/
AA
AB
0100
8951
40A
ENSA
NG
G00
0000
1435
5a
gC
P40
17
T
EP
6E
NS
AN
GG
0000
0014
364
agC
P40
19A
AA
B01
0089
5140
A
T
EP
7E
NS
AN
GG
0000
0014
360
agC
P40
18A
AA
B01
0089
5140
A
T
EP
8E
NS
AN
GG
0000
0015
631
agC
P10
561
AA
AB
0100
8823
40B
T
EP
9E
NS
AN
GG
0000
0015
632
agC
P10
570
AA
AB
0100
8823
40B
T
EP
10E
NS
AN
GG
0000
0015
628
agC
P10
523
AA
AB
0100
8823
40B
T
EP
11E
NS
AN
GG
0000
0015
629
agC
P10
531
AA
AB
0100
8823
40B
T
EP
12E
NS
AN
GG
0000
0010
537
agC
P15
205
AA
AB
0100
8944
30E
T
EP
13E
NS
AN
GG
0000
0005
017
EB
I662
9A
AA
B01
0089
6429
AT
epV
I
T
EP
14E
NS
AN
GG
0000
0017
173
agC
P11
010
AA
AB
0100
8964
29A
T
EP
15E
NS
AN
GG
0000
0017
033
agC
P10
937
AA
AB
0100
8964
29A
Tep
III, T
EP
2
TE
P16
ENSA
NG
G00
0000
1879
3a
gC
P90
15A
AA
B01
0089
7939
C-4
0A
Put
ativ
e ha
plot
ype
of T
EP
1
TE
P17
ENSA
NG
G00
0000
1878
9a
gC
P90
09A
AA
B01
0089
7939
C
Put
ativ
e ha
plot
ype
of T
EP
5
TE
P18
ENSA
NG
G00
0000
1879
1a
gC
P90
14A
AA
B01
0089
7939
C
Put
ativ
e ha
plot
ype
of T
EP
6
TE
P19
EN
SAN
GG
0000
0015
630
ag
CP
1055
2A
AA
B01
0088
2340
B
Put
ativ
e ha
plot
ype
of T
EP
8
GA
LE
GA
LE1
E
NS
AN
GG
0000
0014
203
agC
P49
92A
AA
B01
0089
8432
AG
alec
tin (
CG
1137
2), C
G11
374
G
ALE
2
EN
SA
NG
G00
0000
1239
5 ag
CP
1373
7A
AA
B01
0088
461D
CG
5335
G
ALE
3
EN
SA
NG
G00
0000
1974
6ag
CP
2078
AA
AB
0100
8948
21A
CG
1856
5
G
ALE
4
EN
SA
NG
G00
0000
1318
0ag
CP
7067
AA
AB
0100
8816
42B
G
ALE
5
EN
SA
NG
G00
0000
1313
5ag
CP
6926
AA
AB
0100
8816
42B
G
ALE
6
EN
SA
NG
G00
0000
0818
1ag
CP
2657
AA
AB
0100
8968
20D
![Page 41: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K](https://reader035.vdocument.in/reader035/viewer/2022070909/5f8f19b07dbeb373966ca214/html5/thumbnails/41.jpg)
41
Ano
phel
es g
ene
list
Fam
ilyS
ubfa
mily
Gen
e N
ame
Syn
onym
Ens
embl
#P
rote
in p
redi
ctio
nS
caff
old
Chr
. loc
atio
nD
m O
rtho
logs
(if
any)
Com
men
ts
GA
LE (
cont
.)
GA
LE7
E
NS
AN
GG
0000
0002
705
EB
IP33
68A
AA
B01
0089
6820
D
GA
LE8
IGA
LE20
NO
T P
RE
DIC
TE
DN
OT
PR
ED
ICT
ED
UN
KN
OW
NU
NK
NO
WN
CT
LC
TLG
AC
TLG
A1
E
NS
AN
GG
0000
0009
790
agC
P83
97A
AA
B01
0089
8036
CC
G60
55
C
TLG
A2
E
NS
AN
GG
0000
0017
954
agC
P59
99A
AA
B01
0088
0723
A-2
5DC
G41
15
C
TLG
A3
E
NS
AN
GG
0000
0009
745
agC
P83
53A
AA
B01
0089
8036
CC
G32
44
CT
LGA
4
NO
T P
RE
DIC
TE
D
AA
AB
0100
8859
13E
CT
LMA
CT
LMA
1
EN
SA
NG
G00
0000
1584
2ag
CP
3775
AA
AB
0100
8807
25D
C
TLM
A2
N
OT
PR
ED
ICT
ED
A
AA
B01
0089
635C
C
TLM
A3
E
NS
AN
GG
0000
0015
840
agC
P37
70A
AA
B01
0088
0725
D
C
TLM
A4
E
NS
AN
GG
0000
0015
439
agC
P38
53A
AA
B01
0088
0725
D
C
TLM
A5
E
NS
AN
GG
0000
0016
025
agC
P33
02A
AA
B01
0088
0725
D
CT
LMA
6
EN
SA
NG
G00
0000
1844
9ag
CP
5680
AA
AB
0100
8960
21F
-23A
CG
9134
CT
LSE
CT
LSE
1
EN
SA
NG
G00
0000
0946
9ag
CP
5322
AA
AB
0100
8815
5CC
G15
00
CT
LSE
2
EN
SA
NG
G00
0000
0614
3E
BIP
8141
AA
AB
0100
8846
4CC
G90
95
CT
L C
TL1
E
NS
AN
GG
0000
0018
421
agC
P56
20A
AA
B01
0089
6021
F-2
3A
C
TL2
N
OT
PR
ED
ICT
ED
A
AA
B01
0088
4839
C
C
TL3
E
NS
AN
GG
0000
0008
945
agC
P27
87A
AA
B01
0089
6820
C
C
TL4
E
NS
AN
GG
0000
0018
677
agC
P64
06A
AA
B01
0089
6021
F-2
3A
C
TL5
E
NS
AN
GG
0000
0018
273
agC
P13
553
AA
AB
0100
8846
1D
C
TL6
E
NS
AN
GG
0000
0018
058
agC
P57
16A
AA
B01
0089
6023
A-2
5DC
G14
866
C
TL7
E
NS
AN
GG
0000
0018
029
agC
P42
67A
AA
B01
0088
115C
CG
1576
5
CT
L8
EN
SA
NG
G00
0000
0940
1ag
CP
7946
AA
AB
0100
8888
15D
-16A
CG
1486
6
C
TL9
EN
SA
NG
G00
000
008
133
EB
IP10
622
AA
AB
0100
8859
11C
-13E
CG
1843
1
S
CR
AC
1
EN
SA
NG
G00
0000
0940
1ag
CP
7946
AA
AB
0100
8984
33D
CG
3921
Als
o lis
ted
in S
CR
FB
NF
BN
1
EN
SA
NG
G00
0000
1252
3ag
CP
6864
AA
AB
0100
8816
42B
FB
N2
EN
SA
NG
G00
0000
0877
6ag
CP
7061
AA
AB
0100
8816
42B
FB
N3
EN
SA
NG
G00
0000
0877
3ag
CP
7060
AA
AB
0100
8816
42B
FB
N4
EN
SA
NG
G00
0000
0622
7E
BIP
8256
AA
AB
0100
8816
42B
ha
plot
ype
EN
SA
NG
G00
0000
1319
4
FB
N5
EN
SA
NG
G00
0000
0625
4E
BIP
8288
AA
AB
0100
8816
42B
FB
N6
EN
SA
NG
G00
0000
1315
5ag
CP
6947
AA
AB
0100
8816
42B
FB
N7
EN
SA
NG
G00
0000
0624
8E
BIP
8282
AA
AB
0100
8816
42B
FB
N8
EN
SA
NG
G00
0000
0876
3ag
CP
7043
AA
AB
0100
8816
42B
FB
N9
AgF
BN
L11
EN
SA
NG
G00
0000
0875
9ag
CP
7034
AA
AB
0100
8816
42B
![Page 42: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K](https://reader035.vdocument.in/reader035/viewer/2022070909/5f8f19b07dbeb373966ca214/html5/thumbnails/42.jpg)
42
Ano
phel
es g
ene
list
Fam
ilyS
ubfa
mily
Gen
e N
ame
Syn
onym
Ens
embl
#P
rote
in p
redi
ctio
nS
caff
old
Chr
. loc
atio
nD
m O
rtho
logs
(if
any)
Com
men
ts
FB
N (
cont
.)F
BN
10E
NS
AN
GG
0000
0013
191
agC
P70
93A
AA
B01
0088
1642
B
FB
N11
EN
SA
NG
G00
0000
0880
4ag
CP
7104
AA
AB
0100
8816
42B
FB
N12
EN
SA
NG
G00
0000
1808
4ag
CP
1175
4A
AA
B01
0089
3339
A
FB
N13
EN
SA
NG
G00
0000
1928
5ag
CP
8979
AA
AB
0100
8979
39C
FB
N14
EN
SA
NG
G00
0000
1880
8ag
CP
8950
AA
AB
0100
8979
39C
-40B
FB
N15
EN
SA
NG
G00
0000
1030
5ag
CP
1303
3A
AA
B01
0074
95U
NK
NO
WN
FB
N16
EN
SA
NG
G00
0000
1718
6ag
CP
8960
AA
AB
0100
8979
39C
-40B
FB
N17
EN
SA
NG
G00
0000
1716
5ag
CP
9023
AA
AB
0100
8979
39C
FB
N18
EN
SA
NG
G00
0000
1779
2ag
CP
8985
AA
AB
0100
8979
39C
FB
N19
EN
SA
NG
G00
0000
1719
5ag
CP
8965
AA
AB
0100
8979
39C
FB
N20
EN
SA
NG
G00
0000
1774
9ag
CP
8966
AA
AB
0100
8979
39C
FB
N21
EN
SA
NG
G00
0000
1718
9ag
CP
8961
AA
AB
0100
8979
39C
hapl
otyp
e E
NS
AN
GG
0000
0017
158
FB
N22
EN
SA
NG
G00
0000
1779
3ag
CP
8986
AA
AB
0100
8979
39C
FB
N23
AgF
BN
U10
EN
SA
NG
G00
0000
1570
3ag
CP
1049
3A
AA
B01
0088
2340
B
FB
N24
A/B
EN
SA
NG
G00
0000
0196
3E
BIP
2335
AA
AB
0100
8948
21A
FB
N25
AgF
BN
E3
EN
SA
NG
G00
0000
1882
9ag
CP
2049
AA
AB
0100
8948
21A
FB
N26
EN
SA
NG
G00
0000
0895
9ag
CP
2145
AA
AB
0100
8948
21B
hapl
otyp
e E
NS
AN
GG
0000
0013
950
FB
N27
EN
SA
NG
G00
0000
0836
5ag
CP
2143
AA
AB
0100
8948
21B
FB
N28
EN
SA
NG
G00
0000
0898
9ag
CP
2150
AA
AB
0100
8948
21B
FB
N29
EN
SA
NG
G00
0000
0903
2ag
CP
2016
AA
AB
0100
8948
21B
ha
plot
ype
EN
SA
NG
0000
0013
951
FB
N30
EN
SA
NG
G00
0000
1920
9ag
CP
3444
AA
AB
0100
8807
25D
FB
N31
EN
SA
NG
G00
0000
1777
9ag
CP
3894
AA
AB
0100
8807
25D
FB
N32
EN
SA
NG
G00
0000
0576
2E
BIP
7635
AA
AB
0100
8807
25D
FB
N33
AgF
BN
27E
NS
AN
GG
0000
0017
827
agC
P31
43A
AA
B01
0088
0725
D
FB
N34
EN
SA
NG
G00
0000
0907
5ag
CP
1227
2A
AA
B01
0089
877A
CG
9593
FB
N35
EN
SA
NG
G00
0000
0881
9ag
CP
1289
3A
AA
B01
0089
878D
1
FB
N36
EN
SA
NG
G00
0000
1132
2ag
CP
5142
AA
AB
0100
0889
43D
ha
plot
ype
EN
SA
NG
G00
0000
1397
1
FB
N37
EN
SA
NG
G00
0000
1707
4ag
CP
4816
AA
AB
0100
8984
32A
-32D
FB
N38
EN
SA
NG
G00
0000
0774
6ag
CP
8057
AA
AB
0100
8980
35B
-36C
FB
N39
EN
SA
NG
G00
0000
1889
1ag
CP
7333
AA
AB
0100
8847
1A
FB
N40
NP
1N
OT
PR
ED
ICT
ED
25
D
2L-2
1456
823-
2146
1521
FB
N41
NP
2N
OT
PR
ED
ICT
ED
25
DS
CA
B2L
-401
2838
9-40
1328
74
FB
N42
NP
3N
OT
PR
ED
ICT
ED
21
A
2L-5
4795
71-5
4840
71
![Page 43: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K](https://reader035.vdocument.in/reader035/viewer/2022070909/5f8f19b07dbeb373966ca214/html5/thumbnails/43.jpg)
43
Ano
phel
es g
ene
list
Fam
ilyS
ubfa
mily
Gen
e N
ame
Syn
onym
Ens
embl
#P
rote
in p
redi
ctio
nS
caff
old
Chr
. loc
atio
nD
m O
rtho
logs
(if
any)
Com
men
ts
FB
N (
cont
.)F
BN
43N
P4
NO
T P
RE
DIC
TE
D
42B
3L
-193
3511
8-19
3397
35
FB
N44
NP
5N
OT
PR
ED
ICT
ED
42
B
3L-1
9390
511-
1939
5119
A
FB
N45
NP
6N
OT
PR
ED
ICT
ED
42
B
3L-1
9390
511-
1939
5119
B
FB
N46
NP
7N
OT
PR
ED
ICT
ED
NO
T P
RE
DIC
TE
D
42B
3L
-195
6211
1-19
5667
22
FB
N47
NP
8N
OT
PR
ED
ICT
ED
NO
T P
RE
DIC
TE
D
42B
3L
-195
6561
1-19
5702
25
FB
N48
NP
9N
OT
PR
ED
ICT
ED
NO
T P
RE
DIC
TE
D
39C
3L
-877
3099
-877
7683
FB
N49
NP
10N
OT
PR
ED
ICT
ED
NO
T P
RE
DIC
TE
D
39C
3L
-879
5783
-880
0286
FB
N50
NP
11N
OT
PR
ED
ICT
ED
NO
T P
RE
DIC
TE
D
39C
3L
-880
6357
-881
0968
FB
N51
NP
12N
OT
PR
ED
ICT
ED
NO
T P
RE
DIC
TE
D
39C
3L
-882
4145
-882
8762
FB
N52
NP
13N
OT
PR
ED
ICT
ED
NO
T P
RE
DIC
TE
D
39C
3L
-902
0287
-902
4808
FB
N53
NP
14N
OT
PR
ED
ICT
ED
NO
T P
RE
DIC
TE
D
39C
3L
-904
6294
-905
0902
FB
N54
NP
15, A
gFB
N25
2N
OT
PR
ED
ICT
ED
NO
T P
RE
DIC
TE
D
42B
3L
-193
3875
4-19
3433
68
FB
N55
NP
16N
OT
PR
ED
ICT
ED
NO
T P
RE
DIC
TE
D
33D
3R
-303
2440
2-30
3289
38
FB
N56
NP
17N
OT
PR
ED
ICT
ED
NO
T P
RE
DIC
TE
D
UN
KN
OW
N
UN
KN
-227
3997
1-22
7445
79
FB
N57
NP
18N
OT
PR
ED
ICT
ED
NO
T P
RE
DIC
TE
D
UN
KN
OW
N
UN
KN
-462
7657
7-46
2809
93
EN
SA
NG
G00
0000
0880
7ag
CP
7105
AA
AB
0100
8816
42B
p
seud
ogen
e
EN
SA
NG
G00
0000
1716
3ag
CP
9022
AA
AB
0100
8979
39C
p
seud
ogen
e
E
NS
AN
GG
0000
0008
751
agC
P69
40A
AA
B01
0088
1642
B
pse
udog
ene
CLI
PC
LIP
AC
LIP
A1
ISP
R20
EN
SA
NG
G00
0000
1777
3ag
CP
9913
AA
AB
0100
8986
43D
-46D
CLI
PA
2IS
PL5
EN
SA
NG
G00
0000
1776
3ag
CP
9904
AA
AB
0100
8986
44B
-43D
two
clip
dom
ains
CLI
PA
3E
NS
AN
GG
0000
0012
814
agC
P67
80A
AA
B01
0082
00U
NK
NO
WN
CG
1331
8
CLI
PA
4E
NS
AN
GG
0000
0017
707
agC
P98
58A
AA
B01
0089
8644
B-4
3D
CLI
PA
5E
NS
AN
GG
0000
0017
770
agC
P99
12A
AA
B01
0089
8644
B-4
3D
CLI
PA
6IS
PR
9lik
eE
NS
AN
GG
0000
0017
677
agC
p954
7A
AA
B01
0089
8644
B-4
3D
CLI
PA
7IS
PR
9E
NS
AN
GG
0000
0017
686
agC
p955
7A
AA
B01
0089
8644
B-4
3D
CLI
PA
8E
NS
AN
GG
0000
0016
096
agC
P72
14A
AA
B01
0088
4839
C
CLI
PA
9E
NS
AN
GG
0000
0010
217
agC
P10
582
AA
AB
0100
8823
41A
CLI
PA
10N
ot p
redi
cted
EB
IP76
90A
AA
B01
0088
0725
D-2
7CC
G49
98
CLI
PB
CLI
PB
1A
gSp1
4D2
EN
SA
NG
G00
0000
1109
5ag
CP
1425
6A
AA
B01
0087
9414
D
CLI
PB
2A
gSer
2E
NS
AN
GG
0000
0011
531
agC
P29
89A
AA
B01
0088
7914
CP
redi
ctio
n re
fined
CLI
PB
3E
NS
AN
GG
0000
0011
053
agC
P14
259
AA
AB
0100
8794
14C
-14D
CLI
PB
4A
gSp1
4D1
Not
pre
dict
edN
ot p
redi
cted
Not
pre
dict
ed14
Dac
c.no
:AF
0071
66
CLI
PB
5E
NS
AN
GG
0000
0009
231
agC
P24
80A
AA
B01
0088
8018
AC
G11
02, E
aste
r
![Page 44: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K](https://reader035.vdocument.in/reader035/viewer/2022070909/5f8f19b07dbeb373966ca214/html5/thumbnails/44.jpg)
44
Ano
phel
es g
ene
list
Fam
ilyS
ubfa
mily
Gen
e N
ame
Syn
onym
Ens
embl
#P
rote
in p
redi
ctio
nS
caff
old
Chr
. loc
atio
nD
m O
rtho
logs
(if
any)
Com
men
ts
CLI
P (
cont
.)C
LIP
B6
EN
SA
NG
G00
0000
1044
1ag
CP
1427
2A
AA
B01
0087
9414
C-1
4D
CLI
PB
7E
NS
AN
GG
0000
0019
167
agC
P14
747
AA
AB
0100
8799
10D
-11C
CLI
PB
8A
gSer
6N
ot p
redi
cted
Not
pre
dict
edN
ot p
redi
cted
14A
CG
9737
acc.
no: A
J459
779
CLI
PB
9A
gSp1
4AE
NS
AN
GG
0000
0017
835
agC
P30
48A
AA
B01
0088
7914
A
CLI
PB
10A
gSer
8E
NS
AN
GG
0000
0017
835
agC
P30
48A
AA
B01
0088
7914
AC
G30
66
CLI
PB
11E
NS
AN
GG
0000
0008
059
agC
P45
75A
AA
B01
0089
8433
D
CLI
PB
12E
NS
AN
GG
0000
0008
007
agC
P45
30A
AA
B01
0089
8433
D
CLI
PB
13E
NS
AN
GG
0000
0010
153
agC
P43
97A
AA
B01
0089
0520
DC
G58
96
CLI
PB
14A
gSer
4E
NS
AN
GG
0000
0015
633
agC
P10
576
AA
AB
0100
8823
40A
CLI
PB
15A
gSer
3E
NS
AN
GG
0000
0013
326
agC
P88
06A
AA
B01
0089
8035
B-3
6C
CLI
PB
16E
NS
AN
GG
0000
0009
880
agC
P46
46A
AA
B01
0089
8433
D
CLI
PB
17E
NS
AN
GG
0000
0019
659
agC
P12
211
AA
AB
0100
8987
7A-1
0Dtw
o cl
ip d
omai
ns
CLI
PC
CLI
PC
1E
NS
AN
GG
0000
0014
314
agC
P45
43A
AA
B01
0089
8432
AS
nake
CLI
PC
2E
NS
AN
GG
0000
0010
933
agC
P14
099
AA
AB
0100
8898
18C
-19C
Sna
ke
CLI
PC
3A
gSp1
8DE
NS
AN
GG
0000
0010
982
agC
P14
119
AA
AB
0100
8898
18C
Sna
ke
CLI
PC
4S
p2A
Not
pre
dict
edN
ot p
redi
cted
Not
pre
dict
ed2A
acc.
no:A
F11
7752
CLI
PC
5E
NS
AN
GG
0000
0014
810
agC
P13
087
AA
AB
0100
8846
1D-4
CC
G63
61, p
erse
phon
e
CLI
PC
6E
NS
AN
GG
0000
0014
810
agC
P13
087
AA
AB
0100
8846
1D-4
CC
G63
61, p
erse
phon
e
CLI
PC
7IS
PR
5E
NS
AN
GG
0000
0018
770
agC
p787
2A
AA
B01
0088
88U
NK
NO
WN
CLI
PD
CLI
PD
1A
gSer
1E
NS
AN
GG
0000
0012
449
agC
P16
94A
AA
B01
0088
5911
CC
G93
72
CLI
PD
2E
NS
AN
GG
0000
0019
362
agC
P11
478
AA
AB
0100
8964
29A
-31C
CG
1682
1
CLI
PD
3E
NS
AN
GG
0000
0013
258
agC
P12
231
AA
AB
0100
8987
7A-1
0DC
G74
32
CLI
PD
4E
NS
AN
GG
0000
0013
699
agC
P18
28A
AA
B01
0088
5913
EC
G12
99
CLI
PD
5N
ot p
redi
cted
Not
pre
dict
edA
AA
B01
0088
5911
B-1
4CC
G12
99P
redi
ctio
n re
fined
CLI
PD
6E
NS
AN
GG
0000
0014
328
agC
P18
49A
AA
B01
0088
5913
EC
G12
99tw
o cl
ip d
omai
ns
CLI
PD
7E
NS
AN
GG
0000
0014
695
agC
P48
04A
AA
B01
0089
8432
AC
G82
13, S
tubb
le, C
G81
72
SR
PN
S
RP
N1
E
NS
AN
GG
0000
0019
162
agC
P34
18A
AA
B01
0088
0725
DS
erpi
n-27
A, S
RP
N2,
SR
PN
3in
hibi
tory
: K/F
S
RP
N2
E
NS
AN
GG
0000
0019
323
agC
P37
68A
AA
B01
0088
0725
DS
erpi
n-27
A, S
RP
N1,
SR
PN
3in
hibi
tory
: K/F
S
RP
N3
E
NS
AN
GG
0000
0005
827
EB
I772
3A
AA
B01
0088
0725
DS
erpi
n-27
A, S
RP
N1,
SR
PN
2in
hibi
tory
:T/I
S
RP
N4
E
NS
AN
GG
0000
0003
652
EB
I466
2A
AA
B01
0089
8035
BC
G72
19, S
RP
N5,
SR
PN
6in
hibi
tory
: I/S
S
RP
N5
E
NS
AN
GG
0000
0008
018
agC
P49
23A
AA
B01
0089
8433
DC
G72
19, S
RP
N4,
SR
PN
6in
hibi
tory
:S/L
S
RP
N6
E
NS
AN
GG
0000
0008
056
agC
P45
62A
AA
B01
0089
8433
DC
G72
19, S
RP
N4,
SR
PN
5in
hibi
tory
:I/G
S
RP
N7
E
NS
AN
GG
0000
0018
236
agC
P33
00A
AA
B01
0088
0725
D
inhi
bito
ry:R
/V
![Page 45: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K](https://reader035.vdocument.in/reader035/viewer/2022070909/5f8f19b07dbeb373966ca214/html5/thumbnails/45.jpg)
45
Ano
phel
es g
ene
list
Fam
ilyS
ubfa
mily
Gen
e N
ame
Syn
onym
Ens
embl
#P
rote
in p
redi
ctio
nS
caff
old
Chr
. loc
atio
nD
m O
rtho
logs
(if
any)
Com
men
ts
SR
PN
(co
nt.)
S
RP
N8
E
NS
AN
GG
0000
0012
210
agC
P30
27A
AA
B01
0088
7914
CC
G66
80in
hibi
tory
: K/A
S
RP
N9
E
NS
AN
GG
0000
0014
191
agC
P29
80A
AA
B01
0088
7914
A-C
CG
6687
, Sp5
inhi
bito
ry: S
/S
S
RP
N10
E
NS
AN
GG
0000
0013
014
agC
P14
891
AA
AB
0100
8900
21F
in
hibi
tory
: K/R
S
RP
N11
E
NS
AN
GG
0000
0014
319
agC
P12
201
AA
AB
0100
8987
7A
non
inhi
bito
ry
S
RP
N12
E
NS
AN
GG
0000
0014
436
agC
P12
184
AA
AB
0100
8987
7A
non
inhi
bito
ry
S
RP
N13
E
NS
AN
GG
0000
0013
014
agC
P12
957
AA
AB
0100
8407
21C
no
n in
hibi
tory
S
RP
N14
E
NS
AN
GG
0000
0019
032
agC
P35
74A
AA
B01
0088
0725
D
non
inhi
bito
ry
SR
PN
15
ENSA
NG
G00
0000
1295
9a
gC
P92
54A
AA
B01
0082
87
P
utat
ive
hapl
otyp
e of
SR
PN
9
TO
LLT
OLL
1AN
ot p
redi
cted
NO
T P
RE
DIC
TE
DA
AA
B01
0088
116C
Tol
l1
TO
LL5A
EN
SA
NG
G00
000
015
554/
agC
P41
97/
AA
AB
0100
8811
5BT
oll5
EN
SA
NG
G00
000
015
552
agC
P41
96
TO
LL1B
EN
SA
NG
G00
000
014
400/
agC
P71
98/
AA
AB
0100
8848
39B
Tol
l1
EN
SA
NG
G00
000
014
747
agC
P19
10
TO
LL5B
EN
SA
NG
G00
0000
1414
8ag
CP
7266
AA
AB
0100
8848
39B
Tol
l5
TO
LL6
EN
SA
NG
G00
0000
0673
4E
BIP
8963
AA
AB
0100
8986
43D
Tol
l6
TO
LL7
EN
SA
NG
G00
0000
0677
2E
BIP
9016
AA
AB
0100
8986
43D
Tol
l7
TO
LL8
EN
SA
NG
G00
0000
0869
5ag
CP
9368
AA
AB
0100
8986
43D
Tol
l8
TO
LL9
EN
SA
NG
G00
0000
1203
9ag
CP
3322
AA
AB
0100
8807
25D
Tol
l9
TO
LL10
EN
SA
NG
G00
0000
0627
8E
BIP
8317
AA
AB
0100
8816
42B
TO
LL11
EN
SA
NG
G00
0000
0628
0E
BIP
8319
AA
AB
0100
8816
42B
Cac
tus
E
NS
AN
GG
0000
0007
525
agC
P11
355
AA
AB
0100
8964
29A
Pel
le
EN
SA
NG
G00
0000
0839
7ag
CP
1192
AA
AB
0100
8859
13E
SP
ZS
PZ
1E
NS
AN
GG
0000
0013
105
agC
P13
439
AA
AB
0100
8846
1D
SP
Z2
EN
SA
NG
G00
0000
1095
8ag
CP
6506
AA
AB
0100
8960
25D
SP
Z3
EN
SA
NG
G00
0000
1706
3ag
CP
1127
7A
AA
B01
0089
6429
A
SP
Z4
EN
SA
NG
G00
0000
1663
1ag
CP
1124
2A
AA
B01
0089
6429
A
SP
Z5
EN
SA
NG
G00
0000
1694
9ag
CP
3573
AA
AB
0100
8807
25D
SP
Z6
EN
SA
NG
G00
0000
0743
2ag
CP
1493
0A
AA
B01
0089
0021
C
Tub
eE
NS
AN
GG
0000
0018
009
agC
P29
14A
AA
B01
0088
7914
A
RE
LG
ambi
fE
NS
AN
GG
0000
0008
612
agC
P14
571
AA
AB
0100
8839
34B
Dor
sal
Rel
ish
E
NS
AN
GG
0000
0017
745
agC
P38
20A
AA
B01
0088
0725
DR
elis
h
MyD
88
E
NS
AN
GG
0000
0013
260
agC
P14
973
AA
AB
0100
8900
21F
IMD
NO
T P
RE
DIC
TE
DN
OT
PR
ED
ICT
ED
AA
AB
0100
8948
21B
Imd,
CG
5576
![Page 46: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K](https://reader035.vdocument.in/reader035/viewer/2022070909/5f8f19b07dbeb373966ca214/html5/thumbnails/46.jpg)
46
Ano
phel
es g
ene
list
Fam
ilyS
ubfa
mily
Gen
e N
ame
Syn
onym
Ens
embl
#P
rote
in p
redi
ctio
nS
caff
old
Chr
. loc
atio
nD
m O
rtho
logs
(if
any)
Com
men
ts
ST
AT
S
TA
T1
Ag-
ST
AT
EN
SA
NG
G00
0000
0779
3ag
CP
1073
0A
AA
B01
0088
4938
B
ST
AT
2
EN
SA
NG
G00
0000
0615
7E
BIP
8156
AA
AB
0100
8846
4C
PP
O
PP
O1
L760
38E
NS
AN
GG
0000
0014
466
agC
P11
54A
AA
B01
0088
5913
B
P
PO
2A
F00
4915
EN
SA
NG
G00
0000
1815
9ag
CP
6387
AA
AB
0100
8960
24B
P
PO
3A
F00
4916
EN
SA
NG
G00
0000
0204
4eb
iP24
37A
AA
B01
0089
4821
B
3/4/
5/6/
7/8/
9 lo
cate
d w
ithin
~ 6
5 kb
.
P
PO
4A
J010
193
EN
SA
NG
G00
0000
0896
7ag
CP
2161
AA
AB
0100
8948
21B
P
PO
5A
J010
194
EN
SA
NG
G00
0000
0183
5E
BIP
2175
AA
AB
0100
8948
21B
P
PO
6A
J010
195
NO
T P
RE
DIC
TE
DN
OT
PR
ED
ICT
ED
AA
AB
0100
8948
21B
P
PO
7A
J459
960
EN
SA
NG
G00
0000
0825
7ag
CP
2095
AA
AB
0100
8948
21B
P
PO
8A
J459
961
NO
T P
RE
DIC
TE
DN
OT
PR
ED
ICT
ED
AA
AB
0100
8948
21B
PP
O9
AJ4
5996
2E
NS
AN
GG
0000
0008
251
agC
P20
84A
AA
B01
0089
4821
B
DE
F
DE
F1
Def
ensi
nE
NS
AN
GG
0000
0013
132
agC
P69
15A
AA
B01
0088
1642
B
D
EF
2
NO
T P
RE
DIC
TE
DN
OT
PR
ED
ICT
ED
AA
AB
0100
8952
19D
D
EF
3
NO
T P
RE
DIC
TE
DN
OT
PR
ED
ICT
ED
AA
AB
0100
8807
5D
D
EF
4
NO
T P
RE
DIC
TE
DN
OT
PR
ED
ICT
ED
AA
AB
0100
8960
23A
CE
CC
EC
1C
ecA
EN
SA
NG
G00
0000
0946
8ag
CP
7503
AA
AB
0100
8847
1A
C
EC
2C
ecB
NO
T P
RE
DIC
TE
DN
OT
PR
ED
ICT
ED
AA
AB
0100
8847
1A
C
EC
3C
ecC
NO
T P
RE
DIC
TE
DN
OT
PR
ED
ICT
ED
AA
AB
0100
8847
1A
C
EC
4C
ecD
NO
T P
RE
DIC
TE
DN
OT
PR
ED
ICT
ED
AA
AB
0100
8807
25D
CA
SP
CA
SP
SC
AS
PS
1
NO
T P
RE
DIC
TE
DN
OT
PR
ED
ICT
ED
AA
AB
0100
8986
43D
C
AS
PS
2
EN
SA
NG
G00
0000
0818
6ag
CP
9777
AA
AB
0100
8986
43D
C
AS
PS
3
EN
SA
NG
G00
0000
0656
0E
BIP
8707
AA
AB
0100
8986
43D
C
AS
PS
4
EN
SA
NG
G00
0000
1290
3ag
CP
9776
AA
AB
0100
8986
43D
C
AS
PS
5
EN
SA
NG
G00
0000
1887
6ag
CP
2034
AA
AB
0100
8948
21A
E
BIP
2383
iden
tical
pre
dict
ion
C
AS
PS
6
EN
SA
NG
G00
0000
1887
8ag
CP
2039
AA
AB
0100
8948
21A
C
AS
PS
7
EN
SA
NG
G00
0000
0783
0ag
CP
9089
AA
AB
0100
8963
5A
C
AS
PS
8
EN
SA
NG
G00
0000
1262
9ag
CP
8592
AA
AB
0100
8980
35B
-36C
C
AS
PS
9
EN
SA
NG
G00
0000
1562
5ag
CP
1044
6A
AA
B01
0088
2340
B
C
AS
PS
10
EN
SA
NG
G00
0000
1564
0ag
CP
1059
2A
AA
B01
0088
2340
B
CA
SP
S11
EN
SA
NG
G00
0000
1887
3ag
CP
2032
AA
AB
0100
8948
21A
(2L
)P
ossi
ble
hapl
otyp
e to
agC
P20
34
CA
SP
S12
EN
SA
NG
G00
0000
1562
4ag
CP
1044
0A
AA
B01
0088
2340
B (
3R)
Pos
sibl
e ha
plot
ype
to a
gCP
1044
6
CA
SP
S13
EN
SA
NG
G00
0000
1562
3ag
CP
1043
9A
AA
B01
0088
2340
B (
3R)
Pos
sib;
e ha
plot
ype
of a
gCP
1059
2
CA
SP
S14
EN
SA
NG
G00
0000
1301
6ag
CP
3911
AA
AB
0100
8459
unkn
own
show
s si
mila
rity
to a
gCP
9777
![Page 47: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K](https://reader035.vdocument.in/reader035/viewer/2022070909/5f8f19b07dbeb373966ca214/html5/thumbnails/47.jpg)
47
Ano
phel
es g
ene
list
Fam
ilyS
ubfa
mily
Gen
e N
ame
Syn
onym
Ens
embl
#P
rote
in p
redi
ctio
nS
caff
old
Chr
. loc
atio
nD
m O
rtho
logs
(if
any)
Com
men
ts
CA
SP
(co
nt.)
CA
SP
LC
AS
PL1
E
NS
AN
GG
0000
0015
414
agC
P95
56A
AA
B01
0089
8643
DD
RE
DD
C
AS
PL2
E
NS
AN
GG
0000
0008
206
agC
P27
21A
AA
B01
0089
6820
DD
RO
NC
IAP
IA
P1
EN
SA
NG
G00
0000
0714
7E
BIP
9540
AA
AB
0100
8807
25D
DIA
P1,
TH
RE
AD
IA
P2
EN
SA
NG
G00
0000
1547
1ag
CP
6860
AA
AB
0100
8816
42B
DIA
P2
IAP
(co
nt.)
IA
P3
EN
SA
NG
G00
0000
1407
9ag
CP
3615
AA
AB
0100
8807
25D
IA
P4
EN
SA
NG
G00
0000
1352
1ag
CP
3622
AA
AB
0100
8807
25D
IA
P5
IAP
D1
EN
SA
NG
G00
0000
1668
4ag
CP
1099
6A
AA
B01
0089
6429
A-3
0ED
ET
ER
IN
IA
P6
IAP
B1
EN
SA
NG
G00
0000
0232
7E
BIP
2826
AA
AB
0100
8859
11C
-13E
BR
UC
E
![Page 48: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K](https://reader035.vdocument.in/reader035/viewer/2022070909/5f8f19b07dbeb373966ca214/html5/thumbnails/48.jpg)
48
Dro
soph
ila im
mun
ity g
ene
list
Fam
ily (
orde
r)S
ubfa
mily
(if
any)
Gen
e N
ame
Syn
onym
(if
any)
Acc
essi
on #
Chr
omos
omal
loca
tion
Com
men
ts
PG
RP
PG
RP
LP
GR
P-L
A
CG
4384
, CG
1861
4, C
G43
6167
A7
PG
RP
-LB
C
G14
704
86E
8
PG
RP
-LC
C
G44
3267
A8
PG
RP
-LD
C
G55
2364
E7-
8
PG
RP
-LE
C
G89
9513
F1
PG
RP
-LF
C
G44
3767
A8-
9
P
GR
PS
PG
RP
-SA
C
G11
709
10C
6
PG
RP
-SB
1
CG
9681
73C
1
PG
RP
-SB
2
CG
9697
73C
1
PG
RP
-SC
1a
CG
1474
644
E3
PG
RP
-SC
1b
CG
8577
44E
3
PG
RP
-SC
2
CG
1474
544
E3-
4
PG
RP
-SD
C
G74
9666
A9
TE
P T
epI
CG
1809
635
F1-
F4
Tep
IIC
G70
5228
B1-
B4
Tep
IIIC
G70
6828
B1-
B4
Tep
IVC
G10
363
37F
1-F
2
Tep
VC
G13
079
37F
1-F
2S
eque
nce
was
ref
ined
(M
. Lag
ueux
)
T
epV
IM
crC
G75
8628
D-E
GN
BP
G
NB
P1
DG
NB
P-1
CG
6895
75D
2
GN
BP
2D
GN
BP
-2C
G41
4475
D2
GN
BP
3D
GN
BP
-3C
G50
0866
E4-
E5
SC
RC
lass
A-li
keC
G11
335
CG
1133
510
0B5
CG
3921
CG
3921
24C
7-C
8
CG
2105
CG
2105
43D
7-E
1
CG
4402
CG
4402
58A
1
Teq
uila
GR
AA
LC
G18
403
66F
2
C
lass
Bcr
oque
mor
t
CG
4280
21C
5
CG
1278
9
CG
1278
927
F3
CG
7228
C
G72
2828
D2
CG
7227
C
G72
2728
D2
CG
5750
C
G57
5036
D2
CG
3829
C
G38
2960
E 1
1
SC
R (
cont
.)
CG
2736
C
G27
3660
E10
-11
![Page 49: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K](https://reader035.vdocument.in/reader035/viewer/2022070909/5f8f19b07dbeb373966ca214/html5/thumbnails/49.jpg)
49
Dro
soph
ila im
mun
ity g
ene
list
Fam
ily (
orde
r)S
ubfa
mily
(if
any)
Gen
e N
ame
Syn
onym
(if
any)
Acc
essi
on #
Chr
omos
omal
loca
tion
Com
men
ts
em
p
CG
2727
60E
11
CG
1887
C
G18
8762
B9
C
G74
22
CG
7422
66A
10
C
G10
345
C
G10
345
89D
3
CG
7000
C
G70
0093
B11
C
lass
CdS
R-C
IC
G40
9924
D4
dSR
-CII
CG
8856
48E
10
dSR
-CIII
Q9N
2Q3
24D
CG
3212
C
G32
1223
F3
CT
LC
TLG
AC
G41
15C
G41
1587
B8
CG
6055
CG
6055
28A
1
CG
3244
CG
3244
25A
6
CG
9976
CG
9976
37D
6
CG
9978
CG
9978
37D
6
C
TLM
AC
G91
34
CG
9134
61F
4
CG
2958
C
G29
5824
D8
CG
1683
4
CG
1683
432
C5
lect
in_3
R D
m
NO
T P
RE
DIC
TE
Dch
rom
3R
C
TLS
Efu
rrow
edC
G15
0011
A1
CG
9095
C
G90
9513
B1
C
TL
CG
1581
8
CG
1581
827
F3
CG
1779
9
CG
1779
929
C1
CG
1535
8
CG
1535
822
B4
CG
1537
8
CG
1537
822
C1
CG
2839
C
G28
3921
D2
CG
2826
C
G28
2621
D2
CG
7763
C
G77
6347
F16
C
G34
10C
G34
1024
B3-
C1
CG
1368
6C
G13
686
21D
2
CG
1701
1
CG
1701
130
A7
CG
7106
CG
7106
28D
2
CG
1843
1C
G18
431
54B
18-C
1
CG
1779
7C
G17
797
29C
1
CT
L (c
ont.)
C
G14
500
CG
1450
055
C2
CG
1656
CG
1656
46B
7
![Page 50: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K](https://reader035.vdocument.in/reader035/viewer/2022070909/5f8f19b07dbeb373966ca214/html5/thumbnails/50.jpg)
50
Dro
soph
ila im
mun
ity g
ene
list
Fam
ily (
orde
r)S
ubfa
mily
(if
any)
Gen
e N
ame
Syn
onym
(if
any)
Acc
essi
on #
Chr
omos
omal
loca
tion
Com
men
ts
CG
1652
CG
1652
46B
6-7
CG
6014
CG
6014
78D
2
C
G11
211
CG
1121
142
A8
CG
1308
6C
G13
086
37D
2
CG
1211
1C
G12
111
7F4-
7F5
CG
1576
5C
G15
765
5C2-
5C3
le
ctin
_42A
2R
lect
in_4
2A 2
R42
A
CG
1486
6
CG
1486
688
F1
CG
3921
C
G39
2124
C7-
C8
GA
LE
CG
1137
4C
G11
374
21A
4
CG
1395
0C
G13
950
21D
4
CG
1856
5C
G18
565
77B
7
CG
5335
CG
5335
55E
11
Dm
Gal
Gal
ectin
CG
1137
221
-A4
FB
N
CG
9500
CG
9500
26C
4
CG
8642
CG
8642
44D
5
scab
rous
sca
CG
1757
949
D4
CG
5550
CG
5550
52E
2
CG
3028
0C
G30
280
58D
2
CG
3028
1C
G30
281
58D
2
CG
1035
9C
G10
359
63E
5
CG
7668
CG
7668
76E
1
CG
9593
CG
9593
89A
6
CG
6788
CG
6788
16E
2
CG
1791
CG
1791
9A3
CG
1889
CG
1889
9A3
CG
3183
2
CG
3183
2N
OT
PR
ED
ICT
ED
CLI
PC
G11
02B
ES
T:G
H02
921
CG
1102
82A
4-82
A4
CG
1131
3C
G11
313
100A
3-10
0A3
CG
1299
CG
1299
64A
8-64
A8
two
clip
dom
ains
CG
1331
8C
G13
318
85B
1-85
B1
CG
1504
6C
G15
046
17B
4-17
B4
two
clip
dom
ains
CLI
P (
cont
.)
CG
1670
5C
G16
705
95A
7-95
A7
CG
1682
1C
G16
821
34B
2-34
B2
CG
1847
7B
G:D
S07
108.
1C
G18
477
35D
3-35
D3
![Page 51: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K](https://reader035.vdocument.in/reader035/viewer/2022070909/5f8f19b07dbeb373966ca214/html5/thumbnails/51.jpg)
51
Dro
soph
ila im
mun
ity g
ene
list
Fam
ily (
orde
r)S
ubfa
mily
(if
any)
Gen
e N
ame
Syn
onym
(if
any)
Acc
essi
on #
Chr
omos
omal
loca
tion
Com
men
ts
CG
1855
7C
G18
557
23B
4-23
B4
Ser
7C
G20
459A
2-9A
2
CG
2056
CG
2056
7F3-
7F3
C
G30
66C
G30
6684
D14
-84E
1
CG
3117
CG
3117
23B
4-23
B5
CG
3505
CG
3505
88C
10-8
8C10
stub
ble
stub
bloi
dC
G43
1689
B7-
89B
9
CG
4793
BG
:DS
0748
6.3
CG
4793
35D
6-35
D6
east
erC
G49
2088
F1-
88F
1
CG
4998
CG
4998
72E
1-72
E1
CG
5390
CG
5390
31D
1-31
D1
CG
5896
CG
5896
97E
5-97
E5
CG
5909
CG
5909
97E
5-97
E6
CG
6361
CG
6361
17B
3-17
B3
pers
epho
neC
G63
67C
G63
6717
B3-
17B
4
CG
6639
CG
6639
36C
9-36
C9
CG
7432
CG
7432
92A
13-9
2A13
snak
el(3
)87D
g, m
e(3)
4C
G79
9687
D9-
87D
9
CG
8172
CG
8172
45A
1-45
A1
CG
8213
CG
8213
44F
12-4
5A1
CG
8586
CG
8586
44E
2-44
E2
two
clip
dom
ains
CG
8738
CG
8738
44E
2-44
E2
two
clip
dom
ains
CG
9372
CG
9372
76B
9-76
B9
CG
9377
CG
9377
34B
7-34
B7
CG
9733
CG
9733
99E
3-99
E3
CG
9737
CG
9737
99E
3-99
E3
CG
4914
C
G49
1470
E7-
70E
7no
t in
anal
ysis
SR
PN
serp
in-2
7AC
G11
331
26F
6in
hibi
tory
: K/F
CG
6717
CG
6717
28B
3in
hibi
tory
: K/K
CG
1231
8C
G12
318
28D
2in
hibi
tory
: L/S
CG
7219
CG
7219
28D
5in
hibi
tory
: S/G
SR
PN
(co
nt.)
sp
2C
G81
3728
F5
inhi
bito
ry: L
/S
C
G48
04
CG
4804
31A
2no
n-in
hibi
tory
sp3
C
G93
3438
F2
inhi
bito
ry: K
/S
CG
1447
0
CG
1447
041
F10
non-
inhi
bito
ry
![Page 52: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K](https://reader035.vdocument.in/reader035/viewer/2022070909/5f8f19b07dbeb373966ca214/html5/thumbnails/52.jpg)
52
Dro
soph
ila im
mun
ity g
ene
list
Fam
ily (
orde
r)S
ubfa
mily
(if
any)
Gen
e N
ame
Syn
onym
(if
any)
Acc
essi
on #
Chr
omos
omal
loca
tion
Com
men
ts
CG
9455
C
G94
5542
D4
inhi
bito
ry: M
/M
sp1
C
G94
5642
D4
inhi
bito
ry: R
/A
CG
9460
C
G94
6042
D4
inhi
bito
ry: E
/S
CG
9454
C
G94
5442
D4
non-
inhi
bito
ry
sp4
C
G94
5342
D4
inhi
bito
ry: K
/R
S
pn43
Aa
C
G12
172
43A
inhi
bito
ry: M
/S
nec
C
G18
5743
Ain
hibi
tory
: L/S
CG
1859
C
G18
5943
Ano
n-in
hibi
tory
Spn
43A
b
CG
1865
43A
non-
inhi
bito
ry
C
G77
22C
G77
2247
C7
non-
inhi
bito
ry
CG
1095
6C
G10
956
54A
3no
n-in
hibi
tory
sp6
C
G10
913
55C
1in
hibi
tory
: R/M
CG
1308
CG
1308
64A
10N
o
acp7
6AC
G38
0175
F5
No
CG
6680
CG
6680
77B
4Y
: K/A
CG
6663
CG
6663
77B
4N
o
sp5
CG
1852
577
B4
Y: S
/A
CG
6687
CG
6687
77B
4Y
: S/S
CG
1280
7
CG
1280
785
F5
No
CG
1342
C
G13
4210
0A2
Y: R
/T
TO
LL
Tol
l-1T
oll
CG
5490
97D
1
Tol
l-218
wC
G88
9656
F8
Tol
l-3M
stP
rox
CG
1149
84D
5
Tol
l-4
CG
1824
129
F7
Tol
l-5T
ehao
CG
7121
34B
6
Tol
l-6
CG
7250
71C
1
Tol
l-7
CG
8595
56F
3
Tol
l-8T
ollo
CG
6890
71B
7
Tol
l-9
CG
5528
77B
1
Cac
tus
ca
ctus
C
G58
4835
E5
Pel
le
pelle
C
G59
7497
E6
SP
Z
spat
zle
C
G61
3497
D12
sp
z2
CG
1831
864
B9
sp
z3
CG
7104
28C
3
sp
z4
CG
1492
832
F1
![Page 53: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K](https://reader035.vdocument.in/reader035/viewer/2022070909/5f8f19b07dbeb373966ca214/html5/thumbnails/53.jpg)
53
Dro
soph
ila im
mun
ity g
ene
list
Fam
ily (
orde
r)S
ubfa
mily
(if
any)
Gen
e N
ame
Syn
onym
(if
any)
Acc
essi
on #
Chr
omos
omal
loca
tion
Com
men
ts
sp
z5
CG
9972
63A
1
spz6
C
G91
9660
E1
Tub
e
tube
C
G10
520
82B
1
Rel
do
rsal
C
G66
6736
C5
Dif
C
G67
9436
C4-
5
relis
h
CG
1199
285
C2-
3
M
yD88
C
G20
7845
C4
im
mun
e de
ficie
ncy
imd
CG
5576
55C
8
ST
AT
S
Sta
t92E
dST
AT
, mar
elle
CG
4257
92E
11-1
2
PP
O
Bc
Dox
-A1
CG
5779
55A
3
CG
8193
C
G81
9345
A2
Dox
-A3
C
G29
5259
D2
CE
C c
ecA
1C
G13
6599
E3
Pro
tein
seq
uenc
e id
entic
al to
Cec
A2
c
ecA
2C
G13
6799
E3
cecB
CG
1878
99E
3
CE
C (
cont
.)
cecC
C
G13
7399
E3
DE
F d
ef
CG
1385
46D
7
CA
SP
N
cdr
onc
CG
8091
67C
5
dred
d
CG
7486
1B-7
dcp-
1
CG
5370
59F
5
deca
y
CG
1490
289
D4
ice
dric
eC
G77
8899
C7
drea
m
CG
7863
42A
2
dam
mda
ydre
amC
G18
188
48D
2
IAP
thr
ead
DIA
P1
CG
1228
472
D1
Iap2
DIA
P2
CG
8293
52D
15
dete
rinC
G12
265
90A
1-2
Not
link
ed in
Fly
Bas
e ye
t
Bru
ce
CG
6303
86A
7-86
A8
![Page 54: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K](https://reader035.vdocument.in/reader035/viewer/2022070909/5f8f19b07dbeb373966ca214/html5/thumbnails/54.jpg)
54
5. Supporting references and notes
1. R. A. Holt et al., Science This issue (2002).
2. M. D. Adams et al., Science 287, 2185 (2000).
3. S. van Dongen, PhD, University of Utrecht (2000).
4. E. M. Zdobnov, R. Apweiler, Bioinformatics 17, 847 (2001).
5. S. E. R. Durbin, A. Krogh, and G. Mitchison, Cambridge University Press (1998).
6. E. Birney, R. Durbin, Genome Res 10, 547 (2000).
7. G. Dimopoulos et al., Proc Natl Acad Sci U S A 99, 8814 (2002).
8. F. H. Collins et al., Science 234, 607 (1986).
9. H. M. Muller, G. Dimopoulos, C. Blass, F. C. Kafatos, J Biol Chem 274, 11727
(1999).
10. H. Yoshida, K. Kinoshita, M. Ashida, J Biol Chem 271, 13854 (1996).
11. T. Michel, J. M. Reichhart, J. A. Hoffmann, J. Royet, Nature 414, 756 (2001).
12. M. Gottar et al., Nature 416, 640 (2002).
13. K. M. Choe, T. Werner, S. Stoven, D. Hultmark, K. V. Anderson, Science 296,
359 (2002).
14. M. Ramet, P. Manfruelli, A. Pearson, B. Mathey-Prevot, R. A. Ezekowitz, Nature
416, 644 (2002).
15. T. Werner et al., Proc Natl Acad Sci U S A 97, 13772 (2000).
16. M. Lagueux, E. Perrodou, E. A. Levashina, M. Capovilla, J. A. Hoffmann, Proc
Natl Acad Sci U S A 97, 11427 (2000).
17. M. Lagueux, unpublished.
![Page 55: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K](https://reader035.vdocument.in/reader035/viewer/2022070909/5f8f19b07dbeb373966ca214/html5/thumbnails/55.jpg)
55
18. E. A. Levashina et al., Cell 104, 709 (2001).
19. F. Oduol, J. Xu, O. Niare, R. Natarajan, K. D. Vernick, Proc Natl Acad Sci U S A
97, 11397 (2000).
20. J. Hofemeister, A. Kurtz, R. Borriss, J. Knowles, Gene 49, 177 (1986).
21. S. Schimming, W. H. Schwarz, W. L. Staudenbauer, Eur J Biochem 204, 13
(1992).
22. Y. S. Kim et al., J Biol Chem 275, 32721 (2000).
23. L. Peiser, S. Mukhopadhyay, S. Gordon, Curr Opin Immunol 14, 123 (2002).
24. L. J. van der Laan et al., J Immunol 162, 939 (1999).
25. A. Danielli et al., Proc Natl Acad Sci U S A 97, 7136 (2000).
26. K. Csiszar, Prog. Nucleic Acid Res. Mol. Biol. 70, 1 (2001).
27. R. Crombie, R. Silverstein, J Biol Chem 273, 4855 (1998).
28. N. C. Franc, J. L. Dimarcq, M. Lagueux, J. Hoffmann, R. A. Ezekowitz, Immunity
4, 431 (1996).
29. K. Hart, M. Wilcox, J Mol Biol 234, 249 (1993).
30. A. Pearson, A. Lux, M. Krieger, Proc Natl Acad Sci U S A 92, 4056 (1995).
31. M. Ramet et al., Immunity 15, 1027 (2001).
32. K. Drickamer, M. E. Taylor, Annu Rev Cell Biol 9, 237 (1993).
33. Y. Fujita, S. Kurata, K. Homma, S. Natori, J Biol Chem 273, 9667 (1998).
34. K. Drickamer, Nature 360, 183 (1992).
35. L. A. Leshko-Lindsay, V. G. Corces, Development 124, 169 (1997).
36. L. B. Klickstein et al., J Exp Med 165, 1095 (1987).
37. D. N. Cooper, S. H. Barondes, Glycobiology 9, 979 (1999).
![Page 56: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K](https://reader035.vdocument.in/reader035/viewer/2022070909/5f8f19b07dbeb373966ca214/html5/thumbnails/56.jpg)
56
38. K. E. Pace et al., J Biol Chem 277, 13091 (2002).
39. Q. Jiang, M. Hall, F. G. Noriega, M. Wells, Insect Biochem Mol Biol 27, 283
(1997).
40. C. A. Davis, D. C. Riddell, M. J. Higgins, J. J. Holden, B. N. White, Nucleic
Acids Res 13, 6605 (1985).
41. M. J. Gorman, O. V. Andreeva, S. M. Paskewitz, Insect Biochem Mol Biol 30, 35
(2000).
42. M. J. Gorman, S. M. Paskewitz, Insect Biochem Mol Biol 31, 257 (2001).
43. J. Volz, C. Blass, H. M. Muller, F. C. Kafatos, unpublished data.
44. P. J. Gotwals, J. W. Fristrom, Genetics 127, 747 (1991).
45. L. F. Appel et al., Proc Natl Acad Sci U S A 90, 4937 (1993).
46. G. A. Silverman et al., J Biol Chem 276, 33293 (2001).
47. J. A. Irving, R. N. Pike, A. M. Lesk, J. C. Whisstock, Genome Res 10, 1845
(2000).
48. B. Lemaitre, E. Nicolas, L. Michaut, J. M. Reichhart, J. A. Hoffmann, Cell 86,
973 (1996).
49. C. A. Janeway, Jr., R. Medzhitov, Annu Rev Immunol 20, 197 (2002).
50. J. Y. Ooi, Y. Yagi, X. Hu, Y. T. Ip, EMBO Rep 3, 82 (2002).
51. S. Tauszig, E. Jouanguy, J. A. Hoffmann, J. L. Imler, Proc Natl Acad Sci U S A
97, 10520 (2000).
52. J. L. Imler, unpublished data.
53. C. Luo, L. Zheng, Immunogenetics 51, 92 (2000).
54. P. Ligoxygakis, P. Bulet, J. M. Reichhart, EMBO Rep 3, 666 (2002).
![Page 57: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K](https://reader035.vdocument.in/reader035/viewer/2022070909/5f8f19b07dbeb373966ca214/html5/thumbnails/57.jpg)
57
55. C. Barillas-Mury et al., Embo J 15, 4691 (1996).
56. B. Lemaitre et al., Embo J 14, 536 (1995).
57. F. Leulier, S. Vidal, K. Saigo, R. Ueda, B. Lemaitre, Curr Biol 12, 996 (2002).
58. C. Barillas-Mury, Y. S. Han, D. Seeley, F. C. Kafatos, Embo J 18, 959 (1999).
59. L. O. Baumbusch et al., Nucleic Acids Res 29, 4319 (2001).
60. M. Meister, C. Hetru, J. A. Hoffmann, Curr Top Microbiol Immunol 248, 17
(2000).
61. P. Bulet, C. Hetru, J. L. Dimarcq, D. Hoffmann, Dev Comp Immunol 23, 329
(1999).
62. A. M. Richman et al., Insect Molecular Biology 5, 203 (1996).
63. J. Vizioli et al., Proc Natl Acad Sci U S A 98, 12630 (2001).
64. Y. Shi, Mol Cell 9, 459 (2002).
65. S. Y. Vernooy et al., J Cell Biol 150, F69 (2000).
66. V. Jesenberger, S. Jentsch, Nat Rev Mol Cell Biol 3, 112 (2002).
67. We thank EUROGENTEC and QIAGEN Operon for the long oligonucleotides
used in the microarray experiments.