1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 supplementary material immunity-related genes...

57
1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K. Christophides, 1 * Evgeny Zdobnov, 1 * Carolina Barillas-Mury, 2 Ewan Birney, 3 Stephanie Blandin, 1 Claudia Blass, 1 Paul T. Brey, 4 Frank H. Collins, 5 Alberto Danielli, 1 George Dimopoulos, 6 Charles Hetru, 7 Ngo T. Hoa, 8 Jules A. Hoffmann, 7 Stefan M. Kanzok, 8 Ivica Letunic, 1 Elena Levashina, 1 Thanasis G. Loukeris, 9 Gareth Lycett, 1 Stephan Meister, 1 Kristin Michel, 1 Luis F. Moita, 1 Hans-Michael Mueller, 1 Mike A. Osta, 1 Susan M. Paskewitz, 10 Jean-Marc Reichhart, 7 Andrey Rzhetsky, 11 Laurent Troxler, 7 Kenneth D. Vernick, 12 Dina Vlachou, 1 Jennifer Volz, 1 Christian von Mering, 1 Jiannong Xu, 12 Liangbiao Zheng, 8 Peer Bork, 1 Fotis C. Kafatos 1# 1 European Molecular Biology Laboratory, Meyerhofstr. 1, D-69117 Heidelberg, Germany. 2 Colorado State University, Department of Microbiology, Immunology and Pathology (MIP), Fort Collins, CO 80523-1682, USA. 3 European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK. 4 Unité de Biochimie et Biologie Moléculaire des Insectes, Institut Pasteur, 25 rue du Dr. Roux 75724 Paris Cedex 15 France. 5 Center for Tropical Disease Research and Training, University of Notre Dame, P.O. Box 369, Notre Dame, IN 46556-0369, USA. 6 Department of Biological Sciences, Centre for Molecular Microbiology & Infection, Imperial College of Science, Technology and Medicine, London SW7 2AZ, UK. 7 Institut de Biologie Moléculaire et Cellulaire, Unité Propre de Recherche, 9022 du Centre National de la Recherche Scientifique, 15 rue Descartes, F67084 Strasbourg Cedex France. 8 Yale University School of Medicine, Epidemiology and Public Health, 60 College Street, New Haven, CT 06520 USA. 9 IMBB-FORTH, Vassilika Vouton, P.O.Box 1527, GR-711 10 Heraklion, Crete, Greece. 10 Department of Entomology, 237 Russell Lab, 1630 Linden Drive, University of Wisconsin, Madison, Wisconsin 53706, USA. 11 Columbia Genome Center and Department of Medical Informatics, Columbia University, Russ Berrie Medical Science Pavilion, 1150 St. Nicholas Avenue, New York, NY 10032 USA. 12 Department of Medical and Molecular Parasitology, New York University School of Medicine, 341 East 25th Street, Room 613, New York, NY 10010, USA. *Contributed equally to the work # To whom correspondence should be addressed. Email: [email protected]

Upload: others

Post on 04-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K

1

SUPPLEMENTARY MATERIAL

Immunity-related genes and gene families in Anopheles gambiae: A comparative

genomic analysis

George K. Christophides,1* Evgeny Zdobnov,1* Carolina Barillas-Mury,2 Ewan Birney,3

Stephanie Blandin,1 Claudia Blass,1 Paul T. Brey,4 Frank H. Collins,5 Alberto Danielli,1

George Dimopoulos,6 Charles Hetru,7 Ngo T. Hoa,8 Jules A. Hoffmann,7 Stefan M.

Kanzok,8 Ivica Letunic,1 Elena Levashina,1 Thanasis G. Loukeris,9 Gareth Lycett,1

Stephan Meister,1 Kristin Michel,1 Luis F. Moita,1 Hans-Michael Mueller,1 Mike A. Osta,1

Susan M. Paskewitz,10 Jean-Marc Reichhart,7 Andrey Rzhetsky,11 Laurent Troxler,7

Kenneth D. Vernick,12 Dina Vlachou,1 Jennifer Volz,1 Christian von Mering,1 Jiannong

Xu,12 Liangbiao Zheng,8 Peer Bork,1 Fotis C. Kafatos1#

1European Molecular Biology Laboratory, Meyerhofstr. 1, D-69117 Heidelberg, Germany. 2Colorado State

University, Department of Microbiology, Immunology and Pathology (MIP), Fort Collins, CO 80523-1682,

USA. 3European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10

1SD, UK. 4Unité de Biochimie et Biologie Moléculaire des Insectes, Institut Pasteur, 25 rue du Dr. Roux

75724 Paris Cedex 15 France. 5Center for Tropical Disease Research and Training, University of Notre

Dame, P.O. Box 369, Notre Dame, IN 46556-0369, USA. 6Department of Biological Sciences, Centre for

Molecular Microbiology & Infection, Imperial College of Science, Technology and Medicine, London

SW7 2AZ, UK. 7Institut de Biologie Moléculaire et Cellulaire, Unité Propre de Recherche, 9022 du Centre

National de la Recherche Scientifique, 15 rue Descartes, F67084 Strasbourg Cedex France. 8Yale

University School of Medicine, Epidemiology and Public Health, 60 College Street, New Haven, CT 06520

USA. 9IMBB-FORTH, Vassilika Vouton, P.O.Box 1527, GR-711 10 Heraklion, Crete, Greece.10Department of Entomology, 237 Russell Lab, 1630 Linden Drive, University of Wisconsin, Madison,

Wisconsin 53706, USA. 11Columbia Genome Center and Department of Medical Informatics, Columbia

University, Russ Berrie Medical Science Pavilion, 1150 St. Nicholas Avenue, New York, NY 10032 USA.12Department of Medical and Molecular Parasitology, New York University School of Medicine, 341 East

25th Street, Room 613, New York, NY 10010, USA.

*Contributed equally to the work

#To whom correspondence should be addressed. Email: [email protected]

Page 2: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K

2

1. Methods

Identification of immunity protein families. We employed a range of sequence analysis

procedures combined with careful manual analysis to derive the immunity-related

proteins. First, the predicted proteomes of A. gambiae (1) and D. melanogaster (2) were

compared by means of Smith-Waterman pairwise alignments of all against all proteins.

Second, sequences with significant similarity were clustered together based on pairwise

scores by a single linkage clustering algorithm. Knowing the pitfalls of this method we

also experimented with the newly introduced MCL clustering algorithm (3). Using single

linkage cut-off at e-value 10-20 and MCL inflation equals 3, both methods produced a

similar number of clusters. However, MCL classified more proteins, that were considered

as singletons by the single linkage algorithm. These data were combined with all

identified InterPro signatures of known protein domains and families using the

InterProScan (4) package. In general ,combination of InterPro and single linkage

clustering yielded the most relevant results as judged by further manual analysis by

experts. In some cases, protein families were further screened for possible missed gene

predictions by scanning Anopheles and Drosophila genomic sequences (release 3) with

characteristic HMM (5), trained on manually verified multiple alignments of the known

family members, using GeneWise (6) and HMMer software (S. Eddy,

http://hmmer.wustl.edu/).

Phylogenetic analysis. Full length, or partial predicted sequences where appropriate, were

aligned using Clustal X programs and cladograms constructed by neighbour-joining

Page 3: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K

3

analysis and displayed through Treeview. Detailed nucleotide comparison and

cytogenetic mapping was effected with reference to the Ensembl genome interface

(www.ensembl.org/Anopheles_gambiae) and Flybase (http://flybase.bio.indiana.edu/).

Genes were only considered as 1:1 orthologs if the relevant bootstrap values were above

800 (1000 iterations).

EST and oligonucleotide microarray analysis. Mosquito EST microarray construction,

hybridization and analysis were performed as described (7). Developmental profiling was

performed using embryonic, 4th larval instars, pupal and newly emerged female adult

stages of A. gambiae, Suakoko strain. A pool of total RNA prepared from all stages was

used as reference sample. For immune challenges, 2 to 3-day old female mosquitoes of

the Plasmodium-susceptible strain 4a r/r (8) were pricked with either a sterile needle or

dipped in thick suspension of E. coli or S. aureus. Total RNA was collected 12 hrs after

challenge. For malaria infection experiments, 4a r/r mosquitoes were fed on control

Balb/c mice or on mice infected with P. berghei, and mosquito RNA samples collected at

24 hrs, 28 hrs, 6 days, 11 days and 16 days post-infection.

Oligonucleotide primers were designed to amplify individual genes from a cDNA

library or adult genomic DNA (average probe-length 500bp). PCR products were purified

with ion exchange columns (Macherey-Nagel GmbH &Co.KG, Dueren, Germany) and

spotted at 500ng/µl in 3X SSC. Anti-sense oligonucleotides (60 to 70-mers) were

designed by EUROGENTEC (Seraing, Belgium) and resuspended in 3X SSC at 50 µM

prior to spotting. Usage of 5’ amino modification of the oligonucleotides proved

unnecessary. Spotting was performed on aminosilane coated glass slides using the

Page 4: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K

4

Omnigrid arrayer (GeneMachines, San Carlos, CA) and Telechem Stealth Pins

(Telechem International, Sunnyvale, CA). Cell line 4a3B (9) was challenged with

paraformaldehyde-fixed E. coli and S. aureus (OD 0,05), PGN (10 µg/ml) and H2O2 (2

µM). Duplicated RNA samples were collected 12 hrs after challenge and hybridized to

arrays as described (7). RNA prepared from naïve cells was used as reference.

Figure legend for phylogenetic trees. The following color scheme was used in all the

trees, if not noted otherwise. Red, A. gambiae; blue, D. melanogaster; green, vertebrates;

black, other invertebrates and common stems. Pink and blue shadings indicate putative

gene family expansions in mosquito and the fruitfly respectively. Physical location

(chromosomal subdivision) of genes or gene clusters is given. 1:1 rthologs or orthologous

groups are highlighted with filled or open circles, respectively. Ag, Anopheles gambiae;

Aclu, Acalolepta luxuriosa; Aeae, Aedes aegypti; Aeal, Aedes albopictus; Ag, Anopheles

gambiae; Aecy, Aeshna cyanea; Aldi, Allomyrina dichotoma; Anau, Androctonus

australis hector; Anpe, Antheraea pernyi; Anst, Anopheles stephensi; Apme, Apis

mellifera; Arsu, Armigeres subalbatus; Bomo, Bombyx mori; Bopa, Bombus pascuorum;

Bota, Bos taurus; Caet, Calpodes ethlius; Ce, Caenorhabditis elegans; Ceca, Ceratitis

capitata; Chpl, Chironomus plumosus; Crgi, Crassostrea gigas; Dm, D. melanogaster;

Dv, Drosophila virilis; Foru, Formica rufa; Game, Galleria mellonella; Hevi, Heliothis

virescens; Hs, Homo sapiens; Hyce, Hyalophora cecropia; Hycu, Hyphantria cunea;

Lequ, Leiurus quinquestriatus; Maja, Marsupenaeus japonicus; Mase, Manduca Sexta;

Mumu, Mus musculus; Pale, Pacifastacus leniusculus; Myed, Mytilus edulis; Papr,

Palomena prasina; Pemo, Penaeus monodon; Pihy, Pimpla hypochondriaca; Poma,

Page 5: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K

5

Podisus maculiventris; Prte, Protophormia terraenovae; Pyap, Pyrrhocoris apterus;

Sabu, Sarcophaga bullata; Sape, Sacrophaga peregrina; Spfr, Spodoptera frugiperda;

Stca, Stomoxys calcitrans; Susc, Sus scrofa; Temo, Tenebrio molitor; Tefl, Tetraodon

fluviatilis; Trni, Trichoplusia ni.

Page 6: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K

6

2. Supporting online text

Nomenclature

This study identified a substantial number of genes, approximately 2% of the total

predicted in the current annotation of the Anopheles genome, and belonging to more than

18 protein families. To facilitate future work on this large gene set we named the genes

systematically according to provisional nomenclature rules, modeled on those

recommended by the Human Genome Organization (HUGO) for naming human genes.

Following consultation in the Anopheles genomics community, and to avoid unsystematic

and duplicate names we recommend this as a provisional nomenclature system for the

entire A. gambiae genome, to be supervised by an international committee that is being

set up. The rules are as follows:

1. The names are mnemonic symbols, designed for easy recall. They do not aim to

summarize all current information, which in any case is in complete and subject to

errors (orthology, function, chromosomal location).

2. To avoid errors in electronic communication all names consist exclusively of

capital letters of the Latin alphabet and Arabic numerals; no punctuation marks,

dashes etc. are used.

3. To minimize the length the formal names do not include taxonomic initials. If

similarly named genes of two organisms are being compared, taxonomic initials

can be added for convenience, but do not constitute part of the name (e.g. aTEP to

be easily distinguished from dTep).

Page 7: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K

7

4. Roman letters and numerals indicate protein, italics indicate gene or RNA.

5. The name is based on sequence similarities and carries no functional implications,

which must be determined experimentally.

6. The name consists of two to three contiguous fields, as follows:

- The first field includes three to five letters and is an abbreviations of the

highest sequence grouping used, usually a protein family, e.g. CLIP (for

Clip-domain serine protease).

- The second field, if present, includes one or more letters identifying a

subgroup such as subfamily (e.g. CLIPD), or class (e.g. SCRB).

- The third field enumerates each gene by using consecutive numerals (e.g.

SCRB1,… 12).

- Sometimes the third field numeral can be preceded by letter(s) indicating

gene types within a subgroup (e.g. SCRBQ1, for a gene belonging to the

SCRB Class, and to the croquemort type).

- For historical reasons, in certain families, the third field can also

enumerate by letters rather than numerals (e.g. PGRPLA, for gene A of the

Long subfamily in the PGRP family).

7. It is recommended that names previously used in the literature or in database

submissions be gradually replaced by systematic names, following consultation

with the original author (we have done so for genes previously described by the

authors of the present study). Historical names or names that may be developed

eventually to indicate experimentally verified function or orthology can be used as

synonyms.

Page 8: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K

8

Recognition of infectious non-self

Peptidoglycan Recognition Proteins (PGRPs).

Members of this family have one or more PGRP domains and are important components

of insect immune reactions. The first PGRP was characterized in B. mori as a hemolymph

protein that binds peptidoglycan (PGN) and activates the PPO cascade (10). In

Drosophila, secreted PGRP-SA is essential for activating the Toll signaling pathway

mediated response to Gram positive (Gram+) bacteria, but not to fungi (11), while two

PGRP-LC isoforms act via an alternative immune signaling pathway (Imd) that responds

to Gram- bacteria to induce certain antimicrobial peptides (12), (13). A genome-wide

RNAi screen of Drosophila cells in culture (14) points to PGRP-LC as a key player for

phagocytosis of Gram- but not Gram+ bacteria.

Of the 13 Drosophila genes that encode PGRP domains (15), seven are classified

as short (S) and encode secreted proteins, while six genes of the long (L) subfamily

encode transmembrane or intracellular products. The Anopheles genome includes three

members of the short subfamily (S1, S2 and S3) and four of the long subfamily (LA, LB,

LC and LD). The latter are clear orthologs of correspondingly named Drosophila genes.

However, unlike the Drosophila PGRP-LB protein that is thought to be intracellular,

Anopheles PGRPLB has a putative transmembrane domain; this gene is strongly

upregulated in cells challenged with immune elicitors (7).

Page 9: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K

9

Thioester-containing proteins (TEPs).

In C. elegans, the only TEP (cTEP) identified in the genome displays the conserved

thioester (TE) motif. It is mostly expressed in epithelial cells throughout worm

development and is not induced by immune stimuli.

All six Drosophila Tep proteins (dTep) except dTepV have clear signal peptides,

indicating that they are secreted proteins. dTepI, II and IV are immune-inducible, whereas

dTepIII is expressed only during early developmental stages. dTepII is unique among the

TEP genes as it encodes five alternatively-spliced forms. In contrast, dTEPV may not be

an active gene, as it has never been amplified from cDNA libraries. In all, Drosophila can

produce nine or ten distinct TEPs (16, 17).

Sequence comparison of all cTEP, dTeps, and aTEPs identifies on a single 1:1

ortholog - dTepVI and aTEP13 (Fig. 3), suggesting that these proteins (which lack the TE

motif) might serve highly similar functions in the two insect species. In addition, dTepIII

forms an orthologous group (OG) with aTEP2 and aTEP15; all three proteins have a TE

motif. Three TEP sequences, aTEP12, aTEP14 and cTEP, are highly diverged, forming

deep branches in the tree. Finally, two sequence clusters represent species-specific

expansions of the family, one including exclusively four dTeps and the other ten aTEPs.

The Drosophila-specific expansion includes three proteins with and one without a TE

motif, while the Anopheles-specific expansion includes two with and six without TE

motif (two other proteins are uncertain, or they are only partially represented in the

sequence). The aTEP1 protein (originally designated aTEP-I) is bacterially induced and

promotes phagocytosis of bacteria (18); aTEP4 (originally designated IMCR14) is

strongly upregulated by Plasmodium (19); aTEP3 (lacking a TE motif) is upregulated

Page 10: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K

10

upon bacterial challenge. We speculate that the species-specific family expansions

represent finely tuned radiation of TEPs in response to distinct pathogenic environments

in the fruitfly and the mosquito; and that in insect TEPs more often than in vertebrate

TEPs, protein-specific function does not require the TE motif.

Gram Negative Binding Proteins (GNBPs).

Proteins of this family show homology to the catalytic region of bacterial β-1,3 and β-

1,3-1,4 glucanases (20), (21). Drosophila GNBP-1 exists in both soluble and GPI-

anchored forms and plays an important role in innate immune signaling in response to

bacterial lipopolysaccharides (22). In addition to silkworm Bombyx mori and fruitfly

homologs, one A. gambiae GNBP is known. Characterized moth and mosquito genes are

upregulated by immune challenge, whereas the fruitfly genes are constitutively expressed

at specific developmental stages.

The A. gambiae genome includes 6 GNBP genes which, together with known

moth and fruitfly homologues, reveal two distinct sequence groups (Fig. S1A). Subfamily

A includes all known fruit fly and moth as well as two mosquito sequences (GNBPA1,2).

The GNBPA2 gene of Anopheles and the GNBP3 Drosophila gene are orthologs. A new

subfamily B is mosquito-specific (GNBPB1,2,3,4), and three of its four members are

tightly clustered on chromosomal subdivision 13E (Fig. S1B). Interestingly, the mosquito

genes differ widely in intron-exon structure, showing two to five introns at non-conserved

locations (Fig. S1C).

Scavenger Receptors (SCRs).

Page 11: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K

11

Members of this diverse family of multidomain, transmembrane or secreted receptors

play important roles in innate immunity and development. They recognize modified LDL,

multiple polyanionic ligands and cell wall components, and thus help internalize bacteria

and clear apoptotic cells (23). We considered three major classes named A, B and C. (Fig.

S2A).

Proteins of the A class (SCRA) are associated with macrophages and bear

collagenous and coiled-coil domains that bind polyanionic ligands and serve receptor

trimerization, respectively. Some members of this subfamily contain a Scavenger

Receptor Cysteine-Rich (SRCR) domain which, in a human protein (MARCO), binds

both Gram+ and Gram- bacteria (24). Five SRCR-containing proteins and 4 orthologous

pairs exist both in the fruitfly and in the mosquito (Fig. S2B). The Drosophila protein

Tequila/GRAAL and its Anopheles ortholog SCRASP1 (formerly Sp22D, enriched in

hemocytes; (25) additionally bear multiple chitin binding domains (CBD) and a C-

terminal domain related to coagulation and inflammatory serine proteases (Fig. S2A).

SCRASP2 is similar but lacks CBDs. SCRAC bears a partial C-type lectin domain, and

the fourth orthologous pair, SCRAL, matches the Lysyl oxidase (lys_ox) domain of

human Lox proteins (copper-containing amine oxidases that convert primary amines to

reactive aldehydes), (26).

The numerous members of the B class (SCRB) represent the CD36 family of

receptors, associated with the uptake of multiple ligands and erythrocytes infected with P.

falciparum (27). A total of 15 Anopheles and 12 Drosophila genes belong to this family,

including 8 orthologous pairs (Fig. S2C). One sequence cluster (SCRBQ) includes five

Drosophila and four Anopheles members but only a single 1:1 ortholog; one member is

Page 12: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K

12

Croquemort, a macrophage receptor that mediates binding and phagocytosis of apoptotic

corpses (28). Another sequence cluster consists almost exclusively of 1:1 orthologs and

includes the fruitfly epithelial membrane protein, EMP (29).

The third class, SCRC, includes four Drosophila members, each with two

complement-control protein (CCP) domains followed by a MAM domain (Meprin A5

antigen and RPTP Mu), and usually a somatomedin-B-like (BO) domain (Fig. S2A).

Three members have been described previously as dSR-CI, -CII and –CIII (30) and are

thought to function as PRRs in phagocytosis and innate immunity; CCP together with

MAM bind bacteria in vitro (31). The macrophage-specific dSR-CI recognizes a broad

range of polyanionic ligands, much like the mammalian SCRA homologues. The single

mosquito member of this class resembles dSR-CI and dSR-CII but, surprisingly, bears

two transmembrane domains, at the NH2 and COOH termini (Fig. S2A), according to the

current annotation.

C-Type Lectins (CTL).

These extracellular proteins, which are membrane-bound or soluble, are named for their

Ca2+ dependence and have a ca. 130 residue carbohydrate recognition domain (CRD),

with 18 highly conserved residues including 4 cysteines (32). Some insect CTLs show

affinity for LPS, increase in abundance after body wall injury or are stage-specific,

suggesting roles in both immunity and development (33). In general, seven groups (I-VII)

have been defined by sequence similarity. In both Drosophila (24/34 members) and

Anopheles (17/22) group VII predominates; it bears a single CRD without flanking

accessory domains. Eleven orthologous pairs exist; those bearing a QPD tripeptide motif

Page 13: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K

13

are expected to bind preferentially galactose (CTLGA subfamily), while a pair showing

an EPN-motif (CTLMA subfamily) should bind mannose (34). Several species-specific

family expansions exist. The largest one is in Drosophila and includes 12 genes, all of

which are limited to two chromosomal regions. The largest mosquito expansion has

generated five additional CTLMA members, four of which are clustered within 12 kb at

25D (Fig. S3).

Three 1:1 orthologs are complex lectins possessing additional recognizable

domains. One of these orthologs includes SRAC1 (see above). The two others show

resemblance to vertebrate selectins, integral transmembrane proteins involved in cell

adhesion (35), although they lack an EGF domain present in all other selectins. Each of

these CTLSEs contains 10 Sushi repeats, adhesive units of 50-70 residues that are found

in many proteins participating in the complement immune responses of mammals (36).

The Drosophila gene furrowed encodes one of these CTLSE members, CG1500.

Galectins (GALEs).

These thiol-dependent lectins are distributed widely in metazoa, sponges and

multicellular fungi. A conserved 130 residue core forms the family-specific globular

CRD, which can interact with β-galactosidase. Sequence variation within the core is

significant (20-40% in mammals), and together with a multiplicity of family members

suggests involvement in diverse functions. Galectins function in both development and

immunity: in vertebrates they are involved in cell fate determination, cell proliferation,

apoptosis and innate immunity (37, 38). The Drosophila galectin, Dmgal, contains two

tandem CRDs.

Page 14: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K

14

The A. gambiae and D. melanogaster genomes encode eight and five galectins,

respectively. A group of five galectins, GALE4-8 represents a species-specific expansion

in the mosquito (Fig. S4), and these galectins have a single CRD architecture. Two

sequence clusters represent the double-CRD Dmgal architecture; they include GALE1

and GALE 2 and their respective orthologs, Dmgal and CG5335, plus an additional

Drosophila member in each cluster. Finally, GALE3 and its ortholog, CG18565, bear

single CRDs and dysferlin domains of unknown function.

Signal Modulation and Amplification

Clip domain serine proteases (CLIPs).

Seventy-six CLIPs with complete clip domains were analyzed: 41 in A. gambiae and 35

in D. melanogaster. Most are structurally similar, beginning with a short signal peptide

followed by the clip domain, then a linker region of highly variable length and the serine

protease domain. A few contain additional domains upstream of the clip domain or

downstream of the protease domain. Seven CLIPs contain two clip domains, 3 in A.

gambiae and 4 in D. melanogaster (see Table S1).

CLIPs can be grouped into 4 subfamilies, A-D. Subfamily A contains 21 members

(ten in Anopheles), but only two 1:1 orthologs. Most members have substitutions at one

or more of the critical His/Asp/Ser triad within the catalytic domain. Most members also

have an unusual arrangement within the conserved serine motif: GDGGSP, instead of

GDSGGP. Vertebrate serine proteases usually have eight cysteines within the catalytic

Page 15: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K

15

domain, and six of these are shared by invertebrate serine proteases (39, 40). A striking

feature of subfamily A is the appearance of the missing pair of cysteines within the

catalytic domain (C13/C17). Subfamily A is also characterized by an unusually short

spacing between C1 and C2 of the clip domain. This family contains several CLIPs that

have recently been identified as upregulated after bacterial infections in A. gambiae

(CLIPA1, 6, 7).

Subfamily B has 27 members in Anopheles and Drosophila, including Easter and

several A. gambiae CLIPs ( CLIPB1, 2, 4, 8, 9, 10, 14, 15 ) previously identified (41-43).

This subfamily includes three 1:1 orthologs and one OG. CLIPB1, 4 and 9 show modest

levels of upregulation following bacterial or malaria parasite infections in Anopheles.

Nearly all members of this subfamily can be identified by the presence of another pair of

cysteines within a short insertion in a region between the His and Asp residues of the

catalytic triad (C10/C11). They also tend to have clear activation sites at the beginning of

the catalytic domain of the form RXXGG, suggesting that proteases with specificity for

cleaving after Arg will be necessary to activate most members of this subfamily. The

prophenoloxidase activating enzymes of Holotrichia, Bombyx and Manduca all belong to

subfamily B.

Subfamily C has twelve members, including the previously identified Drosophila

proteins Persephone and Snake, and the Anopheles CLIPC3 and C4 (41). THe subfamily

contains two OGs but no 1:1 orthologs. Several members contain a Cys residue at

position 13 after the active site Ser residue. This subfamily is also characterized by an

activation site, where cleavage is expected to occur after the His or Leu residues.

Subfamily D has 15 members, including three 1:1 orthologs two OGs. A known member

Page 16: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K

16

is the Drosophila protein Stubble, which is important in cytoskeleton organization and

biogenesis (44, 45). Members of this group share the motif RIVGG at the activation site.

In total, we identified eight putative 1:1 orthologs and five putative OGs between

the two fly species, as well as several specific expansions in each species (Fig. 4A). An

analysis of the chromosomal location of all the proteases by subfamily also indicates

additional substantial clustering (data not shown). Finally, four Drosophila and three

Anopheles CLIPs bear two Clip domains.

Serpins (SRPNs).

These are well-conserved, 350-400 residues long proteins. Inhibitory serpins act as

suicide substrates, mostly for serine and more rarely cysteine proteases. Following a

variable, more or less structured N-terminal region, the compact serpin core fold consists

of three β-sheets, 7-9 α helices, a hinge and a C-terminal flexible Reactive Center Loop

(RCL), (46). The RCL acts as bait for the target protease, which cleaves a scissile P1-P1´

peptide bond in the RCL. The sequence of the hinge region determines (47) whether the

cleaved RCL can insert efficiently into the A β-sheet dramatically distorting and

repositioning the bound protease (inhibitory serpins), or not (non-inhibitory serpins).

The majority of the serpin genes in Drosophila and Anopheles are physically

clustered, at 4 chromosomal locations in each species. Eight fruit fly serpins appear to

represent a species-specific expansion (Fig. 4B).

Signal transduction pathways

Page 17: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K

17

Tolls and the Toll Pathway.

The Drosophila Toll (which we suggest to be renamed Toll-1 for consistency) was

originally identified in Drosophila as a key player in dorsoventral patterning in

embryogenesis, and was later shown to be also required for immune induction of

antifungal and anti-Gram+ bacterial responses (48), (11). This discovery and the

subsequent characterization of Toll-like receptors (TLR) in mammals have made the Toll

family and its signaling pathway paradigms of innate immune regulation (49). The Toll

family encodes single-pass transmembrane proteins, with leucine-rich repeats (LRR)

interspersed with cysteine knots in the N-terminal rapidly evolving extracellular domain,

and a C-terminal, intracellular Toll-interleukin 1 receptor (TIR) domain. Drosophila

encodes 8 additional family members, Toll-2 to -9 (50, 51). All members show

developmentally specific expression patterns (52). While the dual function of Drosophila

Toll-1 is clear, the functions of Toll-2 to -9 are currently emerging. Toll-5 and -9 have

been associated with the antifungal response (50, 51, 53) and Toll-2 is required for

general antimicrobial gene expression in larvae (54).

Two fruitfly genes, Toll-1 and -5 form a probable OG together with four mosquito

genes, TOLL1A, 1B, 5A and 5B (Fig. 4C); only the mosquito genes have an additional

intron within the TIR-coding region. Interestingly, Drosophila Toll-1 and -5 are closely

related in their TIR domain, lack this intron and cluster in the phylogenetic tree apart

from the mosquito genes, suggesting the possibility of divergent evolution occuring in

parallel within each taxonomic lineage. However, a long C-terminal extension of similar

sequence with a nearly identical 18-residue segment, identifies Toll-1 and TOLL1A as

putative orthologs. A related protein, TOLL1B, has a shorter and more divergent

Page 18: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K

18

extension. Two others, TOLL5A and 5B, have the shortest tails, a feature that is shared

with Toll-5. Although Toll-1 and -5 are unlinked, the mosquito genes are physically

clustered, TOLL1A with 5A at cytological location 6C and TOLL1B with 5B at 39B, as

determined by in situ hybridization. We suggest that the type 1 and 5 Tolls were

ancestrally fixed and duplicated in the mosquito lineage, giving the TOLL1A/5A and

TOLL1B/5B gene clusters; and that selection is maintaining both sequence similarities

and contrasting features (length of C-terminal tail) in type 1 and 5 Tolls.

In Drosophila, the end points of the Toll pathway are the Rel transcription factors,

Dorsal and Dif. The ortholog of Drosophila Dorsal, Gambif-1 was identified previously

(55). Examination of the genome sequence identified a new 3-end exon for Gambif-1,

which may be alternatively spliced as in D. melanogaster (where a BigDorsal cDNA has

been characterized). However, no mosquito gene encoding an ortholog of Dif has been

found.

Imd and STAT pathway.

In Drosophila, Gram- bacterial infections signal through the alternative Imd pathway,

leading to nuclear translocation of the Rel transcription factor Relish and subsequent

expression of antibacterial AMPs like cecropin (56). An Anopheles ortholog of Relish has

been detected (Fig. S5). Recently it has been demonstrated that PGRP-LC in Drosophila

functions as a receptor for the Imd pathway (57). Exhaustive analysis of additional

signaling pathways is beyond the scope of the present study, but it appears that the Imd

pathway is also operative in the mosquito.

Page 19: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K

19

Other than Rel factors, STATs are important nuclear mediators of immunity. In

Drosophila, only one STAT gene is known. In Anopheles, STAT1 is produced from an

intron-less gene, and undergoes nuclear translocation after immune challenge (58). A

second gene, STAT2, was identified (Fig. S6), which possesses six introns (three at the

same locations as in DmSTAT). The two mosquito STAT genes are closely related (46%

identify, closer than either is to DmSTAT), indicating duplication after separation of the

mosquito and Drosophila lineages. We cannot exclude the possibility that STAT1 arose

by retrotransposon-mediated gene duplication (59).

Immune Effectors

Prophenoloxidases (PPOs).

PPOs evolved from hemocyanin oxygen transporters and, like them, contain two copper

binding sites, each with three essential histidines at conserved positions. The three newly

described members of the PPO family, AgPPO7, 8 and 9, contain the usual conserved

features, except that PPO9 lacks two sites (RF and RE) where proteolytic activation

normally occurs; alternative tryptic target sites may be used instead.

Seven mosquito genes are clustered in tandem orientation within ca. 65 kb at 21B

(2L), while PPO2 is located at 24B (2L). The apparently primitive PPO1 gene is located

outside the 2L chromosome (at 2R, subdivision 13B). The three Drosophila genes are not

physically clustered (Table S1).

Anticrobial peptides (AMPs).

Page 20: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K

20

Insect AMPs can be assigned to three major classes: peptides with cyteine bonds (e.g.

defensins), linear peptides forming amphipathic and hydrophobic α-helices (e.g.

cecropins), and those where particular amino-acids are over-represented. In infected

Drosophila, at least seven distinct families of AMPs are produced (60).

Insect defensins (DEFs) are preferentially active against Gram+ bacteria. More than

48 defensins have been reported from a wide range of insect orders and even scorpions

and mollusks (61). They are often 34-46 residues long and synthesized as precursors with

propeptides. The four Anopheles defensin genes are unlinked and very diverse (Fig. S7).

The most common types of defensin can now be described in terms of three clades. Clade

I may be specific for Diptera; it includes all the members in Aedes, Drosophila defensin

and the previously described defensin of Anopheles (62) which we shall call DEF1. Clade

II includes hymenopteran defensins but also those of the fly Stomoxys. Clade III includes

both hemipteran and dipteran (Chironomus) defensins. Clade IV includes highly

divergent defensins: three new members from Anopheles (DEF 2,3,4) and the dipteran

Sapecin C and lepidopteran Heliomicin. Finally, Clade V are the ancient defensins known

from a mussel, and the dragonfly Aeschna.

Cecropins (CECs) are especially potent against Gram- bacteria. More than 40 are

now known from various higher insects (Diptera and Lepidoptera), and they usually are

31-39 residues in length. Typical features are a tryptophan residue at position 1 or 2, and

post-translational amidation of a C-terminal glycine. Both features are thought to increase

peptide stability and efficacy against bacteria. Interestingly, when compared to the

Page 21: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K

21

Drosophila Cecropin A which has these features, the only previously reported Anopheles

CEC1 lacking the tryptophan residue was found to be efficient against yeast and a wider

spectrum of Gram+ bacteria. Four physically clustered cecropin genes exist in

Drosophila. Four are also present in Anopheles (Fig. S8). However, the peptide

sequences of these species belong to different clades, specific for brachyceran and

nematoceran Diptera, respectively; a third clade represents the cecropins of Lepidoptera.

Interestingly, of the Anopheles cecropins only CEC1 is closely similar to the Aedes

cecropins, while CEC2, 3 and 4 are highly divergent, supporting the possibility of

diversified antimicrobial specificities. All Anopheles cecropins lack the tryptophan at the

N-terminus.

One other antimicrobial peptide has been reported from Anopheles, Gambicin

(63). It has 61 residues and shows similarity to only one peptide, from Aedes (Genbank

accession number AAL76025). No paralogue of this gene was discovered in the full

Anopheles genome. In summary, at this stage it appears that the Anopheles AMPs are

encoded by the two most widespread AMP gene families and a mosquito-specific AMP

gene.

Caspases.

Caspases are a family of aspartate dependent endopeptidases. One subfamily includes

enzymes processing pro-inflammatory cytokines (e.g. Interleukin-1β converting enzyme,

ICE), and the other includes enzymes regulating cell death (e.g. the product of the ced3

gene in C. elegans). Both subfamilies are divided into groups carrying either long (L) or

Page 22: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K

22

short (S) prodomains (64). L-prodomain caspases are initiators that respond to pro- and

anti- apoptotic regulators via adaptor molecules carrying homologous interation domains.

Oligomerized initiator caspases auto-activate by self-cleavage into active

heterotetrameric subunits. For example, the Drosophila initiator caspase, DRONC, and

the adaptor Dark interact via their caspase-recruitment domains (CARD) to form

apoptotically active complexes. Similarly, the initiator caspase DREDD interacts via its

death effector domain (DED) with the apoptotic adaptor dFADD. These activated

apoptotic complexes initiate a proteolytic cascade of S-prodomain effector caspases,

which in turn cleave vital substrates and thus lead to cell death.

Genomic analysis has identified eleven, three and seven caspases in humans, C.

elegans and D. melanogaster (65). Omitting putative haplotypes, we identified 12

caspases in A. gambiae (Fig. S9) of which two are L initiator and 10 are S effector

caspases. Of the three Drosophila L-prodomain caspases, DRONC, DREDD, and

DREAM, the first two have Anopheles orthologs. DREAM does not, but its nearest

Drosophila paralog, DAMM, is a short-prodomain caspase, as are the most similar

Anopheles S9 and S10 caspases, which are clustered at locus 40B; this OG presumably

originated with a short-prodomain caspase, with DREAM representing a novel L-

prodomain member that evolved in the Drosophila lineage. Two other Drosophila S-

prodomain caspases, DRICE and DCP-1, are grouped with two mosquito caspases, S7

and S8. Finally, the fourth Drosophila effector caspase (DECAY) is associated in an OG

together with an expanded group of six Anopheles effector caspases, which are physically

clustered in two chromosomal loci, 21A and 43D. Interestingly, the mosquito initiators

CASPL1 and CASPL2 map to the same loci (Fig. S9B).

Page 23: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K

23

The negative cellular regulators of caspases, IAPs, are characterized by a 70-

residue domain, the baculoviral IAP repeat (BIR). Most members also carry additional C-

terminal domains, such as a RING finger motif or an ubiquitin-conjugating domain

(UBC). The BIR domains are involved with binding and inhibition of mature caspases,

whereas the RING and UBC domains may regulate caspase activity by protein

degradation pathways (66). In Drosophila four IAPs with distinct domain architecture

possess documented anti-apoptotic activity. The Anopheles IAPs include clear orthologs

of three Drosophila IAPs (THREAD, DETERIN and BRUCE). IAP2, the closest

homologue of DIAP2, although the predicted protein sequence contains only one BIR

domain. The two THREAD related members physically linked to Anopheles IAP1 (at

25D) may represent a clade lost from the Drosophila lineage. The expansion of both IAPs

and effector caspases in the mosquito as compared to Drosophila possibly suggests co-

evolution of apoptotic regulators that may fine tune cell death and/or immune responses

in the mosquito, such as those in midgut cells invaded by Plasmodium. A sixth Anopheles

IAP gene exists but is incomplete in the genome assembly and cannot be classified with

certainty.

The search for mosquito pro-apoptotic genes was hampered by the rapid sequence

diversification of the main players (65). We were unable to identify Anopheles

homologues of the three clustered fruitfly genes (rpr, grim, and wrinkled) that are

responsible for the majority of embryonic apoptotic cell death. However, blast analysis

has identified loci potentially encoding orthologs of FADD, Apaf-1, Acinus, Aif, Scythe,

Page 24: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K

24

as well as the two pro-apoptotic Bcl-2 homologues found in Drosophila, dBorg-1 and

dBorg-2.

Page 25: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K

25

Figure Legends

Table S1: Anopheles and Drosophila immunity gene list

Table lists A. gambiae immune-related genes according to the nomenclature convention

proposed here; indicates synonyms (if any); Ensembl gene predictions and protein

predictions; scaffold names; best estimate of chromosomal location based on BAC

hybridizations to polytene chromosomes (performed in the F. H. Collins and F. C.

Kafatos laboratories; (1); names Drosophila orthologs (if any); and various comments

such as likely duplicate haplotypes (1), genome coordinates, protein or gene features

accession number, etc. Gene families are presented in the order they are discussed in the

article and supplementary materials.

Figure S1: GNBP family

(A) Unrooted phylogenetic tree of the GNBP family. (B) Exon-intron structure of

GNBPA and B genes (scale is indicated). The position of introns is different in all genes

within mosquito and even between orthologous genes GNBPA3 in fruit fly and mosquito

(data not shown), consistent with the hypothesis of recent invasion of introns in

eukaryotic genes. (C) Schematic representation of the chromosomal arrangement of

GNBPB genes. GNBPB2, GNBPB3 and GNBPB4 are tightly clustered (B2 and B3 are

separated by 4441bp and B3 and B4 by 5136bp). Given the bootstrap values (not shown)

of the phylogenetic tree (A) and the physical clustering it is likely that the four GNBPB

genes arose from duplications.

Page 26: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K

26

Figure S2: SCR family

(A) Schematic representation of protein domain arrays detected in the Anopheles and

Drosophila putative scavenger receptors (SCRs). Predicted proteins are grouped in

Classes A, B and C. SRCR, Scavenger receptor cysteine-rich; FRI, Frizzled domain; LD,

low density lipoprotein; Lys_ox, Lysyl oxidase; CBD, chitin binding domain; SP,

trypsin-like serine protease; CL, C-type lectin; BO, Somatomedin B; CCP, complement-

control protein; MAM, Meprin A5 antigen and RPTP Mu; TM, transmembrane domain.

Presence of a methionine at the protein NH2-terminus is represented by a circle. (B)

Phylogenetic analysis of Class A-like SCRs (SCRAs). Sequences cluster into 3 groups

(shaded in gray): SCRASP (SCRAs with SP domains), SCRAC (SCRAs with CL

domains) and SCRAL (SCRAs with Lys_ox domains). (C) Phylogenetic analysis of the

Class B SCRs (SCRBs). The central regions (containing the CD36 domain) of 15

Anopheles and 12 Drosophila predicted proteins are compared. Croquemort-related genes

(SCRBQ) are shaded in gray.

* indicate genes possibly belonging to an orthologous group as defined by microsyntenic

analysis.

Figure S3: CTL family

Tree is based on C-type lectin domain sequence alignment. Several protein clusters are

highlighted: CTLSE, Selectins with C-type lectin domain; CTLMA and CTLGA, C-type

lectins with mannose and galactose binding motifs, respectively; SRAC, scavenger

receptors with C-type lectin domains.

Page 27: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K

27

Figure S4: GALE family

Color scheme and further information on the tree can be found in the Methods section of

the Supplementary Materials.

Figure S5: REL family

Color scheme and further information on the tree can be found in the Methods section of

the Supplementary Materials.

Figure S6: STAT family

Color scheme and further information on the tree can be found in the Methods section of

the Supplementary Materials.

Figure S7: DEF family

Gray shaded areas indicate the 5 different clades of Defensins.Clade I: Diptera; II:

Hymenoptera and Diptera; III: Hemiptera, Diptera; IV: Divergent defensins; V: Ancient

Defensins.

Figure S8: CEC family

Page 28: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K

28

Note: D. melanogaster Cecropins A1 and A2 are identical at the protein level (A1/A2

Dm) although encoded by 2 different genes. Gray shaded areas indicate the 3 different

clades of Cecropins.

Figure S9: CASP family

(A) Tree was constructed from caspase domains (InterPro IPR002398) of predicted

proteins. Underlined names indicate long prodomain (initiator) caspases. (B) Anopheles

predicted CASPs (arrows) are physically located in 3 clusters. Putative haplotypes are

indicated by asterisks and are not presented in A.

Figure S10: IAP family

Tree was constructed from alignment of complete predicted sequences. Conserved

structural architecture of domains is indicated at the bottom of the figure (BIR,

baculoviral IAP repeat; RING, Ring-finger motif; CARD, caspase-recruitment domain).

BIR domains are colored-coded to indicate relative similarity compared to THREAD.

Page 29: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K

29

3. Supporting figures

Page 30: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K

30

Page 31: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K

31

Page 32: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K

32

Page 33: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K

33

Page 34: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K

34

Page 35: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K

35

Page 36: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K

36

Page 37: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K

37

Page 38: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K

38

Page 39: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K

39

4. S

uppo

rtin

g ta

bles

Tab

le S

1: A

noph

eles

and

Dro

soph

ila im

mun

ity

gene

list

Ano

phel

es g

ene

list

Fam

ilyS

ubfa

mily

Gen

e N

ame

Syn

onym

Ens

embl

#P

rote

in p

redi

ctio

nS

caff

old

Chr

. loc

atio

nD

m O

rtho

logs

(if

any)

Com

men

ts

PG

RP

PG

RP

LP

GR

PLB

EN

SA

NG

G00

0000

1145

9ag

CP

1201

7A

AA

B01

0089

877A

PG

RP

-LB

(C

G14

704)

P

GR

PLD

NO

T P

RE

DIC

TE

DN

OT

PR

ED

ICT

ED

AA

AB

0100

8960

23A

PG

RP

-LD

(C

G55

23)

PG

RP

LAE

NS

AN

GG

0000

0007

952

agC

P15

020/

1511

4A

AA

B01

0089

0021

FP

GR

P-L

A (

CG

4384

)A

lso

CG

1861

4 an

d C

G43

61

PG

RP

LCE

NS

AN

GG

0000

0007

834

agC

P15

107

AA

AB

0100

8900

21F

PG

RP

-LC

(C

G44

32)

PG

RP

SP

GR

PS

1E

NS

AN

GG

0000

0014

831

agC

P13

479

AA

AB

0100

8846

1D

P

GR

PS

2E

NS

AN

GG

0000

0010

489

agC

P58

98A

AA

B01

0089

6023

A-2

5D

P

GR

PS

3E

NS

AN

GG

0000

0010

490

agC

P59

06A

AA

B01

0089

6023

A-2

5D

GN

BP

GN

BP

AG

NB

PA

1

EN

SA

NG

G00

0000

1777

1ag

CP

3847

AA

AB

0100

8807

25D

G

NB

PA

2

EN

SA

NG

G00

0000

0671

9E

BIP

8943

AA

AB

0100

8986

43D

CG

5008

(G

NB

PA

3)

GN

BP

BG

NB

PB

1A

gGN

BP

EN

SA

NG

G00

0000

1520

5ag

CP

1409

3A

AA

B01

0088

9819

C

G

NB

PB

2

EN

SA

NG

G00

0000

1452

8ag

CP

1153

AA

AB

0100

8851

13E

G

NB

PB

3

EN

SA

NG

G00

0000

1454

6ag

CP

1164

AA

AB

0100

8851

13E

GN

BP

B4

E

NS

AN

GG

0000

0013

732

agC

P17

31A

AA

B01

0088

5113

E

SC

RS

CR

AS

CR

AS

P1

CP

6127

(SpD

22)

EN

SA

NG

G00

0000

1930

7ag

CP

6127

AA

AB

0100

8960

23A

Teq

uila

S

CR

AS

P2

E

NS

AN

GG

0000

0008

472

agC

P55

48A

AA

B01

0089

6023

A-2

5DC

G21

05

S

CR

AS

P3

E

NS

AN

GG

0000

0005

937

EB

IP78

71A

AA

B01

0089

878D

S

CR

AC

1

EN

SA

NG

G00

0000

1901

4ag

CP

4856

AA

AB

0100

8984

33D

CG

3921

SC

RA

L1

EN

SA

NG

G00

0000

1782

3ag

CP

2405

AA

AB

0100

8880

18A

LOX

2

SC

RB

SC

RB

1

EN

SA

NG

G00

0000

1171

9ag

CP

1279

AA

AB

0100

8859

11C

S

CR

B2

E

NS

AN

GG

0000

0013

404

agC

P65

00A

AA

B01

0089

6023

AC

G74

22

S

CR

B3

E

NS

AN

GG

0000

0013

400

agC

P64

64A

AA

B01

0089

6023

AC

G18

87

S

CR

B4

E

NS

AN

GG

0000

0013

409

agC

P65

24A

AA

B01

0089

6023

A

S

CR

B5

E

NS

AN

GG

0000

0007

786

agC

P17

14A

AA

B01

0088

5913

E

S

CR

B6

E

NS

AN

GG

0000

0001

210

EB

IP14

14A

AA

B01

0089

5219

C-1

9DC

G10

345

S

CR

B7

E

NS

AN

GG

0000

0010

163

agC

P43

28A

AA

B01

0089

0520

DC

G27

36

S

CR

B8

E

NS

AN

GG

0000

0010

154

agC

P43

98A

AA

B01

0089

0520

DC

G38

29

S

CR

B9

E

NS

AN

GG

0000

0010

167

agC

P43

29A

AA

B01

0089

0520

DE

mp

S

CR

B10

E

NS

AN

GG

0000

0017

284

agC

P13

309

AA

AB

0100

8846

5A

S

CR

B11

E

NS

AN

GG

0000

0018

883

agC

P13

24A

AA

B01

0088

5911

CC

G70

00

Page 40: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K

40

Ano

phel

es g

ene

list

Fam

ilyS

ubfa

mily

Gen

e N

ame

Syn

onym

Ens

embl

#P

rote

in p

redi

ctio

nS

caff

old

Chr

. loc

atio

nD

m O

rtho

logs

(if

any)

Com

men

ts

SC

R (

cont

.)

SC

RB

12

EN

SA

NG

G00

0000

1399

8ag

CP

5912

AA

AB

0100

8960

23A

S

CR

BQ

1

EN

SA

NG

G00

0000

0981

6ag

CP

8667

AA

AB

0100

8980

36C

S

CR

BQ

2

EN

SA

NG

G00

0000

0979

9ag

CP

8642

AA

AB

0100

8980

36C

S

CR

BQ

3

EN

SA

NG

G00

0000

1941

4ag

CP

1157

1A

AA

B01

0089

6429

F

SC

RB

Q4

E

NS

AN

GG

0000

0016

196

agC

P15

529

AA

AB

0100

8904

15B

CG

1278

9

agC

P10

081

Hap

loty

pe

SC

RC

SC

RC

1

EN

SA

NG

G00

0000

1271

5ag

CP

9671

AA

AB

0100

8986

43D

TE

P

TE

P1

aTE

P-I

EN

SA

NG

G00

0000

1436

8ag

CP

4020

AA

AB

0100

8951

40A

T

EP

2E

NS

AN

GG

0000

0017

238

agC

P11

437

AA

AB

0100

8964

29A

Tep

III, T

EP

15

T

EP

3E

NS

AN

GG

0000

0013

794

agC

P40

24A

AA

B01

0089

5140

A

T

EP

4ag

IMC

R14

EN

SA

NG

G00

0000

1872

7ag

CP

8988

AA

AB

0100

8979

39C

TE

P5

ENSA

NG

G00

0000

1379

4/a

gC

P40

16/

AA

AB

0100

8951

40A

ENSA

NG

G00

0000

1435

5a

gC

P40

17

T

EP

6E

NS

AN

GG

0000

0014

364

agC

P40

19A

AA

B01

0089

5140

A

T

EP

7E

NS

AN

GG

0000

0014

360

agC

P40

18A

AA

B01

0089

5140

A

T

EP

8E

NS

AN

GG

0000

0015

631

agC

P10

561

AA

AB

0100

8823

40B

T

EP

9E

NS

AN

GG

0000

0015

632

agC

P10

570

AA

AB

0100

8823

40B

T

EP

10E

NS

AN

GG

0000

0015

628

agC

P10

523

AA

AB

0100

8823

40B

T

EP

11E

NS

AN

GG

0000

0015

629

agC

P10

531

AA

AB

0100

8823

40B

T

EP

12E

NS

AN

GG

0000

0010

537

agC

P15

205

AA

AB

0100

8944

30E

T

EP

13E

NS

AN

GG

0000

0005

017

EB

I662

9A

AA

B01

0089

6429

AT

epV

I

T

EP

14E

NS

AN

GG

0000

0017

173

agC

P11

010

AA

AB

0100

8964

29A

T

EP

15E

NS

AN

GG

0000

0017

033

agC

P10

937

AA

AB

0100

8964

29A

Tep

III, T

EP

2

TE

P16

ENSA

NG

G00

0000

1879

3a

gC

P90

15A

AA

B01

0089

7939

C-4

0A

Put

ativ

e ha

plot

ype

of T

EP

1

TE

P17

ENSA

NG

G00

0000

1878

9a

gC

P90

09A

AA

B01

0089

7939

C

Put

ativ

e ha

plot

ype

of T

EP

5

TE

P18

ENSA

NG

G00

0000

1879

1a

gC

P90

14A

AA

B01

0089

7939

C

Put

ativ

e ha

plot

ype

of T

EP

6

TE

P19

EN

SAN

GG

0000

0015

630

ag

CP

1055

2A

AA

B01

0088

2340

B

Put

ativ

e ha

plot

ype

of T

EP

8

GA

LE

GA

LE1

E

NS

AN

GG

0000

0014

203

agC

P49

92A

AA

B01

0089

8432

AG

alec

tin (

CG

1137

2), C

G11

374

G

ALE

2

EN

SA

NG

G00

0000

1239

5 ag

CP

1373

7A

AA

B01

0088

461D

CG

5335

G

ALE

3

EN

SA

NG

G00

0000

1974

6ag

CP

2078

AA

AB

0100

8948

21A

CG

1856

5

G

ALE

4

EN

SA

NG

G00

0000

1318

0ag

CP

7067

AA

AB

0100

8816

42B

G

ALE

5

EN

SA

NG

G00

0000

1313

5ag

CP

6926

AA

AB

0100

8816

42B

G

ALE

6

EN

SA

NG

G00

0000

0818

1ag

CP

2657

AA

AB

0100

8968

20D

Page 41: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K

41

Ano

phel

es g

ene

list

Fam

ilyS

ubfa

mily

Gen

e N

ame

Syn

onym

Ens

embl

#P

rote

in p

redi

ctio

nS

caff

old

Chr

. loc

atio

nD

m O

rtho

logs

(if

any)

Com

men

ts

GA

LE (

cont

.)

GA

LE7

E

NS

AN

GG

0000

0002

705

EB

IP33

68A

AA

B01

0089

6820

D

GA

LE8

IGA

LE20

NO

T P

RE

DIC

TE

DN

OT

PR

ED

ICT

ED

UN

KN

OW

NU

NK

NO

WN

CT

LC

TLG

AC

TLG

A1

E

NS

AN

GG

0000

0009

790

agC

P83

97A

AA

B01

0089

8036

CC

G60

55

C

TLG

A2

E

NS

AN

GG

0000

0017

954

agC

P59

99A

AA

B01

0088

0723

A-2

5DC

G41

15

C

TLG

A3

E

NS

AN

GG

0000

0009

745

agC

P83

53A

AA

B01

0089

8036

CC

G32

44

CT

LGA

4

NO

T P

RE

DIC

TE

D

AA

AB

0100

8859

13E

CT

LMA

CT

LMA

1

EN

SA

NG

G00

0000

1584

2ag

CP

3775

AA

AB

0100

8807

25D

C

TLM

A2

N

OT

PR

ED

ICT

ED

A

AA

B01

0089

635C

C

TLM

A3

E

NS

AN

GG

0000

0015

840

agC

P37

70A

AA

B01

0088

0725

D

C

TLM

A4

E

NS

AN

GG

0000

0015

439

agC

P38

53A

AA

B01

0088

0725

D

C

TLM

A5

E

NS

AN

GG

0000

0016

025

agC

P33

02A

AA

B01

0088

0725

D

CT

LMA

6

EN

SA

NG

G00

0000

1844

9ag

CP

5680

AA

AB

0100

8960

21F

-23A

CG

9134

CT

LSE

CT

LSE

1

EN

SA

NG

G00

0000

0946

9ag

CP

5322

AA

AB

0100

8815

5CC

G15

00

CT

LSE

2

EN

SA

NG

G00

0000

0614

3E

BIP

8141

AA

AB

0100

8846

4CC

G90

95

CT

L C

TL1

E

NS

AN

GG

0000

0018

421

agC

P56

20A

AA

B01

0089

6021

F-2

3A

C

TL2

N

OT

PR

ED

ICT

ED

A

AA

B01

0088

4839

C

C

TL3

E

NS

AN

GG

0000

0008

945

agC

P27

87A

AA

B01

0089

6820

C

C

TL4

E

NS

AN

GG

0000

0018

677

agC

P64

06A

AA

B01

0089

6021

F-2

3A

C

TL5

E

NS

AN

GG

0000

0018

273

agC

P13

553

AA

AB

0100

8846

1D

C

TL6

E

NS

AN

GG

0000

0018

058

agC

P57

16A

AA

B01

0089

6023

A-2

5DC

G14

866

C

TL7

E

NS

AN

GG

0000

0018

029

agC

P42

67A

AA

B01

0088

115C

CG

1576

5

CT

L8

EN

SA

NG

G00

0000

0940

1ag

CP

7946

AA

AB

0100

8888

15D

-16A

CG

1486

6

C

TL9

EN

SA

NG

G00

000

008

133

EB

IP10

622

AA

AB

0100

8859

11C

-13E

CG

1843

1

S

CR

AC

1

EN

SA

NG

G00

0000

0940

1ag

CP

7946

AA

AB

0100

8984

33D

CG

3921

Als

o lis

ted

in S

CR

FB

NF

BN

1

EN

SA

NG

G00

0000

1252

3ag

CP

6864

AA

AB

0100

8816

42B

FB

N2

EN

SA

NG

G00

0000

0877

6ag

CP

7061

AA

AB

0100

8816

42B

FB

N3

EN

SA

NG

G00

0000

0877

3ag

CP

7060

AA

AB

0100

8816

42B

FB

N4

EN

SA

NG

G00

0000

0622

7E

BIP

8256

AA

AB

0100

8816

42B

ha

plot

ype

EN

SA

NG

G00

0000

1319

4

FB

N5

EN

SA

NG

G00

0000

0625

4E

BIP

8288

AA

AB

0100

8816

42B

FB

N6

EN

SA

NG

G00

0000

1315

5ag

CP

6947

AA

AB

0100

8816

42B

FB

N7

EN

SA

NG

G00

0000

0624

8E

BIP

8282

AA

AB

0100

8816

42B

FB

N8

EN

SA

NG

G00

0000

0876

3ag

CP

7043

AA

AB

0100

8816

42B

FB

N9

AgF

BN

L11

EN

SA

NG

G00

0000

0875

9ag

CP

7034

AA

AB

0100

8816

42B

Page 42: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K

42

Ano

phel

es g

ene

list

Fam

ilyS

ubfa

mily

Gen

e N

ame

Syn

onym

Ens

embl

#P

rote

in p

redi

ctio

nS

caff

old

Chr

. loc

atio

nD

m O

rtho

logs

(if

any)

Com

men

ts

FB

N (

cont

.)F

BN

10E

NS

AN

GG

0000

0013

191

agC

P70

93A

AA

B01

0088

1642

B

FB

N11

EN

SA

NG

G00

0000

0880

4ag

CP

7104

AA

AB

0100

8816

42B

FB

N12

EN

SA

NG

G00

0000

1808

4ag

CP

1175

4A

AA

B01

0089

3339

A

FB

N13

EN

SA

NG

G00

0000

1928

5ag

CP

8979

AA

AB

0100

8979

39C

FB

N14

EN

SA

NG

G00

0000

1880

8ag

CP

8950

AA

AB

0100

8979

39C

-40B

FB

N15

EN

SA

NG

G00

0000

1030

5ag

CP

1303

3A

AA

B01

0074

95U

NK

NO

WN

FB

N16

EN

SA

NG

G00

0000

1718

6ag

CP

8960

AA

AB

0100

8979

39C

-40B

FB

N17

EN

SA

NG

G00

0000

1716

5ag

CP

9023

AA

AB

0100

8979

39C

FB

N18

EN

SA

NG

G00

0000

1779

2ag

CP

8985

AA

AB

0100

8979

39C

FB

N19

EN

SA

NG

G00

0000

1719

5ag

CP

8965

AA

AB

0100

8979

39C

FB

N20

EN

SA

NG

G00

0000

1774

9ag

CP

8966

AA

AB

0100

8979

39C

FB

N21

EN

SA

NG

G00

0000

1718

9ag

CP

8961

AA

AB

0100

8979

39C

hapl

otyp

e E

NS

AN

GG

0000

0017

158

FB

N22

EN

SA

NG

G00

0000

1779

3ag

CP

8986

AA

AB

0100

8979

39C

FB

N23

AgF

BN

U10

EN

SA

NG

G00

0000

1570

3ag

CP

1049

3A

AA

B01

0088

2340

B

FB

N24

A/B

EN

SA

NG

G00

0000

0196

3E

BIP

2335

AA

AB

0100

8948

21A

FB

N25

AgF

BN

E3

EN

SA

NG

G00

0000

1882

9ag

CP

2049

AA

AB

0100

8948

21A

FB

N26

EN

SA

NG

G00

0000

0895

9ag

CP

2145

AA

AB

0100

8948

21B

hapl

otyp

e E

NS

AN

GG

0000

0013

950

FB

N27

EN

SA

NG

G00

0000

0836

5ag

CP

2143

AA

AB

0100

8948

21B

FB

N28

EN

SA

NG

G00

0000

0898

9ag

CP

2150

AA

AB

0100

8948

21B

FB

N29

EN

SA

NG

G00

0000

0903

2ag

CP

2016

AA

AB

0100

8948

21B

ha

plot

ype

EN

SA

NG

0000

0013

951

FB

N30

EN

SA

NG

G00

0000

1920

9ag

CP

3444

AA

AB

0100

8807

25D

FB

N31

EN

SA

NG

G00

0000

1777

9ag

CP

3894

AA

AB

0100

8807

25D

FB

N32

EN

SA

NG

G00

0000

0576

2E

BIP

7635

AA

AB

0100

8807

25D

FB

N33

AgF

BN

27E

NS

AN

GG

0000

0017

827

agC

P31

43A

AA

B01

0088

0725

D

FB

N34

EN

SA

NG

G00

0000

0907

5ag

CP

1227

2A

AA

B01

0089

877A

CG

9593

FB

N35

EN

SA

NG

G00

0000

0881

9ag

CP

1289

3A

AA

B01

0089

878D

1

FB

N36

EN

SA

NG

G00

0000

1132

2ag

CP

5142

AA

AB

0100

0889

43D

ha

plot

ype

EN

SA

NG

G00

0000

1397

1

FB

N37

EN

SA

NG

G00

0000

1707

4ag

CP

4816

AA

AB

0100

8984

32A

-32D

FB

N38

EN

SA

NG

G00

0000

0774

6ag

CP

8057

AA

AB

0100

8980

35B

-36C

FB

N39

EN

SA

NG

G00

0000

1889

1ag

CP

7333

AA

AB

0100

8847

1A

FB

N40

NP

1N

OT

PR

ED

ICT

ED

25

D

2L-2

1456

823-

2146

1521

FB

N41

NP

2N

OT

PR

ED

ICT

ED

25

DS

CA

B2L

-401

2838

9-40

1328

74

FB

N42

NP

3N

OT

PR

ED

ICT

ED

21

A

2L-5

4795

71-5

4840

71

Page 43: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K

43

Ano

phel

es g

ene

list

Fam

ilyS

ubfa

mily

Gen

e N

ame

Syn

onym

Ens

embl

#P

rote

in p

redi

ctio

nS

caff

old

Chr

. loc

atio

nD

m O

rtho

logs

(if

any)

Com

men

ts

FB

N (

cont

.)F

BN

43N

P4

NO

T P

RE

DIC

TE

D

42B

3L

-193

3511

8-19

3397

35

FB

N44

NP

5N

OT

PR

ED

ICT

ED

42

B

3L-1

9390

511-

1939

5119

A

FB

N45

NP

6N

OT

PR

ED

ICT

ED

42

B

3L-1

9390

511-

1939

5119

B

FB

N46

NP

7N

OT

PR

ED

ICT

ED

NO

T P

RE

DIC

TE

D

42B

3L

-195

6211

1-19

5667

22

FB

N47

NP

8N

OT

PR

ED

ICT

ED

NO

T P

RE

DIC

TE

D

42B

3L

-195

6561

1-19

5702

25

FB

N48

NP

9N

OT

PR

ED

ICT

ED

NO

T P

RE

DIC

TE

D

39C

3L

-877

3099

-877

7683

FB

N49

NP

10N

OT

PR

ED

ICT

ED

NO

T P

RE

DIC

TE

D

39C

3L

-879

5783

-880

0286

FB

N50

NP

11N

OT

PR

ED

ICT

ED

NO

T P

RE

DIC

TE

D

39C

3L

-880

6357

-881

0968

FB

N51

NP

12N

OT

PR

ED

ICT

ED

NO

T P

RE

DIC

TE

D

39C

3L

-882

4145

-882

8762

FB

N52

NP

13N

OT

PR

ED

ICT

ED

NO

T P

RE

DIC

TE

D

39C

3L

-902

0287

-902

4808

FB

N53

NP

14N

OT

PR

ED

ICT

ED

NO

T P

RE

DIC

TE

D

39C

3L

-904

6294

-905

0902

FB

N54

NP

15, A

gFB

N25

2N

OT

PR

ED

ICT

ED

NO

T P

RE

DIC

TE

D

42B

3L

-193

3875

4-19

3433

68

FB

N55

NP

16N

OT

PR

ED

ICT

ED

NO

T P

RE

DIC

TE

D

33D

3R

-303

2440

2-30

3289

38

FB

N56

NP

17N

OT

PR

ED

ICT

ED

NO

T P

RE

DIC

TE

D

UN

KN

OW

N

UN

KN

-227

3997

1-22

7445

79

FB

N57

NP

18N

OT

PR

ED

ICT

ED

NO

T P

RE

DIC

TE

D

UN

KN

OW

N

UN

KN

-462

7657

7-46

2809

93

EN

SA

NG

G00

0000

0880

7ag

CP

7105

AA

AB

0100

8816

42B

p

seud

ogen

e

EN

SA

NG

G00

0000

1716

3ag

CP

9022

AA

AB

0100

8979

39C

p

seud

ogen

e

E

NS

AN

GG

0000

0008

751

agC

P69

40A

AA

B01

0088

1642

B

pse

udog

ene

CLI

PC

LIP

AC

LIP

A1

ISP

R20

EN

SA

NG

G00

0000

1777

3ag

CP

9913

AA

AB

0100

8986

43D

-46D

CLI

PA

2IS

PL5

EN

SA

NG

G00

0000

1776

3ag

CP

9904

AA

AB

0100

8986

44B

-43D

two

clip

dom

ains

CLI

PA

3E

NS

AN

GG

0000

0012

814

agC

P67

80A

AA

B01

0082

00U

NK

NO

WN

CG

1331

8

CLI

PA

4E

NS

AN

GG

0000

0017

707

agC

P98

58A

AA

B01

0089

8644

B-4

3D

CLI

PA

5E

NS

AN

GG

0000

0017

770

agC

P99

12A

AA

B01

0089

8644

B-4

3D

CLI

PA

6IS

PR

9lik

eE

NS

AN

GG

0000

0017

677

agC

p954

7A

AA

B01

0089

8644

B-4

3D

CLI

PA

7IS

PR

9E

NS

AN

GG

0000

0017

686

agC

p955

7A

AA

B01

0089

8644

B-4

3D

CLI

PA

8E

NS

AN

GG

0000

0016

096

agC

P72

14A

AA

B01

0088

4839

C

CLI

PA

9E

NS

AN

GG

0000

0010

217

agC

P10

582

AA

AB

0100

8823

41A

CLI

PA

10N

ot p

redi

cted

EB

IP76

90A

AA

B01

0088

0725

D-2

7CC

G49

98

CLI

PB

CLI

PB

1A

gSp1

4D2

EN

SA

NG

G00

0000

1109

5ag

CP

1425

6A

AA

B01

0087

9414

D

CLI

PB

2A

gSer

2E

NS

AN

GG

0000

0011

531

agC

P29

89A

AA

B01

0088

7914

CP

redi

ctio

n re

fined

CLI

PB

3E

NS

AN

GG

0000

0011

053

agC

P14

259

AA

AB

0100

8794

14C

-14D

CLI

PB

4A

gSp1

4D1

Not

pre

dict

edN

ot p

redi

cted

Not

pre

dict

ed14

Dac

c.no

:AF

0071

66

CLI

PB

5E

NS

AN

GG

0000

0009

231

agC

P24

80A

AA

B01

0088

8018

AC

G11

02, E

aste

r

Page 44: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K

44

Ano

phel

es g

ene

list

Fam

ilyS

ubfa

mily

Gen

e N

ame

Syn

onym

Ens

embl

#P

rote

in p

redi

ctio

nS

caff

old

Chr

. loc

atio

nD

m O

rtho

logs

(if

any)

Com

men

ts

CLI

P (

cont

.)C

LIP

B6

EN

SA

NG

G00

0000

1044

1ag

CP

1427

2A

AA

B01

0087

9414

C-1

4D

CLI

PB

7E

NS

AN

GG

0000

0019

167

agC

P14

747

AA

AB

0100

8799

10D

-11C

CLI

PB

8A

gSer

6N

ot p

redi

cted

Not

pre

dict

edN

ot p

redi

cted

14A

CG

9737

acc.

no: A

J459

779

CLI

PB

9A

gSp1

4AE

NS

AN

GG

0000

0017

835

agC

P30

48A

AA

B01

0088

7914

A

CLI

PB

10A

gSer

8E

NS

AN

GG

0000

0017

835

agC

P30

48A

AA

B01

0088

7914

AC

G30

66

CLI

PB

11E

NS

AN

GG

0000

0008

059

agC

P45

75A

AA

B01

0089

8433

D

CLI

PB

12E

NS

AN

GG

0000

0008

007

agC

P45

30A

AA

B01

0089

8433

D

CLI

PB

13E

NS

AN

GG

0000

0010

153

agC

P43

97A

AA

B01

0089

0520

DC

G58

96

CLI

PB

14A

gSer

4E

NS

AN

GG

0000

0015

633

agC

P10

576

AA

AB

0100

8823

40A

CLI

PB

15A

gSer

3E

NS

AN

GG

0000

0013

326

agC

P88

06A

AA

B01

0089

8035

B-3

6C

CLI

PB

16E

NS

AN

GG

0000

0009

880

agC

P46

46A

AA

B01

0089

8433

D

CLI

PB

17E

NS

AN

GG

0000

0019

659

agC

P12

211

AA

AB

0100

8987

7A-1

0Dtw

o cl

ip d

omai

ns

CLI

PC

CLI

PC

1E

NS

AN

GG

0000

0014

314

agC

P45

43A

AA

B01

0089

8432

AS

nake

CLI

PC

2E

NS

AN

GG

0000

0010

933

agC

P14

099

AA

AB

0100

8898

18C

-19C

Sna

ke

CLI

PC

3A

gSp1

8DE

NS

AN

GG

0000

0010

982

agC

P14

119

AA

AB

0100

8898

18C

Sna

ke

CLI

PC

4S

p2A

Not

pre

dict

edN

ot p

redi

cted

Not

pre

dict

ed2A

acc.

no:A

F11

7752

CLI

PC

5E

NS

AN

GG

0000

0014

810

agC

P13

087

AA

AB

0100

8846

1D-4

CC

G63

61, p

erse

phon

e

CLI

PC

6E

NS

AN

GG

0000

0014

810

agC

P13

087

AA

AB

0100

8846

1D-4

CC

G63

61, p

erse

phon

e

CLI

PC

7IS

PR

5E

NS

AN

GG

0000

0018

770

agC

p787

2A

AA

B01

0088

88U

NK

NO

WN

CLI

PD

CLI

PD

1A

gSer

1E

NS

AN

GG

0000

0012

449

agC

P16

94A

AA

B01

0088

5911

CC

G93

72

CLI

PD

2E

NS

AN

GG

0000

0019

362

agC

P11

478

AA

AB

0100

8964

29A

-31C

CG

1682

1

CLI

PD

3E

NS

AN

GG

0000

0013

258

agC

P12

231

AA

AB

0100

8987

7A-1

0DC

G74

32

CLI

PD

4E

NS

AN

GG

0000

0013

699

agC

P18

28A

AA

B01

0088

5913

EC

G12

99

CLI

PD

5N

ot p

redi

cted

Not

pre

dict

edA

AA

B01

0088

5911

B-1

4CC

G12

99P

redi

ctio

n re

fined

CLI

PD

6E

NS

AN

GG

0000

0014

328

agC

P18

49A

AA

B01

0088

5913

EC

G12

99tw

o cl

ip d

omai

ns

CLI

PD

7E

NS

AN

GG

0000

0014

695

agC

P48

04A

AA

B01

0089

8432

AC

G82

13, S

tubb

le, C

G81

72

SR

PN

S

RP

N1

E

NS

AN

GG

0000

0019

162

agC

P34

18A

AA

B01

0088

0725

DS

erpi

n-27

A, S

RP

N2,

SR

PN

3in

hibi

tory

: K/F

S

RP

N2

E

NS

AN

GG

0000

0019

323

agC

P37

68A

AA

B01

0088

0725

DS

erpi

n-27

A, S

RP

N1,

SR

PN

3in

hibi

tory

: K/F

S

RP

N3

E

NS

AN

GG

0000

0005

827

EB

I772

3A

AA

B01

0088

0725

DS

erpi

n-27

A, S

RP

N1,

SR

PN

2in

hibi

tory

:T/I

S

RP

N4

E

NS

AN

GG

0000

0003

652

EB

I466

2A

AA

B01

0089

8035

BC

G72

19, S

RP

N5,

SR

PN

6in

hibi

tory

: I/S

S

RP

N5

E

NS

AN

GG

0000

0008

018

agC

P49

23A

AA

B01

0089

8433

DC

G72

19, S

RP

N4,

SR

PN

6in

hibi

tory

:S/L

S

RP

N6

E

NS

AN

GG

0000

0008

056

agC

P45

62A

AA

B01

0089

8433

DC

G72

19, S

RP

N4,

SR

PN

5in

hibi

tory

:I/G

S

RP

N7

E

NS

AN

GG

0000

0018

236

agC

P33

00A

AA

B01

0088

0725

D

inhi

bito

ry:R

/V

Page 45: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K

45

Ano

phel

es g

ene

list

Fam

ilyS

ubfa

mily

Gen

e N

ame

Syn

onym

Ens

embl

#P

rote

in p

redi

ctio

nS

caff

old

Chr

. loc

atio

nD

m O

rtho

logs

(if

any)

Com

men

ts

SR

PN

(co

nt.)

S

RP

N8

E

NS

AN

GG

0000

0012

210

agC

P30

27A

AA

B01

0088

7914

CC

G66

80in

hibi

tory

: K/A

S

RP

N9

E

NS

AN

GG

0000

0014

191

agC

P29

80A

AA

B01

0088

7914

A-C

CG

6687

, Sp5

inhi

bito

ry: S

/S

S

RP

N10

E

NS

AN

GG

0000

0013

014

agC

P14

891

AA

AB

0100

8900

21F

in

hibi

tory

: K/R

S

RP

N11

E

NS

AN

GG

0000

0014

319

agC

P12

201

AA

AB

0100

8987

7A

non

inhi

bito

ry

S

RP

N12

E

NS

AN

GG

0000

0014

436

agC

P12

184

AA

AB

0100

8987

7A

non

inhi

bito

ry

S

RP

N13

E

NS

AN

GG

0000

0013

014

agC

P12

957

AA

AB

0100

8407

21C

no

n in

hibi

tory

S

RP

N14

E

NS

AN

GG

0000

0019

032

agC

P35

74A

AA

B01

0088

0725

D

non

inhi

bito

ry

SR

PN

15

ENSA

NG

G00

0000

1295

9a

gC

P92

54A

AA

B01

0082

87

P

utat

ive

hapl

otyp

e of

SR

PN

9

TO

LLT

OLL

1AN

ot p

redi

cted

NO

T P

RE

DIC

TE

DA

AA

B01

0088

116C

Tol

l1

TO

LL5A

EN

SA

NG

G00

000

015

554/

agC

P41

97/

AA

AB

0100

8811

5BT

oll5

EN

SA

NG

G00

000

015

552

agC

P41

96

TO

LL1B

EN

SA

NG

G00

000

014

400/

agC

P71

98/

AA

AB

0100

8848

39B

Tol

l1

EN

SA

NG

G00

000

014

747

agC

P19

10

TO

LL5B

EN

SA

NG

G00

0000

1414

8ag

CP

7266

AA

AB

0100

8848

39B

Tol

l5

TO

LL6

EN

SA

NG

G00

0000

0673

4E

BIP

8963

AA

AB

0100

8986

43D

Tol

l6

TO

LL7

EN

SA

NG

G00

0000

0677

2E

BIP

9016

AA

AB

0100

8986

43D

Tol

l7

TO

LL8

EN

SA

NG

G00

0000

0869

5ag

CP

9368

AA

AB

0100

8986

43D

Tol

l8

TO

LL9

EN

SA

NG

G00

0000

1203

9ag

CP

3322

AA

AB

0100

8807

25D

Tol

l9

TO

LL10

EN

SA

NG

G00

0000

0627

8E

BIP

8317

AA

AB

0100

8816

42B

TO

LL11

EN

SA

NG

G00

0000

0628

0E

BIP

8319

AA

AB

0100

8816

42B

Cac

tus

E

NS

AN

GG

0000

0007

525

agC

P11

355

AA

AB

0100

8964

29A

Pel

le

EN

SA

NG

G00

0000

0839

7ag

CP

1192

AA

AB

0100

8859

13E

SP

ZS

PZ

1E

NS

AN

GG

0000

0013

105

agC

P13

439

AA

AB

0100

8846

1D

SP

Z2

EN

SA

NG

G00

0000

1095

8ag

CP

6506

AA

AB

0100

8960

25D

SP

Z3

EN

SA

NG

G00

0000

1706

3ag

CP

1127

7A

AA

B01

0089

6429

A

SP

Z4

EN

SA

NG

G00

0000

1663

1ag

CP

1124

2A

AA

B01

0089

6429

A

SP

Z5

EN

SA

NG

G00

0000

1694

9ag

CP

3573

AA

AB

0100

8807

25D

SP

Z6

EN

SA

NG

G00

0000

0743

2ag

CP

1493

0A

AA

B01

0089

0021

C

Tub

eE

NS

AN

GG

0000

0018

009

agC

P29

14A

AA

B01

0088

7914

A

RE

LG

ambi

fE

NS

AN

GG

0000

0008

612

agC

P14

571

AA

AB

0100

8839

34B

Dor

sal

Rel

ish

E

NS

AN

GG

0000

0017

745

agC

P38

20A

AA

B01

0088

0725

DR

elis

h

MyD

88

E

NS

AN

GG

0000

0013

260

agC

P14

973

AA

AB

0100

8900

21F

IMD

NO

T P

RE

DIC

TE

DN

OT

PR

ED

ICT

ED

AA

AB

0100

8948

21B

Imd,

CG

5576

Page 46: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K

46

Ano

phel

es g

ene

list

Fam

ilyS

ubfa

mily

Gen

e N

ame

Syn

onym

Ens

embl

#P

rote

in p

redi

ctio

nS

caff

old

Chr

. loc

atio

nD

m O

rtho

logs

(if

any)

Com

men

ts

ST

AT

S

TA

T1

Ag-

ST

AT

EN

SA

NG

G00

0000

0779

3ag

CP

1073

0A

AA

B01

0088

4938

B

ST

AT

2

EN

SA

NG

G00

0000

0615

7E

BIP

8156

AA

AB

0100

8846

4C

PP

O

PP

O1

L760

38E

NS

AN

GG

0000

0014

466

agC

P11

54A

AA

B01

0088

5913

B

P

PO

2A

F00

4915

EN

SA

NG

G00

0000

1815

9ag

CP

6387

AA

AB

0100

8960

24B

P

PO

3A

F00

4916

EN

SA

NG

G00

0000

0204

4eb

iP24

37A

AA

B01

0089

4821

B

3/4/

5/6/

7/8/

9 lo

cate

d w

ithin

~ 6

5 kb

.

P

PO

4A

J010

193

EN

SA

NG

G00

0000

0896

7ag

CP

2161

AA

AB

0100

8948

21B

P

PO

5A

J010

194

EN

SA

NG

G00

0000

0183

5E

BIP

2175

AA

AB

0100

8948

21B

P

PO

6A

J010

195

NO

T P

RE

DIC

TE

DN

OT

PR

ED

ICT

ED

AA

AB

0100

8948

21B

P

PO

7A

J459

960

EN

SA

NG

G00

0000

0825

7ag

CP

2095

AA

AB

0100

8948

21B

P

PO

8A

J459

961

NO

T P

RE

DIC

TE

DN

OT

PR

ED

ICT

ED

AA

AB

0100

8948

21B

PP

O9

AJ4

5996

2E

NS

AN

GG

0000

0008

251

agC

P20

84A

AA

B01

0089

4821

B

DE

F

DE

F1

Def

ensi

nE

NS

AN

GG

0000

0013

132

agC

P69

15A

AA

B01

0088

1642

B

D

EF

2

NO

T P

RE

DIC

TE

DN

OT

PR

ED

ICT

ED

AA

AB

0100

8952

19D

D

EF

3

NO

T P

RE

DIC

TE

DN

OT

PR

ED

ICT

ED

AA

AB

0100

8807

5D

D

EF

4

NO

T P

RE

DIC

TE

DN

OT

PR

ED

ICT

ED

AA

AB

0100

8960

23A

CE

CC

EC

1C

ecA

EN

SA

NG

G00

0000

0946

8ag

CP

7503

AA

AB

0100

8847

1A

C

EC

2C

ecB

NO

T P

RE

DIC

TE

DN

OT

PR

ED

ICT

ED

AA

AB

0100

8847

1A

C

EC

3C

ecC

NO

T P

RE

DIC

TE

DN

OT

PR

ED

ICT

ED

AA

AB

0100

8847

1A

C

EC

4C

ecD

NO

T P

RE

DIC

TE

DN

OT

PR

ED

ICT

ED

AA

AB

0100

8807

25D

CA

SP

CA

SP

SC

AS

PS

1

NO

T P

RE

DIC

TE

DN

OT

PR

ED

ICT

ED

AA

AB

0100

8986

43D

C

AS

PS

2

EN

SA

NG

G00

0000

0818

6ag

CP

9777

AA

AB

0100

8986

43D

C

AS

PS

3

EN

SA

NG

G00

0000

0656

0E

BIP

8707

AA

AB

0100

8986

43D

C

AS

PS

4

EN

SA

NG

G00

0000

1290

3ag

CP

9776

AA

AB

0100

8986

43D

C

AS

PS

5

EN

SA

NG

G00

0000

1887

6ag

CP

2034

AA

AB

0100

8948

21A

E

BIP

2383

iden

tical

pre

dict

ion

C

AS

PS

6

EN

SA

NG

G00

0000

1887

8ag

CP

2039

AA

AB

0100

8948

21A

C

AS

PS

7

EN

SA

NG

G00

0000

0783

0ag

CP

9089

AA

AB

0100

8963

5A

C

AS

PS

8

EN

SA

NG

G00

0000

1262

9ag

CP

8592

AA

AB

0100

8980

35B

-36C

C

AS

PS

9

EN

SA

NG

G00

0000

1562

5ag

CP

1044

6A

AA

B01

0088

2340

B

C

AS

PS

10

EN

SA

NG

G00

0000

1564

0ag

CP

1059

2A

AA

B01

0088

2340

B

CA

SP

S11

EN

SA

NG

G00

0000

1887

3ag

CP

2032

AA

AB

0100

8948

21A

(2L

)P

ossi

ble

hapl

otyp

e to

agC

P20

34

CA

SP

S12

EN

SA

NG

G00

0000

1562

4ag

CP

1044

0A

AA

B01

0088

2340

B (

3R)

Pos

sibl

e ha

plot

ype

to a

gCP

1044

6

CA

SP

S13

EN

SA

NG

G00

0000

1562

3ag

CP

1043

9A

AA

B01

0088

2340

B (

3R)

Pos

sib;

e ha

plot

ype

of a

gCP

1059

2

CA

SP

S14

EN

SA

NG

G00

0000

1301

6ag

CP

3911

AA

AB

0100

8459

unkn

own

show

s si

mila

rity

to a

gCP

9777

Page 47: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K

47

Ano

phel

es g

ene

list

Fam

ilyS

ubfa

mily

Gen

e N

ame

Syn

onym

Ens

embl

#P

rote

in p

redi

ctio

nS

caff

old

Chr

. loc

atio

nD

m O

rtho

logs

(if

any)

Com

men

ts

CA

SP

(co

nt.)

CA

SP

LC

AS

PL1

E

NS

AN

GG

0000

0015

414

agC

P95

56A

AA

B01

0089

8643

DD

RE

DD

C

AS

PL2

E

NS

AN

GG

0000

0008

206

agC

P27

21A

AA

B01

0089

6820

DD

RO

NC

IAP

IA

P1

EN

SA

NG

G00

0000

0714

7E

BIP

9540

AA

AB

0100

8807

25D

DIA

P1,

TH

RE

AD

IA

P2

EN

SA

NG

G00

0000

1547

1ag

CP

6860

AA

AB

0100

8816

42B

DIA

P2

IAP

(co

nt.)

IA

P3

EN

SA

NG

G00

0000

1407

9ag

CP

3615

AA

AB

0100

8807

25D

IA

P4

EN

SA

NG

G00

0000

1352

1ag

CP

3622

AA

AB

0100

8807

25D

IA

P5

IAP

D1

EN

SA

NG

G00

0000

1668

4ag

CP

1099

6A

AA

B01

0089

6429

A-3

0ED

ET

ER

IN

IA

P6

IAP

B1

EN

SA

NG

G00

0000

0232

7E

BIP

2826

AA

AB

0100

8859

11C

-13E

BR

UC

E

Page 48: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K

48

Dro

soph

ila im

mun

ity g

ene

list

Fam

ily (

orde

r)S

ubfa

mily

(if

any)

Gen

e N

ame

Syn

onym

(if

any)

Acc

essi

on #

Chr

omos

omal

loca

tion

Com

men

ts

PG

RP

PG

RP

LP

GR

P-L

A

CG

4384

, CG

1861

4, C

G43

6167

A7

PG

RP

-LB

C

G14

704

86E

8

PG

RP

-LC

C

G44

3267

A8

PG

RP

-LD

C

G55

2364

E7-

8

PG

RP

-LE

C

G89

9513

F1

PG

RP

-LF

C

G44

3767

A8-

9

P

GR

PS

PG

RP

-SA

C

G11

709

10C

6

PG

RP

-SB

1

CG

9681

73C

1

PG

RP

-SB

2

CG

9697

73C

1

PG

RP

-SC

1a

CG

1474

644

E3

PG

RP

-SC

1b

CG

8577

44E

3

PG

RP

-SC

2

CG

1474

544

E3-

4

PG

RP

-SD

C

G74

9666

A9

TE

P T

epI

CG

1809

635

F1-

F4

Tep

IIC

G70

5228

B1-

B4

Tep

IIIC

G70

6828

B1-

B4

Tep

IVC

G10

363

37F

1-F

2

Tep

VC

G13

079

37F

1-F

2S

eque

nce

was

ref

ined

(M

. Lag

ueux

)

T

epV

IM

crC

G75

8628

D-E

GN

BP

G

NB

P1

DG

NB

P-1

CG

6895

75D

2

GN

BP

2D

GN

BP

-2C

G41

4475

D2

GN

BP

3D

GN

BP

-3C

G50

0866

E4-

E5

SC

RC

lass

A-li

keC

G11

335

CG

1133

510

0B5

CG

3921

CG

3921

24C

7-C

8

CG

2105

CG

2105

43D

7-E

1

CG

4402

CG

4402

58A

1

Teq

uila

GR

AA

LC

G18

403

66F

2

C

lass

Bcr

oque

mor

t

CG

4280

21C

5

CG

1278

9

CG

1278

927

F3

CG

7228

C

G72

2828

D2

CG

7227

C

G72

2728

D2

CG

5750

C

G57

5036

D2

CG

3829

C

G38

2960

E 1

1

SC

R (

cont

.)

CG

2736

C

G27

3660

E10

-11

Page 49: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K

49

Dro

soph

ila im

mun

ity g

ene

list

Fam

ily (

orde

r)S

ubfa

mily

(if

any)

Gen

e N

ame

Syn

onym

(if

any)

Acc

essi

on #

Chr

omos

omal

loca

tion

Com

men

ts

em

p

CG

2727

60E

11

CG

1887

C

G18

8762

B9

C

G74

22

CG

7422

66A

10

C

G10

345

C

G10

345

89D

3

CG

7000

C

G70

0093

B11

C

lass

CdS

R-C

IC

G40

9924

D4

dSR

-CII

CG

8856

48E

10

dSR

-CIII

Q9N

2Q3

24D

CG

3212

C

G32

1223

F3

CT

LC

TLG

AC

G41

15C

G41

1587

B8

CG

6055

CG

6055

28A

1

CG

3244

CG

3244

25A

6

CG

9976

CG

9976

37D

6

CG

9978

CG

9978

37D

6

C

TLM

AC

G91

34

CG

9134

61F

4

CG

2958

C

G29

5824

D8

CG

1683

4

CG

1683

432

C5

lect

in_3

R D

m

NO

T P

RE

DIC

TE

Dch

rom

3R

C

TLS

Efu

rrow

edC

G15

0011

A1

CG

9095

C

G90

9513

B1

C

TL

CG

1581

8

CG

1581

827

F3

CG

1779

9

CG

1779

929

C1

CG

1535

8

CG

1535

822

B4

CG

1537

8

CG

1537

822

C1

CG

2839

C

G28

3921

D2

CG

2826

C

G28

2621

D2

CG

7763

C

G77

6347

F16

C

G34

10C

G34

1024

B3-

C1

CG

1368

6C

G13

686

21D

2

CG

1701

1

CG

1701

130

A7

CG

7106

CG

7106

28D

2

CG

1843

1C

G18

431

54B

18-C

1

CG

1779

7C

G17

797

29C

1

CT

L (c

ont.)

C

G14

500

CG

1450

055

C2

CG

1656

CG

1656

46B

7

Page 50: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K

50

Dro

soph

ila im

mun

ity g

ene

list

Fam

ily (

orde

r)S

ubfa

mily

(if

any)

Gen

e N

ame

Syn

onym

(if

any)

Acc

essi

on #

Chr

omos

omal

loca

tion

Com

men

ts

CG

1652

CG

1652

46B

6-7

CG

6014

CG

6014

78D

2

C

G11

211

CG

1121

142

A8

CG

1308

6C

G13

086

37D

2

CG

1211

1C

G12

111

7F4-

7F5

CG

1576

5C

G15

765

5C2-

5C3

le

ctin

_42A

2R

lect

in_4

2A 2

R42

A

CG

1486

6

CG

1486

688

F1

CG

3921

C

G39

2124

C7-

C8

GA

LE

CG

1137

4C

G11

374

21A

4

CG

1395

0C

G13

950

21D

4

CG

1856

5C

G18

565

77B

7

CG

5335

CG

5335

55E

11

Dm

Gal

Gal

ectin

CG

1137

221

-A4

FB

N

CG

9500

CG

9500

26C

4

CG

8642

CG

8642

44D

5

scab

rous

sca

CG

1757

949

D4

CG

5550

CG

5550

52E

2

CG

3028

0C

G30

280

58D

2

CG

3028

1C

G30

281

58D

2

CG

1035

9C

G10

359

63E

5

CG

7668

CG

7668

76E

1

CG

9593

CG

9593

89A

6

CG

6788

CG

6788

16E

2

CG

1791

CG

1791

9A3

CG

1889

CG

1889

9A3

CG

3183

2

CG

3183

2N

OT

PR

ED

ICT

ED

CLI

PC

G11

02B

ES

T:G

H02

921

CG

1102

82A

4-82

A4

CG

1131

3C

G11

313

100A

3-10

0A3

CG

1299

CG

1299

64A

8-64

A8

two

clip

dom

ains

CG

1331

8C

G13

318

85B

1-85

B1

CG

1504

6C

G15

046

17B

4-17

B4

two

clip

dom

ains

CLI

P (

cont

.)

CG

1670

5C

G16

705

95A

7-95

A7

CG

1682

1C

G16

821

34B

2-34

B2

CG

1847

7B

G:D

S07

108.

1C

G18

477

35D

3-35

D3

Page 51: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K

51

Dro

soph

ila im

mun

ity g

ene

list

Fam

ily (

orde

r)S

ubfa

mily

(if

any)

Gen

e N

ame

Syn

onym

(if

any)

Acc

essi

on #

Chr

omos

omal

loca

tion

Com

men

ts

CG

1855

7C

G18

557

23B

4-23

B4

Ser

7C

G20

459A

2-9A

2

CG

2056

CG

2056

7F3-

7F3

C

G30

66C

G30

6684

D14

-84E

1

CG

3117

CG

3117

23B

4-23

B5

CG

3505

CG

3505

88C

10-8

8C10

stub

ble

stub

bloi

dC

G43

1689

B7-

89B

9

CG

4793

BG

:DS

0748

6.3

CG

4793

35D

6-35

D6

east

erC

G49

2088

F1-

88F

1

CG

4998

CG

4998

72E

1-72

E1

CG

5390

CG

5390

31D

1-31

D1

CG

5896

CG

5896

97E

5-97

E5

CG

5909

CG

5909

97E

5-97

E6

CG

6361

CG

6361

17B

3-17

B3

pers

epho

neC

G63

67C

G63

6717

B3-

17B

4

CG

6639

CG

6639

36C

9-36

C9

CG

7432

CG

7432

92A

13-9

2A13

snak

el(3

)87D

g, m

e(3)

4C

G79

9687

D9-

87D

9

CG

8172

CG

8172

45A

1-45

A1

CG

8213

CG

8213

44F

12-4

5A1

CG

8586

CG

8586

44E

2-44

E2

two

clip

dom

ains

CG

8738

CG

8738

44E

2-44

E2

two

clip

dom

ains

CG

9372

CG

9372

76B

9-76

B9

CG

9377

CG

9377

34B

7-34

B7

CG

9733

CG

9733

99E

3-99

E3

CG

9737

CG

9737

99E

3-99

E3

CG

4914

C

G49

1470

E7-

70E

7no

t in

anal

ysis

SR

PN

serp

in-2

7AC

G11

331

26F

6in

hibi

tory

: K/F

CG

6717

CG

6717

28B

3in

hibi

tory

: K/K

CG

1231

8C

G12

318

28D

2in

hibi

tory

: L/S

CG

7219

CG

7219

28D

5in

hibi

tory

: S/G

SR

PN

(co

nt.)

sp

2C

G81

3728

F5

inhi

bito

ry: L

/S

C

G48

04

CG

4804

31A

2no

n-in

hibi

tory

sp3

C

G93

3438

F2

inhi

bito

ry: K

/S

CG

1447

0

CG

1447

041

F10

non-

inhi

bito

ry

Page 52: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K

52

Dro

soph

ila im

mun

ity g

ene

list

Fam

ily (

orde

r)S

ubfa

mily

(if

any)

Gen

e N

ame

Syn

onym

(if

any)

Acc

essi

on #

Chr

omos

omal

loca

tion

Com

men

ts

CG

9455

C

G94

5542

D4

inhi

bito

ry: M

/M

sp1

C

G94

5642

D4

inhi

bito

ry: R

/A

CG

9460

C

G94

6042

D4

inhi

bito

ry: E

/S

CG

9454

C

G94

5442

D4

non-

inhi

bito

ry

sp4

C

G94

5342

D4

inhi

bito

ry: K

/R

S

pn43

Aa

C

G12

172

43A

inhi

bito

ry: M

/S

nec

C

G18

5743

Ain

hibi

tory

: L/S

CG

1859

C

G18

5943

Ano

n-in

hibi

tory

Spn

43A

b

CG

1865

43A

non-

inhi

bito

ry

C

G77

22C

G77

2247

C7

non-

inhi

bito

ry

CG

1095

6C

G10

956

54A

3no

n-in

hibi

tory

sp6

C

G10

913

55C

1in

hibi

tory

: R/M

CG

1308

CG

1308

64A

10N

o

acp7

6AC

G38

0175

F5

No

CG

6680

CG

6680

77B

4Y

: K/A

CG

6663

CG

6663

77B

4N

o

sp5

CG

1852

577

B4

Y: S

/A

CG

6687

CG

6687

77B

4Y

: S/S

CG

1280

7

CG

1280

785

F5

No

CG

1342

C

G13

4210

0A2

Y: R

/T

TO

LL

Tol

l-1T

oll

CG

5490

97D

1

Tol

l-218

wC

G88

9656

F8

Tol

l-3M

stP

rox

CG

1149

84D

5

Tol

l-4

CG

1824

129

F7

Tol

l-5T

ehao

CG

7121

34B

6

Tol

l-6

CG

7250

71C

1

Tol

l-7

CG

8595

56F

3

Tol

l-8T

ollo

CG

6890

71B

7

Tol

l-9

CG

5528

77B

1

Cac

tus

ca

ctus

C

G58

4835

E5

Pel

le

pelle

C

G59

7497

E6

SP

Z

spat

zle

C

G61

3497

D12

sp

z2

CG

1831

864

B9

sp

z3

CG

7104

28C

3

sp

z4

CG

1492

832

F1

Page 53: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K

53

Dro

soph

ila im

mun

ity g

ene

list

Fam

ily (

orde

r)S

ubfa

mily

(if

any)

Gen

e N

ame

Syn

onym

(if

any)

Acc

essi

on #

Chr

omos

omal

loca

tion

Com

men

ts

sp

z5

CG

9972

63A

1

spz6

C

G91

9660

E1

Tub

e

tube

C

G10

520

82B

1

Rel

do

rsal

C

G66

6736

C5

Dif

C

G67

9436

C4-

5

relis

h

CG

1199

285

C2-

3

M

yD88

C

G20

7845

C4

im

mun

e de

ficie

ncy

imd

CG

5576

55C

8

ST

AT

S

Sta

t92E

dST

AT

, mar

elle

CG

4257

92E

11-1

2

PP

O

Bc

Dox

-A1

CG

5779

55A

3

CG

8193

C

G81

9345

A2

Dox

-A3

C

G29

5259

D2

CE

C c

ecA

1C

G13

6599

E3

Pro

tein

seq

uenc

e id

entic

al to

Cec

A2

c

ecA

2C

G13

6799

E3

cecB

CG

1878

99E

3

CE

C (

cont

.)

cecC

C

G13

7399

E3

DE

F d

ef

CG

1385

46D

7

CA

SP

N

cdr

onc

CG

8091

67C

5

dred

d

CG

7486

1B-7

dcp-

1

CG

5370

59F

5

deca

y

CG

1490

289

D4

ice

dric

eC

G77

8899

C7

drea

m

CG

7863

42A

2

dam

mda

ydre

amC

G18

188

48D

2

IAP

thr

ead

DIA

P1

CG

1228

472

D1

Iap2

DIA

P2

CG

8293

52D

15

dete

rinC

G12

265

90A

1-2

Not

link

ed in

Fly

Bas

e ye

t

Bru

ce

CG

6303

86A

7-86

A8

Page 54: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K

54

5. Supporting references and notes

1. R. A. Holt et al., Science This issue (2002).

2. M. D. Adams et al., Science 287, 2185 (2000).

3. S. van Dongen, PhD, University of Utrecht (2000).

4. E. M. Zdobnov, R. Apweiler, Bioinformatics 17, 847 (2001).

5. S. E. R. Durbin, A. Krogh, and G. Mitchison, Cambridge University Press (1998).

6. E. Birney, R. Durbin, Genome Res 10, 547 (2000).

7. G. Dimopoulos et al., Proc Natl Acad Sci U S A 99, 8814 (2002).

8. F. H. Collins et al., Science 234, 607 (1986).

9. H. M. Muller, G. Dimopoulos, C. Blass, F. C. Kafatos, J Biol Chem 274, 11727

(1999).

10. H. Yoshida, K. Kinoshita, M. Ashida, J Biol Chem 271, 13854 (1996).

11. T. Michel, J. M. Reichhart, J. A. Hoffmann, J. Royet, Nature 414, 756 (2001).

12. M. Gottar et al., Nature 416, 640 (2002).

13. K. M. Choe, T. Werner, S. Stoven, D. Hultmark, K. V. Anderson, Science 296,

359 (2002).

14. M. Ramet, P. Manfruelli, A. Pearson, B. Mathey-Prevot, R. A. Ezekowitz, Nature

416, 644 (2002).

15. T. Werner et al., Proc Natl Acad Sci U S A 97, 13772 (2000).

16. M. Lagueux, E. Perrodou, E. A. Levashina, M. Capovilla, J. A. Hoffmann, Proc

Natl Acad Sci U S A 97, 11427 (2000).

17. M. Lagueux, unpublished.

Page 55: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K

55

18. E. A. Levashina et al., Cell 104, 709 (2001).

19. F. Oduol, J. Xu, O. Niare, R. Natarajan, K. D. Vernick, Proc Natl Acad Sci U S A

97, 11397 (2000).

20. J. Hofemeister, A. Kurtz, R. Borriss, J. Knowles, Gene 49, 177 (1986).

21. S. Schimming, W. H. Schwarz, W. L. Staudenbauer, Eur J Biochem 204, 13

(1992).

22. Y. S. Kim et al., J Biol Chem 275, 32721 (2000).

23. L. Peiser, S. Mukhopadhyay, S. Gordon, Curr Opin Immunol 14, 123 (2002).

24. L. J. van der Laan et al., J Immunol 162, 939 (1999).

25. A. Danielli et al., Proc Natl Acad Sci U S A 97, 7136 (2000).

26. K. Csiszar, Prog. Nucleic Acid Res. Mol. Biol. 70, 1 (2001).

27. R. Crombie, R. Silverstein, J Biol Chem 273, 4855 (1998).

28. N. C. Franc, J. L. Dimarcq, M. Lagueux, J. Hoffmann, R. A. Ezekowitz, Immunity

4, 431 (1996).

29. K. Hart, M. Wilcox, J Mol Biol 234, 249 (1993).

30. A. Pearson, A. Lux, M. Krieger, Proc Natl Acad Sci U S A 92, 4056 (1995).

31. M. Ramet et al., Immunity 15, 1027 (2001).

32. K. Drickamer, M. E. Taylor, Annu Rev Cell Biol 9, 237 (1993).

33. Y. Fujita, S. Kurata, K. Homma, S. Natori, J Biol Chem 273, 9667 (1998).

34. K. Drickamer, Nature 360, 183 (1992).

35. L. A. Leshko-Lindsay, V. G. Corces, Development 124, 169 (1997).

36. L. B. Klickstein et al., J Exp Med 165, 1095 (1987).

37. D. N. Cooper, S. H. Barondes, Glycobiology 9, 979 (1999).

Page 56: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K

56

38. K. E. Pace et al., J Biol Chem 277, 13091 (2002).

39. Q. Jiang, M. Hall, F. G. Noriega, M. Wells, Insect Biochem Mol Biol 27, 283

(1997).

40. C. A. Davis, D. C. Riddell, M. J. Higgins, J. J. Holden, B. N. White, Nucleic

Acids Res 13, 6605 (1985).

41. M. J. Gorman, O. V. Andreeva, S. M. Paskewitz, Insect Biochem Mol Biol 30, 35

(2000).

42. M. J. Gorman, S. M. Paskewitz, Insect Biochem Mol Biol 31, 257 (2001).

43. J. Volz, C. Blass, H. M. Muller, F. C. Kafatos, unpublished data.

44. P. J. Gotwals, J. W. Fristrom, Genetics 127, 747 (1991).

45. L. F. Appel et al., Proc Natl Acad Sci U S A 90, 4937 (1993).

46. G. A. Silverman et al., J Biol Chem 276, 33293 (2001).

47. J. A. Irving, R. N. Pike, A. M. Lesk, J. C. Whisstock, Genome Res 10, 1845

(2000).

48. B. Lemaitre, E. Nicolas, L. Michaut, J. M. Reichhart, J. A. Hoffmann, Cell 86,

973 (1996).

49. C. A. Janeway, Jr., R. Medzhitov, Annu Rev Immunol 20, 197 (2002).

50. J. Y. Ooi, Y. Yagi, X. Hu, Y. T. Ip, EMBO Rep 3, 82 (2002).

51. S. Tauszig, E. Jouanguy, J. A. Hoffmann, J. L. Imler, Proc Natl Acad Sci U S A

97, 10520 (2000).

52. J. L. Imler, unpublished data.

53. C. Luo, L. Zheng, Immunogenetics 51, 92 (2000).

54. P. Ligoxygakis, P. Bulet, J. M. Reichhart, EMBO Rep 3, 666 (2002).

Page 57: 1 3 4 1 8 9 1 11 7 1 1# - science.sciencemag.org...1 SUPPLEMENTARY MATERIAL Immunity-related genes and gene families in Anopheles gambiae: A comparative genomic analysis George K

57

55. C. Barillas-Mury et al., Embo J 15, 4691 (1996).

56. B. Lemaitre et al., Embo J 14, 536 (1995).

57. F. Leulier, S. Vidal, K. Saigo, R. Ueda, B. Lemaitre, Curr Biol 12, 996 (2002).

58. C. Barillas-Mury, Y. S. Han, D. Seeley, F. C. Kafatos, Embo J 18, 959 (1999).

59. L. O. Baumbusch et al., Nucleic Acids Res 29, 4319 (2001).

60. M. Meister, C. Hetru, J. A. Hoffmann, Curr Top Microbiol Immunol 248, 17

(2000).

61. P. Bulet, C. Hetru, J. L. Dimarcq, D. Hoffmann, Dev Comp Immunol 23, 329

(1999).

62. A. M. Richman et al., Insect Molecular Biology 5, 203 (1996).

63. J. Vizioli et al., Proc Natl Acad Sci U S A 98, 12630 (2001).

64. Y. Shi, Mol Cell 9, 459 (2002).

65. S. Y. Vernooy et al., J Cell Biol 150, F69 (2000).

66. V. Jesenberger, S. Jentsch, Nat Rev Mol Cell Biol 3, 112 (2002).

67. We thank EUROGENTEC and QIAGEN Operon for the long oligonucleotides

used in the microarray experiments.