on the evolution and physiology of cable bacteria...1 on the evolution and physiology of cable...

50
1 On the Evolution and Physiology of Cable Bacteria Kasper U. Kjeldsen, Lars Schreiber, Casper A. Thorup, Thomas Boesen, Jesper T. Bjerg, Tingting Yang, Morten S. Dueholm, Steffen Larsen, Nils Risgaard-Petersen, Marta Nierychlo, Markus Schmid, Andreas Bøggild, Jack van de Vossenberg, Jeanine S. Geelhoed, Filip J. R. Meysman, Michael Wagner, Per H. Nielsen, Lars Peter Nielsen, and Andreas Schramm Supporting Information Methods Filament extraction, whole genome amplification, and sequencing of marine cable bacteria. Single filaments were extracted with custom-made glass hooks from Aarhus Bay sediment enriched for cable bacteria as described previously (1). Individual filaments were transferred into PCR tubes containing 5 μL of filter- (0.22 µm) and UV-sterilized TE buffer (10 mM Tris-HCl, 1 mM EDTA, pH 7.5). Four microliters of PCR water were added and samples were stored at -80°C. Filaments were lysed by ultrasonic bead-beating: glass beads (diameter, 0.1 mm; PowerLyzer PowerSoil DNA Isolation kit, Qiagen) were heat-sterilized at 220°C for 12 h and subsequently suspended in one volume of filter- and UV-sterilized TE buffer. Approximately 2 µL of glass bead suspension were added to the filament sample. The sample was sonicated on ice (Sonoplus HD2070, Bandelin; sonication parameters: 3 min, continuous mode, amplitude setting: 30% ≈ 21 W). Genomes were amplified using the GenomePlex® Single Cell Whole Genome Amplification Kit (Sigma–Aldrich). Sequencing of the amplified genomic material was performed on an Ion Torrent PGM™ sequencer (Life Technologies) using 316v1 chips and 200 or 400 bp chemistry. Quality trimming and adapter clipping of reads was done using prinseq-lite.pl (2) and Seqclean (https://sourceforge.net/projects/seqclean/), respectively. Reconstruction of a high-quality draft genome of Candidatus Electrothrix aarhusiensis MCF. The genome of Ca. E. aarhusiensis MCF was reconstructed from two filaments that shared identical 16S rRNA gene sequences and ITS regions. Sequences from both filaments were assembled with gsAssembler version 2.6 (Roche 454 Life Sciences) and 10 different settings: minimum overlap settings of 50 or 100 bp, respectively, and minimum sequence identity values of 96-100% with 1% steps. In parallel, reads were assembled using the CLC Genomics Workbench version 5.5.1 (CLCbio) with standard settings for Ion Torrent reads (automatic word and bubble size) and SPAdes version 2.2.1 (3) with the settings: -only-assembler -k 21,33,55. All assemblies were performed on separate data sets for both filaments as well as on a combined data set. In addition to the full-sized assemblies, 10 assemblies of reduced complexity were generated, where 500,000 randomly-selected reads from the combined data set where assembled using gsAssembler (100 bp minimum overlap; 98% minimum sequence identity). In total, 46 assemblies were generated. All assembled sequence data were combined and then separated into contigs longer than 5000 bp (long contigs data set), and those between 1,000 and 5,000 bp (short contigs data set) in length. Contigs shorter than 1,000 bp were excluded from further analysis. As sequence data from both filaments contained contaminations from other bacteria presumably attached to the cable bacteria filaments, the genome of Ca. E. aarhusiensis MCF was reconstructed in a multi-step binning approach designed to eliminate contamination from the assembly: 1) Seed identification and extension: Two contigs (8 and 9.8 kbp long) that with high certainty originated from cable bacteria (anchor contigs) were identified in a gsAssembler assembly (98% www.pnas.org/cgi/doi/10.1073/pnas.1903514116

Upload: others

Post on 13-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: On the Evolution and Physiology of Cable Bacteria...1 On the Evolution and Physiology of Cable Bacteria Kasper U. Kjeldsen, Lars Schreiber, Casper A. Thorup, Thomas Boesen, Jesper

1

On the Evolution and Physiology of Cable Bacteria Kasper U. Kjeldsen, Lars Schreiber, Casper A. Thorup, Thomas Boesen, Jesper T. Bjerg, Tingting Yang, Morten S. Dueholm, Steffen Larsen, Nils Risgaard-Petersen, Marta Nierychlo, Markus Schmid, Andreas Bøggild, Jack van de Vossenberg, Jeanine S. Geelhoed, Filip J. R. Meysman, Michael Wagner, Per H. Nielsen, Lars Peter Nielsen, and Andreas Schramm Supporting Information Methods

Filament extraction, whole genome amplification, and sequencing of marine cable bacteria. Single filaments were extracted with custom-made glass hooks from Aarhus Bay sediment enriched for cable bacteria as described previously (1). Individual filaments were transferred into PCR tubes containing 5 μL of filter- (0.22 µm) and UV-sterilized TE buffer (10 mM Tris-HCl, 1 mM EDTA, pH 7.5). Four microliters of PCR water were added and samples were stored at -80°C. Filaments were lysed by ultrasonic bead-beating: glass beads (diameter, 0.1 mm; PowerLyzer PowerSoil DNA Isolation kit, Qiagen) were heat-sterilized at 220°C for 12 h and subsequently suspended in one volume of filter- and UV-sterilized TE buffer. Approximately 2 µL of glass bead suspension were added to the filament sample. The sample was sonicated on ice (Sonoplus HD2070, Bandelin; sonication parameters: 3 min, continuous mode, amplitude setting: 30% ≈ 21 W). Genomes were amplified using the GenomePlex® Single Cell Whole Genome Amplification Kit (Sigma–Aldrich). Sequencing of the amplified genomic material was performed on an Ion Torrent PGM™ sequencer (Life Technologies) using 316v1 chips and 200 or 400 bp chemistry. Quality trimming and adapter clipping of reads was done using prinseq-lite.pl (2) and Seqclean (https://sourceforge.net/projects/seqclean/), respectively.

Reconstruction of a high-quality draft genome of Candidatus Electrothrix aarhusiensis MCF. The genome of Ca. E. aarhusiensis MCF was reconstructed from two filaments that shared identical 16S rRNA gene sequences and ITS regions. Sequences from both filaments were assembled with gsAssembler version 2.6 (Roche 454 Life Sciences) and 10 different settings: minimum overlap settings of 50 or 100 bp, respectively, and minimum sequence identity values of 96-100% with 1% steps. In parallel, reads were assembled using the CLC Genomics Workbench version 5.5.1 (CLCbio) with standard settings for Ion Torrent reads (automatic word and bubble size) and SPAdes version 2.2.1 (3) with the settings: -only-assembler -k 21,33,55. All assemblies were performed on separate data sets for both filaments as well as on a combined data set. In addition to the full-sized assemblies, 10 assemblies of reduced complexity were generated, where 500,000 randomly-selected reads from the combined data set where assembled using gsAssembler (100 bp minimum overlap; 98% minimum sequence identity). In total, 46 assemblies were generated. All assembled sequence data were combined and then separated into contigs longer than 5000 bp (long contigs data set), and those between 1,000 and 5,000 bp (short contigs data set) in length. Contigs shorter than 1,000 bp were excluded from further analysis. As sequence data from both filaments contained contaminations from other bacteria presumably attached to the cable bacteria filaments, the genome of Ca. E. aarhusiensis MCF was reconstructed in a multi-step binning approach designed to eliminate contamination from the assembly: 1) Seed identification and extension: Two contigs (8 and 9.8 kbp long) that with high certainty originated from cable bacteria (anchor contigs) were identified in a gsAssembler assembly (98%

www.pnas.org/cgi/doi/10.1073/pnas.1903514116

Page 2: On the Evolution and Physiology of Cable Bacteria...1 On the Evolution and Physiology of Cable Bacteria Kasper U. Kjeldsen, Lars Schreiber, Casper A. Thorup, Thomas Boesen, Jesper

2

sequence identity; 50 bp minimum overlap). The contigs contained phylogenetic marker genes (among others dnaG, rplC, rplD, and rplE), which showed a high sequence similarity to genes of Desulfobulbus propionicus – the closest fully sequenced relative of cable bacteria. Contigs featuring regions with a sequence identity of >97% over a stretch of more than 200 bp to either of the two anchor contigs were extracted from the long contig data set using BLASTn (4) and BioPerl (5). These extracted contigs and the anchor contigs were merged and manually curated using Sequencher version 5.0.1 (Genecodes). This extended the anchor contigs to lengths of 39 kbp and 54 kbp, respectively. 2) K-mer binning: Additional contigs of cable bacteria were identified using k-mer based binning (6): tetra- and tri-nucleotide Z-scores as well as GC contents of the 39 kbp and 54 kbp anchor contigs and contigs from the long contig dataset were determined using a custom-made Perl script. Contigs were added to the cable bacteria bin, if their tetra-nucleotide Z-score correlation coefficient (Rtetra) with one of the anchor contigs exceeded 0.9. Contigs not meeting this threshold, but exceeding a lower threshold of RTetra>0.7 and a tri-nucleotide Z-score correlation coefficient of RTri>0.85, and whose GC content differed by no more than 2% from one of the anchor contigs were also added to the bin. All binned contigs were assembled together with the anchor contigs using Sequencher (Dirty Data Algorithm; 98% minimum sequence identity; 100 bp minimum overlap). 3) Undirected contig extension: Contigs showing a sequence identity of >97% over a stretch of at least 100 bp to one of the bin contigs were extracted from long contig data set using blastn and custom Perl scripts. The extracted contigs were merged with the binned contigs and subsequently manually curated using Sequencher. This step was repeated four times after which no further contigs of the long contig data set met the extraction criteria. 4) Extension of contigs ends using short contig data set: Contigs showing a sequence identity of >97% over a stretch of at least 100 bp to the terminal 500 bp of a binned contig were extracted from the short contig data set using blastn and custom Perl scripts. The extracted contigs were merged with the binned contigs and subsequently manually curated using Sequencher. This step was repeated 8 times, after which no significant increase of the overall assembly size could be achieved. 5) Reassembly: The iterative extension of the target genome contigs may result in accumulation of assembly errors and redundancy in the data set. In order to correct for this, sequencing reads were mapped onto the contigs contained in the final bin using bowtie2 (7) with the parameters: -very-sensitive-local –score-min L,0,1.7. The score-min parameters ensured that only reads showing a sequence identity of at least 95% to binned contigs were mapped. All correctly mapped reads were extracted from the read data set. The extracted reads were error corrected using SPAdes version 2.3 (single cell mode, k-mer values 21, 33, 55; (3)) and subsequently assembled using gsAssembler (100 bp minimum overlap; 98% minimum sequence identity). The resulting 211 contigs, representing 3.7 Mbp of sequence information, represented version 1 of the Ca. E. aarhusiensis MCF genome. 6) Annotation based stitching of contigs: The Ca. E. aarhusiensis MCF genome was annotated using the IMG-ER pipeline (8). Truncated genes at contig termini were identified manually. The closest related homologs (amino acid similarity >40%) of the truncated genes were retrieved from GenBank (9) using blastp (4) If two truncated genes showed a high similarity (amino acid-based) to the same retrieved GenBank sequence, the two truncated genes were aligned to the reference sequence using CLUSTALW (10). If the alignment indicated that both truncated genes were part of the same gene, the corresponding contigs were tentatively stitched together. The gap between the contigs was estimated based on the alignment to the reference sequence. Sequencing reads were aligned to the stitched contigs using cross-match (11). The resulting alignments were manually inspected and used to fill gaps between stitched contigs using Consed version 23.0 (11).

Page 3: On the Evolution and Physiology of Cable Bacteria...1 On the Evolution and Physiology of Cable Bacteria Kasper U. Kjeldsen, Lars Schreiber, Casper A. Thorup, Thomas Boesen, Jesper

3

7) Gap closure using Sanger sequencing: For cases where read mapping could not confirm the tentative contig stitching, primer pairs were designed to bridge stitched contig ends using the CLC Genomics Workbench version 5.5.1. The resulting PCR products were directly sequenced by Sanger sequencing (Macrogen). 8) rRNA operon ambiguity resolution: The ITS regions between 16S rRNA and 23S rRNA genes were identified by PCR amplification and sequencing of fragments spanning this region from the filament genome amplification products. This was done with the 16S rRNA gene primers DSBB+1297F (reverse-complement of DSBB+1297R (12)) and DBB1237F (reverse complement of DBB1237 (13,14) in combination with the 23S rRNA gene primers ITSReub_DBB (5’-GCA TCC GCC GTC AGC C-3’; this study) and 126R_DBB (5’-CCG GGT TTC CCC ATT AGG-3’; this study). PCR reaction mixtures of 50 µL volume contained: 0.5 µM of each primer, 0.6 g L-1 bovine serum albumin, 1x HotStar Taq Master Mix (Qiagen) and 1–10 ng of template DNA. Thermal cycling included: an initial denaturing step at 95°C for 15 min; 30 cycles of denaturation at 94°C for 30 s, annealing at 55°C for 1 min, elongation at 72°C for 3 min and a final elongation step at 72°C for 10 min. The resulting PCR products were purified (GenElute PCR CleanUp kit, Sigma), cloned (TOPO TA cloning Kit for sequencing, Invitrogen), and finally Sanger sequenced (GATC-biotech). Genomic regions adjacent to the 16S and 23S rRNA genes were identified by thermal asymmetric interlaced (TAIL)-PCR (15) For regions adjacent to the 16S rRNA gene, the primers SRB385 (16), DSBAC355R_mod (5’-CCA TTG CGC AAT ATT CCT CAC TG-3’ [modified reverse complement of DSBAC355 (17)]), and DBB121R_mod (5’-RGA CAG GTT ATC TAC GCG TTA CTC-3’ [modified reverse complement of DBB121 (14) were used sequentially. For regions adjacent to the 23S rRNA gene, the primers 2490f_long (5’-GTT TGG CAC CTC GAT GTC GGC-3’ [modified reverse complement of 2490r (18)]), DSB2600f (5’-ACA GTT TGG TCC TTA TCT GTT GCG-3’ [this study]), and DSB2628f (5’-GCA GGA TAT TTG AGG AGA TCT TTC C-3’ [this study]) were sequentially used. The resulting PCR products were purified, and directly Sanger sequenced (Macrogen). The obtained Sanger reads of ITS regions of regions adjacent to rRNA operons were merged with the genome using Sequencher. The resulting assembly represented version 2 of the Ca. E. aarhusiensis MCF genome. 9) Binning based on taxonomic classification: All contigs of the long and short contig data sets were taxonomically classified using MetaWatt 3.1.1 (19) based on best DIAMOND BLASTx hits (20) to the MetaWatt reference genome database supplemented with all available Desulfobulbaceae genomes (Table S2). 10) Merging of results of taxonomic binning with previous genome: Contigs classified as Desulfobulbaceae genome fragments were extracted and matched against the genome assembly using Sequencher (Dirty Data Algorithm; 98% minimum sequence identity; 50 bp minimum overlap). The resulting assembly represented version 3 of the Ca. E. aarhusiensis MCF genome. To preserve the existing IMG annotation (step 6), annotations from version 1 of the genome were transferred to version 3 using RATT (21). Already annotated regions (i.e. regions already present in genome version 1) were masked using a custom Perl script. Open reading frames (ORFs) were predicted in the non-masked regions using Prodigal 2.6.1 (22) as implemented in the PROKKA annotation pipeline (23), and added to the transferred annotation. 11) Final consolidation: For the final version of the MCF genome (Table S1), all contigs shorter than 1,000 bp were removed. The remaining contigs of version 3 of the Ca. E. aarhusiensis MCF genome were re-inspected using k-mer based binning in relation to the original anchor contigs (step 2) and using MetaWatt taxonomic binning (step 9). Only contigs either falling into the k-mer-based bin or taxonomically classified as Desulfobulbaceae were retained for the final version of the Ca. E. aarhusiensis MCF draft genome.

Genome reconstructions for Ca. E. communis A1, Ca. E. marina A2, Ca. E. marina A3, and Ca. E. marina A5. Quality- and adapter-trimmed reads were assembled using

Page 4: On the Evolution and Physiology of Cable Bacteria...1 On the Evolution and Physiology of Cable Bacteria Kasper U. Kjeldsen, Lars Schreiber, Casper A. Thorup, Thomas Boesen, Jesper

4

gsAssembler version 2.6 with 10 different settings: minimum overlap of 50 or 100 bp, and minimum sequence identity values of 96-100% with 1% steps. In parallel, reads were assembled using SPAdes version 3.5.0 (3) with the settings: -k 21,33,55 -sc. Additionally, a set of 10 low-complexity assemblies was generated, where 500,000 randomly-selected reads from the combined data set where assembled using gsAssembler with a minimum overlap of 100 bp and a minimum sequence identity of 98%. The resulting 21 genome assemblies of a given filament sample were then combined. Contigs shorter than 1,000 bp were excluded from further analysis. Contigs were taxonomically classified using MetaWatt 3.1.1 (19) based on best DIAMOND blastx hits (20) to the MetaWatt reference genome database supplemented with all available Desulfobulbaceae genomes (Table S2). Contigs classified as Desulfobulbaceae were extracted for further processing and formed the initial cable bacteria bins. ORFs were identified using FragGeneScan version 1.19 (24). ORFs putatively coding for ribosomal proteins or enzymes of the canonical sulfate-reduction pathway were identified and extracted using MEGAN version 4.70.4 (25) based on blastp hits to NCBI’s nr database (January 25th, 2015 version). The extracted ORFs were inspected manually for correct annotation. Contigs carrying confirmed ribosomal protein or sulfate-reduction genes originating from Desulfolbulbaceae were used as anchors for k-mer-based binning of additional contigs from the combined assembly data set as done for the genome of Ca. E. aarhusiensis MCF. Contigs with a sequence identity of at least 99% over a stretch of 100 bp to one of the bin contigs were extracted from the combined assembly data set using blastn and a custom Perl script, and added to the cable bacteria bins. Sequencing reads were mapped (96% minimum sequence identity) onto the cable bacteria bins and extracted using BBMap version 34.94 (sourceforge.net/projects/bbmap/). Extracted reads were assembled using SPAdes version 3.5.0 with the settings: -k 21,33,55 -sc. This was followed by a re-assembly using Sequencher with the Dirty Data Algorithm, a minimum sequence identity of 98%, and a minimum overlap of 100 bp. For the final versions of the genomes, all contigs shorter than 1,000 bp were removed. The remaining contigs were re-inspected using k-mer based binning in relation to the original anchor contigs and by using MetaWatt to confirm their taxonomic classification. Only contigs either falling into the k-mer-based bin or taxonomically classified as Desulfobulbaceae were retained for the final draft genome versions (Table S1). Establishment and metagenomic sequencing of a Ca. Electronema sp. GS enrichment culture. Ca. Electronema sp. GS was enriched by whole core incubations of sediment from a freshwater pond in Vennelystparken, Aarhus, Denmark (56.164796, 10.207805). Repeated transfer of a single cable bacterium filament to autoclaved sediment facilitated the establishment of a stable sediment enrichment culture containing no other cable bacteria than the clonal strain Ca. Electronema sp. GS. This enrichment was used for genomic and proteomic analyses of Ca. Electronema sp. GS. A total of five enrichment cores were sampled in three different ways to obtain cable bacteria biomass: Samples GS1, GS2 and GS5 were obtained from cores on which sterilized sand had been sequentially added on top of the sediment for 7 days, thus forcing the cable bacteria to migrate into the sand due to oxygen limitation. The resulting cable bacteria-enriched sand layer was harvested. Sample GSC consisted of filaments isolated from a core with glass hooks as described previously (26). Sample GSL was obtained collecting cable bacteria that migrated from sediment onto a glass slide as described in Bjerg et al. (27). DNA was extracted using the PowerLyser® DNA Isolation Kit (MoBio Inc.). Libraries for metagenomic sequencing of the extracted DNA were prepared using the Nextera DNA Library Preparation Kit (Illumina), and sequenced using the Illumina MiSeq kit v3. Reads were quality checked using FastQC version 0.11.4 (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) and trimmed using Trimmomatic v. 0.33 (28).

Genome reconstruction of Ca. Electronema sp. GS. The genome of Ca. Electronema sp. GS was extracted from a metagenome reconstructed from the pooled reads of five metagenomic

Page 5: On the Evolution and Physiology of Cable Bacteria...1 On the Evolution and Physiology of Cable Bacteria Kasper U. Kjeldsen, Lars Schreiber, Casper A. Thorup, Thomas Boesen, Jesper

5

sequence libraries (GS1, GS2, GS5, GSC and GSL, see above) using the assembly program IDBA v 1.1.1 (29) with the following parameters: --mink 60 --maxk 120 --min_contig 500. The metagenome was binned using mmGenome v. 0.6.3 (30). Contigs were taxonomically classified using blastp against the NCBI RefSeq Database release 73. Coverage files for the differential coverage analysis were generated by separately mapping the reads from the five samples onto the metagenome using BBMap version 35.82 with a minimum sequence identity threshold of 95 %. Contigs shorter than 5000 bp were excluded from the analysis. A bin classified within the Desulfobulbaceae family was identified as a cable bacterium genome. The Desulfobulbaceae contigs were binned twice in two successive steps to reduce contamination: The coverage files of samples GS1 and GSL were used for an initial lenient binning, followed by a second, more strict binning using the coverage files of samples GS2 and GSC. Reads were re-mapped with a 98 % minimum sequence identity threshold onto the resulting Desulfobulbaceae bin and the mapped reads were re-assembled using SPAdes v. 3.6.2 (3) with settings: -careful -kmer 21,33,55,77,99,127. The resultant assembly was retained for the final draft genome version of Ca. Electronema sp. GS (Table S1).

Estimation of genome completeness and genome annotation. Genome completeness was estimated using CheckM 1.0.7 (31), which tested for the 320 conserved single copy genes (CSCGs) of the order Desulfobacterales. Cable bacteria draft genomes were annotated using the IMG-ER pipeline (8).

Comparative genomics and last common ancestor analysis. Amino acid and nucleotide sequences of protein-coding genes of Desulfobulbaceae (Table S2) and cable bacteria were retrieved from IMG-ER (8). De novo prediction and clustering of protein families, respectively, was performed using the Integrated Toolkit for Exploration of microbial Pan-genomes (ITEP) (32) version 1.1. Initial generation of the ITEP SQL database was done using standard cutoff values for blastp (e-value cutoff: 1E-5) and blastn (e-value cutoff: 1). Genes were clustered based on bidirectional-best blast hits (based on the “maxbit” metric) and using MCL (33) as integrated in ITEP. MCL clustering was performed using an inflation value of 2.0 and a cutoff value of 0.4. Core and pan-genome analyses were performed using R (34) and were based on the presence/absence table of genes families that was generated by ITEP. Last common ancestor (LCA) analysis of the genes in the Ca. E. aarhusiensis MCF genome was performed using MEGAN version 4.70.4 (25) based on blastp hits to NCBI’s nr database (January 25th, 2015 version). LCA assignment was performed using MEGAN’s standard settings (min support: 5; min score: 50; top percent: 10; win score: 0; min complexity: 0.44). Phylogenetic analysis. Phylogenetic analysis was based on a set of single copy genes generally conserved within Bacteria (31). Genes not present in the draft genome of Ca. E. aarhusiensis MCF were removed. The resulting set contained 31 CSCGs primarily associated with the bacterial translation machinery (e.g. ribosomal proteins, tRNA synthetases, RNA polymerases) and with the following associated PFAM models: PF00164, PF00177, PF00181, PF00189, PF00203, PF00237, PF00238, PF00252, PF00276, PF00281, PF00297, PF00298, PF00312, PF00318, PF00333, PF00366, PF00380, PF00410, PF00411, PF00466, PF00572, PF00573, PF00623, PF00673, PF00687, PF00831, PF00861, PF01509, PF02978, PF03719, PF03946, PF03947, PF04561, PF04565, PF04997, PF05000, PF11987. The inferred amino acid sequences for each gene from cable bacteria, Desulfobulbaceae reference genomes (Table S2), and two ‘outgroup species’ (Thermodesulfatator indicus DSM15286, IMG taxon ID: 2505119042; Thermodesulfobacterium geofontis OPF15, IMG taxon ID 2506520012) were retrieved using IMG-ER’s Gene Export feature (8). The retrieved sequences were individually aligned per gene using MAFFT (version 7; (35)) with the “E-INS-i” option. The resulting alignments were concatenated yielding a final superalignment with 10,786 amino acid positions. For phylogenetic

Page 6: On the Evolution and Physiology of Cable Bacteria...1 On the Evolution and Physiology of Cable Bacteria Kasper U. Kjeldsen, Lars Schreiber, Casper A. Thorup, Thomas Boesen, Jesper

6

analysis, deletions and highly variable regions were masked by using a 30% positional conservatory filter as implemented in the ARB software environment (36). The filter left a total of 9,491 alignment positions. The phylogeny of the Desulfobulbaceae was reconstructed by maximum likelihood analysis using RAxML version 8.2.4 (37) with a Γ model of rate heterogeneity and the JTT protein evolution model. Node stability of the calculated phylogeny was evaluated by 1,000 bootstrap replicates.

Proteome analysis. For protein extraction, 500 µL sediment from the Ca. Electronema sp. GS enrichment culture was mixed with 500 µL of buffer (10 µM Tris-HCl, pH 8.0), and transferred to a 2 mL lysing matrix E tube (MP Biomedicals). Cells in the sample were lysed by bead beating (4x 20s, 6.0 m/s) in a FastPrep-24 instrument (MP Biomedicals). The sample was kept on ice for 2 min between each bead beating to prevent sample heating. Large particulate material was allowed to settle by gravity for 5 min and aliquots of 50 µL of supernatant were transferred to six 2 mL microcentrifuge tubes. The samples were lyophilized, and suspended in pure water or 20, 40, 60, 80 or 100% formic acid, which is known to depolymerize strong polymers such as amyloids (38) and membrane proteins at high concentrations (39). The samples were lyophilized again to remove formic acid and water and subjected to SDS-PAGE using a loading buffer containing 8M urea (40). Samples were then analyzed by label-free quantitative mass-spectrometry as previously described (39) using a Q-Exactive mass spectrometer (Thermo Fisher Scientific) with a nano-high pressure liquid chromatography system (Ultimate3000 UHPLC, Thermo Fisher Scientific). Protein identification and quantification were done by comparison against the annotated Ca. Electronema sp. GS draft genome using the open-source software MaxQuant v1.5.8.3. A custom CDS database based on the Ca. Electronema sp. GS draft genome was used as reference. Besides the standard settings, LFQ was activated in MaxQuant. This included a peptide and protein false discovery rate of 1%. Reversed sequences as decoys and contaminant sequences were added automatically by MaxQuant. The minimum ratio count for LFQ was set to one. The reverse and contaminant sequences were removed from the MaxQuant output, and unique identifiers (gene names or id numbers) were created for each protein. Detailed parameters are listed in the file ‘parameters.txt’ under Project PXD012775 in the EMBL-EBI database, where the mass spectrometry proteomics data have been deposited at the ProteomeXchange Consortium (http://www.proteomexchange.org) via the PRIDE partner repository (41).

PCR validation of the cytochrome-hemoglobin fusion protein. An approx. 1,560 bp-long gene fragment encoding the cable bacteria-unique cytochrome-hemoglobin domain fusion protein was PCR amplified from DNA extracted from Aarhus Bay sediment enriched with cable bacteria. Primers F-5'-CAGTGTGAAGGBTGTCATAC-3' and R- 5'-CCTGCTGAAYAAGCTCGTC-3' were custom designed to target positions 301-320 and 1,849-1,867 of the 1,914 bp-long gene H206_00640 of Ca. E. aarhusiensis MCF, and thus span both cytochrome and globin domains. PCR reaction mixtures of 25 µL volume included 12.5 µL Ex Taq HS polymerase (Takara), 2.5 µL 10x Ex Taq buffer (Takara), 0.5 µL BSA (10 mg mL-1), 0.5 µL of each primer (10 pmol µL-

1), and 1 µL undiluted DNA extract as template. Thermal cycling consisted of 95oC for 3 min, and 35 cycles of 95°C for 30 sec, 54°C for 40 sec, 72°C for 90 sec. The cycling was completed by an elongation step of 72°C for 10 min. PCR products were evaluated by agarose gel electrophoresis, purified (GenElute PCR Clean-Up Kit, Sigma) and cloned using the pGEM-T Easy vector system (Promega). Eleven clones representing the target gene were Sanger sequenced on both strands with M13 vector primers (Macrogen).

MAR-FISH. Marine sediment was collected from Aarhus Bay (Marselisborg Marina, Denmark; 56.138377, 10.215787, water depth 4 m) using a Kajak corer (KC Denmark A/S). The upper 10 cm of the sediment were discarded, and the underlying sulfidic sediment was passed through a sieve (pore size: 0.5 mm) to remove macrofauna before homogenization and incubation in small

Page 7: On the Evolution and Physiology of Cable Bacteria...1 On the Evolution and Physiology of Cable Bacteria Kasper U. Kjeldsen, Lars Schreiber, Casper A. Thorup, Thomas Boesen, Jesper

7

glass chambers. These chambers were constructed using two microscopy slides (1 mm thick; 75 mm long; 25 mm wide) separated by two pieces of glass slide (1 mm thick; 75 mm long; 10 mm wide), leaving a 5 mm wide hollow space at the center. All glass-glass surfaces were lubricated with silicone fat to minimize oxygen penetration by convection through the junctions and to help the chamber stick together. The chamber was held together using plastic tape. The bottom of the chamber was sealed using a butyl rubber stopper in order to prevent oxygen penetration from below. The chamber was filled with sediment and incubated in the dark with overlying air-saturated Aarhus Bay seawater (25 ‰) at 15°C. After two weeks the sediment contained a fully developed population of cable bacteria as evaluated by microsensor measurements, and 14C-labelled bicarbonate was added to a final concentration of 10 μCi mL-1: labeled substrate with a concentration of 100 μCi mL-1 was added to the sediment in three tracks 2-3 mm apart using a needle syringe. Control samples were killed by adding 2% formaldehyde (final conc.) together with the labeled substrate. After 8 h of incubation the chambers were dismantled and sediment samples were fixed in 4 % formaldehyde for exactly 3 h in order to avoid fixation biases (42). The fixed samples were washed with glycine buffer (pH 3) to remove excess bicarbonate followed by two washes with phosphate-buffered saline (PBS; 130 mM NaCl, 10 mM NaPi, pH 7.4). Samples were stored in 50% ethanol in PBS at 5°C until further processing. The fixed and washed sediment samples containing cable bacteria filaments were transferred to gelatin coated coverslips. In addition, single cable bacterial filaments were picked from sediment samples and negative controls using a hand-made glass hook, transferred to a water droplet on a gelatin-coated cover slip, and left for air-drying. FISH identification was performed as described previously (1). Microscopic analysis was performed using an Axioskop epifluorescence microscope (Carl Zeiss). MAR procedures and quantification of silver grains was done as described previously (42, 43). Image overlays were done using ImageJ (44).

Analysis of isoelectric points and of pI bias values. Protein-coding genes of cable bacteria, alkaliphilic Desulfobulbaceae (optimal growth at pH ≥ 9.5; Desulfurivibrio alkaliphilus AHT2, Deltaproteobacterium MLMS-1) and neutrophilic Desulfobulbaceae (optimal growth at pH 6.7-7.5; Desulfobulbus elongatus, Desulfobulbus japonicus, Desulfobulbus mediterraneus, Desulfobulbus propionicus, Desulfocapsa sulfexigens, Desulfocapsa thiozymogenes, Desulfotalea psychrophila, Desulfofustis glycolicus, Desulfopila aestuarii, Desulforhopalus singaporensis) were retrieved from IMG-ER (Table S2) (8). The isoelectric points (pI) of the retrieved genes were calculated iteratively with the bisection method (http://isoelectric.ovh.org) and a custom Perl script. For each species the pI bias (45) was calculated for periplasm-exposed proteins, which were identified by the presence of signal peptides or transmembrane helices as predicted by the IMG-ER pipeline (8). The “pI bias” describes the asymmetry of the bimodal distribution of pI, and ranges from -100% (all proteins have an acidic pI) to 100% (all proteins have a basic pI) (45).

Bioorthogonal Noncanonical Amino Acid Tagging (BONCAT). Sediment was collected from Aarhus Bay (Løgten beach, 56.288472, 10.382986), Denmark. The upper 5 cm of the sediment was discarded to minimize bioturbation; remaining sediment was mixed and distributed into glass beakers and incubated for 19 days in the dark in an aquarium circulating seawater from the sampling site at 15 °C to enrich for cable bacteria. Before initiating the BONCAT experiment, oxygen and sulfide porewater concentration profiles were measured by microsensors (1). The profiles indicated the successful enrichment of cable bacteria, as oxygen penetrated ~ 3 mm in the sediment, while sulfide began to accumulate only at ~ 15 mm depth. For the BONCAT experiment, five transparent plastic straws (5.5 cm in length, 7 mm in diameter) were inserted into the sediment of a single beaker. The top of these mini cores was just below the surface water layer. Following the procedure of Hatzenpichler et al. (46), 17 µL of 100 mM L-

Page 8: On the Evolution and Physiology of Cable Bacteria...1 On the Evolution and Physiology of Cable Bacteria Kasper U. Kjeldsen, Lars Schreiber, Casper A. Thorup, Thomas Boesen, Jesper

8

homopropargylglycine (HPG) was mixed into 200 µL sterile-filtered (0.2 µm pore size) seawater and 50 µL of the diluted HPG solution was gently injected into 4 of the mini cores from the bottom to the top using a Hamilton syringe fitted with a thin needle. The remaining core served as non-HPG control. The final concentration of HPG reached ~ 500 µM in each mini core. After injection, the beaker was sealed with parafilm and incubated for 19 hours at 15 °C. After incubation, O2 and H2S porewater concentrations were profiled again. The oxic layer extended from 0-3 mm, the suboxic sulfide-free layer from 3-12 mm, and the sulfidic layer started at 12 mm depth. The mini cores were removed and sectioned accordingly into oxic, suboxic and sulfidic zones. Cable bacteria filaments were picked by custom-made thin glass hooks (1) from oxic, suboxic and sulfidic layers of all mini cores, and were transferred to drops of sterile seawater on clean glass slides. Microscopic examination showed that the filaments were motile at this point and thus metabolically active. Cable bacteria filaments were immobilized at 45°C for 30 min, fixed in formaldehyde (3.7%) for 5 hours, and after three consecutive washes in PBS dehydrated in an ethanol series (50%, 80% and 96% ethanol; 3 min each). The cells incorporating HPG were fluorescently labeled by click chemistry as described previously (46). In short, Cu(I) click solution was prepared by mixing and incubating 5 µL 20 mM CuSO4, 10 µL 50 mM THPTA (Sigma) solution, and 1.2 µL 1 mM FAM-azide dye in the dark for 3 min at room temperature. This dye premix was gently mixed with a solution consisting of 50 µL 100 mM sodium ascorbate, 50 µL 100 mM aminoguanidine hydrochloride and 884 µL PBS. The dehydrated filaments on the microscope slides were covered in aliquots (60 µL) of the resultant click solution and incubated in the dark at 100% humidity for 30 min. The slides were washed 3 times with PBS, air-dried and stained with DAPI (1 mg mL-1) for 5 min in the dark, then washed and air dried anew. Finally, FISH was performed as described previously (1), and slides were observed on an Axiovert 200M epifluorescence microscope (Carl Zeiss).

Transmission electron microscopy (TEM), scanning electron microscopy (SEM) and energy-dispersive X-ray spectroscopy (EDX). For electron microscopy analyses, cable bacteria were collected with custom-made glass hooks, transferred onto TEM/SEM copper 220-230 mesh grids, and air dried at room temperature. TEM was performed on a Tecnai Spirit microscope (120kV). SEM was performed on a NanoSEM (FEI, Nova 600 NanoSEM) operated in low-vacuum and low-voltage (3 kV) mode for acquisition of charge contrast imaging data. EDX analysis of selected areas identified by SEM was performed with an integrated EDX with 20 keV beam energy. For SEM-EDX, gold-coated TEM grids were mounted on an aluminum stage or a silicon chip using carbon tape. The element mapping signals were collected for ~ 5 hours.

SI Discussion

COG-based profiling of gene function in cable bacteria genomes. A large fraction (37-66%) of the coding regions of the cable bacteria genomes compared to other Desulfobulbaceae (median: 31%) could not be assigned to any COG. Correspondingly, most COG functional categories appear underrepresented in cable bacteria (Fig. S1). However, the categories “energy production and conversion” (category C), “amino acid transport and metabolism” (category E), “carbohydrate transport and metabolism” (category G) and “signal transduction mechanisms” (category T) show particular low representation (Fig. S1A). This likely reflects a limited organotrophic catabolic potential as discussed in the main text. The underrepresentation of signal transduction mechanisms is not clear; notably cable bacteria generally harbor the same COG profile within this category as other Desulfobulbaceae but with fewer occurrences of the individual COGs (see also Fig. S11).

Page 9: On the Evolution and Physiology of Cable Bacteria...1 On the Evolution and Physiology of Cable Bacteria Kasper U. Kjeldsen, Lars Schreiber, Casper A. Thorup, Thomas Boesen, Jesper

9

COG functional gene categories seemingly enriched in the cable bacteria genomes compared to other members of the family Desulfobulbaceae include the categories M (cell wall-membrane-envelope biogenesis), J (translation, ribosomal structure and biogenesis), O (posttranslational modification, protein turnover, chaperones), V (defense mechanisms), and D (cell cycle control, cell division, chromosome partitioning) (Fig. S1A). The number of genes belonging to the categories M, J, O in cable bacteria genomes resemble those in other Desulfobulbaceae, and in accordance many genes in these categories encode essential cell functions. In contrast, overrepresentation of categories V and D may be an ecological adaptation (see discussion on virus attack below).

Sulfide oxidation. Physiological studies of various deltaproteobacterial sulfate reducers, including Desulfobulbus species, suggest that some sulfate-reducing bacteria can oxidize sulfide aerobically via a two-step process by which sulfide is first oxidized to elemental sulfur which is then disproportionated to sulfide and sulfate (47). In cable bacteria (see below), and in Desulfurivibrio alkaliphilus (48), a sulfide:quinone oxidoreductase (SQR) is the candidate enzyme to catalyse the intial oxidation of sulfide. However, we were unable to identify genes encoding SQR in the genomes of characterized sulfide-oxidizing members of the genus Desulfobulbus (Table S2); therefore, multiple mechanisms for sulfide oxidation may exist within the family Desulfobulbaceae. The genomes of Ca. Electronema sp. GS and Ca. E. aarhusiensis MCF encode SQR (H206_01218, Ga0183576_101211), and it was also detected in the proteome of Ca. Electronema sp. GS (SI Data 2). SQR is a periplasmic enzyme that catalyzes the oxidation of sulfide to zerovalent sulfur coupled to the reduction of quinones in the membrane (49). The cable bacteria SQRs belong to the type 3 group of SQR enzymes (Fig. S2). Members of this group are generally poorly functionally characterized (49), but available experimental evidence suggests that they function as bona fide SQRs (50, 51). SQRs, including the one of the cable bacteria, have no membrane spanning regions but interact with the membrane through their C-terminal region (49). The cable bacteria SQR carries an N-terminal signal peptide sequence and thus has a predicted periplasmic localization. During extensive microscopic analyses of cable bacteria, sulfur inclusions were never observed (1, 27, 52), indicating that the sulfur produced by SQR is rapidly consumed, or released and dissolved as hydrophilic polysulfides. In some aerobic sulfide oxidizers, zerovalent sulfur produced by SQR-dependent sulfide oxidation is oxidized to sulfite via rhodanese and a sulfur deoxygenase (53,54). A rhodanese with a predicted periplasmic localization is encoded in the genomes of both Ca. Electronema sp. GS and Ca. E. aarhusiensis MCF (H206_00301, Ga0183576_12525), and it is abundant in the proteome of Ca. Electronema sp. GS (SI Data 2). Sulfur dioxygenases which belong to the metallo-lactamase super-family (IPR001279) generally have a low degree of sequence conservation, and are thus difficult to identify bioinformatically (53). The cable bacteria genomes encode members of this superfamily, however the function of these proteins remains hypothetical. The Ca. E. aarhusiensis MCF genome encodes a protein (H206_01737) with fused N-terminal rhodanese and C-terminal metallo-lactamase super-family domains. Homologs of this protein are also present in the genomes of other marine cable bacteria (Ca. E. communis A1, Ca. E. marina A2, Ca. E. marina A5) and various Desulfobulbus species. Their function may resemble that of sulfur dioxygenase proteins in Burkholderia, which have a fused rhodanese and dioxygenase domain (55). However, as discussed in the main text, rhodanese domain proteins may alternatively be involved in the transfer of zerovalent sulfur from the periplasm to the cytoplasm where it is oxidized. The enzymatic pathway for disproportionation of elemental sulfur is generally poorly resolved (56), but in D. alkaliphilus it seems to involve the enzymes of the canonical sulfate reduction pathway (48). Notably, the capacity for sulfur disproportionation and for autotrophic growth appears to coincide with the presence of a conserved gene cluster encoding a heterodisulfide reductase (HdrC), a methyl-viologen-reducing hydrogenase (mvhD), and a methylene-tetrahydrofolate reductase-like protein (Fig. S10). This set of genes is fully conserved in autotrophic, sulfur disproportionating members of the family Desulfobulbaceae as well as in non-

Page 10: On the Evolution and Physiology of Cable Bacteria...1 On the Evolution and Physiology of Cable Bacteria Kasper U. Kjeldsen, Lars Schreiber, Casper A. Thorup, Thomas Boesen, Jesper

10

deltaproteobacterial sulfur disproportionators but is absent in heterotrophic relatives; it also occurs in cable bacteria, which lends support to their sulfur disproportionation capacity, even though the function of this gene cluster in disproportionation or autotrophy remains to be unraveled. As discussed in the main text, our model of sulfide oxidation and sulfur disproportionation in cable bacteria proposes that elemental sulfur produced by the activity of SQR in the periplasm is partly reduced to sulfide by the activity of a energy-conserving polysulfide reductase and partly transported into the cytoplasm, where it is oxidized to sulfate by a reversal of the canonical sulfate-reduction pathway (main text, Fig. 3). The initial step in the latter pathway is the reaction of sulfur (or possibly sulfide) with DsrC (main text Fig. 3). The cable bacteria genomes encode a single DsrC homolog with a CBX10CA conserved cysteine structure associated with the sulfate reduction pathway (57) and its reversal in sulfide oxidizing-prokaryotes (48, 58). Similar to D. alkaliphilus, cable bacteria genomes do not encode DsrEFH, which is likely essential for the sulfide oxidation pathway involving reverse-type DsrAB (rDSR) enzymes (58) but do encode DsrD adjacent to DsrAB, which is considered indicative for a reductive type dissimilatory sulfur metabolism (59). The genomic content together with the sulfide oxidizer phenotype of D. alkaliphilus and cable bacteria questions the use of dsrAB and dsrD as functional marker genes for sulfate reduction. In deltaproteobacterial sulfate reducers DsrMK is part of the membrane bound complex DsrMKJOP (60). DsrMK forms a module that mediates electron transfer between the cytoplasm and the quinone pool while DsrJOP forms a module involved in electron transfer between the quinone pool and the periplasm (61,62). DsrJOP appears absent in cable bacteria (Fig. S3) and is evidently not essential for the functioning of DsrMK since DsrJOP is also absent in Gram positive dissimilatory sulfate reducers (63). The cable bacteria also lack other membrane complexes such as Tmc, Hmc, Qrc, and Och that mediate electron transfer between the periplasm and cytoplasm in SRM (62). In this way cable bacteria resemble a cytochrome-poor dissimilatory sulfate reducer (64) even though the cable bacteria genomes encode several periplasmic cytochromes (Table S5), some of which are abundant in the proteome of Ca. Electronema sp. GS (SI Data 2). As also discussed in the main text we suspect that this reflects a need for cable bacteria to tightly control the exchange of electrons between cytoplasm and periplasm.

Electron transfer from the membrane quinone pool to the periplasm. In Geobacter electron transfer from the quinone pool to the periplasm is facilitated by the membrane-bound, proton-translocating cytochrome bc complex that delivers electrons to soluble periplasmic cytochromes (65,66). The canonical bc1 complex consists of a membrane-bound Rieske Fe-S domain protein and membrane-bound cytochromes b and c1 (67,68). The genomes of Ca. E. aarhusiensis MCF and Ca. Electronema sp. GS encode a membrane-bound Rieske Fe-S domain protein and an adjacent membrane-bound cytochrome b-domain protein that match both the N (IPR005797) and C terminal domain (IPR005798) models of the bc complex cytochrome b. The bc complex translocates protons by a Q-cycling mechanism, where the quinone/quinol-binding and proton release sites are located in the Rieske and the cytochrome b subunits (67,68), respectively. Similar to Geobacter (65), cable bacteria do not encode a c1-type cytochrome but another cytochrome possibly could substitute c1 for electron transfer.

Another possibility for electron transfer from the quinone pool to the periplasm may be via CydA: Desulfobulbaceae possess a quinol-dependent membrane-bound terminal oxygen reductase encoded by the linked genes cydA and cydB (69). The CydA subunit oxidizes the quinone pool and transfers electrons to the heme d/heme b595-binding active site formed between the CydA and CydB subunits, which catalyzes the reduction of dioxygen (70,71). This high-oxygen affinity bd quinol oxidase is believed to protect sulfate reducers against oxygen stress (72,73).

Unlike other members of the family Desulfobulbaceae (Fig. 1 in the main text), cable bacteria do not possess a bd quinol oxidase as their genomes do not encode a CydB homolog. However, their

Page 11: On the Evolution and Physiology of Cable Bacteria...1 On the Evolution and Physiology of Cable Bacteria Kasper U. Kjeldsen, Lars Schreiber, Casper A. Thorup, Thomas Boesen, Jesper

11

genomes do encode a CydA-like protein (matching the pfam CydA model PF01654) that is distantly related (<40% identity) to known CydA proteins. The cable bacteria CydA (Ga0183576_10441, H206_01637) has 8 transmembrane helices as compared to the 9 in bona fide CydA proteins (70) and notably it has a cytochrome domain (PF13442) in its C-terminal part with a predicted periplasmic localization (Fig. S4). By this domain structure the cable bacteria CydA resembles the membrane bound CymA protein in Shewanella which is involved in transfer of electrons from the quinone pool to the periplasm during growth on extracellular electron acceptors (74). Indeed the CydA subunit of the bd quinol oxidase harbors the quinol binding site and binds the heme ligands of the oxidase (71). The cable bacteria CydA-like protein also has the conserved residues His19, His186, Lys252, Glu257, Met393 known to be important for quinol and heme binding (70). We therefore hypothesise that the cable bacteria CydA functions in oxidizing the reduced quinone pool in the membrane. The electrons may then be transferred to the periplasmic cytochrome domain of the cable bacteria CydA and from there to a soluble electron shuttle in the periplasm. Desulfobulbus and Desulfocapsa species all carry two proteins matching the pfam model for CydA. One of them is encoded next to cydB in their genomes and represents the CydA of the bd quinol oxidase. The second, orphan, CydA-like protein in Desulfobulbus and Desulfocapsa also carries a cytochrome domain, but in its central part, and otherwise shares very low amino sequence similarity (<30%) with the cable bacteria CydA-like proteins. The bd quinol oxidase contributes to energy conservation by generating a proton gradient across the cytoplasmic membrane by oxiding quinol on the periplamic side of the membrane and releasing protons to the periplasm while consuming protons from the cytoplasm by oxygen reduction (70,71). Whether CydA represents an energy-conserving mechanism in the electron transport chain of the cable bacteria needs further investigation.The cable bacteria genomes encode several C-type cytochromes, most with a predicted periplasmic localization (Table S5), and Raman microscopy showed a high abundance of c-type cytochromes in live cable bacteria (75, Fig. S5). We therefore propose a model where c-type cytochromes play a key role in transferring electrons from the Rieske-cytochrome b complex or CydA to the conductive periplasmic fibers (main text Fig. 3). Possibly this transfer may also involve the diheme cytochrome MacA as in Geobacter (76). A highly expressed MacA homolog is encoded in the genomes of Ca. Electronema sp. GS and Ca. E. aarhusiensis MCF (H206_05370, Ga0183576_10582) sharing 43% full length sequence identity with MacA of Geobacter sulfurreducens PCA (NP_951525).

Pilus-associated genes. Cable bacteria contain at least three operons of genes associated with type IV pili or type II secretion systems (Table S6), that, together with >10 other genes encode a complete pilus apparatus. This includes genes encoding the PilQ secretin which forms a pore in the outer membrane through which the pilus normally protrudes into the extracellular environment (77). However, as discussed in the main text, extracellular pili were never observed in cable bacteria despite extensive electron microscopy imaging. PilQ also serves an essential function as initiator of the assembly of the pilus machinery. We therefore speculate that PilQ is also essential for assembly and stabilization of the pilus machinery in cable bacteria and thus for the formation of the periplasmic pili that we hypothesize form (part of) the periplasmic fibers. The cryo-EM structure of the Pseudomonas aeruginosa PilQ secretin at 7.4Å revealed a central gate that is closed when the pilus is depolymerized (78). It is conceivable that the cable bacterial secretin PilQ is in a constant closed state, and the major functional role of this protein in cable bacteria is structural in relation to the assembly of the pilus machinery. The pilA genes in Ca. E. aarhusiensis MCF and Ca. Electronema sp. GS are not part of any of those operons but instead linked to genes encoding proteins with tetratricopeptide-repeats that may aid in assembling individual PilA subunits into continuous fibers that span the entire cable bacterium filament (see main text). In Ca. Electrothrix, the pilA operon also encodes three sugar-modifying enzymes (Fig. S6), which we speculate could be involved in pilus glycosylation and thereby integrating pili into the cell envelope or onto a sugar backbone. Genes encoding the same enzymes are also present in Ca.

Page 12: On the Evolution and Physiology of Cable Bacteria...1 On the Evolution and Physiology of Cable Bacteria Kasper U. Kjeldsen, Lars Schreiber, Casper A. Thorup, Thomas Boesen, Jesper

12

Electronema, although not co-located with pilA (Fig. S6). This may be due to the genome-rearranging activity of transposases encoded in the vicinity of these genes in the genome of Ca. Electronema. Taken together, we speculate that the genomic arrangement of the pil genes and especially pilA may support an assembly of multiple e-pili, potentially onto a carbohydrate backbone, into larger fiber structures.

Terminal oxygen reduction. The genome of Ca. Electrothrix communis A1 encodes all four subunits of a membrane-bound cytochrome c oxidase (Cox1-4: Ga0068569_12322 and _13881 to _13884), of which the catalytic subunit shares 60% amino acid sequence identity with the cc(o/b)o3 type characterized in Desulfovibrio vulgaris (79). All four subunits are most closely related to homologs in other Desulfobulbaceae species (~75% amino acid sequence identity), yet these genes are present as the sole genes on two single contigs in the Ca. E. communis A1 genome assembly, and their presence may be the result of an erroneous assembly (i.e., contamination). The other cable bacteria genomes did not encode membrane bound cytochrome oxidases, neither a cbb3 nor an aa3 type (80). Therefore the presence of cytochrome c oxidase in Ca. E. communis A1 is most likely either an artifact or a result of horizontal gene transfer. This type of terminal oxidase is thus, just like the quinol-dependent membrane-bound terminal oxygen reductase CydAB discussed in the previous section, unlikely to play a key role in the general oxygen metabolism of cable bacteria. The Ca. E. aarhusiensis MCF and Ca. Electronema sp. GS genomes encode homologous proteins (H206_00640, Ga0183576_12429), consisting of a truncated hemoglobin and a tetraheme cytochrome domain. The two proteins share 55% amino acid identity (Fig. S7); however, the Ca. Electronema sp. GS protein has an N- and a C-terminal truncated hemoglobin domain and a central multiheme cytochrome domain, while the Ca. E. aarhusiensis MCF protein lacks the N-terminal truncated hemoglobin domain and is correspondingly shorter (Fig. S7A). Both proteins possess a predicted N-terminal signal peptide, and as they have no predicted membrane spanning regions they are likely located in the periplasm (Fig. S7A). Fusion proteins with globin and other functional domains are known from lower eukaryotes and bacteria (81). However, the presence of globin and cytochrome domains in a single protein is unique and never previously observed according to blastp searches against the NCBI nr database and CDART domain architecture searches (82). We validated that this fusion protein is not a result of a genome assembly error by PCR amplifying, cloning and sequencing the gene from DNA extracted from two Aarhus Bay sediment cable bacterium enrichment cultures using a custom designed primer set targeting the gene regions encoding the cytochrome domain and the C-terminal globin domain of the protein (see SI Methods). The sequencing confirmed that the two domains were encoded in a single ORF devoid of internal frame shifts or stop codons. The cytochrome part of the protein shares 50-60% identity with homologs in other deltaproteobacterial species, while the hemoglobin part shares 60-70% identity with single-domain truncated globins from a taxonomically diverse set of bacteria and eukaryotes and phylogenetically belongs to the Group I clade of hemoglobins (Fig. S7B). Little is known about the functions of hemoglobins in bacteria, but roles in oxygen sensing, catalytic nitrosative stress protection and oxygen binding and reduction have been proposed (81, 83, 84). The fusion of two such redox active domains suggests a role in an oxidation-reduction process. For a fusion protein with a globin domain and a monooxygenase domain in Streptomyces avermitilis, the globin domain is proposed to function in oxygen activation (85), and in mammalian cells neuroglobin can act as an electron donor for cytochrome c (86). As discussed in the main text, it is thus tempting to speculate that the unique truncated hemoglobin-cytochrome domain protein catalyzes periplasmic oxygen reduction in cable bacteria.

Protection from oxidative stress. Similar to many other sulfate reducers Ca. E. aarhusiensis MCF and Ca. Electronema sp. GS harbor a cytoplasmic oxygen reductase (rubredoxin-oxygen oxidoreductase [H206_00510/11, Ga0183576_1519]) that uses reducing equivalents in the form of NADH from stored polyglucose to reduce and thereby detoxify oxygen (64). In accordance,

Page 13: On the Evolution and Physiology of Cable Bacteria...1 On the Evolution and Physiology of Cable Bacteria Kasper U. Kjeldsen, Lars Schreiber, Casper A. Thorup, Thomas Boesen, Jesper

13

cable bacteria harbor a complete pathway for synthesizing and degrading polyglucose (Fig. S9) (87,88). The absence of genes encoding the glycolytic enzyme enolase (Fig. S9, see main text for details) in the cable bacteria genomes may question whether they can produce NADH and thus also ATP from polyglucose degradation, yet the consistent presence of the polyglucose metabolism genes in cable bacteria genomes and the observation of polyglucose granules in cable bacteria cells (main text Fig. 5C) suggest otherwise. In both Ca. Electronema sp. GS and Ca. E. aarhusiensis MCF, the rubredoxin-oxygen oxidoreductase is part of a putative operon (H206_00509- H206_00521; Ga0183576_1519, Ga0183576_1110 - Ga0183576_1114) likely involved in protection against oxidative stress. This operon includes genes encoding the super oxide reductase desulfoferredoxin (89) and the hydrogen peroxide reductase rubrerythrin (90). Both of these cytoplasmic enzymes are believed to receive electrons from polyglucose degradation via the electron donor rubredoxin (91) also encoded in this operon. The Ca. E. aarhusiensis MCF genome also encodes the cytoplasmic bifunctional catalase/peroxidase KatG (H206_00990/1) for oxidative stress protection. Ca. Electronema sp. GS encodes two cytoplasmic catalases (Ga0183576_1609, Ga0183576_11638). Finally, Ca. Electronema sp. GS and Ca. E. aarhusiensis MCF also encode a superoxide reductase with a predicted periplasmic localization (H206_01519, Ga0183576_1191) and homologs (H206_06117-H206_06119; Ga0183576_10745-Ga0183576_10747) of the BatA-E proteins putatively involved in periplasmic oxidative stress protection in Bacteroides fragilis (92).

Membrane complexes involved in electron transfer from/to NADH and ferredoxin. The genomes of the marine strains Ca. E. aarhusiensis MCF and Ca. E. marina A5 encode a Na+-translocating NADH:ubiquinone oxidoreductase membrane complex (NqrA-F, H206_01090 -H206_01081 and Ga0068572_10891-92, Ga0068572_10896, Ga0068572_13931-33). Nqr transfers electrons from NADH to the quinol pool while translocating Na+ across the membrane. The MCF genome encodes several H+/Na+ antiporters and possibly this complex facilitates energy conservation from organotrophic growth. However, perhaps more likely the presence of this complex is an adaptation to growth in a saline environment as the Nqr complex is absent in the freshwater Ca. Electronema sp. GS. Similarly, this complex is present in the two marine Desulfobulbus species (D. japonicus and D. mediterraneus) but absent in the two freshwater Desulfobulbus species D. propionicus and D. elongatus. The genomes of both Ca. E. aarhusiensis MCF and Ca. Electronema sp. GS encode a proton-translocating NAD(P)H-quinone oxidoreductase complex (Nuo), which is known to serve a similar function as the Nqr complex in transferring electrons between cytoplasmic NADH and the quinone pool (93). The Nuo complex may also catalyze the reduction of NAD+ to NADH with quinol as electron donor by reverse electron flow thereby producing NAD(P)H (94) e.g. needed for CO2 assimilation. The cable bacteria do not possess NuoEFG subunits, which form the NADH dehydrogenase module of the complex. This is commonly observed among SRM and it was suggested that their Nuo complexes oxidize ferredoxin (H206_03280, Ga0183576_11234) rather than NADH (62). Whether reverse electron flow will also facilitate the reduction of ferredoxin is not known. However, as cable bacteria are predicted to assimilate CO2 autotrophically by the acetyl-CoA pathway, they need a mechanism for producing reduced ferredoxin from sulfide oxidation, as it is a key electron donor for the acetyl-CoA pathway along with NADH (95). Notably, the entire set of nuo genes [nuoA-D, H-N] in both Ca. E. aarhusiensis MCF and Ca. Electronema sp. GS seems acquired by a recent lateral gene transfer as their nuo genes consistently show high similarity to homologs in members of the genus Fibrobacter (phylum Fibrobacteres) as compared to homologs in Desulfobulbaceae (Table S4). In agreement, the NuoBC subunits are fused both in cable bacteria and in Fibrobacter species, but not in any other bacteria according to blastp search against the NCBI database. Unlike Fibrobacter intestinalis ATCC 43854 and Fibrobacter succinogenes S85, cable bacteria however lack genes encoding NuoEFG subunits. Known members of the genus Fibrobacter are obligate anaerobes unable to grow by respiration and are typically found in herbivore guts (96,97). The functional significance, if any, of the

Page 14: On the Evolution and Physiology of Cable Bacteria...1 On the Evolution and Physiology of Cable Bacteria Kasper U. Kjeldsen, Lars Schreiber, Casper A. Thorup, Thomas Boesen, Jesper

14

transfer of the nuo gene cluster from a relative of this group to cable bacteria is not clear. Cable bacteria genomes furthermore encode a putative ferredoxin-NADP reductase (H206_01641, Ga0183576_10951) sharing 29% full-length amino acid sequence identity with the characterized enzyme (B9ZYL6) of Hydrogenobacter thermophilus (98) and sharing the same domain structure with a N-terminal NAD-binding domain (PF00175) and a C-terminal FAD-binding domain (PF00970). This cytoplasmic enzyme catalyzes reversible electron transfer between NADP+/NADPH and ferredoxins and it is possibly involved in production of reduced ferredoxin from NAD(P)H in cable bacteria. The soluble heterodisulfide reductases discussed below are possibly also involved in ferredoxin production. Cable bacteria lack the energy-conserving RNF complex, known from several deltaproteobacterial sulfate reducers, which couple oxidation of ferredoxin by NAD+ to sodium translocation across the cytoplasmic membrane (62).

Soluble heterodisulfide reductases. Heterodisulfide reductase (HdrABC) is encoded in the genome of Ca. E. aarhusiensis MCF by the linked hdrABC genes (H206_03463- H206_03465), but the other cable bacteria do not encode HdrB or carry adjacent hdrA and C in their genomes. Ca. E. aarhusiensis MCF furthermore carries two additional copies of hdrA both adjacent to a homolog of mvhD encoding a methyl-viologen-reducing hydrogenase, delta subunit with a predicted cytoplasmic localization (H206_00705/H206_03779, H206_01153-55/ H206_01156/7). The gene product of the hdrA copies shares 89% with the product of a hdrA homolog (Ga0183576_1212) in Ca. Electronema sp. GS located at the end of a contig thus missing the adjacent mvhD. The HdrA-MvhD pairs are commonly observed in sulfate-reducing bacteria and likely represent electron transfer modules connecting cytoplasmic redox reactions (99). The two hdrA-mvhD pairs in Ca. E. aarhusiensis MCF are respectively located next to genes with a predicted function in fatty acid synthesis and the acetyl-CoA pathway and may serve as a redox partner in these pathways. The cable bacteria also encode a HdrC homolog, next to a mvhD-homolog and a gene encoding a methylene-tetrahydrofolate reductase-like protein. As discussed above, this set of genes is fully conserved in autotrophic members of the family Desulfobulbaceae which all can grow by sulfur disproportionation (Fig. S10); the gene set therefore possibly functions in CO2 fixation or disproportionation.

Hydrogen as alternative electron donor? Many sulfate reducers, including members of the family Desulfobulbaceae, utilize H2 as electron donors via periplasmic hydrogenases (64), which also, together with periplasmic cytochromes, have a role in oxygen detoxification coupled to proton translocation (100). Such hydrogenases, including Fe-only hydrogenases (pfam02906) and Ni-containing hydrogenases (pfam00374), are absent in cable bacteria. While no hydrogenases at all were detected in the genome of Ca. Electronema sp. GS, Ca. E. aarhusiensis MCF encodes a cytoplasmic Hox hydrogenase and associated maturation proteins (SI Data 3) catalyzing the NAD+-dependent oxidation of H2 (101). Overall, this suggests that H2 is not a general electron donor for cable bacteria.

N2 fixation. The Ca. E. aarhusiensis MCF genome encodes the three catalytic subunits of the molybdenum-dependent nitrogenase (NifDHK) as well as the biosynthetic proteins NifENB (SI Data 3). The Ca. Electronema sp. GS genome likewise encodes NifDHK and NifNB yet lacks NifE (SI Data 3). Together this indicates the potential for N2 fixation to ammonium (102). Ammonium and amino acids can also be acquired from the environment, as cable bacteria encode an AmtB-family ammonium transporter as well as predicted amino acid and peptide ABC-type transporters (SI Data 3). Cable bacteria thus resemble deltaproteobacterial sulfate reducers, some of which are heterotrophic diazotrophs (103). Although ammonium is rarely limiting in anaerobic sediment, heterotrophic N2 fixation can be significant in marine sediments (104). The role of N2 fixation in the ecology of cable bacteria remains to be shown.

Page 15: On the Evolution and Physiology of Cable Bacteria...1 On the Evolution and Physiology of Cable Bacteria Kasper U. Kjeldsen, Lars Schreiber, Casper A. Thorup, Thomas Boesen, Jesper

15

Motility and chemotaxis. Cable bacteria are motile by an unknown mechanism; based on microscopic observations, they were hypothesized to move by gliding motility (27). In support of that, genes encoding flagella, which are generally conserved within the Desulfobulbaceae, could not be detected in the cable bacteria genomes. Characterized genes involved in gliding motility in Flavobacteria (105,106) were not detected either in the cable bacteria genomes. The genomes of Ca. E. aarhusiensis MCF and Ca. Electronema sp. GS however encode several putative polysaccharide exporters consisting of a fused permease and cytoplasmic ATPase component (COG1132), which could be involved in excretion-based gliding motility as known from cyanobacteria (107), Beggiatoa (108), and Myxobacteria (109). Desulfonema limicola is the closest filamentous relative of Ca. E. aarhusiensis MCF with available genome data. This species moves by gliding motility (110) and features, similarly to Ca. E. aarhusiensis MCF, several genes coding for putative polysaccharide exporters (BioProject accession number PRJNA50089). Cable bacteria actively position themselves in an oxygen-sulfide gradient presumably by chemotaxis (27). The genomes of Ca. Electronema sp. GS and Ca. E. aarhusiensis MCF contain three putative chemotaxis operons (operon 1, Ga0183576_1157-59; H206_01667-69; operon 2, Ga0183576_1376-79; H206_01375-84 and operon 3 Ga0183576_10430-40; H206_02778-84). This is significantly fewer than other Desulfobulbaceae, which appear to feature 11-23 chemotaxis operons per genome (based on using the presence of genes with methyl-accepting chemotaxis protein domains as a proxy for chemotaxis operons). This indicates limited or highly focused response capabilities to the environment by cable bacteria. The specificities of the chemotaxis sensing domains could not be determined. In both genomes chemotaxis operon 1 includes a putative TauE-like anion permease and ABC transporters for anions, while the other operons include hypothetical proteins and in Ca. Electronema sp. GS operon 2 and 3 include tRNA synthetase domain proteins; tRNA synthetases were previously suggested to play a role in regulation of (flagellar) motility (111).

Cable bacteria genomes show evidence of strong virus interaction. As discussed above, the COG functional categories V (defense mechanisms) and D (cell cycle control, cell division, chromosome partitioning) were overrepresented in cable bacteria (Fig. S1A). The high representation of category V in Ca. E. aarhusiensis MCF is in part due to Type I restriction-modification enzymes (COG0732, COG0286, COG0610, n=22). Restriction systems may act as mobile genetic elements (112), yet phylogenetic analysis did not indicate a recent expansion of these genes within the Ca. E. aarhusiensis MCF genome. A high load of mobile elements was reported in the filamentous sulfide oxidizer Ca. Maribeggiatoa sp., for which a role in cell differentiation was hypothesized (113). However the predominance of these genes may also reflect that cable bacteria are challenged by phages (114). This is furthermore supported by the high representation of COG category D. Most members of this category are antitoxins of toxin-antitoxin modules (COG4118, COG2161, n=10) with adjacent genes annotated as toxins. Toxin-antitoxin modules are widespread in bacterial genomes and may function in defense against phages (115). Although not assigned to single COGs, genes encoding or interacting with restriction enzymes and toxin-antitoxin modules are also overrepresented in the genomes of the other cable bacteria relative to cultivated members of the family Desulfobulbaceae (Table S9). The genome of Ca. Electronema sp. GS is furthermore characterized by an unusually long CRISPR region as compared to cultivated Desulfobulbaceae (Table S10), indicative of an elevated immunity to virus attack (116). Together this suggests that cable bacteria experience a high rate of virus predation in agreement with a lifestyle where growth occurs in blooms and kill-the-winner (117) virus predation dynamics.

Isoelectric point of proteins suggests that cable bacteria may not tolerate high pH. The activity of cable bacteria causes a pronounced pH peak of 8.5 and higher in the oxic sediment surface (1, 118, 119). Since the pH in the periplasm of Gram-negative bacteria largely resembles that of the environment (120), we suspected that periplasm-exposed proteins were adapted to

Page 16: On the Evolution and Physiology of Cable Bacteria...1 On the Evolution and Physiology of Cable Bacteria Kasper U. Kjeldsen, Lars Schreiber, Casper A. Thorup, Thomas Boesen, Jesper

16

higher pH, which would be indicated by their isoelectric point (pI). However, pI biases of periplasm-exposed proteins (i.e. those with signal peptides or transmembrane helices) of cable bacteria largely resemble those of neutrophilic Desulfobulbaceae and are significantly higher than those of alkaliphilic Desulfobulbaceae (Fig. S13). This suggests that cable bacteria, like their neutrophilic relatives (121, 122), are adapted to environments of neutral pH and may not grow under alkaline conditions. In particular, the high pH in the oxic zone of blooming cable bacteria populations (118, 123) may cause denaturation of periplasmic proteins in the cathodic cells and contribute to the collapse of cable bacteria blooms.

Gene regulation. Relative to their genome sizes, Aarhus Bay cable bacteria have on average fewer predicted regulatory genes, i.e. genes with small-molecule-binding domains (124), with cyclic-di-GMP-binding domains (125), and genes potentially interacting with histidine kinases of two component systems (126), than other members of the Desulfobulbaceae (Fig. S12). Average densities of regulatory genes in cable bacteria are comparable to those of other sulfur-cycling filamentous bacteria, such as the deltaproteobacterial sulfate-reducer Desulfonema limicola or gammaproteobacterial sulfide-oxidizers of the family Beggiatoaceae.

References 1. Pfeffer C, Larsen S, Song J, Dong M, Besenbacher F, Meyer RL, Kjeldsen KU,

Schreiber L, Gorby YA, El-Naggar MY, Leung KM, Schramm A, Risgaard-Petersen N, Nielsen LP (2012) Filamentous bacteria transport electrons over centimetre distances. Nature 491(7423):218–221.

2. Schmieder R, Edwards R (2011) Quality control and preprocessing of metagenomic datasets. Bioinformatics 27(6):863–864.

3. Bankevich A, Nurk S, Antipov D, Gurevich A a., Dvorkin M, Kulikov AS, Lesin VM, Nikolenko SI, Pham S, Prjibelski AD, Pyshkin AV, Sirotkin AV, Vyahhi N, Tesler G, Alekseyev MA, Pevzner PA. (2012) SPAdes: A new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol 19(5):455–477.

4. Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL (2009) BLAST+: Architecture and applications. BMC Bioinformatics 10:1–9.

5. Stajich JE, Block D, Boulez K, Brenner SE, Chervitz SA, Dagdigian C, Fuellen G, Gilbert JGR, Korf I, Lapp H, Lehvaslaiho H, Matsalla C, Mungall CJ, Osborne BI, Pocock MR, Schattner P, Senger M, Stein LD, Stupka E, Wilkinson MD, Birney E (2002) The Bioperl Toolkit: Perl modules for the life sciences. Genome Res 12:1611–1618.

6. Iverson V, Morris RM, Frazar CD, Berthiaume CT, Morales RL, Armbrust EV (2012) Untangling genomes from metagenomes: revealing an uncultured class of marine Euryarchaeota. Science 335(587):587–591.

7. Langmead B, Salzberg SL. (2012) Fast gapped-read alignment with Bowtie 2. Nat Methods. 9(4):357-359.

8. Markowitz VM, Chen IMA, Palaniappan K, Chu K, Szeto E, Grechkin Y, Ratner A, Jacob B, Huang J, Williams P, Huntemann M, Anderson I, Mavromatis K, Ivanova NN, Kyrpides NC (2012) IMG: The integrated microbial genomes database and comparative analysis system. Nucleic Acids Res 40(D1):115–122.

9. Benson DA, Cavanaugh M, Clark K, Karsch-Mizrachi I, Ostell J, Pruitt KD, Sayers EW (2018) GenBank. Nucleic Acids Res 46(D1):D41–D47.

10. Larkin MA, Blackshields G, Brown NP, Chenna R, Mcgettigan PA, McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R, Thompson JD, Gibson TJ, Higgins DG (2007) Clustal W and Clustal X version 2.0. Bioinformatics 23(21):2947–2948.

Page 17: On the Evolution and Physiology of Cable Bacteria...1 On the Evolution and Physiology of Cable Bacteria Kasper U. Kjeldsen, Lars Schreiber, Casper A. Thorup, Thomas Boesen, Jesper

17

11. Gordon D, Abajian C, Green P (1998) Consed: a graphical tool for sequence finishing. Genome Res 8:195–202.

12. Kjeldsen KU, Loy A, Jakobsen TF, Thomsen TR, Wagner M, Ingvorsen K (2007) Diversity of sulfate-reducing bacteria from an extreme hypersaline sediment, Great Salt Lake (Utah). FEMS Microbiol Ecol 60(2):287–298.

13. Solden LM, Hoyt DW, Collins WB, Plank JE, Daly RA, Hildebrand E, Beavers TJ, Wolfe R, Nicora CD, Purvine SO, Carstensen M, Lipton MS, Spalinger DE, Firkins JL, Wolfe BA, Wrighton KC (2017) New roles in hemicellulosic sugar fermentation for the uncultivated Bacteroidetes family BS11. ISME J 11(3):691–703.

14. McCarthy AJ, Daly K, Sharp RJ (2000) Development of oligonucleotide probes and PCR primers for detecting phylogenetic subgroups of sulfate-reducing bacteria. Microbiology 146(7):1693–1705.

15. Liu Y-G, Whittier R (1995) Thermal asymmetric interlaced PCR: automatable amplification and sequencing of insert end fragments from P1 and YAC clones for chromosome walking. Genomics 681:674–681.

16. Amann RI, Binder BJ, Olson RJ, Chisholm SW, Devereux R, Stahl DA (1990) Combination of 16S rRNA-targeted oligonucleotide probes with flow cytometry for analyzing mixed microbial populations. Appl Environ Microbiol 56(6):1919–1925.

17. Scheid D, Stubner S (2001) Structure and diversity of Gram-negative sulfate-reducing bacteria on rice roots. FEMS Microbiol Ecol 36(2–3):175–183.

18. Hunt DE, Klepac-Ceraj V, Acinas SG, Gautier C, Bertilsson S, Polz MF (2006) Evaluation of 23S rRNA PCR primers for use in phylogenetic studies of bacterial diversity. Appl Environ Microbiol 72(3):2221–2225.

19. Strous M, Kraft B, Bisdorf R, Tegetmeyer HE (2012) The binning of metagenomic contigs for microbial physiology of mixed cultures. Front Microbiol 3:1–11.

20. Buchfink B, Xie C, Huson DH (2015) Fast and sensitive protein alignment using DIAMOND. Nat Methods 12(1):59–60.

21. Otto TD, Dillon GP, Degrave WS, Berriman M (2011) RATT: Rapid annotation transfer tool. Nucleic Acids Res 39(9):1–7.

22. Hyatt D, Chen GL, LoCascio PF, Land ML, Larimer FW, Hauser LJ (2010) Prodigal: Prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11:119.

23. Seemann T (2014) Prokka: Rapid prokaryotic genome annotation. Bioinformatics 30(14):2068–2069.

24. Rho M, Tang H, Ye Y (2010) FragGeneScan: Predicting genes in short and error-prone reads. Nucleic Acids Res 38(20):1–12.

25. Huson DH, Mitra S, Ruscheweyh H-J, Weber N, Schuster SC (2011) Integrative analysis of environmental sequences using MEGAN4. Genome Res 21(9):1552–1560.

26. Trojan D, Schreiber L, Bjerg JT, Bøggild A, Yang T, Kjeldsen KU, Schramm A (2016) A taxonomic framework for cable bacteria and proposal of the candidate genera Electrothrix and Electronema. Syst Appl Microbiol 39(5):297–306.

27. Bjerg JT, Damgaard LR, Holm SA, Schramm A, Nielsen LP (2016) Motility of electric cable bacteria. Appl Environ Microbiol 82(13):3816–3821.

28. Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30(15):2114–2120.

29. Peng Y, Leung HCM, Chin FY. (2010) IDBA – A practical iterative de Bruijn graph de novo assembler. Research in Computational Molecular Biology. RECOMB 2010. Lecture Notes in Computer Science, Vol 6044., eds Istrail S, Pevzner P, Waterman M (Springer, Berlin, Heidelberg), p 305. 1st ed.

30. Karst SM, Kirkegaard RH, Albertsen M (2014) Mmgenome: A toolbox for reproducible genome extraction from metagenomes. bioRxiv:059121.

Page 18: On the Evolution and Physiology of Cable Bacteria...1 On the Evolution and Physiology of Cable Bacteria Kasper U. Kjeldsen, Lars Schreiber, Casper A. Thorup, Thomas Boesen, Jesper

18

31. Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW (2015) CheckM : assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res 25(7):1043–1055.

32. Benedict MN, Henriksen JR, Metcalf WW, Whitaker RJ, Price ND (2014) ITEP: An integrated toolkit for exploration of microbial pan-genomes. BMC Genomics 15:8.

33. Van Dongen S (2008) Graph Clustering via a discrete uncoupling process. SIAM J Matrix Anal Appl 30(1):121–141.

34. R Core team (2016) R: A language and environment for statistical computing. Available at: https://www.r-project.org/.

35. Katoh K, Standley DM (2013) MAFFT multiple sequence alignment software version 7: Improvements in performance and usability. Mol Biol Evol 30(4):772–780.

36. Ludwig W, Strunk O, Westram R, Richter L, Meier H, Yadhukumar A, Buchner A, Lai T, Steppi S, Jacob G, Förster W, Brettske I, Gerber S, Ginhart AW, Gross O, Grumann S, Hermann S, Jost R, König A, Liss T, Lüßbmann R, May M, Nonhoff B, Reichel B, Strehlow R, Stamatakis A, Stuckmann N, Vilbig A, Lenke M, Ludwig T, Bode A, Schleifer KH (2004) ARB: A software environment for sequence data. Nucleic Acids Res 32(4):1363–1371.

37. Stamatakis A (2014) RAxML version 8: A tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30(9):1312–1313.

38. Dueholm MS, Nielsen PH (2016) Amyloids – a neglected child of the slime. The Perfect Slime: Microbial Extracellular Polymeric Substances (EPS), eds Flemming HC, Neu JR, Wingender J (IWA Publishing, London), pp 113–133.

39. Danielsen HN, Hansen SH, Herbst FA, Kjeldal H, Stensballe A, Nielsen PH, Dueholm MS (2017) Direct identification of functional amyloid proteins by label-free quantitative mass spectrometry. Biomolecules 7(3):1–9.

40. Nilsson M, Givskov M, Overgaard MT, Søndergaard MT, Stensballe A, Otzen DE, Tolker-Nielsen T, Dueholm MS, Nielsen PH, Christiansen G (2013) Expression of Fap amyloids in Pseudomonas aeruginosa, P. fluorescens, and P. putida results in aggregation and increased biofilm formation. Microbiology Open 2(3):365–382.

41. Vizcaíno JA, Csordas A, del-Toro N, Dianes JA, Griss J, Lavidas I, Mayer G, Perez-Riverol Y, Reisinger F, Ternent T, Xu QW, Wang R, Hermjakob H. (2016) 2016 update of the PRIDE database and related tools. Nucleic Acids Res. 44(D1):D447-D456.

42. Nielsen JL, Christensen D, Kloppenborg M, Halkjær Nielsen P (2003) Quantification of cell-specific substrate uptake by probe-defined bacteria under in situ conditions by microautoradiography and fluorescence in situ hybridization. Environ Microbiol 5(3):202–211.

43. Nierychlo M, Nielsen JL, Nielsen PH (2017) Studies of the Ecophysiology of Single Cells in Microbial Communities by (Quantitative) Microautoradiography and fluorescence in situ hybridization (MAR-FISH). Hydrocarbon and Lipid Microbiology Protocols, eds McGenity TJ, Timmis NK, Nogales B (SpringeR, Berlin, Heidelberg), pp 115–131. 1st ed.

44. Schneider CA, Rasband WS, Eliceiri KW (2012) NIH Image to ImageJ: 25 years of image analysis HHS public access. Nat Methods 9(7):671–675.

45. Kiraga J, Mackiewicz P, Mackiewicz D, Kowalczuk M, Biecek P, Polak N, Smolarczyk K, Dudek MR, Cebrat S (2007) The relationships between the isoelectric point and: length of proteins, taxonomy and ecology of organisms. BMC Genomics 8:163.

46. Hatzenpichler R, Scheller S, Tavormina PL, Babin BM, Tirrell DA, Orphan VJ (2014) In situ visualization of newly synthesized proteins in environmental microbes using amino acid tagging and click chemistry. Environ Microbiol 16(8):2568–2590.

Page 19: On the Evolution and Physiology of Cable Bacteria...1 On the Evolution and Physiology of Cable Bacteria Kasper U. Kjeldsen, Lars Schreiber, Casper A. Thorup, Thomas Boesen, Jesper

19

47. Fuseler K, Krekeler D, Sydow U, Cypionka H (1996) A common pathway of sulfide oxidation by sulfate-reducing bacteria. FEMS Microbiol Lett 144(2–3):129–134.

48. Thorup C, Schramm A, Findlay AJ, Finster KW, Schreiber L (2017) Disguised as a sulfate reducer: Growth of the deltaproteobacterium Desulfurivibrio alkaliphilus by sulfide oxidation with nitrate. MBio 8(4):1–9.

49. Marcia M, Ermler U, Peng G, Michel H (2010) A new structure-based classification of sulfide:quinone oxidoreductases. Proteins Struct Funct Bioinforma 78(5):1073–1083.

50. Han Y, Perner M (2016) Sulfide consumption in Sulfurimonas denitrificans and heterologous expression of its three sulfide-quinone reductase homologs. J Bacteriol 198(8):1260–1267.

51. Lencina AM, Ding Z, Schurig-Briccio LA, Gennis RB (2013) Characterization of the Type III sulfide:quinone oxidoreductase from Caldivirga maquilingensis and its membrane binding. Biochim Biophys Acta - Bioenerg 1827(3):266–275.

52. Cornelissen R, Bøggild A, Thiruvallur Eachambadi R, Koning RI, Kremer A, Hidalgo-Martinez S, Zetsche E-M, Damgaard LR, Bonné R, Drijkoningen J, Geelhoed JS, Boesen T, Boschker HTS, Valcke R, Nielsen LP, D’Haen J, Manca J V., Meysman FJR (2018) The cell envelope structure of cable bacteria. Front Microbiol 9:3044.

53. Liu H, Xin Y, Xun L (2014) Distribution, diversity, and activities of sulfur dioxygenases in heterotrophic bacteria. Appl Environ Microbiol 80(5):1799–1806.

54. Wu W, Pang X, Lin J, Liu X, Wang R, Lin J, Chen L (2017) Discovery of a new subgroup of sulfur dioxygenases and characterization of sulfur dioxygenases in the sulfur metabolic network of Acidithiobacillus caldus. PLoS One 12(9):1–23.

55. Motl N, Skiba MA, Kabil O, Smith JL, Banerjee R (2017) Structural and biochemical analyses indicate that a bacterial persulfide dioxygenase-rhodanese fusion protein functions in sulfur assimilation. J Biol Chem 292(34):14026–14038.

56. Finster K (2008) Microbiological disproportionation of inorganic sulfur compounds. J Sulfur Chem 29(3–4):281–292.

57. Santos AA, Venceslau SS, Grein F, Leavitt WD, Dahl C, Johnston DT, Pereira IAC (2015) A protein trisulfide couples dissimilatory sulfate reduction to energy conservation. Science 350(6267):1541–1545.

58. Venceslau SS, Stockdreher Y, Dahl C, Pereira IAC (2014) The “bacterial heterodisulfide” DsrC is a key protein in dissimilatory sulfur metabolism. Biochim Biophys Acta - Bioenerg 1837(7):1148–1164.

59. Anantharaman K, Hausmann B, Jungbluth SP, Kantor RS, Lavy A, Warren LA, Rappé MS, Pester M, Loy A, Thomas BC, Banfield JF (2018) Expanded diversity of microbial groups that shape the dissimilatory sulfur cycle. ISME J 12(7):1715–1728.

60. Pires RH, Venceslau SS, Morais F, Teixeira M, Xavier A V., Pereira IAC (2006) Characterization of the Desulfovibrio desulfuricans ATCC 27774 DsrMKJOP complex - A membrane-bound redox complex involved in the sulfate respiratory pathway. Biochemistry 45(1):249–262.

61. Grein F, Venceslau SS, Schneider L, Hildebrandt P, Todorovic S, Pereira IAC, Dahl C (2010) DsrJ, an essential part of the DsrMKJOP transmembrane complex in the purple sulfur bacterium Allochromatium vinosum, is an unusual triheme cytochrome c. Biochemistry 49(38):8290–8299.

62. Pereira IAC, Ramos AR, Grein F, Marques MC, da Silva SM, Venceslau SS (2011) A comparative genomic analysis of energy metabolism in sulfate reducing bacteria and archaea. Front Microbiol 2:69.

63. Junier P, Junier T, Podell S, Sims DR, Detter JC, Lykidis A, Han CS, Wigginton NS, Gaasterland T, Bernier-Latmani R (2010) The genome of the Gram-positive metal-

Page 20: On the Evolution and Physiology of Cable Bacteria...1 On the Evolution and Physiology of Cable Bacteria Kasper U. Kjeldsen, Lars Schreiber, Casper A. Thorup, Thomas Boesen, Jesper

20

and sulfate-reducing bacterium Desulfotomaculum reducens strain MI-1. Environ Microbiol 12(10):2738–2754.

64. Rabus R, Venceslau SS, Wöhlbrand L, Voordouw G, Wall JD, Pereira IAC (2015) A post-genomic view of the ecophysiology, catabolism and biotechnological relevance of sulphate-reducing prokaryotes. Adv Microb Physiol 66:55–321.

65. Butler JE, Young ND, Lovley DR (2010) Evolution of electron transfer out of the cell. BMC Genomics 11:40.

66. Zacharoff L, Chan CH, Bond DR (2016) Reduction of low potential electron acceptors requires the CbcL inner membrane cytochrome of Geobacter sulfurreducens. Bioelectrochemistry 107:7–13.

67. Crofts AR, Lhee S, Crofts SB, Cheng J, Rose S. (2006) Proton pumping in the bc1 complex: a new gating mechanism that prevents short circuits. Biochim Biophys Acta. 1757(8):1019-1034.

68. Trumpower BL (1990) Cytochrome bc1 complexes of microorganisms. Microbiol Rev 54(2):101–129.

69. Lemos RS, Gomes M, Santana M, Legall J, Xavier V, Y MT (2001) The “strict” anaerobe Desulfovibrio gigas contains a membrane-bound oxygen-reducing respiratory chain. FEBS Lett 496:40–43.

70. Borisov VB, Gennis RB, Hemp J, Verkhovsky MI (2011) The cytochrome bd respiratory oxygen reductases. Biochim Biophys Acta - Bioenerg 1807(11):1398–1413.

71. Safarian S, Rajendran C, Müller H, Preu J, Langer JD, Ovchinnikov S, Hirose T, Kusumoto T, Sakamoto J, Michel H (2016) Structure of a bd oxidase indicates similar mechanisms for membrane integrated oxygen reductases. Science 352(6285):583–586.

72. Ramel F, Amrani A, Pieulle L, Lamrabet O, Voordouw G, Seddiki N, Brèthes D, Company M, Dolla A, Brasseur G (2013) Membrane-bound oxygen reductases of the anaerobic sulfate-reducing Desulfovibrio vulgaris Hildenborough: Roles in oxygen defence and electron link with periplasmic hydrogen oxidation. Microbiology 159:2663–2673.

73. Ramel F, Brasseur G, Pieulle L, Valette O, Hirschler-Réa A, Fardeau ML, Dolla A (2015) Growth of the obligate anaerobe Desulfovibrio vulgaris Hildenborough under continuous low oxygen concentration sparging: Impact of the membrane-bound oxygen reductases. PLoS One 10(4):1–17.

74. Clarke TA, Edwards MJ, Gates AJ, Hall A, White GF, Bradley J, Reardon CL, Shi L, Beliaev AS, Marshall MJ, Wang Z, Watmough NJ, Fredrickson JK, Zachara JM, Butt JN, Richardson DJ (2011) Structure of a bacterial cell surface decaheme electron conduit. Proc Natl Acad Sci U S A 108(23):9384–9389.

75. Bjerg JT, Boschker HTS, Larsen S, Berry D, Schmid M, Millo D (2018) Long-distance electron transport in individual, living cable bacteria. Proc Natl Acad Sci U S A 115(22):5786-5791.

76. Seidel J, Hoffmann M, Ellis KE, Seidel A, Spatzal T, Gerhardt S, Elliott SJ, Einsle O (2012) MacA is a second cytochrome c peroxidase of Geobacter sulfurreducens. Biochemistry 51(13):2747–2756.

77. Burrows LL (2005) Weapons of mass retraction. Mol Microbiol. 57(4):878-88. 78. Koo J, Lamers RP, Rubinstein JL, Burrows LL, Howell PL (2016) Structure of the

Pseudomonas aeruginosa Type IVa Pilus Secretin at 7.4 Å. Structure. 2016 24(10):1778-1787.

79. Lamrabet O, Pieulle L, Aubert C, Mouhamar F, Stocker P, Dolla A, Brasseur G (2011) Oxygen reduction in the strict anaerobe Desulfovibrio vulgaris Hildenborough: Characterization of two membrane-bound oxygen reductases. Microbiology 157(9):2720–2732.

Page 21: On the Evolution and Physiology of Cable Bacteria...1 On the Evolution and Physiology of Cable Bacteria Kasper U. Kjeldsen, Lars Schreiber, Casper A. Thorup, Thomas Boesen, Jesper

21

80. Pereira MM, Santana M, Teixeira M (2001) A novel scenario for the evolution of haem-copper oxygen reductases. Biochim Biophys Acta - Bioenerg 1505(2–3):185–208.

81. Hade MD, Kaur J, Chakraborti PK, Dikshit KL (2017) Multidomain truncated hemoglobins: New members of the globin family exhibiting tandem repeats of globin units and domain fusion. IUBMB Life 69(7):479–488.

82. Geer LY, Domrachev M, Lipman DJ, Bryant SH (2002) CDART: Protein Homology by Domain Architecture. Genome Res 12(10):1619–1623.

83. Vinogradov SN, Moens L (2008) Diversity of globin function: Enzymatic, transport, storage, and sensing. J Biol Chem 283(14):8773–8777.

84. Fernandez E, Larsson JT, McLean KJ, Munro AW, Gorton L, von Wachenfeldt C, Ferapontova EE (2013) Electron transfer reactions, cyanide and O2 binding of truncated hemoglobin from Bacillus subtilis. Electrochim Acta 110:86–93.

85. Bonamore A, Attili A, Arenghi F, Catacchio B, Chiancone E, Morea V, Boffi A (2007) A novel chimera: The “truncated hemoglobin-antibiotic monooxygenase” from Streptomyces avermitilis. Gene 398(1–2):52–61.

86. Fago A, Mathews AJ, Moens L, Dewilde S, Brittain T (2006) The reaction of neuroglobin with potential redox protein partners cytochrome b5 and cytochrome c. FEBS Lett 580(20):4884–4888.

87. Fareleira P, Legall J, Xavier A V., Santos H (1997) Pathways for utilization of carbon reserves in Desulfovibrio gigas under fermentative and respiratory conditions. J Bacteriol 179(12):3972–3980.

88. Preiss J (1984) Bacterial glycogen synthesis and its regulation. Annu Rev Microbiol 38(1):419–458.

89. Lombard M, Fontecave M, Touati D, Nivière V (2000) Reaction of the desulfoferrodoxin from Desulfoarculus baarsii with superoxide anion. J Biol Chem 275(1):115–121.

90. Lumppio HL, Shenvi N V., Summers AO, Voordouw G, Kurtz DM (2001) Rubrerythrin and rubredoxin oxidoreductase in Desulfovibrio vulgaris: a novel oxidative stress protection system. J Bacteriol 183(1):101–108.

91. Coulter ED, Kurtz DM (2001) A Role for rubredoxin in oxidative stress protection in Desulfovibrio vulgaris: Catalytic electron transfer to rubrerythrin and two-iron superoxide reductase. Arch Biochem Biophys 394(1):76–86.

92. Tang YP, Dallas MM, Malamy MH (1999) Characterization of the BatI (Bacteroides aerotolerance) operon in Bacteroides fragilis: Isolation of a B. fragilis mutant with reduced aerotolerance and impaired growth in in vivo model systems. Mol Microbiol 32(1):139–149.

93. Efremov RG, Baradaran R, Sazanov LA (2010) The architecture of respiratory complex I. Nature 465(7297):441–445.

94. Chance B, Hollunger G (1961) The interaction of energy and electron transfer reactions in mitochondria. I. General properties and nature of the products of succinate-linked reduction of pyridine nucleotide. J Biol Chem 236:1534–1543.

95. Berg IA (2011) Ecological aspects of the distribution of different autotrophic CO2 fixation pathways. Appl Environ Microbiol 77(6):1925–1936.

96. Ransom-Jones E, Jones DL, McCarthy AJ, McDonald JE (2012) The Fibrobacteres: An important phylum of cellulose-degrading bacteria. Microb Ecol. 63(2):267-281.

97. Abdul Rahman N, Parks DH, Vanwonterghem I, Morrison M, Tyson GW, Hugenholtz P (2015) A phylogenomic analysis of the bacterial phylum Fibrobacteres. Front Microbiol 6:1469.

98. Ikeda T, Nakamura M, Arai H, Ishii M, Igarashi Y (2009) Ferredoxin-NADP+ reductase from the thermophilic hydrogen-oxidizing bacterium, Hydrogenobacter thermophilus TK-6. FEMS Microbiol Lett 297(1):124–130.

Page 22: On the Evolution and Physiology of Cable Bacteria...1 On the Evolution and Physiology of Cable Bacteria Kasper U. Kjeldsen, Lars Schreiber, Casper A. Thorup, Thomas Boesen, Jesper

22

99. Wöhlbrand L, Jacob JH, Kube M, Mussmann M, Jarling R, Beck A, Amann R, Wilkes H, Reinhardt R, Rabus R (2013) Complete genome, catabolic sub-proteomes and key-metabolites of Desulfobacula toluolica Tol2, a marine, aromatic compound-degrading, sulfate-reducing bacterium. Environ Microbiol 15(5):1334–1355.

100. Baumgarten A, Redenius I, Kranczoch J, Cypionka H (2001) Periplasmic oxygen reduction by Desulfovibrio species. Arch Microbiol 176(4):306–309.

101. Lauterbach L, Idris Z, Vincent KA, Lenz O (2011) Catalytic properties of the isolated diaphorase fragment of the NAD+-reducing [NiFe]-hydrogenase from Ralstonia eutropha. PLoS One 6(10):e25939.

102. Dos Santos PC, Fang Z, Mason SW, Setubal JC, Dixon R (2012) Distribution of nitrogen fixation and nitrogenase-like sequences amongst microbial genomes. BMC Genomics 13(1):1–12.

103. Riederer-Henderson MA, Wilson PW (1970) Nitrogen fixation by sulphate-reducing bacteria. Microbiology 61(1):27.

104. Bertics VJ, Löscher CR, Salonen I, Dale AW, Gier J, Schmitz RA, Treude T (2013) Occurrence of benthic microbial nitrogen fixation coupled to sulfate reduction in the seasonally hypoxic Eckernförde Bay, Baltic Sea. Biogeosciences 10(3):1243–1258

105. Rhodes RG, Pucker HG, McBride MJ (2011) Development and use of a gene deletion strategy for Flavobacterium johnsoniae to identify the redundant gliding motility genes remF, remG, remH, and remI. J Bacteriol 193(10):2418–2428.

106. Shrivastava A, Johnston JJ, Van Baaren JM, McBride MJ (2013) Flavobacterium johnsoniae GldK, GldL, GldM, and SprA are required for secretion of the cell surface gliding motility adhesins sprb and remA. J Bacteriol 195(14):3201–3212.

107. Hoiczyk E, Baumeister W (1998) The junctional pore complex, a prokaryotic secretion organelle, is the molecular motor underlying gliding motility in cyanobacteria. Curr Biol 8(21):1161–1168.

108. Larkin JM, Henk MC (1996) Filamentous sulfide-oxidizing bacteria at hydrocarbon seeps of the Gulf of Mexico. Microsc Res Tech 33(1):23–31.

109. Wolgemuth C, Hoiczyk E, Kaiser D, Oster G (2002) How myxobacteria glide. Curr Biol 12(5):369–77.

110. Widdel F, Kohring G-W, Mayer F (1983) Studies on dissimilatory sulfate-reducing bacteria that decompose fatty acids. Arch Microbiol 134(4):286–294.

111. Rajagopala S V., Titz B, Goll J, Parrish JR, Wohlbold K, McKevitt MT, Palzkill T, Mori H, Finley RL, Uetz P (2007) The protein network of bacterial motility. Mol Syst Biol 3:128.

112. Kobayashi I (2001) Behavior of R-M systems as selfish mobile elements and their impact on genome evolution. Nucleic Acids Res 29(18):3742–3756.

113. MacGregor BJ, Biddle JF, Teske A (2013) Mobile elements in a single-filament orange guaymas basin beggiatoa (“Candidatus maribeggiatoa”) sp. draft genome: Evidence for genetic exchange with cyanobacteria. Appl Environ Microbiol 79(13):3974–3985.

114. Loenen WAM, Dryden DTF, Raleigh EA, Wilson GG (2014) Type i restriction enzymes and their relatives. Nucleic Acids Res 42(1):20–44.

115. Sberro H, Leavitt A, Kiro R, Koh E, Peleg Y, Qimron U, Sorek R (2013) Discovery of functional toxin/antitoxin systems in bacteria by shotgun cloning. Mol Cell 50(1):136–148.

116. Iranzo J, Lobkovsky AE, Wolf YI, Koonin E V. (2013) Evolutionary dynamics of the prokaryotic adaptive immunity system CRISPR-Cas in an explicit ecological context. J Bacteriol 195(17):3834–3844.

117. Thingstad TF (2000) Elements of a theory for the mechanisms controlling abundance, diversity, and biogeochemical role of lytic bacterial viruses in aquatic systems. Limnol Oceanogr 45(6):1320–1328.

Page 23: On the Evolution and Physiology of Cable Bacteria...1 On the Evolution and Physiology of Cable Bacteria Kasper U. Kjeldsen, Lars Schreiber, Casper A. Thorup, Thomas Boesen, Jesper

23

118. Schauer R, Risgaard-Petersen N, Kjeldsen KU, Tataru Bjerg JJ, B Jørgensen B, Schramm A, Nielsen LP (2014) Succession of cable bacteria and electric currents in marine sediment. ISME J 8(6):1314–1322

119. Nielsen LP, Risgaard-Petersen N (2015) Rethinking sediment biogeochemistry after the discovery of electric currents. Ann Rev Mar Sci 7(1):425–442.

120. Wilks JC, Slonczewski JL (2007) pH of the cytoplasm and periplasm of Escherichia coli: Rapid measurement by green fluorescent protein fluorimetry. J Bacteriol 189(15):5601–5607.

121. Widdel F, Pfennig N (1982) Studies on dissimilatory sulfate-reducing bacteria that decompose fatty acids II. Incomplete oxidation of propionate by Desulfobulbus propionicus gen. nov., sp. nov. Arch Microbiol 131(4):360–365.

122. Suzuki D, Ueki A, Amaishi A, Ueki K (2007) Desulfobulbus japonicus sp. nov., a novel Gram-negative propionate-oxidizing, sulfate-reducing bacterium isolated from an estuarine sediment in Japan. Int J Syst Evol Microbiol 57(4):849–855.

123. van de Velde S, Lesven L, Burdorf LDW, Hidalgo-Martinez S, Geelhoed JS, Van Rijswijk P, Gao Y, Meysman FJR (2016) The impact of electrogenic sulfur oxidation on the biogeochemistry of coastal sediments: A field study. Geochim Cosmochim Acta 194:211–232.

124. Babu MM, Teichmann SA (2003) Evolution of transcription factors and the gene regulatory network in Escherichia coli. Nucleic Acids Res 31(4):1234–1244.

125. Sondermann H, Shikuma NJ, Yildiz FH (2012) You’ve come a long way: C-di-GMP signaling. Curr Opin Microbiol 15(2):140–146.

126. Gao R, Mack TR, Stock AM (2007) Bacterial response regulators: versatile regulatory strategies from common domains. Trends Biochem Sci 32(5):225–234.

127. Walker DJ, Adhikari RY, Holmes DE, Ward JE, Woodard TL, Nevin KP, Lovley DR (2018) Electrically conductive pili from pilin genes of phylogenetically diverse microorganisms. ISME J 12(1):48–58.

128. Jones P, Binns D, Chang HY, Fraser M, Li W, McAnulla C, McWilliam H, Maslen J, Mitchell A, Nuka G, Pesseat S, Quinn AF, Sangrador-Vegas A, Scheremetjew M, Yong SY, Lopez R, Hunter S (2014) InterProScan 5: Genome-scale protein function classification. Bioinformatics 30(9):1236–1240.

129. Petersen TN, Brunak S, von Heijne G, Nielsen H (2011) SignalP 4.0: discriminating signal peptides from transmembrane regions. Nat Methods 8(10):785–786.

130. Yu NY, Wagner JR, Laird MR, Melli G, Rey S, Lo R, Dao P, Sahinalp SC, Ester M, Foster LJ, Brinkman FS (2010) PSORTb 3.0: improved protein subcellular localization prediction with refined localization subcategories and predictive capabilities for all prokaryotes. Bioinformatics 26(13):1608-1615.

131. Richter K, Rosselló-Móra R (2009) Shifting the genomic gold standard for the prokaryotic species definition Proc Natl Acad Sci U S A 106(45): 19126-19131.

132. Ragsdale SW, Pierce E (2008) Acetogenesis and the Wood-Ljungdahl pathway of CO2 fixation. Biochim Biophys Acta - Proteins Proteomics 1784(12):1873–1898.

133. Macy JM, Ljungdahl LG, Gottschalk G (1978) Pathway of succinate and propionate formation in Bacteroides fragilis. J Bacteriol 134(1):84–91.

134. Worm P, Koehorst JJ, Visser M, Sedano-Núñez VT, Schaap PJ, Plugge CM, Sousa DZ, Stams AJM (2014) A genomic view on syntrophic versus non-syntrophic lifestyle in anaerobic fatty acid degrading communities. Biochim Biophys Acta - Bioenerg 1837(12):2004–2016.

Page 24: On the Evolution and Physiology of Cable Bacteria...1 On the Evolution and Physiology of Cable Bacteria Kasper U. Kjeldsen, Lars Schreiber, Casper A. Thorup, Thomas Boesen, Jesper

24

SI Figures Figure S1. COG-based profiling of gene function.

(A) Comparison of COG profiles of cable bacteria genomes and of other members of the family Desulfobulbaceae (see Table S2).

Page 25: On the Evolution and Physiology of Cable Bacteria...1 On the Evolution and Physiology of Cable Bacteria Kasper U. Kjeldsen, Lars Schreiber, Casper A. Thorup, Thomas Boesen, Jesper

25

(B) COG profiles of the unique gene pool in cable bacteria. Genes unique to cable bacteria were defined as being present in at least the genomes of both Ca. E. aarhusiensis MCF and Ca. Electronema sp. GS among the cable bacteria while being absent in the genomes of all other Desulfobulbaceae. Gene presence/absence was inferred based on gene clustering results of the ITEP pipeline (see SI Methods).

Page 26: On the Evolution and Physiology of Cable Bacteria...1 On the Evolution and Physiology of Cable Bacteria Kasper U. Kjeldsen, Lars Schreiber, Casper A. Thorup, Thomas Boesen, Jesper

26

Figure S2. Phylogenetic position of SQR (sulfide:quinone reductase) proteins (indicated by the arrow and highlighted in bold) encoded in the genomes of Ca. Electronema sp. GS and Ca. Electrothrix aarhusiensis MCF. The tree was inferred by RAxML analysis of an amino acid alignment covering 287 positions with the PROTGAMMALG model of evolution. SQR subgroups were named according to Marcia et al. 2010 (49). FCSD, flavocytochrome c:sulfide dehydrogenase. Bootstrap values based on 100 resamplings are shown at nodes receiving >50% support. The scale bar shows 10% estimated sequence divergence.

Page 27: On the Evolution and Physiology of Cable Bacteria...1 On the Evolution and Physiology of Cable Bacteria Kasper U. Kjeldsen, Lars Schreiber, Casper A. Thorup, Thomas Boesen, Jesper

27

Figure S3. Conserved organization of dsrMKJOP (highlighted in bold) and flanking genes in the genomes of members of the family Desulfobulbaceae. IMG locus tags (8) shown in blue font color identify the first and last gene of each region. Stars indicate the beginning or the end of a contig. Genes are identified by the name of their gene product and/or their best COG match. EF-2: Translation elongation factor 2. Ser. peptidase fam. S41A: Serine peptidase MEROPS family S41A. RsbT: Anti-sigma-factor antagonist domain protein. Grey fill color indicates hypothetical genes. Notably the DsrJ gene of Ca. Electronema sp. GS shares low amino acid sequence identity (44%) with DsrJ of the other members of the family.

Page 28: On the Evolution and Physiology of Cable Bacteria...1 On the Evolution and Physiology of Cable Bacteria Kasper U. Kjeldsen, Lars Schreiber, Casper A. Thorup, Thomas Boesen, Jesper

28

Figure S4. (A). Comparison of CydA protein domain structure in cable bacteria with that of other orphan CydA proteins of other bacteria (See main text for details). The N-terminal CydA domain is shown in blue, the C-terminal diheme CcoP domain in red, and the multiheme cytochrome domain (present in other Desulfobulbaceae) in yellow. Bacteria encoding electrically conductive pili (127) are underlined; their CydA have a similar domain structure as CydA of cable bacteria. IMG (8) locus tags are shown in parentheses. The CydA of CB_MCF is truncated in its C-terminal part due to an assembly error, and a CcoP domain is absent in the predicted gene product. Manual analysis of the nucleotide sequence downstream of the CydA-encoding gene (Contig39 nucleotide position 18600-18931) showed high sequence similarity to the cydA genes of CB_A1, CB_A5, and CG_GS, and the inferred amino acid sequence represents a CcoP domain. Black stars indicate the end of contigs. Abbreviations. CB_GS: Ca. Electronema sp. GS; CB_MCF: Ca. Electrothrix aarhusiensis MCF; CB_A1: Ca. Electrothrix communis A1; CB_A2: Ca. Electrothrix marina A2; CB_A3: Ca. Electrothrix marina A3; CB_A5: Ca. Electrothrix marina A5; D.: Desulfobulbus; C.: Calditerrivibrio; F.: Flexistipes; G.: Geobacter. (B) proposal for CydA as a quinol-cytochrome oxidoreductase transferring electrons from the membrane quinone pool to periplasmic cytochrome c (pCC).

A B

Page 29: On the Evolution and Physiology of Cable Bacteria...1 On the Evolution and Physiology of Cable Bacteria Kasper U. Kjeldsen, Lars Schreiber, Casper A. Thorup, Thomas Boesen, Jesper

29

Figure S5. Raman spectroscopy detection of c-type cytochromes in single cable bacteria filaments (for details see (75)). (A) Typical Raman spectrum showing the four major peaks indicative of c-type cytochromes, as indicated by arrows. (B) Raman line scan (normalized peak 750 cm-1 intensities) across an individual cable bacterium, showing a cross-section with characteristic bimodal shape (n = 29, with 10 bimodal, 12 skewed unimodal, and 7 unimodal distributions. Only cross sections with three or more hits on the cable bacterium were used). For most cells the cytochrome peaks were larger at the cell edge(s) than in the center. Since the Raman signal, due to the poor confocality of the system, integrates over approx. 0.5 µm depth, it will include more cell envelope / periplasmic space at the edges than in the middle of the cell; hence our data imply that the cytochromes were situated in the cell envelope.

Page 30: On the Evolution and Physiology of Cable Bacteria...1 On the Evolution and Physiology of Cable Bacteria Kasper U. Kjeldsen, Lars Schreiber, Casper A. Thorup, Thomas Boesen, Jesper

30

Figure S6. OperonstructureofpilAinCa.E.aarhusiensisMCF,andoccurrenceoftheadjacentgenesinothercablebacteriagenomes.IMG(8)locustagsareprovidedforeachORF,withthecommonidentifierunderthegenomename.Genesarecolor-codedaccordingtotheirfunctionalannotation;homologousgeneshavesimilarcolorandoccupythesame“column”.ThefragmentedCB_MCFORFs2922,2923and2924arelikelyaresultofanassembly-error;theyarehomologoustoCB_A110991-1099,CB_A210123,CB_A310412andCB_GS10763thatencodeacytoplasmicmembraneanchoredTPRdomainprotein.Abbreviations:CB_MCF:Ca.ElectrothrixaarhusiensisMCF;CB_GS:Ca.Electronemasp.GS;CB_A1:Ca.ElectrothrixcommunisA1;CB_A2:Ca.ElectrothrixmarinaA2;CB_A3:Ca.ElectrothrixmarinaA3;CB_A5:Ca.ElectrothrixmarinaA5;TMH:Transmembranehelix,TPR:tetratricopeptiderepeat.

Page 31: On the Evolution and Physiology of Cable Bacteria...1 On the Evolution and Physiology of Cable Bacteria Kasper U. Kjeldsen, Lars Schreiber, Casper A. Thorup, Thomas Boesen, Jesper

31

Figure S7A. Alignment of the inferred amino sequences of truncated hemoglobin-multiheme cytochrome fusion proteins encoded in the genomes of Ca. E. aarhusiensis MCF (H206_00640) and Ca. Electronema sp. GS (Ga0183576_12429) as well as by gene fragments PCR amplified, cloned and sequenced from Aarhus Bay sediment enriched in cable bacteria (see “SI Methods” for details). Using the InterProScan webserver (128) the truncated hemoglobin domains, highlighted in brown shaded color, were identified by matching pfam model PF01152. Similarly, the multiheme cytochrome domain, highlighted in green colors, was identified by matching pfam model PF12435 (light shaded green) and Prosite model PS51008 (dark shaded green). N-terminal amino acids highlighted in red color represent a signal peptide as predicted by the SignalP server (129).

Page 32: On the Evolution and Physiology of Cable Bacteria...1 On the Evolution and Physiology of Cable Bacteria Kasper U. Kjeldsen, Lars Schreiber, Casper A. Thorup, Thomas Boesen, Jesper

32

Figure S7B. Phylogenetic position of truncated hemoglobin-domain proteins (highlighted in bold) encoded in the genomes of Ca. Electronema sp. GS and Ca. Electrothrix aarhusiensis MCF. The tree was inferred by RAxML analysis of an amino acid alignment covering 101 positions with the PROTGAMMALG model of evolution. Ca. Electronema sp. GS encodes an 820 amino acid long fusion protein (locus tag Ga0183576_12429), which carries a globin domain in both its N-terminal and its C-terminal part; the position of both truncated hemoglobin domains are shown in the tree. Both Ca. Electronema sp. GS and Ca. Electrothrix aarhusiensis MCF contain also a shorter protein with a truncated hemoglobin domain, which is also included in this tree. Inferred amino acids sequences of cloned PCR products from cable bacteria enrichment cultures from Aarhus Bay sediment are also shown in the tree. Clones named “Jan_” and “T1_” originate from independent PCR reactions. Bootstrap values based on 100 resamplings are shown at nodes receiving >50% support. Numbers in square brackets show the amino acid length of the respective proteins; the PCR-amplified gene fragments from sediment enrichments resulted in inferred fragment lengths of 520 amino acids. The scale bar shows 5% estimated sequence divergence.

Page 33: On the Evolution and Physiology of Cable Bacteria...1 On the Evolution and Physiology of Cable Bacteria Kasper U. Kjeldsen, Lars Schreiber, Casper A. Thorup, Thomas Boesen, Jesper

33

A

B

Figure S8. (A) MAR-FISH images of cable bacteria tested for incorporation of 14C-bicarbonate. (a-c) Single cable bacterium showing incorporation of bicarbonate, (d-f) cable bacteria with variable degree of incorporation, and (g-i) cable bacterium killed by formaldehyde prior to bicarbonate addition. Analyzed cable bacteria are presented as (a,d,g) FISH images using Desulfobulbaceae specific probe DSB706, (b,e,h) MAR images, and (c,f,i) an overlay of FISH and MAR. Scale bars, 20 μm. (B) Activity distribution of cable bacteria-specific bicarbonate incorporation quantified as number of silver grains per length of filament (n=29).

Page 34: On the Evolution and Physiology of Cable Bacteria...1 On the Evolution and Physiology of Cable Bacteria Kasper U. Kjeldsen, Lars Schreiber, Casper A. Thorup, Thomas Boesen, Jesper

34

Figure S9. Central carbon metabolism of cable bacteria inferred from genome and proteome data. Gene products are identified by IMG (8) locus tags for Ca. E. aarhusiensis MCF and Ca. Electronema sp. GS, shown in green and blue color, respectively. Underlined locus tags indicate that a given gene product was detected in the proteome of Ca. Electronema sp. GS (SI Data 2).

Page 35: On the Evolution and Physiology of Cable Bacteria...1 On the Evolution and Physiology of Cable Bacteria Kasper U. Kjeldsen, Lars Schreiber, Casper A. Thorup, Thomas Boesen, Jesper

35

Figure S10. Conserved organization of hdrC-mvhD-MTHFR genes in members of the family Desulfobulbaceae. Genes are identified by IMG locus tags (8), the fixed part of the locus tag name is shown in brackets. A star indicates the end of a contig. hdrC: heterodisulfide reductase subunit C. mvhD: Methyl-viologen-reducing hydrogenase, delta subunit. MTHFR: Methylene-tetrahydrofolate reductase-domain protein.

Page 36: On the Evolution and Physiology of Cable Bacteria...1 On the Evolution and Physiology of Cable Bacteria Kasper U. Kjeldsen, Lars Schreiber, Casper A. Thorup, Thomas Boesen, Jesper

36

(A)

(B)

(C)

(D) Figure S11. (A) SEM image of a cable bacterium filament, showing the continuous ridge structures of the outer membrane; (B) TEM scanning of the same filament demonstrated large electron-dense intracellular granules; (C) SEM-EDX elemental mapping; phosphate is indicated in yellow; (D) Element energy spectrum analysis on a granule showed a strong P peak, that together with the O, Ca, and Mg peaks is indicative of polyphosphates. The Al, Au and Si peaks are due to the sample preparation (see “SI Methods” for details).

Page 37: On the Evolution and Physiology of Cable Bacteria...1 On the Evolution and Physiology of Cable Bacteria Kasper U. Kjeldsen, Lars Schreiber, Casper A. Thorup, Thomas Boesen, Jesper

37

Figure S12. Densities of regulatory genes (as genes per Mbp of coding DNA) in cable bacteria (n = 6) and other available Desulfobulbaceae (n = 12; Table S2). Panels represent the different categories of regulatory genes, i.e. genes coding for histidine kinases, and genes with DNA-binding, small-molecule-binding, or cyclic-di-GMP-binding (CD-GMP-binding) domains. Whiskers extend 1.5 times the interquartile range from the 25th and 75th percentiles (Tukey style).

Page 38: On the Evolution and Physiology of Cable Bacteria...1 On the Evolution and Physiology of Cable Bacteria Kasper U. Kjeldsen, Lars Schreiber, Casper A. Thorup, Thomas Boesen, Jesper

38

Figure S13. Isolelectric point (pI) bias of periplasm-exposed proteins of cable bacteria in comparison to alkaliphilic and neutrophilic Desulfobulbaceae (Table S2). Dots represent pI bias values of single genomes.

Page 39: On the Evolution and Physiology of Cable Bacteria...1 On the Evolution and Physiology of Cable Bacteria Kasper U. Kjeldsen, Lars Schreiber, Casper A. Thorup, Thomas Boesen, Jesper

39

SI Tables Table S1. General characteristics of cable bacteria draft genomes.

Ca.

Electrothrix

aarhusiensis

MCF

Ca.

Electrothrix

communis A1

Ca.

Electrothrix

marina A2

Ca.

Electrothrix

marina A3

Ca.

Electrothrix

marina A5

Ca.

Electronema

sp. GS

NCBI locus tag prefix H206 VT98 VT99 VU00 VU00 CDV28

Assembly size (Mbp) 3.73 1.07 0.99 0.60 2.06 2.76

DNA G + C (%) 47.47 48.49 49.13 48.18 49.7 51.86

Number of contigs 143 489 450 349 472 73

Genome completeness 93% 34% 36% 21% 64% 93%

Estimated genome size (Mbp)

4.04 3.15 2.73 2.87 3.24 2.97

Estimated contamination

(%) 1.06 1.62 0.38 0.03 2.28 1.59

CRISPR count 12 0 0 0 0 2

Protein coding genes 4,079 1,455 1,258 839 2,161 2,649

Genes with function prediction

2,615 938 867 552 1,503 1,786

Genes with function prediction (%)

62.8 63.12 67.31 64.26 68.07 66.15

Genes assigned to COGs

1,614 389 447 228 950 1,410

Genes assigned to COGs (%)

38.8 26.18 34.70 26.54 43.03 52.22

NCBI BioProject PRJNA187269 PRJNA278504 PRJNA278504 PRJNA278504 PRJNA278504 PRJNA389779

Page 40: On the Evolution and Physiology of Cable Bacteria...1 On the Evolution and Physiology of Cable Bacteria Kasper U. Kjeldsen, Lars Schreiber, Casper A. Thorup, Thomas Boesen, Jesper

40

Table S2. Overview of reference genomes from members of the family Desulfobulbaceae used for comparative genome analyses.

Organism name IMG taxon ID# Status§ G+C

content [%] Genome size

[Mbp] Gene count

tRNA count

Deltaproteobacterium sp. MLMS-1 638341245 PD 60 6.06 5,487 93

Desulfobulbus elongatus DSM 2908 2556921601 PD 62 3.96 3,565 53

Desulfobulbus japonicus DSM 18378 2524614762 PD 46 5.81 5,125 48

Desulfobulbus mediterraneus DSM 13871

2523533605 PD 58 4.80 4,007 50

Desulfobulbus propionicus DSM 2032

649633036 F 59 3.85 3,408 48

Desulfocapsa sulfexigens DSM 10523

2561511172 F 45 4.02 3,598 46

Desulfocapsa thiozymogenes DSM 7269*

2514885009 PD 54 3.93 3,481 43

Desulfotalea psychrophila LSv54 2606217519 F 47 3.66 3,294 64

Desulfurivibrio alkaliphilus AHT2 646564528 F 60 3.10 2,732 47

Desulfofustis glycolicus DSM 9705 4977298 PD 56 4.98 4,536 44

Desulfopila aestuarii DSM 18488* 6065581 PD 50 6.07 5,367 48

Desulforhopalus singaporensis DSM 12130

5008860 PD 51 5.01 4,449 44

Average - - 54 4.60 4,087 49† #IMG genome IDs (8). §PD, permanent draft; F, finished. †Average tRNA count calculated by excluding that of Deltaproteobacterium sp. MLMS-1 *Genomes only used for analysis of pI bias, and excluded from comparative genomics and phylogenetic analyses.

Page 41: On the Evolution and Physiology of Cable Bacteria...1 On the Evolution and Physiology of Cable Bacteria Kasper U. Kjeldsen, Lars Schreiber, Casper A. Thorup, Thomas Boesen, Jesper

41

Table S3.Average nucleotide identity (ANI)* and aligned percentage of cable bacteria genomes (in parenthesis). All values were calculated using the Pairwise ANI tool of IMG (8).

Ca. Electrothrix aarhusiensis MCF

Ca. Electrothrix communis A1

Ca. Electrothrix marina A2

Ca. Electrothrix marina A3

Ca. Electrothrix marina A5

Ca. Electronema sp. GS

Ca. Electrothrix aarhusiensis MCF *

Ca. Electrothrix communis A1

87.73% [38.57] *

Ca. Electrothrix marina A2

87.79% [42.92]

90.05% [18.9] *

Ca. Electrothrix marina A3

88.15% [38.04]

90.18% [19.25]

98.97% [33.9] *

Ca. Electrothrix marina A5

87.66% [47.31]

89.53% [26.88]

98.93% [39.58]

99.88% [33.06] *

Ca. Electronema sp. GS

71.69% [30.56]

72.04% [19.35]

72.49% [21.82]

72.11% [18.62]

72.14% [28.93] *

*An ANI of 95-96% is commonly accepted as threshold for species demarcation (131).

Page 42: On the Evolution and Physiology of Cable Bacteria...1 On the Evolution and Physiology of Cable Bacteria Kasper U. Kjeldsen, Lars Schreiber, Casper A. Thorup, Thomas Boesen, Jesper

42

Table S4. The NAD(P)H-quinone oxidoreductase (Nuo) genes encoded in the genomes of Ca. Electronema sp. GS (GS) and Ca. Electrothrix aarhusiensis MCF (MCF) were acquired by lateral gene transfer from a Fibrobacter-related donor. According to blastp search (4) against the NCBI nr database the closest relatives of the cable bacteria Nuo gene products are from members of the genus Fibrobacter, not from members of the family Desulfobulbaceae.

Gene Product

KEGG orthology

GS locus tag

MCF locus tag

% amino acid identity§

MCF Fibrobac-ter spp.#

Desulfobul-bus spp.*

Other Desulfobul-

baceae†

NuoA K00330 Ga0183576 _10370

H206 _02745 73 47-49 35-38 36-42

NuoBC$ K00331/ K00332

Ga0183576 _10369

H206 _02744 68 47-49 – –

NuoD K00333 Ga0183576 _10366

H206 _01294 74 53-57 45-49 47-50

NuoH K00337 Ga0183576 _10365

H206 _01293 71 51-53 40-41 36-42

NuoI K00338 Ga0183576 _10364

H206 _01291 71 54-58 32-38 34-40

NuoJ K00339 Ga0183576 _10363

H206 _01290 63 34-36 <30 <30

NuoK K00340 Ga0183576 _10362

H206 _01289 87 56-61 37-41 38-42

NuoL K00341 Ga0183576 _10361

H206 _01288/87 82 50-51 39-42 30-41

NuoM K00342 Ga0183576 _10355

H206 _01284 86 47-49 35-40 34-39

NuoN K00343 Ga0183576 _10354

H206 _01283 77 47-48 32-38 34-37

§According to blastp search using the respective gene products of GS as queries. #Range of blastp identity values for: Fibrobacter intestinalis ATCC 43854, Fibrobacter succinogenes S85 and Fibrobacter sp. UWP2 (IMG genome Id´s (8): 2582581325, 646311927, 2700988687). *Range of blastp identity values for: Desulfobulbus elongatus DSM 2908, Desulfobulbus propionicus 1pr3, Desulfobulbus mediterraneus DSM 13871, Desulfobulbus japonicus DSM. †Range of blastp identity values for: Deltaproteobacterium sp. MLMS-1, Desulfocapsa sulfexigens DSM 10523, Desulfocapsa thiozymogenes DSM 7269, Desulfofustis glycolicus DSM 9705, Desulfopila aestuarii DSM 18488, Desulforhopalus singaporensis DSM 12130, Desulfotalea psychrophila LSv54, Desulfurivibrio alkaliphilus. $The NuoB and C are encoded by a fused gene in cable bacteria and Fibrobacter while in the other Desulfobulbaceae members NuoB and C are encoded by separate genes.

Page 43: On the Evolution and Physiology of Cable Bacteria...1 On the Evolution and Physiology of Cable Bacteria Kasper U. Kjeldsen, Lars Schreiber, Casper A. Thorup, Thomas Boesen, Jesper

43

Table S5. Cytochrome-domain proteins in cable bacteria.

Annotation* Localization** GS# MCF A1 A2 A3 A5 Evidence/notes CYTOCHROMES CONSERVED AMONG CABLE BACTERIA$

Quinohemoprotein amine dehydrogenase A, alpha subunit, heme binding UNIQUE

Membrane _101168 _00926 _11062 _11063 _10314

_10971§ _10972

_10763 _11911§

_10762 _10503a

IPR009056 - Cytochrome c-like domain. pfam10643 - Cytochrome-c551. Shares similarity with QHNDH-domain. QHNDH catalyzes the oxidative deamination of a wide range of aliphatic and aromatic amines. Compared to the 60 kDa subunit of the enzyme (Q8VW85) key domains, and beta/gamma subunits are missing

Cytochrome C oxidase, cbb3-type, subunit III-like protein (CydA) UNIQUE

Membrane _10441 _01637 _10513 _11182§ _11571§ _10175§ Putative electron transfer from quinone pool to periplasm. See main text for details.

Ubiquinol-cytochrome c reductase cytochrome b subunit (CytB of bc1 complex)

Membrane _103112 _01513 x x x x PF00033 - Cytochrome_B. 60% aa id to homologs in Desulfobulbaceae. Putative part of Rieske complex. See main text for details.

Cytochrome c peroxidase (MacA) Periplasmic _10582 _05370 x x x x

MacA homolog - PF03150 Di-heme cytochrome c peroxidase. High aa sequence identity (75%) to various Alpha, Beta and Gammaproteobacteria proteins. Share 62% aa identity with Nitrosomonas Cytochrome c551 peroxidase (P55929).

Cytochrome C UNIQUE Periplasmic _16010 _02197 x x x x

PF00034 - Cytochrom_C. single heme-binding site; shares 50% aa identity with homologs from various bacterial ammonia oxidizers.

Cytochrome-hemoglobin fusion protein UNIQUE

Periplasmic/ membrane-associated

_12429 _00640 x x x x PF13435 - Cytochrome_C554/2 and PF01152 - Bac_globin. See main text for details.

Hypothetical protein Periplasmic/ membrane-associated

_10589 _02139 _11646 _11645 _11644

x x x

Multiheme cytochrome domain IPR011031. 11-12 heme-binding sites. Match to homologs in Desulfobulbus (>70% identity): putative amino acid ABC transporter substrate-binding protein

Class III cytochrome C family protein

Periplasmic/ membrane-associated

_13618 _00200 x x x x PF02085 - Cytochrom_CIII. Class 3 cytochrome: IPR020942. Homologs in Desulfobulbaceae (best blastp hits in NCBI nr) but <40% aa identity.

Hypothetical protein Cytoplasmic (GS: Periplasmic) _14513 _06307

_06308 _12212§ x x _10808

IPR011031 - Multiheme cytochrome domain. 9-10 heme-binding sites. 50% aa identity to homologs in Desulfobulbaceae (hypothetical proteins, putative Class III family cytochromes)

Page 44: On the Evolution and Physiology of Cable Bacteria...1 On the Evolution and Physiology of Cable Bacteria Kasper U. Kjeldsen, Lars Schreiber, Casper A. Thorup, Thomas Boesen, Jesper

44

Hypothetical protein UNIQUE Extracytoplasmic _11815 _01720 x x x _10913

IPR009056 - Cytochrome c-like domain, single heme-binding site; but domain only detected in Ga0068572_10913 (50-80% aa id with MCF and GS homolog). Homologs (hypothetical proteins) in Gammaproteobacteria <50% aa identity.

Cytochrome c554-domain protein. Putative nitrite reductase UNIQUE

Periplasmic _10612 _03164 x x x _12661§

PF13447 - Multi-heme_cytochrome PF13435 - Cytochrome_C554. 57% identical to orange Maribeggiatoa nitrite reductase. See main text for details.

Cytochrome c554 and phosphatase domain protein UNIQUE

Periplasmic _101169 _03236 x _10545 x _10055

PF13435 - tetraheme Cytochrome_C554. N-terminal phosphatase domain IPR029052. Same domains structure found in homologs in NCBI nr database, all hypotheticals and <40% aa identity.

Cytochrome c554-domain protein UNIQUE

Periplasmic _12721 _06305 x x x x PF13435 - IPR023155 - Cytochrome c-552/4. Share 40-50% aa identity to homologs from various Nitrospira and bacterial S-oxidizers.

OTHER CYTOCHROMES ONLY DETECTED IN SOME CABLE BACTERIA GENOMES

Cytochrome B UNIQUE Membrane _14116 x x x x x

PF01292 - Ni_hydr_CYTB. IPR016174 - Di-heme cytochrome, transmembrane. Low aa identity (<40%) to proteins in NCBI nr database.

Cytochrome c554-domain protein

Extracytoplasmic _12430 x _10702§ _11222 x _10333

PF13435 - IPR023155 - Cytochrome c-552/4 (this protein shares high similarity [60+ % identity] to the c554 domain in the cytochrome-globin fusion protein). Domain = di-heme elbow motif, found among others in cytochrome c-554 and c-552. High similarity (60% aa identity) to proteins in e.g., Desulfocapsa, Desulfuromusa, Geobacter

Cytochrome c554-domain protein Periplasmic x _00346 _10064 _10272 _10293§ x

PF13435 - Cytochrome_C554. IPR011031 Multiheme cytochrome. Homologs in some Desulfobulbaceae (60% aa id).

Cytochrome c7 UNIQUE Periplasmic _10212 x x x x x

PF14522 - Cytochrome_C7, OmcB-like. Contains tetraheme cytochrome family C3 domain. The Geobacter OmcB (GSU_2737) is 744 aa long i.e. not homologous. Shares 50% aa identity with proteins in Gallionella.

c(7)-type cytochrome triheme domain UNIQUE

Extracytoplasmic _10211 x x x x x

Contains TIGR04257 - c(7)-type cytochrome triheme domain; contains C3 tetraheme cytochrome family domain; no homologs >40% aa identity

C(7)-type cytochrome triheme domain-containing protein UNIQUE

Periplasmic x x x x x _11912 TIGR04257 - c(7)-type cytochrome triheme domain. Low score against Nanowire_3heme cytochrome domain TIGR04257. <40% aa

Page 45: On the Evolution and Physiology of Cable Bacteria...1 On the Evolution and Physiology of Cable Bacteria Kasper U. Kjeldsen, Lars Schreiber, Casper A. Thorup, Thomas Boesen, Jesper

45

identity to NCBI nr database proteins. In Geobacter nanowire cytochrome (GSU_1996, 348 aa), the triheme domain occurs 3 times. Only 2 triheme domains in cable bacterium A5. The TIGR domain also matches proteins encoded in other Desulfobulbaceae genomes.

Cytochrome c551/c552 UNIQUE Periplasmic x _00091 x x x x PF00034 - Cytochrom_C . Homologs in

Betaproteobacteria <40% identity.

Cytochrome C UNIQUE Periplasmic x _01754 x x x x

PF00034 - Cytochrom_C. Homologs (50% aa identity) in Alpha- and Gammaproteobacteria. Not in Desulfobulbaceae.

NapC/NirT cytochrome c family, N-terminal region domain-protein UNIQUE

Extracytoplasmic x x x x _11811 x

PF03264 - Cytochrom_NNT, IPR011031 Multihaem cytochrome. <30% identity to any protein in NCBI nr database. Very short contig (2 genes, other = alcohol dehydrogenase)

Cytochrome C oxidase, cbb3-type, subunit III-like protein UNIQUE

Extracytoplasmic x x x x x _102010 PF13442 - Cytochrome C oxidase, cbb3-type, subunit III. Shares <40 % aa identity to any protein in NCBI nr database. Short contig.

*Annotation according to IMG (8). UNIQUE refers to absence of homologs in other Desulfobulbaceae. **Localization as predicted by PSORTb (130). #Locus tags displayed in bold blue indicate that the gene product has been detected in the proteome of Ca. Electronema sp. GS (SI Data 2). $‘Conserved among cable bacteria’ refers to presence in both Ca. Electronema sp. and Ca. Electrothrix aarhusiensis MCF, i.e., the two most complete genomes. §Protein truncated, gene located at end of contig. aSingle domain protein, missing C-terminal relative to its other listed homologs.

Page 46: On the Evolution and Physiology of Cable Bacteria...1 On the Evolution and Physiology of Cable Bacteria Kasper U. Kjeldsen, Lars Schreiber, Casper A. Thorup, Thomas Boesen, Jesper

46

Table S6. Genes with homology to type II secretion systems (T2SS) and type IV pilus (T4P) systems.

Annotation IMG* Locus tag Ca. Electronema sp.

GS§**

IMG* Locus tag Ca. Electrothrix

aarhusiensis MCF** O†

Pairwise identity

(%)‡ T4P protein PilB (assembly ATPase) Ga0183576_11524 H206_00056 81

T4P protein PilF (TPR domain containing) Ga0183576_13316 H206_00093 63

T4P protein PilM Ga0183576_10566 H206_00100 77 T4P protein PilN Ga0183576_10565 H206_00101 46 T4P protein PilO Ga0183576_10564 H206_00102 59 T4P protein PilP Ga0183576_10563 H206_00103 49 T4P PilP-like protein NF H206_00104 - T4P PilQ-like outer membrane TPR and secretin domain protein

Ga0183576_10562 H206_00105 42

T4P PilQ-like outer membrane -secretin-domain protein Ga0183576_10561 H206_00106 59

Outer membrane - protein Ga0183576_10560 H206_00107[N], H206_00108[C] 56

TadD-like (Flp pilus) pilotin (TPR domain containing) NF H206_00180 -

T2SS protein E Ga0183576_12720 H206_00314 81 T4P protein PilJ (signal transducing) Ga0183576_15610 H206_00367 58

T4P protein PilT (retraction ATPase) Ga0183576_12026 H206_00466 83

T4P protein PilT (retraction ATPase) Ga0183576_12027 H206_00467 89

T4P protein PilY1 Ga0183576_10754 H206_00472[C],

H206_00473, H206_00474[N]

52-63

T4P PilX N-terminal-domain protein Ga0183576_10755 H206_00475 [C],

H206_00476 (N) 55,63

T4P protein PilW Ga0183576_10756 H206_00477 44 T4P protein PilW Ga0183576_10757 H206_00478 39 Prepilin (GspH-domain protein) Ga0183576_10758 NF - Prepilin (GspH-domain protein) Ga0183576_10759 H206_00479 40 T4P protein PilD (prepilin peptidase 1) Ga0183576_15212 H206_00495 74

T2SS protein G Ga0183576_12122 H206_00715 77 T2SS protein H Ga0183576_12123 H206_00716 46 T2SS protein I Ga0183576_12124 H206_00717 60 T2SS protein J Ga0183576_12125 H206_00718 60

T2SS protein K Ga0183576_12126 H206_00719 [N], H206_00720 [C] 75;49

T2SS protein L Ga0183576_12127 H206_00721 49 T2SS protein M Ga0183576_12128 H206_00722 49 T2SS protein N Ga0183576_12129 H206_00723 42 T2SS protein C Ga0183576_12130 H206_00724 33 T2SS protein D Ga0183576_12131 H206_00727 65 T2SS protein E Ga0183576_12132 H206_00729 80 T4P PilC domain protein NF H206_00836 -

Page 47: On the Evolution and Physiology of Cable Bacteria...1 On the Evolution and Physiology of Cable Bacteria Kasper U. Kjeldsen, Lars Schreiber, Casper A. Thorup, Thomas Boesen, Jesper

47

T4P PilX N-terminal-domain protein NF H206_00837# -

Putative T4P protein PilW NF H206_00838# - T4P protein PilW NF H206_00839# - Prepilin (GspH-domain protein) NF H206_00840# - Prepilin (GspH-domain protein) NF H206_00841# -

T4P protein PilC Ga0183576_10421 H206_01238[N], H206_01239[C] 80;95

T4P protein PilJ NF H206_01376 - T4P protein PilJ Ga0183576_10428 H206_01378 63 T4P PilP-domain protein (fragment at end of contig) NF H206_01619[C] -

T4P protein PilA Ga0183576_10762 H206_02920 56¶ T2SS protein PulF Ga0183576_11453 H206_03389 77 T4P protein PilT (retraction ATPase) Ga0183576_12821 H206_03563 86

T4P protein PilT (retraction ATPase) Ga0183576_10937 H206_03578 [C] 77

T4P protein PilF (TPR domain containing) Ga0183576_12423 NF -

T4P protein PilF (TPR domain containing) Ga0183576_1597 H206_03709 59

*IMG (8). §Locus tags displayed in blue, bold font indicate that the gene product was detected in the proteome of Ca. Electronema sp. GS (SI Data 2). **NF: Gene not found in the genome. [N] and [C]: Indicate that a gene is likely fragmented and that the locus tag represents the N- or C-terminal part of the gene product. †Locus tags inferred to belong to the same operon in the two respective genomes are indicated by vertical solid lines. ¶Based on alignment of the first 100 N-terminal amino acids of the gene products. ‡Pairwise amino acid identity between the respective homologous Ca. Electronema sp. GS and Ca. Electrothrix aarhusiensis MCF gene products. #Gene products of the ORFs in this operon share approximately 40% amino acid identity with the gene products of ORFs in operon H206_00475-H206_00479.

Page 48: On the Evolution and Physiology of Cable Bacteria...1 On the Evolution and Physiology of Cable Bacteria Kasper U. Kjeldsen, Lars Schreiber, Casper A. Thorup, Thomas Boesen, Jesper

48

Table S7. Genes involved in the Wood-Ljungdahl pathway (132).

Enzyme Simplified Reaction§ Locus tag# Ca. Electronema sp. GS

Locus tag Ca. Electrothrix

aarhusiensis MCF

Formate dehydrogenase CO2 ó formate Ga0183576_102221 H206_01006/ H206_01004

Formate-H4F ligase Formate + H4F ó formyl-H4F Ga0183576_10716 H206_00054

Methenyltetrahydrofolate cyclohydrolase / methylenetetrahydrofolate dehydrogenase (NADP+)

Formyl-H4F ó methenyl-H4F Ga0183576_10190 H206_01143

Methenyltetrahydrofolate cyclohydrolase / methylenetetrahydrofolate dehydrogenase (NADP+)

Methenyl-H4F ó methylene-H4F Ga0183576_10190 H206_01143

Methylene-H4F reductase Methylene-H4F ó methyl H4F Ga0183576_12020 H206_00998

Methyl transferase Methyl H4F ó H4F + CH3-CFeSP Ga0183576_1193 H206_02173

CO dehydrogenase CO2 ó CO Ga0183576_10189 H206_01144

Acetyl-CoA synthase beta, gamma, delta

CO + CH3-CFeSP ó acetyl-CoA

Ga0183576_10186 Ga0183576_10182 Ga0183576_10188

H206_01147, H206_01149, H206_01146

§H4F: Tetrahydrofolate; CFeSP: Corrinoid iron-sulfur protein. # IMG (8) Locus tags displayed in bold blue indicate that the gene product has been detected in the proteome of Ca. Electronema sp. GS (SI Data 2).

Page 49: On the Evolution and Physiology of Cable Bacteria...1 On the Evolution and Physiology of Cable Bacteria Kasper U. Kjeldsen, Lars Schreiber, Casper A. Thorup, Thomas Boesen, Jesper

49

Table S8. Genes involved in assimilation of propionate by the methylmalonyl-CoA pathway (133,134). The listed genes are absent in the Ca. Electronema sp. GS genome assembly. Enzyme Genome IMG# Locus tag Propionate-CoA synthetase Ca. E. aarhusiensis MCF

Ca. E. communis A1 Ca. E. communis A2 Ca. E. communis A5

H206_01638/H206_01639 Ga0068569_10512 Ga0068570_11181 Ga0068572_10174

Propionyl-CoA carboxylase Ca. E. aarhusiensis MCF H206_05163 Methylmalonyl-CoA epimerase

Ca. E. aarhusiensis MCF Ca. E. communis A1 Ca. E. communis A5

Note§ Ga0068569_13571 Ga0068572_14453

Methylmalonyl-CoA mutase Ca. E. aarhusiensis MCF Ca. E. communis A5

H206_01842 & H206_01846 Ga0068572_12861

#(8). §In Ca. E. aarhusiensis MCF methylmalonyl-CoA epimerase was missed by the IMG gene calling but the gene is present on contig 103, 5493-5080 (- strand) is upstream and adjacent to the gene coding for propionyl-CoA carboxylase (H206_05163, position 4990-4358 (-)).

Page 50: On the Evolution and Physiology of Cable Bacteria...1 On the Evolution and Physiology of Cable Bacteria Kasper U. Kjeldsen, Lars Schreiber, Casper A. Thorup, Thomas Boesen, Jesper

50

Table S9. Abundance of genes encoding or interacting with restriction enzymes and toxin-antitoxin modules in genomes of members of the family Desulfobulbaceae.

Genome name§ Genome size (bp)

Restriction# (n genes)

Antitoxin* (n genes)

Antitoxin (genes per

Mbp)

Restriction (genes per

Mbp) Ca. Electrothrix communis A1 (D) 1069899 5 2 4.7 1.9 Ca. Electronema sp. GS (D) 2762383 18 18 6.5 6.5 Ca. Electrothrix aarhusiensis MCF (D) 3734523 51 42 13.7 11.2 Ca. Electrothrix marina A2 (D) 991674 10 12 10.1 12.1 Ca. Electrothrix marina A3 (D) 595470 3 8 5.0 13.4 Ca. Electrothrix marina A5 (D) 2059092 19 26 9.2 12.6 Deltaproteobacterium sp. MLMS-1 (PD) 6058912 28 52 4.6 8.6 Desulfobulbus elongatus DSM 2908 (PD) 3961953 21 10 5.3 2.5 Desulfobulbus japonicus DSM 18378 (PD) 5806258 11 8 1.9 1.4 Desulfobulbus mediterraneus DSM 13871 (PD) 4796810 6 1 1.3 0.2

Desulfobulbus propionicus, DSM 2032 (F) 3851869 8 0 2.1 0.0 Desulfocapsa sulfexigens DSM 10523 (F) 4023512 4 6 1.0 1.5 Desulfocapsa thiozymogenes DSM 7269 (PD) 3927421 22 38 5.6 9.7

Desulfofustis glycolicus DSM 9705 (PD) 4977298 15 3 3.0 0.6 Desulfopila aestuarii DSM 18488 (PD) 6065581 8 5 1.3 0.8 Desulforhopalus singaporensis DSM 12130 (PD) 5008860 4 0 0.8 0.0

Desulfotalea psychrophila LSv54 (F) 3659634 5 0 1.4 0.0 Desulfurivibrio alkaliphilus AHT2 (F) 3097763 7 3 2.3 1.0

§(D) draft genome, (PD) permanent draft genome, (F) finished complete genome. #Based on IMG annotation (8). #Genes encoding or interacting with restriction enzymes were identified by searching for the term “restriction” in the IMG annotation of the genome. *Genes encoding or interacting with toxin-antitoxin modules were identified by searching for the term “antitoxin” in the IMG annotation of the genome as well as for genes matching PF02604 (type II toxin-antitoxin system) as genes matching this pfam model tend to be annotated as “prevent-host-death protein” and thus not feature the search term “antitoxin” in their name. Table S10. Presence and size of CRISPR regions in genomes of members of the family Desulfobulbaceae.

Genome name§ Genome size (bp)

CRISPR count#

Total CRISPR size (bp)#

Ca. Electrothrix communis A1 1069899 0 - Ca. Electronema sp. GS 2762383 2 26254 Ca. Electrothrix aarhusiensis MCF 3734523 12 5043 Ca. Electrothrix marina A2 991674 0 - Ca. Electrothrix marina A3 595470 0 - Ca. Electrothrix marina A5 2059092 0 - Deltaproteobacterium sp. MLMS-1 6058912 5 6110 Desulfobulbus elongatus DSM 2908 3961953 1 292 Desulfobulbus japonicus DSM 18378 5806258 0 - Desulfobulbus mediterraneus DSM 13871 4796810 1 464 Desulfobulbus propionicus, DSM 2032 3851869 1 6081 Desulfocapsa sulfexigens DSM 10523 4023512 1 565 Desulfocapsa thiozymogenes DSM 7269 3927421 2 1274 Desulfofustis glycolicus DSM 9705 4977298 0 - Desulfopila aestuarii DSM 18488 6065581 2 162 Desulforhopalus singaporensis DSM 12130 5008860 1 395 Desulfotalea psychrophila LSv54 3659634 1 959 Desulfurivibrio alkaliphilus AHT2 (F) 3097763 2 9415

#Based on IMG annotation (8).