genome‐centric resolution of microbial diversity, metabolism and …392867/uq392867... ·...

Genome-centric resolution of microbial diversity, metabolism and interactions

in anaerobic digestion

Running title: Genome-centric resolution through deep metagenomics

Inka Vanwonterghem1,2

, Paul D Jensen1, Korneel Rabaey

1,3 and Gene W Tyson

1,2*

1Advanced Water Management Centre (AWMC), The University of Queensland, St Lucia, QLD

4072, Australia; 2Australian Centre for Ecogenomics (ACE), School of Chemistry and Molecular

Biosciences, The University of Queensland, St Lucia, QLD 4072, Australia; 3Laboratory for

Microbial Ecology and Technology (LabMET), Ghent University, Coupure Links 653, 9000 Ghent,

Belgium

*Corresponding author: Prof. Gene W. Tyson. Mailing address: Australian Centre for

Ecogenomics (ACE), School of Chemistry and Molecular Biosciences, The University of

Queensland, St Lucia, QLD 4072, Australia. Phone: +617 3365 3829 Fax: +617 336 54511 Email:

[email protected]

Keywords: metagenomics / genome-centric / functional redundancy / metabolic network / novel

diversity / anaerobic digestion

This article has been accepted for publication and undergone full peer review but has not beenthrough the copyediting, typesetting, pagination and proofreading process which may lead todifferences between this version and the Version of Record. Please cite this article as an‘Accepted Article’, doi: 10.1111/1462-2920.13382

This article is protected by copyright. All rights reserved.

2

Abstract

Our understanding of the complex interconnected processes performed by microbial communities is

hindered by our inability to culture the vast majority of microorganisms. Metagenomics provides a

way to bypass this cultivation bottleneck and recent advances in this field now allow us to recover a

growing number of genomes representing previously uncultured populations from increasingly

complex environments. In this study, a temporal genome-centric metagenomic analysis was

performed of lab-scale anaerobic digesters that host complex microbial communities fulfilling a

series of interlinked metabolic processes to enable the conversion of cellulose to methane. In total,

101 population genomes that were moderate to near-complete were recovered based primarily on

differential coverage binning. These populations span 19 phyla, represent mostly novel species and

expand the genomic coverage of several rare phyla. Classification into functional guilds based on

their metabolic potential revealed metabolic networks with a high level of functional redundancy as

well as niche specialization, and allowed us to identify potential roles such as hydrolytic specialists

for several rare, uncultured populations. Genome-centric analyses of complex microbial

communities across diverse environments provide the key to understanding the phylogenetic and

metabolic diversity of these interactive communities.

Introduction

Microorganisms are ubiquitous in the environment and play key roles in global biogeochemical

cycles. As the majority of microbial life has eluded cultivation in the laboratory, culture-

independent techniques have been developed to study their diversity and functions (Tringe and

Rubin, 2005; Albertsen et al., 2013; Vanwonterghem et al., 2014a). Metagenomics, the sequencing

of bulk DNA extracted directly from environmental samples, provides direct access to the


3

metabolic potential of a microbial community. Advances in sequence throughput, read length and

quality, and bioinformatics tools have contributed to a more widespread application of

metagenomics to study natural and engineered systems.

Early metagenomic studies relied largely on gene-centric analyses (Venter et al., 2004; Tringe et al.,

2005) with the recovery of individual genomes limited to environments dominated by few distinct

populations (Tyson et al., 2004). These gene-centric approaches are biased towards existing

databases, hereby overlooking a significant fraction of the novel diversity (Jaenicke et al., 2010;

Wong et al., 2013). In addition, as only an overview of the metabolic potential of the community is

provided without assigning functions to individual populations, important metabolic interactions

may remain undetected. The development of new improved sequencing technologies and

population genome binning algorithms (Wrighton et al., 2012; Albertsen et al., 2013; Imelfort et al.,

2014) has allowed us to move beyond gene-centric approaches and recover population genomes

from increasingly complex environments. This has led to the discovery of novel lineages (Brown et

al., 2015; Castelle et al., 2015), and insight into the metabolic processes (Raghoebarsing et al.,

2006; Haroon et al., 2013) and microbial interactions (Wrighton et al., 2014; Baker et al., 2015)

taking place in these environments.

Engineered systems offer a controlled environment in which to study complex microbial

communities, test hypotheses and explore the efficacy of new metagenomic approaches. Anaerobic

digestion provides an interesting study environment as it consists of a series of metabolic processes

carried out by a consortium of interdependent microorganisms. This process is a critical component

of the global carbon cycle as well as industrially relevant as a waste management strategy and for

the production of bioenergy (Amani et al., 2010). Due to the complexity of the communities

involved, anaerobic digesters (ADs) remain genomically underexplored and most metagenomic

studies have relied on gene-centric approaches (Jaenicke et al., 2010; Hanreich et al., 2013; Wong


4

et al., 2013; Solli et al., 2014; Stolze et al., 2015). The recovery of population genomes from

various engineered systems has provided genomic insight into candidate phyla such as TM7

(Albertsen et al., 2013) and KSB3 (Sekiguchi et al., 2015), which is responsible for filamentous

bulking in anaerobic wastewater treatment, and microbial interactions such as synergistic networks

within terephthalate-degrading bioreactors (Nobu et al., 2014). Genome-centric approaches can thus

provide a powerful means to understanding the phylogenetic and metabolic diversity in anaerobic

digestion.

Here, a detailed genome-centric exploration of complex microbial communities in ADs was

performed to reconstruct the metabolic network by gaining access to the functional potential of

individual population involved in the conversion of cellulose to methane. ADs were operated in

triplicate for a year and supplied with cellulose. Metagenomic sequencing was performed on

samples taken at two time points (spanning ~8 months), characterized by differences in

performance. Co-assembly of the six generated metagenomes followed by differential coverage-

based binning resulted in the recovery of 101 population genomes that constitute the majority of the

community. These genomes represent 19 phyla and expand the genomic diversity of several

lineages with few sequenced representatives. The metabolic reconstruction of individual

populations combined with their relative abundance estimates allowed us to study ecological

theories through the identification of a high level of functional redundancy, and construct an

interaction network for the flow of carbon through the community. These results demonstrate the

importance of genome-centric analyses when studying complex communities that harbor novel

diversity, and provide the foundation for further hypotheses-driven experiments.

Results

Metagenomic sequencing and assembly


5

The phylogenetic and metabolic diversity of microbial communities involved in anaerobic digestion

was studied using a genome-centric metagenomic approach. Three lab-scale ADs (designated AD1,

AD2 and AD3) were used as controlled systems in which to study the community dynamics and

reconstruct the metabolic network. The ADs were inoculated with a mixture of eight samples taken

from anaerobic environmental and engineered systems (Table S1). They were operated for 362 days

and supplied with cellulose as the sole carbon and energy source. Samples for metagenomic

sequencing were collected from the reactors at two time points (T1: Day 96; T2: Day 362) based on

differences in the structure and performance of the microbial communities, which are summarized

in Fig. S1 and Table S2, and have been described in detail previously (Vanwonterghem et al.,

2014b). Briefly, cellulose hydrolysis was stable at both time points at an average efficiency of 86 ±

4%. Accumulation of predominantly acetate and propionate was observed at T1, with highest

volatile fatty acid (VFA) concentrations measured for AD1 which correlated with lower methane

production. At T2, VFAs were efficiently converted to methane and only minor differences were

observed between the reactors. The six metagenomes (111 Gb total raw reads) from the triplicate

ADs at these two time points were co-assembled, generating 494,042 contigs with a combined

length of 908 Mb (Table S3). On average, >85% of the metagenomic reads from each dataset

mapped onto the contigs (>500 bp) from the combined assembly (Table S4).

Microbial community composition and population genome recovery

The community composition was determined by extracting the 16S rRNA gene sequences from the

metagenomes (Fig. 1) and compared to previously reported amplicon-based community profiles

(Fig. S1) (Vanwonterghem et al., 2014b). The most abundant populations belonged to the phyla

Euryarchaeota, Actinobacteria, Bacteroidetes, Fibrobacteres, Firmicutes, Spirochaetes and

Verrucomicrobia, which are commonly found in ADs (Jaenicke et al., 2010). The microbial

communities were highly similar to one another, but shifted in structure over time leading to

significantly different communities at the two time points (P < 0.001). Several differences could be


6

observed between the metagenome- and amplicon-based community profiles. Interestingly, a

Cellulomonas population was detected at 3-7% relative abundance in the metagenomes, while a

primer mismatch for the forward primer (926F) used in the amplicon sequencing approach (Fig. S2)

failed to detect this population (Fig. S1). On the contrary, the abundance of methanogens was

overestimated in the amplicon dataset compared to the metagenome dataset, which is likely due to

PCR primer and amplification biases. Amplicon-based studies using the 454 sequencing platform

also suffer from lower taxonomic resolution compared to metagenomics and may underestimate the

community diversity and dynamics. For example, two Fibrobacter populations were detected in the

metagenome dataset, each dominant at a different time point, yet were grouped together as one

phylotype in the amplicon dataset (Fig. 1 and Fig. S1). A similar observation was made for the

dominant Methanosaeta populations and influences our perception of the microbial community

dynamics.

Population genome binning of the co-assembled metagenomes enabled the recovery of 93 bacterial

and 8 archaeal population genomes with ≥50% completeness and ≤10% contamination (Table 1 and

Table S5). Of these genomes, 58 were substantially to near complete (≥80%) with low to medium

contamination, according to the CheckM classification (Table 1) (Parks et al., 2015). The 101

genomes ranged in size between 1.4 and 6.3 Mb, across a GC content range between 29 and 74%

(Table 1 and Table S5). They represent the majority of the community (62 ± 3% and 79 ± 4% at T1

and T2, respectively; based on percentage of reads mapping), with 58% representing relatively high

abundance populations (>0.5% in at least one of the samples) and the remaining 42% representing

low abundance populations (down to 0.09% maximum relative abundance in at least one of the

samples) (Table 1 and Table S5). In addition to recovering genomes for all the abundant population

identified in the 16S rRNA gene profiles (Fig. 1, Fig. 2 and Fig. S3), a large number of low

abundance population genomes were recovered which highlights the strength of the binning

approach used in this study. The populations were phylogenetically diverse and belong to 19


7

different phyla (Fig. 2). Many of these genomes represent novel orders, families and/or genera, and

they significantly expand the genomic representation of phyla with relatively few sequenced

genomes such as Fibrobacteres (Rahman et al., 2015), Verrucomicrobia, Planctomycetes and

Candidate division WWE1 (Fig. 2).

Classification into functional guilds based on metabolic potential

The metabolic potential of the microbial communities in these reactors was determined in order to

classify individual populations into functional guilds fulfilling the different steps in anaerobic

digestion (hydrolysis, fermentation, syntrophic oxidation and methanogenesis). Based on the

potential substrate utilization for the dominant populations and their relative abundance, the flow of

carbon from cellulose to methane in each community could be inferred, leading to the construction

of a metabolic network.

Hydrolysis. Firstly, a gene-centric approach was applied to examine the hydrolytic capacity of the

AD communities over time and relative to other environments. Glycoside hydrolase (GH) profiles

were generated for each individually assembled metagenome by calculating the total number of

enzymes within each GH family. Comparative analysis of these GH profiles showed no significant

differences between reactors and time points (P<0.05). The AD metagenomes were enriched in

genes belonging to GH5 (5.3 ± 0.4% of total GH) and GH9 (1.6 ± 0.6%), but also showed high

levels of other GH families, including GH2 (4.2 ± 0.3%), GH3 (3 ± 0.4%), GH31 (2.3 ± 0.2%),

GH43 (4.2 ± 0.7%), GH94 (2.0 ± 0.2%), GH78 (3.3 ± 0.3%), GH13 (4.9 ± 0.5%) and GH23 (3.1 ±

0.5%) (Table S6). Enzymes belonging to these GH families are predominantly involved in the

hydrolysis of cellulose, oligosaccharides, sugar side chains, amylose/maltose and peptidoglycan. A

comparison was made between the GH profiles of the ADs and those reported for soil ecosystems

(Tveit et al., 2013), switchgrass compost, termite hindgut and rumen (Allgaier et al., 2010) (Table

S7 and Fig. S4). Principle component analysis showed distinct clustering of the cellulose-degrading


8

reactor samples together with the wood-feeding termite hindgut community (Allgaier et al., 2010),

which were all enriched for cellulases predominantly belonging to GH5, reflecting the cellulosic

substrate. The soil environments clustered together despite differences in plant cover (moss versus

vascular plants), while the rumen sample was most different and showed a high abundance of

oligosaccharide degrading enzymes belonging to GH2, GH3 and GH51 (Table S7 and Fig. S4),

which is likely driven by the dominant grass hemicellulose found in this environment.

Cellulose hydrolyzers were identified in the ADs by generating GH profiles for the individual

population genomes and correlating known activities for GH families with gene annotations to

determine the substrate profile (Fig. 3). The potential to degrade cellulose was a common feature

and present in 65% of the bacterial populations, including phyla commonly associated with

cellulose hydrolysis such as Fibrobacteres (Fibro_01-03), Firmicutes (Firm_03-06, Firm_10-11,

Firm_13-14 and Firm_16), Bacteroidetes (Bact_02-03, Bact_08-11, Bact_13 and Bact_24),

Spirochaetes (Spiro_07-10 and Spiro_12), and Actinobacteria (Actino_01-02) (Fig. 3, Fig. 4 and

Fig. 5) (Lynd et al., 2002; Bayer et al., 2008; Bekele et al., 2011; Suen et al., 2011; Naas et al.,

2014). A range of GH enzymes were also detected in the two Verrucomicrobia populations

(Verruco_01-02) (Fig. 3), and it has previously been speculated that certain populations within this

phylogenetically heterogeneous group can make a substantial contribution to polysaccharide

hydrolysis, even when present at low abundance (Martinez-Garcia et al., 2012). Similar to prior

studies, one of the Lentisphaerae genomes (Lenti_02) (Fig. 3) encoded a high abundance and

variety of GH enzymes (Kaoutari et al., 2013). However, only a very limited number of GH

enzymes were detected in the second Lentisphaerae population (Lenti_01), indicating that

polysaccharide hydrolysis is not a representative feature of the whole phylum. Although the

genome completeness of Lenti_01 is lower than Lenti_02, it is unlikely that this large difference in

GH abundance and diversity can be bridged by the missing fraction of the genome. The largest

number of GH enzymes was observed for a Planctomycetes population (Planc_01) (Fig. 3), which


9

expands our understanding of the metabolic role of Phycisphaerae since only a limited number of

genomes within this class have been sequenced thus far, and this agrees with the recent finding of a

broad range of GH enzymes within Planctomycetes genomes recovered from estuary sediment

(Baker et al., 2015). The discovery of hydrolytic potential within novel species highlights the

importance of genome-centric approaches as these organisms play a crucial role in carbon cycling.

Microorganisms that could use cellobiose but not cellulose were identified in the reactors among

Proteobacteria (Alpha_01, Beta_02, Delta_01 and Epsilon_01), Bacteroidetes (Bact_22-23),

Spirochaetes (Spiro_02-03) and Synergistetes (Syner_01). By assigning functions to individual

populations, discrepancies could be observed between cellobiose opportunists and cellulose

degraders. In contrast to previous studies that reported a minimum ration of 2:1 for these functional

groups (cellobiose:cellulose) (Berlemont and Martiny, 2013; Wrighton et al., 2014), the number of

cellobiose opportunists in the ADs was lower than cellulose degraders. When taking the relative

abundance into account it could be shown that this ratio was dynamic and became more even over

time (1:7 at T1, 1:3 at T2 of cellobiose:cellulose).

The GH profile for each genome was normalized by its relative abundance at each time point (Fig.

S5 and Fig. S6) and this showed a clear shift in the abundant cellulose degraders over time, i.e.

from Bacteroidetes (Bact_02-03) and Ruminococcus (Firm_04-06) populations at T1 (Fig. 4 and

Fig. S5), to Cellulomonas (Actino_01), Fibrobacter (Fibro_03) and Clostridiales (Firm_11)

populations at T2 (Fig. 5 and Fig. S6). Several Spirochaetes (Spiro_07-10 and Spiro_12) and

Verrucomicrobia (Verruco_01-02) were initially present at lower abundance (maximum 1.3%) but

increased over time (maximum 6.1%). Most of the dominant cellulolytic populations possessed a

plurality of genes with cellulase and cellobiosidase activity (Fig. 3), and it has been hypothesized

that higher GH diversity and copy number results in improved cellulose degrading ability

(Berlemont and Martiny, 2013).


10

The presence of multiple high abundance cellulose degraders at the same time within a community

may suggest there is a level of niche specialization. For example, a positive correlation in relative

abundance was observed between Fibro_03 and Firm_11 (Table 1 and Fig. S7). These populations

may utilize different strategies for attachment to cellulose particles since fibro-slime proteins (fsu)

and pili (pil) were identified in Fibro_03, similar to Fibrobacter succinogenes (Suen et al., 2011),

while dockerin and cohesion modules were detected in Firm_11 suggesting the presence of an

organized cellulosome apparatus similar to Clostridium thermocellum (Lynd et al., 2002; Bayer et

al., 2008). Their substrate specificity may also vary as multiple endoglucanases (GH5, GH8, GH9

and GH45) but only one cellobiose phosphorylase (GH94) for cellobiose utilization were found in

Fibro_03, while only few endoglucanases within the GH5 family but multiple cellobiose

phosphorylase (GH94) and beta-glucosidase genes (GH1 and GH3) were detected for Firm_11. In

addition, these populations potentially use different oligosaccharide, cellobiose and glucose

transport mechanisms, such as phosphotransferase systems (pts), non-specific sugar ABC

transporters (e.g. msmK, malK, sugC, and gguAB) and specific cellobiose transporters (cebEFG)

(Fig. S8). These differences in hydrolytic potential suggest that within the same environment and

functional guild, niche specialization may allow seemingly functionally redundant populations to

grow simultaneously and potentially work together.

Fermentation. The majority of the community showed a potential to convert glucose to acetate,

with 73% of the bacterial population genomes encoding the acetate kinase (ack) and phosphate

acetyltransferase (pta) genes required for acetate production (Fig. 4 and Fig. 5). An additional 16%

were missing only one of these genes. This indicates a high level of functional redundancy and

confirms acetate as one of the most important intermediates in these types of systems (Amani et al.,

2010).


11

Propionate production within these communities occurred via the methylmalonyl-CoA pathway by

populations within the Actinobacteria (Actino_02), Bacteroidetes (Bact_02-03, Bact_09-11,

Bact_13, Bact_19 and Bact_22-24), Rhodospirillum (Alpha_01) and Verrucomicrobia

(Verruco_01-02), which contained the key enzymes methylmalonyl-CoA mutase, methylmalonyl-

CoA epimerase and methylmalonyl-CoA carboxyltransferase. The higher propionate concentrations

observed in the reactors at T1 (Table S2) were likely related to the high relative abundance of

Bact_03 (10 ± 2%) and Actino_02 (4 ± 1%), a population closely related to Propionibacterium

(Fig. 4). The main propionate producers decreased in abundance over time and at T2 the dominant

populations of this functional guild shifted to members of the Bacteroidetes (Bact_19 and Bact_22-

24; 0-5%) and Verrucomicrobia (Verruco_01-02; 0- 6%) (Fig. 5). A full complement of genes for

propionate production via the acrylate pathway or propanediol pathway was not detected in the

investigated genomes.

Multiple potential butyrate producers were detected within the phylum Bacteroidetes (Bact_08-11,

Bact_13, Bact_19 and Bact_22-24) (Fig. 4 and Fig. 5). These populations contained the key gene

butyrate kinase (buk) as well as most or all of the remaining genes in the butyrate fermentation

pathway. The alternative pathway using butyryl-CoA:acetate CoA-transferase (but) was not

detected in the studied population genomes. Although butyrate production genes were expected to

be found in the Clostridiales genomes based on what is known from cultured species and genome

representatives (Vital et al., 2016), a complete pathway for butyrate production was not detected in

any of the Clostridiales genomes from this study. Potential for amino acid fermentation to acetate

and butyrate was detected for Synergistetes (Syner_01 and Syner_03) and Treponema (Spiro_12)

populations, which has been observed for species belonging to these phyla previously (Tucci and

Martin, 2007; Ganesan et al., 2008; Chertkov et al., 2010). These populations may be scavengers

utilizing proteins that have been excreted or leaked from dead cells. Potential growth on

proteinaceous compounds and sugars with predominantly acetate and lower amounts of butyrate as


12

fermentation products may also be possible for the Thermotogae populations (Thermo_01-02),

similar to what has been suggested for Mesotoga prima (Nesbo et al., 2012). Only three mesophilic

Thermotogae genomes have been described so far, providing limited knowledge of their

metabolism. The populations within the reactors seem phylogenetically more closely related to

Mesotoga infera, however they lack the genes for utilization of sulfur as terminal electron acceptor,

a key feature for this species (Hanaia et al., 2013). Instead, they also contain a selection of

polysaccharide degrading enzymes, which can be related back to the environment in which they are

found.

Syntrophic VFA oxidation. Reduced compounds such as propionate and butyrate can be further

oxidized to acetate, CO2, H2 and formate by syntrophic bacteria when H2 partial pressures are low.

Two Syntrophobacterales genomes (Delta_01 and Delta_02; 47% amino acid identity (AAI))

contained the majority of genes for the methylmalonyl-CoA pathway, indicating a potential

involvement in propionate oxidation. Other members of this family are capable of syntrophic

propionate oxidation, i.e. Syntrophobacter fumaroxidans (Harmsen et al., 1998) (64% AAI to

Delta_01), and syntrophic oxidation of phenol and other aromatics to acetate, i.e. Syntrophorhabdus

aromaticivora (Qiu et al., 2008) (63% AAI to Delta_02). Delta_01 and Delta_02 were present at

<0.2% relative abundance at T1 and increased over time to 0.3-0.9% at T2 (Fig. 5). Although these

relative abundances are still low, syntrophic propionate oxidizers are capable of high substrate

turnover and this likely contributed to the low observed propionate concentrations at T2. It has been

suggested that Candidatus ‘Cloacamonas Acidaminovorans’ is a hydrogen-producing syntroph

capable of oxidizing propionate based on its genome sequence combined with cultivation

experiments (Pelletier et al., 2008). Although the WWE1 genome recovered from the reactors

appears to have similar genes required for the utilization of amino acids, sugars and carboxylic

acids, as well as multiple putative Fe-only hydrogenases, the energy-conservation mechanism

required for syntrophic VFA oxidation remains to be elucidated.


13

Butyrate oxidation was likely performed via the beta-oxidation pathway by another

Syntrophobacterales population (Delta_03), which is most closely related to Syntrophus

aciditrophicus (60% AAI) (Mclnerney et al., 2007). The Delta_03 genome had a large number of

genes invested in butyrate oxidation and increased in abundance over time from <0.001% at T1 to

~1.4% at T2 (Fig. 5). The Delta_01 and Delta_02 genomes only encode part of the beta-oxidation

pathway, i.e. from butyryl-CoA or crotonyl-CoA to acetate, suggesting intermediates from other

oxidation pathways may feed into the butyrate oxidation pathway at this step.

Methanogenesis. Methane producing populations within these communities were related to

Methanocorpusculum (Methan_05), Methanospirillum (Methan_06), Methanoculleus (Methan_07)

and Methanosaeta (Methan_01-02 and Methan_04) (Fig. 4 and Fig. 5). Over time, there was an

overall increase in methanogen abundance associated with a shift from hydrogenotrophic to

acetoclastic methanogenesis. The presence of multiple populations capable of fulfilling the same

function shows that a level of functional redundancy remained within more specialized functional

guilds. Another interesting finding was the presence of a near complete complement of genes for

hydrogenotrophic methanogenesis within each of the three Methanosaeta genomes (Methan_01-02

and Methan_04), which showed little to no contamination and are reported to be strictly

acetoclastic. Various hypotheses have been developed to explain the potential role of this pathway

(Smith and Ingram-Smith, 2007; Rotaru et al., 2014) but functional assays are needed to determine

whether this pathway is active in these systems.

While methanogen abundance increased over time, the increase in methane production was

disproportional, and this was likely due to a shift in the rate-limiting step. The observed

accumulation of VFAs at T1 indicates syntrophic VFA oxidation and/or methanogenesis was rate-

limiting within the community at this time point. As all VFAs were efficiently converted to biogas


14

at T2, steps upstream in the metabolic network were more likely rate-limiting. When substrate

concentrations are low, methanogens can use internal storage compounds (e.g. glycogen) for

growth without methane production (Verhees et al., 2003). Also, enzymes for assimilatory and

dissimilatory sulfate reduction were encoded within several populations present at higher

abundance at T2 (Delta_01, Chlorobi_01 and Alpha_01-03), indicating potential competition with

methanogens for H2 and/or acetate (Oremland and Polcin, 1982).

Discussion

The widespread application of metagenomics sequencing has led to the discovery of novel species

and metabolic processes of global importance (Haroon et al., 2013; Wrighton et al., 2014; Baker et

al., 2015). Improved metagenome assembly and binning tools (Imelfort et al., 2014) now allow a

growing number of population genomes to be recovered from increasingly complex environments

(Albertsen et al., 2013; Baker et al., 2015; Brown et al., 2015). Here, a detailed genome-centric

analysis of microbial communities involved in the conversion of cellulose to methane led to the

recovery of 101 population genomes that could be classified into functional guilds based on their

potential substrate utilization. Through the recovery of population genomes for the majority of the

community, we were able to combine the metabolic potential of individual populations with their

relative abundance, and reconstruct a metabolic network for the dominant players in the

communities at two time points (T1: Fig. 4 and T2: Fig. 5). The networks revealed a high level of

functional redundancy, particularly among the hydrolyzers and fermenters, as changes in the

dominant players were observed over time while the overall functionality was maintained. Potential

niche specialization was also observed based on the variety and abundance of GH families. Various

microbial interactions could be inferred including competition for substrates and cellobiose- or

glucose-utilizing opportunists that depend on the activity of primary cellulose degraders. Metabolic

functions that could not have been predicted from known cultured or sequenced representatives


15

were also identified within each functional guild. By correlating the metabolic network with

performance parameters, observations such as the accumulation of propionate could be explained.

The genome-resolved network also enabled the proportion of the community represented by each

functional guild to be calculated, and this highlighted the importance of a diverse and well-balanced

community with functional flexibility to fulfill a complex multi-step process such as the anaerobic

digestion.

The results presented here demonstrate the valuable insights that can be gained into complex

metabolic networks through genome-centric metagenomics. The approach described in this study

can be readily applied to other natural and engineered systems, which will undoubtedly reveal novel

microbial diversity and metabolic interactions. When genome-centric metagenomics is combined

with functional data derived from metatranscriptomics or -proteomics, we will be able to develop a

holistic understanding of the complex roles microorganisms play in these environments.

Experimental procedures

Sample collection and DNA extraction

Triplicate ADs (2L working volume) were seeded with a diverse inoculum consisting of a samples

taken various anaerobic digesters, an anaerobic lagoon, rumen fluid and anoxic lake sediment

(Vanwonterghem et al., 2014b), and supplied with alpha cellulose (Sigma Aldrich, NSW Australia)

as the sole energy and carbon source. The reactors were designated AD1, AD2 and AD3, and were

run for 362 days at a 10 day sludge retention time (SRT), under mesophilic conditions and at

neutral pH. The medium contained 3 g L-1

Na2HPO4, 1 g L-1

NH4Cl, 0.5 g L-1

NaCl, 0.2465 g L-1

MgSO4.7H2O, 1.5 g L-1

KH2PO4, 14.7 mg L-1

CaCl2, 2.6 g L-1

NaHCO3, 0.5 g L-1

C3H7NO2S, 0.25


16

g L-1

Na2S.9H2O, and 1 mL of trace solution containing 1.5 g L-1

FeSO4.7H2O, 0.15 g L-1

H3BO3,

0.03g L-1

CuSO4.5H2O, 0.18 g L-1

KI, 0.12 g L-1

MnCl2.4H2O, 0.06 g L-1

Na2Mo4.2H2O, 0.12 g L-1

ZnSO4.7H2O, 0.15 g L-1

CoCl2.6H2O, 10 g L-1

EDTA and 23 mg L-1

NiCl2.6H2O. It was sparged

with N2 and then autoclaved at 121°C for 60 min for oxygen removal and sterilisation. The reactors

were supplied with alpha cellulose at a concentration of 5 g cellulose L-1

medium semi-continuously,

i.e. at intervals of six hours resulting in 4 feed events per day. Reactor performance parameters and

microbial community composition were monitored over time as part of a previous study

(Vanwonterghem et al., 2014b). Samples for metagenomic sequencing were collected from the

three reactors (2 mL) at two time points (Day 96 and Day 362) based on differences in reactor

performance (Table S2). The samples were centrifuged at 14,000 g for 2 min to collect the biomass,

and the pellet was snap-frozen in liquid nitrogen and stored at -80°C until further processing. DNA

was extracted from these samples using the MP-Bio FastDNA Spin Kit for Soil (MP Biomedicals,

Australia) and according to the manufacturer’s instructions.

Metagenome library preparation and sequencing

DNA libraries for samples from the first time point were prepared using the TruSeq DNA Sample

Preparation Kits v2 (Illumina, CA) with 2 µg of DNA from each sample, following the

manufacturer’s instructions. The DNA concentration of the libraries was measured using the

QuanIT kit (Molecular Probes, CA). Paired-end sequencing (2 x 150 bp, average fragment size 250

bp) was performed on the Illumina HiSeq2000 using the TruSeq PE Cluster Kit v3-cBot-HS

(Illumina). The second set of samples were prepared for sequencing using the Nextera DNA

Sample Preparation Kit (Illumina) with 50 ng of DNA from each sample, following the

manufacturer’s instructions. Quantification and quality assessment of the libraries was performed

using the Agilent 2100 Bioanalyser (Agilent technologies, CA). Paired-end sequencing (2 x 150 bp,


17

ranging from 300-800 bp fragment size) was performed on the Illumina HiSeq2000 platform using

the TruSeq SBS Kit v3 (Illumina). Each sample was sequenced on one third of a flowcell lane,

generating a combined total of 111 Gb of raw sequence data. Three additional large insert (2797 ±

83 bp) mate-pair libraries were generated from the same genomic DNA extracted from the three

reactors at day 362. The libraries were constructed using the Nextera Mate Pair Sample Preparation

Kit (Illumina) and sequenced on the Illumina MiSeq system (2 x 250 bp paired-end sequencing)

using the MiSeq Reagent kit v2. Each sample was sequenced on one quarter of a flowcell lane,

generating a combined total of 5 Gb of raw sequence data.

Community profiling

16S rRNA gene amplicon sequencing of all samples using the Roche 454 GS-FLX Titanium

platform (Roche Diagnostics, Australia) has been previously reported (Vanwonterghem et al.,

2014b). The microbial community composition was also determined by identifying and classifying

all 16S rRNA reads from the paired-end metagenomic datasets using the software CommunityM

v.1.2 with default parameters (https://github.com/dparks1134/CommunityM.git), which uses hidden

Markov models (HMMs) to identify the 16S rRNA gene sequences and classifies them using the

GreenGenes database (DeSantis et al., 2006) with clustering at 97% sequence similarity.

Metagenome assembly and population genome binning

Paired-end reads were quality trimmed using CLC workbench v.6 (CLC Bio, Taiwan) with a

quality score threshold of 0.01 and minimum read length of 100 bp. Illumina sequencing adapters at

the ends of reads were trimmed (if found) and reads containing ambiguous nucleotides were

removed from the dataset. Trimmed sequences were assembled using the CLC de novo assembly

algorithm with a kmer size of 63 and automatic bubble size. All six datasets were assembled

individually and also combined in a single large dataset co-assembly for population genome

binning. Only contigs larger than 500 bp were used in downstream analyses. The raw paired-end


18

reads from each individual dataset were mapped onto the combined assembly using BWA (Li,

2013) and SAMtools (Li et al., 2009) with default parameters. On average 87 ± 4% of all reads

mapped onto the co-assembly. Population genomes were recovered from the sequence data based

primarily on differential coverage profiles using GroopM v.0.2 (Imelfort et al., 2014), with initial

core formation set at 1500 bp.

Population genome bin refinement and quality assessment

The completeness and level of contamination of the population genome bins was calculated with

CheckM v.0.9.4 (Parks et al., 2015), which uses lineage specific conserved marker gene sets for

each population genome. Manual refinement of the population genome bins was performed using

the GroopM refine function based on coverage profiles, kmer signatures and GC content, leading to

a significant increase in good quality population genomes (Table S8). The resulting population

genome bins were further refined using the mate-pair sequence data. Adapter sequences were

removed, trimmed reads shorter than 50 bp were discarded, and only valid mate pairs, i.e. reads

oriented in the reverse-forward direction, were retained. Scaffolding of the processed mate-pair

reads was performed using SSPACE v.2.0 (Boetzer et al., 2011) with a minimum number of links

set at 2. The population genome bins were improved by adding or removing linked contigs based on

coverage information, the number of connections between contigs and completeness/contamination

estimates (Table S8). The completeness estimates were also used to calculate the expected genome

size. The data has been submitted to the NCBI Short Read Archive under BioProject

PRJNA284316.

Genome tree phylogeny

A genome tree was generated using 38 universal (Darling et al., 2014) conserved marker genes

from 2015 finished bacterial and archaeal genomes available from the Integrated Microbial

Genomes database (IMG) (Markowitz et al., 2012) and the recovered population genomes (Table


19

S9). Marker genes were identified using HMMs and the genome tree was generated with FastTree

(Price et al., 2009) using a concatenated alignment of the marker genes. The phylogenetic affiliation

of the population genomes was determined relative to the IMG genomes and compared to the

taxonomy of 16S rRNA sequences identified in the genome bins using CommunityM v.1.2 with

default parameters and the GreenGenes database clustered at 97% sequence similarity

(https://github.com/dparks1134/CommunityM.git).

Functional annotation of the metagenomes

For each individually assembled metagenome, open reading frames (ORFs) were identified using

PROKKA v.1.8 (Seemann, 2014). Genes encoding carbohydrate active enzymes (CAZy) (Lombard

et al., 2014) were detected using hmmer v.3.1 (Finn et al., 2011) and the HMM-based database for

CAZy annotation (dbCAN v.3) (Yin et al., 2012), which classifies enzymes that degrade glycosidic

bonds into families based on structurally-related catalytic and carbohydrate-binding modules. For

each metagenome, the total number of hits to a glycoside hydrolase (GH) family was calculated for

comparative analysis.

Functional annotation of the population genomes and metabolic network reconstruction

Population genomes recovered from the combined metagenome assembly were annotated using

PROKKA v.1.8 and validated based on homology search with BLASTP (Altschul et al., 1990)

using the IMG protein database (Markowitz et al., 2012) and KEGG Orthology database (Kanehisa

and Goto, 2000; Kanehisa et al., 2014). Carbohydrate active enzymes were detected for each

population genome using hmmer and dbCAN, similar to the individual metagenomes. These results

were combined with known activities of GH families (http://www.cazy.org;

https://www.cazypedia.org) (Allgaier et al., 2010) and the annotations based PROKKA and IMG

databases, in order to determine the predominant substrate profile for each GH family. A full

reconstruction of the metabolic potential for each population genomes was based on the consensus

of the different annotation methods used and metabolic pathways identified in KEGG and MetaCyc


20

(Caspi et al., 2008). A metabolic pathway comprising multiple genes was considered present if the

majority (>75%) of genes involved in this pathway were detected in the genome. The populations

could be classified into one or more functional guilds, namely hydrolysis (cellulose/cellobiose),

fermentation (acetate/propionate/butyrate), syntrophic VFA oxidation and methanogenesis

(hydrogenotrophic/acetoclastic), based primarily on their carbon metabolism. In order to reconstruct

the metabolic networks at each time point (Fig. 4 and Fig. 5), only those populations present at >

0.1% relative abundance in at least one of the reactor were considered, and their average relative

abundance across the reactors at each time point was calculated to determine the contribution of

each population to the flow of carbon (represented by the thickness of the lines in Fig. 4 and Fig. 5).

The combined (average) relative abundance of all populations within a functional guild was

calculated to assess the overall distribution of functions across the community and how this balance

shifts over time.

Statistical analyses

All statistical analyses and construction of heatmaps were carried out in RStudio v.2.15.0 using the

R CRAN packages: vegan (Oksanen et al., 2013) and RColorBrewer (Neuwirth, 2011). Tukey’s

Honestly Significant Differences Tests were used to statistically compare the datasets and principle

component analysis (PCA) was used to assess the variability between samples. Correlation analyses

were performed through linear regression of the relative abundance profiles and assessing the

respective R2 values.

Acknowledgements


21

This study was supported by the Commonwealth Scientific and industrial Research Organization

(CSIRO) Flagship Cluster “Biotechnological solutions to Australia’s transport, energy and

greenhouse gas challenges”. IV acknowledges support from the University of Queensland

International Scholarship, and PJ acknowledges support from the Australian Meat Processor

Corporation (2013/4008 Technology Fellowship). KR acknowledges support by the European

Research Council (Starter Grant Electrotalk), and GWT was supported by an Australian Research

Council Queen Elizabeth fellowship (DP1093175). The authors would like to thank Serene Low at

the Australian Centre for Ecogenomics for the metagenome library preparation, Mike Imelfort for

assistance with the bioinformatics analysis, Donovan Parks for assistance with the genome quality

assessment and Philip Hugenholtz for providing comments on the manuscript.

Competing financial interests

The authors declare no competing financial interests.

References

Albertsen, M., Hugenholtz, P., Skarshewski, A., Nielsen, K.A., Tyson, G.W., and Nielsen, P.H.

(2013) Genome sequences of rare, uncultured bacertia obtained by differential coverage binning of

multiple metagenomes. Nature Biotechnology 31: 533-538.

Allgaier, M., Reddy, A., Park, J.I., Ivanova, N.N., D'Haeseleer, P., Lowry, P. et al. (2010) Targeted

discovery of glycoside hydrolases from a switchgrass-adapted compost community. Plos One 5: 1-

9.

Altschul, S.F., Gisch, W., Miller, W., Meyers, E.W., and Lipman, D.J. (1990) Basic local alignment

search tool. Journal of Molecular Biology 215: 403-410.


22

Amani, T., Norsati, M., and Sreekrishnan, T.R. (2010) Anaerobic digestion from the viewpoint of

microbiological, chemical, and operational aspects - a review. Environmental Reviews 18: 255-278.

Baker, B.J., Lazar, C.S., Teske, A.P., and Dick, G.J. (2015) Genomic resolution of linkages in

carbon, nitrogen, and sulfur cycling among widespread estuary sediment bacteria. Microbiome 3: 1-

12.

Bayer, E.A., Lamed, R., White, B.A., and Flint, H.J. (2008) From cellulosomes to cellulosomics.

The Chemical Record 8: 364-377.

Bekele, A.Z., Koike, S., and Kobayashi, Y. (2011) Phylogenetic diversity and dietary assoiation of

rumen Treponema revealed using grou-specific 16S rRNA gene-based analysis. FEMS

Microbiology Letters 316: 51-60.

Berlemont, R., and Martiny, A.C. (2013) Phylogenetic distribution of potential cellulases in

bacteria. Applied and Environmental Microbiology 79: 1545-1554.

Boetzer, M., Henkel, C.V., Jansen, H.J., Butler, D., and Pirovano, W. (2011) Scaffolding pre-

assembled contigs using SSPACE. Bioinformatics 27: 578-579.

Brown, C.T., Hug, L.A., Thomas, B.C., Sharon, I., Castelle, C.J., Singh, A. et al. (2015) Unusual

biology across a group comprising more than 15% of domain Bacteria. Nature: 1-18.

Caspi, R., Foerster, H., Fulcher, C.A., PKaipa, P., Krummenacker, M., Latendresse, M. et al. (2008)

The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of

pathway/genome databases. Nucleic Acids Research 36: 623-631.

Castelle, C.J., Wrighton, K.C., Thomas, B.C., Hug, L.A., Brown, C.T., Wilkins, M.J. et al. (2015)

Genomic expansion of domain Archaea highlights roles for organisms from new phyla in anaerobic

carbon cycling. Current Biology 25: 1-12.

Chertkov, O., Sikorski, J., Brambilla, E., Lapidus, A., Copeland, A., Glavina Del Rio, T. et al.

(2010) Complete genome sequence of Aminobacterium colombiense type strain (ALA-1T).

Standards in Genomic Sciences 2: 280-289.


23

Darling, A.E., Jospin, G., Lowe, E., Matsen, I.V., Bik, H.M., and Eisen, J.A. (2014) Phylosift:

phylogenetic analysis of genomes and metagenomes. PeerJ 2: 1-28.

DeSantis, T.Z., Hugenholtz, P., Larsen, N., Rojas, M., Brodie, E.L., Keller, K. et al. (2006)

Greengenes, a chimera-checkes 16S rRNA gene database and workbench compatible with ARB.

Applied and Environmental Microbiology 72: 5069-5072.

Finn, R.D., Clements, J., and Eddy, S.R. (2011) HMMER web server: interactive sequence

similarity searching. Nucleic Acids Research 39: 29-37.

Ganesan, A., Chaussonnerie, S., Tarrade, A., Dauga, C., Boucher, T., Pelletier, E. et al. (2008)

Cloacibacillus evryensis gen. nov., sp. nov., a novel asaccharolytic, mesophilic, amino-acid-

degrading bacterium within the phylum 'Synergistetes', isolated from an anaerobic sludge digester.

International Journal of Systematic and Evolutionary Microbiology 58: 2003-2012.

Hanaia, W.B., Postec, A., Aullo, T., Ranchou-Peyruse, A., Erauso, G., Brochier-Armanet, C. et al.

(2013) Mesotoga infera sp. nov., a mesophilic member of the order Thermotogales, isolated from an

underground gas storage aquifer. International Journal of Systematic and Evolutionary

Microbiology 63: 3003-3008.

Hanreich, A., Schimpf, U., Zakrzewski, M., Schluter, A., Benndorf, D., Heyer, R. et al. (2013)

Metagenome and metaproteome analyses of microbial communities in mesophilic biogas-producing

anaerobic batch fermentations indicate concerted plant carbohydrate degradation. Systematic and

Applied Microbiology 36: 330-338.

Harmsen, H.J.M., Van Kuijk, B.L.M., Plugge, C.M., Akkermans, A.D.L., De Vos, W.M., and

Stams, A.J.M. (1998) Syntrophobacter fumaroxidans sp. nov., a syntrophic propionate-degrading

sulfate-reducing bacterium. International Journal of Systematic Bacteriology 48: 1383-1387.

Haroon, M.F., Hu, S., Shi, Y., Imelfort, M., Keller, J., Hugenholtz, P. et al. (2013) Anaerobic

oxidation of methane coupled to nitrate reduction in a novel archaeal lineage. Nature 500: 567-570.


24

Imelfort, M., Parks, D.H., Woodcroft, B.J., Dennis, P.D., Hugenholtz, P., and Tyson, G.W. (2014)

GroopM: an automated tool for the recovery of population genomes from related metagenomes.

PeerJ 2: e603.

Jaenicke, S., Ander, C., Bekel, T., Bisdorf, R., Droge, M., Gartemann, K.-H. et al. (2010)

Comparative and joint analysis of two metagenomic datasets from a biogas fermenter obtained by

454-pyrosequencing. Plos One 6: 1-15.

Kanehisa, M., and Goto, S. (2000) KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic

Acids Research 28: 27-30.

Kanehisa, M., Goto, S., Sato, Y., Kawashima, M., Furumichi, M., and Tanabe, M. (2014) Data,

information, knowledge and principle: back to metabolism in KEGG. Nucleic Acids Research 42:

199-205.

Kaoutari, A.E., Armougom, F., Gordon, J.I., Raoult, D., and Henrissat, B. (2013) The abundance

and variety of carbohydrate-active enzymes in the human gut microbiota. Nature Reviews


Li, H. (2013) Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM.

arXiv:13033997v2 [q-bioGN].

Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N. et al. (2009) The sequence

alignment/map (SAM) format and SAMtools. Bioinformatics 25: 2078-2079.

Lombard, V., Golaconda Ramulu, H., Drula, E., Coutinho, P.M., and Henrissat, B. (2014) The

carbohydrate-active enzymes database (CAZy) in 2013. Nucleic Acids Research 42: 490-495.

Lynd, L.R., Weimer, P.J., van Zyl, W.H., and Pretorius, I.S. (2002) Microbial cellulose utilization:

Fundamentals and biotechnology. Microbiology and Molecular Biology Reviews 66: 506-577.

Markowitz, V.M., Chen, I.-M.A., Palaniappan, K., Chu, K., Szeto, E., Grechkin, Y. et al. (2012)

IMG: the integrated microbial genomes database and comparative analysis system. Nucleic Acids

Research 40: 115-122.


25

Martinez-Garcia, M., Brazel, D.M., Swan, B.K., Arnosti, C., Chain, P.S.G., Reitenga, K.G. et al.

(2012) Capturing single cell genomes of active polysaccharide degraders: An unexpected

contribution of Verrucomicrobia. Plos One 7: 1-11.

Mclnerney, M.J., Rohlin, L., Mouttaki, H., Kim, U., Krupp, R.S., Rios-Hernandez, L. et al. (2007)

The genome of Syntrophus aciditrophicus: Life at the thermodynamic limit of microbial growth.

PNAS 104: 7600-7605.

Naas, A.E., Mackenzie, A.K., Mravec, J., Schuckel, J., Willats, W.G.T., Eijsink, V.G.H., and Pope,

P.B. (2014) Do rumen Bacteroidetes utilize an alternative mechanism for cellulose degradation.

MBio 5: 1-6.

Nesbo, C.L., Bradman, D.M., Adebusuyi, A., Dlutek, M., Petrus, A.K., Foght, J. et al. (2012)

Mesotoga prima gen. nov., sp. nov., the first described mesophilic species of the Thermotogales.

Extremophiles 16: 387-393.

Neuwirth, E. (2011) RColorBrewer: ColorBrewer palettes. .

Nobu, M.K., Narihiro, T., Rinke, C., Kamagata, Y., Tringe, S.G., Woyke, T., and Liu, W.-T. (2014)

Microbial dark matter ecogenomics reveals complex synergistic networks in a methanogenic

bioreactor. The ISME Journal: 1-13.

Oksanen, J., Blanchet, G., Kindt, R., Legendre, P., Minchin, P.R., O'Hara, R.B. et al. (2013) Vegan:

community ecology package.

Oremland, R.S., and Polcin, S. (1982) Methanogenesis and sulfate reduction: competitive and

noncompetitive substrates in estuarine sediments. Applied and Environmental Microbiology 44:

1270-1276.

Parks, D.H., Imelfort, M., Skennerton, C.T., Hugenholtz, P., and Tyson, G.W. (2015) CheckM:

assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes.

PeerJ PrePrints 2.


26

Pelletier, E., Kreimeyer, A., Bocs, S., Rouy, Z., Gyapay, G., Chouari, R. et al. (2008) "Candidatus

Cloacamonas Acidaminovorans": genome sequence reconstruction provides a first glimpse of a new

bacterial division. Journal of Bacteriology 190: 2572-2579.

Price, M.N., Dehal, P.S., and Arkin, A.P. (2009) FastTree: Computing large minimum-evolution

trees with profiles instead of distance matric. Molecular Biology and Evolution 26: 1641-1650.

Qiu, Y.-L., Hanada, S., Ohashi, A., Harada, H., Kamagata, Y., and Sekiguchi, Y. (2008)

Syntrophorhabdus aromaticivorans gen. nov., sp. nov., the first cultured anaerobe capable of

degrading phenol to acetate in obligate syntrophic associations with a hydrogenotrophic

methanogen. Applied and Environmental Microbiology 74: 2051-2058.

Raghoebarsing, A.A., Pol, A., van de Pas-Schoonen, K.T., Smolders, A.J.P., Ettwig, K.F., Rijpstra,

I.C. et al. (2006) A microbial consortium couples anaerobic methane oxidation to denitrification.

Nature 440: 918-921.

Rahman, N.A., Parks, D., Vanwonterghem, I., Morrison, M., Tyson, G.W., and Hugenholtz, P.

(2015) A phylogenomic analysis of the bacterial phylum Fibrobacteres. Frontiers in Microbiology.

Rotaru, A.-E., Shrestha, P.M., Liu, F., Shrestha, M., Shrestha, D., Embree, M. et al. (2014) A new

model for electron flow during anaerobic digestion: direct interspecies electron transfer to

Methanosaeta for the reduction of carbon dioxide to methane. Energy and Environmental Science 7:

408-415.

Seemann, T. (2014) Prokka: rapid prokaryotic genome annotation. Bioinformatics 30: 2068-2069.

Sekiguchi, Y., Ohashi, A., Parks, D.H., Yamauchi, T., Tyson, G.W., and Hugenholtz, P. (2015)

First genomic insights into members of a candidate bacterial phylum responsible for wastewater

bulking. PeerJ 3.

Smith, K.S., and Ingram-Smith, C. (2007) Methanosaeta, the forgotten methanogen? Trends in


Solli, L., Havelsrud, O.E., Horn, S.J., and Rike, A.G. (2014) A metagenomic study of the microbial

communities in four parallel biogas reactors. Biotechnology for Biofuels 7: 1-15.


27

Stolze, Y., Zakrzewski, M., Maus, I., Eikmeyer, F., Jaenicke, S., Rottmann, N. et al. (2015)

Comparative metagenomics of biogas-producing microbial communities from production-scale

biogas plants operating under wet or dry fermentation conditions. Biotechnology for Biofuels 8: 1-

18.

Suen, G., Weimer, P.J., Stevenson, D.M., Aylward, F.O., Boyum, J., Deneke, J. et al. (2011) The

complete gneome sequence of Fibrobacter succinogenes S85 reveals a cellulolytic and metabolic

specialist. PLoS ONE 6: 1-15.

Tringe, S.G., and Rubin, E.M. (2005) Metagenomics: DNA sequencing of environmental samples.

Nature Reviews Genetics 6: 805-814.

Tringe, S.G., Von Mering, C., Kobayashi, A., Salamov, A.A., Chen, K., Chang, H.W. et al. (2005)

Comparative metagenomics of microbial communities. Science 308: 554-557.

Tucci, S., and Martin, W. (2007) A novel prokaryotic trans-2-enoyl-CoA reductase from the

spirochete Treponema denticola. FEBS Letters 581: 1561-1566.

Tveit, A., Schwacke, R., Svenning, M.M., and Urich, T. (2013) Organic carbon transformations in

high-Arctic peat soils: key functions and microorganisms. The ISME Journal 7: 299-311.

Tyson, G.W., Chapman, J., Hugenholtz, P., Allen, E.E., Ram, R.J., Richardson, P.M. et al. (2004)

Community structure and metabolism through reconstruction of microbial genomes from the

environment. Nature 428: 37-43.

Vanwonterghem, I., Jensen, P.D., Ho, D.P., Batstone, D.J., and Tyson, G.W. (2014a) Linking

microbial community structure, interactions and function in anaerobic digesters using new

molecular techniques. Current Opinion in Biotechnology 27: 55-64.

Vanwonterghem, I., Jensen, P.D., Dennis, P.G., Hugenholtz, P., Rabaey, K., and Tyson, G.W.

(2014b) Deterministic processes guide long-term synchronised population dynamics in replicate

anaerobic digesters. The ISME Journal: 1-14.

Venter, J.C., Remington, K., Heidelberg, J.F., Halpern, A.L., Rusch, D., Eisen, J.A. et al. (2004)

Environmental genome shotgun sequencing of the Sargasso sea. Science 304: 66-74.


28

Verhees, C.H., Kengen, S.W.M., Tuininga, J.E., Schut, G.J., Adams, M.W.W., De Vos, W.M., and

Van der Oost, J. (2003) The uniqu features of glycolytic pathways in Archaea. Biochemical Journal

375: 231-246.

Vital, M., Howe, A.C., and Tiedje, J.M. (2016) Revealing the bacterial butyrate synthesis pathways

by analyzing (meta)genomic data. MBio 5: 1-11.

Wong, M.T., Zhang, D., Li, J., Hui, R.K.H., Tun, H.M., Brar, M.S. et al. (2013) Towards a

metagenomic understanding on enhanced biomethane production from waste activated sludge after

pH 10 pretreatment. Biotechnology for Biofuels 6: 1-14.

Wrighton, K.C., Thomas, B.C., Sharon, I., Miller, C.S., Castelle, C.J., Verberkmoes, N.C. et al.

(2012) Fermentation, hydrogen, and sulfur metabolism in multiple uncultivated bacterial phyla.

Science 337: 1661-1665.

Wrighton, K.C., Castelle, C.J., Wilkins, M.J., Hug, L.A., Sharon, I., Thomas, B.C. et al. (2014)

Metabolic interdependencies between phylogenetically novel fermenters and respiratory organisms

in an unconfined aquifer. The ISME Journal 8: 1452-1463.

Yin, Y., Mao, X., Yang, J.C., Chen, X., Mao, F., and Xu, Y. (2012) dbCAN: a web resource for

automated carbohydrate-active enzyme annotation. Nucleic Acids Research 40: 445-451.

Figure legends

Fig. 1. Metagenome-based microbial community composition. The community profiles are shown

for AD1, AD2 and AD3 on Days 96 (T1) and 362 (T2) based on the 16S rRNA genes extracted

from the metagenomes and clustered at 97% sequence similarity. All populations present at >0.5%

relative abundance in at least one of the samples are shown. The taxonomic classification based on

the 16S rRNA gene are shown at the phylum level (left-hand side) and lowest level of taxonomic

assignment (c: class, o: order, f: family and g: genus; right-hand side).


29

Fig. 2. Phylogeny of the population genomes. Genome tree based on a concatenated set of marker

genes showing the phylogenetic affiliation of the 101 recovered population genomes from the

anaerobic digesters relative to 2015 IMG genomes.

Fig. 3. Distribution of glycoside hydrolase (GH) families for 62 population genomes. The number

of open reading frames (ORFs) identified within each GH family is shown by the heatmap and GH

families are grouped by substrate activity. The phylum-level classification of the population

genomes is shown on the left-hand side of the panel.

Fig. 4. Metabolic network based on the functional classification of all populations present at >0.1%

relative abundance in at least one of the anaerobic digesters (AD1, AD2 and AD3) at Day 96. The

color of the edges corresponds to the substrate node and the thickness of the edges is representative

of the relative abundance of each population genome (average for the three reactors). The

percentages on the right-hand side of the panel show the fraction of the community (total relative

abundance) classified within each functional guild.

Fig. 5. Metabolic network based on the functional classification of all populations present at >0.1%

relative abundance in at least one of the anaerobic digesters (AD1, AD2 and AD3) at Day 362. The

color of the edges corresponds to the substrate node and the thickness of the edges is representative

of the relative abundance of each population genome (average for the three reactors). The

percentages on the right-hand side of the panel show the fraction of the community (total relative

abundance) classified within each functional guild.


31

Table

Table 1. Summary statistics (Compl.: completeness; Cont.: contamination) of 62 population genomes selected for metabolic analysis, which were most

complete and/or abundant in the reactors.

Bin_ID Size Scaffolds Compl. Cont. GC ORFs Genome tree Relative abundance (%)

16S

rRNA

(Mb) # (%) (%) (%) # phylogeny AD1_T1 AD2_T1 AD3_T1 AD1_T2 AD2_T2 AD3_T2 gene

Methan_01 2.6 1 99.4 0.0 59.1 2557 Euryarchaeota 0.00 0.00 0.00 10.13 19.67 5.79 +


Methan_04 2.3 235 85.3 2.9 53.4 2083 Euryarchaeota 0.03 0.04 0.49 0.03 0.11 0.12 -




Cren_01 1.8 151 89.0 0.9 58.0 1863 Crenarchaeota 0.00 0.00 0.00 0.23 0.05 0.28 +

Actino_01 3.6 1 99.4 0.0 73.9 3175 Actinobacteria 0.16 0.16 0.22 23.49 7.59 6.24 +

Actino_02 3.2 27 95.2 0.3 67.6 2823 Actinobacteria 3.19 4.26 5.16 0.31 0.21 0.19 -

Alpha_01 3.9 55 99.0 0.5 68.1 3789 Alphaproteobacteria 0.00 0.00 0.00 2.18 0.28 0.26 +

Alpha_02 4.4 182 92.1 6.5 67.6 3768 Alphaproteobacteria 0.00 0.00 0.00 0.65 0.67 0.63 -

Alpha_03 6.4 53 98.9 8.5 60.6 5964 Alphaproteobacteria 0.00 0.00 0.00 1.93 1.01 1.22 +

Alpha_05 3.5 628 80.7 3.9 66.9 3160 Alphaproteobacteria 0.27 0.28 0.37 0.01 0.01 0.01 -

Bact_02 4.5 99 90.9 4.1 41.5 3385 Bacteroidetes 0.76 1.11 0.92 0.02 0.01 0.01 +










32

Bact_23 4.3 200 94.9 1.7 48.3 3135 Bacteroidetes 0.00 0.00 0.00 0.00 0.00 2.22 -


Beta_02 3.4 139 83.0 2.8 63.7 2793 Betaproteobacteria 0.20 0.17 0.23 0.16 0.12 0.11 -

WWEI_01 2.0 130 91.3 6.0 36.4 1532 WWEI 0.20 0.06 0.60 0.00 0.00 0.00 -

Clorobi_01 2.3 3 99.5 0.8 56.2 2119 Chlorobi 0.01 0.00 0.00 1.77 2.78 3.01 +

Chloro_02 2.6 237 70.8 0.2 52.5 1804 Chloroflexi 0.03 0.00 0.00 0.05 0.37 0.06 -

Deferri_01 2.9 14 98.2 0.9 44.4 2626 Deferribacterales 1.06 1.27 1.01 0.00 0.00 0.00 +

Delta_01 4.9 267 88.8 4.4 59.7 3909 Deltaproteobacteria 0.00 0.02 0.00 0.37 0.86 0.27 +

Delta_02 4.4 138 92.6 4.9 56.8 3832 Deltaproteobacteria 0.00 0.00 0.00 0.32 0.36 0.90 -

Delta_03 5.1 106 69.0 5.8 61.5 3387 Deltaproteobacteria 0.00 0.00 0.00 1.60 1.34 1.28 +

Epsilon_01 2.7 26 100.0 0.8 43.9 2690 Epsilonproteobacteria 1.42 1.46 1.19 0.01 0.03 0.01 -

Fibro_01 2.9 50 98.9 2.2 37.4 2362 Fibrobacteres 0.87 6.26 0.67 0.00 0.00 0.00 -

Fibro_02 3.5 122 93.1 0.7 51.4 2764 Fibrobacteres 4.00 4.16 2.62 0.05 0.01 0.02 +

Fibro_03 4.1 11 89.4 2.3 50.2 2968 Fibrobacteres 0.13 0.45 0.09 5.83 4.40 12.69 +

Firm_03 2.7 684 89.0 3.4 39.5 2573 Firmicutes 0.00 0.05 0.00 0.85 0.93 0.20 +

Firm_04 4.2 53 83.9 2.7 54.1 2959 Firmicutes 9.18 0.00 3.83 0.00 0.00 0.00 +

Firm_05 3.2 265 92.9 1.8 45.7 2806 Firmicutes 0.00 7.16 3.75 0.00 0.08 0.00 +

Firm_06 4.2 215 99.3 3.4 44.5 3763 Firmicutes 0.00 3.09 0.73 0.00 0.00 0.00 -

Firm_10 3.3 152 84.9 1.6 62.5 2482 Firmicutes 0.26 0.09 0.09 0.36 0.06 0.06 +

Firm_11 3.5 3 99.2 0.3 55.2 2847 Firmicutes 0.00 0.00 0.00 3.14 2.73 14.64 +

Firm_13 2.0 67 85.1 0.4 51.6 1514 Firmicutes 0.28 0.72 0.19 0.00 0.00 0.00 -

Firm_14 3.1 45 100.0 0.7 46.3 2824 Firmicutes 0.00 0.00 4.61 0.00 0.00 0.00 +

Firm_16 3.8 117 98.6 4.6 49.3 3097 Firmicutes 12.81 4.44 0.03 0.03 0.03 0.00 -

Lenti_01 3.9 118 70.3 0.4 60.9 2774 Lentisphaerae 2.43 2.08 2.84 2.99 3.63 1.54 +

Lenti_02 6.0 662 82.7 4.1 67.3 4114 Lentisphaerae 0.11 0.08 0.53 0.22 0.79 0.14 +

Planc_01 5.7 162 100.0 1.1 62.8 4512 Planctomycetes 0.00 0.00 0.00 0.40 1.79 1.14 +

Spiro_02 2.6 74 98.9 2.3 55.0 2311 Spirochaetes 2.93 1.23 0.07 0.46 0.19 0.01 -






Spiro_10 3.0 11 97.9 0.0 61.8 2527 Spirochaetes 0.33 0.36 1.29 6.10 2.61 1.55 +


33

Spiro_12 2.4 139 94.9 0.0 57.2 2129 Spirochaetes 0.95 0.86 0.75 2.64 0.50 0.79 +

Syner_01 3.7 683 83.8 5.7 58.9 3280 Synergistetes 0.01 0.01 0.00 0.44 1.26 0.86 +

Syner_03 1.9 218 100.0 2.4 52.0 1862 Synergistetes 0.48 0.79 1.44 0.32 1.35 1.85 +

Thermo_01 2.8 76 94.4 0.3 48.6 2472 Thermotogae 0.06 0.37 0.31 0.22 1.81 1.17 +

Thermo_02 3.5 643 93.8 1.9 47.0 3159 Thermotogae 0.00 0.00 0.00 0.33 0.27 0.01 -

Verruco_01 2.7 7 95.4 1.3 63.0 2261 Verrucomicrobia 0.00 0.01 0.19 5.88 0.51 0.00 +

Verruco_02 2.9 33 94.6 2.0 58.8 2263 Verrucomicrobia 0.00 0.00 0.06 0.01 0.65 0.00 +


Metagenome-based microbial community composition. The community profiles are shown for AD1, AD2 and AD3 on Days 96 (T1) and 362 (T2) based on the 16S rRNA genes extracted from the metagenomes and

clustered at 97% sequence similarity. All populations present at >0.5% relative abundance in at least one of

the samples are shown. The taxonomic classification based on the 16S rRNA gene are shown at the phylum level (left-hand side) and lowest level of taxonomic assignment (c: class, o: order, f: family and g: genus;

right-hand side). 189x278mm (300 x 300 DPI)

Page 34 of 38

Wiley-Blackwell and Society for Applied Microbiology


C. Microarchaeum acidiphilum

C.Parvarchaeum acidiphilum

NANOARCHAEOTA KORARCHAEOTA

THAUMARCHAEOTA

c_Thermoprotei

OSPB1 - NAG1

o_Thermoplasmatales

g_Archaeoglobus

Methanoflorens stordalenmirensis

c_Halobacteria

g_Methanocellus

ANME-1

o_Methanosarcinales

Methanosaeta harundinacea

Methanosaeta therm

ophila

Methanosaeta concilii

Methanocorpusculum

labreanum

g_Methanoplanus

Methanospirillum

hungatei

o_Methanom

icrobiales

Methanofollis lim

inatans

Methanoculleus m

arisnigri

f_Thermotogaceae

f_Thermotogaceae

Kosm

otoga olearia

Mesotoga prim

a

OP9

g_Anaerobaculum

Thermovirga lienii

Cloacibacillus evryensis

f_Synergistaceae

o_Synergistales

Synergistetes sp. S

GP1

Am

inobacteriu

m colom

biense

CYA

NO

BACTE

RIA

ARM

ATIM

ON

AD

ETE

SKted

onob

acter racemifer

f_D

ehalococcoid

aceaec_

Chloroflexi

Term

obacu

lum

terrenum

c_Therm

omicrob

iaCald

ilinea aerop

hila

Bacteriu

m sp

. JAD

2Anaerolin

ea sp.

Anaerolin

ea sp.

Anae

rolin

ea t

her

mop

hila

TH

ERM

Ig_Rubro

bac

ter

o_Sol

irubro

bac

tera

les

f_Cor

ionib

acte

riac

eaAci

dim

icro

biu

m fer

roox

idan

s

Nitrilir

upto

r al

kalip

hilu

sAct

inob

acte

rium

c_Act

inob

acte

ria

c_Act

inob

acte

ria

c_Act

inob

acte

ria

Jian

gella

gan

suen

sis

Act

inop

olym

orph

a al

baf_

Noc

ardo

idac

eae

Mic

rolu

natu

s ph

osph

ovor

usD

ehal

obac

ter

sp.

f_Pr

opio

niba

cter

iace

ae

c_Act

inob

acte

ria

o_M

icro

cocc

ales

c_Act

inob

acte

ria

c_Act

inob

acte

ria

Bra

chyb

acte

rium

fae

cium

Beu

tenb

ergi

a ca

vern

ae

Cel

lulo

mon

as fla

vige

na

Cellu

lom

onas

fim

i

Cellv

ibrio

gilv

us

FUSO

BACT

ERIA

g_Ac

hole

plas

ma

f_Er

ysip

elot

richa

ceae

f_Myc

oplasm

atac

eae

Tepida

naer

obac

ter s

p.

o_Th

erm

anae

roba

cter

ales

g_Clos

tridium

f_Eu

bacter

iace

ae

o_Clos

tridiales

f_Tiss

ierell

acea

e

Euba

cter

ium sa

phen

um

Euba

cteriu

m infir

mum

f_Ace

tivibr

ionac

eae

f_Lac

hnos

pirac

eae

Abiotro

phia

defec

tiva

f_Lach

nospir

acea

e

Clostridiales

sp.

f_Oscillospiraceae

f_Ruminococcaceae

f_Ruminococcaceae

Anaerotruncus colihominis

Clostridium methylpentosum

g_RuminococcusEubacterium siraeum

Ruminococcus champanellensis

Ruminococcus flavefaciens

ELUSIMICROBIA TM6 PARCUBACTERIA

f_Leptospiraceae

g_Brachyspirag_Borellia

g_Spirochaetag_Spirochaeta

Spirochaeta smaragdinaeSpirochaeta coccoidesSpirochaeta sp. Grapes

Spirochaeta sp. Buddyg_TreponemaTreponema brennaborense

g_TreponemaTreponema denticolag_Treponema

C. Kuenenia stuttgartiensisc_PlanctomycetiaCHLAMYDIA

Phycisphaera mikurensis

Lentisphaera araneosa

Victivallis vadensis

o_Opitutales

Coraliomargarita akajimensis

c_Verrucomicrobiae

Pedosphaera parvula

C. Cloacamonas acidaminovorans

Fibrobacter succinogenes

GEMMATIMONADETES

IGNAVIBACTERIA

Chloroherpeton thalassium

g_Prosthecochloris

g_Chlorobium

Chlorobaculum parvum

Chlorobium tepidum

c_Rhodothermia

c_Saprospirae

o_Cytophagales

f_Sphingobacteriaceaeo_Flavobacteriales

Rikenella m

icrofusus

Alistipes indistinctus

g_Alistipes

o_O

dorib

acterales

f_M

arinilab

iaceae

Bacteroid

etes sp.

Paludib

acter prop

ionicig

enes

f_Barn

esiellaceae

o_Bacteroid

ales

g_Bacteroid

es

g_Bacteroid

esC. A

zobactoid

es pseu

dotrich

onym

phae

g_D

ysgon

omon

as

g_Porp

hyrom

onas

Tan

nerella forsyth

ensis

g_Parab

acteroides

AQ

UIF

ICAE

f_D

esulfure

llace

ae

f_N

autilia

ceae

g_N

itra

tiru

pto

r

c_Epsi

lonpro

teob

acte

ria

f_H

elic

obac

tera

ceae

g_Cam

pylo

bact

er

g_Sul

furo

spirill

um

Den

itro

vibr

io a

cetiph

ilus

f_D

efer

riba

cter

acea

e

o_Nitro

spiral

es

ACID

OBACTE

RIA

c_Del

tapr

oteo

bact

eria

c_Del

tapr

oteo

bact

eria

Des

ulfo

mon

ile tie

djei

Synt

roph

us a

cidi

trop

hicu

s

Synt

roph

orha

bdus

aro

mat

iciv

oran

s

c_Del

tapr

oteo

bact

eria

c_Del

tapr

oteo

bact

eria

Des

ulfo

bacc

a ac

etox

idan

s

Synt

roph

obac

ter fu

mar

oxid

ans

c_Gam

map

rote

obac

teria

f_Ne

isse

riace

ae

c_Be

tapr

oteo

bact

eriaf_Rh

odoc

yclace

ae

c_Be

tapr

oteo

bacter

ia

f_Co

mam

onad

acea

e

Thiomon

as sp

.

Thiom

onas

inte

rmed

ia

f_La

utro

piace

ae

f_Sut

tere

llace

ae

f_Alca

ligen

acea

eg_Bon

detel

la

Pussi

limon

as sp

.

c_Alph

aprot

eoba

cteria

Magne

tococ

cus s

p.

Geminico

ccus r

oseus

f_Acetobact

eracea

e

c_Alphaproteobact

eriag_Magneto

spirillum

g_Rhodospirillum

o_SphingomonadalesMeganema perideroedes

Rhodobacterales sp.

f_Rhodobacteraceaeg_Rhodobacter

Rhodobacter capsulatusg_Paracoccus

c_AlphaproteobacteriaParvibaculum lavamentivorans

f_Hyphomicrobiaceaeo_RhizobialesPelagibacterium halotoleransPseudovibrio sp.Polymorphum gilvumRoseibium sp.Labrenzia alexandriiLabrenzia aggregataAhrensia sp.f_Aurantimonadaceaeo_Rhizobialesg_Mesorhizobium

Chelativorans sp.

Nitratireductor aquibiodomusHoeflea phototrophica

Martelella mediterranae

C. Liberibacter asiaticus g_Rhizobium

g_Agrobacterium

g_Agrobacterium

Rhizobium leguminosarum

Rhizobium giardinii

Rhizobium sp.

Sinorhizobium fredii

Rhizobium sp.

Sinorhizobium terangae

Sinorhizobium arboris

Ensifer medicae

Sinorhizobium medicae

Ensifer meliloti

Sinorhizobium meliloti

Cren_01

Methan_01

Methan_02

Methan_03

Methan_04

Methan_05

Methan_06

Methan_07

Thermo_01

Thermo_02

Syner_03

Syn

er_01

Syn

er_02

Chloro_

01

Chloro_

02

Chloro_

03

Chloro_

04

Act

ino_

02

Actin

o_01

Tene

r_01

Tene

r_02

Firm

_01

Firm

_02

Firm

_18

Firm_1

7

Firm_09

Firm_10

Firm_11

Firm_12

Firm_13

Firm_14

Firm_15

Firm_16

Firm_07

Firm_08

Firm_03

Firm_04

Firm_05Firm_06

Spiro_01Spiro_02Spiro_03Spiro_04Spiro_05Spiro_06Spiro_12

Spiro_08Spiro_07Spiro_09Spiro_10Spiro_11

Planc_01Lenti_01

Lenti_02Lenti_03Verruco_01

Verruco_02

Verruco_03

Verruco_04Fibro_02

Fibro_03

Chlorobi_01Bact_22

Bact_23

Bact_24

Bact_18

Bact_19

Bact_20

Bact_08

Bact_09

Bact_10

Bact_11

Bact_

12

Bact_

13

Bact_

14

Bact_

15

Bact_

16

Bact_

03

Bact_

01

Bact_

02Ep

silo

n_01

Def

erri_0

1

Del

ta_0

3

Del

ta_0

2

Del

ta_0

1

Beta

_02

Beta_0

1

Alpha_05

Alpha_01

Alpha_02

Alpha_04

Alpha_03

0.001

Verruco_05

WWE1_01

Fibro_01

Bact_07

Bact_17Bact_

05

Bact_

04

Bact_

06

ProteobacteriaDeferribacteresEpsilonproteobacteriaBacteroidetesChlorobiFibrobacteresWWE1VerrucomicrobiaLentisphaeraePlanctomycetesSpirochaetesFirmicutesTenericutesActinobacteriaChloroflexiSynergistetesThermotogaeEuryarchaeotaCrenarchaeota

Page 35 of 38



Distribution of glycoside hydrolase (GH) families for 62 population genomes. The number of open reading frames (ORFs) identified within each GH family is shown by the heatmap and GH families are grouped by

substrate activity. The phylum-level classification of the population genomes is shown on the left-hand side of the panel.

178x195mm (300 x 300 DPI)

Page 36 of 38



Metabolic network based on the functional classification of all populations present at >0.1% relative abundance in at least one of the anaerobic digesters (AD1, AD2 and AD3) at Day 96. The color of the edges

corresponds to the substrate node and the thickness of the edges is representative of the relative

abundance of each population genome (average for the three reactors). The percentages on the right-hand side of the panel show the fraction of the community (total relative abundance) classified within each

functional guild. 212x183mm (300 x 300 DPI)

Page 37 of 38



Metabolic network based on the functional classification of all populations present at >0.1% relative abundance in at least one of the anaerobic digesters (AD1, AD2 and AD3) at Day 362. The color of the edges corresponds to the substrate node and the thickness of the edges is representative of the relative

abundance of each population genome (average for the three reactors). The percentages on the right-hand side of the panel show the fraction of the community (total relative abundance) classified within each

functional guild. 247x183mm (300 x 300 DPI)

Page 38 of 38



genome‐centric resolution of microbial diversity, metabolism and …392867/uq392867... ·...

Documents