· web viewthis structure recapitulates the majority of the known l4-binding ribosomal protein...

51
Table S1, Mustoe el al. Table S1. UTR and IGR motifs identified from SHAPE-directed motif search, related to Figure 6. Motifs were identified as regions with median SHAPE reactivity below 0.3 (corresponding to the reactivity expected of stably paired nucleotides) and a base-pairing entropy cutoff of 0.04 (corresponding to a pairing probability of 0.9). For each motif, the cell-free SHAPE-directed minimum free energy structure is shown on the left. Unless the motif significantly overlaps a neighboring coding sequence, the consensus secondary structure is shown center. The phylogenetic distribution of homologs is shown right, listing the percentage of species that the motif is found in for each class. The key for SHAPE and consensus diagrams is located in Figure 6. Motifs are grouped according to level of functional evidence. Known functional structures 5′ UTR rpsB (+) 189716-189835 Comparative genomics predicted locus (Ott et al., 2012; Rivas et al., 2001; Uzilov et al., 2006) The two 3′ stems of this motif recapitulate the established S2-binding ribosomal autoregulatory element (Fu et al., 2013). In the functional S2- bound structure, the highly conserved GGGU bulge of the 3-stem forms a pseudoknot with downstream nucleotides (labeled PK); pseudoknots are not detectable via our structure modeling strategy. We additionally detect two additional stems located 5′ to the functional 3′ stem. Consistent with prior phylogenetic analyses (Fu et al., 2013), these additional stems are not generally conserved. 5′ UTR rpsA (+) 961128-961189 Gammaproteobacteria Broadly conserved across all orders Gammaproteobacteria Broadly conserved across all orders 1

Upload: others

Post on 06-Jan-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1:  · Web viewThis structure recapitulates the majority of the known L4-binding ribosomal protein autoregulatory element (Fu et al., 2013). The accepted phylogenetic structure includes

Table S1, Mustoe el al.

Table S1. UTR and IGR motifs identified from SHAPE-directed motif search, related to Figure 6. Motifs were identified as regions with median SHAPE reactivity below 0.3 (corresponding to the reactivity expected of stably paired nucleotides) and a base-pairing entropy cutoff of 0.04 (corresponding to a pairing probability of 0.9). For each motif, the cell-free SHAPE-directed minimum free energy structure is shown on the left. Unless the motif significantly overlaps a neighboring coding sequence, the consensus secondary structure is shown center. The phylogenetic distribution of homologs is shown right, listing the percentage of species that the motif is found in for each class. The key for SHAPE and consensus diagrams is located in Figure 6. Motifs are grouped according to level of functional evidence.

Known functional structures5′ UTR rpsB (+) 189716-189835

Comparative genomics predicted locus (Ott et al., 2012; Rivas et al., 2001; Uzilov et al., 2006)

The two 3′ stems of this motif recapitulate the established S2-binding ribosomal autoregulatory element (Fu et al., 2013). In the functional S2-bound structure, the highly conserved GGGU bulge of the 3-stem forms a pseudoknot with downstream nucleotides (labeled PK); pseudoknots are not detectable via our structure modeling strategy. We additionally detect two additional stems located 5′ to the functional 3′ stem. Consistent with prior phylogenetic analyses (Fu et al., 2013), these additional stems are not generally conserved.

5′ UTR rpsA (+) 961128-961189

Comparative genomics predicted locus (Ott et al., 2012)

This structure recapitulates the S1-binding ribosomal autoregulatory element (Fu et al., 2013). In the phylogenetic-derived structure, an additional stem forms downstream that sequesters the Shine-Dalgarno sequence (Fig. S6). Both our SHAPE data and structure models indicate that this stem is unstable in the absence of S1 ligand.

5′ UTR rpmI (-) 1798057-1798129

GammaproteobacteriaBroadly conserved across all orders

GammaproteobacteriaBroadly conserved across all orders

1

Page 2:  · Web viewThis structure recapitulates the majority of the known L4-binding ribosomal protein autoregulatory element (Fu et al., 2013). The accepted phylogenetic structure includes

Table S1, Mustoe el al.

Comparative genomics predicted locus (Livny et al., 2008; Rivas and Eddy, 2001; Rivas et al., 2001; Uzilov et al., 2006; Washietl et al., 2005)

rpmI (encoding ribosomal protein L35) is preceded by a highly structured 430 nt UTR that contains two L20-binding ribosomal autoregulatory elements (Fu et al., 2013). The majority of the structure overlaps the infC coding sequence. For ease of analysis, we split the UTR into two well-defined sub-motifs. This structure recapitulates the proximal L20-binding ribosomal autoregulatory element.

5 UTR rpmI (-) 1798206-1798425

This is the second sub-motif of the rpmI 5 UTR. This structure largely recapitulates the previously derived model of the upstream L20-binding site in the rpmI leader (Chiaruttini et al., 1996). Several short hairpins of the previous model are unstable according to our cell-free and in-cell SHAPE data. In the functional L20-bound structure, a long-range pseudoknot (PK) is formed between the third apical loop and a region >280-nts upstream (Guillier et al., 2005). Our SHAPE data indicate that this long-range pseudoknot is unstable in the absence of protein (Fig. S6).

Intergenic [iscR, iscS] (-) 2659588-2659651

GammaproteobacteriaBroadly conserved across all orders

Motif overlaps infC coding sequence

2

Page 3:  · Web viewThis structure recapitulates the majority of the known L4-binding ribosomal protein autoregulatory element (Fu et al., 2013). The accepted phylogenetic structure includes

Table S1, Mustoe el al.

Comparative genomics predicted locus (Ott et al., 2012; Uzilov et al., 2006; Washietl et al., 2005)

This structure recapitulates the known iscR stability element (Desnoyers et al., 2009; Nawrocki et al., 2015). This element protects the upstream iscR cistron (iron-sulfur cluster regulator) from being degraded during small RNA targeted degradation of 3 cistrons, mediating differential iscR expression.Intergenic [rpmJ, rpsM] (-) 3440516-3440611

Comparative genomics predicted locus (Ott et al., 2012; Rivas and Eddy, 2001; Rivas et al., 2001; Uzilov et al., 2006; Washietl et al., 2005)

This structure largely recapitulates the known S4-binding ribosomal protein autoregulatory element (Fu et al., 2013). In the S4-bound structure, three pseudoknotted interactions (labeled PK1, PK2, and PK3) form between the top of the 3 stem and the coding region, inhibiting translation initiation. Our structure modeling strategy ignores pseudoknots and thus cannot predict this PK interaction. In agreement with prior studies (Schlax et al., 2001), both cell-free and in-cell SHAPE data suggest that PK1, PK2, and PK3 are incompletely formed in the absence of S4 ligand (Fig. S6). Interestingly, the 5-most hairpin identified here is generally not included as part of the functional S4 motif. This hairpin is located immediately downstream of rpmJ and is presumed to be a terminator (Salgado et al., 2013), although both 3-end mapping experiments (Conway et al., 2014) and our own read depths suggest that read-through is prevalent. Given its high conservation, the 5-most hairpin may play a role in S4 regulation.

5′ UTR rpsJ (-) 3451328-3451444

GammaproteobacteriaEnterobacteriales (38%)

GammaproteobacteriaBroadly conserved across all orders

3

Page 4:  · Web viewThis structure recapitulates the majority of the known L4-binding ribosomal protein autoregulatory element (Fu et al., 2013). The accepted phylogenetic structure includes

Table S1, Mustoe el al.

Comparative genomics predicted locus (Ott et al., 2012; Rivas and Eddy, 2001; Rivas et al., 2001; Uzilov et al., 2006; Washietl et al., 2005)

This structure recapitulates the majority of the known L4-binding ribosomal protein autoregulatory element (Fu et al., 2013). The accepted phylogenetic structure includes two additional stems, an HA stem that is 5′ to HB, and an HG stem that is 5′ to HE. Additionally, HE is extended by several additional base pairs (Fig. S6). HA does not appear be functional, while the base of HE and the Shine-Dalgarno-sequestering HG are thought to play an important role in translation repression (Fu et al., 2013; Zengel and Lindahl, 1996). Our data and structure models support the stable folding of HA, but this stem is located too close to the transcript boundary for detection. SHAPE data and our structure models indicate that the proposed base of HE and HG are unstable, consistent with prior structure probing experiments (Fig. S6) (Shen et al., 1988).Intergenic [rpsL, rpsG] (-) 3472102-3472200

This motif overlaps the S7-binding ribosomal protein autoregulatory element. Our structural model deviates somewhat from the accepted functional structure (Fu et al., 2013). Helices H1, H2, and H5 are present in both the accepted structure and in our model. However, in the accepted structure, helices H3 and H4 form in place of H3*. H3 contains two GA pairs, making prediction of this helix particularly challenging. The high SHAPE reactivities in H3 and H3 regions also suggest that the accepted structure is unstable. Thus, in agreement with prior studies (Saito and Nomura, 1994), we speculate that the accepted structure only stably forms when bound by S7.

We note that despite deviations from the accepted structure, our homolog search identified a comparable number of homologs as previous searches (Fu et al., 2013). The functional structure is apparent from the pattern of highly conserved nucleotides in the sequence consensus. Nevertheless, given its apparent (weaker) conservation, we speculate that the alternative structure identified here may play some functional role.

GammaproteobacteriaEnterobacteriales (100%)

Pasteurellales (100%)Orbales (100%)

Vibrionales (10%)

GammaproteobacteriaBroadly conserved across all orders

4

Page 5:  · Web viewThis structure recapitulates the majority of the known L4-binding ribosomal protein autoregulatory element (Fu et al., 2013). The accepted phylogenetic structure includes

Table S1, Mustoe el al.

5 UTR cspA (+) 3717998-3718062

Comparative genomics predicted locus (Ott et al., 2012)

This motif recapitulates a substructure of the cspA (cold shock protein A) thermoregulator (Nawrocki et al., 2015). The full cspA motif stretches over 300 nts, extending substantially into the CDS, and functions by adopting different conformations at low and high temperatures, modulating cspA translation (Giuliodori et al., 2010). Consistent with the hypothesis that the cspA region can adopt multiple structures, our structural model indicates that most of the mRNA has high base pairing entropy. While the complete cspA motif is widely distributed among bacteria (Nawrocki et al., 2015), our homology search indicates that this minimal segment is less conserved.

Intergenic [rplA, rplJ] (+) 4177807-4177996

Comparative genomics predicted locus (Ott et al., 2012; Uzilov et al., 2006; Washietl et al., 2005)

This motif overlaps the L10(L12)4-binding ribosomal protein autoregulatory element (Fu et al., 2013; Iben and Draper, 2008). L10(L12)4 binds to a highly conserved kink-turn (KT) motif at the base of H2, which requires formation of the KT stem. In agreement with prior structural studies (Climie and Friesen, 1988), our SHAPE data indicate that the motif adopts the alternative structure shown here in the absence of the L10(L12)4 ligand. Indeed,

GammaproteobacteriaEnterobacteriales (28%)

Distributed among multiple bacterial phyla

5

Page 6:  · Web viewThis structure recapitulates the majority of the known L4-binding ribosomal protein autoregulatory element (Fu et al., 2013). The accepted phylogenetic structure includes

Table S1, Mustoe el al.

structural changes associated with L10(L12)4-binding have been implicated as responsible for the repression of rplJ translation (Christensen et al., 1984). Alternative pairings of the KT stem also appear to be widely conserved (Iben and Draper, 2008).

Functionally uncharacterized structures in known functional loci5′ UTR rpsT (-) 21107-21152

Comparative genomics predicted locus (Ott et al., 2012; Uzilov et al., 2006)

Past studies have provided strong evidence that rpsT (encoding ribosomal protein S20) is autogenously regulated at the translational level (Parsons and Mackie, 1983; Parsons et al., 1988; Wirth et al., 1982), but the responsible S20-binding motif has remained unknown. We identified this highly conserved hairpin motif 29 nts upstream of the rpsT start codon. Analysis of the ribosome crystal structure revealed that S20 binds an imperfect helix on the 16S rRNA, although otherwise there is little sequence homology between this motif and the 16S rRNA binding site. We attempted but were unable to purify S20 to validate binding. The high conservation of this motif across multiple bacterial orders is consistent with this motif contributing to S20 autoregulation.

rpsT has two promoters (Mackie and Parsons, 1983). Curiously, a previous study observed autoregulation of the proximal isoform of the rpsT transcript (Parsons et al., 1988), which lacks the conserved motif identified here. However, the proximal isoform exhibits significantly less autogenous translational repression than observed in mixtures of both rpsT isoforms (Parsons and Mackie, 1983). The proximal isoform also does not detectably bind S20 (Donly and Mackie, 1988). We thus speculate that S20 primarily binds and represses translation of the distal rpsT isoform that contains the motif shown here. Additional downstream sequences likely participate in the repression mechanism and weakly interact with S20, explaining the weak S20-mediated repression of the proximal isoform.

5 UTR oppA (+) 1299142-1299178

Comparative genomics predicted locus (Ott et al., 2012)

We identified this dual hairpin motif 26 nts upstream of the oppA gene, which encodes a subunit of the oligopeptide ABC transporter. Previous studies have shown that oppA is negatively regulated by the small RNA GcvB, which interacts with the oppA ribosome binding site immediately adjacent to this hairpin motif (Pulvermacher et al., 2008; Sharma et al., 2007). Other studies have found that oppA translation is stimulated by spermidine, which was proposed to be due to destabilization of the oppA mRNA structure (Higashi et al., 2008). However, our SHAPE data do not support the structural model proposed by this study, and we suggest spermidine stimulation could be explained by changes in GcvB regulation. This motif is only found among the very closely related genera of E. coli, S. dysenteria, and S. flexneri, with 100% sequence conservation, and hence lacks

GammaproteobacteriaEnterobacteriales (100%)

Vibrionales (80%)

GammaproteobacteriaEnterobacteriales (9%)

6

Page 7:  · Web viewThis structure recapitulates the majority of the known L4-binding ribosomal protein autoregulatory element (Fu et al., 2013). The accepted phylogenetic structure includes

Table S1, Mustoe el al.

significant evolutionary support.

Intergenic [alaS, csrA] (-) 2817194-2817335

Comparative genomics predicted locus (Ott et al., 2012; Rivas and Eddy, 2001; Uzilov et al., 2006; Washietl et al., 2005)

This motif is located in a known autoregulatory sequence of upstream of csrA (Yakhnin et al., 2011). The csrA gene product (carbon storage regulator; CsrA) is an mRNA-binding protein that globally regulates genes involved in carbon metabolism (Romeo et al., 2013). Previous experiments mapped multiple CsrA binding sites to this intergenic region (shown by blue line in figure), although these experiments used a truncated synthetic transcript unable to form the structure shown above (Yakhnin et al., 2011). In our identified structure, these binding sites are largely paired, deviating from the known preference for CsrA to bind GGA motifs in a hairpin loop context (Dubey et al., 2005).csrA is transcribed as a alaS-csrA polycistronic transcript, from several minor promoters upstream of the motif shown here, and from the P3 promoter denoted above (Conway et al., 2014; Yakhnin et al., 2011). Under exponential growth conditions, the alaS-csrA read-through product and P3 csrA isoform are expressed at roughly equivalent amounts. Upon transition to stationary phase, the P3 isoform is significantly upregulated. We thus hypothesized that the different structures of the “full-length” versus truncated “P3” UTRs would modulate CsrA binding affinity, thereby tuning the autoregulatory circuit for different growth conditions. However, EMSAs indicated that both UTR isoforms bound CsrA with almost identical affinity (not shown). The observed KD75 was 3x higher than measured previously for a different version of the csrA UTR (Yakhnin et al., 2011), and binding was abnormally cooperative (Hill coefficient = 4), suggesting that a substantial fraction of our CsrA protein was inactive. Both UTR isoforms also exhibited significant structural heterogeneity. Thus, we cannot rule out that the different csrA UTR isoforms bind CsrA differently.

Our homolog search returned specific matches upstream of csrA in >90% of enterobacteria, but both structure and sequence was poorly conserved, indicating that the covariation model lost specificity during training. We thus report the results of using our SHAPE-derived structure to directly search for homologs without training.

5 UTR rpoS (-) 2865600-2865661

GammaproteobacteriaEnterobacteriales (53%)

7

Page 8:  · Web viewThis structure recapitulates the majority of the known L4-binding ribosomal protein autoregulatory element (Fu et al., 2013). The accepted phylogenetic structure includes

Table S1, Mustoe el al.

Comparative genomics predicted locus (Ott et al., 2012)

rpoS encodes the alternative RNA polymerase sigma factor s, which is a major regulator of the stress response in E. coli. Regulation of rpoS is extremely complex, involving numerous transcriptional and post-transcriptional regulatory processes. Of particular note, at least 4 different sRNAs bind to the 567 nt long rpoS 5′ UTR to alter transcript stability or inhibit or promote translation (Keseler et al., 2013). Multiple proteins have also been suggested to post-transcriptionally regulate rpoS by binding to the 5′ UTR.

Several studies have characterized the structure of the rpoS 5′ UTR in vitro, finding that it was highly structured (Peng et al., 2014a; 2014b). The most important functional feature of the structure is an inhibitory stem formed between sRNA binding sites in the 5′ UTR and the rpoS ribosome binding site (RBS). This stem constitutively inhibits translation until sRNA binding releases the RBS. Interestingly, in contrast to prior studies, our SHAPE data and structure models indicate that the inhibitory stem is only partially formed under cell-free conditions and unstable in cells (not shown). This is likely due to the increased temperature of our experiments (37 C compared to 25 C of prior experiments), as well as potential structural rearrangements effected by sRNA binding. Other structures previously identified upstream of the inhibitory stem also appeared to be less well-defined, but our search identified several well-defined motifs that recapitulate portions of previously identified structures.

The motif shown here corresponds to two hairpins that separate the sRNA binding site from the rpoS RBS. These hairpins do not have a well-defined function, but mutating these elements perturbs binding by the RNA chaperone Hfq and reduces translation of rpoS (Peng et al., 2014b).

5 UTR rpoS (-) 2865927-2865971

This second of three well-structured motifs identified in the 5′ UTR rpoS corresponds to a previously identified hairpin of unknown function (Peng et al., 2014b).

5 UTR rpoS (-) 2866048-2866126

GammaproteobacteriaEnterobacteriales (97%)

Overlaps nlpD coding sequence

8

Page 9:  · Web viewThis structure recapitulates the majority of the known L4-binding ribosomal protein autoregulatory element (Fu et al., 2013). The accepted phylogenetic structure includes

Table S1, Mustoe el al.

This third of three well-structured motifs identified in the 5′ UTR rpoS corresponds to a previously identified hairpin of unknown function (Peng et al., 2014b).

Intergenic [pgk, fbaA] (-) 3069427-3069510

We identified two extended hairpins in the pgk-fbaA IGR: the hairpin shown here, and a 3-hairpin that is a known ERIC element (not shown) (Wilson and Sharp, 2006). These hairpins include two RNase E cleavage sites previously implicated in post-transcriptional regulation of pgk and fbaA expression (CS1 and CS2) (Bardey et al., 2005). The proteins encoded by pgk (phosphoglycerate kinase) and fbaA (class II fructose bisphosphate aldolase) are key carbon metabolism enzymes (Keseler et al., 2013). Both genes are co-transcribed from promoters upstream of pgk and then subsequently cleaved by RNase E at specific intergenic sites (Bardey et al., 2005). One of these cleavage sites is immediately 3 of the pgk stop codon and is thought to cause degradation of pgk, controlling differential expression of the two genes (Bardey et al., 2005). Previous structure probing experiments identified the hairpin labeled H2 upstream of cleavage site CS1 (Bardey et al., 2005). However, different structures were proposed in place of H1, placing CS2 in a substantially different structural context. In contrast to the prior model, which was not conserved in other enterobacteria, our structural model is conserved across multiple species.

Notably, our structure provides compelling support for conserved RNase E regulation of pgk and fbaA. RNase E recognition sequences are conserved at CS1 and CS2, and both are positioned in conserved single-stranded loops (McDowall et al., 1994). As CS1 includes part of the pgk coding sequence, usage of this site likely leads to efficient degradation of pgk. By contrast, H2 and the upstream ERIC element stabilize fbaA. Alternative cleavage at CS2, which is located between H1 and the ERIC element, likely generates two stable mRNA products, providing precise control of the expression of these two important genes.

Our homolog search returned matches upstream of fbaA in 69% of enterobacteria, but both structure and sequence

GammaproteobacteriaEnterobacteriales (31%)

Overlaps nlpD coding sequence

9

Page 10:  · Web viewThis structure recapitulates the majority of the known L4-binding ribosomal protein autoregulatory element (Fu et al., 2013). The accepted phylogenetic structure includes

Table S1, Mustoe el al.

were poorly conserved, indicating that the covariation model lost specificity during the training process. We thus report the results of using our SHAPE-derived structure to directly search for homologs without training.

5 UTR rplM (-) 3376701-3376819

Comparative genomics predicted locus (Ott et al., 2012; Rivas et al., 2001)

Our low-SHAPE/low-entropy algorithm identified the helix labeled H3 in the 5 UTR of the rplM-rpsI transcript, which encodes ribosomal proteins L13 and S9, respectively. Visual inspection of our structure models revealed upstream helices H1 and H2, which were well-structured under cell-free conditions but poorly-structured under in-cell conditions. Homology searches revealed that H1-H3 were highly conserved, and hence we expanded the originally identified motif to include all three helices.

As discussed in the main text, a recent study reported that L13 translationally represses the rplM-rpsI operon in vivo (Aseev et al., 2016), but no binding motif was reported. We validate using EMSA experiments that the 5 UTR binds L13 in main text Figure 6.

5 UTR crp (+) 3484051-3484113

Comparative genomics predicted locus (Ott et al., 2012; Tran et al., 2009)

This dual hairpin motif was identified in the 5 UTR of crp, 28 nts upstream of the crp start codon. crp encodes the cyclic-AMP (receptor protein (CRP), which globally regulates transcription in response to carbon source availability. Previous studies have shown that crp is post-transcriptionally regulated by the protein CsrA, which binds to the crp 5 UTR to modulate crp translation (Pannuri et al., 2016). CsrA is another global regulator of carbon metabolism, and hence this interaction helps integrate the CRP and CsrA carbon metabolism regulatory circuits.

While the exact binding site of CsrA has not been mapped, the well-conserved 5 helix identified here overlaps one of several bioinformatically predicted CsrA binding sites (Pannuri et al., 2016). Corroborating that this 5 helix is functional in CsrA binding, the homologous locus in Salmonella enterica was identified as a high-confidence binding site in CsrA pull-down experiments (Holmqvist et al., 2016). Interestingly, this motif does not conform to the preferred binding consensus of CsrA, which is a GGA motif located in a hairpin loop; the nearest GGA motif is stably paired within the 5 stem. Thus, the role of this conserved structure in CsrA binding requires futher study.

Intergenic [glmU, glmS] (-) 3911734-3911774

GammaproteobacteriaEnterobacteriales (34%)

GammaproteobacteriaEnterobacteriales (100%)

10

Page 11:  · Web viewThis structure recapitulates the majority of the known L4-binding ribosomal protein autoregulatory element (Fu et al., 2013). The accepted phylogenetic structure includes

Table S1, Mustoe el al.

Comparative genomics predicted locus (Ott et al., 2012; Washietl et al., 2005)

The glmS gene, which encodes the enzyme L-glutamine:D-fructose-6-phosphate aminotransferase, has served as an archetype for understanding post-transcriptional regulation by sRNAs. glmS is cotranscribed with the upstream gene glmU and subsequently undergoes RNase E processing to yield a monocistronic glmS transcript (Kalamorz et al., 2007). The GlmZ sRNA binds upstream of the glmS start codon to both stabilize glmS and directly stimulate translation (Urban and Vogel, 2008). The mechanism for this translation stimulation is thought to derive from the glmS structure; structure predictions suggested that the GlmS binding site is typically paired with the glmS ribosome binding site (RBS), and hence GlmS binding would unmask the RBS (Kalamorz et al., 2007; Urban and Vogel, 2008).

Informed by our read-depths, we modeled this mRNA as the unprocessed dicistronic glmU-glmS transcript. While this is not the monocistronic glmS product that binds GlmZ, the glmU-glmS intergenic region is identical to the 5 UTR of monocistronic glmS, and our structure models indicate that glmU does not interact with the IGR or the glmS coding sequence. In contrast to prior structure predictions, both our SHAPE data and structural models indicate that the glmS RBS is unstructured in all conditions. This corroborates observations from a previous in vitro SHAPE probing study, which indicated that the RBS was SHAPE reactive (Salim et al., 2012). Combined, these observations indicate that the mechanism for how GlmZ stimulates glmS translation may need revision. Nevertheless, we did identify two well-structured elements in the glmU-glmS intergenic region that based on their conservation are likely important for glmS regulation.

The dual hairpin shown here is located directly between two functional sequences important to GlmZ binding. Immediately upstream is an A-rich sequence that helps recruit the protein Hfq (Salim et al., 2012), and immediately downstream is the GlmZ binding site. Previous studies have observed these structures (Kalamorz et al., 2007; Salim et al., 2012), but neither their conservation nor function has been tested. Our homolog search revealed that these two hairpins are highly conserved in enterobacteria, with particularly strong sequence conservation of the 5 helix. We propose that this helix helps bind and recruit Hfq.

GammaproteobacteriaEnterobacteriales (81%)

11

Page 12:  · Web viewThis structure recapitulates the majority of the known L4-binding ribosomal protein autoregulatory element (Fu et al., 2013). The accepted phylogenetic structure includes

Table S1, Mustoe el al.

Intergenic [glmU, glmS] (-) 3911797-3911828

Comparative genomics predicted locus (Washietl et al., 2005)

This is the second motif identified in the glmU-glmS intergenic region. This hairpin is located immediately downstream of the glmU start codon and the RNase E processing site in the glmU-glmS intergenic region. This hairpin has been observed in prior studies (Kalamorz et al., 2007; Salim et al., 2012), but its significance has not been tested. Given its robust conservation, we suggest that this hairpin plays some role in RNase E processing and in stabilizing the resultant monocistronic glmS product.

Intergenic [trxA, rho] (+) 3964270-3964382

Comparative genomics predicted locus (Ott et al., 2012)

This structure is located approximately 60-nt upstream of rho, which encodes the essential transcription termination factor Rho (Keseler et al., 2013). Significantly, this structure overlaps two Rho-dependent termination sites (indicated by arrows) that autogenously regulate Rho expression (Matsumoto et al., 1986; Peters et al., 2009). The 5-most hairpin also overlaps the putative rhoL open reading frame (Keseler et al., 2013).

The determinants of Rho-dependent termination remain poorly understood but appear to involve a complex interplay between sequence, secondary structure, polymerase elongation rate, and translation rate (Peters et al., 2011). The strong conservation of this structure upstream of enterobacterial rho genes suggests that it plays a role

GammaproteobacteriaEnterobacteriales (41%)

GammaproteobacteriaEnterobacteriales (31%)

12

Page 13:  · Web viewThis structure recapitulates the majority of the known L4-binding ribosomal protein autoregulatory element (Fu et al., 2013). The accepted phylogenetic structure includes

Table S1, Mustoe el al.

in autoregulation. We suggest two potential roles for these structures in Rho-concentration dependent termination of rho. First, these structures may contribute to a timing mechanism, where Rho has only a short time window to bind before hairpin folding inhibits binding, or translation of rhoL catches up to the paused polymerase, preventing termination. Second, these structures may directly interact with RNA polymerase to induce pausing, or directly interact with Rho to modulate termination efficiency (Hollands et al., 2014; Sevostyanova and Groisman, 2015).

13

Page 14:  · Web viewThis structure recapitulates the majority of the known L4-binding ribosomal protein autoregulatory element (Fu et al., 2013). The accepted phylogenetic structure includes

Table S1, Mustoe el al.

Intergenic [rplL, rpoB] (+) 4178966-4179013

Comparative genomics predicted locus (Livny et al., 2008; Ott et al., 2012; Rivas and Eddy, 2001; Rivas et al., 2001; Uzilov et al., 2006; Washietl et al., 2005)

This dual hairpin motif is located in a long intergenic region that regulates expression of the downstream rpoBC gene cluster. rpoB and rpoC encode the essential RNA polymerase subunits and , and are transcribed from promoters upstream of rplL (ribosomal protein L7) (Keseler et al., 2013). Prior studies have shown that the rplL-rpoB intergenic region directs three distinct (although potentially coupled) post-transcriptional regulatory mechanisms: (i) rpoBC mRNA expression is regulated by a transcriptional attenuator downstream of rplL, (ii) the rplL-rpoBC polycistronic mRNA is segmented by specific RNase III cleavage, and (iii) translation of rpoBC is autogenously regulated via interactions with RNA polymerase subunits and (Dennis, 1984; Downing and Dennis, 1987; Passador and Linn, 1989).

The two hairpins shown here are located immediately downstream of rplL, and the 3 hairpin corresponds to the transcriptional attenuator described by prior studies (Barry et al., 1980). Given that rpoB-rpoC does not have an independent promoter, this attenuator effectively controls rpoB-rpoC expression. Remarkably, the read-through ratio of this attenuator changes in response to the concentration of RNA polymerase holoenzyme, thereby autogenously regulating transcription of the rpoB-rpoC operon (Dykxhoorn et al., 1996; Steward and Linn, 1992). The mechanism underlying this transcriptional autoregulation is unknown, but the specific conservation of this structural motif upstream of rpoB across multiple bacterial phyla indicates the uniqueness and importance of this motif in regulation of RNA polymerase.

We note that this motif is annotated in RFAM as Pseudomonas sRNA P26 (Nawrocki et al., 2015). This annotation is based on a bioinformatics prediction, with Northern analysis used to confirm expression (Livny et al., 2006). To our knowledge, function as an sRNA has not been validated. As this motif is expressed as part of the rplL-rpoB transcript, Northern analysis is insufficient to confirm function as an sRNA. We believe it is most likely that this motif functions as a cis-regulator of rpoB-rpoC transcription.

Intergenic [rplL, rpoB] (+) 4179072-4179191

GammaproteobacteriaBroadly conserved across multiple orders

AlphaproteobacteriaRhodobacterales (6%)

14

Page 15:  · Web viewThis structure recapitulates the majority of the known L4-binding ribosomal protein autoregulatory element (Fu et al., 2013). The accepted phylogenetic structure includes

Table S1, Mustoe el al.

Comparative genomics predicted locus (Ott et al., 2012; Rivas and Eddy, 2001; Rivas et al., 2001; Uzilov et al., 2006; Washietl et al., 2005)

As described above in our discussion of the attenuator downstream of rplL, the rpoBC gene cluster encoding RNA polymerase subunits and is post-transcriptionally regulated by three mechanisms. The structure identified here is located downstream of the attenuator and 76-nts upstream of the rpoB start codon. The 5-most stem comprises an RNase III processing site, recapitulating prior computational structure predictions (Barry et al., 1980). We also observe additional conserved structure on either side of the cleavage site. The role of RNase III cleavage in rpoBC regulation is poorly understood, although several studies have concluded that it contributes to regulation of rpoBC expression and translation autoregulation (Dennis, 1984; Dykxhoorn et al., 1997; Passador and Linn, 1989; 1992). The universal conservation of this motif in enterobacteria argues for an essential role in rpoBC regulation.

Intergenic [hfq, hflX] (+) 4398597-4398669

This structure overlaps the hfq coding region and extends into the hfq-hflX IGR, ending 25 nts upstream of the hflX start codon. hfq encodes the key sRNA chaperone protein Hfq and the downstream hflX gene encodes a ribosome factor involved in ribosome rescue during heat shock (Keseler et al., 2013; Zhang et al., 2015). hfq and hflX are co-transcribed. Previous studies identified a rho-independent terminator, an RNase cleavage site, and a rho-dependent terminator in this region (Tsui et al., 1996; 1994). Differential usage of these elements apparently allows variation in the relative expression of hflX and hfq under stress conditions (Tsui et al., 1996).Uncharacterized structures in genomic loci with moderate evidence of regulatory function

GammaproteobacteriaEnterobacteriales (100%)

GammaproteobacteriaEnterobacteriales (72%)

15

Page 16:  · Web viewThis structure recapitulates the majority of the known L4-binding ribosomal protein autoregulatory element (Fu et al., 2013). The accepted phylogenetic structure includes

Table S1, Mustoe el al.

Intergenic [rpsB, tsf] (+) 190774-190834

16

Page 17:  · Web viewThis structure recapitulates the majority of the known L4-binding ribosomal protein autoregulatory element (Fu et al., 2013). The accepted phylogenetic structure includes

Table S1, Mustoe el al.

Comparative genomics predicted locus (Ott et al., 2012; Uzilov et al., 2006)

tsf (encoding ribosome elongation factor Ts) is regulated by the S2-binding RARE located upstream of rpsB (encoding ribosomal protein S2) (Aseev et al., 2008). How S2-binding represses expression of the downstream tsf is poorly understood, but is coordinated through the 258-nt intergenic region separating rpsB and tsf. This intergenic region contains an ERIC element (Wilson and Sharp, 2006) (excluded from our analysis), as well as the dual hairpins shown here. The first hairpin is a predicted transcriptional terminator, although minimal termination is observed during exponential growth (Conway et al., 2014). Our homolog search returned 17 matches, 13 of which shared high sequence and structure conversation. The other four homologues had significantly lower bitscores, were derived from distant bacterial species, and only shared the relatively non-descript 5-hairpin, and were therefore excluded from the consensus. The strong conservation of this intergenic motif across a subset of enterobacteria suggests that these hairpin structures play some role in coordinating S2 repression of rpsB, perhaps by causing transcription termination in the absence of rpsB translation.

Intergenic [rpmF, plsX] (+) 1146753-1146818

This structure overlaps the terminus of the rpmF coding region and extends through the majority of the rpmF-plsX IGR, ending 25 nt upstream of the plsX start codon. rpmF and plsX are co-transcribed but encode proteins with distinct functions and expression levels (Podkovyrov and Larson, 1995). rpmF encodes ribosomal protein L32 and is 7-fold more abundant than the downstream plsX, which encodes a poorly characterized lipid biosynthesis protein (Keseler et al., 2013; Li et al., 2014). The regulation of the rpmF-plsX operon is poorly described. We hypothesize that this structure serves as a transcription attenuator as we observed a large decrease in read-depth directly after this motif. There is strong sequence conservation across multiple enterobacteria, also supporting a role for this motif in the complex regulation of plsX.

Intergenic [oppA, oppB] (+) 1300821-1300926

GammaproteobacteriaEnterobacteriales (41%)

GammaproteobacteriaEnterobacteriales (38%)

17

Page 18:  · Web viewThis structure recapitulates the majority of the known L4-binding ribosomal protein autoregulatory element (Fu et al., 2013). The accepted phylogenetic structure includes

Table S1, Mustoe el al.

Comparative genomics predicted locus (Pichon et al., 2012)

This motif spans the entire 85-nt oppA-oppB IGR. Both genes encode components of the oligopeptide ABC transporter and are transcribed as part of the polycistronic oppABCDF operon under the control of a single promoter upstream of oppA (Keseler et al., 2013). Ribosome-profiling experiments indicate that OppA and OppB proteins are synthesized at an 8:1 ratio (Li et al., 2014). This differential synthesis is achieved by a combination of transcriptional and translational mechanisms: 50% of transcripts terminate before oppB, yielding a 2:1 excess of oppA mRNA, and oppB has 4-fold lower translation efficiency (Li et al., 2014). This structure is likely important for both mechanisms. The strong upper stem is a predicted terminator (Salgado et al., 2013), and the lower stem is likely important for translationally coupling oppA and oppB.

Intergenic [iscA, hscB] (-) 2657466-2657571

Comparative genomics predicted locus (Ott et al., 2012)

This structure is located in the iscA-hscB IGR. Both genes encode iron-sulfur cluster biosynthesis proteins that are co-transcribed as part of the long iscRSUA-hscBA-fdx iron-sulfur operon (Conway et al., 2014; Salgado et al., 2013). The 5-hairpin is a rho-independent transcription attenuator that regulates relative expression of hscB – approximately two-thirds of transcripts terminate after iscA under log-growth conditions (Li et al., 2014; Salgado et al., 2013). hscB is also post-transcriptionally regulated by small RNAs targeted to upstream genes (Keseler et al., 2013). This post-transcriptional regulation is most likely due to small RNA promoted degradation, although modulation of transcription termination via disruption of transcription-translation coupling is also a possibility.

5 UTR rpsP (-) 2744229-2744265

GammaproteobacteriaEnterobacteriales (69%)

GammaproteobacteriaEnterobacteriales (66%)

18

Page 19:  · Web viewThis structure recapitulates the majority of the known L4-binding ribosomal protein autoregulatory element (Fu et al., 2013). The accepted phylogenetic structure includes

Table S1, Mustoe el al.

Comparative genomics predicted locus (Livny et al., 2008; Ott et al., 2012; Pichon et al., 2012; Rivas and Eddy, 2001; Rivas et al., 2001; Uzilov et al., 2006; Washietl et al., 2005)

rpsP encodes ribosomal protein S16, and is the first gene of the polcistronic rpsP-rimM-trmD-rplS transcript. The 5 UTR for this transcript is a relatively short 68 nts, with the hairpin motif identified here located immediately downstream of the transcription start and 21 nts upstream of the rpsP start, and comprising the only structured element in the UTR. Previous studies have shown that this hairpin is a strong transcription attenuator in vitro, with 60% of transcripts terminating prematurely in the UTR (Byström et al., 1989). This hairpin is also an active attenuator in vivo, generating small RNA fragments detectable by Northern blot (Kawano et al., 2005). Strikingly, this attenuator is universally conserved upstream of rpsP in non-endosymbiont enterobacteria.

While this hairpin is clearly functional as a transcription attenuator, its broader role is unclear. In particular, S16 is an essential and highly abundant protein, and correspondingly, the rpsP-rimM-trmD-rplS transcript is highly expressed. It therefore seems disadvantagous to maintain a constitutive attenuator upstream of rpsP. We suggest two potential functions. First, this motif may form the basis of a ribosomal-protein autoregularotry element (RARE) that acts at the transcriptional level. The rpsP-rimM-trmD-rplS operon is one of a minority of ribosomal protein operons without a known RARE. However, an earlier study concluded the operon was not autoregulated at either the transcriptional or translational level (Wikström et al., 1988). It is possible that the over-expression system used by this study masked transcription regulation that occurs at endogenous expression levels, or that the RARE responds to a protein ligand encoded on a different operon. Alternatively, the terminated 5 UTR product may have its own distinct cellular role as a small RNA (Kawano et al., 2005).

We finally note that the covariation model used to identify homologs was highly specific within enterobacteria (homologs were only identified in front of rpsP), but 8 homologs were identified in distant bacterial phyla that were not associated with rpsP. Given the simplicity of this terminator motif, these likely reflect non-specific matches (and indeed have less significant bitscores than enterobacterial homologs). These 8 distant homologs were excluded when from the consensus structure above.

3 UTR yrbL (+) 3347165-3347204

This motif is located in the approximately 300-nt 3 UTR downstream of the monocistronic yrbL transcript, which encodes a protein of unknown function. Previous studies identified a small RNA originating from this locus (Kawano et al., 2005). We thus suggest that this structure is involved in processing and/or function of the yrbL sRNA.Intergenic [gpsA, cysE] (-) 3780609-3780640

Motif overlaps mtgA coding region

GammaproteobacteriaEnterobacteriales (100%)

19

Page 20:  · Web viewThis structure recapitulates the majority of the known L4-binding ribosomal protein autoregulatory element (Fu et al., 2013). The accepted phylogenetic structure includes

Table S1, Mustoe el al.

This motif was identified in the intergenic region between gpsA and cysE, 23 nts upstream of the cysE start codon. cysE encodes serine acetyltransferase, which converts serine to O-acteyl-L-serine as the first step in cysteine biosynthesis. Previous studies have shown that cysE is negatively regulated by the small RNA RyhB, which pairs with the -4 to +8 region of cysE relative to the start codon (Salvail et al., 2010). 9 homologs were identified upstream of csyE in enterobacterial genera, with an additional 3 non-specific homologs identified in distant phyla. These non-specific homologs most likely represent non-specific matches to the GC-rich stem and were not included in the consensus structure shown above.

Intergenic [rpoB, rpoC] (+) 4183290-4183365

Comparative genomics predicted locus (Rivas and Eddy, 2001; Rivas et al., 2001; Uzilov et al., 2006; Washietl et al., 2005)

This triple hairpin motif was identified in the intergenic region linking rpoB and rpoC, which encode the and subunits of RNA polymerase, respectively. As detailed in discussion of the rpoB motifs above, rpoC expression is post-transcriptionally regulated via several structures upstream of rpoB. In addition, studies have also reported that the and subunits independently regulate translation of rpoC (Dykxhoorn et al., 1996). The mechanism underyling this translation regulation is poorly understood. The conservation of this motif in enterobacteria, with particularly high conservation of the 3 helix that overlaps the rpoC Shine-Dalgarno sequence, is consistent with this motif being involved in the translation regulation of rpoC by and .

Novel structures in uncharacterized regionsIntergenic [pyrH, frr] (+) 192799-192840

Comparative genomics predicted locus (Ott et al., 2012)

We identified two adjacent hairpin motifs in the intergenic region separating pyrH (UMP kinase) and frr (ribosome recycling factor RF4). The first hairpin is likely an unannotated repetitive extragenic palindromic element (see table entry below). The second motif shown here is located about 30 nts upstream of the frr start codon.5 UTR accA (+) 208285-208377

GammaproteobacteriaEnterobacteriales (19%)

GammaproteobacteriaEnterobacteriales (84%)

GammaproteobacteriaEnterobacteriales (28%)

20

Page 21:  · Web viewThis structure recapitulates the majority of the known L4-binding ribosomal protein autoregulatory element (Fu et al., 2013). The accepted phylogenetic structure includes

Table S1, Mustoe el al.

The monocistronic accA transcript encodes the subunit of acetyl-CoA carboxylase (ACC), which catalyzes the first step of fatty acid biosynthesis (Keseler et al., 2013). The ACC holoenzyme is composed of four subunits that are encoded on three different operons, hence requiring coordinated expression from three different mRNAs (Li et al., 2014). Transcriptional regulation of accA by the global fatty acid biosynthesis activator fadR has been described (My et al., 2015), but the mechanism regulating stoichiometric protein production is unknown. Translational regulation of accA by sequences in the coding region was proposed (Meades et al., 2010) but later refuted (Smith and Cronan, 2014). Potential regulatory function of the ~300-nt 5 UTR has not been explored.

Two motifs were identified in our analysis. accA is transcribed from two promoters; the motif identified here is unique to the distal isoform.

Motif overlaps dnaE coding region

21

Page 22:  · Web viewThis structure recapitulates the majority of the known L4-binding ribosomal protein autoregulatory element (Fu et al., 2013). The accepted phylogenetic structure includes

Table S1, Mustoe el al.

5 UTR accA (+) 208436-208599

The second of two motifs identified in the 5 UTR of accA. This motif is located downstream of the proximal accA promoter, and thus should be present in both 5 UTR isoforms.

3 UTR rpsA (+) 962885-962967

GammaproteobacteriaEnterobacteriales (31%)

Motif overlaps dnaE coding region

22

Page 23:  · Web viewThis structure recapitulates the majority of the known L4-binding ribosomal protein autoregulatory element (Fu et al., 2013). The accepted phylogenetic structure includes

Table S1, Mustoe el al.

Comparative genomics predicted locus (Livny et al., 2008; Rivas and Eddy, 2001; Rivas et al., 2001; Uzilov et al., 2006; Washietl et al., 2005)

This motif is located in the 3 UTR of the monocistronic rpsA transcript, which encodes ribosomal protein S1. In E. coli, the top of the 5 helix and the adjacent polyU bulge is predicted to be a transcriptional terminator (Salgado et al., 2013). Previous 3-end mapping experiments and our sequencing read depths suggest minimal termination occurs at this location (Conway et al., 2014). The polyU sequences found in E. coli are also poorly conserved, arguing against terminator function. The strong covariation of this helix suggests some functional role.

5 UTR prs (-) 1261129-1261237

Comparative genomics predicted locus (Ott et al., 2012; Uzilov et al., 2006)

The monocistronic prs transcript encodes ribose-phosphate diphosphokinase, a key enzyme in nucleotide salvage and other metabolic pathways (Keseler et al., 2013). Feedback inhibition of the prs promoter by the purine purR-hypoxanthine repressor has been described (He et al., 1993). prs expression is also negatively regulated by pyrimidine through unknown mechanisms (Post et al., 1993). Intriguingly, other E. coli pyrimidine biosynthesis genes are regulated by a UTP-sensitive transcription attenuation mechanism dependent on hairpin- and sequence-induced polymerase pauses (Turnbough and Switzer, 2008). These hairpins may contribute to a similar pyrimidine-responsive transcription regulation mechanism.

5 UTR rcsB (+) 2314096-2314169

GammaproteobacteriaEnterobacteriales (50%)

23

Page 24:  · Web viewThis structure recapitulates the majority of the known L4-binding ribosomal protein autoregulatory element (Fu et al., 2013). The accepted phylogenetic structure includes

Table S1, Mustoe el al.

This motif was identified 29 nts upstream of the rcsB start codon in the 5 UTR of the monocistronic rcsB transcript. rcsB encodes the protein RcsB, which is part of a multicomponent phosphorelay system involved in regulation of capsul synthesis.

Intergenic [ackA, pta] (+) 2412700-2412760

Comparative genomics predicted locus (Ott et al., 2012; Rivas and Eddy, 2001; Rivas et al., 2001; Uzilov et al., 2006; Washietl et al., 2005)

ackA (acetate kinase A) and pta (phosphate acetyltransferase) are two of the central enzymes in acetate metabolism and are co-transcribed from a promoter upstream of ackA (Conway et al., 2014; Keseler et al., 2013; Salgado et al., 2013). This motif is located in the ackA-pta IGR, beginning 5-nts downstream of ackA and ending 8-nts upstream of the pta start codon. The 5-hairpin was previously proposed to be a Rho-independent transcriptional terminator (Salgado et al., 2013), although 3-end mapping experiments (Conway et al., 2014) and our read-depths indicate minimal termination at this location. It has also been reported that pta is transcribed from an independent promoter within the ackA coding region (Kakuda et al., 1994), and thus this motif may be present in the 5 UTR of a monocistronic pta transcript.

Intergenic [alaS, csrA] (-) 2817350-2817398

Comparative genomics predicted locus (Rivas and Eddy, 2001; Uzilov et al., 2006; Washietl et al., 2005)

This is the second of two motifs identified in the alaS-csrA intergenic region (see discussion in prior table entry). This hairpin is located 4 nts downstream of the alaS stop codon and overlaps several alternative csrA-specific

GammaproteobacteriaEnterobacteriales (53%)

GammaproteobacteriaEnterobacteriales (16%)

Motif overlaps rcsD coding reigon

24

Page 25:  · Web viewThis structure recapitulates the majority of the known L4-binding ribosomal protein autoregulatory element (Fu et al., 2013). The accepted phylogenetic structure includes

Table S1, Mustoe el al.

promoters (see figure) (Yakhnin et al., 2011). Our homolog search returned five close homologs in enterobacterial species, as well as one non-specific homolog in the Oceanospirillales Chromohalobacter salexigens. This non-specific homolog was not included in the consensus structure drawn above.

5 UTR epd (-) 3071723-3071805

Comparative genomics predicted locus (Pichon et al., 2012; Rivas et al., 2001)

This motif is located 9 nts upstream of the epd CDS, which encodes the enzyme D-erythrose-4-phosphate dehydrogenase. Previous work has shown that the 3 helix reduces epd translation efficiency due to overlap with the epd ribosome binding site (Bardey et al., 2005). Indeed, it is uncommon to observe such an extended highly stable helix in a 5 UTR. epd is part of the larger epd-pgk-fbaA operon, which is subject to differential post-transcriptional regulation by RNase E cleavage (Bardey et al., 2005), although this structure has not been implicated in cleavage. Moderate covariation and conservation across enterobacterial genera suggests some functional role. Our homolog search also returned one match in the very distant Actinobacteria B. bifidum, which is most likely a non-specific match, and which we excluded from the consensus structure shown above.

3 UTR rpsU (+) 3209004-3209053

This motif overlaps the terminus of the rpsU CDS, which encodes ribosomal protein S21. This CDS overlap explains the high sequence conservation at the 5 end of the motif, but the corresponding conservation of the 3 end is highly supportive of pairing between the rpsU CDS and the downstream intergenic region. rpsU is the first gene in the larger rpsU-dnaG-rpoD operon, which is subject to complex post-transcriptional regulation, including by a transcriptional terminator immediately following the motif identified here (Burton et al., 1983). Thus, this motif may facilitate differential usage of the rpsU terminator (altering read-through to the dnaG gene). Alternatively, rpsU is one of the few ribosomal protein genes without a known autoregulatory element (RARE). It is conceivable that this motif contributes to some sort of feedback control mechanism for S21 expression.5 UTR rpoD (+) 3210969-3211034

GammaproteobacteriaEnterobacteriales (63%)

GammaproteobacteriaEnterobacteriales (41%)

25

Page 26:  · Web viewThis structure recapitulates the majority of the known L4-binding ribosomal protein autoregulatory element (Fu et al., 2013). The accepted phylogenetic structure includes

Table S1, Mustoe el al.

Comparative genomics predicted locus (Ott et al., 2012; Rivas and Eddy, 2001)

This motif is located 34 nts upstream of rpoD, which encodes RNA polymerase sigma factor 70, the primary sigma factor during exponential growth. This motif is only found among the very closely related genera of E. coli, Shigella dysenteria, and Shigella flexneri, with 100% sequence conservation, and hence lacks significant evolutionary support. Experiments suggest that rpoD transcription is subject to transcription attenuation (with transcription stimulated by NusA), which may reflect underlying post-transcriptional regulation (Peacock et al., 1985). rpoD is also part of the rpsU-dnaG-rpoD operon which is subject to complex post-transcriptional regulation, including RNase E processing upstream of this motif (Yajnik and Godson, 1993).

Intergenic [nlpI, yrbN] (-) 3306007-3306062

Comparative genomics predicted locus (Washietl et al., 2005)

This motif is located immediately downstream of nlpI, an adaptor lipoprotein involved in cell division, and 51 nts upstream of yrbN, a bioinformatically identified small open reading frame (26 amino acids) of unknown function (Hemm et al., 2008). Interestingly, the yrbN ORF overlaps the start of deaD, a DEAD-box RNA helicase that plays an important role in ribosome assembly, and which may also potentially interact with other cellular RNAs (Keseler et al., 2013). Regulation of nlpI, yrbN, and deaD is very poorly described, but the broad conservation of this motif in enterobacteria suggests it plays some functional role.

Intergenic [hpf, ptsN] (+) 3344489-3344587

Comparative genomics predicted locus (Ott et al., 2012; Rivas and Eddy, 2001; Rivas et al., 2001; Uzilov et al., 2006; Washietl et al., 2005)

GammaproteobacteriaEnterobacteriales (19%)

GammaproteobacteriaEnterobacteriales (9%)

GammaproteobacteriaEnterobacteriales (78%)

26

Page 27:  · Web viewThis structure recapitulates the majority of the known L4-binding ribosomal protein autoregulatory element (Fu et al., 2013). The accepted phylogenetic structure includes

Table S1, Mustoe el al.

This intergenic motif is located in a long polycistronic transcript that encodes diverse metabolism-related genes. The upstream gene hpf encodes ribosome hibernation promoting factor (Keseler et al., 2013). The downstream ptsN encodes enzyme IIANtr, a key protein in a phosphorelay system that regulates K+ transport and nitrogen metabolism (Keseler et al., 2013). A weak ptsN-specific promoter has been reported; however, ptsN is primarily expressed in conjunction with upstream genes (Powell et al., 1995; Salgado et al., 2013).

The 3 hairpin is a predicted terminator (Salgado et al., 2013), but 3-end mapping experiments (Conway et al., 2014) and our read-depths indicate minimal termination at this location. Thus, this motif could be a conditional terminator. The close proximity of this motif to the ptsN start codon (upstream by 12 nts) also suggests the potential for translational regulation.

5 UTR accB (+) 3403251-3403340

Comparative genomics predicted locus (Ott et al., 2012)

This motif was identified in the 296 nt long 5 UTR of the accB-accC operon. AccB and AccC comprise two of the four subunits of acetyl-CoA carboxylase (ACC), which catalyzes the first step of fatty acid biosynthesis. Previous studies have shown that accB-accC is autoregulated at the transcription promoter level (James and Cronan, 2004), but post-transcriptional regulation has not been explored. Interestingly, this motif is the second complex structure identified upstream of ACC genes, with motifs also found upstream of accA (see preceding table entries).

Our homolog search procedure identified homologs distributed broadly among multiple bacterial phyla. However, RNA structure was poorly conserved, and most homologs appeared to be specifically associated with the gene aroD (3-dehydroquinate dehydratase) rather than accB or accC. Analysis indicated that aroD lies immediately upstream of accB in most enterobacteria, with homologs of this RNA motif extending into the aroD ORF. Thus, during training the covariation model rapidly became specific for the aroD ORF. Searching without model training identified 5 matches in closely related genera, with some evidence of structure covariation.

5 UTR nfuA (+) 3543508-3543598

nfuA encodes an Fe-S cluster scaffold protein important in Fe-S biogenesis under oxidative stress, and which also has potential chaperone functions (Angelini et al., 2008). The monocistronic nfuA mRNA has two promoters; this motif is unique to the distal transcript isoform.

Motif overlaps gntX coding region

GammaproteobacteriaEnterobacteriales (16%)

27

Page 28:  · Web viewThis structure recapitulates the majority of the known L4-binding ribosomal protein autoregulatory element (Fu et al., 2013). The accepted phylogenetic structure includes

Table S1, Mustoe el al.

5 UTR rpoH (-) 3598830-3598864

rpoH encodes the RNA polymerase factor sigma 32, the primary heat shock sigma factor. Previous studies have shown that rpoH is translationally regulated by a thermoswitch in the CDS that masks the ribosome binding site at low temperature, but is destabilized at temperatures greater than 37 °C (Morita et al., 1999). While further study of the rpoH thermosensing mechanism is warranted, it appears unlikely that this motif contributes to thermosensing. In particular, gene fusions containing only 19 nts upstream of rpoH retain thermosensing function (Nagai et al., 1991), whereas the motif identified here is located 24 nts upstream. This motif is only found among the very closely related genera of E. coli, S. dysenteria, and S. flexneri, and hence lacks significant evolutionary support.

5 UTR rpmB (-) 3809744-3809817

Comparative genomics predicted locus (Ott et al., 2012; Rivas et al., 2001; Uzilov et al., 2006; Washietl et al., 2005)

We identified this highly structured three-way junction in the 5 UTR of the polycistronic transcript encoding rpmB (ribosomal protein L28) and rpmG (ribosomal protein L33). A prior study observed possible cryptic signs of autoregulation of the rpmB-rpmG operon in vivo, explainable by L28 acting in combination with an unknown factor (Maguire and Wild, 1997), while no autoregulation was observed by another study (Aseev et al., 2016).

Validation of this motif as a L9 and L28 binder is discussed in the main text using a construct that stretched into the rpmB CDS. Homology searches performed for the entire construct revealed strong sequence conservation downstream of the three-way junction, but no clear structural conservation (Fig. S7). We therefore explored whether the minimal three-way junction motif shown above was sufficient to bind L9 and/or L28. EMSAs using the isolated three-way junction did not bind L9 or L28 protein, leading us to conclude that both L9 and L28 interact with downstream sequence elements that overlap the RBS (Fig. S7).

5 UTR rpmH (+) 3882234-3882326

GammaproteobacteriaEnterobacteriales (9%)

GammaproteobacteriaEnterobacteriales (100%)

28

Page 29:  · Web viewThis structure recapitulates the majority of the known L4-binding ribosomal protein autoregulatory element (Fu et al., 2013). The accepted phylogenetic structure includes

Table S1, Mustoe el al.

Comparative genomics predicted locus (Ott et al., 2012)

We identified two well-structured hairpins motif in the 220-nt 5 UTR of the rpmH-rnpA transcript, approximately 30 nts upstream of the rpmH start codon. rpmH encodes ribosomal protein L34, and rnpA encodes C5, the protein component of RNase P (Keseler et al., 2013). rpmH is expressed in large excess relative to rnpA due to a transcription terminator located downstream of rpmH (Hansen et al., 1982; 1985). The 5 hairpin overlaps the -10 sequence of an alternative rpmH promoter, but this alone is insufficient to explain its extremely high sequence conservation.

Validation of this motif as a C5 binder is discussed in the main text. For these validation experiments we used a construct consisting of the RNA shown in main text Figure 6D, which contains sequences both upstream and downstream of these hairpins. When homology searches were performed using the entire 5 UTR, only the 5 helix (H2 in Figure 6D) and flanking A/U-rich sequences were conserved. Homology searches using only the two hairpins shown above revealed very weak conservation of the 3 helix (shown above).

3 UTR fdhE (-) 4078282-4078313

This hairpin was identified in the 3 UTR of the fdoGHI-fdhE transcript, 8 nts downstream of the fdhE stop codon. fdhE encodes formate dehydrogenase formation protein. Our homolog search returned matches in the closely related genera of E. coli, Shigella dysenteria, and Shigella flexneri, as well non-specific matches in widely distributed bacteria phyla, suggesting that the covariation model for this simple hairpin is non-specific. These non-specific matches were excluded from the consensus model shown above.

3 UTR pfkA (+) 4106542-4106572

Comparative genomics predicted locus (Livny et al., 2008; Pichon et al., 2012; Rivas and Eddy, 2001)

This hairpin was identified in the 132-nt long 3 UTR of the pfkA transcript, immediately downstream of the pfkA stop codon. pfkA encodes 6-phosphofructokinase I, a key enzyme regulating glycolysis. This motif is only found

GammaproteobacteriaEnterobacteriales (9%)

GammaproteobacteriaEnterobacteriales (88%)

GammaproteobacteriaEnterobacteriales (9%)

29

Page 30:  · Web viewThis structure recapitulates the majority of the known L4-binding ribosomal protein autoregulatory element (Fu et al., 2013). The accepted phylogenetic structure includes

Table S1, Mustoe el al.

among the very closely related genera of E. coli, Shigella dysenteria, and Shigella flexneri, and hence lacks significant evolutionary support.

3 UTR oxyR (+) 4157443-4157478

This hairpin is in the 110-nt long 3 UTR of oxyR, which encodes the DNA-binding protein OxyR that is the primary oxidative stress regulator in E. coli. oxyR is heavily regulated at the transcriptional level (Keseler et al., 2013), but post-transcriptional regulation has not been explored.

3 UTR hupA (+) 4198554-4198591

This motif originates in the hupA CDS and stretches into the short 3 UTR of the hupA transcript. A terminator stem is located immediately downstream. hupA encodes the subunit of the global transcriptional regulator HU, which is extensively regulated at the transcriptional level, but post-transcriptional regulation has not been described. The strong sequence conservation of the 5 half of the motif is expected because it overlaps the CDS, while conservation of the 3 half indicates conserved pairing between the CDS and the 3 UTR. Homologs were also identified in the phylogenetically distant bacteria Idiomarina loihiensis and Enterococcus mundtii. These homologs exhibited poorer structural conservation, and likely reflect matches based on the hupA CDS. For this purpose, we removed these homologs from the final multiple alignment.

3 UTR aceA (+) 4216709-4216816

aceA encodes isocitrate lyase, an enzyme involved in acetate metabolism. aceA is transcribed as part of the long

Motif overlaps sthA coding region

GammaproteobacteriaEnterobacteriales (59%)

Motif overlaps aceK coding region

30

Page 31:  · Web viewThis structure recapitulates the majority of the known L4-binding ribosomal protein autoregulatory element (Fu et al., 2013). The accepted phylogenetic structure includes

Table S1, Mustoe el al.

polycistronic aceB-aceA-aceK transcript. aceK is expressed 100-fold less than aceA-aceB due to premature termination inside of the aceK CDS. This termination produces a long structured 3 UTR that includes multiple upstream REP elements as well as the structured motif identified here (Chung et al., 1993). This structure could potentially be involved in the aceB-aceA termination mechanism, or alternatively, the long 3 UTR could play some other post-transcriptional function.

5 UTR miaA (+) 4397144-4397202

miaA encodes the tRNA modification enzyme MiaA, which modifies the anticodon loop of about 20% of tRNA species to improve translational fidelity (Keseler et al., 2013). miaA is part of a complex operon, possessing multiple promoters, and can be also co-transcribed with upstream genes (Conway et al., 2014; Keseler et al., 2013; Tsui and Winkler, 1994). The motif identified here is located in the approximately 200-nt long 5 UTR transcribed from the proximal promoter, 72 nts upstream of the miaA start codon. Variable degradation by RNase E has been proposed as a potential miaA regulatory mechanism, and this structure could play a role in miaA stability (Tsui and Winkler, 1994).

3 UTR purA (+) 4404017-4404045

Comparative genomics predicted locus (Ott et al., 2012; Pichon et al., 2012; Rivas et al., 2001)

This hairpin motif was identified 8 nts downstream of the purA CDS. purA encodes adenylosuccinate synthetase, which catalyzes the first committed step of de novo purine nucleotide synthesis. This motif is only found among the very closely related genera of E. coli, Shigella dysenteria, and Shigella flexneri, and hence lacks significant evolutionary support.

Likely unannotated REP/ERIC elements

Motif overlaps mutL coding region

GammaproteobacteriaEnterobacteriales (9%)

31

Page 32:  · Web viewThis structure recapitulates the majority of the known L4-binding ribosomal protein autoregulatory element (Fu et al., 2013). The accepted phylogenetic structure includes

Table S1, Mustoe el al.

Intergenic [pyrH, frr] (+) 192636-192773

Comparative genomics predicted locus (Ott et al., 2012; Rivas and Eddy, 2001; Uzilov et al., 2006; Washietl et al., 2005)

This is the second of two motifs in the frr-pyrH intergenic region (see entry above). This motif is palindromic and widely distributed among bacterial phyla with no apparent conserved gene context. We thus classify this motif as an unannotated REP element.

Non-specifically distributed among many bacterial phyla

32

Page 33:  · Web viewThis structure recapitulates the majority of the known L4-binding ribosomal protein autoregulatory element (Fu et al., 2013). The accepted phylogenetic structure includes

Table S1, Mustoe el al.

References for Table S1

Angelini, S., Gerez, C., Ollagnier-de Choudens, S., Sanakis, Y., Fontecave, M., Barras, F., and Py, B. (2008). NfuA, a new factor required for maturing Fe/S proteins in Escherichia coli under oxidative stress and iron starvation conditions. J. Biol. Chem. 283, 14084–14091.

Aseev, L.V., Koledinskaya, L.S., and Boni, I.V. (2016). Regulation of Ribosomal Protein Operons rplM-rpsI, rpmB-rpmG, and rplU-rpmA at the Transcriptional and Translational Levels. J. Bacteriol. 198, 2494–2502.

Aseev, L.V., Levandovskaya, A.A., Tchufistova, L.S., Scaptsova, N.V., and Boni, I.V. (2008). A new regulatory circuit in ribosomal protein operons: S2-mediated control of the rpsB-tsf expression in vivo. RNA 14, 1882–1894.

Bardey, V., Vallet, C., Robas, N., Charpentier, B., Thouvenot, B., Mougin, A., Hajnsdorf, E., Régnier, P., Springer, M., and Branlant, C. (2005). Characterization of the molecular mechanisms involved in the differential production of erythrose-4-phosphate dehydrogenase, 3-phosphoglycerate kinase and class II fructose-1,6-bisphosphate aldolase in Escherichia coli. Mol. Microbiol. 57, 1265–1287.

Barry, G., Squires, C., and Squires, C.L. (1980). Attenuation and processing of RNA from the rplJL--rpoBC transcription unit of Escherichia coli. Proc. Natl. Acad. Sci. U.S.A. 77, 3331–3335.

Burton, Z.F., Gross, C.A., Watanabe, K.K., and Burgess, R.R. (1983). The operon that encodes the sigma subunit of RNA polymerase also encodes ribosomal protein S21 and DNA primase in E. coli K12. Cell 32, 335–349.

Byström, A.S., Gabain, von, A., and Björk, G.R. (1989). Differentially expressed trmD ribosomal protein operon of Escherichia coli is transcribed as a single polycistronic mRNA species. J. Mol. Biol. 208, 575–586.

Chiaruttini, C., Milet, M., and Springer, M. (1996). A long-range RNA-RNA interaction forms a pseudoknot required for translational control of the IF3-L35-L20 ribosomal protein operon in Escherichia coli. EMBO J. 15, 4402–4413.

Christensen, T., Johnsen, M., Fiil, N.P., and Friesen, J.D. (1984). RNA secondary structure and translation inhibition: analysis of mutants in the rplJ leader. EMBO J. 3, 1609–1612.

Chung, T., Resnik, E., Stueland, C., and LaPorte, D.C. (1993). Relative expression of the products of glyoxylate bypass operon: contributions of transcription and translation. J. Bacteriol. 175, 4572–4575.

Climie, S.C., and Friesen, J.D. (1988). In vivo and in vitro structural analysis of the rplJ mRNA leader of Escherichia coli. Protection by bound L10-L7/L12. J. Biol. Chem. 263, 15166–15175.

Conway, T., Creecy, J.P., Maddox, S.M., Grissom, J.E., Conkle, T.L., Shadid, T.M., Teramoto, J., San Miguel, P., Shimada, T., Ishihama, A., et al. (2014). Unprecedented high-resolution view of bacterial operon architecture revealed by RNA sequencing. MBio 5, e01442–14.

Dennis, P.P. (1984). Site specific deletions of regulatory sequences in a ribosomal protein-RNA polymerase operon in Escherichia coli. Effects on beta and beta' gene expression. J. Biol. Chem. 259, 3202–3209.

33

Page 34:  · Web viewThis structure recapitulates the majority of the known L4-binding ribosomal protein autoregulatory element (Fu et al., 2013). The accepted phylogenetic structure includes

Table S1, Mustoe el al.

Desnoyers, G., Morissette, A., Prévost, K., and Massé, E. (2009). Small RNA-induced differential degradation of the polycistronic mRNA iscRSUA. EMBO J. 28, 1551–1561.

Donly, B.C., and Mackie, G.A. (1988). Affinities of ribosomal protein S20 and C-terminal deletion mutants for 16S rRNA and S20 mRNA. Nucleic Acids Res. 16, 997–1010.

Downing, W.L., and Dennis, P.P. (1987). Transcription products from the rplKAJL-rpoBC gene cluster. J. Mol. Biol. 194, 609–620.

Dubey, A.K., Baker, C.S., Romeo, T., and Babitzke, P. (2005). RNA sequence and secondary structure participate in high-affinity CsrA-RNA interaction. RNA 11, 1579–1587.

Dykxhoorn, D.M., St Pierre, R., and Linn, T. (1996). Synthesis of the beta and beta' subunits of Escherichia coli RNA polymerase is autogenously regulated in vivo by both transcriptional and translational mechanisms. Mol. Microbiol. 19, 483–493.

Dykxhoorn, D.M., St Pierre, R., Van Ham, O., and Linn, T. (1997). An efficient protocol for linker scanning mutagenesis: analysis of the translational regulation of an Escherichia coli RNA polymerase subunit gene. Nucleic Acids Res. 25, 4209–4218.

Fu, Y., Deiorio-Haggar, K., Anthony, J., and Meyer, M.M. (2013). Most RNAs regulating ribosomal protein biosynthesis in Escherichia coli are narrowly distributed to Gammaproteobacteria. Nucleic Acids Res. 41, 3491–3503.

Giuliodori, A.M., Di Pietro, F., Marzi, S., Masquida, B., Wagner, R., Romby, P., Gualerzi, C.O., and Pon, C.L. (2010). The cspA mRNA is a thermosensor that modulates translation of the cold-shock protein CspA. Mol. Cell 37, 21–33.

Guillier, M., Allemand, F., Dardel, F., Royer, C.A., Springer, M., and Chiaruttini, C. (2005). Double molecular mimicry in Escherichia coli: binding of ribosomal protein L20 to its two sites in mRNA is similar to its binding to 23S rRNA. Mol. Microbiol. 56, 1441–1456.

Hansen, F.G., Hansen, E.B., and Atlung, T. (1982). The nucleotide sequence of the dnaA gene promoter and of the adjacent rpmH gene, coding for the ribosomal protein L34, of Escherichia coli. EMBO J. 1, 1043–1048.

Hansen, F.G., Hansen, E.B., and Atlung, T. (1985). Physical mapping and nucleotide sequence of the rnpA gene that encodes the protein component of ribonuclease P in Escherichia coli. Gene 38, 85–93.

He, B., Choi, K.Y., and Zalkin, H. (1993). Regulation of Escherichia coli glnB, prsA, and speA by the purine repressor. J. Bacteriol. 175, 3598–3606.

Hemm, M.R., Paul, B.J., Schneider, T.D., Storz, G., and Rudd, K.E. (2008). Small membrane proteins found by comparative genomics and ribosome binding site models. Mol. Microbiol. 70, 1487–1501.

Higashi, K., Terui, Y., Suganami, A., Tamura, Y., Nishimura, K., Kashiwagi, K., and Igarashi, K. (2008). Selective structural change by spermidine in the bulged-out region of double-stranded RNA and its effect on RNA function. J. Biol. Chem. 283, 32989–32994.

Hollands, K., Sevostiyanova, A., and Groisman, E.A. (2014). Unusually long-lived pause required for regulation of a Rho-dependent transcription terminator. Proc. Natl. Acad. Sci. U.S.A. 111, E1999–E2007.

34

Page 35:  · Web viewThis structure recapitulates the majority of the known L4-binding ribosomal protein autoregulatory element (Fu et al., 2013). The accepted phylogenetic structure includes

Table S1, Mustoe el al.

Holmqvist, E., Wright, P.R., Li, L., Bischler, T., Barquist, L., Reinhardt, R., Backofen, R., and Vogel, J. (2016). Global RNA recognition patterns of post-transcriptional regulators Hfq and CsrA revealed by UV crosslinking in vivo. EMBO J. 35, 991–1011.

Iben, J.R., and Draper, D.E. (2008). Specific interactions of the L10(L12)4 ribosomal protein complex with mRNA, rRNA, and L11. Biochemistry 47, 2721–2731.

James, E.S., and Cronan, J.E. (2004). Expression of two Escherichia coli acetyl-CoA carboxylase subunits is autoregulated. J. Biol. Chem. 279, 2520–2527.

Kakuda, H., Hosono, K., Shiroishi, K., and Ichihara, S. (1994). Identification and characterization of the ackA (acetate kinase A)-pta (phosphotransacetylase) operon and complementation analysis of acetate utilization by an ackA-pta deletion mutant of Escherichia coli. J. Biochem. 116, 916–922.

Kalamorz, F., Reichenbach, B., März, W., Rak, B., and Görke, B. (2007). Feedback control of glucosamine-6-phosphate synthase GlmS expression depends on the small RNA GlmZ and involves the novel protein YhbJ in Escherichia coli. Mol. Microbiol. 65, 1518–1533.

Kawano, M., Reynolds, A.A., Miranda-Rios, J., and Storz, G. (2005). Detection of 5“- and 3-”UTR-derived small RNAs and cis-encoded antisense RNAs in Escherichia coli. Nucleic Acids Res. 33, 1040–1050.

Keseler, I.M., Mackie, A., Peralta-Gil, M., Santos-Zavaleta, A., Gama-Castro, S., Bonavides-Martinez, C., Fulcher, C., Huerta, A.M., Kothari, A., Krummenacker, M., et al. (2013). EcoCyc: fusing model organism databases with systems biology. Nucleic Acids Res. 41, D605–D612.

Li, G.-W., Burkhardt, D., Gross, C., and Weissman, J.S. (2014). Quantifying absolute protein synthesis rates reveals principles underlying allocation of cellular resources. Cell 157, 624–635.

Livny, J., Brencic, A., Lory, S., and Waldor, M.K. (2006). Identification of 17 Pseudomonas aeruginosa sRNAs and prediction of sRNA-encoding genes in 10 diverse pathogens using the bioinformatic tool sRNAPredict2. Nucleic Acids Res. 34, 3484–3493.

Livny, J., Teonadi, H., Livny, M., and Waldor, M.K. (2008). High-throughput, kingdom-wide prediction and annotation of bacterial non-coding RNAs. PLoS ONE 3, e3197.

Mackie, G.A., and Parsons, G.D. (1983). Tandem promoters in the gene for ribosomal protein S20. J. Biol. Chem. 258, 7840–7846.

Matsumoto, Y., Shigesada, K., Hirano, M., and Imai, M. (1986). Autogenous regulation of the gene for transcription termination factor rho in Escherichia coli: localization and function of its attenuators. J. Bacteriol. 166, 945–958.

McDowall, K.J., Lin-Chao, S., and Cohen, S.N. (1994). A+U content rather than a particular nucleotide order determines the specificity of RNase E cleavage. J. Biol. Chem. 269, 10790–10796.

Meades, G., Benson, B.K., Grove, A., and Waldrop, G.L. (2010). A tale of two functions: enzymatic activity and translational repression by carboxyltransferase. Nucleic Acids Res. 38, 1217–1227.

Morita, M., Kanemori, M., Yanagi, H., and Yura, T. (1999). Heat-induced synthesis of sigma32 in Escherichia coli: structural and functional dissection of rpoH mRNA secondary structure. J. Bacteriol.

35

Page 36:  · Web viewThis structure recapitulates the majority of the known L4-binding ribosomal protein autoregulatory element (Fu et al., 2013). The accepted phylogenetic structure includes

Table S1, Mustoe el al.

181, 401–410.

My, L., Achkar, N.G., Viala, J.P., and Bouveret, E. (2015). Reassessment of the Genetic Regulation of Fatty Acid Synthesis in Escherichia coli: Global Positive Control by the Functional Dual Regulator FadR. J. Bacteriol. 197, 1862–1872.

Nagai, H., Yuzawa, H., and Yura, T. (1991). Interplay of two cis-acting mRNA regions in translational control of sigma 32 synthesis during the heat shock response of Escherichia coli. Proc. Natl. Acad. Sci. U.S.A. 88, 10515–10519.

Nawrocki, E.P., Burge, S.W., Bateman, A., Daub, J., Eberhardt, R.Y., Eddy, S.R., Floden, E.W., Gardner, P.P., Jones, T.A., Tate, J., et al. (2015). Rfam 12.0: updates to the RNA families database. Nucleic Acids Res. 43, D130–D137.

Ott, A., Idali, A., Marchais, A., and Gautheret, D. (2012). NAPP: the Nucleic Acid Phylogenetic Profile Database. Nucleic Acids Res. 40, D205–D209.

Pannuri, A., Vakulskas, C.A., Zere, T., McGibbon, L.C., Edwards, A.N., Georgellis, D., Babitzke, P., and Romeo, T. (2016). Circuitry Linking the Catabolite Repression and Csr Global Regulatory Systems of Escherichia coli. J. Bacteriol. 198, 3000–3015.

Parsons, G.D., and Mackie, G.A. (1983). Expression of the gene for ribosomal protein S20: effects of gene dosage. J. Bacteriol. 154, 152–160.

Parsons, G.D., Donly, B.C., and Mackie, G.A. (1988). Mutations in the leader sequence and initiation codon of the gene for ribosomal protein S20 (rpsT) affect both translational efficiency and autoregulation. J. Bacteriol. 170, 2485–2492.

Passador, L., and Linn, T. (1989). Autogenous regulation of the RNA polymerase beta subunit of Escherichia coli occurs at the translational level in vivo. J. Bacteriol. 171, 6234–6242.

Passador, L., and Linn, T. (1992). An internal region of rpoB is required for autogenous translational regulation of the beta subunit of Escherichia coli RNA polymerase. J. Bacteriol. 174, 7174–7179.

Peacock, S., Lupski, J.R., Godson, G.N., and Weissbach, H. (1985). In vitro stimulation of Escherichia coli RNA polymerase sigma subunit synthesis by NusA protein. Gene 33, 227–234.

Peng, Y., Curtis, J.E., Fang, X., and Woodson, S.A. (2014a). Structural model of an mRNA in complex with the bacterial chaperone Hfq. Proc. Natl. Acad. Sci. U.S.A. 111, 17134–17139.

Peng, Y., Soper, T.J., and Woodson, S.A. (2014b). Positional effects of AAN motifs in rpoS regulation by sRNAs and Hfq. J. Mol. Biol. 426, 275–285.

Peters, J.M., Mooney, R.A., Kuan, P.F., Rowland, J.L., Keles, S., and Landick, R. (2009). Rho directs widespread termination of intragenic and stable RNA transcription. Proc. Natl. Acad. Sci. U.S.A. 106, 15406–15411.

Peters, J.M., Vangeloff, A.D., and Landick, R. (2011). Bacterial transcription terminators: the RNA 3'-end chronicles. J. Mol. Biol. 412, 793–813.

Pichon, C., Merle, du, L., Caliot, M.E., Trieu-Cuot, P., and Le Bouguénec, C. (2012). An in silico model

36

Page 37:  · Web viewThis structure recapitulates the majority of the known L4-binding ribosomal protein autoregulatory element (Fu et al., 2013). The accepted phylogenetic structure includes

Table S1, Mustoe el al.

for identification of small RNAs in whole bacterial genomes: characterization of antisense RNAs in pathogenic Escherichia coli and Streptococcus agalactiae strains. Nucleic Acids Res. 40, 2846–2861.

Podkovyrov, S., and Larson, T.J. (1995). Lipid biosynthetic genes and a ribosomal protein gene are cotranscribed. FEBS Lett. 368, 429–431.

Post, D.A., Hove-Jensen, B., and Switzer, R.L. (1993). Characterization of the hemA-prs region of the Escherichia coli and Salmonella typhimurium chromosomes: identification of two open reading frames and implications for prs expression. J. Gen. Microbiol. 139, 259–266.

Powell, B.S., Court, D.L., Inada, T., Nakamura, Y., Michotey, V., Cui, X., Reizer, A., Saier, M.H., and Reizer, J. (1995). Novel proteins of the phosphotransferase system encoded within the rpoN operon of Escherichia coli. Enzyme IIANtr affects growth on organic nitrogen and the conditional lethality of an erats mutant. J. Biol. Chem. 270, 4822–4839.

Pulvermacher, S.C., Stauffer, L.T., and Stauffer, G.V. (2008). The role of the small regulatory RNA GcvB in GcvB/mRNA posttranscriptional regulation of oppA and dppA in Escherichia coli. FEMS Microbiol. Lett. 281, 42–50.

Rivas, E., and Eddy, S.R. (2001). Noncoding RNA gene detection using comparative sequence analysis. BMC Bioinformatics 2, 8.

Rivas, E., Klein, R.J., Jones, T.A., and Eddy, S.R. (2001). Computational identification of noncoding RNAs in E. coli by comparative genomics. Curr. Biol. 11, 1369–1373.

Romeo, T., Vakulskas, C.A., and Babitzke, P. (2013). Post-transcriptional regulation on a global scale: form and function of Csr/Rsm systems. Environ. Microbiol. 15, 313–324.

Saito, K., and Nomura, M. (1994). Post-transcriptional regulation of the str operon in Escherichia coli. Structural and mutational analysis of the target site for translational repressor S7. J. Mol. Biol. 235, 125–139.

Salgado, H., Peralta-Gil, M., Gama-Castro, S., Santos-Zavaleta, A., Muniz-Rascado, L., García-Sotelo, J.S., Weiss, V., Solano-Lira, H., Martínez-Flores, I., Medina-Rivera, A., et al. (2013). RegulonDB v8.0: omics data sets, evolutionary conservation, regulatory phrases, cross-validated gold standards and more. Nucleic Acids Res. 41, D203–D213.

Salim, N.N., Faner, M.A., Philip, J.A., and Feig, A.L. (2012). Requirement of upstream Hfq-binding (ARN)x elements in glmS and the Hfq C-terminal region for GlmS upregulation by sRNAs GlmZ and GlmY. Nucleic Acids Res. 40, 8021–8032.

Salvail, H., Lanthier-Bourbonnais, P., Sobota, J.M., Caza, M., Benjamin, J.-A.M., Mendieta, M.E.S., Lépine, F., Dozois, C.M., Imlay, J., and Massé, E. (2010). A small RNA promotes siderophore production through transcriptional and metabolic remodeling. Proc. Natl. Acad. Sci. U.S.A. 107, 15223–15228.

Schlax, P.J., Xavier, K.A., Gluick, T.C., and Draper, D.E. (2001). Translational repression of the Escherichia coli alpha operon mRNA: importance of an mRNA conformational switch and a ternary entrapment complex. J. Biol. Chem. 276, 38494–38501.

Sevostyanova, A., and Groisman, E.A. (2015). An RNA motif advances transcription by preventing Rho-dependent termination. Proc. Natl. Acad. Sci. U.S.A. 112, E6835–E6843.

37

Page 38:  · Web viewThis structure recapitulates the majority of the known L4-binding ribosomal protein autoregulatory element (Fu et al., 2013). The accepted phylogenetic structure includes

Table S1, Mustoe el al.

Sharma, C.M., Darfeuille, F., Plantinga, T.H., and Vogel, J. (2007). A small RNA regulates multiple ABC transporter mRNAs by targeting C/A-rich elements inside and upstream of ribosome-binding sites. Genes Dev. 21, 2804–2817.

Shen, P., Zengel, J.M., and Lindahl, L. (1988). Secondary structure of the leader transcript from the Escherichia coli S10 ribosomal protein operon. Nucleic Acids Res. 16, 8905–8924.

Smith, A.C., and Cronan, J.E. (2014). Evidence against translational repression by the carboxyltransferase component of Escherichia coli acetyl coenzyme A carboxylase. J. Bacteriol. 196, 3768–3775.

Steward, K.L., and Linn, T. (1992). Transcription frequency modulates the efficiency of an attenuator preceding the rpoBC RNA polymerase genes of Escherichia coli: possible autogenous control. Nucleic Acids Res. 20, 4773–4779.

Tran, T.T., Zhou, F., Marshburn, S., Stead, M., Kushner, S.R., and Xu, Y. (2009). De novo computational prediction of non-coding RNA genes in prokaryotic genomes. Bioinformatics 25, 2897–2905.

Tsui, H.C., and Winkler, M.E. (1994). Transcriptional patterns of the mutL-miaA superoperon of Escherichia coli K-12 suggest a model for posttranscriptional regulation. Biochimie 76, 1168–1177.

Tsui, H.C., Feng, G., and Winkler, M.E. (1996). Transcription of the mutL repair, miaA tRNA modification, hfq pleiotropic regulator, and hflA region protease genes of Escherichia coli K-12 from clustered Esigma32-specific promoters during heat shock. J. Bacteriol. 178, 5719–5731.

Tsui, H.C., Leung, H.C., and Winkler, M.E. (1994). Characterization of broadly pleiotropic phenotypes caused by an hfq insertion mutation in Escherichia coli K-12. Mol. Microbiol. 13, 35–49.

Turnbough, C.L., and Switzer, R.L. (2008). Regulation of pyrimidine biosynthetic gene expression in bacteria: repression without repressors. Microbiol. Mol. Biol. Rev. 72, 266–300–tableofcontents.

Urban, J.H., and Vogel, J. (2008). Two seemingly homologous noncoding RNAs act hierarchically to activate glmS mRNA translation. PLoS Biol. 6, e64.

Uzilov, A.V., Keegan, J.M., and Mathews, D.H. (2006). Detection of non-coding RNAs on the basis of predicted secondary structure formation free energy change. BMC Bioinformatics 7, 173.

Washietl, S., Hofacker, I.L., and Stadler, P.F. (2005). Fast and reliable prediction of noncoding RNAs. Proc. Natl. Acad. Sci. U.S.A. 102, 2454–2459.

Wikström, P.M., Byström, A.S., and Björk, G.R. (1988). Non-autogenous control of ribosomal protein synthesis from the trmD operon in Escherichia coli. J. Mol. Biol. 203, 141–152.

Wilson, L.A., and Sharp, P.M. (2006). Enterobacterial repetitive intergenic consensus (ERIC) sequences in Escherichia coli: Evolution and implications for ERIC-PCR. Mol. Biol. Evol. 23, 1156–1168.

Wirth, R., Littlechild, J., and Böck, A. (1982). Ribosomal protein S20 purified under mild conditions almost completely inhibits its own translation. Mol. Gen. Genet. 188, 164–166.

Yajnik, V., and Godson, G.N. (1993). Selective decay of Escherichia coli dnaG messenger RNA is initiated by RNase E. J. Biol. Chem. 268, 13253–13260.

Yakhnin, H., Yakhnin, A.V., Baker, C.S., Sineva, E., Berezin, I., Romeo, T., and Babitzke, P. (2011).

38

Page 39:  · Web viewThis structure recapitulates the majority of the known L4-binding ribosomal protein autoregulatory element (Fu et al., 2013). The accepted phylogenetic structure includes

Table S1, Mustoe el al.

Complex regulation of the global regulatory gene csrA: CsrA-mediated translational repression, transcription from five promoters by Eσ⁷⁰ and Eσ(S), and indirect transcriptional activation by CsrA. Mol. Microbiol. 81, 689–704.

Zengel, J.M., and Lindahl, L. (1996). A hairpin structure upstream of the terminator hairpin required for ribosomal protein L4-mediated attenuation control of the S10 operon of Escherichia coli. J. Bacteriol. 178, 2383–2387.

Zhang, Y., Mandava, C.S., Cao, W., Li, X., Zhang, D., Li, N., Zhang, Y., Zhang, X., Qin, Y., Mi, K., et al. (2015). HflX is a ribosome-splitting factor rescuing stalled ribosomes under stress conditions. Nat. Struct. Mol. Biol. 22, 906–913.

39