tight regulation of unstructured proteins: from … · tight regulation of unstructured proteins:...
TRANSCRIPT
Tight Regulation of UnstructuredProteins: from Transcript Synthesisto Protein DegradationJörg Gsponer,1* Matthias E. Futschik,1,2,3 Sarah A. Teichmann,1 M. Madan Babu1*
Altered abundance of several intrinsically unstructured proteins (IUPs) has been associatedwith perturbed cellular signaling that may lead to pathological conditions such as cancer.Therefore, it is important to understand how cells precisely regulate the availability of IUPs. Weobserved that regulation of transcript clearance, proteolytic degradation, and translational ratecontribute to controlling the abundance of IUPs, some of which are present in low amounts and forshort periods of time. Abundant phosphorylation and low stochasticity in transcription andtranslation indicates that the availability of IUPs can be finely tuned. Fidelity in signaling mayrequire that most IUPs are available in appropriate amounts and not present longer than needed.
Up to one third of all eukaryotic proteinshave large segments that are unstructuredand are commonly referred to as intrin-
sically unstructured proteins (IUPs). These pro-teins lack a unique structure, either entirely or inparts, when alone in solution (1). The lack ofstructure is thought to provide several advan-tages, such as (i) an increased interaction surfacearea, (ii) conformational flexibility to interact withseveral targets, (iii) the presence of molecularrecognition elements that fold upon binding, (iv)accessible posttranslational modification sites,and (v) the availability of short linear interactionmotifs (2–5). These and other properties are idealfor proteins that mediate signaling and coordinateregulatory events, and indeed proteins that par-ticipate in regulatory and signaling functions areenriched in unstructured segments (6–8) [sup-porting online material (SOM) text S1]. Becauseof their unusual structural and important func-tional properties, the presence of IUPs in a cellmay need to be carefully monitored. In fact, al-tered abundance of IUPs is associatedwith severaldisease conditions. For instance, overexpressionof thyroid cancer 1 (TC-1) (9) or underexpressionof adenosine 5´-diphosphate (ADP ) ribosylationfactor (Arf) (10) and p27 (11) has been linkedwithvarious types of cancer. Similarly, overexpressionof a-synuclein and tau proteins increases therisk of aggregate formation and has been linkedto Parkinson’s disease and Alzheimer’s disease(12, 13). We therefore tested whether specificcontrol mechanisms affect the availability of IUPs(that is, their abundance and residence time) with-in a cell.
Using theDisopred2 software (6),which reportsunstructured residues based on the protein sequence,we computed the fraction of the polypeptide that ispredicted to be unstructured for every protein inSaccharomyces cerevisiae (14). This allowed us tocategorize 1971 sequences as highly structured(“S”; 0 to 10% of the total length is unstructured),2711 sequences asmoderately unstructured (“M”; 10to 30% of the protein is unstructured), and 2020sequences as highly unstructured (“U”; 30 to 100%of the protein is unstructured) (Fig. 1). This in-formation was integrated with different genome-scale datasets that describe most of the regulatorysteps that influence protein synthesis or degrada-tion (table S1 and fig. S1), and we compared thedistributions of the values for the proteins in theU and S groups with statistical tests (14).
We compared the transcription of genes en-coding highly unstructured proteins with that ofgenes encoding more structured proteins. Be-
cause the steady-state amount of mRNA could beaffected by the rate at which the transcripts areproduced or degraded, we investigated whetherthe transcriptional rate or the degradation ratewere different for the transcripts that encodehighly structured and unstructured proteins. Thenumber of transcription factors (TFs) that regu-late a genewas comparable between the twogroups(P = 0.55, Wilcoxon test) (Fig. 2A). However,mRNAs encoding highly unstructured proteinswere generally less abundant than transcriptsencoding more structured proteins (P = 1 × 10−6,Wilcoxon test) (Fig. 2B). The mRNA half-livesof the transcripts that encode highly unstructuredproteins were lower than transcripts that encodemore structured proteins (P < 10−16, Wilcoxontest) (Fig. 2C) and a comparison of the dis-tribution of transcriptional rates showed that thedifference between the two groups was less sig-nificant (P = 1 × 10−8, Wilcoxon test) (table S3).Thus, differences in decay rates appear to be amajor factor leading to differences in mRNAabundance (SOM text S5).
We analyzed polyadenylate [poly(A)] taillength because the twomajor pathways of mRNAdecay are initiated by removal of the poly(A) tail.A significantly larger fraction of the unstructuredproteins had a short poly(A) tail (table S1) thandid structured proteins (P < 10−16, Fisher’s exacttest) (Fig. 2D). Analysis of transcript binding byPuf family RNA-binding proteins, which affecttranscript stability, showed that Puf5p bindingwas enriched for transcripts that encode highlyunstructured proteins. In fact, 108 of the 224transcripts bound by Puf5p encode highly un-structured proteins, a much greater number thanexpected by chance, which was 68 transcripts(z score, 5.3 SD;P=5× 10−10) (14). Thus, poly(A)tail length and interaction with specific RNA-binding proteins may modulate the stability oftranscripts encoding IUPs (SOM text S5).
REPORT
1Medical Research Council (MRC) Laboratory of MolecularBiology, Hills Road, Cambridge CB2 0QH, UK. 2Institute forTheoretical Biology, Charité, Humboldt-University, Berlin XX-XXXX, Germany. 3Centre of Molecular and Structural Bio-medicine, University of Algarve, Faro XXXXXX, Portugal.
*To whom correspondence should be addressed. E-mail:[email protected] (M.M.B.); [email protected] (J.G.)
Proteome of S. cerevisiae(6702 proteins)
Highly structured, S1971 proteins
Moderately unstructured, M2711 proteins
Highly unstructured, U2020 proteins
2F3JREF2-I mRNA export factor
1U96 Cytochrome C oxidase
1YPI Triosephosphate isomerase
(0-10% unstructured) (10-30% unstructured) (30-100% unstructured)
Grouping proteins in the proteome of yeast
Fig. 1. The S. cerevisiae proteome was grouped into three categories, highly structured (S),moderately unstructured (M), and highly unstructured (U), based on the fraction of unstructuredresidues over the entire protein length.
www.sciencemag.org SCIENCE VOL 000 MONTH 2008 1
MS no: RE1163581/CF/BIOCHEM
Unstructured proteins tend to be less abundantthan structured proteins (P < 10−16; Wilcoxontest) (Fig. 2F, fig. S2, and SOM text S2 and S5).The rate of protein synthesis was significantlylower (P < 10−16, Wilcoxon test) (Fig. 2E) andprotein half-life was shorter (P = 1 × 10−15,Wilcoxon test) (Fig. 2G and SOM text S5) forhighly unstructured than for more structuredproteins. Two pathways that mediate ubquitinproteasome–dependent degradation are the N-end–rule pathway and pxxxxxx exxxxxx sxxxxxxtxxxxxx (PEST)–mediated degradation path-way. Although the distribution of N-end residueswas not significantly different (SOM text S3 andfigs. S3 and S4), a significantly greater fraction ofthe unstructured proteins contained PEST motifs(P < 10−16, Fisher’s exact test) (Fig. 2H and SOMtext S5) as previously observed (1, 15). Therefore,
it appears that the availability of many IUPs isregulated via proteolytic degradation and a re-duced translational rate.
For certain IUPs (for example, p27), post-translational modifications such as phosphoryl-ation (11, 16) can affect their abundance or half-lifein a cell. In fact, recent computational studies usingphosphorylation site–prediction methods havesuggested that unstructured regions are enrichedfor sites that can be posttranslationally modified(17). We analyzed the experimentally determinedyeast kinase-substrate network and found thathighly unstructured proteins are on average sub-strates of twice as many kinases as are structuredproteins (P = 1 × 10−12, Wilcoxon test) (SOM textS4 and fig. S5). On average, 51 T 19% (SD) of allsubstrates of the kinases are highly unstructured,whereas only 19 T 13% (SD) are highly structured
[the remaining 30 T 14% (SD) of all substrates aremoderately unstructured]. This is a significant biasas compared with the expected genomewide dis-tribution of ~30% highly unstructured and ~30%structured proteins based on our categorization(P < 10−16, Fisher’s exact test) (14). We foundthat 85% of the kinases for which more than 50%of their substrates are highly unstructured (tableS2) are either regulated in a cell cycle–dependentmanner (for example, Cdc28) or activated uponexposure to particular stimuli (for example, Fus3)or stress (for example, Atg1). This suggests thatposttranslational modification of IUPs throughphosphorylation may be an important mechanismin fine-tuning their function and possibly theiravailability under different conditions.
We investigated whether genes that encodehighly unstructured proteins display low stochas-
Fig. 2. Comparison of the distribution of values for various regulatory andcellular properties for the three different groups of proteins (S, M, and U) inS. cerevisiae: (A) transcriptional complexity, (B) mRNA abundance, (C) mRNAhalf-life, (D) poly(A) tail length (percentage of transcripts that have short poly(A)tail length within a group), (E) ribosomal density, (F) protein abundance, (G)protein half-life, (H) PEST-sequence content (percentage of proteins with the PESTmotif within a group), (I) TATA box content (percentage of genes with a TATA boxwithin a group), and (J) noise (distance from themedian coefficient of variation) inprotein production. A plus sign denotes statistically significant differences betweenthe S and U group (table S3). ORF, open reading frame; CV, cxxxxxx vxxxxxx.
MONTH 2008 VOL 000 SCIENCE www.sciencemag.org2
REPORT
MS no: RE1163581/CF/BIOCHEM
ticity in their expression levels among individualsin a population of genetically homogeneous cells.An important source of such stochasticity in cel-lular systems is random noise in transcription andtranslation, which results in very different amountsof transcripts and protein products. We used thepresence of a TATA box in the promoter region toinfer genes that might be more subjected to noisein gene expression (18) and found that genesencoding highly unstructured proteins are lesslikely to have a TATA box than those that encodestructured proteins (P = 8 × 10−7, Fisher’s exacttest) (Fig. 2I). At the protein level, analysis ofdirect experimental data revealed that unstructuredproteins have lower noise levels than do struc-tured proteins (P = 3 × 10−11, Wilcoxon test)(Fig. 2J). These results indicate that highly un-structured proteins may display less noise in tran-scription and translation than more structuredproteins.
To assess regulation of IUPs in other orga-nisms, we analyzed datasets (table S1) for Schizo-saccharomyces pombe andHomo sapiens. Similartrends to those observed for S. cerevisiae wereevident in these organisms (Fig. 3 and tables S3and S4). Thus, both unicellular and multicellularorganisms appear to regulate the availability ofIUPs. The observed differences between struc-tured and unstructured proteins were independentof the IUP prediction method used, proteinlength, localization within the major subcellular
compartments, different grouping of proteins, orthe number of interaction partners per protein(tables S6 to S11). Though the distributions of thevalues for the proteins in the structured and un-structured groups are broad and overlap, the dif-ferences reported here are consistently statisticallysignificant with two independent statistical tests,theWilcoxon rank-sum test and the Kolmogorov-Smirnov test (tables S3 to S5) (14). Thus, thereported trends appear to be attributable to theintrinsically unstructured nature of the proteins.Of all the IUPs, those that contain polyglutamine[poly(Q)] or pxxxxxxxx [poly(N)] stretches seemto be more tightly controlled (table S3 and S4).
Proteins with unstructured regions predom-inantly have signaling or regulatory roles and areoften reused in multiple pathways to produce dif-ferent physiological outcomes (2, 6–8). Accord-ingly, increased abundance of IUPs can result inundesirable interactions (for example, titration ofunrelated proteins by inappropriate interactionthrough exposed peptide motifs), thereby disturb-ing the fine balance of the signaling networksleading to dampened or inappropriate activities(19). Spatial and temporal segregation of signal-ing proteins as well as an increased signalingcomplexitymay contribute to fidelity in regulation(SOM text S5 and fig. S6). In addition, tightregulation of signaling and regulatory IUPs couldminimize the potentially harmful effects of ectopicinteractions and may provide robustness to sig-
naling processes by ensuring that such proteinsare present in appropriate amounts and timeperiods. Indeed, free IkB [an IUP that interactswith and inhibits the nuclear factor kB (NF-kB)transcription factor] must be continuously degradedto allow for rapid and robust NF-kB activationand to make the pathway sensitive for a signalinginput (20, 21). In contrast, stabilization of freeIkB by removal of the PESTmotif reduces NF-kBactivation (20, 21). In mammalian cells exposedtomild exxxxxx rxxxxxx stress, survival is favoredas a direct consequence of the intrinsic instabilityof mRNAs and of proteins that drive apoptosis(such as Chop, an IUP) as compared with that ofproteins that promote adaptation and cell longevity(for example, the chaperone protein BiP, a struc-tured protein) (22).
Although the abundance of many IUPs isstrictly controlled, certain IUPs are present incells in large amounts or for long periods of time.In fact, in some cases (for example, the fibrousmuscle protein titin), large amounts may be re-quired throughout the lifetime of a cell. Fine-tuning of IUP availability can be achieved withposttranslational modifications and interactionswith other factors (1, 3, 11, 16). Both mecha-nisms can promote increased abundance orlonger half-life through changes in cellular local-ization or by protection from the degradationmachinery (for example, certain phosphorylatedforms of the cyclin-dependent kinase inhibitory
Fig. 3. Comparison of various regulatory and cellular parameters for the three groups of proteins (S, M, and U) in S. pombe (A to D) and humans (E to G).All reported differences are statistically significant (table S4). A.U., arbitrary units; mya, mxxxxxx yxxxxxx axxxxxx.
www.sciencemag.org SCIENCE VOL 000 MONTH 2008 3
REPORT
MS no: RE1163581/CF/BIOCHEM
protein p27kip1 and the spinocerebellar ataxiatype 1 protein ataxin-1) (23, 24). Although asso-ciation with other proteins may increase theirstability, free IUPs are likely to be rapidly degradedby the 20S proteasome via degradation by default(25), as shown for the unbound forms of p21cip1
(26), p27kip1 (27), a-synuclein (28), and tau (29).Certain posttranslational modifications may pro-mote regulated degradation (for example, that ofp27Kip1) (11, 16). In this context, our finding thatmany IUPs tend to be phosphorylated by mul-tiple kinases and display low noise in transcrip-tion and translation suggests that their abundanceand half-lives may be finely tuned in cells (SOMtext S5).
Our studies reveal an evolutionarily con-served tight control of synthesis and clearanceof most IUPs. The discovery was made possibleby integrating multiple large-scale datasets thatdescribe control mechanisms during tran-scription, translation, and post-translational mod-ification with structural information on proteins.Besides the elucidation of general trends, theframework describing multiple levels of regula-tion introduced here may facilitate investigationof how individual IUPs are fine-tuned in differentcell types and how perturbations to this tightcontrol might influence disease conditions (30).
References and Notes1. A. K. Dunker et al., J. Mol. Graph. Model. 19, 26 (2001).2. P. E. Wright, H. J. Dyson, J. Mol. Biol. 293, 321 (1999).3. R. W. Kriwacki, L. Hengst, L. Tennant, S. I. Reed,
P. E. Wright, Proc. Natl. Acad. Sci. U.S.A. 93, 11504(1996).
4. P. Tompa, FEBS Lett. 579, 3346 (2005).5. C. J. Oldfield et al., Biochemistry 44, 12454 (2005).6. J. J. Ward, J. S. Sodhi, L. J. McGuffin, B. F. Buxton,
D. T. Jones, J. Mol. Biol. 337, 635 (2004).7. J. Liu et al., Biochemistry 45, 6873 (2006).8. L. M. Iakoucheva, C. J. Brown, J. D. Lawson, Z. Obradovic,
A. K. Dunker, J. Mol. Biol. 323, 573 (2002).9. M. Sunde et al., Cancer Res. 64, 2766 (2004).10. C. J. Sherr, Nat. Rev. Cancer 6, 663 (2006).11. M. Grimmler et al., Cell 128, 269 (2007).12. F. Chiti, C. M. Dobson, Annu. Rev. Biochem. 75, 333
(2006).13. M. Goedert, Nat. Rev. Neurosci. 2, 492 (2001).14. Materials and methods are available as supporting
material on Science Online.15. P. Tompa, J. Prilusky, I. Silman, J. L. Sussman, Proteins
XX, XXXX (2007).16. I. Chu et al., Cell 128, 281 (2007).17. L. M. Iakoucheva et al., Nucleic Acids Res. 32, 1037
(2004).18. J. M. Raser, E. K. O'Shea, Science 304, 1811 (2004).19. T. Pawson, N. Warner, Oncogene 26, 1268 (2007).20. E. Mathes, E. L. O'Dea, A. Hoffmann, G. Ghosh, EMBO J.
27, 1421 (2008).21. E. L. O'Dea et al., Mol. Syst. Biol. 3, 111 (2007).22. D. T. Rutkowski et al., PLoS Biol. 4, e374 (2006).23. H. K. Chen et al., Cell 113, 457 (2003).24. I. M. Chu, L. Hengst, J. M. Slingerland, Nat. Rev. Cancer
8, 253 (2008).
25. G. Asher, N. Reuven, Y. Shaul, Bioessays 28, 844(2006).
26. X. Li et al., Mol. Cell 26, 831 (2007).27. W. S. Tambyrajah, L. D. Bowler, C. Medina-Palazon,
A. J. Sinclair, Arch. Biochem. Biophys. 466, 186(2007).
28. G. K. Tofaris, R. Layfield, M. G. Spillantini, FEBS Lett.509, 22 (2001).
29. D. C. David et al., J. Neurochem. 83, 176 (2002).30. W. E. Balch, R. I. Morimoto, A. Dillin, J. W. Kelly, Science
319, 916 (2008).31. The authors acknowledge MRC for funding and thank the
two anonymous referees, the editor, A. Bertolotti,A. Tzakos, A. Wuster, C. Brockmann, C. Chothia, D. Finley,D. Rubinsztein, E. Levy, H. McMahon, I. Schafer, J. Bui,K. Weber, M. Goedert, R. Janky, S. Michnick, andS. Sarkar for providing helpful comments. We apologizefor, because of lack of space, not citing the work thatdescribes the datasets and other references . Many can befound in the SOM. M.M.B. acknowledges Darwin Collegeand Schlumberger Ltd, and M.M.B. and M.F. acknowledgethe Royal Society for support. J.G. is funded by an MRCSpecial Training Fellowship in computational biology.
Supporting Online Materialwww.sciencemag.org/cgi/content/full/[vol]/[issue no.]/[page]/DC1Materials and MethodsSOM Text S1 to S5Figures S1 to S6Tables S1 to S12References
22 July 2008; accepted 9 October 200810.1126/science.1163581
MONTH 2008 VOL 000 SCIENCE www.sciencemag.org4
REPORT
MS no: RE1163581/CF/BIOCHEM