tight regulation of unstructured proteins: from … · tight regulation of unstructured proteins:...

4
Tight Regulation of Unstructured Proteins: from Transcript Synthesis to Protein Degradation Jörg Gsponer, 1 * Matthias E. Futschik, 1,2,3 Sarah A. Teichmann, 1 M. Madan Babu 1 * Altered abundance of several intrinsically unstructured proteins (IUPs) has been associated with perturbed cellular signaling that may lead to pathological conditions such as cancer. Therefore, it is important to understand how cells precisely regulate the availability of IUPs. We observed that regulation of transcript clearance, proteolytic degradation, and translational rate contribute to controlling the abundance of IUPs, some of which are present in low amounts and for short periods of time. Abundant phosphorylation and low stochasticity in transcription and translation indicates that the availability of IUPs can be finely tuned. Fidelity in signaling may require that most IUPs are available in appropriate amounts and not present longer than needed. U p to one third of all eukaryotic proteins have large segments that are unstructured and are commonly referred to as intrin- sically unstructured proteins (IUPs). These pro- teins lack a unique structure, either entirely or in parts, when alone in solution (1). The lack of structure is thought to provide several advan- tages, such as (i) an increased interaction surface area, (ii) conformational flexibility to interact with several targets, (iii) the presence of molecular recognition elements that fold upon binding, (iv) accessible posttranslational modification sites, and (v) the availability of short linear interaction motifs (25). These and other properties are ideal for proteins that mediate signaling and coordinate regulatory events, and indeed proteins that par- ticipate in regulatory and signaling functions are enriched in unstructured segments (68) [sup- porting online material (SOM) text S1]. Because of their unusual structural and important func- tional properties, the presence of IUPs in a cell may need to be carefully monitored. In fact, al- tered abundance of IUPs is associated with several disease conditions. For instance, overexpression of thyroid cancer 1 (TC-1) (9) or underexpression of adenosine 5´-diphosphate (ADP ) ribosylation factor (Arf) (10) and p27 (11) has been linked with various types of cancer. Similarly, overexpression of a-synuclein and tau proteins increases the risk of aggregate formation and has been linked to Parkinsons disease and Alzheimers disease (12, 13). We therefore tested whether specific control mechanisms affect the availability of IUPs (that is, their abundance and residence time) with- in a cell. Using the Disopred2 software ( 6), which reports unstructured residues based on the protein sequence, we computed the fraction of the polypeptide that is predicted to be unstructured for every protein in Saccharomyces cerevisiae ( 14). This allowed us to categorize 1971 sequences as highly structured (S; 0 to 10% of the total length is unstructured), 2711 sequences as moderately unstructured ( M; 10 to 30% of the protein is unstructured), and 2020 sequences as highly unstructured ( U; 30 to 100% of the protein is unstructured) (Fig. 1). This in- formation was integrated with different genome- scale datasets that describe most of the regulatory steps that influence protein synthesis or degrada- tion (table S1 and fig. S1), and we compared the distributions of the values for the proteins in the U and S groups with statistical tests (14). We compared the transcription of genes en- coding highly unstructured proteins with that of genes encoding more structured proteins. Be- cause the steady-state amount of mRNA could be affected by the rate at which the transcripts are produced or degraded, we investigated whether the transcriptional rate or the degradation rate were different for the transcripts that encode highly structured and unstructured proteins. The number of transcription factors (TFs) that regu- late a gene was comparable between the two groups (P = 0.55, Wilcoxon test) (Fig. 2A). However, mRNAs encoding highly unstructured proteins were generally less abundant than transcripts encoding more structured proteins (P = 1 × 10 -6 , Wilcoxon test) (Fig. 2B). The mRNA half-lives of the transcripts that encode highly unstructured proteins were lower than transcripts that encode more structured proteins (P< 10 -16 , Wilcoxon test) (Fig. 2C) and a comparison of the dis- tribution of transcriptional rates showed that the difference between the two groups was less sig- nificant (P = 1 × 10 -8 , Wilcoxon test) (table S3). Thus, differences in decay rates appear to be a major factor leading to differences in mRNA abundance (SOM text S5). We analyzed polyadenylate [poly(A)] tail length because the two major pathways of mRNA decay are initiated by removal of the poly(A) tail. A significantly larger fraction of the unstructured proteins had a short poly(A) tail (table S1) than did structured proteins (P < 10 -16 , Fishers exact test) (Fig. 2D). Analysis of transcript binding by Puf family RNA-binding proteins, which affect transcript stability, showed that Puf5p binding was enriched for transcripts that encode highly unstructured proteins. In fact, 108 of the 224 transcripts bound by Puf5p encode highly un- structured proteins, a much greater number than expected by chance, which was 68 transcripts ( z score, 5.3 SD; P = 5 × 10 -10 )(14). Thus, poly(A) tail length and interaction with specific RNA- binding proteins may modulate the stability of transcripts encoding IUPs (SOM text S5). REPORT 1 Medical Research Council (MRC) Laboratory of Molecular Biology, Hills Road, Cambridge CB2 0QH, UK. 2 Institute for Theoretical Biology, Charité, Humboldt-University, Berlin XX- XXXX, Germany. 3 Centre of Molecular and Structural Bio- medicine, University of Algarve, Faro XXXXXX, Portugal. *To whom correspondence should be addressed. E-mail: [email protected] (M.M.B.); jgsponer@mrc-lmb. cam.ac.uk (J.G.) Proteome of S. cerevisiae (6702 proteins) Highly structured, S 1971 proteins Moderately unstructured, M 2711 proteins Highly unstructured, U 2020 proteins 2F3J REF2-I mRNA export factor 1U96 Cytochrome C oxidase 1YPI Triosephosphate isomerase (0-10% unstructured) (10-30% unstructured) (30-100% unstructured) Grouping proteins in the proteome of yeast Fig. 1. The S. cerevisiae proteome was grouped into three categories, highly structured (S), moderately unstructured (M), and highly unstructured (U), based on the fraction of unstructured residues over the entire protein length. www.sciencemag.org SCIENCE VOL 000 MONTH 2008 1 MS no: RE1163581/CF/BIOCHEM

Upload: vutuong

Post on 29-Jul-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

Tight Regulation of UnstructuredProteins: from Transcript Synthesisto Protein DegradationJörg Gsponer,1* Matthias E. Futschik,1,2,3 Sarah A. Teichmann,1 M. Madan Babu1*

Altered abundance of several intrinsically unstructured proteins (IUPs) has been associatedwith perturbed cellular signaling that may lead to pathological conditions such as cancer.Therefore, it is important to understand how cells precisely regulate the availability of IUPs. Weobserved that regulation of transcript clearance, proteolytic degradation, and translational ratecontribute to controlling the abundance of IUPs, some of which are present in low amounts and forshort periods of time. Abundant phosphorylation and low stochasticity in transcription andtranslation indicates that the availability of IUPs can be finely tuned. Fidelity in signaling mayrequire that most IUPs are available in appropriate amounts and not present longer than needed.

Up to one third of all eukaryotic proteinshave large segments that are unstructuredand are commonly referred to as intrin-

sically unstructured proteins (IUPs). These pro-teins lack a unique structure, either entirely or inparts, when alone in solution (1). The lack ofstructure is thought to provide several advan-tages, such as (i) an increased interaction surfacearea, (ii) conformational flexibility to interact withseveral targets, (iii) the presence of molecularrecognition elements that fold upon binding, (iv)accessible posttranslational modification sites,and (v) the availability of short linear interactionmotifs (2–5). These and other properties are idealfor proteins that mediate signaling and coordinateregulatory events, and indeed proteins that par-ticipate in regulatory and signaling functions areenriched in unstructured segments (6–8) [sup-porting online material (SOM) text S1]. Becauseof their unusual structural and important func-tional properties, the presence of IUPs in a cellmay need to be carefully monitored. In fact, al-tered abundance of IUPs is associatedwith severaldisease conditions. For instance, overexpressionof thyroid cancer 1 (TC-1) (9) or underexpressionof adenosine 5´-diphosphate (ADP ) ribosylationfactor (Arf) (10) and p27 (11) has been linkedwithvarious types of cancer. Similarly, overexpressionof a-synuclein and tau proteins increases therisk of aggregate formation and has been linkedto Parkinson’s disease and Alzheimer’s disease(12, 13). We therefore tested whether specificcontrol mechanisms affect the availability of IUPs(that is, their abundance and residence time) with-in a cell.

Using theDisopred2 software (6),which reportsunstructured residues based on the protein sequence,we computed the fraction of the polypeptide that ispredicted to be unstructured for every protein inSaccharomyces cerevisiae (14). This allowed us tocategorize 1971 sequences as highly structured(“S”; 0 to 10% of the total length is unstructured),2711 sequences asmoderately unstructured (“M”; 10to 30% of the protein is unstructured), and 2020sequences as highly unstructured (“U”; 30 to 100%of the protein is unstructured) (Fig. 1). This in-formation was integrated with different genome-scale datasets that describe most of the regulatorysteps that influence protein synthesis or degrada-tion (table S1 and fig. S1), and we compared thedistributions of the values for the proteins in theU and S groups with statistical tests (14).

We compared the transcription of genes en-coding highly unstructured proteins with that ofgenes encoding more structured proteins. Be-

cause the steady-state amount of mRNA could beaffected by the rate at which the transcripts areproduced or degraded, we investigated whetherthe transcriptional rate or the degradation ratewere different for the transcripts that encodehighly structured and unstructured proteins. Thenumber of transcription factors (TFs) that regu-late a genewas comparable between the twogroups(P = 0.55, Wilcoxon test) (Fig. 2A). However,mRNAs encoding highly unstructured proteinswere generally less abundant than transcriptsencoding more structured proteins (P = 1 × 10−6,Wilcoxon test) (Fig. 2B). The mRNA half-livesof the transcripts that encode highly unstructuredproteins were lower than transcripts that encodemore structured proteins (P < 10−16, Wilcoxontest) (Fig. 2C) and a comparison of the dis-tribution of transcriptional rates showed that thedifference between the two groups was less sig-nificant (P = 1 × 10−8, Wilcoxon test) (table S3).Thus, differences in decay rates appear to be amajor factor leading to differences in mRNAabundance (SOM text S5).

We analyzed polyadenylate [poly(A)] taillength because the twomajor pathways of mRNAdecay are initiated by removal of the poly(A) tail.A significantly larger fraction of the unstructuredproteins had a short poly(A) tail (table S1) thandid structured proteins (P < 10−16, Fisher’s exacttest) (Fig. 2D). Analysis of transcript binding byPuf family RNA-binding proteins, which affecttranscript stability, showed that Puf5p bindingwas enriched for transcripts that encode highlyunstructured proteins. In fact, 108 of the 224transcripts bound by Puf5p encode highly un-structured proteins, a much greater number thanexpected by chance, which was 68 transcripts(z score, 5.3 SD;P=5× 10−10) (14). Thus, poly(A)tail length and interaction with specific RNA-binding proteins may modulate the stability oftranscripts encoding IUPs (SOM text S5).

REPORT

1Medical Research Council (MRC) Laboratory of MolecularBiology, Hills Road, Cambridge CB2 0QH, UK. 2Institute forTheoretical Biology, Charité, Humboldt-University, Berlin XX-XXXX, Germany. 3Centre of Molecular and Structural Bio-medicine, University of Algarve, Faro XXXXXX, Portugal.

*To whom correspondence should be addressed. E-mail:[email protected] (M.M.B.); [email protected] (J.G.)

Proteome of S. cerevisiae(6702 proteins)

Highly structured, S1971 proteins

Moderately unstructured, M2711 proteins

Highly unstructured, U2020 proteins

2F3JREF2-I mRNA export factor

1U96 Cytochrome C oxidase

1YPI Triosephosphate isomerase

(0-10% unstructured) (10-30% unstructured) (30-100% unstructured)

Grouping proteins in the proteome of yeast

Fig. 1. The S. cerevisiae proteome was grouped into three categories, highly structured (S),moderately unstructured (M), and highly unstructured (U), based on the fraction of unstructuredresidues over the entire protein length.

www.sciencemag.org SCIENCE VOL 000 MONTH 2008 1

MS no: RE1163581/CF/BIOCHEM

Unstructured proteins tend to be less abundantthan structured proteins (P < 10−16; Wilcoxontest) (Fig. 2F, fig. S2, and SOM text S2 and S5).The rate of protein synthesis was significantlylower (P < 10−16, Wilcoxon test) (Fig. 2E) andprotein half-life was shorter (P = 1 × 10−15,Wilcoxon test) (Fig. 2G and SOM text S5) forhighly unstructured than for more structuredproteins. Two pathways that mediate ubquitinproteasome–dependent degradation are the N-end–rule pathway and pxxxxxx exxxxxx sxxxxxxtxxxxxx (PEST)–mediated degradation path-way. Although the distribution of N-end residueswas not significantly different (SOM text S3 andfigs. S3 and S4), a significantly greater fraction ofthe unstructured proteins contained PEST motifs(P < 10−16, Fisher’s exact test) (Fig. 2H and SOMtext S5) as previously observed (1, 15). Therefore,

it appears that the availability of many IUPs isregulated via proteolytic degradation and a re-duced translational rate.

For certain IUPs (for example, p27), post-translational modifications such as phosphoryl-ation (11, 16) can affect their abundance or half-lifein a cell. In fact, recent computational studies usingphosphorylation site–prediction methods havesuggested that unstructured regions are enrichedfor sites that can be posttranslationally modified(17). We analyzed the experimentally determinedyeast kinase-substrate network and found thathighly unstructured proteins are on average sub-strates of twice as many kinases as are structuredproteins (P = 1 × 10−12, Wilcoxon test) (SOM textS4 and fig. S5). On average, 51 T 19% (SD) of allsubstrates of the kinases are highly unstructured,whereas only 19 T 13% (SD) are highly structured

[the remaining 30 T 14% (SD) of all substrates aremoderately unstructured]. This is a significant biasas compared with the expected genomewide dis-tribution of ~30% highly unstructured and ~30%structured proteins based on our categorization(P < 10−16, Fisher’s exact test) (14). We foundthat 85% of the kinases for which more than 50%of their substrates are highly unstructured (tableS2) are either regulated in a cell cycle–dependentmanner (for example, Cdc28) or activated uponexposure to particular stimuli (for example, Fus3)or stress (for example, Atg1). This suggests thatposttranslational modification of IUPs throughphosphorylation may be an important mechanismin fine-tuning their function and possibly theiravailability under different conditions.

We investigated whether genes that encodehighly unstructured proteins display low stochas-

Fig. 2. Comparison of the distribution of values for various regulatory andcellular properties for the three different groups of proteins (S, M, and U) inS. cerevisiae: (A) transcriptional complexity, (B) mRNA abundance, (C) mRNAhalf-life, (D) poly(A) tail length (percentage of transcripts that have short poly(A)tail length within a group), (E) ribosomal density, (F) protein abundance, (G)protein half-life, (H) PEST-sequence content (percentage of proteins with the PESTmotif within a group), (I) TATA box content (percentage of genes with a TATA boxwithin a group), and (J) noise (distance from themedian coefficient of variation) inprotein production. A plus sign denotes statistically significant differences betweenthe S and U group (table S3). ORF, open reading frame; CV, cxxxxxx vxxxxxx.

MONTH 2008 VOL 000 SCIENCE www.sciencemag.org2

REPORT

MS no: RE1163581/CF/BIOCHEM

ticity in their expression levels among individualsin a population of genetically homogeneous cells.An important source of such stochasticity in cel-lular systems is random noise in transcription andtranslation, which results in very different amountsof transcripts and protein products. We used thepresence of a TATA box in the promoter region toinfer genes that might be more subjected to noisein gene expression (18) and found that genesencoding highly unstructured proteins are lesslikely to have a TATA box than those that encodestructured proteins (P = 8 × 10−7, Fisher’s exacttest) (Fig. 2I). At the protein level, analysis ofdirect experimental data revealed that unstructuredproteins have lower noise levels than do struc-tured proteins (P = 3 × 10−11, Wilcoxon test)(Fig. 2J). These results indicate that highly un-structured proteins may display less noise in tran-scription and translation than more structuredproteins.

To assess regulation of IUPs in other orga-nisms, we analyzed datasets (table S1) for Schizo-saccharomyces pombe andHomo sapiens. Similartrends to those observed for S. cerevisiae wereevident in these organisms (Fig. 3 and tables S3and S4). Thus, both unicellular and multicellularorganisms appear to regulate the availability ofIUPs. The observed differences between struc-tured and unstructured proteins were independentof the IUP prediction method used, proteinlength, localization within the major subcellular

compartments, different grouping of proteins, orthe number of interaction partners per protein(tables S6 to S11). Though the distributions of thevalues for the proteins in the structured and un-structured groups are broad and overlap, the dif-ferences reported here are consistently statisticallysignificant with two independent statistical tests,theWilcoxon rank-sum test and the Kolmogorov-Smirnov test (tables S3 to S5) (14). Thus, thereported trends appear to be attributable to theintrinsically unstructured nature of the proteins.Of all the IUPs, those that contain polyglutamine[poly(Q)] or pxxxxxxxx [poly(N)] stretches seemto be more tightly controlled (table S3 and S4).

Proteins with unstructured regions predom-inantly have signaling or regulatory roles and areoften reused in multiple pathways to produce dif-ferent physiological outcomes (2, 6–8). Accord-ingly, increased abundance of IUPs can result inundesirable interactions (for example, titration ofunrelated proteins by inappropriate interactionthrough exposed peptide motifs), thereby disturb-ing the fine balance of the signaling networksleading to dampened or inappropriate activities(19). Spatial and temporal segregation of signal-ing proteins as well as an increased signalingcomplexitymay contribute to fidelity in regulation(SOM text S5 and fig. S6). In addition, tightregulation of signaling and regulatory IUPs couldminimize the potentially harmful effects of ectopicinteractions and may provide robustness to sig-

naling processes by ensuring that such proteinsare present in appropriate amounts and timeperiods. Indeed, free IkB [an IUP that interactswith and inhibits the nuclear factor kB (NF-kB)transcription factor] must be continuously degradedto allow for rapid and robust NF-kB activationand to make the pathway sensitive for a signalinginput (20, 21). In contrast, stabilization of freeIkB by removal of the PESTmotif reduces NF-kBactivation (20, 21). In mammalian cells exposedtomild exxxxxx rxxxxxx stress, survival is favoredas a direct consequence of the intrinsic instabilityof mRNAs and of proteins that drive apoptosis(such as Chop, an IUP) as compared with that ofproteins that promote adaptation and cell longevity(for example, the chaperone protein BiP, a struc-tured protein) (22).

Although the abundance of many IUPs isstrictly controlled, certain IUPs are present incells in large amounts or for long periods of time.In fact, in some cases (for example, the fibrousmuscle protein titin), large amounts may be re-quired throughout the lifetime of a cell. Fine-tuning of IUP availability can be achieved withposttranslational modifications and interactionswith other factors (1, 3, 11, 16). Both mecha-nisms can promote increased abundance orlonger half-life through changes in cellular local-ization or by protection from the degradationmachinery (for example, certain phosphorylatedforms of the cyclin-dependent kinase inhibitory

Fig. 3. Comparison of various regulatory and cellular parameters for the three groups of proteins (S, M, and U) in S. pombe (A to D) and humans (E to G).All reported differences are statistically significant (table S4). A.U., arbitrary units; mya, mxxxxxx yxxxxxx axxxxxx.

www.sciencemag.org SCIENCE VOL 000 MONTH 2008 3

REPORT

MS no: RE1163581/CF/BIOCHEM

protein p27kip1 and the spinocerebellar ataxiatype 1 protein ataxin-1) (23, 24). Although asso-ciation with other proteins may increase theirstability, free IUPs are likely to be rapidly degradedby the 20S proteasome via degradation by default(25), as shown for the unbound forms of p21cip1

(26), p27kip1 (27), a-synuclein (28), and tau (29).Certain posttranslational modifications may pro-mote regulated degradation (for example, that ofp27Kip1) (11, 16). In this context, our finding thatmany IUPs tend to be phosphorylated by mul-tiple kinases and display low noise in transcrip-tion and translation suggests that their abundanceand half-lives may be finely tuned in cells (SOMtext S5).

Our studies reveal an evolutionarily con-served tight control of synthesis and clearanceof most IUPs. The discovery was made possibleby integrating multiple large-scale datasets thatdescribe control mechanisms during tran-scription, translation, and post-translational mod-ification with structural information on proteins.Besides the elucidation of general trends, theframework describing multiple levels of regula-tion introduced here may facilitate investigationof how individual IUPs are fine-tuned in differentcell types and how perturbations to this tightcontrol might influence disease conditions (30).

References and Notes1. A. K. Dunker et al., J. Mol. Graph. Model. 19, 26 (2001).2. P. E. Wright, H. J. Dyson, J. Mol. Biol. 293, 321 (1999).3. R. W. Kriwacki, L. Hengst, L. Tennant, S. I. Reed,

P. E. Wright, Proc. Natl. Acad. Sci. U.S.A. 93, 11504(1996).

4. P. Tompa, FEBS Lett. 579, 3346 (2005).5. C. J. Oldfield et al., Biochemistry 44, 12454 (2005).6. J. J. Ward, J. S. Sodhi, L. J. McGuffin, B. F. Buxton,

D. T. Jones, J. Mol. Biol. 337, 635 (2004).7. J. Liu et al., Biochemistry 45, 6873 (2006).8. L. M. Iakoucheva, C. J. Brown, J. D. Lawson, Z. Obradovic,

A. K. Dunker, J. Mol. Biol. 323, 573 (2002).9. M. Sunde et al., Cancer Res. 64, 2766 (2004).10. C. J. Sherr, Nat. Rev. Cancer 6, 663 (2006).11. M. Grimmler et al., Cell 128, 269 (2007).12. F. Chiti, C. M. Dobson, Annu. Rev. Biochem. 75, 333

(2006).13. M. Goedert, Nat. Rev. Neurosci. 2, 492 (2001).14. Materials and methods are available as supporting

material on Science Online.15. P. Tompa, J. Prilusky, I. Silman, J. L. Sussman, Proteins

XX, XXXX (2007).16. I. Chu et al., Cell 128, 281 (2007).17. L. M. Iakoucheva et al., Nucleic Acids Res. 32, 1037

(2004).18. J. M. Raser, E. K. O'Shea, Science 304, 1811 (2004).19. T. Pawson, N. Warner, Oncogene 26, 1268 (2007).20. E. Mathes, E. L. O'Dea, A. Hoffmann, G. Ghosh, EMBO J.

27, 1421 (2008).21. E. L. O'Dea et al., Mol. Syst. Biol. 3, 111 (2007).22. D. T. Rutkowski et al., PLoS Biol. 4, e374 (2006).23. H. K. Chen et al., Cell 113, 457 (2003).24. I. M. Chu, L. Hengst, J. M. Slingerland, Nat. Rev. Cancer

8, 253 (2008).

25. G. Asher, N. Reuven, Y. Shaul, Bioessays 28, 844(2006).

26. X. Li et al., Mol. Cell 26, 831 (2007).27. W. S. Tambyrajah, L. D. Bowler, C. Medina-Palazon,

A. J. Sinclair, Arch. Biochem. Biophys. 466, 186(2007).

28. G. K. Tofaris, R. Layfield, M. G. Spillantini, FEBS Lett.509, 22 (2001).

29. D. C. David et al., J. Neurochem. 83, 176 (2002).30. W. E. Balch, R. I. Morimoto, A. Dillin, J. W. Kelly, Science

319, 916 (2008).31. The authors acknowledge MRC for funding and thank the

two anonymous referees, the editor, A. Bertolotti,A. Tzakos, A. Wuster, C. Brockmann, C. Chothia, D. Finley,D. Rubinsztein, E. Levy, H. McMahon, I. Schafer, J. Bui,K. Weber, M. Goedert, R. Janky, S. Michnick, andS. Sarkar for providing helpful comments. We apologizefor, because of lack of space, not citing the work thatdescribes the datasets and other references . Many can befound in the SOM. M.M.B. acknowledges Darwin Collegeand Schlumberger Ltd, and M.M.B. and M.F. acknowledgethe Royal Society for support. J.G. is funded by an MRCSpecial Training Fellowship in computational biology.

Supporting Online Materialwww.sciencemag.org/cgi/content/full/[vol]/[issue no.]/[page]/DC1Materials and MethodsSOM Text S1 to S5Figures S1 to S6Tables S1 to S12References

22 July 2008; accepted 9 October 200810.1126/science.1163581

MONTH 2008 VOL 000 SCIENCE www.sciencemag.org4

REPORT

MS no: RE1163581/CF/BIOCHEM