bioinformatics practical for biochemists · bioinformatics practical for biochemists andrei lupas,...

21
Bioinformatics Practical for Biochemists Andrei Lupas, Birte Höcker, Steffen Schmidt SS 2012 03. Sequence Features

Upload: others

Post on 10-Sep-2019

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bioinformatics Practical for Biochemists · Bioinformatics Practical for Biochemists Andrei Lupas, Birte Höcker, Steffen Schmidt SS 2012 03. Sequence Features

Bioinformatics Practicalfor

Biochemists

Andrei Lupas, Birte Höcker, Steffen SchmidtSS 2012

03. Sequence Features

Page 2: Bioinformatics Practical for Biochemists · Bioinformatics Practical for Biochemists Andrei Lupas, Birte Höcker, Steffen Schmidt SS 2012 03. Sequence Features

Günter Blobel, 1999, nobelprize.org

Targeting proteins

• signal peptide

• targets proteins to the secretory pathway

• N-terminal sequence recognized while peptide is still synthesized on the ribosome

Page 3: Bioinformatics Practical for Biochemists · Bioinformatics Practical for Biochemists Andrei Lupas, Birte Höcker, Steffen Schmidt SS 2012 03. Sequence Features

Nielsen et al. (2007)

Signal Peptide Prediction

• Sequence Logo of eukaryotic signal peptides

Page 4: Bioinformatics Practical for Biochemists · Bioinformatics Practical for Biochemists Andrei Lupas, Birte Höcker, Steffen Schmidt SS 2012 03. Sequence Features

Signal Peptide Prediction - SignalP

0.0

0.2

0.4

0.6

0.8

1.0

0 10 20 30 40 50 60 70

Score

Position

SignalP-4.0 prediction (euk networks): ERP44_HUMAN

MHPAVFLSLPDLRCSLLLLVTWVFTPVTTE I TSLDTEN I DE I LNNADVALVNFYADWCRFSQMLHP I FEE

C(cleavage)-score

S(ignal peptide)-score

Y (combined)-score

• http://www.cbs.dtu.dk/services/SignalP

Page 5: Bioinformatics Practical for Biochemists · Bioinformatics Practical for Biochemists Andrei Lupas, Birte Höcker, Steffen Schmidt SS 2012 03. Sequence Features

PDB-id: 1c3w

Transmembrane Helices

• unusually long stretch of hydrophobic residues

• >18 hyrdrophobic amino acids

• hydrophobic interaction with lipids in membrane

• orientation of helix / topology of the protein

• looking at the “loops”: R & K mainly found on cytoplasmic side“positive inside rule”TGRPEWIWLALGTALMGLGTLYFLVKGMGVSDPDAKKFYAITTLVPAIAFTMYLSMLLGYGL

N

Page 6: Bioinformatics Practical for Biochemists · Bioinformatics Practical for Biochemists Andrei Lupas, Birte Höcker, Steffen Schmidt SS 2012 03. Sequence Features

Sonnhammer et al. (1998)

Transmembrane Helices – TMHMM

• http://www.cbs.dtu.dk/services/TMHMM/

• Accuracy of predicting TM helices high > 90%

• Accuracy of predicting the topology prediction > 75%

Page 7: Bioinformatics Practical for Biochemists · Bioinformatics Practical for Biochemists Andrei Lupas, Birte Höcker, Steffen Schmidt SS 2012 03. Sequence Features

Transmembrane Helices – TMHMM

• http://www.cbs.dtu.dk/services/TMHMM/

0

0.2

0.4

0.6

0.8

1

1.2

50 100 150 200 250 300 350 400 450

pro

ba

bility

TMHMM posterior probabilities for 5H2A_CRIGR

transmembrane inside outside

Page 8: Bioinformatics Practical for Biochemists · Bioinformatics Practical for Biochemists Andrei Lupas, Birte Höcker, Steffen Schmidt SS 2012 03. Sequence Features

William (1987) Biochim Biophys Acta

Secondary Structure – amino acid preferences

∝-helix β-strand β-turn Glu 1.59 0.52 1.01

Ala 1.41 0.72 0.92

Leu 1.34 1.22 0.57

Met 1.30 1.14 0.52

Gln 1.27 0.98 0.84

Lys 1.23 0.69 1.07

Arg 1.21 0.84 0.90

His 1.05 0.80 0.81

Val 0.90 1.87 0.41

Ile 1.09 1.67 0.47

Tyr 0.74 1.45 0.76

Cys 0.66 1.40 0.54

Trp 1.02 1.35 0.65

Phe 1.16 1.33 0.59

Thr 0.76 1.17 0.90

Gly 0.43 0.58 1.77

Asn 0.76 0.48 1.34

Pro 0.34 0.31 1.32

Ser 0.57 0.96 1.22

Asp 0.99 0.39 1.24

Page 9: Bioinformatics Practical for Biochemists · Bioinformatics Practical for Biochemists Andrei Lupas, Birte Höcker, Steffen Schmidt SS 2012 03. Sequence Features

Secondary Structure – buried ß-sheet

Page 10: Bioinformatics Practical for Biochemists · Bioinformatics Practical for Biochemists Andrei Lupas, Birte Höcker, Steffen Schmidt SS 2012 03. Sequence Features

Secondary Structure – amphilic partially buried ∝-helix

Page 11: Bioinformatics Practical for Biochemists · Bioinformatics Practical for Biochemists Andrei Lupas, Birte Höcker, Steffen Schmidt SS 2012 03. Sequence Features

Secondary Structure – amphiphilic ß-strand

Page 12: Bioinformatics Practical for Biochemists · Bioinformatics Practical for Biochemists Andrei Lupas, Birte Höcker, Steffen Schmidt SS 2012 03. Sequence Features

Secondary Structure Prediction

• Toolkit - Ali2D 50 100 | | | | | | | | | | MASTVSNTSKLEKPVSLIWGCELNEQNKTFEFKVEDDEEKCEHQLALRTVCLGDKAKDEFHIVEIVTQEEGAEKSVPIATLKPSILPMATMVGIELTPPVTFRLKAGSG SS PSIPRED EEEEEEEE EEEE EEEEEEEEE EEEEEE EEEEEEEE EEE EE EEEEEE SS JNET EEEEEEE EEE HHHHHHHHHHHH EEEEEEEE EEEEEE EEEEEE SS Prof (Ouali) EEEEEEEEE EEEEEEE EEEEEEEEEEE EEEEEEEE EEEEEEEE EEEE EEE EEEEEEE SS Prof (Rost) EEEEEEEE EEEEEE HHHHHHHHHHHEEE EEEEEEEE EEEE EEEEEEEE EEEEEE CC Coils TM HMMTOP ------------------------------------------------------------------------------------------------------------- TM MEMSAT-SVM TM PHOBIUS DO DISOPRED2 DDDDDDDDDDDDDDDDD DO IUPRED DDD D DDDD DDD D D DDD DDDDDDD DDDD D DDDD DDDD SO Prof (Rost) B B B BBBBB B B B B B BBB BBB BBBB BB B BB B BB BB BBBB B B B B SO JNET B BBBBBBBB B B B B B BBBBBBBBBBB B BBBBBBB B BBBBBB B B BBBB B B B BBB B B 150 | | | | | | | | | | PLYISGQHVAMEEDYSWAEEEDEGEAEGEEEEEEEEDQESPPKAVKRPAATKKAGQAKKKKLDKEDESSEEDSPTKKGKGAGRGRKPAAKK SS PSIPRED EEEEEEEEEE HH SS JNET EEE E SS Prof (Ouali) EEEEEEEEEE SS Prof (Rost) EEEE EEEEE HHHH HHHHHH HHH HHHHHHHHHHHHH CC Coils CCCCCCCCCCCCCCCCCCCCC TM HMMTOP ------------------------------------------------------------------------------------------- TM MEMSAT-SVM TM PHOBIUS DO DISOPRED2 DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD DO IUPRED DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD SO Prof (Rost) BBB BBB B B B B B SO JNET BBBBBBBBBB SS = Alpha-Helix Beta-Sheet Secondary Structure CC = Coiled Coils TM = Transmembrane ('+'=outside, '-'=inside) DO = Disorder SO = Solvent accessibility (A burried residue has at most 25% of its surface exposed to the solvent.) !

Page 13: Bioinformatics Practical for Biochemists · Bioinformatics Practical for Biochemists Andrei Lupas, Birte Höcker, Steffen Schmidt SS 2012 03. Sequence Features

Secondary Structure Prediction

• Quick2D

Confidence: 0123456789 HHHHHHHHHH EEEEEEEEEE gi|148227972 0000 MASTVSNTSKLEKPVS-LIWGCELNEQNKTFEFKVE-DDEEKCEHQLALRTVCLGDKAKDEFHIVEIV---T---Q---E-E-GAEK-SVPIATLKPSILPMATMVGI gi|345312799 0001 ---------KAEKPVS-LIWGCELSQEKRSYTFD-S-LPVGKYDRQLALRTICLGEKAKDEVNLVEIVL--K---P---G-E-ESEK-RIPIATLQPSVLPMATLEGM gi|112419136 0002 --------SKTEKPLS-TLWGCELNESNKEESFKTD-DTNHQ--HQLALRTMCLGHTAKDEFNIVELV---T---G---EGN-SNAK-AVPIATLHAKSMPTVNLSGL gi|327282048 0003 -SSVTSENSGSSRPIN-LIWGCQLNSKNPSFTFDAP-EDPSY-DQELNLRTICLGENAKDEFNVVEIV---P---P---K-D-SKDTTPVHLATLKLSVLPMVALTGL gi|58177282| 0004 -----------------FLFGCELKADKKEYSFKVE-DDEN--EHQLSLRTVSLGASAKDELHVVEAE---G---I---NYE-GKTI-KIALASLKPSVQPTVSLGGF gi|348511872 0005 ------DISDMSRPQM-YLFGCTLKSDKREFKVDLD-GDET--EQQLSLKAVCLGAEAEDKFHMVEVE---G---L---TYD-GKTT-KVPLAVLKPSVLPSMSLGGF gi|338722403 0006 ---------------A-LLWGCELNQEKRTWTFK-P-QREGKQDCRLSLSTVCLGENAKEEMNLVEVL---P---PAGRE-D-KKTK-PVTIASLQASVLPMAVLMGL gi|226442972 0007 --------------VC-VLWGCELNDTQRNAVFEIA---EDLLEHQFFIRTMCLSAGASKEMHVVEVQ---D---R---V-G-EYCK-PVPIATLHPMCQPMVSFSGF gi|301606268 0008 --------SEFGKKPP-VVWGCVLSKDDKTYVFEPE-DD--FLEHLLELWTICLGAETKDETNVV-AA---E---L---R-Q-TQGK-PITIASLRPSVLPMINVNGL gi|348587890 0009 --STSSLEDKAERTVP---WGAELNFERRICTFRPQ-AEEDSC--RLVLSAICLGEKAKEEVNRVEIVPSAT---P---E-D-KRPQ-TITIATLKLSVLPMVAMAGI gi|327279051 0010 -----------------FAFGCELNSTTRSFTFQVN-EEDDS-DHSLVLSTVCLTASAKDECNIVELV---G---R---DYQ-NKEI-IVPVANLKSSCLPMVSLHNF gi|149689746 0011 ---------------S-FFFGCELSGHTRSFTFKVE--EEDDAEHVLALTMLCLTEGAKDECNVVEVV---A---Q---NHD-HQEI-AVPVANLKLSCQPMLSLDDF gi|161898384 0012 -------------PQT-FLYGCEL-KAGKEVTFNPE-DDDDY-DHQLSVRMACVDPTTKDELNIVEIE---G---Q---DSE-GQKV-KAVLATLKPSSLPSVCLGGF gi|148223515 0013 ---------------S-YLFGCELSSKIKQYTFQVN--EEDDAAHYVCLQTISLGAEAKDEHNVVEVTA--S---N---Y-Q-NKEV-TVPLANLKLSCQPMVNMGSF gi|61806538| 0014 --SVVDSRSRLE---S-YVFSCELSSGVPFYTFQAD-EDEDV-EHFLELRTICLGDGAKEENNVVEVT---AM--N---H-Q-GKKI-SVPVANLNITCLPMVSLGEF gi|118101359 0015 --------------------GCELGVNSRSCVVK---EDDDFLEHLVLLRTISLGADARDELHVVAVE---S---K---N-TYGDHK-PVPIASLRVSVLPMISLKGL !gi|148227972 0000 ELTPPVTFRLKAGSGPLYISGQHVAMEEDYSWAEEEDEGEAEGEEEEEEEEDQESPPKAVKRPAATKKAGQAKKKKLDKEDESSEEDSPTKKGKGAGRGRKPAAKK!gi|345312799 0001 ELTPPVTFRLRAGSGPVYISGQDITMVADVPW--------------------------------------------------------------------------!gi|112419136 0002 DLHPPVTFRLKHGSGPVFVGAEHVAL--------------------------------------------------------------------------------!gi|327282048 0003 ELNPPVTFRLKSGSGPVYLAGQHLA--ADLPWNEEEEDESLSKEDEDLEESSKEDSPVKFTKKAPAKRTSAATKEKELP-------VQEKPQEKKATRGRKPAAK-!gi|58177282| 0004 EITPPVILRLKSGSGPVYVSGQHLVALEDL----------------------------------------------------------------------------!gi|348511872 0005 EITPPVTFRLQSGSGPVYISGQHFISVKD-----------------------------------------------------------------------------!gi|338722403 0006 ELSPPVTFRLRNGSGPVFLSGQECYDTSDLSW--------------------------------------------------------------------------!gi|226442972 0007 ELMPPVIFNLRSGQGPLFISGQHLTL--------------------------------------------------------------------------------!gi|301606268 0008 EFSTPVTFTLKSGSGPVYISGIHISLVDD-----------------------------------------------------------------------------!gi|348587890 0009 SLSPPVTFQLRAGSGPVFLSGQ------------------------------------------------------------------------------------!gi|327279051 0010 ELEPPVTFRLRSGSGPVHLSGRH-----------------------------------------------------------------------------------!gi|149689746 0011 QLQPPVTFRLKSGSGPVRITGRH-----------------------------------------------------------------------------------!gi|161898384 0012 EITPPVVFRLRSGSGPVHISGQHLVI--------------------------------------------------------------------------------!gi|148223515 0013 EIEAPVTFRLTSGSGPVFISGRHYIVIDD-----------------------------------------------------------------------------!gi|61806538| 0014 ELMAPVTLRLKSGSGPVTISGLHLVATEN-----------------------------------------------------------------------------!gi|118101359 0015 EFVPPVTFMLQCGTGPVYLSGQHITLED------------------------------------------------------------------------------!

Confidence: 0123456789 HHHHHHHHHH EEEEEEEEEE gi|148227972 0000 MASTVSNTSKLEKPVS-LIWGCELNEQNKTFEFKVE-DDEEKCEHQLALRTVCLGDKAKDEFHIVEIV---T---Q---E-E-GAEK-SVPIATLKPSILPMATMVGI gi|345312799 0001 ---------KAEKPVS-LIWGCELSQEKRSYTFD-S-LPVGKYDRQLALRTICLGEKAKDEVNLVEIVL--K---P---G-E-ESEK-RIPIATLQPSVLPMATLEGM gi|112419136 0002 --------SKTEKPLS-TLWGCELNESNKEESFKTD-DTNHQ--HQLALRTMCLGHTAKDEFNIVELV---T---G---EGN-SNAK-AVPIATLHAKSMPTVNLSGL gi|327282048 0003 -SSVTSENSGSSRPIN-LIWGCQLNSKNPSFTFDAP-EDPSY-DQELNLRTICLGENAKDEFNVVEIV---P---P---K-D-SKDTTPVHLATLKLSVLPMVALTGL gi|58177282| 0004 -----------------FLFGCELKADKKEYSFKVE-DDEN--EHQLSLRTVSLGASAKDELHVVEAE---G---I---NYE-GKTI-KIALASLKPSVQPTVSLGGF gi|348511872 0005 ------DISDMSRPQM-YLFGCTLKSDKREFKVDLD-GDET--EQQLSLKAVCLGAEAEDKFHMVEVE---G---L---TYD-GKTT-KVPLAVLKPSVLPSMSLGGF gi|338722403 0006 ---------------A-LLWGCELNQEKRTWTFK-P-QREGKQDCRLSLSTVCLGENAKEEMNLVEVL---P---PAGRE-D-KKTK-PVTIASLQASVLPMAVLMGL gi|226442972 0007 --------------VC-VLWGCELNDTQRNAVFEIA---EDLLEHQFFIRTMCLSAGASKEMHVVEVQ---D---R---V-G-EYCK-PVPIATLHPMCQPMVSFSGF gi|301606268 0008 --------SEFGKKPP-VVWGCVLSKDDKTYVFEPE-DD--FLEHLLELWTICLGAETKDETNVV-AA---E---L---R-Q-TQGK-PITIASLRPSVLPMINVNGL gi|348587890 0009 --STSSLEDKAERTVP---WGAELNFERRICTFRPQ-AEEDSC--RLVLSAICLGEKAKEEVNRVEIVPSAT---P---E-D-KRPQ-TITIATLKLSVLPMVAMAGI gi|327279051 0010 -----------------FAFGCELNSTTRSFTFQVN-EEDDS-DHSLVLSTVCLTASAKDECNIVELV---G---R---DYQ-NKEI-IVPVANLKSSCLPMVSLHNF gi|149689746 0011 ---------------S-FFFGCELSGHTRSFTFKVE--EEDDAEHVLALTMLCLTEGAKDECNVVEVV---A---Q---NHD-HQEI-AVPVANLKLSCQPMLSLDDF gi|161898384 0012 -------------PQT-FLYGCEL-KAGKEVTFNPE-DDDDY-DHQLSVRMACVDPTTKDELNIVEIE---G---Q---DSE-GQKV-KAVLATLKPSSLPSVCLGGF gi|148223515 0013 ---------------S-YLFGCELSSKIKQYTFQVN--EEDDAAHYVCLQTISLGAEAKDEHNVVEVTA--S---N---Y-Q-NKEV-TVPLANLKLSCQPMVNMGSF gi|61806538| 0014 --SVVDSRSRLE---S-YVFSCELSSGVPFYTFQAD-EDEDV-EHFLELRTICLGDGAKEENNVVEVT---AM--N---H-Q-GKKI-SVPVANLNITCLPMVSLGEF gi|118101359 0015 --------------------GCELGVNSRSCVVK---EDDDFLEHLVLLRTISLGADARDELHVVAVE---S---K---N-TYGDHK-PVPIASLRVSVLPMISLKGL !gi|148227972 0000 ELTPPVTFRLKAGSGPLYISGQHVAMEEDYSWAEEEDEGEAEGEEEEEEEEDQESPPKAVKRPAATKKAGQAKKKKLDKEDESSEEDSPTKKGKGAGRGRKPAAKK!gi|345312799 0001 ELTPPVTFRLRAGSGPVYISGQDITMVADVPW--------------------------------------------------------------------------!gi|112419136 0002 DLHPPVTFRLKHGSGPVFVGAEHVAL--------------------------------------------------------------------------------!gi|327282048 0003 ELNPPVTFRLKSGSGPVYLAGQHLA--ADLPWNEEEEDESLSKEDEDLEESSKEDSPVKFTKKAPAKRTSAATKEKELP-------VQEKPQEKKATRGRKPAAK-!gi|58177282| 0004 EITPPVILRLKSGSGPVYVSGQHLVALEDL----------------------------------------------------------------------------!gi|348511872 0005 EITPPVTFRLQSGSGPVYISGQHFISVKD-----------------------------------------------------------------------------!gi|338722403 0006 ELSPPVTFRLRNGSGPVFLSGQECYDTSDLSW--------------------------------------------------------------------------!gi|226442972 0007 ELMPPVIFNLRSGQGPLFISGQHLTL--------------------------------------------------------------------------------!gi|301606268 0008 EFSTPVTFTLKSGSGPVYISGIHISLVDD-----------------------------------------------------------------------------!gi|348587890 0009 SLSPPVTFQLRAGSGPVFLSGQ------------------------------------------------------------------------------------!gi|327279051 0010 ELEPPVTFRLRSGSGPVHLSGRH-----------------------------------------------------------------------------------!gi|149689746 0011 QLQPPVTFRLKSGSGPVRITGRH-----------------------------------------------------------------------------------!gi|161898384 0012 EITPPVVFRLRSGSGPVHISGQHLVI--------------------------------------------------------------------------------!gi|148223515 0013 EIEAPVTFRLTSGSGPVFISGRHYIVIDD-----------------------------------------------------------------------------!gi|61806538| 0014 ELMAPVTLRLKSGSGPVTISGLHLVATEN-----------------------------------------------------------------------------!gi|118101359 0015 EFVPPVTFMLQCGTGPVYLSGQHITLED------------------------------------------------------------------------------!

Page 14: Bioinformatics Practical for Biochemists · Bioinformatics Practical for Biochemists Andrei Lupas, Birte Höcker, Steffen Schmidt SS 2012 03. Sequence Features

wikipedia.org, PDB: 2fft

Disordered Regions

• ~25% of eukaryotic protein sequence is intrinsically unstructured

• lack of hydrophobic residues

• often with overrepresentation of a few amino acids

Page 15: Bioinformatics Practical for Biochemists · Bioinformatics Practical for Biochemists Andrei Lupas, Birte Höcker, Steffen Schmidt SS 2012 03. Sequence Features

Secondary Structure Prediction

• Toolkit - Ali2D 50 100 | | | | | | | | | | MASTVSNTSKLEKPVSLIWGCELNEQNKTFEFKVEDDEEKCEHQLALRTVCLGDKAKDEFHIVEIVTQEEGAEKSVPIATLKPSILPMATMVGIELTPPVTFRLKAGSG SS PSIPRED EEEEEEEE EEEE EEEEEEEEE EEEEEE EEEEEEEE EEE EE EEEEEE SS JNET EEEEEEE EEE HHHHHHHHHHHH EEEEEEEE EEEEEE EEEEEE SS Prof (Ouali) EEEEEEEEE EEEEEEE EEEEEEEEEEE EEEEEEEE EEEEEEEE EEEE EEE EEEEEEE SS Prof (Rost) EEEEEEEE EEEEEE HHHHHHHHHHHEEE EEEEEEEE EEEE EEEEEEEE EEEEEE CC Coils TM HMMTOP ------------------------------------------------------------------------------------------------------------- TM MEMSAT-SVM TM PHOBIUS DO DISOPRED2 DDDDDDDDDDDDDDDDD DO IUPRED DDD D DDDD DDD D D DDD DDDDDDD DDDD D DDDD DDDD SO Prof (Rost) B B B BBBBB B B B B B BBB BBB BBBB BB B BB B BB BB BBBB B B B B SO JNET B BBBBBBBB B B B B B BBBBBBBBBBB B BBBBBBB B BBBBBB B B BBBB B B B BBB B B 150 | | | | | | | | | | PLYISGQHVAMEEDYSWAEEEDEGEAEGEEEEEEEEDQESPPKAVKRPAATKKAGQAKKKKLDKEDESSEEDSPTKKGKGAGRGRKPAAKK SS PSIPRED EEEEEEEEEE HH SS JNET EEE E SS Prof (Ouali) EEEEEEEEEE SS Prof (Rost) EEEE EEEEE HHHH HHHHHH HHH HHHHHHHHHHHHH CC Coils CCCCCCCCCCCCCCCCCCCCC TM HMMTOP ------------------------------------------------------------------------------------------- TM MEMSAT-SVM TM PHOBIUS DO DISOPRED2 DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD DO IUPRED DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD SO Prof (Rost) BBB BBB B B B B B SO JNET BBBBBBBBBB SS = Alpha-Helix Beta-Sheet Secondary Structure CC = Coiled Coils TM = Transmembrane ('+'=outside, '-'=inside) DO = Disorder SO = Solvent accessibility (A burried residue has at most 25% of its surface exposed to the solvent.) !

Page 16: Bioinformatics Practical for Biochemists · Bioinformatics Practical for Biochemists Andrei Lupas, Birte Höcker, Steffen Schmidt SS 2012 03. Sequence Features

Disorder Prediction

• IUPRED - http://iupred.enzim.hu/

Page 17: Bioinformatics Practical for Biochemists · Bioinformatics Practical for Biochemists Andrei Lupas, Birte Höcker, Steffen Schmidt SS 2012 03. Sequence Features

Short Linear Motifs – Characteristics

• 3-11 amino acids long

• poorly conserved / evolve fast

• 1-3 amino acids in the motifs are “hot spots”

• ~ 80% in disordered regions

• relatively low affinity to interacting partner (1-150µM)

• interaction via induced fit

Page 18: Bioinformatics Practical for Biochemists · Bioinformatics Practical for Biochemists Andrei Lupas, Birte Höcker, Steffen Schmidt SS 2012 03. Sequence Features

Short Linear Motifs – Function

• protein-protein interactions

• post-translational modifications

• e.g. Phosphorylation

• proteolytic cleavage/processing sites

• KEN / D box in cell cycle - degradation signals

• subcellular targeting sites

• NES - nuclear export signal

➡ modulation of interactions - fine tuning

Page 19: Bioinformatics Practical for Biochemists · Bioinformatics Practical for Biochemists Andrei Lupas, Birte Höcker, Steffen Schmidt SS 2012 03. Sequence Features

David Goodsell, http://www.rcsb.org/pdb/101/motm.do?momID=85

Short Linear Motifs – Nuclear Localization Signal (NLS)

• Impor&n-­‐beta  (1qgk;  blue)  recognizes  nuclear  pores  and  moves  through  them.  It  wraps  around  the  end  of importin-alpha (1ee5; green), an adaptor molecule that connects importin-beta with the cargo, here nucleoplasmin(1k5j; yellow), a chaperone important in nucleosome assembly. All interactions are mediated by linear motifs in unstructured segments (bipartite nuclear localization signals).

Page 20: Bioinformatics Practical for Biochemists · Bioinformatics Practical for Biochemists Andrei Lupas, Birte Höcker, Steffen Schmidt SS 2012 03. Sequence Features

ELM Resources

• elm.eu.org

• NUPL_XENLA

Page 21: Bioinformatics Practical for Biochemists · Bioinformatics Practical for Biochemists Andrei Lupas, Birte Höcker, Steffen Schmidt SS 2012 03. Sequence Features

NLS in nucleoplasmin

• Quick 2D secondary structure prediction for nucleoplasmin, showing the unstructured C-terminal tail and the bipartite nuclear localization motif

50 100 | | | | | | | | | | MASTVSNTSKLEKPVSLIWGCELNEQNKTFEFKVEDDEEKCEHQLALRTVCLGDKAKDEFHIVEIVTQEEGAEKSVPIATLKPSILPMATMVGIELTPPVTFRLKAGSGSS PSIPRED EEEEEEEE EEEE EEEEEEEEE EEEEEE EEEEEEEE EEE EE EEEEEE SS JNET EEEEEEE EEE HHHHHHHHHHHH EEEEEEEE EEEEEE EEEEEE DO DISOPRED2 DDDDDDDDDDDDDDDDD DO IUPRED DDD D DDDD DDD D D DDD DDDDDDD DDDD D DDDD DDDD SO Prof (Rost) B B B BBBBB B B B B B BBB BBB BBBB BB B BB B BB BB BBBB B B B B SO JNET B BBBBBBBB B B B B B BBBBBBBBBBB B BBBBBBB B BBBBBB B B BBBB B B B BBB B B

150 | | | | | | | | | | PLYISGQHVAMEEDYSWAEEEDEGEAEGEEEEEEEEDQESPPKAVKRPAATKKAGQAKKKKLDKEDESSEEDSPTKKGKGAGRGRKPAAKKSS PSIPRED EEEEEEEEEE HH SS JNET EEE E DO DISOPRED2 DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDO IUPRED DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDSO Prof (Rost) BBB BBB B B B B B SO JNET BBBBBBBBBB

 SS = Alpha-Helix Beta-Sheet Secondary StructureDO = DisorderSO = Solvent accessibility (A burried residue has at most 25% of its surface exposed to the solvent.)