supplementary materials for - rune linding · supplementary materials for linear motif atlas for...
TRANSCRIPT
www.sciencesignaling.org/cgi/content/full/1/35/ra2/DC1
Supplementary Materials for
Linear Motif Atlas for Phosphorylation-Dependent Signaling
Martin Lee Miller, Lars Juhl Jensen, Francesca Diella, Claus Jørgensen, Michele Tinti,
Lei Li, Marilyn Hsiung, Sirlester A. Parker, Jennifer Bordeaux, Thomas Sicheritz-Ponten,
Marina Olhovsky, Adrian Pasculescu, Jes Alexander, Stefan Knapp, Nikolaj Blom, Peer
Bork, Shawn Li, Gianni Cesareni, Tony Pawson, Benjamin E. Turk, Michael B. Yaffe,*
Søren Brunak,* Rune Linding*
*To whom correspondence should be addressed. E-mail: [email protected] (S.B.), [email protected]
(M.B.Y.), and [email protected] (R.L.)
Published 2 September 2008, Sci. Signal. 1, ra2 (2008)
DOI: 10.1126/scisignal.1159433
This PDF file includes:
Figure S1: Overview of the NetPhorest pipeline.
Figure S2: Phosphorylation data mapped onto domain trees.
Figure S3: Coverage of classifiers for targets of kinases, SH2 domains, and PTB
domains.
Figure S4: Score calibration.
Figure S5: Correlation between domain similarity and substrate specificity.
Figure S6: Kinase matrices from Positional Scanning Peptide Libraries (PSPL).
Figure S7: Sequence logos for kinases and pS/pT-binding domains.
Figure S8: Receiver output characteristic (ROC) curves for the NetPhorest
classifiers.
Table S1: The selected set of NetPhorest classifiers.
Table S2: Benchmark of the NetPhorest method.
Supplementary Information for:Linear motif atlas for phosphorylation-dependent signaling
Martin Lee Miller 1,2∗, Lars Juhl Jensen 2,3∗, Francesca Diella 3, Claus Jørgensen 4, Michele Tinti 5,Lei Li 6, Marilyn Hsiung 4, Sirlester A. Parker 7, Jennifer Bordeaux 7, Thomas Sicheritz-Ponten 1, Ma-rina Olhovsky 4, Adrian Pasculescu 4, Jes Alexander 8, Stefan Knapp 9, Nikolaj Blom 1, Peer Bork 2,10,Shawn Li 6, Gianni Cesareni 5, Tony Pawson 4, Benjamin E. Turk 7, Michael B. Yaffe 8†, Søren Brunak 1,2†
and Rune Linding 4,8,11†
1 Center for Biological Sequence Analysis, Technical University of Denmark, Lyngby, Denmark.2 The Novo Nordisk Foundation Centre for Protein Research, University of Copenhagen, Copenhagen, Denmark3 European Molecular Biology Laboratory, Heidelberg, Germany4 Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto, Canada5 University of Rome, Tor Vergata, Rome, Italy6 University of Western Ontario, London, Ontario, Canada7 Department of Pharmacology, Yale University School of Medicine, New Haven, USA8 Center for Cancer Research, Massachusetts Institute of Technology, Cambridge, USA9 Structural Genomics Consortium, University of Oxford, UK10 Max-Delbruck-Centre for Molecular Medicine, Berlin, Germany11 Cellular & Molecular Logic Team, The Institute of Cancer Research, London, UK∗ These authors contributed equally to this work† To whom correspondence should be addressed; E-mail: [email protected], [email protected] and [email protected]
Linear motif atlas for phosphorylation-dependent signaling 2
Data organization
In vivo sites and in vitro peptide data are
mapped onto phylogenetic domain trees
Dataset compilation
Positive and negative examples are compiled
for each domain or group of similar domains
Redundancy reduction
Near identical sites and sites located within
similar proteins are removed
Data partitioning
The non-redundant datasets are partitioned
into separate training, test and validation sets
Training of ANNs
A cross-validated ensemble of ANNs is
trained on each of the partitioned datasets
Evaluation of PSSMs
The PSSMs from in vitro experiments are
benchmarked on the non-redundant sites
Score calibration
The benchmark results are used to make the
scores from different classifiers comparable
Selection of classifiers
The subset of the classifiers is selected that
gives the best overall predictive power
Figure S1: Overview of the NetPhorest pipeline. The pipeline consists of a combination of Python, Perl and C-code that takes as inputa set of manually organized data sets and phylogenetic domain trees. Most of the steps in the flow chart are shown in more detail inFigures 1 and 2.
Linear motif atlas for phosphorylation-dependent signaling 3
B
A
PR
PK
CLIK
1L C
LIK1
SgK
071 KIS
TTK
NE
K10
NE
K6
NE
K7
NE
K2
NE
K11
NE
K4
NE
K3
NE
K5
NE
K1
NE
K8
NE
K9
MP
SK
1
GA
K
AA
K1
BIK
E
HR
I
GC
N2 1
EIF
2A
K2
PE
K MY
T1
Wee1
Wee1B
RN
AseL
IRE2
IRE1
CK2alpha
CK2a2
CDK10
PITS
LRE
CDK9
CHED
CR
K7
CD
K4
CD
K6
CD
K5
CD
K1
CD
K2
CD
K3
PC
TA
IRE
1
PC
TA
IRE
2
PC
TA
IRE
3
PFTA
IRE
1
PFTA
IRE
2
CD
K7
CD
K8
CD
K11
CC
RK
MA
PK
6
MA
PK
4
MA
PK
15
MA
PK
3
MA
PK
1
NLK
MA
PK
7
MA
PK
12
MA
PK
13
MA
PK
14
MA
PK
11
MA
PK
8
MA
PK
10
MA
PK
9
CD
KL5
CD
KL4
CD
KL1
CD
KL3
CD
KL2
MO
K
ICK
M
AK
GS
K3alp
ha
GS
K3beta
PR
P4
DY
RK
1A
D
YR
K1B
D
YR
K4
DY
RK
3
DY
RK
2
HIP
K4
HIP
K2
HIP
K1
HIP
K3
CLK
2
CLK
3
CLK
1
CLK
4
SR
PK
1
MS
SK
1
SR
PK
2
a
h
p
l a
K
K
M
a
C CaM
KK
beta
T
LK
1
TL
K2
a
h
p
l a
K
K
I IK
Kbeta
IK
Kepsilo
n
TB
K1
ULK3 ULK2
ULK1
Fused
ULK4
CHK2
PKD3 PKD1
PKD2
MSK2 2 MSK1 2 RSK4 2
RSK3 2 RSK2 2 RSK1 2
Mnk1 Mnk2
MAPKAPK5 MAPKAPK3 MAPKAPK2
PHKg2 PHKg1 CASK
CaMKIIalpha
CaMKIIdelta
CaMKIIbeta
CaMKIIgamma
DCAMKL3
DCAMKL2
DCAMKL1
PSKH2
PSKH1
VACAMKL
CaMKIV
CaMKIbeta
CaMKIgamma
CaMKIalpha
CaMKIdelta
STK33
DAPK3 DAPK1
DAPK2
DRAK2 DRAK1
TTN
smMLCK skMLCK SgK085
caMLCK
Trad Trio
Obscn 1
SPEG 1
SPEG 2
Obscn 2
HUNK MELK
MARK1 MARK2
MARK3 MARK4
QSK SIK QIK
NuaK1 NuaK2
AMPKa1 AMPKa2 BRSK2 BRSK1
LKB1 CHK1
NIM1 SNRK TSSK4 TSSK2 TSSK1 SSTK TSSK3
PASK Pim2 Pim3 Pim1
SgK495 Trb3 Trb1 Trb2
Auro
raA
A
uro
raC
A
uro
raB
P
LK
4
PLK
1
PLK
3
PLK
2
PD
K1
PK
Abeta
PK
Aalp
ha
PK
Agam
ma
PR
KY
PR
KX
P
KG
1cG
KI
PK
G2cG
KII
RSK1
RSK2
RSK3
RSK4
p70S
6Kb
p70S
6K
MSK2
1
RSK5 P
KBal
pha
PKBbe
ta
PKBga
mm
a SG
K2
SG
K3
SG
K1
PKN3
PKN1
PKN2
PKCiota
PKCze
ta
PKCde
lta
PKCth
eta
PKCet
a
PKCepsi
lon
PKCalpha
PKCbeta
PKCgam
ma
GR
K2
GR
K3
GR
K6
GR
K4
GR
K5
GR
K7
GR
K1
LA
TS
2
LA
TS
1
CR
IK
ND
R1
ND
R2
RO
CK
1
RO
CK
2
DM
PK
1
DM
PK
2
MR
CK
a
MR
CK
b
MA
ST
L
MA
ST
1
MA
ST
4
MA
ST
2
MA
ST
3
SgK
494
YA
NK
3
YA
NK
1
YA
NK
2
RS
KL1
RS
KL2
MAP4K1 MAP4K2 MAP4K5 KHS2
LOK SLK
MYO3A MYO3B NRK
MAP4K4 MAP4K6
TNIK
MST4 MST3
YSK1
MST1 MST2
TAO1 TAO2
TAO3
PAK1 PAK3
PAK2 PAK6
PAK4 PAK5
STLK3 OSR1
STLK6 STLK5
MAP2K5
MAP2K2
MAP2K1
MAP2K7
MAP2K4
MAP2K3
MAP2K6
MAP3K8
MAP3K14
GCN2 2
MAP3K4
MAP3K6
MAP3K7
MAP3K5
MAP3K1
MAP3K
8 2
MAP3K
2
MAP3K
3
TB
CK
C
DC
7
PIN
K1
SgK
269
SgK
223
Wnk3
Wnk1
Wn
k2
W
nk4
2
P
B
R
N
NR
BP
1
SC
YL2
SC
YL3
SC
YL1
PIK
3R
4
Slo
b
SgK
307
SgK
424
MO
S
SgK
496
PB
K
HS
ER
A
NP
a
AN
Pb
CY
GD
C
YG
F
ILK
BM
PR
1A
B
MP
R1B
A
LK
4
TG
FbR
1
ALK
7
ALK
2
ALK
1
TG
FbR
2
AC
TR
2B
A
CT
R2
MIS
R2
BM
PR
2
TE
SK
1
TE
SK
2
LIM
K2
LIM
K1
MAP3K
7 2
HH
498
ZAK
MAP3K
10
MLK
4 M
AP3K
11
MAP3K
9 DLK
M
AP3K
13
AR
AF
RAF1
BR
AF
KS
R1
KS
R2
LR
RK
1
LR
RK
2
RIP
K1
RIP
K3
RIP
K2
AN
KR
D3
SgK
288
MLK
L
IRA
K4
IRA
K2
IRA
K3
IRA
K1
Csk
CTK
Fer
Fes
FRK
BLK
Lck
HCK
Lyn
Fgr
Src
Yes Fyn
Brk
Srm
Abl2
Abl
ITK
Tec
TXK
BMX
BTK
FAK
PYK2
Syk
ZA
P70
EphA1
EphA2
EphA8
EphA5
EphA3 EphA4
EphA6
EphA7 EphB2
EphB1 EphB3
EphB4
Eph
B6
Eph
A10
Tie2
Tie1
PDGFRalpha
PDGFRbeta
FLT3
CSF1R
Kit
Ret FGFR4
FGFR2
FGFR3 FGFR1
FLT4
KDR
FLT1
Ros LTK ALK IRR InsR IGF1R DDR2 DDR1
Musk TRKA TRKC TRKB
ROR2 ROR1
RYK
Sky Axl Mer
Met Ron
TNK2 TNK1 Tyk2 JAK1 JAK2 JAK3 ErbB3 ErbB4 ErbB2 EGFR
CCK4 Lmr3 Lmr2 Lmr1 SuRTK106
Tyk2 2 JAK1 2 JAK3 2 JAK2 2
SB
K
SgK
069
SgK
110
SgK
493
Haspin
S
gK
196
SgK
396
BU
B1
BU
BR
1
CK
1gam
ma3
CK
1gam
ma1
CK
1gam
ma2
CK
1delta
C
K1epsilo
n
CK
1alp
ha2
CK
1alp
ha
TT
BK
2
TT
BK
1
VR
K3
VR
K1
VR
K2
RIOK2
RIOK3
RIOK1
TIF1beta
TIF1alpha
TIF1gamma
BrdT
Brd4
Brd2
Brd3
ADCK2
ADCK4
ADCK3
ADCK5
ADCK1
PDHK2
PDHK4
BCKD
K
PDHK1
PDHK3
DNAPK
SMG
1
TRRAP
mTO
R
ATM
ATR
AlphaK
3
Chak2
Chak1
eEF2K
AlphaK
2
AlphaK
1
Artificial neural network (ANN)
Position-specific scoring matrix (PSSM)
C
APPL2
APPL
SHC2
SHC3
SHC1
SHC4
FRS3
FRS2
DOK2
DOK3
DOK1
DOK6
DOK4
DOK5
DOK7
IRS1
IRS2
IRS4
PTB domains
SH2 domains
Kinase domains
50 sites
25 sites
50 sites
HSH
2D
SH
2D2A
2
SH2D
4A
SH2D
4B
ZAP70
2 Syk
2 Tyk
2 JAK1
JAK2
JAK3
GR
B10
GR
B14
GR
B7
ZA
P70 1
Syk
1
CTK
Csk
CB
LB
CB
L
CB
LC
ST
AT
4
ST
AT
1
ST
AT
3
ST
AT
2
ST
AT
5B
ST
AT
5A
ST
AT
6
SH
2B
2
SH
2B1
SH
2B
3
CH
N2
CHN1
SHC2
SHC3
SHC4
SHC1
Fer
Fes
SH2D2A
SH2D3C
BCAR3
RA
SA
1 2
R
AS
A1 1
P
LC
G1 1
P
LC
G2 1
SHB
SHF
SHD SHE
Tec
TXK
BM
X
BTK
ITK
LCP2
MIS
T
SH
2D6
BLN
K
INP
P5D
IN
PP
L1
SH
2D
1B
S
H2D
1A
P
LC
G1 2
PLC
G2 2
FR
K
SU
PT
6H
BR
DG
1
SH
3B
P2
CR
K C
RK
L
TN
S3
TN
S1
TE
NC
1
TN
S4
RIN
3
RIN
2
RIN
1
Brk
Srm
FRK 2
SH2D5
GRAP
GRB2
GRAP2
Abl2
Abl
SLAP2
SLAP
Src
Yes
Fyn
Fgr
HCK
Lyn
Lck
BLK
VAV1 V
AV3
VAV2
PTP
N11 1
PTP
N6 1
PTP
N6 2
PTP
N11 2
NC
K1
NC
K2
PIK
3R
3 2
PIK
3R
1 2
PIK
3R
2 2
PIK
3R
3 1
PIK
3R
2 1
PIK
3R
1 1
SOCS5 SOCS4
SOCS6 CISH SOCS1 SOCS3 SO
CS7 SOCS2
DAPP1
Figure S2: Phosphorylation data mapped onto domain trees. The number of phosphorylation sites annotated to be substrates ofindividual (orange bars) or groups (orange pies, centered on the group in question) of kinases (A), SH2 domains (B) and PTB domains(C) are shown on the respective phylogenetic domain trees. This data was used to develop artificial neural networks (ANNs). The bluebars show NetPhorest collection of position-specific scoring matrices (PSSMs) based on in vitro assays. NetPhorest also contains ANNsfor the WW domain of PIN1 and PSSMs for 14-3-3 and BRCT domains, all of which bind phosphorylated serines and threonines (notshown).
Linear motif atlas for phosphorylation-dependent signaling 4
A
PR
PK
CLIK
1L C
LIK1
SgK
071 TTK
KIS
NE
K10
NE
K7
NE
K6
NE
K2
NE
K11
NE
K4
NE
K3
NE
K5
NE
K1
NE
K8
NE
K9
MP
SK
1
GA
K
BIK
E
AA
K1
HR
I
GC
N2 1
EIF
2A
K2
PE
K MY
T1
Wee1
Wee1B
RN
AseL
IRE2
IRE1
CK2alpha
CK2a2
CDK10
PITS
LRE
CDK9
CRK7
CH
ED
CD
K6
CD
K4
CD
K5
CD
K1
CD
K3
CD
K2
PC
TA
IRE
1
PC
TA
IRE
2
PC
TA
IRE
3
PFTA
IRE
1
PFTA
IRE
2
CD
K7
CD
K11
CD
K8
CC
RK
MA
PK
6
MA
PK
4
MA
PK
15
MA
PK
1
MA
PK
3
MA
PK
7
NLK
MA
PK
13
MA
PK
12
MA
PK
14
MA
PK
11
MA
PK
10
MA
PK
8
MA
PK
9
CD
KL5
CD
KL1
CD
KL4
CD
KL2
CD
KL3
MO
K
ICK
M
AK
GS
K3beta
GS
K3alp
ha
PR
P4
DY
RK
1B
D
YR
K1A
D
YR
K4
DY
RK
2
DY
RK
3
HIP
K4
HIP
K2
HIP
K1
HIP
K3
CLK
3
CLK
2
CLK
1
CLK
4
SR
PK
1
SR
PK
2
MS
SK
1
a
h
p
l a
K
K
M
a
C
Ca
MK
Kb
eta
T
LK
2
TL
K1
a
t e
b
K
K
I a
h
p
l a
K
K
I n
o
l i s
p
e
K
K
I T
BK
1
ULK3 ULK2
ULK1
Fused
ULK4
CHK2
PKD1 PKD3
PKD2
MSK1 2 MSK2 2 RSK4 2
RSK2 2 RSK3 2 RSK1 2
Mnk1 Mnk2
MAPKAPK5 MAPKAPK3 MAPKAPK2
PHKg2 PHKg1 CASK
CaMKIIdelta
CaMKIIalpha
CaMKIIbeta
CaMKIIgamma
DCAMKL3
DCAMKL2
DCAMKL1
PSKH1
PSKH2
VACAMKL
CaMKIV
CaMKIbeta
CaMKIgamma
CaMKIalpha
CaMKIdelta
STK33
DAPK3 DAPK1
DAPK2
DRAK1 DRAK2
TTN
smMLCK skMLCK
caMLCK SgK085
Trio
Trad
Obscn 1
SPEG 1
SPEG 2
Obscn 2
HUNK MELK
MARK1 MARK2
MARK3 MARK4
QSK QIK SIK
NuaK2 NuaK1
AMPKa2 AMPKa1 BRSK1 BRSK2
LKB1 CHK1
SNRK NIM1 TSSK4 TSSK2 TSSK1 TSSK3 SSTK
PASK Pim2 Pim3 Pim1
SgK495 Trb3 Trb1 Trb2
Auro
raA
A
uro
raC
A
uro
raB
P
LK
4
PLK
1
PLK
3
PLK
2
PD
K1
PK
Abeta
PK
Aalp
ha
PK
Agam
ma
PR
KY
PR
KX
P
KG
1cG
KI
PK
G2cG
KII
RSK2
RSK1
RSK3
RSK4
p70S
6K
p70S
6Kb
MSK2
1
RSK5 P
KBal
pha
PKBbe
ta
PKBga
mm
a SG
K2
SG
K1
SG
K3
PKN3
PKN2
PKN1
PKCiota
PKCze
ta
PKCde
lta
PKCth
eta
PKCet
a
PKCepsi
lon
PKCalpha
PKCbeta
PKCgam
ma
GR
K3
GR
K2
GR
K6
GR
K4
GR
K5
GR
K1
GR
K7
LA
TS
1
LA
TS
2
CR
IK
ND
R2
ND
R1
RO
CK
1
RO
CK
2
DM
PK
1
DM
PK
2
MR
CK
b
MR
CK
a
MA
ST
L
MA
ST
1
MA
ST
4
MA
ST
2
MA
ST
3
SgK
494
YA
NK
3
YA
NK
2
YA
NK
1
RS
KL1
RS
KL2
MAP4K1 MAP4K2 KHS2 MAP4K5
SLK LOK
MYO3A MYO3B NRK
MAP4K6 MAP4K4
TNIK
MST3 MST4
YSK1
MST1 MST2
TAO1 TAO2
TAO3
PAK1 PAK3
PAK2 PAK6
PAK4 PAK5
OSR1 STLK3
STLK5 STLK6
MAP2K5
MAP2K1
MAP2K2
MAP2K7
MAP2K4
MAP2K6
MAP2K3
MAP3K8
MAP3K14
GCN2 2
MAP3K4
MAP3K6
MAP3K7
MAP3K5
MAP3K1
MAP3K
8 2
MAP3K
3
MAP3K
2
CD
C7
TB
CK
P
INK
1
SgK
269
SgK
223
Wnk3
Wnk1
Wn
k2
W
nk4
2
P
B
R
N
N
RB
P1
SC
YL2
SC
YL1
SC
YL3
PIK
3R
4
Slo
b
SgK
424
SgK
307
SgK
496
MO
S
PB
K
HS
ER
A
NP
b
AN
Pa
CY
GF
C
YG
D
ILK
BM
PR
1B
B
MP
R1A
T
GF
bR
1
ALK
4
ALK
7
ALK
2
ALK
1
TG
FbR
2
AC
TR
2
AC
TR
2B
BM
PR
2
MIS
R2
TE
SK
2
TE
SK
1
LIM
K1
LIM
K2
HH
498
MAP3K
7 2
ZAK
MAP3K
10
MLK
4 M
AP3K
9
MAP3K
11
DLK
M
AP3K
13
AR
AF
BR
AF
RAF1 KS
R2
KS
R1
LR
RK
2
LR
RK
1
RIP
K1
RIP
K3
RIP
K2
AN
KR
D3
SgK
288
MLK
L
IRA
K4
IRA
K2
IRA
K1
IRA
K3
CTK
Csk
Fer
Fes
FRK
BLK
Lck
HCK
Lyn
Fgr
Src
Yes Fyn
Srm
Brk
Abl2
Abl
ITK
Tec
TXK
BMX
BTK
FAK
PYK2
Syk
ZA
P70
EphA1
EphA2
EphA8
EphA3
EphA5 EphA4
EphA6
EphA7 EphB1
EphB2 EphB3
EphB4
Eph
A10
Eph
B6
Tie1
Tie2
PDGFRalpha
PDGFRbeta
FLT3
CSF1R
Kit
Ret FGFR4
FGFR3
FGFR2 FGFR1
FLT4
KDR
FLT1
Ros ALK LTK IRR InsR IGF1R DDR2 DDR1
Musk TRKA TRKB TRKC
ROR1 ROR2
RYK
Sky Axl Mer
Ron Met
TNK2 TNK1 Tyk2 JAK1 JAK3 JAK2 ErbB3 ErbB4 ErbB2 EGFR
CCK4 Lmr3 Lmr1 Lmr2 SuRTK106
Tyk2 2 JAK1 2 JAK2 2 JAK3 2
SB
K
SgK
069
SgK
110
SgK
493
Haspin
S
gK
196
SgK
396
BU
B1
BU
BR
1
CK
1gam
ma3
CK
1gam
ma1
CK
1gam
ma2
CK
1delta
n
o
l i s
p
e
1
K
C
a
h
p
l a
1
K
C
2
a
h
p
l a
1
K
C
TT
BK
1
TT
BK
2
VR
K3
VR
K2
VR
K1
RIOK2
RIOK3
RIOK1
TIF1beta
TIF1alpha
TIF1gamma
BrdT
Brd4
Brd2
Brd3
ADCK2
ADCK4
ADCK3
ADCK5
ADCK1
PDHK2
PDHK4
BCKD
K
PDHK1
PDHK3
DNAPK
SMG
1
TRRAP
mTO
R
ATM
ATR
AlphaK
3
Chak2
Chak1
eEF2K
AlphaK
2
AlphaK
1
Artificial neural network (ANN)
Position-specific scoring matrix (PSSM)
B
HSH
2D
SH
2D2A
2
SH2D
4A
SH2D
4B
ZAP70
2 Syk
2 Tyk
2 JAK1 JA
K2 JAK3
GR
B10
GR
B14
GR
B7 Z
AP
70 1
Syk
1
CTK
C
sk
ST
AT
4
ST
AT
1
ST
AT
3
ST
AT
2
ST
AT
5B
ST
AT
5A
SHB
SHF
SHD SHE
Tec
TXK
BM
X
BTK
ITK
LCP2
MIS
T
SH
2D6
BLN
K
INP
P5D
IN
PP
L1
SH
2D
1B
S
H2D
1A
P
LC
G1 2
P
LC
G2 2
FR
K
SU
PT
6H
BR
DG
1
SH
3B
P2
CR
K
SH
2B
2
SH
2B1 SH
2B
3
CH
N2
CHN1
SHC2
SHC3
SHC4
SHC1
Fer
Fes
SH2D2A
SH2D3C
BCAR3
RA
SA
1 2
R
AS
A1 1
P
LC
G1 1
P
LC
G2 1
CR
KL
TN
S3
TN
S1
TE
NC
1
TN
S4
RIN
3
RIN
2
RIN
1
Brk
Srm
FRK 2 SH2D5
GRAP
GRB2
GRAP2
Abl2
Abl
SLAP2
SLAP
ST
AT
6
CB
LB
CB
L
CB
LC
Src
Yes
Fyn
Fgr
HCK
Lyn
Lck
BLK
VAV1
VAV3
VAV2
PTP
N11 1
PTP
N6 1
PTP
N6 2
PTP
N11 2
NC
K1
NC
K2
PIK
3R
3 2
PIK
3R
1 2
PIK
3R
2 2
PIK
3R
3 1
PIK
3R
2 1
PIK
3R
1 1
SOCS5 SOCS4 SOCS6 CISH SOCS1 SOCS3 SO
CS7 SO
CS2
DAPP1
C
APPL2 APPL
SHC2
SHC3
SHC1
SHC4
FRS3
FRS2
DOK2
DOK3
DOK1
DOK6
DOK4
DOK5
DOK7
IRS1
IRS2
IRS4
Kinase domains
SH2 domains
PTB domains
Figure S3: Coverage of classifiers for targets of kinases, SH2 domains and PTB domains. Bars show the kinases, SH2 domains andPTB domains that are covered by artificial neural networks (orange) and position-specific scoring matrices (blue). Based on current data,the NetPhorest pipeline yields a non-redundant collection of 125 sequence-based classifiers that cover 179 of 518 kinase domains, 93 of118 SH2 domains and 8 of 18 phosphotyrosine-binding PTB domains (the figure is best viewed by zooming in a PDF viewer). NetPhorestalso contains ANNs for the WW domain of PIN1 and PSSMs for 14-3-3 and BRCT domains, all of which bind phosphorylated serines andthreonines (not shown). The trees for kinase and SH2 domains were obtained from Manning et al. 1 and Liu et al. 2, respectively. It shouldbe noted that some of the kinase domains, for example TRRAP, do not appear to have catalytic activity. None of these are covered bythe sequence-based classifiers.
Linear motif atlas for phosphorylation-dependent signaling 5
0 0.2 0.4 0.6 0.8 1 ANN output score
0
0.2
0.4
0.6
0.8
1
Positiv
e p
redic
tive v
alu
e (
PP
V)
Binned output scores
Fitted sigmoid
Rescaled sigmoid
Score calibration for CK2 family kinases
χ2 = 0.281
Figure S4: Score calibration. To make the output scores from different sequence-based classifiers directly comparable, we calibratedthe scores through benchmarking on our compilation of phosphorylation sites. As an example, we show here the score calibration forthe CK2 group of kinases, which includes the CK2α1 and CK2α2 isoforms. The fraction of correct predictions (positive predictive value,PPV) is calculated within different score windows (running bins) on the validation set (orange curve). Subsequently, a sigmoid functionis fitted to these values, minimizing the sum of squared errors. The resulting calibration curves enables us to estimate the posteriorprobability that a site is phosphorylated by a particular kinase or group of kinases. However, because the fractions of positive exampleswithin the redundancy-reduced data sets do not reflect the corresponding prior probabilities, the calibration curves has to be rescaled(dotted curve, see Methods for details).
B
SH2 domains
A
0 0.2 0.4 0.6 0.8 1 Domain similarity
0
0.2
0.4
0.6
0.8
1
y
t i r a
l i
m
i s
e
t
a
r t s
b
u
S
Kinase domains
ATM/ATR
DNAPK
CDK1
CDK2/CDK3
JNK
p38
0 0.2 0.4 0.6 0.8 1 Domain similarity
0
0.2
0.4
0.6
0.8
1
y
t i r a
l i
m
i s
e
t
a
r t s
b
u
S
Figure S5: Correlation between domain similarity and substrate specificity. Calculating the similarity between two sequence motifsis non-trivial, in particular if the motifs are described by different types of sequence models (ANNs and PSSMs). To assess the similaritybetween motifs, we created 100,000 random phosphopeptides and scored each of them using all classifiers in NetPhorest. We therebyobtained a set of 100,000 dimensional score vectors, one for each sequence model, and defined the similarity between two motifs asthe cosine similarity between their score vectors. The corresponding domain similarities were calculated based on pairwise sequencealignments (self-normalized bit scores, see Methods for details) between all domains in the kinase and SH2 tree. For internal nodesthis was calculated by averaging over the pairwise sequence-similarity of each member in the group across all other nodes. Substratesimilarity was plotted against domain similarity for kinases (A) and SH2 domains (B). Note that, although significant, the correlationbetween domain and substrate similarity is far from perfect (R2 = 0.18, P < 10−12 and R2 = 0.31, P < 10−6 for kinases and SH2domains, respectively; only domain pairs with a self-normalized bitscore of 0.3 or higher are considered). Two groups of kinase domainsthat have diverged greatly in sequence but have retained their substrate specificity are highlighted in orange and blue, respectively.
Linear motif atlas for phosphorylation-dependent signaling 6
A
BFixed residue
Fix
ed
po
sitio
n G A V I C P L M F Y W H K R Q N D E
– 5 – 4 – 3 – 2 – 1 +1 +2 +3 +4
T S
Y - A - Z - X - X - X - X - S/T - X - X - X - X - A - G - K - K(biotin) – 5 set
Y - A - X - Z - X - X - X - S/T - X - X - X - X - A - G - K - K(biotin) – 4 set Y - A - X - X - Z - X - X - S/T - X - X - X - X - A - G - K - K(biotin) – 3 set Y - A - X - X - X - Z - X - S/T - X - X - X - X - A - G - K - K(biotin) – 2 set Y - A - X - X - X - X - Z - S/T - X - X - X - X - A - G - K - K(biotin) – 1 set
Y - A - X - X - X - X - X - S/T - Z - X - X - X - A - G - K - K(biotin) +1 set Y - A - X - X - X - X - X - S/T - X - Z - X - X - A - G - K - K(biotin) +2 set Y - A - X - X - X - X - X - S/T - X - X - Z - X - A - G - K - K(biotin) +3 set Y - A - X - X - X - X - X - S/T - X - X - X - Z - A - G - K - K(biotin) +4 set
Fixed residue
Fix
ed
po
sitio
n G A V I C P L M F Y W H K R Q N D E
– 5 – 4 – 3 – 2 – 1 +1 +2 +3 +4
T S
Fixed residue
Fix
ed
po
sitio
n G A V I C P L M F Y W H K R Q N D E
– 5 – 4 – 3 – 2 – 1 +1 +2 +3 +4
T S
Fixed residue
Fix
ed
po
sitio
n G A V I C P L M F Y W H K R Q N D E
– 5 – 4 – 3 – 2 – 1 +1 +2 +3 +4
T S
Fixed residue
Fix
ed
po
sitio
n G A V I C P L M F Y W H K R Q N D E
– 5 – 4 – 3 – 2 – 1 +1 +2 +3 +4
T S
Fixed residue
Fix
ed
po
sitio
n G A V I C P L M F Y W H K R Q N D E
– 5 – 4 – 3 – 2 – 1 +1 +2 +3 +4
T S
Fixed residue
Fix
ed
po
sitio
n G A V I C P L M F Y W H K R Q N D E
– 5 – 4 – 3 – 2 – 1 +1 +2 +3 +4
T S
Fixed residue
Fix
ed
po
sitio
n G A V I C P L M F Y W H K R Q N D E
– 5 – 4 – 3 – 2 – 1 +1 +2 +3 +4
T S
p T p Y p T p Y
Clk2
p T p Y p T p Y
Clk3
p T p Y p T p Y
LKB1
p T p Y p T p Y
MST1
p T p Y p T p Y
MST4
p T p Y p T p Y
S6K1
p T p Y p T p Y
TLK1
p T p Y p T p Y
TNIK Figure S6: Kinase matrices from Positional Scanning Peptide Libraries (PSPL). Protein kinase phosphorylation motifs for Clk2, Clk3,LKB1, Mst1, Mst4, S6K1, Tlk1 and TNIK were determined using arrayed positional scanning peptide libraries as previously described 3.Briefly, we used a series of biotinylated peptides in which each of nine positions surrounding a central phosphorylation site were system-atically substituted with each of the twenty proteogenic amino acids. This set of peptides was subjected in parallel to radiolabel kinaseassays followed by capture on streptavidin membranes. Membranes were washed and exposed to a phosphor storage screen.
Linear motif atlas for phosphorylation-dependent signaling 7
AMPK_group KIN PSSM
-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 70.0
2.0
4.0
bit
s
N
HR ILM
R TS
RL
C
ATM_ATR_group KIN ANN
-7 -6 -5 -4 -3 -2 -1 0+
1+
2+
3+
4+
5+
6+
7
0.0
1.5
3.0
bit
s
N
ILS
QES
DPQ
TESG
SESLT
SYA
QETSG
QVDS
STLES
C
Abl_group KIN ANN
-7 -6 -5 -4 -3 -2 -1 0+
1+
2+
3+
4+
5+
6+
7
0.0
2.0
4.0
bit
s
N
PGPSEAGN
AQDPEGS
GP
VYASTQ
IVP
SV
ASP
C
AuroraA KIN ANN
-7 -6 -5 -4 -3 -2 -1 0+
1+
2+
3+
4+
5+
6+
7
0.0
1.5
3.0
bit
s
N
LRVPKQ
LEASR
KESGPQN
EKPQLS
FKLQSPR
GQCAVKLR
IRT
SKGVAR
QTVKLP
IHSAEVP
GFENVLP
KQA
KNTEPS
LPQDEAR
C
AuroraC_AuroraB_group KIN ANN
-7 -6 -5 -4 -3 -2 -1 0+
1+
2+
3+
4+
5+
6+
7
0.0
1.5
3.0b
its
N
ELNPRVYISTK
NPQMSTK
GEKQS
QKVGA
YQPHTKAR
TLKR
NSVLEI
KR
TSNLS
YLKGCNSAIV
IGWKPDRVAS
VSNMIGDAREK
IPSTAE
SEPLRPA
C
CDK1 KIN ANN
-7 -6 -5 -4 -3 -2 -1 0+
1+
2+
3+
4+
5+
6+
7
0.0
1.5
3.0
bit
s
N
S PS
PSPGST
SGHKLEP
QSKV
SKR
KASLP
C
CDK2_CDK3_group KIN ANN
-7 -6 -5 -4 -3 -2 -1 0+
1+
2+
3+
4+
5+
6+
7
0.0
1.5
3.0
bit
s
N
LRGNPS
PGVSPP
KGS
RSV
AP
GELT
SHPLGVRS
TVAQHSRK
RQKS
LR
GAKD
C
CDK4_CDK6_group KIN ANN
-7 -6 -5 -4 -3 -2 -1 0+
1+
2+
3+
4+
5+
6+
7
0.0
1.5
3.0
bit
s
N
HKPRSDM
IMPSVR
KHPGSD
KI
MHQFEA
RAQPL
HYP
GFEALS
TSPT
RPLKIFA
IHPSTAR
LIHAGV
KIHREAS
KQREA
YRPLKI
Q
C
CDK5 KIN PSSM
-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7
0.0
2.0
4.0
bit
s
N
TSPQ
RK
RHK
KHR
C
CDK7 KIN ANN
-7 -6 -5 -4 -3 -2 -1 0+
1+
2+
3+
4+
5+
6+
7
0.0
1.0
2.0
bit
s
N
LI
REDYG
LHVCI
RS
YVSRDAP
INQFVPK
KMGPSR
FQRDTVPA
GPFTLY
ST
IDHPLFNPQVYE
LPRDTVLNFQSAV
IYPTHTLR
IHGVW
C
CHK2 KIN PSSM
-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7
0.0
2.0
4.0
bit
s
N
YWVFMIL
KR
AMSRQT
LNT
SMLFYI
HWCYRTS
C
CK1_group KIN PSSM -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7
0.0
2.0
4.0
bit
s
N
YETDS
GDSAT
AGT
S
C
CK2_group KIN ANN
-7 -6 -5 -4 -3 -2 -1 0+
1+
2+
3+
4+
5+
6+
7
0.0
1.5
3.0
bit
s
N
EEESDE
EDT
SGSED
SED
SDE
SDE
DE
DEE
C
CLK_group KIN PSSM
-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7
0.0
2.0
4.0
bit
s
N
R HKR S
R
C
CaMKIIalpha_CaMKIIdelta_group KIN ANN
-7 -6 -5 -4 -3 -2 -1 0+
1+
2+
3+
4+
5+
6+
7
0.0
1.5
3.0
bit
s
N
GSL
KSRTV
KFYI
RL
VLRSE
TAKLPR
RLASQ
KTS
KQGLV
LERSD
KRDES
QSVEAL
LHVKS
IQAGLD
S
C
CaMKIIbeta_CaMKIIgamma_group KIN PSSM
-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7
0.0
2.0
4.0
bit
s
N
YFK
KR
NM
QKLFQMT
SYLMVIF
TEISD
LMI
K
YKF
C
DMPK_group KIN PSSM
-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7
0.0
2.0
4.0
bit
s
N
LRLR
RLT
SLV
C
DNAPK KIN ANN
-7 -6 -5 -4 -3 -2 -1 0+
1+
2+
3+
4+
5+
6+
7
0.0
1.5
3.0
bit
s
N
YNHGEMLVQS
MHALGFQST
PTVGS
QLTEA
NPGFT
QSATG
QDASLT
SLN
QQTGVAEDS
GNVASPQT
GHFKS
LQSGPD
STAFLNPGSTEQ
C
Linear motif atlas for phosphorylation-dependent signaling 8
EGFR_group KIN PSSM -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7
0.0
2.0
4.0
bit
s
N
NPDE
PNDE
NDEP
TVI
DEYY
VLIF
FMYILVF
VP
C
EIF2AK2 KIN ANN
-7 -6 -5 -4 -3 -2 -1 0+
1+
2+
3+
4+
5+
6+
7
0.0
0.5
1.0
bit
s
N
DILMPS
IKMVE
GNPSL
ELMPD
FGIVS
EKLTP
EFLR
STEGPREFLRST
KRSTG
EGILR
DGRT
GRSVL
DEI
R
C
EphA7_EphA6_EphA4_EphA3_EphA5_group KIN PSSM
-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7
0.0
2.0
4.0
bit
s
N
TYFIVDLEYC
YQADE
NIYDVE
FYLIVP
C
EphB4_EphB3_EphB1_EphB2_group KIN PSSM
-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7
0.0
2.0
4.0
bit
s
N
EDNYD
E
C
FLT3_CSF1R_Kit_group KIN ANN
-7 -6 -5 -4 -3 -2 -1 0+
1+
2+
3+
4+
5+
6+
7
0.0
1.5
3.0b
its
N
IPESTD
LNHGIE
IGDAMS
ESTN
GEAND
HANSY
ILFEVNYK
FSIV
NDVY
IPFVKL
TRLAGD
HPQFDN
YTSRLCA
KPQSER
C
GRK_group KIN ANN
-7 -6 -5 -4 -3 -2 -1 0+
1+
2+
3+
4+
5+
6+
7
0.0
1.5
3.0
bit
s
N
DLQESD
STERD
TES
PEGVADT
SSTE
VDS
TDGVS
SSG
C
GSK3_group KIN ANN
-7 -6 -5 -4 -3 -2 -1 0+
1+
2+
3+
4+
5+
6+
7
0.0
1.5
3.0
bit
s
N
PSTVEG
SPEPS
QSPLSTAP
GLDSVPT
SKEALSVRP
SP
GSP
DRVPTS
HQTRSP
C
RCK_group KIN PSSM
-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7
0.0
2.0
4.0
bit
s
N
RCKPR
QTSLRCP
ASTYRT
SRGTSAP
DRST
C
InsR_IGF1R_group KIN ANN
-7 -6 -5 -4 -3 -2 -1 0+
1+
2+
3+
4+
5+
6+
7
0.0
1.5
3.0
bit
sN
SGKE
YTKDASE
ER
KETGPND
GPQRSDYMS L
TSEY
SGPKLG
K
C
JAK2_JAK3_group KIN ANN
-7 -6 -5 -4 -3 -2 -1 0+
1+
2+
3+
4+
5+
6+
7
0.0
1.5
3.0
bit
s
N
ILDVAPS
HPSL
HGSALP
NPQEISLAT
HI
QDS
YSFGLKAD
PEDYV
YAL
PTVK
IMPETLV
FNPQRDVKS
KITVHEPFD
ITSP
EDTG
C
JNK_group KIN ANN
-7 -6 -5 -4 -3 -2 -1 0+
1+
2+
3+
4+
5+
6+
7
0.0
1.5
3.0
bit
s
N
PSA
S SDE
SALP
TAPSLT
SQAPEP
EAPTG
ASPS
C
LKB1 KIN PSSM -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7
0.0
2.0
4.0
bit
s
N
ASPYFW
MWYF
VLIMRT W
PT
C
Src_group KIN PSSM
-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7
0.0
2.0
4.0
bit
s
N
DE
GDEELTVI YN
DEAQG
EYFVIL
C
MAP2K6_MAP2K3_MAP2K4_MAP2K7_group KIN ANN
-7 -6 -5 -4 -3 -2 -1 0+
1+
2+
3+
4+
5+
6+
7
0.0
1.0
2.0
bit
s
N
QRI
KLSTA
GNQRTCDVS
SRMLKFEAGTD
YVNMDQES
ISVGFTLM
LMEVAT
QAPKDGM
SYT
IGLMRDWPV
TVMDYA
DWKSTVA
MDPSWGTRV
YSRNMEACWT
KLPSVDYR
WEISYPDAR
C
p38_group KIN PSSM
-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7
0.0
2.0
4.0
bit
s
N
IVLP
QTSP F
C
MAPK3_MAPK1_MAPK7_NLK_group KIN ANN
-7 -6 -5 -4 -3 -2 -1 0+
1+
2+
3+
4+
5+
6+
7
0.0
1.5
3.0
bit
s
N
SP
PS
ALSP
ALP
SLT
SPSP
SPL
SP
S
C
MAPKAPK_group KIN ANN
-7 -6 -5 -4 -3 -2 -1 0+
1+
2+
3+
4+
5+
6+
7
0.0
1.5
3.0
bit
s
N
KLPVRG
RSTVA
SPL
GKLEPARS
KLQSDAPTR
PVGTAQS
RTPELS
RVWPL
GRSFED
GEDPVLS
LEPQSELPS
RSRVASGP
C
MST_group KIN PSSM
-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7
0.0
2.0
4.0
bit
s
N
SYW
HTQAWY
MRKGNT I
WLM
QNHRK
ATSKR
PKR
C
Linear motif atlas for phosphorylation-dependent signaling 9
YSK_group KIN PSSM
-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 70.0
2.0
4.0
bit
s
N
R CNRAHWY
MRNGTY
TWCLVM
TGKR
KR
HGKR
C
Met_group KIN ANN
-7 -6 -5 -4 -3 -2 -1 0+
1+
2+
3+
4+
5+
6+
7
0.0
1.5
3.0
bit
s
N
YRMES
GETDVA
LIHAD
SVL
NPSAQREGD
PSVADE
HLSTNEYT
PMHCLYDV
HNSVIP
INELPMV
QTNPD
YRNLKCQTAP
NTFS
SVYGT
C
NEK1_NEK5_NEK3_NEK4_NEK11_NEK2_group KIN PSSM
-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7
0.0
2.0
4.0
bit
s
N
FW
YGWMLF
TSLVTSMI
SR
C
PAKA_group KIN PSSM
-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7
0.0
2.0
4.0
bit
s
N
RRSWHKR
VPSTHR
RTS
WLY
C
PAKB_group KIN PSSM
-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7
0.0
2.0
4.0b
its
N
WQSKR
HSYR
KRT
SSYMVI
W
VSYCA
YFWS
C
PDGFR_group KIN PSSM
-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7
0.0
2.0
4.0
bit
s
N
TPSDE
TPSDE
TPSDE
TPSDEYE
YMFIV
PEYD
YLMFIV
E
C
PDK1 KIN ANN
-7 -6 -5 -4 -3 -2 -1 0+
1+
2+
3+
4+
5+
6+
7
0.0
1.5
3.0
bit
s
N
GLFPERAS
GNSYEP
QRSAEG
YVNKFTEASD
NSDAKLPRT
RMKGFQSEAT
QNMHDAKTFS
STPMLSYF
GFEDASTVC
ILNPFTVAG
IHVSPT
KINGEATPLTYDE
WFLNRTSAY
C
PKA_group KIN ANN
-7 -6 -5 -4 -3 -2 -1 0+
1+
2+
3+
4+
5+
6+
7
0.0
1.5
3.0
bit
s
N
PSKR
SKR
TS
C
PKB_group KIN PSSM
-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7
0.0
2.0
4.0
bit
s
N
KR
KR
ST T
SMVLI
WF
NSGP
C
PKC_group KIN ANN
-7 -6 -5 -4 -3 -2 -1 0+
1+
2+
3+
4+
5+
6+
7
0.0
1.5
3.0
bit
s
N
SKR
R TS
KR
R
C
PKD_group KIN PSSM
-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7
0.0
2.0
4.0
bit
s
N
AYPAK
TMSIVL
IALV
TSHRK
EQK
RGHLMK
CWTS
YWIFLMV
IAVM
MWYF
MWYVF
YWPMF
C
PKGcGK_group KIN ANN -7 -6 -5 -4 -3 -2 -1 0+
1+
2+
3+
4+
5+
6+
7
0.0
1.5
3.0
bit
s
N
PSVAGR
A IVAKR
ESLRVK
HSTVPQGAR
VGLKR
AI
KRVT
SKQSAR
GS
KPSGAQE
RGQVA
TPSK
PR
GLPDA
C
PLK1 KIN ANN
-7 -6 -5 -4 -3 -2 -1 0+
1+
2+
3+
4+
5+
6+
7
0.0
1.5
3.0
bit
s
N
LSELPES
VNTLES
LP
TALDSE
LESDT
SI
KSLP
KQALEVS
DNLI
TSPL
SDA
LTAKDS
C
Pim3_Pim1_group KIN PSSM
-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7
0.0
2.0
4.0
bit
s
N
WHKR
RQHKR
QKNYRH T
S
C
Pim2 KIN PSSM
-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7
0.0
2.0
4.0
bit
s
N
GPKR
KR
QHKR
YKNQRH T
SG
C
ROCK_group KIN ANN
-7 -6 -5 -4 -3 -2 -1 0+
1+
2+
3+
4+
5+
6+
7
0.0
1.0
2.0
bit
s
N
KIPEDLARS
KMNPQDGTRS
QSVIPAER
PFEVDAKGSR
YVTHFAQKR
LHGSVKYR
TPNKALSIYR
TS
KPTVYGLQS
GSEKQRAV
QPMIHGAVRS
NFRVGI
MS
WSMFEDAPGRV
EPQY
YWVSQLFMTGP
C
RSK_group KIN ANN
-7 -6 -5 -4 -3 -2 -1 0+
1+
2+
3+
4+
5+
6+
7
0.0
1.5
3.0
bit
s
N
HTGPRS
GRS
LSDYR
QERSLSWPGR
ILQRS
PMDLGR
T
STDLER
NESGP
GNPSDALR
GCVAIPS
YSQPARG
GEAPFS
RMLFADP
C
SGK_group KIN ANN
-7 -6 -5 -4 -3 -2 -1 0+
1+
2+
3+
4+
5+
6+
7
0.0
1.5
3.0
bit
s
N
ILQEGSDAR
ERTCGFQSPRG
QREDLPASRY
TRPHALS
GNQRWSCVL
TS
IWLNPSMKAV
LSVAGPDT
LEQTDGNRS
GLEQV
HVNPA
ILNPRYESG
C
Linear motif atlas for phosphorylation-dependent signaling 10
SLK_group KIN PSSM
-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7
0.0
2.0
4.0
bit
s
N
RKNQKHWR
WTFKHRYNYHRT I
FYLWM
GNHKR
HKR
HKR
C
Syk_group KIN ANN
-7 -6 -5 -4 -3 -2 -1 0+
1+
2+
3+
4+
5+
6+
7
0.0
1.5
3.0
bit
s
N
LSE
SPE
PED
LESTD
NQSED
GNSED
PEDYK
YQLDVE
NTED
MEALPV
NDLPEEDVP
C
ACTR2_ACTR2B_TGFbR2_group KIN PSSM
-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7
0.0
2.0
4.0
bit
s
N
NSTR
QKRNSDE
SNDEKRT
SS
C
TLK_group KIN PSSM
-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7
0.0
2.0
4.0
bit
s
N
ESD
GNTSDE
HNEDT
SRMNQ
DW W
C
MSN_group KIN PSSM
-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7
0.0
2.0
4.0
bit
s
N
STRGYP
WYPHNGTC
WI
ML
TSKR
STNHPKR
TSNKR
C
Tec_group KIN ANN
-7 -6 -5 -4 -3 -2 -1 0+
1+
2+
3+
4+
5+
6+
7
0.0
1.5
3.0
bit
s
N
IKLQRSTE
SR
KNEVHQLS
HLPQRTVE
ILE
LQDA
IHPETVDLYY
TMLKIADV
KFPYINQET
MGL
QVLENPRT
IKQRFDNP
TE
YNLEDPRGA
C
mTOR KIN ANN
-7 -6 -5 -4 -3 -2 -1 0+
1+
2+
3+
4+
5+
6+
7
0.0
1.0
2.0
bit
s
N
LFPRS
GLFNQTP
IHMGPES
HGFETP
KGPDCVL
IFQTVYR
KFRTNS
TS
YTPERVSP
DSTAGR
FSKGD
ILGERT
TRLGAP
IGFSTAP
C
p70S6K_group KIN PSSM
-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7
0.0
2.0
4.0
bit
s
N
TNHKR
RK
YSWKHR
SHLYRT
YTS
TNCSR
C
any_group 1433 PSSM
-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7
0.0
2.0
4.0
bit
s
N
HKR T
SYFTSGP
C
any_group BRCT PSSM
-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7
0.0
2.0
4.0
bit
s
N
TS
VYLVIF
KYLVIF
C
APPL_group PTB ANN
-7 -6 -5 -4 -3 -2 -1 0+
1+
2+
3+
4+
5+
6+
7
0.0
1.5
3.0
bit
s
N
KTYLPDVS
KDCQSVWYRL
HGFRASL
KNPQRVWSCHNH
AP
TQLKIGFASMYY
PNMLEARVF
NHGWALF
IGPFRAVL
C
FRS_group PTB ANN
-7 -6 -5 -4 -3 -2 -1 0+
1+
2+
3+
4+
5+
6+
7
0.0
1.5
3.0
bit
s
N
PTYS
YWVQCRLA
NFAISL
HQSCNK
HEAP
SELMGYY
SVMF
TMDANSLF
YVAL
C
SHC1_SHC2_SHC3_group PTB PSSM
-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7
0.0
2.0
4.0
bit
s
N
NP YC
SHC4 PTB ANN
-7 -6 -5 -4 -3 -2 -1 0+
1+
2+
3+
4+
5+
6+
7
0.0
2.0
4.0
bit
s
N
VPTYS
KISFL
HSNK
EAP
VTEAMLYM
LRF
TLSF
L
C
any_group WW ANN
-7 -6 -5 -4 -3 -2 -1 0+
1+
2+
3+
4+
5+
6+
7
0.0
2.0
4.0
bit
s
N
S SPSPLT
SPPS
SLSP
C
Figure S7: Sequence logos for kinases and pS/pT-binding domains. Each logo shows the amino acid preferences at differentposition relative to the phosphorylation site. The height of the stack of symbols in each position is calculated to be the relative entropyfor this position, and the height the individual symbol within a stack is proportional to its frequency. The logos were constructed usingenoLOGOS 4. As new data become available, updated logos will be made available from http://netphorest.info.
Linear motif atlas for phosphorylation-dependent signaling 11
Linear motif atlas for phosphorylation-dependent signaling 12
Linear motif atlas for phosphorylation-dependent signaling 13
Linear motif atlas for phosphorylation-dependent signaling 14
Linear motif atlas for phosphorylation-dependent signaling 15
Figure S8: Receiver output characteristic (ROC) curves for the NetPhorest classifiers. We benchmarked all classifiers for which atleast 12 positive examples were available after homology reduction. Each ROC curve shows the sensitivity as function of the rate of falsepositivies (1-specificity) for a particular classifier. The performance measures are based on an independent validation set that was notused for training or parameter optimization.
Linear motif atlas for phosphorylation-dependent signaling 16
Table S1: The selected set of NetPhorest classifiers. The table lists the non-redundant set of classifiers currently incorporated in theNetPhorest atlas. Firstly, we list if the classifier is based on artificial neural networks (ANNs) or a position-specific scoring matrix (PSSM),whether it a classifier for a kinase (KIN) or phospho-binding domain (SH2, PTB, 14-3-3, BRCT or WW) and the types of phosphorylatedresidues it can recognize (AA). Secondly, we show the number of non-redundant positive sites (POS) used for benchmarking. Finally, wereport the performance as area under the receiver operating characteristic curve (AROC); the ROC curves are available in Figure S8 orat http://netphorest.info. Due to lack of data, PSSMs could in some cases not be benchmarked. These PSSMs we assumed tohave comparable performance to other PSSMs obtained from the same type of assay, and we thus estimate the performance based onthe PSSMs that could be benchmarked. For the ANNs, the AROC obtained using zero hidden neurons, that is a linear model, is shownin parenthesis.
Source Class AA POS AROC
14-3-3 PSSM 14-3-3 ST 113 0.894Abl family ANN KIN Y 44 0.652 (0.610)Abl family PSSM SH2 Y 187 0.696ACTR2/ACTR2B/TGFbR2 PSSM KIN SY - 0.830AMPK subfamily PSSM KIN ST 28 0.858APPL subfamily ANN PTB Y 16 0.957 (0.934)ATM/ATR ANN KIN ST 57 0.986 (0.985)AuroraA ANN KIN ST 26 0.809 (0.77)AuroraC/AuroraB ANN KIN ST 21 0.884 (0.881)BCAR3 ANN SH2 Y 92 0.736 (0.737)BLNK PSSM SH2 Y - 0.749BRCT PSSM BRCT ST 6 0.875BRDG1 ANN SH2 Y 85 0.870 (0.854)Brk ANN SH2 Y 126 0.815 (0.809)CaMKIIalpha/CaMKIIdelta ANN KIN ST 33 0.786 (0.770)CaMKIIbeta/CaMKIIgamma PSSM KIN ST - 0.830CBL subfamily ANN SH2 Y 123 0.698 (0.681)CDK1 ANN KIN ST 128 0.839 (0.82)CDK2/CDK3 ANN KIN ST 65 0.959 (0.959)CDK4/CDK6 ANN KIN ST 12 0.671 (0.810)CDK5 PSSM KIN ST 15 0.973CDK7 ANN KIN ST 14 0.664 (0.532)CHK2 PSSM KIN ST 15 0.756CISH ANN SH2 Y 16 0.807 (0.604)CK1 family PSSM KIN ST 63 0.723CK2 family ANN KIN ST 251 0.930 (0.921)CLK family PSSM KIN ST - 0.830CRK ANN SH2 Y 138 0.841 (0.823)CRKL ANN SH2 Y 135 0.822 (0.841)Csk ANN SH2 Y 65 0.866 (0.807)CTK ANN SH2 Y 53 0.691 (0.744)DAPP1 ANN SH2 Y 29 0.683 (0.694)DMPK family PSSM KIN ST - 0.830DNAPK ANN KIN ST 21 0.913 (0.918)EGFR family PSSM KIN Y 40 0.627EIF2AK2 ANN KIN ST 14 0.634 (0.560)
Linear motif atlas for phosphorylation-dependent signaling 17
Source Class AA POS AROC
EphA7/EphA6/EphA4/EphA3/EphA5 PSSM KIN Y - 0.830EphB4/EphB3/EphB1/EphB2 PSSM KIN Y - 0.830Fes subfamily ANN SH2 Y 55 0.897 (0.846)FLT3/CSF1R/Kit ANN KIN Y 15 0.820 (0.541)FRK ANN SH2 Y 152 0.721 (0.758)FRS family ANN PTB Y 24 0.971 (0.939)GRB2 subfamily ANN SH2 Y 193 0.935 (0.923)GRB family ANN SH2 Y 224 0.683 (0.673)GRK subfamily ANN KIN ST 59 0.581 (0.598)GSK3 subfamily ANN KIN ST 73 0.860 (0.822)HSH2D/SH2D2A ANN SH2 Y 93 0.718 (0.713)INPP5D PSSM SH2 Y 65 0.655INPPL1 ANN SH2 Y 95 0.666 (0.667)InsR family ANN KIN Y 45 0.762JAK2/JAK3 ANN KIN Y 20 0.707 (0.661)JNK subfamily ANN KIN ST 49 0.918 (0.940)LCP2 ANN SH2 Y 59 0.730 (0.726)LKB1 PSSM KIN T 14 0.969MAP2K6/MAP2K3/MAP2K4/MAP2K7 ANN KIN STY 16 0.716 (0.662)MAPK3/MAPK1/MAPK7/NLK ANN KIN ST 143 0.869 (0.838)MAPKAPK family ANN KIN ST 28 0.801 (0.762)Met family ANN KIN Y 16 0.760 (0.601)MIST ANN SH2 Y 56 0.702 (0.683)MSN family PSSM KIN ST - 0.830MST family PSSM KIN ST - 0.830mTOR ANN KIN ST 13 0.667 (0.740)NCK family ANN SH2 Y 117 0.780 (0.785)NEK1/NEK5/NEK3/NEK4/NEK11/NEK2 PSSM KIN ST - 0.830p38 subfamily PSSM KIN ST 54 0.857p70S6K subfamily PSSM KIN ST - 0.830PAKA family PSSM KIN ST 32 0.850PAKB family PSSM KIN ST - 0.830PDGFR family PSSM KIN Y 23 0.673PDK1 ANN KIN ST 27 0.806 (0.773)PIK3R1 1 ANN SH2 Y 235 0.842 (0.818)PIK3R1 2 PSSM SH2 Y - 0.749PIK3R2 2 PSSM SH2 Y - 0.749PIK3R3 1/PIK3R2 1 ANN SH2 Y 140 0.888 (0.881)PIK3R3 2 PSSM SH2 Y - 0.749Pim2 PSSM KIN ST - 0.830Pim3/Pim1 PSSM KIN S - 0.830PKA family ANN KIN ST 282 0.899 (0.878)PKB family PSSM KIN ST 77 0.907PKC family ANN KIN ST 296 0.873 (0.867)PKD family PSSM KIN ST - 0.830PKGcGK family ANN KIN ST 35 0.761 (0.681)PLCG 1 subfamily PSSM SH2 Y 30 0.713PLCG 2 subfamily PSSM SH2 Y - 0.749
Linear motif atlas for phosphorylation-dependent signaling 18
Source Class AA POS AROC
PLK1 ANN KIN ST 42 0.599 (0.498)PTPN11 1/PTPN6 1 ANN SH2 Y 90 0.809 (0.792)PTPN11 2/PTPN6 2 ANN SH2 Y 82 0.743 (0.703)RASA1 1 PSSM SH2 Y - 0.749RASA1 2 PSSM SH2 Y - 0.749RCK family PSSM KIN Y - 0.830ROCK family ANN KIN ST 21 0.792 (0.787)RSK family ANN KIN ST 32 0.725 (0.612)SGK family ANN KIN ST 18 0.896 (0.841)SH2B family ANN SH2 Y 184 0.727 (0.709)SH2D1A/SH2D1B ANN SH2 Y 190 0.808 (0.765)SH2D3C ANN SH2 Y 76 0.684 (0.652)SH2D6 PSSM SH2 Y - 0.749SH3BP2 PSSM SH2 Y 40 0.690SHB PSSM SH2 Y - 0.749SHC1/SHC2/SHC3 PSSM PTB Y 30 0.981SHC1/SHC4 ANN SH2 Y 112 0.739 (0.678)SHC2 PSSM SH2 Y - 0.749SHC3 ANN SH2 Y 58 0.636 (0.564)SHC4 ANN PTB Y 32 0.935 (0.968)SHD ANN SH2 Y 143 0.738 (0.748)SHF PSSM SH2 Y - 0.749SLK family PSSM KIN ST - 0.830SOCS2/SOCS7 ANN SH2 Y 96 0.675 (0.686)SOCS5/SOCS4 ANN SH2 Y 64 0.801 (0.708)Src family ANN SH2 Y 339 0.763Src family PSSM KIN Y 229 0.626 (0.743)Srm ANN SH2 Y 92 0.680 (0.64)SUPT6H ANN SH2 Y 101 0.698 (0.653)Syk 1 subfamily ANN SH2 Y 55 0.596 (0.581)Syk 2 subfamily ANN SH2 Y 54 0.829 (0.862)Syk family ANN KIN Y 48 0.705 (0.720)Tec family ANN KIN Y 25 0.700 (0.542)Tec family ANN SH2 Y 171 0.750 (0.716)TENC1/TNS3/TNS1 ANN SH2 Y 197 0.732 (0.689)TLK family PSSM KIN ST - 0.830TNS4 PSSM SH2 Y 47 0.794VAV1/VAV3 ANN SH2 Y 59 0.879 (0.869)VAV2 ANN SH2 Y 40 0.821 (0.816)WW ANN WW ST 119 0.884 (0.893)YSK family PSSM KIN ST - 0.830
Linear motif atlas for phosphorylation-dependent signaling 19
Table S2: Benchmark of the NetPhorest method. NetPhorest was compared to four published methods for kinase-specific predictionof phosphorylation sites (Scansite 5, NetPhosK 6, GPS 7 and KinasePhos 8) and to the simple sequence patterns collected by the ELM 9,PROSITE 10 and HPRD 11 databases. GPS and KinasePhos were benchmarked exclusively on the phosphorylation sites that are dissimilarin sequence to those used for developing the method, whereas the pattern based methods were benchmarked on the full data set (seeMethods for details). First, we tested if the method performs significantly (P < 0.05) better than random; those which do not are markedwith ‘R’. Methods performing better than random were subsequently tested to see if they perform significantly poorer than NetPhorest (‘P’)or comparable to it (‘C’). No predictor from any of the tested methods performed significantly better than the corresponding NetPhorestpredictor. For each predictor from the other methods, the table lists the area under the receiver operating characteristic curve (AROC),the AROC of the corresponding NetPhorest classifier and the p-value for the comparison of the two. As Scansite 5 and NetPhosK 6 arepart of NetPhorest, some comparisons of classifiers from these methods will be self-comparisons; these are marked by an asterisk. Itwas impossible to compare with other published methods such as PREDIKIN 12 and PredPhospho 13, since these methods do not supportbatch predictions.
Name Class Method Result AROC AROC P-valueMethod NetPhorest
Abl KIN Scansite C 0.70 0.70 *AMPK family KIN Scansite C 0.79 0.79 *ATM KIN NetPhosK W 0.84 0.99 0ATM KIN Scansite W 0.95 0.99 0BLK/Lck/HCK/Lyn SH2 Scansite W 0.59 0.73 0CAMK family KIN Scansite C 0.79 0.81 0.23CaMKII family KIN NetPhosK W 0.54 0.79 0CaMKII family KIN Scansite C 0.81 0.81 *CDK1 KIN NetPhosK W 0.75 0.84 0CDK1 KIN Scansite C 0.85 0.85 *CDK5 KIN Scansite C 0.90 0.90 *CDK family KIN NetPhosK W 0.73 0.86 0CK1 family KIN Scansite C 0.83 0.83 *CK2 family KIN NetPhosK W 0.61 0.93 0CK2 family KIN Scansite C 0.92 0.93 0.32CRK family SH2 Scansite C 0.82 0.83 0.31DNAPK KIN NetPhosK C 0.88 0.91 0.24DNAPK KIN Scansite W 0.80 0.91 0.05EGFR family KIN NetPhosK R - 0.64 -EGFR KIN Scansite C 0.63 0.63 *Fgr/Fyn/Src/Yes SH2 Scansite W 0.66 0.73 0FLT3/CSF1R/Kit/PDGFR KIN Scansite C 0.65 0.65 *GRB2 family SH2 Scansite W 0.89 0.93 0.01GSK3 subfamily KIN NetPhosK W 0.65 0.86 0GSK3 subfamily KIN Scansite W 0.73 0.86 0INPPL1/INPP5D SH2 Scansite W 0.56 0.64 0.03InsR/IGF1R KIN NetPhosK C 0.56 0.76 1ITK SH2 Scansite W 0.76 0.85 0.01Lck/HCK/Lyn KIN Scansite C 0.59 0.59 0.53MAPK14 KIN Scansite C 0.75 0.79 0.2MAPK3 KIN Scansite W 0.79 0.91 0NCK family SH2 Scansite C 0.74 0.78 0.09
Linear motif atlas for phosphorylation-dependent signaling 20
Name Class Method Result AROC AROC P-valueMethod NetPhorest
p38 subfamily KIN NetPhosK C 0.84 0.84 *p38 subfamily KIN Scansite C 0.78 0.84 0.11PIK3R 1 subfamily SH2 Scansite W 0.75 0.80 0.03PKA family KIN NetPhosK W 0.83 0.90 0PKA family KIN Scansite W 0.86 0.90 0PKB family KIN NetPhosK W 0.85 0.95 0PKB family KIN Scansite C 0.95 0.95 *PKCdelta/PKCtheta KIN Scansite C 0.61 0.64 0.39PKC family KIN NetPhosK W 0.76 0.87 0PKC family KIN Scansite W 0.74 0.87 0PKGcGK family KIN NetPhosK C 0.71 0.76 0.08RSK family KIN NetPhosK C 0.74 0.74 *SHC1/SHC4 SH2 Scansite R - 0.74 -Src family KIN NetPhosK C 0.59 0.62 0.37Src KIN Scansite C 0.56 0.60 0.19Tec family SH2 Scansite W 0.61 0.75 0
14-3-3 14-3-3 HPRD W 0.73 0.89 0Abl KIN GPS C 0.74 0.67 0.75Abl KIN HPRD R - 0.67 -Abl SH2 HPRD R - 0.70 -AGC group KIN HPRD R - 0.86 -AMPK subfamily KIN GPS C 0.77 0.86 0.2AMPK subfamily KIN HPRD C 0.66 0.78 0.07ATM KIN GPS W 0.85 0.99 0ATM KIN HPRD W 0.95 0.99 0.03ATM KIN KinasePhos C 0.91 0.99 0.06AuroraA KIN HPRD R - 0.81 -BRCT BRCT HPRD R - 0.88 -CaMKII family KIN GPS R - 0.79 -CaMKII family KIN HPRD W 0.74 0.79 0.05CDK1 KIN GPS W 0.7 0.84 0CDK1 KIN HPRD W 0.65 0.84 0CDK1 KIN KinasePhos W 0.7 0.84 0.02CDK4 KIN HPRD R - 0.82 -CDK5 KIN HPRD W 0.53 0.97 0CDK family KIN ELM W 0.67 0.86 0CDK family KIN HPRD W 0.67 0.86 0CDK family KIN KinasePhos C 0.76 0.87 0.12CDK family KIN Prosite W 0.67 0.86 0CK1 family KIN ELM W 0.58 0.72 0.01CK1 family KIN HPRD W 0.64 0.72 0.05CK1 family KIN Prosite W 0.58 0.72 0CK2 family KIN ELM W 0.71 0.93 0CK2 family KIN GPS W 0.72 0.93 0CK2 family KIN HPRD W 0.88 0.93 0CK2 family KIN KinasePhos R - 0.93 -CK2 family KIN Prosite W 0.82 0.93 0CRK family SH2 HPRD C 0.78 0.83 0.13
Linear motif atlas for phosphorylation-dependent signaling 21
Name Class Method Result AROC AROC P-valueMethod NetPhorest
DNAPK KIN HPRD C 0.78 0.91 0.11EGFR family KIN HPRD R - 0.63 -EGFR KIN HPRD R - 0.59 -Fes SH2 HPRD C 0.77 0.83 0.09Fgr SH2 HPRD R - 0.65 -Fyn SH2 HPRD W 0.51 0.83 0GRB2 SH2 ELM C 0.87 0.88 0.45GRB family SH2 HPRD W 0.59 0.68 0GRK family KIN HPRD R - 0.58 -GSK3 subfamily KIN ELM W 0.73 0.86 0GSK3 subfamily KIN HPRD W 0.77 0.86 0GSK3 subfamily KIN Prosite W 0.73 0.86 0INPPL1 SH2 HPRD R - 0.67 -ITK SH2 HPRD R - 0.85 -JAK2/JAK3 KIN HPRD C 0.71 0.71 0.53JNK subfamily KIN HPRD R - 0.92 -LKB1 KIN HPRD R - 0.97 -MAP2K family KIN HPRD R - 0.81 -MAPK3 KIN HPRD R - 0.91 -MAPKAPK2 KIN HPRD R - 0.86 -MAPKAPK family KIN HPRD R - 0.80 -mTOR KIN HPRD R - 0.67 -NCK family SH2 HPRD R - 0.78 -PAK family KIN HPRD R - 0.78 -PDGFR family KIN HPRD R - 0.67 -PDK1 KIN HPRD R - 0.81 -PIKK family KIN ELM C 0.91 0.91 0.32PIKK family KIN Prosite C 0.91 0.91 0.35PKA family KIN ELM W 0.80 0.90 0PKA family KIN GPS W 0.77 0.9 0PKA family KIN HPRD W 0.86 0.90 0.01PKA family KIN KinasePhos W 0.72 0.9 0PKA family KIN Prosite W 0.66 0.90 0PKB family KIN ELM C 0.85 0.91 0.25PKB family KIN GPS R - 0.91 -PKB family KIN HPRD W 0.86 0.91 0.03PKB family KIN Prosite C 0.85 0.91 0.14PKCalpha KIN HPRD R - 0.82 -PKC family KIN GPS W 0.67 0.87 0PKC family KIN HPRD W 0.78 0.87 0PKC family KIN KinasePhos W 0.66 0.87 0PKC family KIN Prosite W 0.71 0.87 0PKGcGK family KIN HPRD R - 0.76 -PLCG 1 subfamily SH2 HPRD R - 0.71 -PLK1 KIN HPRD R - 0.60 -PTPN11 1 SH2 ELM W 0.54 0.78 0PTPN family SH2 HPRD W 0.63 0.76 0SHC1 SH2 HPRD R - 0.65 -
Linear motif atlas for phosphorylation-dependent signaling 22
Name Class Method Result AROC AROC P-valueMethod NetPhorest
SHC family PTB HPRD W 0.90 0.97 0.02Src family KIN HPRD R - 0.62 -Src family SH2 HPRD W 0.56 0.76 0Src KIN HPRD C 0.58 0.60 0.37Src SH2 ELM W 0.61 0.74 0Syk 1 SH2 HPRD R - 0.62 -Syk 2 SH2 HPRD R - 0.83 -Syk KIN HPRD R - 0.67 -TK group KIN HPRD R - 0.79 -TNS family SH2 HPRD R - 0.70 -VAV family SH2 HPRD R - 0.84 -WW WW HPRD C 0.88 0.88 0.24
References
[1] G. Manning, D. B. Whyte, R. Martinez, T. Hunter, S. Sudarsanam, The protein kinase complement of thehuman genome. Science 298, 1912–1934 (2002).
[2] B. A. Liu, K. Jablonowski, M. Raina, M. Arce, T. Pawson, P. D. Nash, The human and mouse complementof SH2 domain proteins—establishing the boundaries of phosphotyrosine signaling. Mol. Cell 22, 851–868 (2006).
[3] J. E. Hutti, E. T. Rarrell, J. D. Chang, D. W. Abbott, P. Storz, A. Toker, L. C. Cantley, B. E. Turk, A rapidmethod for determining protein kinase phosphorylation specificity. Nat. Methods 1, 27–29 (2004).
[4] C. T. Workman, Y. Yin, D. L. Corcoran, T. Ideker, G. D. Stormo, P. V. Benos, enoLOGOS: a versatile webtool for energy normalized sequence logos. Nucleic Acids Res. 33, W389–W392 (2005).
[5] J. C. Obenauer, L. C. Cantley, M. B. Yaffe, Scansite 2.0: Proteome-wide prediction of cell signalinginteractions using short sequence motifs. Nucleic Acids Res. 31, 3635–3641 (2003).
[6] N. Blom, T. Sicheritz-Ponten, R. Gupta, S. Gammeltoft, S. Brunak, Prediction of post-translational gly-cosylation and phosphorylation of proteins from the amino acid sequence. Proteomics 4, 1633–1649(2004).
[7] Y. Xue, F. Zhou, M. Zhu, K. Ahmed, G. Chen, X. Yao, GPS: a comprehensive www server for phospho-rylation sites prediction. Nucleic Acids Res. 33, W184–W187 (2005).
[8] Y. H. Wong, T. Y. Lee, H. K. Liang, C. M. Huang, T. Y. Wang, Y. H. Yang, C. H. Chu, H. D. Huang, M. T.Ko, J. K. Hwang, KinasePhos 2.0: a web server for identifying protein kinase-specific phosphorylationsites based on sequences and coupling patterns. Nucleic Acids Res. 35, W588–W594 (2007).
[9] P. Puntervoll, R. Linding, C. Gemund, S. Chabanis-Davidson, M. Mattingsdal, S. Cameron, D. M. Martin,G. Ausiello, B. Brannetti, A. Costantini, F. Ferre, V. Maselli, A. Via, G. Cesareni, F. Diella, G. Superti-Furga, L. Wyrwicz, C. Ramu, C. McGuigan, R. Gudavalli, I. Letunic, P. Bork, L. Rychlewski, B. Kuster,M. Helmer-Citterich, W. N. Hunter, R. Aasland, T. J. Gibson, ELM server: a new resource for investigatingshort functional sites in modular eukaryotic proteins. Nucleic Acids Res. 31, 3625–3630 (2003).
[10] N. Hulo, A. Bairoch, V. Bulliard, L. Cerutti, E. De Castro, P. S. Langendijk-Genevaux, M. Pagni, C. J. A.Sigrist, The PROSITE database. Nucleic Acids Res. 34, D227–D230 (2006).
[11] R. Amanchy, B. Periaswamy, S. Mathivanan, R. Reddy, S. G. Tattikota, A. Pandey, A curated compendiumof phosphorylation motifs. Nat. Biotechnol. 25, 285–286 (2007).
[12] R. I. Brinkworth, R. A. Breinl, B. Kobe, Structural basis and prediction of substrate specificity in proteinserine/threonine kinases. Proc. Natl. Acad. Sci. U.S.A. 100, 74–79 (2003).
[13] J. H. Kim, J. Lee, B. Oh, K. Kimm, I. Koh, Prediction of phosphorylation sites using SVMs. Bioinformatics20, 3179–3184 (2004).