[email protected] dimacs 2006 quality and effectiveness of protein structure models
TRANSCRIPT
![Page 2: Anna.Tramontano@uniroma1.it DIMACS 2006 Quality and effectiveness of protein structure models](https://reader030.vdocument.in/reader030/viewer/2022032611/56649ee65503460f94bf70e3/html5/thumbnails/2.jpg)
Molecular function
Molecular structure
Sequence
Th
e p
ara
dig
m
![Page 3: Anna.Tramontano@uniroma1.it DIMACS 2006 Quality and effectiveness of protein structure models](https://reader030.vdocument.in/reader030/viewer/2022032611/56649ee65503460f94bf70e3/html5/thumbnails/3.jpg)
Protein No.FLAV_CLOBE 1 A . . . I V Y W S G T G N T E K M A ECYSJ_THIRO 2 A . I T I L F G S Q T G N A K A V A E
…
Dete
cti
ng
hom
olo
gy
![Page 4: Anna.Tramontano@uniroma1.it DIMACS 2006 Quality and effectiveness of protein structure models](https://reader030.vdocument.in/reader030/viewer/2022032611/56649ee65503460f94bf70e3/html5/thumbnails/4.jpg)
0.00
0.50
1.00
1.50
2.00
2.50
3.00
3.50
1.0 0.8 0.6 0.4 0.2 0
Fraction sequence identity after structural superposition
r.m
.s.d
. =
[(1
/N)
Σ d
2]1
/2
Chothia and Lesk, EMBO J., 1986
Pro
tein
s e
volv
e
![Page 5: Anna.Tramontano@uniroma1.it DIMACS 2006 Quality and effectiveness of protein structure models](https://reader030.vdocument.in/reader030/viewer/2022032611/56649ee65503460f94bf70e3/html5/thumbnails/5.jpg)
AVGIFRAAVCTRGVAKAVDFVP
AVGIFRAAVCTRGVAKAVDFVP| || | | || ||||| ||AIGIWRSATCTKGVAKA--FVA
+
If If the alignment is correct, we can use the Chothia and Lesk relationship to predict the expected quality of the modelC
om
para
tive m
od
ellin
g
![Page 6: Anna.Tramontano@uniroma1.it DIMACS 2006 Quality and effectiveness of protein structure models](https://reader030.vdocument.in/reader030/viewer/2022032611/56649ee65503460f94bf70e3/html5/thumbnails/6.jpg)
Orengo, Curr. Op. Str. Biol, 1994
AVGIFRAAVCTRGVAKAVDFVPVESMETTMRSPVFTDNSSPPAVPQSFQVAHLHAPTGSGKSTKVPAAYAAQGYKVLVLNPSVAATLGFGAYMSKAHGIDPNIRTGVRTITTGAPVTYSTYGKFLADGGCSGGAYDIIICDECHSTDSTTILGIGTVLDQAETAGARLVVLATATPPGSVTVPHPNIEEVALSNTGEIP
Score and select modelFold
recog
nit
ion
![Page 7: Anna.Tramontano@uniroma1.it DIMACS 2006 Quality and effectiveness of protein structure models](https://reader030.vdocument.in/reader030/viewer/2022032611/56649ee65503460f94bf70e3/html5/thumbnails/7.jpg)
Bystroff and Baker, JMB, 1998
AVGIFRAAVCTRGVAKAVDFVP…AVGIFR AAVCTR GVAKAVDF
Fra
gm
en
t b
ased
![Page 8: Anna.Tramontano@uniroma1.it DIMACS 2006 Quality and effectiveness of protein structure models](https://reader030.vdocument.in/reader030/viewer/2022032611/56649ee65503460f94bf70e3/html5/thumbnails/8.jpg)
Bystroff and Baker, JMB, 1998
AVGIFRAAVCTRGVAKAVDFVP…AVGIFR AAVCTR GVAKAVDF
Fra
gm
en
t b
ased
![Page 9: Anna.Tramontano@uniroma1.it DIMACS 2006 Quality and effectiveness of protein structure models](https://reader030.vdocument.in/reader030/viewer/2022032611/56649ee65503460f94bf70e3/html5/thumbnails/9.jpg)
Bystroff and Baker, JMB, 1998
AVGIFRAAVCTRGVAKAVDFVP…AVGIFR AAVCTR GVAKAVDF
Fra
gm
en
t b
ased
![Page 10: Anna.Tramontano@uniroma1.it DIMACS 2006 Quality and effectiveness of protein structure models](https://reader030.vdocument.in/reader030/viewer/2022032611/56649ee65503460f94bf70e3/html5/thumbnails/10.jpg)
Bystroff and Baker, JMB, 1998
AVGIFRAAVCTRGVAKAVDFVP…AVGIFR AAVCTR GVAKAVDF
Fra
gm
en
t b
ased
![Page 11: Anna.Tramontano@uniroma1.it DIMACS 2006 Quality and effectiveness of protein structure models](https://reader030.vdocument.in/reader030/viewer/2022032611/56649ee65503460f94bf70e3/html5/thumbnails/11.jpg)
Bystroff and Baker, JMB, 1998
AVGIFRAAVCTRGVAKAVDFVP…AVGIFR AAVCTR GVAKAVDF
Score and select modelFra
gm
en
t b
ased
![Page 12: Anna.Tramontano@uniroma1.it DIMACS 2006 Quality and effectiveness of protein structure models](https://reader030.vdocument.in/reader030/viewer/2022032611/56649ee65503460f94bf70e3/html5/thumbnails/12.jpg)
Moult et al., Proteins, 1995
CASP: Critical assessment of techniques for protein structure predictionAVSRAFT
RAFTAAFDGHTYIPK
Th
e e
valu
ati
on
![Page 13: Anna.Tramontano@uniroma1.it DIMACS 2006 Quality and effectiveness of protein structure models](https://reader030.vdocument.in/reader030/viewer/2022032611/56649ee65503460f94bf70e3/html5/thumbnails/13.jpg)
Tramontano, NSB, 2003
Mod
els
Targ
ets
Gro
up
s
Th
e e
valu
ati
on
0
50
100
150
200
250
300
0
10
20
30
40
50
60
70
1 2 3 4 5
0
5000
10000
15000
20000
25000
30000
6
![Page 14: Anna.Tramontano@uniroma1.it DIMACS 2006 Quality and effectiveness of protein structure models](https://reader030.vdocument.in/reader030/viewer/2022032611/56649ee65503460f94bf70e3/html5/thumbnails/14.jpg)
Cozzetto and Tramontano, Proteins, 2004
CASP4 CASP5 CASP6 : Best models
20,00
30,00
40,00
50,00
60,00
70,00
80,00
90,00
100,00
110,00
120,00
0 20 40 60 80
Max
P.A
L0 casp6
casp4
casp5
Th
e e
valu
ati
on
![Page 15: Anna.Tramontano@uniroma1.it DIMACS 2006 Quality and effectiveness of protein structure models](https://reader030.vdocument.in/reader030/viewer/2022032611/56649ee65503460f94bf70e3/html5/thumbnails/15.jpg)
http://predictioncenter.govS
tate
of
the a
rt
Moult et al., Proteins, 2005.
![Page 16: Anna.Tramontano@uniroma1.it DIMACS 2006 Quality and effectiveness of protein structure models](https://reader030.vdocument.in/reader030/viewer/2022032611/56649ee65503460f94bf70e3/html5/thumbnails/16.jpg)
http://www.caspur.it/PMDB
Castrignano’ et al., NAR, 2006.
Sta
te o
f th
e a
rt
![Page 17: Anna.Tramontano@uniroma1.it DIMACS 2006 Quality and effectiveness of protein structure models](https://reader030.vdocument.in/reader030/viewer/2022032611/56649ee65503460f94bf70e3/html5/thumbnails/17.jpg)
Str
uctu
ral g
en
om
ics
![Page 18: Anna.Tramontano@uniroma1.it DIMACS 2006 Quality and effectiveness of protein structure models](https://reader030.vdocument.in/reader030/viewer/2022032611/56649ee65503460f94bf70e3/html5/thumbnails/18.jpg)
Protein crystallization
Diffraction datameasurements
Model building Phase estimation
Protein preparation
Mole
cu
lar
rep
lacem
en
t
![Page 19: Anna.Tramontano@uniroma1.it DIMACS 2006 Quality and effectiveness of protein structure models](https://reader030.vdocument.in/reader030/viewer/2022032611/56649ee65503460f94bf70e3/html5/thumbnails/19.jpg)
}Rotation
search
Translation search ?
Model
Mole
cu
lar
rep
lacem
en
t
![Page 20: Anna.Tramontano@uniroma1.it DIMACS 2006 Quality and effectiveness of protein structure models](https://reader030.vdocument.in/reader030/viewer/2022032611/56649ee65503460f94bf70e3/html5/thumbnails/20.jpg)
Mole
cu
lar
rep
lacem
en
t
Completely automatic procedure:
CASP ModelsMolRep (10x10)AMoRe. (20)RefMac (10)
ArpWarp
![Page 21: Anna.Tramontano@uniroma1.it DIMACS 2006 Quality and effectiveness of protein structure models](https://reader030.vdocument.in/reader030/viewer/2022032611/56649ee65503460f94bf70e3/html5/thumbnails/21.jpg)
Giorgetti et al., Bioinformatics, 2005
100
80
60
40
?
Mole
cu
lar
rep
lacem
en
tGDT-TS (distance based measure)= [NCA(1Å)+NCA (2Å)+NCA (4Å)+NCA (8Å)]/4
![Page 22: Anna.Tramontano@uniroma1.it DIMACS 2006 Quality and effectiveness of protein structure models](https://reader030.vdocument.in/reader030/viewer/2022032611/56649ee65503460f94bf70e3/html5/thumbnails/22.jpg)
Giorgetti et al., submittedMole
cu
lar
rep
lacem
en
t
What if we don’t know the quality of the model?
What if we don’t know how to build models?
![Page 23: Anna.Tramontano@uniroma1.it DIMACS 2006 Quality and effectiveness of protein structure models](https://reader030.vdocument.in/reader030/viewer/2022032611/56649ee65503460f94bf70e3/html5/thumbnails/23.jpg)
Mole
cu
lar
rep
lacem
en
t
Giorgetti et al., submitted
ACTFGARTEADEASRTFCGAVHIGFRLPMNHTYWPLYHMVCS…
Structure factors
![Page 24: Anna.Tramontano@uniroma1.it DIMACS 2006 Quality and effectiveness of protein structure models](https://reader030.vdocument.in/reader030/viewer/2022032611/56649ee65503460f94bf70e3/html5/thumbnails/24.jpg)
Mole
cu
lar
rep
lacem
en
t60% success rate
![Page 25: Anna.Tramontano@uniroma1.it DIMACS 2006 Quality and effectiveness of protein structure models](https://reader030.vdocument.in/reader030/viewer/2022032611/56649ee65503460f94bf70e3/html5/thumbnails/25.jpg)
Mole
cu
lar
rep
lacem
en
t60% success rate
If one of the retrieved models
works, the procedure is successful
![Page 26: Anna.Tramontano@uniroma1.it DIMACS 2006 Quality and effectiveness of protein structure models](https://reader030.vdocument.in/reader030/viewer/2022032611/56649ee65503460f94bf70e3/html5/thumbnails/26.jpg)
molecular
bio
log
ical
bloodcoagulation
catalityc activity
cellu
lar
extra cellular
Fu
ncti
on
pre
dic
tion
![Page 27: Anna.Tramontano@uniroma1.it DIMACS 2006 Quality and effectiveness of protein structure models](https://reader030.vdocument.in/reader030/viewer/2022032611/56649ee65503460f94bf70e3/html5/thumbnails/27.jpg)
Moult et al., Proteins, 1995
AVSRAFTRAFTAAFDGHTYIPK
The experiment
?
![Page 28: Anna.Tramontano@uniroma1.it DIMACS 2006 Quality and effectiveness of protein structure models](https://reader030.vdocument.in/reader030/viewer/2022032611/56649ee65503460f94bf70e3/html5/thumbnails/28.jpg)
Scheme of the experiment
Collect known info on targets
Ask people to provide ADDITIONAL information
Compare predictions
Is there a consensus?
Once the structure is known, can we say more?
Fu
ncti
on
pre
dic
tion
![Page 29: Anna.Tramontano@uniroma1.it DIMACS 2006 Quality and effectiveness of protein structure models](https://reader030.vdocument.in/reader030/viewer/2022032611/56649ee65503460f94bf70e3/html5/thumbnails/29.jpg)
EC Number BindingBinding site(s)Residue role(s)PT modificationsFree text comments
Soro and Tramontano, Proteins 2005
Fu
ncti
on
pre
dic
tion
![Page 30: Anna.Tramontano@uniroma1.it DIMACS 2006 Quality and effectiveness of protein structure models](https://reader030.vdocument.in/reader030/viewer/2022032611/56649ee65503460f94bf70e3/html5/thumbnails/30.jpg)
We had too few predictions per target to derive any
sensible conclusion.
However, for the sake of the experiment, we tried to see what we could do and which
would be the problems in analysing the data (other
than the format) pretending that the numbers were
significant.
Fu
ncti
on
pre
dic
tion
![Page 31: Anna.Tramontano@uniroma1.it DIMACS 2006 Quality and effectiveness of protein structure models](https://reader030.vdocument.in/reader030/viewer/2022032611/56649ee65503460f94bf70e3/html5/thumbnails/31.jpg)
Summary table for target T0230Molecular function Unknown / COG annotation: Predicted metal-sulfur cluster biosynthetic enzyme (Group:
General function prediction only; Category: Poorly characterized)
Predictions:
GO number GO name frequency
287 magnesium ion binding 1
4176 ATP-dependent peptidase activity 1
4475 mannose-1-phosphate guanylyltransferase activity 1 1
4476 4672 protein kinase activity 1
5094 Rho GDP-dissociation inhibitor activity 1
5554 Molecular function unknown 1 -
6812 PROCESS (1)
6825 PROCESS (1)
8170 N-methyltransferase activity 1
16822 hydrolase activity, acting on acid carbon-carbon bonds 1
46872 metal ion binding 1
Fu
ncti
on
pre
dic
tion
![Page 32: Anna.Tramontano@uniroma1.it DIMACS 2006 Quality and effectiveness of protein structure models](https://reader030.vdocument.in/reader030/viewer/2022032611/56649ee65503460f94bf70e3/html5/thumbnails/32.jpg)
Summary table for target T0230Molecular function Unknown / COG annotation: Predicted metal-sulfur cluster biosynthetic enzyme (Group:
General function prediction only; Category: Poorly characterized)
Predictions:
GO number GO name frequency GO Parents
287 magnesium ion binding 1 46872, 43167, 5488
4176 ATP-dependent peptidase activity 1 8233, 16787, 3824
4475 mannose-1-phosphate guanylyltransferase
activity 1 8905, 16779, 16772, 16740, 3824
4672 protein kinase activity 1 16773, 16772, 16740 (16301), 3824
5094 Rho GDP-dissociation inhibitor
activity1 1 5092, 5083, 30695, 30234
8170 N-methyltransferase activity 1 8168, 16741, 16740, 3824
16822 hydrolase activity, acting on
acid carbon-carbon bonds 1 16787, 3824
46872 metal ion binding 1 43167, 5488
Fu
ncti
on
pre
dic
tion
![Page 33: Anna.Tramontano@uniroma1.it DIMACS 2006 Quality and effectiveness of protein structure models](https://reader030.vdocument.in/reader030/viewer/2022032611/56649ee65503460f94bf70e3/html5/thumbnails/33.jpg)
Summary table for target T0230Molecular function Unknown / COG annotation: Predicted metal-sulfur cluster biosynthetic enzyme (Group:
General function prediction only; Category: Poorly characterized)
Predictions:
GO number GO name frequency GO Parents
287 magnesium ion binding 1 46872, 43167, 5488
4176 ATP-dependent peptidase activity 1 8233, 16787, 3824
4475 mannose-1-phosphate guanylyltransferase
activity 1 8905, 16779, 16772, 16740, 3824
4672 protein kinase activity 1 16773, 16772, 16740 (16301), 3824
5094 Rho GDP-dissociation inhibitor
activity1 1 5092, 5083, 30695, 30234
8170 N-methyltransferase activity 1 8168, 16741, 16740, 3824
16822 hydrolase activity, acting on
acid carbon-carbon bonds 1 16787, 3824
46872 metal ion binding 1 43167, 5488
Fu
ncti
on
pre
dic
tion
![Page 34: Anna.Tramontano@uniroma1.it DIMACS 2006 Quality and effectiveness of protein structure models](https://reader030.vdocument.in/reader030/viewer/2022032611/56649ee65503460f94bf70e3/html5/thumbnails/34.jpg)
Summary table for target T0230Molecular function Unknown / COG annotation: Predicted metal-sulfur cluster biosynthetic enzyme (Group:
General function prediction only; Category: Poorly characterized)
Predictions:
GO number GO name frequency GO Parents
287 magnesium ion binding 1 46872, 43167, 5488
4176 ATP-dependent peptidase activity 1 8233, 16787, 3824
4475 mannose-1-phosphate guanylyltransferase
activity 1 8905, 16779, 16772, 16740, 3824
4672 protein kinase activity 1 16773, 16772, 16740 (16301), 3824
5094 Rho GDP-dissociation inhibitor
activity1 1 5092, 5083, 30695, 30234
8170 N-methyltransferase activity 1 8168, 16741, 16740, 3824
16822 hydrolase activity, acting on
acid carbon-carbon bonds 1 16787, 3824
46872 metal ion binding 1 43167, 5488
16787 hydrolase 3824 catalyitic activity16740 transferase activity
Fu
ncti
on
pre
dic
tion
![Page 35: Anna.Tramontano@uniroma1.it DIMACS 2006 Quality and effectiveness of protein structure models](https://reader030.vdocument.in/reader030/viewer/2022032611/56649ee65503460f94bf70e3/html5/thumbnails/35.jpg)
Target Group Mod GO GO name Target Group Mod GO GO name T0226 P0009 1 4347 glucose-6-phosphate isomerase activity T0263 P0049 1 3754 chaperone activity
P0050 1 16429 tRNA (adenine-N1-)-methyltransferase activity P0050 1 5098
Ran GTPase activator activity
P0050 3 16429 tRNA (adenine-N1-)-methyltransferase activity P0050 2 5098
Ran GTPase activator activity
P0070 1 4347 glucose-6-phosphate isomerase activity P0070 1 4497 monooxygenase activity
P0344 1 4360 glutamine-fructose-6-phosphate transaminase (isomerizing) activity P0237 1 16491
oxidoreductase activity
Cons Transferase activity/Isomerase activity P0344 1 3676 nucleic acid binding
T0243 P0050 2 3980 UDP-glucose:glycoprotein glucosyltransferase activity
Cons
Binding/Oxidoreductase activity/Enzyme regulator activity
P0050 3 4581 dolichyl-phosphate beta-glucosyltransferase activity T0266 P0003 1 3723
RNA binding
P0070 1 3700 transcription factor activity P0049 1 3754 chaperone activity P0237 1 3677 DNA binding P0050 1 4587 ornithine-oxo-acid transaminase activity P0344 1 3677 DNA binding P0050 3 4047 aminomethyltransferase activity Cons DNA binding/Transferase activity P0096 1 4827 proline-tRNA ligase activity T0249 P0003 1 3677 DNA binding P0237 1 166 nucleotide binding P0070 1 3700 transcription factor activity P0726 1 4812 tRNA ligase activity
P0100 1 3700 transcription factor activity
Cons
Binding/ Transport activity/ Transferase activity
P0237 1 3677 DNA binding P0344 1 3677 DNA binding P0589 1 3700 transcription factor activity
Cons
DNA binding/Transcription factor activity
Results: GO consensus
Soro and Tramontano, Proteins, 2005
Fu
ncti
on
pre
dic
tion
![Page 36: Anna.Tramontano@uniroma1.it DIMACS 2006 Quality and effectiveness of protein structure models](https://reader030.vdocument.in/reader030/viewer/2022032611/56649ee65503460f94bf70e3/html5/thumbnails/36.jpg)
18 months later…
Annotations in DB decreased by 5%
24 new targets were annotated
We looked at methods (abstracts, directly contacting predictors, literature)
Fu
ncti
on
pre
dic
tion
![Page 37: Anna.Tramontano@uniroma1.it DIMACS 2006 Quality and effectiveness of protein structure models](https://reader030.vdocument.in/reader030/viewer/2022032611/56649ee65503460f94bf70e3/html5/thumbnails/37.jpg)
4
5
2
2
2
1
11
11011
10011
10100
10001
10000
11100
10101
11001Fu
ncti
on
pre
dic
tion
![Page 38: Anna.Tramontano@uniroma1.it DIMACS 2006 Quality and effectiveness of protein structure models](https://reader030.vdocument.in/reader030/viewer/2022032611/56649ee65503460f94bf70e3/html5/thumbnails/38.jpg)
18 months later…
4 newly annotated targets had been correctly predicted by at least one method
85% of the consensus non
redundant predictions were correct
Fu
ncti
on
pre
dic
tion
![Page 39: Anna.Tramontano@uniroma1.it DIMACS 2006 Quality and effectiveness of protein structure models](https://reader030.vdocument.in/reader030/viewer/2022032611/56649ee65503460f94bf70e3/html5/thumbnails/39.jpg)
Target Group Mod GO GO name Target Group Mod GO GO name T0226 P0009 1 4347 glucose-6-phosphate isomerase activity T0263 P0049 1 3754 chaperone activity
P0050 1 16429 tRNA (adenine-N1-)-methyltransferase activity P0050 1 5098
Ran GTPase activator activity
P0050 3 16429 tRNA (adenine-N1-)-methyltransferase activity P0050 2 5098
Ran GTPase activator activity
P0070 1 4347 glucose-6-phosphate isomerase activity P0070 1 4497 monooxygenase activity
P0344 1 4360 glutamine-fructose-6-phosphate transaminase (isomerizing) activity P0237 1 16491
oxidoreductase activity
Cons Transferase activity/Isomerase activity P0344 1 3676 nucleic acid binding
T0243 P0050 2 3980 UDP-glucose:glycoprotein glucosyltransferase activity
Cons
Binding/Oxidoreductase activity/Enzyme regulator activity
P0050 3 4581 dolichyl-phosphate beta-glucosyltransferase activity T0266 P0003 1 3723
RNA binding
P0070 1 3700 transcription factor activity P0049 1 3754 chaperone activity P0237 1 3677 DNA binding P0050 1 4587 ornithine-oxo-acid transaminase activity P0344 1 3677 DNA binding P0050 3 4047 aminomethyltransferase activity Cons DNA binding/Transferase activity P0096 1 4827 proline-tRNA ligase activity T0249 P0003 1 3677 DNA binding P0237 1 166 nucleotide binding P0070 1 3700 transcription factor activity P0726 1 4812 tRNA ligase activity
P0100 1 3700 transcription factor activity
Cons
Binding/ Transport activity/ Transferase activity
P0237 1 3677 DNA binding P0344 1 3677 DNA binding P0589 1 3700 transcription factor activity
Cons
DNA binding/Transcription factor activity
Results: GO consensus
Soro and Tramontano, Proteins, 2005
Fu
ncti
on
pre
dic
tion
![Page 40: Anna.Tramontano@uniroma1.it DIMACS 2006 Quality and effectiveness of protein structure models](https://reader030.vdocument.in/reader030/viewer/2022032611/56649ee65503460f94bf70e3/html5/thumbnails/40.jpg)
**
*
***
Fu
ncti
on
pre
dic
tion
![Page 41: Anna.Tramontano@uniroma1.it DIMACS 2006 Quality and effectiveness of protein structure models](https://reader030.vdocument.in/reader030/viewer/2022032611/56649ee65503460f94bf70e3/html5/thumbnails/41.jpg)
CASP is about to start again:
We will start collecting targets next week
There will be a few differences
http://predictioncenter.org
An
nou
ncm
en
ts
![Page 42: Anna.Tramontano@uniroma1.it DIMACS 2006 Quality and effectiveness of protein structure models](https://reader030.vdocument.in/reader030/viewer/2022032611/56649ee65503460f94bf70e3/html5/thumbnails/42.jpg)
BioSapiens - EU VI FrameworkMinistero della Salute
Universita' di Roma Istituto Pasteur Roma
Facolta' di Medicina San Paolo
CNR
Claudia Bonaccini Michele CerianiDomenico Cozzetto Emanuela GiombiniAlejandro GiorgettiPaolo MarcatiliVeronica MoreaRomina OlivaMassimiliano OrsiniMarialuisa Pellegrini Domenico Raimondo Simonetta Soro Ivano Talamo
Krzysztof FidelisTim Hubbard
Andriy KryshtafovychJohn Moult
Burkhard RostAdam Zemla
Structural biologistsPredictors
Ackn
ow
led
gem
en
ts